第24章 项目结构与规范

学习目标

  • 理解 Python 项目结构的演进与最佳实践
  • 掌握现代 Python 包管理工具链(pyproject.toml、uv、poetry)
  • 熟练运用代码规范工具(Ruff、Black、isort、mypy)
  • 理解配置管理的分层架构与安全实践
  • 掌握文档体系的建设方法
  • 了解代码重构的原则与模式

24.1 项目组织

24.1.1 标准库项目结构

Python 社区推荐使用 src 布局(src-layout),其核心优势在于强制区分包源码与测试代码,避免测试时意外导入未安装的本地包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
myproject/
├── src/
│ └── myproject/
│ ├── __init__.py # 包初始化,暴露公共 API
│ ├── py.typed # PEP 561 类型标记
│ ├── core/
│ │ ├── __init__.py
│ │ ├── models.py # 数据模型
│ │ ├── services.py # 业务逻辑
│ │ └── repositories.py # 数据访问
│ ├── api/
│ │ ├── __init__.py
│ │ ├── routes.py # 路由定义
│ │ └── schemas.py # 请求/响应模式
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── logging.py # 日志配置
│ │ └── helpers.py # 通用工具
│ └── config.py # 配置管理
├── tests/
│ ├── __init__.py
│ ├── conftest.py # pytest 共享 fixtures
│ ├── unit/
│ │ ├── __init__.py
│ │ ├── test_models.py
│ │ └── test_services.py
│ ├── integration/
│ │ ├── __init__.py
│ │ └── test_api.py
│ └── e2e/
│ ├── __init__.py
│ └── test_workflow.py
├── docs/
│ ├── conf.py
│ ├── index.rst
│ └── Makefile
├── scripts/
│ ├── setup_dev.sh
│ └── migrate_db.py
├── .github/
│ ├── workflows/
│ │ └── ci.yml
│ ├── CODEOWNERS
│ └── PULL_REQUEST_TEMPLATE.md
├── .gitignore
├── .pre-commit-config.yaml
├ ├── LICENSE
├── README.md
├── CHANGELOG.md
├── pyproject.toml
└── Makefile

src 布局 vs flat 布局对比

维度src 布局flat 布局
测试隔离必须安装包才能测试可能意外导入未安装代码
打包安全避免意外包含测试文件需显式排除
IDE 支持需配置源路径开箱即用
社区推荐PyPA 推荐简单项目可用

24.1.2 Web 应用项目结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
webapp/
├── app/
│ ├── __init__.py # 应用工厂
│ ├── extensions.py # 扩展初始化
│ ├── models/
│ │ ├── __init__.py
│ │ ├── user.py
│ │ ├── post.py
│ │ └── mixins.py # 可复用模型混入
│ ├── views/
│ │ ├── __init__.py
│ │ ├── auth.py
│ │ ├── dashboard.py
│ │ └── api/
│ │ ├── __init__.py
│ │ ├── v1/
│ │ │ ├── __init__.py
│ │ │ ├── users.py
│ │ │ └── posts.py
│ │ └── v2/
│ │ └── __init__.py
│ ├── services/
│ │ ├── __init__.py
│ │ ├── auth_service.py
│ │ └── email_service.py
│ ├── forms/
│ │ ├── __init__.py
│ │ └── auth_forms.py
│ ├── templates/
│ │ ├── base.html
│ │ ├── components/
│ │ │ ├── navbar.html
│ │ │ └── pagination.html
│ │ ├── auth/
│ │ │ ├── login.html
│ │ │ └── register.html
│ │ └── dashboard/
│ │ └── index.html
│ ├── static/
│ │ ├── css/
│ │ ├── js/
│ │ └── img/
│ ├── middleware/
│ │ ├── __init__.py
│ │ └── auth.py
│ └── utils/
│ ├── __init__.py
│ └── decorators.py
├── migrations/
│ ├── env.py
│ ├── versions/
│ └── alembic.ini
├── tests/
│ ├── conftest.py
│ ├── factories.py # 测试数据工厂
│ ├── unit/
│ └── integration/
├── requirements/
│ ├── base.txt
│ ├── dev.txt
│ └── prod.txt
├── config.py # 配置类
├── .env.example
├── .env # 本地环境变量(不提交)
└── run.py # 入口脚本

24.1.3 数据科学项目结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
datascience/
├── data/
│ ├── raw/ # 原始数据(只读)
│ ├── interim/ # 中间处理数据
│ ├── processed/ # 最终处理数据
│ └── external/ # 外部数据源
├── models/
│ ├── trained/ # 训练好的模型
│ └── predictions/ # 模型预测结果
├── notebooks/
│ ├── 01_exploratory.ipynb
│ ├── 02_feature_engineering.ipynb
│ └── 03_model_training.ipynb
├── src/
│ ├── __init__.py
│ ├── data/
│ │ ├── __init__.py
│ │ ├── make_dataset.py
│ │ └── preprocess.py
│ ├── features/
│ │ ├── __init__.py
│ │ └── build_features.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── train.py
│ │ └── predict.py
│ └── visualization/
│ ├── __init__.py
│ └── visualize.py
├── tests/
├── pyproject.toml
├── Makefile
├── dvc.yaml # DVC 流水线
└── params.yaml # 超参数配置

24.2 包管理

24.2.1 pyproject.toml 详解

pyproject.toml 是 PEP 518/621 定义的现代 Python 项目配置标准,统一了构建系统、项目元数据和工具配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "myproject"
version = "1.0.0"
description = "A production-grade Python application"
readme = "README.md"
license = "MIT"
license-files = ["LICENSE"]
authors = [
{name = "Your Name", email = "your.email@example.com"},
]
maintainers = [
{name = "Maintainer Name", email = "maintainer@example.com"},
]
classifiers = [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: Implementation :: CPython",
"Topic :: Software Development :: Libraries",
"Typing :: Typed",
]
requires-python = ">=3.10"
dependencies = [
"requests>=2.31.0",
"click>=8.1.0",
"pydantic>=2.0.0",
"sqlalchemy>=2.0.0",
"structlog>=23.0.0",
]

[project.optional-dependencies]
dev = [
"pytest>=8.0.0",
"pytest-cov>=5.0.0",
"pytest-asyncio>=0.23.0",
"ruff>=0.4.0",
"mypy>=1.10.0",
"pre-commit>=3.7.0",
"commitizen>=3.27.0",
]
docs = [
"sphinx>=7.0.0",
"sphinx-rtd-theme>=2.0.0",
"myst-parser>=3.0.0",
]
test = [
"pytest>=8.0.0",
"pytest-cov>=5.0.0",
"pytest-asyncio>=0.23.0",
"hypothesis>=6.100.0",
"factory-boy>=3.3.0",
]

[project.scripts]
myproject = "myproject.cli:main"

[project.gui-scripts]
myproject-gui = "myproject.gui:run"

[project.entry-points."myproject.plugins"]
builtin = "myproject.plugins.builtin"

[project.urls]
Homepage = "https://github.com/user/myproject"
Documentation = "https://myproject.readthedocs.io"
Repository = "https://github.com/user/myproject"
Changelog = "https://github.com/user/myproject/blob/main/CHANGELOG.md"
Issues = "https://github.com/user/myproject/issues"

24.2.2 包管理工具对比

特性pip + venvPoetryuvHatch
依赖解析有限完整极快完整
锁文件requirements.txtpoetry.lockuv.lock无(可配)
虚拟环境手动管理自动管理自动管理自动管理
构建后端setuptoolspoetry-core无(用其他)hatchling
发布支持twine内置内置
性能基准中等极快中等
成熟度快速增长中等

uv 工具使用

1
2
3
4
5
6
7
uv init myproject
uv add requests pydantic
uv add --dev pytest ruff mypy
uv sync
uv run pytest
uv build
uv publish

Poetry 使用

1
2
3
4
5
6
7
poetry new myproject
poetry add requests pydantic
poetry add --group dev pytest ruff mypy
poetry install
poetry run pytest
poetry build
poetry publish

24.2.3 依赖版本管理策略

1
2
3
4
5
6
7
8
9
10
11
# 精确版本(确定性构建,推荐生产环境)
requests==2.31.0

# 兼容版本(允许补丁更新,推荐库开发)
requests>=2.31.0,<3.0.0

# 插入符版本(与 ~2.31.0 等价)
requests~=2.31.0

# 最小版本(不推荐,可能引入破坏性变更)
requests>=2.31.0

依赖分组最佳实践

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[project.optional-dependencies]
dev = [
"ruff>=0.4.0",
"mypy>=1.10.0",
"pre-commit>=3.7.0",
"ipython>=8.0.0",
]
test = [
"pytest>=8.0.0",
"pytest-cov>=5.0.0",
"hypothesis>=6.100.0",
]
docs = [
"sphinx>=7.0.0",
"sphinx-rtd-theme>=2.0.0",
]
all = [
"myproject[dev,test,docs]",
]

24.3 代码规范

24.3.1 PEP 8 与现代代码风格

PEP 8 是 Python 代码风格的基础规范,但现代项目应结合以下扩展规范:

规范关注领域核心要点
PEP 8代码风格缩进、行宽、命名、空行
PEP 257文档字符串docstring 格式与内容
PEP 484类型注解静态类型标注
PEP 526变量注解变量类型标注语法
PEP 3134异常链raise ... from ...
Google Style Guide综合规范Google 内部 Python 规范

命名约定

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class UserRepository:
pass

class HTTPClient:
pass

MAX_CONNECTIONS = 100
DEFAULT_TIMEOUT = 30

def calculate_total(items: list[dict]) -> float:
pass

def _internal_helper(data: bytes) -> str:
pass

user_count: int = 0

24.3.2 Ruff:一体化 Linter 与 Formatter

Ruff 是用 Rust 编写的极速 Python Linter 和 Formatter,替代了 flake8、isort、pyupgrade 等多个工具:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[tool.ruff]
target-version = "py310"
line-length = 88
src = ["src"]

[tool.ruff.lint]
select = [
"E", # pycodestyle errors
"W", # pycodestyle warnings
"F", # pyflakes
"I", # isort
"N", # pep8-naming
"UP", # pyupgrade
"B", # flake8-bugbear
"SIM", # flake8-simplify
"TCH", # flake8-type-checking
"RUF", # ruff-specific rules
"PERF", # perflint
"PGH", # pygrep-hooks
"PLC", # pylint convention
"PLE", # pylint error
"PLR", # pylint refactor
"PLW", # pylint warning
]
ignore = [
"E501", # line too long (handled by formatter)
"PLR0913", # too many arguments
]

[tool.ruff.lint.per-file-ignores]
"tests/**" = ["PLR2004", "S101"]

[tool.ruff.lint.isort]
known-first-party = ["myproject"]
force-single-line = true

[tool.ruff.lint.pylint]
max-args = 7

[tool.ruff.format]
quote-style = "double"
indent-style = "space"
skip-magic-trailing-comma = false
line-ending = "auto"
1
2
3
4
ruff check .
ruff check --fix .
ruff format .
ruff format --check .

24.3.3 类型检查:mypy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[tool.mypy]
python_version = "3.12"
strict = true
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_any_generics = true
check_untyped_defs = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_ignores = true
warn_no_return = true
strict_optional = true

[[tool.mypy.overrides]]
module = "third_party_lib.*"
ignore_missing_imports = true

[[tool.mypy.overrides]]
module = "tests.*"
disallow_untyped_defs = false

类型注解最佳实践

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from __future__ import annotations

from collections.abc import Callable, Sequence
from typing import Any, Protocol, TypeVar, runtime_checkable

T = TypeVar("T")

@runtime_checkable
class Comparable(Protocol):
def __lt__(self, other: Any) -> bool: ...

def sort_items(
items: Sequence[T],
key: Callable[[T], Any] | None = None,
*,
reverse: bool = False,
) -> list[T]:
return sorted(items, key=key, reverse=reverse)

class DataStore:
def __init__(self, data: dict[str, Any] | None = None) -> None:
self._data: dict[str, Any] = data or {}

def get(self, key: str, default: T) -> str | T:
return self._data.get(key, default)

def set(self, key: str, value: Any) -> None:
self._data[key] = value

24.3.4 pre-commit 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-toml
- id: check-json
- id: check-merge-conflict
- id: check-added-large-files
args: ['--maxkb=500']
- id: detect-private-key
- id: debug-statements
- id: no-commit-to-branch
args: ['--branch', 'main']

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.0
hooks:
- id: ruff
args: ['--fix', '--exit-non-zero-on-fix']
- id: ruff-format

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.10.0
hooks:
- id: mypy
additional_dependencies: [pydantic>=2.0]
args: ['--strict']

24.4 配置管理

24.4.1 分层配置架构

1
2
3
4
5
6
7
8
9
10
11
12
13
┌─────────────────────────────────────────────┐
│ Layer 1: 代码内默认值 │
│ (硬编码的合理默认值,零配置即可运行) │
├─────────────────────────────────────────────┤
│ Layer 2: 配置文件 │
│ (pyproject.toml / config.yaml / .env) │
├─────────────────────────────────────────────┤
│ Layer 3: 环境变量 │
│ (覆盖配置文件,适合容器化部署) │
├─────────────────────────────────────────────┤
│ Layer 4: 命令行参数 │
│ (最高优先级,用于临时覆盖) │
└─────────────────────────────────────────────┘

24.4.2 基于类的配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
from __future__ import annotations

import os
from pathlib import Path
from dataclasses import dataclass, field
from enum import Enum


class Environment(Enum):
DEVELOPMENT = "development"
TESTING = "testing"
STAGING = "staging"
PRODUCTION = "production"


@dataclass
class DatabaseConfig:
url: str = "sqlite:///app.db"
pool_size: int = 5
max_overflow: int = 10
echo: bool = False
connect_timeout: int = 30


@dataclass
class RedisConfig:
url: str = "redis://localhost:6379/0"
max_connections: int = 50
socket_timeout: int = 5


@dataclass
class SecurityConfig:
secret_key: str = "change-me-in-production"
algorithm: str = "HS256"
access_token_expire_minutes: int = 30
refresh_token_expire_days: int = 7
bcrypt_rounds: int = 12


@dataclass
class LoggingConfig:
level: str = "INFO"
format: str = "json"
file_path: Path | None = None
max_bytes: int = 10 * 1024 * 1024
backup_count: int = 5


@dataclass
class AppConfig:
env: Environment = Environment.DEVELOPMENT
debug: bool = True
base_dir: Path = field(default_factory=lambda: Path(__file__).parent.parent)
database: DatabaseConfig = field(default_factory=DatabaseConfig)
redis: RedisConfig = field(default_factory=RedisConfig)
security: SecurityConfig = field(default_factory=SecurityConfig)
logging: LoggingConfig = field(default_factory=LoggingConfig)

@classmethod
def from_env(cls) -> AppConfig:
env_name = os.getenv("APP_ENV", "development")
env = Environment(env_name)

config = cls(
env=env,
debug=os.getenv("APP_DEBUG", "true").lower() == "true",
database=DatabaseConfig(
url=os.getenv("DATABASE_URL", "sqlite:///app.db"),
pool_size=int(os.getenv("DB_POOL_SIZE", "5")),
echo=os.getenv("DB_ECHO", "false").lower() == "true",
),
security=SecurityConfig(
secret_key=os.getenv("SECRET_KEY", "change-me-in-production"),
access_token_expire_minutes=int(
os.getenv("ACCESS_TOKEN_EXPIRE_MINUTES", "30")
),
),
logging=LoggingConfig(
level=os.getenv("LOG_LEVEL", "INFO"),
format=os.getenv("LOG_FORMAT", "json"),
file_path=Path(log_path) if (log_path := os.getenv("LOG_FILE_PATH")) else None,
),
)

if env == Environment.PRODUCTION:
config.debug = False
if config.security.secret_key == "change-me-in-production":
raise ValueError("SECRET_KEY must be set in production")

return config

24.4.3 Pydantic Settings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import Field, SecretStr, field_validator


class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
env_prefix="APP_",
env_nested_delimiter="__",
case_sensitive=False,
)

app_name: str = "MyApp"
environment: str = Field(default="development", pattern="^(development|staging|production)$")
debug: bool = False

database_url: SecretStr = Field(default="sqlite:///app.db")
database_pool_size: int = Field(default=5, ge=1, le=100)

redis_url: str = "redis://localhost:6379/0"

secret_key: SecretStr = Field(default="change-me")
access_token_expire_minutes: int = Field(default=30, ge=1)

log_level: str = Field(default="INFO", pattern="^(DEBUG|INFO|WARNING|ERROR|CRITICAL)$")

@field_validator("secret_key")
@classmethod
def validate_secret_key(cls, v: SecretStr, info) -> SecretStr:
if info.data.get("environment") == "production" and v.get_secret_value() == "change-me":
raise ValueError("SECRET_KEY must be set in production")
return v

class Database:
url: str = "sqlite:///app.db"
pool_size: int = 5


settings = Settings()

24.4.4 环境变量安全实践

1
2
3
4
5
6
APP_ENV=production
APP_DEBUG=false
APP_DATABASE_URL=postgresql://user:pass@db:5432/app
APP_SECRET_KEY=${VAULT_SECRET_KEY}
APP_LOG_LEVEL=INFO
APP_REDIS_URL=redis://redis:6379/0
1
2
3
.env
.env.local
.env.*.local

安全原则

  1. 绝不提交密钥:所有敏感信息通过环境变量注入
  2. 提供 .env.example:列出所有需要的环境变量及示例值
  3. 运行时验证:启动时检查必要的环境变量是否已设置
  4. 最小权限:每个环境仅配置该环境所需的最小权限
  5. 密钥轮换:定期更换密钥,支持无缝切换

24.5 文档体系

24.5.1 README.md 规范

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Project Name

> One-line description of what this project does

## Features

- Feature 1
- Feature 2
- Feature 3

## Quick Start

### Prerequisites

- Python >= 3.10
- uv (recommended) or pip

### Installation

```bash
uv pip install myproject

Basic Usage

1
2
3
4
from myproject import App

app = App()
app.run()

Documentation

Full documentation is available at docs.example.com.

Development

1
2
3
4
git clone https://github.com/user/myproject.git
cd myproject
uv sync
uv run pytest

Contributing

Please read CONTRIBUTING.md for details.

License

This project is licensed under the MIT License - see LICENSE.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

### 24.5.2 API 文档

```python
from __future__ import annotations

from typing import Any


def calculate_discount(
price: float,
discount_rate: float,
*,
min_price: float = 0.0,
max_discount: float | None = None,
) -> float:
"""Calculate the discounted price of an item.

Applies the given discount rate to the original price, with optional
constraints on minimum price and maximum discount amount.

Args:
price: The original price of the item. Must be non-negative.
discount_rate: The discount rate as a decimal (e.g., 0.15 for 15%).
Must be between 0.0 and 1.0.
min_price: The minimum allowable price after discount.
Defaults to 0.0.
max_discount: The maximum discount amount allowed. If None,
no cap is applied. Defaults to None.

Returns:
The discounted price, clamped to [min_price, price].

Raises:
ValueError: If price is negative or discount_rate is out of range.

Examples:
>>> calculate_discount(100.0, 0.15)
85.0
>>> calculate_discount(100.0, 0.50, max_discount=30.0)
70.0
>>> calculate_discount(100.0, 0.90, min_price=20.0)
20.0
"""
if price < 0:
raise ValueError(f"price must be non-negative, got {price}")
if not 0.0 <= discount_rate <= 1.0:
raise ValueError(
f"discount_rate must be between 0.0 and 1.0, got {discount_rate}"
)

discount_amount = price * discount_rate
if max_discount is not None:
discount_amount = min(discount_amount, max_discount)

discounted = price - discount_amount
return max(discounted, min_price)

24.5.3 架构决策记录(ADR)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# ADR-001: 选择 SQLAlchemy 作为 ORM

## 状态

已接受

## 背景

项目需要与关系型数据库交互,需要选择一个 ORM 框架。

## 决策

选择 SQLAlchemy 2.0 作为 ORM 框架。

## 理由

1. **成熟度**:SQLAlchemy 是 Python 生态中最成熟的 ORM,社区支持广泛
2. **2.0 版本改进**:原生支持 async、类型注解、dataclass 集成
3. **灵活性**:支持从高层 ORM 到底层 SQL 的多级抽象
4. **性能**:2.0 版本在批量操作和查询性能上有显著提升

## 备选方案

- Django ORM:与 Django 深度绑定,不适合独立使用
- Tortoise ORM:异步优先但生态较小
- SQLModel:基于 SQLAlchemy 但更年轻,稳定性待验证

## 影响

- 团队需要学习 SQLAlchemy 2.0 的新 API
- 可以利用 Alembic 进行数据库迁移
- 需要注意 async session 的正确使用方式

24.6 代码重构

24.6.1 重构原则

重构的核心原则来自 Martin Fowler 的经典著作《重构:改善既有代码的设计》:

  1. 小步前进:每次只做一个小改动,确保每步都可编译通过
  2. 测试保障:重构前确保有足够的测试覆盖
  3. 行为不变:重构不改变代码的外部可观察行为
  4. 持续重构:遵循”三次法则”——第三次做类似的事时重构

24.6.2 代码异味与重构手法

代码异味描述重构手法
Long Method方法过长,难以理解提取方法、以查询替代临时变量
Large Class类承担过多职责提取类、单一职责原则
Long Parameter List参数过多引入参数对象、保持对象完整
Divergent Change一个类因不同原因变化提取类
Shotgun Surgery一个变更影响多个类移动方法/字段、内联类
Feature Envy方法过度使用其他类数据移动方法、提取方法
Data Clumps数据项总是一起出现提取类
Primitive Obsession过度使用基本类型以对象替代数据值
Switch Statements复杂条件逻辑以多态替代条件、以策略模式替代
Speculative Generality过度设计折叠层次、内联类

24.6.3 重构实战示例

重构前

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def process_order(data: dict) -> dict:
if not data.get("items"):
raise ValueError("Order must have items")

total = 0
for item in data["items"]:
if item.get("price") and item.get("quantity"):
if item.get("discount"):
item_total = item["price"] * item["quantity"] * (1 - item["discount"])
else:
item_total = item["price"] * item["quantity"]
total += item_total

if data.get("coupon"):
if data["coupon"] == "SAVE10":
total = total * 0.9
elif data["coupon"] == "SAVE20":
total = total * 0.8

if total > 1000:
total = total * 0.95

tax = total * 0.08

return {
"subtotal": total,
"tax": tax,
"total": total + tax,
"items_count": len(data["items"]),
}

重构后

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
from __future__ import annotations

from dataclasses import dataclass, field
from enum import Enum
from functools import reduce


class Coupon(Enum):
SAVE10 = ("SAVE10", 0.10)
SAVE20 = ("SAVE20", 0.20)

def __init__(self, code: str, discount_rate: float) -> None:
self.code = code
self.discount_rate = discount_rate

@classmethod
def from_code(cls, code: str) -> Coupon | None:
return next((c for c in cls if c.code == code), None)


@dataclass
class OrderItem:
price: float
quantity: int
discount: float = 0.0

@property
def subtotal(self) -> float:
return self.price * self.quantity * (1 - self.discount)


@dataclass
class Order:
items: list[OrderItem] = field(default_factory=list)
coupon: Coupon | None = None
bulk_discount_threshold: float = 1000.0
bulk_discount_rate: float = 0.05
tax_rate: float = 0.08

@property
def items_count(self) -> int:
return len(self.items)

@property
def subtotal(self) -> float:
base = sum(item.subtotal for item in self.items)
after_coupon = self._apply_coupon(base)
after_bulk = self._apply_bulk_discount(after_coupon)
return after_bulk

@property
def tax(self) -> float:
return self.subtotal * self.tax_rate

@property
def total(self) -> float:
return self.subtotal + self.tax

def to_dict(self) -> dict:
return {
"subtotal": round(self.subtotal, 2),
"tax": round(self.tax, 2),
"total": round(self.total, 2),
"items_count": self.items_count,
}

def _apply_coupon(self, amount: float) -> float:
if self.coupon is None:
return amount
return amount * (1 - self.coupon.discount_rate)

def _apply_bulk_discount(self, amount: float) -> float:
if amount > self.bulk_discount_threshold:
return amount * (1 - self.bulk_discount_rate)
return amount


def process_order(data: dict) -> dict:
items = [
OrderItem(
price=item["price"],
quantity=item["quantity"],
discount=item.get("discount", 0.0),
)
for item in data.get("items", [])
]

if not items:
raise ValueError("Order must have items")

coupon = None
if coupon_code := data.get("coupon"):
coupon = Coupon.from_code(coupon_code)

order = Order(items=items, coupon=coupon)
return order.to_dict()

24.6.4 Makefile 自动化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
.PHONY: install lint format test build clean

PYTHON := python
UV := uv

install:
$(UV) sync

lint:
$(UV) run ruff check .
$(UV) run mypy src/

format:
$(UV) run ruff format .
$(UV) run ruff check --fix .

test:
$(UV) run pytest tests/ -v --cov=src --cov-report=term-missing

test-e2e:
$(UV) run pytest tests/e2e/ -v

build:
$(UV) build

clean:
find . -type d -name __pycache__ -exec rm -rf {} +
find . -type f -name "*.pyc" -delete
rm -rf .pytest_cache .mypy_cache .ruff_cache
rm -rf dist build *.egg-info

check: lint test
@echo "✅ All checks passed!"

setup: install
$(UV) run pre-commit install
@echo "✅ Development environment ready!"

24.7 前沿技术动态

24.7.1 现代Python项目管理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# pyproject.toml 现代配置
[project]
name = "my-project"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"requests>=2.28.0",
]

[project.optional-dependencies]
dev = ["pytest", "ruff", "mypy"]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.ruff]
line-length = 88
select = ["E", "F", "I", "N", "W"]

[tool.mypy]
strict = true

24.7.2 UV包管理器

1
2
3
4
5
6
# UV - 极速Python包管理器
uv init my-project
uv add requests numpy pandas
uv add --dev pytest ruff mypy
uv sync
uv run pytest

24.7.3 Ruff一体化工具

1
2
3
4
5
6
7
8
9
10
[tool.ruff]
line-length = 88
target-version = "py311"

[tool.ruff.lint]
select = ["E", "F", "I", "N", "W", "UP", "B", "C4", "SIM"]
ignore = ["E501"]

[tool.ruff.format]
quote-style = "double"

24.7.4 Pydantic Settings配置管理

1
2
3
4
5
6
7
8
9
10
11
12
13
from pydantic_settings import BaseSettings
from pydantic import Field

class Settings(BaseSettings):
app_name: str = "MyApp"
debug: bool = False
database_url: str = Field(alias="DATABASE_URL")

class Config:
env_file = ".env"
env_file_encoding = "utf-8"

settings = Settings()

24.8 本章小结

本章系统阐述了 Python 项目结构与规范的核心知识体系:

  1. 项目组织:src 布局、Web 应用结构、数据科学项目结构的设计原则
  2. 包管理:pyproject.toml 标准配置、现代包管理工具对比、依赖版本策略
  3. 代码规范:PEP 8 扩展规范、Ruff 一体化工具、mypy 类型检查、pre-commit 自动化
  4. 配置管理:分层配置架构、基于类的配置、Pydantic Settings、环境变量安全
  5. 文档体系:README 规范、API 文档、架构决策记录
  6. 代码重构:重构原则、代码异味识别、实战重构示例

24.9 习题与项目练习

基础练习

  1. 项目初始化:使用 uv 创建一个标准 Python 库项目,配置 pyproject.toml,包含完整的元数据、依赖和工具配置。

  2. 代码规范配置:为一个现有项目配置 Ruff + mypy + pre-commit,确保所有检查通过。

  3. 配置管理:实现一个基于 Pydantic Settings 的配置系统,支持 .env 文件和环境变量覆盖。

进阶练习

  1. 项目模板:创建一个可复用的项目模板(cookiecutter 或 copier),包含:

    • src 布局
    • 完整的 pyproject.toml
    • pre-commit 配置
    • GitHub Actions CI
    • 文档骨架
  2. 代码审查:对一段 200 行以上的遗留代码进行代码异味分析,列出所有发现的问题并制定重构计划。

  3. 重构实战:将一个过程式风格的 Python 脚本重构为面向对象架构,确保重构过程中测试始终通过。

项目练习

  1. 完整项目搭建:从零搭建一个生产级 Python Web 项目,要求:

    • 使用 src 布局
    • 配置分层配置管理
    • 集成 Ruff + mypy + pre-commit
    • 编写完整的 README 和 API 文档
    • 配置 GitHub Actions CI/CD
    • 实现至少一次有意义的重构
  2. 代码质量仪表盘:开发一个工具,分析 Python 项目的代码质量指标:

    • 圈复杂度
    • 代码行数与注释率
    • 类型注解覆盖率
    • 测试覆盖率
    • 依赖健康度

思考题

  1. 在微服务架构中,多个服务共享通用库时,如何设计包的结构和版本策略,以平衡复用性与独立性?

  2. 当项目从单体架构演进到微服务架构时,项目结构应如何调整?需要考虑哪些重构策略?

24.10 延伸阅读

24.10.1 项目结构

24.10.2 包管理工具

24.10.3 代码质量

24.10.4 重构与架构

  • 《Refactoring》 (Martin Fowler) — 重构经典著作
  • 《Clean Code》 (Robert C. Martin) — 代码整洁之道
  • 《Architecture Patterns with Python》 — Python架构模式

下一章:第25章 实战:命令行工具开发