第6章列表与元组

学习目标

完成本章学习后，读者将能够：

掌握列表的内部实现（动态数组）及其对时间复杂度的影响
熟练运用列表切片、推导式与常用方法进行数据操作
理解元组的不可变语义及其在API设计与数据完整性中的应用
运用命名元组（namedtuple）与数据类（dataclass）构建轻量级数据结构
根据场景正确选择列表或元组，理解性能差异与可哈希性

6.1 列表基础

6.1.1 创建列表

empty_list = []
numbers = [1, 2, 3, 4, 5]
mixed = [1, "hello", 3.14, True, [1, 2]]

# 从其他类型转换
list_from_range = list(range(5))        # [0, 1, 2, 3, 4]
list_from_string = list("Python")       # ['P', 'y', 't', 'h', 'o', 'n']
list_from_tuple = list((1, 2, 3))       # [1, 2, 3]

# 重复与嵌套
zeros = [0] * 5                          # [0, 0, 0, 0, 0]
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# 注意：嵌套列表的浅拷贝陷阱
wrong = [[0] * 3] * 3      # 三个子列表是同一对象！
wrong[0][0] = 1
print(wrong)  # [[1, 0, 0], [1, 0, 0], [1, 0, 0]]

correct = [[0] * 3 for _ in range(3)]  # 各子列表独立
correct[0][0] = 1
print(correct)  # [[1, 0, 0], [0, 0, 0], [0, 0, 0]]

学术注记：CPython的列表基于动态数组实现。预分配额外空间以摊销扩容成本，扩容策略为new_size = old_size + (old_size >> 3) + 6（约1.125倍增长）。这意味着append()的均摊时间复杂度为O(1)，但在中间插入/删除为O(n)。

6.1.2 索引与切片

fruits = ["apple", "banana", "cherry", "date", "elderberry"]

# 索引访问
print(fruits[0])       # "apple"
print(fruits[-1])      # "elderberry"

# 切片语法：[start:stop:step]
print(fruits[1:4])     # ['banana', 'cherry', 'date']
print(fruits[:3])      # ['apple', 'banana', 'cherry']
print(fruits[2:])      # ['cherry', 'date', 'elderberry']
print(fruits[::2])     # ['apple', 'cherry', 'elderberry']
print(fruits[::-1])    # ['elderberry', 'date', 'cherry', 'banana', 'apple']

# 切片创建新列表（浅拷贝）
copy = fruits[:]
copy[0] = "avocado"
print(fruits[0])       # "apple" - 原列表不变

# 赋值切片
numbers = [1, 2, 3, 4, 5]
numbers[1:3] = [20, 30]      # 替换
numbers[1:1] = [100, 200]    # 插入
del numbers[1:3]              # 删除

6.1.3 常用操作时间复杂度

操作	时间复杂度	说明
`lst[i]`	O(1)	索引访问
`lst.append(x)`	O(1)*	尾部追加（均摊）
`lst.pop()`	O(1)	尾部弹出
`lst.insert(0, x)`	O(n)	头部插入
`lst.pop(0)`	O(n)	头部弹出
`x in lst`	O(n)	线性查找
`lst.sort()`	O(n log n)	排序

6.2 列表方法

6.2.1 添加元素

fruits = ["apple", "banana"]

fruits.append("cherry")                    # 尾部追加
fruits.extend(["date", "elderberry"])      # 批量追加
fruits.insert(0, "apricot")                # 指定位置插入

6.2.2 删除元素

fruits = ["apple", "banana", "cherry", "date"]

fruits.remove("banana")     # 按值删除（第一个匹配）
popped = fruits.pop()       # 弹出最后一个
popped = fruits.pop(0)      # 弹出指定位置
del fruits[1:3]             # 删除切片
fruits.clear()              # 清空列表

6.2.3 查找与统计

fruits = ["apple", "banana", "cherry", "banana"]

print("apple" in fruits)           # True
print(fruits.index("banana"))      # 1 - 第一次出现的索引
print(fruits.index("banana", 2))   # 3 - 从索引2开始查找
print(fruits.count("banana"))      # 2

6.2.4 排序

# 原地排序
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
numbers.sort()                     # 升序
numbers.sort(reverse=True)         # 降序

# 按key函数排序
words = ["banana", "Apple", "cherry"]
words.sort(key=str.lower)          # 忽略大小写
words.sort(key=len)                # 按长度

students = [{"name": "Alice", "score": 85}, {"name": "Bob", "score": 92}]
students.sort(key=lambda s: s["score"], reverse=True)

# 返回新列表（非原地）
sorted_numbers = sorted(numbers)
sorted_desc = sorted(numbers, reverse=True)

# 稳定排序
data = [("Alice", 85), ("Bob", 85), ("Charlie", 92)]
sorted(data, key=lambda x: x[1])  # Bob在Alice之后（稳定）

6.3 列表推导式

# 基本形式
squares = [x ** 2 for x in range(10)]

# 带条件过滤
evens = [x for x in range(20) if x % 2 == 0]

# 带条件表达式
labels = ["偶数" if x % 2 == 0 else "奇数" for x in range(5)]

# 嵌套推导式
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flattened = [num for row in matrix for num in row]

# 矩阵转置
transposed = [[row[i] for row in matrix] for i in range(3)]

# 笛卡尔积
pairs = [(x, y) for x in range(3) for y in range(3) if x != y]

工程实践：推导式应保持简洁。如果逻辑需要多行或嵌套超过两层，应改用普通for循环，以提升可读性。

6.4 元组

6.4.1 创建元组

empty = ()
single = (1,)          # 注意逗号！ (1) 是整数，不是元组
single = 1,            # 等价写法
coordinates = (3, 4)
mixed = (1, "hello", 3.14)
nested = ((1, 2), (3, 4))

tuple_from_list = tuple([1, 2, 3])
tuple_from_string = tuple("Python")

6.4.2 元组的不可变性

coordinates = (3, 4)
coordinates[0] = 5      # TypeError！元组不可修改

# 但元组内的可变对象仍可修改
point = ([1, 2], [3, 4])
point[0].append(3)      # 合法！修改的是列表，不是元组
print(point)            # ([1, 2, 3], [3, 4])

# 元组"连接"创建新元组
new_coords = coordinates + (5,)    # (3, 4, 5)

学术注记：元组的不可变性是指顶层引用不可变，而非深层内容不可变。元组仅保证其中存储的引用不变，如果引用指向可变对象，该对象本身仍可修改。这一特性使得”不可变容器包含可变元素”成为可能。

6.4.3 元组的优势

# 1. 可作为字典键（可哈希）
location_map = {(35.68, 139.69): "东京", (40.71, -74.01): "纽约"}

# 2. 函数多返回值
def get_stats(numbers: list[int]) -> tuple[int, int, float]:
    return min(numbers), max(numbers), sum(numbers) / len(numbers)

# 3. 数据完整性保证
# 元组向调用者承诺：数据不会被意外修改
CONFIG = ("localhost", 5432, "mydb")

# 4. 性能优势
import sys
print(sys.getsizeof([1, 2, 3]))    # 列表：约88字节
print(sys.getsizeof((1, 2, 3)))    # 元组：约72字节

6.4.4 序列解包

# 基本解包
x, y, z = (1, 2, 3)

# 扩展解包
first, *rest = (1, 2, 3, 4, 5)
*head, last = (1, 2, 3, 4, 5)
first, *middle, last = (1, 2, 3, 4, 5)

# 交换变量
a, b = b, a

# 遍历解包
for index, value in enumerate(["a", "b", "c"]):
    print(f"{index}: {value}")

6.5 命名元组与数据类

6.5.1 namedtuple

from collections import namedtuple

Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)

print(p.x, p.y)          # 3 4 - 按名称访问
print(p[0], p[1])        # 3 4 - 按索引访问
x, y = p                 # 解包

print(p._fields)         # ('x', 'y')
print(p._asdict())       # {'x': 3, 'y': 4}
new_p = p._replace(x=5)  # 创建新实例

# 默认值
Point = namedtuple("Point", "x y", defaults=[0])
p = Point(5)             # Point(x=5, y=0)

6.5.2 dataclass（推荐）

from dataclasses import dataclass, field

@dataclass
class Point:
    x: float
    y: float = 0.0
    
    def distance_to_origin(self) -> float:
        return (self.x ** 2 + self.y ** 2) ** 0.5

p = Point(3, 4)
print(p)                          # Point(x=3, y=4)
print(p.distance_to_origin())     # 5.0

@dataclass(frozen=True)           # 不可变数据类
class Color:
    red: int
    green: int
    blue: int

c = Color(255, 128, 0)
# c.red = 100  # FrozenInstanceError

@dataclass
class Employee:
    name: str
    department: str
    salary: float = 0.0
    skills: list[str] = field(default_factory=list)

e = Employee("Alice", "Engineering", 80000, ["Python", "SQL"])

工程实践：对于简单数据容器，优先使用@dataclass而非namedtuple。dataclass支持默认值、方法、继承，且可设置frozen=True实现不可变性。namedtuple仅在需要与元组兼容的场景中使用。

6.6 实用技巧

6.6.1 去重保序

numbers = [1, 2, 2, 3, 3, 3, 4]

# 方法1：set去重（不保序）
unique = list(set(numbers))

# 方法2：dict保序（Python 3.7+）
unique = list(dict.fromkeys(numbers))

# 方法3：推导式保序
seen = set()
unique = [x for x in numbers if not (x in seen or seen.add(x))]

6.6.2 扁平化嵌套列表

from itertools import chain

# 单层扁平化
nested = [[1, 2], [3, 4], [5, 6]]
flat = list(chain.from_iterable(nested))  # [1, 2, 3, 4, 5, 6]
flat = [x for sub in nested for x in sub] # 等价推导式

# 深层扁平化（递归）
def flatten(lst: list) -> list:
    result = []
    for item in lst:
        if isinstance(item, list):
            result.extend(flatten(item))
        else:
            result.append(item)
    return result

6.6.3 分组

from itertools import groupby

students = [
    {"name": "Alice", "grade": "A"},
    {"name": "Bob", "grade": "B"},
    {"name": "Charlie", "grade": "A"},
]

students.sort(key=lambda x: x["grade"])
for grade, group in groupby(students, key=lambda x: x["grade"]):
    print(f"{grade}: {[s['name'] for s in group]}")

6.7 性能分析与优化

6.7.1 内存布局分析

import sys

def analyze_memory():
    """分析列表与元组的内存占用"""
    
    print("空容器:")
    print(f"  列表: {sys.getsizeof([])} 字节")
    print(f"  元组: {sys.getsizeof(())} 字节")
    
    print("\n单个元素:")
    print(f"  列表: {sys.getsizeof([1])} 字节")
    print(f"  元组: {sys.getsizeof((1,))} 字节")
    
    print("\n10个元素:")
    print(f"  列表: {sys.getsizeof(list(range(10)))} 字节")
    print(f"  元组: {sys.getsizeof(tuple(range(10)))} 字节")
    
    print("\n100个元素:")
    print(f"  列表: {sys.getsizeof(list(range(100)))} 字节")
    print(f"  元组: {sys.getsizeof(tuple(range(100)))} 字节")

analyze_memory()

学术注记：列表额外存储容量指针和长度信息，元组仅需存储元素指针。列表预分配约25%额外空间以优化append性能，元组无此开销。

6.7.2 操作性能基准测试

import timeit
from collections import deque

def benchmark_operations():
    """列表操作性能对比"""
    n = 100_000
    
    print(f"数据规模: {n}")
    
    append_time = timeit.timeit(
        'lst.append(1)',
        setup='lst = []',
        number=n
    )
    print(f"append(): {append_time:.4f}秒")
    
    insert_time = timeit.timeit(
        'lst.insert(0, 1)',
        setup='lst = []',
        number=n
    )
    print(f"insert(0, x): {insert_time:.4f}秒")
    
    deque_appendleft = timeit.timeit(
        'd.appendleft(1)',
        setup='from collections import deque; d = deque()',
        number=n
    )
    print(f"deque.appendleft(): {deque_appendleft:.4f}秒")
    
    in_list = timeit.timeit(
        'x in lst',
        setup='lst = list(range(10000)); x = 9999',
        number=10000
    )
    print(f"x in list (末尾): {in_list:.4f}秒")
    
    in_set = timeit.timeit(
        'x in s',
        setup='s = set(range(10000)); x = 9999',
        number=10000
    )
    print(f"x in set: {in_set:.4f}秒")

benchmark_operations()

6.7.3 性能优化策略

策略1：预分配列表大小

import time

def slow_approach(n: int) -> list:
    result = []
    for i in range(n):
        result.append(i ** 2)
    return result

def fast_approach(n: int) -> list:
    result = [None] * n
    for i in range(n):
        result[i] = i ** 2
    return result

def best_approach(n: int) -> list:
    return [i ** 2 for i in range(n)]

n = 1_000_000
print(f"推导式: {timeit.timeit(lambda: best_approach(n), number=1):.4f}秒")

策略2：使用deque处理双端操作

from collections import deque

class TaskQueue:
    def __init__(self):
        self._queue = deque()
    
    def add_front(self, task):
        self._queue.appendleft(task)
    
    def add_back(self, task):
        self._queue.append(task)
    
    def get_front(self):
        return self._queue.popleft() if self._queue else None
    
    def get_back(self):
        return self._queue.pop() if self._queue else None

策略3：使用array模块处理数值数据

from array import array
import sys

numbers = list(range(1000))
numbers_array = array('i', range(1000))

print(f"list: {sys.getsizeof(numbers)} 字节")
print(f"array: {sys.getsizeof(numbers_array)} 字节")

6.7.4 时间复杂度速查表

操作	列表	元组	deque	说明
索引访问	O(1)	O(1)	O(1)	随机访问
尾部追加	O(1)*	-	O(1)	列表均摊
头部追加	O(n)	-	O(1)	deque优势
尾部弹出	O(1)	-	O(1)	-
头部弹出	O(n)	-	O(1)	deque优势
查找元素	O(n)	O(n)	O(n)	线性查找
排序	O(n log n)	-	-	原地排序

6.8 前沿技术动态

6.8.1 高性能数据结构

from array import array
import numpy as np

arr = array('i', [1, 2, 3, 4, 5])

np_arr = np.array([1, 2, 3, 4, 5], dtype=np.int32)
result = np_arr * 2

6.8.2 类型化列表（PEP 585, PEP 646）

from typing import list

def process_numbers(nums: list[int]) -> list[int]:
    return [x * 2 for x in nums]

type Vector = list[float]

def dot_product(a: Vector, b: Vector) -> float:
    return sum(x * y for x, y in zip(a, b))

6.8.3 结构化模式匹配

def process_data(data: list | tuple) -> str:
    match data:
        case []:
            return "Empty"
        case [x]:
            return f"Single: {x}"
        case [x, y]:
            return f"Pair: {x}, {y}"
        case [first, *rest]:
            return f"First: {first}, Rest: {rest}"

6.8.4 不可变数据结构

from typing import NamedTuple

class Point(NamedTuple):
    x: float
    y: float

p1 = Point(1.0, 2.0)
p2 = p1._replace(x=3.0)

6.9 本章小结

本章系统介绍了Python列表与元组的完整体系：

列表：动态数组实现，支持索引、切片、增删改查，注意时间复杂度
元组：不可变序列，可哈希，性能优于列表，适合固定数据
推导式：声明式数据转换，简洁但不宜过度嵌套
命名元组与数据类：轻量级数据结构，优先使用dataclass
性能优化：预分配、deque、array模块的使用场景
选择原则：可变数据用列表，不可变数据用元组，结构化数据用dataclass

6.9.1 数据结构选择指南

场景	推荐结构	原因
频繁尾部增删	list	O(1) append/pop
频繁头部增删	deque	O(1) 两端操作
固定数据集合	tuple	不可变、可哈希、内存小
结构化数据	dataclass	类型安全、方法支持
大量数值计算	array/numpy	内存紧凑、向量化运算
快速成员检测	set	O(1) 查找

6.9.2 常见陷阱与规避

# 陷阱1：嵌套列表浅拷贝
matrix = [[0] * 3] * 3
matrix[0][0] = 1
print(matrix)  # [[1, 0, 0], [1, 0, 0], [1, 0, 0]]

# 正确做法
matrix = [[0] * 3 for _ in range(3)]

# 陷阱2：迭代中修改列表
lst = [1, 2, 3, 4, 5]
for x in lst:
    if x % 2 == 0:
        lst.remove(x)

# 正确做法：遍历副本或使用推导式
lst = [x for x in lst if x % 2 != 0]

# 陷阱3：单元素元组
single = (1)    # 整数，不是元组！
single = (1,)   # 正确的元组

6.10 练习题

基础题

创建包含1-100的列表，筛选出所有偶数。
给定列表[3, 1, 4, 1, 5, 9, 2, 6]，排序并去重，保持升序。
使用列表推导式生成3×3乘法表矩阵。

进阶题

实现函数flatten(lst)，支持任意深度嵌套列表的扁平化。
使用dataclass定义Student类（姓名、年龄、成绩列表），实现计算平均成绩的方法。
实现函数chunk(lst, size)，将列表按指定大小分块，如chunk([1,2,3,4,5], 2) → [[1,2], [3,4], [5]]。

项目实践

学生成绩分析系统：编写一个程序，要求：
- 使用dataclass定义Student
- 支持从CSV文件读取数据
- 实现按科目、按学生的统计分析
- 使用推导式进行数据过滤
- 支持排序、分组输出

思考题

为什么元组可以作为字典键而列表不能？从哈希表原理的角度解释。
列表的append()和insert(0, x)的时间复杂度为何不同？如果需要频繁在头部插入，应使用什么数据结构？
Python列表的动态扩容策略是什么？为什么append()的均摊时间复杂度是O(1)？

6.11 延伸阅读

6.11.1 数据结构与算法

《数据结构与算法分析（Python语言描述）》 (Goodrich等) — Python数据结构的权威教材
《算法导论》第10-12章 — 基本数据结构的理论分析
Python Time Complexity Wiki (https://wiki.python.org/moin/TimeComplexity) — 官方复杂度参考

6.11.2 Python内部实现

CPython源码 (https://github.com/python/cpython) — Objects/listobject.c、Objects/tupleobject.c
《Python源码剖析》 (陈儒) — 深入理解CPython实现细节
PEP 3113 — Removal of Tuple Parameter Unpacking (https://peps.python.org/pep-3113/)

6.11.3 高性能数据结构

NumPy文档 (https://numpy.org/doc/) — 科学计算数组库
collections模块 (https://docs.python.org/3/library/collections.html) — deque、namedtuple等
array模块 (https://docs.python.org/3/library/array.html) — 紧凑数值数组

6.11.4 函数式数据结构

《Purely Functional Data Structures》 (Chris Okasaki) — 函数式编程中的持久化数据结构
immutables库 (https://github.com/MagicStack/immutables) — Python高性能不可变映射
pyrsistent库 (https://github.com/tobgu/pyrsistent) — 持久化/不可变数据结构

下一章：第7章字典与集合

第6章 列表与元组

学习目标

6.1 列表基础

6.1.1 创建列表

6.1.2 索引与切片

6.1.3 常用操作时间复杂度

6.2 列表方法

6.2.1 添加元素

6.2.2 删除元素

6.2.3 查找与统计

6.2.4 排序

6.3 列表推导式

6.4 元组

6.4.1 创建元组

6.4.2 元组的不可变性

6.4.3 元组的优势

6.4.4 序列解包

6.5 命名元组与数据类

6.5.1 namedtuple

6.5.2 dataclass（推荐）

6.6 实用技巧

6.6.1 去重保序

6.6.2 扁平化嵌套列表

6.6.3 分组

6.7 性能分析与优化

6.7.1 内存布局分析

6.7.2 操作性能基准测试

6.7.3 性能优化策略

6.7.4 时间复杂度速查表

6.8 前沿技术动态

6.8.1 高性能数据结构

6.8.2 类型化列表（PEP 585, PEP 646）

6.8.3 结构化模式匹配

6.8.4 不可变数据结构

6.9 本章小结

6.9.1 数据结构选择指南

6.9.2 常见陷阱与规避

6.10 练习题

基础题

进阶题

项目实践

思考题

6.11 延伸阅读

6.11.1 数据结构与算法

6.11.2 Python内部实现

6.11.3 高性能数据结构

6.11.4 函数式数据结构

第6章列表与元组