03 数据类型

数据类型是C++程序的基础，深入理解其底层实现和性能特性对于编写高效、可靠的代码至关重要。本章将从底层原理到实际应用，全面分析C++的数据类型系统。

整数类型的深度分析

基本整数类型

类型	大小（字节）	范围	内存布局	对齐要求	推荐使用场景
char	1	-128 到 127 或 0 到 255	单字节，补码或无符号表示	1字节	字符存储、小整数
signed char	1	-128 到 127	单字节，补码表示	1字节	带符号小整数
unsigned char	1	0 到 255	单字节，无符号表示	1字节	无符号小整数、原始字节
short	2	-32768 到 32767	2字节，补码表示	2字节	空间受限的整数
unsigned short	2	0 到 65535	2字节，无符号表示	2字节	无符号短整数
int	4	-2147483648 到 2147483647	4字节，补码表示	4字节	通用整数计算
unsigned int	4	0 到 4294967295	4字节，无符号表示	4字节	无符号整数计算、位操作
long	4 或 8	取决于平台	4或8字节，补码表示	4或8字节	平台相关的长整数
unsigned long	4 或 8	取决于平台	4或8字节，无符号表示	4或8字节	无符号长整数
long long	8	-9223372036854775808 到 9223372036854775807	8字节，补码表示	8字节	大整数计算
unsigned long long	8	0 到 18446744073709551615	8字节，无符号表示	8字节	无符号大整数、位操作

固定宽度整数类型（C++11+）

类型	大小（字节）	范围	头文件	推荐使用场景
int8_t	1	-128 到 127	cstdint	明确需要1字节带符号整数
uint8_t	1	0 到 255	cstdint	明确需要1字节无符号整数
int16_t	2	-32768 到 32767	cstdint	明确需要2字节带符号整数
uint16_t	2	0 到 65535	cstdint	明确需要2字节无符号整数
int32_t	4	-2147483648 到 2147483647	cstdint	明确需要4字节带符号整数
uint32_t	4	0 到 4294967295	cstdint	明确需要4字节无符号整数
int64_t	8	-9223372036854775808 到 9223372036854775807	cstdint	明确需要8字节带符号整数
uint64_t	8	0 到 18446744073709551615	cstdint	明确需要8字节无符号整数

整数表示的技术细节

补码表示：

原理：最高位为符号位（0表示正，1表示负），负数用其绝对值的补码表示
计算方法：负数的补码 = 正数的原码按位取反 + 1
优势：
- 加减法可以使用相同的硬件电路
- 0只有一种表示形式
- 范围对称（-2^(n-1) 到 2^(n-1)-1）

字节序（Endianness）：

小端序（Little-Endian）：低字节存储在低地址，高字节存储在高地址（x86/x64平台）
大端序（Big-Endian）：高字节存储在低地址，低字节存储在高地址（网络字节序、某些嵌入式平台）
混合端序：不同类型使用不同字节序（罕见）

整数溢出：

有符号整数溢出：未定义行为（Undefined Behavior），可能导致程序崩溃或安全漏洞
无符号整数溢出：定义为模运算（Modulo Operation），结果为取模后的值
溢出检测：
- 使用GCC内置函数：__builtin_add_overflow、__builtin_sub_overflow、__builtin_mul_overflow
- 手动检测：比较操作数和结果

位运算优化：

位掩码：使用无符号类型进行位操作
位移操作：左移相当于乘以2的幂，右移相当于除以2的幂
位操作技巧：
- 检查奇偶性：x & 1
- 清除最低位的1：x & (x-1)
- 获取最低位的1：x & -x
- 交换两个数：a ^= b; b ^= a; a ^= b

整数类型的性能优化

类型选择策略：

与寄存器宽度匹配：优先使用与CPU寄存器宽度匹配的类型（如64位平台使用int64_t/uint64_t）
避免类型转换：减少隐式类型转换，特别是有符号和无符号之间的转换
合理使用无符号类型：对于非负数值，使用无符号类型可以获得额外的一位表示范围
使用固定宽度类型：提高代码可移植性，避免平台差异

位操作的性能优势：

无分支操作：位操作通常不需要分支，执行速度快
硬件支持：现代CPU有专门的位操作指令，执行效率高
内存节省：使用位域和位掩码可以减少内存占用

代码示例：

// 整数溢出检测
#include <cstdint>
#include <limits>

// 安全的整数加法
template <typename T>
bool safe_add(T a, T b, T& result) {
    if constexpr (std::is_signed_v<T>) {
        if (b > 0 && a > std::numeric_limits<T>::max() - b) return false;
        if (b < 0 && a < std::numeric_limits<T>::min() - b) return false;
    } else {
        if (a > std::numeric_limits<T>::max() - b) return false;
    }
    result = a + b;
    return true;
}

// GCC内置溢出检测
bool gcc_safe_add(int a, int b, int& result) {
    return !__builtin_add_overflow(a, b, &result);
}

// 位掩码操作类
class BitMask {
private:
    uint32_t mask_;

public:
    BitMask() : mask_(0) {}
    
    // 设置指定位
    void set_bit(int position) {
        mask_ |= (1U << position);
    }
    
    // 清除指定位
    void clear_bit(int position) {
        mask_ &= ~(1U << position);
    }
    
    // 切换指定位
    void toggle_bit(int position) {
        mask_ ^= (1U << position);
    }
    
    // 检查指定位
    bool test_bit(int position) const {
        return (mask_ & (1U << position)) != 0;
    }
    
    // 计算设置的位数（汉明重量）
    int count_set_bits() const {
        return __builtin_popcount(mask_);
    }
    
    // 查找最低设置位的位置
    int find_first_set() const {
        return __builtin_ctz(mask_);
    }
    
    // 查找最高设置位的位置
    int find_last_set() const {
        return 31 - __builtin_clz(mask_);
    }
    
    // 位运算
    BitMask& operator|=(const BitMask& other) {
        mask_ |= other.mask_;
        return *this;
    }
    
    BitMask& operator&=(const BitMask& other) {
        mask_ &= other.mask_;
        return *this;
    }
    
    BitMask& operator^=(const BitMask& other) {
        mask_ ^= other.mask_;
        return *this;
    }
    
    friend BitMask operator|(BitMask lhs, const BitMask& rhs) {
        lhs |= rhs;
        return lhs;
    }
    
    friend BitMask operator&(BitMask lhs, const BitMask& rhs) {
        lhs &= rhs;
        return lhs;
    }
    
    friend BitMask operator^(BitMask lhs, const BitMask& rhs) {
        lhs ^= rhs;
        return lhs;
    }
};

// 位操作优化的哈希函数
uint32_t hash_uint32(uint32_t x) {
    // 经典的MurmurHash3算法片段
    x ^= x >> 16;
    x *= 0x85ebca6b;
    x ^= x >> 13;
    x *= 0xc2b2ae35;
    x ^= x >> 16;
    return x;
}

// 快速幂算法（位操作优化）
uint64_t fast_pow(uint64_t base, uint64_t exponent) {
    uint64_t result = 1;
    while (exponent > 0) {
        if (exponent & 1) {
            result *= base;
        }
        base *= base;
        exponent >>= 1;
    }
    return result;
}

类型大小与字节序检测：

// 编译时检查类型大小
static_assert(sizeof(int) >= 4, "int must be at least 4 bytes");
static_assert(sizeof(long long) == 8, "long long must be 8 bytes");
static_assert(sizeof(intptr_t) == sizeof(void*), "intptr_t must match pointer size");

// 获取类型的位宽
#include <climits>
constexpr int int_bits = sizeof(int) * CHAR_BIT;
constexpr int ll_bits = sizeof(long long) * CHAR_BIT;
constexpr int ptr_bits = sizeof(void*) * CHAR_BIT;

// 平台字节序检测
constexpr bool is_little_endian() {
    union { 
        int i; 
        char c[sizeof(int)]; 
    } u = { 1 };
    return u.c[0] == 1;
}

constexpr bool is_big_endian() {
    union { 
        int i; 
        char c[sizeof(int)]; 
    } u = { 1 };
    return u.c[sizeof(int)-1] == 1;
}

static_assert(is_little_endian() || is_big_endian(), "Byte order detection failed");

// 编译时字节序判断
constexpr auto byte_order = is_little_endian() ? "little-endian" : "big-endian";

// 类型特性检测
constexpr bool is_32bit_platform = sizeof(void*) == 4;
constexpr bool is_64bit_platform = sizeof(void*) == 8;

static_assert(is_32bit_platform || is_64bit_platform, "Unknown platform bitness");

整数类型的性能优化：

// 整数类型选择的性能考量
// 1. 优先使用与寄存器宽度匹配的类型
// 2. 避免不必要的类型转换
// 3. 合理使用无符号类型表示非负数值
// 4. 对于循环计数器，使用size_t类型
// 5. 对于位操作，使用无符号类型以避免符号扩展

// 示例：高性能计数器实现
class HighPerformanceCounter {
private:
    // 使用与平台寄存器匹配的类型
    using CounterType = std::conditional_t<sizeof(void*) == 8, uint64_t, uint32_t>;
    CounterType count_;

public:
    HighPerformanceCounter() : count_(0) {}

    // 无分支自增
    void increment() {
        // 使用内联汇编或编译器内置函数进一步优化
        // GCC/Clang: __atomic_add_fetch(&count_, 1, __ATOMIC_RELAXED);
        count_ = count_ + 1;
    }

    // 无分支批量自增
    void increment_by(size_t n) {
        count_ = count_ + n;
    }

    // 位操作优化的取模运算（当模为2的幂时）
    template <unsigned int Mod>
    CounterType get_mod() const {
        static_assert((Mod & (Mod - 1)) == 0, "Mod must be power of 2");
        return count_ & (Mod - 1);
    }

    // 通用取模运算（优化版本）
    CounterType get_mod_generic(size_t mod) const {
        return count_ % mod;
    }

    // 重置计数器
    void reset() {
        count_ = 0;
    }

    // 获取计数值
    CounterType get() const { return count_; }

    // 检查是否溢出（对于无符号类型）
    bool would_overflow(size_t n) const {
        return count_ > std::numeric_limits<CounterType>::max() - n;
    }
};

// 示例：高性能位标志管理
class BitFlags {
private:
    using StorageType = uint64_t; // 使用64位存储以减少内存访问
    static constexpr size_t BitsPerStorage = sizeof(StorageType) * CHAR_BIT;
    StorageType bits_;

public:
    BitFlags() : bits_(0) {}

    // 设置标志（无分支）
    void set(size_t index) {
        bits_ |= (static_cast<StorageType>(1) << (index % BitsPerStorage));
    }

    // 清除标志（无分支）
    void clear(size_t index) {
        bits_ &= ~(static_cast<StorageType>(1) << (index % BitsPerStorage));
    }

    // 切换标志（无分支）
    void toggle(size_t index) {
        bits_ ^= (static_cast<StorageType>(1) << (index % BitsPerStorage));
    }

    // 检查标志（无分支）
    bool test(size_t index) const {
        return (bits_ & (static_cast<StorageType>(1) << (index % BitsPerStorage))) != 0;
    }

    // 检查是否有任何标志设置
    bool any() const {
        return bits_ != 0;
    }

    // 检查是否所有标志都未设置
    bool none() const {
        return bits_ == 0;
    }

    // 计算设置的标志数（汉明重量）
    size_t count() const {
        return __builtin_popcountll(bits_); // GCC/Clang内置函数
    }

    // 查找最低设置位的位置
    size_t find_first_set() const {
        return __builtin_ctzll(bits_); // GCC/Clang内置函数
    }

    // 查找最高设置位的位置
    size_t find_last_set() const {
        return BitsPerStorage - 1 - __builtin_clzll(bits_); // GCC/Clang内置函数
    }

    // 位运算
    BitFlags& operator|=(const BitFlags& other) {
        bits_ |= other.bits_;
        return *this;
    }

    BitFlags& operator&=(const BitFlags& other) {
        bits_ &= other.bits_;
        return *this;
    }

    BitFlags& operator^=(const BitFlags& other) {
        bits_ ^= other.bits_;
        return *this;
    }

    friend BitFlags operator|(BitFlags lhs, const BitFlags& rhs) {
        lhs |= rhs;
        return lhs;
    }

    friend BitFlags operator&(BitFlags lhs, const BitFlags& rhs) {
        lhs &= rhs;
        return lhs;
    }

    friend BitFlags operator^(BitFlags lhs, const BitFlags& rhs) {
        lhs ^= rhs;
        return lhs;
    }
};

// 示例：类型优化的循环
void process_elements(const std::vector<int>& elements) {
    // 使用size_t作为循环计数器，与容器的size()类型匹配
    for (size_t i = 0; i < elements.size(); ++i) {
        // 处理元素
    }

    // 使用范围for循环（更现代，避免计数器类型问题）
    for (const auto& element : elements) {
        // 处理元素
    }
}

// 示例：避免类型转换的性能损失
void avoid_type_conversions() {
    // 不好的做法：混合类型
    int a = 100;
    double b = 3.14;
    auto result = a * b; // 发生类型转换

    // 好的做法：提前转换或使用相同类型
    double a_double = 100.0;
    double b_double = 3.14;
    auto result_opt = a_double * b_double; // 无类型转换
}

浮点类型的精度分析

类型	大小（字节）	精度（有效数字）	指数范围	IEEE 标准	内存布局	对齐要求
float	4	约6-7位	-126 到 +127	IEEE 754单精度	连续4字节，符号位(1)+指数位(8)+尾数位(23)	4字节
double	8	约15-17位	-1022 到 +1023	IEEE 754双精度	连续8字节，符号位(1)+指数位(11)+尾数位(52)	8字节
long double	8 或 16	约18-34位	取决于实现	扩展精度	连续8或16字节，取决于实现	8或16字节
__float128	16	约34位	-16382 到 +16383	IEEE 754四精度	连续16字节，符号位(1)+指数位(15)+尾数位(112)	16字节

IEEE 754浮点表示的深入分析

单精度（float）：

总位数：32位
符号位：1位（0表示正，1表示负）
指数位：8位（偏移量127，实际范围-126到+127）
尾数位：23位（隐含前导1，实际精度24位）
数值表示：(-1)^符号位 × 2^(指数位-127) × (1.尾数位)
特殊值：
- 零：指数位全0，尾数位全0（有正负零之分）
- 无穷大：指数位全1，尾数位全0（有正负无穷之分）
- NaN：指数位全1，尾数位非0（有 signaling NaN 和 quiet NaN 之分）
- 非规格化数：指数位全0，尾数位非0（表示非常小的数）

双精度（double）：

总位数：64位
符号位：1位
指数位：11位（偏移量1023，实际范围-1022到+1023）
尾数位：52位（隐含前导1，实际精度53位）
数值表示：(-1)^符号位 × 2^(指数位-1023) × (1.尾数位)
特殊值：与单精度相同，但位模式不同

扩展精度（long double）：

x86平台：80位扩展精度（1位符号，15位指数，64位尾数）
其他平台：可能为128位四精度或与double相同
精度优势：提供更高的精度和更大的指数范围，适合数值计算密集型应用

浮点表示的数学原理：

科学计数法：IEEE 754使用二进制科学计数法表示浮点数
隐含位：尾数部分隐含前导1，提高精度
指数偏移：使用偏移指数表示，避免单独的符号位
精度限制：有限的尾数位导致某些十进制小数无法精确表示

浮点精度问题的技术分析：

表示误差：例如，0.1无法用二进制精确表示
累积误差：多次运算后误差会累积
取消现象：两个相近数相减，有效数字丢失
溢出/下溢：数值超出表示范围
NaN传播：任何与NaN的运算结果都是NaN

浮点运算的性能特性：

硬件支持：现代CPU有专门的浮点运算单元（FPU）
SIMD指令：SSE、AVX等指令集加速浮点运算
精度与速度权衡：单精度计算更快，双精度精度更高
内存带宽：单精度占用更少内存，可能提高缓存命中率

浮点类型的选择策略：

单精度（float）：适合图形处理、实时系统、内存受限场景
双精度（double）：适合科学计算、金融应用、需要高精度的场景
扩展精度（long double）：适合需要极高精度的数值计算

浮点精度控制技术：

// 浮点精度检测
#include <cmath>
#include <limits>

void float_precision_demo() {
    // 检测浮点数精度
    std::cout << "float epsilon: " << std::numeric_limits<float>::epsilon() << std::endl;
    std::cout << "double epsilon: " << std::numeric_limits<double>::epsilon() << std::endl;
    std::cout << "long double epsilon: " << std::numeric_limits<long double>::epsilon() << std::endl;
    
    // 检测浮点数范围
    std::cout << "float min: " << std::numeric_limits<float>::min() << std::endl;
    std::cout << "float max: " << std::numeric_limits<float>::max() << std::endl;
    std::cout << "double min: " << std::numeric_limits<double>::min() << std::endl;
    std::cout << "double max: " << std::numeric_limits<double>::max() << std::endl;
    
    // 0.1的表示问题
    float f = 0.1f;
    double d = 0.1;
    std::cout << "float 0.1: " << std::setprecision(20) << f << std::endl;
    std::cout << "double 0.1: " << std::setprecision(20) << d << std::endl;
}

// 浮点数比较的正确方法
template <typename T>
bool almost_equal(T a, T b, int ulp = 1) {
    // 使用ULP（Units in the Last Place）比较
    return std::abs(a - b) <= std::numeric_limits<T>::epsilon() * std::max(std::abs(a), std::abs(b)) * ulp ||
           std::abs(a - b) < std::numeric_limits<T>::min();
}

// 累积误差的控制
double kahan_sum(const double* values, size_t count) {
    // Kahan求和算法，减少累积误差
    double sum = 0.0;
    double compensation = 0.0;
    
    for (size_t i = 0; i < count; ++i) {
        double y = values[i] - compensation;
        double t = sum + y;
        compensation = (t - sum) - y;
        sum = t;
    }
    
    return sum;
}

// 递归求和（进一步减少误差）
double recursive_sum(const double* values, size_t start, size_t end) {
    if (end - start == 1) {
        return values[start];
    }
    if (end - start == 2) {
        return values[start] + values[start+1];
    }
    size_t mid = start + (end - start) / 2;
    return recursive_sum(values, start, mid) + recursive_sum(values, mid, end);
}

// 浮点运算的性能优化
double fast_dot_product(const double* a, const double* b, size_t n) {
    double sum = 0.0;
    
    // 展开循环以减少分支预测失败
    size_t i = 0;
    for (; i + 3 < n; i += 4) {
        sum += a[i] * b[i] + a[i+1] * b[i+1] + a[i+2] * b[i+2] + a[i+3] * b[i+3];
    }
    
    // 处理剩余元素
    for (; i < n; ++i) {
        sum += a[i] * b[i];
    }
    
    return sum;
}

// SIMD优化的向量运算
#include <immintrin.h>

double simd_dot_product(const double* a, const double* b, size_t n) {
    __m256d sum = _mm256_setzero_pd();
    
    // 处理32字节对齐的部分
    size_t i = 0;
    for (; i + 3 < n; i += 4) {
        __m256d va = _mm256_loadu_pd(&a[i]);
        __m256d vb = _mm256_loadu_pd(&b[i]);
        __m256d vmul = _mm256_mul_pd(va, vb);
        sum = _mm256_add_pd(sum, vmul);
    }
    
    // 处理剩余部分
    double result = 0.0;
    double tmp[4];
    _mm256_storeu_pd(tmp, sum);
    result = tmp[0] + tmp[1] + tmp[2] + tmp[3];
    
    for (; i < n; ++i) {
        result += a[i] * b[i];
    }
    
    return result;
}

// 浮点异常处理
#include <fenv.h>

void float_exception_handling() {
    // 启用所有浮点异常
    feenableexcept(FE_DIVBYZERO | FE_INEXACT | FE_INVALID | FE_OVERFLOW | FE_UNDERFLOW);
    
    try {
        // 可能触发异常的操作
        double x = 1.0 / 0.0; // 除零
        double y = std::sqrt(-1.0); // 无效操作
    } catch (const std::exception& e) {
        std::cout << "Exception: " << e.what() << std::endl;
    }
    
    // 禁用浮点异常
    fedisableexcept(FE_ALL_EXCEPT);
}

字符类型与编码系统

类型	大小（字节）	范围	编码用途	内存布局	对齐要求	C++标准
char	1	-128 到 127 或 0 到 255（取决于实现）	ASCII/扩展ASCII	单字节	1字节	C++98+
signed char	1	-128 到 127	带符号字符	单字节，补码表示	1字节	C++98+
unsigned char	1	0 到 255	无符号字符/原始字节	单字节，无符号表示	1字节	C++98+
wchar_t	2 或 4	取决于实现	宽字符（UTF-16/UTF-32）	2或4字节，取决于平台	2或4字节	C++98+
char16_t	2	0 到 65535	UTF-16编码	2字节，大端序	2字节	C++11+
char32_t	4	0 到 4294967295	UTF-32编码	4字节，大端序	4字节	C++11+
char8_t	1	0 到 255	UTF-8编码	单字节	1字节	C++20+
std::byte	1	0 到 255	原始字节	单字节	1字节	C++17+

Unicode编码的深入分析

UTF-8编码原理：

可变长度编码：1-4字节
兼容ASCII：0x00-0x7F（1字节）
多字节序列：
- 首字节高位表示后续字节数
- 后续字节以10开头
编码规则：
- U+0000 到 U+007F：0xxxxxxx（1字节）
- U+0080 到 U+07FF：110xxxxx 10xxxxxx（2字节）
- U+0800 到 U+FFFF：1110xxxx 10xxxxxx 10xxxxxx（3字节）
- U+10000 到 U+10FFFF：11110xxx 10xxxxxx 10xxxxxx 10xxxxxx（4字节）
优点：
- 兼容ASCII
- 无字节序问题
- 存储空间效率高（对于ASCII文本）
- 自同步（可以从任意位置开始解码）
缺点：
- 随机访问效率低
- 某些字符需要多字节存储

UTF-16编码原理：

可变长度编码：2或4字节
基本多文种平面（BMP）：U+0000到U+FFFF，使用2字节
补充平面：U+10000到U+10FFFF，使用代理对（surrogate pair）
代理对规则：
- 高代理项：0xD800-0xDBFF
- 低代理项：0xDC00-0xDFFF
- 计算方法：码点 = 0x10000 + ((高代理项 - 0xD800) << 10) + (低代理项 - 0xDC00)
优点：
- 大部分常用字符使用2字节
- 随机访问效率高于UTF-8
- 适合东亚语言（常用字符多为2字节）
缺点：
- 存在字节序问题（需要BOM）
- 代理对增加了复杂度

UTF-32编码原理：

固定长度编码：4字节
直接表示Unicode码点：每个码点对应一个4字节值
优点：
- 简单直观
- 随机访问效率高
- 无需处理可变长度
缺点：
- 存储空间开销大（是UTF-8的4倍）
- 存在字节序问题（需要BOM）

编码转换技术：

// UTF-8与UTF-16转换
#include <string>
#include <string_view>
#include <charconv>

// UTF-8到UTF-16转换
std::u16string utf8_to_utf16(std::string_view utf8) {
    std::u16string utf16;
    utf16.reserve(utf8.size()); // 预分配空间
    
    size_t i = 0;
    while (i < utf8.size()) {
        unsigned char c = static_cast<unsigned char>(utf8[i]);
        
        if (c < 0x80) {
            // 单字节
            utf16.push_back(static_cast<char16_t>(c));
            i++;
        } else if (c < 0xE0) {
            // 双字节
            if (i + 1 >= utf8.size()) break;
            unsigned char c2 = static_cast<unsigned char>(utf8[i+1]);
            char16_t code = static_cast<char16_t>(((c & 0x1F) << 6) | (c2 & 0x3F));
            utf16.push_back(code);
            i += 2;
        } else if (c < 0xF0) {
            // 三字节
            if (i + 2 >= utf8.size()) break;
            unsigned char c2 = static_cast<unsigned char>(utf8[i+1]);
            unsigned char c3 = static_cast<unsigned char>(utf8[i+2]);
            char16_t code = static_cast<char16_t>(((c & 0x0F) << 12) | ((c2 & 0x3F) << 6) | (c3 & 0x3F));
            utf16.push_back(code);
            i += 3;
        } else {
            // 四字节（需要代理对）
            if (i + 3 >= utf8.size()) break;
            unsigned char c2 = static_cast<unsigned char>(utf8[i+1]);
            unsigned char c3 = static_cast<unsigned char>(utf8[i+2]);
            unsigned char c4 = static_cast<unsigned char>(utf8[i+3]);
            uint32_t code_point = static_cast<uint32_t>(((c & 0x07) << 18) | ((c2 & 0x3F) << 12) | ((c3 & 0x3F) << 6) | (c4 & 0x3F));
            
            // 转换为代理对
            code_point -= 0x10000;
            char16_t high_surrogate = static_cast<char16_t>((code_point >> 10) + 0xD800);
            char16_t low_surrogate = static_cast<char16_t>((code_point & 0x3FF) + 0xDC00);
            utf16.push_back(high_surrogate);
            utf16.push_back(low_surrogate);
            i += 4;
        }
    }
    
    return utf16;
}

// UTF-16到UTF-8转换
std::string utf16_to_utf8(std::u16string_view utf16) {
    std::string utf8;
    utf8.reserve(utf16.size() * 2); // 预分配空间
    
    size_t i = 0;
    while (i < utf16.size()) {
        char16_t c = utf16[i];
        
        if (c < 0x80) {
            // 单字节
            utf8.push_back(static_cast<char>(c));
            i++;
        } else if (c < 0x800) {
            // 双字节
            utf8.push_back(static_cast<char>(0xC0 | (c >> 6)));
            utf8.push_back(static_cast<char>(0x80 | (c & 0x3F)));
            i++;
        } else if (c >= 0xD800 && c <= 0xDBFF) {
            // 高代理项，需要低代理项
            if (i + 1 >= utf16.size()) break;
            char16_t low = utf16[i+1];
            if (low < 0xDC00 || low > 0xDFFF) break;
            
            // 计算码点
            uint32_t code_point = static_cast<uint32_t>(((c - 0xD800) << 10) | (low - 0xDC00)) + 0x10000;
            
            // 四字节UTF-8
            utf8.push_back(static_cast<char>(0xF0 | (code_point >> 18)));
            utf8.push_back(static_cast<char>(0x80 | ((code_point >> 12) & 0x3F)));
            utf8.push_back(static_cast<char>(0x80 | ((code_point >> 6) & 0x3F)));
            utf8.push_back(static_cast<char>(0x80 | (code_point & 0x3F)));
            i += 2;
        } else if (c < 0xFFFF) {
            // 三字节
            utf8.push_back(static_cast<char>(0xE0 | (c >> 12)));
            utf8.push_back(static_cast<char>(0x80 | ((c >> 6) & 0x3F)));
            utf8.push_back(static_cast<char>(0x80 | (c & 0x3F)));
            i++;
        }
    }
    
    return utf8;
}

// UTF-8与UTF-32转换
std::u32string utf8_to_utf32(std::string_view utf8) {
    std::u32string utf32;
    utf32.reserve(utf8.size()); // 预分配空间
    
    size_t i = 0;
    while (i < utf8.size()) {
        unsigned char c = static_cast<unsigned char>(utf8[i]);
        
        if (c < 0x80) {
            utf32.push_back(static_cast<char32_t>(c));
            i++;
        } else if (c < 0xE0) {
            if (i + 1 >= utf8.size()) break;
            unsigned char c2 = static_cast<unsigned char>(utf8[i+1]);
            char32_t code = static_cast<char32_t>(((c & 0x1F) << 6) | (c2 & 0x3F));
            utf32.push_back(code);
            i += 2;
        } else if (c < 0xF0) {
            if (i + 2 >= utf8.size()) break;
            unsigned char c2 = static_cast<unsigned char>(utf8[i+1]);
            unsigned char c3 = static_cast<unsigned char>(utf8[i+2]);
            char32_t code = static_cast<char32_t>(((c & 0x0F) << 12) | ((c2 & 0x3F) << 6) | (c3 & 0x3F));
            utf32.push_back(code);
            i += 3;
        } else {
            if (i + 3 >= utf8.size()) break;
            unsigned char c2 = static_cast<unsigned char>(utf8[i+1]);
            unsigned char c3 = static_cast<unsigned char>(utf8[i+2]);
            unsigned char c4 = static_cast<unsigned char>(utf8[i+3]);
            char32_t code = static_cast<char32_t>(((c & 0x07) << 18) | ((c2 & 0x3F) << 12) | ((c3 & 0x3F) << 6) | (c4 & 0x3F));
            utf32.push_back(code);
            i += 4;
        }
    }
    
    return utf32;
}

字符编码的性能优化：

// 高性能UTF-8字符串处理
class Utf8String {
private:
    std::string data_;

public:
    Utf8String(const char* str) : data_(str) {}
    Utf8String(const std::u8string& str) : data_(reinterpret_cast<const char*>(str.data()), str.size()) {}
    Utf8String(const std::string& str) : data_(str) {}
    
    // 快速验证UTF-8有效性
    bool is_valid() const {
        size_t i = 0;
        while (i < data_.size()) {
            unsigned char c = static_cast<unsigned char>(data_[i]);
            
            if (c < 0x80) {
                i++;
            } else if (c < 0xC0) {
                return false; // 续字节但不是首字节
            } else if (c < 0xE0) {
                if (i + 1 >= data_.size() || (static_cast<unsigned char>(data_[i+1]) & 0xC0) != 0x80) {
                    return false;
                }
                i += 2;
            } else if (c < 0xF0) {
                if (i + 2 >= data_.size() || 
                    (static_cast<unsigned char>(data_[i+1]) & 0xC0) != 0x80 ||
                    (static_cast<unsigned char>(data_[i+2]) & 0xC0) != 0x80) {
                    return false;
                }
                i += 3;
            } else if (c < 0xF8) {
                if (i + 3 >= data_.size() || 
                    (static_cast<unsigned char>(data_[i+1]) & 0xC0) != 0x80 ||
                    (static_cast<unsigned char>(data_[i+2]) & 0xC0) != 0x80 ||
                    (static_cast<unsigned char>(data_[i+3]) & 0xC0) != 0x80) {
                    return false;
                }
                i += 4;
            } else {
                return false; // 无效的首字节
            }
        }
        return true;
    }
    
    // 计算码点数（O(n)，但优化实现）
    size_t length() const {
        size_t count = 0;
        size_t i = 0;
        while (i < data_.size()) {
            unsigned char c = static_cast<unsigned char>(data_[i]);
            // 跳过续字节
            if (c >= 0x80 && c < 0xC0) {
                i++;
                continue;
            }
            count++;
            i++;
        }
        return count;
    }
    
    // 随机访问（O(n)，但缓存优化）
    char32_t operator[](size_t index) const {
        size_t count = 0;
        size_t i = 0;
        while (i < data_.size() && count < index) {
            unsigned char c = static_cast<unsigned char>(data_[i]);
            if (c >= 0x80 && c < 0xC0) {
                i++;
                continue;
            }
            count++;
            i++;
        }
        
        if (i >= data_.size()) {
            throw std::out_of_range("Index out of range");
        }
        
        unsigned char c = static_cast<unsigned char>(data_[i]);
        if (c < 0x80) {
            return c;
        } else if (c < 0xE0) {
            if (i + 1 >= data_.size()) {
                throw std::out_of_range("Invalid UTF-8 sequence");
            }
            return ((c & 0x1F) << 6) | (static_cast<unsigned char>(data_[i+1]) & 0x3F);
        } else if (c < 0xF0) {
            if (i + 2 >= data_.size()) {
                throw std::out_of_range("Invalid UTF-8 sequence");
            }
            return ((c & 0x0F) << 12) | 
                   ((static_cast<unsigned char>(data_[i+1]) & 0x3F) << 6) |
                   (static_cast<unsigned char>(data_[i+2]) & 0x3F);
        } else {
            if (i + 3 >= data_.size()) {
                throw std::out_of_range("Invalid UTF-8 sequence");
            }
            return ((c & 0x07) << 18) | 
                   ((static_cast<unsigned char>(data_[i+1]) & 0x3F) << 12) |
                   ((static_cast<unsigned char>(data_[i+2]) & 0x3F) << 6) |
                   (static_cast<unsigned char>(data_[i+3]) & 0x3F);
        }
    }
    
    // 字符串比较（UTF-8编码感知）
    bool operator==(const Utf8String& other) const {
        return data_ == other.data_;
    }
    
    bool operator<(const Utf8String& other) const {
        return data_ < other.data_;
    }
    
    // 子串提取
    Utf8String substr(size_t pos, size_t count = std::string::npos) const {
        if (pos > length()) {
            throw std::out_of_range("Position out of range");
        }
        
        // 找到对应位置的字节偏移
        size_t byte_pos = 0;
        size_t code_point_count = 0;
        while (byte_pos < data_.size() && code_point_count < pos) {
            unsigned char c = static_cast<unsigned char>(data_[byte_pos]);
            if (c >= 0x80 && c < 0xC0) {
                byte_pos++;
                continue;
            }
            code_point_count++;
            byte_pos++;
        }
        
        if (count == std::string::npos) {
            return Utf8String(data_.substr(byte_pos));
        }
        
        // 找到结束位置的字节偏移
        size_t end_code_point_count = 0;
        size_t byte_end = byte_pos;
        while (byte_end < data_.size() && end_code_point_count < count) {
            unsigned char c = static_cast<unsigned char>(data_[byte_end]);
            if (c >= 0x80 && c < 0xC0) {
                byte_end++;
                continue;
            }
            end_code_point_count++;
            byte_end++;
        }
        
        return Utf8String(data_.substr(byte_pos, byte_end - byte_pos));
    }
    
    // 获取底层字符串
    const std::string& str() const { return data_; }
    std::u8string u8str() const {
        return std::u8string(reinterpret_cast<const char8_t*>(data_.data()), data_.size());
    }
};

// 字符编码的性能考量
// 1. 对于ASCII为主的文本，UTF-8最节省空间
// 2. 对于需要频繁随机访问的场景，UTF-32更高效
// 3. 对于国际化文本，UTF-8是Web和跨平台的最佳选择
// 4. 对于Windows平台，UTF-16是原生编码

// 示例：高性能字符串处理
void high_performance_string_operations() {
    // 字符串视图（避免复制）
    std::string_view sv = "Hello, World!";
    
    // 小字符串优化（SSO）
    std::string small_str = "Small string";
    // 对于短字符串，std::string会在栈上存储，避免堆分配
    
    // 字符串拼接优化
    std::string result;
    result.reserve(100); // 预分配空间
    result += "Hello, ";
    result += "World!";
    
    // 字符串查找优化
    size_t pos = sv.find("World");
    
    // 字符串比较优化
    bool equal = sv == "Hello, World!";
}

编码安全与最佳实践：

// 编码安全处理
void encoding_safety() {
    // 避免缓冲区溢出
    char buffer[1024];
    const char* source = "Long string that might overflow";
    // 不安全：strcpy(buffer, source);
    // 安全：strncpy(buffer, source, sizeof(buffer)-1); buffer[sizeof(buffer)-1] = '\0';
    
    // 使用std::string避免缓冲区问题
    std::string safe_str = source;
    
    // 验证输入编码
    Utf8String user_input = "User input with Unicode: 你好世界";
    if (!user_input.is_valid()) {
        std::cerr << "Invalid UTF-8 input" << std::endl;
        return;
    }
    
    // 处理编码转换错误
    try {
        char32_t ch = user_input[100]; // 可能抛出异常
    } catch (const std::out_of_range& e) {
        std::cerr << "Invalid index: " << e.what() << std::endl;
    }
}

// 字符编码最佳实践
// 1. 使用std::string存储UTF-8编码
// 2. 使用std::u8string（C++20+）明确表示UTF-8编码
// 3. 避免使用char*进行字符串操作，使用std::string_view
// 4. 对用户输入进行编码验证
// 5. 使用RAII管理字符串资源
// 6. 避免混合使用不同编码的字符串
// 7. 对于需要国际化的应用，使用UTF-8编码
// 8. 对于Windows平台，注意UTF-16与UTF-8的转换

// 示例：跨平台字符串处理
#ifdef _WIN32
// Windows平台：UTF-16是原生编码
std::wstring to_wstring(const std::string& utf8) {
    std::u16string utf16 = utf8_to_utf16(std::string_view(utf8));
    return std::wstring(utf16.begin(), utf16.end());
}

std::string from_wstring(const std::wstring& wide) {
    std::u16string utf16(wide.begin(), wide.end());
    return utf16_to_utf8(std::u16string_view(utf16));
}
#else
// Unix平台：UTF-8是原生编码
std::string to_native_string(const std::string& utf8) {
    return utf8;
}

std::string from_native_string(const std::string& native) {
    return native;
}
#endif

C++20字符类型增强：

char8_t：明确表示UTF-8编码的字符类型
std::u8string：UTF-8编码的字符串类型
std::u8string_view：UTF-8编码的字符串视图类型
字符字面量：
- u8'c'：char8_t字面量
- u8"string"：UTF-8字符串字面量

C++17 std::byte：

用途：表示原始字节，而非字符
优势：类型安全，避免与char混淆
操作：仅支持位运算和比较运算
转换：需要显式转换为其他类型

// std::byte示例
#include <cstddef>

void byte_example() {
    // 创建std::byte
    std::byte b1{0x41}; // ASCII 'A'
    std::byte b2 = static_cast<std::byte>(65); // 也表示 'A'
    
    // 位运算
    std::byte b3 = b1 | b2;
    std::byte b4 = b1 & b2;
    std::byte b5 = ~b1;
    
    // 比较运算
    bool equal = (b1 == b2);
    
    // 转换为整数
    int i = std::to_integer<int>(b1); // 65
    
    // 转换为char
    char c = static_cast<char>(std::to_integer<unsigned char>(b1)); // 'A'
}

布尔类型优化与内存布局

类型	大小（字节）	范围	用途	内存布局	对齐要求	特殊优化
bool	1（通常）	false 或 true	布尔逻辑	单字节，0表示false，非0表示true	1字节	位压缩、SSE指令优化
std::vector	每元素1位	false 或 true	空间优化的布尔向量	位压缩存储	取决于实现	位级操作优化
std::bitset	每元素1位	false 或 true	固定大小的位集合	位压缩存储	取决于实现	编译时优化

布尔类型的底层实现

C++布尔类型的技术特性：

大小：标准要求至少1字节，但编译器可以优化为更小的存储
表示：false表示为0，true表示为1（但任何非0值都会转换为true）
转换：
- 从bool转换为整型：false→0，true→1
- 从整型转换为bool：0→false，非0→true
- 从浮点型转换为bool：0.0→false，非0.0→true
- 从指针转换为bool：nullptr→false，非nullptr→true

布尔类型的内存优化：

位压缩：将多个bool值存储在单个字节中
位域：使用结构体位域减少内存占用
SIMD指令：使用CPU的向量指令并行处理布尔操作
分支预测：利用CPU的分支预测机制优化布尔条件

位压缩技术：

// 位压缩的布尔数组
class BitArray {
private:
    using WordType = uint64_t; // 64位字，提高处理效率
    static constexpr size_t BITS_PER_WORD = sizeof(WordType) * CHAR_BIT;
    
    std::vector<WordType> words_;
    size_t size_;
    
    // 计算字索引和位偏移
    size_t word_index(size_t bit) const { return bit / BITS_PER_WORD; }
    size_t bit_offset(size_t bit) const { return bit % BITS_PER_WORD; }
    
    // 确保容量足够
    void ensure_capacity(size_t required_bits) {
        size_t required_words = (required_bits + BITS_PER_WORD - 1) / BITS_PER_WORD;
        if (words_.size() < required_words) {
            words_.resize(required_words, 0);
        }
    }

public:
    BitArray(size_t size = 0) : size_(size) {
        ensure_capacity(size);
    }
    
    // 设置指定位
    void set(size_t index, bool value) {
        if (index >= size_) {
            throw std::out_of_range("Index out of range");
        }
        
        size_t word_idx = word_index(index);
        size_t bit_off = bit_offset(index);
        
        if (value) {
            words_[word_idx] |= (1ULL << bit_off);
        } else {
            words_[word_idx] &= ~(1ULL << bit_off);
        }
    }
    
    // 获取指定位
    bool get(size_t index) const {
        if (index >= size_) {
            throw std::out_of_range("Index out of range");
        }
        
        size_t word_idx = word_index(index);
        size_t bit_off = bit_offset(index);
        
        return (words_[word_idx] & (1ULL << bit_off)) != 0;
    }
    
    // 翻转指定位
    void flip(size_t index) {
        if (index >= size_) {
            throw std::out_of_range("Index out of range");
        }
        
        size_t word_idx = word_index(index);
        size_t bit_off = bit_offset(index);
        
        words_[word_idx] ^= (1ULL << bit_off);
    }
    
    // 全部设置为true
    void set_all() {
        std::fill(words_.begin(), words_.end(), ~WordType(0));
        // 清除超出大小的位
        if (size_ % BITS_PER_WORD != 0) {
            size_t last_word_idx = word_index(size_ - 1);
            WordType mask = (1ULL << (size_ % BITS_PER_WORD)) - 1;
            words_[last_word_idx] &= mask;
        }
    }
    
    // 全部设置为false
    void reset_all() {
        std::fill(words_.begin(), words_.end(), 0);
    }
    
    // 统计true的数量
    size_t count() const {
        size_t total = 0;
        for (WordType word : words_) {
            total += __builtin_popcountll(word); // GCC/Clang内置函数
        }
        // 调整超出大小的位计数
        if (size_ % BITS_PER_WORD != 0) {
            size_t last_word_idx = word_index(size_ - 1);
            WordType mask = (1ULL << (size_ % BITS_PER_WORD)) - 1;
            WordType last_word = words_[last_word_idx] & mask;
            total -= __builtin_popcountll(words_[last_word_idx]);
            total += __builtin_popcountll(last_word);
        }
        return total;
    }
    
    // 大小操作
    size_t size() const { return size_; }
    void resize(size_t new_size) {
        ensure_capacity(new_size);
        size_ = new_size;
    }
    
    // 位运算
    BitArray& operator|=(const BitArray& other) {
        if (size_ != other.size_) {
            throw std::invalid_argument("Size mismatch");
        }
        for (size_t i = 0; i < words_.size(); ++i) {
            words_[i] |= other.words_[i];
        }
        return *this;
    }
    
    BitArray& operator&=(const BitArray& other) {
        if (size_ != other.size_) {
            throw std::invalid_argument("Size mismatch");
        }
        for (size_t i = 0; i < words_.size(); ++i) {
            words_[i] &= other.words_[i];
        }
        return *this;
    }
    
    BitArray& operator^=(const BitArray& other) {
        if (size_ != other.size_) {
            throw std::invalid_argument("Size mismatch");
        }
        for (size_t i = 0; i < words_.size(); ++i) {
            words_[i] ^= other.words_[i];
        }
        return *this;
    }
    
    friend BitArray operator|(BitArray lhs, const BitArray& rhs) {
        lhs |= rhs;
        return lhs;
    }
    
    friend BitArray operator&(BitArray lhs, const BitArray& rhs) {
        lhs &= rhs;
        return lhs;
    }
    
    friend BitArray operator^(BitArray lhs, const BitArray& rhs) {
        lhs ^= rhs;
        return lhs;
    }
};

// 位域优化
struct Flags {
    bool is_enabled : 1;
    bool is_read_only : 1;
    bool is_modified : 1;
    bool reserved : 5; // 填充到8位
};

// 大小验证
static_assert(sizeof(Flags) == 1, "Flags should be 1 byte");

// 布尔操作的SIMD优化
#include <immintrin.h>

// 使用SSE2并行处理布尔数组
void simd_bool_operations(const bool* a, const bool* b, bool* result, size_t n) {
    size_t i = 0;
    
    // 处理16字节对齐的部分
    for (; i + 15 < n; i += 16) {
        // 加载16个bool值（注意：实际存储为字节）
        __m128i va = _mm_loadu_si128(reinterpret_cast<const __m128i*>(&a[i]));
        __m128i vb = _mm_loadu_si128(reinterpret_cast<const __m128i*>(&b[i]));
        
        // 执行布尔AND操作
        __m128i vresult = _mm_and_si128(va, vb);
        
        // 存储结果
        _mm_storeu_si128(reinterpret_cast<__m128i*>(&result[i]), vresult);
    }
    
    // 处理剩余部分
    for (; i < n; ++i) {
        result[i] = a[i] && b[i];
    }
}

// 分支预测优化
void branch_prediction_optimization(const std::vector<bool>& flags) {
    // 预测分支方向
    // CPU会学习分支模式，对于有规律的分支预测准确率高
    
    // 好的分支模式：有规律的true/false序列
    for (bool flag : flags) {
        if (flag) {
            // 处理true
        } else {
            // 处理false
        }
    }
    
    // 坏的分支模式：随机的true/false序列
    // 会导致分支预测失败，性能下降
}

// 无分支布尔操作
bool no_branch_and(bool a, bool b) {
    // 使用位运算避免分支
    return static_cast<bool>(static_cast<uint8_t>(a) & static_cast<uint8_t>(b));
}

bool no_branch_or(bool a, bool b) {
    // 使用位运算避免分支
    return static_cast<bool>(static_cast<uint8_t>(a) | static_cast<uint8_t>(b));
}

bool no_branch_not(bool a) {
    // 使用位运算避免分支
    return static_cast<bool>(!static_cast<uint8_t>(a));
}

内存布局与对齐优化

内存对齐的技术原理：

对齐要求：每种类型都有其对齐要求（通常是其大小的倍数）
内存填充：编译器会在结构体成员之间添加填充字节以满足对齐要求
内存访问：对齐的内存访问速度更快，未对齐的访问可能导致性能下降或硬件异常

对齐对性能的影响：

缓存行：对齐的数据更容易填满CPU缓存行，提高缓存利用率
内存带宽：对齐的访问可以更高效地利用内存总线带宽
原子操作：某些CPU架构要求原子操作的内存地址必须对齐

结构体内存布局优化：

// 未优化的结构体（有内存填充）
struct UnoptimizedStruct {
    char c;     // 1字节
    int i;      // 4字节（需要4字节对齐，前面填充3字节）
    double d;   // 8字节（需要8字节对齐，前面填充4字节）
    bool b;     // 1字节
};
// 大小：1 + 3（填充） + 4 + 4（填充） + 8 + 1 + 7（填充）= 28字节

// 优化的结构体（按大小排序，减少填充）
struct OptimizedStruct {
    double d;   // 8字节
    int i;      // 4字节
    char c;     // 1字节
    bool b;     // 1字节
};
// 大小：8 + 4 + 1 + 1 + 2（填充）= 16字节

// 内存布局优化的技术原理
// 1. 按成员大小从大到小排序
// 2. 使用位域减少内存占用
// 3. 利用编译器的packed属性（谨慎使用，可能影响性能）
// 4. 考虑缓存行大小，避免伪共享

// 缓存行优化
// 典型缓存行大小：64字节（x86/x64）
struct CacheAlignedStruct {
    // 确保每个实例都对齐到缓存行边界
    alignas(64) double values[8]; // 64字节，正好一个缓存行
};

// 避免伪共享
struct NoFalseSharing {
    alignas(64) int counter1; // 第一个计数器，对齐到缓存行
    alignas(64) int counter2; // 第二个计数器，对齐到下一个缓存行
};

// 内存布局的编译期计算
#include <type_traits>

// 计算结构体成员的偏移量
template <typename T, typename... Members>
struct offset_of;

template <typename T, typename First, typename... Rest>
struct offset_of<T, First, Rest...> {
    static constexpr size_t value = offsetof(T, First);
    static constexpr size_t next = offset_of<T, Rest...>::value;
};

template <typename T, typename Last>
struct offset_of<T, Last> {
    static constexpr size_t value = offsetof(T, Last);
};

// 编译期验证内存布局
static_assert(offsetof(OptimizedStruct, d) == 0, "d should be at offset 0");
static_assert(offsetof(OptimizedStruct, i) == 8, "i should be at offset 8");
static_assert(offsetof(OptimizedStruct, c) == 12, "c should be at offset 12");
static_assert(offsetof(OptimizedStruct, b) == 13, "b should be at offset 13");
static_assert(sizeof(OptimizedStruct) == 16, "OptimizedStruct should be 16 bytes");

// 内存布局的运行时检查
void check_memory_layout() {
    OptimizedStruct s;
    std::cout << "OptimizedStruct size: " << sizeof(s) << std::endl;
    std::cout << "d offset: " << offsetof(OptimizedStruct, d) << std::endl;
    std::cout << "i offset: " << offsetof(OptimizedStruct, i) << std::endl;
    std::cout << "c offset: " << offsetof(OptimizedStruct, c) << std::endl;
    std::cout << "b offset: " << offsetof(OptimizedStruct, b) << std::endl;
}

// 内存对齐的显式控制
struct ExplicitAlignment {
    alignas(8) char small;  // 强制8字节对齐
    int normal;             // 4字节，自动对齐
};

// 大小验证
static_assert(sizeof(ExplicitAlignment) == 16, "ExplicitAlignment should be 16 bytes");

// 内存布局的性能影响
void memory_layout_performance() {
    // 连续访问vs随机访问
    // 缓存预取器对连续内存访问更友好
    
    // 数据局部性优化
    // 将频繁一起访问的数据放在同一个缓存行
    struct DataLocality {
        int key;            // 频繁访问
        int value;          // 与key一起访问
        char padding[64 - sizeof(int) * 2]; // 填充到缓存行
    };
}

现代C++类型增强

类型特性（Type Traits）：

头文件：<type_traits>
用途：编译期类型查询和转换
核心功能：
- 类型分类：std::is_integral、std::is_floating_point、std::is_class等
- 类型属性：std::is_const、std::is_volatile、std::is_trivial等
- 类型转换：std::add_const、std::remove_reference、std::decay等
- 类型关系：std::is_same、std::is_base_of、std::is_convertible等

代码示例：

// 类型特性的使用
#include <type_traits>

// 编译期类型检查
template <typename T>
void process(T value) {
    static_assert(std::is_integral_v<T>, "T must be integral type");
    
    if constexpr (std::is_signed_v<T>) {
        std::cout << "Signed integral type" << std::endl;
    } else {
        std::cout << "Unsigned integral type" << std::endl;
    }
    
    if constexpr (std::is_arithmetic_v<T>) {
        std::cout << "Arithmetic type" << std::endl;
    }
}

// 类型转换
template <typename T>
void type_conversion_demo(T value) {
    // 移除引用
    using NoRef = std::remove_reference_t<T>;
    
    // 添加const
    using ConstType = std::add_const_t<NoRef>;
    
    // 衰变类型（移除cv限定符、引用，数组转为指针）
    using Decayed = std::decay_t<T>;
    
    std::cout << "Original type: " << typeid(T).name() << std::endl;
    std::cout << "No reference: " << typeid(NoRef).name() << std::endl;
    std::cout << "Const type: " << typeid(ConstType).name() << std::endl;
    std::cout << "Decayed type: " << typeid(Decayed).name() << std::endl;
}

// 条件类型选择
template <typename T>
using SafeIntegral = std::conditional_t<
    sizeof(T) <= sizeof(int),
    int,
    std::conditional_t<
        sizeof(T) <= sizeof(long long),
        long long,
        void
    >
>;

// 类型萃取
template <typename T>
struct TypeTraits {
    static constexpr bool is_integral = std::is_integral_v<T>;
    static constexpr bool is_floating = std::is_floating_point_v<T>;
    static constexpr bool is_arithmetic = std::is_arithmetic_v<T>;
    static constexpr bool is_trivial = std::is_trivial_v<T>;
    static constexpr bool is_standard_layout = std::is_standard_layout_v<T>;
    static constexpr size_t size = sizeof(T);
};

C++17 std::optional：

用途：表示可能不存在的值
优势：避免使用特殊值（如nullptr、-1）表示不存在
性能：小对象优化，避免堆分配

C++17 std::variant：

用途：类型安全的联合体
优势：避免手动类型管理，支持访问者模式
性能：存储在栈上，类型切换开销小

C++20 std::span：

用途：非拥有的连续序列视图
优势：避免复制，支持运行时大小的数组
性能：零开销抽象，与原始指针+大小等价

C++23 std::expected：

用途：表示可能失败的操作结果
优势：同时支持返回值和错误信息
性能：小对象优化，避免异常开销

C++23 std::mdspan：

用途：多维数组的非拥有视图
优势：支持任意维度，避免手动索引计算
性能：零开销抽象，与原始多维数组等价

现代C++类型的性能优化：

// std::optional的性能优化
void optional_performance() {
    // 小对象优化：当T是小类型时，std::optional在栈上存储
    std::optional<int> opt_int = 42;
    
    // 避免使用std::optional存储大对象
    // 对于大对象，考虑使用std::unique_ptr
    
    // 快速访问
    if (opt_int.has_value()) {
        int value = opt_int.value();
        // 或使用操作符*
        int value2 = *opt_int;
    }
    
    // emplace构造，避免额外拷贝
    std::optional<std::string> opt_str;
    opt_str.emplace("Hello, World!");
}

// std::variant的性能优化
void variant_performance() {
    // 存储在栈上，大小为最大类型的大小
    std::variant<int, double, std::string> var;
    
    // 快速类型切换
    var = 3.14;
    
    // 使用std::visit避免手动类型检查
    std::visit([](auto&& value) {
        using T = std::decay_t<decltype(value)>;
        if constexpr (std::is_same_v<T, int>) {
            std::cout << "int: " << value << std::endl;
        } else if constexpr (std::is_same_v<T, double>) {
            std::cout << "double: " << value << std::endl;
        } else if constexpr (std::is_same_v<T, std::string>) {
            std::cout << "string: " << value << std::endl;
        }
    }, var);
    
    // 使用std::holds_alternative进行类型检查
    if (std::holds_alternative<double>(var)) {
        double value = std::get<double>(var);
        std::cout << "Holds double: " << value << std::endl;
    }
}

// std::span的性能优化
void span_performance() {
    // 从数组创建span
    int arr[] = {1, 2, 3, 4, 5};
    std::span<int> span_arr(arr);
    
    // 从std::vector创建span
    std::vector<int> vec = {1, 2, 3, 4, 5};
    std::span<int> span_vec(vec);
    
    // 子span
    auto sub_span = span_vec.subspan(1, 3); // 元素2,3,4
    
    // 范围for循环
    for (int value : span_vec) {
        std::cout << value << " ";
    }
    std::cout << std::endl;
    
    // 与C风格API交互
    void process_array(int* data, size_t size);
    process_array(span_vec.data(), span_vec.size());
}

// std::expected的性能优化
#include <expected>

std::expected<int, std::string> divide(int a, int b) {
    if (b == 0) {
        return std::unexpected("Division by zero");
    }
    return a / b;
}

void expected_performance() {
    // 处理成功情况
    auto result = divide(10, 2);
    if (result) {
        std::cout << "Result: " << *result << std::endl;
    } else {
        std::cout << "Error: " << result.error() << std::endl;
    }
    
    // 使用and_then链式操作
    auto chained = divide(10, 2)
        .and_then([](int x) { return divide(x, 2); })
        .and_then([](int x) { return divide(x, 2); });
    
    if (chained) {
        std::cout << "Chained result: " << *chained << std::endl;
    }
}

// std::mdspan的性能优化
#include <mdspan>

void mdspan_performance() {
    // 2D数组
    int data[3][4] = {
        {1, 2, 3, 4},
        {5, 6, 7, 8},
        {9, 10, 11, 12}
    };
    
    // 创建2D span
    std::mdspan<int, std::extents<size_t, 3, 4>> md(data);
    
    // 访问元素
    std::cout << "Element at (1, 2): " << md[1, 2] << std::endl;
    
    // 范围for循环
    for (size_t i = 0; i < md.extent(0); ++i) {
        for (size_t j = 0; j < md.extent(1); ++j) {
            std::cout << md[i, j] << " ";
        }
        std::cout << std::endl;
    }
    
    // 动态维度
    std::vector<int> vec(12);
    std::mdspan<int, std::dextents<size_t, 2>> dynamic_md(vec.data(), 3, 4);
    
    // 填充数据
    for (size_t i = 0; i < dynamic_md.extent(0); ++i) {
        for (size_t j = 0; j < dynamic_md.extent(1); ++j) {
            dynamic_md[i, j] = i * dynamic_md.extent(1) + j;
        }
    }
}

// 现代C++类型的最佳实践
// 1. 使用std::optional替代nullptr或特殊值
// 2. 使用std::variant替代union
// 3. 使用std::span替代指针+大小
// 4. 使用std::expected替代异常处理（性能关键路径）
// 5. 使用std::mdspan处理多维数组
// 6. 合理使用类型特性进行编译期优化
// 7. 利用小对象优化减少内存开销
// 8. 避免过度使用复杂类型导致编译时间增加

数据类型的最佳实践与性能总结

整数类型最佳实践

选择合适的类型：根据值的范围选择最小的合适类型
使用无符号类型：对于非负数值，使用无符号类型获得额外的一位表示范围
避免类型转换：减少隐式类型转换，特别是有符号和无符号之间的转换
使用固定宽度类型：提高代码可移植性，避免平台差异
位操作优化：对于位标志和掩码，使用位操作替代算术操作

浮点类型最佳实践

精度选择：根据应用需求选择合适的精度（float/double/long double）
避免精度陷阱：了解浮点表示的限制，避免比较浮点数是否相等
使用数值算法：对于求和等操作，使用Kahan求和等数值稳定算法
SIMD优化：对于大规模数据，使用SIMD指令加速浮点运算
异常处理：合理处理浮点异常，避免程序崩溃

字符类型最佳实践

编码选择：优先使用UTF-8编码存储文本
类型安全：使用std::string存储UTF-8，避免char*
编码转换：避免不必要的编码转换，使用字符串视图减少复制
安全处理：验证输入编码的有效性，避免缓冲区溢出
跨平台：注意Windows和Unix平台的编码差异

布尔类型最佳实践

内存优化：对于大量布尔值，使用位压缩或std::vector
性能优化：使用无分支布尔操作，利用分支预测
类型安全：避免将布尔值与整数混用
SIMD优化：对于批量布尔操作，使用SIMD指令

内存布局最佳实践

结构体重排：按成员大小从大到小排序，减少内存填充
缓存优化：考虑缓存行大小，避免伪共享
对齐控制：使用alignas和alignof控制内存对齐
位域使用：对于标志位，使用位域减少内存占用
编译期计算：使用static_assert验证内存布局

现代C++类型最佳实践

零开销抽象：利用std::span等零开销抽象提高代码安全性
类型安全：使用std::optional、std::variant等类型安全的容器
错误处理：使用std::expected替代异常处理（性能关键路径）
多维数组：使用std::mdspan处理多维数组，避免手动索引计算
编译期优化：使用类型特性进行编译期类型检查和优化

性能优化总结

数据类型选择的性能影响：

内存占用：小类型减少内存占用，提高缓存命中率
CPU操作：与寄存器宽度匹配的类型执行更快
类型转换：减少类型转换，避免额外开销
并行处理：合适的类型更适合SIMD指令加速

内存布局的性能影响：

缓存利用率：优化的内存布局提高缓存命中率
内存带宽：减少填充，提高内存带宽利用率
伪共享：避免多个线程访问同一缓存行，减少竞争
对齐访问：对齐的内存访问速度更快

编码实践的性能影响：

位操作：位操作比算术操作更快
数值算法：选择数值稳定的算法减少误差
SIMD指令：利用SIMD指令加速数据并行处理
零开销抽象：使用现代C++的零开销抽象提高代码质量

通过掌握这些技术细节，开发者可以编写更加高效、可靠、可维护的C++代码，充分发挥C++语言的性能优势。