第8章 字符和字符串 字符类型的深度解析 字符类型的底层实现 C++中的字符类型在底层实现上依赖于目标平台和编译器实现,但其设计遵循明确的标准规范:
类型 大小(字节) 范围 底层表示 用途 内存对齐 char 1 -128 到 127 或 0 到 255(取决于实现) 单字节整数 ASCII字符、UTF-8编码单元 1字节 signed char 1 -128 到 127 带符号单字节整数 带符号字符值 1字节 unsigned char 1 0 到 255 无符号单字节整数 无符号字符值、原始字节 1字节 wchar_t 2 或 4 取决于实现 多字节整数 宽字符(平台相关) 2或4字节 char16_t 2 0 到 65535(C++11+) 16位无符号整数 UTF-16编码单元 2字节 char32_t 4 0 到 4294967295(C++11+) 32位无符号整数 UTF-32编码单元、Unicode码点 4字节 char8_t 1 0 到 255(C++20+) 8位无符号整数 UTF-8编码单元 1字节
字符类型的内存表示与硬件映射 字符类型在底层直接映射到CPU的寄存器和内存操作:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 char c = 'A' ; unsigned char uc = 255 ; wchar_t wc = L'中' ; char16_t c16 = u'中' ; char32_t c32 = U'中' ; char8_t c8 = u8'A' ; std::cout << "Size of char: " << sizeof (char ) << " bytes, Alignment: " << alignof (char ) << std::endl; std::cout << "Size of wchar_t: " << sizeof (wchar_t ) << " bytes, Alignment: " << alignof (wchar_t ) << std::endl; std::cout << "Size of char16_t: " << sizeof (char16_t ) << " bytes, Alignment: " << alignof (char16_t ) << std::endl; std::cout << "Size of char32_t: " << sizeof (char32_t ) << " bytes, Alignment: " << alignof (char32_t ) << std::endl; std::cout << "Size of char8_t: " << sizeof (char8_t ) << " bytes, Alignment: " << alignof (char8_t ) << std::endl;
字符类型的汇编级实现 字符操作在底层通过CPU的整数指令实现,不同字符类型对应不同的寄存器宽度:
1 2 3 4 5 6 7 8 9 10 11 ; char操作示例(x86-64) mov al, 'A' ; 将字符'A'加载到AL寄存器(8位) mov byte ptr [c], al ; 存储到内存 ; wchar_t操作示例(x86-64,2字节) mov ax, 0x4E2D ; 将Unicode码点'中'加载到AX寄存器(16位) mov word ptr [wc], ax ; 存储到内存 ; char32_t操作示例(x86-64) mov eax, 0x4E2D ; 将Unicode码点'中'加载到EAX寄存器(32位) mov dword ptr [c32], eax ; 存储到内存
字符常量的高级特性 1. 字符常量的类型系统与值表示 字符常量在C++类型系统中具有明确的类型,并且在编译时会被转换为对应类型的整数值:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 static_assert (std::is_same_v<decltype ('A' ), char >);static_assert (std::is_same_v<decltype (L'A' ), wchar_t >);static_assert (std::is_same_v<decltype (u'A' ), char16_t >);static_assert (std::is_same_v<decltype (U'A' ), char32_t >);static_assert (std::is_same_v<decltype (u8'A' ), char8_t >);constexpr char c = 'A' ; constexpr wchar_t wc = L'\x4E2D' ; constexpr char16_t c16 = u'\u4E2D' ; constexpr char32_t c32 = U'\U00004E2D' ; constexpr int i = 'A' ; constexpr long long ll = L'中' ;
2. 多字符常量的底层实现 多字符常量在底层通过位打包实现,其值依赖于编译器的字节序:
1 2 3 4 5 6 7 8 9 10 11 int multiChar = 'ABCD' ; std::cout << "Multi-character constant: " << std::hex << multiChar << std::endl;
3. Unicode字符常量的编码机制 Unicode字符常量通过转义序列或直接Unicode字符表示,编译器会自动处理编码转换:
1 2 3 4 5 6 7 8 9 10 11 12 13 char16_t c16_1 = u'\u0041' ; char16_t c16_2 = u'\u4E2D' ; char32_t c32_1 = U'\U00000041' ; char32_t c32_2 = U'\U00004E2D' ; char16_t c16_3 = u'A' ; char16_t c16_4 = u'中' ;
4. 字符常量的编译期计算 字符常量可以在编译期进行计算,用于模板元编程和 constexpr 上下文:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 constexpr char lowercaseA = 'A' + 32 ; constexpr char digit5 = '0' + 5 ; constexpr bool isDigit = '5' >= '0' && '5' <= '9' ; constexpr bool isAlpha (char c) { return (c >= 'a' && c <= 'z' ) || (c >= 'A' && c <= 'Z' ); } static_assert (isAlpha ('A' )); static_assert (!isAlpha ('5' )); template <char C> struct CharValue { static constexpr int value = C; }; static_assert (CharValue<'A' >::value == 65 );
转义序列的深度解析 转义序列的类型与编译期处理 转义序列在编译期被处理并转换为对应的字符值,分为以下几类:
转义序列类型 示例 字符值 编译期处理 使用场景 简单转义 \n, \t, \r0x0A, 0x09, 0x0D 直接替换为对应值 文本格式化 引号转义 \', \", \\0x27, 0x22, 0x5C 转义特殊字符 字符串字面量 空字符 \00x00 生成空终止符 字符串结束标记 控制字符 \a, \b, \f, \v0x07, 0x08, 0x0C, 0x0B 生成控制字符 终端控制 十六进制 \x41, \xFF0x41, 0xFF 解析十六进制值 任意字节值 八进制 \101, \0400x41, 0x20 解析八进制值 任意字节值
转义序列的底层实现 转义序列在编译期被词法分析器处理,转换为对应的整数值:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 char newline = '\n' ; char tab = '\t' ; char backslash = '\\' ; char quote = '\'' ; char null = '\0' ; char hexChar1 = '\x41' ; char hexChar2 = '\xFF' ; char octChar1 = '\101' ; char octChar2 = '\040' ; char octChar3 = '\777' ;
原始字符串字面量的深度解析 原始字符串字面量(C++11+)允许包含未转义的特殊字符,其底层实现使用分隔符机制:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 std::string raw1 = R"(Raw string with \n and ")" ; std::cout << raw1 << std::endl; std::string raw2 = R"(=) Raw string with newlines and "quotes" and backslashes \ \ \ (=)" ; std::string mixed = "Normal " + R"(raw)" + " string" ; constexpr auto sqlQuery = R"( SELECT * FROM users WHERE id = ? ORDER BY name )" ;
转义序列的性能影响 转义序列在编译期处理,对运行时性能无影响,但需要注意以下几点:
编译期开销 :复杂的转义序列会增加编译时间,但通常可以忽略字符串长度计算 :转义序列被视为单个字符,如”\n”的长度为1国际化考虑 :在Unicode字符串中,转义序列的处理可能因编码而异调试便利性 :合理使用转义序列可以提高代码可读性1 2 3 4 5 6 7 8 std::string path1 = "C:\\Users\\Name\\File.txt" ; std::string path2 = R"(C:\Users\Name\File.txt)" ; static_assert (sizeof ("Hello\nWorld" ) == 11 ); static_assert ("Hello\nWorld" s.size () == 11 );
字符类型的类型转换 1. 隐式类型转换的底层机制 字符类型的隐式转换遵循C++的类型提升规则,底层通过CPU的零扩展或符号扩展指令实现:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 char c = 'A' ; int i = c; double d = c; unsigned char uc = 255 ; signed char sc = uc;
2. 显式类型转换的深度解析 显式类型转换提供了对转换过程的精确控制,特别是在处理边界情况时:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 int i = 65 ;char c = static_cast <char >(i); unsigned char uc = static_cast <unsigned char >(i); int large = 300 ;unsigned char uc2 = static_cast <unsigned char >(large); signed char sc2 = static_cast <signed char >(large); char8_t utf8Char = u8'A' ;char c8 = static_cast <char >(utf8Char); wchar_t wc = L'A' ;char cw = static_cast <char >(wc);
3. 字符与数字的高效转换 字符与数字之间的转换是常见操作,有多种优化技术:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 char digitChar = '5' ;int digit = digitChar - '0' ; int safeDigitToInt (char c) { if (c >= '0' && c <= '9' ) { return c - '0' ; } throw std::invalid_argument ("Not a digit" ); } int num = 5 ;char numChar = static_cast <char >('0' + num); std::string intToString (int value) { if (value == 0 ) return "0" ; bool negative = false ; if (value < 0 ) { negative = true ; value = -value; } std::string result; while (value > 0 ) { result += static_cast <char >('0' + (value % 10 )); value /= 10 ; } if (negative) { result += '-' ; } std::reverse (result.begin (), result.end ()); return result; }
4. 类型转换的性能影响 不同类型转换的性能特性差异显著:
转换类型 底层实现 性能特性 适用场景 char → int 零扩展/符号扩展 O(1),单指令 字符处理 char → double 整数到浮点数转换 O(1),多条指令 数值计算 int → char 截断 O(1),单指令 范围受限场景 wchar_t → char 截断或转换 O(1),可能有分支 窄字符输出 char → bool 零值检查 O(1),单指令 条件判断
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 void processCharacters (const std::vector<char >& chars) { for (char c : chars) { if (c >= 'A' && c <= 'Z' ) { } } } void batchConvert (const std::vector<char >& input, std::vector<int >& output) { output.resize (input.size ()); for (size_t i = 0 ; i < input.size (); i++) { output[i] = static_cast <int >(input[i]); } }
字符类型的最佳实践 1. 字符类型的选择策略 选择合适的字符类型需要考虑编码需求、平台兼容性和性能特性:
使用场景 推荐类型 底层实现 性能特性 兼容性 ASCII字符 char 单字节整数 最高效 全平台 原始字节 unsigned char 无符号单字节整数 缓存友好 全平台 UTF-8编码 char8_t (C++20+) 或 char 单字节编码单元 缓存友好 现代编译器 UTF-16编码 char16_t 16位无符号整数 中等 C++11+ UTF-32编码 char32_t 32位无符号整数 较低 C++11+ 宽字符(Windows) wchar_t 2字节UTF-16 中等 Windows API 宽字符(Unix) char32_t 4字节UTF-32 较低 Unix系统
2. 字符处理的性能优化技术 字符处理是高频操作,需要使用高效的实现技术:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 bool isVowel (char c) { static constexpr unsigned int vowelMask = ( 1u << ('a' - 'a' ) | 1u << ('e' - 'a' ) | 1u << ('i' - 'a' ) | 1u << ('o' - 'a' ) | 1u << ('u' - 'a' ) | 1u << ('A' - 'a' ) | 1u << ('E' - 'a' ) | 1u << ('I' - 'a' ) | 1u << ('O' - 'a' ) | 1u << ('U' - 'a' ) ); if (c < 'A' || (c > 'Z' && c < 'a' ) || c > 'z' ) { return false ; } return (vowelMask & (1u << (c - 'a' ))) != 0 ; } char toUpper (char c) { static constexpr char upperTable[256 ] = []() { char table[256 ] = {0 }; for (int i = 0 ; i < 256 ; i++) { if (i >= 'a' && i <= 'z' ) { table[i] = static_cast <char >(i - 32 ); } else { table[i] = static_cast <char >(i); } } return table; }(); return upperTable[static_cast <unsigned char >(c)]; } #include <immintrin.h> void batchToUpper (char * data, size_t size) { size_t i = 0 ; for (; i + 31 < size; i += 32 ) { __m256i vec = _mm256_loadu_si256(reinterpret_cast <const __m256i*>(data + i)); __m256i mask = _mm256_set1_epi8(0x20 ); __m256i upper = _mm256_andnot_si256(mask, vec); _mm256_storeu_si256(reinterpret_cast <__m256i*>(data + i), upper); } for (; i < size; i++) { if (data[i] >= 'a' && data[i] <= 'z' ) { data[i] -= 32 ; } } }
3. 字符处理的安全性考虑 字符处理中的安全性主要涉及边界检查、符号扩展和编码验证:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 void processRawBytes (const unsigned char * data, size_t size) { for (size_t i = 0 ; i < size; i++) { unsigned char byte = data[i]; processByte (byte); } } char safeGetChar () { int c = getchar (); if (c == EOF) { return '\0' ; } return static_cast <char >(static_cast <unsigned char >(c)); } bool isValidUTF8LeadByte (unsigned char c) { if (c <= 0x7F ) { return true ; } else if (c >= 0xC0 && c <= 0xF7 ) { return true ; } return false ; } bool isPrintableASCII (char c) { unsigned char uc = static_cast <unsigned char >(c); return uc >= 0x20 && uc <= 0x7E ; }
4. 字符类型的现代C++特性 现代C++提供了更安全、更灵活的字符处理机制:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 #include <concepts> template <typename T>concept NarrowCharType = std::integral<T> && sizeof (T) == 1 ;template <typename T>concept WideCharType = std::integral<T> && sizeof (T) > 1 ;template <NarrowCharType T>void processNarrowChar (T c) { std::cout << "Narrow character: " << static_cast <int >(c) << std::endl; } template <WideCharType T>void processWideChar (T c) { std::cout << "Wide character: " << static_cast <int >(c) << std::endl; } #include <type_traits> static_assert (std::is_trivial_v<char >);static_assert (std::is_standard_layout_v<char >);static_assert (std::is_integral_v<char >);static_assert (!std::is_floating_point_v<char >);template <typename T>constexpr bool isUnicodeCharType () { return std::is_same_v<T, char16_t > || std::is_same_v<T, char32_t > || std::is_same_v<T, char8_t >; } static_assert (isUnicodeCharType <char16_t >());static_assert (isUnicodeCharType <char32_t >());static_assert (isUnicodeCharType <char8_t >());static_assert (!isUnicodeCharType <char >());
字符类型的现代C++特性 1. 字符类型的概念(C++20+) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 #include <concepts> template <typename T>concept CharType = std::integral<T> && sizeof (T) == 1 ;template <typename T>concept WideCharType = std::integral<T> && sizeof (T) > 1 ;template <CharType T>void processNarrowChar (T c) { std::cout << "Narrow character: " << static_cast <int >(c) << std::endl; } template <WideCharType T>void processWideChar (T c) { std::cout << "Wide character: " << static_cast <int >(c) << std::endl; } processNarrowChar ('A' ); processWideChar (L'A' ); processWideChar (u'A' ); processWideChar (U'A' );
2. 字符类型的类型特征 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 #include <type_traits> static_assert (std::is_integral_v<char >);static_assert (std::is_trivial_v<char >);static_assert (std::is_standard_layout_v<char >);static_assert (std::is_pod_v<char >); static_assert (sizeof (char ) == 1 );static_assert (sizeof (wchar_t ) >= sizeof (char ));static_assert (sizeof (char16_t ) == 2 );static_assert (sizeof (char32_t ) == 4 );static_assert (sizeof (char8_t ) == 1 );static_assert (std::is_signed_v<signed char >);static_assert (!std::is_signed_v<unsigned char >);
字符类型的底层实现细节 1. 字符类型的存储 char :通常存储在一个字节中,其有符号性由编译器决定wchar_t :在Windows上通常为2字节(UTF-16),在Unix上通常为4字节(UTF-32)char16_t :固定为2字节,用于UTF-16编码char32_t :固定为4字节,用于UTF-32编码char8_t :固定为1字节,用于UTF-8编码(C++20+)2. 字符类型的对齐 1 2 3 4 5 6 7 8 9 std::cout << "Alignment of char: " << alignof (char ) << " bytes" << std::endl; std::cout << "Alignment of wchar_t: " << alignof (wchar_t ) << " bytes" << std::endl; std::cout << "Alignment of char16_t: " << alignof (char16_t ) << " bytes" << std::endl; std::cout << "Alignment of char32_t: " << alignof (char32_t ) << " bytes" << std::endl; std::cout << "Alignment of char8_t: " << alignof (char8_t ) << " bytes" << std::endl;
字符类型的性能分析 1. 字符操作的性能 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #include <benchmark/benchmark.h> static void BM_CharComparison (benchmark::State& state) { char c1 = 'A' , c2 = 'B' ; for (auto _ : state) { bool result = c1 == c2; benchmark::DoNotOptimize (result); } } static void BM_CharConversion (benchmark::State& state) { char c = 'A' ; for (auto _ : state) { int i = static_cast <int >(c); benchmark::DoNotOptimize (i); } } BENCHMARK (BM_CharComparison);BENCHMARK (BM_CharConversion);BENCHMARK_MAIN ();
2. 字符类型的内存访问模式 char :单字节访问,缓存友好wchar_t :多字节访问,可能导致对齐问题char16_t :2字节访问,需要2字节对齐char32_t :4字节访问,需要4字节对齐char8_t :单字节访问,缓存友好总结 字符类型是C++中最基本的数据类型之一,选择合适的字符类型对于字符串处理、编码转换和性能优化至关重要。现代C++提供了多种字符类型以支持不同的编码方案,特别是Unicode编码。
在实际编程中,应根据具体的使用场景选择合适的字符类型:
对于ASCII字符和UTF-8编码,使用char或char8_t(C++20+) 对于UTF-16编码,使用char16_t 对于UTF-32编码,使用char32_t 对于原始字节数据,使用unsigned char 对于与平台API交互,使用wchar_t(特别是Windows) 通过理解字符类型的底层实现、内存表示和性能特性,可以编写更加高效、可靠的字符处理代码。
C风格字符串的深度解析 字符串字面量的底层实现 C风格字符串是由字符组成的数组,以空字符('\0')结尾。其底层实现涉及编译期处理、内存布局和运行时操作:
1. 字符串字面量的编译期处理 字符串字面量在编译期被处理并存储在只读内存段:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 const char * str1 = "Hello, world!" ; const char str2[] = "Hello, world!" ; static_assert (std::is_same_v<decltype ("Hello" ), const char [6 ]>);static_assert (sizeof ("Hello" ) == 6 ); static_assert (sizeof ("Hello" s) == sizeof (std::string)); constexpr auto strLen = sizeof ("Hello" ) - 1 ;static_assert (strLen == 5 );
2. 字符串字面量的内存布局与链接 字符串字面量的内存布局受编译器和链接器的影响:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 const char * s1 = "Hello" ; const char * s2 = "Hello" ; std::cout << "s1: " << static_cast <const void *>(s1) << std::endl; std::cout << "s2: " << static_cast <const void *>(s2) << std::endl; std::cout << "s1 == s2: " << (s1 == s2) << std::endl;
3. 多行字符串和原始字符串的深度解析 C++11+支持原始字符串字面量,其底层实现使用分隔符机制:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 const char * multiLine = "Line 1\n" "Line 2\n" "Line 3" ; const char * raw = R"(Raw string with \n and ")" ; const char * rawWithDelimiter = R"(=) Raw string with newlines and "quotes" (=)" ; const char * normal = "Raw string with \\n and \"" ;const char * rawEq = R"(Raw string with \n and ")" ;
4. 字符串字面量的安全性考虑 字符串字面量存储在只读内存中,修改它们会导致未定义行为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 char * str = "Hello" ; str[0 ] = 'h' ; char str[] = "Hello" ; str[0 ] = 'h' ; char * dynamicStr = new char [6 ];std::strcpy (dynamicStr, "Hello" ); dynamicStr[0 ] = 'h' ; delete [] dynamicStr;
字符串操作函数的深度分析 C标准库提供了一系列字符串操作函数,声明在<cstring>头文件中。这些函数的底层实现涉及内存操作、算法设计和性能优化:
1. 字符串长度计算的底层实现 strlen函数通过线性扫描计算字符串长度,底层使用字节级比较:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 #include <cstring> const char * str = "Hello" ;size_t length = strlen (str); size_t my_strlen (const char * s) { const char * p = s; while (*p) { ++p; } return p - s; } size_t safeStrlen (const char * str, size_t maxLength) { if (!str) return 0 ; const char * end = str; while (*end && (end - str) < maxLength) { ++end; } return end - str; }
2. 字符串复制的内存操作 strcpy和strncpy函数的底层实现涉及内存复制和边界检查:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 char dest[20 ];strcpy (dest, "Hello" );strncpy (dest, "Hello" , sizeof (dest) - 1 );dest[sizeof (dest) - 1 ] = '\0' ; char * my_strcpy (char * dest, const char * src) { char * p = dest; while ((*p++ = *src++)) { } return dest; } char * safeStrcpy (char * dest, size_t destSize, const char * src) { if (!dest || !src || destSize == 0 ) return nullptr ; size_t srcLen = strlen (src); size_t copyLen = (srcLen < destSize - 1 ) ? srcLen : (destSize - 1 ); memcpy (dest, src, copyLen); dest[copyLen] = '\0' ; return dest; }
3. 字符串连接的内存管理 strcat和strncat函数的底层实现涉及长度计算和内存复制:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 char dest[20 ] = "Hello" ;strcat (dest, " world" );size_t destLen = strlen (dest);strncat (dest, " world" , sizeof (dest) - destLen - 1 );char * my_strcat (char * dest, const char * src) { char * p = dest; while (*p) { ++p; } while ((*p++ = *src++)) { } return dest; } char * safeStrcat (char * dest, size_t destSize, const char * src) { if (!dest || !src || destSize == 0 ) return nullptr ; size_t destLen = strlen (dest); if (destLen >= destSize - 1 ) return dest; size_t srcLen = strlen (src); size_t copyLen = (srcLen < destSize - destLen - 1 ) ? srcLen : (destSize - destLen - 1 ); memcpy (dest + destLen, src, copyLen); dest[destLen + copyLen] = '\0' ; return dest; }
4. 字符串比较的算法实现 strcmp和strncmp函数的底层实现使用逐字节比较:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 int result = strcmp ("Hello" , "World" ); result = strcmp ("Hello" , "Hello" ); result = strcmp ("World" , "Hello" ); int my_strcmp (const char * s1, const char * s2) { while (*s1 && *s1 == *s2) { ++s1; ++s2; } return static_cast <unsigned char >(*s1) - static_cast <unsigned char >(*s2); } result = strncmp ("Hello" , "World" , 3 ); int strcmpIgnoreCase (const char * s1, const char * s2) { if (!s1 || !s2) return (s1 == s2) ? 0 : (s1 ? 1 : -1 ); while (*s1 && *s2) { unsigned char c1 = static_cast <unsigned char >(tolower (*s1)); unsigned char c2 = static_cast <unsigned char >(tolower (*s2)); if (c1 != c2) return (c1 < c2) ? -1 : 1 ; ++s1; ++s2; } return (s1 && !s2) ? 1 : (!s1 && s2) ? -1 : 0 ; }
5. 字符串搜索的算法分析 strchr、strrchr和strstr函数的底层实现涉及搜索算法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 const char * found = strchr ("Hello" , 'l' ); found = strrchr ("Hello" , 'l' ); found = strstr ("Hello world" , "world" ); const char * my_strchr (const char * s, int c) { while (*s && *s != static_cast <char >(c)) { ++s; } return *s == static_cast <char >(c) ? s : nullptr ; } const char * my_strstr (const char * haystack, const char * needle) { if (!*needle) { return haystack; } const char * h = haystack; while (*h) { const char * h2 = h; const char * n = needle; while (*h2 && *n && *h2 == *n) { ++h2; ++n; } if (!*n) { return h; } ++h; } return nullptr ; } const char * fastStrStr (const char * haystack, const char * needle) { if (!haystack || !needle || !*needle) return haystack; const char * haystackEnd = haystack; while (*haystackEnd) ++haystackEnd; size_t needleLen = strlen (needle); if (needleLen > static_cast <size_t >(haystackEnd - haystack)) { return nullptr ; } const char * haystackPtr = haystack; while (haystackPtr <= haystackEnd - needleLen) { const char * needlePtr = needle; const char * haystackCheck = haystackPtr; while (*needlePtr && *haystackCheck == *needlePtr) { ++needlePtr; ++haystackCheck; } if (!*needlePtr) { return haystackPtr; } ++haystackPtr; } return nullptr ; }
字符串输入和输出的最佳实践 1. 字符串输入的安全性与性能 字符串输入操作涉及缓冲区管理、错误处理和性能优化:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 char name[50 ];std::cin >> name; std::cin.getline (name, sizeof (name)); if (std::cin.fail ()) { std::cin.clear (); std::cin.ignore (std::numeric_limits<std::streamsize>::max (), '\n' ); std::cout << "Input too long, truncated." << std::endl; } bool safeGetLine (char * buffer, size_t bufferSize) { if (!buffer || bufferSize == 0 ) return false ; std::cin.getline (buffer, bufferSize); if (std::cin.fail ()) { std::cin.clear (); std::cin.ignore (std::numeric_limits<std::streamsize>::max (), '\n' ); return false ; } return true ; } std::string safeInput; std::getline (std::cin, safeInput);
2. 字符串输出的优化与底层实现 字符串输出操作涉及格式化处理、缓冲区管理和性能优化:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 const char * str = "Hello, world!" ;std::cout << str << std::endl; void fastPrint (const char * str) { if (!str) return ; size_t len = strlen (str); std::cout.write (str, len); std::cout << std::endl; } puts (str); std::string message = "Hello, C++!" ; std::cout << message << std::endl; #include <format> std::string formatted = std::format("Hello, {}!" , "World" ); std::cout << formatted << std::endl;
3. 字符串输入输出的性能优化策略 针对不同场景的性能优化策略:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 void batchInput (std::vector<std::string>& lines, size_t maxLines) { lines.reserve (maxLines); std::string line; for (size_t i = 0 ; i < maxLines && std::getline (std::cin, line); ++i) { lines.push_back (std::move (line)); } } void batchOutput (const std::vector<std::string>& lines) { std::stringstream ss; for (const auto & line : lines) { ss << line << '\n' ; } std::cout << ss.str (); } #include <fstream> #include <sys/mman.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> void processLargeFile (const char * filename) { int fd = open (filename, O_RDONLY); if (fd == -1 ) return ; struct stat sb; if (fstat (fd, &sb) == -1 ) { close (fd); return ; } char * addr = static_cast <char *>(mmap (nullptr , sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0 )); if (addr != MAP_FAILED) { munmap (addr, sb.st_size); } close (fd); }
C风格字符串的性能优化 1. 内存布局优化 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 void processSmallString (const char * input) { char buffer[256 ]; size_t len = strlen (input); if (len < sizeof (buffer) - 1 ) { strncpy (buffer, input, sizeof (buffer) - 1 ); buffer[sizeof (buffer) - 1 ] = '\0' ; } else { char * largeBuffer = new char [len + 1 ]; strcpy (largeBuffer, input); delete [] largeBuffer; } }
2. 字符串操作的性能分析 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 #include <benchmark/benchmark.h> static void BM_Strlen (benchmark::State& state) { const char * str = "Hello, world! This is a test string." ; for (auto _ : state) { size_t len = strlen (str); benchmark::DoNotOptimize (len); } } static void BM_Strcpy (benchmark::State& state) { const char * src = "Hello, world!" ; char dest[20 ]; for (auto _ : state) { strcpy (dest, src); benchmark::DoNotOptimize (dest); } } static void BM_Strcmp (benchmark::State& state) { const char * s1 = "Hello, world!" ; const char * s2 = "Hello, world!" ; for (auto _ : state) { int result = strcmp (s1, s2); benchmark::DoNotOptimize (result); } } BENCHMARK (BM_Strlen);BENCHMARK (BM_Strcpy);BENCHMARK (BM_Strcmp);BENCHMARK_MAIN ();
3. 字符串操作的高级优化技巧 避免重复计算长度 :缓存 strlen 的结果,减少重复扫描使用 memcpy 替代 strcpy :对于已知长度的字符串,提高复制速度使用 memcmp 替代 strcmp :对于已知长度的字符串,减少函数调用开销预分配足够空间 :避免频繁的重新分配和复制使用栈上缓冲区 :对于小字符串,利用栈内存的快速访问特性内存对齐 :提高SIMD指令的访问效率内存池 :避免频繁的动态内存分配和释放缓存友好 :分块处理大字符串,减少缓存未命中1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 void optimizedStringOperations () { const char * src = "Hello, world!" ; size_t srcLen = strlen (src); char dest[20 ]; strncpy (dest, src, sizeof (dest) - 1 ); dest[sizeof (dest) - 1 ] = '\0' ; char dest2[20 ]; memcpy (dest2, src, srcLen); dest2[srcLen] = '\0' ; const char * s1 = "Hello" ; const char * s2 = "Hello" ; bool equal = (memcmp (s1, s2, 5 ) == 0 ); } void alignedStringProcessing () { alignas (16 ) char alignedBuffer[256 ]; const char * input = "Aligned memory test string" ; size_t inputLen = strlen (input); memcpy (alignedBuffer, input, inputLen + 1 ); } class StringMemoryPool {private : static constexpr size_t BLOCK_SIZE = 4096 ; std::vector<char *> blocks; size_t currentBlock = 0 ; size_t currentPos = 0 ; public : ~StringMemoryPool () { for (char * block : blocks) { delete [] block; } } char * allocate (size_t size) { if (currentBlock >= blocks.size () || currentPos + size > BLOCK_SIZE) { char * newBlock = new char [BLOCK_SIZE]; blocks.push_back (newBlock); currentBlock = blocks.size () - 1 ; currentPos = 0 ; } char * result = blocks[currentBlock] + currentPos; currentPos += size; return result; } char * copyString (const char * str) { size_t len = strlen (str) + 1 ; char * buffer = allocate (len); memcpy (buffer, str, len); return buffer; } }; void cacheFriendlyStringProcess (const char * str) { size_t len = strlen (str); const size_t CACHE_LINE_SIZE = 64 ; for (size_t i = 0 ; i < len; i += CACHE_LINE_SIZE) { size_t blockSize = std::min (CACHE_LINE_SIZE, len - i); processStringBlock (str + i, blockSize); } }
4. 字符串操作的SIMD优化 利用现代CPU的SIMD指令加速字符串操作:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 #include <immintrin.h> size_t simd_strlen (const char * s) { const char * p = s; while (reinterpret_cast <uintptr_t >(p) % 16 != 0 ) { if (!*p) return p - s; ++p; } const __m128i zero = _mm_setzero_si128(); while (true ) { __m128i chunk = _mm_load_si128(reinterpret_cast <const __m128i*>(p)); __m128i cmp = _mm_cmpeq_epi8(chunk, zero); int mask = _mm_movemask_epi8(cmp); if (mask != 0 ) { return p - s + __builtin_ctz(mask); } p += 16 ; } } int simd_strcmp (const char * s1, const char * s2) { const char * p1 = s1; const char * p2 = s2; while (reinterpret_cast <uintptr_t >(p1) % 16 != 0 && *p1 && *p1 == *p2) { ++p1; ++p2; } if (!*p1 || *p1 != *p2) { return static_cast <unsigned char >(*p1) - static_cast <unsigned char >(*p2); } const __m128i zero = _mm_setzero_si128(); while (true ) { __m128i chunk1 = _mm_load_si128(reinterpret_cast <const __m128i*>(p1)); __m128i chunk2 = _mm_load_si128(reinterpret_cast <const __m128i*>(p2)); __m128i cmp_eq = _mm_cmpeq_epi8(chunk1, chunk2); __m128i cmp_zero1 = _mm_cmpeq_epi8(chunk1, zero); int eq_mask = _mm_movemask_epi8(cmp_eq); int zero_mask = _mm_movemask_epi8(cmp_zero1); if (eq_mask != 0xFFFF ) { int pos = __builtin_ctz(~eq_mask); return static_cast <unsigned char >(p1[pos]) - static_cast <unsigned char >(p2[pos]); } if (zero_mask != 0 ) { return 0 ; } p1 += 16 ; p2 += 16 ; } } size_t optimized_strlen (const char * s) { if (!s[0 ]) return 0 ; if (!s[1 ]) return 1 ; if (!s[2 ]) return 2 ; if (!s[3 ]) return 3 ; size_t len = 0 ; while (s[len]) { len += 4 ; } while (len > 0 && !s[len-1 ]) { len--; } return len; }
5. 字符串操作的最佳实践 综合各种优化策略,总结字符串操作的最佳实践:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 void stringOperationBestPractices () { const char * smallStr = "Hello, world!" ; char smallBuffer[128 ]; size_t smallLen = strlen (smallStr); memcpy (smallBuffer, smallStr, smallLen + 1 ); const char * largeStr = "This is a very long string that exceeds the cache line size..." ; size_t largeLen = strlen (largeStr); std::unique_ptr<char []> largeBuffer (new char [largeLen + 1 ]) ; memcpy (largeBuffer.get (), largeStr, largeLen + 1 ); const char * s1 = "Test string for comparison" ; const char * s2 = "Test string for comparison" ; size_t compareLen = strlen (s1); bool equal = (memcmp (s1, s2, compareLen) == 0 ); const char * haystack = "Hello, world! This is a test string." ; const char * needle = "test" ; const char * found = strstr (haystack, needle); char concatBuffer[256 ] = "Hello, " ; const char * suffix = "world!" ; size_t prefixLen = strlen (concatBuffer); size_t suffixLen = strlen (suffix); if (prefixLen + suffixLen < sizeof (concatBuffer) - 1 ) { memcpy (concatBuffer + prefixLen, suffix, suffixLen + 1 ); } StringMemoryPool pool; char * pooledStr = pool.copyString ("Memory pool allocated string" ); }
C风格字符串的安全性 1. 常见安全问题的深度分析 安全问题 技术原因 风险后果 解决方案 缓冲区溢出 输入字符串长度超过缓冲区大小,导致覆盖相邻内存 程序崩溃、代码执行、数据泄露 使用 strncpy、strncat 等带长度限制的函数 空指针解引用 对 nullptr 调用字符串函数,导致内存访问异常 程序崩溃、拒绝服务 检查指针是否为 nullptr 再调用函数 未终止的字符串 缺少空字符,导致字符串函数扫描越界 未定义行为、内存访问错误 确保所有字符串都以 '\0' 结尾 字符串字面量修改 尝试修改只读内存中的字符串,违反内存保护 程序崩溃、段错误 使用 const char* 并避免修改 整数溢出 字符串长度计算时发生溢出,导致逻辑错误 缓冲区溢出、内存损坏 使用 size_t 类型并检查边界条件 格式化字符串漏洞 用户输入直接作为格式字符串,执行未预期操作 代码执行、内存泄露 使用固定格式字符串,用户输入作为参数 时间-of-check 到 time-of-use 漏洞 检查和使用之间字符串状态发生变化 缓冲区溢出、安全绕过 原子操作或复制到临时缓冲区
2. 安全的字符串处理实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 bool safeStringCopy (char * dest, size_t destSize, const char * src) { if (!dest || !src || destSize == 0 ) { return false ; } size_t srcLen = 0 ; const char * p = src; while (*p && srcLen < SIZE_MAX) { ++p; ++srcLen; } if (srcLen == SIZE_MAX) { return false ; } if (srcLen >= destSize) { memcpy (dest, src, destSize - 1 ); dest[destSize - 1 ] = '\0' ; return false ; } memcpy (dest, src, srcLen + 1 ); return true ; } bool safeStringConcat (char * dest, size_t destSize, const char * src) { if (!dest || !src || destSize == 0 ) { return false ; } size_t destLen = 0 ; char * q = dest; while (*q && destLen < destSize - 1 ) { ++q; ++destLen; } if (destLen >= destSize - 1 ) { return false ; } size_t remaining = destSize - destLen; size_t srcLen = 0 ; const char * p = src; while (*p && srcLen < remaining - 1 ) { ++p; ++srcLen; } if (srcLen >= remaining) { memcpy (dest + destLen, src, remaining - 1 ); dest[destLen + remaining - 1 ] = '\0' ; return false ; } memcpy (dest + destLen, src, srcLen + 1 ); return true ; } size_t safeStringLength (const char * str, size_t maxLength) { if (!str || maxLength == 0 ) { return 0 ; } size_t len = 0 ; const char * p = str; while (*p && len < maxLength) { ++p; ++len; } return len; } int safeStringCompare (const char * s1, const char * s2, size_t maxLength) { if (!s1 && !s2) return 0 ; if (!s1) return -1 ; if (!s2) return 1 ; size_t i = 0 ; while (i < maxLength && *s1 && *s2 && *s1 == *s2) { ++s1; ++s2; ++i; } if (i >= maxLength) return 0 ; if (!*s1 && !*s2) return 0 ; if (!*s1) return -1 ; if (!*s2) return 1 ; return static_cast <unsigned char >(*s1) - static_cast <unsigned char >(*s2); }
3. 安全的字符串操作库实现 实现一个完整的安全字符串操作库,提供类型安全和边界检查:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 class SafeString {private : char * buffer; size_t capacity; size_t length; bool ensureCapacity (size_t required) { if (required <= capacity) { return true ; } size_t newCapacity = capacity * 2 ; if (newCapacity < required) { newCapacity = required; } if (newCapacity < required || newCapacity > SIZE_MAX - 1 ) { return false ; } char * newBuffer = new char [newCapacity + 1 ]; if (!newBuffer) { return false ; } if (buffer) { memcpy (newBuffer, buffer, length + 1 ); delete [] buffer; } else { newBuffer[0 ] = '\0' ; } buffer = newBuffer; capacity = newCapacity; return true ; } public : SafeString (size_t initialCapacity = 64 ) : buffer (nullptr ), capacity (0 ), length (0 ) { ensureCapacity (initialCapacity); } ~SafeString () { delete [] buffer; } SafeString (const SafeString& other) : buffer (nullptr ), capacity (0 ), length (0 ) { if (other.buffer) { ensureCapacity (other.length); memcpy (buffer, other.buffer, other.length + 1 ); length = other.length; } } SafeString (SafeString&& other) noexcept : buffer (other.buffer), capacity (other.capacity), length (other.length) { other.buffer = nullptr ; other.capacity = 0 ; other.length = 0 ; } SafeString (const char * str) : buffer (nullptr ), capacity (0 ), length (0 ) { if (str) { size_t strLen = strlen (str); ensureCapacity (strLen); memcpy (buffer, str, strLen + 1 ); length = strLen; } } SafeString& operator =(const SafeString& other) { if (this != &other) { if (other.buffer) { ensureCapacity (other.length); memcpy (buffer, other.buffer, other.length + 1 ); length = other.length; } else { clear (); } } return *this ; } SafeString& operator =(SafeString&& other) noexcept { if (this != &other) { delete [] buffer; buffer = other.buffer; capacity = other.capacity; length = other.length; other.buffer = nullptr ; other.capacity = 0 ; other.length = 0 ; } return *this ; } bool append (const char * str) { if (!str) { return false ; } size_t strLen = strlen (str); if (!ensureCapacity (length + strLen)) { return false ; } memcpy (buffer + length, str, strLen + 1 ); length += strLen; return true ; } bool append (char c) { if (!ensureCapacity (length + 1 )) { return false ; } buffer[length] = c; buffer[length + 1 ] = '\0' ; ++length; return true ; } void clear () { if (buffer) { buffer[0 ] = '\0' ; } length = 0 ; } size_t size () const { return length; } size_t getCapacity () const { return capacity; } const char * c_str () const { return buffer ? buffer : "" ; } bool operator ==(const SafeString& other) const { return strcmp (c_str (), other.c_str ()) == 0 ; } bool operator !=(const SafeString& other) const { return !(*this == other); } bool operator <(const SafeString& other) const { return strcmp (c_str (), other.c_str ()) < 0 ; } bool operator <=(const SafeString& other) const { return strcmp (c_str (), other.c_str ()) <= 0 ; } bool operator >(const SafeString& other) const { return strcmp (c_str (), other.c_str ()) > 0 ; } bool operator >=(const SafeString& other) const { return strcmp (c_str (), other.c_str ()) >= 0 ; } }; void safeStringUsage () { SafeString s1 (100 ) ; s1. append ("Hello" ); s1. append (", " ); s1. append ("world!" ); std::cout << "String: " << s1. c_str () << std::endl; std::cout << "Length: " << s1. size () << std::endl; std::cout << "Capacity: " << s1. getCapacity () << std::endl; SafeString s2 = s1; std::cout << "s2: " << s2. c_str () << std::endl; SafeString s3 = "Test" ; s3. append (s1. c_str ()); std::cout << "s3: " << s3. c_str () << std::endl; }
4. 安全编码实践指南 输入验证 :对所有用户输入进行长度和内容验证边界检查 :在所有字符串操作中检查边界条件内存安全 :使用安全的内存分配和释放函数类型安全 :使用适当的类型(如 size_t 表示长度)错误处理 :检查所有函数调用的返回值代码审查 :定期审查字符串操作代码的安全性工具检测 :使用静态分析工具检测安全漏洞最小权限 :限制字符串操作的权限范围1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 void secureInputHandling () { constexpr size_t BUFFER_SIZE = 256 ; char buffer[BUFFER_SIZE]; std::cout << "Enter your name: " << std::endl; std::cin.getline (buffer, BUFFER_SIZE); if (std::cin.fail ()) { std::cin.clear (); std::cin.ignore (std::numeric_limits<std::streamsize>::max (), '\n' ); std::cout << "Input too long, truncated." << std::endl; } bool valid = true ; for (size_t i = 0 ; buffer[i] && valid; ++i) { if (!std::isprint (static_cast <unsigned char >(buffer[i])) && !std::isspace (static_cast <unsigned char >(buffer[i]))) { valid = false ; } } if (valid) { std::cout << "Hello, " << buffer << "!" << std::endl; } else { std::cout << "Invalid input." << std::endl; } } void securePrintf (const char * format, const char * input) { printf ("%s: %s\n" , format, input); } void avoidFormatVulnerability (const char * userInput) { printf ("User input: %s\n" , userInput); }
C风格字符串与现代C++的集成 1. 与 std::string 的互操作 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 std::string cppStr = "Hello, world!" ; const char * cStr = cppStr.c_str (); const char * data = cppStr.data (); const char * cStr2 = "Hello" ;std::string cppStr2 (cStr2) ;void processString (const char * cStr) { std::string cppStr (cStr) ; cppStr += "!" ; std::cout << cppStr.c_str () << std::endl; }
2. 与 std::string_view 的集成(C++17+) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 #include <string_view> std::string_view sv1 = "Hello, world!" ; std::string cppStr = "Hello" ; std::string_view sv2 (cppStr) ; const char * cStr = sv1. data ();void processStringView (std::string_view sv) { std::cout << "Length: " << sv.length () << std::endl; std::cout << "Substring: " << sv.substr (0 , 5 ) << std::endl; std::string copy = std::string (sv); copy += "!" ; std::cout << copy << std::endl; }
C风格字符串的最佳实践 1. 何时使用C风格字符串 与C库交互 :当需要调用C语言库函数时性能关键路径 :对于非常注重性能的场景内存受限环境 :在内存受限的嵌入式系统中底层系统编程 :操作系统内核、驱动程序等2. 最佳实践总结 实践 原因 示例 使用 const char* 避免修改字符串字面量 const char* str = "Hello";检查空指针 避免空指针解引用 if (str) { /* 处理 */ }确保空字符终止 避免未定义行为 buffer[sizeof(buffer)-1] = '\0';使用安全的字符串函数 避免缓冲区溢出 strncpy(dest, src, size);缓存字符串长度 提高性能 size_t len = strlen(str);优先使用 std::string 现代C++推荐 std::string str = "Hello";使用 std::string_view 高效字符串视图 std::string_view sv = str;
总结 C风格字符串是C++中最基本的字符串表示形式,虽然在现代C++中推荐使用 std::string 和 std::string_view,但C风格字符串仍然在许多场景中发挥着重要作用,特别是与C库交互、性能关键路径和底层系统编程。
掌握C风格字符串的底层实现、操作技巧和安全实践,对于编写高效、可靠的C++代码至关重要。通过合理使用安全的字符串函数、性能优化技巧以及与现代C++特性的集成,可以充分发挥C风格字符串的优势,同时避免其潜在的安全问题。
在实际编程中,应根据具体的使用场景选择合适的字符串表示形式:对于大多数应用场景,推荐使用 std::string;对于需要高效字符串视图的场景,使用 std::string_view;对于与C库交互或性能关键的场景,使用C风格字符串。
string 类的深度解析 std::string的底层实现 std::string是C++标准库提供的字符串类,其底层实现具有以下特点:
1. 内存布局 小字符串优化(SSO) :对于短字符串,直接存储在栈上,避免堆分配动态内存管理 :对于长字符串,使用堆内存分配引用计数(某些实现) :早期实现使用引用计数,现代实现通常不使用1 2 3 4 5 6 7 8 9 std::string shortStr = "Hello" ; std::string longStr = "Hello, world! This is a long string that exceeds SSO buffer." ; std::cout << "Size of std::string: " << sizeof (std::string) << " bytes" << std::endl;
2. 容量管理 1 2 3 4 5 6 7 8 9 10 11 12 13 14 std::string s = "Hello" ; std::cout << "Size: " << s.size () << std::endl; std::cout << "Capacity: " << s.capacity () << std::endl; s.reserve (100 ); std::cout << "Capacity after reserve: " << s.capacity () << std::endl; s.shrink_to_fit (); std::cout << "Capacity after shrink_to_fit: " << s.capacity () << std::endl;
std::string的高级用法 1. 字符串初始化 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 #include <string> #include <initializer_list> std::string s1; std::string s2 = "Hello" ; std::string s3 ("Hello" ) ; std::string s4 (5 , 'a' ) ; std::string s5 (s2) ; std::string s6 (s2, 1 , 3 ) ; std::string s7 ({ 'H' , 'e' , 'l' , 'l' , 'o' }) ; std::string s8 (s2. begin(), s2. end()) ; std::string s9 ("Hello" , 5 ) ; #include <string_view> std::string_view sv = "Hello" ; std::string s10 (sv) ;
2. 字符串赋值 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 std::string s; s = "World" ; s = s2; s.assign ("Hello" ); s.assign ("Hello" , 2 , 3 ); s.assign (5 , 'x' ); s.assign (s2. begin (), s2. end ()); s.assign ({ 'W' , 'o' , 'r' , 'l' , 'd' }); s.assign (sv);
3. 字符串拼接 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 std::string s = "Hello" ; s += " world" ; s += '!' ; s.append ("!" ); s.append ("abc" , 2 ); s.append (3 , '?' ); s.append (s2); s.append (s2, 0 , 2 ); s.append (s2. begin (), s2. end ()); s.append (sv); s.reserve (s.size () + 10 ); s += " world!" ;
4. 字符串访问 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 std::string s = "Hello" ; char c1 = s[0 ]; char c2 = s.at (0 ); try { char c3 = s.at (10 ); } catch (const std::out_of_range& e) { std::cout << "Out of range: " << e.what () << std::endl; } char front = s.front (); char back = s.back (); const char * data = s.data (); char * mutableData = s.data (); const char * cStr = s.c_str (); for (std::string::iterator it = s.begin (); it != s.end (); ++it) { std::cout << *it; } for (char ch : s) { std::cout << ch; } for (std::string::reverse_iterator rit = s.rbegin (); rit != s.rend (); ++rit) { std::cout << *rit; }
5. 字符串修改 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 std::string s = "Hello, world!" ; s.insert (5 , " " ); s.insert (5 , 3 , '!' ); s.insert (5 , "abc" , 2 ); s.insert (5 , s2); s.insert (s.begin () + 5 , 'x' ); s.erase (5 , 1 ); s.erase (s.begin () + 5 ); s.erase (s.begin () + 5 , s.end ()); s.replace (0 , 5 , "Hi" ); s.replace (s.begin (), s.begin () + 5 , "Hello" ); s.replace (0 , 5 , 3 , 'x' ); s.replace (0 , 5 , "abc" , 2 ); s.clear (); bool isEmpty = s.empty (); s.resize (10 ); s.resize (5 , 'x' );
6. 字符串搜索 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 std::string s = "Hello, world!" ; size_t pos1 = s.find ('l' ); size_t pos2 = s.rfind ('l' ); size_t pos3 = s.find_first_of ("aeiou" ); size_t pos4 = s.find_last_of ("aeiou" ); size_t pos5 = s.find_first_not_of ("Helo," ); size_t pos6 = s.find ("world" ); size_t pos7 = s.find ('l' , 3 ); if (s.find ("test" ) == std::string::npos) { std::cout << "Substring not found" << std::endl; } size_t boyerMooreSearch (const std::string& haystack, const std::string& needle) { if (needle.empty ()) return 0 ; if (haystack.size () < needle.size ()) return std::string::npos; return haystack.find (needle); }
std::string的性能优化 1. 内存管理优化 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 std::string s; s.reserve (1000 ); std::string createString () { std::string s = "Hello, world!" ; return s; } std::string s1 = "Hello" ; std::string s2 = std::move (s1); std::string shortStr = "Hello" ; std::string longStr = "This is a long string that exceeds the SSO buffer size." ; std::string buildString (const std::vector<std::string>& parts) { size_t totalLength = 0 ; for (const auto & part : parts) { totalLength += part.size (); } std::string result; result.reserve (totalLength); for (const auto & part : parts) { result += part; } return result; }
2. 字符串操作优化 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 #include <sstream> std::string buildString (int x, double y, const std::string& z) { std::ostringstream oss; oss << "x: " << x << ", y: " << y << ", z: " << z; return oss.str (); } std::string s = "Hello, world!" ; size_t len = s.size (); for (size_t i = 0 ; i < len; ++i) { } void processString (std::string s) { } std::string s = "Hello" ; processString (std::move (s)); std::string s1 = "Hello" ; std::string s2 = "World" ; if (s1. compare (0 , 2 , s2, 0 , 2 ) == 0 ) { } std::string s = "Hello" ; char * data = &s[0 ];
std::string的现代C++特性 1. 移动语义(C++11+) 1 2 3 4 5 6 7 8 9 10 11 12 13 std::string createLargeString () { std::string s (1000000 , 'x' ) ; return s; } std::string s1 = createLargeString (); std::string s2; s2 = createLargeString (); std::string s3 = "Hello" ; std::string s4 = std::move (s3);
2. 字符串视图(C++17+) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 #include <string_view> std::string s = "Hello, world!" ; std::string_view sv (s) ; std::string_view sv2 = "Hello" ; void processStringView (std::string_view sv) { std::cout << "Length: " << sv.length () << std::endl; std::cout << "Substring: " << sv.substr (0 , 5 ) << std::endl; } std::string s5 (sv) ;
3. 字符串转换(C++11+) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 int i = 42 ;double d = 3.14 ;std::string s1 = std::to_string (i); std::string s2 = std::to_string (d); std::string s3 = "42" ; std::string s4 = "3.14" ; int i2 = std::stoi (s3);double d2 = std::stod (s4);try { int i3 = std::stoi ("abc" ); int i4 = std::stoi ("12345678901234567890" ); } catch (const std::exception& e) { std::cout << "Conversion error: " << e.what () << std::endl; } std::string hexStr = "4A" ; int hexVal = std::stoi (hexStr, nullptr , 16 ); #include <charconv> std::string numStr = "42" ; int value;auto [ptr, ec] = std::from_chars (numStr.data (), numStr.data () + numStr.size (), value);if (ec == std::errc ()) { std::cout << "Value: " << value << std::endl; }
4. 字符串格式化(C++20+) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 #include <format> std::string s1 = std::format("Hello, {}!" , "world" ); std::string s2 = std::format("The answer is {}." , 42 ); std::string s3 = std::format("Pi is approximately {:.2f}." , 3.14159 ); std::string s4 = std::format("{:<10} {:>10}" , "Left" , "Right" ); std::string s5 = std::format("{:^10}" , "Centered" ); std::string s6 = std::format("Decimal: {}, Hex: {:x}, Octal: {:o}" , 42 , 42 , 42 ); struct Point { int x, y; }; template <>struct std ::formatter<Point> { constexpr auto parse (std::format_parse_context& ctx) { return ctx.begin (); } template <typename FormatContext> auto format (const Point& p, FormatContext& ctx) { return std::format_to(ctx.out (), "({}, {})" , p.x, p.y); } }; Point p = {1 , 2 }; std::string s7 = std::format("Point: {}" , p);
std::string的最佳实践 1. 避免常见错误 错误 原因 解决方案 缓冲区溢出 无边界检查的访问 使用at()或检查索引 空指针解引用 使用data()或c_str()返回的空字符串指针 检查字符串是否为空 内存泄漏 无(string自动管理内存) - 性能问题 频繁的重新分配 使用reserve()预分配空间 不必要的复制 按值传递大字符串 按const&传递或使用string_view
2. 性能最佳实践 实践 原因 示例 预分配空间 避免频繁的重新分配 s.reserve(1000);使用移动语义 避免不必要的复制 std::string s2 = std::move(s1);利用返回值优化 避免函数返回时的复制 return std::string("Hello");使用string_view 避免小字符串的复制 void process(std::string_view sv);避免频繁的拼接 减少内存分配 使用reserve()后拼接 选择合适的比较方法 提高比较效率 使用compare()进行前缀比较
3. 代码风格最佳实践 实践 原因 示例 使用std::string 现代C++推荐 std::string s = "Hello";避免使用C风格字符串 安全、便捷 使用std::string而非char* 使用auto推断类型 代码简洁 auto s = std::string("Hello");合理使用异常 处理错误 使用try-catch处理at()的异常 遵守命名约定 代码可读性 使用驼峰命名法
std::string的输入和输出 1. 字符串输入 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 std::string name; std::cout << "Enter your name: " ; std::cin >> name; std::cout << "Your name is: " << name << std::endl; std::cout << "Enter a line: " ; std::cin.ignore (); std::getline (std::cin, name); std::cout << "You entered: " << name << std::endl; std::vector<std::string> lines; std::string line; std::cout << "Enter lines (Ctrl+Z to end): " << std::endl; while (std::getline (std::cin, line)) { lines.push_back (line); } bool readLine (std::string& line) { return static_cast <bool >(std::getline (std::cin, line)); }
2. 字符串输出 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 std::string s = "Hello, world!" ; std::cout << s << std::endl; std::ofstream file ("output.txt" ) ;if (file.is_open ()) { file << s << std::endl; file.close (); } std::ostringstream oss; oss << "Name: " << s << ", Length: " << s.size (); std::string output = oss.str (); std::cout << output << std::endl; std::cout << std::format("String: '{}', Length: {}" , s, s.size ()) << std::endl; void printLargeString (const std::string& s) { std::cout.write (s.data (), s.size ()); std::cout << std::endl; }
std::string的性能分析 1. 基准测试 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 #include <benchmark/benchmark.h> static void BM_StringCreation (benchmark::State& state) { for (auto _ : state) { std::string s (100 , 'x' ) ; benchmark::DoNotOptimize (s); } } static void BM_StringConcatenation (benchmark::State& state) { std::string s; s.reserve (1000 ); for (auto _ : state) { s += "Hello, world!" ; benchmark::DoNotOptimize (s); } } static void BM_StringCopy (benchmark::State& state) { std::string s (1000 , 'x' ) ; for (auto _ : state) { std::string copy = s; benchmark::DoNotOptimize (copy); } } BENCHMARK (BM_StringCreation);BENCHMARK (BM_StringConcatenation);BENCHMARK (BM_StringCopy);BENCHMARK_MAIN ();
2. 性能对比 操作 std::string C风格字符串 备注 创建 O(n) O(n) std::string有SSO优化 复制 O(n) O(n) std::string使用深复制 拼接 O(n) O(n) std::string自动管理内存 查找 O(n) O(n) 实现相似 访问 O(1) O(1) 相同 比较 O(n) O(n) 实现相似
总结 std::string是C++标准库提供的功能强大、安全可靠的字符串类。它具有以下优点:
安全 :自动管理内存,避免缓冲区溢出便捷 :丰富的成员函数,支持各种字符串操作高效 :小字符串优化(SSO)、移动语义等性能特性现代 :支持C++11+的各种特性,如移动语义、字符串视图等兼容 :可以与C风格字符串互操作在现代C++编程中,应优先使用std::string而非C风格字符串,除非有特殊的性能要求或需要与C库交互。通过合理使用std::string的各种特性和最佳实践,可以编写更加安全、高效、可维护的字符串处理代码。
字符串流 字符串输入流(istringstream) 1 2 3 4 5 6 7 8 9 10 11 #include <sstream> std::string data = "123 45.67 Hello" ; std::istringstream iss (data) ;int i;double d;std::string s; iss >> i >> d >> s; std::cout << "i: " << i << ", d: " << d << ", s: " << s << std::endl;
字符串输出流(ostringstream) 1 2 3 4 5 6 7 8 9 10 #include <sstream> std::ostringstream oss; int i = 123 ;double d = 45.67 ;std::string s = "Hello" ; oss << "i: " << i << ", d: " << d << ", s: " << s; std::string result = oss.str (); std::cout << result << std::endl;
字符串流的应用 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 int number = 123 ;std::ostringstream oss; oss << number; std::string numberStr = oss.str (); std::string numberStr = "123" ; std::istringstream iss (numberStr) ;int number;iss >> number; std::ostringstream oss; oss << std::fixed << std::setprecision (2 ); oss << "Pi is approximately " << 3.14159 ; std::string message = oss.str ();
宽字符串 宽字符和宽字符串 1 2 3 4 5 6 7 8 9 10 11 wchar_t wc = L'A' ;const wchar_t * wstr = L"Hello, world!" ;std::wcout << L"Enter your name: " ; std::wstring wname; std::wcin >> wname; std::wcout << L"Hello, " << wname << L"!" << std::endl;
wstring 类 1 2 3 4 5 6 7 8 std::wstring ws1 = L"Hello" ; std::wstring ws2 (5 , L'a' ) ;ws1 += L" world" ; size_t len = ws1.l ength();std::wcout << ws1 << std::endl;
Unicode 字符串 UTF-8 字符串 1 2 3 4 5 6 7 8 const char * utf8Str = u8"Hello, 世界!" ;std::string utf8String = u8"Hello, 世界!" ; std::cout << utf8String << std::endl;
UTF-16 字符串 1 2 3 4 5 const char16_t * utf16Str = u"Hello, 世界!" ;std::u16string utf16String = u"Hello, 世界!" ;
UTF-32 字符串 1 2 3 4 5 const char32_t * utf32Str = U"Hello, 世界!" ;std::u32string utf32String = U"Hello, 世界!" ;
字符串的最佳实践 1. 优先使用 std::string 安全性 :std::string 自动管理内存,避免缓冲区溢出便捷性 :std::string 提供了丰富的成员函数可读性 :std::string 的代码更易读、易维护兼容性 :std::string 可以与 C 风格字符串互操作2. 避免缓冲区溢出 1 2 3 4 5 6 7 char buffer[10 ];std::cin >> buffer; std::string buffer; std::cin >> buffer;
3. 字符串连接 1 2 3 4 5 6 7 8 9 10 11 12 13 14 std::string result; result = "Hello" ; result += " " ; result += "world" ; result += "!" ; std::string result = "Hello" + std::string (" " ) + "world" + "!" ; std::ostringstream oss; oss << "Hello" << " " << "world" << "!" ; std::string result = oss.str ();
4. 字符串比较 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 const char * str1 = "Hello" ;const char * str2 = "Hello" ;if (str1 == str2) { } if (strcmp (str1, str2) == 0 ) { } std::string s1 = "Hello" ; std::string s2 = "Hello" ; if (s1 == s2) { }
5. 字符串转换 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 int number = 123 ;std::string str = std::to_string (number); std::ostringstream oss; oss << number; std::string str = oss.str (); std::string str = "123" ; int number = std::stoi (str);std::istringstream iss (str) ;int number;iss >> number;
C++11+字符串处理新特性 字符串视图(std::string_view,C++17+) std::string_view是C++17引入的一个非所有权字符串视图,用于提供对字符串的高效访问,避免不必要的字符串复制:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 #include <string_view> #include <string> std::string s = "Hello, world!" ; std::string_view sv (s) ;std::cout << sv << std::endl; const char * cstr = "Hello" ;std::string_view sv2 (cstr) ;std::string_view sv3 (s, 0 , 5 ) ; std::cout << "Length: " << sv.length () << std::endl; std::cout << "Empty: " << sv.empty () << std::endl; std::cout << "Substring: " << sv.substr (7 , 5 ) << std::endl; size_t pos = sv.find ("world" );if (pos != std::string_view::npos) { std::cout << "Found 'world' at position: " << pos << std::endl; } if (sv.starts_with ("Hello" )) { std::cout << "Starts with 'Hello'" << std::endl; } if (sv.ends_with ("!" )) { std::cout << "Ends with '!'" << std::endl; }
std::string的新方法(C++11+) C++11新方法 1 2 3 4 5 6 7 8 std::string s1 = "Hello" ; std::string s2 = std::move (s1); std::string s; s.emplace_back ('H' ); s.append ("ello" );
C++14新方法 1 2 3 4 5 6 using namespace std::string_literals;std::string s = "Hello" s; std::string raw = R"(Raw string with "quotes" and \backslashes)" s;
C++20新方法 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 std::string s = "Hello, world!" ; if (s.starts_with ("Hello" )) { std::cout << "Starts with 'Hello'" << std::endl; } if (s.ends_with ("!" )) { std::cout << "Ends with '!'" << std::endl; } if (s.starts_with ({'H' , 'e' })) { std::cout << "Starts with 'He'" << std::endl; } std::vector<char > chars = {'H' , 'e' , 'l' , 'l' , 'o' }; std::string s2 (chars.begin(), chars.end()) ;
C++23新方法 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 std::string s = "Hello, world!" ; if (s.contains ("world" )) { std::cout << "Contains 'world'" << std::endl; } if (s.contains ('o' )) { std::cout << "Contains 'o'" << std::endl; } std::string s; s.resize_and_overwrite (10 , [](char * buffer, size_t size) -> size_t { std::memcpy (buffer, "Hello" , 5 ); return 5 ; }); std::cout << s << std::endl;
正则表达式(C++11+) C++11引入了std::regex库,用于字符串的模式匹配和替换:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 #include <regex> #include <string> std::string s = "Hello, world!" ; std::regex pattern ("world" ) ;if (std::regex_search (s, pattern)) { std::cout << "Found 'world'" << std::endl; } std::string date = "2023-12-25" ; std::regex datePattern (R"((\d{4})-(\d{2})-(\d{2}))" ) ;std::smatch matches; if (std::regex_search (date, matches, datePattern)) { std::cout << "Year: " << matches[1 ] << std::endl; std::cout << "Month: " << matches[2 ] << std::endl; std::cout << "Day: " << matches[3 ] << std::endl; } std::string text = "Hello, world! Hello, C++!" ; std::regex replacePattern ("Hello" ) ;std::string result = std::regex_replace (text, replacePattern, "Hi" ); std::cout << result << std::endl; std::regex caseInsensitivePattern ("hello" , std::regex::icase) ;if (std::regex_search (s, caseInsensitivePattern)) { std::cout << "Found 'hello' (case insensitive)" << std::endl; }
C++20引入了std::format库,提供了一种类型安全、灵活的字符串格式化方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 #include <format> #include <string> std::string message = std::format("Hello, {}!" , "world" ); std::cout << message << std::endl; std::string info = std::format("Name: {}, Age: {}" , "Alice" , 30 ); std::cout << info << std::endl; std::string number = std::format("Pi is approximately {:.2f}" , 3.14159 ); std::cout << number << std::endl; std::string aligned = std::format("{:<10} {:>10}" , "Left" , "Right" ); std::cout << aligned << std::endl; std::string hex = std::format("Decimal: {}, Hex: {:x}, Octal: {:o}" , 42 , 42 , 42 ); std::cout << hex << std::endl;
类型安全 :相比printf,std::format是类型安全的灵活性 :支持位置参数和命名参数可读性 :格式化字符串更清晰易读性能 :性能与printf相当或更好扩展性 :支持自定义类型的格式化C++23新特性:print库 C++23引入了std::print和std::println函数,提供了一种更方便的字符串输出方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 #include <print> std::print ("Hello, {}}!" , "world" ); std::println ("Hello, {}!" , "world" ); std::println ("Name: {}, Age: {}" , "Alice" , 30 ); std::println ("Pi is approximately {:.2f}" , 3.14159 );
Unicode字符串处理进阶 Unicode码点和代码单元 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 #include <cuchar> #include <string> void printUtf8CodePoints (const std::string& utf8Str) { const char * p = utf8Str.data (); const char * end = p + utf8Str.size (); while (p < end) { char32_t codePoint; size_t len = mbrtoc32 (&codePoint, p, end - p, nullptr ); if (len == static_cast <size_t >(-1 ) || len == static_cast <size_t >(-2 )) { break ; } std::cout << "Code point: U+" << std::hex << codePoint << std::endl; p += len; } } std::string utf8Str = u8"Hello, 世界!" ; printUtf8CodePoints (utf8Str);
Unicode字符串的转换 1 2 3 4 5 6 7 8 9 10 11 12 13 #include <codecvt> #include <locale> std::wstring_convert<std::codecvt_utf8_utf16<wchar_t >> converter; std::string utf8Str = u8"Hello, 世界!" ; std::wstring utf16Str = converter.from_bytes (utf8Str); std::string utf8Str2 = converter.to_bytes (utf16Str);
常见错误和陷阱 1. 空指针解引用 1 2 3 4 5 6 7 8 const char * str = nullptr ;size_t len = strlen (str); if (str != nullptr ) { size_t len = strlen (str); }
2. 缓冲区溢出 1 2 3 4 5 6 7 8 9 10 char buffer[10 ];strcpy (buffer, "This string is too long" ); strncpy (buffer, "This string is too long" , sizeof (buffer) - 1 );buffer[sizeof (buffer) - 1 ] = '\0' ; std::string buffer = "This string is too long" ;
3. 忘记 null 终止符 1 2 3 4 5 6 7 8 9 10 11 12 13 14 char buffer[10 ];for (int i = 0 ; i < 10 ; i++) { buffer[i] = 'a' ; } std::cout << buffer << std::endl; char buffer[11 ]; for (int i = 0 ; i < 10 ; i++) { buffer[i] = 'a' ; } buffer[10 ] = '\0' ; std::cout << buffer << std::endl;
4. 字符串字面量的修改 1 2 3 4 5 6 7 char * str = "Hello" ;str[0 ] = 'h' ; char str[] = "Hello" ;str[0 ] = 'h' ;
5. 混合使用 C 风格字符串和 std::string 1 2 3 4 5 6 7 8 9 std::string s = "Hello" ; const char * cstr = s.c_str ();std::string s = "Hello" ; std::string copy = s; const char * cstr = copy.c_str ();
小结 本章介绍了C++中的字符和字符串处理,包括:
字符类型 :char、wchar_t、char16_t、char32_tC风格字符串 :字符数组、字符串字面量、字符串操作函数std::string 类 :C++标准库提供的字符串类,具有丰富的成员函数字符串流 :istringstream 和 ostringstream,用于字符串的输入输出宽字符串 :wchar_t 和 std::wstringUnicode 字符串 :UTF-8、UTF-16、UTF-32 字符串字符串的最佳实践 :优先使用 std::string,避免缓冲区溢出等常见错误和陷阱 :空指针解引用、缓冲区溢出、忘记 null 终止符等字符串是C++程序中最常用的数据类型之一,掌握好字符串的处理方法对于编写高效、可靠的程序至关重要。在实际编程中,应优先使用 std::string 类,它提供了更安全、更便捷的字符串操作方式。同时,也要了解 C 风格字符串的基本概念和操作函数,因为在一些遗留代码或与 C 库交互的场景中仍然会用到。
在后续章节中,我们将学习更高级的C++特性,如内存模型、面向对象编程、模板等,这些特性将与字符串处理结合使用,帮助我们构建更复杂、更强大的程序。