第23章 字符串与流处理
字符串处理深度解析
std::string的核心特性
std::string是C++标准库中最常用的字符串类,它提供了丰富的字符串操作功能。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| #include <string> #include <iostream>
int main() { std::string s1; std::string s2(5, 'a'); std::string s3("hello"); std::string s4(s3); std::string s5(s3, 1, 3); std::string s6({ 'w', 'o', 'r', 'l', 'd' }); char c1 = s3[0]; char c2 = s3.at(0); s1 = "Hello, World!"; s1 += " Test"; s1.append(" More"); s1.insert(7, "Beautiful "); s1.erase(7, 9); s1.replace(7, 5, "Wonderful"); std::string s7 = s3 + " " + s6; size_t length = s7.size(); bool empty = s1.empty(); std::string s8 = s7.substr(6, 5); size_t pos = s7.find("World"); return 0; }
|
std::string的内存管理
std::string的内存管理由其内部的分配器负责,通常采用小字符串优化(SSO)技术。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| #include <string> #include <iostream>
int main() { std::string small("Hello"); std::string large("This is a much longer string that will not fit in the small string optimization buffer"); std::string s; std::cout << "Capacity: " << s.capacity() << std::endl; s.reserve(100); std::cout << "Capacity after reserve: " << s.capacity() << std::endl; s.shrink_to_fit(); std::cout << "Capacity after shrink_to_fit: " << s.capacity() << std::endl; return 0; }
|
std::string_view(C++17)
std::string_view是C++17引入的非拥有式字符串视图,它提供了对字符串的只读访问,避免了不必要的复制。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
| #include <string_view> #include <string> #include <iostream>
void print_string_view(std::string_view sv) { std::cout << "Length: " << sv.length() << ", Content: " << sv << std::endl; }
int main() { std::string s = "Hello, World!"; std::string_view sv1(s); std::string_view sv2(s, 7, 5); const char* cstr = "Test String"; std::string_view sv3(cstr); std::string_view sv4(cstr, 4); std::string_view sv5 = "Hello"; std::string_view sv6 = sv5.substr(1, 3); bool starts_with = sv5.starts_with("He"); bool ends_with = sv5.ends_with("lo"); size_t pos = sv5.find("ll"); print_string_view(sv1); print_string_view("Direct literal"); return 0; }
|
字符串的编码处理
C++11引入了UTF-8字符串字面量支持,C++20增强了字符串的编码处理能力。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| #include <string> #include <iostream>
int main() { std::string utf8_str = u8"你好,世界!"; std::string raw_str = R"( Line 1 Line 2 Line 3 )"; std::wstring wide_str = L"Wide string"; return 0; }
|
输入输出流系统
流的层次结构
C++的I/O系统基于流的概念,主要类层次结构如下:
1 2 3 4 5 6 7 8 9 10 11 12 13
| std::ios_base (抽象基类) ├── std::ios (提供格式化和状态管理) ├── std::istream (输入流基类) │ ├── std::ifstream (文件输入流) │ ├── std::istringstream (字符串输入流) │ └── std::cin (标准输入流) └── std::ostream (输出流基类) ├── std::ofstream (文件输出流) ├── std::ostringstream (字符串输出流) └── std::cout, std::cerr, std::clog (标准输出流) └── std::iostream (输入输出流基类) ├── std::fstream (文件输入输出流) └── std::stringstream (字符串输入输出流)
|
标准输入输出流
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
| #include <iostream> #include <string>
int main() { std::cout << "Hello, World!" << std::endl; std::cout << "Number: " << 42 << ", Float: " << 3.14 << std::endl; std::cerr << "Error message" << std::endl; std::clog << "Log message" << std::endl; int number; std::cout << "Enter a number: "; std::cin >> number; std::cout << "You entered: " << number << std::endl; std::string name; std::cout << "Enter your name: "; std::cin >> name; std::cout << "Hello, " << name << "!" << std::endl; std::string line; std::cout << "Enter a line: "; std::cin.ignore(); std::getline(std::cin, line); std::cout << "You entered: " << line << std::endl; return 0; }
|
文件流操作
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
| #include <fstream> #include <string> #include <iostream>
int main() { std::ofstream out_file("example.txt"); if (out_file.is_open()) { out_file << "Hello, File!" << std::endl; out_file << "Number: " << 42 << std::endl; out_file.close(); } std::ifstream in_file("example.txt"); if (in_file.is_open()) { std::string line; while (std::getline(in_file, line)) { std::cout << line << std::endl; } in_file.close(); } std::ofstream bin_out("data.bin", std::ios::binary); int data[] = {1, 2, 3, 4, 5}; bin_out.write(reinterpret_cast<const char*>(data), sizeof(data)); bin_out.close(); std::ifstream bin_in("data.bin", std::ios::binary); int read_data[5]; bin_in.read(reinterpret_cast<char*>(read_data), sizeof(read_data)); bin_in.close(); return 0; }
|
字符串流操作
字符串流是在内存中进行字符串和其他类型之间转换的强大工具。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
| #include <sstream> #include <string> #include <iostream>
int main() { std::ostringstream oss; oss << "Name: " << "John" << ", Age: " << 30; std::string result = oss.str(); std::cout << result << std::endl; std::string input = "123 45.67 Hello"; std::istringstream iss(input); int i; double d; std::string s; iss >> i >> d >> s; std::cout << "Integer: " << i << std::endl; std::cout << "Double: " << d << std::endl; std::cout << "String: " << s << std::endl; std::stringstream ss; ss << "Test " << 123; std::string intermediate = ss.str(); std::cout << "Intermediate: " << intermediate << std::endl; ss.str(""); ss << "New value: " << 456; std::cout << "New: " << ss.str() << std::endl; return 0; }
|
格式化输出
传统格式化方法
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| #include <iostream> #include <iomanip>
int main() { std::cout << "Decimal: " << 42 << std::endl; std::cout << "Octal: " << std::oct << 42 << std::endl; std::cout << "Hexadecimal: " << std::hex << std::uppercase << 42 << std::endl; std::cout << std::dec; double pi = 3.141592653589793; std::cout << "Default: " << pi << std::endl; std::cout << "Fixed: " << std::fixed << std::setprecision(2) << pi << std::endl; std::cout << "Scientific: " << std::scientific << std::setprecision(4) << pi << std::endl; std::cout << std::setw(10) << std::left << "Left" << std::setw(10) << std::right << "Right" << std::endl; std::cout << std::setw(10) << std::setfill('*') << 42 << std::endl; return 0; }
|
C++20格式化库
C++20引入了新的格式化库,提供了更现代、更安全的格式化方法。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| #include <format> #include <string> #include <iostream>
int main() { std::string s1 = std::format("Hello, {}!", "World"); std::cout << s1 << std::endl; std::string s2 = std::format("Name: {}, Age: {}", "John", 30); std::cout << s2 << std::endl; int value = 42; std::string s3 = std::format("Decimal: {}, Octal: {:o}, Hex: {:X}", value, value, value); std::cout << s3 << std::endl; double pi = 3.141592653589793; std::string s4 = std::format("Pi: {:.2f}, Scientific: {:.4e}", pi, pi); std::cout << s4 << std::endl; std::string s5 = std::format("{:<10} {:>10}", "Left", "Right"); std::cout << s5 << std::endl; std::format_to(std::ostream_iterator<char>(std::cout), "Formatted: {}", 42); std::cout << std::endl; return 0; }
|
高级字符串操作
字符串算法
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
| #include <string> #include <algorithm> #include <cctype> #include <iostream>
int main() { std::string s = "Hello, World!"; std::transform(s.begin(), s.end(), s.begin(), [](unsigned char c) { return std::toupper(c); }); std::cout << "Uppercase: " << s << std::endl; std::transform(s.begin(), s.end(), s.begin(), [](unsigned char c) { return std::tolower(c); }); std::cout << "Lowercase: " << s << std::endl; size_t pos = s.find("world"); if (pos != std::string::npos) { s.replace(pos, 5, "C++"); } std::cout << "After replacement: " << s << std::endl; std::string s2 = " Hello World "; auto start = s2.find_first_not_of(" \t\n\r"); auto end = s2.find_last_not_of(" \t\n\r"); if (start != std::string::npos && end != std::string::npos) { s2 = s2.substr(start, end - start + 1); } else { s2.clear(); } std::cout << "Trimmed: '" << s2 << "'" << std::endl; return 0; }
|
字符串分割
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| #include <string> #include <vector> #include <sstream> #include <iostream>
std::vector<std::string> split(const std::string& s, char delimiter) { std::vector<std::string> tokens; std::string token; std::istringstream tokenStream(s); while (std::getline(tokenStream, token, delimiter)) { tokens.push_back(token); } return tokens; }
int main() { std::string s = "apple,banana,orange,grape"; std::vector<std::string> fruits = split(s, ','); for (const auto& fruit : fruits) { std::cout << fruit << std::endl; } return 0; }
|
正则表达式
C++11引入了正则表达式库,提供了强大的字符串匹配和处理能力。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
| #include <regex> #include <string> #include <iostream>
int main() { std::string s = "Hello, my email is user@example.com"; std::regex email_regex(R"([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})" ); std::smatch match; if (std::regex_search(s, match, email_regex)) { std::cout << "Found email: " << match.str() << std::endl; } std::string phone = "My phone number is 123-456-7890"; std::regex phone_regex(R"(\d{3}-\d{3}-\d{4})" ); std::string replaced = std::regex_replace(phone, phone_regex, "***-***-****"); std::cout << "Original: " << phone << std::endl; std::cout << "Replaced: " << replaced << std::endl; std::string numbers = "123-456-7890, 987-654-3210, 555-1234"; std::regex number_regex(R"(\d{3}[-]?\d{3}[-]?\d{4})" ); auto begin = std::sregex_iterator(numbers.begin(), numbers.end(), number_regex); auto end = std::sregex_iterator(); for (std::sregex_iterator i = begin; i != end; ++i) { std::smatch match = *i; std::cout << "Found number: " << match.str() << std::endl; } return 0; }
|
流的状态管理
流状态标志
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| #include <iostream> #include <fstream>
int main() { std::ifstream file("non_existent_file.txt"); if (!file) { std::cout << "File failed to open" << std::endl; } if (file.fail()) { std::cout << "Failbit set" << std::endl; } if (file.bad()) { std::cout << "Badbit set" << std::endl; } if (file.eof()) { std::cout << "Eofbit set" << std::endl; } file.clear(); int value; while (file >> value) { } if (file.eof()) { std::cout << "End of file reached" << std::endl; } else if (file.fail()) { std::cout << "Failed to read value" << std::endl; } return 0; }
|
流的异常处理
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| #include <iostream> #include <fstream>
int main() { std::ifstream file; file.exceptions(std::ifstream::failbit | std::ifstream::badbit); try { file.open("non_existent_file.txt"); int value; file >> value; } catch (const std::ifstream::failure& e) { std::cout << "Exception: " << e.what() << std::endl; } return 0; }
|
性能优化策略
字符串操作性能优化
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
| #include <string> #include <iostream>
int main() { std::string s; s.reserve(1000); for (int i = 0; i < 1000; ++i) { s += "test"; } std::string s1, s2; s1.reserve(4000); s2.reserve(4000); for (int i = 0; i < 1000; ++i) { s1.append("test"); s2 = s2 + "test"; } std::string large_string(10000, 'x'); std::string s3 = std::move(large_string); void process_string(std::string_view sv); return 0; }
void process_string(std::string_view sv) { }
|
流操作性能优化
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| #include <iostream> #include <fstream> #include <string> #include <chrono>
int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(nullptr); std::ifstream file("large_file.txt"); std::string buffer((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>()); std::ofstream bin_file("data.bin", std::ios::binary); int data[1000]; bin_file.write(reinterpret_cast<const char*>(data), sizeof(data)); std::cout << std::unitbuf; std::cout << std::nounitbuf; return 0; }
|
内存映射文件
对于大文件操作,内存映射文件可以提供更好的性能。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| #include <iostream> #include <fstream> #include <string> #include <boost/iostreams/device/mapped_file.hpp>
int main() { boost::iostreams::mapped_file_source file; file.open("large_file.txt"); if (file.is_open()) { const char* data = file.data(); size_t size = file.size(); std::string content(data, size); std::cout << "File size: " << size << " bytes" << std::endl; file.close(); } return 0; }
|
实际应用案例
案例1:配置文件解析
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
| #include <string> #include <fstream> #include <sstream> #include <map> #include <iostream>
class ConfigParser { public: bool load(const std::string& filename) { std::ifstream file(filename); if (!file) { return false; } std::string line; while (std::getline(file, line)) { if (line.empty() || line[0] == '#') { continue; } std::size_t pos = line.find('='); if (pos != std::string::npos) { std::string key = line.substr(0, pos); std::string value = line.substr(pos + 1); trim(key); trim(value); config_[key] = value; } } return true; } std::string get(const std::string& key, const std::string& default_value = "") const { auto it = config_.find(key); return (it != config_.end()) ? it->second : default_value; } int get_int(const std::string& key, int default_value = 0) const { auto it = config_.find(key); if (it != config_.end()) { return std::stoi(it->second); } return default_value; } double get_double(const std::string& key, double default_value = 0.0) const { auto it = config_.find(key); if (it != config_.end()) { return std::stod(it->second); } return default_value; } private: void trim(std::string& s) { size_t start = s.find_first_not_of(" \t\n\r"); if (start != std::string::npos) { s = s.substr(start); } else { s.clear(); return; } size_t end = s.find_last_not_of(" \t\n\r"); if (end != std::string::npos) { s = s.substr(0, end + 1); } } std::map<std::string, std::string> config_; };
int main() { ConfigParser config; if (config.load("config.txt")) { std::string name = config.get("name", "Default"); int port = config.get_int("port", 8080); double timeout = config.get_double("timeout", 30.0); std::cout << "Name: " << name << std::endl; std::cout << "Port: " << port << std::endl; std::cout << "Timeout: " << timeout << std::endl; } else { std::cout << "Failed to load config file" << std::endl; } return 0; }
|
案例2:日志系统
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
| #include <string> #include <fstream> #include <iostream> #include <chrono> #include <iomanip>
class Logger { public: enum Level { DEBUG, INFO, WARN, ERROR, FATAL }; Logger(const std::string& filename) : file_(filename, std::ios::app) { if (!file_) { throw std::runtime_error("Failed to open log file"); } } ~Logger() { if (file_.is_open()) { file_.close(); } } void log(Level level, const std::string& message) { std::string level_str = levelToString(level); std::string timestamp = getTimestamp(); std::string log_message = "[" + timestamp + "] [" + level_str + "] " + message; file_ << log_message << std::endl; std::cout << log_message << std::endl; } void debug(const std::string& message) { log(DEBUG, message); } void info(const std::string& message) { log(INFO, message); } void warn(const std::string& message) { log(WARN, message); } void error(const std::string& message) { log(ERROR, message); } void fatal(const std::string& message) { log(FATAL, message); } private: std::ofstream file_; std::string levelToString(Level level) { switch (level) { case DEBUG: return "DEBUG"; case INFO: return "INFO"; case WARN: return "WARN"; case ERROR: return "ERROR"; case FATAL: return "FATAL"; default: return "UNKNOWN"; } } std::string getTimestamp() { auto now = std::chrono::system_clock::now(); auto now_c = std::chrono::system_clock::to_time_t(now); std::stringstream ss; ss << std::put_time(std::localtime(&now_c), "%Y-%m-%d %H:%M:%S"); return ss.str(); } };
int main() { try { Logger logger("application.log"); logger.info("Application started"); logger.debug("Initializing components"); logger.warn("Configuration file not found, using defaults"); logger.error("Failed to connect to database"); logger.fatal("Critical error, shutting down"); } catch (const std::exception& e) { std::cerr << "Logger error: " << e.what() << std::endl; } return 0; }
|
案例3:CSV文件解析
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
| #include <string> #include <fstream> #include <sstream> #include <vector> #include <iostream>
class CSVReader { public: CSVReader(const std::string& filename, char delimiter = ',') : filename_(filename), delimiter_(delimiter) {} std::vector<std::vector<std::string>> read() { std::vector<std::vector<std::string>> data; std::ifstream file(filename_); if (!file.is_open()) { throw std::runtime_error("Failed to open CSV file"); } std::string line; while (std::getline(file, line)) { std::vector<std::string> row = parseLine(line); data.push_back(row); } file.close(); return data; } private: std::string filename_; char delimiter_; std::vector<std::string> parseLine(const std::string& line) { std::vector<std::string> tokens; std::string token; std::istringstream tokenStream(line); bool inQuotes = false; std::string currentToken; for (char c : line) { if (c == '"') { inQuotes = !inQuotes; } else if (c == delimiter_ && !inQuotes) { tokens.push_back(currentToken); currentToken.clear(); } else { currentToken += c; } } tokens.push_back(currentToken); return tokens; } };
int main() { try { CSVReader reader("data.csv"); std::vector<std::vector<std::string>> data = reader.read(); for (const auto& row : data) { for (const auto& cell : row) { std::cout << cell << "\t"; } std::cout << std::endl; } } catch (const std::exception& e) { std::cerr << "Error: " << e.what() << std::endl; } return 0; }
|
最佳实践与注意事项
1. 字符串处理最佳实践
- 使用
std::string而非C风格字符串:std::string提供了更好的安全性和易用性 - 对于只读操作,优先使用
std::string_view:避免不必要的复制,提高性能 - 预分配字符串容量:对于需要频繁修改的字符串,使用
reserve()预分配足够的空间 - 使用移动语义:对于大型字符串,使用
std::move避免复制开销 - 注意字符串编码:处理多语言文本时,注意UTF-8编码的正确处理
- 使用现代字符串算法:结合
<algorithm>库提供的算法进行字符串操作
2. 流操作最佳实践
- 管理流的生命周期:确保文件流正确打开和关闭
- 检查流状态:在读写操作后检查流的状态,确保操作成功
- 使用RAII管理流:通过对象的构造和析构管理流资源
- 优化I/O性能:对于大量I/O操作,考虑使用缓冲区和批量读写
- 禁用同步:对于需要高性能的控制台I/O,考虑禁用与C I/O的同步
- 使用字符串流进行类型转换:
std::stringstream是类型转换的安全选择
3. 性能优化注意事项
- 测量而非猜测:使用性能分析工具识别真正的瓶颈
- 避免频繁的小字符串分配:使用
std::string::reserve和移动语义 - 批量处理I/O:减少系统调用次数,提高I/O效率
- 考虑内存映射:对于大文件操作,内存映射可能提供更好的性能
- 使用适当的字符串视图类型:根据是否需要修改,选择
std::string或std::string_view - 注意正则表达式的性能:复杂的正则表达式可能导致性能问题,考虑替代方案
4. 安全性注意事项
- 避免缓冲区溢出:使用
std::string的at()方法进行带边界检查的访问 - 正确处理异常:对于文件操作,适当处理可能的异常
- 验证输入:对于来自外部的字符串输入,进行适当的验证和 sanitization
- 避免使用已移动的字符串:移动后的字符串处于有效但未指定的状态
- 注意字符串的生命周期:使用
std::string_view时,确保底层字符串的生命周期足够长
总结
字符串和流处理是C++编程中的基础且重要的部分,本章我们深入探讨了:
- 字符串处理:从
std::string的基本操作到高级特性,包括内存管理、小字符串优化和C++17引入的std::string_view - 流系统:详细介绍了C++的流层次结构,包括标准I/O、文件I/O和字符串流
- 格式化输出:从传统的格式化方法到C++20的新格式化库
- 高级操作:字符串算法、正则表达式、流的状态管理等
- 性能优化:字符串操作和流操作的性能优化策略
- 实际应用:配置文件解析、日志系统、CSV文件处理等实际案例
通过掌握这些知识,你将能够更加高效、安全地处理字符串和流操作,编写更加健壮和高性能的C++程序。字符串和流处理是C++编程的基础技能,也是成为优秀C++程序员的必备知识。
在实际项目中,你应该根据具体需求选择合适的字符串类型和流操作方式,平衡易用性、性能和安全性。通过不断学习和实践,你会逐渐掌握字符串和流处理的精髓,并能够在各种场景中灵活应用。