diff --git a/CppCoreGuidelines.md b/CppCoreGuidelines.md index 23fa877..a1558f9 100644 --- a/CppCoreGuidelines.md +++ b/CppCoreGuidelines.md @@ -1,6 +1,6 @@ # C++ Core Guidelines -April 9, 2017 +April 16, 2017 Editors: @@ -2078,7 +2078,7 @@ Parameter passing semantic rules: * [F.22: Use `T*` or `owner` or a smart pointer to designate a single object](#Rf-ptr) * [F.23: Use a `not_null` to indicate "null" is not a valid value](#Rf-nullptr) * [F.24: Use a `span` or a `span_p` to designate a half-open sequence](#Rf-range) -* [F.25: Use a `zstring` or a `not_null` to designate a C-style string](#Rf-string) +* [F.25: Use a `zstring` or a `not_null` to designate a C-style string](#Rf-zstring) * [F.26: Use a `unique_ptr` to transfer ownership where a pointer is needed](#Rf-unique_ptr) * [F.27: Use a `shared_ptr` to share ownership](#Rf-shared_ptr) @@ -3066,7 +3066,7 @@ Passing a `span` object as an argument is exactly as efficient as passing a pair (Complex) Warn where accesses to pointer parameters are bounded by other parameters that are integral types and suggest they could use `span` instead. -### F.25: Use a `zstring` or a `not_null` to designate a C-style string +### F.25: Use a `zstring` or a `not_null` to designate a C-style string ##### Reason @@ -17102,8 +17102,278 @@ If you have a good reason to use another container, use that instead. For exampl ## SL.str: String +Text manipulation is a huge topic. +`std::string` doesn't cover all of it. +This section primarily tries to clarify `std::string`'s relation to `char*`, `zstring`, `string_view`, and `gsl::string_span`. +The important issue of non-ASCII charactersets and encodings (e.g., `wchar_t`, unicode, and UTF-8) will be covered elswhere. + +See also [regular expressions](#SS-regex). + +Here, we use "sequence of characters" or "string" to refer to a sequence of charaters meant to be read as text (somehow, eventually). +We don't consider + +String summary: + +* [SL.str.1: Use `std::string` to own character sequences](#Rstr-string) +* [SL.str.2: Use `std::string_view` or `gsl::string_span` to refer to character sequences](#Rstr-view) +* [SL.str.3: Use `zstring` or `czstring` to refere to a C-style, zero-terminated, sequence of characters](#Rstr-zstring) +* [SL.str.4: Use `char*` to refer to a single character](#Rstr-char*) +* [Sl.str.5: Use `std::byte` to refer to byte values that do not necessarily represent characters](#Rstr-byte) + +* [Sl.str.10: Use `std::string` when you need to perform locale-sensitive sting operations](#Rstr-locale) +* [Sl.str.11: Use `gsl::string_span` rather than `std::view` when you need to mutate a string](#Rstr-span) +* [Sl.str.12: Use the `s` suffix for string literals meant to be standard-library `string`s](#Rstr-s) + +See also + +* [F.24 span](#Rf-range) +* [F.25 zstring](#Rf-zstring) + + +### SL.str.1: Use `std::string` to own character sequences + +##### Reason + +`string` correctly handles allocation, ownership, copying, gradual expansion, and offers a variety of useful operations. + +##### Example + + vector read_until(const string& terminator) + { + vector res; + for (string s; cin>>s && s!=terminator; ) // read a word + res.push_back(s); + return res; + } + +Note how `>>` and `!=` are provided for `string` (as examples of a useful operations) and there there are no explicit +allocations, deallocations, or range checks (`string` takes care of those). + +In C++17, we might use `string_view` as the argument, rather than `const string *` to allow more flexibility to callers: + + vector read_until(string_view terminator) // C++17 + { + vector res; + for (string s; cin>>s && s!=terminator; ) // read a word + res.push_back(s); + return res; + } + +The `gsl::string_span` is a current alternative offering most of the benefits of `string_span` for simple examples: + + vector read_until(string_span terminator) + { + vector res; + for (string s; cin>>s && s!=terminator; ) // read a word + res.push_back(s); + return res; + } + +##### Example, bad + +Don't use C-style strings for operations that require non-trivial memory management + + char* cat(const char* s1, const char* s2) // beware! + // return s1 + '.' + s2 + { + int l1 = strlen(s1); + int l2 = strlen(s2); + char* p = (char*)malloc(l1+l2+2); + strcpy(p,s1,l1); + p[l1] = '.'; + strcpy(p+l1+1,s2,l2); + p[l1+l2+1] = 0; + return res; + } + +Did we get that right? +Will the caller remember to `free()` the returned pointer? +Will this code pass a security review? + +##### Note + +Do not assume that `string` is slower than lower-level techniques without measurement and remember than not all code is performance critical. +[Don't optimize prematurely](#Rper-Knuth) + +##### Enforcement + ??? +### SL.str.2: Use `std::string_view` or `gsl::string_span` to refer to character sequences + +##### Reason + +`std::string_view` or `gsl::string_span` provides simple and (potentially) safe access to character sequences independently of how +those sequences are allocated and stored. + +##### Example + + vector read_until(string_span terminator); + + void user(zstring p, const string& s, string_span ss) + { + auto v1 = read_until(p); + auto v2 = read_until(s); + auto v3 = read_until(ss); + // ... + } + +##### Note + +??? + +##### Enforcement + +??? + +### SL.str.3: Use `zstring` or `czstring` to refere to a C-style, zero-terminated, sequence of characters + +##### Reason + +Readability. +Statement of intent. +A plain `char*` can be a pointer to a single character, a pointer to an arry of characters, a pointer to a C-style (zero terminated) string, or event to a small integer. +Distinguishing these alternatives prevents misunderstandings and bugs. + +##### Example + + void f1(const char* s); // s is probably a string + +All we know is that it is supposet ot bet the nullptr or point to at least one character + + void f1(zstring s); // s is a C-style string or the nullptr + void f1(czstring s); // s is a C-style string that is not the nullptr + void f1(std::byte* s); // s is a pointer to a byte (C++17) + +##### Note + +Don't convert a C-style string to `string` unless there is a reason to. + +##### Note + +Linke any other "plain pointer", a `zstring` should not represent ownership. + +##### Note + +There are billions of lines of C++ "out there", most use `char*` and `const char*` without documenting intent. +They are use in a wide varity of ways, including to represent ownership and as generic pointers to memory (instead of `void*`). +It is hard to separate these uses, so this guideline is hard to follow. +This is one of the major sources of bugs in C and C++ programs, so it it worth while to follow this guideline wherever feasible.. + +##### Enforcement + +* Flag uses of `[]` on a `char*` +* Flag uses of `delete` on a `char*` +* Flag uses of `free()` on a `char*` + +### SL.str.4: Use `char*` to refer to a single character + +##### Reason + +The variety of uses of `char*` in current code is a major source of errors. + +##### Example, bad + + char arr[] = {'a', 'b', 'c'}; + + void print(const char* p) + { + cout << p << '\n'; + } + + void use() + { + print(arr); // run-time error; potentially very bad + } + +The array `arr` is not a C-style string because it is not zero-terminated. + +##### Alternative + +See [`zstring`](#Rstr-zstring), [`string`](#Rstr-string), and [`string_span`](#Rstr-view). + +##### Enforcement + +* Flag uses of `[]` on a `char*` + +### Sl.str.5: Use `std::byte` to refer to byte values that do not necessarily represent characters + +##### Reason + +Use of `char*` to represent a pinter to something that is not necessarily a character cause confusion +and disable valuable optimizations. + +##### Example + + ??? + +##### Note + +C++17 + +##### Enforcement + +??? + + +### Sl.str.10: Use `std::string` when you need to perform locale-sensitive sting operations + +##### Reason + +`std::string` support standard-library [`locale` facilities](#Rstr-locale) + +##### Example + + ??? + +##### Note + +??? + +##### Enforcement + +??? +### Sl.str.11: Use `gsl::string_span` rather than `std::view` when you need to mutate a string + +##### Reason + +`std::string_view` is read-only. + +##### Example + +??? + +##### Note + +??? + +##### Enforcement + +The compile will flag attempts to write to a `string_view`. + +### Sl.str.12: Use the `s` suffix for string literals meant to be standard-library `string`s + +##### Reason + +Direct expression of an idea minimizes mistakes. + +##### Example + + auto pp1 = make_pair("Tokyo",9.00); // {C-style string,double} intended? + pair pp2 = {"Tokyo",9.00}; // a bit verbose + auto pp3 = make_pair("Tokyo"s,9.00); // {std::string,double} // C++17 + pair pp4 = {"Tokyo"s,9.00}; // {std::string,double} // C++17 + + +##### Note + +C++17 + +##### Enforcement + +??? + + ## SL.io: Iostream ???