From 8b6d79573257fa11672ca03020ff571e0ee1a2cf Mon Sep 17 00:00:00 2001 From: hsutter Date: Mon, 17 Apr 2017 11:09:46 -0700 Subject: [PATCH] Added first draft of GSL intro --- docs/gsl-intro.md | 314 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 314 insertions(+) create mode 100644 docs/gsl-intro.md diff --git a/docs/gsl-intro.md b/docs/gsl-intro.md new file mode 100644 index 0000000..b75cea1 --- /dev/null +++ b/docs/gsl-intro.md @@ -0,0 +1,314 @@ + +# Using the Guideline Support Library (GSL): A Tutorial and FAQ + +by Herb Sutter + +updated 2017-04-17 + + +## Overview: "Is this document a tutorial or a FAQ?" + +It aims to be both: + +- a tutorial you can read in order, following a similar style as the introduction of [K&R](https://en.wikipedia.org/wiki/The_C_Programming_Language) by building up examples of increasing complexity; and + +- a FAQ you can use as a reference, with each section showing the answer to a specific question. + + +## Motivation: "Why would I use GSL, and where can I get it?" + +First look at the [C++ Core Guidelines](https://github.com/isocpp/CppCoreGuidelines); this is a support library for that document. Select a set of guidelines you want to adopt, then bring in the GSL as directed by those guidelines. + +You can try out the examples in this document on all major compilers and platforms using [this GSL reference implementation](https://github.com/microsoft/gsl). + +# gsl::span: "What is gsl::span for?" + +`gsl::span` is a replacement for `(pointer, length)` pairs to refer to a sequence of contiguous objects. It can be thought of as a pointer to an array, but that knows its bounds. + +## span parameters: "How should I choose between span and traditional (ptr, length) parameters?" + +In new code, prefer the bounds-checkable `span` instead of separate pointer and length parameters. In older code, adopt `span` where reasonable as you maintain the code. + +A function that takes a pointer to an array and a separate length, such as: + +~~~cpp +// Error-prone: Process n contiguous ints starting at *p +void dangerous_process_ints(const int* p, size_t n); +~~~ + +is error-prone and difficult to use correctly: + +~~~cpp +int a[100]; +dangerous_process_ints(a, 1000); // oops: buffer overflow + +vector v(200); +dangerous_process_ints(v.data(), 1000); // oops: buffer overflow + +auto remainder = find(v.begin(), v.end(), some_value); + // now call dangerous_process_ints() to fill the rest of the container from *remainder to the end +dangerous_process_ints(&*remainder, v.end() - remainder); // correct but convoluted +~~~ + +Instead, using `span` encapsulates the pointer and the length: + +~~~cpp +// BETTER: Read n contiguous ints starting at *p +void process_ints(span s); +~~~ + +which makes `process_ints` easier to use correctly because it conveniently deduces from common types: + +~~~cpp +int a[100]; +process_ints(a); // deduces correct length: 100 (constructs the span from a container) + +vector v(200); +process_ints(v); // deduces correct length: 200 (constructs the span from a container) +~~~ + +and conveniently supports modern C++ argument initialization when the calling code does have distinct pointer and length arguments: + +~~~cpp +auto remainder = find(v.begin(), v.end(), some_value); + // now call process_ints() to fill the rest of the container from *remainder to the end +process_ints({remainder, v.end()}); // correct and clear (constructs the span from an iterator pair) +~~~ + +> Things to remember +> - Prefer `span` instead of (pointer, length) pairs. +> - Pass a `span` like a pointer (i.e., by value for "in" parameters). Treat it like a pointer range. + + +## span and const: "What's the difference between `span` and `const span`?" + +`span` means that the `T` objects are read-only. Prefer this by default, especially as a parameter, if you don't need to modify the `T`s. + +`const span` means that the `span` itself can't be made to point at a different target. + +`const span` means both. + +> Things to remember +> - Prefer a `span` by default to denote that the contents are read-only, unless you do need read-write access. + + +## Iteration: "How do I iterate over a span?" + +A `span` is an encapsulated range, and so can be visited using a range-based `for` loop. + +Consider the implementation of a function like the `process_ints` that we saw in an earlier example. Visiting every object using a (pointer, length) pair requires an explicit index: + +~~~cpp +void dangerous_process_ints(int* p, size_t n) { + for (auto i = 0; i < n; ++i) { + p[i] = next_character(); + } + + // or: + // while (n--) { + // *p = next_character(); + //} +} +~~~ + +A `span` supports range-`for`: + +~~~cpp +void process_ints(span s) { + for (auto& c : s) { + c = next_character(); + } +} +~~~ + + +## Sub-spans: "What if I need a subrange of a span?" + +To refer to a sub-span, use `first`, `last`, or `subspan`. + +~~~cpp +void process_ints(span s) { + if (s.length() > 10) { + read_header(s.first(10)); // first 10 entries + read_rest(s.subspan(10)); // remaining entries + // ... + } +} +~~~ + +In rarer cases, when you know the number of elements at compile time and want to enable `constexpr` use of `span`, you can pass the length of the sub-span as a template argument: + +~~~cpp +constexpr int process_ints(span s) { + if (s.length() > 10) { + read_header(s.first<10>()); // first 10 entries + read_rest(s.subspan<10>()); // remaining entries + // ... + } + return s.size(); +} +~~~ + + +## span and STL: "How do I pass a span to an STL-style [begin,end) function?" + +Use `span::iterator`s. A `span` is iterable like any STL range. + +To call an STL `[begin,end)`-style interface, use `begin` and `end` by default, or other valid iterators if you don't want to pass the whole range: + +~~~cpp +void f(span s) { + // ... + auto found = find_if(s.begin(), s.end(), some_value); + // ... +} +~~~ + +If you are using a range-based algorithm such as from [Range-V3](https://github.com/ericniebler/range-v3), you can use a `span` as a range directly: + +~~~cpp +void f(span s) { + // ... + auto found = find_if(s, some_value); + // ... +} +~~~ + + +## Comparison: "When I compare `span`s, do I compare the `T` values or the underlying pointers?" + +Comparing two `span`s compares the `T` values. To compare two `span`s for identity, to see if they're pointing to the same thing, use `.data()`. + +~~~cpp +int a[] = { 1, 2, 3}; +span sa{a}; + +vector v = { 1, 2, 3 }; +span sv{v}; + +assert(sa == sv); // sa and sv both point to contiguous ints with values 1, 2, 3 +assert(sa.data() != sv.data()); // but sa and sv point to different memory areas +~~~ + +> Things to remember +> - Comparing `span`s compares their contents, not whether they point to the same location. + + +## Empty vs null: "Do I have to explicitly check whether a span is null?" + +Usually not, because the thing you usually want to check for is that the `span` is not empty, which means its size is not zero. It's safe to test the size of a span even if it's null. + +Remember that the following all have identical meaning for a `span s`: + +- `!s.empty()` +- `s.size() != 0` +- `s.data() != nullptr && s.size() != 0` (the first condition is actually redundant) + +The following is also functionally equivalent as it just tests whether there are zero elements: + +- `s != nullptr` (compares `s` against a null-constructed empty `span`) + +For example: + +~~~cpp +void f(span s) { + if (s != nullptr && s.size() > 0) { // bad: redundant, overkill + // ... + } + + if (s.size() > 0) { // good: not redundant + // ... + } + + if (!s.empty()) { // good: same as "s.size() > 0" + // ... + } +} + +~~~ + +> Things to remember +> - Usually you shouldn't check for a null `span`. For a `span s`, if you're comparing `s != nullptr` or `s.data() != nullptr`, check to make sure you shouldn't just be asking `!s.empty()`. + + +## as_bytes: "Why would I convert a span to `span`?" + +Because it's a type-safe way to get a read-only view of the objects' bytes. + +Without `span`, to view the bytes of an object requires writing a brittle cast: + +~~~cpp +void serialize(char* p, int length); // bad: forgot const + +void f(widget* p, int length) { + // serialize one object's bytes (incl. padding) + serialize(p, 1); // bad: copies just the first byte, forgot sizeof(widget) +} +~~~ + +With `span` the code is safer and cleaner: + +~~~cpp +void serialize(span); // can't forget const, the first test call site won't compile + +void f(span s) { + // ... + // serialize one object's bytes (incl. padding) + serialize(s.as_bytes()); // ok +} +~~~ + +Also, `span` lets you distinguish between `.size()` and `.size_bytes()`; make use of that distinction instead of multiplying by `sizeof(T)`. + +> Things to remember +> - Prefer `span`'s `.size_bytes()` instead of `.size() * sizeof(T)`. + + + +


+# *** TODO - Other span suggestions and questions back to Bjarne and Neil + +Bjarne suggested: + +- given an STL style interface ([b:e)), how do I implement it using a span? + +HS: I couldn't think of an example so I skipped this + +- show a use of string_span + +HS: I think we're dropping this, so it doesn't need an example, right? + +- I would concentrate on span and push not_null(), narrow(), and friends to a separate note. + +HS: OK, stopping with the above for now -- what more can we say about span? + +- I would be happy to review a rough draft. + +HS: Here you go! :) + +Neil suggested: + +- some guidance on how to deal with standard lib container size_t vs span ptrdiff_t mismatch. + +HS: Do you have an example in mind? + + +




+# MORE RAW NOTES + +I'll continue with more of these, and possibly in a separate note as Bjarne suggests a few lines above, if everyone agrees. + +## Neil + +- use `byte` everywhere you are handling memory (as opposed to characters or integers) + +- use `narrow()` when you cannot afford to be surprised by a value change during conversion to a smaller range (includes going between signed to unsigned) + +- use `narrow_cast()` when you are *sure* you won’t be surprised by a value change during conversion to a smaller range + +> - pass `not_null` by value + +I suspect this isn't right -- I think it should be "pass `not_null` the same as `T`". For example, `not_null` should be passed by value, but `not_null>` should probably be passed by `const&`. + +- use `not_null` on any raw pointer parameter that should never contain nullptr +