Basti’s Buggy Blog

Poking at software.

When to Use C++17 std::string_view?

C++17 brings many new fancy things, many of which significantly improve the readability and expressiveness of code (if constexpr, std::optional). However, there are features like std::string_view, which can be footguns in disguise. This article gives a short overview over the possible (mis-)uses of this “new” class.

Table of Contents

TL;DR

string_view can improve performance if used correctly, but provides many possibilities to shoot yourself in the foot. If in doubt, stick with strings.

Function parameters:

  • Do use string_view wherever you would use const string&
  • Don’t assume a string_view is NUL-terminated
    • Construct a string, if NUL-termination is required
    • Use const char* otherwise
  • Don’t store string_view past the lifetime of the function
    • If you need to store the string, construct a std::string
    • If the function requires a string_view with static lifetime, use a different parameter type (e.g. string_view*) to make mistakes harder (other suggestions welcome)
  • Don’t use string_view& or const string_view& (the class is tiny)

Return types:

  • Don’t use string_view, as it is hard for the callee to guarantee that the string_view’s data outlive the returned object
    • Exception: The returned string_view is a substring of a passed argument.
  • Don’t use string_view in getters, not even in getters for constant members

Class members:

  • Don’t use string_view, unless you are sure they have a static lifetime

Static Strings

  • Do initialize static strings with the sv literal suffix

Why a New String Class?

Consider the following C++ function, which accepts a std::string as the only argument:

#include <string>
#include <iostream>

// Standard C++
void printMe(const std::string& sOutput) {
  std::cout << sOutput << std::endl;
}

int main() {
  // works, calls function as expected
  printMe(std::string{"Look at me, I'm a modern string!"});

  // works, implicitly calls std::string constructor
  printMe("I'm a plain old C-String");
}

This function accepts C++ strings and plain old C-strings, which are implicitly converted to C++ strings. Without looking at the implementation of the function, we can tell the following things:

  • The function wants a string (std::string)
  • The function is not planning on modifying the string (const)
  • At least in the function call, no copy of the string is created (&)

Note: Those are healthy assumptions, which do not necessarily hold up. For example, the callee could get a pointer of the reference and modify the underlying value at a different point in time. However, in a well-designed API we can expect that those assumptions are true.

The drawback of passing the string as a constant reference is that C-strings will be converted into C++ strings, which copies the string. This might not be a problem when the function is called infrequently with small strings. But as you can imagine, repeatedly calling the function with multi-GB strings can have a negative impact on the performance.

Pre-C++17 we are stuck with providing an overload for C-strings in order to prevent copying. But C++17 brings string_view, which sets out to solve our problem. By using a pass-by-value string_view, we can accept C and C++ strings! Let’s try it out by simply swapping our const std::string& for a std::string_view:

#include <string>
#include <string_view>
#include <iostream>

// Standard C++
void printMe(std::string_view sOutput) {
  std::cout << sOutput << std::endl;
}

int main() {
  // works, implicitly calls std::string::operator string_view
  printMe(std::string{"Look at me, I'm a modern string!"});

  // works, implicitly calls std::string_view constructor
  printMe("I'm a plain old C-String");
}

It doesn’t look like much has changed. However, passing a C-string no longer copies the string implicitly. Instead, a string_view is constructed, which usually is implemented using only two members:

The class template basic_string_view describes an object that can refer to a constant contiguous sequence of char-like objects with the first element of the sequence at position zero.

A typical implementation holds only two members: a pointer to constant CharT and a size.

cppreference.com

Based on that description, let’s draw a quick sketch on how a implementation could look like. Notes: The white boxes show ownership and the data types are implementation dependent.

possible implementation of a string_view class

string_view anatomy

As you can see, the string_view does not own any of the actual underlying string, it just points to the first character and has a member, which keeps track of the size. This means that it is the callers responsibility to ensure the data outlive the function call. Let’s compare it to passing a string by reference:

A C++ string reference with pointer indirection

A C++ string reference with pointer indirection

To access our actual string, we must follow two pointers: The first one to follow the string reference, the second one to get to the actual data. The string_view object on the other hand is small enough to be passed on the stack and will most likely be stored in CPU registers. Leaving out possible extra copies the string object potentially has to do, it is worth keeping this in mind. This however is not a guarantee for performance improvement: If in doubt, benchmark!

The string object’s advantage is that owns the data and is responsible for managing the memory containing the string. This means we generally don’t have to worry about any lifetimes. With the string_view we have to be really careful to prevent the view from outliving the underlying data, otherwise we will be left with a dangling pointer. Therefore make sure that you know about the lifetimes or just use a plain string object.

Substrings

One of the things that string_view is really good at is substrings. It allows us to view a string through a window of fixed position and size. Let’s say instead of looking at the string “Hello World!”, we only want to reference the word “World”:

Rererencing part of a string using `string_view`

Rererencing part of a string using string_view

string_view provides the method substr, which returns another string_view for the given window size and dimension. It can be used as a replacement for string in many C++ interfaces like iostream:

#include <string_view>
#include <iostream>
using namespace std::literals;

void printMe(std::string_view sOutput) {
  std::cout << sOutput << std::endl; // prints "World"
}

int main() {
  constexpr auto str = "Hello World!"sv;
  printMe(str.substr(6, 5)); // pass "World" as string view
}

C-style interfaces which require a NUL-terminated string, should not be used with the string view. string_view does not guarantee that the string it contains is NUL-terminated – as seen in the example above. Functions like printf will not know about the string_view’s end of the string:

#include <string_view>
#include <cstdio>
using namespace std::literals;

void printMe(std::string_view sOutput) {
  // data() points to the first char element
  printf("%s\n", sOutput.data()); // oops, prints "World!"
}

int main() {
  constexpr auto str = "Hello World!"sv;
  printMe(str.substr(6, 5));
}

To fix this, we need to allocate a new string, containing the data of the string_view, creating a copy:

#include <string>
#include <string_view>
#include <cstdio>
using namespace std::literals;

void printMe(std::string_view sOutput) {
  // copy is created, but NUL-terminated
  printf("%s\n", std::string{sOutput}.c_str()); // prints "World"
}

int main() {
  constexpr auto str = "Hello World!"sv;
  printMe(str.substr(6, 5));
}

Examples (Good and Bad)

string_view as Function Parameter

Setter

Setters are a perfect example for string_views. The setter itself is responsible for creating a copy of the passed view. However, do not store the string_view itself as a class member.

#include <string>
#include <string_view>
using namespace std;

struct Person {
  string name;
  void setName(string_view name_) {
    name = std::string{name_}; // create a copy here -> own the string
  }
};

int main() {
  Person p {"Peter"};
  p.setName("Secret Name");
}

View of a Temporary String

This is a very subtle mistake that might not always lead to a crash. By concatenating two string objects (greeting + welcome), a third temporary object is created, which allocates memory through the default allocator. Because a string rvalue can implicitly be converted to a string view, all seems fine. But right after the conversion, the rvalue is unused and therefore the string object containing the concatenated string can be destructed. This leaves the string_view with a dangling pointer.

#include <string>
#include <string_view>
#include <iostream>
using namespace std;

void printMe(string_view sOutput) {
  cout << sOutput << endl;
}

int main() {
  string greeting{"Hello {{name}}, "};
  string welcome{"welcome to {{company}}"};
  printMe(greeting + welcome); // oh no!
}

string_view as Return Value

Return Value has Function-Scope Lifetime

Most of the time, returning a string_view is a bad idea. As soon as the function-scope variable ret goes out of scope, the string_view reference becomes invalid and we are left with a dangling pointer. Compilers should be able to catch this mistake tho (Clang 11 emits a warning).

#include <string>
#include <string_view>
#include <vector>
#include <iostream>
using namespace std;

string_view concat(vector<string_view> vec) {
  std::string ret{};
  for(auto el : vec) {
    ret += el;
  }
  // function-scope string is returned as string_view
  return ret; // oh no
}

int main() {
  cout << concat({"A", "B"}) << endl; // prints ��
}

Getter

Returning a string_view in a getter method is almost a guarantee for trouble. The string_view becomes invalid as soon as the underlying (member-)value is changed. Use const std::string& or std::string as return type instead.

#include <string>
#include <string_view>
#include <vector>
#include <iostream>
using namespace std;

struct Person {
  string name;
  string_view getName() const {
    return name;
  }
  void setName(string_view name_) {
    name = std::string{name_};
  }
};

int main() {
  string_view name{};
  Person p {"Peter"};
  name = p.getName();
  p.setName("I am overwriting your stuff!"); // oh no
  cout << name << endl; // prints garbage
}

Return Value Lifetime Depends on Argument Lifetime

The goal is to extract all variables in a string, which are wrapped in {{}}. As the returned vector only contains string_views, which reference the passed in string, this works fine.

#include <string>
#include <string_view>
#include <vector>
#include <iostream>
using namespace std;
using namespace std::literals;

// the returned string_views are substrings of the passed stringview
// => lifetime ok
vector<string_view> parseVars(string_view str) {
  vector<string_view> ret{};

  size_t lastStart = 0;
  for(
     size_t start = str.find("{{", lastStart);
     start != string::npos;
     start = str.find("{{", lastStart)
  ) {
     start += 2;
     size_t end = str.find("}}", start);
     if(end == string::npos) {
        return ret;
     }

     ret.push_back(str.substr(start, end - start));

     lastStart = end + 2;
  }
  
  return ret;
}

int main() {
   string greeting{"Hello {{name}}, "};
   string welcome{"welcome to {{company}}"};
   auto msg = greeting + welcome;
   for(auto var : parseVars(msg)) {
      cout << var << endl;
      // prints:
      // name
      // company
   }
}

Conclusion

String views certainly are helpful in some cases — especially when performance matters. Swapping out const std::string& parameters for std::string_views is straightforward, if you don’t use C-Style APIs. In other places, carefully consider if you need string views because of the pitfall potential.

Comments and suggestions are welcome (mail at hiebl dot cc)