When to Use C++17 std::string_view?
C++17 brings many new fancy things, many of which significantly improve the readability and expressiveness of code (if constexpr
, std::optional
).
However, there are features like std::string_view
, which can be footguns in disguise.
This article gives a short overview over the possible (mis-)uses of this “new” class.
Table of Contents
TL;DR
string_view
can improve performance if used correctly, but provides many possibilities to shoot yourself in the foot.
If in doubt, stick with string
s.
Function parameters:
- Do use
string_view
wherever you would useconst string&
- Don’t assume a
string_view
is NUL-terminated- Construct a
string
, if NUL-termination is required - Use
const char*
otherwise
- Construct a
- Don’t store
string_view
past the lifetime of the function- If you need to store the string, construct a
std::string
- If the function requires a
string_view
with static lifetime, use a different parameter type (e.g.string_view*
) to make mistakes harder (other suggestions welcome)
- If you need to store the string, construct a
- Don’t use
string_view&
orconst string_view&
(the class is tiny)
Return types:
- Don’t use
string_view
, as it is hard for the callee to guarantee that thestring_view
’s data outlive the returned object- Exception: The returned
string_view
is a substring of a passed argument.
- Exception: The returned
- Don’t use
string_view
in getters, not even in getters for constant members
Class members:
- Don’t use
string_view
, unless you are sure they have a static lifetime
Static Strings
- Do initialize static strings with the
sv
literal suffix
Why a New String Class?
Consider the following C++ function, which accepts a std::string
as the only argument:
#include <string>
#include <iostream>
// Standard C++
void printMe(const std::string& sOutput) {
std::cout << sOutput << std::endl;
}
int main() {
// works, calls function as expected
printMe(std::string{"Look at me, I'm a modern string!"});
// works, implicitly calls std::string constructor
printMe("I'm a plain old C-String");
}
This function accepts C++ strings and plain old C-strings, which are implicitly converted to C++ strings. Without looking at the implementation of the function, we can tell the following things:
- The function wants a string (
std::string
) - The function is not planning on modifying the string (
const
) - At least in the function call, no copy of the string is created (
&
)
Note: Those are healthy assumptions, which do not necessarily hold up. For example, the callee could get a pointer of the reference and modify the underlying value at a different point in time. However, in a well-designed API we can expect that those assumptions are true.
The drawback of passing the string as a constant reference is that C-strings will be converted into C++ strings, which copies the string. This might not be a problem when the function is called infrequently with small strings. But as you can imagine, repeatedly calling the function with multi-GB strings can have a negative impact on the performance.
Pre-C++17 we are stuck with providing an overload for C-strings in order to prevent copying.
But C++17 brings string_view
, which sets out to solve our problem.
By using a pass-by-value string_view
, we can accept C and C++ strings!
Let’s try it out by simply swapping our const std::string&
for a std::string_view
:
#include <string>
#include <string_view>
#include <iostream>
// Standard C++
void printMe(std::string_view sOutput) {
std::cout << sOutput << std::endl;
}
int main() {
// works, implicitly calls std::string::operator string_view
printMe(std::string{"Look at me, I'm a modern string!"});
// works, implicitly calls std::string_view constructor
printMe("I'm a plain old C-String");
}
It doesn’t look like much has changed.
However, passing a C-string no longer copies the string implicitly.
Instead, a string_view
is constructed, which usually is implemented using only two members:
The class template
basic_string_view
describes an object that can refer to a constant contiguous sequence ofchar
-like objects with the first element of the sequence at position zero.A typical implementation holds only two members: a pointer to constant CharT and a size.
—cppreference.com
Based on that description, let’s draw a quick sketch on how a implementation could look like. Notes: The white boxes show ownership and the data types are implementation dependent.
As you can see, the string_view
does not own any of the actual underlying string, it just points to the first character and has a member, which keeps track of the size.
This means that it is the callers responsibility to ensure the data outlive the function call.
Let’s compare it to passing a string by reference:
To access our actual string, we must follow two pointers: The first one to follow the string reference, the second one to get to the actual data.
The string_view
object on the other hand is small enough to be passed on the stack and will most likely be stored in CPU registers.
Leaving out possible extra copies the string
object potentially has to do, it is worth keeping this in mind.
This however is not a guarantee for performance improvement: If in doubt, benchmark!
The string
object’s advantage is that owns the data and is responsible for managing the memory containing the string.
This means we generally don’t have to worry about any lifetimes.
With the string_view
we have to be really careful to prevent the view from outliving the underlying data, otherwise we will be left with a dangling pointer.
Therefore make sure that you know about the lifetimes or just use a plain string
object.
Substrings
One of the things that string_view
is really good at is substrings.
It allows us to view a string through a window of fixed position and size.
Let’s say instead of looking at the string “Hello World!”, we only want to reference the word “World”:
string_view
provides the method substr
, which returns another string_view
for the given window size and dimension.
It can be used as a replacement for string
in many C++ interfaces like iostream
:
#include <string_view>
#include <iostream>
using namespace std::literals;
void printMe(std::string_view sOutput) {
std::cout << sOutput << std::endl; // prints "World"
}
int main() {
constexpr auto str = "Hello World!"sv;
printMe(str.substr(6, 5)); // pass "World" as string view
}
C-style interfaces which require a NUL-terminated string, should not be used with the string view.
string_view
does not guarantee that the string it contains is NUL-terminated – as seen in the example above.
Functions like printf
will not know about the string_view
’s end of the string:
#include <string_view>
#include <cstdio>
using namespace std::literals;
void printMe(std::string_view sOutput) {
// data() points to the first char element
printf("%s\n", sOutput.data()); // oops, prints "World!"
}
int main() {
constexpr auto str = "Hello World!"sv;
printMe(str.substr(6, 5));
}
To fix this, we need to allocate a new string, containing the data of the string_view
, creating a copy:
#include <string>
#include <string_view>
#include <cstdio>
using namespace std::literals;
void printMe(std::string_view sOutput) {
// copy is created, but NUL-terminated
printf("%s\n", std::string{sOutput}.c_str()); // prints "World"
}
int main() {
constexpr auto str = "Hello World!"sv;
printMe(str.substr(6, 5));
}
Examples (Good and Bad)
string_view
as Function Parameter
✅ Setter
Setters are a perfect example for string_view
s.
The setter itself is responsible for creating a copy of the passed view.
However, do not store the string_view
itself as a class member.
#include <string>
#include <string_view>
using namespace std;
struct Person {
string name;
void setName(string_view name_) {
name = std::string{name_}; // create a copy here -> own the string
}
};
int main() {
Person p {"Peter"};
p.setName("Secret Name");
}
❌ View of a Temporary String
This is a very subtle mistake that might not always lead to a crash.
By concatenating two string
objects (greeting + welcome
), a third temporary object is created, which allocates memory through the default allocator.
Because a string
rvalue can implicitly be converted to a string view, all seems fine.
But right after the conversion, the rvalue is unused and therefore the string object containing the concatenated string can be destructed.
This leaves the string_view
with a dangling pointer.
#include <string>
#include <string_view>
#include <iostream>
using namespace std;
void printMe(string_view sOutput) {
cout << sOutput << endl;
}
int main() {
string greeting{"Hello {{name}}, "};
string welcome{"welcome to {{company}}"};
printMe(greeting + welcome); // oh no!
}
string_view
as Return Value
❌ Return Value has Function-Scope Lifetime
Most of the time, returning a string_view
is a bad idea.
As soon as the function-scope variable ret
goes out of scope, the string_view
reference becomes invalid and we are left with a dangling pointer.
Compilers should be able to catch this mistake tho (Clang 11 emits a warning).
#include <string>
#include <string_view>
#include <vector>
#include <iostream>
using namespace std;
string_view concat(vector<string_view> vec) {
std::string ret{};
for(auto el : vec) {
ret += el;
}
// function-scope string is returned as string_view
return ret; // oh no
}
int main() {
cout << concat({"A", "B"}) << endl; // prints ��
}
❌ Getter
Returning a string_view
in a getter method is almost a guarantee for trouble.
The string_view
becomes invalid as soon as the underlying (member-)value is changed.
Use const std::string&
or std::string
as return type instead.
#include <string>
#include <string_view>
#include <vector>
#include <iostream>
using namespace std;
struct Person {
string name;
string_view getName() const {
return name;
}
void setName(string_view name_) {
name = std::string{name_};
}
};
int main() {
string_view name{};
Person p {"Peter"};
name = p.getName();
p.setName("I am overwriting your stuff!"); // oh no
cout << name << endl; // prints garbage
}
✅ Return Value Lifetime Depends on Argument Lifetime
The goal is to extract all variables in a string, which are wrapped in {{}}
.
As the returned vector only contains string_view
s, which reference the passed in string, this works fine.
#include <string>
#include <string_view>
#include <vector>
#include <iostream>
using namespace std;
using namespace std::literals;
// the returned string_views are substrings of the passed stringview
// => lifetime ok
vector<string_view> parseVars(string_view str) {
vector<string_view> ret{};
size_t lastStart = 0;
for(
size_t start = str.find("{{", lastStart);
start != string::npos;
start = str.find("{{", lastStart)
) {
start += 2;
size_t end = str.find("}}", start);
if(end == string::npos) {
return ret;
}
ret.push_back(str.substr(start, end - start));
lastStart = end + 2;
}
return ret;
}
int main() {
string greeting{"Hello {{name}}, "};
string welcome{"welcome to {{company}}"};
auto msg = greeting + welcome;
for(auto var : parseVars(msg)) {
cout << var << endl;
// prints:
// name
// company
}
}
Conclusion
String views certainly are helpful in some cases — especially when performance matters.
Swapping out const std::string&
parameters for std::string_view
s is straightforward, if you don’t use C-Style APIs.
In other places, carefully consider if you need string views because of the pitfall potential.
Comments and suggestions are welcome (mail at hiebl dot cc)