Practical Type Punning in C++11 and higher
Type punning (treating an object as if it were another one) is common in C++ code bases, yet most of the time it technically is undefined behavior.
Typically it is used when reinterpreting bytes received from a network socket as a POD-struct.
This article tries to answer two questions: Why should I not use reinterpret_cast
? and What to use instead?
It is heavily inspired by Timur Doumler’s talk at CppCon 2019.
Table of Contents
TL;DR
reinterpret_cast
leads to UB if you don’t respect the alignment requirements of the target type.- Always prefer
std::bit_cast
andstd::memcpy
overreinterpret_cast
- To access the
char[]
of a typeT
, just usereinterpret_cast
- To interpret
char[]
as the typeT
: preferstd::memcpy
- Otherwise: Make sure the
char[]
buffer is correctly aligned forT
(watch out for different heap and stack alignment) - (pre-C++20: use the placement-
new
operator on the buffer) reinterpret_cast
the buffer asT
- Otherwise: Make sure the
Why not use reinterpret_cast
?
Using reinterpret_cast
to interpret the bytes of an object A as the as the object B does break several assumptions the C++ compiler makes about:
- The C++ object model (object lifetime rules, strict aliasing rules)
- The hardware architecture (alignment rules)
This section tries to show how reinterpret_cast
can cause all of the listed problems.
πͺ Strict Aliasing
A C++ compiler is allowed to assume that when de-referenced, two pointers of incompatible types do not have the same value (i.e. do not point to the same chunk of memory).
By using reinterpret_cast
you break the compiler’s assumption, leading to undefined behavior.
Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias eachother.) —CellPerformance - Understanding Strict Aliasing
The following example shows how passing the same pointer value in two different pointer types breaks the compiler’s assumptions.
In the init_vars
function, the compiler assumes that a
and b
reference different memory, allowing the compiler to assume that the *b = 0.f
expression does not change the value of *a
.
int init_vars(int* a, float* b) {
*a = 42;
*b = 0.f; // πͺ strict aliasing violation
// the compiler is allowed to assume 42 is returned
return *a;
}
int main() {
int a_and_b{12};
// the compiler will optimize this to `return 42`
return init_vars(&a_and_b, reinterpret_cast<float*>(&a_and_b));
}
If you reinterpret_cast
a pointer to an incompatible type and use it, you violate the strict aliasing rule, breaking the compiler’s assumptions.
π Alignment
Depending on the target hardware architecture, data types are required to be placed at specific positions in memory.
For example, a 2-byte integer (uint16_t
) could be required to start at a memory address divisible by 2
:
So far so good, but what would happened if we violated those requirements? The Linux kernel documentation says the following about unaligned memory access:
Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0).
[…]
—kernel.org Documentation - Unaligned Memory Accesses
- Some architectures are able to perform unaligned memory accesses transparently, but there is usually a significant performance cost.
- Some architectures raise processor exceptions when unaligned accesses happen. The exception handler is able to correct the unaligned access, at significant cost to performance.
- Some architectures raise processor exceptions when unaligned accesses happen, but the exceptions do not contain enough information for the unaligned access to be corrected.
- Some architectures are not capable of unaligned memory access, but will silently perform a different memory access to the one that was requested, resulting in a subtle code bug that is hard to detect!
In short: It’s bad; don’t do it.
Let’s try it anyways π!
The following C++14 code allocates a 128 byte buffer
on the heap.
This heap allocation guarantees that the first addressable byte of this allocation satisfies the alignment requirements of the largest known scalar value (std::max_align_t
).
Therefore, we could access the value at location &buffer[0]
as if it were a uint8_t
, uint16_t
, uint32_t
or uint64_t
without violating any alignment rules.
To force an unaligned memory access, let us interpret the bytes at index 1 and 2 as an uint16_t
and then print the integer value:
#include <memory>
#include <fmt/core.h>
int main() {
// create a heap buffer aligned with
// __STDCPP_DEFAULT_NEW_ALIGNMENT__
auto buffer = std::make_unique<uint8_t[]>(128);
buffer[0] = 0x00; // we don't care about this byte
buffer[1] = 0x01; // first byte of our uint16_t
buffer[2] = 0x02; // second byte of our uint16_t
buffer[3] = 0x03; // we don't care about this byte
// this is fine, uint8_t has a 1 byte alignment requirement
uint8_t* ptr8 = &buffer[1];
// uint16_t* points to unaligned memory address
uint16_t* ptr16 = reinterpret_cast<uint16_t*>(&buffer[1]);
// β‘ unaligned memory access happens here
fmt::print("*ptr8={}, *ptr16={}\n", *ptr8, *ptr16);
return 0;
}
Using clang-15 with -O2 -Wall
for amd64, the compiler does not emit any warnings.
No apparent runtime errors occur.
Only after enabling the Undefined Behavior Sanitizer (-fsanitize=undefined
), the error becomes apparent:
β² Object Lifetime
In short: Don’t worry about it if you don’t use reinterpret_cast
.
While the talk this article is based on states that creating a byte-array that provides storage for a type T
does not start the lifetime for the T
object, it seems like the latest C++20 standard draft has been extended to allow for this.
An operation that begins the lifetime of an array of char, unsigned char, or stdβ::βbyte implicitly creates objects within the region of storage occupied by the array. Any implicit or explicit invocation of a function named operator new or operator new[] implicitly creates objects in the returned region of storage and returns a pointer to a suitable created object. —C++20 standard draft [basic.memobj]
Furthermore it gives the following example which shows that the lifetime of an object can be started by allocating memory for it using malloc
:
#include <cstdlib>
struct X { int a, b; };
X *make_x() {
// The call to stdβ::βmalloc implicitly creates an object of type X
// and its subobjects a and b, and returns a pointer to that X object
// (or an object that is pointer-interconvertible ([basic.compound]) with it),
// in order to give the subsequent class member access operations
// defined behavior.
X *p = (X*)std::malloc(sizeof(struct X));
p->a = 1;
p->b = 2;
return p;
}
In my interpretation this still means that you can not start an object lifetime using reinterpret_cast
.
What to do instead?
The only safe alternatives are std::memcpy
and modern C++20 functions like std::bit_cast
.
Conversion between T
β U
of the same size
The wrong naΓ―ve solution would look like this:
#include <cstdint>
#include <fmt/core.h>
struct Foo { uint8_t x[4]; };
struct Bar { int32_t x; };
int main() {
Foo foo{{0x00, 0x01, 0x02, 0x03}};
Bar bar{12};
// πͺ: strict aliasing violation
// π: alignment violation
// β²: object lifetime violation
auto& as_float = reinterpret_cast<float&>(foo); // π
auto& as_uint = reinterpret_cast<uint32_t&>(foo); // π
auto& as_bar = reinterpret_cast<Bar&>(foo); // π
auto& as_foo = reinterpret_cast<Foo&>(bar); // β²
fmt::print(
"as_float={}, as_uint={}, as_bar.x={}, as_foo={}\n",
as_float, as_uint, as_bar.x, as_foo.x[0] // πͺ, πͺ, πͺ, πͺ
);
return 0;
}
There are three violations with this solution:
- πͺ The strict aliasing rule is violated since we create and use incompatible pointers referring to the same memory chunk
- π The alignment requirements of the types might not be met.
foo
requires a 1-byte alignment, whilefloat
,uint32_t
andBar
require a 4-byte alignment - β² The object lifetime of the
as_foo
object was never started; No (trivial) constructor was ever called; The lifetime was not implicitly started by the creation of a byte array
The correct solution for C++20 would look like this:
#include <bit>
#include <cstdint>
#include <fmt/core.h>
struct Foo { uint8_t x[4]; };
struct Bar { int32_t x; };
int main() {
Foo foo{{0x00, 0x01, 0x02, 0x03}};
Bar bar{12};
auto as_float = std::bit_cast<float>(foo);
auto as_uint = std::bit_cast<uint32_t>(foo);
auto as_bar = std::bit_cast<Bar>(foo);
auto as_foo = std::bit_cast<Foo>(bar);
fmt::print(
"as_float={}, as_uint={}, as_bar.x={}, as_foo={}\n",
as_float, as_uint, as_bar.x, as_foo.x[0]
);
return 0;
}
With std::bit_cast
The compiler tries to avoid copying the object if it fits all the alignment requirements.
For C++11 you can implement the bit_cast
polyfill function on your own, using a memcpy
(source):
template <class To, class From>
std::enable_if_t<
sizeof(To) == sizeof(From) &&
std::is_trivially_copyable<From>::value &&
std::is_trivially_copyable<To>::value,
To
>
bit_cast(const From& src) noexcept {
static_assert(std::is_trivially_constructible<To>::value,
"This implementation additionally requires "
"destination type to be trivially constructible");
To dst; // object lifetime started
std::memcpy(&dst, &src, sizeof(To));
return dst;
}
As you can see, the target type is trivially-constructed, ensuring the object lifetime is started and the alignment requirements are fulfilled, and then filled by std::memcpy
before being returned.
Even with this implementation the compiler is able to omit the std::memcpy
in many cases.
Conversion from T
β char[]
In short: use std::bit_cast
(or polyfill) if you can, use reinterpret_cast
if you must.
#include <cstdint>
#include <bit>
#include <array>
#include <fmt/core.h>
int main() {
uint64_t foo{5854636534293464452ULL};
auto bar = std::bit_cast<std::array<uint8_t, sizeof(foo)>>(foo);
for(size_t i = 0; i < sizeof(foo); ++i) {
fmt::print("bar[{}]=0x{:02x}\n", i, bar[i]);
}
return 0;
}
If after profiling your program you realize the compiler was not able to optimize the std::bit_cast
(or polyfill) operation, you can think about reinterpret_cast
:
#include <cstdint>
#include <fmt/core.h>
int main() {
uint64_t foo{5854636534293464452ULL};
uint8_t* bar = reinterpret_cast<uint8_t*>(&foo);
for(size_t i = 0; i < sizeof(foo); ++i) {
fmt::print("bar[{}]=0x{:02x}\n", i, bar[i]);
}
return 0;
}
What about
- πͺ Strict Aliasing?
unsigned char*
types are exempted from the aliasing rules - π Alignment?
uint8_t
has a 1-byte alignment requirement - β² Object Lifetime? I really don’t know, works fine thoβ’οΈ
If you are concerned about the object lifetime issue, don’t use reinterpret_cast
.
Conversion from char[]
β T
In short: Use std::memcpy
if you can, use reinterpret_cast
on a properly aligned char[]
buffer if you must.
The following example shows how a unaligned stack-based byte buffer is allocated (#1) and filled (#2) with data.
Then the foo
object is trivially-constructed (#3) (no code actually runs), before being filled with the bytes from the buffer (#4).
By using explicitly constructing the object and using std::memcpy
, no lifetime rules or alignment requirements are violated.
#include <memory>
#include <cstring>
#include <fmt/core.h>
struct Foo { uint32_t y; bool x; bool z; };
void recv(uint8_t* buf) {
buf[offsetof(Foo, x) + 0] = 0x01; // x
buf[offsetof(Foo, y) + 0] = 0xff; // y_3
buf[offsetof(Foo, y) + 1] = 0x00; // y_2
buf[offsetof(Foo, y) + 2] = 0x00; // y_1
buf[offsetof(Foo, y) + 3] = 0x00; // y_0
buf[offsetof(Foo, z) + 0] = 0x00; // z
}
int main() {
uint8_t stack_buffer[sizeof(Foo[10])]; // #1
// let's pretend we received this buffer via the network
recv(stack_buffer); // #2
// explicitly start the lifetime /w trivial ctor
Foo foo; // #3
// fill the foo with the bytes from the buffer
std::memcpy(&foo, stack_buffer, sizeof(foo)); // #4
fmt::print("foo=(x={}, y={}, z={})\n", foo.x, foo.y, foo.z);
return 0;
}
If for some reason you really don’t want to copy the bytes (i.e. performance), you interpret the bytes of the buffer as a new object directly, given that the buffer is correctly aligned and we make sure that the object lifetime ist started.
Let’s look at an example of interpreting bytes as a Foo
.
π What about Alignment?
If you use a new
-allocated buffer for receiving bytes from the network, it is able to provide storage for the Foo
object, since by definition the new allocation respects the __STDCPP_DEFAULT_NEW_ALIGNMENT__
alignment requirement.
A stack-allocated buffer provides no such guarantee, meaning that a stack-based char[]
buffer might start at an odd memory address, not fulfilling the alignment requirements of our Foo
object.
To force the stack-based buffer to fulfill those requirements, we can use the alignas
keyword.
Click to show explanatory graphic
β² What about object lifetime?
Starting with P0593R3, which was recently added to the C++20 standard, the lifetime of our Foo
object is implicitly started once we allocate the stack or heap based buffer.
Pre-C++20, you must start the lifetime yourself by using placement new
on the buffer, which executes the trivial constructor.
#include <memory>
#include <cstring>
#include <fmt/core.h>
struct Foo { uint32_t y; bool x; bool z; };
void recv(uint8_t* buf) {
buf[offsetof(Foo, x) + 0] = 0x01; // x
buf[offsetof(Foo, y) + 0] = 0xff; // y_3
buf[offsetof(Foo, y) + 1] = 0x00; // y_2
buf[offsetof(Foo, y) + 2] = 0x00; // y_1
buf[offsetof(Foo, y) + 3] = 0x00; // y_0
buf[offsetof(Foo, z) + 0] = 0x00; // z
}
int main() {
// lifetime for the Foo objects is implicitly started here (C++20)
auto heap_buffer = std::make_unique<uint8_t[]>(sizeof(Foo[10]));
alignas(Foo[10]) uint8_t stack_buffer[sizeof(Foo[10])];
// let's pretend we received this buffer via the network
recv(heap_buffer.get());
recv(stack_buffer);
// explicitly start the lifetime /w placement new (< C++20)
Foo* h0 = new(heap_buffer.get()) Foo;
fmt::print("h0=(x={}, y={}, z={})\n", h0->x, h0->y, h0->z);
// lifetime implictly created by the buffer allocation (C++20)
Foo* s0 = reinterpret_cast<Foo*>(stack_buffer);
fmt::print("s0=(x={}, y={}, z={})\n", s0->x, s0->y, s0->z);
return 0;
}
Conclusion
Why, C++, why?!
I want to go back to using reinterpret_cast
without seeing UB everywhere.