Practical Type Punning in C++11 and higher
Type punning (treating an object as if it were another one) is common in C++ code bases, yet most of the time it technically is undefined behavior.
Typically it is used when reinterpreting bytes received from a network socket as a POD-struct.
This article tries to answer two questions: Why should I not use reinterpret_cast
? and What to use instead?
It is heavily inspired by Timur Doumler’s talk at CppCon 2019.
Table of Contents
TL;DR
reinterpret_cast
leads to UB if you don’t respect the alignment requirements of the target type.- Always prefer
std::bit_cast
andstd::memcpy
overreinterpret_cast
- To access the
char[]
of a typeT
, just usereinterpret_cast
- To interpret
char[]
as the typeT
: preferstd::memcpy
- Otherwise: Make sure the
char[]
buffer is correctly aligned forT
(watch out for different heap and stack alignment) - (pre-C++20: use the placement-
new
operator on the buffer) reinterpret_cast
the buffer asT
- Otherwise: Make sure the
Why not use reinterpret_cast
?
Using reinterpret_cast
to interpret the bytes of an object A as the as the object B does break several assumptions the C++ compiler makes about:
- The C++ object model (object lifetime rules, strict aliasing rules)
- The hardware architecture (alignment rules)
This section tries to show how reinterpret_cast
can cause all of the listed problems.
πͺ Strict Aliasing
A C++ compiler is allowed to assume that when de-referenced, two pointers of incompatible types do not have the same value (i.e. do not point to the same chunk of memory).
By using reinterpret_cast
you break the compiler’s assumption, leading to undefined behavior.
Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias eachother.) —CellPerformance - Understanding Strict Aliasing
The following example shows how passing the same pointer value in two different pointer types breaks the compiler’s assumptions.
In the init_vars
function, the compiler assumes that a
and b
reference different memory, allowing the compiler to assume that the *b = 0.f
expression does not change the value of *a
.
int init_vars(int* a, float* b) {
*a = 42;
*b = 0.f; // πͺ strict aliasing violation
// the compiler is allowed to assume 42 is returned
return *a;
}
int main() {
int a_and_b{12};
// the compiler will optimize this to `return 42`
return init_vars(&a_and_b, reinterpret_cast<float*>(&a_and_b));
}
If you reinterpret_cast
a pointer to an incompatible type and use it, you violate the strict aliasing rule, breaking the compiler’s assumptions.
π Alignment
Depending on the target hardware architecture, data types are required to be placed at specific positions in memory.
For example, a 2-byte integer (uint16_t
) could be required to start at a memory address divisible by 2
:
So far so good, but what would happened if we violated those requirements? The Linux kernel documentation says the following about unaligned memory access:
Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0).
[…]
—kernel.org Documentation - Unaligned Memory Accesses
- Some architectures are able to perform unaligned memory accesses transparently, but there is usually a significant performance cost.
- Some architectures raise processor exceptions when unaligned accesses happen. The exception handler is able to correct the unaligned access, at significant cost to performance.
- Some architectures raise processor exceptions when unaligned accesses happen, but the exceptions do not contain enough information for the unaligned access to be corrected.
- Some architectures are not capable of unaligned memory access, but will silently perform a different memory access to the one that was requested, resulting in a subtle code bug that is hard to detect!
In short: It’s bad; don’t do it.
Let’s try it anyways π!
The following C++14 code allocates a 128 byte buffer
on the heap.
This heap allocation guarantees that the first addressable byte of this allocation satisfies the alignment requirements of the largest known scalar value (std::max_align_t
).
Therefore, we could access the value at location &buffer[0]
as if it were a uint8_t
, uint16_t
, uint32_t
or uint64_t
without violating any alignment rules.
To force an unaligned memory access, let us interpret the bytes at index 1 and 2 as an uint16_t
and then print the integer value:
#include <memory>
#include <fmt/core.h>
int main() {
// create a heap buffer aligned with
// __STDCPP_DEFAULT_NEW_ALIGNMENT__
auto buffer = std::make_unique<uint8_t[]>(128);
buffer[0] = 0x00; // we don't care about this byte
buffer[1] = 0x01; // first byte of our uint16_t
buffer[2] = 0x02; // second byte of our uint16_t
buffer[3] = 0x03; // we don't care about this byte
// this is fine, uint8_t has a 1 byte alignment requirement
uint8_t* ptr8 = &buffer[1];
// uint16_t* points to unaligned memory address
uint16_t* ptr16 = reinterpret_cast<uint16_t*>(&buffer[1]);
// β‘ unaligned memory access happens here
fmt::print("*ptr8={}, *ptr16={}\n", *ptr8, *ptr16);
return 0;
}
Using clang-15 with -O2 -Wall
for amd64, the compiler does not emit any warnings.
No apparent runtime errors occur.
Only after enabling the Undefined Behavior Sanitizer (-fsanitize=undefined
), the error becomes apparent:
/app/example.cpp:17:48: runtime error: reference binding to misaligned address 0x55af847a9f21 for type 'uint16_t' (aka 'unsigned short'), which requires 2 byte alignment
0x55af847a9f21: note: pointer points here
00 00 00 00 01 02 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.cpp:17:48 in
β² Object Lifetime
In short: Don’t worry about it if you don’t use reinterpret_cast
.
While the talk this article is based on states that creating a byte-array that provides storage for a type T
does not start the lifetime for the T
object, it seems like the latest C++20 standard draft has been extended to allow for this.
An operation that begins the lifetime of an array of char, unsigned char, or stdβ::βbyte implicitly creates objects within the region of storage occupied by the array. Any implicit or explicit invocation of a function named operator new or operator new[] implicitly creates objects in the returned region of storage and returns a pointer to a suitable created object. —C++20 standard draft [basic.memobj]
Furthermore it gives the following example which shows that the lifetime of an object can be started by allocating memory for it using malloc
:
#include <cstdlib>
struct X { int a, b; };
X *make_x() {
// The call to stdβ::βmalloc implicitly creates an object of type X
// and its subobjects a and b, and returns a pointer to that X object
// (or an object that is pointer-interconvertible ([basic.compound]) with it),
// in order to give the subsequent class member access operations
// defined behavior.
X *p = (X*)std::malloc(sizeof(struct X));
p->a = 1;
p->b = 2;
return p;
}
In my interpretation this still means that you can not start an object lifetime using reinterpret_cast
.
What to do instead?
The only safe alternatives are std::memcpy
and modern C++20 functions like std::bit_cast
.
Conversion between T
β U
of the same size
The wrong naΓ―ve solution would look like this:
#include <cstdint>
#include <fmt/core.h>
struct Foo { uint8_t x[4]; };
struct Bar { int32_t x; };
int main() {
Foo foo{{0x00, 0x01, 0x02, 0x03}};
Bar bar{12};
// πͺ: strict aliasing violation
// π: alignment violation
// β²: object lifetime violation
auto& as_float = reinterpret_cast<float&>(foo); // π
auto& as_uint = reinterpret_cast<uint32_t&>(foo); // π
auto& as_bar = reinterpret_cast<Bar&>(foo); // π
auto& as_foo = reinterpret_cast<Foo&>(bar); // β²
fmt::print(
"as_float={}, as_uint={}, as_bar.x={}, as_foo={}\n",
as_float, as_uint, as_bar.x, as_foo.x[0] // πͺ, πͺ, πͺ, πͺ
);
return 0;
}
There are three violations with this solution:
- πͺ The strict aliasing rule is violated since we create and use incompatible pointers referring to the same memory chunk
- π The alignment requirements of the types might not be met.
foo
requires a 1-byte alignment, whilefloat
,uint32_t
andBar
require a 4-byte alignment - β² The object lifetime of the
as_foo
object was never started; No (trivial) constructor was ever called; The lifetime was not implicitly started by the creation of a byte array
The correct solution for C++20 would look like this:
#include <bit>
#include <cstdint>
#include <fmt/core.h>
struct Foo { uint8_t x[4]; };
struct Bar { int32_t x; };
int main() {
Foo foo{{0x00, 0x01, 0x02, 0x03}};
Bar bar{12};
auto as_float = std::bit_cast<float>(foo);
auto as_uint = std::bit_cast<uint32_t>(foo);
auto as_bar = std::bit_cast<Bar>(foo);
auto as_foo = std::bit_cast<Foo>(bar);
fmt::print(
"as_float={}, as_uint={}, as_bar.x={}, as_foo={}\n",
as_float, as_uint, as_bar.x, as_foo.x[0]
);
return 0;
}
With std::bit_cast
The compiler tries to avoid copying the object if it fits all the alignment requirements.
For C++11 you can implement the bit_cast
polyfill function on your own, using a memcpy
(source):
template <class To, class From>
std::enable_if_t<
sizeof(To) == sizeof(From) &&
std::is_trivially_copyable<From>::value &&
std::is_trivially_copyable<To>::value,
To
>
bit_cast(const From& src) noexcept {
static_assert(std::is_trivially_constructible<To>::value,
"This implementation additionally requires "
"destination type to be trivially constructible");
To dst; // object lifetime started
std::memcpy(&dst, &src, sizeof(To));
return dst;
}
As you can see, the target type is trivially-constructed, ensuring the object lifetime is started and the alignment requirements are fulfilled, and then filled by std::memcpy
before being returned.
Even with this implementation the compiler is able to omit the std::memcpy
in many cases.
Conversion from T
β char[]
In short: use std::bit_cast
(or polyfill) if you can, use reinterpret_cast
if you must.
#include <cstdint>
#include <bit>
#include <array>
#include <fmt/core.h>
int main() {
uint64_t foo{5854636534293464452ULL};
auto bar = std::bit_cast<std::array<uint8_t, sizeof(foo)>>(foo);
for(size_t i = 0; i < sizeof(foo); ++i) {
fmt::print("bar[{}]=0x{:02x}\n", i, bar[i]);
}
return 0;
}
If after profiling your program you realize the compiler was not able to optimize the std::bit_cast
(or polyfill) operation, you can think about reinterpret_cast
:
#include <cstdint>
#include <fmt/core.h>
int main() {
uint64_t foo{5854636534293464452ULL};
uint8_t* bar = reinterpret_cast<uint8_t*>(&foo);
for(size_t i = 0; i < sizeof(foo); ++i) {
fmt::print("bar[{}]=0x{:02x}\n", i, bar[i]);
}
return 0;
}
What about
- πͺ Strict Aliasing?
unsigned char*
types are exempted from the aliasing rules - π Alignment?
uint8_t
has a 1-byte alignment requirement - β² Object Lifetime? I really don’t know, works fine thoβ’οΈ
If you are concerned about the object lifetime issue, don’t use reinterpret_cast
.
Conversion from char[]
β T
In short: Use std::memcpy
if you can, use reinterpret_cast
on a properly aligned char[]
buffer if you must.
The following example shows how a unaligned stack-based byte buffer is allocated (#1) and filled (#2) with data.
Then the foo
object is trivially-constructed (#3) (no code actually runs), before being filled with the bytes from the buffer (#4).
By using explicitly constructing the object and using std::memcpy
, no lifetime rules or alignment requirements are violated.
#include <memory>
#include <cstring>
#include <fmt/core.h>
struct Foo { uint32_t y; bool x; bool z; };
void recv(uint8_t* buf) {
buf[offsetof(Foo, x) + 0] = 0x01; // x
buf[offsetof(Foo, y) + 0] = 0xff; // y_3
buf[offsetof(Foo, y) + 1] = 0x00; // y_2
buf[offsetof(Foo, y) + 2] = 0x00; // y_1
buf[offsetof(Foo, y) + 3] = 0x00; // y_0
buf[offsetof(Foo, z) + 0] = 0x00; // z
}
int main() {
uint8_t stack_buffer[sizeof(Foo[10])]; // #1
// let's pretend we received this buffer via the network
recv(stack_buffer); // #2
// explicitly start the lifetime /w trivial ctor
Foo foo; // #3
// fill the foo with the bytes from the buffer
std::memcpy(&foo, stack_buffer, sizeof(foo)); // #4
fmt::print("foo=(x={}, y={}, z={})\n", foo.x, foo.y, foo.z);
return 0;
}
If for some reason you really don’t want to copy the bytes (i.e. performance), you interpret the bytes of the buffer as a new object directly, given that the buffer is correctly aligned and we make sure that the object lifetime ist started.
Let’s look at an example of interpreting bytes as a Foo
.
π What about Alignment?
If you use a new
-allocated buffer for receiving bytes from the network, it is able to provide storage for the Foo
object, since by definition the new allocation respects the __STDCPP_DEFAULT_NEW_ALIGNMENT__
alignment requirement.
A stack-allocated buffer provides no such guarantee, meaning that a stack-based char[]
buffer might start at an odd memory address, not fulfilling the alignment requirements of our Foo
object.
To force the stack-based buffer to fulfill those requirements, we can use the alignas
keyword.
Click to show explanatory graphic
β² What about object lifetime?
Starting with P0593R3, which was recently added to the C++20 standard, the lifetime of our Foo
object is implicitly started once we allocate the stack or heap based buffer.
Pre-C++20, you must start the lifetime yourself by using placement new
on the buffer, which executes the trivial constructor.
#include <memory>
#include <cstring>
#include <fmt/core.h>
struct Foo { uint32_t y; bool x; bool z; };
void recv(uint8_t* buf) {
buf[offsetof(Foo, x) + 0] = 0x01; // x
buf[offsetof(Foo, y) + 0] = 0xff; // y_3
buf[offsetof(Foo, y) + 1] = 0x00; // y_2
buf[offsetof(Foo, y) + 2] = 0x00; // y_1
buf[offsetof(Foo, y) + 3] = 0x00; // y_0
buf[offsetof(Foo, z) + 0] = 0x00; // z
}
int main() {
// lifetime for the Foo objects is implicitly started here (C++20)
auto heap_buffer = std::make_unique<uint8_t[]>(sizeof(Foo[10]));
alignas(Foo[10]) uint8_t stack_buffer[sizeof(Foo[10])];
// let's pretend we received this buffer via the network
recv(heap_buffer.get());
recv(stack_buffer);
// explicitly start the lifetime /w placement new (< C++20)
Foo* h0 = new(heap_buffer.get()) Foo;
fmt::print("h0=(x={}, y={}, z={})\n", h0->x, h0->y, h0->z);
// lifetime implictly created by the buffer allocation (C++20)
Foo* s0 = reinterpret_cast<Foo*>(stack_buffer);
fmt::print("s0=(x={}, y={}, z={})\n", s0->x, s0->y, s0->z);
return 0;
}
Conclusion
Why, C++, why?!
I want to go back to using reinterpret_cast
without seeing UB everywhere.