Basti’s Buggy Blog

Poking at software.

Practical Type Punning in C++11 and higher

Type punning (treating an object as if it were another one) is common in C++ code bases, yet most of the time it technically is undefined behavior. Typically it is used when reinterpreting bytes received from a network socket as a POD-struct. This article tries to answer two questions: Why should I not use reinterpret_cast? and What to use instead? It is heavily inspired by Timur Doumler’s talk at CppCon 2019.

Table of Contents

TL;DR

  • reinterpret_cast leads to UB if you don’t respect the alignment requirements of the target type.
  • Always prefer std::bit_cast and std::memcpy over reinterpret_cast
  • To access the char[] of a type T, just use reinterpret_cast
  • To interpret char[] as the type T: prefer std::memcpy
    • Otherwise: Make sure the char[] buffer is correctly aligned for T (watch out for different heap and stack alignment)
    • (pre-C++20: use the placement-new operator on the buffer)
    • reinterpret_cast the buffer as T

Why not use reinterpret_cast?

Using reinterpret_cast to interpret the bytes of an object A as the as the object B does break several assumptions the C++ compiler makes about:

  • The C++ object model (object lifetime rules, strict aliasing rules)
  • The hardware architecture (alignment rules)

This section tries to show how reinterpret_cast can cause all of the listed problems.

πŸͺž Strict Aliasing

A C++ compiler is allowed to assume that when de-referenced, two pointers of incompatible types do not have the same value (i.e. do not point to the same chunk of memory). By using reinterpret_cast you break the compiler’s assumption, leading to undefined behavior.

Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias eachother.) CellPerformance - Understanding Strict Aliasing

The following example shows how passing the same pointer value in two different pointer types breaks the compiler’s assumptions. In the init_vars function, the compiler assumes that a and b reference different memory, allowing the compiler to assume that the *b = 0.f expression does not change the value of *a.

int init_vars(int* a, float* b) {
   *a = 42;
   *b = 0.f; // πŸͺž strict aliasing violation
   // the compiler is allowed to assume 42 is returned
   return *a;
}

int main() {
   int a_and_b{12};
   // the compiler will optimize this to `return 42`
   return init_vars(&a_and_b, reinterpret_cast<float*>(&a_and_b));
}

If you reinterpret_cast a pointer to an incompatible type and use it, you violate the strict aliasing rule, breaking the compiler’s assumptions.

πŸ“ Alignment

Depending on the target hardware architecture, data types are required to be placed at specific positions in memory. For example, a 2-byte integer (uint16_t) could be required to start at a memory address divisible by 2:

So far so good, but what would happened if we violated those requirements? The Linux kernel documentation says the following about unaligned memory access:

Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0).

[…]

  • Some architectures are able to perform unaligned memory accesses transparently, but there is usually a significant performance cost.
  • Some architectures raise processor exceptions when unaligned accesses happen. The exception handler is able to correct the unaligned access, at significant cost to performance.
  • Some architectures raise processor exceptions when unaligned accesses happen, but the exceptions do not contain enough information for the unaligned access to be corrected.
  • Some architectures are not capable of unaligned memory access, but will silently perform a different memory access to the one that was requested, resulting in a subtle code bug that is hard to detect!
kernel.org Documentation - Unaligned Memory Accesses

In short: It’s bad; don’t do it.

Let’s try it anyways πŸ™‚!

The following C++14 code allocates a 128 byte buffer on the heap. This heap allocation guarantees that the first addressable byte of this allocation satisfies the alignment requirements of the largest known scalar value (std::max_align_t). Therefore, we could access the value at location &buffer[0] as if it were a uint8_t, uint16_t, uint32_t or uint64_t without violating any alignment rules.

To force an unaligned memory access, let us interpret the bytes at index 1 and 2 as an uint16_t and then print the integer value:

#include <memory>
#include <fmt/core.h>

int main() {
   // create a heap buffer aligned with
   // __STDCPP_DEFAULT_NEW_ALIGNMENT__
   auto buffer = std::make_unique<uint8_t[]>(128);

   buffer[0] = 0x00; // we don't care about this byte
   buffer[1] = 0x01; // first byte of our uint16_t
   buffer[2] = 0x02; // second byte of our uint16_t
   buffer[3] = 0x03; // we don't care about this byte

   // this is fine, uint8_t has a 1 byte alignment requirement
   uint8_t* ptr8 = &buffer[1];

   // uint16_t* points to unaligned memory address
   uint16_t* ptr16 = reinterpret_cast<uint16_t*>(&buffer[1]);

   // ⚑ unaligned memory access happens here
   fmt::print("*ptr8={}, *ptr16={}\n", *ptr8, *ptr16);

   return 0;
}

Using clang-15 with -O2 -Wall for amd64, the compiler does not emit any warnings. No apparent runtime errors occur. Only after enabling the Undefined Behavior Sanitizer (-fsanitize=undefined), the error becomes apparent:

/app/example.cpp:17:48: runtime error: reference binding to misaligned address 0x55af847a9f21 for type 'uint16_t' (aka 'unsigned short'), which requires 2 byte alignment
0x55af847a9f21: note: pointer points here
 00 00 00  00 01 02 03 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00
              ^ 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.cpp:17:48 in 

⏲ Object Lifetime

In short: Don’t worry about it if you don’t use reinterpret_cast.

While the talk this article is based on states that creating a byte-array that provides storage for a type T does not start the lifetime for the T object, it seems like the latest C++20 standard draft has been extended to allow for this.

An operation that begins the lifetime of an array of char, unsigned char, or std​::​byte implicitly creates objects within the region of storage occupied by the array. Any implicit or explicit invocation of a function named operator new or operator new[] implicitly creates objects in the returned region of storage and returns a pointer to a suitable created object. C++20 standard draft [basic.memobj]

Furthermore it gives the following example which shows that the lifetime of an object can be started by allocating memory for it using malloc:

#include <cstdlib>
struct X { int a, b; };
X *make_x() {
   // The call to std​::​malloc implicitly creates an object of type X
   // and its subobjects a and b, and returns a pointer to that X object
   // (or an object that is pointer-interconvertible ([basic.compound]) with it),
   // in order to give the subsequent class member access operations
   // defined behavior.
   X *p = (X*)std::malloc(sizeof(struct X));
   p->a = 1;
   p->b = 2;
   return p;
}

In my interpretation this still means that you can not start an object lifetime using reinterpret_cast.

What to do instead?

The only safe alternatives are std::memcpy and modern C++20 functions like std::bit_cast.

Conversion between T ↔ U of the same size

The wrong naΓ―ve solution would look like this:

#include <cstdint>
#include <fmt/core.h>

struct Foo { uint8_t x[4]; };
struct Bar { int32_t x; };

int main() {
   Foo foo{{0x00, 0x01, 0x02, 0x03}};
   Bar bar{12};
   // πŸͺž: strict aliasing violation
   // πŸ“: alignment violation
   // ⏲: object lifetime violation
   auto& as_float = reinterpret_cast<float&>(foo); // πŸ“
   auto& as_uint = reinterpret_cast<uint32_t&>(foo); // πŸ“
   auto& as_bar = reinterpret_cast<Bar&>(foo); // πŸ“
   auto& as_foo = reinterpret_cast<Foo&>(bar); // ⏲
   fmt::print(
      "as_float={}, as_uint={}, as_bar.x={}, as_foo={}\n",
      as_float, as_uint, as_bar.x, as_foo.x[0] // πŸͺž, πŸͺž, πŸͺž, πŸͺž
   );
   return 0;
}

There are three violations with this solution:

  1. πŸͺž The strict aliasing rule is violated since we create and use incompatible pointers referring to the same memory chunk
  2. πŸ“ The alignment requirements of the types might not be met. foo requires a 1-byte alignment, while float, uint32_t and Bar require a 4-byte alignment
  3. ⏲ The object lifetime of the as_foo object was never started; No (trivial) constructor was ever called; The lifetime was not implicitly started by the creation of a byte array

The correct solution for C++20 would look like this:

#include <bit>
#include <cstdint>
#include <fmt/core.h>

struct Foo { uint8_t x[4]; };
struct Bar { int32_t x; };

int main() {
   Foo foo{{0x00, 0x01, 0x02, 0x03}};
   Bar bar{12};
   auto as_float = std::bit_cast<float>(foo);
   auto as_uint = std::bit_cast<uint32_t>(foo);
   auto as_bar = std::bit_cast<Bar>(foo);
   auto as_foo = std::bit_cast<Foo>(bar);
   fmt::print(
      "as_float={}, as_uint={}, as_bar.x={}, as_foo={}\n",
      as_float, as_uint, as_bar.x, as_foo.x[0]
   );
   return 0;
}

With std::bit_cast The compiler tries to avoid copying the object if it fits all the alignment requirements.

For C++11 you can implement the bit_cast polyfill function on your own, using a memcpy (source):

template <class To, class From>
std::enable_if_t<
   sizeof(To) == sizeof(From) &&
   std::is_trivially_copyable<From>::value &&
   std::is_trivially_copyable<To>::value,
   To
>
bit_cast(const From& src) noexcept {
   static_assert(std::is_trivially_constructible<To>::value,
      "This implementation additionally requires "
      "destination type to be trivially constructible");

   To dst; // object lifetime started
   std::memcpy(&dst, &src, sizeof(To));
   return dst;
}

As you can see, the target type is trivially-constructed, ensuring the object lifetime is started and the alignment requirements are fulfilled, and then filled by std::memcpy before being returned. Even with this implementation the compiler is able to omit the std::memcpy in many cases.

Conversion from T β†’ char[]

In short: use std::bit_cast (or polyfill) if you can, use reinterpret_cast if you must.

#include <cstdint>
#include <bit>
#include <array>
#include <fmt/core.h>

int main() {
   uint64_t foo{5854636534293464452ULL};
   auto bar = std::bit_cast<std::array<uint8_t, sizeof(foo)>>(foo);
   for(size_t i = 0; i < sizeof(foo); ++i) {
      fmt::print("bar[{}]=0x{:02x}\n", i, bar[i]);
   }
   return 0;
}

If after profiling your program you realize the compiler was not able to optimize the std::bit_cast (or polyfill) operation, you can think about reinterpret_cast:

#include <cstdint>
#include <fmt/core.h>

int main() {
   uint64_t foo{5854636534293464452ULL};
   uint8_t* bar = reinterpret_cast<uint8_t*>(&foo);
   for(size_t i = 0; i < sizeof(foo); ++i) {
      fmt::print("bar[{}]=0x{:02x}\n", i, bar[i]);
   }
   return 0;
}

What about

  • πŸͺž Strict Aliasing? unsigned char* types are exempted from the aliasing rules
  • πŸ“ Alignment? uint8_t has a 1-byte alignment requirement
  • ⏲ Object Lifetime? I really don’t know, works fine thoℒ️

If you are concerned about the object lifetime issue, don’t use reinterpret_cast.

Conversion from char[] β†’ T

In short: Use std::memcpy if you can, use reinterpret_cast on a properly aligned char[] buffer if you must.

The following example shows how a unaligned stack-based byte buffer is allocated (#1) and filled (#2) with data. Then the foo object is trivially-constructed (#3) (no code actually runs), before being filled with the bytes from the buffer (#4). By using explicitly constructing the object and using std::memcpy, no lifetime rules or alignment requirements are violated.

#include <memory>
#include <cstring>
#include <fmt/core.h>

struct Foo { uint32_t y; bool x; bool z; };

void recv(uint8_t* buf) {
   buf[offsetof(Foo, x) + 0] = 0x01; // x
   buf[offsetof(Foo, y) + 0] = 0xff; // y_3
   buf[offsetof(Foo, y) + 1] = 0x00; // y_2
   buf[offsetof(Foo, y) + 2] = 0x00; // y_1
   buf[offsetof(Foo, y) + 3] = 0x00; // y_0
   buf[offsetof(Foo, z) + 0] = 0x00; // z
}

int main() {
   uint8_t stack_buffer[sizeof(Foo[10])]; // #1

   // let's pretend we received this buffer via the network
   recv(stack_buffer); // #2

   // explicitly start the lifetime /w trivial ctor
   Foo foo; // #3
   // fill the foo with the bytes from the buffer
   std::memcpy(&foo, stack_buffer, sizeof(foo)); // #4
   fmt::print("foo=(x={}, y={}, z={})\n", foo.x, foo.y, foo.z);

   return 0;
}

If for some reason you really don’t want to copy the bytes (i.e. performance), you interpret the bytes of the buffer as a new object directly, given that the buffer is correctly aligned and we make sure that the object lifetime ist started. Let’s look at an example of interpreting bytes as a Foo.

πŸ“ What about Alignment? If you use a new-allocated buffer for receiving bytes from the network, it is able to provide storage for the Foo object, since by definition the new allocation respects the __STDCPP_DEFAULT_NEW_ALIGNMENT__ alignment requirement. A stack-allocated buffer provides no such guarantee, meaning that a stack-based char[] buffer might start at an odd memory address, not fulfilling the alignment requirements of our Foo object. To force the stack-based buffer to fulfill those requirements, we can use the alignas keyword.

Click to show explanatory graphic

⏲ What about object lifetime? Starting with P0593R3, which was recently added to the C++20 standard, the lifetime of our Foo object is implicitly started once we allocate the stack or heap based buffer. Pre-C++20, you must start the lifetime yourself by using placement new on the buffer, which executes the trivial constructor.

#include <memory>
#include <cstring>
#include <fmt/core.h>

struct Foo { uint32_t y; bool x; bool z; };

void recv(uint8_t* buf) {
   buf[offsetof(Foo, x) + 0] = 0x01; // x
   buf[offsetof(Foo, y) + 0] = 0xff; // y_3
   buf[offsetof(Foo, y) + 1] = 0x00; // y_2
   buf[offsetof(Foo, y) + 2] = 0x00; // y_1
   buf[offsetof(Foo, y) + 3] = 0x00; // y_0
   buf[offsetof(Foo, z) + 0] = 0x00; // z
}

int main() {
   // lifetime for the Foo objects is implicitly started here (C++20)
   auto heap_buffer = std::make_unique<uint8_t[]>(sizeof(Foo[10]));
   alignas(Foo[10]) uint8_t stack_buffer[sizeof(Foo[10])];

   // let's pretend we received this buffer via the network
   recv(heap_buffer.get());
   recv(stack_buffer);

   // explicitly start the lifetime /w placement new (< C++20)
   Foo* h0 = new(heap_buffer.get()) Foo;
   fmt::print("h0=(x={}, y={}, z={})\n", h0->x, h0->y, h0->z);

   // lifetime implictly created by the buffer allocation (C++20)
   Foo* s0 = reinterpret_cast<Foo*>(stack_buffer);
   fmt::print("s0=(x={}, y={}, z={})\n", s0->x, s0->y, s0->z);

   return 0;
}

Conclusion

Why, C++, why?! I want to go back to using reinterpret_cast without seeing UB everywhere.

See Also