The Curious Case of the Impossible Checksum
Naturally, as a software developer being confronted with less-than-optimal software, the threshold for annoyance-caused re-implementation of proprietary apps is quite low. In this particular instance I wanted to use my phone to control the resistance of my stationary exercise bike (Kettler Racer S). Proprietary Android apps can be used to automatically regulate the resistance of the trainer via Bluetooth according to an exercise plan you pick. To quote GitHub user “kaegi”, the usability of aforementioned apps ranges from “bad-user experience” to “non-functional”. I decided to take matters into my own hands and implemented the Bluetooth protocol for controlling the trainer, causing me to uncover a surprisingly stupid implementation bug in the firmware.
TL;DR
- I re-implemented a serial RFCOMM Bluetooth protocol to control my exercise bike
- The protocol contains special characters to delimit the beginning and end of frames
- The frame payload has to be escaped to not contain those special characters
- The protocol has a checksum, which is wrongfully escaped, leading to impossible checksums
- To prevent those impossible checksums, unused bytes in the protocol can be used to avoid erroneous checksums
Table of Contents
The Bluetooth Protocol
By reverse-engineering the barely functional Android apps, I found out that the phone establishes a Bluetooth RFCOMM channel to the trainer and transmits a initial authentication sequence. In the jungle of Bluetooth protocols RFCOMM is a reliable, serial data connection (basically a poor man’s TCP), which is relatively easy to implement. In Android, Bluetooth tracing can be enabled in the developer settings in order to create Wireshark-compatible PCAP dumps for us to analyze. Luckily I didn’t have to go through the trouble of blackbox-analyzing the traffic since weakly obfuscated Android apps and a Rust based implementation of the Kettler protocol do exist.
To keep it short, the protocol has two simple components: properties and methods. Some properties can be read and written to by using variable-length user-defined values. Some properties are readable and writable, others are read-only. The methods define which action is taken on the property.
For example, to set the trainers target resistance, the phone needs to send a WRITE
POWER_TARGET
request with the desired power as the value.
To check if the command was successful, we can send a READ
POWER_TARGET
command and check if our desired value is set.
There is a long list of properties of which many are not supported on the trainer since they are meant for different equipment types (like a treadmill).
method | byte |
---|---|
READ | 0x01 |
WRITE | 0x02 |
ANSWER | 0x03 |
STATUS | 0x04 |
ERROR | 0x05 |
RESET | 0x06 |
property | bytes |
---|---|
AUTHENTIFICATION | 0x00 0x01 |
DEVICE_STATE | 0x00 0x06 |
RPM | 0x00 0x09 |
POWER_TARGET | 0x00 0x0a |
POWER_CURRENT | 0x00 0x0b |
… | … |
Each request/response follows the same binary format. Since this message is later packaged into a frame, we will call the request/response a frame payload. The frame payload structure is as simple as that:
The leading two bytes describe which property the message is about, followed by one byte that defines the method of the message. To allow the user to send variable-length values, the fourth and fifth byte define the length of the user-defined value. Finally, the last bytes are the user-defined value.
Protocol-Framing
Up until now we have ignored the fact that the RFCOMM Bluetooth channel is a continuous data stream, which does not come with the capability of sending distinct packets (i.e. frames) of data (like Ethernet). If a developer chooses to use a frame-based protocol, they need to implement this themselves. For some use-cases a framing might not even be needed.
In our case though, a very simple framing is used. In the continuous stream of data, a special character is used to signal the start of a frame. Likewise, another special character is used to signal the end of a frame. The frame payload can be placed in between those special characters.
This protocol uses the ASCII Characters STX (0x02
, Start of Text) and ETX (0x03
, End of Text) as the special characters.
Additionally a checksum is appended to the end of the frame, ensuring the integrity of the message:
Detecting frames could not be more simple, here is some pseudocode that ignores errors and the checksum:
while(stream.isOpen()) {
// buffer for the frame payload
frameBuffer = Buffer()
// read the first byte
b = stream.readByte()
// wait for the beginning of a frame
while(b != 0x02) {
b = stream.readByte()
}
// until we encounter the end of the frame
while((b = stream.readByte()) != 0x03) {
// store the payload byte in the frame buffer
frameBuffer.append(b)
}
// TODO: checksum stuff here
frameReceived(b)
}
The code waits for the STX character to appear in the stream.
All following bytes are written to a buffer, until a ETX character is received.
Finally the frame is submitted via the frameReceived
method and the procedure restarts.
Payload Escaping
As the frame payload might contain one of the special characters used for the framing, the offending characters have to be escaped.
For this, a third special ASCII character is chosen: DLE (0x10
, Data Link Escape).
This character, followed by one of three possible other characters, forms an escape sequence which allows the transmission of special characters in an encoded way.
The following table shows all three inputs with their respective escape sequence:
unescaped input | escaped output |
---|---|
0x02 | 0x10 0x22 |
0x03 | 0x10 0x23 |
0x10 | 0x10 0x30 |
The three to be escaped characters with their escape sequences
In the process of escaping, those input bytes are simply replaced by the respective escape sequence. After escaping, the frame payload does not contain any special characters that interfere with the framing anymore.
The following example shows how a payload is framed. First, the payload is escaped, then the STX and ETX bytes are wrapped around the payload and finally the checksum is appended.
Checksum
So far everything has been quite straightforward. Now onto the more convoluted stuff: The checksum. To verify the integrity of the message on the application layer, a 16-bit CCIT CRC (generator polynomial 0x8408) is used. It aims to protect the integrity of the unescaped payload.
Before escaping the payload bytes, the checksum is built over all the payload bytes. The 2-byte result is appended to the frame after the ETX character.
Both, the sender and the receiver, calculate the checksum. If the receiver detects a difference in the checksums, the message is corrupted and the corruption must be handled (e.g. by ignoring the message).
Impossible Checksums
I was quite happy with my implementation of this protocol, since it seemed to work pretty reliable. But after some time of playing around with sending random values, I noticed that setting specific target power values caused the Bluetooth connection to reset. I checked the logs of the received frames and saw some error messages: “Invalid Checksum”
To be more specific, I observed two similar effects:
- Some power target values I received had invalid checksums
- Some power target values I sent were not accepted by the trainer
Receiving
It was quite odd to me that specific values reliably caused invalid checksums, leading me to believe there was a bug in my implementation of the CRC algorithm1. To have a basis for testing, I wrote down all received frames with faulty checksums and wrote some unit tests for them. I compared by calculated CRC with the CRC I received from the trainer:
I quickly noticed that all the expected checksums contained one of the special characters. Furthermore, the actual checksums seem to be the escaped form of the expected checksums, truncated to two bytes.
This lead me to formulating a speculation about the inner workings of the trainer’s firmware: The checksum is not allowed to contain any special character since it would interfere with the frame-detection algorithm. As a workaround for checksums containing special characters, the implementers decided to escape and truncate the checksums before being appending them to the end of the frame.
While this certainly being odd, this is unlikely to cause issues if the sender and receiver both conform to this escaping. It somewhat reduces the number of available checksums, but for the purpose of a application level checksum it does not seem too dangerous.
I adjusted my implementation to escape and truncate the calculated checksum before comparing it to the received one. After that I had no issues with receiving invalid checksums.
One problem fixed, one to go!
Sending
The second issue I had was that specific power target commands I sent to the trainer were simply ignored. After working on the first checksum issue, I suspected that it was going to be related to that. Once again I wrote down the bytes of the frames that caused issues in unit tests:
Sure enough - each of the calculated checksums contained one of the three special characters. “Can’t be too hard to fix it then, I’ll just escape and truncate the checksum, like I did when receiving the checksums.” (Spoiler: It was not that simple)
Even after implementing the escaping and truncation, the commands would not be accepted by the trainer. I was at a loss of what to do next, since seemingly there was no way to get the trainer to accept the valid checksum. … Unless I could choose the checksum myself!
At this point I remembered the “unused” byte in the frame payload:
This lead me to the idea that I could avoid having to deal with the faulty escaping by simply not using checksums that contain special characters: Whenever I calculate an invalid checksum, I try out different values for the unused byte until the checksum does not contain any special characters.
To my surprise it actually worked. In all of my test cases, replacing the unused 0x00 byte with 0x01 or 0x04 was sufficient to prevent the checksum from containing a special character. The Kotlin implementation of this idea was pretty straightforward:
Conclusion
To me these bugs look like a rushed implementation during the time the whole QA department was on holidays. Someone probably applied a quick fix after noticing that the frame parsing algorithm wouldn’t work for certain CRCs.
I think that this also partly explains why the Bluetooth connection is so flaky when using the proprietary apps. Not one implementation I saw implemented a workaround for this CRC “anomaly”.
Two speculations:
- This bug probably exists on various Kettler devices which implement a RFCOMM interface
- Had Kettler published the description of the protocol, there would be many open source apps to control Kettler devices
-
I hate implementing bit-level operations in Java/Kotlin 🤮 ↩︎