Atomics: the beauty is in the eye of the holder
Back when I was working at Radisys, one of the first things I had to do was implement a queue. Per the VOLTHA architecture, the OLT must stream it's
Here's an excerpt from this schema:
service NativeEventsManagementService {
// ...
// Initiate the server streaming of the events
rpc StreamEvents(google.protobuf.Empty) returns(stream Event);
}
The Radisys OLT, requires sending a gRPC stream of errors/alerts about its health. Now, VOLTHA itself is written in Go but the OLT firmware is written in C++.
Disclaimer
An OLT is a pretty simple device. It doesn't really need a whole lot of performance engineering, and atomics were definitely overkill for this. I was young and dumb
and wanted to utilise modern C++ to its fullest.
So in the implementation of the queue, I did the following:
class Queue {
struct Node {
T data;
std::atomic<Node*> next{nullptr};
// ...
}
}
The head
and tail
of the queue can hold the std::atomic<Node*>
type, both initialised to the same pointer location. Now, for a quick overview of the memory_order
: essentially, the sequence of instructions you write isn't always what's actually executed. After you've written an instruction(s), the compiler as well the CPU hardware reorders them to optimise for performance. This is called Out of Order Execution (Wiki). This is usually good, but sometimes we want to be more explicit about how we want said instructions to be executed. Per cppreference, we have the following paradigms:
memory_order_relaxed
: bare minimum guarantee in consistency, is atomicmemory_order_acquire
: consistency after acquiringmemory_order_release
: consistency before releasingmemory_order_seq_cst
: strict sequential consistency (Wiki)
With this, our enqueue
function becomes:
void enqueue(const T& data) {
auto new_node = std::make_unique<Node>(data);
Node* raw_node = new_node.release(); // * ownership transferred to queue
Node* current_tail = tail.load(std::memory_order_relaxed);
// * ┌------------------------------┘
// * can be `relaxed` since `tail` is only modified by the producer
current_tail->next.store(raw_node, std::memory_order_release);
tail.store(raw_node, std::memory_order_release);
}
dequeue
would have a similar implementation, but use std::memory_order_acquire
instead of std::memory_order_release
. Furthermore, reading from the head
would be a std::memory_order_relaxed
operation as it's only modified by the modified by the consumer.
Looks good, right? There's a problem. The tail
and head
pointers work well when we have a single producer and a single consumer. But in our case, we have events coming in from multiple producers. Which means they'll now have to contend to write to this queue. So now we have to add a retry mechanism:
void enqueue(T data) {
auto new_node = std::make_unique<Node>(data);
Node* raw_node = new_node.release();
while (true) {
Node* current_tail = tail.load(std::memory_order_acquire);
Node* next = current_tail->next.load(std::memory_order_acquire);
if (next == nullptr) {
// * next node is null, try to set tail of `next` node to `raw_node`
if (current_tail->next.compare_exchange_strong(next, raw_node, _, _)) {
// * perform o/p on success
tail.compare_exchange_strong(current_tail, raw_node, _, _);
break;
}
}
}
}
But this would get stuck when a thread modifies the value of tail
, but before it can update it's location, another thread coms in to modify tail
themself,
So we must handle the case where tail
gets stuck pointing to next != null
. Our while
loop now becomes:
while (true) {
Node* current_tail = tail.load(std::memory_order_acquire);
Node* next = current_tail->next.load(std::memory_order_acquire);
if (next == nullptr) {
if (current_tail->next.compare_exchange_strong(next, raw_node, _, _)) {
// success: now update tail
tail.compare_exchange_strong(current_tail, raw_node, _, _);
return;
}
// handle race condition
} else {
// advance the tail past the already-linked node
tail.compare_exchange_strong(current_tail, next, _, _);
}
}
I'm not even going to pretend that I fully understand what's happening anymore. After this, I was hesitant to use atomics. Another reason was their inconsistent compatibility across hardware. This blog by ArangoDB delves into more detail, and does so quite well.
In cases where an atomic operation is not fully supported, a mutex lock is used as a fallback. Damn.
The other reason is what I like to call the illusion of performance. In theory, atomic operations are faster than mutex locks.
std::atomic
was introduced to enable
All of this charade to arrive at a rather predictable solution: just use std::mutex
instead.
But wait!
C++
is just so big that you could abandon a whole world of design and move on without thinking too much.
Since then I've learnt the faults. My experience wasn't unique.
Reflection
In my opinion the beauty of C++
is it's flexibility in design paradigms - you can be functional, can follow classes, you could work with numerous abstractions if you don't wanna get lost in implementation.
if one assesses the trends of expectations from a programming language what do you see - Python and Rust are booming. They coexist. There are instances when
Gratitude
A big thanks to my colleagues from the Radisys OLT team, Praveen, Abhilash, Prateek, Tejal, Shiva and Thulasi.