← Zurück zu den Artikeln
March 27, 2026
5 min read

ZigBolt: Why We Built Our Own Aeron in Zig and Hit 20 Nanoseconds Per Message

ZigBolt: Why We Built Our Own Aeron in Zig and Hit 20 Nanoseconds Per Message
#zigbolt
#zig
#hft
#low-latency
#messaging
#aeron
#ipc
#open-source

ZigBolt — ultra-low latency messaging Lock-free ring buffers, zero-copy codecs, Raft cluster — all in pure Zig, all open source.

If you work in algorithmic trading or market making, you know the price of every microsecond. One extra context switch — and your order arrives second. One JVM GC pause — and the market maker on the other side has already updated the quote. In a world where money is measured in nanoseconds, messaging infrastructure isn't a boring pipe between services — it's a competitive advantage.

We built ZigBolt — a messaging system for high-frequency trading written in Zig. From scratch. No JVM, no garbage collector, no Media Driver, no XML configs. And we got 20 nanoseconds p50 latency on an SPSC ring buffer and 30 nanoseconds on IPC via shared memory.

This article covers why we needed it, how it works inside, and why Zig.


TL;DR

  • ZigBolt — open-source (MIT) messaging system for HFT in pure Zig
  • 20 ns p50 on SPSC, 30 ns p50 on IPC — faster than Aeron's published numbers
  • Zero-copy codec runs at 0 ns (compile-time generation, runtime is just a pointer cast)
  • No GC, no JVM, no Media Driver — the library embeds directly into your application
  • Raft cluster, archive, sequencer — all included
  • FFI bindings for Rust, Python, Go, TypeScript, C — work in whatever language you prefer

The Problem: Why Aeron Is Great But Not Enough

Aeron from Real Logic is the de facto standard for low-latency messaging in capital markets. Dozens of HFT firms use it, it's battle-tested, and it has excellent architecture. But Aeron has a fundamental problem, and its name is JVM.

JVM Safepoints: The Invisible Enemy

Even if you carefully put all data in off-heap memory, even if you disabled GC ergonomics and set GuaranteedSafepointInterval=300000 — the JVM still occasionally stops all threads at a safepoint. This isn't a bug, it's an architectural decision: the JVM needs safepoints for deoptimization, biased locking, and stack walking.

In practice it looks like this: your thread sends messages at p50 = 200 ns, and suddenly p99.9 spikes to 50 us. For no apparent reason. Because one of the JVM threads decided it was time.

Media Driver: An Extra Hop

Aeron works through a Media Driver — a separate process (or embedded JVM) that routes messages between publisher and subscriber via shared memory. This gives nice isolation but adds at least one extra hop:

Aeron:    App → shm → Media Driver → shm → socket → NIC
ZigBolt:  App → ring buffer → io_uring → NIC

Each hop means extra nanoseconds, extra cache misses, extra unpredictability.

SBE: A Separate Build Step

Simple Binary Encoding — the standard FIX codec for financial messages. In the Aeron ecosystem, it's a separate Java utility that generates code from XML schemas. A separate dependency, a separate build step, a separate set of problems.


The Solution: ZigBolt

We asked ourselves: what if we took Aeron's best ideas — triple-buffered log, lock-free ring buffers, Raft cluster — and implemented them in a language that:

  1. Has no runtime overhead (no GC, no safepoints)
  2. Allows code generation at compile time (comptime)
  3. Trivially integrates with C libraries (DPDK, io_uring)
  4. Compiles to a ~100 KB binary

That language is Zig.


Architecture

┌─────────────────────────────────────────────────────────┐
│  Publisher/Subscriber API (typed generic wrappers)       │
├─────────────────────────────────────────────────────────┤
│  Transport Layer (channel factory & lifecycle)            │
├─────────────────────────────────────────────────────────┤
│  IPC Channel (shared memory)  │ UDP Channel (network)    │
├─────────────────────────────────────────────────────────┤
│  WireCodec (comptime, zero-copy) │ SBE Encoder/Decoder   │
├─────────────────────────────────────────────────────────┤
│  Ring Buffers (SPSC/MPSC) │ LogBuffer (triple-buffered)  │
├─────────────────────────────────────────────────────────┤
│  Archive (replay) │ Sequencer (total order) │ Raft (HA)  │
└─────────────────────────────────────────────────────────┘

Seven layers, each usable independently. Just need an SPSC ring buffer for IPC between two processes? Take it. Need a full cluster with Raft consensus and archiving? Also there.


Benchmarks: Numbers, Not Words

ZigBolt Benchmark Results

Real benchmark results over 10 million iterations (Apple Silicon / macOS):

SPSC Ring Buffer

Message Size p50 p99 p99.9 Throughput
8 bytes 20 ns 30 ns 120 ns 42.8M msg/s
32 bytes 30 ns 50 ns 150 ns 28.5M msg/s
64 bytes 50 ns 60 ns 320 ns 17.6M msg/s
256 bytes 30 ns 50 ns 50 ns 29.5M msg/s

IPC Channel (shared memory)

Message Size p50 p99 p99.9 Throughput
64 bytes 30 ns 40 ns 40 ns 35.7M msg/s
256 bytes 40 ns 40 ns 170 ns 27.4M msg/s
1024 bytes 90 ns 260 ns 900 ns 9.9M msg/s

LogBuffer (Aeron-style triple-buffered)

Message Size p50 p99 p99.9 Throughput
32 bytes 30 ns 40 ns 320 ns 33.6M msg/s
64 bytes 30 ns 30 ns 160 ns 38.0M msg/s
256 bytes 30 ns 40 ns 60 ns 31.1M msg/s

WireCodec (comptime zero-copy)

Operation Latency Throughput
Encode (32 bytes) 0 ns inlined memcpy
Decode (32 bytes) ~0.4 ns 2.7 billion msg/s

Yes, you read that right: encoding takes zero nanoseconds. Because WireCodec(T) validates the struct at compile time and turns encode/decode into a plain @memcpy or pointer cast. Runtime overhead = zero.

For comparison: Aeron claims IPC RTT (round-trip) of ~250 ns. Our one-way latency is 30 ns. Even counting round-trip, we're 4x faster.


How It Works Inside

Lock-Free SPSC: Simplicity as Virtue

SPSC Ring Buffer

A single-producer single-consumer ring buffer is the simplest and fastest lock-free data structure. The writer moves head, the reader moves tail, no CAS needed — acquire/release atomics suffice.

The key trick is cache-line padding. If head and tail sit in the same cache line, every update to one counter invalidates the cache for the other core (false sharing). The fix:

// Head (write position) — on its own cache line
head: std.atomic.Value(usize) align(128) = .init(0),

// 128 bytes padding — guaranteed isolation
_pad0: [128 - @sizeOf(std.atomic.Value(usize))]u8 = .{0} ** ...,

// Tail (read position) — on its own cache line
tail: std.atomic.Value(usize) align(128) = .init(0),

128 bytes, not 64 — because on Apple Silicon (and many ARM chips) the hardware prefetcher can work with pairs of cache lines. We play it safe.

WireCodec: Comptime Instead of Code Generation

WireCodec — compile-time codec

In the Java/C++ world, binary codecs require a separate step: write a schema, run a code generator, get code, compile. In Zig, all of this happens at compile time:

const TickMsg = packed struct {
    symbol_id: u32,
    price: i64,
    quantity: u32,
    side: u8,
    _reserved: [3]u8,
    timestamp: u64,
};

const Codec = WireCodec(TickMsg);

// Encode — just a 32-byte memcpy. Inlines to 1-2 instructions.
Codec.encode(&msg, buf[0..Codec.wire_size]);

// Decode — pointer cast. Zero copies.
const tick = Codec.decode(buf[0..Codec.wire_size]);

The Zig compiler validates at comptime:

  • The struct is packed (no padding holes)
  • Size is a multiple of 8 bytes (alignment for SIMD)
  • All fields are primitive types

If something's wrong — compile error, not a runtime exception at 3 AM in production.

IPC via Shared Memory

Two processes map the same file in /dev/shm. Publisher writes to the ring buffer, subscriber reads. No sockets, no system calls on the hot path:

// Publisher
const channel = try IpcChannel.create("/market-data", .{
    .term_length = 1024 * 1024, // 1 MB
});
channel.publish(&msg_bytes, msg_type_id);

// Subscriber (another process)
const channel = try IpcChannel.open("/market-data", .{
    .term_length = 1024 * 1024,
});
const count = channel.poll(handler_fn, 10);

The entire path from publish() to handler_fn invocation in the subscriber — 30 nanoseconds for a 64-byte message.

NAK-Based Reliability for UDP

For network transport, ZigBolt uses receiver-driven retransmission. The receiver tracks gaps in sequence numbers via bitmap and sends NAK (negative acknowledgement) to the sender. Plus AIMD congestion control — TCP-like slow start and congestion avoidance — to avoid flooding the network.

Raft Cluster: When You Need Consistency

For cases where losing a message is unacceptable (e.g., a matching engine), ZigBolt includes full Raft consensus:

  • Leader election with configurable timeout (150-300 ms)
  • Log replication — the leader replicates every message to followers
  • Write-ahead log with CRC32 validation and crash recovery
  • Snapshots — so the WAL doesn't grow forever

Archive: Record and Replay

All messages can be recorded to a segmented on-disk archive. Then — replay from any position by time or sequence number. Built-in LZ4-style compression with no external dependencies. Sparse index for fast lookup within segments.

Total-Order Sequencer

For market making across multiple venues, it's critical that all events have a global order. The sequencer takes N input streams and merges them into one, assigning monotonically increasing sequence numbers. Every participant sees the same sequence of events.


Why Zig, Not Rust/C/C++?

We chose between four candidates. Here's an honest comparison:

Criterion Zig C/C++ Rust Java (Aeron)
GC / runtime overhead None None None JVM safepoints, GC
Comptime code generation Native Macros/templates proc macros None
C interop (DPDK, io_uring) Trivial @cImport Native FFI/bindgen JNI overhead
SIMD @Vector, built-in Intrinsics packed_simd (unstable) Vectorization hints
Cross-compilation Built-in CMake hell cargo target N/A
Build time Seconds Minutes (C++) Minutes Seconds + JVM startup
Hidden control flow None Exceptions, implicit casts Panics in unwrap Exceptions

Zig gave us a unique combination: C-level performance + safety during development + comptime metaprogramming (codecs, lookup tables, protocol state machines — all generated at compile time) + trivial integration with DPDK, liburing, ef_vi via @cImport.

And a Zig binary weighs ~100 KB. Versus 20+ MB for a JVM-based solution.


Bindings: Work in Your Language

ZigBolt compiles to a shared library with a C-ABI, and we have ready-made bindings for five languages:

TypeScript / Node.js

import { IpcChannel } from "@zigbolt/node";

const channel = IpcChannel.create({
  name: "/my-market-data",
  termLength: 1024 * 1024,
});

const msg = Buffer.from("BTC/USDT 42000.50", "utf-8");
channel.publish(msg, 1);

Rust

use zigbolt::IpcChannel;

let ch = IpcChannel::create("/my-channel", 64 * 1024).unwrap();
ch.publish(b"hello", 1).unwrap();

let sub = IpcChannel::open("/my-channel", 64 * 1024).unwrap();
sub.poll(|data, msg_type_id| {
    println!("got {} bytes, type={}", data.len(), msg_type_id);
}, 10);

Python

from zigbolt import IpcChannel

ch = IpcChannel.create("/market-data", term_length=1024*1024)
ch.publish(b"tick data here", msg_type_id=1)

Plus Go and plain C. The same shared memory channel is accessible from all languages simultaneously — publisher in Zig, subscriber in Python, monitoring in Go. They all read the same mmap region.


SBE Codec: FIX-Compatible Messages

For financial protocols, ZigBolt includes a full SBE (Simple Binary Encoding) codec with compile-time schemas. Built-in message types:

  • NewOrderSingle — order submission
  • ExecutionReport — execution report
  • MarketDataIncrementalRefresh — incremental market data update
  • MassQuote — mass quoting
  • Heartbeat — connectivity check
  • Logon — authentication

No external code generator, no XML. Everything is described with Zig structs and validated at compile time.


Wire Protocol: Aeron Compatibility

ZigBolt implements Aeron-compatible wire protocol flyweights:

  • DataHeaderFlyweight — data frames
  • StatusMessage — flow control
  • NAK — negative acknowledgement
  • Setup, RTT, Error — service frames

This means ZigBolt can coexist with existing Aeron infrastructure. Migration doesn't have to be a big bang.


What's Next

ZigBolt is currently at version 0.2.1. The core is stable, benchmarks are reproducible, bindings work. Coming soon:

  • io_uring backend — zero-copy network transport on Linux 6.0+ (IORING_OP_SEND_ZC)
  • DPDK / AF_XDP — kernel bypass for when every microsecond counts
  • Multi-Raft — sharding by instrument/strategy
  • Columnar archive — Apache Arrow/Parquet integration for analytics
  • Hugepage support — pre-faulted 2MB/1GB hugepages to minimize TLB misses

Try It

Build from source (zig build), run benchmarks (zig build bench), connect via FFI from any language. If you have Zig 0.15.1 and a couple of minutes — try the ping-pong benchmark and compare with your current solution.


Links:


Citation

@software{soloviov2026zigbolt,
  author = {Soloviov, Eugen},
  title = {ZigBolt: Why We Built Our Own Aeron in Zig and Hit 20 Nanoseconds Per Message},
  year = {2026},
  url = {https://marketmaker.cc/en/blog/post/zigbolt-zig-messaging-hft},
  version = {0.2.1},
  description = {How and why we built an ultra-low-latency messaging system for HFT from scratch in Zig. No JVM, no GC, no surprises.}
}
blog.disclaimer

MarketMaker.cc Team

Quantitative Research & Strategy

Discuss in Telegram
Newsletter

Dem Markt einen Schritt voraus

Abonniere unseren Newsletter für exklusive KI-Trading-Einblicke, Marktanalysen und Plattform-Updates.

Wir respektieren deine Privatsphäre. Jederzeit abbestellbar.