ZigBolt: Why We Built Our Own Aeron in Zig and Hit 20 Nanoseconds Per Message

ZigBolt — ultra-low latency messaging Lock-free ring buffers, zero-copy codecs, Raft cluster — all in pure Zig, all open source.

If you work in algorithmic trading or market making, you know the price of every microsecond. One extra context switch — and your order arrives second. One JVM GC pause — and the market maker on the other side has already updated the quote. In a world where money is measured in nanoseconds, messaging infrastructure isn't a boring pipe between services — it's a competitive advantage.

We built ZigBolt — a messaging system for high-frequency trading written in Zig. From scratch. No JVM, no garbage collector, no Media Driver, no XML configs. And we got 20 nanoseconds p50 latency on an SPSC ring buffer and 30 nanoseconds on IPC via shared memory.

This article covers why we needed it, how it works inside, and why Zig.

TL;DR

ZigBolt — open-source (MIT) messaging system for HFT in pure Zig
20 ns p50 on SPSC, 30 ns p50 on IPC — faster than Aeron's published numbers
Zero-copy codec runs at 0 ns (compile-time generation, runtime is just a pointer cast)
No GC, no JVM, no Media Driver — the library embeds directly into your application
Raft cluster, archive, sequencer — all included
FFI bindings for Rust, Python, Go, TypeScript, C — work in whatever language you prefer

The Problem: Why Aeron Is Great But Not Enough

Aeron from Real Logic is the de facto standard for low-latency messaging in capital markets. Dozens of HFT firms use it, it's battle-tested, and it has excellent architecture. But Aeron has a fundamental problem, and its name is JVM.

JVM Safepoints: The Invisible Enemy

Even if you carefully put all data in off-heap memory, even if you disabled GC ergonomics and set GuaranteedSafepointInterval=300000 — the JVM still occasionally stops all threads at a safepoint. This isn't a bug, it's an architectural decision: the JVM needs safepoints for deoptimization, biased locking, and stack walking.

In practice it looks like this: your thread sends messages at p50 = 200 ns, and suddenly p99.9 spikes to 50 us. For no apparent reason. Because one of the JVM threads decided it was time.

Media Driver: An Extra Hop

Aeron works through a Media Driver — a separate process (or embedded JVM) that routes messages between publisher and subscriber via shared memory. This gives nice isolation but adds at least one extra hop:

Aeron:    App → shm → Media Driver → shm → socket → NIC
ZigBolt:  App → ring buffer → io_uring → NIC

Each hop means extra nanoseconds, extra cache misses, extra unpredictability.

SBE: A Separate Build Step

Simple Binary Encoding — the standard FIX codec for financial messages. In the Aeron ecosystem, it's a separate Java utility that generates code from XML schemas. A separate dependency, a separate build step, a separate set of problems.

The Solution: ZigBolt

We asked ourselves: what if we took Aeron's best ideas — triple-buffered log, lock-free ring buffers, Raft cluster — and implemented them in a language that:

Has no runtime overhead (no GC, no safepoints)
Allows code generation at compile time (comptime)
Trivially integrates with C libraries (DPDK, io_uring)
Compiles to a ~100 KB binary

That language is Zig.

Architecture

┌─────────────────────────────────────────────────────────┐
│  Publisher/Subscriber API (typed generic wrappers)       │
├─────────────────────────────────────────────────────────┤
│  Transport Layer (channel factory & lifecycle)            │
├─────────────────────────────────────────────────────────┤
│  IPC Channel (shared memory)  │ UDP Channel (network)    │
├─────────────────────────────────────────────────────────┤
│  WireCodec (comptime, zero-copy) │ SBE Encoder/Decoder   │
├─────────────────────────────────────────────────────────┤
│  Ring Buffers (SPSC/MPSC) │ LogBuffer (triple-buffered)  │
├─────────────────────────────────────────────────────────┤
│  Archive (replay) │ Sequencer (total order) │ Raft (HA)  │
└─────────────────────────────────────────────────────────┘

Seven layers, each usable independently. Just need an SPSC ring buffer for IPC between two processes? Take it. Need a full cluster with Raft consensus and archiving? Also there.

Benchmarks: Numbers, Not Words

ZigBolt Benchmark Results

Real benchmark results over 10 million iterations (Apple Silicon / macOS):

SPSC Ring Buffer

Message Size	p50	p99	p99.9	Throughput
8 bytes	20 ns	30 ns	120 ns	42.8M msg/s
32 bytes	30 ns	50 ns	150 ns	28.5M msg/s
64 bytes	50 ns	60 ns	320 ns	17.6M msg/s
256 bytes	30 ns	50 ns	50 ns	29.5M msg/s

IPC Channel (shared memory)

Message Size	p50	p99	p99.9	Throughput
64 bytes	30 ns	40 ns	40 ns	35.7M msg/s
256 bytes	40 ns	40 ns	170 ns	27.4M msg/s
1024 bytes	90 ns	260 ns	900 ns	9.9M msg/s

LogBuffer (Aeron-style triple-buffered)

Message Size	p50	p99	p99.9	Throughput
32 bytes	30 ns	40 ns	320 ns	33.6M msg/s
64 bytes	30 ns	30 ns	160 ns	38.0M msg/s
256 bytes	30 ns	40 ns	60 ns	31.1M msg/s

WireCodec (comptime zero-copy)

Operation	Latency	Throughput
Encode (32 bytes)	0 ns	inlined memcpy
Decode (32 bytes)	~0.4 ns	2.7 billion msg/s

Yes, you read that right: encoding takes zero nanoseconds. Because WireCodec(T) validates the struct at compile time and turns encode/decode into a plain @memcpy or pointer cast. Runtime overhead = zero.

For comparison: Aeron claims IPC RTT (round-trip) of ~250 ns. Our one-way latency is 30 ns. Even counting round-trip, we're 4x faster.

How It Works Inside

Lock-Free SPSC: Simplicity as Virtue

SPSC Ring Buffer

A single-producer single-consumer ring buffer is the simplest and fastest lock-free data structure. The writer moves head, the reader moves tail, no CAS needed — acquire/release atomics suffice.

The key trick is cache-line padding. If head and tail sit in the same cache line, every update to one counter invalidates the cache for the other core (false sharing). The fix:

// Head (write position) — on its own cache line
head: std.atomic.Value(usize) align(128) = .init(0),

// 128 bytes padding — guaranteed isolation
_pad0: [128 - @sizeOf(std.atomic.Value(usize))]u8 = .{0} ** ...,

// Tail (read position) — on its own cache line
tail: std.atomic.Value(usize) align(128) = .init(0),

128 bytes, not 64 — because on Apple Silicon (and many ARM chips) the hardware prefetcher can work with pairs of cache lines. We play it safe.

WireCodec: Comptime Instead of Code Generation

WireCodec — compile-time codec

In the Java/C++ world, binary codecs require a separate step: write a schema, run a code generator, get code, compile. In Zig, all of this happens at compile time:

const TickMsg = packed struct {
    symbol_id: u32,
    price: i64,
    quantity: u32,
    side: u8,
    _reserved: [3]u8,
    timestamp: u64,
};

const Codec = WireCodec(TickMsg);

// Encode — just a 32-byte memcpy. Inlines to 1-2 instructions.
Codec.encode(&msg, buf[0..Codec.wire_size]);

// Decode — pointer cast. Zero copies.
const tick = Codec.decode(buf[0..Codec.wire_size]);

The Zig compiler validates at comptime:

The struct is packed (no padding holes)
Size is a multiple of 8 bytes (alignment for SIMD)
All fields are primitive types

If something's wrong — compile error, not a runtime exception at 3 AM in production.

IPC via Shared Memory

Two processes map the same file in /dev/shm. Publisher writes to the ring buffer, subscriber reads. No sockets, no system calls on the hot path:

// Publisher
const channel = try IpcChannel.create("/market-data", .{
    .term_length = 1024 * 1024, // 1 MB
});
channel.publish(&msg_bytes, msg_type_id);

// Subscriber (another process)
const channel = try IpcChannel.open("/market-data", .{
    .term_length = 1024 * 1024,
});
const count = channel.poll(handler_fn, 10);

The entire path from publish() to handler_fn invocation in the subscriber — 30 nanoseconds for a 64-byte message.

NAK-Based Reliability for UDP

For network transport, ZigBolt uses receiver-driven retransmission. The receiver tracks gaps in sequence numbers via bitmap and sends NAK (negative acknowledgement) to the sender. Plus AIMD congestion control — TCP-like slow start and congestion avoidance — to avoid flooding the network.

Raft Cluster: When You Need Consistency

For cases where losing a message is unacceptable (e.g., a matching engine), ZigBolt includes full Raft consensus:

Leader election with configurable timeout (150-300 ms)
Log replication — the leader replicates every message to followers
Write-ahead log with CRC32 validation and crash recovery
Snapshots — so the WAL doesn't grow forever

Archive: Record and Replay

All messages can be recorded to a segmented on-disk archive. Then — replay from any position by time or sequence number. Built-in LZ4-style compression with no external dependencies. Sparse index for fast lookup within segments.

Total-Order Sequencer

For market making across multiple venues, it's critical that all events have a global order. The sequencer takes N input streams and merges them into one, assigning monotonically increasing sequence numbers. Every participant sees the same sequence of events.

Why Zig, Not Rust/C/C++?

We chose between four candidates. Here's an honest comparison:

Criterion	Zig	C/C++	Rust	Java (Aeron)
GC / runtime overhead	None	None	None	JVM safepoints, GC
Comptime code generation	Native	Macros/templates	proc macros	None
C interop (DPDK, io_uring)	Trivial `@cImport`	Native	FFI/bindgen	JNI overhead
SIMD	`@Vector`, built-in	Intrinsics	packed_simd (unstable)	Vectorization hints
Cross-compilation	Built-in	CMake hell	cargo target	N/A
Build time	Seconds	Minutes (C++)	Minutes	Seconds + JVM startup
Hidden control flow	None	Exceptions, implicit casts	Panics in `unwrap`	Exceptions

Zig gave us a unique combination: C-level performance + safety during development + comptime metaprogramming (codecs, lookup tables, protocol state machines — all generated at compile time) + trivial integration with DPDK, liburing, ef_vi via @cImport.

And a Zig binary weighs ~100 KB. Versus 20+ MB for a JVM-based solution.

Bindings: Work in Your Language

ZigBolt compiles to a shared library with a C-ABI, and we have ready-made bindings for five languages:

TypeScript / Node.js

import { IpcChannel } from "@zigbolt/node";

const channel = IpcChannel.create({
  name: "/my-market-data",
  termLength: 1024 * 1024,
});

const msg = Buffer.from("BTC/USDT 42000.50", "utf-8");
channel.publish(msg, 1);

Rust

use zigbolt::IpcChannel;

let ch = IpcChannel::create("/my-channel", 64 * 1024).unwrap();
ch.publish(b"hello", 1).unwrap();

let sub = IpcChannel::open("/my-channel", 64 * 1024).unwrap();
sub.poll(|data, msg_type_id| {
    println!("got {} bytes, type={}", data.len(), msg_type_id);
}, 10);

Python

from zigbolt import IpcChannel

ch = IpcChannel.create("/market-data", term_length=1024*1024)
ch.publish(b"tick data here", msg_type_id=1)

Plus Go and plain C. The same shared memory channel is accessible from all languages simultaneously — publisher in Zig, subscriber in Python, monitoring in Go. They all read the same mmap region.

SBE Codec: FIX-Compatible Messages

For financial protocols, ZigBolt includes a full SBE (Simple Binary Encoding) codec with compile-time schemas. Built-in message types:

NewOrderSingle — order submission
ExecutionReport — execution report
MarketDataIncrementalRefresh — incremental market data update
MassQuote — mass quoting
Heartbeat — connectivity check
Logon — authentication

No external code generator, no XML. Everything is described with Zig structs and validated at compile time.

Wire Protocol: Aeron Compatibility

ZigBolt implements Aeron-compatible wire protocol flyweights:

DataHeaderFlyweight — data frames
StatusMessage — flow control
NAK — negative acknowledgement
Setup, RTT, Error — service frames

This means ZigBolt can coexist with existing Aeron infrastructure. Migration doesn't have to be a big bang.

What's Next

ZigBolt is currently at version 0.2.1. The core is stable, benchmarks are reproducible, bindings work. Coming soon:

io_uring backend — zero-copy network transport on Linux 6.0+ (IORING_OP_SEND_ZC)
DPDK / AF_XDP — kernel bypass for when every microsecond counts
Multi-Raft — sharding by instrument/strategy
Columnar archive — Apache Arrow/Parquet integration for analytics
Hugepage support — pre-faulted 2MB/1GB hugepages to minimize TLB misses

Try It

Website: zigbolt-landing.vercel.app
Docs: zigbolt-landing.vercel.app/getting-started/introduction/
Source code: github.com/suenot/zigbolt
License: MIT

Build from source (zig build), run benchmarks (zig build bench), connect via FFI from any language. If you have Zig 0.15.1 and a couple of minutes — try the ping-pong benchmark and compare with your current solution.

Links:

ZigBolt Landing: zigbolt-landing.vercel.app
GitHub: github.com/suenot/zigbolt
Aeron (for comparison): github.com/real-logic/aeron | our Aeron overview
Zig language: ziglang.org
Marketmaker.cc: marketmaker.cc

Citation

@software{soloviov2026zigbolt,
  author = {Soloviov, Eugen},
  title = {ZigBolt: Why We Built Our Own Aeron in Zig and Hit 20 Nanoseconds Per Message},
  year = {2026},
  url = {https://marketmaker.cc/en/blog/post/zigbolt-zig-messaging-hft},
  version = {0.2.1},
  description = {How and why we built an ultra-low-latency messaging system for HFT from scratch in Zig. No JVM, no GC, no surprises.}
}

ZigBolt: Why We Built Our Own Aeron in Zig and Hit 20 Nanoseconds Per Message

TL;DR

The Problem: Why Aeron Is Great But Not Enough

JVM Safepoints: The Invisible Enemy

Media Driver: An Extra Hop

SBE: A Separate Build Step

The Solution: ZigBolt

Architecture

Benchmarks: Numbers, Not Words

SPSC Ring Buffer

IPC Channel (shared memory)

LogBuffer (Aeron-style triple-buffered)

WireCodec (comptime zero-copy)

How It Works Inside

Lock-Free SPSC: Simplicity as Virtue

WireCodec: Comptime Instead of Code Generation

IPC via Shared Memory

NAK-Based Reliability for UDP

Raft Cluster: When You Need Consistency

Archive: Record and Replay

Total-Order Sequencer

Why Zig, Not Rust/C/C++?

Bindings: Work in Your Language

TypeScript / Node.js

Rust

Python

SBE Codec: FIX-Compatible Messages

Wire Protocol: Aeron Compatibility

What's Next

Try It

Citation

Read More

Aeron: Inside the Messaging System That Powers Half of the HFT Industry

Automated ETF Portfolio Rebalancing: How We Built a Bot for Tinkoff Invest

Complex Arbitrage Execution in Rust: From Nanoseconds to Atomic Multi-Legs

TL;DR

The Problem: Why Aeron Is Great But Not Enough

JVM Safepoints: The Invisible Enemy

Media Driver: An Extra Hop

SBE: A Separate Build Step

The Solution: ZigBolt

Architecture

Benchmarks: Numbers, Not Words

SPSC Ring Buffer

IPC Channel (shared memory)

LogBuffer (Aeron-style triple-buffered)

WireCodec (comptime zero-copy)

How It Works Inside

Lock-Free SPSC: Simplicity as Virtue

WireCodec: Comptime Instead of Code Generation

IPC via Shared Memory

NAK-Based Reliability for UDP

Raft Cluster: When You Need Consistency

Archive: Record and Replay

Total-Order Sequencer

Why Zig, Not Rust/C/C++?

Bindings: Work in Your Language

TypeScript / Node.js

Rust

Python

SBE Codec: FIX-Compatible Messages

Wire Protocol: Aeron Compatibility

What's Next

Try It

Citation

Read More

Aeron: Inside the Messaging System That Powers Half of the HFT Industry

Automated ETF Portfolio Rebalancing: How We Built a Bot for Tinkoff Invest

Complex Arbitrage Execution in Rust: From Nanoseconds to Atomic Multi-Legs

Bozordan bir qadam oldinda bo'ling

Muvaffaqiyat!