← All Collections
4 parts

High-Performance Backtest Engines

How to build a backtest engine that runs hundreds of times faster without changing a single PnL number — data layout, caching, adaptive resolution, and architecture, from first speedups to production internals.

  1. 01
    The Backtest Speed Ladder: 298x on a Laptop CPU, Identical PnL to the Last Trade
    Jul 1, 2026 #algotrading

    The Backtest Speed Ladder: 298x on a Laptop CPU, Identical PnL to the Last Trade

    Five implementations of the same 80-combo parameter sweep, all verified to produce identical PnL: pandas rolling.apply takes 69.9 seconds, numpy 3.1, numba 2.0, parallel numba 0.23 — a measured 298x speedup on an Apple M2 Max with zero hardware changes, and still ~13x over a competent vectorized baseline. What each rung buys, why a GPU is not the missing piece, and where the real bottleneck in mass parameter search lives.

  2. 02
    Aggregated Parquet Cache: How to Speed Up Multi-Timeframe Backtests by Hundreds of Times
    Mar 16, 2026 #algotrading

    Aggregated Parquet Cache: How to Speed Up Multi-Timeframe Backtests by Hundreds of Times

    How to precompute timeframes and indicators from minute candles, save them to parquet, and use them for mass strategy testing without redundant recalculations.

  3. 03
    Adaptive Drill-Down: Backtest with Variable Granularity from Minutes to Raw Trades
    Mar 17, 2026 #algotrading

    Adaptive Drill-Down: Backtest with Variable Granularity from Minutes to Raw Trades

    How adaptive data granularity speeds up backtests and saves storage: drill-down from 1m to 1s, 100ms, and raw trades only where price moved significantly or volume spiked, not across the entire historical series.

  4. 04
    The IPC Tax: Put the Backtest Engine Behind a Socket and Lose 13% — Almost None of It to the Socket
    Jul 2, 2026 #algotrading

    The IPC Tax: Put the Backtest Engine Behind a Socket and Lose 13% — Almost None of It to the Socket

    We ported a numba backtest kernel line-for-line to Rust and called it across a process boundary four ways, with an equivalence gate confirming identical PnL to the last trade. Shipping the entire 1.2 MB price series through a Unix socket costs ~2 ms — about 0.1% of the job. JSON-encoding the same payload costs 1348x more than raw bytes, chatty per-combo calls re-ship the data 80 times, and a per-bar call pattern would pay 2.1 s of pure IPC on a 2.0 s job. The boundary is cheap; the tax is in how you cross it.