Bid-Ask Spread Modeling and Prediction with Machine Learning
The bid-ask spread is the single most important variable a market maker controls. Set it too wide and you lose flow to competitors. Set it too narrow and adverse selection eats your inventory alive. Traditional microstructure theory gives us elegant decompositions of the spread into its economic components. Machine learning gives us the tools to predict how those components shift in real time. This post bridges both worlds: we start with the classical theory, build up to Roll's implicit spread estimator, then move into gradient boosting and deep learning models that predict spreads from order book features. Along the way we flag the units, leakage, and benchmarking traps that quietly invalidate spread models in practice.
Why Spreads Matter for Market Makers
A market maker continuously quotes a bid price and an ask price . The quoted spread is:
Every round-trip (buy at the maker's bid, sell at the maker's ask, both filled by takers) transfers up to from the takers to the maker — in theory. In practice, the maker earns less than because of adverse selection: some takers are informed and trade right before the price moves against the maker. The realized profit per round-trip is the realized spread, which equals the effective spread minus the price impact:
We measure all three quantities on the same full-spread basis (not half), so the identity is dimensionally consistent. The effective spread for a single trade is:
Here is the trade direction (taker buy or sell), is the transaction price, and is the midpoint of the best bid and ask at the time of the trade. The price-impact term is defined symmetrically over a post-trade horizon :
where is the mid a horizon after the trade. The horizon must be stated explicitly — common choices are 5 minutes in equities and 30 seconds in crypto, where prices reprice faster. Subtracting the impact from the effective spread leaves the realized spread: what the maker keeps after the market has moved.
A market maker who can predict the spread — and its components — in the next 1, 5, or 60 seconds can dynamically adjust quotes to maximize realized spread while maintaining fill rates.
The Three Components of the Spread

The market microstructure literature (Stoll 1978, Glosten and Milgrom 1985, Huang and Stoll 1997) decomposes the bid-ask spread into three economic components.
1. Order Processing Cost ()
This is the cost of providing the market-making service per filled side: the fee the maker actually pays plus technology infrastructure, regulatory compliance, and the opportunity cost of capital deployed. Demsetz (1968) and Tinic (1972) were the first to formalize this component.
The key distinction is who pays which fee. A maker quoting passively pays the maker fee on its own fills — and on many venues is a rebate, i.e. negative. It does not pay the taker fee on those passive fills; the counterparty that crosses the spread pays . So the maker's per-side order-processing cost is:
where is signed (a rebate lowers and can make it negative) and covers connectivity, colocation, and compute. In modern electronic markets this component has shrunk dramatically — sub-penny in equities, a few basis points or a net rebate in crypto.
The taker fee matters for a different reason: it sets a floor on how tight the full spread can profitably be, because a taker who crosses pays on top of the spread. If you want a spread floor that keeps your quotes economically attractive relative to that taker cost, motivate it separately rather than folding into the maker's own cost. Conflating the two double-counts a round-trip fee inside a single half-spread.
2. Inventory Holding Cost ()
When a market maker accumulates a directional position (long or short), they bear price risk. The inventory component compensates for this risk. Stoll (1978) and Amihud and Mendelson (1980) modeled this as a function of volatility and the maker's current inventory:
where is the asset's volatility and is the maker's current inventory. As inventory grows, the maker widens the spread on the side where they are exposed and narrows it on the other, a technique called inventory skewing.
3. Adverse Selection Cost ()
This is the most dangerous component. Informed traders — those with superior information about imminent price moves — systematically pick off stale quotes. The adverse selection cost equals the expected loss per trade to informed counterparties. Copeland and Galai (1983) modeled this as the value of a free option the maker gives to informed traders. Glosten and Milgrom (1985) formalized it as the Bayesian revision in the maker's beliefs after observing a trade:
where is the true fundamental value. In liquid markets, adverse selection can account for 30-60% of the total spread.
The Full Decomposition
The quoted half-spread can be written as:
with , , and all expressed as per-side (half-spread) costs — that is what keeps the accounting consistent. Huang and Stoll (1997) proposed an econometric method to estimate these components from trade and quote data. The key insight: order processing costs create a fixed spread floor, inventory costs create a spread that varies with position and volatility, and adverse selection costs create a spread that varies with information asymmetry.
Roll's Implicit Spread Model

Before high-frequency data was widely available, Richard Roll (1984) proposed an elegant method to estimate the effective spread using only transaction prices. His insight: in an efficient market, the bid-ask bounce induces negative serial covariance in price changes, even when there is no new information.
The Model
Assume the fundamental value follows a random walk:
The observed transaction price bounces between bid and ask:
where with equal probability (i.e., buys and sells are equally likely). The price change is:
Computing the first-order autocovariance:
The model is derived in price units: falls out of the autocovariance of price changes, not returns. This distinction is the single most common implementation error, and we keep the code faithful to it below.
The Roll Estimator
Solving for :
When the sample autocovariance is positive (which happens frequently in practice due to noise or momentum), the estimator is undefined. A common fix is to set the estimate to zero or use the signed root:
where is the sample first-order autocovariance.
Implementation in Python
The estimator returns a spread in price units. To express it in basis points we divide by the midprice once — because, unlike a return-space estimator, it has not already been divided by price:
import numpy as np
import pandas as pd
def roll_spread(prices: pd.Series, window: int = 200) -> pd.Series:
"""
Rolling Roll (1984) spread estimator, in PRICE units.
The model is P_t = V_t + (S/2) d_t with Cov(ΔP_t, ΔP_{t-1}) = -S^2/4,
so S is recovered from the autocovariance of price CHANGES (diff),
not returns (pct_change). Using returns rescales the estimate by the
price level and is wrong by roughly that factor.
Parameters
----------
prices : pd.Series
Transaction prices.
window : int
Rolling window size (number of price changes).
Returns
-------
pd.Series
Estimated spread per window, in price units.
"""
dprice = prices.diff().dropna()
autocov = dprice.rolling(window).apply(
lambda x: np.cov(x[:-1], x[1:])[0, 1], raw=True
)
return 2.0 * np.sqrt(np.maximum(-autocov, 0.0))
trades = pd.read_parquet("trades.parquet")
trades["roll_spread"] = roll_spread(trades["price"], window=200)
trades["quoted_spread"] = trades["ask"] - trades["bid"]
trades["midprice"] = 0.5 * (trades["ask"] + trades["bid"])
trades["quoted_spread_bps"] = trades["quoted_spread"] / trades["midprice"] * 1e4
trades["roll_spread_bps"] = trades["roll_spread"] / trades["midprice"] * 1e4
A quick sanity check on a simulated series — a fundamental random walk near a price of 100 with a true spread of — recovers from price changes. The returns-based variant would return , off by the price level, and then dividing that by the midprice again to get bps compounds the error. If you prefer a return-space estimator, derive the model in log-price space and drop the second division by the midprice; pick one convention and make the code match the math.
Limitations of Roll's Model
Roll's model assumes: (1) market efficiency, (2) no information asymmetry, (3) i.i.d. trade direction, and (4) constant spread. All of these are violated in practice. Harris (1990) showed that the estimator is severely biased due to Jensen's inequality when applied to noisy data. Despite these limitations, the Roll estimator remains useful as a quick baseline and is widely used in empirical finance research.
ML Features for Spread Prediction

To move beyond static models, we need features that capture the dynamic drivers of spread variation. Here is a taxonomy of features organized by the spread component they proxy for.
Order Book Features (Inventory & Adverse Selection)
| Feature | Formula | Proxies For |
|---|---|---|
| Book imbalance | Directional pressure | |
| Weighted mid-price | Short-term fair value | |
| Depth ratio (levels 1-5) | Multi-level supply/demand | |
| Book pressure | Distance-weighted pressure | |
| Spread / tick ratio | Tightness relative to minimum |
Book pressure here uses an absolute distance-to-mid decay, , so volume sitting near the touch counts more than deep volume, and both sides are weighted by a positive, decreasing function. This avoids the structural sign bias of dividing volume by the signed distance (which is negative on the bid side, positive on the ask side, and blows up as a level approaches the mid). Pick from the typical book depth, or replace the exponential with any positive weight that decreases in distance.
Trade Flow Features (Adverse Selection)
| Feature | Formula | Proxies For |
|---|---|---|
| Trade imbalance | Net informed flow | |
| VPIN | Volume-synchronized probability of informed trading | Toxicity |
| Kyle's lambda | Regression of on signed volume | Price impact per unit |
| Large-trade frequency | Count of trades in window | Institutional activity |
Volatility Features (Inventory Cost)
| Feature | Formula | Proxies For |
|---|---|---|
| Realized volatility | Short-term risk | |
| Garman-Klass vol | Range-based vol | |
| Vol-of-vol | Rolling std of | Regime uncertainty |
| Return autocorrelation | Momentum / mean-reversion |
Market Regime Features
| Feature | Description | Proxies For |
|---|---|---|
| Time-of-day encoding | Intraday seasonality | |
| Seconds since last trade | Time gap | Activity level |
| Cross-asset correlation | Rolling corr with index/BTC | Systematic risk |
| Funding rate (crypto) | Perp funding rate | Leveraged positioning |
Gradient Boosting for Spread Prediction
Gradient boosted trees (XGBoost, LightGBM, CatBoost) are the workhorse of tabular prediction in quantitative finance. They handle mixed feature types, capture nonlinear interactions, require minimal preprocessing, and train fast on millions of rows — provided the feature build itself is vectorized (see the autocorrelation note below).
Problem Formulation
We frame spread prediction as a regression task. The target is the time-weighted average quoted spread over the next seconds:
In practice, we approximate this with the volume-weighted average spread over the next snapshots:
This target is a forward window of length (or horizon), which means adjacent rows share overlapping future windows. That overlap leaks information across a naive train/validation split — we handle it explicitly in the training code below.
Full Pipeline
import lightgbm as lgb
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, r2_score
def build_features(df: pd.DataFrame) -> pd.DataFrame:
"""Build spread-prediction features from L2 order book snapshots."""
f = pd.DataFrame(index=df.index)
f["imb1"] = (df["bid_vol_1"] - df["ask_vol_1"]) / (
df["bid_vol_1"] + df["ask_vol_1"] + 1e-9
)
bid_depth = df[[f"bid_vol_{i}" for i in range(1, 6)]].sum(axis=1)
ask_depth = df[[f"ask_vol_{i}" for i in range(1, 6)]].sum(axis=1)
f["depth_imb5"] = (bid_depth - ask_depth) / (bid_depth + ask_depth + 1e-9)
mid = 0.5 * (df["ask_1"] + df["bid_1"])
f["spread_bps"] = (df["ask_1"] - df["bid_1"]) / mid * 1e4
f["log_spread"] = np.log1p(df["ask_1"] - df["bid_1"])
log_ret = np.log(mid / mid.shift(1))
f["rvol_50"] = log_ret.rolling(50).std()
f["rvol_200"] = log_ret.rolling(200).std()
if "trade_sign" in df.columns and "trade_vol" in df.columns:
signed_vol = df["trade_sign"] * df["trade_vol"]
total_vol = df["trade_vol"].rolling(50).sum()
f["tfi_50"] = signed_vol.rolling(50).sum() / (total_vol + 1e-9)
lag1 = log_ret.shift(1)
f["ret_autocorr"] = log_ret.rolling(100).corr(lag1)
if isinstance(df.index, pd.DatetimeIndex):
seconds = df.index.hour * 3600 + df.index.minute * 60 + df.index.second
f["tod_sin"] = np.sin(2 * np.pi * seconds / 86400)
f["tod_cos"] = np.cos(2 * np.pi * seconds / 86400)
for lag in [1, 5, 10, 50]:
f[f"spread_lag_{lag}"] = f["spread_bps"].shift(lag)
return f.dropna()
def build_target(df: pd.DataFrame, horizon: int = 10) -> pd.Series:
"""Forward mean spread over the next `horizon` snapshots (in bps).
target[t] = mean(spread_bps[t+1 .. t+horizon]). Note that consecutive
targets share an overlapping forward window of length `horizon`, which
is why the CV below purges a gap of `horizon` rows around each fold.
"""
mid = 0.5 * (df["ask_1"] + df["bid_1"])
spread_bps = (df["ask_1"] - df["bid_1"]) / mid * 1e4
fwd = spread_bps.shift(-1).rolling(horizon).mean().shift(-(horizon - 1))
return fwd
def purged_walk_forward(n: int, n_splits: int, horizon: int):
"""Expanding-window splits with a purge/embargo gap of `horizon` rows.
Because each target spans `horizon` future snapshots, rows straddling a
train/val boundary share overlapping target windows. Dropping a gap of
`horizon` rows between train and validation removes that leakage
(Lopez de Prado-style purging). Without it, validation R²/MAE are
optimistically biased by the target overlap.
"""
fold_size = n // (n_splits + 1)
for k in range(1, n_splits + 1):
train_end = fold_size * k
val_start = train_end + horizon # embargo gap
val_end = val_start + fold_size
if val_end > n:
break
train_idx = np.arange(0, train_end - horizon) # purge gap
val_idx = np.arange(val_start, val_end)
yield train_idx, val_idx
def train_spread_model(features: pd.DataFrame, target: pd.Series, horizon: int = 10):
"""Train LightGBM with purged, embargoed walk-forward validation."""
common = features.index.intersection(target.dropna().index)
X = features.loc[common].reset_index(drop=True)
y = target.loc[common].reset_index(drop=True)
models, scores = [], []
params = {
"objective": "mae",
"learning_rate": 0.05,
"num_leaves": 63,
"min_child_samples": 100,
"subsample": 0.8,
"colsample_bytree": 0.8,
"reg_alpha": 0.1,
"reg_lambda": 1.0,
"verbose": -1,
}
for fold, (train_idx, val_idx) in enumerate(
purged_walk_forward(len(X), n_splits=5, horizon=horizon)
):
X_tr, X_val = X.iloc[train_idx], X.iloc[val_idx]
y_tr, y_val = y.iloc[train_idx], y.iloc[val_idx]
ds_tr = lgb.Dataset(X_tr, y_tr)
ds_val = lgb.Dataset(X_val, y_val, reference=ds_tr)
model = lgb.train(
params,
ds_tr,
num_boost_round=2000,
valid_sets=[ds_val],
callbacks=[lgb.early_stopping(50), lgb.log_evaluation(200)],
)
preds = model.predict(X_val)
mae = mean_absolute_error(y_val, preds)
r2 = r2_score(y_val, preds)
print(f"Fold {fold}: MAE={mae:.4f} bps, R²={r2:.4f}")
models.append(model)
scores.append({"mae": mae, "r2": r2})
return models[-1], scores
The crucial detail is the purge/embargo gap. The forward-mean target means consecutive rows overlap by up to horizon snapshots, so a plain TimeSeriesSplit lets validation rows share future windows with training rows — leaking the answer and inflating validation R². Dropping a gap of at least horizon rows on both sides of each fold boundary (Lopez de Prado-style purged k-fold) removes that bias. This applies just as much to the gradient-boosting pipeline as to deep learning, even though the leakage is more commonly discussed for sequence models.
Feature Importance Analysis
One of the key advantages of tree-based models is interpretability. After training, inspect SHAP values to understand which features drive spread predictions:
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_val)
shap.summary_plot(shap_values, X_val, max_display=15)
Typical findings across asset classes:
- Lagged spread () is almost always the most important feature — spreads are highly autocorrelated. This is also why headline R² looks high: much of the score is just persistence, so always benchmark against an AR/EWMA baseline (more on this below).
- Realized volatility is the second most important — intraday volatility and spreads are strongly positively correlated, both contemporaneously and dynamically.
- Book imbalance matters most during volatile periods — it signals imminent directional moves.
- Trade flow imbalance captures short-term adverse selection — a burst of one-sided flow predicts spread widening.
- Time-of-day captures the U-shaped intraday pattern (wider at open/close, tighter midday).
Hyperparameter Considerations
For spread prediction specifically:
- Use MAE or Huber loss rather than MSE. Spread distributions are right-skewed with occasional extreme outliers (during news events). MAE is more robust.
- Set
min_child_sampleshigh (100+) to prevent the model from fitting to microstructure noise in individual snapshots. - Use
subsample < 1.0to decorrelate trees and improve generalization across different volatility regimes.
Deep Learning Approaches
While gradient boosting excels on tabular features, deep learning can learn representations directly from raw order book data. Two architectures have proven effective for spread-related prediction tasks.
Architecture 1: CNN-LSTM for Order Book Snapshots
The DeepLOB architecture (Zhang et al. 2019) uses stacked small-kernel convolutions — and an Inception module — to extract spatial patterns across order book levels while preserving that spatial structure, followed by LSTM layers to model temporal dependencies. The important design choice is not to global-pool the level axis away before the recurrent layer: doing so collapses exactly the cross-level structure the convolutions are meant to capture.
For spread prediction, the input is a tensor of shape :
- = number of time steps (e.g., 100 snapshots)
- = number of price levels (e.g., 10 bid + 10 ask = 20)
- = features per level (price, volume, order count)
The model below keeps the convolutional feature map over levels and flattens it into the LSTM input (input_size = 16 * L), rather than averaging the level dimension into 16 channel means:
import torch
import torch.nn as nn
class SpreadPredictor(nn.Module):
"""
CNN-LSTM model for bid-ask spread prediction from L2 order book.
Input: (batch, seq_len, n_levels, n_features)
Output: (batch, 1) — predicted spread in bps
"""
def __init__(
self,
n_levels: int = 20,
n_features: int = 3,
seq_len: int = 100,
hidden_dim: int = 64,
n_lstm_layers: int = 2,
dropout: float = 0.2,
):
super().__init__()
self.seq_len = seq_len
self.n_levels = n_levels
self.conv = nn.Sequential(
nn.Conv1d(n_features, 32, kernel_size=3, padding=1),
nn.BatchNorm1d(32),
nn.LeakyReLU(0.1),
nn.Conv1d(32, 16, kernel_size=3, padding=1),
nn.BatchNorm1d(16),
nn.LeakyReLU(0.1),
)
conv_out_dim = 16 * n_levels # flattened (channels × levels)
self.lstm = nn.LSTM(
input_size=conv_out_dim,
hidden_size=hidden_dim,
num_layers=n_lstm_layers,
batch_first=True,
dropout=dropout,
)
self.head = nn.Sequential(
nn.Linear(hidden_dim, 32),
nn.ReLU(),
nn.Dropout(dropout),
nn.Linear(32, 1),
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""
Parameters
----------
x : Tensor of shape (batch, seq_len, n_levels, n_features)
Returns
-------
Tensor of shape (batch, 1) — predicted spread
"""
batch, T, L, F = x.shape
x = x.reshape(batch * T, L, F).permute(0, 2, 1)
x = self.conv(x) # (batch * T, 16, L)
x = x.reshape(batch, T, 16 * L) # (batch, T, 16 * L) — keep levels
lstm_out, _ = self.lstm(x) # (batch, T, hidden_dim)
last_hidden = lstm_out[:, -1, :] # (batch, hidden_dim)
return self.head(last_hidden) # (batch, 1)
If you do want pooling on the level axis to control parameter count, use a strided or learned pooling that retains more than a single position — not AdaptiveAvgPool1d(1), which averages every level into one number and throws away the spatial signal.
Architecture 2: Transformer Encoder
Transformers can capture long-range dependencies in order book sequences without the sequential bottleneck of LSTMs. For spread prediction, a lightweight transformer encoder works well:
class TransformerSpreadPredictor(nn.Module):
"""Transformer encoder for spread prediction from order book sequences."""
def __init__(
self,
input_dim: int = 40, # 20 levels * 2 features (price_offset, volume)
d_model: int = 64,
nhead: int = 4,
n_layers: int = 3,
seq_len: int = 100,
dropout: float = 0.1,
):
super().__init__()
self.input_proj = nn.Linear(input_dim, d_model)
self.pos_encoding = nn.Parameter(torch.randn(1, seq_len, d_model) * 0.02)
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model,
nhead=nhead,
dim_feedforward=d_model * 4,
dropout=dropout,
batch_first=True,
activation="gelu",
)
self.encoder = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)
self.head = nn.Sequential(
nn.LayerNorm(d_model),
nn.Linear(d_model, 1),
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""
x: (batch, seq_len, input_dim) — flattened order book snapshots
"""
x = self.input_proj(x) + self.pos_encoding[:, : x.size(1), :]
x = self.encoder(x)
return self.head(x[:, -1, :])
Training Considerations
-
Normalization: Normalize prices as offsets from the midprice (in ticks or bps). Normalize volumes by their rolling mean. Raw prices and volumes cause training instability.
-
Loss function: Use Huber loss () to handle spread spikes:
-
Window sampling and leakage: Use non-overlapping windows for training, and — exactly as in the gradient-boosting pipeline — purge/embargo a gap of at least
horizonsnapshots between train and validation. Both the forward-mean target and overlapping input windows leak future information across split boundaries and inflate apparent performance. -
Online adaptation: In production, periodically fine-tune the model on recent data (last 1-2 hours) with a small learning rate. Market microstructure changes intraday, and a model trained on morning data may underperform in the afternoon.
When to Use Deep Learning vs. Gradient Boosting
| Criterion | Gradient Boosting | Deep Learning |
|---|---|---|
| Input type | Tabular features | Raw order book sequences |
| Training data size | Works with 100K+ rows | Needs 1M+ rows |
| Feature engineering | Manual (high effort, high control) | Learned (lower effort, less interpretable) |
| Inference latency | Single-digit µs with a compiled predictor; tens of µs from Python | Hundreds of µs on GPU |
| Interpretability | High (SHAP) | Low (attention maps) |
| Regime adaptation | Retrain / online update | Fine-tune on recent data |
| Short-horizon spread skill | Broadly comparable to DL | Edge grows on longer horizons / larger data |
We deliberately avoid quoting specific R² figures: spread forecast accuracy depends heavily on the horizon, the asset, and how much of the score is simply spread autocorrelation. A model can post an impressive raw R² while adding almost nothing over a one-line EWMA. Report skill above an AR/EWMA baseline on the same data, with the horizon stated, rather than a headline R². Likewise, treat latency numbers as implementation-dependent: a 2000-round, 63-leaf LightGBM model predicts a single row in tens of microseconds from Python and only reaches a few microseconds with a compiled/C++ predictor.
In practice, many production systems use a two-stage approach: a fast gradient boosting model for real-time quoting (latency-critical), and a deep learning model running asynchronously to adjust the boosting model's parameters or provide a secondary signal.
From Prediction to Quoting

A spread prediction is only valuable if it translates into better quotes. Here is a simplified quoting rule that uses the predicted spread:
def compute_quotes(
mid: float,
predicted_spread_bps: float,
inventory: float,
max_inventory: float,
skew_factor: float = 0.5,
min_spread_bps: float = 1.0,
) -> tuple[float, float]:
"""
Compute bid/ask quotes from predicted spread and inventory.
Parameters
----------
mid : float
Current midprice.
predicted_spread_bps : float
Model-predicted spread in basis points.
inventory : float
Current inventory (positive = long).
max_inventory : float
Maximum allowed inventory.
skew_factor : float
How aggressively to skew quotes toward inventory neutrality.
min_spread_bps : float
Minimum spread floor (covers order processing costs).
Returns
-------
(bid, ask) : tuple[float, float]
"""
spread_bps = max(predicted_spread_bps, min_spread_bps)
half_spread = mid * spread_bps / 2e4
inv_ratio = inventory / max_inventory # in [-1, 1]
skew = skew_factor * inv_ratio * half_spread
bid = mid - half_spread - skew
ask = mid + half_spread - skew
return bid, ask
When inventory is long (), the skew lowers both the bid and the ask. From a single, consistent perspective: a lower ask makes it cheaper for takers to buy from us, which offloads our long inventory; a lower bid makes us less likely to be hit by sellers, slowing further accumulation. The predicted spread controls the overall width — widening when the model expects volatility or adverse selection, narrowing when conditions are calm.
Evaluation and Backtesting

Spread Prediction Metrics
Beyond standard regression metrics (MAE, ), evaluate spread predictions with metrics that matter for market making:
- Skill over a baseline: Always report MAE/R² relative to an AR(1) or EWMA forecast of recent spreads. Because spreads are strongly persistent, the absolute score is dominated by autocorrelation; only the improvement over a trivial baseline reflects real predictive content.
- Directional accuracy: Does the model correctly predict whether the spread will widen or narrow? A model with mediocre MAE but high directional accuracy can still be profitable.
- Tail coverage: Does the model predict spread spikes? Compute MAE separately for the top 5% of spread values — this is where adverse selection losses concentrate.
- Calibration: Plot predicted vs. realized spread quantiles. A well-calibrated model's 90th percentile prediction should match the 90th percentile of realized spreads.
PnL-Based Evaluation
Ultimately, the only metric that matters is realized PnL. Backtest the full loop:
- At each timestamp, predict the spread
- Compute quotes using the predicted spread + inventory skew
- Simulate fills against historical trades
- Track inventory, realized PnL, and Sharpe ratio
Compare against baselines: (a) constant spread (the time-series median), (b) EWMA of recent spreads, and (c) the Roll estimator.
Conclusion
Spread modeling sits at the intersection of financial theory and applied ML. The classical decomposition into order processing, inventory, and adverse selection costs provides the economic intuition for why spreads vary. Roll's model gives an elegant baseline estimator from minimal data — as long as you compute it in price units. Gradient boosting models turn microstructure features into accurate short-horizon spread forecasts with low-latency inference. Deep learning architectures learn directly from raw order book data, capturing patterns that handcrafted features may miss — provided the architecture preserves cross-level structure rather than pooling it away.
For a production market-making system, the practical recommendation is layered:
- Use the Huang-Stoll decomposition offline to understand your spread components and calibrate risk limits
- Use Roll's estimator as a sanity check and for instruments where you lack order book data
- Deploy a LightGBM model for real-time spread prediction — it is fast, interpretable, and robust — with purged walk-forward validation and an AR/EWMA benchmark
- Run a CNN-LSTM or Transformer model in a secondary loop to detect regime changes and adjust the primary model
The spread is not a number — it is a signal. The better you model it (and the more honestly you measure that model), the more precisely you can price liquidity provision.
This post is part of the marketmaker.cc series on algorithmic market making and microstructure.
Auteurs
Trading-systems engineer
Trading-systems engineer building bots since 2017: cross-exchange arbitrage (connected up to 30 venues), cointegration-based pairs arbitrage across spot and futures, scalping, news and sentiment-driven strategies, trend algorithms, and portfolio management and balancing algorithms. Also builds sub-millisecond order execution, big-data warehouses, backtesting engines, AI agents, and trading interfaces (incl. open-source profitmaker.cc). Stack: JS/TS, Python, Rust/Zig/Go, DevOps, backend, frontend, architecture.