Assine nossa newsletter para insights exclusivos sobre trading com IA, análises de mercado e atualizações da plataforma.
You ran a strategy through a backtest. You got PnL +42%, Sharpe 1.8, MaxDD -12%. The results look great. You launch the bot in production, and a month later you discover that the drawdown is already -28% and PnL is heading toward zero.
What went wrong? It is not a bug and not "a changed market." The issue is that you made a decision based on a single number — a single-point estimate. You learned that the strategy showed +42%, but you did not learn how much you can trust that number.
The Problem with Single-Point Estimates
A backtest on historical data is one run through one specific sequence of market events. The result depends on the order of trades: the same strategy with the same trades, but in a different order, can show an entirely different maximum drawdown.
Imagine 491 trades. Each trade is a random event with a certain return distribution. The historical backtest shows only one realization of this process. It is like rolling a die once and concluding that the die always lands on four.
What we actually need:
Not a point estimate, but an interval: "with 95% probability, the final PnL will be between X and Y"
Not a single maximum drawdown, but a distribution: "in the 5% worst scenarios, the drawdown exceeds Z%"
Not the mean, but the tails: what happens if luck is not on your side?
This is exactly what Monte Carlo bootstrap is for.
What Is Monte Carlo Bootstrap
Bootstrap is a resampling method proposed by Bradley Efron in 1979. The idea is elegant: if we have a data sample, we can generate thousands of "new" samples by randomly selecting elements from the original with replacement.
In the context of a backtest, it works like this:
You have an array of returns for each trade — for example, 491 values
You randomly select 491 values from this array with replacement — some trades will appear twice, some will not appear at all
You build an equity curve from this new sample
You repeat 10,000 times
You get a distribution of final metrics, not a single number
Each iteration is one "alternative scenario": what could have happened if the order and set of trades had been slightly different.
Implementation in 10 Lines
Here is a complete working implementation:
import numpy as np
defmax_drawdown(equity_curve):
"""Calculate the maximum drawdown of an equity curve."""
peak = np.maximum.accumulate(equity_curve)
drawdown = (equity_curve - peak) / peak
return drawdown.min()
trade_returns = [...] # 491 values, e.g. [0.012, -0.005, 0.008, ...]
n_simulations = 10000
results = []
for _ inrange(n_simulations):
sampled = np.random.choice(trade_returns, size=len(trade_returns), replace=True)
equity = np.cumprod(1 + sampled)
results.append({
"final_pnl": equity[-1] - 1,
"max_dd": max_drawdown(equity),
"sharpe": np.mean(sampled) / np.std(sampled) * np.sqrt(252)
})
Execution time: ~2 seconds on a regular laptop. 10,000 alternative histories of your strategy.
Extracting Confidence Intervals
Now we have not one number, but a distribution. Here is how to extract useful information from it:
A fan chart provides an intuitive understanding of the spread of possible outcomes. A narrow fan means the strategy is stable. A wide fan means the result heavily depends on "luck" with the sequence of trades.
The fan chart (left) shows the spread of possible equity trajectories, and the histogram (right) shows the density distribution of final returns with highlighted confidence intervals (5%, 50%, 95%).
Advanced Analysis: Probability of Ruin
Bootstrap allows you to answer a critical question: what is the probability that the strategy will lose X% of capital?
These metrics are impossible to obtain from a single backtest run. Yet they are critical for making the decision to launch a strategy.
For more on why deep drawdowns are mathematically dangerous and how return asymmetry works, read our article Loss-Profit Asymmetry.
When Classical Bootstrap Does Not Work
The method has limitations that are important to know.
Autocorrelation of Returns
Classical bootstrap assumes that trades are independent. In reality, this is often not the case — a strategy can have winning and losing streaks. If autocorrelation is significant, use block bootstrap:
defblock_bootstrap(returns, block_size=10, n_simulations=10000):
"""Bootstrap preserving local dependency structure."""
n = len(returns)
results = []
for _ inrange(n_simulations):
starts = np.random.randint(0, n - block_size + 1, size=n // block_size + 1)
sampled = np.concatenate([returns[s:s+block_size] for s in starts])[:n]
equity = np.cumprod(1 + sampled)
results.append({
"final_pnl": equity[-1] - 1,
"max_dd": max_drawdown(equity),
})
return pd.DataFrame(results)
Block bootstrap preserves local dependencies between consecutive trades, providing more realistic confidence intervals for MaxDD.
Market Non-Stationarity
Bootstrap works with the original trade distribution. If the market has structurally changed (e.g., volatility dropped or liquidity changed), historical trades may be unrepresentative. To account for this:
Use a rolling window: bootstrap only on the last N trades
Weight recent trades more heavily: weighted bootstrap
Split data by market regimes and bootstrap separately
Small Number of Trades
Bootstrap is reliable when n > 30 trades. If you have 10 trades — no amount of resampling will help. 491 trades is an excellent sample; you can trust the results.
Comparison of Approaches to Backtest Robustness Assessment
Method
What it provides
Complexity
Time
When to use
Single backtest
One point estimate
Minimal
Seconds
Never as a final result
Walk-forward
Out-of-sample metrics
Medium
Minutes
To check for overfitting
Monte Carlo bootstrap
Confidence intervals
Minimal
~2 sec
Always before production
Monte Carlo path
New price paths
High
Minutes-hours
For stress testing
Cross-validation
Average metrics across folds
Medium
Minutes
For parameter tuning
Monte Carlo bootstrap is the only method that in minimal time provides a complete picture of risks.
Checklist: Interpreting Results
Here is how we recommend interpreting Monte Carlo bootstrap results:
Launch in production if:
PnL at the 5th percentile is positive
MaxDD at the 5th percentile is acceptable for your risk appetite
Probability of ruin < 1%
Sharpe at the 5th percentile > 0.5
Needs work if:
PnL at the 5th percentile is near zero
MaxDD at the 5th percentile is significantly worse than at the 50th
Wide fan chart spread — the strategy is unstable
Do not launch if:
PnL at the 5th percentile is negative
Probability of ruin > 5%
Confidence interval for Sharpe includes 0
Our Experience at marketmaker.cc
At marketmaker.cc, we develop our own backtest engine, and Monte Carlo bootstrap is an integral part of our pipeline. Every strategy goes through bootstrap automatically before being approved for live trading.
We integrated bootstrap directly into the backtest engine: after a run, you get not just the final PnL, but a complete report with confidence intervals, fan chart, probability of ruin, and a comparison of block vs. standard bootstrap. This takes an additional 2-3 seconds — a negligible price for understanding real risks.
From our experience: approximately 30% of strategies that look attractive by single-point estimate are filtered out after Monte Carlo bootstrap. Their 5th percentile PnL goes negative or MaxDD turns out to be unacceptable. Without bootstrap, these strategies would have gone to production and would have very likely resulted in losses.
Conclusion
Monte Carlo bootstrap is ~10 lines of code and ~2 seconds of computation. It transforms a single number from a backtest into a full distribution with confidence intervals. This is perhaps the highest ROI of any quantitative analysis tool:
Minimal cost: implementation in 30 minutes
Maximum payoff: understanding of real strategy risks
No dependencies: only NumPy
If you are not yet using bootstrap — add it to your pipeline today. It is the only way to know how much you can trust your backtest results.
@software{soloviov2026montecarlobootstrap,
author = {Soloviov, Eugen},
title = {Monte Carlo Bootstrap: How to Get Confidence Intervals for a Backtest in 10 Lines of Code},
year = {2026},
url = {https://marketmaker.cc/ru/blog/post/monte-carlo-bootstrap-backtest},
version = {0.1.0},
description = {Why a single-point estimate from a backtest is a dangerous illusion. How Monte Carlo bootstrap in 2 seconds of computation gives you a 95\% confidence interval for PnL and MaxDD, and why this is a mandatory step before launching a strategy in production.}
}