Assine nossa newsletter para insights exclusivos sobre trading com IA, análises de mercado e atualizações da plataforma.
You optimized a strategy. 12 separation parameters, 9 meta-parameters — 21 in total. A backtest over 25 months on a single pair shows PnL +3342% at MaxLev. The equity curve rises with almost no drawdowns. Sharpe above 3. Everything looks perfect.
You launch the bot. Two weeks later, the strategy loses 18% of capital. A month later — 34%. The parameters that "worked" on historical data turned out to be fitted to a specific sequence of market events. You didn't find a pattern — you memorized noise.
This is classic overfitting. And the only systematic way to detect it before going into production is Walk-Forward Optimization (WFO).
The Single Train/Test Split Trap
The standard approach: split data into 70% train and 30% test. Optimize on train, verify on test. If the result is positive — launch.
The problem: this is one test on one split. The result depends on where you draw the boundary. Shift the boundary by a month — and the out-of-sample PnL can change from +40% to -15%.
Three different splits — three different conclusions. Which one to trust? None of them. A single train/test split is the same single-point estimate whose problems we described in Monte Carlo Bootstrap. You need not one check, but a systematic series of checks on consecutive data segments.
This is exactly what Walk-Forward Optimization exists for.
What Is Walk-Forward Optimization
WFO is a procedure of sequential optimization and verification of a strategy on sliding (or expanding) data windows. The idea: simulate the real trading process where you periodically re-optimize parameters on available data and then trade until the next re-optimization.
Each "window" consists of two parts:
In-Sample (IS) — the period on which parameters are optimized
Out-of-Sample (OOS) — the period on which the found parameters are tested without fitting
The key property: OOS periods do not overlap and collectively cover a significant portion of the data. The resulting equity curve is built only from OOS segments — this is the honest evaluation of the strategy.
Anchored WFO (Expanding Window)
In anchored WFO, the start of the train period is fixed, and its end expands with each window:
Window 1: Train [2024-01] → Test [2024-04]
Window 2: Train [2024-01..04] → Test [2024-07] (growing train)
Window 3: Train [2024-01..07] → Test [2024-10]
Window 4: Train [2024-01..10] → Test [2025-01]
Window 5: Train [2024-01..2025-01] → Test [2025-04]
Advantages:
Each subsequent train period contains more data — optimization is more stable
Early patterns are not lost — they are always in the training set
Easier to implement
Disadvantages:
Old data may "dilute" current patterns
If the market has structurally changed — old data is harmful
Train period grows indefinitely, increasing optimization time
Rolling WFO (Sliding Window)
In rolling WFO, a fixed-length train period "slides" across the data:
Window 1: Train [2024-01..06] → Test [2024-07..09]
Window 2: Train [2024-04..09] → Test [2024-10..12]
Window 3: Train [2024-07..12] → Test [2025-01..03]
Window 4: Train [2024-10..2025-03] → Test [2025-04..06]
Window 5: Train [2025-01..06] → Test [2025-07..09]
Advantages:
Adapts to the current market regime
Constant optimization time
Old, irrelevant data does not affect results
Disadvantages:
Less data for training — higher variance of optimal parameters
Sensitive to window length selection
May "forget" rare but important events (flash crashes)
Combinatorial Purged Cross-Validation (CPCV)
An advanced method proposed by Marcos Lopez de Prado. Data is split into N groups, from which k are selected for testing. The key difference from standard cross-validation is purging (removing data at the train/test boundary) and embargo (an additional gap to prevent data leakage):
Number of combinations=(kN)
With N=10,k=2: 45 train/test combinations. Each combination produces an OOS result, and the final estimate is the average across all combinations.
from itertools import combinations
import numpy as np
defcpcv_splits(n_groups: int, k_test: int, purge_pct: float = 0.01):
"""
Generate CPCV splits with purging.
Args:
n_groups: number of groups
k_test: number of test groups in each split
purge_pct: fraction of data for purging (at the train/test boundary)
"""
groups = list(range(n_groups))
splits = []
for test_groups in combinations(groups, k_test):
train_groups = [g for g in groups if g notin test_groups]
splits.append({
"train": train_groups,
"test": list(test_groups),
"purge_groups": _get_purge_groups(train_groups, test_groups),
})
return splits
def_get_purge_groups(train, test):
"""Groups at the train/test boundary for purging."""
purge = set()
for t in test:
if t - 1in train:
purge.add(t - 1)
if t + 1in train:
purge.add(t + 1)
returnlist(purge)
CPCV is better than rolling WFO when data is scarce, but computationally more expensive. For a strategy with 21 parameters and 25 months of data, we recommend starting with rolling WFO and using CPCV as an additional check.
Key WFO Parameters
Train Period Length
Too short a train — insufficient data for reliable optimization. Too long — old data dilutes current patterns.
Rule of thumb: the train should contain at least 200-300 trades. If the strategy makes 2 trades per day:
Tmin=2trades/day300trades=150days≈5months
For crypto with its regime switches, we recommend no more than 6-12 months for the rolling window.
Test Period Length
The test period must be sufficient for a statistically significant evaluation, but not too long — otherwise parameters have time to degrade.
Rule: test = 20-33% of train. If train = 6 months, test = 1.5-2 months.
Overlap
In rolling WFO, windows can overlap. Overlap increases the number of OOS data points but introduces correlation between estimates:
Without overlap:
Train [01..06] → Test [07..09]
Train [07..12] → Test [01..03]
With 50% overlap:
Train [01..06] → Test [07..09]
Train [04..09] → Test [10..12]
Train [07..12] → Test [01..03]
Recommendation: 50% overlap on the train period — a good balance between the number of windows and independence of estimates.
Re-optimization Frequency
Determines how often you recalculate parameters. In the crypto market, the optimal frequency is every 1-3 months. More frequent re-optimization increases the risk of overfitting to noise; less frequent — the risk of parameter staleness.
Walk-Forward Efficiency Ratio and Degradation Rate
Walk-Forward Efficiency Ratio (WFER)
The key WFO metric — the ratio of OOS returns to IS returns:
WFER=PnLISPnLOOS
Interpretation:
WFER
Interpretation
> 0.8
Excellent robustness. Parameters transfer to new data.
0.5 — 0.8
Acceptable robustness. Strategy works but with degradation.
0.3 — 0.5
Borderline case. Partial overfitting is likely.
< 0.3
Overfitting. Parameters are fitted to IS data.
< 0
Strategy is unprofitable OOS. Complete overfitting or logic error.
If WFER < 0.5 — the strategy is most likely overfit. This is our primary filter.
Degradation Rate
Shows how quickly optimal parameters lose effectiveness over time:
Degradation rate=dtd(OOS PnL)
In practice: divide the test period into sub-intervals and track PnL dynamics:
defdegradation_rate(oos_returns: np.ndarray, n_subperiods: int = 4) -> float:
"""
Estimate parameter degradation rate.
Splits the OOS period into sub-intervals and computes the slope
of linear regression of PnL against sub-interval number.
Returns:
slope: negative = degradation, positive = improvement
"""
chunk_size = len(oos_returns) // n_subperiods
subperiod_pnls = []
for i inrange(n_subperiods):
start = i * chunk_size
end = start + chunk_size
sub_pnl = np.sum(oos_returns[start:end])
subperiod_pnls.append(sub_pnl)
x = np.arange(n_subperiods)
slope = np.polyfit(x, subperiod_pnls, 1)[0]
return slope
If the degradation rate is strongly negative — parameters become stale quickly, and you need more frequent re-optimization or a shorter train period.
Full WFO Pipeline Implementation in Python
import numpy as np
import pandas as pd
from dataclasses import dataclass, field
from typing importCallable, List, Optionalimport warnings
@dataclassclassWFOWindow:
"""A single walk-forward window."""
window_id: int
train_start: int# train start index
train_end: int# train end index (exclusive)
test_start: int# test start index
test_end: int# test end index (exclusive)
best_params: dict = field(default_factory=dict)
is_pnl: float = 0.0# in-sample PnL
oos_pnl: float = 0.0# out-of-sample PnL
oos_returns: np.ndarray = field(default_factory=lambda: np.array([]))
wfer: float = 0.0# walk-forward efficiency ratio@dataclassclassWFOResult:
"""Result of the entire WFO."""
windows: List[WFOWindow]
aggregate_oos_pnl: float
aggregate_is_pnl: float
wfer: float
degradation_rate: float
oos_equity: np.ndarray
oos_sharpe: float
oos_max_dd: float
n_windows: int
passed: bool# whether the strategy passed the filterclassWalkForwardOptimizer:
"""
Walk-Forward Optimization pipeline.
Supports anchored (expanding) and rolling (sliding) modes.
"""def__init__(
self,
data: np.ndarray,
optimize_fn: Callable,
evaluate_fn: Callable,
mode: str = "rolling", # "rolling" or "anchored"
train_size: int = 180, # days
test_size: int = 60, # days
step_size: int = 60, # window step size, days
min_trades: int = 30, # min number of trades in OOS
wfer_threshold: float = 0.5, # WFER threshold for accept/reject):
self.data = data
self.optimize_fn = optimize_fn
self.evaluate_fn = evaluate_fn
self.mode = mode
self.train_size = train_size
self.test_size = test_size
self.step_size = step_size
self.min_trades = min_trades
self.wfer_threshold = wfer_threshold
defgenerate_windows(self) -> List[WFOWindow]:
"""Generate walk-forward windows."""
n = len(self.data)
windows = []
window_id = 0ifself.mode == "rolling":
start = 0while start + self.train_size + self.test_size <= n:
w = WFOWindow(
window_id=window_id,
train_start=start,
train_end=start + self.train_size,
test_start=start + self.train_size,
test_end=min(start + self.train_size + self.test_size, n),
)
windows.append(w)
start += self.step_size
window_id += 1elifself.mode == "anchored":
train_end = self.train_size
while train_end + self.test_size <= n:
w = WFOWindow(
window_id=window_id,
train_start=0,
train_end=train_end,
test_start=train_end,
test_end=min(train_end + self.test_size, n),
)
windows.append(w)
train_end += self.step_size
window_id += 1return windows
defrun(self) -> WFOResult:
"""Run the full WFO pipeline."""
windows = self.generate_windows()
all_oos_returns = []
for w in windows:
train_data = self.data[w.train_start:w.train_end]
test_data = self.data[w.test_start:w.test_end]
best_params, is_pnl = self.optimize_fn(train_data)
w.best_params = best_params
w.is_pnl = is_pnl
oos_pnl, oos_returns = self.evaluate_fn(test_data, best_params)
w.oos_pnl = oos_pnl
w.oos_returns = oos_returns
if is_pnl != 0:
w.wfer = oos_pnl / is_pnl
else:
w.wfer = 0.0
all_oos_returns.extend(oos_returns)
all_oos = np.array(all_oos_returns)
oos_equity = np.cumprod(1 + all_oos)
peak = np.maximum.accumulate(oos_equity)
max_dd = ((oos_equity - peak) / peak).min()
aggregate_is = sum(w.is_pnl for w in windows)
aggregate_oos = sum(w.oos_pnl for w in windows)
wfer = aggregate_oos / aggregate_is if aggregate_is != 0else0if np.std(all_oos) > 0:
oos_sharpe = np.mean(all_oos) / np.std(all_oos) * np.sqrt(252)
else:
oos_sharpe = 0
deg_rate = self._degradation_rate(windows)
passed = wfer >= self.wfer_threshold and aggregate_oos > 0return WFOResult(
windows=windows,
aggregate_oos_pnl=aggregate_oos,
aggregate_is_pnl=aggregate_is,
wfer=wfer,
degradation_rate=deg_rate,
oos_equity=oos_equity,
oos_sharpe=oos_sharpe,
oos_max_dd=max_dd,
n_windows=len(windows),
passed=passed,
)
def_degradation_rate(self, windows: List[WFOWindow]) -> float:
"""Slope of OOS PnL across window numbers."""iflen(windows) < 3:
return0.0
pnls = [w.oos_pnl for w in windows]
x = np.arange(len(pnls))
slope = np.polyfit(x, pnls, 1)[0]
return slope
Usage Example
import numpy as np
np.random.seed(42)
prices = 100 * np.cumprod(1 + np.random.normal(0.0002, 0.02, 750))
defmy_optimize(train_data):
"""
Optimize strategy on train data.
Returns (best_params, is_pnl).
"""
best_pnl = -np.inf
best_params = {}
for fast inrange(5, 30, 5):
for slow inrange(20, 100, 10):
if fast >= slow:
continue
pnl, _ = _run_strategy(train_data, fast, slow)
if pnl > best_pnl:
best_pnl = pnl
best_params = {"fast": fast, "slow": slow}
return best_params, best_pnl
defmy_evaluate(test_data, params):
"""
Evaluate strategy on test data with fixed parameters.
Returns (oos_pnl, oos_returns).
"""
pnl, returns = _run_strategy(test_data, params["fast"], params["slow"])
return pnl, returns
def_run_strategy(data, fast_period, slow_period):
"""Simple MA crossover strategy."""
fast_ma = pd.Series(data).rolling(fast_period).mean().values
slow_ma = pd.Series(data).rolling(slow_period).mean().values
position = 0
returns = []
for i inrange(slow_period, len(data) - 1):
if fast_ma[i] > slow_ma[i] and position <= 0:
position = 1elif fast_ma[i] < slow_ma[i] and position >= 0:
position = -1
daily_ret = (data[i + 1] - data[i]) / data[i]
returns.append(position * daily_ret)
total_pnl = np.sum(returns)
return total_pnl, np.array(returns)
wfo = WalkForwardOptimizer(
data=prices,
optimize_fn=my_optimize,
evaluate_fn=my_evaluate,
mode="rolling",
train_size=180, # 6 months
test_size=60, # 2 months
step_size=60, # step = test
)
result = wfo.run()
print(f"Windows: {result.n_windows}")
print(f"OOS PnL: {result.aggregate_oos_pnl:.4f}")
print(f"IS PnL: {result.aggregate_is_pnl:.4f}")
print(f"WFER: {result.wfer:.3f}")
print(f"OOS Sharpe: {result.oos_sharpe:.2f}")
print(f"OOS MaxDD: {result.oos_max_dd:.2%}")
print(f"Degradation: {result.degradation_rate:.5f}")
print(f"Passed: {result.passed}")
for w in result.windows:
print(f" Window {w.window_id}: IS={w.is_pnl:.4f} OOS={w.oos_pnl:.4f} "f"WFER={w.wfer:.2f} params={w.best_params}")
Interpreting Results: When to Trust, When to Reject
Strategy Passed WFO
If WFER >= 0.5 across all windows, OOS PnL is positive and stable:
Parameters are similar between windows (fast = 10-15, slow = 50-60)
OOS PnL is positive in most windows
Degradation rate is close to zero
Strategy Failed WFO
Window 0: IS=0.2341 OOS=-0.0312 WFER=-0.13 params={'fast': 5, 'slow': 95}
Window 1: IS=0.1987 OOS=0.0089 WFER=0.04 params={'fast': 25, 'slow': 30}
Window 2: IS=0.2156 OOS=-0.0567 WFER=-0.26 params={'fast': 10, 'slow': 90}
Window 3: IS=0.1834 OOS=0.0234 WFER=0.13 params={'fast': 20, 'slow': 40}
→ Aggregate WFER: -0.07, IS is high, OOS is near zero → overfitting
Signs of overfitting:
High IS PnL, low/negative OOS PnL — classic overfitting
Parameters vary significantly between windows — no stable optimum
WFER < 0.3 in most windows — parameters don't transfer
Degradation rate is strongly negative — rapid degradation
More on parameter stability analysis — in the article Plateau analysis. If the optimum is "sharp" (drops steeply with small parameter changes) — this is an additional overfitting signal.
WFO Specifics for Cryptocurrencies
Cryptocurrencies create unique problems for WFO that don't exist in traditional markets.
Regime Switches
The crypto market switches between radically different regimes: bull trend, bear trend, sideways with high/low volatility. Parameters optimal in one regime can be unprofitable in another.
Solution: use rolling WFO (not anchored) with a 4-6 month window. This allows "forgetting" old regimes. Additionally — cluster data by volatility and run WFO separately for each cluster.
Short History
Most altcoins have less than 3 years of trading history. With train = 6 months and test = 2 months, you'll get only 4-5 windows — a statistically weak estimate.
Solution: use CPCV instead of or in addition to rolling WFO. CPCV generates more combinations from the same data. For 10 groups and k=2: 45 combinations instead of 4-5 windows.
Structural Liquidity Changes
Crypto pair liquidity is non-stationary: a pair can be liquid for 6 months, then volumes drop 10x. Parameters optimized on a liquid market don't work on an illiquid one.
Solution: add a liquidity filter to the WFO pipeline. Exclude windows where average daily volume is below a threshold. Verify that liquidity in the test period is comparable to the train period.
Funding Rate Impact
For leveraged futures strategies, funding rates can fundamentally change OOS results. A strategy shows +5% OOS over 2 months, but at 10x leverage, funding eats 3.6%.
Detailed analysis of funding impact — in our article Funding rates kill your leverage. Be sure to account for funding costs when evaluating OOS PnL in WFO.
Multi-Parameter Strategies: Why WFO Is Critical with 12+ Parameters
A strategy with 21 parameters (12 separation + 9 meta) on 25 months of data from a single pair is a model with a colossal search space.
The Curse of Dimensionality
The number of parameter combinations grows exponentially with the number of parameters:
Combinations=∏i=1n∣Pi∣
If each of the 21 parameters takes at least 10 values:
1021=10sextillion combinations
Even with Bayesian optimization (details in Coordinate Descent vs Bayesian), you explore a negligible fraction of the space. The probability that the found optimum is a noise artifact rather than a real pattern grows with the number of parameters.
Bonferroni Formula for Multiple Comparisons
If you test M parameter combinations, the probability of a false "discovery" (finding a good result by chance):
P(false discovery)=1−(1−α)M≈1−e−αM
At α=0.05 and M=10000 tried combinations:
P≈1−e−500≈1.0
You're guaranteed to find "working" parameters — that are actually fitted to noise. Without WFO, you have no way to distinguish a real edge from a statistical artifact.
Rule: Number of OOS Data Points vs Number of Parameters
A rule of thumb for trusting WFO results:
ParametersOOS trades>10
For 21 parameters, you need at least 210 OOS trades. If your WFO generates fewer — the result cannot be trusted.
The strategy with +3342% PnL@ML: 21 parameters, 25 months of data. Suppose 5 OOS windows of 60 days, 2 trades/day — total 5×60×2=600 OOS trades. The ratio 600/21=28.6 — acceptable, but only if WFER > 0.5.
Integrating WFO with Optuna
In each WFO window, you need to optimize parameters. For 21 parameters, grid search is impossible, coordinate descent is inefficient. The optimal choice is Bayesian optimization via Optuna.
Important: inside WFO, optimize Sharpe, not PnL. PnL optimization finds parameters that maximize profit on a specific sequence of trades. Sharpe optimization finds parameters with the best return-to-risk ratio — they are more robust OOS.
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
defplot_wfo_results(result: WFOResult, data: np.ndarray):
"""Visualize Walk-Forward Optimization results."""
fig, axes = plt.subplots(3, 1, figsize=(16, 14))
ax = axes[0]
ax.plot(result.oos_equity, color='#4FC3F7', linewidth=1.5)
ax.axhline(1.0, color='#FF5252', linestyle='--', alpha=0.5, label='Break-even')
ax.set_title(f'OOS Equity Curve (WFER={result.wfer:.2f}, Sharpe={result.oos_sharpe:.2f})')
ax.set_ylabel('Equity')
ax.legend()
ax.grid(True, alpha=0.3)
ax = axes[1]
wfers = [w.wfer for w in result.windows]
colors = ['#69F0AE'if w >= 0.5else'#FFAB40'if w >= 0.3else'#FF5252'for w in wfers]
ax.bar(range(len(wfers)), wfers, color=colors, edgecolor='#1A237E', alpha=0.8)
ax.axhline(0.5, color='#E040FB', linestyle='--', label='Threshold (0.5)')
ax.axhline(0, color='gray', linestyle='-', alpha=0.3)
ax.set_title('Walk-Forward Efficiency Ratio by Window')
ax.set_xlabel('Window')
ax.set_ylabel('WFER')
ax.legend()
ax = axes[2]
x = np.arange(len(result.windows))
width = 0.35
ax.bar(x - width/2, [w.is_pnl for w in result.windows],
width, label='IS PnL', color='#7C4DFF', alpha=0.7)
ax.bar(x + width/2, [w.oos_pnl for w in result.windows],
width, label='OOS PnL', color='#4FC3F7', alpha=0.7)
ax.set_title('In-Sample vs Out-of-Sample PnL')
ax.set_xlabel('Window')
ax.set_ylabel('PnL')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('wfo_results.png', dpi=150)
plt.show()
Practical Recommendations
Checklist Before Launching a Strategy in Production
1. Run WFO (rolling + anchored)
Compare results of both modes. If rolling WFO fails but anchored passes — most likely the strategy only works on early data.
2. Check WFER for each window
Not just aggregate WFER, but each window individually. If 2 out of 6 windows have WFER < 0 — that's a problem, even if aggregate > 0.5.
3. Compare parameters between windows
If optimal parameters "jump" from window to window — there is no stable edge. Use Plateau analysis to verify optimum stability.
4. Check degradation rate
A strongly negative degradation rate = parameters lose effectiveness quickly. You need more frequent re-optimization or a strategy overhaul.
5. Apply Monte Carlo bootstrap to OOS results
Aggregate OOS PnL is also a single-point estimate. Apply Monte Carlo bootstrap to the array of OOS returns to obtain confidence intervals.
6. Account for costs
OOS PnL must include commissions, slippage, and funding rates. A nice OOS PnL without costs is an illusion. More details — Funding rates kill your leverage.
Minimum Data Requirements
Number of parameters
Min OOS trades
Min WFO windows
Min data (2 trades/day)
2-5
50
3
~6 months
6-10
100
4
~12 months
11-15
150
5
~18 months
16-21
210
6
~24 months
22+
300+
8+
~36+ months
The Strategy with 21 Parameters and 25 Months of Data
Let's return to the question from the beginning of the article: 21 parameters optimized on 25 months of data from a single pair. PnL@ML = +3342%. How to validate?
Step 1. Rolling WFO: train = 8 months, test = 2 months, step = 2 months. We get ~8 windows.
Step 2. Anchored WFO: first train = 8 months, test = 2 months. We get ~8 windows.
Step 3. CPCV: 10 groups of ~2.5 months, k = 2. We get 45 combinations.
Step 4. For each method, verify:
WFER >= 0.5?
Parameters stable between windows?
Degradation rate acceptable?
OOS trades / Parameters >= 10?
Step 5. Monte Carlo bootstrap on aggregate OOS returns. 5th percentile PnL > 0?
If any of these tests fails — the strategy with +3342% is most likely overfit. 21 parameters on 25 months of a single pair — this is an extremely high parameter-to-data ratio. Without passing WFO, there can be no trust.
We additionally recommend checking strategy efficiency accounting for PnL by active time — this will reveal what portion of the +3342% is due to time in position versus actual edge.
Conclusion
Walk-Forward Optimization is not optional — it is a necessity. It is the only method that systematically verifies parameter transferability to new data. A single train/test split is a lottery. A full backtest on all data is self-deception.
Key takeaways:
WFER < 0.5 = overfitting. If out-of-sample PnL is less than half of in-sample — the parameters are fitted.
Parameter stability matters more than the maximum. Parameters that yield +15% in every window are better than parameters that yield +40% in one and -10% in another.
Rolling WFO for crypto. Regime switches make anchored WFO less reliable. A rolling window of 4-6 months is the optimal balance.
More parameters — stricter requirements. 21 parameters require at least 210 OOS trades and 6+ WFO windows. Without this, the result cannot be verified.
WFO + Monte Carlo bootstrap + Plateau analysis — three layers of overfitting protection. Each layer catches what the others miss.
A strategy that passes WFO with WFER > 0.5 across all windows, stable parameters, and a positive 5th-percentile bootstrap — that is a strategy you can trust with real money. Everything else is curve fitting with a pretty equity curve.
@article{soloviov2026walkforwardoptimization,
author = {Soloviov, Eugen},
title = {Walk-Forward Optimization: The Only Honest Strategy Test},
year = {2026},
url = {https://marketmaker.cc/en/blog/post/walk-forward-optimization},
version = {0.1.0},
description = {Why a single train/test split does not protect against overfitting, how walk-forward optimization systematically verifies parameter robustness, and why a strategy with +3342\% PnL@ML on 21 parameters is a ticking time bomb without WFO.}
}