Article 6 in the "Backtests Without Illusions" series
You ran study.optimize(), Optuna found a parameter set with PnL +87%. You're excited and preparing the strategy for production. Two weeks of live trading later, PnL is around zero. What happened?
The optimizer found the tip of a needle in parameter space. The parameters are perfectly fitted to the historical sequence of trades — but the slightest deviation in market conditions destroys the entire construct. This is classic overfitting, and it could have been detected before launch.
In the previous article we compared coordinate descent with Bayesian optimization and showed why Optuna finds the optimum more efficiently. Today — the next step: how to make sure the found optimum is robust, rather than the result of fitting to noise.
Why Finding the "Best" Parameters Is Only Half the Work
Strategy parameter optimization is a search for a maximum in a multidimensional space. The problem is that maximums come in two types:
Plateau — a wide flat region where PnL is consistently high across parameter variations. Even if market conditions shift the effective parameters by 10-20%, the strategy will continue to profit.
Sharp peak — a narrow summit where PnL is high only at the exact parameter value. A shift of one step collapses profitability. This is almost certainly overfitting: the optimizer found an artifact of historical data, not a stable pattern.
An alpinism metaphor: a plateau is a mountain tableland where you can walk safely. A sharp peak is the tip of a needle where you can only balance.
Sharp Peak vs Flat Plateau — Visual Intuition
Imagine a contour map where the axes are two strategy parameters and the color represents PnL. Two patterns are easy to distinguish visually:
Plateau (robust optimum):
Wide areas of the same color
Smooth transitions between PnL levels
Isolines far apart
Shifting from the optimum by +/-20% changes PnL by no more than 10%
Imagine a heatmap: in the center — a bright yellow rectangle roughly one-third the size of the entire map. The color gradually transitions to orange, then red toward the edges. The optimum is not a point, but a region.
Sharp peak (overfitting):
A narrow bright spot surrounded by cold colors
Abrupt transitions: a collapse right next to the optimum
Isolines compressed into tight rings
Shifting by +/-5% drops PnL by 50% or more
Imagine the same heatmap, but in the center — a tiny yellow dot immediately surrounded by blue and purple. A single "correct" parameter combination.
Parameter Sensitivity Analysis
One-Dimensional Analysis: PnL vs One Parameter
The simplest approach — fix all parameters except one and see how PnL depends on its value. Optuna provides plot_slice for this:
import optuna
from optuna.visualization import plot_slice
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=500)
fig = plot_slice(study, params=["htf_entry_sell", "ltf_momentum", "stop_loss_pct"])
fig.show()
What to look for on a slice plot:
Robust parameter: the point cloud forms a wide horizontal band near the optimum. The best trials are spread across a wide range of parameter values.
Fragile parameter: the best trials are concentrated in a narrow range. Shifting the parameter by one or two steps — and profitability collapses.
A contour plot shows the interaction of two parameters simultaneously. This is the key tool for plateau analysis, because parameters rarely act independently — entry and exit thresholds, timeframes, and position sizes are interconnected.
from optuna.visualization import plot_contour
fig = plot_contour(study, params=["htf_entry_sell", "htf_exit_buy"])
fig.show()
A contour plot for a robust parameter pair looks like a topographic map of a hilly plain: smooth wide isolines, large areas of the same color. A contour plot for a fragile pair — like a map of a volcanic cone: tight concentric rings around a single point.
For a strategy with 12 separation parameters, this gives (212)=66 pairwise contour plots. You don't have to study them all — start with the parameters that Optuna rated as most important.
Optuna can estimate each parameter's contribution to the objective function:
from optuna.visualization import plot_param_importances
fig = plot_param_importances(study)
fig.show()
The parameter importance chart is a horizontal histogram. Parameters are ranked by their contribution to PnL variance in descending order. The top 3-4 parameters usually explain 70-80% of the variance.
Rule: if a parameter explains less than 2% of PnL variance, its value is practically irrelevant to the result — it's robust by definition. Focus plateau analysis on the top-5 most important parameters.
The result — a grid of scatter plots. Each subplot shows the objective function value (PnL, Y-axis) against a single parameter value (X-axis). Points are individual trials. For a robust parameter, the best points (highest PnL) are distributed across a wide range of X. For a fragile one — grouped in a narrow column.
plot_contour — Two-Dimensional Contours
from optuna.visualization import plot_contour
important_pairs = [
["htf_entry_sell", "htf_entry_buy"],
["htf_entry_sell", "stop_loss_pct"],
["ltf_momentum_threshold", "take_profit_pct"],
]
for params in important_pairs:
fig = plot_contour(study, params=params)
fig.update_layout(title=f"Contour: {params[0]} vs {params[1]}")
fig.show()
Each contour plot is a heatmap with two parameters on the axes. Color encodes the average PnL in a given region of parameter space. Yellow/green — high PnL, blue/purple — low. Isolines connect points with the same PnL.
fANOVA (functional ANOVA) decomposes the variance of the objective function across parameters and their interactions. This is more powerful than simple correlation because it accounts for nonlinear effects.
Quantitative Plateau Metrics
Visual assessment is subjective. We need numbers. Here are three metrics that formalize the concept of a "plateau."
Sensitivity Ratio
The ratio of PnL change to parameter change:
Si=Δpi/pi,optΔPnL/PnLopt
where ΔPnL is the PnL drop when parameter pi deviates from the optimum by Δpi.
Interpretation:
Si<0.5 — parameter is robust: a 10% shift causes less than 5% PnL drop
0.5≤Si<2.0 — moderate sensitivity
Si≥2.0 — parameter is fragile: a 10% shift crashes PnL by 20%+
Plateau Width
The width of the parameter region within which PnL stays within X% of the optimum:
where the denominator is the full search range of the parameter.
Interpretation:
Wirel(10%)>0.3 — the plateau covers more than 30% of the range at the 10% threshold. Robust parameter.
Wirel(10%)<0.05 — the plateau is narrower than 5% of the range. Red flag.
Robustness Score
A combined metric across all parameters:
R=∏i=1k(Wirel(10%))wi
where wi is the normalized importance of parameter i from fANOVA (∑wi=1).
The product of weighted widths is a strict metric: if even one important parameter has a narrow plateau, R will be low. Unimportant parameters (with small wi) have almost no effect.
Let's examine three strategies with 12 separation parameters. Each strategy underwent Optuna optimization with 500 trials.
Strategy A (~55% PnL, ~500 trades, ~15% time)
Strategy A's parameters form a wide plateau. Take the key parameter htf_entry_sell:
Optimal value: 0.020
PnL at 0.015: +51% (7% drop)
PnL at 0.025: +49% (11% drop)
PnL at 0.010: +43% (22% drop)
PnL at 0.030: +41% (25% drop)
If you imagine this as a one-dimensional plot (X-axis — htf_entry_sell value, Y-axis — PnL), you'll see a gentle parabola with a flat top. The range 0.010-0.030 is the plateau, where PnL stays within +/-25% of the optimum.
Sensitivity ratio: S=0.250.11=0.44 — robust.
Plateau width at 10% threshold: from 0.013 to 0.027, Wrel=0.040.014=35%.
Strategy B (~25% PnL, ~40 trades, ~5% time)
Strategy B is optimized on a small number of trades. Parameter htf_entry_sell:
Optimal value: 0.018
PnL at 0.015: +24% (4% drop)
PnL at 0.025: +9% (64% drop)
PnL at 0.012: +11% (56% drop)
On the plot — an asymmetric and steep curve. The plateau exists only in the narrow range 0.015-0.020. To the right of the optimum — a cliff.
Sensitivity ratio: S=0.390.64=1.64 — moderate sensitivity, but with 40 trades this is a red flag. Small sample + narrow plateau = high probability of overfitting.
Plateau width at 10% threshold: from 0.016 to 0.020, Wrel=0.040.004=10%.
Strategy C (~300% PnL, ~400 trades, ~45% time)
Strategy C shows stunning PnL, but plateau analysis reveals problems:
Optimal value of htf_entry_sell: 0.022
PnL at 0.020: +295% (2% drop)
PnL at 0.025: +142% (53% drop)
PnL at 0.019: +128% (57% drop)
On the plot — a characteristic "needle": a very high peak at 0.022, sharp drop in all directions. The contour plot would show a bright spot immediately surrounded by cold colors.
Sensitivity ratio: S=0.140.53=3.79 — fragile. Despite 400 trades, the strategy is excessively dependent on the exact value of a single parameter.
Plateau width at 10% threshold: from 0.021 to 0.023, Wrel=0.040.002=5%.
Summary Table
Strategy
PnL
Trades
Sensitivity
Plateau width
Robustness score
Verdict
Strategy A
+55%
~500
0.44
35%
0.148
Robust
Strategy B
+25%
~40
1.64
10%
0.032
Check (small sample)
Strategy C
+300%
~400
3.79
5%
0.008
Overfitting
Paradox: Strategy C with PnL +300% has the worst robustness score. Strategy A with a "modest" +55% is the most robust. This is a typical plateau analysis result: impressive numbers often mask fragility.
Confidence intervals for each strategy can additionally be verified through Monte Carlo bootstrap — it will show PnL spread when resampling trades.
3D Visualization and Heatmaps
For the most important parameter pairs, it's useful to build a 3D surface and heatmap. This provides intuitive understanding of the landscape shape.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
defplot_parameter_landscape(
study: "optuna.Study",
param_x: str,
param_y: str,
grid_size: int = 50,
):
"""
Build a 3D surface plot and heatmap for a pair of parameters.
"""
trials = [t for t in study.trials
if t.state == optuna.trial.TrialState.COMPLETE]
x_vals = np.array([t.params[param_x] for t in trials])
y_vals = np.array([t.params[param_y] for t in trials])
z_vals = np.array([t.values[0] for t in trials])
from scipy.interpolate import griddata
xi = np.linspace(x_vals.min(), x_vals.max(), grid_size)
yi = np.linspace(y_vals.min(), y_vals.max(), grid_size)
Xi, Yi = np.meshgrid(xi, yi)
Zi = griddata((x_vals, y_vals), z_vals, (Xi, Yi), method='cubic')
fig = plt.figure(figsize=(18, 7))
ax1 = fig.add_subplot(121, projection='3d')
surf = ax1.plot_surface(Xi, Yi, Zi, cmap=cm.viridis, alpha=0.85,
edgecolor='none')
ax1.set_xlabel(param_x)
ax1.set_ylabel(param_y)
ax1.set_zlabel('PnL, %')
ax1.set_title('3D Parameter Landscape')
fig.colorbar(surf, ax=ax1, shrink=0.5)
ax2 = fig.add_subplot(122)
hm = ax2.pcolormesh(Xi, Yi, Zi, cmap=cm.viridis, shading='auto')
contours = ax2.contour(Xi, Yi, Zi, levels=10, colors='white',
linewidths=0.8, alpha=0.7)
ax2.clabel(contours, inline=True, fontsize=8, fmt='%.0f%%')
best = study.best_trial
ax2.scatter(best.params[param_x], best.params[param_y],
color='red', s=100, marker='*', zorder=5, label='Optimum')
ax2.set_xlabel(param_x)
ax2.set_ylabel(param_y)
ax2.set_title('Contour Heatmap')
ax2.legend()
fig.colorbar(hm, ax=ax2)
plt.tight_layout()
plt.savefig(f'landscape_{param_x}_vs_{param_y}.png', dpi=150)
plt.show()
A 3D surface plot for a robust strategy resembles a table mountain — a flat top with gentle slopes. For a fragile strategy — a sharp peak, like the Matterhorn. The heatmap complements the 3D view, showing the same information in a top-down projection with isolines.
Red Flags: When Optimization Results Are Suspicious
Eight signs that optimization found overfitting rather than a real pattern:
1. Sensitivity Ratio > 2 for a Key Parameter
If PnL drops more than 20% with a 10% parameter shift — the optimum is fragile.
2. Plateau Width < 10% of the Search Range
If the "good" region occupies less than 10% of the explored range — the optimizer most likely found an artifact.
3. Top-3 Trials Yield PnL 2-3x Above the Median
If the best trials are outliers against the rest rather than the "hilltop" — it's not a plateau.
top_3_mean = np.mean(sorted([t.values[0] for t in study.trials
if t.state == optuna.trial.TrialState.COMPLETE],
reverse=True)[:3])
median_pnl = np.median([t.values[0] for t in study.trials
if t.state == optuna.trial.TrialState.COMPLETE])
outlier_ratio = top_3_mean / median_pnl
if outlier_ratio > 2.5:
print(f"WARNING: Top trials are {outlier_ratio:.1f}x above median — possible overfitting")
4. Low Trade Count (< 50) with High PnL
Small sample + high PnL = high variance in the estimate. Plateau analysis on 40 trades is unreliable in itself. For such strategies, Monte Carlo bootstrap is critical.
5. One "Magic" Parameter Combination
If the contour plot shows a single bright dot amidst a gray field — this isn't a strategy, it's a data-fitted combination.
6. Too Many Parameters
For 12 parameters with 10 values each, the search space contains 1012 combinations. Optuna explores ~500. The probability of finding a "good" artifact in such a space is high. The more parameters, the stricter plateau analysis should be.
7. PnL Drops Sharply Out-of-Sample
If in-sample PnL is +87% and walk-forward shows +12% — the optimization fitted parameters to the training period. More about this in the Walk-Forward optimization article.
8. Parameters Are "Pinned" to Range Boundaries
If the optimal value coincides with the search grid boundary — the optimum may lie beyond the range. Expand the range and rerun the optimization.
Automated Plateau Analysis Report
Bringing it all together into a single report generated after each optimization:
import json
from datetime import datetime
defgenerate_plateau_report(
study: "optuna.Study",
strategy_name: str,
n_trades: int,
threshold_pct: float = 10.0,
) -> dict:
"""
Generate a complete plateau analysis report.
"""
robustness = compute_robustness_score(study, threshold_pct)
red_flags = []
sorted_params = sorted(
robustness["parameters"].items(),
key=lambda x: x[1]["importance"],
reverse=True
)
for name, metrics in sorted_params[:3]:
if metrics["sensitivity_ratio"] > 2.0:
red_flags.append(
f"High sensitivity for {name}: "f"S={metrics['sensitivity_ratio']:.2f}"
)
for name, metrics in robustness["parameters"].items():
if metrics["plateau_width_rel"] < 0.05:
red_flags.append(
f"Narrow plateau for {name}: "f"W={metrics['plateau_width_rel']:.1%}"
)
all_values = sorted(
[t.values[0] for t in study.trials
if t.state == optuna.trial.TrialState.COMPLETE],
reverse=True
)
iflen(all_values) > 10:
top3 = np.mean(all_values[:3])
med = np.median(all_values)
if med > 0and top3 / med > 2.5:
red_flags.append(
f"Top trials are outliers: "f"{top3:.1f} vs median {med:.1f} "f"({top3/med:.1f}x)"
)
if n_trades < 50:
red_flags.append(f"Low trade count: {n_trades}")
report = {
"strategy": strategy_name,
"timestamp": datetime.now().isoformat(),
"best_pnl": study.best_value,
"n_trials": len(study.trials),
"n_trades": n_trades,
"robustness_score": robustness["robustness_score"],
"verdict": robustness["verdict"],
"red_flags": red_flags,
"parameters": robustness["parameters"],
}
return report
report = generate_plateau_report(
study, strategy_name="Strategy A", n_trades=491
)
print(json.dumps(report, indent=2, default=str))
Plateau analysis and walk-forward validation (WFO) are complementary methods:
Plateau analysis answers the question: "How stable is the optimum to small parameter shifts?" This is a check of parametric robustness.
Walk-forward answers the question: "Do the parameters work on data the optimizer hasn't seen?" This is a check of temporal robustness.
A strategy can pass plateau analysis (wide plateau) but fail walk-forward (market regime changed). And vice versa — it can pass walk-forward on fixed parameters but have a fragile optimum.
Recommendation: always use both methods. If a strategy passes plateau analysis (R>0.1) and walk-forward (PnLOOS>50%×PnLIS) — this is a strong signal of robustness. More details in the Walk-Forward optimization article.
To assess PnL confidence intervals at each stage, apply Monte Carlo bootstrap. And for correctly comparing strategies with different active time, use the PnL per active time metric.
Recommendations
Before Optimization
Limit the number of parameters. The fewer parameters — the more reliable the plateau. 5-7 parameters is a reasonable maximum. 12 already requires heightened caution.
Set meaningful ranges. Don't set htf_entry_sell from 0.001 to 1.0 if the realistic range is 0.005 to 0.05. Unnecessarily wide ranges create the illusion of a plateau.
Use enough trials. For 12 parameters, a minimum of 300-500 trials. For reliable plateau analysis — 1000+.
During Optimization
Watch convergence. If Optuna continues finding significantly better solutions after 400 trials — the process hasn't converged, and plateau analysis will be unreliable.
Use pruning with caution. Aggressive pruning (MedianPruner) can cut trials that look bad in early steps but are important for building a complete landscape picture.
After Optimization
Generate the plateau report automatically. Integrate generate_plateau_report() into the optimization pipeline. Don't rely on visual assessment — use numbers.
Check the top-5 parameters. If fANOVA shows that 3 parameters explain 80% of the variance — the remaining 9 can be checked less thoroughly.
Compare with the baseline strategy. If the strategy with default parameters (no optimization) shows +30%, and the optimized one +55% — the difference is only 25 pp, and the plateau is likely wide. If the default shows 0%, and the optimized one +300% — all profitability depends on precise parameter fitting.
Final check — walk-forward. Plateau analysis is a necessary but not sufficient condition for robustness. Always validate out-of-sample.
Conclusion
Parameter optimization is a powerful tool, but without plateau analysis it's a game of roulette. You don't know whether you've found a stable pattern or fitted the model to noise.
Three rules of plateau analysis:
Compute the robustness score. The product of weighted plateau widths gives a single number that summarizes the robustness of all parameters. R>0.1 — green light.
Sensitivity ratio < 1 for key parameters. If a 10% parameter shift causes less than 10% PnL drop — the parameter is robust. If more — be cautious.
Visualize contour plots. No metric can replace understanding the landscape shape. A flat table mountain — good. A sharp needle — bad.
Plateau analysis takes 5 minutes after optimization and can save weeks of unprofitable live trading. It's a mandatory step between study.optimize() and launching the bot.
@article{soloviov2026plateauanalysis,
author = {Soloviov, Eugen},
title = {Plateau Analysis: How to Distinguish a Robust Optimum from Overfitting},
year = {2026},
url = {https://marketmaker.cc/en/blog/post/plateau-analysis-overfitting},
version = {0.1.0},
description = {Why finding the best strategy parameters is only half the work. How to visually and quantitatively distinguish a stable plateau from a fragile peak, and why Optuna contour plots are a mandatory step before launching an optimized strategy into production.}
}