Disclaimer: The information provided in this article is for educational and informational purposes only and does not constitute financial, investment, or trading advice. Trading cryptocurrencies involves significant risk of loss.
You have a strategy with 12 parameters. Each parameter takes ~9 values. You want to find the combination that maximizes PnL with limited drawdown. How do you do it?
If your answer is "I iterate through all combinations" — you have a problem. If your answer is "I change one parameter at a time" — you have a different problem. This article is about what problems lurk behind each approach and how to solve them.
Why Exhaustive Search Is Impossible
The Curse of Dimensionality
Exhaustive search (grid search) tests every combination of values for every parameter. For two parameters with 9 values, that's 92=81 runs — perfectly feasible. For three: 93=729 — tolerable.
But for a real strategy with 12 parameters:
Ngrid=912=282,429,536,481
Two hundred eighty-two billion runs. Even if a single backtest takes 1 second (which is already optimistic), exhaustive search would take:
T=3600×24×365282×109≈8,950 years
This is exponential growth: each new parameter multiplies the search space by 9. Add a 13th parameter — and instead of 9,000 years you need 80,000.
In the article about Parquet cache we showed how precomputing timeframes and indicators speeds up a single backtest to ~1 second. But even at 0.1 seconds per run, exhaustive search of 12 parameters would require 895 years. Precomputation helps, but doesn't solve the fundamental problem of exponential growth.
We need methods that explore the parameter space smarter than exhaustive search.
Coordinate Descent and OAT: Fast but Blind
Two Variants of the Same Idea
There are two related approaches — both optimize one parameter at a time, but differ in the number of passes:
OAT (One-at-a-Time) sweep — a single pass through all parameters. Iterate through values of the first parameter, fix the best, move to the second — and so on. Once. Fast and cheap.
Coordinate Descent — multi-pass. After optimizing the last parameter, return to the first and check whether the optimum has changed (since the context changed — other parameter values are now different). Repeat rounds until convergence. More expensive, but more precise — each round can refine the solution.
In practice, for backtests OAT is used more often: a single pass through 12 parameters — 96 runs. Coordinate descent with 3-5 rounds — 300-500 runs, which is already comparable to Optuna, but without its advantages.
For 12 parameters with ~8 values each:
NOAT=K×N=12×8=96 runs
Compare with 282×109 for grid search. OAT is linear: O(K⋅N) instead of O(NK). This is both its main advantage and its main problem.
defoat_sweep(
param_grid: dict[str, list],
run_backtest_fn,
initial_params: dict,
metric: str = "effective_score",
) -> dict:
"""
OAT sweep: single pass, optimizing one parameter at a time.
param_grid: {"htf_entry_sell": [0.0, 0.005, ..., 0.05], ...}
initial_params: starting values for all parameters
metric: metric to optimize (effective_score recommended —
PnL per active time extrapolated to a year)
"""
best_params = initial_params.copy()
best_score = run_backtest_fn(**best_params)[metric]
for param_name, values in param_grid.items():
param_best_val = best_params[param_name]
param_best_score = best_score
for val in values:
candidate = best_params.copy()
candidate[param_name] = val
result = run_backtest_fn(**candidate)
score = result[metric]
if score > param_best_score:
param_best_score = score
param_best_val = val
best_params[param_name] = param_best_val
best_score = param_best_score
print(f"{param_name}: best={param_best_val}, score={param_best_score:.4f}")
return best_params
Which metric to choose for optimization? Instead of raw PnL or PnL@MaxLev, it is recommended to use effective score — PnL per active time extrapolated to a year. This metric accounts for time in position and allows correct comparison of strategies with different trading frequencies.
The Blind Spot: Parameter Interactions
OAT assumes that the effect of each parameter is additive — i.e., the optimal value of one parameter does not depend on the values of others. This assumption holds for some parameters, but breaks for coupled ones.
Additive vs Coupled Parameters
Before optimizing — it's useful to classify parameters:
Additive (independent) — the optimal value of one does not depend on the other. They can be optimized one at a time cheaply:
htf_entry_sell and htf_entry_buy — entry thresholds for different directions (sell/buy) on the same timeframe. The sell threshold filters short signals, the buy threshold — longs. They operate on non-overlapping subsets of trades.
tp_target and be_trigger — take-profit and breakeven, if they don't create conflicting exit conditions.
Coupled (interactive) — the optimal value of one depends on the other. Joint optimization is needed:
htf_entry_sell and mtf_entry_sell — thresholds for the same direction (sell) on different timeframes. HTF determines which signals reach MTF, and the MTF threshold determines filtering effectiveness. The HTF optimum shifts when MTF changes.
ltf_entry_sell, mtf_entry_sell, htf_entry_sell — the entire threshold chain for one direction.
partial_frac and tp_target — partial close size depends on the TP level.
Practical approach: first cheaply optimize additive parameters via OAT. Then optimize coupled groups via Optuna. This reduces the budget: instead of 12 parameters in Optuna, we send only 6-8 coupled ones, while the rest are already fixed.
Example: How OAT Misses an Interaction
Consider two coupled thresholds:
htf_entry_sell — threshold on the higher timeframe (sell direction)
mtf_entry_sell — threshold on the middle timeframe (sell direction)
OAT fixes mtf_entry_sell = 0.01 (initial value) and iterates through htf_entry_sell. Finds the best value: htf_entry_sell = 0.02. Fixes it and moves to the next parameter — never returns.
Here's what OAT missed:
htf_entry_sell
mtf_entry_sell
PnL
0.02
0.01
+42%
0.02
0.02
+38%
0.03
0.02
+51%
0.03
0.01
+35%
The combination (0.03, 0.02) yields PnL +51%, but OAT will never consider it because with fixed mtf_entry_sell = 0.01, the value htf_entry_sell = 0.03 yields only +35%. OAT got "stuck" in the local optimum (0.02, 0.01) and cannot see the global optimum (0.03, 0.02).
This is a classic problem: if the objective function landscape contains diagonal ridges (when the optimum of one parameter shifts as another changes), OAT misses them.
Formalizing the Problem
Let f(θ1,θ2,…,θK) be the objective function (PnL). OAT finds a point where:
∂θi∂f=0∀i
But this is a necessary, not sufficient condition for a global optimum. If the Hessian matrix Hij=∂θi∂θj∂2f has significant off-diagonal elements — OAT does not account for cross-derivatives ∂θi∂θj∂2f when i=j.
For coupled parameters (thresholds of the same direction across multiple timeframes) — interactions are the rule, not the exception. The entry threshold on the higher timeframe determines which signals reach the middle one, and the threshold on the middle one determines filtering effectiveness on the lower. For additive parameters (different directions, independent filters) cross-derivatives are close to zero — and OAT works well.
Bayesian Optimization: Smart Search
The Idea
Instead of blind enumeration or greedy search, Bayesian optimization builds a surrogate model of the objective function and at each step selects the point where the expected improvement is maximum.
Algorithm:
Choose several random points, evaluate the objective function
Build a surrogate model (approximates f(θ) from observed points)
Find the point with maximum expected improvement (acquisition function)
Evaluate the objective function at that point
Update the surrogate model
Repeat steps 3-5
The key difference from OAT: Bayesian optimization considers all parameters simultaneously and can explore diagonal ridges in parameter space.
TPE (Tree-structured Parzen Estimator)
TPE is the default sampler in Optuna. Instead of modeling f(θ) directly, TPE models two distributions:
l(θ) — distribution of parameters where the objective function is better than threshold y∗
g(θ) — distribution of parameters where the objective function is worse than threshold y∗
TPE's acquisition function — the ratio:
EI(θ)∝g(θ)l(θ)
TPE selects points where l(θ) is large (parameters similar to "good" ones) and g(θ) is small (parameters not similar to "bad" ones).
Why TPE is suitable for backtests:
Handles conditional dependencies between parameters
Does not require continuity of the objective function
Efficient with moderate budgets (100-1000 iterations)
Supports categorical and discrete parameters
Gaussian Process (GP)
An alternative to TPE — Gaussian Process. GP models f(θ) as a multivariate normal process and provides not only a value prediction, but also uncertainty at each point.
f(θ)∼GP(m(θ),k(θ,θ′))
where m(θ) is the mean, k(θ,θ′) is the covariance function (kernel).
GP works well when:
There are few parameters (up to 10-15)
The objective function is smooth
Each run is expensive (minutes, hours)
For backtests with a precomputed Parquet cache, where a single run takes ~1 second, TPE is usually preferred: it builds the model faster and scales better to 500+ iterations.
At ~1 second per backtest (with precomputed cache):
T500=500×1s≈8 minutes
Eight minutes versus 8,950 years of exhaustive search. And TPE in 500 iterations finds combinations that OAT misses in 96, because it explores the parameter space simultaneously rather than one axis at a time.
Saving and Resuming a Study
import optuna
study = optuna.create_study(
storage="sqlite:///optuna_study.db",
study_name="strategy_v2",
sampler=TPESampler(seed=42),
direction="minimize",
load_if_exists=True, # continue if study already exists
)
study.optimize(objective, n_trials=300)
study.optimize(objective, n_trials=200)
Adding Constraints
Not all parameter combinations are valid. For example, the exit threshold should not exceed the entry threshold:
Optuna supports several samplers. Each has its own strengths.
TPESampler (default)
sampler = optuna.samplers.TPESampler(
n_startup_trials=20, # random trials before modeling begins
seed=42,
)
Principle: Tree-structured Parzen Estimator
Strengths: good for mixed parameter types, scales to 1000+ iterations
Weaknesses: may be less efficient with strong parameter interactions
When to use: by default, if there's no reason to choose another
CmaEsSampler
sampler = optuna.samplers.CmaEsSampler(seed=42)
Principle: Covariance Matrix Adaptation Evolution Strategy — an evolutionary algorithm that adapts the covariance matrix
Strengths: excellent at finding interactions between continuous parameters, accounts for correlations
Weaknesses: does not support categorical parameters, requires more iterations for initialization
When to use: if all parameters are continuous and you suspect strong interactions
GPSampler
sampler = optuna.samplers.GPSampler(seed=42)
Principle: Gaussian Process with acquisition function
Strengths: best sample efficiency (fewer iterations for a good result), provides uncertainty estimates
Weaknesses:O(n3) in iteration count — slow when n>200
When to use: if a single backtest is expensive (minutes) and the budget is limited to 100-200 iterations
RandomSampler (baseline)
sampler = optuna.samplers.RandomSampler(seed=42)
Principle: uniform random sampling
Strengths: doesn't get stuck in local optima, full space coverage
Weaknesses: doesn't use previous results
When to use: as a baseline for comparison, or for exploratory analysis
QMCSampler
sampler = optuna.samplers.QMCSampler(seed=42)
Principle: Quasi-Monte Carlo (Sobol/Halton sequences) — fills the space more uniformly than a random sampler
Strengths: better space coverage than RandomSampler, reproducibility
Weaknesses: does not adapt to results
When to use: for the first 50-100 iterations before switching to TPE
Summary Table
Sampler
Type
Interactions
Categorical
Best Budget
TPE
Bayesian
Partial
Yes
100-1000
CmaEs
Evolutionary
Yes
No
200-2000
GP
Bayesian
Yes
Limited
50-200
Random
Random
No
Yes
Any (baseline)
QMC
Quasi-random
No
No
50-500
Practical Benchmark
import optuna
import time
defbenchmark_sampler(sampler, n_trials=300):
"""Compare samplers on the same task."""
study = optuna.create_study(sampler=sampler, direction="minimize")
start = time.time()
study.optimize(objective, n_trials=n_trials, show_progress_bar=False)
elapsed = time.time() - start
return {
"best_value": -study.best_value,
"elapsed_sec": elapsed,
"best_trial": study.best_trial.number,
}
samplers = {
"TPE": optuna.samplers.TPESampler(seed=42),
"CmaEs": optuna.samplers.CmaEsSampler(seed=42),
"GP": optuna.samplers.GPSampler(seed=42),
"Random": optuna.samplers.RandomSampler(seed=42),
"QMC": optuna.samplers.QMCSampler(seed=42),
}
for name, sampler in samplers.items():
result = benchmark_sampler(sampler, n_trials=300)
print(f"{name:8s}: best PnL={result['best_value']:.2f}%, "f"found at trial #{result['best_trial']}, "f"time={result['elapsed_sec']:.1f}s")
Typical results for a strategy with 12 parameters:
Sampler
Best PnL
Found at Iteration
Sampler Overhead
TPE
~51%
~180
Low
CmaEs
~49%
~250
Medium
GP
~48%
~90
High when n>200
Random
~42%
~270
Minimal
QMC
~43%
~200
Minimal
TPE and CmaEs consistently outperform random search by 15-20% in final PnL. GP finds good results earlier but hits a computational ceiling with a large number of iterations.
Multi-Objective Optimization: PnL vs MaxDD
Why a Single Criterion Is Not Enough
Maximizing PnL without drawdown constraints is a path to disaster. A strategy with PnL +80% and MaxDD -30% is, due to loss-profit asymmetry, significantly riskier than a strategy with PnL +50% and MaxDD -5%.
The optimization problem is actually multi-objective:
maxθPnL(θ)subject toMaxDD(θ)→min
These goals conflict: aggressive parameters increase both PnL and drawdown. The solution is not a single point, but a Pareto front: a set of solutions where you cannot improve one metric without worsening the other.
The Pareto front gives multiple solutions. How to choose one?
defselect_from_pareto(
pareto_trials: list,
max_dd_limit: float = -5.0,
min_pnl: float = 20.0,
) -> list:
"""
Filter the Pareto front by constraints.
max_dd_limit: maximum acceptable drawdown (e.g., -5%)
min_pnl: minimum acceptable PnL (%)
"""
filtered = []
for trial in pareto_trials:
pnl, max_dd = trial.values
if max_dd >= max_dd_limit and pnl >= min_pnl:
max_lev = min(50 / abs(max_dd), 100) if max_dd != 0else100
pnl_at_max_lev = pnl * max_lev
filtered.append({
"trial": trial,
"pnl": pnl,
"max_dd": max_dd,
"max_lev": max_lev,
"pnl_at_max_lev": pnl_at_max_lev,
})
filtered.sort(key=lambda x: x["pnl_at_max_lev"], reverse=True)
return filtered
Note: when calculating PnL at maximum leverage, you must account for funding rates, otherwise theoretically high leverage will turn into a loss on the real market. Additionally, the final PnL is a single-point estimate, and to assess result stability you need Monte Carlo bootstrap.
Example: Three Strategies on the Pareto Front
Strategy
PnL
MaxDD
MaxLev
PnL@MaxLev
Trading time
Strategy A
~55%
~0.9%
~55x
~3025%
~15%
Strategy B
~25%
~0.75%
~66x
~1650%
~5%
Strategy C
~300%
~17%
~3x
~900%
~45%
Strategy C with an impressive PnL of +300% turns out to be the least attractive by PnL@MaxLev due to high drawdown. Strategy A leads in net leveraged return, but when accounting for PnL per active time, Strategy B may be preferable — 95% of free time can be filled with other strategies.
Contour Plots and Parameter Importance
Landscape Visualization
After optimization — visualization. Optuna provides built-in tools:
A contour plot builds a two-dimensional cross-section of the objective function for a pair of parameters. If the isolines are parallel to one of the axes — the parameters don't interact, and OAT would have found the same optimum. If the isolines are diagonal — there is interaction, and OAT will miss.
key_params = ["htf_entry_sell", "mtf_entry_sell", "ltf_entry_sell",
"htf_entry_buy", "mtf_entry_buy", "ltf_entry_buy"]
for i, p1 inenumerate(key_params):
for p2 in key_params[i+1:]:
fig = vis.plot_contour(study, params=[p1, p2])
fig.write_image(f"contour_{p1}_vs_{p2}.png")
If a contour plot shows a plateau — a region where the objective function changes little — this is a good sign. A plateau means the result is robust to small parameter deviations. More about plateau analysis and its relationship to overfitting — in the upcoming article Plateau analysis.
Parameter Importance
importance = optuna.importance.get_param_importances(study)
for param, imp in importance.items():
print(f"{param:20s}: {imp:.4f}")
Parameters with importance < 0.01 can be fixed at their default value — this reduces the dimensionality of the problem and speeds up optimization. But be careful: low importance may also mean the parameter is important only in interaction with others. Verify through contour plots.
Precomputed Cache: Why 1 Second per Backtest Changes Everything
The speed of a single backtest determines which optimization method you can afford.
Backtest Time
96 OAT
500 TPE
2000 CmaEs
60 seconds
1.6 hours
8.3 hours
33 hours
10 seconds
16 minutes
83 minutes
5.5 hours
1 second
1.5 minutes
8 minutes
33 minutes
0.1 seconds
10 seconds
50 seconds
3.3 minutes
At 60 seconds per backtest, 500 TPE iterations take 8 hours. Already tolerable, but iterating (changing the objective function, restarting) is expensive. At 1 second — 8 minutes, and you can run dozens of experiments per day.
This is precisely why precomputation into Parquet cache is not just a speed optimization, but an expansion of the space of available methods. Without cache you're limited to OAT or 100 GP iterations. With cache — you can afford 2000 CmaEs iterations or a full multi-objective NSGA-III.
import pyarrow.parquet as pq
import time
t0 = time.time()
htf_pre = pq.read_table("cache/htf_indicators.parquet").to_pandas()
mtf_pre = pq.read_table("cache/mtf_indicators.parquet").to_pandas()
ltf_pre = pq.read_table("cache/ltf_indicators.parquet").to_pandas()
print(f"Cache loaded in {time.time() - t0:.2f}s") # ~0.3s
t1 = time.time()
result = run_backtest(htf_pre, mtf_pre, ltf_pre, htf_entry_sell=0.02, ...)
print(f"Backtest in {time.time() - t1:.2f}s") # ~1.0s
Practical Recommendations
When to Use OAT
OAT is justified in the following cases:
Exploratory analysis. You're just starting to explore a strategy and want to understand which parameters affect the result at all. 96 runs in 1.5 minutes — an excellent starting point.
Additive parameters. For parameters that operate on non-overlapping subsets of trades (sell vs buy directions, different instruments), OAT will give a correct result faster.
Very expensive backtest. If a single run takes 10+ minutes and cannot be sped up, OAT with 96 runs (16 hours) is preferable to 500 TPE iterations (3.5 days).
When to Use Optuna
Optuna is preferable in most cases:
More than 3 parameters. Interactions are practically guaranteed — OAT will miss the optimum.
Multi-timeframe strategies. Thresholds across different timeframes are almost always interconnected.
Final optimization. When the strategy has passed Monte Carlo bootstrap and you're confident in its robustness — Optuna will find the best parameters.
Multi-objective problems. PnL vs MaxDD vs trading time — OAT cannot solve this problem in principle.
Hybrid Approach: OAT for Additive + Optuna for Coupled
You don't have to choose between OAT and Optuna — it's better to combine them:
Classify parameters. Divide into additive (independent) and coupled (interactive). Example for 12 separation parameters:
Additive:htf_entry_sell <-> htf_entry_buy, mtf_entry_sell <-> mtf_entry_buy, ltf_entry_sell <-> ltf_entry_buy (sell/buy — different directions, operate on non-overlapping trades)
Coupled group sell:htf_entry_sell, mtf_entry_sell, ltf_entry_sell (filtering chain: HTF -> MTF -> LTF for sell signals)
Coupled group buy:htf_entry_buy, mtf_entry_buy, ltf_entry_buy
OAT for additive. Optimize sell and buy groups independently. If sell parameters don't affect buy trades — OAT will give a correct result in minutes.
Optuna for coupled. Within each group (sell: 6 parameters entry+exit) use TPE. 6 parameters instead of 12 — the budget is cut in half.
1. Precompute Parquet cache (once)
2. Classify parameters: additive vs coupled
3. OAT for additive (~50 runs, ~1 min) → fix
4. Optuna TPE for coupled groups (300 iterations x 2 groups, ~10 min)
5. Optuna NSGA-III for meta-parameters (500 iterations, ~8 min) → Pareto front
6. Contour plots → visualize interactions
7. Monte Carlo bootstrap of best points → confidence intervals
8. Walk-Forward → out-of-sample validation
Step 8 — walk-forward optimization — is critically important for protection against overfitting. More about this in the upcoming article Walk-Forward.
Optimization Pitfalls
Overfitting. The more parameters and the more precise the optimization — the higher the risk of fitting the strategy to historical data. 500 Optuna iterations with 12 parameters will find a combination that works perfectly on the training set, but is useless on new data.
Prefer solutions on plateaus (more about this in Plateau analysis)
Multiple comparisons problem. If you test 500 combinations, the probability of randomly finding a "good" result grows. Bonferroni correction or FDR (False Discovery Rate) control help, but the simpler approach is out-of-sample validation.
Insufficient budget. TPE with 50 iterations for 12 parameters is too few. The first 20 iterations are random (startup), leaving only 30 for modeling. Minimum budget: 10×K=120 iterations for 12 parameters, recommended: 30–50×K.
Freqtrade: How It Works in a Production Framework
Freqtrade — one of the popular algotrading frameworks — uses Optuna under the hood through the Hyperopt module. Its experience confirms our recommendations:
Samplers: TPE (default), GP, CmaEs, NSGA-II, QMC — all available through configuration
Loss functions: 12 built-in loss functions, including ShortTradeDurHyperOptLoss, SharpeHyperOptLoss, MaxDrawDownHyperOptLoss
Multi-objective: support for NSGA-II and NSGA-III for simultaneous optimization of multiple metrics
Custom samplers: ability to plug in any Optuna-compatible sampler
A key lesson from the Freqtrade ecosystem: built-in loss functions cover typical scenarios, but for serious optimization you need a custom objective function that accounts for your strategy's specifics — active time, funding costs, adaptive drill-down for accurate fill simulation.
Conclusion
Coordinate descent (OAT) is a fast and intuitive method. For 12 parameters it requires only 96 runs and finishes in a minute and a half. But it is blind to parameter interactions — and in multi-timeframe strategies, interactions are almost always present.
Bayesian optimization through Optuna (TPE, GP, CmaEs) explores the parameter space as a whole. 500 iterations in 8 minutes — with a precomputed Parquet cache — find combinations invisible to OAT.
Multi-objective optimization (NSGA-III) transforms the problem of "maximize PnL" into the problem of "build a Pareto front of PnL vs MaxDD" — and provides a set of solutions with different risk-return tradeoffs.
But optimization is only part of the pipeline. The found parameters need to be validated through Monte Carlo bootstrap, corrected for funding rates, recalculated accounting for active time, and run through walk-forward validation. More on that in the upcoming articles of the series.
@article{soloviov2026optuna,
author = {Soloviov, Eugen},
title = {Coordinate Descent vs Bayesian Optimization: Which Finds Better Parameters},
year = {2026},
url = {https://marketmaker.cc/en/blog/post/optuna-vs-coordinate-descent},
description = {Why exhaustive search is impossible for 12+ parameters, how coordinate descent misses interactions, and how Optuna with a TPE sampler finds in 500 iterations what OAT cannot find in 96.}
}