Signal Correlation: How Many Pairs to Monitor

You launch a strategy on 10 crypto pairs: BTC/USDT, ETH/USDT, SOL/USDT, AVAX/USDT, and six more alts. The logic seems bulletproof: if the strategy is active 5% of the time on one pair, then across 10 pairs at least one should be active $1 - 0.95^{10} = 40\%$ of the time. A fourfold increase in utilization.

In practice, utilization turns out to be 15-16%, not 40%. Your 10 pairs behave like 3. Capital sits idle, fill_efficiency drops, and the effective portfolio return ends up three times lower than projected.

The reason is signal correlation. And in cryptocurrencies, it is catastrophically high.

The Illusion of Diversification in Crypto

In traditional finance, diversification works because Apple stock and an oil ETF react to different factors. In the cryptocurrency market, everything is different.

BTC is the dominant factor. When Bitcoin drops 5%, ETH drops 6-8%, SOL drops 8-12%, altcoins drop 10-20%. Daily return correlations in the crypto market are consistently above 0.6, and during panic they approach 1.0.

But for us — algo traders — what matters is not price correlation, but signal correlation. If the strategy is based on momentum and BTC triggers an entry signal, there is a high probability that ETH and SOL will trigger an analogous signal in the same minute. All pairs enter long simultaneously, all exit simultaneously. Ten positions — but essentially one bet.

Why 10 Pairs ≠ 10x Diversification

Formal Statement

Suppose a strategy on each of $N$ pairs is active $p$ fraction of the time. If signals were completely independent, the probability that at least one pair is active:

$P(\geq 1\ \text{active}) = 1 - (1 - p)^N$

For Strategy B ( $p = 0.05$ , $N = 10$ ):

$P(\geq 1) = 1 - 0.95^{10} = 1 - 0.5987 \approx 40.1\%$

But signals are not independent. Cryptocurrencies move in sync — meaning signals emerge and extinguish in clusters.

Correlation Turns 10 Pairs Into 3

The intuition is this: if 10 pairs are correlated, they carry information from not 10 independent sources but, say, 3-4. We formalize this through effective_N:

$N_{eff} = \frac{N}{C_f}$

where $C_f$ is the correlation factor, reflecting the average pairwise signal correlation. When $C_f = 1$ the pairs are fully independent; when $C_f = N$ they are identical.

For crypto pairs, a typical $C_f \approx 3$ . Then:

$N_{eff} = \frac{10}{3} \approx 3.3$

$P(\geq 1) = 1 - 0.95^{3.3} \approx 1 - 0.844 = 15.6\%$

Not 40%, but 15.6%. A 2.5x difference. Fill efficiency drops accordingly, and with it the effective return of the entire portfolio (see PnL per Active Time).

Correlation in Crypto Markets

Crypto signal correlation matrix heatmap

BTC as the Dominant Factor

The crypto market has a pronounced factor structure. BTC explains 60-80% of the daily return variance for most altcoins. This is clearly visible through PCA (Principal Component Analysis):

import numpy as np
from sklearn.decomposition import PCA

def analyze_crypto_factor_structure(returns_matrix: np.ndarray, pair_names: list) -> dict:
    """
    PCA analysis of the factor structure of crypto returns.

    Args:
        returns_matrix: returns matrix [n_days x n_pairs]
        pair_names: list of pair names
    """
    pca = PCA()
    pca.fit(returns_matrix)

    explained = pca.explained_variance_ratio_
    cumulative = np.cumsum(explained)

    print("Factor structure:")
    for i, (var, cum) in enumerate(zip(explained[:5], cumulative[:5])):
        print(f"  PC{i+1}: {var:.1%} variance (cumulative: {cum:.1%})")

    loadings = pca.components_[0]
    print("\nPC1 loadings (BTC factor):")
    for name, load in sorted(zip(pair_names, loadings), key=lambda x: -abs(x[1])):
        print(f"  {name}: {load:.3f}")

    return {
        "explained_variance": explained,
        "n_effective_factors": int(np.searchsorted(cumulative, 0.90)) + 1,
        "pc1_loadings": dict(zip(pair_names, loadings)),
    }

Typical result for a portfolio of 10 crypto pairs:

Component	Explained Variance	Cumulative
PC1 (BTC)	65%	65%
PC2	12%	77%
PC3	8%	85%
PC4	5%	90%
PC5-PC10	10%	100%

Four factors explain 90% of variance. Out of 10 pairs, no more than 4 are "independent."

Signal Correlation vs. Price Correlation

Here is an important nuance. Price correlation and signal correlation are different things. BTC and ETH prices are correlated at 0.85, but the signals of a specific strategy may be correlated at 0.95 or at 0.50 — depending on the entry logic.

Example: an RSI overbought/oversold strategy. RSI on BTC crosses 30 (oversold) — enter long. ETH at the same moment may also be oversold (signal correlation ~0.90). Or it may not be, if ETH was falling more slowly (signal correlation ~0.40).

The correct approach is to measure the correlation of signals specifically, not price series:

import numpy as np
from itertools import combinations

def signal_correlation_matrix(
    signals: dict,  # {pair: np.array of 0/1 per minute}
    method: str = "pearson",
) -> np.ndarray:
    """
    Calculate the signal correlation matrix (binary: 0 = flat, 1 = in position).

    Args:
        signals: dictionary {pair_name: binary_signal_array}
        method: correlation method ("pearson", "jaccard")
    """
    pairs = sorted(signals.keys())
    n = len(pairs)
    corr_matrix = np.ones((n, n))

    for i, j in combinations(range(n), 2):
        s_i = signals[pairs[i]]
        s_j = signals[pairs[j]]

        if method == "pearson":
            corr = np.corrcoef(s_i, s_j)[0, 1]
        elif method == "jaccard":
            intersection = np.sum(s_i & s_j)
            union = np.sum(s_i | s_j)
            corr = intersection / union if union > 0 else 0
        else:
            raise ValueError(f"Unknown method: {method}")

        corr_matrix[i, j] = corr
        corr_matrix[j, i] = corr

    return corr_matrix, pairs


def estimate_correlation_factor(corr_matrix: np.ndarray) -> float:
    """
    Estimate correlation_factor from the signal correlation matrix.

    correlation_factor = 1 + (N-1) * mean_pairwise_correlation

    When correlation is 0 → C_f = 1 (all independent).
    When correlation is 1 → C_f = N (all identical).
    """
    n = corr_matrix.shape[0]
    upper_triangle = corr_matrix[np.triu_indices(n, k=1)]
    mean_corr = np.mean(upper_triangle)

    correlation_factor = 1 + (n - 1) * mean_corr
    return correlation_factor

Temporal Correlation: Calm vs. Panic

Correlation is not static. During calm periods, BTC and alts can diverge — ETH rises on Ethereum news, SOL on Solana news. In a crisis, everything collapses into a single factor: risk-on/risk-off.

def rolling_correlation_factor(
    signals: dict,
    window_days: int = 30,
    step_days: int = 7,
) -> list:
    """
    Rolling correlation_factor to detect regime changes.
    """
    pairs = sorted(signals.keys())
    minutes_per_day = 1440
    window = window_days * minutes_per_day
    step = step_days * minutes_per_day

    total_minutes = len(signals[pairs[0]])
    results = []

    for start in range(0, total_minutes - window, step):
        end = start + window
        window_signals = {p: signals[p][start:end] for p in pairs}

        corr_matrix, _ = signal_correlation_matrix(window_signals)
        cf = estimate_correlation_factor(corr_matrix)

        results.append({
            "start_minute": start,
            "end_minute": end,
            "correlation_factor": cf,
            "effective_n": len(pairs) / cf,
        })

    return results

Typical picture for 10 crypto pairs:

Market Regime	Average Signal Correlation	$C_f$	$N_{eff}$
Sideways (low vol)	0.15-0.25	2.4-3.3	3.0-4.2
Uptrend	0.25-0.40	3.3-4.6	2.2-3.0
Downtrend	0.30-0.50	3.7-5.5	1.8-2.7
Panic (crash)	0.60-0.90	6.4-9.1	1.1-1.6

During panic, 10 pairs compress to 1-2 effective ones. Precisely when diversification is needed most, it vanishes. This is the crypto analogue of the classic "correlations go to 1 in a crisis."

effective_N: The Key Concept

Formula and Derivation

The idea of effective_N is borrowed from statistics, where effective sample size accounts for autocorrelation of observations. For our purposes:

$N_{eff} = \frac{N}{1 + (N - 1) \cdot \bar{\rho}}$

where $\bar{\rho}$ is the average pairwise signal correlation. Simplified notation:

$N_{eff} = \frac{N}{C_f}, \quad C_f = 1 + (N - 1) \cdot \bar{\rho}$

Properties:

When $\bar{\rho} = 0$ : $C_f = 1$ , $N_{eff} = N$ — full independence
When $\bar{\rho} = 1$ : $C_f = N$ , $N_{eff} = 1$ — all pairs are identical
When $\bar{\rho} = 0.25$ and $N = 10$ : $C_f = 3.25$ , $N_{eff} = 3.08$

How to Estimate correlation_factor from Data

In practice, there are three approaches:

1. From the signal correlation matrix (exact).

Run the strategy on all pairs, obtain binary signals (0/1 for each minute), build the correlation matrix, compute $C_f$ using the formula above.

2. From PCA of price returns (approximate).

If signals strongly depend on price dynamics (momentum, mean-reversion), you can estimate $N_{eff}$ as the number of PCA components explaining 90% of variance.

3. From asset class heuristics (rough).

Asset Class	Typical $C_f$
Crypto (top-10)	2.5-4.0
Crypto (with DeFi/memecoins)	2.0-3.0
Forex (majors)	1.5-2.5
Stocks (single sector)	2.0-3.5
Stocks (cross-sector)	1.2-1.8

For a crypto portfolio of BTC, ETH, SOL, AVAX, MATIC, DOGE, DOT, LINK, UNI, ATOM, a safe estimate is $C_f \approx 3$ .

Slot Utilization Modeling

Formula for $P(\geq 1\ \text{active})$

The base formula accounting for correlation:

$P(\geq 1\ \text{active}) = 1 - (1 - p)^{N_{eff}}$

Table for different strategies and number of pairs ( $C_f = 3$ ):

Strategy	$p$ (trading time)	5 pairs ( $N_{eff}=1.7$ )	10 pairs ( $N_{eff}=3.3$ )	20 pairs ( $N_{eff}=6.7$ )	50 pairs ( $N_{eff}=16.7$ )
Strategy B	5%	8.2%	15.6%	29.1%	58.0%
Strategy A	15%	23.6%	41.8%	65.9%	92.8%
Strategy C	45%	67.1%	89.0%	98.8%	~100%

For Strategy B with 5% activity, you need 50 pairs just to have at least one active position half the time. And that doesn't even account for the fact that 50 crypto pairs are more strongly correlated than 10.

Multi-Slot Orchestrator

A real orchestrator manages multiple slots simultaneously. If you have 5 slots and 10 pairs, utilization is calculated differently:

$\text{utilization} = \frac{\min(E[\text{active}], \text{max\_slots})}{\text{max\_slots}}$

$E[\text{active}] = N_{eff} \cdot p$

def estimate_fill_efficiency(
    trading_time_pct: float,
    n_pairs: int,
    correlation_factor: float = 3.0,
    max_slots: int = 1,
) -> dict:
    """
    Analytical estimate of fill efficiency for a multi-slot orchestrator.

    Args:
        trading_time_pct: fraction of active time for one strategy on one pair
        n_pairs: number of trading pairs
        correlation_factor: signal correlation coefficient
        max_slots: maximum number of simultaneous positions

    Returns:
        dict with utilization metrics
    """
    effective_n = n_pairs / correlation_factor

    p_at_least_one = 1 - (1 - trading_time_pct) ** effective_n

    expected_active = effective_n * trading_time_pct

    utilization = min(expected_active, max_slots) / max_slots

    fill_efficiency = min(p_at_least_one, utilization)

    return {
        "effective_n": effective_n,
        "p_at_least_one": p_at_least_one,
        "expected_active": expected_active,
        "utilization": utilization,
        "fill_efficiency": fill_efficiency,
    }


configs = [
    ("Strategy B, 10 pairs, 1 slot", 0.05, 10, 3.0, 1),
    ("Strategy B, 10 pairs, 3 slots", 0.05, 10, 3.0, 3),
    ("Strategy B, 30 pairs, 1 slot", 0.05, 30, 3.0, 1),
    ("Strategy A, 10 pairs, 1 slot", 0.15, 10, 3.0, 1),
    ("Strategy C, 10 pairs, 1 slot", 0.45, 10, 3.0, 1),
    ("Strategy C, 10 pairs, 5 slots", 0.45, 10, 3.0, 5),
]

for name, p, n, cf, slots in configs:
    result = estimate_fill_efficiency(p, n, cf, slots)
    print(f"{name}:")
    print(f"  N_eff = {result['effective_n']:.1f}")
    print(f"  P(≥1 active) = {result['p_at_least_one']:.1%}")
    print(f"  E[active] = {result['expected_active']:.2f}")
    print(f"  fill_efficiency = {result['fill_efficiency']:.1%}")
    print()

Expected output:

Strategy B, 10 pairs, 1 slot:
  N_eff = 3.3
  P(≥1 active) = 15.6%
  E[active] = 0.17
  fill_efficiency = 15.6%

Strategy B, 10 pairs, 3 slots:
  N_eff = 3.3
  P(≥1 active) = 15.6%
  E[active] = 0.17
  fill_efficiency = 5.6%

Strategy B, 30 pairs, 1 slot:
  N_eff = 10.0
  P(≥1 active) = 40.1%
  E[active] = 0.50
  fill_efficiency = 40.1%

Strategy A, 10 pairs, 1 slot:
  N_eff = 3.3
  P(≥1 active) = 41.8%
  E[active] = 0.50
  fill_efficiency = 41.8%

Strategy C, 10 pairs, 1 slot:
  N_eff = 3.3
  P(≥1 active) = 89.0%
  E[active] = 1.50
  fill_efficiency = 89.0%

Strategy C, 10 pairs, 5 slots:
  N_eff = 3.3
  P(≥1 active) = 89.0%
  E[active] = 1.50
  fill_efficiency = 30.0%

Note: Strategy B with 3 slots and 10 pairs shows fill_efficiency of 5.6%. Three slots are pointless when the expected number of active pairs is only 0.17. Slots should be allocated proportionally to expected load.

Simulation from Real Data

The analytical model is an approximation. For accurate estimation, you need simulation on real signals:

import numpy as np

def simulate_fill_efficiency(
    all_signals: dict,       # {(strategy, pair): [(entry_min, exit_min), ...]}
    max_slots: int = 10,
    test_period_minutes: int = 750 * 24 * 60,  # 750 days
    priority_fn=None,        # priority function for position selection
) -> dict:
    """
    Simulate real slot load of the orchestrator.

    For each minute: count how many pairs want to enter a position,
    and how many slots are actually occupied (accounting for the limit).

    Args:
        all_signals: signals by pairs and strategies
        max_slots: maximum number of simultaneous positions
        test_period_minutes: length of the test period in minutes
        priority_fn: if None — FIFO; otherwise — ranking function
    """
    demand_timeline = np.zeros(test_period_minutes, dtype=np.int32)
    capped_timeline = np.zeros(test_period_minutes, dtype=np.int32)

    for signals in all_signals.values():
        for entry_min, exit_min in signals:
            if entry_min < test_period_minutes:
                end = min(exit_min, test_period_minutes)
                demand_timeline[entry_min:end] += 1

    capped_timeline = np.minimum(demand_timeline, max_slots)

    total_demand = np.sum(demand_timeline)
    total_filled = np.sum(capped_timeline)
    time_with_any_active = np.sum(demand_timeline > 0)

    fill_efficiency = np.mean(capped_timeline) / max_slots
    demand_fill_ratio = total_filled / total_demand if total_demand > 0 else 0
    time_utilization = time_with_any_active / test_period_minutes

    slot_distribution = {}
    for s in range(max_slots + 1):
        slot_distribution[s] = np.mean(capped_timeline == s)

    return {
        "fill_efficiency": fill_efficiency,
        "demand_fill_ratio": demand_fill_ratio,
        "time_utilization": time_utilization,
        "avg_demand": np.mean(demand_timeline),
        "avg_filled": np.mean(capped_timeline),
        "slot_distribution": slot_distribution,
        "overflow_pct": np.mean(demand_timeline > max_slots),
    }

Simulation on real data often shows even lower utilization than the analytical estimate, because it accounts for temporal clustering of signals: all pairs enter simultaneously in a cluster, creating overflow, and then all go silent, creating a void.

How Many Pairs to Monitor? Diminishing Returns Analysis

Effective N and diminishing returns curve

The key question: at what $N$ does adding one more pair stop noticeably increasing fill_efficiency?

import numpy as np

def diminishing_returns_analysis(
    trading_time_pct: float,
    correlation_factor: float = 3.0,
    max_pairs: int = 100,
    target_utilization: float = 0.80,
) -> dict:
    """
    Diminishing returns analysis from adding new pairs.
    """
    results = []
    target_n = None

    for n in range(1, max_pairs + 1):
        n_eff = n / correlation_factor
        p_active = 1 - (1 - trading_time_pct) ** n_eff
        marginal = 0
        if n > 1:
            prev_eff = (n - 1) / correlation_factor
            prev_p = 1 - (1 - trading_time_pct) ** prev_eff
            marginal = p_active - prev_p

        results.append({
            "n_pairs": n,
            "n_effective": n_eff,
            "p_at_least_one": p_active,
            "marginal_gain": marginal,
        })

        if target_n is None and p_active >= target_utilization:
            target_n = n

    return {
        "results": results,
        "target_n_for_utilization": target_n,
    }


analysis_b = diminishing_returns_analysis(0.05, correlation_factor=3.0, target_utilization=0.80)
print(f"Strategy B: need {analysis_b['target_n_for_utilization']} pairs for 80% P(≥1)")

for r in analysis_b["results"]:
    if r["n_pairs"] in [1, 3, 5, 10, 20, 30, 50, 80]:
        print(f"  N={r['n_pairs']:3d}: N_eff={r['n_effective']:.1f}, "
              f"P(≥1)={r['p_at_least_one']:.1%}, "
              f"marginal={r['marginal_gain']:.2%}")

Results for Strategy B ( $p = 0.05$ , $C_f = 3$ ):

$N$ pairs	$N_{eff}$	$P(\geq 1)$	Marginal gain
1	0.3	1.7%	—
3	1.0	5.0%	+1.7%
5	1.7	8.2%	+1.6%
10	3.3	15.6%	+1.4%
20	6.7	29.1%	+1.1%
30	10.0	40.1%	+0.9%
50	16.7	58.0%	+0.6%
80	26.7	74.5%	+0.4%

For Strategy B, reaching 80% single-slot utilization is impossible even with 100 pairs (you need ~96 pairs). This is a fundamental limitation: a strategy with 5% trading time is not suited for single-slot operation — it needs a portfolio approach with multiple strategies.

For Strategy A ( $p = 0.15$ , $C_f = 3$ ):

$N$ pairs	$N_{eff}$	$P(\geq 1)$	Marginal gain
5	1.7	23.6%	—
10	3.3	41.8%	+3.3%
20	6.7	65.9%	+2.1%
30	10.0	80.3%	+1.2%

Strategy A reaches 80% utilization at ~30 pairs. Marginal gain at the 30th pair is only +1.2%.

For Strategy C ( $p = 0.45$ , $C_f = 3$ ):

$N$ pairs	$N_{eff}$	$P(\geq 1)$
3	1.0	45.0%
5	1.7	67.1%
10	3.3	89.0%
15	5.0	95.0%

Strategy C with 45% trading time reaches 90% utilization at just 10 pairs. Adding more is pointless.

Edge Degradation Across Pairs

There is another factor that limits the number of pairs: edge degradation. A strategy developed and optimized on BTC/USDT may perform worse on less liquid alts.

Causes of degradation:

Liquidity: slippage on DOGE/USDT is several times higher than on BTC/USDT
Spreads: less liquid pairs have wider bid-ask spreads
Microstructure: order book patterns differ between pairs
Manipulation: low-liquidity alts are susceptible to pump-and-dump

def edge_decay_analysis(
    strategy_results: dict,  # {pair: {"pnl_per_day": float, "n_trades": int}}
    min_trades: int = 30,
) -> list:
    """
    Rank pairs by edge accounting for degradation.
    """
    ranked = []
    for pair, metrics in strategy_results.items():
        if metrics["n_trades"] < min_trades:
            continue
        ranked.append({
            "pair": pair,
            "pnl_per_day": metrics["pnl_per_day"],
            "n_trades": metrics["n_trades"],
            "sharpe": metrics.get("sharpe", 0),
        })

    ranked.sort(key=lambda x: x["pnl_per_day"], reverse=True)

    cumulative_pnl = []
    running_sum = 0
    for i, r in enumerate(ranked):
        running_sum += r["pnl_per_day"]
        avg = running_sum / (i + 1)
        cumulative_pnl.append({
            "n_pairs": i + 1,
            "last_added": r["pair"],
            "last_pnl_per_day": r["pnl_per_day"],
            "avg_pnl_per_day": avg,
        })

    return cumulative_pnl

Typical picture:

# pairs	Last Added	PnL/day of last	Average PnL/day
1	BTC/USDT	0.89%	0.89%
2	ETH/USDT	0.82%	0.86%
3	SOL/USDT	0.71%	0.81%
5	AVAX/USDT	0.55%	0.73%
8	DOT/USDT	0.31%	0.61%
10	DOGE/USDT	0.12%	0.53%

Adding the 10th pair lowers the portfolio's average PnL/day. By the 8th pair, the edge is already half that of the best. A balance is needed between fill_efficiency (grows with the number of pairs) and average edge (falls).

Optimal Number of Pairs: Unified Model

We combine fill_efficiency and edge decay into a single metric — expected portfolio PnL per day:

$\text{Portfolio PnL/day} = \text{avg\_edge}(N) \times \text{fill\_efficiency}(N) \times 365$

def optimal_pairs_count(
    pair_edges: list,           # PnL/day in descending order: [0.89, 0.82, 0.71, ...]
    trading_time_pct: float,
    correlation_factor: float = 3.0,
    max_slots: int = 1,
) -> dict:
    """
    Find the optimal number of pairs that maximizes portfolio PnL.
    """
    best_n = 0
    best_score = 0
    results = []

    for n in range(1, len(pair_edges) + 1):
        avg_edge = np.mean(pair_edges[:n])

        n_eff = n / correlation_factor
        p_active = 1 - (1 - trading_time_pct) ** n_eff
        expected_active = n_eff * trading_time_pct
        utilization = min(expected_active, max_slots) / max_slots
        fill_eff = min(p_active, utilization)

        portfolio_score = avg_edge * fill_eff * 365

        results.append({
            "n_pairs": n,
            "avg_edge": avg_edge,
            "fill_efficiency": fill_eff,
            "portfolio_annualized": portfolio_score,
        })

        if portfolio_score > best_score:
            best_score = portfolio_score
            best_n = n

    return {
        "optimal_n": best_n,
        "optimal_score": best_score,
        "results": results,
    }


edges = [0.89, 0.82, 0.71, 0.65, 0.55, 0.48, 0.40, 0.31, 0.22, 0.12,
         0.08, 0.05, 0.02, -0.01, -0.05]

opt = optimal_pairs_count(edges, trading_time_pct=0.15, correlation_factor=3.0)
print(f"Optimal number of pairs: {opt['optimal_n']}")
print(f"Portfolio annualized: {opt['optimal_score']:.1f}%")

for r in opt["results"]:
    print(f"  N={r['n_pairs']:2d}: avg_edge={r['avg_edge']:.2f}%, "
          f"fill_eff={r['fill_efficiency']:.1%}, "
          f"portfolio={r['portfolio_annualized']:.1f}%")

The optimum is typically found at the point where the marginal fill_efficiency from adding a pair no longer compensates for the decline in average edge. For a typical crypto portfolio:

Strategy B (5% time): optimum at 8-12 pairs
Strategy A (15% time): optimum at 6-10 pairs
Strategy C (45% time): optimum at 4-6 pairs

A paradox: the strategy with the lowest trading time benefits from the most pairs, yet fill_efficiency still remains low. The solution is not more pairs, but combining with other strategies (see Combo Strategies).

Cross-Pair Diversification: Strategies for Reducing Correlation

If you cannot increase the number of pairs indefinitely, you can reduce $C_f$ — that is, increase signal diversity.

Strategy 1: Mix of Liquid and DeFi Tokens

BTC, ETH, BNB are very strongly correlated. But UNI (DEX), AAVE (lending), CRV (stablecoins) may have their own drivers. Adding DeFi tokens reduces the average $\bar{\rho}$ from 0.35 to 0.20-0.25:

$C_f = 1 + 9 \times 0.20 = 2.8 \quad \text{(vs. 3.25 at } \bar{\rho} = 0.25\text{)}$

Strategy 2: Different Strategies on the Same Pairs

Instead of 10 pairs with one strategy — 5 pairs with two different strategies. If the strategies are based on different principles (momentum vs. mean-reversion), their signals can be anti-correlated:

Momentum enters long when price rises
Mean-reversion enters long when price falls

$\bar{\rho}_{\text{cross-strategy}} < 0 \implies C_f < 1 \implies N_{eff} > N$

This is the only way to get $N_{eff} > N$ — use strategies with negative signal correlation.

Strategy 3: Spot vs. Futures

Funding rate arbitrage and spot trading have a different correlation structure. Adding arbitrage strategies to the portfolio significantly reduces the overall $C_f$ , because arbitrage by definition exploits divergences, not convergences.

Practical Recommendations by Strategy Type

High-Frequency Low-Time Strategies (trading time < 10%)

Typical representative: Strategy B (5% time, 38 trades over 750 days).

Number of pairs: 10-15 (optimum for edge/fill balance)
Problem: fill_efficiency is low even with 15 pairs (~20-25%)
Solution: mandatory combination with other strategies in the orchestrator
$C_f$ for crypto: 2.5-3.5
Monitoring: rolling $C_f$ to adapt pair count to market regime

Medium-Time Strategies (trading time 10-30%)

Typical representative: Strategy A (15% time, 418 trades over 750 days).

Number of pairs: 6-10
Fill_efficiency at 10 pairs: ~40%
Solution: combining 2-3 such strategies fills 80%+ of the time
$C_f$ for crypto: 2.5-3.5
Focus: select pairs with maximum edge, don't chase quantity

High-Time Strategies (trading time > 30%)

Typical representative: Strategy C (45% time).

Number of pairs: 4-6
Fill_efficiency at 6 pairs: ~80%
Problem: overflow — multiple pairs active simultaneously, but fewer slots
Solution: increase max_slots or add pair prioritization
$C_f$ for crypto: 2.5-4.0 (higher due to long positions overlapping crises)

Summary Table

Parameter	Strategy B (5%)	Strategy A (15%)	Strategy C (45%)
Recommended $N$	10-15	6-10	4-6
$N_{eff}$ at $C_f=3$	3.3-5.0	2.0-3.3	1.3-2.0
Fill eff. (1 slot)	15-23%	32-42%	77-89%
Combination needed?	Mandatory	Desirable	No
Bottleneck	Few signals	Balance	Overflow

Connection to Other Metrics in the Series

This article is the eleventh in the "Backtests Without Illusions" series. Signal correlation directly impacts the metrics from previous articles:

PnL per Active Time: fill_efficiency is the key multiplier in the effective return formula. If you overestimate fill_efficiency by ignoring correlation, your portfolio PnL forecast will be overly optimistic.
Funding rates: with high correlation, positions open simultaneously — and funding costs grow linearly with the number of slots. Overflow + funding = accelerated capital burn.
Funding rate arbitrage: arbitrage strategies are a natural diversifier that reduces the portfolio's $C_f$ . Their signals are weakly correlated with momentum and mean-reversion strategies.
Combo strategies (next article): how to assemble a portfolio of strategies with different $p$ and $C_f$ to achieve 90%+ utilization. Cascade orchestration accounts for signal correlation when assigning priorities.

Conclusion

Diversification in crypto is not about the number of pairs. 10 correlated pairs yield the effect of 3-4 independent ones. During panic, even fewer.

Four takeaways:

Calculate effective_N, not N. For crypto pairs $C_f \approx 3$ . Ten pairs are ~3.3 effective ones. Plan fill_efficiency based on $N_{eff}$ , not $N$ .
Measure signal correlation, not price correlation. Price correlation is a proxy, not a substitute. Run the strategy on all pairs and compute the binary signal correlation matrix.
Account for edge degradation. More pairs means lower average PnL/day. The optimum is at the point where marginal fill_efficiency from a new pair still compensates for the edge decline.
Reduce $C_f$ , don't increase $N$ . Combining different strategies on the same pairs is more effective than one strategy on more pairs. Cross-strategy diversification can yield $N_{eff} > N$ .

Correlation factor is the hidden variable that determines how realistic your utilization and return forecasts are. Ignoring it means building a portfolio on illusions.

Useful Links

Citation

@article{soloviov2026signalcorrelation,
  author = {Soloviov, Eugen},
  title = {Signal Correlation: How Many Pairs to Monitor},
  year = {2026},
  url = {https://marketmaker.cc/ru/blog/post/signal-correlation-pairs},
  version = {0.1.0},
  description = {Why 10 crypto pairs don't provide 10x diversification, how to calculate effective\_N via correlation\_factor, and how many pairs you really need to monitor for 80-90\% orchestrator slot utilization.}
}

The reason is signal correlation. And in cryptocurrencies, it is catastrophically high.

The Illusion of Diversification in Crypto

In traditional finance, diversification works because Apple stock and an oil ETF react to different factors. In the cryptocurrency market, everything is different.

Why 10 Pairs ≠ 10x Diversification

Formal Statement

Suppose a strategy on each of $N$ pairs is active $p$ fraction of the time. If signals were completely independent, the probability that at least one pair is active:

$P(\geq 1\ \text{active}) = 1 - (1 - p)^N$

For Strategy B ( $p = 0.05$ , $N = 10$ ):

$P(\geq 1) = 1 - 0.95^{10} = 1 - 0.5987 \approx 40.1\%$

But signals are not independent. Cryptocurrencies move in sync — meaning signals emerge and extinguish in clusters.

Correlation Turns 10 Pairs Into 3

The intuition is this: if 10 pairs are correlated, they carry information from not 10 independent sources but, say, 3-4. We formalize this through effective_N:

$N_{eff} = \frac{N}{C_f}$

where $C_f$ is the correlation factor, reflecting the average pairwise signal correlation. When $C_f = 1$ the pairs are fully independent; when $C_f = N$ they are identical.

For crypto pairs, a typical $C_f \approx 3$ . Then:

$N_{eff} = \frac{10}{3} \approx 3.3$

$P(\geq 1) = 1 - 0.95^{3.3} \approx 1 - 0.844 = 15.6\%$

Not 40%, but 15.6%. A 2.5x difference. Fill efficiency drops accordingly, and with it the effective return of the entire portfolio (see PnL per Active Time).

Correlation in Crypto Markets

Crypto signal correlation matrix heatmap

BTC as the Dominant Factor

The crypto market has a pronounced factor structure. BTC explains 60-80% of the daily return variance for most altcoins. This is clearly visible through PCA (Principal Component Analysis):

import numpy as np
from sklearn.decomposition import PCA

def analyze_crypto_factor_structure(returns_matrix: np.ndarray, pair_names: list) -> dict:
    """
    PCA analysis of the factor structure of crypto returns.

    Args:
        returns_matrix: returns matrix [n_days x n_pairs]
        pair_names: list of pair names
    """
    pca = PCA()
    pca.fit(returns_matrix)

    explained = pca.explained_variance_ratio_
    cumulative = np.cumsum(explained)

    print("Factor structure:")
    for i, (var, cum) in enumerate(zip(explained[:5], cumulative[:5])):
        print(f"  PC{i+1}: {var:.1%} variance (cumulative: {cum:.1%})")

    loadings = pca.components_[0]
    print("\nPC1 loadings (BTC factor):")
    for name, load in sorted(zip(pair_names, loadings), key=lambda x: -abs(x[1])):
        print(f"  {name}: {load:.3f}")

    return {
        "explained_variance": explained,
        "n_effective_factors": int(np.searchsorted(cumulative, 0.90)) + 1,
        "pc1_loadings": dict(zip(pair_names, loadings)),
    }

Typical result for a portfolio of 10 crypto pairs:

Component	Explained Variance	Cumulative
PC1 (BTC)	65%	65%
PC2	12%	77%
PC3	8%	85%
PC4	5%	90%
PC5-PC10	10%	100%

Four factors explain 90% of variance. Out of 10 pairs, no more than 4 are "independent."

Signal Correlation vs. Price Correlation

The correct approach is to measure the correlation of signals specifically, not price series:

import numpy as np
from itertools import combinations

def signal_correlation_matrix(
    signals: dict,  # {pair: np.array of 0/1 per minute}
    method: str = "pearson",
) -> np.ndarray:
    """
    Calculate the signal correlation matrix (binary: 0 = flat, 1 = in position).

    Args:
        signals: dictionary {pair_name: binary_signal_array}
        method: correlation method ("pearson", "jaccard")
    """
    pairs = sorted(signals.keys())
    n = len(pairs)
    corr_matrix = np.ones((n, n))

    for i, j in combinations(range(n), 2):
        s_i = signals[pairs[i]]
        s_j = signals[pairs[j]]

        if method == "pearson":
            corr = np.corrcoef(s_i, s_j)[0, 1]
        elif method == "jaccard":
            intersection = np.sum(s_i & s_j)
            union = np.sum(s_i | s_j)
            corr = intersection / union if union > 0 else 0
        else:
            raise ValueError(f"Unknown method: {method}")

        corr_matrix[i, j] = corr
        corr_matrix[j, i] = corr

    return corr_matrix, pairs


def estimate_correlation_factor(corr_matrix: np.ndarray) -> float:
    """
    Estimate correlation_factor from the signal correlation matrix.

    correlation_factor = 1 + (N-1) * mean_pairwise_correlation

    When correlation is 0 → C_f = 1 (all independent).
    When correlation is 1 → C_f = N (all identical).
    """
    n = corr_matrix.shape[0]
    upper_triangle = corr_matrix[np.triu_indices(n, k=1)]
    mean_corr = np.mean(upper_triangle)

    correlation_factor = 1 + (n - 1) * mean_corr
    return correlation_factor

Temporal Correlation: Calm vs. Panic

Correlation is not static. During calm periods, BTC and alts can diverge — ETH rises on Ethereum news, SOL on Solana news. In a crisis, everything collapses into a single factor: risk-on/risk-off.

def rolling_correlation_factor(
    signals: dict,
    window_days: int = 30,
    step_days: int = 7,
) -> list:
    """
    Rolling correlation_factor to detect regime changes.
    """
    pairs = sorted(signals.keys())
    minutes_per_day = 1440
    window = window_days * minutes_per_day
    step = step_days * minutes_per_day

    total_minutes = len(signals[pairs[0]])
    results = []

    for start in range(0, total_minutes - window, step):
        end = start + window
        window_signals = {p: signals[p][start:end] for p in pairs}

        corr_matrix, _ = signal_correlation_matrix(window_signals)
        cf = estimate_correlation_factor(corr_matrix)

        results.append({
            "start_minute": start,
            "end_minute": end,
            "correlation_factor": cf,
            "effective_n": len(pairs) / cf,
        })

    return results

Typical picture for 10 crypto pairs:

Market Regime	Average Signal Correlation	$C_f$	$N_{eff}$
Sideways (low vol)	0.15-0.25	2.4-3.3	3.0-4.2
Uptrend	0.25-0.40	3.3-4.6	2.2-3.0
Downtrend	0.30-0.50	3.7-5.5	1.8-2.7
Panic (crash)	0.60-0.90	6.4-9.1	1.1-1.6

During panic, 10 pairs compress to 1-2 effective ones. Precisely when diversification is needed most, it vanishes. This is the crypto analogue of the classic "correlations go to 1 in a crisis."

effective_N: The Key Concept

Formula and Derivation

The idea of effective_N is borrowed from statistics, where effective sample size accounts for autocorrelation of observations. For our purposes:

$N_{eff} = \frac{N}{1 + (N - 1) \cdot \bar{\rho}}$

where $\bar{\rho}$ is the average pairwise signal correlation. Simplified notation:

$N_{eff} = \frac{N}{C_f}, \quad C_f = 1 + (N - 1) \cdot \bar{\rho}$

Properties:

When $\bar{\rho} = 0$ : $C_f = 1$ , $N_{eff} = N$ — full independence
When $\bar{\rho} = 1$ : $C_f = N$ , $N_{eff} = 1$ — all pairs are identical
When $\bar{\rho} = 0.25$ and $N = 10$ : $C_f = 3.25$ , $N_{eff} = 3.08$

How to Estimate correlation_factor from Data

In practice, there are three approaches:

1. From the signal correlation matrix (exact).

Run the strategy on all pairs, obtain binary signals (0/1 for each minute), build the correlation matrix, compute $C_f$ using the formula above.

2. From PCA of price returns (approximate).

If signals strongly depend on price dynamics (momentum, mean-reversion), you can estimate $N_{eff}$ as the number of PCA components explaining 90% of variance.

3. From asset class heuristics (rough).

Asset Class	Typical $C_f$
Crypto (top-10)	2.5-4.0
Crypto (with DeFi/memecoins)	2.0-3.0
Forex (majors)	1.5-2.5
Stocks (single sector)	2.0-3.5
Stocks (cross-sector)	1.2-1.8

For a crypto portfolio of BTC, ETH, SOL, AVAX, MATIC, DOGE, DOT, LINK, UNI, ATOM, a safe estimate is $C_f \approx 3$ .

Slot Utilization Modeling

Formula for $P(\geq 1\ \text{active})$

The base formula accounting for correlation:

$P(\geq 1\ \text{active}) = 1 - (1 - p)^{N_{eff}}$

Table for different strategies and number of pairs ( $C_f = 3$ ):

Strategy	$p$ (trading time)	5 pairs ( $N_{eff}=1.7$ )	10 pairs ( $N_{eff}=3.3$ )	20 pairs ( $N_{eff}=6.7$ )	50 pairs ( $N_{eff}=16.7$ )
Strategy B	5%	8.2%	15.6%	29.1%	58.0%
Strategy A	15%	23.6%	41.8%	65.9%	92.8%
Strategy C	45%	67.1%	89.0%	98.8%	~100%

Multi-Slot Orchestrator

A real orchestrator manages multiple slots simultaneously. If you have 5 slots and 10 pairs, utilization is calculated differently:

$\text{utilization} = \frac{\min(E[\text{active}], \text{max\_slots})}{\text{max\_slots}}$

$E[\text{active}] = N_{eff} \cdot p$

def estimate_fill_efficiency(
    trading_time_pct: float,
    n_pairs: int,
    correlation_factor: float = 3.0,
    max_slots: int = 1,
) -> dict:
    """
    Analytical estimate of fill efficiency for a multi-slot orchestrator.

    Args:
        trading_time_pct: fraction of active time for one strategy on one pair
        n_pairs: number of trading pairs
        correlation_factor: signal correlation coefficient
        max_slots: maximum number of simultaneous positions

    Returns:
        dict with utilization metrics
    """
    effective_n = n_pairs / correlation_factor

    p_at_least_one = 1 - (1 - trading_time_pct) ** effective_n

    expected_active = effective_n * trading_time_pct

    utilization = min(expected_active, max_slots) / max_slots

    fill_efficiency = min(p_at_least_one, utilization)

    return {
        "effective_n": effective_n,
        "p_at_least_one": p_at_least_one,
        "expected_active": expected_active,
        "utilization": utilization,
        "fill_efficiency": fill_efficiency,
    }


configs = [
    ("Strategy B, 10 pairs, 1 slot", 0.05, 10, 3.0, 1),
    ("Strategy B, 10 pairs, 3 slots", 0.05, 10, 3.0, 3),
    ("Strategy B, 30 pairs, 1 slot", 0.05, 30, 3.0, 1),
    ("Strategy A, 10 pairs, 1 slot", 0.15, 10, 3.0, 1),
    ("Strategy C, 10 pairs, 1 slot", 0.45, 10, 3.0, 1),
    ("Strategy C, 10 pairs, 5 slots", 0.45, 10, 3.0, 5),
]

for name, p, n, cf, slots in configs:
    result = estimate_fill_efficiency(p, n, cf, slots)
    print(f"{name}:")
    print(f"  N_eff = {result['effective_n']:.1f}")
    print(f"  P(≥1 active) = {result['p_at_least_one']:.1%}")
    print(f"  E[active] = {result['expected_active']:.2f}")
    print(f"  fill_efficiency = {result['fill_efficiency']:.1%}")
    print()

Expected output:

Strategy B, 10 pairs, 1 slot:
  N_eff = 3.3
  P(≥1 active) = 15.6%
  E[active] = 0.17
  fill_efficiency = 15.6%

Strategy B, 10 pairs, 3 slots:
  N_eff = 3.3
  P(≥1 active) = 15.6%
  E[active] = 0.17
  fill_efficiency = 5.6%

Strategy B, 30 pairs, 1 slot:
  N_eff = 10.0
  P(≥1 active) = 40.1%
  E[active] = 0.50
  fill_efficiency = 40.1%

Strategy A, 10 pairs, 1 slot:
  N_eff = 3.3
  P(≥1 active) = 41.8%
  E[active] = 0.50
  fill_efficiency = 41.8%

Strategy C, 10 pairs, 1 slot:
  N_eff = 3.3
  P(≥1 active) = 89.0%
  E[active] = 1.50
  fill_efficiency = 89.0%

Strategy C, 10 pairs, 5 slots:
  N_eff = 3.3
  P(≥1 active) = 89.0%
  E[active] = 1.50
  fill_efficiency = 30.0%

Simulation from Real Data

The analytical model is an approximation. For accurate estimation, you need simulation on real signals:

import numpy as np

def simulate_fill_efficiency(
    all_signals: dict,       # {(strategy, pair): [(entry_min, exit_min), ...]}
    max_slots: int = 10,
    test_period_minutes: int = 750 * 24 * 60,  # 750 days
    priority_fn=None,        # priority function for position selection
) -> dict:
    """
    Simulate real slot load of the orchestrator.

    For each minute: count how many pairs want to enter a position,
    and how many slots are actually occupied (accounting for the limit).

    Args:
        all_signals: signals by pairs and strategies
        max_slots: maximum number of simultaneous positions
        test_period_minutes: length of the test period in minutes
        priority_fn: if None — FIFO; otherwise — ranking function
    """
    demand_timeline = np.zeros(test_period_minutes, dtype=np.int32)
    capped_timeline = np.zeros(test_period_minutes, dtype=np.int32)

    for signals in all_signals.values():
        for entry_min, exit_min in signals:
            if entry_min < test_period_minutes:
                end = min(exit_min, test_period_minutes)
                demand_timeline[entry_min:end] += 1

    capped_timeline = np.minimum(demand_timeline, max_slots)

    total_demand = np.sum(demand_timeline)
    total_filled = np.sum(capped_timeline)
    time_with_any_active = np.sum(demand_timeline > 0)

    fill_efficiency = np.mean(capped_timeline) / max_slots
    demand_fill_ratio = total_filled / total_demand if total_demand > 0 else 0
    time_utilization = time_with_any_active / test_period_minutes

    slot_distribution = {}
    for s in range(max_slots + 1):
        slot_distribution[s] = np.mean(capped_timeline == s)

    return {
        "fill_efficiency": fill_efficiency,
        "demand_fill_ratio": demand_fill_ratio,
        "time_utilization": time_utilization,
        "avg_demand": np.mean(demand_timeline),
        "avg_filled": np.mean(capped_timeline),
        "slot_distribution": slot_distribution,
        "overflow_pct": np.mean(demand_timeline > max_slots),
    }

How Many Pairs to Monitor? Diminishing Returns Analysis

Effective N and diminishing returns curve

The key question: at what $N$ does adding one more pair stop noticeably increasing fill_efficiency?

import numpy as np

def diminishing_returns_analysis(
    trading_time_pct: float,
    correlation_factor: float = 3.0,
    max_pairs: int = 100,
    target_utilization: float = 0.80,
) -> dict:
    """
    Diminishing returns analysis from adding new pairs.
    """
    results = []
    target_n = None

    for n in range(1, max_pairs + 1):
        n_eff = n / correlation_factor
        p_active = 1 - (1 - trading_time_pct) ** n_eff
        marginal = 0
        if n > 1:
            prev_eff = (n - 1) / correlation_factor
            prev_p = 1 - (1 - trading_time_pct) ** prev_eff
            marginal = p_active - prev_p

        results.append({
            "n_pairs": n,
            "n_effective": n_eff,
            "p_at_least_one": p_active,
            "marginal_gain": marginal,
        })

        if target_n is None and p_active >= target_utilization:
            target_n = n

    return {
        "results": results,
        "target_n_for_utilization": target_n,
    }


analysis_b = diminishing_returns_analysis(0.05, correlation_factor=3.0, target_utilization=0.80)
print(f"Strategy B: need {analysis_b['target_n_for_utilization']} pairs for 80% P(≥1)")

for r in analysis_b["results"]:
    if r["n_pairs"] in [1, 3, 5, 10, 20, 30, 50, 80]:
        print(f"  N={r['n_pairs']:3d}: N_eff={r['n_effective']:.1f}, "
              f"P(≥1)={r['p_at_least_one']:.1%}, "
              f"marginal={r['marginal_gain']:.2%}")

Results for Strategy B ( $p = 0.05$ , $C_f = 3$ ):

$N$ pairs	$N_{eff}$	$P(\geq 1)$	Marginal gain
1	0.3	1.7%	—
3	1.0	5.0%	+1.7%
5	1.7	8.2%	+1.6%
10	3.3	15.6%	+1.4%
20	6.7	29.1%	+1.1%
30	10.0	40.1%	+0.9%
50	16.7	58.0%	+0.6%
80	26.7	74.5%	+0.4%

For Strategy A ( $p = 0.15$ , $C_f = 3$ ):

$N$ pairs	$N_{eff}$	$P(\geq 1)$	Marginal gain
5	1.7	23.6%	—
10	3.3	41.8%	+3.3%
20	6.7	65.9%	+2.1%
30	10.0	80.3%	+1.2%

Strategy A reaches 80% utilization at ~30 pairs. Marginal gain at the 30th pair is only +1.2%.

For Strategy C ( $p = 0.45$ , $C_f = 3$ ):

$N$ pairs	$N_{eff}$	$P(\geq 1)$
3	1.0	45.0%
5	1.7	67.1%
10	3.3	89.0%
15	5.0	95.0%

Strategy C with 45% trading time reaches 90% utilization at just 10 pairs. Adding more is pointless.

Edge Degradation Across Pairs

There is another factor that limits the number of pairs: edge degradation. A strategy developed and optimized on BTC/USDT may perform worse on less liquid alts.

Causes of degradation:

Liquidity: slippage on DOGE/USDT is several times higher than on BTC/USDT
Spreads: less liquid pairs have wider bid-ask spreads
Microstructure: order book patterns differ between pairs
Manipulation: low-liquidity alts are susceptible to pump-and-dump

def edge_decay_analysis(
    strategy_results: dict,  # {pair: {"pnl_per_day": float, "n_trades": int}}
    min_trades: int = 30,
) -> list:
    """
    Rank pairs by edge accounting for degradation.
    """
    ranked = []
    for pair, metrics in strategy_results.items():
        if metrics["n_trades"] < min_trades:
            continue
        ranked.append({
            "pair": pair,
            "pnl_per_day": metrics["pnl_per_day"],
            "n_trades": metrics["n_trades"],
            "sharpe": metrics.get("sharpe", 0),
        })

    ranked.sort(key=lambda x: x["pnl_per_day"], reverse=True)

    cumulative_pnl = []
    running_sum = 0
    for i, r in enumerate(ranked):
        running_sum += r["pnl_per_day"]
        avg = running_sum / (i + 1)
        cumulative_pnl.append({
            "n_pairs": i + 1,
            "last_added": r["pair"],
            "last_pnl_per_day": r["pnl_per_day"],
            "avg_pnl_per_day": avg,
        })

    return cumulative_pnl

Typical picture:

# pairs	Last Added	PnL/day of last	Average PnL/day
1	BTC/USDT	0.89%	0.89%
2	ETH/USDT	0.82%	0.86%
3	SOL/USDT	0.71%	0.81%
5	AVAX/USDT	0.55%	0.73%
8	DOT/USDT	0.31%	0.61%
10	DOGE/USDT	0.12%	0.53%

Optimal Number of Pairs: Unified Model

We combine fill_efficiency and edge decay into a single metric — expected portfolio PnL per day:

$\text{Portfolio PnL/day} = \text{avg\_edge}(N) \times \text{fill\_efficiency}(N) \times 365$

def optimal_pairs_count(
    pair_edges: list,           # PnL/day in descending order: [0.89, 0.82, 0.71, ...]
    trading_time_pct: float,
    correlation_factor: float = 3.0,
    max_slots: int = 1,
) -> dict:
    """
    Find the optimal number of pairs that maximizes portfolio PnL.
    """
    best_n = 0
    best_score = 0
    results = []

    for n in range(1, len(pair_edges) + 1):
        avg_edge = np.mean(pair_edges[:n])

        n_eff = n / correlation_factor
        p_active = 1 - (1 - trading_time_pct) ** n_eff
        expected_active = n_eff * trading_time_pct
        utilization = min(expected_active, max_slots) / max_slots
        fill_eff = min(p_active, utilization)

        portfolio_score = avg_edge * fill_eff * 365

        results.append({
            "n_pairs": n,
            "avg_edge": avg_edge,
            "fill_efficiency": fill_eff,
            "portfolio_annualized": portfolio_score,
        })

        if portfolio_score > best_score:
            best_score = portfolio_score
            best_n = n

    return {
        "optimal_n": best_n,
        "optimal_score": best_score,
        "results": results,
    }


edges = [0.89, 0.82, 0.71, 0.65, 0.55, 0.48, 0.40, 0.31, 0.22, 0.12,
         0.08, 0.05, 0.02, -0.01, -0.05]

opt = optimal_pairs_count(edges, trading_time_pct=0.15, correlation_factor=3.0)
print(f"Optimal number of pairs: {opt['optimal_n']}")
print(f"Portfolio annualized: {opt['optimal_score']:.1f}%")

for r in opt["results"]:
    print(f"  N={r['n_pairs']:2d}: avg_edge={r['avg_edge']:.2f}%, "
          f"fill_eff={r['fill_efficiency']:.1%}, "
          f"portfolio={r['portfolio_annualized']:.1f}%")

The optimum is typically found at the point where the marginal fill_efficiency from adding a pair no longer compensates for the decline in average edge. For a typical crypto portfolio:

Strategy B (5% time): optimum at 8-12 pairs
Strategy A (15% time): optimum at 6-10 pairs
Strategy C (45% time): optimum at 4-6 pairs

Cross-Pair Diversification: Strategies for Reducing Correlation

If you cannot increase the number of pairs indefinitely, you can reduce $C_f$ — that is, increase signal diversity.

Strategy 1: Mix of Liquid and DeFi Tokens

BTC, ETH, BNB are very strongly correlated. But UNI (DEX), AAVE (lending), CRV (stablecoins) may have their own drivers. Adding DeFi tokens reduces the average $\bar{\rho}$ from 0.35 to 0.20-0.25:

$C_f = 1 + 9 \times 0.20 = 2.8 \quad \text{(vs. 3.25 at } \bar{\rho} = 0.25\text{)}$

Strategy 2: Different Strategies on the Same Pairs

Momentum enters long when price rises
Mean-reversion enters long when price falls

$\bar{\rho}_{\text{cross-strategy}} < 0 \implies C_f < 1 \implies N_{eff} > N$

This is the only way to get $N_{eff} > N$ — use strategies with negative signal correlation.

Strategy 3: Spot vs. Futures

Practical Recommendations by Strategy Type

High-Frequency Low-Time Strategies (trading time < 10%)

Typical representative: Strategy B (5% time, 38 trades over 750 days).

Number of pairs: 10-15 (optimum for edge/fill balance)
Problem: fill_efficiency is low even with 15 pairs (~20-25%)
Solution: mandatory combination with other strategies in the orchestrator
$C_f$ for crypto: 2.5-3.5
Monitoring: rolling $C_f$ to adapt pair count to market regime

Medium-Time Strategies (trading time 10-30%)

Typical representative: Strategy A (15% time, 418 trades over 750 days).

Number of pairs: 6-10
Fill_efficiency at 10 pairs: ~40%
Solution: combining 2-3 such strategies fills 80%+ of the time
$C_f$ for crypto: 2.5-3.5
Focus: select pairs with maximum edge, don't chase quantity

High-Time Strategies (trading time > 30%)

Typical representative: Strategy C (45% time).

Number of pairs: 4-6
Fill_efficiency at 6 pairs: ~80%
Problem: overflow — multiple pairs active simultaneously, but fewer slots
Solution: increase max_slots or add pair prioritization
$C_f$ for crypto: 2.5-4.0 (higher due to long positions overlapping crises)

Summary Table

Parameter	Strategy B (5%)	Strategy A (15%)	Strategy C (45%)
Recommended $N$	10-15	6-10	4-6
$N_{eff}$ at $C_f=3$	3.3-5.0	2.0-3.3	1.3-2.0
Fill eff. (1 slot)	15-23%	32-42%	77-89%
Combination needed?	Mandatory	Desirable	No
Bottleneck	Few signals	Balance	Overflow

Connection to Other Metrics in the Series

This article is the eleventh in the "Backtests Without Illusions" series. Signal correlation directly impacts the metrics from previous articles:

PnL per Active Time: fill_efficiency is the key multiplier in the effective return formula. If you overestimate fill_efficiency by ignoring correlation, your portfolio PnL forecast will be overly optimistic.
Funding rates: with high correlation, positions open simultaneously — and funding costs grow linearly with the number of slots. Overflow + funding = accelerated capital burn.
Funding rate arbitrage: arbitrage strategies are a natural diversifier that reduces the portfolio's $C_f$ . Their signals are weakly correlated with momentum and mean-reversion strategies.
Combo strategies (next article): how to assemble a portfolio of strategies with different $p$ and $C_f$ to achieve 90%+ utilization. Cascade orchestration accounts for signal correlation when assigning priorities.

Conclusion

Diversification in crypto is not about the number of pairs. 10 correlated pairs yield the effect of 3-4 independent ones. During panic, even fewer.

Four takeaways:

Calculate effective_N, not N. For crypto pairs $C_f \approx 3$ . Ten pairs are ~3.3 effective ones. Plan fill_efficiency based on $N_{eff}$ , not $N$ .
Measure signal correlation, not price correlation. Price correlation is a proxy, not a substitute. Run the strategy on all pairs and compute the binary signal correlation matrix.
Account for edge degradation. More pairs means lower average PnL/day. The optimum is at the point where marginal fill_efficiency from a new pair still compensates for the edge decline.
Reduce $C_f$ , don't increase $N$ . Combining different strategies on the same pairs is more effective than one strategy on more pairs. Cross-strategy diversification can yield $N_{eff} > N$ .

Correlation factor is the hidden variable that determines how realistic your utilization and return forecasts are. Ignoring it means building a portfolio on illusions.

Useful Links

Citation

@article{soloviov2026signalcorrelation,
  author = {Soloviov, Eugen},
  title = {Signal Correlation: How Many Pairs to Monitor},
  year = {2026},
  url = {https://marketmaker.cc/ru/blog/post/signal-correlation-pairs},
  version = {0.1.0},
  description = {Why 10 crypto pairs don't provide 10x diversification, how to calculate effective\_N via correlation\_factor, and how many pairs you really need to monitor for 80-90\% orchestrator slot utilization.}
}

The Illusion of Diversification in Crypto

Why 10 Pairs ≠ 10x Diversification

Formal Statement

Correlation Turns 10 Pairs Into 3

Correlation in Crypto Markets

BTC as the Dominant Factor

Signal Correlation vs. Price Correlation

Temporal Correlation: Calm vs. Panic

effective_N: The Key Concept

Formula and Derivation

How to Estimate correlation_factor from Data

Slot Utilization Modeling

Formula for P(≥1 active)P(\geq 1\ \text{active})P(≥1 active)

Multi-Slot Orchestrator

Simulation from Real Data

How Many Pairs to Monitor? Diminishing Returns Analysis

Edge Degradation Across Pairs

Optimal Number of Pairs: Unified Model

Cross-Pair Diversification: Strategies for Reducing Correlation

Strategy 1: Mix of Liquid and DeFi Tokens

Strategy 2: Different Strategies on the Same Pairs

Strategy 3: Spot vs. Futures

Practical Recommendations by Strategy Type

High-Frequency Low-Time Strategies (trading time < 10%)

Medium-Time Strategies (trading time 10-30%)

High-Time Strategies (trading time > 30%)

Summary Table

Connection to Other Metrics in the Series

Conclusion

Useful Links

Citation

Read More

PnL by Active Time: The Metric That Changes Strategy Rankings

Multi-Symbol Validation: Test Your Strategy on All Pairs

Cascade Strategies: Priority Execution with Fallback Filling

बाज़ार से आगे रहें

The Illusion of Diversification in Crypto

Why 10 Pairs ≠ 10x Diversification

Formal Statement

Correlation Turns 10 Pairs Into 3

Correlation in Crypto Markets

BTC as the Dominant Factor

Signal Correlation vs. Price Correlation

Temporal Correlation: Calm vs. Panic

effective_N: The Key Concept

Formula and Derivation

How to Estimate correlation_factor from Data

Slot Utilization Modeling

Formula for P(≥1 active)P(\geq 1\ \text{active})P(≥1 active)

Multi-Slot Orchestrator

Simulation from Real Data

How Many Pairs to Monitor? Diminishing Returns Analysis

Edge Degradation Across Pairs

Optimal Number of Pairs: Unified Model

Cross-Pair Diversification: Strategies for Reducing Correlation

Strategy 1: Mix of Liquid and DeFi Tokens

Strategy 2: Different Strategies on the Same Pairs

Strategy 3: Spot vs. Futures

Practical Recommendations by Strategy Type

High-Frequency Low-Time Strategies (trading time < 10%)

Medium-Time Strategies (trading time 10-30%)

High-Time Strategies (trading time > 30%)

Summary Table

Connection to Other Metrics in the Series

Conclusion

Useful Links

Citation

Formula for $P(\geq 1\ \text{active})$

Formula for $P(\geq 1\ \text{active})$