Distance Approach in Pairs Trading: Implementation and Analysis with Rust

The Distance Approach in pairs trading has gained significant popularity due to its elegant simplicity and effectiveness. This technique identifies asset pairs through statistical measures and trades based on the divergence and convergence of their price relationships. This article provides a comprehensive analysis of both basic and advanced Distance Approach methodologies, with practical implementations in Rust tailored for high-frequency traders, algorithmic developers, mathematicians, and programmers seeking robust solutions.

Distance Approach Pairs Trading Visualization Visualizing the Distance Approach: Assets A and B tracking each other, with trading signals generated based on spread divergence (Long/Short)

Theoretical Foundation of the Distance Approach

The Distance Approach establishes a framework for pairs trading based on normalized price movements between assets. At its core, the method uses Euclidean squared distance measurements to identify assets that historically move together and generates trading signals when their normalized price divergence exceeds statistically significant thresholds[2].

This approach consists of two primary stages:

Pairs formation - identifying statistically related asset pairs
Trading signal generation - creating entry and exit rules based on divergence

Mathematical Basis

The basic implementation utilizes Euclidean distance between normalized price series. For two assets with normalized price time series X and Y, we calculate:

fn euclidean_squared_distance(x: &[f64], y: &[f64]) -> f64 {
    assert_eq!(x.len(), y.len(), "Time series must have equal length");
    
    x.iter()
        .zip(y.iter())
        .map(|(xi, yi)| (xi - yi).powi(2))
        .sum()
}

This distance metric helps identify assets that historically move together, providing the foundation for statistical arbitrage opportunities[2].

Basic Distance Approach Implementation

Data Normalization

Before calculating distances, we must normalize price data to establish comparable scales. Min-max normalization is commonly applied:

fn min_max_normalize(prices: &[f64]) -> Vec<f64> {
    if prices.is_empty() {
        return Vec::new();
    }
    
    let min_price = prices.iter().fold(f64::INFINITY, |a, &b| a.min(b));
    let max_price = prices.iter().fold(f64::NEG_INFINITY, |a, &b| a.max(b));
    let range = max_price - min_price;
    
    if range.abs() < f64::EPSILON {
        return vec![0.5; prices.len()];
    }
    
    prices.iter()
        .map(|&price| (price - min_price) / range)
        .collect()
}

Finding the Closest Pairs

We identify potential pairs by calculating the Euclidean distance between all asset combinations and selecting those with the smallest distances:

#[derive(Debug, Clone)]
struct StockPair {
    stock1_idx: usize,
    stock2_idx: usize,
    distance: f64,
}

impl PartialEq for StockPair {
    fn eq(&self, other: &Self) -> bool {
        self.distance.eq(&other.distance)
    }
}

impl Eq for StockPair {}

impl PartialOrd for StockPair {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
        self.distance.partial_cmp(&other.distance)
    }
}

impl Ord for StockPair {
    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
        self.partial_cmp(other).unwrap_or(std::cmp::Ordering::Equal)
    }
}

fn find_closest_pairs(normalized_prices: &[Vec<f64>], top_n: usize) -> Vec<StockPair> {
    let stock_count = normalized_prices.len();
    let mut pairs = BinaryHeap::new();
    
    for i in 0..stock_count {
        for j in (i+1)..stock_count {
            let distance = euclidean_squared_distance(&normalized_prices[i], &normalized_prices[j]);
            
            pairs.push(Reverse(StockPair {
                stock1_idx: i,
                stock2_idx: j,
                distance,
            }));
            
            // Keep only top N pairs
            if pairs.len() > top_n {
                pairs.pop();
            }
        }
    }
    
    // Convert from heap to vector and reverse to get ascending order
    pairs.into_iter().map(|Reverse(pair)| pair).collect()
}

Calculating Historical Volatility

Historical volatility calculation is crucial for setting appropriate trading thresholds:

fn calculate_spread_volatility(normalized_price1: &[f64], normalized_price2: &[f64]) -> f64 {
    assert_eq!(normalized_price1.len(), normalized_price2.len());
    
    // Calculate price spread
    let spread: Vec<f64> = normalized_price1.iter()
        .zip(normalized_price2.iter())
        .map(|(p1, p2)| p1 - p2)
        .collect();
    
    // Calculate mean of spread
    let mean = spread.iter().sum::<f64>() / spread.len() as f64;
    
    // Calculate standard deviation
    let variance = spread.iter()
        .map(|&x| (x - mean).powi(2))
        .sum::<f64>() / spread.len() as f64;
    
    variance.sqrt()
}

Advanced Selection Methods

Industry Group Filtering

Restricting pair selection to the same industry can enhance performance by selecting economically related assets:

fn find_industry_pairs(
    normalized_prices: &[Vec<f64>], 
    industry_codes: &[usize], 
    top_n_per_industry: usize
) -> Vec<StockPair> {
    // Group stocks by industry
    let mut industry_groups: std::collections::HashMap<usize, Vec<usize>> = std::collections::HashMap::new();
    
    for (idx, &code) in industry_codes.iter().enumerate() {
        industry_groups.entry(code).or_default().push(idx);
    }
    
    // Find closest pairs within each industry
    let mut all_pairs = Vec::new();
    
    for (_industry_code, stock_indices) in industry_groups {
        let mut industry_pairs = Vec::new();
        
        for i in 0..stock_indices.len() {
            for j in (i+1)..stock_indices.len() {
                let stock1_idx = stock_indices[i];
                let stock2_idx = stock_indices[j];
                let distance = euclidean_squared_distance(
                    &normalized_prices[stock1_idx], 
                    &normalized_prices[stock2_idx]
                );
                
                industry_pairs.push(StockPair {
                    stock1_idx,
                    stock2_idx,
                    distance,
                });
            }
        }
        
        // Sort pairs by distance
        industry_pairs.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());
        
        // Take top N from each industry
        let top_pairs: Vec<StockPair> = industry_pairs.into_iter()
            .take(top_n_per_industry)
            .collect();
            
        all_pairs.extend(top_pairs);
    }
    
    all_pairs
}

The zero-crossings approach identifies pairs with frequent convergence and divergence, potentially indicating more profitable trading opportunities:

Zero-Crossings Strategy Visualization Zero-Crossings concept: identifying pairs that frequently mean-revert, indicated by the spread crossing the zero line

fn count_zero_crossings(spread: &[f64]) -> usize {
    if spread.len() < 2 {
        return 0;
    }
    
    let mut count = 0;
    for i in 1..spread.len() {
        if (spread[i-1] < 0.0 && spread[i] >= 0.0) || 
           (spread[i-1] >= 0.0 && spread[i] < 0.0) {
            count += 1;
        }
    }
    
    count
}

fn find_zero_crossing_pairs(
    normalized_prices: &[Vec<f64>],
    top_distance_threshold: f64,
    min_crossings: usize
) -> Vec<StockPair> {
    let stock_count = normalized_prices.len();
    let mut qualifying_pairs = Vec::new();
    
    for i in 0..stock_count {
        for j in (i+1)..stock_count {
            let distance = euclidean_squared_distance(&normalized_prices[i], &normalized_prices[j]);
            
            // Only consider pairs with distance below threshold
            if distance < top_distance_threshold {
                // Calculate spread
                let spread: Vec<f64> = normalized_prices[i].iter()
                    .zip(normalized_prices[j].iter())
                    .map(|(p1, p2)| p1 - p2)
                    .collect();
                
                let crossings = count_zero_crossings(&spread);
                
                if crossings >= min_crossings {
                    qualifying_pairs.push(StockPair {
                        stock1_idx: i,
                        stock2_idx: j,
                        distance,
                    });
                }
            }
        }
    }
    
    // Sort by number of crossings (could extend StockPair to include this)
    qualifying_pairs.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());
    qualifying_pairs
}

Historical Standard Deviation Consideration

This method addresses a limitation of the basic approach by prioritizing pairs with higher spread volatility, which can increase profit potential:

fn find_highsd_pairs(
    normalized_prices: &[Vec<f64>],
    top_distance_count: usize,
    min_volatility: f64
) -> Vec<StockPair> {
    let stock_count = normalized_prices.len();
    let mut all_pairs = Vec::new();
    
    for i in 0..stock_count {
        for j in (i+1)..stock_count {
            let distance = euclidean_squared_distance(&normalized_prices[i], &normalized_prices[j]);
            
            // Calculate spread volatility
            let spread: Vec<f64> = normalized_prices[i].iter()
                .zip(normalized_prices[j].iter())
                .map(|(p1, p2)| p1 - p2)
                .collect();
            
            let volatility = calculate_spread_volatility(&normalized_prices[i], &normalized_prices[j]);
            
            if volatility >= min_volatility {
                all_pairs.push(StockPair {
                    stock1_idx: i,
                    stock2_idx: j,
                    distance,
                });
            }
        }
    }
    
    // Sort by distance
    all_pairs.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());
    
    // Take top N pairs with highest volatility that meet distance criteria
    all_pairs.into_iter().take(top_distance_count).collect()
}

Advanced Approach: Pearson Correlation Method

The Pearson Correlation approach offers several advantages over the basic Distance Approach, focusing on return correlations rather than price distances[1].

Implementation in Rust

fn pearson_correlation(x: &[f64], y: &[f64]) -> f64 {
    assert_eq!(x.len(), y.len(), "Arrays must have the same length");
    
    let n = x.len() as f64;
    let sum_x: f64 = x.iter().sum();
    let sum_y: f64 = y.iter().sum();
    let sum_xx: f64 = x.iter().map(|&val| val * val).sum();
    let sum_yy: f64 = y.iter().map(|&val| val * val).sum();
    let sum_xy: f64 = x.iter().zip(y.iter()).map(|(&xi, &yi)| xi * yi).sum();
    
    let numerator = n * sum_xy - sum_x * sum_y;
    let denominator = ((n * sum_xx - sum_x * sum_x) * (n * sum_yy - sum_y * sum_y)).sqrt();
    
    if denominator.abs() < f64::EPSILON {
        return 0.0;
    }
    
    numerator / denominator
}

struct PearsonPair {
    stock_idx: usize,
    comover_indices: Vec<usize>,
    correlations: Vec<f64>,
}

fn find_pearson_pairs(returns: &[Vec<f64>], top_n_comovers: usize) -> Vec<PearsonPair> {
    let stock_count = returns.len();
    let mut all_pairs = Vec::new();
    
    for i in 0..stock_count {
        let mut correlations = Vec::with_capacity(stock_count - 1);
        
        for j in 0..stock_count {
            if i == j {
                continue;
            }
            
            let correlation = pearson_correlation(&returns[i], &returns[j]).abs();
            correlations.push((j, correlation));
        }
        
        // Sort by correlation (highest first)
        correlations.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
        
        // Take top N comovers
        let top_comovers: Vec<(usize, f64)> = correlations.into_iter()
            .take(top_n_comovers)
            .collect();
            
        let (comover_indices, correlation_values): (Vec<usize>, Vec<f64>) = 
            top_comovers.into_iter().unzip();
            
        all_pairs.push(PearsonPair {
            stock_idx: i,
            comover_indices,
            correlations: correlation_values,
        });
    }
    
    all_pairs
}

Portfolio Formation and Beta Calculation

The Pearson approach creates portfolios of comovers for each stock, then calculates regression coefficients:

fn calculate_beta(stock_returns: &[f64], portfolio_returns: &[f64]) -> f64 {
    let cov_xy = covariance(stock_returns, portfolio_returns);
    let var_x = variance(portfolio_returns);
    
    if var_x.abs() < f64::EPSILON {
        return 0.0;
    }
    
    cov_xy / var_x
}

fn covariance(x: &[f64], y: &[f64]) -> f64 {
    assert_eq!(x.len(), y.len());
    
    let n = x.len() as f64;
    let mean_x: f64 = x.iter().sum::<f64>() / n;
    let mean_y: f64 = y.iter().sum::<f64>() / n;
    
    let sum_cov: f64 = x.iter()
        .zip(y.iter())
        .map(|(&xi, &yi)| (xi - mean_x) * (yi - mean_y))
        .sum();
    
    sum_cov / n
}

fn variance(x: &[f64]) -> f64 {
    let n = x.len() as f64;
    let mean: f64 = x.iter().sum::<f64>() / n;
    
    let sum_var: f64 = x.iter()
        .map(|&xi| (xi - mean).powi(2))
        .sum();
    
    sum_var / n
}

Trading Signal Generation

The final step in both approaches is generating trading signals based on divergence thresholds:

enum TradingSignal {
    Long,
    Short,
    Neutral
}

struct TradePosition {
    stock1_idx: usize,
    stock2_idx: usize,
    signal: TradingSignal,
    entry_spread: f64,
    timestamp: usize,
}

fn generate_trading_signals(
    normalized_prices: &[Vec<f64>],
    pairs: &[StockPair], 
    threshold_multiplier: f64,
    volatilities: &[f64],
    current_time: usize
) -> Vec<TradePosition> {
    let mut positions = Vec::new();
    
    for (pair_idx, pair) in pairs.iter().enumerate() {
        let stock1_idx = pair.stock1_idx;
        let stock2_idx = pair.stock2_idx;
        
        // Calculate current spread
        let current_spread = normalized_prices[stock1_idx][current_time] - 
                             normalized_prices[stock2_idx][current_time];
                             
        let threshold = threshold_multiplier * volatilities[pair_idx];
        
        let signal = if current_spread > threshold {
            // Stock1 is overvalued relative to Stock2
            TradingSignal::Short
        } else if current_spread < -threshold {
            // Stock1 is undervalued relative to Stock2
            TradingSignal::Long
        } else {
            TradingSignal::Neutral
        };
        
        if signal != TradingSignal::Neutral {
            positions.push(TradePosition {
                stock1_idx,
                stock2_idx,
                signal,
                entry_spread: current_spread,
                timestamp: current_time,
            });
        }
    }
    
    positions
}

Performance Optimization

For high-frequency trading systems, performance is critical. SIMD (Single Instruction, Multiple Data) instructions can significantly accelerate distance calculations:

SIMD Performance Optimization in Rust SIMD acceleration: utilizing data-level parallelism in Rust to process multiple price points simultaneously, drastically reducing latency

#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

#[cfg(target_arch = "x86_64")]
#[inline]
unsafe fn euclidean_distance_simd(x: &[f32], y: &[f32]) -> f32 {
    assert_eq!(x.len(), y.len());
    
    let mut sum = _mm256_setzero_ps();
    let chunks = x.len() / 8;
    
    for i in 0..chunks {
        let xi = _mm256_loadu_ps(&x[i * 8]);
        let yi = _mm256_loadu_ps(&y[i * 8]);
        
        let diff = _mm256_sub_ps(xi, yi);
        let squared = _mm256_mul_ps(diff, diff);
        sum = _mm256_add_ps(sum, squared);
    }
    
    // Handle the remaining elements
    let mut result = _mm256_reduce_add_ps(sum);
    
    for i in (chunks * 8)..x.len() {
        result += (x[i] - y[i]).powi(2);
    }
    
    result.sqrt()
}

// Helper function to sum SIMD vector
#[cfg(target_arch = "x86_64")]
#[inline(always)]
unsafe fn _mm256_reduce_add_ps(v: __m256) -> f32 {
    let hilow = _mm256_extractf128_ps(v, 1);
    let low = _mm256_castps256_ps128(v);
    let sum128 = _mm_add_ps(hilow, low);
    
    let hi64 = _mm_extractf128_si128(_mm_castps_si128(sum128), 1);
    let low64 = _mm_castps_si128(sum128);
    let sum64 = _mm_add_ps(_mm_castsi128_ps(hi64), _mm_castsi128_ps(low64));
    
    _mm_cvtss_f32(_mm_hadd_ps(sum64, sum64))
}

Asynchronous processing can further improve throughput, especially when dealing with multiple stock pairs:

use tokio::task;
use futures::future::join_all;

async fn process_pairs_async(
    normalized_prices: &[Vec<f64>],
    stock_count: usize,
    chunk_size: usize
) -> Vec<StockPair> {
    let mut tasks = Vec::new();
    
    // Split work into chunks
    let chunks = (stock_count + chunk_size - 1) / chunk_size;
    
    for chunk in 0..chunks {
        let start = chunk * chunk_size;
        let end = std::cmp::min((chunk + 1) * chunk_size, stock_count);
        
        let prices_clone = normalized_prices.to_vec();
        
        let task = task::spawn(async move {
            let mut pairs = Vec::new();
            
            for i in start..end {
                for j in (i+1)..stock_count {
                    let distance = euclidean_squared_distance(&prices_clone[i], &prices_clone[j]);
                    
                    pairs.push(StockPair {
                        stock1_idx: i,
                        stock2_idx: j,
                        distance,
                    });
                }
            }
            
            pairs
        });
        
        tasks.push(task);
    }
    
    // Await all tasks and combine results
    let results = join_all(tasks).await;
    
    let mut all_pairs = Vec::new();
    for result in results {
        if let Ok(pairs) = result {
            all_pairs.extend(pairs);
        }
    }
    
    // Sort by distance
    all_pairs.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap_or(std::cmp::Ordering::Equal));
    all_pairs
}

Testing the Strategy Implementation

To evaluate our implementation, we need proper testing infrastructure:

#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_normalization() {
        let prices = vec![10.0, 15.0, 12.0, 18.0, 20.0];
        let normalized = min_max_normalize(&prices);
        
        let expected = vec![0.0, 0.5, 0.2, 0.8, 1.0];
        
        for (a, b) in normalized.iter().zip(expected.iter()) {
            assert!((a - b).abs() < 0.001);
        }
    }
    
    #[test]
    fn test_euclidean_distance() {
        let x = vec![0.1, 0.2, 0.3, 0.4, 0.5];
        let y = vec![0.15, 0.22, 0.35, 0.38, 0.53];
        
        let distance = euclidean_squared_distance(&x, &y);
        let expected = 0.0049; // Calculated manually
        
        assert!((distance - expected).abs() < 0.0001);
    }
    
    #[test]
    fn test_pearson_correlation() {
        let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
        let y = vec![5.0, 4.0, 3.0, 2.0, 1.0];
        
        let corr = pearson_correlation(&x, &y);
        let expected = -1.0; // Perfect negative correlation
        
        assert!((corr - expected).abs() < 0.0001);
    }
    
    // Integration tests would be implemented in tests/ directory
}

For integration testing, we would follow the Rust convention of placing tests in a separate tests directory at the project root[15][18].

Conclusion

The Distance Approach provides a robust framework for pairs trading, with both basic and advanced methodologies offering valuable statistical arbitrage opportunities. The basic approach, with its focus on Euclidean distance, offers simplicity and effectiveness, while the Pearson Correlation approach provides additional flexibility and potentially better divergence reversion characteristics.

Rust's performance characteristics make it an ideal language for implementing these computationally intensive strategies, especially with optimizations like SIMD and concurrent processing. The combination of statistical rigor and efficient implementation creates a powerful toolkit for algorithmic traders.

When implementing a pairs trading system, several considerations should be made:

The trade-off between simplicity (basic approach) and enhanced statistical power (Pearson approach)
The computational resources required for large-scale pair analysis
Transaction costs, which can significantly impact profitability[3]
The need for continuous monitoring and recalibration of pairs

By combining the Distance Approach with Rust's performance capabilities, traders can develop highly efficient and effective statistical arbitrage systems capable of operating at the speed and scale required for modern markets.

Citation

@software{soloviov2025distanceapproach,
  author = {Soloviov, Eugen},
  title = {Distance Approach in Pairs Trading: Implementation and Analysis with Rust},
  year = {2025},
  url = {https://marketmaker.cc/en/blog/post/distance-approach-pairs-trading},
  version = {0.1.0},
  description = {A comprehensive analysis of basic and advanced Distance Approach methodologies for pairs trading, with practical implementations in Rust tailored for high-frequency traders and algorithmic developers.}
}

References

Distance Approach Pairs Trading Visualization Visualizing the Distance Approach: Assets A and B tracking each other, with trading signals generated based on spread divergence (Long/Short)

Theoretical Foundation of the Distance Approach

This approach consists of two primary stages:

Pairs formation - identifying statistically related asset pairs
Trading signal generation - creating entry and exit rules based on divergence

Mathematical Basis

The basic implementation utilizes Euclidean distance between normalized price series. For two assets with normalized price time series X and Y, we calculate:

fn euclidean_squared_distance(x: &[f64], y: &[f64]) -> f64 {
    assert_eq!(x.len(), y.len(), "Time series must have equal length");
    
    x.iter()
        .zip(y.iter())
        .map(|(xi, yi)| (xi - yi).powi(2))
        .sum()
}

This distance metric helps identify assets that historically move together, providing the foundation for statistical arbitrage opportunities[2].

Basic Distance Approach Implementation

Data Normalization

Before calculating distances, we must normalize price data to establish comparable scales. Min-max normalization is commonly applied:

fn min_max_normalize(prices: &[f64]) -> Vec<f64> {
    if prices.is_empty() {
        return Vec::new();
    }
    
    let min_price = prices.iter().fold(f64::INFINITY, |a, &b| a.min(b));
    let max_price = prices.iter().fold(f64::NEG_INFINITY, |a, &b| a.max(b));
    let range = max_price - min_price;
    
    if range.abs() < f64::EPSILON {
        return vec![0.5; prices.len()];
    }
    
    prices.iter()
        .map(|&price| (price - min_price) / range)
        .collect()
}

Finding the Closest Pairs

We identify potential pairs by calculating the Euclidean distance between all asset combinations and selecting those with the smallest distances:

#[derive(Debug, Clone)]
struct StockPair {
    stock1_idx: usize,
    stock2_idx: usize,
    distance: f64,
}

impl PartialEq for StockPair {
    fn eq(&self, other: &Self) -> bool {
        self.distance.eq(&other.distance)
    }
}

impl Eq for StockPair {}

impl PartialOrd for StockPair {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
        self.distance.partial_cmp(&other.distance)
    }
}

impl Ord for StockPair {
    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
        self.partial_cmp(other).unwrap_or(std::cmp::Ordering::Equal)
    }
}

fn find_closest_pairs(normalized_prices: &[Vec<f64>], top_n: usize) -> Vec<StockPair> {
    let stock_count = normalized_prices.len();
    let mut pairs = BinaryHeap::new();
    
    for i in 0..stock_count {
        for j in (i+1)..stock_count {
            let distance = euclidean_squared_distance(&normalized_prices[i], &normalized_prices[j]);
            
            pairs.push(Reverse(StockPair {
                stock1_idx: i,
                stock2_idx: j,
                distance,
            }));
            
            // Keep only top N pairs
            if pairs.len() > top_n {
                pairs.pop();
            }
        }
    }
    
    // Convert from heap to vector and reverse to get ascending order
    pairs.into_iter().map(|Reverse(pair)| pair).collect()
}

Calculating Historical Volatility

Historical volatility calculation is crucial for setting appropriate trading thresholds:

fn calculate_spread_volatility(normalized_price1: &[f64], normalized_price2: &[f64]) -> f64 {
    assert_eq!(normalized_price1.len(), normalized_price2.len());
    
    // Calculate price spread
    let spread: Vec<f64> = normalized_price1.iter()
        .zip(normalized_price2.iter())
        .map(|(p1, p2)| p1 - p2)
        .collect();
    
    // Calculate mean of spread
    let mean = spread.iter().sum::<f64>() / spread.len() as f64;
    
    // Calculate standard deviation
    let variance = spread.iter()
        .map(|&x| (x - mean).powi(2))
        .sum::<f64>() / spread.len() as f64;
    
    variance.sqrt()
}

Advanced Selection Methods

Industry Group Filtering

Restricting pair selection to the same industry can enhance performance by selecting economically related assets:

fn find_industry_pairs(
    normalized_prices: &[Vec<f64>], 
    industry_codes: &[usize], 
    top_n_per_industry: usize
) -> Vec<StockPair> {
    // Group stocks by industry
    let mut industry_groups: std::collections::HashMap<usize, Vec<usize>> = std::collections::HashMap::new();
    
    for (idx, &code) in industry_codes.iter().enumerate() {
        industry_groups.entry(code).or_default().push(idx);
    }
    
    // Find closest pairs within each industry
    let mut all_pairs = Vec::new();
    
    for (_industry_code, stock_indices) in industry_groups {
        let mut industry_pairs = Vec::new();
        
        for i in 0..stock_indices.len() {
            for j in (i+1)..stock_indices.len() {
                let stock1_idx = stock_indices[i];
                let stock2_idx = stock_indices[j];
                let distance = euclidean_squared_distance(
                    &normalized_prices[stock1_idx], 
                    &normalized_prices[stock2_idx]
                );
                
                industry_pairs.push(StockPair {
                    stock1_idx,
                    stock2_idx,
                    distance,
                });
            }
        }
        
        // Sort pairs by distance
        industry_pairs.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());
        
        // Take top N from each industry
        let top_pairs: Vec<StockPair> = industry_pairs.into_iter()
            .take(top_n_per_industry)
            .collect();
            
        all_pairs.extend(top_pairs);
    }
    
    all_pairs
}

The zero-crossings approach identifies pairs with frequent convergence and divergence, potentially indicating more profitable trading opportunities:

Zero-Crossings Strategy Visualization Zero-Crossings concept: identifying pairs that frequently mean-revert, indicated by the spread crossing the zero line

fn count_zero_crossings(spread: &[f64]) -> usize {
    if spread.len() < 2 {
        return 0;
    }
    
    let mut count = 0;
    for i in 1..spread.len() {
        if (spread[i-1] < 0.0 && spread[i] >= 0.0) || 
           (spread[i-1] >= 0.0 && spread[i] < 0.0) {
            count += 1;
        }
    }
    
    count
}

fn find_zero_crossing_pairs(
    normalized_prices: &[Vec<f64>],
    top_distance_threshold: f64,
    min_crossings: usize
) -> Vec<StockPair> {
    let stock_count = normalized_prices.len();
    let mut qualifying_pairs = Vec::new();
    
    for i in 0..stock_count {
        for j in (i+1)..stock_count {
            let distance = euclidean_squared_distance(&normalized_prices[i], &normalized_prices[j]);
            
            // Only consider pairs with distance below threshold
            if distance < top_distance_threshold {
                // Calculate spread
                let spread: Vec<f64> = normalized_prices[i].iter()
                    .zip(normalized_prices[j].iter())
                    .map(|(p1, p2)| p1 - p2)
                    .collect();
                
                let crossings = count_zero_crossings(&spread);
                
                if crossings >= min_crossings {
                    qualifying_pairs.push(StockPair {
                        stock1_idx: i,
                        stock2_idx: j,
                        distance,
                    });
                }
            }
        }
    }
    
    // Sort by number of crossings (could extend StockPair to include this)
    qualifying_pairs.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());
    qualifying_pairs
}

Historical Standard Deviation Consideration

This method addresses a limitation of the basic approach by prioritizing pairs with higher spread volatility, which can increase profit potential:

fn find_highsd_pairs(
    normalized_prices: &[Vec<f64>],
    top_distance_count: usize,
    min_volatility: f64
) -> Vec<StockPair> {
    let stock_count = normalized_prices.len();
    let mut all_pairs = Vec::new();
    
    for i in 0..stock_count {
        for j in (i+1)..stock_count {
            let distance = euclidean_squared_distance(&normalized_prices[i], &normalized_prices[j]);
            
            // Calculate spread volatility
            let spread: Vec<f64> = normalized_prices[i].iter()
                .zip(normalized_prices[j].iter())
                .map(|(p1, p2)| p1 - p2)
                .collect();
            
            let volatility = calculate_spread_volatility(&normalized_prices[i], &normalized_prices[j]);
            
            if volatility >= min_volatility {
                all_pairs.push(StockPair {
                    stock1_idx: i,
                    stock2_idx: j,
                    distance,
                });
            }
        }
    }
    
    // Sort by distance
    all_pairs.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());
    
    // Take top N pairs with highest volatility that meet distance criteria
    all_pairs.into_iter().take(top_distance_count).collect()
}

Advanced Approach: Pearson Correlation Method

The Pearson Correlation approach offers several advantages over the basic Distance Approach, focusing on return correlations rather than price distances[1].

Implementation in Rust

fn pearson_correlation(x: &[f64], y: &[f64]) -> f64 {
    assert_eq!(x.len(), y.len(), "Arrays must have the same length");
    
    let n = x.len() as f64;
    let sum_x: f64 = x.iter().sum();
    let sum_y: f64 = y.iter().sum();
    let sum_xx: f64 = x.iter().map(|&val| val * val).sum();
    let sum_yy: f64 = y.iter().map(|&val| val * val).sum();
    let sum_xy: f64 = x.iter().zip(y.iter()).map(|(&xi, &yi)| xi * yi).sum();
    
    let numerator = n * sum_xy - sum_x * sum_y;
    let denominator = ((n * sum_xx - sum_x * sum_x) * (n * sum_yy - sum_y * sum_y)).sqrt();
    
    if denominator.abs() < f64::EPSILON {
        return 0.0;
    }
    
    numerator / denominator
}

struct PearsonPair {
    stock_idx: usize,
    comover_indices: Vec<usize>,
    correlations: Vec<f64>,
}

fn find_pearson_pairs(returns: &[Vec<f64>], top_n_comovers: usize) -> Vec<PearsonPair> {
    let stock_count = returns.len();
    let mut all_pairs = Vec::new();
    
    for i in 0..stock_count {
        let mut correlations = Vec::with_capacity(stock_count - 1);
        
        for j in 0..stock_count {
            if i == j {
                continue;
            }
            
            let correlation = pearson_correlation(&returns[i], &returns[j]).abs();
            correlations.push((j, correlation));
        }
        
        // Sort by correlation (highest first)
        correlations.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
        
        // Take top N comovers
        let top_comovers: Vec<(usize, f64)> = correlations.into_iter()
            .take(top_n_comovers)
            .collect();
            
        let (comover_indices, correlation_values): (Vec<usize>, Vec<f64>) = 
            top_comovers.into_iter().unzip();
            
        all_pairs.push(PearsonPair {
            stock_idx: i,
            comover_indices,
            correlations: correlation_values,
        });
    }
    
    all_pairs
}

Portfolio Formation and Beta Calculation

The Pearson approach creates portfolios of comovers for each stock, then calculates regression coefficients:

fn calculate_beta(stock_returns: &[f64], portfolio_returns: &[f64]) -> f64 {
    let cov_xy = covariance(stock_returns, portfolio_returns);
    let var_x = variance(portfolio_returns);
    
    if var_x.abs() < f64::EPSILON {
        return 0.0;
    }
    
    cov_xy / var_x
}

fn covariance(x: &[f64], y: &[f64]) -> f64 {
    assert_eq!(x.len(), y.len());
    
    let n = x.len() as f64;
    let mean_x: f64 = x.iter().sum::<f64>() / n;
    let mean_y: f64 = y.iter().sum::<f64>() / n;
    
    let sum_cov: f64 = x.iter()
        .zip(y.iter())
        .map(|(&xi, &yi)| (xi - mean_x) * (yi - mean_y))
        .sum();
    
    sum_cov / n
}

fn variance(x: &[f64]) -> f64 {
    let n = x.len() as f64;
    let mean: f64 = x.iter().sum::<f64>() / n;
    
    let sum_var: f64 = x.iter()
        .map(|&xi| (xi - mean).powi(2))
        .sum();
    
    sum_var / n
}

Trading Signal Generation

The final step in both approaches is generating trading signals based on divergence thresholds:

enum TradingSignal {
    Long,
    Short,
    Neutral
}

struct TradePosition {
    stock1_idx: usize,
    stock2_idx: usize,
    signal: TradingSignal,
    entry_spread: f64,
    timestamp: usize,
}

fn generate_trading_signals(
    normalized_prices: &[Vec<f64>],
    pairs: &[StockPair], 
    threshold_multiplier: f64,
    volatilities: &[f64],
    current_time: usize
) -> Vec<TradePosition> {
    let mut positions = Vec::new();
    
    for (pair_idx, pair) in pairs.iter().enumerate() {
        let stock1_idx = pair.stock1_idx;
        let stock2_idx = pair.stock2_idx;
        
        // Calculate current spread
        let current_spread = normalized_prices[stock1_idx][current_time] - 
                             normalized_prices[stock2_idx][current_time];
                             
        let threshold = threshold_multiplier * volatilities[pair_idx];
        
        let signal = if current_spread > threshold {
            // Stock1 is overvalued relative to Stock2
            TradingSignal::Short
        } else if current_spread < -threshold {
            // Stock1 is undervalued relative to Stock2
            TradingSignal::Long
        } else {
            TradingSignal::Neutral
        };
        
        if signal != TradingSignal::Neutral {
            positions.push(TradePosition {
                stock1_idx,
                stock2_idx,
                signal,
                entry_spread: current_spread,
                timestamp: current_time,
            });
        }
    }
    
    positions
}

Performance Optimization

For high-frequency trading systems, performance is critical. SIMD (Single Instruction, Multiple Data) instructions can significantly accelerate distance calculations:

SIMD Performance Optimization in Rust SIMD acceleration: utilizing data-level parallelism in Rust to process multiple price points simultaneously, drastically reducing latency

#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

#[cfg(target_arch = "x86_64")]
#[inline]
unsafe fn euclidean_distance_simd(x: &[f32], y: &[f32]) -> f32 {
    assert_eq!(x.len(), y.len());
    
    let mut sum = _mm256_setzero_ps();
    let chunks = x.len() / 8;
    
    for i in 0..chunks {
        let xi = _mm256_loadu_ps(&x[i * 8]);
        let yi = _mm256_loadu_ps(&y[i * 8]);
        
        let diff = _mm256_sub_ps(xi, yi);
        let squared = _mm256_mul_ps(diff, diff);
        sum = _mm256_add_ps(sum, squared);
    }
    
    // Handle the remaining elements
    let mut result = _mm256_reduce_add_ps(sum);
    
    for i in (chunks * 8)..x.len() {
        result += (x[i] - y[i]).powi(2);
    }
    
    result.sqrt()
}

// Helper function to sum SIMD vector
#[cfg(target_arch = "x86_64")]
#[inline(always)]
unsafe fn _mm256_reduce_add_ps(v: __m256) -> f32 {
    let hilow = _mm256_extractf128_ps(v, 1);
    let low = _mm256_castps256_ps128(v);
    let sum128 = _mm_add_ps(hilow, low);
    
    let hi64 = _mm_extractf128_si128(_mm_castps_si128(sum128), 1);
    let low64 = _mm_castps_si128(sum128);
    let sum64 = _mm_add_ps(_mm_castsi128_ps(hi64), _mm_castsi128_ps(low64));
    
    _mm_cvtss_f32(_mm_hadd_ps(sum64, sum64))
}

Asynchronous processing can further improve throughput, especially when dealing with multiple stock pairs:

use tokio::task;
use futures::future::join_all;

async fn process_pairs_async(
    normalized_prices: &[Vec<f64>],
    stock_count: usize,
    chunk_size: usize
) -> Vec<StockPair> {
    let mut tasks = Vec::new();
    
    // Split work into chunks
    let chunks = (stock_count + chunk_size - 1) / chunk_size;
    
    for chunk in 0..chunks {
        let start = chunk * chunk_size;
        let end = std::cmp::min((chunk + 1) * chunk_size, stock_count);
        
        let prices_clone = normalized_prices.to_vec();
        
        let task = task::spawn(async move {
            let mut pairs = Vec::new();
            
            for i in start..end {
                for j in (i+1)..stock_count {
                    let distance = euclidean_squared_distance(&prices_clone[i], &prices_clone[j]);
                    
                    pairs.push(StockPair {
                        stock1_idx: i,
                        stock2_idx: j,
                        distance,
                    });
                }
            }
            
            pairs
        });
        
        tasks.push(task);
    }
    
    // Await all tasks and combine results
    let results = join_all(tasks).await;
    
    let mut all_pairs = Vec::new();
    for result in results {
        if let Ok(pairs) = result {
            all_pairs.extend(pairs);
        }
    }
    
    // Sort by distance
    all_pairs.sort_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap_or(std::cmp::Ordering::Equal));
    all_pairs
}

Testing the Strategy Implementation

To evaluate our implementation, we need proper testing infrastructure:

#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_normalization() {
        let prices = vec![10.0, 15.0, 12.0, 18.0, 20.0];
        let normalized = min_max_normalize(&prices);
        
        let expected = vec![0.0, 0.5, 0.2, 0.8, 1.0];
        
        for (a, b) in normalized.iter().zip(expected.iter()) {
            assert!((a - b).abs() < 0.001);
        }
    }
    
    #[test]
    fn test_euclidean_distance() {
        let x = vec![0.1, 0.2, 0.3, 0.4, 0.5];
        let y = vec![0.15, 0.22, 0.35, 0.38, 0.53];
        
        let distance = euclidean_squared_distance(&x, &y);
        let expected = 0.0049; // Calculated manually
        
        assert!((distance - expected).abs() < 0.0001);
    }
    
    #[test]
    fn test_pearson_correlation() {
        let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
        let y = vec![5.0, 4.0, 3.0, 2.0, 1.0];
        
        let corr = pearson_correlation(&x, &y);
        let expected = -1.0; // Perfect negative correlation
        
        assert!((corr - expected).abs() < 0.0001);
    }
    
    // Integration tests would be implemented in tests/ directory
}

For integration testing, we would follow the Rust convention of placing tests in a separate tests directory at the project root[15][18].

Conclusion

When implementing a pairs trading system, several considerations should be made:

The trade-off between simplicity (basic approach) and enhanced statistical power (Pearson approach)
The computational resources required for large-scale pair analysis
Transaction costs, which can significantly impact profitability[3]
The need for continuous monitoring and recalibration of pairs

Citation

@software{soloviov2025distanceapproach,
  author = {Soloviov, Eugen},
  title = {Distance Approach in Pairs Trading: Implementation and Analysis with Rust},
  year = {2025},
  url = {https://marketmaker.cc/en/blog/post/distance-approach-pairs-trading},
  version = {0.1.0},
  description = {A comprehensive analysis of basic and advanced Distance Approach methodologies for pairs trading, with practical implementations in Rust tailored for high-frequency traders and algorithmic developers.}
}

Theoretical Foundation of the Distance Approach

Mathematical Basis

Basic Distance Approach Implementation

Data Normalization

Finding the Closest Pairs

Calculating Historical Volatility

Advanced Selection Methods

Industry Group Filtering

Historical Standard Deviation Consideration

Advanced Approach: Pearson Correlation Method

Implementation in Rust

Portfolio Formation and Beta Calculation

Trading Signal Generation

Performance Optimization

Testing the Strategy Implementation

Conclusion

Citation

References

Read More

Matrices, Tensors, and Tropical Algebra: Linear Algebra for Arbitrage Detection

Vine Copulas for Arbitrage: Modeling High-Dimensional Dependencies

Futures-Spot Arbitrage: From Cash-and-Carry to DeFi-CeFi

Нарықтан бір қадам алда болыңыз

Theoretical Foundation of the Distance Approach

Mathematical Basis

Basic Distance Approach Implementation

Data Normalization

Finding the Closest Pairs

Calculating Historical Volatility

Advanced Selection Methods

Industry Group Filtering

Historical Standard Deviation Consideration

Advanced Approach: Pearson Correlation Method

Implementation in Rust

Portfolio Formation and Beta Calculation

Trading Signal Generation

Performance Optimization

Testing the Strategy Implementation

Conclusion

Citation

References