Realistic Backtesting: Transaction Costs, Slippage, and Walk-Forward Optimization
Abstract
Most backtest failures in live trading stem from unrealistic assumptions about costs, slippage, and optimization methodology. This paper details our comprehensive backtesting framework designed to minimize the gap between simulated and live performance.
The Backtest-to-Live Gap
Common reasons strategies fail live:
| Factor | Typical Impact | Our Approach |
|---|---|---|
| Transaction costs | -20% to -50% of gross returns | Full cost modeling |
| Slippage | -10% to -30% of gross returns | Dynamic slippage |
| Market impact | Variable, often ignored | Size-based impact model |
| Overfitting | Strategy fails completely | Walk-forward testing |
| Look-ahead bias | Inflated win rates | Point-in-time data |
Transaction Cost Modeling
Fee Structure
We model complete exchange fee structures:
python@dataclass class FeeModel: maker_fee: float = 0.001 # 0.1% for limit orders taker_fee: float = 0.002 # 0.2% for market orders funding_rate: float = 0.01 # 0.01% per 8h for perpetuals withdrawal_fee: float = 0.0005 # Network fees
Round-Trip Cost Calculation
pythondef calculate_round_trip_cost( entry_type: str, # 'maker' or 'taker' exit_type: str, # 'maker' or 'taker' position_size: float, funding_periods: int, fee_model: FeeModel ) -> float: """ Calculate total round-trip transaction costs. """ entry_fee = ( fee_model.maker_fee if entry_type == 'maker' else fee_model.taker_fee ) exit_fee = ( fee_model.maker_fee if exit_type == 'maker' else fee_model.taker_fee ) total_funding = fee_model.funding_rate * funding_periods return entry_fee + exit_fee + total_funding
Realistic Cost Assumptions
For crypto trading:
- Conservative: 0.5% round trip (taker + taker + 2% annual funding)
- Moderate: 0.3% round trip (maker + taker + 1% annual funding)
- Optimistic: 0.15% round trip (maker + maker, minimal funding)
We default to conservative assumptions.
Slippage Modeling
Slippage is the difference between expected and actual execution price.
Fixed Slippage (Naive)
Many backtests use fixed slippage (e.g., 0.1%). This is incorrect—slippage depends on:
- Order size relative to book depth
- Market volatility
- Time of day
- Order type
Dynamic Slippage Model
pythondef estimate_slippage( order_size: float, side: str, orderbook: OrderBook, volatility: float ) -> float: """ Estimate slippage based on order size and market conditions. """ # Base slippage from spread spread = orderbook.best_ask - orderbook.best_bid spread_slippage = spread / 2 # Impact slippage from order size if side == 'buy': available_liquidity = sum(orderbook.asks[:10].volume) else: available_liquidity = sum(orderbook.bids[:10].volume) liquidity_ratio = order_size / available_liquidity impact_slippage = orderbook.mid_price * liquidity_ratio * 0.01 # Volatility adjustment vol_adjustment = 1 + (volatility / 0.02) # Baseline 2% daily vol return (spread_slippage + impact_slippage) * vol_adjustment
Market Impact Model (Square Root Law)
For larger orders, we use the square root market impact model:
Impact = σ × √(Q / V) × π
Where:
- σ = daily price volatility
- Q = order size
- V = average daily volume
- π = permanent impact coefficient (~0.1 for crypto)
Walk-Forward Optimization
Walk-forward testing prevents overfitting by simulating how the strategy would be developed and deployed in real time.
The Process
Timeline:
[----Train 1----][Test 1][----Train 2----][Test 2]...
1. Train on historical data (e.g., 6 months)
2. Optimize parameters on training set
3. Test on unseen future data (e.g., 1 month)
4. Slide window forward
5. Repeat
Implementation
pythondef walk_forward_test( strategy: Strategy, data: pd.DataFrame, train_months: int = 6, test_months: int = 1, overlap: bool = False ) -> List[TestResult]: """ Perform walk-forward backtesting. """ results = [] # Calculate window sizes total_days = len(data) train_days = train_months * 30 test_days = test_months * 30 current_start = 0 while current_start + train_days + test_days <= total_days: # Define windows train_end = current_start + train_days test_end = train_end + test_days train_data = data.iloc[current_start:train_end] test_data = data.iloc[train_end:test_end] # Optimize on training data best_params = strategy.optimize(train_data) # Test on out-of-sample data strategy.set_params(best_params) result = strategy.backtest(test_data) results.append(TestResult( train_period=(current_start, train_end), test_period=(train_end, test_end), params=best_params, performance=result )) # Slide window if overlap: current_start += test_days else: current_start = test_end return results
Walk-Forward Efficiency Ratio
A healthy strategy should show consistent performance:
pythondef walk_forward_efficiency(results: List[TestResult]) -> float: """ Calculate walk-forward efficiency ratio. WFE = Average Test Sharpe / Average Train Sharpe Good: WFE > 0.5 Acceptable: WFE 0.3-0.5 Poor: WFE < 0.3 (likely overfit) """ train_sharpes = [r.train_sharpe for r in results] test_sharpes = [r.test_sharpe for r in results] return np.mean(test_sharpes) / np.mean(train_sharpes)
Point-in-Time Data
The Problem
Many data sources retroactively update historical data:
- Earnings restatements
- Dividend adjustments
- Split adjustments
- Delisting handling
Using restated data creates look-ahead bias.
Our Approach
We maintain point-in-time databases:
- Data stored as it appeared at each moment
- No retroactive updates
- Survivorship-bias-free (includes delisted assets)
- Timestamps for all data points
Monte Carlo Simulation
To understand performance distribution, we run Monte Carlo simulations:
pythondef monte_carlo_simulation( strategy: Strategy, data: pd.DataFrame, num_simulations: int = 1000, shuffle_method: str = 'bootstrap' ) -> MonteCarloResults: """ Run Monte Carlo simulation to estimate performance distribution. """ results = [] for _ in range(num_simulations): if shuffle_method == 'bootstrap': # Resample with replacement sampled_data = data.sample(frac=1, replace=True) elif shuffle_method == 'block_bootstrap': # Resample blocks to preserve autocorrelation sampled_data = block_resample(data, block_size=20) result = strategy.backtest(sampled_data) results.append(result) return MonteCarloResults( mean_sharpe=np.mean([r.sharpe for r in results]), std_sharpe=np.std([r.sharpe for r in results]), percentile_5_sharpe=np.percentile([r.sharpe for r in results], 5), percentile_95_sharpe=np.percentile([r.sharpe for r in results], 95), probability_positive=sum(1 for r in results if r.sharpe > 0) / num_simulations )
Our Backtesting Pipeline
Complete pipeline for strategy evaluation:
pythondef evaluate_strategy(strategy: Strategy) -> EvaluationReport: """ Complete strategy evaluation pipeline. """ # 1. Load point-in-time data data = load_pit_data(strategy.asset, strategy.timeframe) # 2. Walk-forward optimization wf_results = walk_forward_test(strategy, data) # 3. Calculate metrics with realistic costs for result in wf_results: result.apply_transaction_costs(FeeModel()) result.apply_slippage_model(DynamicSlippage()) # 4. Monte Carlo simulation mc_results = monte_carlo_simulation(strategy, data) # 5. Calculate efficiency ratio wfe = walk_forward_efficiency(wf_results) # 6. Generate report return EvaluationReport( walk_forward_results=wf_results, monte_carlo_results=mc_results, walk_forward_efficiency=wfe, recommendation='PASS' if wfe > 0.3 and mc_results.probability_positive > 0.7 else 'FAIL' )
Conclusion
Realistic backtesting requires:
- Comprehensive cost modeling: All fees, funding, and spreads
- Dynamic slippage: Based on order size and market conditions
- Walk-forward testing: Prevents overfitting
- Point-in-time data: No look-ahead bias
- Monte Carlo simulation: Understand performance distribution
Strategies that pass our framework have a much higher probability of live trading success.
For more on our methodology, see our audit framework.