Case Study 1: Backtesting a Mean-Reversion Strategy Across 1,000 Markets
Objective
In this case study, we build a complete end-to-end backtest of a mean-reversion strategy applied to 1,000 simulated prediction markets. We will simulate realistic fills, model transaction costs for a Polymarket-like platform, generate a comprehensive performance report, and test whether the results are statistically significant.
The goal is not to find a profitable strategy (the data is simulated), but to demonstrate the complete methodology for evaluating any prediction market strategy rigorously.
Background
Mean reversion is the hypothesis that prices tend to return to a long-run average after temporary deviations. In prediction markets, this manifests as follows: when a market's price deviates significantly from its recent average (or from a model-based fair value), the deviation is more likely to reverse than to persist.
The economic rationale is that temporary price spikes in prediction markets are often caused by: - Noise traders reacting to irrelevant information - Temporary liquidity imbalances - Overreaction to news that is quickly corrected
Strategy Definition
Name: Z-Score Mean Reversion
Parameters:
- lookback: Number of periods for rolling mean and standard deviation (default: 30)
- entry_threshold: Z-score magnitude to trigger entry (default: 1.5)
- exit_threshold: Z-score magnitude to trigger exit (default: 0.5)
- max_position: Maximum contracts per market (default: 50)
Rules: 1. Compute rolling mean and standard deviation of the price over the lookback window. 2. Calculate z-score: $z = (P_{current} - \mu_{rolling}) / \sigma_{rolling}$ 3. If $z < -\text{entry\_threshold}$: BUY (price is unusually low, expect reversion up). 4. If $z > +\text{entry\_threshold}$: SELL (price is unusually high, expect reversion down). 5. Exit when $|z| < \text{exit\_threshold}$ (price has reverted).
Step 1: Generate Simulated Market Data
We generate 1,000 markets with realistic characteristics:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
def generate_prediction_market_data(n_markets=1000,
n_periods=200,
mean_reversion_strength=0.05,
noise_level=0.03,
seed=42):
"""Generate simulated prediction market data with mild mean reversion."""
np.random.seed(seed)
all_data = []
for market_id in range(n_markets):
# True probability (fixed for each market)
true_prob = np.random.uniform(0.15, 0.85)
# Generate prices with mean reversion toward true probability
prices = np.zeros(n_periods)
prices[0] = true_prob + np.random.randn() * 0.1
prices[0] = np.clip(prices[0], 0.05, 0.95)
for t in range(1, n_periods):
# Mean-reverting process (Ornstein-Uhlenbeck discretized)
reversion = mean_reversion_strength * (true_prob - prices[t-1])
noise = noise_level * np.random.randn()
prices[t] = prices[t-1] + reversion + noise
prices[t] = np.clip(prices[t], 0.01, 0.99)
# Generate bid/ask around price
spreads = np.random.uniform(0.02, 0.06, n_periods)
bids = prices - spreads / 2
asks = prices + spreads / 2
bids = np.clip(bids, 0.01, 0.98)
asks = np.clip(asks, 0.02, 0.99)
# Ensure ask > bid
asks = np.maximum(asks, bids + 0.01)
# Generate volume (higher near 0.5, lower near extremes)
base_volume = 100 + 400 * (1 - 4 * (prices - 0.5)**2)
volumes = np.maximum(base_volume + np.random.randn(n_periods) * 50, 10)
# Generate bid/ask sizes
bid_sizes = np.random.exponential(50, n_periods)
ask_sizes = np.random.exponential(50, n_periods)
# Resolution: based on true probability
resolution = 1 if np.random.random() < true_prob else 0
# Create timestamps
start_date = datetime(2024, 1, 1) + timedelta(days=np.random.randint(0, 30))
timestamps = [start_date + timedelta(hours=4*i) for i in range(n_periods)]
for t in range(n_periods):
all_data.append({
'timestamp': timestamps[t],
'market_id': f'MKT_{market_id:04d}',
'last_price': round(prices[t], 4),
'bid': round(bids[t], 4),
'ask': round(asks[t], 4),
'volume': round(volumes[t], 1),
'bid_size': round(bid_sizes[t], 1),
'ask_size': round(ask_sizes[t], 1),
'true_prob': true_prob,
'resolution': resolution,
})
df = pd.DataFrame(all_data)
df['timestamp'] = pd.to_datetime(df['timestamp'])
return df
# Generate data
data = generate_prediction_market_data(n_markets=1000, n_periods=200)
print(f"Generated {len(data):,} rows across {data['market_id'].nunique()} markets")
print(f"Date range: {data['timestamp'].min()} to {data['timestamp'].max()}")
Step 2: Implement the Strategy
class MeanReversionStrategy:
"""Z-Score mean reversion strategy for prediction markets."""
def __init__(self, lookback=30, entry_threshold=1.5,
exit_threshold=0.5, max_position=50):
self.lookback = lookback
self.entry_threshold = entry_threshold
self.exit_threshold = exit_threshold
self.max_position = max_position
self.positions = {} # market_id -> position size
def generate_signals(self, market_data: pd.DataFrame) -> pd.DataFrame:
"""Generate signals for a single market's data."""
prices = market_data['last_price']
# Rolling statistics (using only past data)
rolling_mean = prices.rolling(self.lookback, min_periods=self.lookback).mean()
rolling_std = prices.rolling(self.lookback, min_periods=self.lookback).std()
# Z-score
zscore = (prices - rolling_mean) / rolling_std
# Generate signals
signals = pd.Series(0, index=market_data.index)
for i in range(self.lookback, len(signals)):
z = zscore.iloc[i]
market_id = market_data['market_id'].iloc[i]
current_pos = self.positions.get(market_id, 0)
if np.isnan(z):
continue
if current_pos == 0:
# No position: check for entry
if z < -self.entry_threshold:
signals.iloc[i] = 1 # Buy
self.positions[market_id] = self.max_position
elif z > self.entry_threshold:
signals.iloc[i] = -1 # Sell
self.positions[market_id] = -self.max_position
else:
# Have position: check for exit
if abs(z) < self.exit_threshold:
signals.iloc[i] = -np.sign(current_pos) # Exit
self.positions[market_id] = 0
return signals
Step 3: Run the Backtest with Fill Simulation
def run_backtest(data, strategy, fee_rate=0.02, impact_coeff=0.1):
"""Run backtest across all markets with realistic execution."""
results = {
'trades': [],
'equity_snapshots': [],
}
initial_capital = 100000
cash = initial_capital
positions = {} # market_id -> {'qty': int, 'entry_price': float, 'entry_time': datetime}
markets = data['market_id'].unique()
for market_id in markets:
mkt_data = data[data['market_id'] == market_id].reset_index(drop=True)
if len(mkt_data) < strategy.lookback + 10:
continue
strategy_instance = MeanReversionStrategy(
lookback=strategy.lookback,
entry_threshold=strategy.entry_threshold,
exit_threshold=strategy.exit_threshold,
max_position=strategy.max_position,
)
signals = strategy_instance.generate_signals(mkt_data)
for i in range(len(mkt_data)):
signal = signals.iloc[i]
row = mkt_data.iloc[i]
if signal == 0:
continue
# Simulate execution
qty = abs(strategy.max_position)
if signal > 0: # Buy
base_price = row['ask']
available = row['ask_size']
impact = impact_coeff * np.sqrt(qty / max(available, 1))
fill_price = min(base_price + impact, 0.99)
cost = qty * fill_price
fee = cost * fee_rate
if cash >= cost + fee:
cash -= (cost + fee)
positions[market_id] = {
'qty': qty,
'entry_price': fill_price,
'entry_time': row['timestamp'],
'side': 'long',
}
elif signal < 0: # Sell / Exit
if market_id in positions:
pos = positions[market_id]
base_price = row['bid']
available = row['bid_size']
impact = impact_coeff * np.sqrt(pos['qty'] / max(available, 1))
fill_price = max(base_price - impact, 0.01)
proceeds = pos['qty'] * fill_price
fee = proceeds * fee_rate
cash += (proceeds - fee)
# Record trade
if pos['side'] == 'long':
pnl = (fill_price - pos['entry_price']) * pos['qty']
else:
pnl = (pos['entry_price'] - fill_price) * pos['qty']
pnl -= fee # Subtract exit fee
entry_fee = pos['entry_price'] * pos['qty'] * fee_rate
pnl -= entry_fee # Subtract entry fee
results['trades'].append({
'market_id': market_id,
'entry_time': pos['entry_time'],
'exit_time': row['timestamp'],
'entry_price': pos['entry_price'],
'exit_price': fill_price,
'qty': pos['qty'],
'pnl': pnl,
'side': pos['side'],
})
del positions[market_id]
# Snapshot equity
position_value = sum(
p['qty'] * mkt_data.iloc[min(i, len(mkt_data)-1)]['last_price']
for mid, p in positions.items()
if mid == market_id
)
total_equity = cash + position_value
# Handle market resolutions for open positions
for market_id, pos in list(positions.items()):
mkt_data = data[data['market_id'] == market_id]
resolution = mkt_data['resolution'].iloc[0]
if pos['side'] == 'long':
exit_price = resolution # Resolves to 0 or 1
pnl = (exit_price - pos['entry_price']) * pos['qty']
else:
exit_price = resolution
pnl = (pos['entry_price'] - exit_price) * pos['qty']
fee = abs(pnl) * fee_rate if pnl > 0 else 0
pnl -= fee
results['trades'].append({
'market_id': market_id,
'entry_time': pos['entry_time'],
'exit_time': mkt_data['timestamp'].iloc[-1],
'entry_price': pos['entry_price'],
'exit_price': exit_price,
'qty': pos['qty'],
'pnl': pnl,
'side': pos['side'],
})
cash += pos['qty'] * exit_price - (fee if pnl > 0 else 0)
results['final_equity'] = cash
results['initial_capital'] = initial_capital
results['total_return'] = (cash - initial_capital) / initial_capital
return results
# Run the backtest
strategy = MeanReversionStrategy(lookback=30, entry_threshold=1.5,
exit_threshold=0.5, max_position=50)
results = run_backtest(data, strategy)
print(f"Total trades: {len(results['trades'])}")
print(f"Initial capital: ${results['initial_capital']:,.2f}")
print(f"Final equity: ${results['final_equity']:,.2f}")
print(f"Total return: {results['total_return']:.2%}")
Step 4: Compute Performance Metrics
def compute_trade_metrics(trades):
"""Compute comprehensive metrics from trade list."""
if not trades:
return {'error': 'No trades'}
pnls = [t['pnl'] for t in trades]
wins = [p for p in pnls if p > 0]
losses = [p for p in pnls if p < 0]
metrics = {
'total_trades': len(trades),
'winning_trades': len(wins),
'losing_trades': len(losses),
'win_rate': len(wins) / len(trades) if trades else 0,
'total_pnl': sum(pnls),
'avg_pnl': np.mean(pnls),
'median_pnl': np.median(pnls),
'std_pnl': np.std(pnls),
'avg_win': np.mean(wins) if wins else 0,
'avg_loss': np.mean(losses) if losses else 0,
'max_win': max(pnls) if pnls else 0,
'max_loss': min(pnls) if pnls else 0,
'profit_factor': (sum(wins) / abs(sum(losses))
if losses else float('inf')),
'expectancy': np.mean(pnls),
}
# Win/loss ratio
if losses:
metrics['avg_win_loss_ratio'] = (
abs(np.mean(wins)) / abs(np.mean(losses)) if wins else 0
)
else:
metrics['avg_win_loss_ratio'] = float('inf')
# Holding period analysis
holding_periods = []
for t in trades:
if 'entry_time' in t and 'exit_time' in t:
hp = (t['exit_time'] - t['entry_time']).total_seconds() / 86400
holding_periods.append(hp)
if holding_periods:
metrics['avg_holding_days'] = np.mean(holding_periods)
metrics['median_holding_days'] = np.median(holding_periods)
metrics['max_holding_days'] = max(holding_periods)
return metrics
metrics = compute_trade_metrics(results['trades'])
print("\n=== PERFORMANCE METRICS ===")
for key, value in metrics.items():
if isinstance(value, float):
print(f" {key:25s}: {value:>12.4f}")
else:
print(f" {key:25s}: {value:>12}")
Step 5: Test Statistical Significance
def test_significance(trades, n_permutations=10000):
"""Test if strategy results are statistically significant."""
pnls = np.array([t['pnl'] for t in trades])
# Actual performance
actual_total = pnls.sum()
actual_sharpe = (pnls.mean() / pnls.std() * np.sqrt(252)
if pnls.std() > 0 else 0)
# Permutation test
perm_totals = []
perm_sharpes = []
for _ in range(n_permutations):
# Randomly flip signs of PnLs (simulate random direction)
random_signs = np.random.choice([-1, 1], size=len(pnls))
perm_pnl = pnls * random_signs
perm_totals.append(perm_pnl.sum())
if perm_pnl.std() > 0:
perm_sharpes.append(
perm_pnl.mean() / perm_pnl.std() * np.sqrt(252)
)
else:
perm_sharpes.append(0)
perm_totals = np.array(perm_totals)
perm_sharpes = np.array(perm_sharpes)
# P-values
p_total = np.mean(perm_totals >= actual_total)
p_sharpe = np.mean(perm_sharpes >= actual_sharpe)
# Bootstrap confidence interval for total PnL
bootstrap_totals = []
for _ in range(n_permutations):
sample = np.random.choice(pnls, size=len(pnls), replace=True)
bootstrap_totals.append(sample.sum())
bootstrap_totals = np.array(bootstrap_totals)
ci_lower = np.percentile(bootstrap_totals, 2.5)
ci_upper = np.percentile(bootstrap_totals, 97.5)
return {
'actual_total_pnl': actual_total,
'actual_sharpe': actual_sharpe,
'permutation_p_value_total': p_total,
'permutation_p_value_sharpe': p_sharpe,
'significant_5pct': p_total < 0.05,
'significant_1pct': p_total < 0.01,
'bootstrap_ci_95': (ci_lower, ci_upper),
'ci_contains_zero': ci_lower <= 0 <= ci_upper,
}
sig_results = test_significance(results['trades'])
print("\n=== STATISTICAL SIGNIFICANCE ===")
for key, value in sig_results.items():
print(f" {key:35s}: {value}")
Step 6: Generate the Report
def print_full_report(results, metrics, sig_results):
"""Print a comprehensive backtest report."""
print("=" * 70)
print(" BACKTEST REPORT: Mean Reversion Across 1,000 Markets")
print("=" * 70)
print(f"\n Strategy: Z-Score Mean Reversion")
print(f" Parameters: lookback=30, entry=1.5, exit=0.5")
print(f" Markets Tested: 1,000")
print(f" Data Period: Simulated 200 periods per market")
print(f"\n --- Capital Summary ---")
print(f" Initial Capital: ${results['initial_capital']:>12,.2f}")
print(f" Final Equity: ${results['final_equity']:>12,.2f}")
print(f" Total Return: {results['total_return']:>12.2%}")
print(f"\n --- Trade Summary ---")
print(f" Total Trades: {metrics['total_trades']:>12}")
print(f" Winning Trades: {metrics['winning_trades']:>12}")
print(f" Losing Trades: {metrics['losing_trades']:>12}")
print(f" Win Rate: {metrics['win_rate']:>12.2%}")
print(f"\n --- Profitability ---")
print(f" Total P&L: ${metrics['total_pnl']:>12,.2f}")
print(f" Average P&L/Trade: ${metrics['avg_pnl']:>12,.4f}")
print(f" Profit Factor: {metrics['profit_factor']:>12.3f}")
print(f" Expectancy: ${metrics['expectancy']:>12,.4f}")
print(f"\n --- Win/Loss Analysis ---")
print(f" Average Win: ${metrics['avg_win']:>12,.4f}")
print(f" Average Loss: ${metrics['avg_loss']:>12,.4f}")
print(f" Max Win: ${metrics['max_win']:>12,.4f}")
print(f" Max Loss: ${metrics['max_loss']:>12,.4f}")
print(f" Avg Win/Loss Ratio: {metrics['avg_win_loss_ratio']:>12.3f}")
print(f"\n --- Statistical Significance ---")
print(f" Sharpe Ratio: {sig_results['actual_sharpe']:>12.3f}")
print(f" P-value (total): {sig_results['permutation_p_value_total']:>12.4f}")
print(f" P-value (Sharpe): {sig_results['permutation_p_value_sharpe']:>12.4f}")
print(f" Significant (5%): {str(sig_results['significant_5pct']):>12}")
print(f" Significant (1%): {str(sig_results['significant_1pct']):>12}")
ci = sig_results['bootstrap_ci_95']
print(f" 95% CI Total PnL: (${ci[0]:,.2f}, ${ci[1]:,.2f})")
print(f" CI Contains Zero: {str(sig_results['ci_contains_zero']):>12}")
print("=" * 70)
print_full_report(results, metrics, sig_results)
Key Takeaways from This Case Study
-
Scale matters. Testing across 1,000 markets provides much more statistical power than testing on a single market. The large sample size lets us detect smaller effects and have higher confidence in the results.
-
Costs are significant. The 2% fee rate and market impact meaningfully reduce gross returns. Many trades that appear profitable before costs become unprofitable after costs are applied.
-
Fill simulation reveals reality. Using bid/ask prices with impact modeling instead of last-traded prices shows the true cost of execution. The gap can be substantial in thin markets.
-
Statistical testing is essential. Without the permutation test and bootstrap confidence interval, we would not know whether the observed performance reflects a genuine edge or just favorable randomness in the simulation.
-
The methodology generalizes. This same framework --- data generation/loading, strategy implementation, fill simulation, cost modeling, metrics computation, and significance testing --- applies to any prediction market strategy on any platform.