Case Study 1: Backtesting a Mean-Reversion Strategy Across 1,000 Markets

Objective

In this case study, we build a complete end-to-end backtest of a mean-reversion strategy applied to 1,000 simulated prediction markets. We will simulate realistic fills, model transaction costs for a Polymarket-like platform, generate a comprehensive performance report, and test whether the results are statistically significant.

The goal is not to find a profitable strategy (the data is simulated), but to demonstrate the complete methodology for evaluating any prediction market strategy rigorously.

Background

Mean reversion is the hypothesis that prices tend to return to a long-run average after temporary deviations. In prediction markets, this manifests as follows: when a market's price deviates significantly from its recent average (or from a model-based fair value), the deviation is more likely to reverse than to persist.

The economic rationale is that temporary price spikes in prediction markets are often caused by: - Noise traders reacting to irrelevant information - Temporary liquidity imbalances - Overreaction to news that is quickly corrected

Strategy Definition

Name: Z-Score Mean Reversion

Parameters: - lookback: Number of periods for rolling mean and standard deviation (default: 30) - entry_threshold: Z-score magnitude to trigger entry (default: 1.5) - exit_threshold: Z-score magnitude to trigger exit (default: 0.5) - max_position: Maximum contracts per market (default: 50)

Rules: 1. Compute rolling mean and standard deviation of the price over the lookback window. 2. Calculate z-score: $z = (P_{current} - \mu_{rolling}) / \sigma_{rolling}$ 3. If $z < -\text{entry\_threshold}$: BUY (price is unusually low, expect reversion up). 4. If $z > +\text{entry\_threshold}$: SELL (price is unusually high, expect reversion down). 5. Exit when $|z| < \text{exit\_threshold}$ (price has reverted).

Step 1: Generate Simulated Market Data

We generate 1,000 markets with realistic characteristics:

import numpy as np
import pandas as pd
from datetime import datetime, timedelta

def generate_prediction_market_data(n_markets=1000,
                                      n_periods=200,
                                      mean_reversion_strength=0.05,
                                      noise_level=0.03,
                                      seed=42):
    """Generate simulated prediction market data with mild mean reversion."""
    np.random.seed(seed)

    all_data = []

    for market_id in range(n_markets):
        # True probability (fixed for each market)
        true_prob = np.random.uniform(0.15, 0.85)

        # Generate prices with mean reversion toward true probability
        prices = np.zeros(n_periods)
        prices[0] = true_prob + np.random.randn() * 0.1
        prices[0] = np.clip(prices[0], 0.05, 0.95)

        for t in range(1, n_periods):
            # Mean-reverting process (Ornstein-Uhlenbeck discretized)
            reversion = mean_reversion_strength * (true_prob - prices[t-1])
            noise = noise_level * np.random.randn()
            prices[t] = prices[t-1] + reversion + noise
            prices[t] = np.clip(prices[t], 0.01, 0.99)

        # Generate bid/ask around price
        spreads = np.random.uniform(0.02, 0.06, n_periods)
        bids = prices - spreads / 2
        asks = prices + spreads / 2
        bids = np.clip(bids, 0.01, 0.98)
        asks = np.clip(asks, 0.02, 0.99)

        # Ensure ask > bid
        asks = np.maximum(asks, bids + 0.01)

        # Generate volume (higher near 0.5, lower near extremes)
        base_volume = 100 + 400 * (1 - 4 * (prices - 0.5)**2)
        volumes = np.maximum(base_volume + np.random.randn(n_periods) * 50, 10)

        # Generate bid/ask sizes
        bid_sizes = np.random.exponential(50, n_periods)
        ask_sizes = np.random.exponential(50, n_periods)

        # Resolution: based on true probability
        resolution = 1 if np.random.random() < true_prob else 0

        # Create timestamps
        start_date = datetime(2024, 1, 1) + timedelta(days=np.random.randint(0, 30))
        timestamps = [start_date + timedelta(hours=4*i) for i in range(n_periods)]

        for t in range(n_periods):
            all_data.append({
                'timestamp': timestamps[t],
                'market_id': f'MKT_{market_id:04d}',
                'last_price': round(prices[t], 4),
                'bid': round(bids[t], 4),
                'ask': round(asks[t], 4),
                'volume': round(volumes[t], 1),
                'bid_size': round(bid_sizes[t], 1),
                'ask_size': round(ask_sizes[t], 1),
                'true_prob': true_prob,
                'resolution': resolution,
            })

    df = pd.DataFrame(all_data)
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    return df

# Generate data
data = generate_prediction_market_data(n_markets=1000, n_periods=200)
print(f"Generated {len(data):,} rows across {data['market_id'].nunique()} markets")
print(f"Date range: {data['timestamp'].min()} to {data['timestamp'].max()}")

Step 2: Implement the Strategy

class MeanReversionStrategy:
    """Z-Score mean reversion strategy for prediction markets."""

    def __init__(self, lookback=30, entry_threshold=1.5,
                 exit_threshold=0.5, max_position=50):
        self.lookback = lookback
        self.entry_threshold = entry_threshold
        self.exit_threshold = exit_threshold
        self.max_position = max_position
        self.positions = {}  # market_id -> position size

    def generate_signals(self, market_data: pd.DataFrame) -> pd.DataFrame:
        """Generate signals for a single market's data."""
        prices = market_data['last_price']

        # Rolling statistics (using only past data)
        rolling_mean = prices.rolling(self.lookback, min_periods=self.lookback).mean()
        rolling_std = prices.rolling(self.lookback, min_periods=self.lookback).std()

        # Z-score
        zscore = (prices - rolling_mean) / rolling_std

        # Generate signals
        signals = pd.Series(0, index=market_data.index)

        for i in range(self.lookback, len(signals)):
            z = zscore.iloc[i]
            market_id = market_data['market_id'].iloc[i]
            current_pos = self.positions.get(market_id, 0)

            if np.isnan(z):
                continue

            if current_pos == 0:
                # No position: check for entry
                if z < -self.entry_threshold:
                    signals.iloc[i] = 1  # Buy
                    self.positions[market_id] = self.max_position
                elif z > self.entry_threshold:
                    signals.iloc[i] = -1  # Sell
                    self.positions[market_id] = -self.max_position
            else:
                # Have position: check for exit
                if abs(z) < self.exit_threshold:
                    signals.iloc[i] = -np.sign(current_pos)  # Exit
                    self.positions[market_id] = 0

        return signals

Step 3: Run the Backtest with Fill Simulation

def run_backtest(data, strategy, fee_rate=0.02, impact_coeff=0.1):
    """Run backtest across all markets with realistic execution."""

    results = {
        'trades': [],
        'equity_snapshots': [],
    }

    initial_capital = 100000
    cash = initial_capital
    positions = {}  # market_id -> {'qty': int, 'entry_price': float, 'entry_time': datetime}

    markets = data['market_id'].unique()

    for market_id in markets:
        mkt_data = data[data['market_id'] == market_id].reset_index(drop=True)

        if len(mkt_data) < strategy.lookback + 10:
            continue

        strategy_instance = MeanReversionStrategy(
            lookback=strategy.lookback,
            entry_threshold=strategy.entry_threshold,
            exit_threshold=strategy.exit_threshold,
            max_position=strategy.max_position,
        )

        signals = strategy_instance.generate_signals(mkt_data)

        for i in range(len(mkt_data)):
            signal = signals.iloc[i]
            row = mkt_data.iloc[i]

            if signal == 0:
                continue

            # Simulate execution
            qty = abs(strategy.max_position)

            if signal > 0:  # Buy
                base_price = row['ask']
                available = row['ask_size']
                impact = impact_coeff * np.sqrt(qty / max(available, 1))
                fill_price = min(base_price + impact, 0.99)

                cost = qty * fill_price
                fee = cost * fee_rate

                if cash >= cost + fee:
                    cash -= (cost + fee)
                    positions[market_id] = {
                        'qty': qty,
                        'entry_price': fill_price,
                        'entry_time': row['timestamp'],
                        'side': 'long',
                    }

            elif signal < 0:  # Sell / Exit
                if market_id in positions:
                    pos = positions[market_id]
                    base_price = row['bid']
                    available = row['bid_size']
                    impact = impact_coeff * np.sqrt(pos['qty'] / max(available, 1))
                    fill_price = max(base_price - impact, 0.01)

                    proceeds = pos['qty'] * fill_price
                    fee = proceeds * fee_rate
                    cash += (proceeds - fee)

                    # Record trade
                    if pos['side'] == 'long':
                        pnl = (fill_price - pos['entry_price']) * pos['qty']
                    else:
                        pnl = (pos['entry_price'] - fill_price) * pos['qty']

                    pnl -= fee  # Subtract exit fee
                    entry_fee = pos['entry_price'] * pos['qty'] * fee_rate
                    pnl -= entry_fee  # Subtract entry fee

                    results['trades'].append({
                        'market_id': market_id,
                        'entry_time': pos['entry_time'],
                        'exit_time': row['timestamp'],
                        'entry_price': pos['entry_price'],
                        'exit_price': fill_price,
                        'qty': pos['qty'],
                        'pnl': pnl,
                        'side': pos['side'],
                    })

                    del positions[market_id]

            # Snapshot equity
            position_value = sum(
                p['qty'] * mkt_data.iloc[min(i, len(mkt_data)-1)]['last_price']
                for mid, p in positions.items()
                if mid == market_id
            )
            total_equity = cash + position_value

    # Handle market resolutions for open positions
    for market_id, pos in list(positions.items()):
        mkt_data = data[data['market_id'] == market_id]
        resolution = mkt_data['resolution'].iloc[0]

        if pos['side'] == 'long':
            exit_price = resolution  # Resolves to 0 or 1
            pnl = (exit_price - pos['entry_price']) * pos['qty']
        else:
            exit_price = resolution
            pnl = (pos['entry_price'] - exit_price) * pos['qty']

        fee = abs(pnl) * fee_rate if pnl > 0 else 0
        pnl -= fee

        results['trades'].append({
            'market_id': market_id,
            'entry_time': pos['entry_time'],
            'exit_time': mkt_data['timestamp'].iloc[-1],
            'entry_price': pos['entry_price'],
            'exit_price': exit_price,
            'qty': pos['qty'],
            'pnl': pnl,
            'side': pos['side'],
        })

        cash += pos['qty'] * exit_price - (fee if pnl > 0 else 0)

    results['final_equity'] = cash
    results['initial_capital'] = initial_capital
    results['total_return'] = (cash - initial_capital) / initial_capital

    return results

# Run the backtest
strategy = MeanReversionStrategy(lookback=30, entry_threshold=1.5,
                                  exit_threshold=0.5, max_position=50)
results = run_backtest(data, strategy)

print(f"Total trades: {len(results['trades'])}")
print(f"Initial capital: ${results['initial_capital']:,.2f}")
print(f"Final equity: ${results['final_equity']:,.2f}")
print(f"Total return: {results['total_return']:.2%}")

Step 4: Compute Performance Metrics

def compute_trade_metrics(trades):
    """Compute comprehensive metrics from trade list."""
    if not trades:
        return {'error': 'No trades'}

    pnls = [t['pnl'] for t in trades]
    wins = [p for p in pnls if p > 0]
    losses = [p for p in pnls if p < 0]

    metrics = {
        'total_trades': len(trades),
        'winning_trades': len(wins),
        'losing_trades': len(losses),
        'win_rate': len(wins) / len(trades) if trades else 0,
        'total_pnl': sum(pnls),
        'avg_pnl': np.mean(pnls),
        'median_pnl': np.median(pnls),
        'std_pnl': np.std(pnls),
        'avg_win': np.mean(wins) if wins else 0,
        'avg_loss': np.mean(losses) if losses else 0,
        'max_win': max(pnls) if pnls else 0,
        'max_loss': min(pnls) if pnls else 0,
        'profit_factor': (sum(wins) / abs(sum(losses))
                         if losses else float('inf')),
        'expectancy': np.mean(pnls),
    }

    # Win/loss ratio
    if losses:
        metrics['avg_win_loss_ratio'] = (
            abs(np.mean(wins)) / abs(np.mean(losses)) if wins else 0
        )
    else:
        metrics['avg_win_loss_ratio'] = float('inf')

    # Holding period analysis
    holding_periods = []
    for t in trades:
        if 'entry_time' in t and 'exit_time' in t:
            hp = (t['exit_time'] - t['entry_time']).total_seconds() / 86400
            holding_periods.append(hp)

    if holding_periods:
        metrics['avg_holding_days'] = np.mean(holding_periods)
        metrics['median_holding_days'] = np.median(holding_periods)
        metrics['max_holding_days'] = max(holding_periods)

    return metrics

metrics = compute_trade_metrics(results['trades'])
print("\n=== PERFORMANCE METRICS ===")
for key, value in metrics.items():
    if isinstance(value, float):
        print(f"  {key:25s}: {value:>12.4f}")
    else:
        print(f"  {key:25s}: {value:>12}")

Step 5: Test Statistical Significance

def test_significance(trades, n_permutations=10000):
    """Test if strategy results are statistically significant."""
    pnls = np.array([t['pnl'] for t in trades])

    # Actual performance
    actual_total = pnls.sum()
    actual_sharpe = (pnls.mean() / pnls.std() * np.sqrt(252)
                     if pnls.std() > 0 else 0)

    # Permutation test
    perm_totals = []
    perm_sharpes = []

    for _ in range(n_permutations):
        # Randomly flip signs of PnLs (simulate random direction)
        random_signs = np.random.choice([-1, 1], size=len(pnls))
        perm_pnl = pnls * random_signs

        perm_totals.append(perm_pnl.sum())
        if perm_pnl.std() > 0:
            perm_sharpes.append(
                perm_pnl.mean() / perm_pnl.std() * np.sqrt(252)
            )
        else:
            perm_sharpes.append(0)

    perm_totals = np.array(perm_totals)
    perm_sharpes = np.array(perm_sharpes)

    # P-values
    p_total = np.mean(perm_totals >= actual_total)
    p_sharpe = np.mean(perm_sharpes >= actual_sharpe)

    # Bootstrap confidence interval for total PnL
    bootstrap_totals = []
    for _ in range(n_permutations):
        sample = np.random.choice(pnls, size=len(pnls), replace=True)
        bootstrap_totals.append(sample.sum())

    bootstrap_totals = np.array(bootstrap_totals)
    ci_lower = np.percentile(bootstrap_totals, 2.5)
    ci_upper = np.percentile(bootstrap_totals, 97.5)

    return {
        'actual_total_pnl': actual_total,
        'actual_sharpe': actual_sharpe,
        'permutation_p_value_total': p_total,
        'permutation_p_value_sharpe': p_sharpe,
        'significant_5pct': p_total < 0.05,
        'significant_1pct': p_total < 0.01,
        'bootstrap_ci_95': (ci_lower, ci_upper),
        'ci_contains_zero': ci_lower <= 0 <= ci_upper,
    }

sig_results = test_significance(results['trades'])
print("\n=== STATISTICAL SIGNIFICANCE ===")
for key, value in sig_results.items():
    print(f"  {key:35s}: {value}")

Step 6: Generate the Report

def print_full_report(results, metrics, sig_results):
    """Print a comprehensive backtest report."""

    print("=" * 70)
    print("  BACKTEST REPORT: Mean Reversion Across 1,000 Markets")
    print("=" * 70)

    print(f"\n  Strategy: Z-Score Mean Reversion")
    print(f"  Parameters: lookback=30, entry=1.5, exit=0.5")
    print(f"  Markets Tested: 1,000")
    print(f"  Data Period: Simulated 200 periods per market")

    print(f"\n  --- Capital Summary ---")
    print(f"  Initial Capital:     ${results['initial_capital']:>12,.2f}")
    print(f"  Final Equity:        ${results['final_equity']:>12,.2f}")
    print(f"  Total Return:        {results['total_return']:>12.2%}")

    print(f"\n  --- Trade Summary ---")
    print(f"  Total Trades:        {metrics['total_trades']:>12}")
    print(f"  Winning Trades:      {metrics['winning_trades']:>12}")
    print(f"  Losing Trades:       {metrics['losing_trades']:>12}")
    print(f"  Win Rate:            {metrics['win_rate']:>12.2%}")

    print(f"\n  --- Profitability ---")
    print(f"  Total P&L:           ${metrics['total_pnl']:>12,.2f}")
    print(f"  Average P&L/Trade:   ${metrics['avg_pnl']:>12,.4f}")
    print(f"  Profit Factor:       {metrics['profit_factor']:>12.3f}")
    print(f"  Expectancy:          ${metrics['expectancy']:>12,.4f}")

    print(f"\n  --- Win/Loss Analysis ---")
    print(f"  Average Win:         ${metrics['avg_win']:>12,.4f}")
    print(f"  Average Loss:        ${metrics['avg_loss']:>12,.4f}")
    print(f"  Max Win:             ${metrics['max_win']:>12,.4f}")
    print(f"  Max Loss:            ${metrics['max_loss']:>12,.4f}")
    print(f"  Avg Win/Loss Ratio:  {metrics['avg_win_loss_ratio']:>12.3f}")

    print(f"\n  --- Statistical Significance ---")
    print(f"  Sharpe Ratio:        {sig_results['actual_sharpe']:>12.3f}")
    print(f"  P-value (total):     {sig_results['permutation_p_value_total']:>12.4f}")
    print(f"  P-value (Sharpe):    {sig_results['permutation_p_value_sharpe']:>12.4f}")
    print(f"  Significant (5%):    {str(sig_results['significant_5pct']):>12}")
    print(f"  Significant (1%):    {str(sig_results['significant_1pct']):>12}")
    ci = sig_results['bootstrap_ci_95']
    print(f"  95% CI Total PnL:    (${ci[0]:,.2f}, ${ci[1]:,.2f})")
    print(f"  CI Contains Zero:    {str(sig_results['ci_contains_zero']):>12}")

    print("=" * 70)

print_full_report(results, metrics, sig_results)

Key Takeaways from This Case Study

Scale matters. Testing across 1,000 markets provides much more statistical power than testing on a single market. The large sample size lets us detect smaller effects and have higher confidence in the results.
Costs are significant. The 2% fee rate and market impact meaningfully reduce gross returns. Many trades that appear profitable before costs become unprofitable after costs are applied.
Fill simulation reveals reality. Using bid/ask prices with impact modeling instead of last-traded prices shows the true cost of execution. The gap can be substantial in thin markets.
Statistical testing is essential. Without the permutation test and bootstrap confidence interval, we would not know whether the observed performance reflects a genuine edge or just favorable randomness in the simulation.
The methodology generalizes. This same framework --- data generation/loading, strategy implementation, fill simulation, cost modeling, metrics computation, and significance testing --- applies to any prediction market strategy on any platform.