Exercises: Chapter 26
Section A: Conceptual Understanding (Exercises 1--10)
Exercise 1: Identifying Lookahead Bias
The following backtest code snippet processes prediction market data. Identify all instances of lookahead bias and explain how to fix each one.
import pandas as pd
import numpy as np
df = pd.read_csv('market_data.csv')
df['resolution'] = df.groupby('market_id')['resolution'].transform('last')
df['zscore'] = (df['price'] - df['price'].mean()) / df['price'].std()
for i, row in df.iterrows():
if row['zscore'] < -1.5 and row['resolution'] == 1:
signal = 'BUY'
elif row['zscore'] > 1.5 and row['resolution'] == 0:
signal = 'SELL'
Exercise 2: Survivorship Bias Scenario
You download a dataset of 500 prediction markets from a platform. The dataset only includes markets that successfully resolved (YES or NO) and excludes 73 markets that were cancelled due to ambiguous resolution criteria. Your strategy specifically targets markets with unusual resolution criteria because they tend to be mispriced.
(a) Explain why your backtest results will be biased. (b) In which direction will the bias push your estimated returns? (c) Propose a method to correct for or mitigate this bias.
Exercise 3: Overfitting Analysis
A trader tests 50 different parameter combinations for a mean-reversion strategy on a single prediction market and selects the one with the highest Sharpe ratio. The best combination achieves an in-sample Sharpe of 3.2.
(a) Calculate the probability that at least one of the 50 tests would exceed a Sharpe of 2.0 purely by chance (assume returns are normally distributed with zero mean and unit variance, and that the sample has 100 observations). (b) What is the expected maximum Sharpe ratio across 50 independent tests under the null hypothesis? (c) How does this change the interpretation of the observed Sharpe of 3.2?
Exercise 4: Event-Driven vs. Vectorized
Explain why an event-driven backtesting architecture provides structural protection against lookahead bias, while a vectorized approach does not. Use a specific example involving a prediction market strategy that computes a rolling average.
Exercise 5: Fill Simulation Importance
A strategy backtested on a prediction market with average daily volume of 200 contracts shows a 45% annual return. The strategy trades 50 contracts per signal. Using the square-root impact model with $\sigma = 0.03$, $\beta = 0.5$, and $V = 200$, calculate the expected market impact cost per trade and determine whether the strategy remains profitable after accounting for this impact.
Exercise 6: Transaction Cost Breakdown
For a Polymarket trade with the following parameters, calculate the total transaction cost: - Buy 100 YES contracts at ask price of $0.62 - Bid price is $0.58 - Taker fee: 2% - Expected holding period: 45 days - Risk-free rate: 5% annually
Include: spread cost, trading fee, and opportunity cost.
Exercise 7: Walk-Forward Design
You have 3 years of daily prediction market data. Design a walk-forward testing scheme with: - 6-month training windows - 2-month test windows - Rolling (not anchored) approach
(a) How many walk-forward steps will you have? (b) How much of the total data will be used for out-of-sample testing? (c) What is the minimum number of trades per window needed for statistical reliability?
Exercise 8: Metric Interpretation
A strategy reports the following metrics: - Win Rate: 72% - Average Win: $0.04 - Average Loss: $0.12 - Profit Factor: 0.86
(a) Is this strategy profitable? Explain using the expectancy formula. (b) What does the combination of high win rate and low profit factor suggest about the strategy's risk profile? (c) How would you modify the strategy to improve its risk-adjusted performance?
Exercise 9: Statistical Power
Calculate the minimum number of trades needed to detect a Sharpe ratio of 0.8 with: (a) 80% power at the 5% significance level (b) 90% power at the 1% significance level (c) Discuss the practical implications for prediction market backtesting, where markets often have limited trading history.
Exercise 10: Multiple Comparisons
You test 30 strategies on the same prediction market dataset. Five strategies show p-values below 0.05.
(a) Apply the Bonferroni correction. How many strategies remain significant? (b) Apply the Benjamini-Hochberg procedure with FDR = 0.10. How many strategies remain significant? (c) Which correction method is more appropriate for exploratory backtesting research, and why?
Section B: Implementation (Exercises 11--20)
Exercise 11: Build a Data Handler
Implement a CSVDataHandler class that inherits from the DataHandler abstract base class defined in Section 26.3. The handler should:
- Load data from a CSV file with columns: timestamp, market_id, last_price, bid, ask, volume, bid_size, ask_size
- Support multiple markets simultaneously
- Enforce chronological ordering
- Prevent any possibility of lookahead (only emit data up to the current timestamp)
Exercise 12: Implement a Mean-Reversion Strategy
Implement a MeanReversionStrategy class that inherits from Strategy. The strategy should:
- Compute a rolling z-score of the price over a configurable lookback window
- Generate a BUY signal when z-score < -threshold
- Generate a SELL signal when z-score > +threshold
- Return no signal when abs(z-score) < threshold
- Use only data available through the DataHandler.get_latest() method
Exercise 13: Build a Portfolio Manager
Implement a SimplePortfolio class that inherits from Portfolio. It should:
- Track positions in multiple markets simultaneously
- Enforce a maximum position size per market (configurable)
- Enforce a maximum total portfolio allocation (configurable)
- Convert signals to orders only when position limits allow
- Track realized and unrealized P&L
Exercise 14: Implement Fill Simulation
Extend the RealisticExecutionSimulator from Section 26.5 to support:
- A configurable "fill probability" that varies with order size relative to available liquidity
- Time-varying slippage (higher during volatile periods)
- A "queue position" model for limit orders (your order fills only after orders ahead of you in the queue)
Exercise 15: Cost Model for PredictIt
Implement a PredictItCostModel that models PredictIt's unique fee structure:
- 10% fee on profits per market (not per trade --- calculated at market resolution)
- 5% withdrawal fee
- $850 maximum position per market
- No fee on losing trades
Your model should correctly track cumulative P&L per market to calculate the profit fee at resolution.
Exercise 16: Vectorized Backtester
Implement a vectorized backtester that operates on pandas DataFrames for fast strategy screening. The backtester should: - Accept a signal DataFrame (same index as price data, values of +1, -1, 0) - Apply configurable transaction costs - Compute an equity curve - Return a dictionary of performance metrics - Include a warning if the signal appears to use future information (basic check: correlation between signal and future returns is suspiciously high)
Exercise 17: Walk-Forward with Cross-Validation
Extend the WalkForwardEngine from Section 26.7 to support combinatorial purged cross-validation (CPCV):
- Within each training window, use k-fold cross-validation with a purge gap
- The purge gap prevents information leakage between train and validation folds
- Select parameters based on average cross-validated performance rather than single in-sample performance
Exercise 18: Custom Performance Metric
Implement a "Prediction Market Efficiency" metric that captures how well a strategy exploits prediction market mispricing. Define it as:
$$PME = \frac{\text{Average Edge Captured}}{\text{Average Edge Available}}$$
Where "edge available" is the absolute difference between the market price and the true resolution probability, and "edge captured" is the profit earned relative to the edge available at the time of entry.
Exercise 19: Regime-Aware Backtester
Implement a regime-detection module that identifies different market regimes (e.g., low volatility, high volatility, trending, mean-reverting) and reports strategy performance separately for each regime. Use a hidden Markov model with two states.
Exercise 20: Backtest Comparison Framework
Build a framework that can compare two strategies side-by-side: - Run both on the same data - Compute all metrics for each - Perform a paired t-test to determine if the difference in returns is statistically significant - Generate a comparison report with overlaid equity curves
Section C: Analysis and Research (Exercises 21--30)
Exercise 21: Sharpe Ratio Distribution
Simulate 10,000 random strategies (random signals on random returns) and plot the distribution of backtest Sharpe ratios. What is the 95th percentile Sharpe for random strategies with 500 trades? How does this change with 100, 200, and 1000 trades?
Exercise 22: Impact of Spread on Strategy Viability
For a strategy with a gross Sharpe ratio of 1.5 that trades once per day, plot how the net Sharpe ratio declines as the spread increases from 0 to 10 cents. At what spread does the strategy become unprofitable? How does trading frequency affect this relationship?
Exercise 23: Optimal Walk-Forward Window Size
Using simulated data with a known signal embedded in noise, test walk-forward analysis with training windows of 30, 60, 90, 120, 180, and 365 days. Plot out-of-sample performance as a function of training window size. Is there an optimal window size? How does it relate to the signal's characteristics?
Exercise 24: Bootstrap Analysis of Drawdown
Generate 10,000 bootstrap samples of a strategy's return series and compute the maximum drawdown distribution. Report the 5th, 25th, 50th, 75th, and 95th percentiles. How does this compare to the single backtest drawdown? What does this tell you about drawdown uncertainty?
Exercise 25: The Bailey-Lopez de Prado Minimum Backtest Length
Implement the Bailey-Lopez de Prado formula for the minimum backtest length (MBL) needed to avoid false discoveries:
$$MBL \geq \frac{1}{SR^2} \left[ (z_\alpha + z_\beta)^2 + \frac{(z_\alpha + z_\beta)^4}{4} \hat{\gamma}_3^2 + \frac{(z_\alpha + z_\beta)^2}{4} (\hat{\gamma}_4 - 3) \right]$$
Where $\hat{\gamma}_3$ and $\hat{\gamma}_4$ are the skewness and kurtosis of returns. Calculate MBL for a strategy with Sharpe 1.0, skewness -0.5, and excess kurtosis 3.0.
Exercise 26: Transaction Cost Sensitivity Surface
Create a 3D surface plot showing strategy net return as a function of two cost parameters: spread (0--10 cents) and fee rate (0--5%). Identify the "break-even" contour where the strategy transitions from profitable to unprofitable.
Exercise 27: Stale Price Detection
Implement a stale price detection algorithm that identifies periods in prediction market data where the quoted price has not changed for an unusually long time. The algorithm should: - Flag prices that have not moved in more than 2x the average inter-trade interval - Distinguish between genuinely stale quotes and markets that legitimately trade at stable prices - Adjust the execution simulator to use wider spreads during detected stale periods
Exercise 28: Portfolio-Level Backtesting
Extend the backtesting framework to handle a portfolio of 50 prediction markets simultaneously. The portfolio should: - Respect a total capital constraint - Implement equal-weight and risk-parity allocation schemes - Compute portfolio-level metrics (including cross-market correlation effects) - Handle the fact that different markets have different resolution dates
Exercise 29: Slippage Model Calibration
Given a dataset of 1,000 actual prediction market executions (with order size, market conditions at time of order, and actual fill price), calibrate the parameters of the square-root impact model. Report the calibrated $\beta$ coefficient and its confidence interval. Compare the calibrated model's predictions to a simple constant-slippage model.
Exercise 30: End-to-End Backtest Pipeline
Build a complete end-to-end backtest pipeline that: 1. Downloads prediction market data from an API (or loads from a provided CSV) 2. Cleans and validates the data 3. Implements a momentum strategy (buy markets whose prices have risen over the past N periods) 4. Runs walk-forward backtesting with parameter optimization 5. Applies the fill simulator and transaction cost model 6. Generates a full performance report with statistical significance tests 7. Produces a go/no-go recommendation for paper trading
This should be a single script that runs from start to finish and produces a PDF or HTML report.