Chapter 22 Key Takeaways
Core Concepts
-
Statistical models serve as independent probability estimators. By building models that map observable features to event probabilities, you create a benchmark against which to evaluate prediction market prices. When your model's probability diverges significantly from the market price, a potential trading opportunity exists.
-
Logistic regression is the workhorse for binary prediction markets. It naturally produces well-calibrated probability estimates between 0 and 1, is interpretable (coefficients are log-odds), and is relatively resistant to overfitting. Always start with logistic regression before considering more complex models.
-
Feature engineering is at least as important as model choice. Well-constructed features — polling averages, momentum indicators, volume signals, time-to-expiry effects, and cross-market information — drive model performance. A simple model with excellent features will outperform a complex model with poor features.
-
Regularization prevents overfitting. Ridge (L2) shrinks coefficients toward zero; Lasso (L1) performs automatic feature selection by setting some coefficients exactly to zero; Elastic Net combines both. Always standardize features before applying regularization, and tune the regularization strength using time-respecting cross-validation.
-
Stationarity is a prerequisite for time series modeling. Prediction market prices are typically non-stationary, but price changes (first differences) or log-odds returns are approximately stationary. The ADF test provides a formal check; ACF and PACF plots reveal the autocorrelation structure.
-
ARIMA captures short-term price dynamics. The Box-Jenkins methodology — identification via ACF/PACF, estimation, diagnostic checking — provides a disciplined approach to time series modeling. ARIMA is most useful for 1-to-5-step-ahead forecasts.
-
GARCH captures volatility dynamics. Volatility clusters in prediction markets: volatile periods tend to persist. GARCH models quantify current volatility, which is essential for position sizing and risk management. Higher conditional volatility warrants smaller positions.
-
Walk-forward validation is mandatory for time series data. Standard k-fold cross-validation creates lookahead bias by mixing past and future data. Walk-forward validation respects temporal ordering and provides realistic out-of-sample performance estimates.
-
Lookahead bias is the most dangerous error in backtesting. It can enter at every stage: feature computation, scaling, feature selection, hyperparameter tuning, and model selection. Every computation must use only data available at the time of prediction.
-
Model selection uses information criteria and proper scoring rules. AIC and BIC balance fit and complexity. Log-loss and Brier score measure the quality of probability estimates. Calibration assessment ensures predicted probabilities match observed frequencies.
Practical Rules of Thumb
-
Start simple. A logistic regression with 3-5 well-chosen features often outperforms a complex model with 50 features on out-of-sample data.
-
Expect modest R-squared values. In competitive markets, models that explain 5-15% of variance on out-of-sample data can still generate profitable trading signals.
-
Use half-Kelly or less. The Kelly criterion provides optimal position sizing in theory, but estimation error in model probabilities makes full Kelly too aggressive. Half-Kelly or quarter-Kelly is more practical.
-
Set trading thresholds above transaction costs. The minimum edge required for a trade should exceed the round-trip transaction cost (bid-ask spread plus fees) by a safety margin that reflects model uncertainty.
-
Retrain models periodically. Relationships between features and outcomes can change over time (concept drift). A rolling window or periodic retraining schedule keeps the model current.
-
Always compare to a naive baseline. An ARIMA model should beat the random walk forecast. A logistic regression should beat the historical base rate. If the model cannot beat simple baselines, the added complexity is not justified.
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
| Using k-fold CV on time series | Overestimates model performance by 20-50% | Use walk-forward validation |
| Scaling features before splitting | Leaks test set information into training | Fit scalers within each fold |
| Too many features, too few observations | Severe overfitting | Use Lasso or manual feature selection |
| Ignoring transaction costs in backtest | Strategy looks profitable but loses money in practice | Include realistic transaction costs from the start |
| ARIMA on raw prices (not differenced) | Spurious regression, unreliable forecasts | Test for stationarity; difference as needed |
| Ignoring model calibration | Probability estimates do not match reality | Use reliability diagrams and ECE |
| Over-optimizing on training data | Model memorizes noise | Use proper train/validation/test splits |
Key Formulas
Logistic regression probability: $$p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k)}}$$
Odds ratio for coefficient $\beta_j$: $$OR_j = e^{\beta_j}$$
Log-loss (proper scoring rule): $$\text{Log-loss} = -\frac{1}{n}\sum_{i=1}^{n}[y_i \log(\hat{p}_i) + (1-y_i)\log(1-\hat{p}_i)]$$
Brier score: $$\text{Brier} = \frac{1}{n}\sum_{i=1}^{n}(\hat{p}_i - y_i)^2$$
AIC: $$\text{AIC} = -2\ell(\hat{\theta}) + 2k$$
GARCH(1,1) conditional variance: $$\sigma_t^2 = \omega + \alpha \epsilon_{t-1}^2 + \beta \sigma_{t-1}^2$$
What to Remember
The central message of this chapter is that statistical modeling for prediction markets is an exercise in disciplined probability estimation. The goal is not to maximize in-sample fit or classification accuracy, but to produce well-calibrated, out-of-sample probability estimates that differ from market prices by more than the cost of trading. Every methodological choice — from feature engineering to regularization to validation — should serve this goal.