Chapter 22 Key Takeaways

Core Concepts

Statistical models serve as independent probability estimators. By building models that map observable features to event probabilities, you create a benchmark against which to evaluate prediction market prices. When your model's probability diverges significantly from the market price, a potential trading opportunity exists.
Logistic regression is the workhorse for binary prediction markets. It naturally produces well-calibrated probability estimates between 0 and 1, is interpretable (coefficients are log-odds), and is relatively resistant to overfitting. Always start with logistic regression before considering more complex models.
Feature engineering is at least as important as model choice. Well-constructed features — polling averages, momentum indicators, volume signals, time-to-expiry effects, and cross-market information — drive model performance. A simple model with excellent features will outperform a complex model with poor features.
Regularization prevents overfitting. Ridge (L2) shrinks coefficients toward zero; Lasso (L1) performs automatic feature selection by setting some coefficients exactly to zero; Elastic Net combines both. Always standardize features before applying regularization, and tune the regularization strength using time-respecting cross-validation.
Stationarity is a prerequisite for time series modeling. Prediction market prices are typically non-stationary, but price changes (first differences) or log-odds returns are approximately stationary. The ADF test provides a formal check; ACF and PACF plots reveal the autocorrelation structure.
ARIMA captures short-term price dynamics. The Box-Jenkins methodology — identification via ACF/PACF, estimation, diagnostic checking — provides a disciplined approach to time series modeling. ARIMA is most useful for 1-to-5-step-ahead forecasts.
GARCH captures volatility dynamics. Volatility clusters in prediction markets: volatile periods tend to persist. GARCH models quantify current volatility, which is essential for position sizing and risk management. Higher conditional volatility warrants smaller positions.
Walk-forward validation is mandatory for time series data. Standard k-fold cross-validation creates lookahead bias by mixing past and future data. Walk-forward validation respects temporal ordering and provides realistic out-of-sample performance estimates.
Lookahead bias is the most dangerous error in backtesting. It can enter at every stage: feature computation, scaling, feature selection, hyperparameter tuning, and model selection. Every computation must use only data available at the time of prediction.
Model selection uses information criteria and proper scoring rules. AIC and BIC balance fit and complexity. Log-loss and Brier score measure the quality of probability estimates. Calibration assessment ensures predicted probabilities match observed frequencies.

Practical Rules of Thumb

Start simple. A logistic regression with 3-5 well-chosen features often outperforms a complex model with 50 features on out-of-sample data.
Expect modest R-squared values. In competitive markets, models that explain 5-15% of variance on out-of-sample data can still generate profitable trading signals.
Use half-Kelly or less. The Kelly criterion provides optimal position sizing in theory, but estimation error in model probabilities makes full Kelly too aggressive. Half-Kelly or quarter-Kelly is more practical.
Set trading thresholds above transaction costs. The minimum edge required for a trade should exceed the round-trip transaction cost (bid-ask spread plus fees) by a safety margin that reflects model uncertainty.
Retrain models periodically. Relationships between features and outcomes can change over time (concept drift). A rolling window or periodic retraining schedule keeps the model current.
Always compare to a naive baseline. An ARIMA model should beat the random walk forecast. A logistic regression should beat the historical base rate. If the model cannot beat simple baselines, the added complexity is not justified.

Common Pitfalls

Pitfall	Consequence	Prevention
Using k-fold CV on time series	Overestimates model performance by 20-50%	Use walk-forward validation
Scaling features before splitting	Leaks test set information into training	Fit scalers within each fold
Too many features, too few observations	Severe overfitting	Use Lasso or manual feature selection
Ignoring transaction costs in backtest	Strategy looks profitable but loses money in practice	Include realistic transaction costs from the start
ARIMA on raw prices (not differenced)	Spurious regression, unreliable forecasts	Test for stationarity; difference as needed
Ignoring model calibration	Probability estimates do not match reality	Use reliability diagrams and ECE
Over-optimizing on training data	Model memorizes noise	Use proper train/validation/test splits

Key Formulas

Logistic regression probability: $$p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k)}}$$

Odds ratio for coefficient $\beta_j$: $$OR_j = e^{\beta_j}$$

Log-loss (proper scoring rule): $$\text{Log-loss} = -\frac{1}{n}\sum_{i=1}^{n}[y_i \log(\hat{p}_i) + (1-y_i)\log(1-\hat{p}_i)]$$

Brier score: $$\text{Brier} = \frac{1}{n}\sum_{i=1}^{n}(\hat{p}_i - y_i)^2$$

AIC: $$\text{AIC} = -2\ell(\hat{\theta}) + 2k$$

GARCH(1,1) conditional variance: $$\sigma_t^2 = \omega + \alpha \epsilon_{t-1}^2 + \beta \sigma_{t-1}^2$$

What to Remember

The central message of this chapter is that statistical modeling for prediction markets is an exercise in disciplined probability estimation. The goal is not to maximize in-sample fit or classification accuracy, but to produce well-calibrated, out-of-sample probability estimates that differ from market prices by more than the cost of trading. Every methodological choice — from feature engineering to regularization to validation — should serve this goal.