Chapter 22 Further Reading
Foundational Textbooks
Regression and Classification
-
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). Springer. The definitive reference for statistical learning. Chapters 4 (logistic regression), 3 (linear regression), and 7 (model assessment) are directly relevant to this chapter. Available free online from the authors.
-
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (2nd ed.). Springer. A more accessible version of the above. Chapters 4 and 6 cover logistic regression and regularization with clear explanations and R/Python examples. Also available free online.
-
Agresti, A. (2013). Categorical Data Analysis (3rd ed.). Wiley. Comprehensive treatment of logistic regression, multinomial models, and related techniques. Particularly strong on interpretation of odds ratios and marginal effects.
Time Series Analysis
-
Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press. The canonical graduate-level time series textbook. Covers ARIMA, state space models, and regime-switching models in rigorous detail. Chapters 3-5 (ARIMA) and 22 (Markov switching) are most relevant.
-
Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.). OTexts. An excellent, practical introduction to time series forecasting. Covers ARIMA, exponential smoothing, and forecast evaluation with modern Python/R implementations. Available free online at https://otexts.com/fpp3/.
-
Tsay, R. S. (2010). Analysis of Financial Time Series (3rd ed.). Wiley. Focused on financial applications, with thorough coverage of GARCH models, volatility modeling, and non-linear time series. Chapters 3 and 4 are directly applicable to prediction market price analysis.
-
Shumway, R. H., & Stoffer, D. S. (2017). Time Series Analysis and Its Applications (4th ed.). Springer. Accessible yet rigorous coverage of both classical and modern time series methods, including state space models and the Kalman filter.
Prediction Markets and Forecasting
-
Wolfers, J., & Zitzewitz, E. (2004). "Prediction Markets." Journal of Economic Perspectives, 18(2), 107-126. Foundational survey of prediction markets covering efficiency, design, and applications. Provides the economic context for why statistical modeling of prediction markets is both possible and profitable.
-
Manski, C. F. (2006). "Interpreting the Predictions of Prediction Markets." Economics Letters, 91(3), 425-429. Important paper on whether prediction market prices can be directly interpreted as probabilities. Relevant to understanding the relationship between model outputs and market prices.
-
Arrow, K. J., Forsythe, R., Gorham, M., Hahn, R., Hanson, R., Ledyard, J. O., ... & Zitzewitz, E. (2008). "The Promise of Prediction Markets." Science, 320(5878), 877-878. Brief but influential advocacy for prediction markets from leading economists, providing context for why modeling these markets is a productive research area.
-
Page, L. (2012). "'Are Prediction Markets Price Efficient?' Evidence from theELI Exchange." Journal of Economic Behavior & Organization, 83(1), 5-14. Empirical analysis of prediction market efficiency, documenting the biases (favorite-longshot, etc.) that statistical models can potentially exploit.
Applied Statistical Methods for Prediction
-
Silver, N. (2012). The Signal and the Noise: Why So Many Predictions Fail — but Some Don't. Penguin. Popular science treatment of forecasting with extensive discussion of election prediction models and the challenge of calibration. Provides intuition for the statistical concepts in this chapter.
-
Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. Practical, Bayesian-influenced guide to regression modeling. The treatment of logistic regression (Chapters 5-6) is especially clear and applicable to prediction market modeling.
-
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics. Princeton University Press. Focuses on causal inference but provides excellent guidance on regression specifications, robust standard errors, and avoiding common pitfalls that are directly relevant to prediction market feature analysis.
Regularization and Feature Selection
-
Tibshirani, R. (1996). "Regression Shrinkage and Selection via the Lasso." Journal of the Royal Statistical Society: Series B, 58(1), 267-288. The original Lasso paper. Establishes the theoretical foundation for L1 regularization and its feature selection properties.
-
Zou, H., & Hastie, T. (2005). "Regularization and Variable Selection via the Elastic Net." Journal of the Royal Statistical Society: Series B, 67(2), 301-320. Introduces Elastic Net, combining L1 and L2 penalties. Essential for understanding how to handle correlated features in prediction market models.
GARCH and Volatility Modeling
-
Bollerslev, T. (1986). "Generalized Autoregressive Conditional Heteroskedasticity." Journal of Econometrics, 31(3), 307-327. The foundational GARCH paper. Extends Engle's ARCH model to the generalized form used throughout this chapter and in practice.
-
Engle, R. F. (2001). "GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics." Journal of Economic Perspectives, 15(4), 157-168. Accessible introduction to GARCH modeling by the inventor of ARCH, who received the Nobel Prize for this work. Excellent starting point for understanding volatility modeling.
Model Validation and Evaluation
-
Tashman, L. J. (2000). "Out-of-Sample Tests of Forecasting Accuracy: An Analysis and Review." International Journal of Forecasting, 16(4), 437-450. Comprehensive review of out-of-sample evaluation methods including walk-forward validation. Discusses the practical issues of forecast evaluation that are central to this chapter.
-
Gneiting, T., & Raftery, A. E. (2007). "Strictly Proper Scoring Rules, Prediction, and Estimation." Journal of the American Statistical Association, 102(477), 359-378. Rigorous treatment of proper scoring rules including log-loss and Brier score. Establishes why these metrics are appropriate for evaluating probabilistic forecasts.
-
Niculescu-Mizil, A., & Caruana, R. (2005). "Predicting Good Probabilities with Supervised Learning." Proceedings of the 22nd International Conference on Machine Learning, 625-632. Empirical study of calibration across different models. Shows that logistic regression tends to be well-calibrated, while other models (naive Bayes, SVMs, trees) often require post-hoc calibration.
Software Documentation
-
scikit-learn: LogisticRegression. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html Official documentation for the primary logistic regression implementation used in this chapter.
-
statsmodels: Time Series Analysis. https://www.statsmodels.org/stable/tsa.html Documentation for ARIMA, state space models, and time series diagnostics in Python.
-
arch: ARCH/GARCH Models. https://arch.readthedocs.io/ Documentation for the
archpackage used for GARCH modeling in Python.
Online Resources
-
Forecasting: Principles and Practice (free online textbook). https://otexts.com/fpp3/ Rob Hyndman's outstanding free textbook on forecasting, with practical exercises and R/Python code.
-
Penn World Table and FRED Economic Data. https://fred.stlouisfed.org/ Free access to macroeconomic data useful as features in prediction market models.
-
FiveThirtyEight Election Forecast Methodology. Historical documentation of Nate Silver's election forecasting methodology, which uses many of the techniques covered in this chapter (polling aggregation, logistic regression, simulation).
-
Metaculus and Good Judgment Open. Forecasting platforms that provide historical calibration data useful for studying and practicing probability estimation.