Chapter 25 Further Reading
Foundational Papers
Forecast Combination
-
Bates, J. M., & Granger, C. W. J. (1969). "The Combination of Forecasts." Operational Research Quarterly, 20(4), 451-468. The seminal paper that launched the forecast combination literature. Bates and Granger demonstrated that combining forecasts from different methods produces lower mean squared error than either method alone, and introduced the theoretical framework for optimal combination weights.
-
Clemen, R. T. (1989). "Combining Forecasts: A Review and Annotated Bibliography." International Journal of Forecasting, 5(4), 559-583. A comprehensive review of the first two decades of forecast combination research. Clemen's conclusion -- that "the results have been virtually unanimous: combining multiple forecasts leads to increased forecast accuracy" -- remains one of the strongest empirical generalizations in the field.
-
Timmermann, A. (2006). "Forecast Combinations." Handbook of Economic Forecasting, Vol. 1, 135-196. A thorough modern treatment of forecast combination theory, covering optimal weights, Bayesian approaches, and the forecast combination puzzle. Essential reading for understanding why simple averages are so hard to beat.
The Forecast Combination Puzzle
-
Stock, J. H., & Watson, M. W. (2004). "Combination Forecasts of Output Growth in a Seven-Country Data Set." Journal of Forecasting, 23(6), 405-430. Demonstrates the forecast combination puzzle in macroeconomic forecasting: simple combination methods frequently outperform more sophisticated approaches across multiple countries and time periods.
-
Smith, J., & Wallis, K. F. (2009). "A Simple Explanation of the Forecast Combination Puzzle." Oxford Bulletin of Economics and Statistics, 71(3), 331-355. Provides an elegant explanation of why equal-weight averaging is so robust: the estimation error in optimal weights is often large enough to offset the theoretical advantage of optimization.
Extremizing and Forecast Aggregation
-
Baron, J., Mellers, B. A., Tetlock, P. E., Stone, E., & Ungar, L. H. (2014). "Two Reasons to Make Aggregated Probability Forecasts More Extreme." Decision Analysis, 11(2), 133-145. The theoretical and empirical case for extremizing. Provides two complementary justifications: the shared-information argument and the probing argument. Essential reading for anyone implementing extremizing in practice.
-
Satopaa, V. A., Baron, J., Foster, D. P., Mellers, B. A., Tetlock, P. E., & Ungar, L. H. (2014). "Combining Multiple Probability Predictions Using a Simple Logit Model." International Journal of Forecasting, 30(2), 344-356. Introduces the logistic recalibration approach to finding the optimal extremizing factor. Shows that this approach outperforms simple extremizing and provides both bias correction and extremization simultaneously.
-
Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown Publishing. The popular account of the Good Judgment Project, which demonstrated the power of extremized crowd forecasts in geopolitical prediction. Provides practical context for the theory covered in this chapter.
Opinion Pooling
-
Genest, C., & Zidek, J. V. (1986). "Combining Probability Distributions: A Critique and an Annotated Bibliography." Statistical Science, 1(1), 114-135. The classic review of linear and logarithmic pooling, covering axiomatic foundations, properties, and the relationship between the two approaches. Required reading for understanding the theoretical basis of probability combination.
-
Ranjan, R., & Gneiting, T. (2010). "Combining Probability Forecasts." Journal of the Royal Statistical Society: Series B, 72(1), 71-91. Introduces the beta-transformed linear pool, which generalizes both the linear pool and extremizing. Shows that the linear pool is generically miscalibrated and proposes corrections based on the beta transformation.
-
Hora, S. C. (2004). "Probability Judgments for Continuous Quantities: Linear Combinations and Calibration." Management Science, 50(5), 597-604. Analyzes the calibration properties of linear combinations of probability forecasts and shows when and why linear pooling preserves calibration.
Stacking and Meta-Learning
-
Wolpert, D. H. (1992). "Stacked Generalization." Neural Networks, 5(2), 241-259. The original paper on stacking. Wolpert introduces the concept of using a meta-learner to combine base model predictions and proves that this approach can reduce generalization error under broad conditions.
-
Breiman, L. (1996). "Stacked Regressions." Machine Learning, 24(1), 49-64. Breiman's influential treatment of stacking, which emphasizes cross-validation to prevent overfitting and demonstrates the effectiveness of constrained regression as a meta-learner.
-
van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). "Super Learner." Statistical Applications in Genetics and Molecular Biology, 6(1). Introduces the Super Learner framework, which provides theoretical guarantees for cross-validated stacking: the ensemble performs asymptotically as well as the oracle-best weighted combination.
Bayesian Model Averaging
-
Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). "Bayesian Model Averaging: A Tutorial." Statistical Science, 14(4), 382-401. The definitive tutorial on BMA. Covers the theory, BIC approximation, and practical implementation. Clear examples illustrate when BMA outperforms model selection and when it does not.
-
Raftery, A. E., Gneiting, T., Balabdaoui, F., & Polakowski, M. (2005). "Using Bayesian Model Averaging to Calibrate Forecast Ensembles." Monthly Weather Review, 133(5), 1155-1174. Applies BMA to weather forecast ensembles, showing how BMA can produce calibrated probabilistic forecasts from ensembles of deterministic numerical weather prediction models.
Ensemble Diversity
-
Krogh, A., & Vedelsby, J. (1995). "Neural Network Ensembles, Cross Validation, and Active Learning." Advances in Neural Information Processing Systems, 7, 231-238. Derives the ambiguity decomposition, which shows that ensemble error equals average individual error minus diversity. This fundamental result explains why diversity is the key ingredient in ensemble performance.
-
Kuncheva, L. I., & Whitaker, C. J. (2003). "Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy." Machine Learning, 51(2), 181-207. A comprehensive survey of diversity measures for classifier ensembles, including correlation, Q-statistic, disagreement, and double-fault measures. Analyzes the relationship between diversity and ensemble accuracy.
Prediction Markets and Forecast Aggregation
-
Manski, C. F. (2006). "Interpreting the Predictions of Prediction Markets." Economics Letters, 91(3), 425-429. Analyzes how to interpret prediction market prices as probability forecasts and the conditions under which the market price equals the mean belief of traders.
-
Rothschild, D. (2009). "Forecasting Elections: Comparing Prediction Markets, Polls, and Their Biases." Public Opinion Quarterly, 73(5), 895-916. Compares prediction markets to polls as forecasting tools and discusses methods for combining them. Shows that markets and polls each have distinct biases that can be corrected through combination.
-
Ungar, L., Mellers, B., Satopaa, V., Tetlock, P., & Baron, J. (2012). "The Good Judgment Project: A Large Scale Test of Different Methods of Combining Expert Judgments." AAAI Technical Report FS-12-06. Describes the aggregation methods used in the Good Judgment Project, including extremized weighted means, and demonstrates their superiority over simple averaging for geopolitical forecasting.
Textbooks and Monographs
-
Dietterich, T. G. (2000). "Ensemble Methods in Machine Learning." International Workshop on Multiple Classifier Systems, 1-15. Springer. A highly accessible introduction to ensemble methods in machine learning, covering bagging, boosting, and stacking with clear intuitions for why they work.
-
Zhou, Z.-H. (2012). Ensemble Methods: Foundations and Algorithms. CRC Press. A comprehensive textbook on ensemble methods covering theoretical foundations, major algorithms, and practical considerations. Excellent reference for the machine learning perspective on ensembles.
-
Clemen, R. T., & Winkler, R. L. (1999). "Combining Probability Distributions from Experts in Risk Analysis." Risk Analysis, 19(2), 187-203. A practical guide to combining expert probability judgments, with special attention to the challenges of dependent experts and calibration.
Online Resources
-
Metaculus Track Record: metaculus.com/questions/track-record — Real-world data on forecast aggregation performance, comparing community predictions to Metaculus's proprietary aggregation algorithm.
-
Good Judgment Open: gjopen.com — A public forecasting platform where you can practice and observe forecast aggregation in action.
-
scikit-learn Ensemble Methods: scikit-learn.org/stable/modules/ensemble.html — Documentation for implementing bagging, boosting, and stacking in Python.
-
Forecast Combination Package (R): The
ForecastCombR package implements many of the combination methods discussed in this chapter, including optimal weights, Bates-Granger, and various shrinkage approaches.