Chapter 23: Further Reading
Core Machine Learning Textbooks
-
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). Springer. The definitive reference for statistical learning. Chapters on tree-based methods, boosting, and neural networks provide rigorous mathematical foundations. Available free online at https://hastie.su.domains/ElemStatLearn/.
-
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. Excellent coverage of Bayesian approaches to machine learning, neural networks, and kernel methods. Particularly useful for understanding probability estimation from a Bayesian perspective.
-
Murphy, K. P. (2022). Probabilistic Machine Learning: An Introduction. MIT Press. A modern, comprehensive treatment of ML with strong emphasis on probabilistic reasoning. Covers decision trees, ensemble methods, neural networks, and calibration within a unified probabilistic framework.
-
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. The standard reference for deep learning theory and practice. Chapters on feedforward networks, regularization, and optimization are directly relevant to neural networks for tabular data. Available free at https://www.deeplearningbook.org/.
Gradient Boosting and XGBoost
-
Chen, T., & Guestrin, C. (2016). "XGBoost: A Scalable Tree Boosting System." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. The original XGBoost paper. Describes the second-order approximation, regularization framework, and systems optimizations that make XGBoost effective and efficient.
-
Ke, G., Meng, Q., Finley, T., et al. (2017). "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." Advances in Neural Information Processing Systems, 30. The LightGBM paper introducing GOSS (Gradient-based One-Side Sampling) and EFB (Exclusive Feature Bundling) for faster training.
-
Friedman, J. H. (2001). "Greedy Function Approximation: A Gradient Boosting Machine." Annals of Statistics, 29(5), 1189-1232. The foundational paper on gradient boosting. Provides the theoretical framework that XGBoost and LightGBM build upon.
-
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). "CatBoost: Unbiased Boosting with Categorical Features." Advances in Neural Information Processing Systems, 31. Introduces CatBoost, another gradient boosting library with novel handling of categorical features and ordered boosting to reduce overfitting.
Calibration
-
Platt, J. (1999). "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods." Advances in Large Margin Classifiers, 61-74. The original paper on Platt scaling. Shows how to fit a sigmoid function to convert SVM outputs into calibrated probabilities; the method applies broadly to any classifier.
-
Niculescu-Mizil, A., & Caruana, R. (2005). "Predicting Good Probabilities with Supervised Learning." Proceedings of the 22nd International Conference on Machine Learning, 625-632. Empirically compares calibration across many classifiers (naive Bayes, logistic regression, SVMs, decision trees, boosted trees, random forests, neural networks). Establishes that boosted trees and random forests are typically miscalibrated and benefit from post-hoc calibration.
-
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). "On Calibration of Modern Neural Networks." Proceedings of the 34th International Conference on Machine Learning, 1321-1330. Shows that modern deep networks are often overconfident and introduces temperature scaling as a simple, effective calibration method.
-
Zadrozny, B., & Elkan, C. (2002). "Transforming Classifier Scores into Accurate Multiclass Probability Estimates." Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 694-699. Introduces isotonic regression for calibration and compares it with Platt scaling.
SHAP and Model Interpretability
-
Lundberg, S. M., & Lee, S. I. (2017). "A Unified Approach to Interpreting Model Predictions." Advances in Neural Information Processing Systems, 30. The foundational SHAP paper connecting Shapley values from game theory with model interpretability. Introduces SHAP values and their desirable properties.
-
Lundberg, S. M., Erion, G., Chen, H., et al. (2020). "From Local Explanations to Global Understanding with Explainable AI for Trees." Nature Machine Intelligence, 2, 56-67. Introduces TreeSHAP, the fast exact algorithm for computing SHAP values in tree-based models. Essential reading for anyone using SHAP with XGBoost, LightGBM, or random forests.
-
Molnar, C. (2022). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2nd ed.). A comprehensive, practitioner-friendly guide to interpretability methods including SHAP, LIME, partial dependence plots, and accumulated local effects. Available free at https://christophm.github.io/interpretable-ml-book/.
-
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "'Why Should I Trust You?': Explaining the Predictions of Any Classifier." Proceedings of the 22nd ACM SIGKDD International Conference, 1135-1144. Introduces LIME (Local Interpretable Model-agnostic Explanations), an alternative to SHAP for local model interpretability.
Neural Networks for Tabular Data
-
Shwartz-Ziv, R., & Armon, A. (2022). "Tabular Data: Deep Learning is Not All You Need." Information Fusion, 81, 84-90. Demonstrates that well-tuned gradient boosting often matches or beats deep learning on tabular data. Essential reading for understanding when neural networks are and are not appropriate.
-
Gorishniy, Y., Rubachev, I., Khrulkov, V., & Babenko, A. (2021). "Revisiting Deep Learning Models for Tabular Data." Advances in Neural Information Processing Systems, 34. Evaluates various neural network architectures for tabular data, including MLP, ResNet, and attention-based models, comparing them with gradient boosting.
-
Arik, S. O., & Pfister, T. (2021). "TabNet: Attentive Interpretable Tabular Learning." Proceedings of the AAAI Conference on Artificial Intelligence, 35(8), 6679-6687. Introduces TabNet, a neural network architecture specifically designed for tabular data with built-in feature selection and interpretability.
Hyperparameter Optimization
-
Bergstra, J., & Bengio, Y. (2012). "Random Search for Hyper-Parameter Optimization." Journal of Machine Learning Research, 13, 281-305. Shows that random search is more efficient than grid search for hyperparameter optimization, especially when some hyperparameters are more important than others.
-
Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). "Optuna: A Next-Generation Hyperparameter Optimization Framework." Proceedings of the 25th ACM SIGKDD International Conference, 2623-2631. The Optuna paper describing its pruning mechanisms, define-by-run API, and efficient search strategies.
-
Feurer, M., & Hutter, F. (2019). "Hyperparameter Optimization." Automated Machine Learning, Springer, 3-33. A comprehensive survey of hyperparameter optimization methods including grid search, random search, Bayesian optimization, and multi-fidelity methods.
Prediction Markets and Forecasting
-
Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown Publishers. Examines what makes accurate forecasters, including calibration, updating, and the importance of probabilistic thinking. Directly relevant to understanding what good probability estimates look like.
-
Wolfers, J., & Zitzewitz, E. (2004). "Prediction Markets." Journal of Economic Perspectives, 18(2), 107-126. A foundational survey of prediction markets, their design, and their accuracy relative to other forecasting methods.
-
Gneiting, T., & Raftery, A. E. (2007). "Strictly Proper Scoring Rules, Prediction, and Estimation." Journal of the American Statistical Association, 102(477), 359-378. Rigorous treatment of proper scoring rules (including Brier score and log-loss) that are used to evaluate probability estimates. Essential for understanding why these metrics incentivize honest probability reporting.
Software Documentation
-
XGBoost Documentation. https://xgboost.readthedocs.io/ Official documentation with parameters guide, tutorials, and API reference.
-
LightGBM Documentation. https://lightgbm.readthedocs.io/ Official documentation for LightGBM.
-
scikit-learn User Guide. https://scikit-learn.org/stable/user_guide.html Comprehensive guide covering random forests, calibration (CalibratedClassifierCV), model selection, and evaluation metrics.
-
PyTorch Tutorials. https://pytorch.org/tutorials/ Official PyTorch tutorials for building and training neural networks.
-
SHAP Documentation. https://shap.readthedocs.io/ Official SHAP library documentation with examples for TreeSHAP, DeepSHAP, and KernelSHAP.
-
Optuna Documentation. https://optuna.readthedocs.io/ Official Optuna documentation for hyperparameter optimization.
Related Chapters
-
Chapter 22: Logistic Regression and Probability Models — Logistic regression as the baseline model that ML methods seek to improve upon. Understanding logistic regression is prerequisite knowledge for gradient boosting (which uses logistic loss).
-
Chapter 24: Ensemble Methods and Model Stacking — Combining multiple ML models into ensembles that outperform individual models. Builds directly on the diverse model training from this chapter.
-
Chapter 25: Time Series Models for Prediction Markets — Specialized time-series methods (ARIMA, state-space models, recurrent neural networks) for sequential prediction market data.
-
Chapter 20: Bayesian Inference for Prediction Markets — Bayesian approaches to probability estimation that complement the frequentist ML methods in this chapter.