Chapter 8 Further Reading: Supervised Learning — Regression


Regression Foundations

1. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2023). An Introduction to Statistical Learning with Applications in Python (2nd ed.). Springer. The definitive introductory textbook on statistical learning, now available with Python code (previously R only). Chapters 3 (Linear Regression), 6 (Regularization), and 8 (Tree-Based Methods) cover the core algorithms from this chapter with the right balance of mathematical rigor and practical intuition. The free PDF is available at statlearning.com. If you read only one supplementary text for this course, make it this one.

2. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). Springer. The advanced companion to ISLR, covering the same topics at greater mathematical depth. Not required for this course, but invaluable if you want to understand why Ridge and Lasso work, the geometric interpretation of regularization, or the statistical theory behind ensemble methods. Also freely available online. Recommended for students with quantitative backgrounds who want the full picture.

3. Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to Linear Regression Analysis (6th ed.). Wiley. The standard graduate-level reference on linear regression, covering diagnostics, multicollinearity, influential observations, and model selection in exhaustive detail. More statistical than machine-learning-oriented, but provides the deepest treatment of the assumptions, violations, and remedies for linear models. Particularly strong on residual analysis and model diagnostics.


Regularization and Feature Selection

4. Tibshirani, R. (1996). "Regression Shrinkage and Selection via the Lasso." Journal of the Royal Statistical Society: Series B, 58(1), 267-288. The original paper introducing Lasso (L1) regularization. Tibshirani demonstrates how the L1 penalty produces sparse solutions — setting coefficients exactly to zero — and why this property makes Lasso a simultaneous model fitting and feature selection technique. A landmark paper that changed how practitioners approach high-dimensional regression. Readable even without deep statistical training.

5. Hoerl, A. E., & Kennard, R. W. (1970). "Ridge Regression: Biased Estimation for Nonorthogonal Problems." Technometrics, 12(1), 55-67. The paper that introduced Ridge (L2) regularization, demonstrating that deliberately introducing bias into coefficient estimates can reduce mean squared error when predictors are correlated. Published over fifty years ago, the core insight remains fundamental to modern machine learning. Provides historical context for why regularization exists and what problem it was designed to solve.

6. Zou, H., & Hastie, T. (2005). "Regularization and Variable Selection via the Elastic Net." Journal of the Royal Statistical Society: Series B, 67(2), 301-320. Introduces Elastic Net, which combines Ridge and Lasso penalties. Addresses Lasso's limitation when predictors are highly correlated (Lasso tends to select only one from a group of correlated features). Elastic Net encourages group selection, making it more robust in practice. The paper is accessible and the scikit-learn implementation is straightforward.


Tree-Based Methods and Gradient Boosting

7. Chen, T., & Guestrin, C. (2016). "XGBoost: A Scalable Tree Boosting System." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. The paper that introduced XGBoost, the algorithm that has dominated machine learning competitions on structured data for nearly a decade. Chen and Guestrin describe the algorithmic innovations — including regularized objective functions, sparsity-aware split finding, and cache-aware access — that make XGBoost both accurate and computationally efficient. Essential reading for anyone deploying gradient boosting in production.

8. Breiman, L. (2001). "Random Forests." Machine Learning, 45(1), 5-32. Leo Breiman's foundational paper on Random Forests. Introduces the key ideas — bagging, random feature selection at each split, out-of-bag error estimation — with characteristic clarity. Breiman's practical orientation (he was a statistician who worked extensively with real data) makes the paper valuable for practitioners, not just theorists.

9. Ke, G., Meng, Q., Finley, T., et al. (2017). "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." Advances in Neural Information Processing Systems, 30. Introduces LightGBM, a gradient boosting variant developed by Microsoft that achieves comparable accuracy to XGBoost with significantly faster training on large datasets. The leaf-wise tree growth strategy and histogram-based algorithms make LightGBM particularly well-suited for the kinds of large-scale tabular datasets common in business applications.


Time Series and Demand Forecasting

10. Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.). OTexts. The most accessible and comprehensive textbook on time series forecasting, available freely at otexts.com/fpp3. Covers exponential smoothing, ARIMA, regression with time series errors, and the R/Python tools for implementing them. While this chapter introduces time series concepts for regression, Chapter 16 of this textbook will draw more heavily on Hyndman's framework. A must-read for anyone building forecasting models.

11. Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2020). "The M4 Competition: 100,000 Time Series and 61 Forecasting Methods." International Journal of Forecasting, 36(1), 54-74. A comprehensive analysis of the M4 forecasting competition, in which 61 methods were evaluated on 100,000 real-world time series. Key finding: hybrid methods that combine statistical models with machine learning consistently outperformed pure statistical or pure ML approaches. The paper provides empirical guidance on when and how to use different forecasting methods — directly relevant to the model comparison approach in this chapter's DemandForecaster.

12. Fildes, R., Ma, S., & Kolassa, S. (2022). "Retail Forecasting: Research and Practice." International Journal of Forecasting, 38(4), 1283-1318. A comprehensive review of demand forecasting in retail, bridging academic research and industry practice. Covers feature engineering for retail demand (promotions, weather, holidays), hierarchical forecasting, intermittent demand, and the organizational challenges of deploying forecasting systems. Directly relevant to the Athena storyline and Case Study 1 (Walmart).


Business Applications of Regression

13. Fisher, M., & Raman, A. (2010). The New Science of Retailing: How Analytics Are Transforming the Supply Chain and Improving Performance. Harvard Business Review Press. Two Harvard Business School professors make the case for data-driven inventory management, demand forecasting, and assortment planning. The book provides the business context for the regression techniques covered in this chapter — why better forecasts matter, how forecast accuracy translates to financial performance, and what organizational capabilities are required to act on model outputs.

14. Fader, P. S., & Hardie, B. G. S. (2009). "Probability Models for Customer-Base Analysis." Journal of Interactive Marketing, 23(1), 61-69. Introduces probabilistic models for customer lifetime value (CLV) prediction — one of the key regression applications listed in Section 8.1. Fader and Hardie's BG/NBD and Gamma-Gamma models provide a principled framework for predicting how long customers will remain active and how much they will spend. An alternative to the pure regression approach for CLV that offers stronger theoretical foundations.

15. Jiang, Z., & Seidmann, A. (2014). "Capacity Planning and Performance Contracting for Service Facilities." Decision Support Systems, 58, 31-42. Examines how regression-based demand forecasting feeds into capacity planning decisions — a direct extension of the "from forecast to action" framework in Section 8.11. Useful for understanding how forecast accuracy, safety margins, and service level targets interact in service operations, not just retail inventory.


Model Evaluation and Error Analysis

16. Shmueli, G. (2010). "To Explain or to Predict?" Statistical Science, 25(3), 289-310. A foundational paper on the distinction between explanatory modeling (understanding relationships) and predictive modeling (forecasting outcomes). Shmueli argues that the goals, methods, and evaluation criteria for the two tasks differ fundamentally — a distinction directly relevant to the caution in Section 8.12 about confusing correlation with causation. Required reading for anyone who builds regression models.

17. Gneiting, T. (2011). "Making and Evaluating Point Forecasts." Journal of the American Statistical Association, 106(494), 746-762. A rigorous treatment of how to evaluate point forecasts (single-number predictions) and when they are and are not sufficient. Gneiting argues that point forecasts should always be accompanied by uncertainty estimates — a principle illustrated by the Zillow case study, where the absence of uncertainty ranges contributed to overconfidence in iBuying pricing.


Case Study Background

18. Hays, C. L. (2004). "What Walmart Knows About Customers' Habits." The New York Times, November 14, 2004. The original news report on Walmart's data mining capabilities, including the famous strawberry Pop-Tart discovery. While brief, the article provides the journalistic context for Case Study 1 and illustrates how early predictive analytics generated actionable business insights.

19. Parker, W. (2021). "Zillow Quits Home-Flipping Business, Cites Inability to Forecast Prices." The Wall Street Journal, November 2, 2021. The definitive breaking news story on Zillow's iBuying shutdown. Parker's reporting captures the internal turmoil, the financial scale of the losses, and CEO Rich Barton's acknowledgment that "the unpredictability in forecasting home prices far exceeds what we anticipated." Essential companion reading for Case Study 2.

20. Buchak, G., Grenadier, S., & Matvos, G. (2022). "iBuyers: Liquidity in Real Estate Markets." National Bureau of Economic Research Working Paper. An academic analysis of the iBuying business model, examining how algorithmic pricing affects real estate market dynamics, liquidity, and price discovery. The paper provides the theoretical framework for understanding why adverse selection was such a significant challenge for Zillow and its competitors.


Python and Tools

21. Geron, A. (2022). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (3rd ed.). O'Reilly Media. The most practical guide to implementing machine learning in Python. Part I covers regression, regularization, decision trees, and ensemble methods with working code. The progression from simple to complex models mirrors this chapter's approach. Geron's emphasis on practical considerations (feature scaling, cross-validation, hyperparameter tuning) makes this an ideal companion for the coding exercises.

22. scikit-learn Documentation. "Supervised Learning — Generalized Linear Models." scikit-learn.org. The official documentation for scikit-learn's regression implementations — LinearRegression, Ridge, Lasso, ElasticNet, and more. Includes mathematical descriptions, API references, and practical examples. Bookmark this page — you will reference it repeatedly when implementing regression models.

23. XGBoost Documentation. "XGBoost Parameters." xgboost.readthedocs.io. The official guide to XGBoost's parameters, including regularization controls (reg_alpha, reg_lambda), tree-building parameters (max_depth, min_child_weight), and learning parameters (learning_rate, n_estimators). Understanding these parameters is essential for tuning XGBoost models in practice.


Advanced Topics (Preview of Later Chapters)

24. Taylor, S. J., & Letham, B. (2018). "Forecasting at Scale." The American Statistician, 72(1), 37-45. The paper introducing Facebook Prophet, a time series forecasting tool designed for business analysts. Prophet handles seasonal effects, holidays, and trend changes automatically — features that require manual engineering in the regression approach from this chapter. Prophet will be covered in depth in Chapter 16.

25. Christoffersen, P. F., & Diebold, F. X. (1998). "Cointegration and Long-Horizon Forecasting." Journal of Business & Economic Statistics, 16(4), 450-458. An advanced treatment of long-horizon forecasting challenges, including the distinction between in-sample fit and out-of-sample predictive accuracy. Relevant for understanding why models that perform well on historical data often disappoint in production — a theme that connects the Zillow case study to the model evaluation framework in Chapter 11.


Each item in this reading list was selected because it directly supports concepts introduced in Chapter 8 and connects to the broader themes of the textbook. Items 10, 11, 12, and 24 anticipate the deeper time series treatment in Chapter 16. Items 16, 17, and 25 connect to the model evaluation framework in Chapter 11.