Chapter 9 Further Reading: Regression Analysis for Sports Modeling

The following annotated bibliography provides resources for deeper exploration of the regression analysis topics introduced in Chapter 9. Entries are organized by category and chosen for their relevance to sports modeling and betting applications.


Books: Statistical Foundations

1. James, Gareth, Witten, Daniela, Hastie, Trevor, and Tibshirani, Robert. An Introduction to Statistical Learning with Applications in Python. Springer, 2023 (2nd edition). The gold-standard introductory textbook for applied statistical learning. Chapters 3 (linear regression), 4 (classification/logistic regression), and 6 (regularization: Ridge, Lasso, Elastic Net) are directly relevant to Chapter 9. The Python code examples and lab exercises make concepts immediately applicable. This is the single most recommended reference for readers who want to deepen their regression foundations.

2. Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. The Elements of Statistical Learning. Springer, 2009 (2nd edition). The more advanced companion to ISLP. Chapters 3 and 4 provide rigorous mathematical treatment of linear and logistic regression, including the geometry of least squares, distributional theory, and the bias-variance decomposition. Recommended for readers with strong mathematical backgrounds who want to understand the "why" behind the methods.

3. Kutner, Michael H., Nachtsheim, Christopher J., Neter, John, and Li, William. Applied Linear Statistical Models. McGraw-Hill, 2005 (5th edition). A comprehensive reference for regression diagnostics, including detailed treatment of residual analysis, influence diagnostics (Cook's distance, DFBETAS), multicollinearity (VIF, condition numbers), and remedial measures. Particularly valuable for the diagnostic work emphasized in Chapter 9.

4. Agresti, Alan. Categorical Data Analysis. Wiley, 2013 (3rd edition). The definitive reference on logistic regression and related methods for categorical outcomes. Chapters on binary regression, model diagnostics for logistic models, and overdispersion are directly relevant for sports win probability models.


Books: Sports Analytics Applications

5. Winston, Wayne L. Mathletics: How Gamblers, Managers, and Fans Use Mathematics in Sports. Princeton University Press, 2022 (2nd edition). Contains chapters on regression-based sports models with accessible worked examples from football, basketball, and baseball. Winston's treatment of point spread modeling and the use of regression for rating systems is particularly aligned with Chapter 9's approach.

6. Albert, Jim, Glickman, Mark E., Swartz, Tim B., and Koning, Ruud H., eds. Handbook of Statistical Methods and Analyses in Sports. Chapman & Hall/CRC, 2017. An academic handbook with chapters by leading sports statisticians. The chapters on regression models for game outcomes, team rating systems, and predictive model evaluation are directly relevant. Each chapter includes references to key papers in the field.

7. Severini, Thomas A. Analytic Methods in Sports: Using Mathematics and Statistics to Understand Data from Baseball, Football, Basketball, and Other Sports. CRC Press, 2020 (2nd edition). A mathematically rigorous treatment of sports analytics methods including linear models for run/point estimation, logistic models for win probability, and model validation. The textbook style with exercises makes it a natural companion to Chapter 9.


Academic Papers

8. Harville, David A. "Predictions for National Football League Games via Linear-Model Methodology." Journal of the American Statistical Association, 75(371), 1980, pp. 516-524. One of the earliest academic papers applying regression methods to NFL game prediction. Harville's linear model approach to team ratings and game predictions established a framework that remains influential. Essential historical reading for understanding the intellectual lineage of modern sports regression models.

9. Stern, Hal S. "On the Probability of Winning a Football Game." The American Statistician, 45(3), 1991, pp. 179-183. A compact, elegant paper that models NFL game outcomes using normal distributions fitted via regression. Stern demonstrates how to convert point-spread predictions into win probabilities, a technique used throughout Chapter 9. The paper's clarity makes it accessible to students.

10. Lopez, Michael J. and Matthews, Gregory J. "Building an NCAA Men's Basketball Predictive Model and Quantifying Its Success." Journal of Quantitative Analysis in Sports, 11(1), 2015, pp. 5-12. Demonstrates the construction and evaluation of a logistic regression model for predicting NCAA tournament outcomes. The paper's emphasis on out-of-sample evaluation and comparison against betting markets makes it directly relevant to the betting application sections of Chapter 9.

11. Manner, Hans. "Modeling and Forecasting the Outcomes of NBA Basketball Games." Journal of Quantitative Analysis in Sports, 12(1), 2016, pp. 31-41. Compares multiple regression approaches (OLS, ordered probit, random effects) for NBA game prediction. The paper provides a useful benchmark for the NBA logistic regression model developed in Case Study 2 and discusses the challenges of forecasting in basketball.

12. Pelechrinis, Konstantinos and Papalexakis, Evangelos. "The Anatomy of American Football: Evidence from 7 Years of NFL Game Data." PLoS ONE, 11(12), 2016. Uses regression analysis on NFL play-by-play data to decompose the factors driving game outcomes. The feature engineering approach (EPA-based metrics, situational variables) directly informed the modeling methodology in Chapter 9's case studies.


Applied Tutorials and Blog Posts

13. Football Outsiders (footballoutsiders.com). "Methods to Our Madness" (methodology section). The methodology documentation for DVOA (Defense-adjusted Value Over Average), one of the most widely cited team efficiency metrics in football analytics. Understanding how DVOA is constructed --- essentially a regression-adjusted efficiency metric --- provides context for the EPA-based features used in Chapter 9.

14. FiveThirtyEight. "How Our NFL Predictions Work" (2015, updated annually). Nate Silver's detailed explanation of FiveThirtyEight's Elo-based NFL prediction system, which incorporates regression adjustments for home-field advantage, rest, travel, and quarterback quality. While Elo is not a regression model per se, the feature engineering decisions and evaluation methodology are highly relevant.

15. Inpredictable (inpredictable.com). "NFL Regression Modeling Series." A series of blog posts by an independent sports analyst walking through the process of building NFL regression models, from feature selection through backtesting. The posts use R and provide code, covering many of the same topics as Chapter 9 with a practical emphasis.


Software Documentation and Tutorials

16. scikit-learn Documentation: Linear Models Module. The official documentation for sklearn.linear_model, including LinearRegression, LogisticRegression, Ridge, Lasso, ElasticNet, and RidgeCV/LassoCV. The user guide sections on regularization paths and cross-validation strategies are essential references for the programming exercises in Chapter 9.

17. statsmodels Documentation: Regression and Linear Models. The documentation for statsmodels.api.OLS and statsmodels.api.Logit, including summary tables, diagnostic tests (Breusch-Pagan, Durbin-Watson, Jarque-Bera), and influence measures (Cook's distance, leverage). Statsmodels provides the inferential statistics (p-values, confidence intervals) that scikit-learn deliberately omits.

18. Seaborn and Matplotlib: Regression Diagnostic Visualization. The Seaborn residplot and regplot functions, combined with scipy.stats.probplot for Q-Q plots, provide the visualization toolkit for the diagnostic workflows in Chapter 9. The Matplotlib documentation for subplot layouts is essential for building the four-panel diagnostic plots.


Data Sources

19. nflfastR / nflverse (nflverse.com). An open-source R package (with Python wrappers via nfl_data_py) that provides play-by-play NFL data going back to 1999, including EPA calculations, win probability estimates, and game-level aggregations. This is the primary data source for NFL regression modeling and the data that powers many of the features in Chapter 9's case studies. Updated weekly during the season.

20. Basketball Reference (basketball-reference.com) and NBA API. Basketball Reference provides comprehensive NBA team and player statistics. The nba_api Python package accesses the NBA's official statistics API programmatically. These sources provide the net rating, pace, and schedule data used in the NBA logistic regression case study.


How to Use This Reading List

For readers working through this textbook sequentially, the following prioritization is suggested:

  • Start with: James et al. (entry 1) chapters 3, 4, and 6 for a thorough statistical grounding.
  • Go deeper on diagnostics: Kutner et al. (entry 3) for the most complete treatment of residual analysis and influence measures.
  • For sports-specific applications: Winston (entry 5) and Albert et al. (entry 6) for sports modeling context.
  • For programming implementation: scikit-learn docs (entry 16) and statsmodels docs (entry 17).
  • For data: nflfastR/nflverse (entry 19) for NFL and basketball-reference/nba_api (entry 20) for NBA.
  • For historical perspective: Harville (entry 8) and Stern (entry 9) to see how the field began.

These resources will be referenced again in later chapters as we build upon the regression foundations established here.