Chapter 15: Further Reading - NFL Modeling
Academic Papers and Research
-
Burke, Brian. "Expected Points and Expected Points Added." Advanced Football Analytics (2009). The foundational explanation of the EPA framework that underpins modern NFL analytics. Describes how expected points are calculated from historical play-by-play data and how EPA captures the value of individual plays.
-
Yurko, Ronald, Samuel Ventura, and Maksim Horowitz. "nflWAR: A Reproducible Method for Offensive Player Evaluation in Football." Journal of Quantitative Analysis in Sports (2019). Presents a framework for attributing team-level performance to individual players using play-by-play data, with applications to quarterback evaluation and player valuation.
-
Lopez, Michael J. "How Often Does the Best Team Win? A Unified Approach to Understanding Randomness in North American Sport." Annals of Applied Statistics (2018). Compares the role of luck versus skill across the NFL, NBA, MLB, and NHL. Provides critical context for why NFL outcomes are so difficult to predict and why sample size limitations are particularly acute.
-
Boulier, Bryan L., and Herman O. Stekler. "Predicting the Outcomes of National Football League Games." International Journal of Forecasting (2003). An early systematic study of NFL prediction models comparing different approaches to forecasting game outcomes and evaluating them against point spreads.
-
Stern, Hal. "On the Probability of Winning a Football Game." The American Statistician (1991). Classic paper establishing that NFL point differentials are approximately normally distributed with a standard deviation of about 14 points, a finding that remains valid today.
Books
-
Winston, Wayne. "Mathletics: How Gamblers, Managers, and Sports Enthusiasts Use Mathematics in Baseball, Basketball, and Football." Princeton University Press (2009). Accessible introduction to sports analytics with strong chapters on NFL statistical modeling and the mathematics of point spreads.
-
Carroll, Bob, Pete Palmer, and John Thorn. "The Hidden Game of Football." University of Chicago Press (1998). A pioneering work in football analytics that introduced many concepts that later became standard, including efficiency-based team evaluation.
-
Alamar, Benjamin. "Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers." Columbia University Press (2013). Provides a broad framework for applying analytics to sports decision-making, with relevant sections on football player evaluation and game strategy.
Data Sources
-
nflfastR / nfl_data_py. Open-source play-by-play data for all NFL games since 1999, with pre-calculated EPA, win probability, and completion probability. Available at https://github.com/nflverse/nflfastR (R) and https://github.com/nflverse/nfl_data_py (Python). The essential starting point for any NFL modeling project.
-
Pro Football Reference. Comprehensive historical statistics including game logs, player stats, team rankings, and draft data. Available at https://www.pro-football-reference.com/. Useful for cross-referencing and for historical analysis predating the play-by-play data era.
-
Football Outsiders (DVOA). Defense-adjusted Value Over Average ratings for every NFL team and player. Provides opponent-adjusted efficiency metrics that can serve as benchmarks or inputs to custom models. Available at https://www.footballoutsiders.com/.
-
ESPN's QBR (Quarterback Rating). A proprietary quarterback evaluation metric that attempts to isolate quarterback performance from supporting cast effects. While the full methodology is not public, QBR values provide a useful cross-check for quarterback-level analysis.
Online Resources and Communities
-
Lee Sharpe's NFL Data Repository. Curated datasets including game-level data, schedules, and draft picks in clean, analysis-ready formats. Available on GitHub. A convenient complement to play-by-play data.
-
Open Source Football (blog). Tutorials and analyses using nflfastR data. Covers topics from basic EPA calculations to advanced modeling techniques. An excellent resource for learning to work with NFL play-by-play data in R and Python.
-
The Athletic's NFL Analytics Coverage. Regular articles applying advanced metrics to NFL analysis, including pieces by Ben Baldwin, Nate Tice, and others. Subscription required but provides high-quality applied analytics content.
-
PFF (Pro Football Focus). Provides play-by-play grading of every NFL player on every play. While the grades are subjective, PFF's data on pressures, separation, and coverage assignments is valuable and not available elsewhere. Subscription required for detailed data access.
Betting Market Resources
-
Unabated. A line-shopping and odds-analysis platform that tracks opening lines, line movement, and market consensus across major sportsbooks. Essential for serious NFL bettors who need to identify the best available number.
-
Pinnacle Sports Betting Resources. Pinnacle's editorial content on NFL betting covers market efficiency, closing line value, and bankroll management. As a reduced-juice sportsbook, Pinnacle's closing lines are widely regarded as the sharpest in the market.
-
Spanky's NFL Picks (historical archive). One of the longest-running documented NFL betting records, providing a case study in the difficulty of long-term profitability against the spread even with a systematic approach.
Methodological References
- Glickman, Mark E., and Hal S. Stern. "A State-Space Model for National Football League Scores." Journal of the American Statistical Association (1998). Introduces a Bayesian state-space framework for modeling NFL scores that elegantly handles the evolution of team quality over time. A technically demanding but highly rewarding read for those interested in Bayesian approaches to NFL prediction.