Chapter 15: Key Takeaways - NFL Modeling
-
EPA/play is the foundational metric for NFL modeling. Expected Points Added per play captures offensive and defensive efficiency in a way that accounts for down, distance, and field position. It is the single most predictive per-play statistic available and should be the backbone of any NFL spread or totals model.
-
Small sample sizes are the NFL's defining analytical challenge. With only 17 regular-season games, many statistics do not stabilize within a single season. Turnover rate, red-zone efficiency, and third-down conversion rate are particularly noisy. Modelers must use play-level data rather than game-level data to extract signal, and they should apply Bayesian shrinkage or regression to the mean for unstable metrics.
-
Early-down efficiency is more predictive than all-down efficiency. Filtering to first and second down plays removes the high-variance, situation-dependent noise of third-down plays. Early-down EPA/play and success rate are more stable year-over-year and more predictive of future game outcomes.
-
Opponent adjustment is essential. Raw efficiency metrics are biased by schedule strength. An iterative opponent-adjustment process that accounts for the quality of each team's opponents produces ratings that are significantly more predictive than raw averages.
-
Home-field advantage has declined significantly. The traditional 3-point home-field advantage has shrunk to approximately 1.5-2 points in recent seasons. Models that use the historical figure will systematically overvalue home teams.
-
Key numbers (3 and 7) are uniquely important in NFL spread betting. The NFL's scoring structure causes final margins to cluster around 3 and 7. The difference between a line of 2.5 and 3.5 is far more consequential than the difference between 4.5 and 5.5. Teasers that cross both key numbers have historically shown positive expected value.
-
Quarterback injuries dominate line movement. The typical impact of losing a starting quarterback ranges from 2 to 7 points depending on the quality gap between the starter and backup. A systematic, Bayesian approach to estimating this gap is more accurate than ad hoc adjustments.
-
Preseason priors matter more in the NFL than in any other major sport. Because the season is short, early-season predictions depend heavily on preseason information. Previous season performance (regressed toward the mean), roster changes, and market-derived win totals are all valuable prior inputs.
-
The NFL closing line is highly efficient but not unbeatable. The closing spread achieves an RMSE of approximately 13.5 points, which is remarkably good. However, edges of 1-2 points do exist, particularly around injury mispricing, weather effects on totals, and early-season model uncertainty.
-
Turnovers should be modeled with extreme caution. Fumble recovery is essentially random, and interception rates have low year-over-year stability. Including raw turnover data as a model feature leads to overfitting. Use turnover-adjusted metrics or exclude turnovers entirely from predictive features.
-
Pace interacts with efficiency to determine scoring. A totals model must account for both how well teams move the ball and how many opportunities they create. Two high-pace teams meeting in a game will produce more total scoring opportunities than their efficiency alone would suggest.
-
Honest evaluation against the closing line is the ultimate test. A model that cannot consistently identify edges relative to the closing spread is not generating betting value, regardless of how well it explains past results. Track every prediction before game time and compare rigorously to the market.