Chapter 15 Quiz: NFL Modeling
Test your understanding of NFL-specific modeling concepts, data, and market structure.
Question 1. What does EPA stand for in the NFL analytics context, and what baseline does it measure against?
Answer
EPA stands for Expected Points Added. It measures the change in expected points on a given play relative to the expected points at the start of that play, based on down, distance, field position, and other game-state variables. The baseline is the league-average expected points for a given game state.Question 2. Why is the NFL's small sample size (17 regular-season games) a particular challenge for bettors compared to MLB or NBA?
Answer
With only 17 games, many team-level statistics do not stabilize within a single season. Metrics like turnover rate, red-zone efficiency, and third-down conversion rate carry significant noise, making it difficult to distinguish true team quality from random variation. Bettors must rely on more granular play-level data or priors from previous seasons to compensate.Question 3. What are the two most common final margins of victory in NFL games, and why are they called "key numbers"?
Answer
The two most common margins are 3 (field goal) and 7 (touchdown). They are called key numbers because the NFL's scoring structure (3 for a field goal, 7 for a touchdown with extra point) causes game outcomes to cluster around these margins. This clustering makes the difference between a spread of 2.5 and 3.5 far more consequential than the difference between, say, 4.5 and 5.5.Question 4. A team's offensive DVOA is +15.2% and their defensive DVOA is -10.8% (negative is good for defense). What does this imply about the team's overall quality?
Answer
This team is above average on both sides of the ball. Their offense performs 15.2% better than the league-average offense after adjusting for opponent and situation, and their defense allows 10.8% fewer value-adjusted yards than league average. The combined DVOA of roughly +26% suggests this is an elite team, likely among the top few in the league.Question 5. What is "garbage time" in the context of NFL analytics, and how should a modeler handle it?
Answer
Garbage time refers to plays that occur when the game's outcome is effectively decided, typically when one team has a large lead in the fourth quarter. Common definitions include a lead of 28 or more points in the fourth quarter, or situations where win probability exceeds 95%. Modelers should either exclude garbage-time plays from efficiency calculations or down-weight them, as they do not reflect a team's true competitive performance. Defenses may play prevent schemes and offenses may run conservative or experimental plays.Question 6. Explain the concept of Success Rate in NFL analytics. How is a "successful" play defined on each down?
Answer
Success Rate measures the percentage of plays that achieve a positive expected outcome. A play is typically considered successful if it gains: 50% of yards to go on first down, 70% of yards to go on second down, or 100% of yards to go on third or fourth down. This metric is considered more stable than EPA/play because it is less influenced by explosive plays and turnovers.Question 7. What is the typical home-field advantage in the NFL expressed in points, and how has it changed over the past decade?
Answer
Historically, NFL home-field advantage was approximately 3 points. Over the past decade, it has declined significantly to approximately 1.5 to 2 points, with some analyses suggesting it dropped even further during and after the 2020 season. Possible explanations include improved travel conditions, better preparation technology, and increased noise-canceling communication in helmets.Question 8. You are building a spread prediction model. Should you include turnovers as a predictive feature? Justify your answer.
Answer
Turnovers should be treated with extreme caution. Fumble recovery rate is essentially random (approximately 50/50 regardless of team quality) and interception rate, while somewhat skill-based for quarterbacks, has low year-over-year correlation. Including raw turnover counts as a feature will likely cause overfitting to noise. A better approach is to use turnover-adjusted metrics (e.g., expected turnovers based on interceptable passes thrown) or to exclude turnovers entirely and let the model focus on more stable efficiency metrics.Question 9. What is the nflfastR package, and why is it valuable for NFL modeling?
Answer
nflfastR is an open-source R package (with Python equivalents like nfl_data_py) that provides cleaned, play-by-play NFL data going back to 1999. It includes pre-calculated EPA values, win probability, completion probability, and other advanced metrics for every play. It is valuable because it provides free, comprehensive, research-grade data that previously required expensive proprietary sources, democratizing access to the raw material needed for serious NFL modeling.Question 10. Describe how you would build a preseason prior for an NFL team rating system. What information would you incorporate?
Answer
A preseason prior should incorporate: (1) previous season's performance metrics (efficiency stats, not win-loss record), regressed toward the mean by roughly 30-40%; (2) offseason roster changes, particularly at quarterback; (3) draft capital added (draft pick value); (4) coaching changes; (5) market-derived information such as preseason win totals from sportsbooks. The prior should be weighted heavily in Weeks 1-4 and gradually replaced by current-season data as the sample grows.Question 11. What is the approximate standard deviation of an NFL final score margin, and why does this matter for spread betting?
Answer
The standard deviation of NFL final score margins is approximately 13.5 to 14 points. This matters because it defines the uncertainty envelope around any point spread prediction. Even a model that perfectly estimates the true spread would see substantial variation in outcomes. To generate betting value, a model needs to identify edges of at least 1-2 points consistently, which is a small fraction of the total variance.Question 12. Explain the difference between a team's pythagorean win expectation and their actual win-loss record. Why is this useful for modeling?
Answer
Pythagorean win expectation estimates the number of wins a team "should" have based on their total points scored and allowed, using the formula: Win% = PF^2.37 / (PF^2.37 + PA^2.37). Teams that significantly outperform their pythagorean expectation (more actual wins than expected) tend to regress the following season, as outperformance is often driven by unsustainable factors like a disproportionate record in close games. This makes pythagorean wins a better predictor of future performance than actual wins.Question 13. How should a modeler account for weather effects in NFL totals predictions?
Answer
Wind speed above 15 mph has the most significant effect on scoring, primarily by reducing passing efficiency and field goal accuracy. Temperature has a modest effect below freezing. Precipitation (rain or snow) has a smaller but measurable effect. A modeler should collect historical weather data for outdoor stadiums, estimate the scoring impact per unit of each weather variable, and apply the adjustment to the predicted total. Indoor and dome games serve as the control group. Wind is typically the most impactful variable, worth 2-4 points off the total in extreme cases.Question 14. What is a "reverse line movement" and what might it indicate about sharp action on an NFL game?
Answer
Reverse line movement occurs when the point spread moves in the opposite direction of the public betting percentages. For example, if 75% of spread bets are on Team A, but the line moves in favor of Team B, this suggests that the smaller percentage of bets on Team B represents larger-dollar wagers from sharp bettors. Sportsbooks adjust lines based on liability, and sharp money carries more weight than public money. Reverse line movement is often used as a signal of informed betting activity.Question 15. Why might a modeler use early-down EPA/play rather than all-down EPA/play as a primary feature?
Answer
Early-down (first and second down) EPA/play is more predictive of future performance because it is less influenced by game script and situational variance. Third-down performance is heavily dependent on the down-and-distance created by first and second down plays and involves higher variance (blitzes, coverage adjustments). By focusing on early downs, the modeler captures a purer signal of offensive and defensive efficiency that is less contaminated by situation-dependent noise.Question 16. A team is 8-2 ATS (against the spread) through 10 games. Should you conclude they are being undervalued by the market?
Answer
No. An 8-2 ATS record in 10 games is not statistically significant. Assuming a 50% base rate for covering the spread, the probability of going 8-2 or better by chance alone is approximately 5.5% (sum of binomial probabilities for 8, 9, and 10 successes in 10 trials at p=0.50). This is borderline significant at the 5% level but far from conclusive. You would need a sustained ATS record over many more games (typically 200+ bets) to establish statistical significance with reasonable confidence.Question 17. Describe the concept of "regression to the mean" as it applies to NFL interception rate for a quarterback.
Answer
Interception rate has a relatively low year-over-year correlation (approximately r = 0.30-0.40), meaning a large portion of a quarterback's interception rate in a given season is driven by randomness rather than skill. A quarterback who throws interceptions on 4% of attempts in one season is likely to regress toward the league average (approximately 2.5%) the following season. A modeler should apply a Bayesian shrinkage or regression factor when using interception rate as a feature, blending the observed rate with the league mean.Question 18. What is the typical breakeven win rate for a standard -110 point spread bet, and how does this affect the minimum edge required?
Answer
At -110 odds, you risk $110 to win $100, giving a breakeven win rate of 110/210 = 52.38%. This means a bettor must win more than 52.38% of their spread bets to be profitable long term. A model that identifies a 2% edge (54.38% true win probability) generates approximately 2 cents of expected profit per dollar wagered, which is a meaningful edge in the NFL market but requires disciplined bankroll management to realize over the high variance of a single season.Question 19. How would you model the impact of a starting quarterback injury on the point spread?
Answer
First, estimate the performance gap between the starter and the backup using historical EPA/play data, preseason projections, or draft capital as a proxy. The typical gap between a starting NFL quarterback and their backup is approximately 0.10-0.15 EPA/play. Multiply this gap by the expected number of dropbacks per game (approximately 35) to estimate the point impact, which typically ranges from 3-7 points depending on the quality of the starter. This adjustment should be applied to the team's power rating and reflected in the predicted spread.Question 20. What role does pace of play (plays per game) have in NFL totals modeling?
Answer
Pace directly affects total scoring opportunity. A team that runs 70 plays per game creates more scoring chances than a team running 58 plays per game. When two up-tempo teams meet, the expected total should be higher than efficiency alone would suggest, because more plays means more opportunities for both offenses. Pace should be modeled as a multiplicative factor with efficiency: Expected Points = EPA/play x Number of Plays. A game with a pace mismatch (one fast team, one slow team) requires modeling the interaction, as the faster team cannot fully dictate pace without the ball.Question 21. Explain the concept of a "look-ahead line" in NFL betting. Why do sportsbooks release lines for the following week before the current week's games are played?
Answer
A look-ahead line is a point spread released approximately one week before the game, before the current week's results are known. Sportsbooks release these lines to gauge market sentiment and attract early sharp action that helps them set more accurate opening lines. For bettors, look-ahead lines can occasionally offer value because they do not account for injuries, rest situations, or momentum changes from the intervening week's games. However, limits are typically very low on look-ahead lines.Question 22. Why is the NFL red zone (inside the opponent's 20-yard line) particularly noisy from a statistical perspective?
Answer
Red zone performance is noisy because: (1) sample sizes are small, as teams may have only 3-5 red zone possessions per game; (2) the compressed field changes offensive and defensive dynamics significantly; (3) touchdown-versus-field-goal outcomes are high-leverage binary events with significant variance; (4) play calling in the red zone is more game-plan-specific and opponent-dependent. Red zone touchdown rate has low year-over-year stability (r approximately 0.20-0.30), making it a poor standalone predictor despite its obvious importance to scoring.Question 23. How would you construct an NFL power rating that combines offensive and defensive components?
Answer
A standard approach: (1) Calculate offensive EPA/play for each team, adjusted for opponent defensive quality; (2) Calculate defensive EPA/play allowed, adjusted for opponent offensive quality; (3) Combine these into a net rating: Power = (Off EPA/play - Def EPA/play); (4) Convert to a points-per-game scale by multiplying by average plays per game (approximately 64); (5) Add a home-field adjustment (approximately +1.5 to +2 points); (6) The predicted spread equals Home Power Rating - Away Power Rating + Home Field Advantage. Iterate the opponent adjustments until ratings converge.Question 24. What is the "tease" bet, and why is it particularly popular in NFL betting?
Answer
A teaser allows the bettor to adjust the point spread by a fixed number of points (typically 6, 6.5, or 7) in their favor on two or more selections, all of which must win for the bet to pay. Teasers are popular in the NFL because of key numbers: a 6-point teaser on a -7.5 favorite crosses both 3 and 7, converting a moderately likely cover into a highly likely one. Historical analysis shows that properly constructed NFL teasers crossing key numbers can achieve win rates above the breakeven threshold, making them one of the few structured bet types with a theoretical edge.Question 25. Describe a complete workflow for producing NFL game predictions for a full weekly slate, from data collection through final output.