Chapter 18 Exercises: Modeling the NHL

Part A: Conceptual Questions (Exercises 1--8)

Exercise 1. Explain what an expected goals (xG) model measures and why it is more useful than raw shot counts for predicting future NHL team performance. Describe the key features that drive an xG model and rank them by approximate importance.

Exercise 2. Define Corsi and Fenwick. Explain why shot attempts serve as a proxy for puck possession in hockey when no official possession statistic exists. Under what conditions would Corsi and Fenwick give meaningfully different readings for the same team?

Exercise 3. PDO is defined as the sum of a team's shooting percentage and save percentage ($\text{PDO} = \text{Sh\%} + \text{Sv\%}$). Explain why PDO is considered the most mean-reverting metric in hockey. A team has a PDO of 104.5% through 20 games. What does this imply about its future performance, and how would a bettor exploit this information?

Exercise 4. Describe the "turtling effect" in the NHL. How do teams' shot attempt rates change when they are leading by 1, 2, or 3+ goals? Explain why raw Corsi percentages must be score-adjusted to accurately reflect team quality, and describe the general methodology for making this adjustment.

Exercise 5. Goals Saved Above Expected (GSAx) requires heavy regression to the mean for goaltenders. Explain why goaltender metrics are so noisy compared to skater metrics. What is the approximate number of shots a goaltender must face before GSAx per shot carries 50% weight versus the league-average prior?

Exercise 6. Explain the overtime and shootout structure in the NHL regular season and its implications for three betting markets: moneyline, puck line (+/- 1.5), and totals. Why does a team's overtime probability act as a "wedge" between its moneyline win probability and its puck line cover probability?

Exercise 7. Describe the back-to-back fatigue effect in the NHL. Quantify the approximate impact on win rate, goals against, and save percentage. Under what conditions is the market most likely to underadjust for back-to-back fatigue?

Exercise 8. Compare the predictive power of CF% (Corsi For percentage) versus xG differential for forecasting future NHL standings points over 20-game windows. Under what circumstances is CF% more useful than xG, and vice versa? Explain why a complete betting model should incorporate both.

Part B: Calculation Problems (Exercises 9--15)

Exercise 9. A team has the following even-strength shot attempt data over a game:

Category For Against
Shots on goal 28 24
Missed shots 8 6
Blocked shots 10 12

Calculate: (a) Corsi For (CF), Corsi Against (CA), and CF%; (b) Fenwick For (FF), Fenwick Against (FA), and FF%; (c) If the team scored 2 goals on 28 shots on goal and allowed 1 goal on 24 shots, compute PDO.

Exercise 10. A goaltender has faced 1,200 shots this season with the following results: 108 goals allowed, xGA (expected goals against from the xG model) = 124.5. Compute: (a) save percentage, (b) expected save percentage, (c) GSAx total, (d) GSAx per shot, and (e) GSAx per game (assuming 40 games played).

Exercise 11. Apply the goaltender regression formula to the goaltender from Exercise 10:

$$\text{Projected GSAx/shot} = \frac{n}{n + k} \times \text{observed} + \frac{k}{n + k} \times 0$$

with $k = 3{,}000$ shots. Then project the goaltender's GSAx over the next 1,500 shots. Compare the raw GSAx per shot to the regressed rate and explain the magnitude of the regression.

Exercise 12. Two teams are projected as follows for an upcoming game: Team A (home) has xGF/60 = 2.80 and xGA/60 = 2.25 at even strength. Team B (away) has xGF/60 = 2.50 and xGA/60 = 2.60. Assuming 48 minutes of even-strength play and a league-average xG/60 of 2.50, compute the expected even-strength goals for each team using the geometric mean method from the chapter.

Exercise 13. Using a Poisson model, compute the following for a game where Team A is projected at 3.2 goals and Team B at 2.6 goals:

(a) P(Team A wins in regulation)

(b) P(game goes to overtime)

(c) P(Team A wins overall, assuming 52% OT win probability for the home team)

(d) P(Team A covers -1.5 puck line)

(e) P(total goals > 5.5)

Exercise 14. A team's power play operates at 8.5 xGF per 60 minutes, and they average 3.2 power play opportunities per game with an average duration of 1.8 minutes per opportunity. The opposing penalty kill allows 7.0 xGA per 60 minutes (league average is 7.5 xGA/60). Compute the team's expected power play goals for this game.

Exercise 15. A bettor's model projects Team A at 55.2% win probability (including overtime). The market offers Team A at $-125$ moneyline. Calculate: (a) the implied probability at $-125$ (no-vig), (b) the edge, (c) the expected value of a \$100 bet, and (d) the Kelly criterion optimal fraction assuming a \$5,000 bankroll.

Part C: Programming Exercises (Exercises 16--20)

Exercise 16. Implement an xG model using logistic regression in Python. The model should: - Accept a DataFrame of shots with features: distance, angle, shot type, strength state, is_rebound, is_rush - Use sklearn's ColumnTransformer for numeric scaling and categorical encoding - Apply isotonic calibration for well-calibrated probability outputs - Return evaluation metrics: log loss, Brier score, and AUC-ROC

Generate synthetic shot data with realistic distributions and demonstrate the model's performance.

Exercise 17. Build a ShotMetricsCalculator class in Python that computes: - Raw Corsi (CF, CA, CF%) - Raw Fenwick (FF, FA, FF%) - PDO (shooting % + save %) - Score-adjusted Corsi using the score-state multipliers from the chapter - Venue-adjusted Corsi (home ice advantage adjustment)

Test the calculator on synthetic play-by-play data for at least three teams.

Exercise 18. Implement a GoaltenderRegressor class that: - Accepts a goaltender's observed GSAx and shots faced - Applies Bayesian regression with a configurable regression constant - Projects future GSAx with confidence intervals - Compares two goaltenders' projected per-game impact

Demonstrate with at least four goaltender profiles: an elite starter, an average starter, a struggling veteran, and a hot backup with a small sample.

Exercise 19. Write a complete NHL game projection function that combines: - Even-strength xG projection (geometric mean method) - Power play and penalty kill contributions - Goaltender GSAx adjustment (regressed) - Back-to-back fatigue adjustment - Home ice advantage

The function should output projected goals for each team, win probability (Poisson-based), puck line probability, and over/under probability for a given total.

Exercise 20. Create an NHL market analysis tool that processes a synthetic dataset of 1,200 games and reports: - Puck line cover rates bucketed by moneyline favorite strength - Back-to-back performance impact (home B2B, away B2B, both, neither) - Over/under results by total line (5.5, 6.0, 6.5) - Backup goaltender effect on game totals and moneyline results - Home ice advantage by month

Output results in formatted tables and generate at least one visualization.

Part D: Analysis Exercises (Exercises 21--25)

Exercise 21. Consider two NHL teams with the following season-to-date metrics (30 games each):

Metric Team X Team Y
Record 20-8-2 14-12-4
Points 42 32
CF% (5v5) 48.5% 53.8%
xGF% (5v5) 47.2% 55.1%
PDO 103.8% 97.5%
GSAx (team) +12.5 -6.2

(a) Which team's record is more likely to be sustainable? Justify your answer using the underlying metrics.

(b) Project each team's points pace over the remaining 52 games under the assumption that PDO regresses to 100%.

(c) Identify the specific betting angle this analysis suggests.

Exercise 22. A team's starting goaltender is confirmed out for tonight's game, replaced by a backup with the following profile: 400 shots faced, GSAx of $-3.0$, save percentage of .895. The starter has faced 1,500 shots with a GSAx of +15.0. Using regression constants of $k = 3{,}000$:

(a) Compute regressed GSAx per shot for both goaltenders.

(b) Estimate the per-game goal differential between the starter and backup assuming 30 shots per game.

(c) Determine how much the moneyline should move when the backup is announced.

(d) If the actual moneyline movement is only 5 cents, identify the betting opportunity.

Exercise 23. Analyze the puck line market for a game where your model projects Team A at 60% win probability and 3.1 expected goals vs. Team B at 2.5 expected goals.

(a) Using a Poisson model, compute P(Team A wins by 2+ goals in regulation).

(b) Compute P(overtime) and note that overtime games cannot produce a 2+ goal margin unless one team scores an empty-net goal followed by an OT winner.

(c) If the puck line for Team A -1.5 is priced at +185 (implied 35.1%), determine whether there is value.

(d) Compare the edge on the moneyline vs. the puck line.

Exercise 24. A team is playing the second game of a road back-to-back after an overtime loss the previous night. Research suggests: - Back-to-back games reduce win probability by approximately 5-7 percentage points - Road back-to-backs are worse than home back-to-backs by an additional 2 points - Overtime games the previous night add approximately 1 point of additional fatigue

If the team's pre-adjustment win probability was 52%, compute the adjusted probability. Then determine the fair moneyline and compare to a hypothetical market price of $-110$.

Exercise 25. Build a regression analysis of the relationship between first-half xG differential (games 1--41) and second-half points earned (games 42--82) for a hypothetical league of 32 teams. Use synthetic data where first-half xG differential explains 40% of the variance in second-half points. Demonstrate that xG differential is a better predictor than actual goal differential (which should explain only 25% of variance). Discuss implications for midseason betting.

Part E: Research Exercises (Exercises 26--30)

Exercise 26. Research the evolution of NHL analytics from the introduction of Corsi in the mid-2000s through the modern xG era. Identify three major methodological advances and assess how each has affected the efficiency of NHL betting markets. Has the proliferation of public xG models reduced the edges available to quantitative bettors?

Exercise 27. Investigate the historical profitability of betting against teams with extreme PDO values (above 103% or below 97%) in the first 20 games of the season. Find academic or industry evidence on the magnitude and sustainability of this strategy. Discuss the minimum PDO threshold and game-count filter that optimizes the risk-reward tradeoff.

Exercise 28. Study the impact of goaltender announcements on NHL betting lines. Research the typical timing of confirmations (1--2 hours pre-game), the magnitude of line movement for starter-to-backup switches, and whether the market consistently underadjusts. Propose a specific strategy for exploiting the goaltender announcement window.

Exercise 29. Analyze the structure and efficiency of the NHL puck line market. Research historical cover rates for favorites at -1.5 across different moneyline probability ranges. Determine whether the puck line market is more or less efficient than the moneyline market, and identify the specific conditions (if any) where systematic puck line value exists.

Exercise 30. Design a complete NHL betting system for a full 82-game season. Your system should integrate: (a) team-level xG and shot metrics, (b) goaltender evaluation with proper regression, (c) special teams modeling, (d) back-to-back and schedule adjustments, (e) a Poisson-based probability engine, and (f) market comparison for edge identification. Document the data pipeline, model architecture, bet selection criteria (minimum edge threshold), staking plan, and evaluation framework. Address how the system handles the preseason cold-start problem and in-season roster changes.