24 min read

> "Hockey is a unique sport in the sense that you need each and every guy helping each other and pulling in the same direction to be successful."

Learning Objectives

Build an expected goals (xG) model from shot-level data using logistic regression
Calculate and interpret Corsi, Fenwick, and other shot-based possession metrics with score and venue adjustments
Evaluate goaltenders using Goals Saved Above Expected (GSAx) and apply proper regression to the mean
Model score effects and game state including power plays, empty nets, and the turtling effect
Identify and exploit NHL-specific market patterns including puck line value, back-to-back fatigue, and home ice advantage

In This Chapter

Chapter Overview
18.1 Expected Goals (xG) Models
18.2 Corsi, Fenwick, and Shot-Based Metrics
18.3 Goaltender Evaluation and Adjustment
18.4 Score Effects and Game State
18.5 NHL Betting Market Patterns
18.6 Chapter Summary
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 18: Modeling the NHL

"Hockey is a unique sport in the sense that you need each and every guy helping each other and pulling in the same direction to be successful." --- Wayne Gretzky

Chapter Overview

Hockey is the hardest major North American sport to model. The reasons are fundamental: games are low-scoring (averaging roughly 6 total goals), the puck moves at speeds exceeding 100 mph on shots, play is continuous with on-the-fly substitutions, and the chaotic interactions of ten skaters and two goaltenders on a confined ice surface produce a level of randomness that resists simple statistical reduction. In no other sport does the gap between process and outcome create so much noise.

And yet, precisely because of this noise, hockey presents some of the most persistent and exploitable betting market inefficiencies in all of sports. When outcomes are random, the public --- and sometimes the market --- overreacts to results and underweights the underlying process. A team that dominates shot generation and expected goals but has been "unlucky" (scoring on only 6% of its shots instead of the expected 9%) will have a poor record that depresses its price. When that team's shooting percentage inevitably regresses toward the mean, its results improve, but the market has already moved on. The bettor who understood the process, not just the results, captured value at the discounted price.

This chapter provides the analytical framework to exploit that disconnect. We begin with expected goals (xG) models --- the cornerstone of modern hockey analytics --- which quantify the quality of every shot based on its location, type, and context. We then examine Corsi, Fenwick, and other shot-based metrics that measure team-level possession and territorial dominance. Goaltender evaluation follows, because the difference between an elite and a replacement-level goaltender can swing a team's expected goals against by a full goal per game. Score effects and game state analysis address the fact that teams play fundamentally differently when leading versus trailing. Finally, we survey NHL betting market patterns that create actionable edges.

In this chapter, you will learn to: - Build an xG model from scratch using logistic regression on shot-level features - Calculate score-and-venue-adjusted Corsi and Fenwick to measure true team quality - Evaluate goaltenders using GSAx and determine appropriate regression coefficients - Quantify score effects and adjust shot metrics for game state - Exploit puck line value, back-to-back fatigue, and other NHL betting patterns

18.1 Expected Goals (xG) Models

What xG Measures

Expected goals (xG) is the probability that a given shot results in a goal, estimated from the characteristics of the shot itself. An unscreened wrist shot from the blue line might have an xG of 0.02 (a 2% chance of scoring), while a one-timer from the low slot on a cross-ice pass might have an xG of 0.35 (a 35% chance). By summing the xG values of all shots in a game, we obtain a team's expected goals --- a measure of how many goals a team "deserved" to score based on the quality of its chances, independent of whether the puck actually went in.

The power of xG lies in its ability to separate the signal (shot quality and shot volume) from the noise (whether each individual shot happened to beat the goaltender). A team that generates 3.2 xG per game but only scores 2.4 actual goals is not a bad team --- it is an unlucky team whose shooting percentage is below expected. Over time, that discrepancy will close.

For bettors, xG provides the single most useful predictor of future team performance in the NHL. Teams that consistently out-xG their opponents are structurally superior, regardless of their current win-loss record.

Building an xG Model from Shot Data

An xG model is fundamentally a binary classification problem: given a shot, predict whether it is a goal (1) or not (0). The standard approach uses logistic regression, though more complex models (gradient boosting, neural networks) can capture non-linear interactions.

Key Features

The features that drive xG models, roughly in order of importance:

Shot distance: The distance from the shot location to the center of the goal. This is the single most important predictor. Shots from within 30 feet have dramatically higher conversion rates than shots from beyond 50 feet.
Shot angle: The angle between the shot location and the goal line. Shots from directly in front have higher xG than shots from sharp angles along the boards.
Shot type: Wrist shots, snap shots, slap shots, deflections, tip-ins, and backhands all have different base conversion rates. Deflections and tip-ins from close range are the highest-xG shot types.
Game state (manpower): Shots on the power play convert at higher rates because the defense is disadvantaged. Conversely, shorthanded shots convert at lower rates (but still occur). Even-strength, 5v4, 4v5, 5v3, and other situations all have different base rates.
Rebound: Whether the shot follows a previous shot within a short time window (typically 2-3 seconds). Rebounds have significantly higher xG because the goaltender is often out of position.
Rush vs. settled: Whether the shot comes in transition (rush) or in the offensive zone during a set play. Rush chances generally have higher xG.
Time since last event: Captures whether the shot comes from sustained offensive zone pressure or a quick counter-attack.
Score state: Teams trailing tend to take more shots but from lower-quality locations. Teams leading tend to take fewer but more selective shots. This is related to the "score effects" we discuss in Section 18.4.

The mathematical model for logistic regression xG is:

$$P(\text{goal} | \mathbf{x}) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k)}}$$

where $\mathbf{x} = (x_1, x_2, \ldots, x_k)$ is the vector of shot features and $\boldsymbol{\beta}$ is the vector of learned coefficients.

The loss function for fitting is the binary cross-entropy:

$$\mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{p}_i) + (1-y_i) \log(1-\hat{p}_i) \right]$$

where $y_i \in \{0, 1\}$ is the actual outcome and $\hat{p}_i$ is the predicted probability.

Python Code: Building an xG Model

"""
NHL Expected Goals (xG) Model

Builds an xG model from shot-level data using logistic regression.
Features include shot distance, angle, type, game state, and
contextual variables.

Requirements:
    pip install pandas numpy scikit-learn matplotlib
"""

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import (
    log_loss,
    brier_score_loss,
    roc_auc_score,
    calibration_curve,
)
from sklearn.calibration import CalibratedClassifierCV
import matplotlib.pyplot as plt


class XGModel:
    """
    Expected Goals (xG) model for NHL shot data.

    Uses logistic regression with engineered features to
    predict the probability that any given shot results in
    a goal. The model is calibrated to produce well-calibrated
    probability estimates suitable for aggregation.

    Attributes:
        model: The fitted sklearn Pipeline.
        feature_names: List of feature column names.
        is_fitted: Whether the model has been trained.
    """

    NUMERIC_FEATURES = [
        "shot_distance",
        "shot_angle",
        "time_since_last_event",
        "x_coordinate",
        "y_coordinate",
    ]

    CATEGORICAL_FEATURES = [
        "shot_type",
        "strength_state",
        "is_rebound",
        "is_rush",
    ]

    def __init__(self, calibrate: bool = True):
        """
        Args:
            calibrate: Whether to apply isotonic calibration
                after fitting. Recommended for probability
                estimation (as opposed to pure classification).
        """
        self.calibrate = calibrate
        self.model = None
        self.is_fitted = False

    @staticmethod
    def compute_shot_distance(x: float, y: float) -> float:
        """
        Compute shot distance from goal center.

        Assumes the goal is at coordinates (89, 0) on a standard
        NHL coordinate system where center ice is (0, 0).

        Args:
            x: X-coordinate of the shot (along the length of ice).
            y: Y-coordinate of the shot (across the width).

        Returns:
            Distance in feet from the shot to the goal center.
        """
        goal_x, goal_y = 89.0, 0.0
        return np.sqrt((x - goal_x) ** 2 + (y - goal_y) ** 2)

    @staticmethod
    def compute_shot_angle(x: float, y: float) -> float:
        """
        Compute the angle of the shot relative to the goal.

        The angle is 0 degrees when directly in front and
        increases toward 90 degrees at the goal line.

        Args:
            x: X-coordinate of the shot.
            y: Y-coordinate of the shot.

        Returns:
            Shot angle in degrees.
        """
        goal_x = 89.0
        if x >= goal_x:
            return 90.0  # Behind or at the goal line
        return np.degrees(np.arctan(abs(y) / (goal_x - x)))

    def prepare_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """
        Engineer features from raw shot data.

        Expects a DataFrame with at least the following columns:
        - x_coordinate, y_coordinate: Shot location
        - shot_type: Type of shot (wrist, slap, snap, etc.)
        - strength_state: Manpower (5v5, 5v4, 4v5, etc.)
        - seconds_since_last_event: Time since previous event
        - is_rebound: Boolean, shot within 3 sec of prior shot
        - is_rush: Boolean, shot from a rush/transition

        Args:
            df: Raw shot-level DataFrame.

        Returns:
            DataFrame with engineered features ready for modeling.
        """
        features = df.copy()

        # Compute distance and angle if not already present
        if "shot_distance" not in features.columns:
            features["shot_distance"] = features.apply(
                lambda r: self.compute_shot_distance(
                    r["x_coordinate"], r["y_coordinate"]
                ),
                axis=1,
            )
        if "shot_angle" not in features.columns:
            features["shot_angle"] = features.apply(
                lambda r: self.compute_shot_angle(
                    r["x_coordinate"], r["y_coordinate"]
                ),
                axis=1,
            )

        # Rename for consistency
        if "seconds_since_last_event" in features.columns:
            features["time_since_last_event"] = features[
                "seconds_since_last_event"
            ]

        # Convert boolean flags to string for categorical encoding
        for col in ["is_rebound", "is_rush"]:
            if col in features.columns:
                features[col] = features[col].astype(str)

        return features

    def build_pipeline(self) -> Pipeline:
        """
        Construct the sklearn preprocessing and modeling pipeline.

        Returns:
            An unfitted sklearn Pipeline.
        """
        preprocessor = ColumnTransformer(
            transformers=[
                ("num", StandardScaler(), self.NUMERIC_FEATURES),
                (
                    "cat",
                    OneHotEncoder(handle_unknown="ignore", sparse_output=False),
                    self.CATEGORICAL_FEATURES,
                ),
            ]
        )

        base_model = LogisticRegression(
            C=1.0,
            penalty="l2",
            solver="lbfgs",
            max_iter=1000,
            class_weight="balanced",
        )

        if self.calibrate:
            calibrated = CalibratedClassifierCV(
                base_model, method="isotonic", cv=5
            )
            pipeline = Pipeline([
                ("preprocessor", preprocessor),
                ("classifier", calibrated),
            ])
        else:
            pipeline = Pipeline([
                ("preprocessor", preprocessor),
                ("classifier", base_model),
            ])

        return pipeline

    def fit(self, X: pd.DataFrame, y: pd.Series) -> None:
        """
        Fit the xG model on training data.

        Args:
            X: DataFrame of shot features.
            y: Series of binary outcomes (1 = goal, 0 = no goal).
        """
        X_prepared = self.prepare_features(X)
        self.model = self.build_pipeline()
        self.model.fit(
            X_prepared[self.NUMERIC_FEATURES + self.CATEGORICAL_FEATURES], y
        )
        self.is_fitted = True
        print(f"xG model fitted on {len(y)} shots ({y.sum()} goals)")

    def predict_xg(self, X: pd.DataFrame) -> np.ndarray:
        """
        Predict xG (goal probability) for each shot.

        Args:
            X: DataFrame of shot features.

        Returns:
            Array of xG values (probabilities between 0 and 1).
        """
        if not self.is_fitted:
            raise RuntimeError("Model must be fitted before predicting.")
        X_prepared = self.prepare_features(X)
        features = X_prepared[self.NUMERIC_FEATURES + self.CATEGORICAL_FEATURES]
        return self.model.predict_proba(features)[:, 1]

    def evaluate(
        self, X: pd.DataFrame, y: pd.Series
    ) -> dict[str, float]:
        """
        Evaluate model performance on test data.

        Args:
            X: Test features.
            y: Test labels.

        Returns:
            Dictionary with log_loss, brier_score, and AUC.
        """
        xg_pred = self.predict_xg(X)

        return {
            "log_loss": round(log_loss(y, xg_pred), 4),
            "brier_score": round(brier_score_loss(y, xg_pred), 4),
            "auc_roc": round(roc_auc_score(y, xg_pred), 4),
            "mean_xg": round(xg_pred.mean(), 4),
            "actual_goal_rate": round(y.mean(), 4),
        }

    def team_xg_summary(
        self, shots_df: pd.DataFrame
    ) -> pd.DataFrame:
        """
        Aggregate shot-level xG to team-game level.

        Args:
            shots_df: DataFrame of shots with columns for
                'game_id', 'shooting_team', and shot features.

        Returns:
            DataFrame with game_id, team, total xG, actual goals,
            and the xG difference (luck measure).
        """
        xg_values = self.predict_xg(shots_df)
        shots_df = shots_df.copy()
        shots_df["xg"] = xg_values

        summary = (
            shots_df.groupby(["game_id", "shooting_team"])
            .agg(
                total_xg=("xg", "sum"),
                shots=("xg", "count"),
                goals=("is_goal", "sum"),
            )
            .reset_index()
        )
        summary["xg_diff"] = summary["goals"] - summary["total_xg"]
        return summary.round(3)


def generate_synthetic_shot_data(n_shots: int = 10000) -> pd.DataFrame:
    """
    Generate realistic synthetic NHL shot data for demonstration.

    The synthetic data mimics real NHL shot distributions:
    - Most shots come from 20-50 feet
    - Goal rate is approximately 9% overall
    - Close shots and rebounds convert at higher rates

    Args:
        n_shots: Number of shots to generate.

    Returns:
        DataFrame with shot features and outcomes.
    """
    np.random.seed(42)

    # Generate shot locations (offensive zone)
    x = np.random.uniform(25, 89, n_shots)
    y = np.random.uniform(-42, 42, n_shots)

    # Compute derived features
    distance = np.sqrt((x - 89) ** 2 + y ** 2)
    angle = np.degrees(np.arctan(np.abs(y) / np.maximum(89 - x, 0.1)))

    shot_types = np.random.choice(
        ["wrist", "slap", "snap", "backhand", "deflection", "tip"],
        n_shots,
        p=[0.40, 0.15, 0.20, 0.08, 0.10, 0.07],
    )

    strength = np.random.choice(
        ["5v5", "5v4", "4v5", "5v3", "4v4"],
        n_shots,
        p=[0.72, 0.14, 0.08, 0.02, 0.04],
    )

    is_rebound = np.random.binomial(1, 0.08, n_shots).astype(str)
    is_rush = np.random.binomial(1, 0.15, n_shots).astype(str)
    time_since = np.random.exponential(15, n_shots)

    # Generate outcomes with realistic probabilities
    # Base probability depends primarily on distance
    base_prob = 0.35 * np.exp(-0.05 * distance)

    # Adjustments
    for i in range(n_shots):
        if shot_types[i] in ("deflection", "tip"):
            base_prob[i] *= 1.8
        if shot_types[i] == "backhand":
            base_prob[i] *= 0.7
        if is_rebound[i] == "True":
            base_prob[i] *= 2.0
        if strength[i] == "5v4":
            base_prob[i] *= 1.3
        if strength[i] == "4v5":
            base_prob[i] *= 0.6

    base_prob = np.clip(base_prob, 0.01, 0.60)
    goals = np.random.binomial(1, base_prob)

    teams = np.random.choice(
        ["BOS", "TOR", "TBL", "FLA", "NYR", "CAR", "COL", "DAL",
         "VGK", "EDM", "WPG", "VAN", "MIN", "STL", "NSH", "LAK"],
        n_shots,
    )
    game_ids = np.random.randint(20001, 21300, n_shots)

    return pd.DataFrame({
        "game_id": game_ids,
        "shooting_team": teams,
        "x_coordinate": x,
        "y_coordinate": y,
        "shot_distance": distance,
        "shot_angle": angle,
        "shot_type": shot_types,
        "strength_state": strength,
        "is_rebound": is_rebound,
        "is_rush": is_rush,
        "seconds_since_last_event": time_since,
        "is_goal": goals,
    })


# --- Example Usage ---
if __name__ == "__main__":
    # Generate synthetic data
    shots = generate_synthetic_shot_data(n_shots=20000)
    print(f"Generated {len(shots)} shots, {shots['is_goal'].sum()} goals "
          f"({shots['is_goal'].mean():.1%} rate)")

    # Split into train/test
    X = shots.drop(columns=["is_goal"])
    y = shots["is_goal"]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.25, random_state=42
    )

    # Build and fit the model
    xg_model = XGModel(calibrate=True)
    xg_model.fit(X_train, y_train)

    # Evaluate
    metrics = xg_model.evaluate(X_test, y_test)
    print("\n=== Model Evaluation ===")
    for metric, value in metrics.items():
        print(f"  {metric}: {value}")

    # Team-level xG summary
    test_with_goals = X_test.copy()
    test_with_goals["is_goal"] = y_test
    team_summary = xg_model.team_xg_summary(test_with_goals)
    team_agg = (
        team_summary.groupby("shooting_team")
        .agg(
            games=("game_id", "nunique"),
            total_xg=("total_xg", "sum"),
            total_goals=("goals", "sum"),
            total_shots=("shots", "sum"),
        )
        .reset_index()
    )
    team_agg["xg_per_game"] = (team_agg["total_xg"] / team_agg["games"]).round(2)
    team_agg["goals_per_game"] = (team_agg["total_goals"] / team_agg["games"]).round(2)
    team_agg["luck"] = (team_agg["goals_per_game"] - team_agg["xg_per_game"]).round(2)

    print("\n=== Team xG Summary (Test Set) ===")
    print(team_agg.sort_values("xg_per_game", ascending=False).to_string(index=False))

Interpreting xG for Betting

The key outputs of an xG model for betting are:

Team xGF (expected goals for) per game: Higher is better. Measures offensive quality.
Team xGA (expected goals against) per game: Lower is better. Measures defensive quality.
xG differential (xGF - xGA): The most important single number. Positive means the team is creating more quality chances than it allows.
xG-to-actual-goals gap: When a team's actual goals diverge significantly from its xG (in either direction), regression is likely. This is the primary signal for finding mispriced teams.

A team with an xG differential of +0.5 per game (e.g., 3.0 xGF and 2.5 xGA) is a strong team that, over a full 82-game season, "deserves" to outscore opponents by roughly 41 goals. If that team's actual goal differential is only +15 due to poor shooting percentage and/or bad goaltending luck, the market will undervalue them based on their win-loss record.

Market Insight: The correlation between xG differential and points earned in the subsequent 20-game window is approximately 0.55-0.65, compared to only 0.35-0.45 for actual goal differential. This means xG is a better predictor of future results than past results are. In the NHL, the process is more predictive than the outcome --- a fact that the betting market consistently underweights.

18.2 Corsi, Fenwick, and Shot-Based Metrics

Shot Attempts as a Proxy for Possession

Unlike basketball or soccer, hockey does not have a direct "possession" statistic tracked by the league. Instead, hockey analysts use shot attempts as a proxy for territorial control. The logic is straightforward: a team that is spending more time in the offensive zone will generate more shot attempts, while a team pinned in its own zone will generate fewer. Shot attempt differential, therefore, approximates puck possession time.

Corsi

Corsi counts all shot attempts: shots on goal, missed shots, and blocked shots. The team-level metric Corsi For percentage (CF%) is:

$$\text{CF\%} = \frac{\text{Shot attempts for}}{\text{Shot attempts for} + \text{Shot attempts against}} \times 100$$

A CF% of 55% means the team directed 55% of all shot attempts in a game, indicating strong territorial dominance. League average is, by definition, 50%.

Corsi is valuable because it is high-volume (a typical NHL game features 100-120 total shot attempts) and stabilizes quickly. After 20-25 games, a team's CF% becomes a reliable indicator of underlying quality.

Fenwick

Fenwick is a refinement of Corsi that excludes blocked shots. The rationale: a shot that is blocked never reaches the goaltender and may reflect the defensive team's shot-blocking ability more than the offensive team's shot generation. Fenwick For percentage (FF%) is:

$$\text{FF\%} = \frac{\text{Unblocked shot attempts for}}{\text{Unblocked shot attempts for} + \text{Unblocked shot attempts against}} \times 100$$

In practice, Corsi and Fenwick are highly correlated ($r > 0.95$), and both are useful predictors. Some analysts prefer Fenwick; others prefer Corsi for its larger sample size.

PDO: The Luck Metric

PDO is the sum of a team's shooting percentage and save percentage at even strength:

$$\text{PDO} = \text{Sh\%} + \text{Sv\%}$$

League average PDO is, by definition, 100.0% (approximately 9% shooting + 91% save). PDO is the single most mean-reverting metric in hockey. Teams with PDO above 102% are almost certainly benefiting from unsustainable good luck (hot shooting and/or hot goaltending), while teams below 98% are likely unlucky.

The regression properties of PDO are so strong that simply betting against teams with extreme PDO values has been historically profitable. A team with a 104% PDO early in the season is virtually guaranteed to regress, which means its win rate will decline even if its underlying play quality remains constant.

Score and Venue Adjustments

Raw Corsi and Fenwick are biased by two factors that must be corrected:

Score effects: Teams with a lead tend to play more defensively (reducing their own shot attempts and conceding more), while trailing teams press harder (increasing their shot attempts). Raw CF% therefore overvalues teams that spend more time trailing and undervalues teams that frequently hold leads. See Section 18.4 for details.
Venue effects: Home teams generate approximately 1-3% more shot attempts than away teams, even after controlling for team quality. This home-ice shot advantage should be adjusted out when evaluating team quality.

The standard approach is to compute score-and-venue-adjusted Corsi (sometimes called "adjusted CF%" or "expected CF%") by applying multiplicative adjustments for each game state and venue.

Python Code: Shot Metrics Calculator

"""
NHL Shot-Based Metrics Calculator

Computes Corsi, Fenwick, PDO, and score/venue-adjusted
versions from play-by-play data.

Requirements:
    pip install pandas numpy
"""

import pandas as pd
import numpy as np
from typing import Optional


class ShotMetricsCalculator:
    """
    Computes NHL shot-based possession and luck metrics.

    Handles raw calculations, score adjustments, and venue
    adjustments to produce accurate team-level metrics for
    use in betting models.

    Attributes:
        score_adjustments: Dictionary mapping score states to
            shot attempt adjustment factors.
    """

    # Score state adjustments (multiplicative factors)
    # Derived from league-wide data: how shot rates change by score state
    # Format: {goal_differential: (CF_adjustment, CA_adjustment)}
    # Trailing teams shoot more; leading teams shoot less
    SCORE_ADJUSTMENTS = {
        -3: (1.15, 0.82),  # Trailing by 3+
        -2: (1.10, 0.87),  # Trailing by 2
        -1: (1.06, 0.93),  # Trailing by 1
         0: (1.00, 1.00),  # Tied
         1: (0.93, 1.06),  # Leading by 1
         2: (0.87, 1.10),  # Leading by 2
         3: (0.82, 1.15),  # Leading by 3+
    }

    # Home ice adjustment: home teams generate ~2% more shots
    HOME_ADJUSTMENT = 1.02
    AWAY_ADJUSTMENT = 0.98

    def __init__(self):
        pass

    def compute_raw_corsi(
        self, pbp: pd.DataFrame, team: str
    ) -> dict[str, float]:
        """
        Compute raw Corsi metrics for a team from play-by-play data.

        Args:
            pbp: Play-by-play DataFrame with columns:
                - event_type: 'shot', 'missed_shot', 'blocked_shot', 'goal'
                - event_team: Team that generated the event
                - game_id: Unique game identifier
                - strength_state: e.g., '5v5', '5v4'
            team: Team abbreviation to compute metrics for.

        Returns:
            Dictionary with CF (Corsi For), CA (Corsi Against),
            CF% and raw shot counts.
        """
        # Filter to even strength (5v5) for standard analysis
        es = pbp[pbp["strength_state"] == "5v5"].copy()

        shot_events = ["shot", "missed_shot", "blocked_shot", "goal"]
        all_shots = es[es["event_type"].isin(shot_events)]

        cf = len(all_shots[all_shots["event_team"] == team])
        ca = len(all_shots[all_shots["event_team"] != team])
        total = cf + ca

        cf_pct = (cf / total * 100) if total > 0 else 50.0

        return {
            "corsi_for": cf,
            "corsi_against": ca,
            "cf_pct": round(cf_pct, 1),
            "total_corsi_events": total,
        }

    def compute_raw_fenwick(
        self, pbp: pd.DataFrame, team: str
    ) -> dict[str, float]:
        """
        Compute raw Fenwick (unblocked shot attempts) metrics.

        Args:
            pbp: Play-by-play DataFrame.
            team: Team abbreviation.

        Returns:
            Dictionary with FF, FA, and FF%.
        """
        es = pbp[pbp["strength_state"] == "5v5"].copy()

        # Fenwick excludes blocked shots
        fenwick_events = ["shot", "missed_shot", "goal"]
        all_fenwick = es[es["event_type"].isin(fenwick_events)]

        ff = len(all_fenwick[all_fenwick["event_team"] == team])
        fa = len(all_fenwick[all_fenwick["event_team"] != team])
        total = ff + fa

        ff_pct = (ff / total * 100) if total > 0 else 50.0

        return {
            "fenwick_for": ff,
            "fenwick_against": fa,
            "ff_pct": round(ff_pct, 1),
        }

    def compute_pdo(
        self, pbp: pd.DataFrame, team: str
    ) -> dict[str, float]:
        """
        Compute PDO (shooting% + save%) for a team.

        Args:
            pbp: Play-by-play DataFrame.
            team: Team abbreviation.

        Returns:
            Dictionary with shooting%, save%, and PDO.
        """
        es = pbp[pbp["strength_state"] == "5v5"].copy()

        shots_for = es[
            (es["event_team"] == team)
            & (es["event_type"].isin(["shot", "goal"]))
        ]
        goals_for = len(shots_for[shots_for["event_type"] == "goal"])
        total_shots_for = len(shots_for)

        shots_against = es[
            (es["event_team"] != team)
            & (es["event_type"].isin(["shot", "goal"]))
        ]
        goals_against = len(shots_against[shots_against["event_type"] == "goal"])
        total_shots_against = len(shots_against)

        sh_pct = (goals_for / total_shots_for * 100) if total_shots_for > 0 else 9.0
        sv_pct = (
            (1 - goals_against / total_shots_against) * 100
            if total_shots_against > 0
            else 91.0
        )
        pdo = sh_pct + sv_pct

        return {
            "shooting_pct": round(sh_pct, 2),
            "save_pct": round(sv_pct, 2),
            "pdo": round(pdo, 2),
        }

    def compute_score_adjusted_corsi(
        self,
        pbp: pd.DataFrame,
        team: str,
        is_home: bool,
    ) -> dict[str, float]:
        """
        Compute score-and-venue-adjusted Corsi.

        Adjusts each shot attempt based on the score state at
        the time it occurred, normalizing to the tied-game rate.
        Also applies a home/away venue adjustment.

        Args:
            pbp: Play-by-play DataFrame with 'score_diff_for_team'
                column (team's goal lead at time of event).
            team: Team abbreviation.
            is_home: Whether the team is the home team.

        Returns:
            Dictionary with adjusted CF, CA, and CF%.
        """
        es = pbp[pbp["strength_state"] == "5v5"].copy()
        shot_events = ["shot", "missed_shot", "blocked_shot", "goal"]
        all_shots = es[es["event_type"].isin(shot_events)].copy()

        # Cap score differential at +/- 3
        all_shots["score_state"] = all_shots["score_diff_for_team"].clip(-3, 3)

        adjusted_cf = 0.0
        adjusted_ca = 0.0

        for _, shot in all_shots.iterrows():
            score_state = int(shot["score_state"])

            if shot["event_team"] == team:
                # This is a "for" event
                cf_adj, _ = self.SCORE_ADJUSTMENTS.get(
                    score_state, (1.0, 1.0)
                )
                # Divide by the adjustment to normalize: if trailing
                # teams shoot 10% more, we divide by 1.10 to get the
                # "tied-equivalent" shot count
                adjusted_cf += 1.0 / cf_adj
            else:
                # This is an "against" event
                _, ca_adj = self.SCORE_ADJUSTMENTS.get(
                    score_state, (1.0, 1.0)
                )
                adjusted_ca += 1.0 / ca_adj

        # Venue adjustment
        venue_adj = self.HOME_ADJUSTMENT if is_home else self.AWAY_ADJUSTMENT
        adjusted_cf /= venue_adj
        adjusted_ca *= venue_adj  # Opponent gets reverse adjustment

        total = adjusted_cf + adjusted_ca
        adj_cf_pct = (adjusted_cf / total * 100) if total > 0 else 50.0

        return {
            "adj_corsi_for": round(adjusted_cf, 1),
            "adj_corsi_against": round(adjusted_ca, 1),
            "adj_cf_pct": round(adj_cf_pct, 1),
        }

    def team_season_summary(
        self,
        all_pbp: pd.DataFrame,
        games: pd.DataFrame,
    ) -> pd.DataFrame:
        """
        Compute season-long adjusted metrics for all teams.

        Args:
            all_pbp: Full season play-by-play data.
            games: DataFrame with game_id, home_team, away_team.

        Returns:
            DataFrame with one row per team and columns for
            raw and adjusted Corsi, Fenwick, and PDO.
        """
        results = []

        teams = set(games["home_team"].unique()) | set(games["away_team"].unique())

        for team in sorted(teams):
            # Get all games for this team
            team_games = games[
                (games["home_team"] == team) | (games["away_team"] == team)
            ]

            total_adj_cf = 0.0
            total_adj_ca = 0.0
            total_raw_cf = 0
            total_raw_ca = 0
            n_games = 0

            for _, game in team_games.iterrows():
                game_pbp = all_pbp[all_pbp["game_id"] == game["game_id"]]
                is_home = game["home_team"] == team

                raw = self.compute_raw_corsi(game_pbp, team)
                adj = self.compute_score_adjusted_corsi(game_pbp, team, is_home)

                total_raw_cf += raw["corsi_for"]
                total_raw_ca += raw["corsi_against"]
                total_adj_cf += adj["adj_corsi_for"]
                total_adj_ca += adj["adj_corsi_against"]
                n_games += 1

            pdo = self.compute_pdo(
                all_pbp[all_pbp["game_id"].isin(team_games["game_id"])],
                team,
            )

            raw_cf_pct = (
                total_raw_cf / (total_raw_cf + total_raw_ca) * 100
                if (total_raw_cf + total_raw_ca) > 0
                else 50.0
            )
            adj_cf_pct = (
                total_adj_cf / (total_adj_cf + total_adj_ca) * 100
                if (total_adj_cf + total_adj_ca) > 0
                else 50.0
            )

            results.append({
                "team": team,
                "games": n_games,
                "raw_cf_pct": round(raw_cf_pct, 1),
                "adj_cf_pct": round(adj_cf_pct, 1),
                "pdo": pdo["pdo"],
                "sh_pct": pdo["shooting_pct"],
                "sv_pct": pdo["save_pct"],
            })

        return pd.DataFrame(results).sort_values("adj_cf_pct", ascending=False)


# --- Example Usage ---
if __name__ == "__main__":
    # Generate synthetic play-by-play data
    np.random.seed(42)
    n_events = 50000

    synthetic_pbp = pd.DataFrame({
        "game_id": np.random.randint(20001, 20100, n_events),
        "event_type": np.random.choice(
            ["shot", "missed_shot", "blocked_shot", "goal"],
            n_events,
            p=[0.45, 0.20, 0.20, 0.15],
        ),
        "event_team": np.random.choice(
            ["BOS", "TOR", "TBL", "NYR", "CAR", "COL"],
            n_events,
        ),
        "strength_state": np.random.choice(
            ["5v5", "5v4", "4v5", "4v4"],
            n_events,
            p=[0.75, 0.12, 0.08, 0.05],
        ),
        "score_diff_for_team": np.random.choice(
            [-3, -2, -1, 0, 1, 2, 3],
            n_events,
            p=[0.03, 0.08, 0.18, 0.42, 0.18, 0.08, 0.03],
        ),
    })

    calc = ShotMetricsCalculator()

    # Compute metrics for Boston
    raw_corsi = calc.compute_raw_corsi(synthetic_pbp, "BOS")
    raw_fenwick = calc.compute_raw_fenwick(synthetic_pbp, "BOS")
    pdo = calc.compute_pdo(synthetic_pbp, "BOS")
    adj_corsi = calc.compute_score_adjusted_corsi(synthetic_pbp, "BOS", is_home=True)

    print("=== Boston Bruins: Shot Metrics ===")
    print(f"Raw Corsi:     CF={raw_corsi['corsi_for']}, CA={raw_corsi['corsi_against']}, CF%={raw_corsi['cf_pct']}")
    print(f"Raw Fenwick:   FF={raw_fenwick['fenwick_for']}, FA={raw_fenwick['fenwick_against']}, FF%={raw_fenwick['ff_pct']}")
    print(f"PDO:           Sh%={pdo['shooting_pct']}, Sv%={pdo['save_pct']}, PDO={pdo['pdo']}")
    print(f"Adj Corsi:     CF={adj_corsi['adj_corsi_for']}, CA={adj_corsi['adj_corsi_against']}, CF%={adj_corsi['adj_cf_pct']}")

When Shot Quantity Versus Shot Quality Matters

A persistent debate in hockey analytics concerns whether shot volume (Corsi/Fenwick) or shot quality (xG) is more useful for prediction. The answer depends on the context:

Team-level, long-term prediction: CF% and xG differential are similarly predictive of future standings points ($r \approx 0.55$-$0.65$ over 20+ game windows). CF% has the advantage of larger sample sizes and faster stabilization.
Game-level prediction: xG is more informative because it captures the quality dimension that raw shot counts miss. Two teams can both have CF% of 52%, but one may generate high-danger chances from the slot while the other takes low-percentage shots from the perimeter.
Betting applications: Use both. CF% as a quick filter for team quality (is this team good at controlling play?), and xG for finer-grained game-level predictions (how many goals should we expect?).

Common Pitfall: Do not treat Corsi as a standalone metric. A team with 55% CF% but 48% xGF% is generating lots of shots from low-quality locations --- quantity without quality. Modern analytics has moved decisively toward xG-based models, with shot metrics serving as supplementary evidence.

18.3 Goaltender Evaluation and Adjustment

The Goaltender Problem

Goaltenders are the most impactful individual position in team sports, yet they are also the hardest to evaluate accurately. The traditional metric, save percentage (Sv%), is extraordinarily noisy: the difference between a .920 (good) and .910 (below average) goaltender is one additional save per 100 shots. With a typical goaltender facing 1,500-2,000 shots per season, the expected difference is only 15-20 goals over 60+ starts --- easily swamped by the quality of shots faced and defensive play in front of the goaltender.

For bettors, this creates a critical problem: the market often misprices goaltenders based on recent save percentage, which is heavily influenced by luck and shot quality. A goaltender with a .935 Sv% over a 15-game stretch may be playing well, or he may simply be facing easy shots. A goaltender with a .900 Sv% may be genuinely struggling, or he may be facing an unusually difficult workload.

Goals Saved Above Expected (GSAx)

Goals Saved Above Expected (GSAx) solves this problem by comparing a goaltender's actual goals allowed to the goals he would be "expected" to allow based on the xG model applied to the shots he faced:

$$\text{GSAx} = \text{xGA} - \text{Actual GA}$$

where xGA is the sum of xG values of all shots faced by the goaltender. Positive GSAx means the goaltender saved more goals than expected (good); negative GSAx means he allowed more than expected (bad).

GSAx adjusts for shot quality automatically. A goaltender facing 30 shots per game from the slot will have a lower raw save percentage than one facing 30 shots per game from the point, but GSAx will accurately reflect their relative skill.

Goaltender Consistency and Sample Size

Even GSAx requires substantial sample sizes to stabilize. The challenge is fundamental: goals are rare events (only about 9% of shots score), so the variance on any individual shot outcome is high.

Approximate stabilization points for goaltender metrics:

Metric	Shots to Stabilize	Games (approx.)
Save percentage	~4,000-6,000	3-4 full seasons
GSAx per shot	~2,500-3,500	2-3 full seasons
High-danger Sv%	~1,000-1,500	1-2 seasons
Rebound control	~800-1,200	1-1.5 seasons

These stabilization periods are dramatically longer than for skater metrics. Within a single season, a goaltender's GSAx contains substantial noise, and heavy regression to the mean is necessary.

Regression to the Mean for Goaltenders

Regression to the mean is critical for goaltender evaluation. The general framework:

$$\text{Projected GSAx/shot} = \frac{n}{n + k} \times \text{observed GSAx/shot} + \frac{k}{n + k} \times \text{prior (league average)}$$

where $n$ is the number of shots faced and $k$ is the regression constant (approximately 2,500-3,500 shots for GSAx).

For a goaltender who has faced 1,000 shots with a GSAx of +8.0 (approximately +0.008 per shot):

$$\text{Projected} = \frac{1000}{1000 + 3000} \times 0.008 + \frac{3000}{4000} \times 0.000 = 0.25 \times 0.008 = 0.002 \text{ per shot}$$

Over 1,500 future shots, this projects to a GSAx of only $+3.0$ --- substantially less than the observed $+8.0$. The heavy regression reflects the enormous noise in goaltender performance.

Python Code: Goaltender Evaluation

"""
NHL Goaltender Evaluation Module

Computes GSAx, applies regression to the mean, and
projects goaltender performance for betting models.

Requirements:
    pip install pandas numpy
"""

import pandas as pd
import numpy as np
from dataclasses import dataclass


@dataclass
class GoaltenderProfile:
    """
    Comprehensive goaltender profile for betting analysis.

    Attributes:
        name: Goaltender name.
        team: Current team abbreviation.
        shots_faced: Total shots faced this season.
        goals_against: Actual goals allowed.
        xga: Expected goals against (from xG model).
        gsax: Goals Saved Above Expected.
        games_played: Number of games started.
    """
    name: str
    team: str
    shots_faced: int
    goals_against: int
    xga: float
    gsax: float
    games_played: int

    @property
    def save_pct(self) -> float:
        """Raw save percentage."""
        if self.shots_faced == 0:
            return 0.0
        return round(1 - self.goals_against / self.shots_faced, 4)

    @property
    def xsv_pct(self) -> float:
        """Expected save percentage (based on xG of shots faced)."""
        if self.shots_faced == 0:
            return 0.0
        return round(1 - self.xga / self.shots_faced, 4)

    @property
    def gsax_per_shot(self) -> float:
        """GSAx per shot faced."""
        if self.shots_faced == 0:
            return 0.0
        return round(self.gsax / self.shots_faced, 5)

    @property
    def gsax_per_game(self) -> float:
        """GSAx per game played."""
        if self.games_played == 0:
            return 0.0
        return round(self.gsax / self.games_played, 2)


class GoaltenderRegressor:
    """
    Applies Bayesian regression to goaltender metrics to
    produce stabilized projections.

    The regression framework shrinks observed performance
    toward the league-average prior based on sample size.

    Attributes:
        regression_constant: Number of shots needed for
            the observed GSAx to carry 50% weight.
        prior_gsax_per_shot: The league-average GSAx per shot
            (by definition, 0.0 for an average goaltender).
    """

    def __init__(
        self,
        regression_constant: float = 3000.0,
        prior_gsax_per_shot: float = 0.0,
    ):
        self.regression_constant = regression_constant
        self.prior_gsax_per_shot = prior_gsax_per_shot

    def regressed_gsax_per_shot(
        self, observed_gsax_per_shot: float, shots_faced: int
    ) -> float:
        """
        Compute the regression-adjusted GSAx per shot.

        Args:
            observed_gsax_per_shot: Raw GSAx per shot from data.
            shots_faced: Number of shots in the sample.

        Returns:
            Regressed GSAx per shot (closer to prior with small samples).
        """
        weight = shots_faced / (shots_faced + self.regression_constant)
        regressed = (
            weight * observed_gsax_per_shot
            + (1 - weight) * self.prior_gsax_per_shot
        )
        return round(regressed, 5)

    def project_gsax(
        self,
        goalie: GoaltenderProfile,
        future_shots: int = 1500,
    ) -> dict[str, float]:
        """
        Project a goaltender's future GSAx.

        Args:
            goalie: The goaltender's current profile.
            future_shots: Number of future shots to project over.

        Returns:
            Dictionary with regressed rate, projected GSAx,
            and confidence interval.
        """
        regressed_rate = self.regressed_gsax_per_shot(
            goalie.gsax_per_shot, goalie.shots_faced
        )
        projected_gsax = regressed_rate * future_shots

        # Confidence interval using binomial approximation
        # Standard error of GSAx per shot ~ sqrt(p*(1-p)/n)
        # where p is approximately the goal rate (~0.09)
        se_per_shot = np.sqrt(0.09 * 0.91 / max(goalie.shots_faced, 1))
        weight = goalie.shots_faced / (goalie.shots_faced + self.regression_constant)

        projected_se = weight * se_per_shot * future_shots
        ci_lower = projected_gsax - 1.96 * projected_se
        ci_upper = projected_gsax + 1.96 * projected_se

        return {
            "observed_gsax_per_shot": goalie.gsax_per_shot,
            "regressed_gsax_per_shot": regressed_rate,
            "regression_weight": round(
                goalie.shots_faced
                / (goalie.shots_faced + self.regression_constant),
                3,
            ),
            "projected_gsax": round(projected_gsax, 1),
            "ci_lower": round(ci_lower, 1),
            "ci_upper": round(ci_upper, 1),
        }

    def compare_goaltenders(
        self,
        goalie_a: GoaltenderProfile,
        goalie_b: GoaltenderProfile,
        shots_per_game: float = 30.0,
    ) -> dict[str, float]:
        """
        Compare two goaltenders' projected impact per game.

        Useful for evaluating matchups: how many goals per game
        does Goalie A save compared to Goalie B?

        Args:
            goalie_a: First goaltender.
            goalie_b: Second goaltender.
            shots_per_game: Average shots faced per game.

        Returns:
            Dictionary with per-game GSAx projections for each
            goaltender and the differential.
        """
        rate_a = self.regressed_gsax_per_shot(
            goalie_a.gsax_per_shot, goalie_a.shots_faced
        )
        rate_b = self.regressed_gsax_per_shot(
            goalie_b.gsax_per_shot, goalie_b.shots_faced
        )

        gsax_per_game_a = rate_a * shots_per_game
        gsax_per_game_b = rate_b * shots_per_game
        differential = gsax_per_game_a - gsax_per_game_b

        return {
            "goalie_a": goalie_a.name,
            "goalie_a_gsax_per_game": round(gsax_per_game_a, 3),
            "goalie_b": goalie_b.name,
            "goalie_b_gsax_per_game": round(gsax_per_game_b, 3),
            "differential_per_game": round(differential, 3),
            "differential_interpretation": (
                f"{goalie_a.name} saves {abs(differential):.2f} more goals per game"
                if differential > 0
                else f"{goalie_b.name} saves {abs(differential):.2f} more goals per game"
            ),
        }


# --- Worked Example ---
if __name__ == "__main__":
    # Define goaltender profiles
    elite_goalie = GoaltenderProfile(
        name="Igor Shesterkin",
        team="NYR",
        shots_faced=1800,
        goals_against=140,
        xga=165.0,
        gsax=25.0,
        games_played=58,
    )

    average_goalie = GoaltenderProfile(
        name="Average Starter",
        team="AVG",
        shots_faced=1600,
        goals_against=148,
        xga=148.0,
        gsax=0.0,
        games_played=52,
    )

    struggling_goalie = GoaltenderProfile(
        name="Struggling Goalie",
        team="STR",
        shots_faced=900,
        goals_against=95,
        xga=82.0,
        gsax=-13.0,
        games_played=30,
    )

    backup_hot_streak = GoaltenderProfile(
        name="Hot Backup",
        team="HOT",
        shots_faced=400,
        goals_against=28,
        xga=38.0,
        gsax=10.0,
        games_played=14,
    )

    regressor = GoaltenderRegressor(regression_constant=3000)

    print("=== Goaltender Profiles ===")
    for g in [elite_goalie, average_goalie, struggling_goalie, backup_hot_streak]:
        print(f"\n{g.name} ({g.team}):")
        print(f"  Sv%: {g.save_pct:.3f} | xSv%: {g.xsv_pct:.3f}")
        print(f"  GSAx: {g.gsax:+.1f} | GSAx/shot: {g.gsax_per_shot:+.5f}")
        print(f"  GSAx/game: {g.gsax_per_game:+.2f}")

        proj = regressor.project_gsax(g, future_shots=1500)
        print(f"  Regression weight: {proj['regression_weight']:.1%}")
        print(f"  Regressed GSAx/shot: {proj['regressed_gsax_per_shot']:+.5f}")
        print(f"  Projected GSAx (1500 shots): {proj['projected_gsax']:+.1f} "
              f"[{proj['ci_lower']:+.1f}, {proj['ci_upper']:+.1f}]")

    # Compare elite vs average
    print("\n=== Goaltender Comparison ===")
    comp = regressor.compare_goaltenders(elite_goalie, average_goalie)
    for k, v in comp.items():
        print(f"  {k}: {v}")

    # Key insight: the hot backup
    print("\n=== Regression Warning: Hot Backup ===")
    comp_backup = regressor.compare_goaltenders(backup_hot_streak, elite_goalie)
    for k, v in comp_backup.items():
        print(f"  {k}: {v}")
    print("  NOTE: Despite a higher raw GSAx/game, the backup's small sample")
    print("  means heavy regression. The elite goalie projects better going forward.")

Goaltending as a Source of Market Mispricing

Goaltender evaluation is arguably the single largest source of mispricing in NHL betting markets. The market tends to:

Overweight recent save percentage: A goaltender with a .940 Sv% over his last 10 games will have his team's line shortened significantly. But .940 is unsustainable for almost any goaltender --- even the best in the world average .920-.925 over a full season. The regression is inevitable.
Underweight backup-to-starter differences: The difference between a team's starter and backup goaltender is often 0.3-0.5 expected goals per game, but lines frequently move by less than this when a backup is announced.
Ignore shot quality context: A goaltender posting a .900 Sv% behind a poor defensive team may actually be performing well (high GSAx) because he faces an unusually high volume of high-danger shots. The market sees .900 and sells.

Market Insight: Track goaltender announcements closely. When a team's backup is confirmed (typically 1-2 hours before game time), re-run your model with the backup's regressed GSAx rate. If the line has not moved sufficiently to account for the downgrade, bet the other side.

18.4 Score Effects and Game State

The Turtling Effect

"Turtling" is the colloquial term for the tendency of NHL teams to play more defensively when holding a lead. The effect is statistically dramatic:

Tied: Teams generate approximately 50/50 shot attempts (by definition, on average).
Leading by 1: The leading team's CF% drops to approximately 47%. They concede more shots but from lower-quality locations.
Leading by 2: CF% drops further to approximately 43-44%.
Leading by 3+: CF% can drop below 40%.

Conversely, trailing teams press aggressively, generating more shot attempts from higher-quality locations.

This has profound implications for analysis. A team's raw CF% or xG differential is heavily contaminated by the score states it played in. A team that holds many leads (because it has good goaltending or scores efficiently) will have a suppressed raw CF% that makes it look worse than it truly is at controlling play. Score adjustment corrects for this.

Empty Net Adjustments

When a team is trailing by 1 or 2 goals in the final minutes of a game, it typically pulls its goaltender for an extra attacker (the "empty net" situation). This creates a 6v5 manpower advantage that generates a high volume of shot attempts --- but also leads to frequent empty-net goals that inflate the leading team's scoring statistics.

For modeling purposes, empty-net events should be: 1. Excluded from shot metrics (Corsi, Fenwick) and xG calculations, because they represent a fundamentally different game state. 2. Included in final score calculations for betting purposes (since bets settle on actual final scores including empty-net goals). 3. Modeled separately when projecting game totals, because the probability and timing of empty-net situations depend on the score state.

Power Play and Penalty Kill Impact

Special teams (power play and penalty kill) operate under fundamentally different dynamics than even-strength play. The 5v4 power play creates approximately 60-65% of shot attempts and a significantly higher shooting percentage due to defensive coverage breakdowns.

Key metrics for special teams evaluation:

Power play percentage (PP%): Goals scored per power play opportunity. League average is approximately 20-22%.
Penalty kill percentage (PK%): Percentage of penalty kills without allowing a goal. League average is approximately 78-80%.
Power play xG per 60 minutes (PP xG/60): Expected goals generated per 60 minutes of power play time. More process-oriented than raw PP%.
Penalty kill xGA per 60 minutes (PK xGA/60): Expected goals allowed per 60 minutes of penalty killing.

The impact on game outcomes is significant: teams average approximately 3-4 power play opportunities per game. A team with a 28% PP% (elite) facing a team with a 75% PK% (poor) gains roughly 0.3-0.4 expected goals per game from special teams alone.

Python Code: Score Effects and Game State Model

"""
NHL Score Effects and Game State Model

Quantifies how teams play differently based on score state,
models empty net impact, and integrates power play effects
into game projections.

Requirements:
    pip install pandas numpy scipy
"""

import pandas as pd
import numpy as np
from scipy.stats import poisson
from dataclasses import dataclass


@dataclass
class TeamGameState:
    """
    Captures a team's performance across different game states.

    Attributes:
        team: Team abbreviation.
        ev_xgf_per60: Even-strength xGF per 60 minutes (score-adjusted).
        ev_xga_per60: Even-strength xGA per 60 minutes (score-adjusted).
        pp_xgf_per60: Power play xGF per 60 minutes.
        pk_xga_per60: Penalty kill xGA per 60 minutes.
        pp_opportunities_per_game: Average power play opportunities per game.
        pk_times_per_game: Average times shorthanded per game.
        pp_minutes_per_opp: Average minutes per power play opportunity.
    """
    team: str
    ev_xgf_per60: float
    ev_xga_per60: float
    pp_xgf_per60: float
    pk_xga_per60: float
    pp_opportunities_per_game: float
    pk_times_per_game: float
    pp_minutes_per_opp: float = 1.8  # ~1.8 minutes average PP duration


class ScoreEffectsModel:
    """
    Models how game flow changes based on score state and
    integrates special teams into game projections.
    """

    # Shot rate multipliers by score state (for the team in that state)
    # Score state is from the perspective of the team we're modeling
    SHOT_RATE_MULTIPLIERS = {
        -3: {"xgf_mult": 1.20, "xga_mult": 0.80},  # Trailing by 3+
        -2: {"xgf_mult": 1.12, "xga_mult": 0.88},
        -1: {"xgf_mult": 1.06, "xga_mult": 0.94},
         0: {"xgf_mult": 1.00, "xga_mult": 1.00},  # Tied
         1: {"xgf_mult": 0.94, "xga_mult": 1.06},
         2: {"xgf_mult": 0.88, "xga_mult": 1.12},
         3: {"xgf_mult": 0.80, "xga_mult": 1.20},  # Leading by 3+
    }

    def expected_even_strength_goals(
        self,
        team: TeamGameState,
        opponent: TeamGameState,
        ev_minutes: float = 48.0,
    ) -> dict[str, float]:
        """
        Compute expected even-strength goals for both teams.

        Args:
            team: Our team's game state profile.
            opponent: Opposing team's game state profile.
            ev_minutes: Expected even-strength minutes (typically 46-50).

        Returns:
            Dictionary with expected 5v5 goals for each team.
        """
        # Team's xGF is driven by their offense vs opponent's defense
        # We use the geometric mean of the team's xGF rate and
        # the opponent's xGA rate, normalized to league average
        league_avg_xg_per60 = 2.5  # Approximate league average

        team_off_factor = team.ev_xgf_per60 / league_avg_xg_per60
        opp_def_factor = opponent.ev_xga_per60 / league_avg_xg_per60

        team_xgf_per60 = league_avg_xg_per60 * np.sqrt(team_off_factor * opp_def_factor)

        opp_off_factor = opponent.ev_xgf_per60 / league_avg_xg_per60
        team_def_factor = team.ev_xga_per60 / league_avg_xg_per60

        opp_xgf_per60 = league_avg_xg_per60 * np.sqrt(opp_off_factor * team_def_factor)

        team_ev_goals = team_xgf_per60 * ev_minutes / 60.0
        opp_ev_goals = opp_xgf_per60 * ev_minutes / 60.0

        return {
            "team_ev_xg": round(team_ev_goals, 2),
            "opponent_ev_xg": round(opp_ev_goals, 2),
        }

    def expected_special_teams_goals(
        self,
        team: TeamGameState,
        opponent: TeamGameState,
    ) -> dict[str, float]:
        """
        Compute expected goals from power plays and penalty kills.

        Args:
            team: Our team's game state profile.
            opponent: Opposing team's game state profile.

        Returns:
            Dictionary with expected PP goals for and PK goals against.
        """
        # Team's power play goals
        team_pp_minutes = (
            team.pp_opportunities_per_game * team.pp_minutes_per_opp
        )
        team_pp_goals = team.pp_xgf_per60 * team_pp_minutes / 60.0

        # Adjust for opponent's PK quality
        league_avg_pk_xga = 7.5  # per 60 minutes
        opp_pk_factor = opponent.pk_xga_per60 / league_avg_pk_xga
        team_pp_goals *= opp_pk_factor

        # Opponent's power play goals against our PK
        opp_pp_minutes = (
            opponent.pp_opportunities_per_game * opponent.pp_minutes_per_opp
        )
        opp_pp_goals = opponent.pp_xgf_per60 * opp_pp_minutes / 60.0

        league_avg_pp_xgf = 7.5  # per 60 minutes
        team_pk_factor = team.pk_xga_per60 / league_avg_pk_xga
        opp_pp_goals *= team_pk_factor

        return {
            "team_pp_goals": round(team_pp_goals, 2),
            "opponent_pp_goals": round(opp_pp_goals, 2),
        }

    def project_game_total(
        self,
        team: TeamGameState,
        opponent: TeamGameState,
        ev_minutes: float = 48.0,
        empty_net_goals: float = 0.25,
    ) -> dict[str, float]:
        """
        Project total goals for a game, combining all sources.

        Args:
            team: Home team's game state profile.
            opponent: Away team's game state profile.
            ev_minutes: Expected even-strength minutes.
            empty_net_goals: Expected empty-net goals (both teams combined).

        Returns:
            Comprehensive projection with goals by source.
        """
        ev = self.expected_even_strength_goals(team, opponent, ev_minutes)
        st = self.expected_special_teams_goals(team, opponent)

        team_total = ev["team_ev_xg"] + st["team_pp_goals"]
        opp_total = ev["opponent_ev_xg"] + st["opponent_pp_goals"]
        game_total = team_total + opp_total + empty_net_goals

        return {
            "team_projected_goals": round(team_total, 2),
            "opponent_projected_goals": round(opp_total, 2),
            "empty_net_goals": empty_net_goals,
            "projected_total": round(game_total, 2),
            "breakdown": {
                "team_ev": ev["team_ev_xg"],
                "team_pp": st["team_pp_goals"],
                "opp_ev": ev["opponent_ev_xg"],
                "opp_pp": st["opponent_pp_goals"],
            },
        }

    def win_probability_from_projection(
        self,
        team_goals: float,
        opponent_goals: float,
        max_goals: int = 12,
    ) -> dict[str, float]:
        """
        Derive win probability from goal projections using Poisson.

        Hockey goals are well-modeled by the Poisson distribution
        (low-scoring, approximately independent events).

        Args:
            team_goals: Projected goals for our team.
            opponent_goals: Projected goals for opponent.
            max_goals: Maximum goals to consider.

        Returns:
            Dictionary with regulation and overall win probabilities.
        """
        team_pmf = poisson.pmf(np.arange(max_goals + 1), team_goals)
        opp_pmf = poisson.pmf(np.arange(max_goals + 1), opponent_goals)

        joint = np.outer(team_pmf, opp_pmf)

        reg_win = np.sum(np.tril(joint, k=-1))
        reg_loss = np.sum(np.triu(joint, k=1))
        reg_tie = np.sum(np.diag(joint))

        # Overtime/shootout: approximately 52% for the better team
        # (slight home advantage, slight edge for the team with more xG)
        if team_goals >= opponent_goals:
            ot_win_pct = 0.52
        else:
            ot_win_pct = 0.48

        total_win = reg_win + reg_tie * ot_win_pct
        total_loss = reg_loss + reg_tie * (1 - ot_win_pct)

        return {
            "regulation_win": round(reg_win, 4),
            "regulation_loss": round(reg_loss, 4),
            "overtime_probability": round(reg_tie, 4),
            "overall_win": round(total_win, 4),
            "overall_loss": round(total_loss, 4),
        }


# --- Worked Example ---
if __name__ == "__main__":
    # Define two teams
    strong_team = TeamGameState(
        team="COL",
        ev_xgf_per60=2.85,
        ev_xga_per60=2.20,
        pp_xgf_per60=8.5,
        pk_xga_per60=6.8,
        pp_opportunities_per_game=3.2,
        pk_times_per_game=2.8,
    )

    weak_team = TeamGameState(
        team="CHI",
        ev_xgf_per60=2.15,
        ev_xga_per60=2.75,
        pp_xgf_per60=6.5,
        pk_xga_per60=8.2,
        pp_opportunities_per_game=2.8,
        pk_times_per_game=3.5,
    )

    model = ScoreEffectsModel()

    print("=== Game Projection: Colorado vs Chicago ===\n")

    projection = model.project_game_total(strong_team, weak_team)
    print("Goal Projection:")
    for k, v in projection.items():
        if k != "breakdown":
            print(f"  {k}: {v}")
    print(f"  Breakdown: {projection['breakdown']}")

    win_prob = model.win_probability_from_projection(
        projection["team_projected_goals"],
        projection["opponent_projected_goals"],
    )
    print("\nWin Probability:")
    for k, v in win_prob.items():
        print(f"  {k}: {v:.1%}" if isinstance(v, float) else f"  {k}: {v}")

    # Convert to implied moneyline
    def prob_to_american(p: float) -> int:
        if p >= 0.5:
            return round(-100 * p / (1 - p))
        return round(100 * (1 - p) / p)

    print(f"\nFair Moneyline:")
    print(f"  COL: {prob_to_american(win_prob['overall_win'])}")
    print(f"  CHI: {prob_to_american(win_prob['overall_loss'])}")

The Overtime and Shootout Problem

NHL regular-season games that are tied after 60 minutes go to a 5-minute 3v3 overtime, followed by a shootout if still tied. This structure has important implications for betting:

Puck line (+/- 1.5): A regulation tie counts as the underdog covering +1.5 but the favorite not covering -1.5. Since approximately 23-26% of NHL games reach overtime, the puck line is heavily influenced by the overtime probability --- not just the regulation win probability.
Totals: Overtime and shootout goals count toward the total in most sportsbooks (though some books offer "regulation only" totals). A game that goes to overtime will typically produce 1-2 additional goal attempts, slightly pushing the total higher.
Moneyline: The moneyline settles on the final result including overtime/shootout, so the model must account for the approximately 50/50 nature of overtime outcomes.

Common Pitfall: Many bettors fail to distinguish between regulation and overtime when analyzing puck line value. The puck line is fundamentally a bet on whether the game will be decided in regulation by 2+ goals. A team with a 60% win probability might only have a 40% chance of winning by 2+ goals in regulation because overtime siphons off many close games.

18.5 NHL Betting Market Patterns

Puck Line Value Analysis

The NHL puck line is analogous to the MLB run line: the favorite is listed at -1.5 and the underdog at +1.5. Unlike baseball's run line, the puck line in hockey is a more extreme proposition because the average total score (approximately 6 goals) is much lower than in baseball (approximately 8-9 runs).

Key relationships:

A team with a 60% moneyline win probability covers -1.5 only about 33-38% of the time.
A team with a 55% win probability covers -1.5 only about 28-32% of the time.
The underdog at +1.5 covers approximately 62-72% of the time for standard matchups.

The puck line market offers value in two scenarios:

Underdog +1.5 at plus-money (rare): When a significant underdog's +1.5 line is priced at +100 or better, the implied probability (50% or less) is often well below the true probability of the underdog covering (which is typically 60-65% even for large underdogs).
Favorite -1.5 at long odds in blowout-prone matchups: When an elite team plays a significantly weaker opponent, the -1.5 price (often +180 to +220) can offer value if the true blowout probability exceeds the implied probability.

Totals Market Efficiency

NHL totals are typically set at 5.5, 6.0, or 6.5 goals. Because the total goals are low-count discrete events, the distinction between 5.5 and 6.5 is significant. The probability of exactly 6 total goals in a game is approximately 15-18%, so moving the line from 5.5 to 6.5 shifts the over/under probabilities by a large amount.

Historical analysis suggests that NHL totals markets are moderately efficient but exhibit a slight "over" bias among public bettors. This mirrors patterns in other sports: the public prefers action (overs) to stalemate (unders). However, the magnitude of the bias is small (1-2% ROI on unders in aggregate) and has decreased over time.

The most exploitable feature of NHL totals is their sensitivity to goaltender announcements. When a team's backup goaltender is confirmed (replacing a significantly better starter), the total should increase. The market often adjusts the moneyline adequately but underadjusts the total, creating a window for over bets.

Home Ice Advantage

Home ice advantage in the NHL is real but modest compared to other sports. Home teams win approximately 54-55% of regular-season games, and the advantage comes from:

Last change: The home team gets the final line change, allowing favorable matchups.
Familiarity with ice and boards: Home teams know the quirks of their own rink.
Crowd energy: Some evidence of crowd influence on referee behavior (more power plays for the home team).
Travel and rest: Away teams often face the cumulative fatigue of road trips.

For betting, home ice advantage translates to approximately a +0.08 to +0.12 expected goal differential (roughly 3-5 cents on the moneyline). The market generally prices this correctly, but value can emerge when home ice advantage is amplified (e.g., high altitude in Denver/Colorado) or diminished (e.g., a home team at the end of a long homestand facing a rested opponent).

Back-to-Back Fatigue

Back-to-back games (playing on consecutive days) are a significant factor in NHL performance and betting. Teams playing the second game of a back-to-back show measurable performance degradation:

Win rate: Drops from approximately 50% to 43-45% on the second night of a back-to-back.
Goals against: Increases by approximately 0.2-0.3 goals per game.
Save percentage: Drops by approximately 3-5 points (0.003-0.005), often because a backup goaltender starts the second game.

The market adjusts for back-to-backs, but the adjustment is often insufficient, particularly when: - The team must travel between games (a "road back-to-back"). - The previous game went to overtime (adding extra fatigue). - The team starts its backup goaltender, and the backup is significantly worse than the starter.

Python Code: NHL Market Pattern Analysis

"""
NHL Betting Market Pattern Analysis

Analyzes historical NHL betting data to identify puck line
value, totals patterns, home ice advantage, and back-to-back
fatigue effects.

Requirements:
    pip install pandas numpy matplotlib seaborn
"""

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from typing import Optional


class NHLMarketAnalyzer:
    """
    Analyzes NHL betting market patterns from historical data.

    Expects a DataFrame with columns:
        - game_id: Unique game identifier
        - date: Game date
        - home_team, away_team: Team abbreviations
        - home_goals, away_goals: Final scores (incl. OT/SO)
        - regulation_home_goals, regulation_away_goals: Goals in 60 min
        - went_to_ot: Boolean
        - home_ml_close: Home moneyline at close
        - away_ml_close: Away moneyline at close
        - total_close: Closing total line
        - home_b2b: Boolean, home team on back-to-back
        - away_b2b: Boolean, away team on back-to-back
        - home_goalie, away_goalie: Starting goaltender names
        - home_starter_is_backup: Boolean

    Attributes:
        data: The historical betting DataFrame.
    """

    def __init__(self, data: pd.DataFrame):
        self.data = data.copy()
        self._preprocess()

    def _preprocess(self) -> None:
        """Add derived columns for analysis."""
        df = self.data

        # Results
        df["home_win"] = df["home_goals"] > df["away_goals"]
        df["total_goals"] = df["home_goals"] + df["away_goals"]
        df["home_margin"] = df["home_goals"] - df["away_goals"]

        # Puck line results
        df["home_covers_minus_1_5"] = df["home_margin"] >= 2
        df["away_covers_plus_1_5"] = df["home_margin"] <= 1

        # Totals results
        df["over_hit"] = df["total_goals"] > df["total_close"]
        df["under_hit"] = df["total_goals"] < df["total_close"]

        # Determine favorite
        df["home_is_fav"] = np.where(
            df["home_ml_close"] < 0,
            df["home_ml_close"].abs() > df["away_ml_close"].abs(),
            False,
        )

    @staticmethod
    def _calc_profit(odds: int, won: bool) -> float:
        """Calculate profit on a $100 flat bet."""
        if won:
            if odds > 0:
                return 100 * (odds / 100)
            else:
                return 100 * (100 / abs(odds))
        return -100.0

    def puck_line_analysis(self) -> pd.DataFrame:
        """
        Analyze puck line (+/- 1.5) profitability by matchup type.

        Returns:
            DataFrame with puck line cover rates bucketed by
            moneyline favorite status.
        """
        df = self.data.copy()

        # Compute implied probability from home moneyline
        df["home_implied"] = df["home_ml_close"].apply(
            lambda x: abs(x) / (abs(x) + 100) if x < 0 else 100 / (x + 100)
        )

        # Bucket by strength of favorite
        bins = [0.30, 0.45, 0.50, 0.55, 0.60, 0.70, 0.80]
        labels = ["30-45%", "45-50%", "50-55%", "55-60%", "60-70%", "70-80%"]
        df["strength_bucket"] = pd.cut(
            df["home_implied"], bins=bins, labels=labels
        )

        summary = (
            df.groupby("strength_bucket", observed=True)
            .agg(
                games=("game_id", "count"),
                home_win_pct=("home_win", "mean"),
                home_covers_pct=("home_covers_minus_1_5", "mean"),
                away_covers_pct=("away_covers_plus_1_5", "mean"),
                ot_pct=("went_to_ot", "mean"),
            )
            .reset_index()
        )
        return summary.round(3)

    def back_to_back_analysis(self) -> pd.DataFrame:
        """
        Analyze the impact of back-to-back games.

        Returns:
            DataFrame comparing performance in B2B vs rest situations.
        """
        df = self.data.copy()

        scenarios = {
            "Neither B2B": (~df["home_b2b"]) & (~df["away_b2b"]),
            "Home B2B only": df["home_b2b"] & (~df["away_b2b"]),
            "Away B2B only": (~df["home_b2b"]) & df["away_b2b"],
            "Both B2B": df["home_b2b"] & df["away_b2b"],
        }

        results = []
        for label, mask in scenarios.items():
            subset = df[mask]
            if len(subset) == 0:
                continue
            results.append({
                "scenario": label,
                "games": len(subset),
                "home_win_pct": round(subset["home_win"].mean(), 3),
                "avg_total": round(subset["total_goals"].mean(), 2),
                "avg_home_goals": round(subset["home_goals"].mean(), 2),
                "avg_away_goals": round(subset["away_goals"].mean(), 2),
            })

        return pd.DataFrame(results)

    def totals_pattern_analysis(self) -> pd.DataFrame:
        """
        Analyze over/under results by total line and context.

        Returns:
            DataFrame with over/under profitability by line value.
        """
        df = self.data.copy()

        summary = (
            df.groupby("total_close")
            .agg(
                games=("game_id", "count"),
                avg_actual=("total_goals", "mean"),
                over_pct=("over_hit", "mean"),
                under_pct=("under_hit", "mean"),
            )
            .reset_index()
        )
        summary["line_vs_actual"] = summary["avg_actual"] - summary["total_close"]
        return summary.round(3)

    def home_ice_advantage_trend(self) -> pd.DataFrame:
        """
        Analyze home ice advantage over time.

        Returns:
            DataFrame with home win percentage by season segment.
        """
        df = self.data.copy()
        df["month"] = pd.to_datetime(df["date"]).dt.month

        monthly = (
            df.groupby("month")
            .agg(
                games=("game_id", "count"),
                home_win_pct=("home_win", "mean"),
                avg_home_goals=("home_goals", "mean"),
                avg_away_goals=("away_goals", "mean"),
            )
            .reset_index()
        )
        monthly["home_advantage_goals"] = (
            monthly["avg_home_goals"] - monthly["avg_away_goals"]
        )
        return monthly.round(3)

    def goaltender_backup_effect(self) -> dict[str, float]:
        """
        Quantify the impact of backup goaltenders on game outcomes.

        Returns:
            Dictionary with performance comparison between
            starter and backup games.
        """
        df = self.data.copy()

        starter_games = df[~df["home_starter_is_backup"]]
        backup_games = df[df["home_starter_is_backup"]]

        return {
            "starter_games": len(starter_games),
            "backup_games": len(backup_games),
            "starter_home_win_pct": round(starter_games["home_win"].mean(), 3),
            "backup_home_win_pct": round(backup_games["home_win"].mean(), 3),
            "starter_avg_goals_against": round(
                starter_games["away_goals"].mean(), 2
            ),
            "backup_avg_goals_against": round(
                backup_games["away_goals"].mean(), 2
            ),
            "starter_avg_total": round(starter_games["total_goals"].mean(), 2),
            "backup_avg_total": round(backup_games["total_goals"].mean(), 2),
        }

    def comprehensive_report(self) -> None:
        """Print a comprehensive market analysis report."""
        print("=" * 60)
        print("NHL BETTING MARKET PATTERN REPORT")
        print("=" * 60)

        print("\n--- Puck Line Analysis ---")
        print(self.puck_line_analysis().to_string(index=False))

        print("\n--- Back-to-Back Analysis ---")
        print(self.back_to_back_analysis().to_string(index=False))

        print("\n--- Totals by Line ---")
        print(self.totals_pattern_analysis().to_string(index=False))

        print("\n--- Home Ice Advantage by Month ---")
        print(self.home_ice_advantage_trend().to_string(index=False))

        print("\n--- Goaltender Backup Effect ---")
        backup = self.goaltender_backup_effect()
        for k, v in backup.items():
            print(f"  {k}: {v}")


# --- Example with Synthetic Data ---
if __name__ == "__main__":
    np.random.seed(42)
    n_games = 1500

    # Generate synthetic NHL game data
    home_goals = np.random.poisson(2.9, n_games)
    away_goals = np.random.poisson(2.7, n_games)

    # Some games go to OT (tied in regulation)
    regulation_tied = home_goals == away_goals
    went_to_ot = regulation_tied.copy()
    # In OT, one team scores
    for i in range(n_games):
        if went_to_ot[i]:
            if np.random.random() < 0.54:  # Slight home advantage in OT
                home_goals[i] += 1
            else:
                away_goals[i] += 1

    synthetic = pd.DataFrame({
        "game_id": range(n_games),
        "date": pd.date_range("2024-10-01", periods=n_games, freq="D"),
        "home_team": np.random.choice(["BOS", "TOR", "NYR", "COL", "FLA"], n_games),
        "away_team": np.random.choice(["CHI", "ANA", "SJS", "CBJ", "MTL"], n_games),
        "home_goals": home_goals,
        "away_goals": away_goals,
        "regulation_home_goals": home_goals - np.where(went_to_ot, 1, 0) * (home_goals > away_goals).astype(int),
        "regulation_away_goals": away_goals - np.where(went_to_ot, 1, 0) * (away_goals > home_goals).astype(int),
        "went_to_ot": went_to_ot,
        "home_ml_close": np.random.choice([-180, -150, -130, -110, 100, 120, 140], n_games),
        "away_ml_close": np.random.choice([160, 130, 110, -110, -100, -120, -140], n_games),
        "total_close": np.random.choice([5.5, 6.0, 6.5], n_games, p=[0.30, 0.40, 0.30]),
        "home_b2b": np.random.binomial(1, 0.12, n_games).astype(bool),
        "away_b2b": np.random.binomial(1, 0.14, n_games).astype(bool),
        "home_goalie": "Goalie_H",
        "away_goalie": "Goalie_A",
        "home_starter_is_backup": np.random.binomial(1, 0.25, n_games).astype(bool),
    })

    analyzer = NHLMarketAnalyzer(synthetic)
    analyzer.comprehensive_report()

Building a Complete NHL Prediction System

A production-ready NHL prediction system integrates all the components discussed in this chapter:

Team-level xG model: Generates expected goals for and against per 60 minutes, score-adjusted.
Goaltender adjustment: Applies regressed GSAx to the team's xGA based on the confirmed starting goaltender.
Special teams model: Adds expected power play and penalty kill contributions based on each team's rates and the opponent's opposing rates.
Game state model: Accounts for the expected distribution of score states throughout the game, empty net probability, and overtime likelihood.
Back-to-back and rest adjustment: Modifies expected goals based on rest days and travel.
Poisson model for probabilities: Converts expected goals into win probabilities, puck line probabilities, and totals probabilities.

The final output is a set of fair odds for every available market: moneyline, puck line, totals, period lines, and team totals. Comparing these fair odds to the market prices reveals where edges exist.

Real-World Application: The NHL betting market is smaller and less liquid than the NFL or NBA markets, which creates both opportunities and constraints. Lines move more sharply on small bets, limits are lower, and some books are reluctant to offer large player props. However, the smaller market also means less sharp competition: fewer syndicates and modeling groups focus primarily on hockey, leaving more inefficiency for the prepared individual bettor.

18.6 Chapter Summary

Key Concepts

Expected goals (xG) is the cornerstone of modern NHL analytics and the most predictive single metric for team-level performance. An xG model assigns a goal probability to each shot based on distance, angle, shot type, and context.
Corsi (all shot attempts) and Fenwick (unblocked shot attempts) measure territorial control and serve as proxies for puck possession. They stabilize quickly and are useful supplements to xG.
PDO (shooting% + save%) is the most mean-reverting metric in hockey. Teams with extreme PDO values are prime regression candidates, and the market often underweights this regression.
Score effects dramatically alter team behavior: leading teams turtle (suppressing their shot generation) while trailing teams press. Raw shot metrics must be score-adjusted to accurately reflect team quality.
Goals Saved Above Expected (GSAx) is the proper metric for goaltender evaluation, as it adjusts for the quality of shots faced. Raw save percentage is heavily contaminated by shot quality and sample noise.
Goaltender performance requires heavy regression to the mean due to small sample sizes. Even after a full season, a goaltender's GSAx contains substantial noise, and projection should weight the prior heavily.
The puck line (+/- 1.5) is one of the most interesting NHL betting markets because the relationship between win probability and puck line probability is non-linear, with overtime probability acting as a wedge.
Back-to-back games create measurable performance degradation, particularly when combined with travel and backup goaltender starts. The market often underadjusts for these factors.
Home ice advantage in the NHL is modest (~54-55% win rate) and primarily driven by the last-change rule, familiarity, and travel effects. The market generally prices this correctly.
NHL totals markets are sensitive to goaltender announcements and exhibit a slight public bias toward overs, though this bias has decreased over time.

Key Formulas

Formula	Description
$P(\text{goal} \mid \mathbf{x}) = \frac{1}{1+e^{-\mathbf{x}\boldsymbol{\beta}}}$	Logistic regression xG model
$\text{CF\%} = \frac{\text{CF}}{\text{CF} + \text{CA}} \times 100$	Corsi For percentage
$\text{FF\%} = \frac{\text{FF}}{\text{FF} + \text{FA}} \times 100$	Fenwick For percentage
$\text{PDO} = \text{Sh\%} + \text{Sv\%}$	Luck metric (mean = 100%)
$\text{GSAx} = \text{xGA} - \text{Actual GA}$	Goals Saved Above Expected
$\hat{\theta} = \frac{n}{n+k}\theta_{\text{obs}} + \frac{k}{n+k}\theta_{\text{prior}}$	Bayesian regression for goaltenders
$P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}$	Poisson PMF for goal scoring

Key Code Patterns

xG model (XGModel): A scikit-learn pipeline that preprocesses shot features, fits logistic regression with isotonic calibration, and evaluates using log loss, Brier score, and AUC. The model aggregates shot-level predictions to team-game-level xG summaries.
Shot metrics calculator (ShotMetricsCalculator): Computes raw and score/venue-adjusted Corsi and Fenwick, plus PDO, from play-by-play data. Demonstrates the importance of score adjustment for accurate team evaluation.
Goaltender evaluation (GoaltenderRegressor): Applies Bayesian regression to goaltender GSAx, properly handling the extreme sample-size issues inherent in goaltender evaluation. Includes goaltender comparison functionality.
Score effects model (ScoreEffectsModel): Integrates even-strength and special teams projections, accounts for empty-net goals, and derives win and totals probabilities from goal projections using the Poisson distribution.
Market pattern analyzer (NHLMarketAnalyzer): Analyzes puck line value, back-to-back effects, totals patterns, home ice advantage, and goaltender backup impact from historical betting data.

Decision Framework: NHL Betting Checklist

START: An NHL game is on tonight's card.

1. Confirm starting goaltenders.
   - Check team announcements (typically 1-2 hours pre-game).
   - Look up each goaltender's regressed GSAx rate.
   - Calculate the goaltender impact differential.

2. Assess team quality.
   - Review score-adjusted CF% and xG differential for both teams.
   - Check PDO: is either team due for regression?
   - Consider recent form vs. season-long metrics (weight season-long).

3. Check game context.
   - Is either team on a back-to-back?
   - What is the travel situation?
   - Any key injuries beyond the goaltender?

4. Evaluate special teams.
   - Compare PP% vs opponent PK% (and vice versa).
   - Estimate expected penalty differential from recent trends.

5. Run the model.
   - Compute expected goals for each team (EV + ST + adjustments).
   - Derive Poisson-based win probability.
   - Compute puck line and totals probabilities.

6. Compare to market.
   - Calculate fair odds for moneyline, puck line, total.
   - Identify markets where model edge exceeds the vig.
   - Pay special attention to puck line and totals, which
     are often less efficient than the moneyline.

7. Size and place bets.
   - Apply Kelly criterion or fractional Kelly.
   - Record the bet with full rationale.
   - Note: NHL limits are lower than NFL/NBA; manage
     exposure across books.

What's Next

In Chapter 19: Modeling Soccer, we turn to the world's most popular sport. Soccer shares hockey's low-scoring nature and Poisson-based modeling framework, but introduces unique challenges: the global breadth of leagues and competitions, the prominence of the draw as a distinct outcome, the Asian handicap market, and the enormous scale of the international betting market that makes soccer the most liquid and most efficiently priced sport for bettors. We will build xG models for soccer (which predated hockey's adoption of the concept), model the three-way market (home/draw/away), and explore the massive derivatives market including correct score, both teams to score, and in-play betting.

This chapter is part of The Sports Betting Textbook, a comprehensive guide to quantitative sports betting. All code examples use Python 3.11+ and are available in the companion repository.