3 min read

Win probability models transform how we understand and analyze football games. Rather than waiting until the final whistle to know the outcome, these models estimate the likelihood of each team winning at any moment during the game. This chapter...

Chapter 21: Win Probability Models

Introduction

Win probability models transform how we understand and analyze football games. Rather than waiting until the final whistle to know the outcome, these models estimate the likelihood of each team winning at any moment during the game. This chapter develops win probability models from first principles, progressing from simple state-based models to sophisticated machine learning approaches.

Win probability (WP) answers a fundamental question: "Given the current game situation, what is the probability that each team will win?" The answer depends on numerous factors including score, time remaining, field position, down and distance, and team strength differentials.

Learning Objectives:

By the end of this chapter, you will be able to: - Understand the theoretical foundations of win probability models - Build win probability models using logistic regression - Incorporate game state features (score, time, field position) - Adjust for team strength differentials - Evaluate and calibrate win probability predictions - Apply WP models to in-game decision analysis


21.1 Foundations of Win Probability

What Win Probability Represents

Win probability is a conditional probability:

$$WP = P(\text{Team A wins} | \text{Game State})$$

Where game state includes: - Current score differential - Time remaining - Field position (yard line) - Down and distance - Possession indicator - Timeouts remaining - Team strength differential

The State Space of Football

Football can be modeled as a finite state machine where each play transitions the game from one state to another:

from dataclasses import dataclass
from typing import Optional
import numpy as np

@dataclass
class GameState:
    """
    Complete representation of a football game state.
    """
    # Score
    home_score: int
    away_score: int

    # Time
    quarter: int
    seconds_remaining: int  # Seconds remaining in game

    # Field position
    yard_line: int  # 1-99, from perspective of team with ball
    down: int       # 1-4
    distance: int   # Yards to first down

    # Possession
    home_has_ball: bool

    # Resources
    home_timeouts: int
    away_timeouts: int

    # Team strength (optional)
    home_pregame_wp: float = 0.5

    @property
    def score_differential(self) -> int:
        """Score differential from home team perspective."""
        return self.home_score - self.away_score

    @property
    def possession_score_diff(self) -> int:
        """Score differential from possessing team's perspective."""
        if self.home_has_ball:
            return self.home_score - self.away_score
        return self.away_score - self.home_score

    @property
    def game_seconds_remaining(self) -> int:
        """Total seconds remaining in game."""
        quarters_remaining = 4 - self.quarter
        return quarters_remaining * 900 + self.seconds_remaining

    @property
    def is_red_zone(self) -> bool:
        """Is the offense in the red zone?"""
        return self.yard_line >= 80

    @property
    def is_scoring_position(self) -> bool:
        """Is the offense in field goal range?"""
        return self.yard_line >= 60


class GameStateEncoder:
    """
    Encode game state into feature vector for ML models.
    """

    def encode(self, state: GameState) -> np.ndarray:
        """
        Convert game state to feature vector.

        Features:
        1. Score differential (from home perspective)
        2. Game seconds remaining (normalized)
        3. Field position (normalized)
        4. Down (one-hot: 4 features)
        5. Distance to first down (normalized)
        6. Possession indicator
        7. Timeout differential
        8. Pregame win probability
        """
        # Normalized features
        score_diff = state.score_differential / 28  # Normalize by ~4 TDs
        time_remaining = state.game_seconds_remaining / 3600  # Normalize by game length
        field_pos = state.yard_line / 100
        distance_norm = min(state.distance, 20) / 20

        # One-hot encode down
        down_features = [0, 0, 0, 0]
        if 1 <= state.down <= 4:
            down_features[state.down - 1] = 1

        # Possession and timeouts
        possession = 1 if state.home_has_ball else 0
        timeout_diff = (state.home_timeouts - state.away_timeouts) / 3

        features = [
            score_diff,
            time_remaining,
            field_pos,
            *down_features,
            distance_norm,
            possession,
            timeout_diff,
            state.home_pregame_wp
        ]

        return np.array(features)

21.2 Building a Logistic Regression Win Probability Model

The Logistic Model

Win probability naturally fits a logistic regression framework:

$$WP = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + ... + \beta_n x_n)}}$$

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.calibration import calibration_curve
from typing import Dict, List, Tuple

class LogisticWinProbabilityModel:
    """
    Win probability model using logistic regression.

    Simple, interpretable baseline model using key game state features.
    """

    def __init__(self):
        self.model = LogisticRegression(
            C=1.0,
            max_iter=1000,
            solver='lbfgs'
        )
        self.scaler = StandardScaler()
        self.feature_names = [
            'score_diff',
            'time_remaining',
            'field_position',
            'down_1', 'down_2', 'down_3', 'down_4',
            'distance',
            'possession',
            'timeout_diff',
            'pregame_wp'
        ]
        self.is_fitted = False

    def prepare_features(self, plays: pd.DataFrame) -> np.ndarray:
        """
        Extract features from play-by-play data.

        Parameters:
        -----------
        plays : pd.DataFrame
            Play-by-play data with game state columns

        Returns:
        --------
        np.ndarray : Feature matrix
        """
        features = []

        for _, play in plays.iterrows():
            state = GameState(
                home_score=play['home_score'],
                away_score=play['away_score'],
                quarter=play['quarter'],
                seconds_remaining=play['seconds_remaining'],
                yard_line=play['yard_line'],
                down=play['down'],
                distance=play['distance'],
                home_has_ball=play['home_possession'],
                home_timeouts=play.get('home_timeouts', 3),
                away_timeouts=play.get('away_timeouts', 3),
                home_pregame_wp=play.get('pregame_wp', 0.5)
            )

            encoder = GameStateEncoder()
            features.append(encoder.encode(state))

        return np.array(features)

    def train(self, plays: pd.DataFrame, outcome_col: str = 'home_win'):
        """
        Train the win probability model.

        Parameters:
        -----------
        plays : pd.DataFrame
            Training data with game state and outcomes
        outcome_col : str
            Column indicating if home team won
        """
        X = self.prepare_features(plays)
        y = plays[outcome_col].values

        # Scale features
        X_scaled = self.scaler.fit_transform(X)

        # Train model
        self.model.fit(X_scaled, y)
        self.is_fitted = True

        # Store coefficients for interpretation
        self.coefficients = dict(zip(self.feature_names, self.model.coef_[0]))

    def predict(self, plays: pd.DataFrame) -> np.ndarray:
        """
        Predict win probability for game states.

        Returns:
        --------
        np.ndarray : Win probabilities for home team
        """
        if not self.is_fitted:
            raise ValueError("Model must be trained first")

        X = self.prepare_features(plays)
        X_scaled = self.scaler.transform(X)

        return self.model.predict_proba(X_scaled)[:, 1]

    def predict_single(self, state: GameState) -> float:
        """
        Predict win probability for a single game state.
        """
        if not self.is_fitted:
            raise ValueError("Model must be trained first")

        encoder = GameStateEncoder()
        X = encoder.encode(state).reshape(1, -1)
        X_scaled = self.scaler.transform(X)

        return self.model.predict_proba(X_scaled)[0, 1]

    def get_feature_importance(self) -> pd.DataFrame:
        """
        Get feature importance from coefficients.
        """
        return pd.DataFrame({
            'feature': self.feature_names,
            'coefficient': self.model.coef_[0],
            'abs_importance': np.abs(self.model.coef_[0])
        }).sort_values('abs_importance', ascending=False)

Feature Engineering for Win Probability

Effective features capture game dynamics:

class WinProbabilityFeatureEngineer:
    """
    Engineer features for win probability models.
    """

    def __init__(self):
        pass

    def create_features(self, plays: pd.DataFrame) -> pd.DataFrame:
        """
        Create comprehensive feature set.

        Features include:
        - Basic game state
        - Interaction terms
        - Non-linear transformations
        - Expected points adjustments
        """
        df = plays.copy()

        # Basic features
        df['score_diff'] = df['home_score'] - df['away_score']
        df['poss_score_diff'] = df.apply(
            lambda x: x['score_diff'] if x['home_possession'] else -x['score_diff'],
            axis=1
        )

        # Time features
        df['game_seconds'] = (4 - df['quarter']) * 900 + df['seconds_remaining']
        df['game_pct_remaining'] = df['game_seconds'] / 3600

        # Field position
        df['field_position_pct'] = df['yard_line'] / 100
        df['is_red_zone'] = (df['yard_line'] >= 80).astype(int)
        df['is_fg_range'] = (df['yard_line'] >= 60).astype(int)

        # Down and distance
        df['down_distance'] = df['down'] * df['distance']  # Interaction
        df['third_long'] = ((df['down'] == 3) & (df['distance'] >= 8)).astype(int)
        df['fourth_down'] = (df['down'] == 4).astype(int)

        # Score-time interactions
        df['score_time_interaction'] = df['score_diff'] * df['game_pct_remaining']
        df['score_per_time'] = df['score_diff'] / (df['game_pct_remaining'] + 0.01)

        # Possession value
        df['possession_value'] = df['poss_score_diff'] + df['field_position_pct'] * 3

        # Urgency indicators
        df['trailing_late'] = (
            (df['poss_score_diff'] < 0) &
            (df['game_seconds'] < 600)
        ).astype(int)

        df['leading_late'] = (
            (df['poss_score_diff'] > 0) &
            (df['game_seconds'] < 600)
        ).astype(int)

        # Timeout differential
        if 'home_timeouts' in df.columns:
            df['timeout_diff'] = df['home_timeouts'] - df['away_timeouts']
        else:
            df['timeout_diff'] = 0

        return df

21.3 Advanced Win Probability Models

Gradient Boosting Win Probability

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score

class GradientBoostingWPModel:
    """
    Win probability model using gradient boosting.

    Captures non-linear relationships and interactions automatically.
    """

    def __init__(self,
                 n_estimators: int = 100,
                 max_depth: int = 4,
                 learning_rate: float = 0.1):
        self.model = GradientBoostingClassifier(
            n_estimators=n_estimators,
            max_depth=max_depth,
            learning_rate=learning_rate,
            random_state=42
        )
        self.feature_engineer = WinProbabilityFeatureEngineer()
        self.feature_cols = None
        self.is_fitted = False

    def train(self,
             plays: pd.DataFrame,
             outcome_col: str = 'home_win',
             cv: int = 5):
        """
        Train the model with cross-validation.
        """
        # Engineer features
        df = self.feature_engineer.create_features(plays)

        # Select feature columns
        self.feature_cols = [
            'score_diff', 'game_pct_remaining', 'field_position_pct',
            'down', 'distance', 'is_red_zone', 'is_fg_range',
            'score_time_interaction', 'trailing_late', 'leading_late',
            'timeout_diff', 'pregame_wp'
        ]

        # Filter to available columns
        self.feature_cols = [c for c in self.feature_cols if c in df.columns]

        X = df[self.feature_cols].values
        y = plays[outcome_col].values

        # Cross-validation
        cv_scores = cross_val_score(self.model, X, y, cv=cv, scoring='roc_auc')
        print(f"CV AUC: {cv_scores.mean():.4f} (+/- {cv_scores.std()*2:.4f})")

        # Train on full data
        self.model.fit(X, y)
        self.is_fitted = True

    def predict(self, plays: pd.DataFrame) -> np.ndarray:
        """Predict win probabilities."""
        if not self.is_fitted:
            raise ValueError("Model must be trained first")

        df = self.feature_engineer.create_features(plays)
        X = df[self.feature_cols].values

        return self.model.predict_proba(X)[:, 1]

    def get_feature_importance(self) -> pd.DataFrame:
        """Get feature importance."""
        return pd.DataFrame({
            'feature': self.feature_cols,
            'importance': self.model.feature_importances_
        }).sort_values('importance', ascending=False)

Neural Network Win Probability

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

class NeuralWPModel(nn.Module):
    """
    Neural network win probability model.

    Architecture:
    - Input layer: game state features
    - Hidden layers with dropout
    - Output: probability via sigmoid
    """

    def __init__(self, input_dim: int, hidden_dims: List[int] = [64, 32]):
        super().__init__()

        layers = []
        prev_dim = input_dim

        for hidden_dim in hidden_dims:
            layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.ReLU(),
                nn.Dropout(0.2),
                nn.BatchNorm1d(hidden_dim)
            ])
            prev_dim = hidden_dim

        layers.append(nn.Linear(prev_dim, 1))
        layers.append(nn.Sigmoid())

        self.network = nn.Sequential(*layers)

    def forward(self, x):
        return self.network(x)


class NeuralWPTrainer:
    """
    Train neural network win probability model.
    """

    def __init__(self, input_dim: int):
        self.model = NeuralWPModel(input_dim)
        self.optimizer = torch.optim.Adam(self.model.parameters(), lr=0.001)
        self.criterion = nn.BCELoss()

    def train(self,
             X_train: np.ndarray,
             y_train: np.ndarray,
             epochs: int = 50,
             batch_size: int = 256):
        """Train the model."""
        dataset = TensorDataset(
            torch.FloatTensor(X_train),
            torch.FloatTensor(y_train).unsqueeze(1)
        )
        loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

        self.model.train()

        for epoch in range(epochs):
            total_loss = 0

            for X_batch, y_batch in loader:
                self.optimizer.zero_grad()
                predictions = self.model(X_batch)
                loss = self.criterion(predictions, y_batch)
                loss.backward()
                self.optimizer.step()
                total_loss += loss.item()

            if (epoch + 1) % 10 == 0:
                print(f"Epoch {epoch+1}/{epochs}, Loss: {total_loss/len(loader):.4f}")

    def predict(self, X: np.ndarray) -> np.ndarray:
        """Predict win probabilities."""
        self.model.eval()
        with torch.no_grad():
            X_tensor = torch.FloatTensor(X)
            predictions = self.model(X_tensor)
        return predictions.numpy().flatten()

21.4 Model Calibration and Evaluation

Calibration Analysis

A well-calibrated model's predictions match observed frequencies:

class WPCalibrationAnalyzer:
    """
    Analyze and improve win probability calibration.
    """

    def __init__(self, n_bins: int = 10):
        self.n_bins = n_bins

    def calculate_calibration(self,
                             predictions: np.ndarray,
                             outcomes: np.ndarray) -> pd.DataFrame:
        """
        Calculate calibration statistics.

        Parameters:
        -----------
        predictions : np.ndarray
            Predicted probabilities
        outcomes : np.ndarray
            Binary outcomes (0/1)

        Returns:
        --------
        pd.DataFrame : Calibration by bin
        """
        bins = np.linspace(0, 1, self.n_bins + 1)
        calibration = []

        for i in range(self.n_bins):
            mask = (predictions >= bins[i]) & (predictions < bins[i+1])
            if mask.sum() > 0:
                calibration.append({
                    'bin': f'{bins[i]:.1f}-{bins[i+1]:.1f}',
                    'bin_midpoint': (bins[i] + bins[i+1]) / 2,
                    'predicted_mean': predictions[mask].mean(),
                    'actual_mean': outcomes[mask].mean(),
                    'count': mask.sum(),
                    'calibration_error': predictions[mask].mean() - outcomes[mask].mean()
                })

        return pd.DataFrame(calibration)

    def calculate_metrics(self,
                         predictions: np.ndarray,
                         outcomes: np.ndarray) -> Dict:
        """
        Calculate comprehensive calibration metrics.
        """
        from sklearn.metrics import brier_score_loss, log_loss, roc_auc_score

        calibration = self.calculate_calibration(predictions, outcomes)

        # Expected Calibration Error (ECE)
        weights = calibration['count'] / calibration['count'].sum()
        ece = (weights * calibration['calibration_error'].abs()).sum()

        # Maximum Calibration Error
        mce = calibration['calibration_error'].abs().max()

        return {
            'brier_score': brier_score_loss(outcomes, predictions),
            'log_loss': log_loss(outcomes, predictions),
            'auc': roc_auc_score(outcomes, predictions),
            'ece': ece,
            'mce': mce
        }

    def plot_calibration(self,
                        predictions: np.ndarray,
                        outcomes: np.ndarray,
                        ax=None):
        """
        Plot calibration curve.
        """
        import matplotlib.pyplot as plt

        if ax is None:
            fig, ax = plt.subplots(figsize=(8, 8))

        calibration = self.calculate_calibration(predictions, outcomes)

        # Perfect calibration line
        ax.plot([0, 1], [0, 1], 'k--', label='Perfect Calibration')

        # Actual calibration
        ax.plot(calibration['predicted_mean'],
               calibration['actual_mean'],
               'bo-', label='Model')

        ax.set_xlabel('Predicted Probability')
        ax.set_ylabel('Observed Frequency')
        ax.set_title('Win Probability Calibration')
        ax.legend()
        ax.grid(True, alpha=0.3)

        return ax

Isotonic Calibration

from sklearn.isotonic import IsotonicRegression

class CalibratedWPModel:
    """
    Wrapper that adds isotonic calibration to any WP model.
    """

    def __init__(self, base_model):
        self.base_model = base_model
        self.calibrator = IsotonicRegression(out_of_bounds='clip')
        self.is_calibrated = False

    def calibrate(self, val_plays: pd.DataFrame, outcome_col: str = 'home_win'):
        """
        Fit isotonic calibration on validation set.
        """
        raw_predictions = self.base_model.predict(val_plays)
        outcomes = val_plays[outcome_col].values

        self.calibrator.fit(raw_predictions, outcomes)
        self.is_calibrated = True

    def predict(self, plays: pd.DataFrame) -> np.ndarray:
        """
        Get calibrated predictions.
        """
        raw_predictions = self.base_model.predict(plays)

        if self.is_calibrated:
            return self.calibrator.predict(raw_predictions)
        return raw_predictions

21.5 Win Probability Added (WPA)

Calculating Play Impact

Win Probability Added measures each play's impact on winning:

class WPACalculator:
    """
    Calculate Win Probability Added for each play.

    WPA = WP_after - WP_before
    """

    def __init__(self, wp_model):
        self.wp_model = wp_model

    def calculate_wpa(self, plays: pd.DataFrame) -> pd.DataFrame:
        """
        Calculate WPA for all plays.

        Parameters:
        -----------
        plays : pd.DataFrame
            Play-by-play data with before/after states

        Returns:
        --------
        pd.DataFrame : Plays with WPA added
        """
        df = plays.copy()

        # Get WP before each play
        df['wp_before'] = self.wp_model.predict(plays)

        # Calculate WP after (need after-play state)
        if 'wp_after' not in df.columns:
            # Shift to get next play's WP as current play's WP_after
            df['wp_after'] = df.groupby('game_id')['wp_before'].shift(-1)

            # Handle game-ending plays
            df.loc[df['wp_after'].isna(), 'wp_after'] = df.loc[
                df['wp_after'].isna(), 'home_win'
            ].astype(float)

        # Calculate WPA from home team perspective
        df['wpa_home'] = df['wp_after'] - df['wp_before']

        # WPA from perspective of possessing team
        df['wpa'] = df.apply(
            lambda x: x['wpa_home'] if x['home_possession'] else -x['wpa_home'],
            axis=1
        )

        return df

    def aggregate_player_wpa(self, plays_with_wpa: pd.DataFrame) -> pd.DataFrame:
        """
        Aggregate WPA by player.
        """
        player_wpa = plays_with_wpa.groupby(['player_id', 'player_name']).agg({
            'wpa': ['sum', 'mean', 'count'],
            'wpa_home': 'sum'
        }).round(4)

        player_wpa.columns = ['total_wpa', 'avg_wpa', 'plays', 'home_wpa']

        return player_wpa.sort_values('total_wpa', ascending=False)

Expected Points Added Integration

class EPAWPAAnalyzer:
    """
    Combine EPA and WPA for comprehensive play analysis.
    """

    def __init__(self, wp_model, ep_model):
        self.wp_model = wp_model
        self.ep_model = ep_model

    def analyze_play(self, play: pd.Series) -> Dict:
        """
        Comprehensive play analysis with EPA and WPA.
        """
        # Get EPA (context-neutral value)
        epa = self.ep_model.calculate_epa(play)

        # Get WPA (context-dependent value)
        wp_before = play.get('wp_before', 0.5)
        wp_after = play.get('wp_after', 0.5)
        wpa = wp_after - wp_before

        # Leverage index: how much does this situation magnify play impact?
        leverage = abs(wpa) / (abs(epa) + 0.01) if epa != 0 else 1.0

        return {
            'epa': epa,
            'wpa': wpa,
            'wp_before': wp_before,
            'wp_after': wp_after,
            'leverage_index': leverage,
            'is_high_leverage': leverage > 1.5
        }

21.6 Applications of Win Probability

In-Game Decision Analysis

class DecisionAnalyzer:
    """
    Use WP models to analyze in-game decisions.
    """

    def __init__(self, wp_model):
        self.wp_model = wp_model

    def analyze_fourth_down(self,
                           state: GameState,
                           conversion_prob: float,
                           fg_success_prob: float = None) -> Dict:
        """
        Analyze fourth down decision using WP.

        Options:
        1. Go for it (convert or turnover on downs)
        2. Punt (surrender possession, better field position for opponent)
        3. Kick field goal (if in range)
        """
        current_wp = self.wp_model.predict_single(state)

        results = {'current_wp': current_wp}

        # Option 1: Go for it
        # If convert: new first down
        convert_state = GameState(
            home_score=state.home_score,
            away_score=state.away_score,
            quarter=state.quarter,
            seconds_remaining=state.seconds_remaining - 5,
            yard_line=min(state.yard_line + state.distance, 99),
            down=1,
            distance=10,
            home_has_ball=state.home_has_ball,
            home_timeouts=state.home_timeouts,
            away_timeouts=state.away_timeouts,
            home_pregame_wp=state.home_pregame_wp
        )

        # If fail: turnover on downs
        fail_state = GameState(
            home_score=state.home_score,
            away_score=state.away_score,
            quarter=state.quarter,
            seconds_remaining=state.seconds_remaining - 5,
            yard_line=100 - state.yard_line,
            down=1,
            distance=10,
            home_has_ball=not state.home_has_ball,
            home_timeouts=state.home_timeouts,
            away_timeouts=state.away_timeouts,
            home_pregame_wp=state.home_pregame_wp
        )

        wp_convert = self.wp_model.predict_single(convert_state)
        wp_fail = self.wp_model.predict_single(fail_state)

        go_for_it_wp = conversion_prob * wp_convert + (1 - conversion_prob) * wp_fail

        results['go_for_it'] = {
            'expected_wp': go_for_it_wp,
            'wp_gain': go_for_it_wp - current_wp,
            'conversion_prob': conversion_prob
        }

        # Option 2: Punt
        # Assume average punt of 40 net yards
        punt_distance = min(40, state.yard_line - 20)
        punt_state = GameState(
            home_score=state.home_score,
            away_score=state.away_score,
            quarter=state.quarter,
            seconds_remaining=state.seconds_remaining - 5,
            yard_line=100 - (state.yard_line - punt_distance),
            down=1,
            distance=10,
            home_has_ball=not state.home_has_ball,
            home_timeouts=state.home_timeouts,
            away_timeouts=state.away_timeouts,
            home_pregame_wp=state.home_pregame_wp
        )

        punt_wp = self.wp_model.predict_single(punt_state)

        results['punt'] = {
            'expected_wp': punt_wp,
            'wp_gain': punt_wp - current_wp,
            'expected_net': punt_distance
        }

        # Option 3: Field goal (if in range)
        if fg_success_prob is not None and state.yard_line >= 60:
            # If make: score 3 points, kick off
            make_state = GameState(
                home_score=state.home_score + (3 if state.home_has_ball else 0),
                away_score=state.away_score + (0 if state.home_has_ball else 3),
                quarter=state.quarter,
                seconds_remaining=state.seconds_remaining - 5,
                yard_line=25,  # Touchback on kickoff
                down=1,
                distance=10,
                home_has_ball=not state.home_has_ball,
                home_timeouts=state.home_timeouts,
                away_timeouts=state.away_timeouts,
                home_pregame_wp=state.home_pregame_wp
            )

            wp_make = self.wp_model.predict_single(make_state)
            wp_miss = self.wp_model.predict_single(fail_state)  # Same as failed 4th down

            fg_wp = fg_success_prob * wp_make + (1 - fg_success_prob) * wp_miss

            results['field_goal'] = {
                'expected_wp': fg_wp,
                'wp_gain': fg_wp - current_wp,
                'success_prob': fg_success_prob
            }

        # Recommendation
        options = {k: v['expected_wp'] for k, v in results.items()
                  if isinstance(v, dict) and 'expected_wp' in v}

        results['recommendation'] = max(options, key=options.get)
        results['wp_gain_vs_punt'] = results['go_for_it']['expected_wp'] - results['punt']['expected_wp']

        return results

Summary

This chapter covered building and applying win probability models:

Key Concepts: 1. Game State Representation: Capturing all relevant factors affecting win probability 2. Logistic Regression: Simple, interpretable baseline model 3. Advanced Models: Gradient boosting and neural networks for improved accuracy 4. Calibration: Ensuring predictions match observed frequencies 5. Win Probability Added: Measuring play impact in context 6. Decision Analysis: Using WP for in-game strategy

Best Practices: - Start with interpretable logistic regression before complex models - Always validate calibration, not just discrimination - Consider both EPA (context-free) and WPA (context-dependent) - Use isotonic regression for post-hoc calibration - Account for team strength in pregame probability

Next Steps: The next chapter applies machine learning more broadly to college football analytics, covering additional prediction tasks and advanced techniques.