Win probability models transform how we understand and analyze football games. Rather than waiting until the final whistle to know the outcome, these models estimate the likelihood of each team winning at any moment during the game. This chapter...
In This Chapter
Chapter 21: Win Probability Models
Introduction
Win probability models transform how we understand and analyze football games. Rather than waiting until the final whistle to know the outcome, these models estimate the likelihood of each team winning at any moment during the game. This chapter develops win probability models from first principles, progressing from simple state-based models to sophisticated machine learning approaches.
Win probability (WP) answers a fundamental question: "Given the current game situation, what is the probability that each team will win?" The answer depends on numerous factors including score, time remaining, field position, down and distance, and team strength differentials.
Learning Objectives:
By the end of this chapter, you will be able to: - Understand the theoretical foundations of win probability models - Build win probability models using logistic regression - Incorporate game state features (score, time, field position) - Adjust for team strength differentials - Evaluate and calibrate win probability predictions - Apply WP models to in-game decision analysis
21.1 Foundations of Win Probability
What Win Probability Represents
Win probability is a conditional probability:
$$WP = P(\text{Team A wins} | \text{Game State})$$
Where game state includes: - Current score differential - Time remaining - Field position (yard line) - Down and distance - Possession indicator - Timeouts remaining - Team strength differential
The State Space of Football
Football can be modeled as a finite state machine where each play transitions the game from one state to another:
from dataclasses import dataclass
from typing import Optional
import numpy as np
@dataclass
class GameState:
"""
Complete representation of a football game state.
"""
# Score
home_score: int
away_score: int
# Time
quarter: int
seconds_remaining: int # Seconds remaining in game
# Field position
yard_line: int # 1-99, from perspective of team with ball
down: int # 1-4
distance: int # Yards to first down
# Possession
home_has_ball: bool
# Resources
home_timeouts: int
away_timeouts: int
# Team strength (optional)
home_pregame_wp: float = 0.5
@property
def score_differential(self) -> int:
"""Score differential from home team perspective."""
return self.home_score - self.away_score
@property
def possession_score_diff(self) -> int:
"""Score differential from possessing team's perspective."""
if self.home_has_ball:
return self.home_score - self.away_score
return self.away_score - self.home_score
@property
def game_seconds_remaining(self) -> int:
"""Total seconds remaining in game."""
quarters_remaining = 4 - self.quarter
return quarters_remaining * 900 + self.seconds_remaining
@property
def is_red_zone(self) -> bool:
"""Is the offense in the red zone?"""
return self.yard_line >= 80
@property
def is_scoring_position(self) -> bool:
"""Is the offense in field goal range?"""
return self.yard_line >= 60
class GameStateEncoder:
"""
Encode game state into feature vector for ML models.
"""
def encode(self, state: GameState) -> np.ndarray:
"""
Convert game state to feature vector.
Features:
1. Score differential (from home perspective)
2. Game seconds remaining (normalized)
3. Field position (normalized)
4. Down (one-hot: 4 features)
5. Distance to first down (normalized)
6. Possession indicator
7. Timeout differential
8. Pregame win probability
"""
# Normalized features
score_diff = state.score_differential / 28 # Normalize by ~4 TDs
time_remaining = state.game_seconds_remaining / 3600 # Normalize by game length
field_pos = state.yard_line / 100
distance_norm = min(state.distance, 20) / 20
# One-hot encode down
down_features = [0, 0, 0, 0]
if 1 <= state.down <= 4:
down_features[state.down - 1] = 1
# Possession and timeouts
possession = 1 if state.home_has_ball else 0
timeout_diff = (state.home_timeouts - state.away_timeouts) / 3
features = [
score_diff,
time_remaining,
field_pos,
*down_features,
distance_norm,
possession,
timeout_diff,
state.home_pregame_wp
]
return np.array(features)
21.2 Building a Logistic Regression Win Probability Model
The Logistic Model
Win probability naturally fits a logistic regression framework:
$$WP = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + ... + \beta_n x_n)}}$$
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.calibration import calibration_curve
from typing import Dict, List, Tuple
class LogisticWinProbabilityModel:
"""
Win probability model using logistic regression.
Simple, interpretable baseline model using key game state features.
"""
def __init__(self):
self.model = LogisticRegression(
C=1.0,
max_iter=1000,
solver='lbfgs'
)
self.scaler = StandardScaler()
self.feature_names = [
'score_diff',
'time_remaining',
'field_position',
'down_1', 'down_2', 'down_3', 'down_4',
'distance',
'possession',
'timeout_diff',
'pregame_wp'
]
self.is_fitted = False
def prepare_features(self, plays: pd.DataFrame) -> np.ndarray:
"""
Extract features from play-by-play data.
Parameters:
-----------
plays : pd.DataFrame
Play-by-play data with game state columns
Returns:
--------
np.ndarray : Feature matrix
"""
features = []
for _, play in plays.iterrows():
state = GameState(
home_score=play['home_score'],
away_score=play['away_score'],
quarter=play['quarter'],
seconds_remaining=play['seconds_remaining'],
yard_line=play['yard_line'],
down=play['down'],
distance=play['distance'],
home_has_ball=play['home_possession'],
home_timeouts=play.get('home_timeouts', 3),
away_timeouts=play.get('away_timeouts', 3),
home_pregame_wp=play.get('pregame_wp', 0.5)
)
encoder = GameStateEncoder()
features.append(encoder.encode(state))
return np.array(features)
def train(self, plays: pd.DataFrame, outcome_col: str = 'home_win'):
"""
Train the win probability model.
Parameters:
-----------
plays : pd.DataFrame
Training data with game state and outcomes
outcome_col : str
Column indicating if home team won
"""
X = self.prepare_features(plays)
y = plays[outcome_col].values
# Scale features
X_scaled = self.scaler.fit_transform(X)
# Train model
self.model.fit(X_scaled, y)
self.is_fitted = True
# Store coefficients for interpretation
self.coefficients = dict(zip(self.feature_names, self.model.coef_[0]))
def predict(self, plays: pd.DataFrame) -> np.ndarray:
"""
Predict win probability for game states.
Returns:
--------
np.ndarray : Win probabilities for home team
"""
if not self.is_fitted:
raise ValueError("Model must be trained first")
X = self.prepare_features(plays)
X_scaled = self.scaler.transform(X)
return self.model.predict_proba(X_scaled)[:, 1]
def predict_single(self, state: GameState) -> float:
"""
Predict win probability for a single game state.
"""
if not self.is_fitted:
raise ValueError("Model must be trained first")
encoder = GameStateEncoder()
X = encoder.encode(state).reshape(1, -1)
X_scaled = self.scaler.transform(X)
return self.model.predict_proba(X_scaled)[0, 1]
def get_feature_importance(self) -> pd.DataFrame:
"""
Get feature importance from coefficients.
"""
return pd.DataFrame({
'feature': self.feature_names,
'coefficient': self.model.coef_[0],
'abs_importance': np.abs(self.model.coef_[0])
}).sort_values('abs_importance', ascending=False)
Feature Engineering for Win Probability
Effective features capture game dynamics:
class WinProbabilityFeatureEngineer:
"""
Engineer features for win probability models.
"""
def __init__(self):
pass
def create_features(self, plays: pd.DataFrame) -> pd.DataFrame:
"""
Create comprehensive feature set.
Features include:
- Basic game state
- Interaction terms
- Non-linear transformations
- Expected points adjustments
"""
df = plays.copy()
# Basic features
df['score_diff'] = df['home_score'] - df['away_score']
df['poss_score_diff'] = df.apply(
lambda x: x['score_diff'] if x['home_possession'] else -x['score_diff'],
axis=1
)
# Time features
df['game_seconds'] = (4 - df['quarter']) * 900 + df['seconds_remaining']
df['game_pct_remaining'] = df['game_seconds'] / 3600
# Field position
df['field_position_pct'] = df['yard_line'] / 100
df['is_red_zone'] = (df['yard_line'] >= 80).astype(int)
df['is_fg_range'] = (df['yard_line'] >= 60).astype(int)
# Down and distance
df['down_distance'] = df['down'] * df['distance'] # Interaction
df['third_long'] = ((df['down'] == 3) & (df['distance'] >= 8)).astype(int)
df['fourth_down'] = (df['down'] == 4).astype(int)
# Score-time interactions
df['score_time_interaction'] = df['score_diff'] * df['game_pct_remaining']
df['score_per_time'] = df['score_diff'] / (df['game_pct_remaining'] + 0.01)
# Possession value
df['possession_value'] = df['poss_score_diff'] + df['field_position_pct'] * 3
# Urgency indicators
df['trailing_late'] = (
(df['poss_score_diff'] < 0) &
(df['game_seconds'] < 600)
).astype(int)
df['leading_late'] = (
(df['poss_score_diff'] > 0) &
(df['game_seconds'] < 600)
).astype(int)
# Timeout differential
if 'home_timeouts' in df.columns:
df['timeout_diff'] = df['home_timeouts'] - df['away_timeouts']
else:
df['timeout_diff'] = 0
return df
21.3 Advanced Win Probability Models
Gradient Boosting Win Probability
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score
class GradientBoostingWPModel:
"""
Win probability model using gradient boosting.
Captures non-linear relationships and interactions automatically.
"""
def __init__(self,
n_estimators: int = 100,
max_depth: int = 4,
learning_rate: float = 0.1):
self.model = GradientBoostingClassifier(
n_estimators=n_estimators,
max_depth=max_depth,
learning_rate=learning_rate,
random_state=42
)
self.feature_engineer = WinProbabilityFeatureEngineer()
self.feature_cols = None
self.is_fitted = False
def train(self,
plays: pd.DataFrame,
outcome_col: str = 'home_win',
cv: int = 5):
"""
Train the model with cross-validation.
"""
# Engineer features
df = self.feature_engineer.create_features(plays)
# Select feature columns
self.feature_cols = [
'score_diff', 'game_pct_remaining', 'field_position_pct',
'down', 'distance', 'is_red_zone', 'is_fg_range',
'score_time_interaction', 'trailing_late', 'leading_late',
'timeout_diff', 'pregame_wp'
]
# Filter to available columns
self.feature_cols = [c for c in self.feature_cols if c in df.columns]
X = df[self.feature_cols].values
y = plays[outcome_col].values
# Cross-validation
cv_scores = cross_val_score(self.model, X, y, cv=cv, scoring='roc_auc')
print(f"CV AUC: {cv_scores.mean():.4f} (+/- {cv_scores.std()*2:.4f})")
# Train on full data
self.model.fit(X, y)
self.is_fitted = True
def predict(self, plays: pd.DataFrame) -> np.ndarray:
"""Predict win probabilities."""
if not self.is_fitted:
raise ValueError("Model must be trained first")
df = self.feature_engineer.create_features(plays)
X = df[self.feature_cols].values
return self.model.predict_proba(X)[:, 1]
def get_feature_importance(self) -> pd.DataFrame:
"""Get feature importance."""
return pd.DataFrame({
'feature': self.feature_cols,
'importance': self.model.feature_importances_
}).sort_values('importance', ascending=False)
Neural Network Win Probability
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
class NeuralWPModel(nn.Module):
"""
Neural network win probability model.
Architecture:
- Input layer: game state features
- Hidden layers with dropout
- Output: probability via sigmoid
"""
def __init__(self, input_dim: int, hidden_dims: List[int] = [64, 32]):
super().__init__()
layers = []
prev_dim = input_dim
for hidden_dim in hidden_dims:
layers.extend([
nn.Linear(prev_dim, hidden_dim),
nn.ReLU(),
nn.Dropout(0.2),
nn.BatchNorm1d(hidden_dim)
])
prev_dim = hidden_dim
layers.append(nn.Linear(prev_dim, 1))
layers.append(nn.Sigmoid())
self.network = nn.Sequential(*layers)
def forward(self, x):
return self.network(x)
class NeuralWPTrainer:
"""
Train neural network win probability model.
"""
def __init__(self, input_dim: int):
self.model = NeuralWPModel(input_dim)
self.optimizer = torch.optim.Adam(self.model.parameters(), lr=0.001)
self.criterion = nn.BCELoss()
def train(self,
X_train: np.ndarray,
y_train: np.ndarray,
epochs: int = 50,
batch_size: int = 256):
"""Train the model."""
dataset = TensorDataset(
torch.FloatTensor(X_train),
torch.FloatTensor(y_train).unsqueeze(1)
)
loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
self.model.train()
for epoch in range(epochs):
total_loss = 0
for X_batch, y_batch in loader:
self.optimizer.zero_grad()
predictions = self.model(X_batch)
loss = self.criterion(predictions, y_batch)
loss.backward()
self.optimizer.step()
total_loss += loss.item()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1}/{epochs}, Loss: {total_loss/len(loader):.4f}")
def predict(self, X: np.ndarray) -> np.ndarray:
"""Predict win probabilities."""
self.model.eval()
with torch.no_grad():
X_tensor = torch.FloatTensor(X)
predictions = self.model(X_tensor)
return predictions.numpy().flatten()
21.4 Model Calibration and Evaluation
Calibration Analysis
A well-calibrated model's predictions match observed frequencies:
class WPCalibrationAnalyzer:
"""
Analyze and improve win probability calibration.
"""
def __init__(self, n_bins: int = 10):
self.n_bins = n_bins
def calculate_calibration(self,
predictions: np.ndarray,
outcomes: np.ndarray) -> pd.DataFrame:
"""
Calculate calibration statistics.
Parameters:
-----------
predictions : np.ndarray
Predicted probabilities
outcomes : np.ndarray
Binary outcomes (0/1)
Returns:
--------
pd.DataFrame : Calibration by bin
"""
bins = np.linspace(0, 1, self.n_bins + 1)
calibration = []
for i in range(self.n_bins):
mask = (predictions >= bins[i]) & (predictions < bins[i+1])
if mask.sum() > 0:
calibration.append({
'bin': f'{bins[i]:.1f}-{bins[i+1]:.1f}',
'bin_midpoint': (bins[i] + bins[i+1]) / 2,
'predicted_mean': predictions[mask].mean(),
'actual_mean': outcomes[mask].mean(),
'count': mask.sum(),
'calibration_error': predictions[mask].mean() - outcomes[mask].mean()
})
return pd.DataFrame(calibration)
def calculate_metrics(self,
predictions: np.ndarray,
outcomes: np.ndarray) -> Dict:
"""
Calculate comprehensive calibration metrics.
"""
from sklearn.metrics import brier_score_loss, log_loss, roc_auc_score
calibration = self.calculate_calibration(predictions, outcomes)
# Expected Calibration Error (ECE)
weights = calibration['count'] / calibration['count'].sum()
ece = (weights * calibration['calibration_error'].abs()).sum()
# Maximum Calibration Error
mce = calibration['calibration_error'].abs().max()
return {
'brier_score': brier_score_loss(outcomes, predictions),
'log_loss': log_loss(outcomes, predictions),
'auc': roc_auc_score(outcomes, predictions),
'ece': ece,
'mce': mce
}
def plot_calibration(self,
predictions: np.ndarray,
outcomes: np.ndarray,
ax=None):
"""
Plot calibration curve.
"""
import matplotlib.pyplot as plt
if ax is None:
fig, ax = plt.subplots(figsize=(8, 8))
calibration = self.calculate_calibration(predictions, outcomes)
# Perfect calibration line
ax.plot([0, 1], [0, 1], 'k--', label='Perfect Calibration')
# Actual calibration
ax.plot(calibration['predicted_mean'],
calibration['actual_mean'],
'bo-', label='Model')
ax.set_xlabel('Predicted Probability')
ax.set_ylabel('Observed Frequency')
ax.set_title('Win Probability Calibration')
ax.legend()
ax.grid(True, alpha=0.3)
return ax
Isotonic Calibration
from sklearn.isotonic import IsotonicRegression
class CalibratedWPModel:
"""
Wrapper that adds isotonic calibration to any WP model.
"""
def __init__(self, base_model):
self.base_model = base_model
self.calibrator = IsotonicRegression(out_of_bounds='clip')
self.is_calibrated = False
def calibrate(self, val_plays: pd.DataFrame, outcome_col: str = 'home_win'):
"""
Fit isotonic calibration on validation set.
"""
raw_predictions = self.base_model.predict(val_plays)
outcomes = val_plays[outcome_col].values
self.calibrator.fit(raw_predictions, outcomes)
self.is_calibrated = True
def predict(self, plays: pd.DataFrame) -> np.ndarray:
"""
Get calibrated predictions.
"""
raw_predictions = self.base_model.predict(plays)
if self.is_calibrated:
return self.calibrator.predict(raw_predictions)
return raw_predictions
21.5 Win Probability Added (WPA)
Calculating Play Impact
Win Probability Added measures each play's impact on winning:
class WPACalculator:
"""
Calculate Win Probability Added for each play.
WPA = WP_after - WP_before
"""
def __init__(self, wp_model):
self.wp_model = wp_model
def calculate_wpa(self, plays: pd.DataFrame) -> pd.DataFrame:
"""
Calculate WPA for all plays.
Parameters:
-----------
plays : pd.DataFrame
Play-by-play data with before/after states
Returns:
--------
pd.DataFrame : Plays with WPA added
"""
df = plays.copy()
# Get WP before each play
df['wp_before'] = self.wp_model.predict(plays)
# Calculate WP after (need after-play state)
if 'wp_after' not in df.columns:
# Shift to get next play's WP as current play's WP_after
df['wp_after'] = df.groupby('game_id')['wp_before'].shift(-1)
# Handle game-ending plays
df.loc[df['wp_after'].isna(), 'wp_after'] = df.loc[
df['wp_after'].isna(), 'home_win'
].astype(float)
# Calculate WPA from home team perspective
df['wpa_home'] = df['wp_after'] - df['wp_before']
# WPA from perspective of possessing team
df['wpa'] = df.apply(
lambda x: x['wpa_home'] if x['home_possession'] else -x['wpa_home'],
axis=1
)
return df
def aggregate_player_wpa(self, plays_with_wpa: pd.DataFrame) -> pd.DataFrame:
"""
Aggregate WPA by player.
"""
player_wpa = plays_with_wpa.groupby(['player_id', 'player_name']).agg({
'wpa': ['sum', 'mean', 'count'],
'wpa_home': 'sum'
}).round(4)
player_wpa.columns = ['total_wpa', 'avg_wpa', 'plays', 'home_wpa']
return player_wpa.sort_values('total_wpa', ascending=False)
Expected Points Added Integration
class EPAWPAAnalyzer:
"""
Combine EPA and WPA for comprehensive play analysis.
"""
def __init__(self, wp_model, ep_model):
self.wp_model = wp_model
self.ep_model = ep_model
def analyze_play(self, play: pd.Series) -> Dict:
"""
Comprehensive play analysis with EPA and WPA.
"""
# Get EPA (context-neutral value)
epa = self.ep_model.calculate_epa(play)
# Get WPA (context-dependent value)
wp_before = play.get('wp_before', 0.5)
wp_after = play.get('wp_after', 0.5)
wpa = wp_after - wp_before
# Leverage index: how much does this situation magnify play impact?
leverage = abs(wpa) / (abs(epa) + 0.01) if epa != 0 else 1.0
return {
'epa': epa,
'wpa': wpa,
'wp_before': wp_before,
'wp_after': wp_after,
'leverage_index': leverage,
'is_high_leverage': leverage > 1.5
}
21.6 Applications of Win Probability
In-Game Decision Analysis
class DecisionAnalyzer:
"""
Use WP models to analyze in-game decisions.
"""
def __init__(self, wp_model):
self.wp_model = wp_model
def analyze_fourth_down(self,
state: GameState,
conversion_prob: float,
fg_success_prob: float = None) -> Dict:
"""
Analyze fourth down decision using WP.
Options:
1. Go for it (convert or turnover on downs)
2. Punt (surrender possession, better field position for opponent)
3. Kick field goal (if in range)
"""
current_wp = self.wp_model.predict_single(state)
results = {'current_wp': current_wp}
# Option 1: Go for it
# If convert: new first down
convert_state = GameState(
home_score=state.home_score,
away_score=state.away_score,
quarter=state.quarter,
seconds_remaining=state.seconds_remaining - 5,
yard_line=min(state.yard_line + state.distance, 99),
down=1,
distance=10,
home_has_ball=state.home_has_ball,
home_timeouts=state.home_timeouts,
away_timeouts=state.away_timeouts,
home_pregame_wp=state.home_pregame_wp
)
# If fail: turnover on downs
fail_state = GameState(
home_score=state.home_score,
away_score=state.away_score,
quarter=state.quarter,
seconds_remaining=state.seconds_remaining - 5,
yard_line=100 - state.yard_line,
down=1,
distance=10,
home_has_ball=not state.home_has_ball,
home_timeouts=state.home_timeouts,
away_timeouts=state.away_timeouts,
home_pregame_wp=state.home_pregame_wp
)
wp_convert = self.wp_model.predict_single(convert_state)
wp_fail = self.wp_model.predict_single(fail_state)
go_for_it_wp = conversion_prob * wp_convert + (1 - conversion_prob) * wp_fail
results['go_for_it'] = {
'expected_wp': go_for_it_wp,
'wp_gain': go_for_it_wp - current_wp,
'conversion_prob': conversion_prob
}
# Option 2: Punt
# Assume average punt of 40 net yards
punt_distance = min(40, state.yard_line - 20)
punt_state = GameState(
home_score=state.home_score,
away_score=state.away_score,
quarter=state.quarter,
seconds_remaining=state.seconds_remaining - 5,
yard_line=100 - (state.yard_line - punt_distance),
down=1,
distance=10,
home_has_ball=not state.home_has_ball,
home_timeouts=state.home_timeouts,
away_timeouts=state.away_timeouts,
home_pregame_wp=state.home_pregame_wp
)
punt_wp = self.wp_model.predict_single(punt_state)
results['punt'] = {
'expected_wp': punt_wp,
'wp_gain': punt_wp - current_wp,
'expected_net': punt_distance
}
# Option 3: Field goal (if in range)
if fg_success_prob is not None and state.yard_line >= 60:
# If make: score 3 points, kick off
make_state = GameState(
home_score=state.home_score + (3 if state.home_has_ball else 0),
away_score=state.away_score + (0 if state.home_has_ball else 3),
quarter=state.quarter,
seconds_remaining=state.seconds_remaining - 5,
yard_line=25, # Touchback on kickoff
down=1,
distance=10,
home_has_ball=not state.home_has_ball,
home_timeouts=state.home_timeouts,
away_timeouts=state.away_timeouts,
home_pregame_wp=state.home_pregame_wp
)
wp_make = self.wp_model.predict_single(make_state)
wp_miss = self.wp_model.predict_single(fail_state) # Same as failed 4th down
fg_wp = fg_success_prob * wp_make + (1 - fg_success_prob) * wp_miss
results['field_goal'] = {
'expected_wp': fg_wp,
'wp_gain': fg_wp - current_wp,
'success_prob': fg_success_prob
}
# Recommendation
options = {k: v['expected_wp'] for k, v in results.items()
if isinstance(v, dict) and 'expected_wp' in v}
results['recommendation'] = max(options, key=options.get)
results['wp_gain_vs_punt'] = results['go_for_it']['expected_wp'] - results['punt']['expected_wp']
return results
Summary
This chapter covered building and applying win probability models:
Key Concepts: 1. Game State Representation: Capturing all relevant factors affecting win probability 2. Logistic Regression: Simple, interpretable baseline model 3. Advanced Models: Gradient boosting and neural networks for improved accuracy 4. Calibration: Ensuring predictions match observed frequencies 5. Win Probability Added: Measuring play impact in context 6. Decision Analysis: Using WP for in-game strategy
Best Practices: - Start with interpretable logistic regression before complex models - Always validate calibration, not just discrimination - Consider both EPA (context-free) and WPA (context-dependent) - Use isotonic regression for post-hoc calibration - Account for team strength in pregame probability
Next Steps: The next chapter applies machine learning more broadly to college football analytics, covering additional prediction tasks and advanced techniques.
Related Reading
Explore this topic in other books
NFL Analytics Game Simulation & Win Probability Basketball Analytics Win Probability in Basketball Soccer Analytics Predictive Match Modeling