Chapter 21: Key Takeaways - Win Probability Models

Quick Reference Summary

This chapter covered building, calibrating, and applying win probability models for college football.


Core Concepts

Game State Features

Feature Description Typical Range
Score Differential Home - Away score -35 to +35
Time Remaining Seconds left in game 0 to 3600
Field Position Yard line (offense view) 1 to 99
Down Current down 1 to 4
Distance Yards to first down 1 to 99
Possession Which team has ball Home/Away
Timeouts Remaining for each team 0 to 3

Key Metrics

Metric Formula Good Value
Brier Score Mean((pred - outcome)²) < 0.20
ECE Weighted avg calibration error < 0.05
AUC Area under ROC curve > 0.80
Log Loss -Σ[y·log(p) + (1-y)·log(1-p)]/n < 0.50

Essential Formulas

Logistic Win Probability

WP = 1 / (1 + exp(-z))

Where: z = β₀ + β₁·score_diff + β₂·time + β₃·field_pos + ...

Win Probability Added

WPA = WP_after - WP_before

Total game WPA = Final WP - Initial WP

Expected Calibration Error

ECE = Σ (n_bin / n_total) × |predicted_avg - actual_avg|

Pregame WP from Spread

WP = 1 / (1 + 10^(-spread / 10))

Example: 7-point favorite
WP = 1 / (1 + 10^(-7/10)) = 0.72

Code Patterns

Basic Win Probability Model

from sklearn.linear_model import LogisticRegression

def train_wp_model(plays: pd.DataFrame) -> LogisticRegression:
    """Train basic WP model."""
    features = ['score_diff', 'time_remaining', 'field_position',
                'down', 'distance', 'possession']

    X = plays[features].values
    y = plays['home_win'].values

    model = LogisticRegression()
    model.fit(X, y)
    return model

WPA Calculation

def calculate_wpa(plays: pd.DataFrame, wp_model) -> pd.DataFrame:
    """Calculate WPA for each play."""
    df = plays.copy()

    # Get WP before each play
    df['wp_before'] = wp_model.predict_proba(df[features])[:, 1]

    # WP after is next play's WP before
    df['wp_after'] = df.groupby('game_id')['wp_before'].shift(-1)

    # Handle game-ending plays
    df['wp_after'] = df['wp_after'].fillna(df['home_win'])

    # Calculate WPA
    df['wpa'] = df['wp_after'] - df['wp_before']

    return df

Calibration Check

def check_calibration(predictions: np.ndarray,
                     outcomes: np.ndarray,
                     n_bins: int = 10) -> dict:
    """Check model calibration."""
    bins = np.linspace(0, 1, n_bins + 1)
    calibration = []

    for i in range(n_bins):
        mask = (predictions >= bins[i]) & (predictions < bins[i+1])
        if mask.sum() > 0:
            pred_mean = predictions[mask].mean()
            actual_mean = outcomes[mask].mean()
            calibration.append({
                'bin': f'{bins[i]:.1f}-{bins[i+1]:.1f}',
                'predicted': pred_mean,
                'actual': actual_mean,
                'error': abs(pred_mean - actual_mean)
            })

    df = pd.DataFrame(calibration)
    ece = df['error'].mean()

    return {'calibration': df, 'ece': ece}

Model Comparison

Logistic Regression

Pros Cons
Interpretable coefficients Limited non-linear capture
Fast training May underfit complex patterns
Good baseline Requires manual features

Gradient Boosting

Pros Cons
Captures non-linear relationships Less interpretable
Automatic feature interactions Risk of overfitting
Often best accuracy Slower training

Neural Network

Pros Cons
Most flexible Hardest to interpret
Can learn complex patterns Requires most data
End-to-end learning Training complexity

Fourth Down Decision Framework

Decision Options

  1. Go for it: Convert or turnover on downs
  2. Punt: Better field position for opponent
  3. Field Goal: If in range

Expected WP Calculation

E[WP_go] = P(convert) × WP_convert + (1-P(convert)) × WP_fail

E[WP_fg] = P(make) × WP_make + (1-P(make)) × WP_miss

E[WP_punt] = WP_after_punt

Break-Even Analysis

Go for it when: E[WP_go] > E[WP_punt]

Break-even rate: P* where E[WP_go] = E[WP_punt]

Calibration Guidelines

Interpreting Calibration Curves

Pattern Issue Fix
Curve below diagonal Overconfident Apply calibration
Curve above diagonal Underconfident Apply calibration
S-curve deviation Non-linear miscalibration Isotonic regression
Consistent offset Systematic bias Retrain or adjust

Calibration Methods

  1. Platt Scaling: Logistic regression on predictions
  2. Isotonic Regression: Non-parametric mapping
  3. Temperature Scaling: Divide logits by constant

Common Pitfalls

1. Ignoring Time-Score Interaction

Wrong:

features = ['score_diff', 'time_remaining']  # No interaction

Right:

df['score_time'] = df['score_diff'] * df['time_remaining']
features = ['score_diff', 'time_remaining', 'score_time']

2. Poor End-of-Game Handling

Wrong:

# Missing data for final plays
df['wp_after'] = df['wp_before'].shift(-1)  # NaN at end

Right:

df['wp_after'] = df['wp_before'].shift(-1)
df.loc[df['wp_after'].isna(), 'wp_after'] = df['home_win']

3. Ignoring Pregame Probability

Wrong:

# Start both teams at 50%
initial_wp = 0.5

Right:

# Incorporate team strength
initial_wp = calculate_pregame_wp(home_team, away_team)

Evaluation Checklist

Before Training

  • [ ] Clean play-by-play data
  • [ ] Define game state features
  • [ ] Determine outcome variable
  • [ ] Split train/validation/test

Model Training

  • [ ] Feature engineering complete
  • [ ] Cross-validation performed
  • [ ] Baseline comparison done
  • [ ] Hyperparameters tuned

Calibration

  • [ ] Calibration curve plotted
  • [ ] ECE calculated
  • [ ] Brier score measured
  • [ ] Calibration applied if needed

Validation

  • [ ] Out-of-sample testing
  • [ ] Extreme case analysis
  • [ ] Visual inspection of predictions
  • [ ] Domain expert review

Quick Reference Tables

Approximate WP by Situation

Score Diff Time Remaining WP (Leading Team)
+7 15:00 Q4 ~75%
+7 5:00 Q4 ~85%
+7 2:00 Q4 ~90%
+14 10:00 Q4 ~92%
+14 2:00 Q4 ~98%

Typical WPA Values

Event WPA Range
Touchdown (close game) +0.15 to +0.30
Field goal +0.03 to +0.10
Interception -0.08 to -0.20
Fumble -0.05 to -0.15
First down +0.01 to +0.05
Punt -0.01 to -0.03

Next Steps

After mastering win probability models, proceed to: - Chapter 22: Machine Learning Applications - Chapter 23: Network Analysis in Football - Chapter 27: Building a Complete Analytics System