Chapter 21: Key Takeaways - Win Probability Models

DataField.Dev

Chapter 21: Key Takeaways - Win Probability Models

Quick Reference Summary

This chapter covered building, calibrating, and applying win probability models for college football.

Core Concepts

Game State Features

Feature	Description	Typical Range
Score Differential	Home - Away score	-35 to +35
Time Remaining	Seconds left in game	0 to 3600
Field Position	Yard line (offense view)	1 to 99
Down	Current down	1 to 4
Distance	Yards to first down	1 to 99
Possession	Which team has ball	Home/Away
Timeouts	Remaining for each team	0 to 3

Key Metrics

Metric	Formula	Good Value
Brier Score	Mean((pred - outcome)²)	< 0.20
ECE	Weighted avg calibration error	< 0.05
AUC	Area under ROC curve	> 0.80
Log Loss	-Σ[y·log(p) + (1-y)·log(1-p)]/n	< 0.50

Essential Formulas

Logistic Win Probability

WP = 1 / (1 + exp(-z))

Where: z = β₀ + β₁·score_diff + β₂·time + β₃·field_pos + ...

Win Probability Added

WPA = WP_after - WP_before

Total game WPA = Final WP - Initial WP

Expected Calibration Error

ECE = Σ (n_bin / n_total) × |predicted_avg - actual_avg|

Pregame WP from Spread

WP = 1 / (1 + 10^(-spread / 10))

Example: 7-point favorite
WP = 1 / (1 + 10^(-7/10)) = 0.72

Code Patterns

Basic Win Probability Model

from sklearn.linear_model import LogisticRegression

def train_wp_model(plays: pd.DataFrame) -> LogisticRegression:
    """Train basic WP model."""
    features = ['score_diff', 'time_remaining', 'field_position',
                'down', 'distance', 'possession']

    X = plays[features].values
    y = plays['home_win'].values

    model = LogisticRegression()
    model.fit(X, y)
    return model

WPA Calculation

def calculate_wpa(plays: pd.DataFrame, wp_model) -> pd.DataFrame:
    """Calculate WPA for each play."""
    df = plays.copy()

    # Get WP before each play
    df['wp_before'] = wp_model.predict_proba(df[features])[:, 1]

    # WP after is next play's WP before
    df['wp_after'] = df.groupby('game_id')['wp_before'].shift(-1)

    # Handle game-ending plays
    df['wp_after'] = df['wp_after'].fillna(df['home_win'])

    # Calculate WPA
    df['wpa'] = df['wp_after'] - df['wp_before']

    return df

Calibration Check

def check_calibration(predictions: np.ndarray,
                     outcomes: np.ndarray,
                     n_bins: int = 10) -> dict:
    """Check model calibration."""
    bins = np.linspace(0, 1, n_bins + 1)
    calibration = []

    for i in range(n_bins):
        mask = (predictions >= bins[i]) & (predictions < bins[i+1])
        if mask.sum() > 0:
            pred_mean = predictions[mask].mean()
            actual_mean = outcomes[mask].mean()
            calibration.append({
                'bin': f'{bins[i]:.1f}-{bins[i+1]:.1f}',
                'predicted': pred_mean,
                'actual': actual_mean,
                'error': abs(pred_mean - actual_mean)
            })

    df = pd.DataFrame(calibration)
    ece = df['error'].mean()

    return {'calibration': df, 'ece': ece}

Model Comparison

Logistic Regression

Pros	Cons
Interpretable coefficients	Limited non-linear capture
Fast training	May underfit complex patterns
Good baseline	Requires manual features

Gradient Boosting

Pros	Cons
Captures non-linear relationships	Less interpretable
Automatic feature interactions	Risk of overfitting
Often best accuracy	Slower training

Neural Network

Pros	Cons
Most flexible	Hardest to interpret
Can learn complex patterns	Requires most data
End-to-end learning	Training complexity

Fourth Down Decision Framework

Decision Options

Go for it: Convert or turnover on downs
Punt: Better field position for opponent
Field Goal: If in range

Expected WP Calculation

E[WP_go] = P(convert) × WP_convert + (1-P(convert)) × WP_fail

E[WP_fg] = P(make) × WP_make + (1-P(make)) × WP_miss

E[WP_punt] = WP_after_punt

Break-Even Analysis

Go for it when: E[WP_go] > E[WP_punt]

Break-even rate: P* where E[WP_go] = E[WP_punt]

Calibration Guidelines

Interpreting Calibration Curves

Pattern	Issue	Fix
Curve below diagonal	Overconfident	Apply calibration
Curve above diagonal	Underconfident	Apply calibration
S-curve deviation	Non-linear miscalibration	Isotonic regression
Consistent offset	Systematic bias	Retrain or adjust

Calibration Methods

Platt Scaling: Logistic regression on predictions
Isotonic Regression: Non-parametric mapping
Temperature Scaling: Divide logits by constant

Common Pitfalls

1. Ignoring Time-Score Interaction

Wrong:

features = ['score_diff', 'time_remaining']  # No interaction

Right:

df['score_time'] = df['score_diff'] * df['time_remaining']
features = ['score_diff', 'time_remaining', 'score_time']

2. Poor End-of-Game Handling

Wrong:

# Missing data for final plays
df['wp_after'] = df['wp_before'].shift(-1)  # NaN at end

Right:

df['wp_after'] = df['wp_before'].shift(-1)
df.loc[df['wp_after'].isna(), 'wp_after'] = df['home_win']

3. Ignoring Pregame Probability

Wrong:

# Start both teams at 50%
initial_wp = 0.5

Right:

# Incorporate team strength
initial_wp = calculate_pregame_wp(home_team, away_team)

Evaluation Checklist

Before Training

[ ] Clean play-by-play data
[ ] Define game state features
[ ] Determine outcome variable
[ ] Split train/validation/test

Model Training

[ ] Feature engineering complete
[ ] Cross-validation performed
[ ] Baseline comparison done
[ ] Hyperparameters tuned

Calibration

[ ] Calibration curve plotted
[ ] ECE calculated
[ ] Brier score measured
[ ] Calibration applied if needed

Validation

[ ] Out-of-sample testing
[ ] Extreme case analysis
[ ] Visual inspection of predictions
[ ] Domain expert review

Quick Reference Tables

Approximate WP by Situation

Score Diff	Time Remaining	WP (Leading Team)
+7	15:00 Q4	~75%
+7	5:00 Q4	~85%
+7	2:00 Q4	~90%
+14	10:00 Q4	~92%
+14	2:00 Q4	~98%

Typical WPA Values

Event	WPA Range
Touchdown (close game)	+0.15 to +0.30
Field goal	+0.03 to +0.10
Interception	-0.08 to -0.20
Fumble	-0.05 to -0.15
First down	+0.01 to +0.05
Punt	-0.01 to -0.03

Next Steps

After mastering win probability models, proceed to: - Chapter 22: Machine Learning Applications - Chapter 23: Network Analysis in Football - Chapter 27: Building a Complete Analytics System