Chapter 21: Key Takeaways - Win Probability Models
Quick Reference Summary
This chapter covered building, calibrating, and applying win probability models for college football.
Core Concepts
Game State Features
| Feature | Description | Typical Range |
|---|---|---|
| Score Differential | Home - Away score | -35 to +35 |
| Time Remaining | Seconds left in game | 0 to 3600 |
| Field Position | Yard line (offense view) | 1 to 99 |
| Down | Current down | 1 to 4 |
| Distance | Yards to first down | 1 to 99 |
| Possession | Which team has ball | Home/Away |
| Timeouts | Remaining for each team | 0 to 3 |
Key Metrics
| Metric | Formula | Good Value |
|---|---|---|
| Brier Score | Mean((pred - outcome)²) | < 0.20 |
| ECE | Weighted avg calibration error | < 0.05 |
| AUC | Area under ROC curve | > 0.80 |
| Log Loss | -Σ[y·log(p) + (1-y)·log(1-p)]/n | < 0.50 |
Essential Formulas
Logistic Win Probability
WP = 1 / (1 + exp(-z))
Where: z = β₀ + β₁·score_diff + β₂·time + β₃·field_pos + ...
Win Probability Added
WPA = WP_after - WP_before
Total game WPA = Final WP - Initial WP
Expected Calibration Error
ECE = Σ (n_bin / n_total) × |predicted_avg - actual_avg|
Pregame WP from Spread
WP = 1 / (1 + 10^(-spread / 10))
Example: 7-point favorite
WP = 1 / (1 + 10^(-7/10)) = 0.72
Code Patterns
Basic Win Probability Model
from sklearn.linear_model import LogisticRegression
def train_wp_model(plays: pd.DataFrame) -> LogisticRegression:
"""Train basic WP model."""
features = ['score_diff', 'time_remaining', 'field_position',
'down', 'distance', 'possession']
X = plays[features].values
y = plays['home_win'].values
model = LogisticRegression()
model.fit(X, y)
return model
WPA Calculation
def calculate_wpa(plays: pd.DataFrame, wp_model) -> pd.DataFrame:
"""Calculate WPA for each play."""
df = plays.copy()
# Get WP before each play
df['wp_before'] = wp_model.predict_proba(df[features])[:, 1]
# WP after is next play's WP before
df['wp_after'] = df.groupby('game_id')['wp_before'].shift(-1)
# Handle game-ending plays
df['wp_after'] = df['wp_after'].fillna(df['home_win'])
# Calculate WPA
df['wpa'] = df['wp_after'] - df['wp_before']
return df
Calibration Check
def check_calibration(predictions: np.ndarray,
outcomes: np.ndarray,
n_bins: int = 10) -> dict:
"""Check model calibration."""
bins = np.linspace(0, 1, n_bins + 1)
calibration = []
for i in range(n_bins):
mask = (predictions >= bins[i]) & (predictions < bins[i+1])
if mask.sum() > 0:
pred_mean = predictions[mask].mean()
actual_mean = outcomes[mask].mean()
calibration.append({
'bin': f'{bins[i]:.1f}-{bins[i+1]:.1f}',
'predicted': pred_mean,
'actual': actual_mean,
'error': abs(pred_mean - actual_mean)
})
df = pd.DataFrame(calibration)
ece = df['error'].mean()
return {'calibration': df, 'ece': ece}
Model Comparison
Logistic Regression
| Pros | Cons |
|---|---|
| Interpretable coefficients | Limited non-linear capture |
| Fast training | May underfit complex patterns |
| Good baseline | Requires manual features |
Gradient Boosting
| Pros | Cons |
|---|---|
| Captures non-linear relationships | Less interpretable |
| Automatic feature interactions | Risk of overfitting |
| Often best accuracy | Slower training |
Neural Network
| Pros | Cons |
|---|---|
| Most flexible | Hardest to interpret |
| Can learn complex patterns | Requires most data |
| End-to-end learning | Training complexity |
Fourth Down Decision Framework
Decision Options
- Go for it: Convert or turnover on downs
- Punt: Better field position for opponent
- Field Goal: If in range
Expected WP Calculation
E[WP_go] = P(convert) × WP_convert + (1-P(convert)) × WP_fail
E[WP_fg] = P(make) × WP_make + (1-P(make)) × WP_miss
E[WP_punt] = WP_after_punt
Break-Even Analysis
Go for it when: E[WP_go] > E[WP_punt]
Break-even rate: P* where E[WP_go] = E[WP_punt]
Calibration Guidelines
Interpreting Calibration Curves
| Pattern | Issue | Fix |
|---|---|---|
| Curve below diagonal | Overconfident | Apply calibration |
| Curve above diagonal | Underconfident | Apply calibration |
| S-curve deviation | Non-linear miscalibration | Isotonic regression |
| Consistent offset | Systematic bias | Retrain or adjust |
Calibration Methods
- Platt Scaling: Logistic regression on predictions
- Isotonic Regression: Non-parametric mapping
- Temperature Scaling: Divide logits by constant
Common Pitfalls
1. Ignoring Time-Score Interaction
Wrong:
features = ['score_diff', 'time_remaining'] # No interaction
Right:
df['score_time'] = df['score_diff'] * df['time_remaining']
features = ['score_diff', 'time_remaining', 'score_time']
2. Poor End-of-Game Handling
Wrong:
# Missing data for final plays
df['wp_after'] = df['wp_before'].shift(-1) # NaN at end
Right:
df['wp_after'] = df['wp_before'].shift(-1)
df.loc[df['wp_after'].isna(), 'wp_after'] = df['home_win']
3. Ignoring Pregame Probability
Wrong:
# Start both teams at 50%
initial_wp = 0.5
Right:
# Incorporate team strength
initial_wp = calculate_pregame_wp(home_team, away_team)
Evaluation Checklist
Before Training
- [ ] Clean play-by-play data
- [ ] Define game state features
- [ ] Determine outcome variable
- [ ] Split train/validation/test
Model Training
- [ ] Feature engineering complete
- [ ] Cross-validation performed
- [ ] Baseline comparison done
- [ ] Hyperparameters tuned
Calibration
- [ ] Calibration curve plotted
- [ ] ECE calculated
- [ ] Brier score measured
- [ ] Calibration applied if needed
Validation
- [ ] Out-of-sample testing
- [ ] Extreme case analysis
- [ ] Visual inspection of predictions
- [ ] Domain expert review
Quick Reference Tables
Approximate WP by Situation
| Score Diff | Time Remaining | WP (Leading Team) |
|---|---|---|
| +7 | 15:00 Q4 | ~75% |
| +7 | 5:00 Q4 | ~85% |
| +7 | 2:00 Q4 | ~90% |
| +14 | 10:00 Q4 | ~92% |
| +14 | 2:00 Q4 | ~98% |
Typical WPA Values
| Event | WPA Range |
|---|---|
| Touchdown (close game) | +0.15 to +0.30 |
| Field goal | +0.03 to +0.10 |
| Interception | -0.08 to -0.20 |
| Fumble | -0.05 to -0.15 |
| First down | +0.01 to +0.05 |
| Punt | -0.01 to -0.03 |
Next Steps
After mastering win probability models, proceed to: - Chapter 22: Machine Learning Applications - Chapter 23: Network Analysis in Football - Chapter 27: Building a Complete Analytics System