Chapter 21: Key Takeaways - In-Game Win Probability

Core Concepts Summary

1. Win Probability Definition

  • Definition: The probability that a team wins given the current game state
  • Key Inputs: Score differential, time remaining, possession, team strength, home court
  • Range: 0.0 (certain loss) to 1.0 (certain win)
  • Starting Point: Typically ~0.50-0.55 for home team at tip-off

2. Win Probability Added (WPA)

  • Definition: Change in win probability caused by a single play
  • Formula: WPA = WP_after - WP_before
  • Range: Typically -0.15 to +0.15 per play (higher in clutch)
  • Team WPA: Sum equals final WP - initial WP (~0.5 for winners)

3. Leverage Index (LI)

  • Definition: Importance of current game situation relative to average
  • Formula: LI = Expected WP swing / Average expected WP swing
  • Baseline: LI = 1.0 is average importance
  • High Leverage: LI > 3.0 indicates critical moments

4. Model Calibration

  • Definition: When predicted probabilities match actual outcomes
  • Well-Calibrated: 70% predictions win ~70% of the time
  • Brier Score: Measures combined calibration and discrimination (0-1, lower is better)
  • ECE: Expected Calibration Error measures average calibration gap

Essential Formulas

Win Probability (Simplified Normal CDF Model)

WP = Phi(effective_lead / sqrt(variance * time_remaining))

Where:
- effective_lead = score_diff + possession_value + home_advantage
- possession_value = ~1.0-1.1 points
- home_advantage = ~3.0-3.5 points
- variance = ~0.068 points per second per team
- Phi = standard normal cumulative distribution function

Seconds Remaining Calculation

For regulation (quarters 1-4):
seconds_remaining = (4 - period) * 720 + clock_seconds

Where:
- period = current quarter (1-4)
- clock_seconds = minutes * 60 + seconds on game clock

Brier Score

Brier Score = (1/n) * sum((predicted_i - actual_i)^2)

Where:
- predicted_i = predicted win probability for situation i
- actual_i = actual outcome (1 if win, 0 if loss)
- n = number of predictions

Interpretation:
- 0.00 = perfect predictions
- 0.25 = always predicting 0.5 (baseline)
- Lower is better

Win Probability Added

WPA = WP_after - WP_before

For a game winner:
Total Team WPA = 1.0 - initial_WP = ~0.5 (starting from neutral)

For individual play:
WPA accounts for context (time, score, leverage)

Leverage Index

LI = Expected WP swing in situation / Average expected WP swing

Rules of thumb:
- LI < 0.5: Low leverage (early game, blowout)
- LI = 1.0: Average leverage
- LI = 2-3: High leverage (close game, late)
- LI > 5: Extreme leverage (clutch moments)

Logistic Regression Model

WP = 1 / (1 + exp(-z))

Where:
z = beta_0 + beta_1*score_diff + beta_2*f(time) + beta_3*possession + ...

Common time transformations:
- log(seconds_remaining + 1)
- sqrt(seconds_remaining)
- seconds_remaining / 2880 (normalized)

Expected Calibration Error

ECE = sum(n_b / N * |accuracy_b - confidence_b|)

Where:
- n_b = samples in bin b
- N = total samples
- accuracy_b = actual win rate in bin b
- confidence_b = average predicted probability in bin b

Implementation Checklist

Building a Win Probability Model

  • [ ] Data Collection
  • [ ] Gather play-by-play data (1000+ games recommended)
  • [ ] Include: game_id, period, clock, scores, events
  • [ ] Calculate seconds remaining
  • [ ] Determine possession for each event
  • [ ] Create binary outcome variable (home_win)

  • [ ] Feature Engineering

  • [ ] Score differential (primary feature)
  • [ ] Time remaining transformations (log, sqrt)
  • [ ] Possession indicator (-1, 0, 1)
  • [ ] Score-time interaction terms
  • [ ] Quarter/period indicators
  • [ ] Clutch situation flag
  • [ ] Optional: team strength adjustment

  • [ ] Model Training

  • [ ] Choose algorithm (logistic regression recommended for interpretability)
  • [ ] Use time-series cross-validation
  • [ ] Apply regularization (L2) to prevent overfitting
  • [ ] Calculate Brier score on held-out data

  • [ ] Calibration

  • [ ] Create calibration curve (predicted vs actual)
  • [ ] Calculate Expected Calibration Error
  • [ ] Apply Platt scaling if needed
  • [ ] Verify calibration across all probability bins

  • [ ] Validation

  • [ ] Temporal validation (train on past, test on future)
  • [ ] Stratified evaluation (by quarter, score differential)
  • [ ] Compare to baseline models
  • [ ] Check edge cases (overtime, large leads)

  • [ ] Deployment

  • [ ] Build prediction API
  • [ ] Handle real-time updates
  • [ ] Monitor calibration drift
  • [ ] Document model assumptions

Common Pitfalls to Avoid

1. Data Leakage

Problem: Using future information to predict current state Solution: Use time-series cross-validation; never use game outcome as feature

2. Ignoring Calibration

Problem: Model has good accuracy but poor probability estimates Solution: Always evaluate calibration; apply post-hoc calibration techniques

3. Overcomplicating Features

Problem: Too many features lead to overfitting Solution: Start simple (score, time, possession); add features incrementally

4. Misinterpreting WPA

Problem: Using WPA to predict future performance Solution: WPA describes past impact; use for narrative, not projection

5. Ignoring Context in Leverage

Problem: Treating all high-WPA plays as equally skillful Solution: Normalize by leverage index; consider shot difficulty

6. Small Sample Overconfidence

Problem: Drawing conclusions from single games or few plays Solution: Report confidence intervals; aggregate over many observations


Quick Reference Tables

Benchmark Brier Scores

Model Brier Score Quality
Perfect 0.000 Ideal (impossible)
Strong Model 0.10-0.15 Excellent
Good Model 0.15-0.20 Good
Baseline (always 0.5) 0.250 Poor
Weak Model 0.20-0.25 Marginal
Random 0.333 Worthless

Leverage Index Reference

Situation Approximate LI
Start of game 0.5-0.8
Up 10, start of Q2 0.6-0.9
Tied, end of Q3 1.5-2.0
Down 3, 2 min left 2.5-3.5
Tied, 30 sec left 4.0-6.0
Tied, final possession 5.0-8.0
Up 20, Q4 0.1-0.3

Win Probability Guidelines

Game State Approximate Home WP
Tip-off 52-55%
Up 5 at half 72-75%
Up 10 at half 85-88%
Up 15 at half 92-95%
Down 5, 5 min left 25-30%
Down 10, 5 min left 8-12%
Up 3, 30 sec left, ball 90-95%
Tied, 10 sec left, ball 55-60%

Feature Importance (Typical)

Feature Relative Importance
Score differential 1.00 (baseline)
Time remaining 0.60-0.70
Score x Time interaction 0.30-0.40
Possession 0.15-0.25
Home court 0.10-0.15
Team strength 0.10-0.20

Decision Frameworks

Framework 1: Interpreting Win Probability

1. Check current WP estimate
2. Compare to pre-game expectations
3. Identify key swings (when did WP change most?)
4. Calculate improbability if underdog wins
5. Context: Is this a meaningful probability shift?

Framework 2: Evaluating Player WPA

1. Calculate total WPA for game/season
2. Normalize by possessions played
3. Separate by leverage tier (high vs low)
4. Compare to expected WPA given opportunities
5. Caveat: WPA is descriptive, not predictive

Framework 3: Model Calibration Check

1. Bin predictions into 10 groups (0-10%, 10-20%, etc.)
2. Calculate actual win rate in each bin
3. Plot predicted vs actual (should be diagonal)
4. Identify over/under-confident regions
5. Apply recalibration if needed

Framework 4: Building Real-Time WP System

1. Train model on historical data
2. Set up data pipeline for live game feed
3. Calculate WP after each play
4. Store WP trajectory for visualization
5. Calculate WPA for significant events
6. Monitor for calibration drift

Key Insights Summary

  1. Score differential is king: Accounts for 50-60% of model predictive power

  2. Time transforms matter: Log and sqrt transformations capture non-linear decay

  3. Possession is worth ~1 point: Important to include, especially late game

  4. Calibration trumps accuracy: Well-calibrated 70% is better than miscalibrated 75%

  5. Leverage varies 100x: From 0.1 in blowouts to 10+ in crunch time

  6. WPA is retrospective: Great for storytelling, poor for prediction

  7. Single games have huge variance: Even 90% WP situations lose 10% of the time

  8. Simple models often win: Logistic regression competes with complex ML

  9. Temporal validation is essential: Prevents data leakage and overfit

  10. Context always matters: Same WP swing means different things in different situations


Application Scenarios

Scenario 1: Broadcasting Win Probability

  1. Build model with ~150ms inference time
  2. Display WP after each possession
  3. Highlight plays with WPA > 0.10
  4. Show "win probability graph" during breaks
  5. Calculate "comeback improbability" for late leads

Scenario 2: Coaching Decision Support

  1. Calculate WP for each strategic option
  2. Compare: foul vs defend, 2 vs 3, timeout vs play
  3. Present as "this choice gives X% better WP"
  4. Track decision quality over time
  5. Adjust for personnel and matchups

Scenario 3: Player Evaluation

  1. Calculate season WPA for each player
  2. Normalize by minutes and possessions
  3. Separate clutch WPA (LI > 2) from regular
  4. Compare to expected WPA given shot/play quality
  5. Use alongside other metrics (not in isolation)

Scenario 4: Fan Engagement

  1. Show live WP during games
  2. Create "nail-biter index" (time spent near 50%)
  3. Rank most improbable wins
  4. Identify "plays of the game" by WPA
  5. Compare current game to historical context

Tools and Resources

  • Python: scikit-learn, XGBoost, statsmodels
  • R: hoopR, tidyverse, mgcv
  • Visualization: Matplotlib, Plotly, D3.js

Data Sources

  • NBA API (official play-by-play)
  • Basketball-Reference (historical data)
  • Second Spectrum (proprietary tracking)
  • ESPN API (real-time feeds)

Key Metrics to Track

  • Brier Score (overall model quality)
  • ECE (calibration quality)
  • WPA distribution (player/team evaluation)
  • Leverage distribution (game excitement)
  • Calibration drift (production monitoring)

Reference Implementations

  • ESPN Win Probability
  • FiveThirtyEight NBA model
  • Inpredictable (Mike Beuoy)
  • Cleaning the Glass

Summary Equations Card

Win Probability (Normal CDF):
WP = Phi((score_diff + poss_value + home_adv) / sqrt(var * time))

Brier Score:
BS = (1/n) * sum((pred - actual)^2)

Win Probability Added:
WPA = WP_after - WP_before

Leverage Index:
LI = E[WP_swing_current] / E[WP_swing_average]

Seconds Remaining:
sec = (4 - quarter) * 720 + clock_seconds

Effective Lead:
eff_lead = score_diff + (~1.0 if possession) + (~3.5 if home)

Expected Points per Possession:
EPP = ~1.0 to 1.1 points

Calibration Error (per bin):
CE_b = |actual_rate_b - predicted_avg_b|

ECE:
ECE = sum((n_b/N) * CE_b)