Chapter 21: Key Takeaways - In-Game Win Probability
Core Concepts Summary
1. Win Probability Definition
- Definition: The probability that a team wins given the current game state
- Key Inputs: Score differential, time remaining, possession, team strength, home court
- Range: 0.0 (certain loss) to 1.0 (certain win)
- Starting Point: Typically ~0.50-0.55 for home team at tip-off
2. Win Probability Added (WPA)
- Definition: Change in win probability caused by a single play
- Formula: WPA = WP_after - WP_before
- Range: Typically -0.15 to +0.15 per play (higher in clutch)
- Team WPA: Sum equals final WP - initial WP (~0.5 for winners)
3. Leverage Index (LI)
- Definition: Importance of current game situation relative to average
- Formula: LI = Expected WP swing / Average expected WP swing
- Baseline: LI = 1.0 is average importance
- High Leverage: LI > 3.0 indicates critical moments
4. Model Calibration
- Definition: When predicted probabilities match actual outcomes
- Well-Calibrated: 70% predictions win ~70% of the time
- Brier Score: Measures combined calibration and discrimination (0-1, lower is better)
- ECE: Expected Calibration Error measures average calibration gap
Essential Formulas
Win Probability (Simplified Normal CDF Model)
WP = Phi(effective_lead / sqrt(variance * time_remaining))
Where:
- effective_lead = score_diff + possession_value + home_advantage
- possession_value = ~1.0-1.1 points
- home_advantage = ~3.0-3.5 points
- variance = ~0.068 points per second per team
- Phi = standard normal cumulative distribution function
Seconds Remaining Calculation
For regulation (quarters 1-4):
seconds_remaining = (4 - period) * 720 + clock_seconds
Where:
- period = current quarter (1-4)
- clock_seconds = minutes * 60 + seconds on game clock
Brier Score
Brier Score = (1/n) * sum((predicted_i - actual_i)^2)
Where:
- predicted_i = predicted win probability for situation i
- actual_i = actual outcome (1 if win, 0 if loss)
- n = number of predictions
Interpretation:
- 0.00 = perfect predictions
- 0.25 = always predicting 0.5 (baseline)
- Lower is better
Win Probability Added
WPA = WP_after - WP_before
For a game winner:
Total Team WPA = 1.0 - initial_WP = ~0.5 (starting from neutral)
For individual play:
WPA accounts for context (time, score, leverage)
Leverage Index
LI = Expected WP swing in situation / Average expected WP swing
Rules of thumb:
- LI < 0.5: Low leverage (early game, blowout)
- LI = 1.0: Average leverage
- LI = 2-3: High leverage (close game, late)
- LI > 5: Extreme leverage (clutch moments)
Logistic Regression Model
WP = 1 / (1 + exp(-z))
Where:
z = beta_0 + beta_1*score_diff + beta_2*f(time) + beta_3*possession + ...
Common time transformations:
- log(seconds_remaining + 1)
- sqrt(seconds_remaining)
- seconds_remaining / 2880 (normalized)
Expected Calibration Error
ECE = sum(n_b / N * |accuracy_b - confidence_b|)
Where:
- n_b = samples in bin b
- N = total samples
- accuracy_b = actual win rate in bin b
- confidence_b = average predicted probability in bin b
Implementation Checklist
Building a Win Probability Model
- [ ] Data Collection
- [ ] Gather play-by-play data (1000+ games recommended)
- [ ] Include: game_id, period, clock, scores, events
- [ ] Calculate seconds remaining
- [ ] Determine possession for each event
-
[ ] Create binary outcome variable (home_win)
-
[ ] Feature Engineering
- [ ] Score differential (primary feature)
- [ ] Time remaining transformations (log, sqrt)
- [ ] Possession indicator (-1, 0, 1)
- [ ] Score-time interaction terms
- [ ] Quarter/period indicators
- [ ] Clutch situation flag
-
[ ] Optional: team strength adjustment
-
[ ] Model Training
- [ ] Choose algorithm (logistic regression recommended for interpretability)
- [ ] Use time-series cross-validation
- [ ] Apply regularization (L2) to prevent overfitting
-
[ ] Calculate Brier score on held-out data
-
[ ] Calibration
- [ ] Create calibration curve (predicted vs actual)
- [ ] Calculate Expected Calibration Error
- [ ] Apply Platt scaling if needed
-
[ ] Verify calibration across all probability bins
-
[ ] Validation
- [ ] Temporal validation (train on past, test on future)
- [ ] Stratified evaluation (by quarter, score differential)
- [ ] Compare to baseline models
-
[ ] Check edge cases (overtime, large leads)
-
[ ] Deployment
- [ ] Build prediction API
- [ ] Handle real-time updates
- [ ] Monitor calibration drift
- [ ] Document model assumptions
Common Pitfalls to Avoid
1. Data Leakage
Problem: Using future information to predict current state Solution: Use time-series cross-validation; never use game outcome as feature
2. Ignoring Calibration
Problem: Model has good accuracy but poor probability estimates Solution: Always evaluate calibration; apply post-hoc calibration techniques
3. Overcomplicating Features
Problem: Too many features lead to overfitting Solution: Start simple (score, time, possession); add features incrementally
4. Misinterpreting WPA
Problem: Using WPA to predict future performance Solution: WPA describes past impact; use for narrative, not projection
5. Ignoring Context in Leverage
Problem: Treating all high-WPA plays as equally skillful Solution: Normalize by leverage index; consider shot difficulty
6. Small Sample Overconfidence
Problem: Drawing conclusions from single games or few plays Solution: Report confidence intervals; aggregate over many observations
Quick Reference Tables
Benchmark Brier Scores
| Model | Brier Score | Quality |
|---|---|---|
| Perfect | 0.000 | Ideal (impossible) |
| Strong Model | 0.10-0.15 | Excellent |
| Good Model | 0.15-0.20 | Good |
| Baseline (always 0.5) | 0.250 | Poor |
| Weak Model | 0.20-0.25 | Marginal |
| Random | 0.333 | Worthless |
Leverage Index Reference
| Situation | Approximate LI |
|---|---|
| Start of game | 0.5-0.8 |
| Up 10, start of Q2 | 0.6-0.9 |
| Tied, end of Q3 | 1.5-2.0 |
| Down 3, 2 min left | 2.5-3.5 |
| Tied, 30 sec left | 4.0-6.0 |
| Tied, final possession | 5.0-8.0 |
| Up 20, Q4 | 0.1-0.3 |
Win Probability Guidelines
| Game State | Approximate Home WP |
|---|---|
| Tip-off | 52-55% |
| Up 5 at half | 72-75% |
| Up 10 at half | 85-88% |
| Up 15 at half | 92-95% |
| Down 5, 5 min left | 25-30% |
| Down 10, 5 min left | 8-12% |
| Up 3, 30 sec left, ball | 90-95% |
| Tied, 10 sec left, ball | 55-60% |
Feature Importance (Typical)
| Feature | Relative Importance |
|---|---|
| Score differential | 1.00 (baseline) |
| Time remaining | 0.60-0.70 |
| Score x Time interaction | 0.30-0.40 |
| Possession | 0.15-0.25 |
| Home court | 0.10-0.15 |
| Team strength | 0.10-0.20 |
Decision Frameworks
Framework 1: Interpreting Win Probability
1. Check current WP estimate
2. Compare to pre-game expectations
3. Identify key swings (when did WP change most?)
4. Calculate improbability if underdog wins
5. Context: Is this a meaningful probability shift?
Framework 2: Evaluating Player WPA
1. Calculate total WPA for game/season
2. Normalize by possessions played
3. Separate by leverage tier (high vs low)
4. Compare to expected WPA given opportunities
5. Caveat: WPA is descriptive, not predictive
Framework 3: Model Calibration Check
1. Bin predictions into 10 groups (0-10%, 10-20%, etc.)
2. Calculate actual win rate in each bin
3. Plot predicted vs actual (should be diagonal)
4. Identify over/under-confident regions
5. Apply recalibration if needed
Framework 4: Building Real-Time WP System
1. Train model on historical data
2. Set up data pipeline for live game feed
3. Calculate WP after each play
4. Store WP trajectory for visualization
5. Calculate WPA for significant events
6. Monitor for calibration drift
Key Insights Summary
-
Score differential is king: Accounts for 50-60% of model predictive power
-
Time transforms matter: Log and sqrt transformations capture non-linear decay
-
Possession is worth ~1 point: Important to include, especially late game
-
Calibration trumps accuracy: Well-calibrated 70% is better than miscalibrated 75%
-
Leverage varies 100x: From 0.1 in blowouts to 10+ in crunch time
-
WPA is retrospective: Great for storytelling, poor for prediction
-
Single games have huge variance: Even 90% WP situations lose 10% of the time
-
Simple models often win: Logistic regression competes with complex ML
-
Temporal validation is essential: Prevents data leakage and overfit
-
Context always matters: Same WP swing means different things in different situations
Application Scenarios
Scenario 1: Broadcasting Win Probability
- Build model with ~150ms inference time
- Display WP after each possession
- Highlight plays with WPA > 0.10
- Show "win probability graph" during breaks
- Calculate "comeback improbability" for late leads
Scenario 2: Coaching Decision Support
- Calculate WP for each strategic option
- Compare: foul vs defend, 2 vs 3, timeout vs play
- Present as "this choice gives X% better WP"
- Track decision quality over time
- Adjust for personnel and matchups
Scenario 3: Player Evaluation
- Calculate season WPA for each player
- Normalize by minutes and possessions
- Separate clutch WPA (LI > 2) from regular
- Compare to expected WPA given shot/play quality
- Use alongside other metrics (not in isolation)
Scenario 4: Fan Engagement
- Show live WP during games
- Create "nail-biter index" (time spent near 50%)
- Rank most improbable wins
- Identify "plays of the game" by WPA
- Compare current game to historical context
Tools and Resources
Recommended Software
- Python: scikit-learn, XGBoost, statsmodels
- R: hoopR, tidyverse, mgcv
- Visualization: Matplotlib, Plotly, D3.js
Data Sources
- NBA API (official play-by-play)
- Basketball-Reference (historical data)
- Second Spectrum (proprietary tracking)
- ESPN API (real-time feeds)
Key Metrics to Track
- Brier Score (overall model quality)
- ECE (calibration quality)
- WPA distribution (player/team evaluation)
- Leverage distribution (game excitement)
- Calibration drift (production monitoring)
Reference Implementations
- ESPN Win Probability
- FiveThirtyEight NBA model
- Inpredictable (Mike Beuoy)
- Cleaning the Glass
Summary Equations Card
Win Probability (Normal CDF):
WP = Phi((score_diff + poss_value + home_adv) / sqrt(var * time))
Brier Score:
BS = (1/n) * sum((pred - actual)^2)
Win Probability Added:
WPA = WP_after - WP_before
Leverage Index:
LI = E[WP_swing_current] / E[WP_swing_average]
Seconds Remaining:
sec = (4 - quarter) * 720 + clock_seconds
Effective Lead:
eff_lead = score_diff + (~1.0 if possession) + (~3.5 if home)
Expected Points per Possession:
EPP = ~1.0 to 1.1 points
Calibration Error (per bin):
CE_b = |actual_rate_b - predicted_avg_b|
ECE:
ECE = sum((n_b/N) * CE_b)