Chapter 21: Exercises - In-Game Win Probability
Section 21.1-21.2: Foundations and Data
Exercise 21.1 (Basic)
Define win probability in the context of basketball analytics. What does a 65% win probability mean at halftime?
Exercise 21.2 (Basic)
List the five key inputs to a standard win probability model. Explain why each factor affects the probability of winning.
Exercise 21.3 (Intermediate)
Given a game in the third quarter: - Home team leads by 8 points - 6 minutes remaining in the quarter (18 minutes total remaining) - Away team has possession
Using intuition (not a formal model), estimate the home team's win probability. Explain your reasoning.
Exercise 21.4 (Intermediate)
Calculate the total seconds remaining in a game given: - Quarter 3 - Clock shows 4:32 - Regulation game (12-minute quarters)
Exercise 21.5 (Advanced)
Design a data preprocessing pipeline for win probability model training. Specify: - Required columns - Data transformations - How to handle missing values - How to create the target variable
Section 21.3: Feature Engineering
Exercise 21.6 (Basic)
Explain why score differential and time remaining interact in win probability models. Give an example where the same score differential has very different implications.
Exercise 21.7 (Basic)
A team has possession with 2 minutes remaining. The possession is worth approximately how many points in expectation? How should this affect win probability?
Exercise 21.8 (Intermediate)
Create feature transformations for time remaining that capture the non-linear relationship with win probability: a) Log transformation b) Square root transformation c) Explain when each is appropriate
Exercise 21.9 (Intermediate)
Team A is the pre-game favorite by 7 points. They trail by 2 points at halftime. How would you incorporate team strength into the win probability calculation?
Exercise 21.10 (Advanced)
Design interaction features that capture: a) Score differential relative to time remaining b) The "running out of time" effect for trailing teams c) Garbage time situations
Section 21.4: Logistic Regression Models
Exercise 21.11 (Basic)
Write the logistic regression formula for win probability with two features: score_diff and time_remaining.
Exercise 21.12 (Basic)
If a logistic regression model has: - Intercept (beta_0) = 0.5 - Score_diff coefficient (beta_1) = 0.15 - Time_remaining_sqrt coefficient (beta_2) = -0.02
Calculate the win probability when score_diff = 5 and time_remaining = 400 seconds.
Exercise 21.13 (Intermediate)
Explain why logistic regression is preferred over linear regression for win probability modeling. What problem does it solve?
Exercise 21.14 (Intermediate)
A win probability model predicts 0.72 when score_diff = 10, time_remaining = 600, possession = 1. If the team scores 3 points on this possession: - What is the new score_diff? - What is the approximate new time_remaining? - What would you expect the new win probability to be?
Exercise 21.15 (Advanced)
Implement polynomial feature expansion for a win probability model. Include: - Score squared - Time-score interactions - Cross-validation to determine optimal degree
Section 21.5: Model Calibration
Exercise 21.16 (Basic)
Define calibration in the context of probability models. What does it mean for a model to be "well-calibrated"?
Exercise 21.17 (Basic)
A model predicts win probability of 0.80 for 500 different game situations. In those situations, the team actually won 375 times. Calculate: a) The actual win rate b) Is the model well-calibrated at this probability level? c) Is the model over-confident or under-confident?
Exercise 21.18 (Intermediate)
Calculate the Brier Score for the following predictions: | Predicted WP | Actual Outcome (1=win) | |--------------|----------------------| | 0.80 | 1 | | 0.60 | 0 | | 0.55 | 1 | | 0.30 | 0 | | 0.75 | 1 |
Exercise 21.19 (Intermediate)
Explain the difference between Platt scaling and isotonic regression for model calibration. When would you use each?
Exercise 21.20 (Advanced)
Design a calibration evaluation that tests model performance across: - Different time periods (quarters) - Different score differentials - Home vs. away games Include specific metrics and visualization approaches.
Section 21.6-21.7: Applications and WPA
Exercise 21.21 (Basic)
Define Win Probability Added (WPA). What does a WPA of +0.15 mean for a play?
Exercise 21.22 (Basic)
Before a made three-pointer, win probability was 45%. After, it was 62%. Calculate the WPA for this shot.
Exercise 21.23 (Intermediate)
A player has the following plays in a game: | Play | WP Before | WP After | |------|-----------|----------| | Made 3PT | 50% | 58% | | Turnover | 62% | 55% | | Made layup | 48% | 54% | | Missed FT | 72% | 70% | | Block | 35% | 42% |
Calculate the player's total WPA for the game.
Exercise 21.24 (Intermediate)
Explain why WPA has high variance as a player evaluation metric. What are two limitations of using season WPA to evaluate players?
Exercise 21.25 (Advanced)
Design a WPA attribution system for assists. When Player A passes to Player B who scores, how should the WPA be divided between them? Consider different scenarios (open vs. difficult shot, primary vs. secondary assist).
Section 21.8: Leverage Index
Exercise 21.26 (Basic)
Define Leverage Index (LI). What does an LI of 3.0 mean?
Exercise 21.27 (Basic)
Rank the following situations from lowest to highest leverage: a) Tie game, 30 seconds left b) Up 20 points, 5 minutes left c) Down 3 points, 2 minutes left d) Up 5 points, start of 3rd quarter
Exercise 21.28 (Intermediate)
Calculate an approximate leverage index for: - Tie game, 60 seconds remaining - Expected WP swing on a made/missed shot: 25% - Average expected WP swing per possession: 2%
Exercise 21.29 (Intermediate)
How should leverage index inform: a) Player substitution decisions b) Timeout usage c) Shot selection
Exercise 21.30 (Advanced)
Create a leverage index heatmap function that visualizes LI across: - Score differential (-30 to +30) - Time remaining (0 to 48 minutes) Include code to generate the visualization.
Section 21.9: Comeback Probability
Exercise 21.31 (Basic)
A team is down 15 points with 8 minutes remaining. Using historical data, estimate their comeback probability (within reason).
Exercise 21.32 (Basic)
Why is "momentum" difficult to detect statistically in win probability data?
Exercise 21.33 (Intermediate)
Analyze the following potential comeback: - Down 12 points at end of 3rd quarter - Win probability model shows 8% chance - The team won
Calculate the "improbability" of this comeback. How should we interpret single-game comebacks in the context of sample sizes?
Exercise 21.34 (Intermediate)
Design a study to test whether "momentum" carries predictive value beyond current game state. Specify: - What data you would collect - Your hypothesis - Statistical test to use
Exercise 21.35 (Advanced)
Build a comeback probability table using simulations: - For deficits of 5, 10, 15, 20 points - At times of 12, 8, 4, 2 minutes remaining - Using realistic possession parameters Compare your results to historical NBA data.
Section 21.10-21.11: Broadcast and Implementation
Exercise 21.36 (Basic)
List three ways win probability is used in NBA broadcasts to enhance viewer experience.
Exercise 21.37 (Basic)
A broadcast shows win probability jumping from 55% to 78% after a single play. What type of play likely caused this? At what point in the game did it occur?
Exercise 21.38 (Intermediate)
Design a win probability alert system that notifies when: - Probability crosses 90% (game essentially decided) - Probability changes by >15% on a single play - An improbable comeback occurs (winner was <25% at some point)
Exercise 21.39 (Intermediate)
Create a "game excitement score" based on win probability data. Possible factors: - Lead changes - WP swings - Time spent in close situations - Final outcome vs. peak probability
Exercise 21.40 (Advanced)
Implement a complete win probability model in Python using sklearn. Include: - Feature engineering - Model training - Calibration check - Real-time prediction function
Applied Projects
Exercise 21.41 (Project)
Build a win probability model using historical play-by-play data: a) Collect at least 1000 games of data b) Engineer relevant features c) Train logistic regression model d) Evaluate calibration e) Compare to existing public models
Exercise 21.42 (Project)
Analyze the most improbable wins in NBA history: a) Find games where winner was at <5% WP at some point b) Identify the play(s) that swung the game c) Calculate total WPA for key players d) Create visualizations
Exercise 21.43 (Project)
Create a real-time win probability dashboard that: a) Updates with each play b) Shows current WP c) Displays WP graph over game d) Highlights high-WPA plays e) Provides comeback probability
Exercise 21.44 (Project)
Evaluate whether different models (logistic regression, random forest, neural network) produce significantly different win probabilities: a) Train multiple model types b) Compare predictions c) Analyze calibration differences d) Determine if model choice matters practically
Exercise 21.45 (Project)
Use WPA to identify the most "clutch" players: a) Calculate career WPA per game b) Calculate high-leverage WPA c) Analyze year-to-year consistency d) Determine if "clutch" is a persistent skill
Challenge Problems
Challenge 21.1
Derive the standard deviation of score differential as a function of time remaining, given that each possession is approximately independent with standard deviation of ~10 points per possession.
Challenge 21.2
Build a win probability model that incorporates: - Current game state - Player injury status - Recent team performance - Betting market information Compare to a model using only game state.
Challenge 21.3
Design an experiment to test whether live win probability displays affect game outcomes (e.g., do trailing teams with visible low WP try harder or give up?).
Challenge 21.4
Create a "championship probability" model that updates throughout the playoffs based on series position, home court, and individual game win probabilities.
Challenge 21.5
Develop an attribution model for team win probability that decomposes: - Individual player contributions - Coaching decisions (timeouts, substitutions) - Matchup effects - Random variance