Chapter 27: Case Study 1 - Building an Automated Shot Quality Model Using Tracking Data
Introduction
Shot quality models estimate the probability that a shot will be made based on various contextual factors. While simple models use only shot distance and type, advanced models leverage tracking data to incorporate defender positions, shooter movement, and other spatial features. This case study walks through building a comprehensive shot quality model using computer vision-derived tracking data.
Part 1: Problem Definition
Objective
Create a model that predicts P(Make) for any shot attempt, using: - Shooter position and movement - Defender positions and distances - Game context (score, time, etc.)
Data Sources
- NBA tracking data (25 fps)
- Play-by-play (shot outcomes)
- Shot location data
Evaluation Metrics
- Log Loss (calibration)
- AUC-ROC (discrimination)
- Calibration curves
- Points Added vs. baseline
Part 2: Feature Engineering from Tracking Data
Shooter Features
def extract_shooter_features(tracking_df, shot_time, shooter_id):
"""
Extract features about the shooter at shot time.
"""
# Get shooter position at shot time
shooter_pos = get_position(tracking_df, shot_time, shooter_id)
features = {
# Location features
'distance_to_basket': calculate_distance(shooter_pos, BASKET_POS),
'shot_angle': calculate_angle(shooter_pos, BASKET_POS),
# Movement features (last 1 second)
'shooter_speed': calculate_speed(tracking_df, shooter_id, shot_time, window=1.0),
'is_moving': calculate_speed(...) > 3.0, # mph threshold
# Positioning
'in_paint': is_in_paint(shooter_pos),
'corner_three': is_corner_three(shooter_pos),
'distance_from_three_line': distance_to_three_line(shooter_pos)
}
return features
Defender Features
def extract_defender_features(tracking_df, shot_time, shooter_id):
"""
Extract features about nearest defenders.
"""
shooter_pos = get_position(tracking_df, shot_time, shooter_id)
# Find all defenders
defenders = get_defensive_players(tracking_df, shot_time)
# Calculate distances to all defenders
distances = [calculate_distance(shooter_pos, d['position'])
for d in defenders]
features = {
# Closest defender
'closest_defender_dist': min(distances),
# Second closest (help defense)
'second_defender_dist': sorted(distances)[1],
# Number of defenders within range
'defenders_within_4ft': sum(d < 4 for d in distances),
'defenders_within_6ft': sum(d < 6 for d in distances),
# Rim protection
'rim_protector_present': any_defender_near_rim(tracking_df, shot_time),
'rim_protector_distance': closest_defender_to_rim(...)
}
return features
Touch and Possession Features
def extract_possession_features(tracking_df, shot_time, shooter_id):
"""
Features about how shooter received the ball.
"""
# Find when shooter received ball
catch_time = find_catch_time(tracking_df, shot_time, shooter_id)
features = {
# Time since catch
'touch_time': shot_time - catch_time,
'catch_and_shoot': (shot_time - catch_time) < 0.5,
# Dribbles before shot
'dribbles_before_shot': count_dribbles(tracking_df, catch_time, shot_time),
# Pass type received
'received_from_drive': was_drive_kick(tracking_df, catch_time),
'received_from_post': was_post_pass(tracking_df, catch_time)
}
return features
Part 3: Model Development
Dataset
- Training: 2018-2020 seasons (~300,000 shots)
- Validation: 2020-21 season (~80,000 shots)
- Test: 2021-22 season (~80,000 shots)
Feature List (Final)
| Category | Features | Count |
|---|---|---|
| Location | Distance, angle, zone | 8 |
| Shooter movement | Speed, direction, settled | 5 |
| Defender proximity | Closest, second closest, contest | 7 |
| Help defense | Paint defenders, rim protection | 4 |
| Possession | Touch time, dribbles, pass type | 6 |
| Context | Time, score, quarter | 5 |
| Total | 35 |
Model Selection
Comparing approaches:
| Model | Log Loss | AUC-ROC | Calibration |
|---|---|---|---|
| Logistic Regression | 0.612 | 0.682 | Good |
| Random Forest | 0.598 | 0.705 | Fair |
| XGBoost | 0.589 | 0.718 | Good |
| Neural Network | 0.592 | 0.712 | Fair |
Selected: XGBoost for best balance of performance and calibration.
Hyperparameter Tuning
best_params = {
'max_depth': 6,
'learning_rate': 0.05,
'n_estimators': 300,
'min_child_weight': 10,
'subsample': 0.8,
'colsample_bytree': 0.8
}
Part 4: Results
Overall Performance
- Log Loss: 0.589 (vs. 0.652 for distance-only model)
- AUC-ROC: 0.718 (vs. 0.659 for distance-only)
- Calibration: Well-calibrated across probability range
Feature Importance
| Rank | Feature | Importance |
|---|---|---|
| 1 | distance_to_basket | 0.182 |
| 2 | closest_defender_dist | 0.124 |
| 3 | touch_time | 0.098 |
| 4 | shooter_speed | 0.076 |
| 5 | shot_angle | 0.068 |
| 6 | defenders_within_4ft | 0.055 |
| 7 | rim_protector_present | 0.048 |
| 8 | catch_and_shoot | 0.042 |
Key Insights
- Defender distance matters most after location: A wide-open 3 is worth more than a contested one
- Touch time is predictive: Catch-and-shoot has ~5% higher make probability
- Movement is important: Stationary shooters make more shots
- Rim protection deters attempts: Fewer shots near protected rim
Part 5: Applications
Shot Selection Analysis
Calculate "Points Added" for each player:
Points Added = sum(Actual Points - Expected Points per Shot)
Expected Points = P(Make) × Point Value
Top Shot Selectors (2021-22): 1. Stephen Curry: +145 points 2. Kevin Durant: +112 points 3. Devin Booker: +98 points
Worst Shot Selectors: 1. Russell Westbrook: -89 points 2. ...
Defensive Impact
Measure how defenders affect shot quality:
Defensive Impact = Opponent xPts with Player - Opponent xPts without
Best Shot Contest Defenders: 1. Rudy Gobert: -0.08 xPts per shot contested 2. Bam Adebayo: -0.06 xPts 3. Draymond Green: -0.05 xPts
Real-Time Applications
- Live shot quality display during broadcasts
- Coaching feedback on shot selection
- Player development tracking
Part 6: Limitations and Future Work
Limitations
- No shooter skill adjustment: Same xPts for Curry and average shooter
- Limited pre-shot actions: Don't capture full play development
- No fatigue/game state interaction: Early vs. late game may differ
- Tracking data availability: Not available for all levels
Future Improvements
- Add shooter-specific adjustments
- Incorporate play-type context
- Model shot fake and pump fake scenarios
- Include time-varying fatigue effects
Conclusion
Tracking data enables shot quality models that significantly outperform location-only baselines. Key factors include defender proximity, shooter movement, and touch time. These models have practical applications for shot selection analysis, defensive evaluation, and real-time broadcasting.
Exercises
Exercise 1
Calculate the expected points for the following shots using a simplified model: - 15-foot jumper, open (4+ feet from defender) - 15-foot jumper, contested (2 feet from defender) - 3-pointer, catch-and-shoot, open - 3-pointer, off-dribble, contested
Exercise 2
Design features to capture "shot difficulty" that aren't already included in the model.
Exercise 3
How would you modify this model for college basketball where tracking data is less available?