Chapter 27: Case Study 1 - Building an Automated Shot Quality Model Using Tracking Data

Introduction

Shot quality models estimate the probability that a shot will be made based on various contextual factors. While simple models use only shot distance and type, advanced models leverage tracking data to incorporate defender positions, shooter movement, and other spatial features. This case study walks through building a comprehensive shot quality model using computer vision-derived tracking data.

Part 1: Problem Definition

Objective

Create a model that predicts P(Make) for any shot attempt, using: - Shooter position and movement - Defender positions and distances - Game context (score, time, etc.)

Data Sources

  • NBA tracking data (25 fps)
  • Play-by-play (shot outcomes)
  • Shot location data

Evaluation Metrics

  • Log Loss (calibration)
  • AUC-ROC (discrimination)
  • Calibration curves
  • Points Added vs. baseline

Part 2: Feature Engineering from Tracking Data

Shooter Features

def extract_shooter_features(tracking_df, shot_time, shooter_id):
    """
    Extract features about the shooter at shot time.
    """
    # Get shooter position at shot time
    shooter_pos = get_position(tracking_df, shot_time, shooter_id)

    features = {
        # Location features
        'distance_to_basket': calculate_distance(shooter_pos, BASKET_POS),
        'shot_angle': calculate_angle(shooter_pos, BASKET_POS),

        # Movement features (last 1 second)
        'shooter_speed': calculate_speed(tracking_df, shooter_id, shot_time, window=1.0),
        'is_moving': calculate_speed(...) > 3.0,  # mph threshold

        # Positioning
        'in_paint': is_in_paint(shooter_pos),
        'corner_three': is_corner_three(shooter_pos),
        'distance_from_three_line': distance_to_three_line(shooter_pos)
    }

    return features

Defender Features

def extract_defender_features(tracking_df, shot_time, shooter_id):
    """
    Extract features about nearest defenders.
    """
    shooter_pos = get_position(tracking_df, shot_time, shooter_id)

    # Find all defenders
    defenders = get_defensive_players(tracking_df, shot_time)

    # Calculate distances to all defenders
    distances = [calculate_distance(shooter_pos, d['position'])
                 for d in defenders]

    features = {
        # Closest defender
        'closest_defender_dist': min(distances),

        # Second closest (help defense)
        'second_defender_dist': sorted(distances)[1],

        # Number of defenders within range
        'defenders_within_4ft': sum(d < 4 for d in distances),
        'defenders_within_6ft': sum(d < 6 for d in distances),

        # Rim protection
        'rim_protector_present': any_defender_near_rim(tracking_df, shot_time),
        'rim_protector_distance': closest_defender_to_rim(...)
    }

    return features

Touch and Possession Features

def extract_possession_features(tracking_df, shot_time, shooter_id):
    """
    Features about how shooter received the ball.
    """
    # Find when shooter received ball
    catch_time = find_catch_time(tracking_df, shot_time, shooter_id)

    features = {
        # Time since catch
        'touch_time': shot_time - catch_time,
        'catch_and_shoot': (shot_time - catch_time) < 0.5,

        # Dribbles before shot
        'dribbles_before_shot': count_dribbles(tracking_df, catch_time, shot_time),

        # Pass type received
        'received_from_drive': was_drive_kick(tracking_df, catch_time),
        'received_from_post': was_post_pass(tracking_df, catch_time)
    }

    return features

Part 3: Model Development

Dataset

  • Training: 2018-2020 seasons (~300,000 shots)
  • Validation: 2020-21 season (~80,000 shots)
  • Test: 2021-22 season (~80,000 shots)

Feature List (Final)

Category Features Count
Location Distance, angle, zone 8
Shooter movement Speed, direction, settled 5
Defender proximity Closest, second closest, contest 7
Help defense Paint defenders, rim protection 4
Possession Touch time, dribbles, pass type 6
Context Time, score, quarter 5
Total 35

Model Selection

Comparing approaches:

Model Log Loss AUC-ROC Calibration
Logistic Regression 0.612 0.682 Good
Random Forest 0.598 0.705 Fair
XGBoost 0.589 0.718 Good
Neural Network 0.592 0.712 Fair

Selected: XGBoost for best balance of performance and calibration.

Hyperparameter Tuning

best_params = {
    'max_depth': 6,
    'learning_rate': 0.05,
    'n_estimators': 300,
    'min_child_weight': 10,
    'subsample': 0.8,
    'colsample_bytree': 0.8
}

Part 4: Results

Overall Performance

  • Log Loss: 0.589 (vs. 0.652 for distance-only model)
  • AUC-ROC: 0.718 (vs. 0.659 for distance-only)
  • Calibration: Well-calibrated across probability range

Feature Importance

Rank Feature Importance
1 distance_to_basket 0.182
2 closest_defender_dist 0.124
3 touch_time 0.098
4 shooter_speed 0.076
5 shot_angle 0.068
6 defenders_within_4ft 0.055
7 rim_protector_present 0.048
8 catch_and_shoot 0.042

Key Insights

  1. Defender distance matters most after location: A wide-open 3 is worth more than a contested one
  2. Touch time is predictive: Catch-and-shoot has ~5% higher make probability
  3. Movement is important: Stationary shooters make more shots
  4. Rim protection deters attempts: Fewer shots near protected rim

Part 5: Applications

Shot Selection Analysis

Calculate "Points Added" for each player:

Points Added = sum(Actual Points - Expected Points per Shot)
Expected Points = P(Make) × Point Value

Top Shot Selectors (2021-22): 1. Stephen Curry: +145 points 2. Kevin Durant: +112 points 3. Devin Booker: +98 points

Worst Shot Selectors: 1. Russell Westbrook: -89 points 2. ...

Defensive Impact

Measure how defenders affect shot quality:

Defensive Impact = Opponent xPts with Player - Opponent xPts without

Best Shot Contest Defenders: 1. Rudy Gobert: -0.08 xPts per shot contested 2. Bam Adebayo: -0.06 xPts 3. Draymond Green: -0.05 xPts

Real-Time Applications

  • Live shot quality display during broadcasts
  • Coaching feedback on shot selection
  • Player development tracking

Part 6: Limitations and Future Work

Limitations

  1. No shooter skill adjustment: Same xPts for Curry and average shooter
  2. Limited pre-shot actions: Don't capture full play development
  3. No fatigue/game state interaction: Early vs. late game may differ
  4. Tracking data availability: Not available for all levels

Future Improvements

  1. Add shooter-specific adjustments
  2. Incorporate play-type context
  3. Model shot fake and pump fake scenarios
  4. Include time-varying fatigue effects

Conclusion

Tracking data enables shot quality models that significantly outperform location-only baselines. Key factors include defender proximity, shooter movement, and touch time. These models have practical applications for shot selection analysis, defensive evaluation, and real-time broadcasting.

Exercises

Exercise 1

Calculate the expected points for the following shots using a simplified model: - 15-foot jumper, open (4+ feet from defender) - 15-foot jumper, contested (2 feet from defender) - 3-pointer, catch-and-shoot, open - 3-pointer, off-dribble, contested

Exercise 2

Design features to capture "shot difficulty" that aren't already included in the model.

Exercise 3

How would you modify this model for college basketball where tracking data is less available?