Chapter 27: Case Study 1 - Building an Automated Shot Quality Model Using Tracking Data

Introduction

Shot quality models estimate the probability that a shot will be made based on various contextual factors. While simple models use only shot distance and type, advanced models leverage tracking data to incorporate defender positions, shooter movement, and other spatial features. This case study walks through building a comprehensive shot quality model using computer vision-derived tracking data.

Part 1: Problem Definition

Objective

Create a model that predicts P(Make) for any shot attempt, using: - Shooter position and movement - Defender positions and distances - Game context (score, time, etc.)

Data Sources

NBA tracking data (25 fps)
Play-by-play (shot outcomes)
Shot location data

Evaluation Metrics

Log Loss (calibration)
AUC-ROC (discrimination)
Calibration curves
Points Added vs. baseline

Part 2: Feature Engineering from Tracking Data

Shooter Features

def extract_shooter_features(tracking_df, shot_time, shooter_id):
    """
    Extract features about the shooter at shot time.
    """
    # Get shooter position at shot time
    shooter_pos = get_position(tracking_df, shot_time, shooter_id)

    features = {
        # Location features
        'distance_to_basket': calculate_distance(shooter_pos, BASKET_POS),
        'shot_angle': calculate_angle(shooter_pos, BASKET_POS),

        # Movement features (last 1 second)
        'shooter_speed': calculate_speed(tracking_df, shooter_id, shot_time, window=1.0),
        'is_moving': calculate_speed(...) > 3.0,  # mph threshold

        # Positioning
        'in_paint': is_in_paint(shooter_pos),
        'corner_three': is_corner_three(shooter_pos),
        'distance_from_three_line': distance_to_three_line(shooter_pos)
    }

    return features

Defender Features

def extract_defender_features(tracking_df, shot_time, shooter_id):
    """
    Extract features about nearest defenders.
    """
    shooter_pos = get_position(tracking_df, shot_time, shooter_id)

    # Find all defenders
    defenders = get_defensive_players(tracking_df, shot_time)

    # Calculate distances to all defenders
    distances = [calculate_distance(shooter_pos, d['position'])
                 for d in defenders]

    features = {
        # Closest defender
        'closest_defender_dist': min(distances),

        # Second closest (help defense)
        'second_defender_dist': sorted(distances)[1],

        # Number of defenders within range
        'defenders_within_4ft': sum(d < 4 for d in distances),
        'defenders_within_6ft': sum(d < 6 for d in distances),

        # Rim protection
        'rim_protector_present': any_defender_near_rim(tracking_df, shot_time),
        'rim_protector_distance': closest_defender_to_rim(...)
    }

    return features

Touch and Possession Features

def extract_possession_features(tracking_df, shot_time, shooter_id):
    """
    Features about how shooter received the ball.
    """
    # Find when shooter received ball
    catch_time = find_catch_time(tracking_df, shot_time, shooter_id)

    features = {
        # Time since catch
        'touch_time': shot_time - catch_time,
        'catch_and_shoot': (shot_time - catch_time) < 0.5,

        # Dribbles before shot
        'dribbles_before_shot': count_dribbles(tracking_df, catch_time, shot_time),

        # Pass type received
        'received_from_drive': was_drive_kick(tracking_df, catch_time),
        'received_from_post': was_post_pass(tracking_df, catch_time)
    }

    return features

Part 3: Model Development

Dataset

Training: 2018-2020 seasons (~300,000 shots)
Validation: 2020-21 season (~80,000 shots)
Test: 2021-22 season (~80,000 shots)

Feature List (Final)

Category	Features	Count
Location	Distance, angle, zone	8
Shooter movement	Speed, direction, settled	5
Defender proximity	Closest, second closest, contest	7
Help defense	Paint defenders, rim protection	4
Possession	Touch time, dribbles, pass type	6
Context	Time, score, quarter	5
Total		35

Model Selection

Comparing approaches:

Model	Log Loss	AUC-ROC	Calibration
Logistic Regression	0.612	0.682	Good
Random Forest	0.598	0.705	Fair
XGBoost	0.589	0.718	Good
Neural Network	0.592	0.712	Fair

Selected: XGBoost for best balance of performance and calibration.

Hyperparameter Tuning

best_params = {
    'max_depth': 6,
    'learning_rate': 0.05,
    'n_estimators': 300,
    'min_child_weight': 10,
    'subsample': 0.8,
    'colsample_bytree': 0.8
}

Part 4: Results

Overall Performance

Log Loss: 0.589 (vs. 0.652 for distance-only model)
AUC-ROC: 0.718 (vs. 0.659 for distance-only)
Calibration: Well-calibrated across probability range

Feature Importance

Rank	Feature	Importance
1	distance_to_basket	0.182
2	closest_defender_dist	0.124
3	touch_time	0.098
4	shooter_speed	0.076
5	shot_angle	0.068
6	defenders_within_4ft	0.055
7	rim_protector_present	0.048
8	catch_and_shoot	0.042

Key Insights

Defender distance matters most after location: A wide-open 3 is worth more than a contested one
Touch time is predictive: Catch-and-shoot has ~5% higher make probability
Movement is important: Stationary shooters make more shots
Rim protection deters attempts: Fewer shots near protected rim

Part 5: Applications

Shot Selection Analysis

Calculate "Points Added" for each player:

Points Added = sum(Actual Points - Expected Points per Shot)
Expected Points = P(Make) × Point Value

Top Shot Selectors (2021-22): 1. Stephen Curry: +145 points 2. Kevin Durant: +112 points 3. Devin Booker: +98 points

Worst Shot Selectors: 1. Russell Westbrook: -89 points 2. ...

Defensive Impact

Measure how defenders affect shot quality:

Defensive Impact = Opponent xPts with Player - Opponent xPts without

Best Shot Contest Defenders: 1. Rudy Gobert: -0.08 xPts per shot contested 2. Bam Adebayo: -0.06 xPts 3. Draymond Green: -0.05 xPts

Real-Time Applications

Live shot quality display during broadcasts
Coaching feedback on shot selection
Player development tracking

Part 6: Limitations and Future Work

Limitations

No shooter skill adjustment: Same xPts for Curry and average shooter
Limited pre-shot actions: Don't capture full play development
No fatigue/game state interaction: Early vs. late game may differ
Tracking data availability: Not available for all levels

Future Improvements

Add shooter-specific adjustments
Incorporate play-type context
Model shot fake and pump fake scenarios
Include time-varying fatigue effects

Conclusion

Tracking data enables shot quality models that significantly outperform location-only baselines. Key factors include defender proximity, shooter movement, and touch time. These models have practical applications for shot selection analysis, defensive evaluation, and real-time broadcasting.

Exercises

Exercise 1

Calculate the expected points for the following shots using a simplified model: - 15-foot jumper, open (4+ feet from defender) - 15-foot jumper, contested (2 feet from defender) - 3-pointer, catch-and-shoot, open - 3-pointer, off-dribble, contested

Exercise 2

Design features to capture "shot difficulty" that aren't already included in the model.

Exercise 3

How would you modify this model for college basketball where tracking data is less available?