Case Study 2: Building an EPV Model from Tracking Data

Introduction

This case study walks through the process of building an Expected Possession Value model using player tracking data, from data preparation through model deployment.

Data Requirements

Tracking Data Structure

Each frame (25 per second) includes: - Ball position (x, y, z) - All 10 players' positions (x, y) - Game clock, shot clock - Event labels (pass, shot, turnover)

Feature Engineering

Spatial Features: - Ball distance to basket - Closest defender distance - Spacing metrics (team spread) - Lane penetration indicators

Temporal Features: - Shot clock remaining - Time in half-court - Possession phase indicators

Contextual Features: - Score differential - Quarter - Home/away

Model Architecture

Target Variable

Binary outcome: Did possession score? (then weight by points)

Or continuous: Points scored on possession

Model Selection

Gradient Boosted Trees (XGBoost): - Handles non-linear relationships - Feature importance interpretable - Fast inference for real-time applications

Neural Network Alternative: - Can capture complex spatial patterns - Requires more data - Better for sequence modeling

Implementation Steps

Step 1: Data Preprocessing

# Pseudocode for data preparation
features = extract_spatial_features(tracking_data)
features = add_temporal_features(features, game_data)
features = add_contextual_features(features, game_state)
labels = get_possession_outcomes(possession_data)

Step 2: Model Training

# Train gradient boosted model
from xgboost import XGBRegressor

model = XGBRegressor(
    n_estimators=500,
    max_depth=8,
    learning_rate=0.05,
    objective='reg:squarederror'
)
model.fit(X_train, y_train)

Step 3: Validation

  • Out-of-sample prediction error
  • Calibration: predicted EPV should match realized points
  • Feature importance analysis

Step 4: Deployment

  • Real-time inference during games
  • Historical analysis for player evaluation
  • Coaching applications

Results and Interpretation

Model Performance

Metric Value
RMSE 0.85 points
R-squared 0.35
Calibration Error 0.02

Key Feature Importance

  1. Ball distance to basket (25%)
  2. Closest defender distance (20%)
  3. Shot clock remaining (15%)
  4. Team spacing (12%)
  5. Ball handler quality (10%)

Applications

Player Evaluation

  • EPV Added per possession
  • Decision quality vs. execution quality
  • Role-specific EPV contributions

Game Strategy

  • Optimal shot selection thresholds
  • Play design evaluation
  • Defensive scheme analysis

Conclusion

Building an EPV model requires significant data infrastructure and modeling expertise, but provides powerful insights into basketball value creation at a granular level.


Technical Appendix

Feature Definitions

Feature Description Calculation
ball_dist_basket Euclidean distance sqrt((x-0)^2 + (y-0)^2)
closest_def_dist Minimum defender distance min(dist to each defender)
team_spread Convex hull area Area of polygon enclosing offensive players