Case Study 2: Building an EPV Model from Tracking Data
Introduction
This case study walks through the process of building an Expected Possession Value model using player tracking data, from data preparation through model deployment.
Data Requirements
Tracking Data Structure
Each frame (25 per second) includes: - Ball position (x, y, z) - All 10 players' positions (x, y) - Game clock, shot clock - Event labels (pass, shot, turnover)
Feature Engineering
Spatial Features: - Ball distance to basket - Closest defender distance - Spacing metrics (team spread) - Lane penetration indicators
Temporal Features: - Shot clock remaining - Time in half-court - Possession phase indicators
Contextual Features: - Score differential - Quarter - Home/away
Model Architecture
Target Variable
Binary outcome: Did possession score? (then weight by points)
Or continuous: Points scored on possession
Model Selection
Gradient Boosted Trees (XGBoost): - Handles non-linear relationships - Feature importance interpretable - Fast inference for real-time applications
Neural Network Alternative: - Can capture complex spatial patterns - Requires more data - Better for sequence modeling
Implementation Steps
Step 1: Data Preprocessing
# Pseudocode for data preparation
features = extract_spatial_features(tracking_data)
features = add_temporal_features(features, game_data)
features = add_contextual_features(features, game_state)
labels = get_possession_outcomes(possession_data)
Step 2: Model Training
# Train gradient boosted model
from xgboost import XGBRegressor
model = XGBRegressor(
n_estimators=500,
max_depth=8,
learning_rate=0.05,
objective='reg:squarederror'
)
model.fit(X_train, y_train)
Step 3: Validation
- Out-of-sample prediction error
- Calibration: predicted EPV should match realized points
- Feature importance analysis
Step 4: Deployment
- Real-time inference during games
- Historical analysis for player evaluation
- Coaching applications
Results and Interpretation
Model Performance
| Metric | Value |
|---|---|
| RMSE | 0.85 points |
| R-squared | 0.35 |
| Calibration Error | 0.02 |
Key Feature Importance
- Ball distance to basket (25%)
- Closest defender distance (20%)
- Shot clock remaining (15%)
- Team spacing (12%)
- Ball handler quality (10%)
Applications
Player Evaluation
- EPV Added per possession
- Decision quality vs. execution quality
- Role-specific EPV contributions
Game Strategy
- Optimal shot selection thresholds
- Play design evaluation
- Defensive scheme analysis
Conclusion
Building an EPV model requires significant data infrastructure and modeling expertise, but provides powerful insights into basketball value creation at a granular level.
Technical Appendix
Feature Definitions
| Feature | Description | Calculation |
|---|---|---|
| ball_dist_basket | Euclidean distance | sqrt((x-0)^2 + (y-0)^2) |
| closest_def_dist | Minimum defender distance | min(dist to each defender) |
| team_spread | Convex hull area | Area of polygon enclosing offensive players |