Chapter 26: Case Study 2 - Building a Draft Success Prediction Model with Gradient Boosting
Introduction
The NBA Draft presents a classic machine learning challenge: predict future success from limited historical data with high stakes. This case study walks through building a complete draft prediction model using gradient boosting, from feature engineering through deployment-ready evaluation.
Part 1: Problem Framing
Definition of Success
"Success" can be defined multiple ways. We chose a multi-target approach:
- Binary: Starter-level player (Top 150 in VORP over first 5 seasons)
- Continuous: Career Win Shares (through age 28 or 5 seasons)
- Categorical: Tier (All-Star, Starter, Rotation, Bust)
Data Collection
Training data: Draft classes 2005-2018 (sufficient career data) Test data: Draft classes 2019-2020 (holdout) Total prospects: 842 players with complete data
Part 2: Feature Engineering
Statistical Features (College)
college_features = {
# Volume stats (per 100 possessions)
'pts_per_100', 'reb_per_100', 'ast_per_100',
'stl_per_100', 'blk_per_100', 'tov_per_100',
# Efficiency
'ts_pct', 'efg_pct', 'ft_pct', 'three_pt_pct',
# Usage and role
'usage_rate', 'ast_ratio', 'tov_ratio',
# Advanced
'bpm', 'obpm', 'dbpm', 'ws_per_40'
}
Physical Features
physical_features = {
'height_no_shoes', 'weight', 'wingspan',
'standing_reach', 'body_fat_pct',
'max_vertical', 'lane_agility', 'three_quarter_sprint'
}
Derived Features
def engineer_draft_features(df):
# Wingspan advantage
df['wingspan_ratio'] = df['wingspan'] / df['height_no_shoes']
# Age adjustment (younger = more projection)
df['age_bonus'] = np.maximum(0, 22 - df['age'])
# Conference adjustment
df['conf_adj_pts'] = df['pts_per_100'] * df['conference_factor']
# Production vs draft position expectation
df['production_over_expected'] = df['bpm'] - df['expected_bpm_for_pick']
# Physical profile score
df['athletic_composite'] = (
df['max_vertical_pct'] * 0.35 +
df['agility_pct'] * 0.35 +
df['sprint_pct'] * 0.30
)
return df
Final feature count: 42 features
Part 3: Model Development
Train-Test Split
# Temporal split (no data leakage)
train = df[df['draft_year'] <= 2018] # 720 players
test = df[df['draft_year'].isin([2019, 2020])] # 122 players
Baseline Models
| Model | Accuracy (Starter Binary) | AUC-ROC |
|---|---|---|
| Logistic Regression | 68.2% | 0.71 |
| Random Forest | 71.5% | 0.74 |
| XGBoost | 73.8% | 0.77 |
| Gradient Boosting | 74.2% | 0.78 |
Hyperparameter Tuning
Using 5-fold cross-validation with time-based splits:
from sklearn.model_selection import GridSearchCV, TimeSeriesSplit
param_grid = {
'n_estimators': [100, 200, 300],
'max_depth': [3, 4, 5, 6],
'learning_rate': [0.05, 0.1, 0.15],
'min_samples_leaf': [5, 10, 20],
'subsample': [0.8, 0.9, 1.0]
}
# Time-based CV
tscv = TimeSeriesSplit(n_splits=5)
grid_search = GridSearchCV(
GradientBoostingClassifier(),
param_grid,
cv=tscv,
scoring='roc_auc'
)
Optimal parameters: - n_estimators: 200 - max_depth: 4 - learning_rate: 0.1 - min_samples_leaf: 10 - subsample: 0.9
Part 4: Model Evaluation
Classification Performance (Binary: Starter)
Confusion Matrix (Test Set):
Predicted
Starter Not-Starter
Actual Starter 28 12
Not-Starter 15 67
Metrics: - Accuracy: 77.9% - Precision: 65.1% - Recall: 70.0% - F1 Score: 67.5% - AUC-ROC: 0.79
Continuous Performance (Career Win Shares)
- MAE: 11.2 WS
- RMSE: 15.8 WS
- R-squared: 0.42
Tier Prediction (Multi-class)
| Tier | Precision | Recall | F1 |
|---|---|---|---|
| All-Star | 45% | 50% | 47% |
| Starter | 52% | 58% | 55% |
| Rotation | 61% | 55% | 58% |
| Bust | 72% | 68% | 70% |
The model is best at identifying busts, worst at identifying All-Stars (as expected given rarity).
Part 5: Feature Importance
Top 15 Features
| Rank | Feature | Importance |
|---|---|---|
| 1 | age_at_draft | 0.142 |
| 2 | bpm | 0.098 |
| 3 | ft_pct | 0.087 |
| 4 | wingspan_ratio | 0.072 |
| 5 | conf_adj_pts | 0.068 |
| 6 | ast_ratio | 0.055 |
| 7 | athletic_composite | 0.052 |
| 8 | usage_rate | 0.048 |
| 9 | ts_pct | 0.045 |
| 10 | standing_reach | 0.042 |
| 11 | games_played | 0.038 |
| 12 | stl_per_100 | 0.035 |
| 13 | max_vertical | 0.032 |
| 14 | tov_ratio | 0.028 |
| 15 | three_pt_pct | 0.025 |
Key Insights
- Age dominates: Younger producers are significantly more likely to succeed
- FT% matters more than 3PT%: Free throw shooting is a better predictor of NBA shooting
- Length (wingspan ratio) is crucial: Physical tools matter beyond just athleticism
- Playmaking indicators: Assist ratio predicts beyond just points
Part 6: Model Interpretation
SHAP Analysis
For a specific prediction (Zion Williamson, 2019):
| Feature | Value | SHAP Contribution |
|---|---|---|
| age_at_draft | 18.8 | +0.18 |
| bpm | +12.2 | +0.22 |
| wingspan_ratio | 1.04 | +0.08 |
| athletic_composite | 95th pct | +0.12 |
| ft_pct | 64.0% | -0.15 |
| conference | SEC (1.04) | +0.05 |
Prediction: 85% probability of starter, 42% probability of All-Star
Actual outcome: All-Star (model correctly identified high upside)
Partial Dependence
Key relationships discovered: - Age effect is strongly non-linear (drops sharply after 21) - FT% has threshold effect (below 70% is red flag) - BPM effect is approximately linear - Wingspan ratio has minimum threshold (~1.03)
Part 7: Deployment Considerations
Model Card
Model: GradientBoostingClassifier for NBA Draft Success Prediction Version: 1.0 Training Data: NCAA players drafted 2005-2018 Features: 42 (college stats, physical measurements, derived features) Outputs: Starter probability, tier prediction, projected Win Shares Limitations: International players excluded, recent rule changes may affect validity Fairness: Tested for bias across conferences (none detected) Update Schedule: Retrain annually with new outcomes
Production Pipeline
class DraftPredictionPipeline:
def __init__(self, model_path):
self.model = load_model(model_path)
self.scaler = load_scaler(model_path)
self.feature_pipeline = self._setup_feature_pipeline()
def predict(self, prospect_data):
# Feature engineering
features = self.feature_pipeline.transform(prospect_data)
# Scale
features_scaled = self.scaler.transform(features)
# Predict
starter_prob = self.model.predict_proba(features_scaled)[0, 1]
tier_pred = self.tier_model.predict(features_scaled)[0]
ws_pred = self.regression_model.predict(features_scaled)[0]
# Confidence interval
ws_lower = ws_pred - 1.96 * self.model_std_error
ws_upper = ws_pred + 1.96 * self.model_std_error
return {
'starter_probability': starter_prob,
'predicted_tier': tier_pred,
'projected_win_shares': ws_pred,
'ws_confidence_interval': (ws_lower, ws_upper)
}
Part 8: Lessons Learned
What Worked
- Age as dominant feature: Confirmed conventional wisdom with data
- FT% over 3PT%: Better predictor of NBA shooting
- Conference adjustment: Essential for fair comparison
- Gradient boosting: Handled non-linear relationships well
What Didn't Work
- Personality/character features: Couldn't reliably quantify
- Injury history: Insufficient data
- Team context: Hard to isolate individual contribution
Recommendations
- Use model as starting point, not final answer
- Weight human scouting for non-quantifiable traits
- Update model as more tracking data becomes available
- Account for uncertainty in all predictions
Conclusion
A gradient boosting model can meaningfully predict draft success, achieving 78% AUC-ROC on held-out draft classes. Key predictive features include age, production metrics, and physical measurements. However, the model should complement rather than replace traditional scouting, particularly for assessing intangible qualities the data cannot capture.
Exercises
Exercise 1
Implement the model and reproduce results on the 2019-2020 test set. Calculate all metrics.
Exercise 2
Add tracking data features (if available) and measure improvement.
Exercise 3
Build a "bust probability" model specifically and evaluate against draft position.
Exercise 4
Create an interactive tool that shows SHAP contributions for any prospect.