Chapter 26: Case Study 2 - Building a Draft Success Prediction Model with Gradient Boosting

Introduction

The NBA Draft presents a classic machine learning challenge: predict future success from limited historical data with high stakes. This case study walks through building a complete draft prediction model using gradient boosting, from feature engineering through deployment-ready evaluation.

Part 1: Problem Framing

Definition of Success

"Success" can be defined multiple ways. We chose a multi-target approach:

  1. Binary: Starter-level player (Top 150 in VORP over first 5 seasons)
  2. Continuous: Career Win Shares (through age 28 or 5 seasons)
  3. Categorical: Tier (All-Star, Starter, Rotation, Bust)

Data Collection

Training data: Draft classes 2005-2018 (sufficient career data) Test data: Draft classes 2019-2020 (holdout) Total prospects: 842 players with complete data

Part 2: Feature Engineering

Statistical Features (College)

college_features = {
    # Volume stats (per 100 possessions)
    'pts_per_100', 'reb_per_100', 'ast_per_100',
    'stl_per_100', 'blk_per_100', 'tov_per_100',

    # Efficiency
    'ts_pct', 'efg_pct', 'ft_pct', 'three_pt_pct',

    # Usage and role
    'usage_rate', 'ast_ratio', 'tov_ratio',

    # Advanced
    'bpm', 'obpm', 'dbpm', 'ws_per_40'
}

Physical Features

physical_features = {
    'height_no_shoes', 'weight', 'wingspan',
    'standing_reach', 'body_fat_pct',
    'max_vertical', 'lane_agility', 'three_quarter_sprint'
}

Derived Features

def engineer_draft_features(df):
    # Wingspan advantage
    df['wingspan_ratio'] = df['wingspan'] / df['height_no_shoes']

    # Age adjustment (younger = more projection)
    df['age_bonus'] = np.maximum(0, 22 - df['age'])

    # Conference adjustment
    df['conf_adj_pts'] = df['pts_per_100'] * df['conference_factor']

    # Production vs draft position expectation
    df['production_over_expected'] = df['bpm'] - df['expected_bpm_for_pick']

    # Physical profile score
    df['athletic_composite'] = (
        df['max_vertical_pct'] * 0.35 +
        df['agility_pct'] * 0.35 +
        df['sprint_pct'] * 0.30
    )

    return df

Final feature count: 42 features

Part 3: Model Development

Train-Test Split

# Temporal split (no data leakage)
train = df[df['draft_year'] <= 2018]  # 720 players
test = df[df['draft_year'].isin([2019, 2020])]  # 122 players

Baseline Models

Model Accuracy (Starter Binary) AUC-ROC
Logistic Regression 68.2% 0.71
Random Forest 71.5% 0.74
XGBoost 73.8% 0.77
Gradient Boosting 74.2% 0.78

Hyperparameter Tuning

Using 5-fold cross-validation with time-based splits:

from sklearn.model_selection import GridSearchCV, TimeSeriesSplit

param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 4, 5, 6],
    'learning_rate': [0.05, 0.1, 0.15],
    'min_samples_leaf': [5, 10, 20],
    'subsample': [0.8, 0.9, 1.0]
}

# Time-based CV
tscv = TimeSeriesSplit(n_splits=5)
grid_search = GridSearchCV(
    GradientBoostingClassifier(),
    param_grid,
    cv=tscv,
    scoring='roc_auc'
)

Optimal parameters: - n_estimators: 200 - max_depth: 4 - learning_rate: 0.1 - min_samples_leaf: 10 - subsample: 0.9

Part 4: Model Evaluation

Classification Performance (Binary: Starter)

Confusion Matrix (Test Set):

                 Predicted
              Starter  Not-Starter
Actual  Starter    28         12
    Not-Starter    15         67

Metrics: - Accuracy: 77.9% - Precision: 65.1% - Recall: 70.0% - F1 Score: 67.5% - AUC-ROC: 0.79

Continuous Performance (Career Win Shares)

  • MAE: 11.2 WS
  • RMSE: 15.8 WS
  • R-squared: 0.42

Tier Prediction (Multi-class)

Tier Precision Recall F1
All-Star 45% 50% 47%
Starter 52% 58% 55%
Rotation 61% 55% 58%
Bust 72% 68% 70%

The model is best at identifying busts, worst at identifying All-Stars (as expected given rarity).

Part 5: Feature Importance

Top 15 Features

Rank Feature Importance
1 age_at_draft 0.142
2 bpm 0.098
3 ft_pct 0.087
4 wingspan_ratio 0.072
5 conf_adj_pts 0.068
6 ast_ratio 0.055
7 athletic_composite 0.052
8 usage_rate 0.048
9 ts_pct 0.045
10 standing_reach 0.042
11 games_played 0.038
12 stl_per_100 0.035
13 max_vertical 0.032
14 tov_ratio 0.028
15 three_pt_pct 0.025

Key Insights

  1. Age dominates: Younger producers are significantly more likely to succeed
  2. FT% matters more than 3PT%: Free throw shooting is a better predictor of NBA shooting
  3. Length (wingspan ratio) is crucial: Physical tools matter beyond just athleticism
  4. Playmaking indicators: Assist ratio predicts beyond just points

Part 6: Model Interpretation

SHAP Analysis

For a specific prediction (Zion Williamson, 2019):

Feature Value SHAP Contribution
age_at_draft 18.8 +0.18
bpm +12.2 +0.22
wingspan_ratio 1.04 +0.08
athletic_composite 95th pct +0.12
ft_pct 64.0% -0.15
conference SEC (1.04) +0.05

Prediction: 85% probability of starter, 42% probability of All-Star

Actual outcome: All-Star (model correctly identified high upside)

Partial Dependence

Key relationships discovered: - Age effect is strongly non-linear (drops sharply after 21) - FT% has threshold effect (below 70% is red flag) - BPM effect is approximately linear - Wingspan ratio has minimum threshold (~1.03)

Part 7: Deployment Considerations

Model Card

Model: GradientBoostingClassifier for NBA Draft Success Prediction Version: 1.0 Training Data: NCAA players drafted 2005-2018 Features: 42 (college stats, physical measurements, derived features) Outputs: Starter probability, tier prediction, projected Win Shares Limitations: International players excluded, recent rule changes may affect validity Fairness: Tested for bias across conferences (none detected) Update Schedule: Retrain annually with new outcomes

Production Pipeline

class DraftPredictionPipeline:
    def __init__(self, model_path):
        self.model = load_model(model_path)
        self.scaler = load_scaler(model_path)
        self.feature_pipeline = self._setup_feature_pipeline()

    def predict(self, prospect_data):
        # Feature engineering
        features = self.feature_pipeline.transform(prospect_data)

        # Scale
        features_scaled = self.scaler.transform(features)

        # Predict
        starter_prob = self.model.predict_proba(features_scaled)[0, 1]
        tier_pred = self.tier_model.predict(features_scaled)[0]
        ws_pred = self.regression_model.predict(features_scaled)[0]

        # Confidence interval
        ws_lower = ws_pred - 1.96 * self.model_std_error
        ws_upper = ws_pred + 1.96 * self.model_std_error

        return {
            'starter_probability': starter_prob,
            'predicted_tier': tier_pred,
            'projected_win_shares': ws_pred,
            'ws_confidence_interval': (ws_lower, ws_upper)
        }

Part 8: Lessons Learned

What Worked

  1. Age as dominant feature: Confirmed conventional wisdom with data
  2. FT% over 3PT%: Better predictor of NBA shooting
  3. Conference adjustment: Essential for fair comparison
  4. Gradient boosting: Handled non-linear relationships well

What Didn't Work

  1. Personality/character features: Couldn't reliably quantify
  2. Injury history: Insufficient data
  3. Team context: Hard to isolate individual contribution

Recommendations

  1. Use model as starting point, not final answer
  2. Weight human scouting for non-quantifiable traits
  3. Update model as more tracking data becomes available
  4. Account for uncertainty in all predictions

Conclusion

A gradient boosting model can meaningfully predict draft success, achieving 78% AUC-ROC on held-out draft classes. Key predictive features include age, production metrics, and physical measurements. However, the model should complement rather than replace traditional scouting, particularly for assessing intangible qualities the data cannot capture.

Exercises

Exercise 1

Implement the model and reproduce results on the 2019-2020 test set. Calculate all metrics.

Exercise 2

Add tracking data features (if available) and measure improvement.

Exercise 3

Build a "bust probability" model specifically and evaluate against draft position.

Exercise 4

Create an interactive tool that shows SHAP contributions for any prospect.