Chapter 26: Case Study 2 - Building a Draft Success Prediction Model with Gradient Boosting

Introduction

The NBA Draft presents a classic machine learning challenge: predict future success from limited historical data with high stakes. This case study walks through building a complete draft prediction model using gradient boosting, from feature engineering through deployment-ready evaluation.

Part 1: Problem Framing

Definition of Success

"Success" can be defined multiple ways. We chose a multi-target approach:

Binary: Starter-level player (Top 150 in VORP over first 5 seasons)
Continuous: Career Win Shares (through age 28 or 5 seasons)
Categorical: Tier (All-Star, Starter, Rotation, Bust)

Data Collection

Training data: Draft classes 2005-2018 (sufficient career data) Test data: Draft classes 2019-2020 (holdout) Total prospects: 842 players with complete data

Part 2: Feature Engineering

Statistical Features (College)

college_features = {
    # Volume stats (per 100 possessions)
    'pts_per_100', 'reb_per_100', 'ast_per_100',
    'stl_per_100', 'blk_per_100', 'tov_per_100',

    # Efficiency
    'ts_pct', 'efg_pct', 'ft_pct', 'three_pt_pct',

    # Usage and role
    'usage_rate', 'ast_ratio', 'tov_ratio',

    # Advanced
    'bpm', 'obpm', 'dbpm', 'ws_per_40'
}

Physical Features

physical_features = {
    'height_no_shoes', 'weight', 'wingspan',
    'standing_reach', 'body_fat_pct',
    'max_vertical', 'lane_agility', 'three_quarter_sprint'
}

Derived Features

def engineer_draft_features(df):
    # Wingspan advantage
    df['wingspan_ratio'] = df['wingspan'] / df['height_no_shoes']

    # Age adjustment (younger = more projection)
    df['age_bonus'] = np.maximum(0, 22 - df['age'])

    # Conference adjustment
    df['conf_adj_pts'] = df['pts_per_100'] * df['conference_factor']

    # Production vs draft position expectation
    df['production_over_expected'] = df['bpm'] - df['expected_bpm_for_pick']

    # Physical profile score
    df['athletic_composite'] = (
        df['max_vertical_pct'] * 0.35 +
        df['agility_pct'] * 0.35 +
        df['sprint_pct'] * 0.30
    )

    return df

Final feature count: 42 features

Part 3: Model Development

Train-Test Split

# Temporal split (no data leakage)
train = df[df['draft_year'] <= 2018]  # 720 players
test = df[df['draft_year'].isin([2019, 2020])]  # 122 players

Baseline Models

Model	Accuracy (Starter Binary)	AUC-ROC
Logistic Regression	68.2%	0.71
Random Forest	71.5%	0.74
XGBoost	73.8%	0.77
Gradient Boosting	74.2%	0.78

Hyperparameter Tuning

Using 5-fold cross-validation with time-based splits:

from sklearn.model_selection import GridSearchCV, TimeSeriesSplit

param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 4, 5, 6],
    'learning_rate': [0.05, 0.1, 0.15],
    'min_samples_leaf': [5, 10, 20],
    'subsample': [0.8, 0.9, 1.0]
}

# Time-based CV
tscv = TimeSeriesSplit(n_splits=5)
grid_search = GridSearchCV(
    GradientBoostingClassifier(),
    param_grid,
    cv=tscv,
    scoring='roc_auc'
)

Optimal parameters: - n_estimators: 200 - max_depth: 4 - learning_rate: 0.1 - min_samples_leaf: 10 - subsample: 0.9

Part 4: Model Evaluation

Classification Performance (Binary: Starter)

Confusion Matrix (Test Set):

                 Predicted
              Starter  Not-Starter
Actual  Starter    28         12
    Not-Starter    15         67

Metrics: - Accuracy: 77.9% - Precision: 65.1% - Recall: 70.0% - F1 Score: 67.5% - AUC-ROC: 0.79

Continuous Performance (Career Win Shares)

MAE: 11.2 WS
RMSE: 15.8 WS
R-squared: 0.42

Tier Prediction (Multi-class)

Tier	Precision	Recall	F1
All-Star	45%	50%	47%
Starter	52%	58%	55%
Rotation	61%	55%	58%
Bust	72%	68%	70%

The model is best at identifying busts, worst at identifying All-Stars (as expected given rarity).

Part 5: Feature Importance

Top 15 Features

Rank	Feature	Importance
1	age_at_draft	0.142
2	bpm	0.098
3	ft_pct	0.087
4	wingspan_ratio	0.072
5	conf_adj_pts	0.068
6	ast_ratio	0.055
7	athletic_composite	0.052
8	usage_rate	0.048
9	ts_pct	0.045
10	standing_reach	0.042
11	games_played	0.038
12	stl_per_100	0.035
13	max_vertical	0.032
14	tov_ratio	0.028
15	three_pt_pct	0.025

Key Insights

Age dominates: Younger producers are significantly more likely to succeed
FT% matters more than 3PT%: Free throw shooting is a better predictor of NBA shooting
Length (wingspan ratio) is crucial: Physical tools matter beyond just athleticism
Playmaking indicators: Assist ratio predicts beyond just points

Part 6: Model Interpretation

SHAP Analysis

For a specific prediction (Zion Williamson, 2019):

Feature	Value	SHAP Contribution
age_at_draft	18.8	+0.18
bpm	+12.2	+0.22
wingspan_ratio	1.04	+0.08
athletic_composite	95th pct	+0.12
ft_pct	64.0%	-0.15
conference	SEC (1.04)	+0.05

Prediction: 85% probability of starter, 42% probability of All-Star

Actual outcome: All-Star (model correctly identified high upside)

Partial Dependence

Key relationships discovered: - Age effect is strongly non-linear (drops sharply after 21) - FT% has threshold effect (below 70% is red flag) - BPM effect is approximately linear - Wingspan ratio has minimum threshold (~1.03)

Part 7: Deployment Considerations

Model Card

Model: GradientBoostingClassifier for NBA Draft Success Prediction Version: 1.0 Training Data: NCAA players drafted 2005-2018 Features: 42 (college stats, physical measurements, derived features) Outputs: Starter probability, tier prediction, projected Win Shares Limitations: International players excluded, recent rule changes may affect validity Fairness: Tested for bias across conferences (none detected) Update Schedule: Retrain annually with new outcomes

Production Pipeline

class DraftPredictionPipeline:
    def __init__(self, model_path):
        self.model = load_model(model_path)
        self.scaler = load_scaler(model_path)
        self.feature_pipeline = self._setup_feature_pipeline()

    def predict(self, prospect_data):
        # Feature engineering
        features = self.feature_pipeline.transform(prospect_data)

        # Scale
        features_scaled = self.scaler.transform(features)

        # Predict
        starter_prob = self.model.predict_proba(features_scaled)[0, 1]
        tier_pred = self.tier_model.predict(features_scaled)[0]
        ws_pred = self.regression_model.predict(features_scaled)[0]

        # Confidence interval
        ws_lower = ws_pred - 1.96 * self.model_std_error
        ws_upper = ws_pred + 1.96 * self.model_std_error

        return {
            'starter_probability': starter_prob,
            'predicted_tier': tier_pred,
            'projected_win_shares': ws_pred,
            'ws_confidence_interval': (ws_lower, ws_upper)
        }

Part 8: Lessons Learned

What Worked

Age as dominant feature: Confirmed conventional wisdom with data
FT% over 3PT%: Better predictor of NBA shooting
Conference adjustment: Essential for fair comparison
Gradient boosting: Handled non-linear relationships well

What Didn't Work

Personality/character features: Couldn't reliably quantify
Injury history: Insufficient data
Team context: Hard to isolate individual contribution

Recommendations

Use model as starting point, not final answer
Weight human scouting for non-quantifiable traits
Update model as more tracking data becomes available
Account for uncertainty in all predictions

Conclusion

A gradient boosting model can meaningfully predict draft success, achieving 78% AUC-ROC on held-out draft classes. Key predictive features include age, production metrics, and physical measurements. However, the model should complement rather than replace traditional scouting, particularly for assessing intangible qualities the data cannot capture.

Exercises

Exercise 1

Implement the model and reproduce results on the 2019-2020 test set. Calculate all metrics.

Exercise 2

Add tracking data features (if available) and measure improvement.

Exercise 3

Build a "bust probability" model specifically and evaluate against draft position.

Exercise 4

Create an interactive tool that shows SHAP contributions for any prospect.