Chapter 26: Machine Learning in Basketball - Exercises

Section A: ML Fundamentals (Questions 1-10)

Exercise 1: Supervised vs Unsupervised Learning

For each basketball problem, classify as supervised or unsupervised and justify: a) Predicting whether a player will make an All-Star team b) Discovering natural groupings of playing styles c) Estimating a player's market value from statistics d) Identifying unusual game patterns for scouting

Exercise 2: Feature Engineering

Create 10 engineered features from basic box score statistics (points, rebounds, assists, FG%, etc.) that might improve player evaluation models. For each feature: a) Define the calculation b) Explain basketball rationale c) Discuss potential limitations

Exercise 3: Train-Test Split Design

For a model predicting player performance next season: a) Why is random splitting problematic? b) Design an appropriate validation scheme c) How would you handle players who span multiple seasons?

Exercise 4: Handling Missing Data

A dataset has 15% missing values for three-point percentage (some players don't attempt threes). Compare approaches: a) Remove rows with missing values b) Impute with mean/median c) Impute with zero d) Create a separate "attempts threes" feature Which is most appropriate and why?

Exercise 5: Feature Scaling

Explain why feature scaling matters for: a) K-means clustering b) Neural networks c) Random forests Which algorithms require scaling and which don't?

Exercise 6: Class Imbalance

You're predicting whether players will be inducted into the Hall of Fame (2% positive rate). Discuss: a) Why accuracy is a poor metric b) Three techniques to handle imbalance c) Appropriate evaluation metrics

Exercise 7: Cross-Validation Implementation

Design a 5-fold cross-validation scheme for predicting Win Shares. Include: a) How to split the data b) How to handle time-based concerns c) Metrics to calculate for each fold

Exercise 8: Regularization

Compare L1 (Lasso) and L2 (Ridge) regularization for a player salary prediction model: a) How does each affect coefficients? b) When would you prefer L1? c) What is elastic net and when is it useful?

Exercise 9: Overfitting Detection

Your model achieves 95% accuracy on training data but 65% on test data. a) Diagnose the problem b) List 5 strategies to address overfitting c) How would you know if you've fixed it?

Exercise 10: Pipeline Design

Design a complete ML pipeline for predicting player career length: a) Data collection and cleaning b) Feature engineering c) Model selection d) Training and validation e) Deployment considerations


Section B: Supervised Learning (Questions 11-18)

Exercise 11: Linear Regression

Build a linear regression model for predicting PER from basic statistics. Show: a) Model specification b) Coefficient interpretation c) R-squared calculation d) Residual analysis

Exercise 12: Logistic Regression

Design a logistic regression model to predict whether a player makes the playoffs: a) Select appropriate features b) Interpret odds ratios c) Calculate predicted probabilities d) Evaluate using ROC-AUC

Exercise 13: Decision Trees

Build a decision tree for classifying players by position: a) Select features b) Explain how splits are determined c) Discuss tree depth and pruning d) Interpret the resulting tree

Exercise 14: Random Forests

Compare a single decision tree to a random forest for draft pick success prediction: a) How does random forest improve on single trees? b) What hyperparameters matter most? c) How do you interpret feature importance?

Exercise 15: Gradient Boosting

Implement gradient boosting for predicting game margins: a) Explain the boosting process b) Compare to random forests c) Tune key hyperparameters d) Analyze learning curves

Exercise 16: Neural Networks

Design a neural network for player comparison: a) Architecture (layers, neurons) b) Activation functions c) Training process d) When would neural nets outperform simpler models?

Exercise 17: Model Comparison

Compare multiple models for predicting All-NBA selection: a) Logistic regression b) Random forest c) Gradient boosting d) Neural network Use appropriate metrics and explain which you'd deploy and why.

Exercise 18: Ensemble Methods

Create an ensemble combining multiple prediction models: a) Design the ensemble architecture b) Determine optimal weights c) Evaluate improvement over individual models


Section C: Unsupervised Learning (Questions 19-26)

Exercise 19: K-Means Clustering

Apply K-means to discover player archetypes: a) Select and prepare features b) Determine optimal K (elbow method, silhouette) c) Interpret the resulting clusters d) Name and characterize each cluster

Exercise 20: Hierarchical Clustering

Use hierarchical clustering on the same player data: a) Choose linkage method and justify b) Create and interpret dendrogram c) Compare results to K-means

Exercise 21: Dimensionality Reduction with PCA

Apply PCA to player statistics: a) Prepare and scale data b) Determine number of components c) Interpret principal components d) Visualize players in reduced space

Exercise 22: t-SNE Visualization

Use t-SNE to visualize player similarities: a) Prepare data appropriately b) Choose perplexity parameter c) Interpret the visualization d) Compare to PCA visualization

Exercise 23: Clustering Team Playing Styles

Apply clustering to team-level statistics: a) Select appropriate features b) Determine number of clusters c) Characterize each team style d) Track how teams move between clusters over time

Exercise 24: Anomaly Detection

Use unsupervised methods to detect unusual games: a) Define "unusual" in basketball context b) Apply isolation forest or similar method c) Interpret flagged anomalies d) Discuss false positive management

Exercise 25: Association Rules

Apply association rule mining to play-by-play data: a) Define transactions and items b) Find frequent itemsets c) Generate and interpret rules d) Discuss basketball applications

Exercise 26: Gaussian Mixture Models

Compare GMM to K-means for player clustering: a) How do the assumptions differ? b) When would GMM be preferred? c) Interpret soft cluster assignments d) Apply to real player data


Section D: Advanced Topics (Questions 27-32)

Exercise 27: Time Series Analysis

Model a player's performance trajectory: a) Identify trends and seasonality b) Handle career interruptions c) Forecast future performance d) Quantify prediction uncertainty

Exercise 28: Sequence Modeling

Use sequence models for play prediction: a) Prepare play-by-play data as sequences b) Apply RNN/LSTM architecture c) Predict next action given history d) Evaluate prediction quality

Exercise 29: Transfer Learning

Apply transfer learning for basketball analysis: a) Identify source and target domains b) What knowledge transfers? c) Fine-tune pre-trained models d) Evaluate improvement

Exercise 30: Interpretable ML

Make a "black box" model interpretable: a) Apply SHAP values b) Create partial dependence plots c) Generate local explanations d) Present to non-technical audience

Exercise 31: AutoML Application

Use AutoML tools for a basketball prediction task: a) Frame the problem appropriately b) Apply AutoML (e.g., auto-sklearn) c) Compare to manual model building d) Discuss pros and cons

Exercise 32: Production Deployment

Design a production ML system for draft prediction: a) Data pipeline design b) Model versioning c) Monitoring and retraining d) Handling model drift


Section E: Applied Projects (Questions 33-40)

Exercise 33: Player Comparison System

Build a complete player comparison system using: a) Similarity metrics b) Clustering c) Visualization d) Interactive interface design

Exercise 34: Draft Model

Create an end-to-end draft prediction model: a) Feature engineering from college stats b) Model selection and training c) Validation on historical drafts d) Uncertainty quantification

Exercise 35: Lineup Optimization

Build a lineup optimization system: a) Define objective function b) Feature engineering for lineups c) Model lineup performance d) Optimize given constraints

Exercise 36: Injury Prediction

Develop an injury risk model: a) Feature engineering from load data b) Handle class imbalance c) Evaluate with appropriate metrics d) Discuss ethical considerations

Exercise 37: Shot Selection Analysis

Build a shot quality model: a) Features from shot location and context b) Predict make probability c) Analyze shot selection quality d) Identify improvement opportunities

Exercise 38: Trade Evaluation

Create a trade evaluation system: a) Project player values b) Account for contract implications c) Quantify uncertainty d) Present recommendations

Exercise 39: Career Trajectory Modeling

Model career trajectories for different player types: a) Cluster players by playing style b) Model typical trajectories per cluster c) Predict individual futures d) Identify outlier trajectories

Exercise 40: Real-Time Analysis

Design a real-time ML system for in-game analytics: a) Streaming data pipeline b) Low-latency prediction c) Visualization for coaches d) Post-game batch analysis


Answer Key Guidelines

Code exercises should include working implementations with comments. Conceptual exercises should demonstrate understanding of methods and appropriate application. Applied projects should include full methodology and interpretation.