Chapter 26: Machine Learning in Basketball - Exercises
Section A: ML Fundamentals (Questions 1-10)
Exercise 1: Supervised vs Unsupervised Learning
For each basketball problem, classify as supervised or unsupervised and justify: a) Predicting whether a player will make an All-Star team b) Discovering natural groupings of playing styles c) Estimating a player's market value from statistics d) Identifying unusual game patterns for scouting
Exercise 2: Feature Engineering
Create 10 engineered features from basic box score statistics (points, rebounds, assists, FG%, etc.) that might improve player evaluation models. For each feature: a) Define the calculation b) Explain basketball rationale c) Discuss potential limitations
Exercise 3: Train-Test Split Design
For a model predicting player performance next season: a) Why is random splitting problematic? b) Design an appropriate validation scheme c) How would you handle players who span multiple seasons?
Exercise 4: Handling Missing Data
A dataset has 15% missing values for three-point percentage (some players don't attempt threes). Compare approaches: a) Remove rows with missing values b) Impute with mean/median c) Impute with zero d) Create a separate "attempts threes" feature Which is most appropriate and why?
Exercise 5: Feature Scaling
Explain why feature scaling matters for: a) K-means clustering b) Neural networks c) Random forests Which algorithms require scaling and which don't?
Exercise 6: Class Imbalance
You're predicting whether players will be inducted into the Hall of Fame (2% positive rate). Discuss: a) Why accuracy is a poor metric b) Three techniques to handle imbalance c) Appropriate evaluation metrics
Exercise 7: Cross-Validation Implementation
Design a 5-fold cross-validation scheme for predicting Win Shares. Include: a) How to split the data b) How to handle time-based concerns c) Metrics to calculate for each fold
Exercise 8: Regularization
Compare L1 (Lasso) and L2 (Ridge) regularization for a player salary prediction model: a) How does each affect coefficients? b) When would you prefer L1? c) What is elastic net and when is it useful?
Exercise 9: Overfitting Detection
Your model achieves 95% accuracy on training data but 65% on test data. a) Diagnose the problem b) List 5 strategies to address overfitting c) How would you know if you've fixed it?
Exercise 10: Pipeline Design
Design a complete ML pipeline for predicting player career length: a) Data collection and cleaning b) Feature engineering c) Model selection d) Training and validation e) Deployment considerations
Section B: Supervised Learning (Questions 11-18)
Exercise 11: Linear Regression
Build a linear regression model for predicting PER from basic statistics. Show: a) Model specification b) Coefficient interpretation c) R-squared calculation d) Residual analysis
Exercise 12: Logistic Regression
Design a logistic regression model to predict whether a player makes the playoffs: a) Select appropriate features b) Interpret odds ratios c) Calculate predicted probabilities d) Evaluate using ROC-AUC
Exercise 13: Decision Trees
Build a decision tree for classifying players by position: a) Select features b) Explain how splits are determined c) Discuss tree depth and pruning d) Interpret the resulting tree
Exercise 14: Random Forests
Compare a single decision tree to a random forest for draft pick success prediction: a) How does random forest improve on single trees? b) What hyperparameters matter most? c) How do you interpret feature importance?
Exercise 15: Gradient Boosting
Implement gradient boosting for predicting game margins: a) Explain the boosting process b) Compare to random forests c) Tune key hyperparameters d) Analyze learning curves
Exercise 16: Neural Networks
Design a neural network for player comparison: a) Architecture (layers, neurons) b) Activation functions c) Training process d) When would neural nets outperform simpler models?
Exercise 17: Model Comparison
Compare multiple models for predicting All-NBA selection: a) Logistic regression b) Random forest c) Gradient boosting d) Neural network Use appropriate metrics and explain which you'd deploy and why.
Exercise 18: Ensemble Methods
Create an ensemble combining multiple prediction models: a) Design the ensemble architecture b) Determine optimal weights c) Evaluate improvement over individual models
Section C: Unsupervised Learning (Questions 19-26)
Exercise 19: K-Means Clustering
Apply K-means to discover player archetypes: a) Select and prepare features b) Determine optimal K (elbow method, silhouette) c) Interpret the resulting clusters d) Name and characterize each cluster
Exercise 20: Hierarchical Clustering
Use hierarchical clustering on the same player data: a) Choose linkage method and justify b) Create and interpret dendrogram c) Compare results to K-means
Exercise 21: Dimensionality Reduction with PCA
Apply PCA to player statistics: a) Prepare and scale data b) Determine number of components c) Interpret principal components d) Visualize players in reduced space
Exercise 22: t-SNE Visualization
Use t-SNE to visualize player similarities: a) Prepare data appropriately b) Choose perplexity parameter c) Interpret the visualization d) Compare to PCA visualization
Exercise 23: Clustering Team Playing Styles
Apply clustering to team-level statistics: a) Select appropriate features b) Determine number of clusters c) Characterize each team style d) Track how teams move between clusters over time
Exercise 24: Anomaly Detection
Use unsupervised methods to detect unusual games: a) Define "unusual" in basketball context b) Apply isolation forest or similar method c) Interpret flagged anomalies d) Discuss false positive management
Exercise 25: Association Rules
Apply association rule mining to play-by-play data: a) Define transactions and items b) Find frequent itemsets c) Generate and interpret rules d) Discuss basketball applications
Exercise 26: Gaussian Mixture Models
Compare GMM to K-means for player clustering: a) How do the assumptions differ? b) When would GMM be preferred? c) Interpret soft cluster assignments d) Apply to real player data
Section D: Advanced Topics (Questions 27-32)
Exercise 27: Time Series Analysis
Model a player's performance trajectory: a) Identify trends and seasonality b) Handle career interruptions c) Forecast future performance d) Quantify prediction uncertainty
Exercise 28: Sequence Modeling
Use sequence models for play prediction: a) Prepare play-by-play data as sequences b) Apply RNN/LSTM architecture c) Predict next action given history d) Evaluate prediction quality
Exercise 29: Transfer Learning
Apply transfer learning for basketball analysis: a) Identify source and target domains b) What knowledge transfers? c) Fine-tune pre-trained models d) Evaluate improvement
Exercise 30: Interpretable ML
Make a "black box" model interpretable: a) Apply SHAP values b) Create partial dependence plots c) Generate local explanations d) Present to non-technical audience
Exercise 31: AutoML Application
Use AutoML tools for a basketball prediction task: a) Frame the problem appropriately b) Apply AutoML (e.g., auto-sklearn) c) Compare to manual model building d) Discuss pros and cons
Exercise 32: Production Deployment
Design a production ML system for draft prediction: a) Data pipeline design b) Model versioning c) Monitoring and retraining d) Handling model drift
Section E: Applied Projects (Questions 33-40)
Exercise 33: Player Comparison System
Build a complete player comparison system using: a) Similarity metrics b) Clustering c) Visualization d) Interactive interface design
Exercise 34: Draft Model
Create an end-to-end draft prediction model: a) Feature engineering from college stats b) Model selection and training c) Validation on historical drafts d) Uncertainty quantification
Exercise 35: Lineup Optimization
Build a lineup optimization system: a) Define objective function b) Feature engineering for lineups c) Model lineup performance d) Optimize given constraints
Exercise 36: Injury Prediction
Develop an injury risk model: a) Feature engineering from load data b) Handle class imbalance c) Evaluate with appropriate metrics d) Discuss ethical considerations
Exercise 37: Shot Selection Analysis
Build a shot quality model: a) Features from shot location and context b) Predict make probability c) Analyze shot selection quality d) Identify improvement opportunities
Exercise 38: Trade Evaluation
Create a trade evaluation system: a) Project player values b) Account for contract implications c) Quantify uncertainty d) Present recommendations
Exercise 39: Career Trajectory Modeling
Model career trajectories for different player types: a) Cluster players by playing style b) Model typical trajectories per cluster c) Predict individual futures d) Identify outlier trajectories
Exercise 40: Real-Time Analysis
Design a real-time ML system for in-game analytics: a) Streaming data pipeline b) Low-latency prediction c) Visualization for coaches d) Post-game batch analysis
Answer Key Guidelines
Code exercises should include working implementations with comments. Conceptual exercises should demonstrate understanding of methods and appropriate application. Applied projects should include full methodology and interpretation.