Chapter 26: Machine Learning in Basketball - Quiz
Instructions
This quiz contains 25 questions covering the key concepts from Chapter 26. Select the best answer for multiple-choice questions. Short answer questions should be answered in 2-4 sentences.
Multiple Choice Questions
Question 1
Which type of machine learning is used when you have labeled examples (e.g., player statistics and whether they made All-Star)? - A) Unsupervised learning - B) Supervised learning - C) Reinforcement learning - D) Transfer learning
Question 2
K-means clustering requires which of the following inputs? - A) Target labels for each observation - B) The number of clusters (K) - C) A loss function to minimize - D) Pre-trained weights
Question 3
Which algorithm is MOST sensitive to feature scaling? - A) Decision trees - B) Random forests - C) K-nearest neighbors - D) XGBoost
Question 4
The "elbow method" is used to determine: - A) Optimal learning rate - B) Optimal number of clusters - C) Optimal regularization strength - D) Optimal tree depth
Question 5
When training data accuracy is 95% but test accuracy is 60%, the model is likely: - A) Underfitting - B) Overfitting - C) Well-generalized - D) Biased
Question 6
L1 regularization (Lasso) differs from L2 (Ridge) by: - A) Producing exactly zero coefficients - B) Being computationally faster - C) Working only on regression problems - D) Requiring more training data
Question 7
For a classification problem with 98% negative class and 2% positive class, accuracy is a poor metric because: - A) It's computationally expensive - B) A model predicting all negatives achieves 98% accuracy - C) It doesn't work with probabilities - D) It requires balanced data
Question 8
Principal Component Analysis (PCA) is primarily used for: - A) Classification - B) Regression - C) Dimensionality reduction - D) Clustering
Question 9
In a random forest, "bagging" refers to: - A) Removing outliers - B) Training trees on bootstrap samples - C) Pruning weak trees - D) Averaging predictions
Question 10
The gradient in "gradient boosting" refers to: - A) Learning rate - B) The slope of the loss function - C) Tree depth - D) Number of iterations
Question 11
Cross-validation is used to: - A) Speed up training - B) Estimate model generalization performance - C) Reduce overfitting automatically - D) Select features
Question 12
Which statement about neural networks is TRUE? - A) They always outperform simpler models - B) They require large amounts of data to train effectively - C) They're inherently interpretable - D) They can't handle missing data
Question 13
SHAP values are used for: - A) Feature scaling - B) Model interpretation - C) Data cleaning - D) Hyperparameter tuning
Question 14
t-SNE is primarily used for: - A) Prediction - B) Visualization of high-dimensional data - C) Feature selection - D) Outlier detection
Question 15
The silhouette score measures: - A) Model accuracy - B) Feature importance - C) Cluster quality - D) Prediction confidence
Question 16
When would you use a Recurrent Neural Network (RNN) instead of a feedforward network? - A) For image classification - B) For sequential/time-series data - C) For tabular data - D) For small datasets
Question 17
Feature engineering is the process of: - A) Removing features from the model - B) Creating new features from existing data - C) Normalizing feature values - D) Selecting the best model
Question 18
Hyperparameter tuning typically uses: - A) Gradient descent - B) Grid search or random search - C) Feature scaling - D) Data augmentation
Question 19
Model ensembling typically: - A) Reduces variance and improves generalization - B) Speeds up training time - C) Simplifies interpretation - D) Reduces data requirements
Question 20
The "bias-variance tradeoff" refers to: - A) Trading accuracy for speed - B) The tradeoff between underfitting and overfitting - C) Choosing between different loss functions - D) Balancing training and test data sizes
Short Answer Questions
Question 21
Explain why you might choose a random forest over a neural network for predicting which draft picks will become NBA starters. Consider data availability, interpretability, and performance.
Your Answer:
Question 22
You're building a model to cluster NBA players into playing style archetypes. Describe your approach including feature selection, algorithm choice, and how you would validate the resulting clusters.
Your Answer:
Question 23
Explain the difference between correlation and causation in the context of a machine learning model that finds players with more tattoos have higher scoring averages. What would you caution a team about using this finding?
Your Answer:
Question 24
A model predicting injury risk has 90% accuracy but fails to identify 60% of actual injuries. Explain what metrics should have been used instead of accuracy and why this model is problematic.
Your Answer:
Question 25
Describe a specific scenario where simpler statistical methods (like linear regression) would be preferred over complex machine learning approaches for a basketball analytics problem.
Your Answer:
Answer Key
Multiple Choice Answers
-
B - Supervised learning uses labeled examples to learn a mapping from inputs to outputs.
-
B - K-means requires specifying the number of clusters K before running the algorithm.
-
C - K-nearest neighbors uses distance calculations, which are directly affected by feature scales.
-
B - The elbow method plots variance explained vs. K to find the optimal number of clusters.
-
B - High training accuracy with low test accuracy is the classic sign of overfitting.
-
A - L1 regularization can drive coefficients exactly to zero, performing feature selection.
-
B - With severe class imbalance, a naive model predicting majority class achieves high accuracy.
-
C - PCA projects data onto lower-dimensional space while preserving variance.
-
B - Bagging (Bootstrap AGGregatING) trains each tree on a random sample with replacement.
-
B - Gradient boosting fits new models to the gradient (derivative) of the loss function.
-
B - Cross-validation provides an estimate of how well the model generalizes to unseen data.
-
B - Neural networks typically require large datasets to learn complex patterns effectively.
-
B - SHAP (SHapley Additive exPlanations) values explain individual predictions.
-
B - t-SNE is a visualization technique for high-dimensional data in 2D or 3D.
-
C - Silhouette score measures how similar points are to their own cluster vs. other clusters.
-
B - RNNs are designed for sequential data where order matters (e.g., play sequences).
-
B - Feature engineering creates new informative features from raw data.
-
B - Hyperparameters are typically tuned using grid search, random search, or Bayesian optimization.
-
A - Ensembles combine multiple models, typically reducing variance and improving generalization.
-
B - Bias-variance tradeoff balances model simplicity (high bias) vs. flexibility (high variance).
Short Answer Rubric
Question 21 - Should mention: (1) limited draft data makes neural networks prone to overfitting, (2) random forests handle small datasets better, (3) feature importance aids interpretation for scouts, (4) similar performance with less complexity.
Question 22 - Should cover: (1) features like scoring volume, shot distribution, defensive stats, (2) K-means or GMM algorithms, (3) elbow/silhouette methods for K selection, (4) validation through expert review and cluster stability analysis.
Question 23 - Should explain: (1) correlation doesn't imply causation, (2) likely confounding variables (age, culture, position), (3) model captures correlation, not causal mechanism, (4) using this for decisions would be inappropriate.
Question 24 - Should discuss: (1) precision, recall, and F1 score as better metrics, (2) missing 60% of injuries means low recall, (3) for medical predictions, false negatives are costly, (4) should optimize for recall/sensitivity.
Question 25 - Examples include: (1) predicting PER from box score stats (interpretable coefficients), (2) small sample sizes where complex models overfit, (3) when coefficient interpretation is the goal, (4) baseline models before adding complexity.