Chapter 26: Machine Learning in Basketball - Quiz

Instructions

This quiz contains 25 questions covering the key concepts from Chapter 26. Select the best answer for multiple-choice questions. Short answer questions should be answered in 2-4 sentences.

Multiple Choice Questions

Question 1

Which type of machine learning is used when you have labeled examples (e.g., player statistics and whether they made All-Star)? - A) Unsupervised learning - B) Supervised learning - C) Reinforcement learning - D) Transfer learning

Question 2

K-means clustering requires which of the following inputs? - A) Target labels for each observation - B) The number of clusters (K) - C) A loss function to minimize - D) Pre-trained weights

Question 3

Which algorithm is MOST sensitive to feature scaling? - A) Decision trees - B) Random forests - C) K-nearest neighbors - D) XGBoost

Question 4

The "elbow method" is used to determine: - A) Optimal learning rate - B) Optimal number of clusters - C) Optimal regularization strength - D) Optimal tree depth

Question 5

When training data accuracy is 95% but test accuracy is 60%, the model is likely: - A) Underfitting - B) Overfitting - C) Well-generalized - D) Biased

Question 6

L1 regularization (Lasso) differs from L2 (Ridge) by: - A) Producing exactly zero coefficients - B) Being computationally faster - C) Working only on regression problems - D) Requiring more training data

Question 7

For a classification problem with 98% negative class and 2% positive class, accuracy is a poor metric because: - A) It's computationally expensive - B) A model predicting all negatives achieves 98% accuracy - C) It doesn't work with probabilities - D) It requires balanced data

Question 8

Principal Component Analysis (PCA) is primarily used for: - A) Classification - B) Regression - C) Dimensionality reduction - D) Clustering

Question 9

In a random forest, "bagging" refers to: - A) Removing outliers - B) Training trees on bootstrap samples - C) Pruning weak trees - D) Averaging predictions

Question 10

The gradient in "gradient boosting" refers to: - A) Learning rate - B) The slope of the loss function - C) Tree depth - D) Number of iterations

Question 11

Cross-validation is used to: - A) Speed up training - B) Estimate model generalization performance - C) Reduce overfitting automatically - D) Select features

Question 12

Which statement about neural networks is TRUE? - A) They always outperform simpler models - B) They require large amounts of data to train effectively - C) They're inherently interpretable - D) They can't handle missing data

Question 13

SHAP values are used for: - A) Feature scaling - B) Model interpretation - C) Data cleaning - D) Hyperparameter tuning

Question 14

t-SNE is primarily used for: - A) Prediction - B) Visualization of high-dimensional data - C) Feature selection - D) Outlier detection

Question 15

The silhouette score measures: - A) Model accuracy - B) Feature importance - C) Cluster quality - D) Prediction confidence

Question 16

When would you use a Recurrent Neural Network (RNN) instead of a feedforward network? - A) For image classification - B) For sequential/time-series data - C) For tabular data - D) For small datasets

Question 17

Feature engineering is the process of: - A) Removing features from the model - B) Creating new features from existing data - C) Normalizing feature values - D) Selecting the best model

Question 18

Hyperparameter tuning typically uses: - A) Gradient descent - B) Grid search or random search - C) Feature scaling - D) Data augmentation

Question 19

Model ensembling typically: - A) Reduces variance and improves generalization - B) Speeds up training time - C) Simplifies interpretation - D) Reduces data requirements

Question 20

The "bias-variance tradeoff" refers to: - A) Trading accuracy for speed - B) The tradeoff between underfitting and overfitting - C) Choosing between different loss functions - D) Balancing training and test data sizes

Short Answer Questions

Question 21

Explain why you might choose a random forest over a neural network for predicting which draft picks will become NBA starters. Consider data availability, interpretability, and performance.

Your Answer:

Question 22

You're building a model to cluster NBA players into playing style archetypes. Describe your approach including feature selection, algorithm choice, and how you would validate the resulting clusters.

Your Answer:

Question 23

Explain the difference between correlation and causation in the context of a machine learning model that finds players with more tattoos have higher scoring averages. What would you caution a team about using this finding?

Your Answer:

Question 24

A model predicting injury risk has 90% accuracy but fails to identify 60% of actual injuries. Explain what metrics should have been used instead of accuracy and why this model is problematic.

Your Answer:

Question 25

Describe a specific scenario where simpler statistical methods (like linear regression) would be preferred over complex machine learning approaches for a basketball analytics problem.

Your Answer:

Answer Key

Multiple Choice Answers

B - Supervised learning uses labeled examples to learn a mapping from inputs to outputs.
B - K-means requires specifying the number of clusters K before running the algorithm.
C - K-nearest neighbors uses distance calculations, which are directly affected by feature scales.
B - The elbow method plots variance explained vs. K to find the optimal number of clusters.
B - High training accuracy with low test accuracy is the classic sign of overfitting.
A - L1 regularization can drive coefficients exactly to zero, performing feature selection.
B - With severe class imbalance, a naive model predicting majority class achieves high accuracy.
C - PCA projects data onto lower-dimensional space while preserving variance.
B - Bagging (Bootstrap AGGregatING) trains each tree on a random sample with replacement.
B - Gradient boosting fits new models to the gradient (derivative) of the loss function.
B - Cross-validation provides an estimate of how well the model generalizes to unseen data.
B - Neural networks typically require large datasets to learn complex patterns effectively.
B - SHAP (SHapley Additive exPlanations) values explain individual predictions.
B - t-SNE is a visualization technique for high-dimensional data in 2D or 3D.
C - Silhouette score measures how similar points are to their own cluster vs. other clusters.
B - RNNs are designed for sequential data where order matters (e.g., play sequences).
B - Feature engineering creates new informative features from raw data.
B - Hyperparameters are typically tuned using grid search, random search, or Bayesian optimization.
A - Ensembles combine multiple models, typically reducing variance and improving generalization.
B - Bias-variance tradeoff balances model simplicity (high bias) vs. flexibility (high variance).

Short Answer Rubric

Question 21 - Should mention: (1) limited draft data makes neural networks prone to overfitting, (2) random forests handle small datasets better, (3) feature importance aids interpretation for scouts, (4) similar performance with less complexity.

Question 22 - Should cover: (1) features like scoring volume, shot distribution, defensive stats, (2) K-means or GMM algorithms, (3) elbow/silhouette methods for K selection, (4) validation through expert review and cluster stability analysis.

Question 23 - Should explain: (1) correlation doesn't imply causation, (2) likely confounding variables (age, culture, position), (3) model captures correlation, not causal mechanism, (4) using this for decisions would be inappropriate.

Question 24 - Should discuss: (1) precision, recall, and F1 score as better metrics, (2) missing 60% of injuries means low recall, (3) for medical predictions, false negatives are costly, (4) should optimize for recall/sensitivity.

Question 25 - Examples include: (1) predicting PER from box score stats (interpretable coefficients), (2) small sample sizes where complex models overfit, (3) when coefficient interpretation is the goal, (4) baseline models before adding complexity.