Chapter 22: Quiz - Machine Learning Applications

Instructions

Choose the best answer for each question. Questions cover ML fundamentals, supervised learning, ensemble methods, clustering, neural networks, and evaluation.


Section 1: Machine Learning Fundamentals (Questions 1-8)

Question 1

Which ML task type is appropriate for predicting whether a team will win or lose?

A) Regression B) Classification C) Clustering D) Dimensionality reduction

Question 2

The main advantage of machine learning over traditional statistics for football analytics is:

A) ML requires less data B) ML automatically discovers complex patterns and interactions C) ML models are always more accurate D) ML doesn't require domain knowledge

Question 3

Data leakage in football ML occurs when:

A) The model uses too many features B) Future information is used to predict past outcomes C) The training set is too small D) The model is overfit

Question 4

For game outcome prediction, which train/test split strategy is most appropriate?

A) Random 80/20 split B) Stratified random split C) Temporal split (train on past, test on future) D) Leave-one-out cross-validation

Question 5

Which feature would likely cause data leakage for predicting game outcomes?

A) Home team's Elo rating before the game B) Season-ending team record C) Opponent's defensive ranking entering the game D) Home field advantage indicator

Question 6

The curse of dimensionality in football ML refers to:

A) Having too many games to process B) Performance degradation when too many features relative to samples C) Difficulty in storing large datasets D) Slow model training times

Question 7

Feature engineering in football analytics:

A) Is unnecessary with modern ML algorithms B) Converts raw data into meaningful predictive signals C) Always improves model performance D) Should be avoided to prevent overfitting

Question 8

Which statement about football ML is TRUE?

A) More complex models always perform better B) Ensemble methods never outperform single models C) Domain knowledge should guide feature creation D) Neural networks are always the best choice


Section 2: Supervised Learning (Questions 9-17)

Question 9

Logistic regression is appropriate for game prediction because:

A) It outputs exact point spreads B) It naturally produces probabilities between 0 and 1 C) It requires the least amount of data D) It captures all non-linear relationships

Question 10

Random Forest classifiers work by:

A) Training a single decision tree with many features B) Training multiple trees on random subsets of data and features C) Using random weights for each prediction D) Randomly selecting which games to predict

Question 11

Gradient boosting improves on random forests by:

A) Training trees in parallel rather than sequence B) Sequentially training trees to correct previous errors C) Using larger trees D) Requiring less hyperparameter tuning

Question 12

The learning rate parameter in XGBoost controls:

A) How fast the model makes predictions B) How much each tree contributes to the final prediction C) The number of trees to train D) The depth of each tree

Question 13

Which XGBoost parameter helps prevent overfitting?

A) n_estimators B) objective C) max_depth D) random_state

Question 14

For predicting point spreads (continuous), which model type is appropriate?

A) Logistic regression B) Random forest classifier C) Gradient boosting regressor D) K-means clustering

Question 15

Early stopping in model training:

A) Stops training when validation error starts increasing B) Stops training after a fixed number of epochs C) Stops training when accuracy reaches 100% D) Is only used for neural networks

Question 16

The subsample parameter in XGBoost:

A) Determines test set size B) Controls fraction of samples used per tree C) Sets the number of features to use D) Defines output sample rate

Question 17

Which regularization approach is used by Ridge regression?

A) L1 (sum of absolute coefficients) B) L2 (sum of squared coefficients) C) Dropout D) Early stopping


Section 3: Ensemble Methods (Questions 18-24)

Question 18

Voting ensembles combine multiple models by:

A) Training one model on predictions of others B) Averaging or voting on individual model predictions C) Selecting the best-performing model D) Concatenating model outputs

Question 19

Stacking differs from voting by:

A) Using fewer base models B) Training a meta-learner on base model predictions C) Only working with neural networks D) Requiring identical base models

Question 20

For soft voting, ensemble predictions are calculated using:

A) Majority vote of predicted classes B) Average of predicted probabilities C) Maximum probability across models D) Minimum probability across models

Question 21

When building weighted ensembles, weights should be based on:

A) Model complexity B) Training time C) Validation set performance D) Random assignment

Question 22

The primary benefit of ensemble methods is:

A) Faster training time B) Better interpretability C) Reduced variance and improved generalization D) Simpler model architecture

Question 23

Which statement about ensemble diversity is TRUE?

A) All base models should use the same algorithm B) Diversity among base models improves ensemble performance C) Only two models are needed for effective ensembles D) Correlated models make better ensembles

Question 24

A stacking ensemble with passthrough=True:

A) Skips the meta-learner B) Includes original features with base model predictions C) Uses base models in parallel only D) Prevents overfitting automatically


Section 4: Unsupervised Learning (Questions 25-29)

Question 25

K-means clustering requires:

A) Labeled training data B) Pre-specified number of clusters C) Normally distributed features D) Binary outcomes

Question 26

The silhouette score measures:

A) Number of clusters B) How well samples fit their assigned cluster vs. others C) Training time D) Feature importance

Question 27

For discovering player archetypes, which technique is appropriate?

A) Logistic regression B) K-means or GMM clustering C) Linear regression D) Decision trees

Question 28

The elbow method helps determine:

A) Optimal regularization strength B) Optimal number of clusters C) Optimal learning rate D) Optimal train/test split

Question 29

Before clustering player statistics, you should:

A) Remove all outliers B) Standardize/scale features C) Use raw statistics directly D) Reduce to one feature


Section 5: Neural Networks (Questions 30-33)

Question 30

In a feed-forward neural network for classification:

A) The output layer uses linear activation B) The output layer uses sigmoid or softmax activation C) Hidden layers use sigmoid activation D) No activation functions are used

Question 31

Dropout regularization:

A) Removes features during training B) Randomly deactivates neurons during training C) Reduces the number of layers D) Only applies to recurrent networks

Question 32

Batch normalization in neural networks:

A) Reduces batch size B) Normalizes layer inputs to stabilize training C) Is only used in image models D) Replaces activation functions

Question 33

LSTM networks are appropriate for football applications involving:

A) Single game prediction B) Sequential data like play-by-play C) Player clustering D) Feature selection


Section 6: Model Evaluation (Questions 34-40)

Question 34

The Brier score for probabilistic predictions:

A) Ranges from -1 to 1 B) Is the mean squared error between predictions and outcomes C) Should be maximized D) Only applies to regression

Question 35

A model with AUC = 0.85 means:

A) It predicts 85% of games correctly B) It ranks a random positive higher than negative 85% of the time C) 85% of predictions are calibrated D) The false positive rate is 15%

Question 36

Expected Calibration Error (ECE) measures:

A) How well probabilities match observed frequencies B) Model training efficiency C) Feature correlation D) Overfitting degree

Question 37

A calibrated model predicting 70% win probability should:

A) Always be correct B) Win about 70% of the time across similar predictions C) Never be wrong D) Have exactly 70% accuracy overall

Question 38

When comparing models, you should primarily consider:

A) Training speed B) Performance on held-out test data C) Number of parameters D) Model popularity

Question 39

For football game prediction, which metric combination is most informative?

A) Accuracy only B) Accuracy, AUC, and Brier score C) Training loss only D) F1 score only

Question 40

Isotonic regression calibration:

A) Retrains the entire model B) Maps raw predictions to better-calibrated probabilities C) Always improves accuracy D) Requires labeled clusters


Answer Key

Section 1: Fundamentals

  1. B - Classification (binary outcome)
  2. B - Automatic pattern discovery
  3. B - Future information used for past predictions
  4. C - Temporal split preserves time ordering
  5. B - Season-ending record uses future information
  6. B - Too many features relative to samples
  7. B - Converts raw data to predictive signals
  8. C - Domain knowledge guides feature creation

Section 2: Supervised Learning

  1. B - Outputs probabilities between 0 and 1
  2. B - Multiple trees on random subsets
  3. B - Sequential training to correct errors
  4. B - Contribution of each tree
  5. C - max_depth limits tree complexity
  6. C - Gradient boosting regressor for continuous
  7. A - Stops when validation error increases
  8. B - Fraction of samples per tree
  9. B - L2 regularization

Section 3: Ensemble Methods

  1. B - Averaging or voting predictions
  2. B - Trains meta-learner on base predictions
  3. B - Average of probabilities
  4. C - Based on validation performance
  5. C - Reduced variance, better generalization
  6. B - Diversity improves performance
  7. B - Includes original features with predictions

Section 4: Unsupervised Learning

  1. B - Pre-specified number of clusters
  2. B - Cluster fit quality measure
  3. B - K-means or GMM clustering
  4. B - Optimal number of clusters
  5. B - Standardize/scale features

Section 5: Neural Networks

  1. B - Sigmoid or softmax for classification
  2. B - Randomly deactivates neurons
  3. B - Normalizes layer inputs
  4. B - Sequential data like play-by-play

Section 6: Model Evaluation

  1. B - Mean squared error of predictions
  2. B - Ranks positive higher 85% of time
  3. A - Probability-frequency alignment
  4. B - Win about 70% across similar predictions
  5. B - Performance on held-out test data
  6. B - Accuracy, AUC, and Brier score
  7. B - Maps to better-calibrated probabilities

Scoring Guide

  • 35-40 correct: Excellent! Ready for production ML systems
  • 28-34 correct: Good understanding, review specific areas
  • 21-27 correct: Solid foundation, more practice needed
  • Below 21: Review chapter material before proceeding