Chapter 7: Quiz

Instructions

This quiz assesses your understanding of Expected Goals (xG) models covered in Chapter 7. Select the best answer for each question. Answers are provided at the end.


Section A: Fundamentals (Questions 1-8)

Question 1

What does an xG value of 0.15 for a shot represent?

A) The shot was taken from 15 meters away B) 15% of similar shots historically resulted in goals C) The player has a 15% career conversion rate D) The team created 0.15 total expected goals in the match


Question 2

Which statement best describes why xG was developed?

A) To replace all existing soccer statistics B) To quantify chance quality and reduce noise from goal-scoring randomness C) To eliminate the need for match observation D) To predict exactly how many goals will be scored


Question 3

In most xG models, which feature has the highest predictive importance?

A) Body part used B) Time remaining in match C) Distance to goal D) Assist type


Question 4

A player has scored 12 goals from 12.5 xG over a season. Which interpretation is most appropriate?

A) The player is definitively an elite finisher B) The player has overperformed xG and may regress toward expected values C) The xG model is poorly calibrated D) The player's xG value should be adjusted upward


Question 5

Why are headers generally assigned lower xG than foot shots from similar positions?

A) Headers are taken from farther distances on average B) Headers are inherently harder to direct accurately C) Goalkeepers are better at saving headers D) Data providers penalize aerial play


Question 6

What is the primary mathematical technique used in basic xG models?

A) Linear regression B) Logistic regression C) K-means clustering D) Random forest classification


Question 7

Match A: Team wins 3-1 with xG of 1.2 vs 0.9 Match B: Team wins 3-1 with xG of 2.8 vs 1.3

Which statement is most accurate?

A) Both wins are equally dominant B) Match A shows greater clinical finishing; Match B shows better chance creation C) xG cannot distinguish between these matches D) Match B was more lucky


Question 8

A penalty kick is typically assigned an xG of approximately:

A) 0.50 B) 0.65 C) 0.76 D) 0.95


Section B: Model Building and Evaluation (Questions 9-15)

Question 9

If an xG model has a log loss of 0.28, what does this indicate?

A) The model is poorly calibrated B) The model has reasonable predictive performance for xG C) 28% of shots are predicted incorrectly D) The model requires more training data


Question 10

A calibration curve shows predicted probabilities on the x-axis and actual outcome rates on the y-axis. A well-calibrated model should:

A) Have a steep upward slope B) Follow the diagonal line y = x closely C) Show a horizontal line D) Minimize the area under the curve


Question 11

ROC AUC measures:

A) The accuracy of probability estimates B) The model's ability to rank shots by quality C) The total variance explained D) The number of correct goal predictions


Question 12

When training an xG model, why should you use cross-validation?

A) To maximize the training set size B) To ensure reliable performance estimates and detect overfitting C) To eliminate the need for a test set D) To improve feature importance calculations


Question 13

Which evaluation scenario indicates poor model generalization?

A) Similar log loss on training and validation sets B) Higher ROC AUC on the test set than training set C) Significantly higher log loss on validation set than training set D) Calibration curve close to the diagonal


Question 14

A gradient boosting xG model shows feature importances of: distance (0.35), angle (0.25), body part (0.15), x-coordinate (0.12), y-coordinate (0.08), shot type (0.05). What does this suggest?

A) Only distance and angle should be used B) Location-based features dominate, but other factors contribute meaningfully C) The model is overfitting to body part D) Shot type should be removed from the model


Question 15

What is Brier score?

A) The area under the ROC curve B) The mean squared error between predictions and outcomes C) The negative log probability of outcomes D) The correlation between xG and goals


Section C: Interpretation and Application (Questions 16-22)

Question 16

A team's season totals show 55 goals from 62.4 xG. Which conclusion is best supported?

A) The team has poor finishing B) The team may have experienced some bad luck in front of goal C) Their xG model is overestimating chance quality D) The team should replace their strikers


Question 17

Expected Assists (xA) measures:

A) The number of assists a player should have based on historical rates B) The xG of shots a player creates for teammates C) The quality of passes received before shooting D) The probability of a pass being completed


Question 18

Post-shot xG (PSxG) differs from pre-shot xG by incorporating:

A) Player finishing skill B) Shot placement within the goal frame C) Defensive pressure information D) Goalkeeper positioning data


Question 19

Which application is LEAST appropriate for xG analysis?

A) Evaluating whether a team's recent form is sustainable B) Comparing finishing skill between two players with 15 shots each C) Assessing which team created better chances in a match D) Identifying teams that may regress in goal-scoring


Question 20

A goalkeeper faces 120 shots totaling 17.0 PSxG and concedes 12 goals. Their "goals prevented" metric equals:

A) -3.0 (prevented 3 goals) B) +3.0 (conceded 3 more than expected) C) 14.0 (total goals conceded) D) 0.80 (conversion rate)


Question 21

When using xG for match prediction via Poisson simulation, what assumption is typically made?

A) Goals follow a normal distribution B) The two teams' goals are independent events C) All shots have equal xG D) Home advantage doesn't exist


Question 22

A striker has consistently outperformed xG by 20% over 5 seasons. This most strongly suggests:

A) The xG model is broken for this player B) Random variation over 5 seasons C) Genuine above-average finishing skill D) The player only takes easy shots


Section D: Limitations and Critical Thinking (Questions 23-30)

Question 23

Different xG providers (StatsBomb, Opta, Understat) often produce different xG values for the same shot because:

A) They use different measurement units B) They have different feature sets, training data, and methodologies C) Some providers are consistently wrong D) They apply different rounding rules


Question 24

Which factor is typically NOT included in standard event-data xG models?

A) Distance to goal B) Body part used C) Number of defenders between shooter and goal D) Shot type (volley, placed, etc.)


Question 25

A manager claims their team's xG is misleading because "we only take shots when we know we can score." This criticism:

A) Is completely valid and invalidates xG for their team B) Conflates shot selection with finishing skill C) Suggests the team should shoot more often D) Indicates the xG model needs recalibration


Question 26

Why might a team's actual points differ significantly from their "expected points" based on xG?

A) Set pieces aren't captured well by xG B) Variance in conversion rates, own goals, red cards, and factors outside xG C) The team plays in a league with different rules D) xG cannot be converted to points


Question 27

When communicating xG to a general audience, which practice is most appropriate?

A) Report xG to three decimal places for precision B) Always state that xG perfectly predicts goal outcomes C) Round to one decimal and acknowledge uncertainty D) Avoid mentioning limitations to prevent confusion


Question 28

A model trained on Premier League data is applied to MLS matches. Performance decreases. The most likely explanation is:

A) MLS uses different goal dimensions B) Different playing styles, refereeing, and pitch conditions affect generalization C) The model has too many features D) MLS goalkeepers are better


Question 29

Which statement about xG variance is most accurate?

A) A single shot with 0.30 xG will convert exactly 30% of the time B) Over a large sample, actual goals converge toward total xG C) High-xG shots always convert; low-xG shots never do D) xG variance decreases for individual shots


Question 30

The "regression to the mean" principle in xG analysis suggests that:

A) All players will eventually have identical xG values B) Extreme over/underperformance tends to normalize over time C) Mean xG increases over a season D) Regression models should always be used instead of classification


Answer Key

  1. B - xG represents the probability of a goal based on historical conversion rates for similar shots
  2. B - xG quantifies chance quality to reduce noise from goal-scoring randomness
  3. C - Distance to goal is typically the most important feature
  4. B - Slight overperformance suggests potential regression, not definitive finishing skill
  5. B - Headers are harder to direct accurately due to the body position required
  6. B - Logistic regression is the standard technique for probability prediction
  7. B - Match A shows clinical finishing; Match B shows better chance creation
  8. C - Penalties convert at approximately 76% (some models use slightly different values)
  9. B - Log loss of 0.28 indicates reasonable performance (baseline is ~0.35)
  10. B - Perfect calibration means the calibration curve follows y = x
  11. B - ROC AUC measures discriminative ability (ranking shots by quality)
  12. B - Cross-validation provides reliable estimates and detects overfitting
  13. C - Much higher validation loss indicates overfitting
  14. B - Location features dominate but other features contribute meaningful information
  15. B - Brier score is the mean squared error between predictions and binary outcomes
  16. B - Underperforming xG by 7 goals suggests some bad luck
  17. B - xA is the sum of xG from shots a player created
  18. B - PSxG incorporates where the shot was placed within the goal frame
  19. B - 15 shots is far too small for reliable finishing skill comparison
  20. A - Goals prevented = PSxG - Goals = 17.0 - 12 = 3.0 goals prevented
  21. B - Poisson simulation assumes independence between teams' goal-scoring
  22. C - 5 seasons of data provides strong evidence of genuine finishing skill
  23. B - Providers differ in features, training data, and methodology
  24. C - Defender positions require tracking data, not available in standard event data
  25. B - The criticism conflates shot selection (choosing when to shoot) with finishing skill
  26. B - Many factors outside xG affect actual outcomes
  27. C - Round appropriately and acknowledge uncertainty for general audiences
  28. B - Different leagues have different styles affecting model transfer
  29. B - Law of large numbers: actual goals converge to expected over many shots
  30. B - Extreme performance (over/under) tends to normalize over larger samples

Scoring Guide

Score Performance Level
27-30 Excellent - mastery of xG concepts
23-26 Good - solid understanding with minor gaps
18-22 Satisfactory - core concepts understood, review advanced topics
13-17 Needs Improvement - review chapter before proceeding
0-12 Insufficient - reread chapter and complete exercises