Chapter 7: Quiz
Instructions
This quiz assesses your understanding of Expected Goals (xG) models covered in Chapter 7. Select the best answer for each question. Answers are provided at the end.
Section A: Fundamentals (Questions 1-8)
Question 1
What does an xG value of 0.15 for a shot represent?
A) The shot was taken from 15 meters away B) 15% of similar shots historically resulted in goals C) The player has a 15% career conversion rate D) The team created 0.15 total expected goals in the match
Question 2
Which statement best describes why xG was developed?
A) To replace all existing soccer statistics B) To quantify chance quality and reduce noise from goal-scoring randomness C) To eliminate the need for match observation D) To predict exactly how many goals will be scored
Question 3
In most xG models, which feature has the highest predictive importance?
A) Body part used B) Time remaining in match C) Distance to goal D) Assist type
Question 4
A player has scored 12 goals from 12.5 xG over a season. Which interpretation is most appropriate?
A) The player is definitively an elite finisher B) The player has overperformed xG and may regress toward expected values C) The xG model is poorly calibrated D) The player's xG value should be adjusted upward
Question 5
Why are headers generally assigned lower xG than foot shots from similar positions?
A) Headers are taken from farther distances on average B) Headers are inherently harder to direct accurately C) Goalkeepers are better at saving headers D) Data providers penalize aerial play
Question 6
What is the primary mathematical technique used in basic xG models?
A) Linear regression B) Logistic regression C) K-means clustering D) Random forest classification
Question 7
Match A: Team wins 3-1 with xG of 1.2 vs 0.9 Match B: Team wins 3-1 with xG of 2.8 vs 1.3
Which statement is most accurate?
A) Both wins are equally dominant B) Match A shows greater clinical finishing; Match B shows better chance creation C) xG cannot distinguish between these matches D) Match B was more lucky
Question 8
A penalty kick is typically assigned an xG of approximately:
A) 0.50 B) 0.65 C) 0.76 D) 0.95
Section B: Model Building and Evaluation (Questions 9-15)
Question 9
If an xG model has a log loss of 0.28, what does this indicate?
A) The model is poorly calibrated B) The model has reasonable predictive performance for xG C) 28% of shots are predicted incorrectly D) The model requires more training data
Question 10
A calibration curve shows predicted probabilities on the x-axis and actual outcome rates on the y-axis. A well-calibrated model should:
A) Have a steep upward slope B) Follow the diagonal line y = x closely C) Show a horizontal line D) Minimize the area under the curve
Question 11
ROC AUC measures:
A) The accuracy of probability estimates B) The model's ability to rank shots by quality C) The total variance explained D) The number of correct goal predictions
Question 12
When training an xG model, why should you use cross-validation?
A) To maximize the training set size B) To ensure reliable performance estimates and detect overfitting C) To eliminate the need for a test set D) To improve feature importance calculations
Question 13
Which evaluation scenario indicates poor model generalization?
A) Similar log loss on training and validation sets B) Higher ROC AUC on the test set than training set C) Significantly higher log loss on validation set than training set D) Calibration curve close to the diagonal
Question 14
A gradient boosting xG model shows feature importances of: distance (0.35), angle (0.25), body part (0.15), x-coordinate (0.12), y-coordinate (0.08), shot type (0.05). What does this suggest?
A) Only distance and angle should be used B) Location-based features dominate, but other factors contribute meaningfully C) The model is overfitting to body part D) Shot type should be removed from the model
Question 15
What is Brier score?
A) The area under the ROC curve B) The mean squared error between predictions and outcomes C) The negative log probability of outcomes D) The correlation between xG and goals
Section C: Interpretation and Application (Questions 16-22)
Question 16
A team's season totals show 55 goals from 62.4 xG. Which conclusion is best supported?
A) The team has poor finishing B) The team may have experienced some bad luck in front of goal C) Their xG model is overestimating chance quality D) The team should replace their strikers
Question 17
Expected Assists (xA) measures:
A) The number of assists a player should have based on historical rates B) The xG of shots a player creates for teammates C) The quality of passes received before shooting D) The probability of a pass being completed
Question 18
Post-shot xG (PSxG) differs from pre-shot xG by incorporating:
A) Player finishing skill B) Shot placement within the goal frame C) Defensive pressure information D) Goalkeeper positioning data
Question 19
Which application is LEAST appropriate for xG analysis?
A) Evaluating whether a team's recent form is sustainable B) Comparing finishing skill between two players with 15 shots each C) Assessing which team created better chances in a match D) Identifying teams that may regress in goal-scoring
Question 20
A goalkeeper faces 120 shots totaling 17.0 PSxG and concedes 12 goals. Their "goals prevented" metric equals:
A) -3.0 (prevented 3 goals) B) +3.0 (conceded 3 more than expected) C) 14.0 (total goals conceded) D) 0.80 (conversion rate)
Question 21
When using xG for match prediction via Poisson simulation, what assumption is typically made?
A) Goals follow a normal distribution B) The two teams' goals are independent events C) All shots have equal xG D) Home advantage doesn't exist
Question 22
A striker has consistently outperformed xG by 20% over 5 seasons. This most strongly suggests:
A) The xG model is broken for this player B) Random variation over 5 seasons C) Genuine above-average finishing skill D) The player only takes easy shots
Section D: Limitations and Critical Thinking (Questions 23-30)
Question 23
Different xG providers (StatsBomb, Opta, Understat) often produce different xG values for the same shot because:
A) They use different measurement units B) They have different feature sets, training data, and methodologies C) Some providers are consistently wrong D) They apply different rounding rules
Question 24
Which factor is typically NOT included in standard event-data xG models?
A) Distance to goal B) Body part used C) Number of defenders between shooter and goal D) Shot type (volley, placed, etc.)
Question 25
A manager claims their team's xG is misleading because "we only take shots when we know we can score." This criticism:
A) Is completely valid and invalidates xG for their team B) Conflates shot selection with finishing skill C) Suggests the team should shoot more often D) Indicates the xG model needs recalibration
Question 26
Why might a team's actual points differ significantly from their "expected points" based on xG?
A) Set pieces aren't captured well by xG B) Variance in conversion rates, own goals, red cards, and factors outside xG C) The team plays in a league with different rules D) xG cannot be converted to points
Question 27
When communicating xG to a general audience, which practice is most appropriate?
A) Report xG to three decimal places for precision B) Always state that xG perfectly predicts goal outcomes C) Round to one decimal and acknowledge uncertainty D) Avoid mentioning limitations to prevent confusion
Question 28
A model trained on Premier League data is applied to MLS matches. Performance decreases. The most likely explanation is:
A) MLS uses different goal dimensions B) Different playing styles, refereeing, and pitch conditions affect generalization C) The model has too many features D) MLS goalkeepers are better
Question 29
Which statement about xG variance is most accurate?
A) A single shot with 0.30 xG will convert exactly 30% of the time B) Over a large sample, actual goals converge toward total xG C) High-xG shots always convert; low-xG shots never do D) xG variance decreases for individual shots
Question 30
The "regression to the mean" principle in xG analysis suggests that:
A) All players will eventually have identical xG values B) Extreme over/underperformance tends to normalize over time C) Mean xG increases over a season D) Regression models should always be used instead of classification
Answer Key
- B - xG represents the probability of a goal based on historical conversion rates for similar shots
- B - xG quantifies chance quality to reduce noise from goal-scoring randomness
- C - Distance to goal is typically the most important feature
- B - Slight overperformance suggests potential regression, not definitive finishing skill
- B - Headers are harder to direct accurately due to the body position required
- B - Logistic regression is the standard technique for probability prediction
- B - Match A shows clinical finishing; Match B shows better chance creation
- C - Penalties convert at approximately 76% (some models use slightly different values)
- B - Log loss of 0.28 indicates reasonable performance (baseline is ~0.35)
- B - Perfect calibration means the calibration curve follows y = x
- B - ROC AUC measures discriminative ability (ranking shots by quality)
- B - Cross-validation provides reliable estimates and detects overfitting
- C - Much higher validation loss indicates overfitting
- B - Location features dominate but other features contribute meaningful information
- B - Brier score is the mean squared error between predictions and binary outcomes
- B - Underperforming xG by 7 goals suggests some bad luck
- B - xA is the sum of xG from shots a player created
- B - PSxG incorporates where the shot was placed within the goal frame
- B - 15 shots is far too small for reliable finishing skill comparison
- A - Goals prevented = PSxG - Goals = 17.0 - 12 = 3.0 goals prevented
- B - Poisson simulation assumes independence between teams' goal-scoring
- C - 5 seasons of data provides strong evidence of genuine finishing skill
- B - Providers differ in features, training data, and methodology
- C - Defender positions require tracking data, not available in standard event data
- B - The criticism conflates shot selection (choosing when to shoot) with finishing skill
- B - Many factors outside xG affect actual outcomes
- C - Round appropriately and acknowledge uncertainty for general audiences
- B - Different leagues have different styles affecting model transfer
- B - Law of large numbers: actual goals converge to expected over many shots
- B - Extreme performance (over/under) tends to normalize over larger samples
Scoring Guide
| Score | Performance Level |
|---|---|
| 27-30 | Excellent - mastery of xG concepts |
| 23-26 | Good - solid understanding with minor gaps |
| 18-22 | Satisfactory - core concepts understood, review advanced topics |
| 13-17 | Needs Improvement - review chapter before proceeding |
| 0-12 | Insufficient - reread chapter and complete exercises |