Chapter 15 Self-Assessment Quiz

Test your understanding of player performance metrics. Select the best answer for each question. A score of 70% or higher indicates readiness to proceed to Chapter 16.


1. Which of the following is the standard normalization unit used in soccer analytics for comparing player statistics across different playing times?

(a) Per match (b) Per 90 minutes (c) Per 60 minutes (d) Per season

Answer **(b) Per 90 minutes.** The soccer analytics community has adopted per-90-minute normalization as the standard because it approximates one full match and allows comparison regardless of actual minutes played. See Section 15.3.1.

2. A forward has scored 8 goals in 1,800 minutes. What is their goals per 90?

(a) 0.36 (b) 0.40 (c) 0.44 (d) 0.50

Answer **(b) 0.40.** Goals per 90 = (8 / 1800) * 90 = 0.40. See Section 15.3.1.

3. What is the primary advantage of PSxG - GA over raw save percentage for evaluating goalkeepers?

(a) It is easier to compute (b) It accounts for the difficulty and placement of shots faced (c) It includes the goalkeeper's distribution ability (d) It penalizes goalkeepers who face fewer shots

Answer **(b) It accounts for the difficulty and placement of shots faced.** PSxG (Post-Shot Expected Goals) considers not just shot location but also the trajectory of the ball, making it a more informative measure of shot-stopping ability than raw save percentage. See Section 15.2.1.

4. Why is a minimum minutes threshold applied when computing per-90 leaderboards?

(a) To exclude goalkeepers from outfield rankings (b) To reduce computational cost (c) To prevent small-sample-size artifacts from inflating rates (d) To comply with league data regulations

Answer **(c) To prevent small-sample-size artifacts from inflating rates.** A player who plays only 45 minutes and scores once would have a per-90 rate of 2.0 goals, which is statistically meaningless. Minimum thresholds ensure rates are based on sufficient data. See Section 15.3.2.

5. In Bayesian shrinkage for per-90 metrics, the shrunk estimate is pulled toward:

(a) Zero (b) The player's career average (c) The population average (d) The maximum observed value

Answer **(c) The population average.** Bayesian shrinkage adjusts a player's observed rate toward the population mean, with the degree of adjustment inversely proportional to the player's sample size (minutes played). See Section 15.3.3.

6. Using the Bayesian shrinkage formula with kappa = 900, what weight is assigned to a player's own data if they have played 900 minutes?

(a) 0.25 (b) 0.50 (c) 0.75 (d) 1.00

Answer **(b) 0.50.** The weight w = n / (n + kappa) = 900 / (900 + 900) = 0.50. At kappa minutes, the player's data and the population prior receive equal weight. See Section 15.3.3.

7. According to the general age-performance framework, at what age range do physical attributes (sprint speed, distance covered) typically peak in soccer?

(a) 20-22 (b) 24-26 (c) 28-30 (d) 30-32

Answer **(b) 24-26.** Physical metrics tend to peak earlier than technical or tactical metrics. Research consistently shows sprint speed and endurance peaking in the mid-twenties. See Section 15.4.1.

8. What is survivorship bias in the context of aging curves?

(a) Players who score more goals survive longer in the league (b) Only players good enough to maintain contracts are observed at older ages, making the decline appear shallower (c) Younger players are more likely to be injured (d) Teams prefer to sign younger players, biasing the data

Answer **(b) Only players good enough to maintain contracts are observed at older ages, making the decline appear shallower.** Weaker players from a cohort retire or drop to lower leagues, so the observed average at older ages is inflated by a selection effect. See Section 15.4.3.

9. The coefficient of variation (CV) is defined as:

(a) The standard deviation divided by the median (b) The mean divided by the standard deviation (c) The standard deviation divided by the mean (d) The variance divided by the mean

Answer **(c) The standard deviation divided by the mean.** CV = sigma / x_bar. It measures relative variability and is useful for comparing consistency across players with different average performance levels. See Section 15.5.3.

10. What does an EWMA smoothing parameter alpha close to 1.0 produce?

(a) A very smooth, slowly-changing average that emphasizes long-term trends (b) A very responsive average that closely tracks recent observations (c) An average that equally weights all observations (d) An average that ignores the most recent observation

Answer **(b) A very responsive average that closely tracks recent observations.** A high alpha gives more weight to the most recent data point, making the EWMA react quickly to changes. A low alpha produces a smoother estimate. See Section 15.5.2.

11. What is the purpose of z-score standardization in player profiling?

(a) To convert all metrics to goals (b) To place metrics on a common scale for comparison and combination (c) To remove outliers from the data (d) To adjust for playing time

Answer **(b) To place metrics on a common scale for comparison and combination.** Since different metrics are measured in different units and have different ranges, z-scores transform them all to a common scale (mean 0, standard deviation 1), enabling cross-metric comparison and weighted aggregation. See Section 15.6.2.

12. A player has a z-score of -1.5 for "errors leading to goals." Is this good or bad?

(a) Bad -- it means the player makes many errors (b) Good -- it means the player makes fewer errors than average (c) Neutral -- z-scores near zero are average (d) Cannot be determined without more context

Answer **(b) Good -- it means the player makes fewer errors than average.** For a negative metric like errors, a negative z-score indicates below-average frequency of errors, which is desirable. When building profiles, such metrics should be inverted so that higher values always indicate better performance. See Section 15.6.3.

13. On a radar chart displaying percentile ranks, what does a perfectly circular shape indicate?

(a) The player is the best in all categories (b) The player is equally ranked across all categories relative to the peer group (c) The data has not been properly standardized (d) The player has no weaknesses

Answer **(b) The player is equally ranked across all categories relative to the peer group.** A circular shape means the same percentile in every metric. This does not mean the player is good or bad -- just equally positioned across dimensions. A perfect circle at the 90th percentile would indicate elite-and-balanced; at the 30th percentile, below-average-and-balanced. See Section 15.6.3.

14. Cosine similarity between two player vectors measures:

(a) The Euclidean distance between them (b) The angle between the vectors, ignoring magnitude (c) The product of their magnitudes (d) The difference in their mean z-scores

Answer **(b) The angle between the vectors, ignoring magnitude.** Cosine similarity captures the directional alignment of two profiles. Two players with identical shapes of strengths and weaknesses will have cosine similarity near 1.0, even if one is objectively better across all metrics. See Section 15.7.2.

15. When should you prefer Euclidean distance over cosine similarity for player comparison?

(a) When you want to match playing style only (b) When you want to match both playing style and performance level (c) When comparing players across different leagues (d) When the data has not been standardized

Answer **(b) When you want to match both playing style and performance level.** Euclidean distance is sensitive to both the shape and magnitude of the profile vector, so it finds players who are similar in both what they do and how well they do it. See Section 15.7.3.

16. In K-Means clustering of player metrics, each cluster center represents:

(a) The best player in the cluster (b) The median player in the cluster (c) A prototypical player profile (archetype) for that cluster (d) The most recently added player to the cluster

Answer **(c) A prototypical player profile (archetype) for that cluster.** The cluster center (centroid) is the mean of all player vectors assigned to that cluster and can be interpreted as the average or prototypical profile for that player type. See Section 15.7.5.

17. Which of the following is NOT a common archetype that emerges from clustering forwards?

(a) Poacher (b) Complete forward (c) Regista (d) Pressing forward

Answer **(c) Regista.** A regista is a deep-lying playmaking midfielder role, not a forward archetype. Poacher, complete forward, and pressing forward are all common forward archetypes identified through clustering. See Section 15.7.5.

18. When constructing a comparison group for player percentile rankings, which approach is most methodologically sound?

(a) Compare all outfield players regardless of position (b) Compare all players in the same position across the same league level (c) Compare players from the same team only (d) Compare players with the same nationality

Answer **(b) Compare all players in the same position across the same league level.** Position filtering ensures that role-specific expectations are accounted for, and league level filtering ensures comparable competition quality. Comparing across positions or within a single team produces misleading rankings. See Section 15.1.4.

19. A club uses the "buy low, sell high" transfer strategy based on age curves. Which purchase-and-sale ages best exploit this approach?

(a) Buy at 18, sell at 22 (b) Buy at 20-23, sell at 25-28 (c) Buy at 27, sell at 30 (d) Buy at 25, sell at 32

Answer **(b) Buy at 20-23, sell at 25-28.** Purchasing before the peak (when fees are lower) and selling at or near the peak (when value is highest) maximizes the difference between purchase price and sale price. This strategy is used by clubs like Brentford and Brighton. See Section 15.4.4.

20. Why are non-penalty metrics (e.g., npG/90, npxG/90) preferred over total metrics when evaluating strikers?

(a) Penalties are too rare to measure (b) Penalty conversion is purely random (c) Penalty-taking is often assigned based on seniority rather than skill, and penalties inflate open-play scoring metrics (d) Non-penalty metrics are easier to compute

Answer **(c) Penalty-taking is often assigned based on seniority rather than skill, and penalties inflate open-play scoring metrics.** A designated penalty taker receives a scoring boost unrelated to their open-play ability, distorting comparisons with non-penalty takers. Non-penalty metrics isolate the contributions that reflect true from-play scoring talent. See Section 15.2.4.

21. In PCA-based player visualization, what does proximity between two points on a 2D PCA map indicate?

(a) The players are on the same team (b) The players have similar metric profiles in the reduced space (c) The players have the same age (d) The players play the same number of minutes

Answer **(b) The players have similar metric profiles in the reduced space.** PCA preserves the major sources of variance from the original high-dimensional metric space. Players who are close together in the PCA projection have similar profiles across the metrics that contribute most to the principal components. See Section 15.7.6.

22. Which of the following is a limitation of composite performance indices (CPIs)?

(a) They require too much data to compute (b) They compress multi-dimensional information into a single number, losing nuance about how a player contributes (c) They are only applicable to goalkeepers (d) They cannot be compared across seasons

Answer **(b) They compress multi-dimensional information into a single number, losing nuance about how a player contributes.** While useful for initial ranking and screening, a single number cannot reveal whether a player is a prolific scorer, a creative genius, or a defensive warrior. Final evaluations should always examine the full profile. See Section 15.6.4.

23. The delta method for constructing aging curves computes:

(a) The average performance level at each age (b) The average year-over-year change in performance for players transitioning between consecutive ages (c) The peak age using regression (d) The retirement probability at each age

Answer **(b) The average year-over-year change in performance for players transitioning between consecutive ages.** The delta method uses within-player changes (e.g., how a player's metric changed from age 26 to 27) and averages these deltas across many players at each age transition, which helps control for selection effects. See Section 15.4.1.

24. A player has a form index of 1.15 (EWMA / baseline). This means:

(a) The player's recent form is 15% above their long-term average (b) The player's recent form is 15% below their long-term average (c) The player has played 15% more minutes recently (d) The player ranks in the 15th percentile

Answer **(a) The player's recent form is 15% above their long-term average.** A form index above 1.0 indicates that recent performance (measured by EWMA) exceeds the player's own expanding baseline average. A value of 1.15 means 15% above baseline. See Section 15.5.1.

25. When using a player similarity model for recruitment, which step is essential for validating the results before making a signing recommendation?

(a) Verifying that the similarity score exceeds 0.95 (b) Cross-referencing with video analysis and scouting reports (c) Ensuring the player has a Wikipedia page (d) Confirming the player's social media following is large enough

Answer **(b) Cross-referencing with video analysis and scouting reports.** Statistical similarity is a starting point, not an endpoint. Metrics cannot capture all aspects of a player's game (e.g., leadership, decision-making under pressure, injury risk). Scouting and video verification are essential final steps in any recruitment pipeline. See Section 15.7.7.

Scoring Guide

Score Assessment
22-25 correct (88-100%) Excellent -- ready to advance
18-21 correct (72-84%) Good -- review missed topics
14-17 correct (56-68%) Needs improvement -- re-read relevant sections
Below 14 (below 56%) Re-study the chapter thoroughly before proceeding