Chapter 15 Exercises: Player Performance Metrics
Part A: Foundations of Player Evaluation (Problems 1-8)
1. ⭐ A midfielder has recorded the following season statistics: 4 goals, 7 assists, 2,430 minutes played. Compute the player's goals per 90, assists per 90, and goal contributions (G+A) per 90.
2. ⭐ Explain in your own words why raw goal totals are an unreliable way to compare a striker who played 3,200 minutes with a striker who played 1,600 minutes. What normalization would you apply, and what caveats remain even after normalization?
3. ⭐ A centre-back has the following per-90 statistics among all centre-backs in the league (n = 80): - Tackles per 90: 2.4 (league mean 1.9, std 0.6) - Aerial win %: 68% (league mean 60%, std 8%) - Progressive passes per 90: 7.1 (league mean 4.2, std 1.1)
Compute the z-score for each metric. Which metric is the player's strongest relative to the peer group?
4. ⭐⭐ You are given a dataset of 200 forwards in the top five European leagues. You want to create a percentile-based scouting report. Describe the steps you would take to: (a) filter the dataset, (b) choose metrics, (c) compute percentiles, and (d) visualize the output. What minimum-minutes threshold would you use for a full season, and why?
5. ⭐⭐ Consider two goalkeepers: - GK A: 120 shots on target faced, 90 saves, PSxG = 105.2, GA = 30 - GK B: 95 shots on target faced, 72 saves, PSxG = 78.6, GA = 23
Compute save percentage and PSxG - GA for both. Which goalkeeper is the better shot-stopper according to each metric? Discuss why the two metrics might give different answers.
6. ⭐ Define the following terms in 1-2 sentences each: (a) Per-90 normalization (b) Coefficient of variation (c) Cosine similarity (d) Player archetype (e) Bayesian shrinkage
7. ⭐⭐ A winger has the following match-level goal involvement (G+A) over 10 matches: [0, 2, 0, 0, 3, 0, 0, 0, 1, 0]. A second winger has: [1, 0, 1, 0, 1, 0, 1, 0, 1, 1]. Both have 6 total goal involvements. Compute the standard deviation and coefficient of variation for each. Which player is more consistent, and why might a coach prefer the consistent player?
8. ⭐⭐ Explain the difference between a volume metric and an efficiency metric. Give two examples of each for a striker. When might a high-volume, low-efficiency player be preferable to a low-volume, high-efficiency player?
Part B: Playing Time and Age Curves (Problems 9-16)
9. ⭐ A substitute forward has played 180 minutes across 6 appearances and scored 2 goals. Compute his goals per 90. Explain why this figure is misleading, and describe two methods for addressing the problem.
10. ⭐⭐ Apply Bayesian shrinkage to the following scenario. The league-average goals per 90 for strikers is 0.35. A young striker has scored 3 goals in 450 minutes (goals per 90 = 0.60). Using a shrinkage parameter $\kappa = 900$, compute the shrunk estimate. Compare this with the raw per-90 figure.
11. ⭐⭐ A player has the following season-by-season non-penalty goals per 90:
| Age | npG/90 |
|---|---|
| 21 | 0.22 |
| 22 | 0.28 |
| 23 | 0.35 |
| 24 | 0.41 |
| 25 | 0.44 |
| 26 | 0.48 |
| 27 | 0.45 |
| 28 | 0.42 |
| 29 | 0.38 |
(a) Compute the year-over-year deltas. (b) At what age does this player peak? (c) Fit a quadratic model $y = \beta_0 + \beta_1 a + \beta_2 a^2$ and find the predicted peak age algebraically.
12. ⭐⭐⭐ Explain survivorship bias in the context of aging curves. Suppose you observe that the average goals per 90 for 33-year-old strikers in the Premier League is 0.28, while for 32-year-olds it is 0.30. Can you conclude that the typical decline from 32 to 33 is 0.02 goals per 90? Why or why not?
13. ⭐⭐ A club is considering two transfer targets: - Player A: Age 22, current performance level 70 (on a 0-100 scale) - Player B: Age 28, current performance level 82
Assuming a generic aging curve that peaks at age 27 with a symmetric shape, and that each year from peak yields a 2-point decline, estimate each player's remaining career value (sum of performance levels from current age through age 35). Which player provides more total value? How does a 10% annual discount factor change the comparison?
14. ⭐⭐⭐
Write a Python function that takes a DataFrame with columns [player_id, season, age, minutes, goals_per90] and returns a DataFrame showing the average year-over-year delta in goals per 90 at each age, using only player-seasons with at least 900 minutes.
15. ⭐⭐ A data analyst presents the following claim: "Our aging curve shows that centre-backs peak at age 31 for interceptions per 90." Provide at least three possible explanations for this unusually late peak, at least one of which involves a methodological concern.
16. ⭐⭐⭐ Discuss how substitution patterns can bias per-90 metrics. Design a simple analysis (describe the steps, no code required) that would test whether substitute appearances systematically inflate a forward's per-90 goal rate compared to starter appearances.
Part C: Profiles, Form, and Similarity (Problems 17-26)
17. ⭐⭐ A midfielder's match ratings over 8 consecutive matches are: [9.2, 8.8, 9.5, 9.1, 8.4, 7.9, 8.2, 8.0]. Compute: (a) the 3-match rolling average after each match (starting from match 3), and (b) the EWMA with $\alpha = 0.3$, starting from the first value. Is the player's form improving or declining?
18. ⭐⭐⭐ You have standardized metrics for five attacking midfielders:
| Player | npxG_z | xA_z | Key Passes_z | Prog Passes_z | Dribbles_z | Pressures_z |
|---|---|---|---|---|---|---|
| A | 1.2 | 0.8 | 0.6 | 0.3 | -0.2 | 0.1 |
| B | 0.3 | 1.5 | 1.8 | 1.2 | 0.5 | -0.3 |
| C | 1.5 | 0.2 | -0.1 | -0.3 | 1.3 | 0.8 |
| D | -0.1 | 0.4 | 0.3 | 1.6 | -0.5 | 1.4 |
| E | 0.9 | 0.9 | 0.7 | 0.5 | 0.3 | 0.2 |
(a) Compute the cosine similarity between each pair of players and Player A. (b) Which player is the best stylistic match for Player A? (c) Which player is most different?
19. ⭐⭐ Sketch (describe verbally or draw) the radar charts for Players B and C from Problem 18. How do their profiles differ in character? What positional sub-roles would you assign to each?
20. ⭐⭐⭐ Implement a weighted Euclidean distance function where the weights for the six metrics in Problem 18 are: npxG (0.25), xA (0.20), Key Passes (0.15), Prog Passes (0.15), Dribbles (0.10), Pressures (0.15). Compute the weighted distance between Player A and all others. Does the ranking change compared to cosine similarity?
21. ⭐⭐ Explain the difference between cosine similarity and Euclidean distance in the context of player comparison. Give a scenario where cosine similarity would be preferable and a scenario where Euclidean distance would be preferable.
22. ⭐⭐⭐ You run K-Means clustering on a dataset of 300 Premier League midfielders using 8 metrics. You try k = 3, 4, 5, 6, 7, and 8 clusters.
(a) How would you use the elbow method to choose the optimal k? (b) How would you use the silhouette score to choose k? (c) After choosing k = 5, you examine the cluster centers. Describe how you would label each cluster as a midfield archetype (e.g., "deep-lying playmaker," "box-to-box," etc.).
23. ⭐⭐ A scout asks: "Find me a left-footed centre-back under 24 who plays similarly to Virgil van Dijk." Describe the full pipeline you would use to answer this question, including data preparation, metric selection, similarity computation, and output format.
24. ⭐⭐⭐⭐ Using PCA, you reduce 10 metrics for 500 forwards to 2 principal components. PC1 explains 35% of variance and loads most heavily on npxG, shots, and goals (a "scoring" dimension). PC2 explains 20% and loads on xA, key passes, and progressive passes (a "creative" dimension).
(a) A player has PC1 = 2.1, PC2 = 1.8. Describe this player's profile in words. (b) A second player has PC1 = -0.5, PC2 = 2.5. Describe this profile. (c) How would you use these two components to find the "most complete" forwards? Define "complete" mathematically.
25. ⭐⭐⭐ A player's form index (EWMA / baseline) over 20 matches oscillates between 0.85 and 1.15. Another player's form index stays between 0.95 and 1.05. Propose a metric that captures the stability of form and compute it for both players. What are the tactical implications of each pattern?
26. ⭐⭐⭐⭐ Design a composite performance index (CPI) for Premier League strikers. Specify: (a) the 6-8 metrics you would include and why, (b) how you would standardize them, (c) how you would determine the weights (expert, PCA, or regression-based), and (d) how you would validate that the CPI is meaningful (e.g., correlation with market value, correlation with team points, stability across seasons). Write pseudocode for the entire pipeline.
Part D: Integration and Critical Thinking (Problems 27-32)
27. ⭐⭐⭐ A data-driven club has built a player similarity model and identified that Player X (age 22, playing in the Belgian league) has a 0.94 cosine similarity to a departing key player. The scout watches three matches and reports that Player X "looks nothing like" the departing player. How do you reconcile the statistical similarity with the scouting disagreement? List at least four possible explanations.
28. ⭐⭐⭐⭐ Critically evaluate the following statement: "Player A's composite performance index is 10.2, while Player B's is 9.6. Therefore Player A is a better player." Identify at least five assumptions or limitations embedded in this comparison.
29. ⭐⭐⭐ A journalist publishes a per-90 leaderboard for the Premier League mid-season (after 19 matches). The top 5 list for "chances created per 90" includes two players with fewer than 500 minutes. Explain the statistical problem. Propose a revised leaderboard methodology that is more robust.
30. ⭐⭐⭐⭐ Discuss the ethical considerations of player performance metrics. Consider: (a) How might over-reliance on metrics disadvantage certain playing styles or player demographics? (b) What happens when players learn which metrics are used in contract negotiations and optimize for those metrics at the expense of team performance? (c) How should clubs communicate metric-based evaluations to players?
31. ⭐⭐⭐ You are tasked with building a player evaluation system for a club that plays in a league with limited data availability (no tracking data, only basic event data). Which of the methods in this chapter can you still apply? Which ones become impossible or unreliable? Design a simplified evaluation framework using only basic event data (goals, assists, shots, passes, tackles, fouls, cards, minutes).
32. ⭐⭐⭐⭐ Capstone Project: Using the concepts from this chapter, design a complete player recruitment pipeline for a mid-table Premier League club looking to sign a right winger under 25 years old for under 20 million euros. Describe each stage: data collection, metric selection, per-90 normalization, age curve projection, similarity matching, shortlist generation, and final recommendation. Identify the limitations of your approach at each stage.