Chapter 5 Exercises: Introduction to Soccer Metrics
Part A: Conceptual Understanding (Problems 1--10)
Problem 1 ★
List three traditional soccer statistics and, for each, identify one specific limitation discussed in Section 5.1. Explain why each limitation matters for player evaluation.
Problem 2 ★
A pundit states: "Player X completed 91% of his passes, while Player Y completed only 78%. Clearly, Player X is the better passer." Identify at least two reasons why this conclusion may be flawed.
Problem 3 ★
Define the following terms in your own words: - (a) Counting statistic - (b) Rate statistic - (c) Per-90 normalization - (d) Stabilization point - (e) Intraclass correlation coefficient (ICC)
Problem 4 ★★
Explain the signal-to-noise framework. Write out the decomposition equation and describe each component. Give a soccer-specific example illustrating how random noise can distort a player's observed statistics.
Problem 5 ★★
A sporting director tells you: "I only want to look at players who scored at least 15 goals last season." Using concepts from this chapter, explain the potential problems with this filtering criterion. What alternative approach would you suggest?
Problem 6 ★★
Classify each of the following as descriptive, predictive, or prescriptive: - (a) A team scored 65 goals last season. - (b) A model estimates that a team will score 58 goals next season. - (c) A report recommends signing a striker because they would add 5 goals to the team's output. - (d) A player's xG map showing the location of all shots taken. - (e) A probability distribution for a team's final league position.
Problem 7 ★★
Why does goal conversion rate take 35--40+ matches to stabilize, while pass completion percentage stabilizes in 6--8 matches? Use the concepts of event frequency and variance to explain.
Problem 8 ★★★
Consider two metrics for evaluating centre-backs: - Metric A: Tackles per 90 minutes - Metric B: Percentage of opponent dribbles successfully tackled
Evaluate each metric against the five desirable properties of a good metric (validity, reliability, discrimination, interpretability, actionability). Which metric would you prefer for a scouting report, and why?
Problem 9 ★★★
A coach asks: "Why should I trust your xG model when it says we deserved to win 2.1 to 0.8, but we actually lost 0--1?" Write a response (150--200 words) that addresses the coach's concern while building trust in the metric.
Problem 10 ★★★
Explain the concept of over-adjustment. Provide a hypothetical example where adjusting for too many context factors could remove meaningful signal from a soccer metric.
Part B: Computational Problems (Problems 11--20)
Problem 11 ★
A midfielder has 7 assists in 2,100 minutes of play. Calculate: - (a) Their assists per 90 minutes. - (b) The number of full 90-minute matches equivalent to their playing time. - (c) Why this sample size might or might not be sufficient for a reliable per-90 estimate.
Problem 12 ★★
The following table shows a striker's statistics in two different leagues:
| League | Goals | Minutes | League Avg Goals/Game |
|---|---|---|---|
| League A | 18 | 2,800 | 2.65 |
| League B | 14 | 2,500 | 3.10 |
- (a) Compute the goals per 90 in each league.
- (b) Propose a simple league-strength adjustment and apply it. Which striker performance is more impressive after adjustment?
Problem 13 ★★
A player has the following per-match tackle counts over 10 matches: [4, 2, 5, 3, 4, 6, 2, 3, 5, 4].
- (a) Compute the split-half reliability (odd matches vs. even matches).
- (b) Apply the Spearman-Brown formula to estimate full-sample reliability.
- (c) Interpret the result: is this metric stable for this player?
Problem 14 ★★
A team has 58% average possession. One of their midfielders records 85 passes per 90 and 2.3 tackles per 90.
- (a) Compute the possession-adjusted passes per 90.
- (b) Compute the possession-adjusted tackles per 90.
- (c) Explain why these adjustments move in opposite directions.
Problem 15 ★★
Using the opponent-adjustment formula from Section 5.4.2, compute the adjusted goals for a team that scored 3 goals against an opponent that concedes an average of 0.9 goals per match. The league average goals conceded is 1.30 per match. Interpret the result.
Problem 16 ★★★
A goalkeeper has the following save percentages over 6 seasons: [0.72, 0.68, 0.74, 0.70, 0.69, 0.73]. Using year-over-year correlation, estimate the stability of save percentage. Show your calculation.
Problem 17 ★★★
You are given the following data for 5 players over a season:
| Player | Minutes | Shots | Goals | xG |
|---|---|---|---|---|
| A | 2,700 | 85 | 14 | 11.2 |
| B | 1,800 | 62 | 10 | 8.5 |
| C | 3,100 | 110 | 12 | 14.8 |
| D | 900 | 30 | 8 | 5.1 |
| E | 2,400 | 78 | 9 | 10.3 |
- (a) Compute goals per 90, xG per 90, and shots per 90 for each player.
- (b) Compute each player's goals minus xG (over/underperformance).
- (c) Which player(s) would you flag for further investigation and why?
- (d) For which player(s) should per-90 rates be treated with caution?
Problem 18 ★★★
Write a Python function that takes a DataFrame with columns ["player", "match", "metric_value"] and returns the split-half reliability and Spearman-Brown reliability for each player. The function should split matches into odd and even groups.
Problem 19 ★★★
A team plays 38 league matches with the following game-state breakdown:
| Game State | Minutes | Goals Scored |
|---|---|---|
| Losing (GS < 0) | 650 | 12 |
| Level (GS = 0) | 1,400 | 22 |
| Winning (GS > 0) | 1,370 | 18 |
The league-average game-state distribution is: Losing 30%, Level 40%, Winning 30%.
- (a) Compute the team's scoring rate (goals per 90) in each game state.
- (b) Compute the game-state adjusted total goals using the league-average distribution.
- (c) Compare the raw and adjusted totals. What does the difference tell you about this team?
Problem 20 ★★★★
The ICC for a new pressing metric is 0.35.
- (a) Calculate the stabilization point $n^*$ (in seasons).
- (b) If each season consists of 34 matches, how many matches are needed for stabilization?
- (c) A scout wants to evaluate a player based on 15 matches of data. Is this sufficient? Justify your answer quantitatively.
- (d) Propose two ways to increase the ICC of this metric without collecting more data.
Part C: Application and Analysis (Problems 21--26)
Problem 21 ★★
You are building a player comparison tool for a scouting department. The tool will display per-90 statistics for forwards in a European league. Write a list of 5 design decisions you need to make (e.g., minimum minutes threshold) and justify each one using concepts from this chapter.
Problem 22 ★★★
A journalist writes: "Team X's defence is the worst in the league --- they have conceded 55 goals, more than any other team." Using at least three concepts from this chapter, write a 200-word rebuttal explaining why this conclusion may be premature.
Problem 23 ★★★
Design a metric called "Progressive Passing Index" (PPI) that measures a midfielder's ability to advance the ball through passing. Specify: - (a) What events are included (e.g., passes that advance the ball at least X yards toward goal). - (b) What denominator you would use and why. - (c) What context adjustments you would apply. - (d) How you would validate the metric using the three-pillar framework.
Problem 24 ★★★
You have been asked to present a scouting report to a head coach who is skeptical of analytics. The report recommends signing a left-winger based on xG-assisted, progressive carries, and defensive pressures. Outline your presentation strategy (300 words max), drawing on Section 5.6.
Problem 25 ★★★★
A club's analytics department has developed a new "Defensive Impact Score" (DIS) for centre-backs. The metric combines tackles won, interceptions, aerial duels won, and blocks into a single composite score. The department claims the metric "captures overall defensive quality."
Critique this claim. Specifically address: - (a) What aspects of defending might be missing from this metric? - (b) How would you test whether the metric actually measures "defensive quality"? - (c) What are the risks of using a composite score vs. examining components individually? - (d) Propose a validation study design.
Problem 26 ★★★★
Two strikers are candidates for a transfer. Their statistics are:
| Metric | Striker X | Striker Y |
|---|---|---|
| Age | 24 | 29 |
| Goals per 90 | 0.42 | 0.55 |
| xG per 90 | 0.48 | 0.50 |
| Minutes played | 2,600 | 2,400 |
| League | Eredivisie | Premier League |
| Team possession | 62% | 48% |
| Avg opponent rank | 8th | 10th |
Write a thorough analysis (400 words) comparing these two strikers. Address per-90 reliability, league adjustment, possession adjustment, age trajectory, and xG over/underperformance. Make a recommendation and justify it.
Part D: Programming Challenges (Problems 27--30)
Problem 27 ★★
Write a Python function per_90(values: list[float], minutes: list[float]) -> list[float] that computes per-90 rates. The function should:
- Handle division by zero gracefully.
- Issue a warning if any player has fewer than 450 minutes.
- Return None for players below a specified minimum-minutes threshold (default 0).
Problem 28 ★★★
Write a Python function that performs opponent adjustment on a team's goals-scored data. The function should accept: - A list of goals scored per match. - A list of opponent average goals conceded per match. - The league average goals conceded per match.
It should return the list of opponent-adjusted goals per match and the season total.
Problem 29 ★★★★
Write a Python program that simulates 1,000 seasons for a striker with a true conversion rate of 0.13 and 90 shots per season. For each simulated season, compute: - Raw conversion rate - Goals scored
Then produce: - A histogram of simulated goals with the true expected value marked. - The 90% confidence interval for goals. - The probability that the striker scores 15+ goals (a "great season") by chance.
Problem 30 ★★★★
Build a complete metric validation pipeline in Python. The pipeline should: 1. Generate synthetic player-match data (at least 20 players, 30 matches each). 2. Compute a per-90 metric for each player. 3. Perform split-half reliability analysis. 4. Compute the ICC. 5. Test predictive validity (first half predicting second half). 6. Report all results in a formatted summary table.
Use the code structure from the chapter examples as a starting point.