Chapter 16 Exercises
Part A: Conceptual Questions
Exercise 16.1 (Difficulty: *) List and briefly define the six primary dimensions used to characterize a team's playing style (Section 16.1.1). For each dimension, give one example metric that could be used to measure it.
Exercise 16.2 (Difficulty: *) Explain why actual league points are a noisy indicator of team quality. Identify at least three distinct sources of randomness that cause actual points to deviate from "true" quality.
Exercise 16.3 (Difficulty: *) Define the Lineup Stability Index (LSI). A team makes exactly 2 changes to their starting XI between every consecutive match in a 38-match season. What is their LSI?
Exercise 16.4 (Difficulty: **) Explain the difference between the basic independent Poisson model and the Dixon-Coles model for match outcomes. Why does the correction factor $\rho$ tend to be negative? What types of scorelines does this affect?
Exercise 16.5 (Difficulty: **) A new signing from a very different tactical system (e.g., a player moving from a deep-block, counterattacking team to a high-pressing, possession team) is expected to have a lower integration rate $\kappa$ in the chemistry integration curve $C_{ij}(t) = C_{ij}^{\infty}(1 - e^{-\kappa t})$. Explain intuitively why this is the case. What could a coaching staff do to accelerate integration?
Exercise 16.6 (Difficulty: **) Describe the concept of "style drift" and explain how you would detect a significant tactical shift mid-season. What events might cause sudden style drift?
Exercise 16.7 (Difficulty: **) Why is it important to weight score-state analysis by minutes spent in each state? Construct a hypothetical example where unweighted analysis would give misleading results.
Exercise 16.8 (Difficulty: ***) Critically evaluate the Squad Depth Index (SDI) formula from Section 16.3.2. What are its limitations? Propose one modification that would improve it.
Part B: Computational Problems
Exercise 16.9 (Difficulty: *) A team creates 1.8 xG and concedes 0.9 xG in a match. Using the Skellam distribution (or simulation), compute the expected points for this team. Show your work.
Exercise 16.10 (Difficulty: *) Compute the expected result probability (win/draw/loss) for a match where the home team has $\lambda_H = 1.5$ and the away team has $\lambda_A = 1.2$, using the independent Poisson model. Truncate at 6 goals per team.
Exercise 16.11 (Difficulty: **) A team has the following xG and actual goals over 5 matches:
| Match | xG For | xG Against | Goals For | Goals Against |
|---|---|---|---|---|
| 1 | 2.1 | 0.8 | 3 | 0 |
| 2 | 1.4 | 1.6 | 0 | 2 |
| 3 | 0.9 | 0.7 | 1 | 1 |
| 4 | 1.8 | 1.1 | 2 | 1 |
| 5 | 1.3 | 1.9 | 2 | 2 |
Compute the cumulative actual points and cumulative expected points after all 5 matches. What is the Luck Index?
Exercise 16.12 (Difficulty: **) Team A has an Elo rating of 1750. Team B has an Elo rating of 1620. Team A is playing at home (home advantage = 65 Elo points). (a) Compute the expected result for Team A. (b) If Team A wins 2-1, compute the updated Elo ratings for both teams using $K = 30$ with goal difference scaling $K_{\text{adj}} = K \cdot \ln(|\Delta G| + 1)$.
Exercise 16.13 (Difficulty: **) Given the following style vectors (z-scores) for three teams:
| Dimension | Team X | Team Y | Team Z |
|---|---|---|---|
| Possession | 1.8 | -1.2 | 0.3 |
| Pressing | 1.5 | -0.8 | 1.1 |
| Directness | -1.0 | 1.5 | 0.2 |
| Width | 0.8 | 0.3 | -0.5 |
| Def. Height | 1.6 | -1.5 | 0.0 |
Compute the Euclidean distance between each pair of teams. Which two teams are most similar in style?
Exercise 16.14 (Difficulty: ***) Implement the Dixon-Coles correction factor $\tau$ for the case $\lambda_H = 1.3$, $\lambda_A = 0.9$, and $\rho = -0.05$. Compute the adjusted probability of a 0-0 draw and a 1-1 draw, and compare with the unadjusted (independent Poisson) probabilities.
Exercise 16.15 (Difficulty: ***) A team plays 6 matches with the following starting XIs (represented as sets of player IDs):
| Match | Starting XI |
|---|---|
| 1 | {1,2,3,4,5,6,7,8,9,10,11} |
| 2 | {1,2,3,4,5,6,7,8,9,10,12} |
| 3 | {1,2,3,4,5,6,7,8,9,12,13} |
| 4 | {1,2,3,4,5,14,7,8,9,10,11} |
| 5 | {1,2,3,4,5,14,7,8,9,10,11} |
| 6 | {1,2,3,4,5,6,7,8,15,10,11} |
Compute the Lineup Stability Index.
Exercise 16.16 (Difficulty: ***) Write a Python function that takes an array of team Elo ratings and a fixture list, and computes the Fixture Difficulty Rating (FDR) for each team over their next $n$ matches. Include home advantage adjustment.
Part C: Applied Analysis
Exercise 16.17 (Difficulty: **)
Using the provided example-01-team-style.py code, generate style fingerprints for all 20 teams in a league. Create a radar chart for the top 4 teams and the bottom 4 teams. Describe the stylistic differences you observe between the groups.
Exercise 16.18 (Difficulty: **)
Using the expected points model from example-02-expected-points.py, construct an xPts table for a full season. Identify the 3 teams with the largest positive Luck Index and the 3 with the largest negative Luck Index. Research whether these teams' subsequent season performance was consistent with regression toward xPts.
Exercise 16.19 (Difficulty: ***) Analyze a team that changed managers mid-season. Compute style fingerprints for the periods before and after the change. Quantify the style drift using the Euclidean distance metric. Which dimensions changed most?
Exercise 16.20 (Difficulty: ***)
Using the season simulation code from example-03-season-simulation.py, run a simulation at three points in the season: after matchweek 10, matchweek 20, and matchweek 30. For the eventual champions, plot how their title probability evolved over the season. At what matchweek did their probability first exceed 50%? 75%? 95%?
Exercise 16.21 (Difficulty: ***) Compute the Squad Depth Index for each position group for a team of your choice. Identify the weakest position group and propose a signing target using the player similarity framework from Chapter 15 (find a player from a similar tactical system who would improve the SDI for that position).
Exercise 16.22 (Difficulty: ***) Perform a score-state analysis for one team across a full season. Compute possession percentage, PPDA, and xG per minute for each score state (winning, drawing, losing). Is the team tactically flexible or rigid? Support your answer with numbers.
Exercise 16.23 (Difficulty: ****) Build a fixture congestion model for a team competing in both the league and a cup competition. Compute congestion metrics for each match and correlate with performance metrics (xG, xGA, distance covered). Does the team show statistically significant performance degradation under congestion?
Exercise 16.24 (Difficulty: ****) Construct a full Dixon-Coles model for a league season. Estimate attack and defense parameters for all teams, the home advantage parameter, and the correlation parameter $\rho$. Use maximum likelihood estimation with time-decay weighting. Compare the predicted vs. actual league table.
Part D: Research and Extension
Exercise 16.25 (Difficulty: ***) Research the concept of "game states" beyond simple win/draw/loss score states. Some analysts define states by goal difference (e.g., winning by 2+, winning by 1, drawing, losing by 1, losing by 2+). Design a more granular game-state framework that also incorporates time remaining (e.g., "losing by 1 with 10 minutes left" vs. "losing by 1 with 60 minutes left"). What additional insights does this granularity provide?
Exercise 16.26 (Difficulty: ***) The Elo rating system described in Section 16.6.2 treats all wins equally (modulo goal difference). Design an extension that incorporates xG into the Elo update. How would you modify the expected result and actual result components? What are the advantages and disadvantages of an "xG-Elo" system?
Exercise 16.27 (Difficulty: ****) Team chemistry is modeled as a pairwise phenomenon in Section 16.4, but some combinations involve 3+ players (e.g., a triangle of center-backs and defensive midfielder). Propose a mathematical framework for measuring higher-order chemistry (triadic, tetradic). How would you estimate the parameters of such a model, given that the number of possible combinations grows combinatorially?
Exercise 16.28 (Difficulty: ****) Season simulations assume that team strength is fixed (or evolves independently of results). In reality, there are feedback effects: winning streaks boost confidence, losing streaks damage morale, and being near the top or bottom of the table changes incentives. Design a simulation model that incorporates these feedback effects. How would you calibrate the strength of the feedback from historical data?
Exercise 16.29 (Difficulty: ****) The current squad balance framework focuses on within-team analysis. Extend it to a league-wide comparative framework: define a "market efficiency" metric that measures whether a team's spending is optimally distributed across positions relative to the marginal value of improvement at each position. Hint: use the concept of marginal xG contribution per position.
Exercise 16.30 (Difficulty: *) Implement a full Bayesian version of the Dixon-Coles model using PyMC or Stan. Place priors on attack/defense parameters, the home advantage, and $\rho$. Compare the posterior distributions with the maximum likelihood point estimates. How do the Bayesian credible intervals for season predictions compare with the frequentist simulation intervals?
Exercise 16.31 (Difficulty: *) Design a "team DNA" metric that captures the long-term stylistic identity of a club across multiple managers. For example, Barcelona has maintained a possession-based identity across decades despite managerial changes. Operationalize this concept by computing style fingerprints across 10+ seasons and measuring the persistence of style dimensions. Which clubs have the strongest "DNA"? Which are most influenced by their current manager?
Exercise 16.32 (Difficulty: *) Build an end-to-end season prediction system that combines: (a) Dixon-Coles team strengths with time decay, (b) squad depth adjustments for fixture congestion, (c) fixture difficulty ratings, (d) team chemistry adjustments for teams with significant transfer activity, and (e) score-state-dependent performance models. Validate on at least 3 historical seasons and report calibration metrics. Compare with a baseline model that uses only current league standings.