Chapter 19 Exercises: Modeling Soccer

Part A: Foundational Concepts (Exercises 1-6)

Exercise 1. Explain why the independent Poisson model is a natural starting point for modeling soccer scores. Identify at least three empirical properties of soccer goal data that are consistent with the Poisson distribution, and one important property that violates it.

Exercise 2. In the Dixon-Coles model, the correlation parameter rho is typically small and negative. Derive the effect of rho = -0.10 on the probability of a 0-0 draw when lambda = 1.3 and mu = 1.1. Compare this to the independent Poisson probability and explain intuitively why the adjustment goes in this direction.

Exercise 3. A team has attack strength alpha = 1.25, and their opponent has defense strength beta = 0.95. The home advantage parameter gamma = 1.30. Calculate the expected number of goals for the home team. Then compute the Poisson probability of the home team scoring exactly 0, 1, 2, and 3 goals. What is the probability of scoring 4 or more?

Exercise 4. Explain the concept of time-decay weighting in the Dixon-Coles model. If the decay parameter xi = 0.0019 per day, calculate the weight assigned to a match played (a) 90 days ago, (b) 180 days ago, (c) 365 days ago, and (d) 730 days ago. Why might different leagues require different decay rates?

Exercise 5. Define expected goals (xG) and explain why xG differential is a better predictor of future team performance than actual goal differential, particularly in the first 10 matches of a season. Provide a hypothetical example where the two metrics give opposite assessments of a team.

Exercise 6. A shot is taken from coordinates (108, 35) on a pitch of dimensions 120 x 80, where the goal center is at (120, 40). Calculate the distance to the center of the goal and the angle subtended by the goalposts (width 7.32 meters). Explain why both features are important for an xG model.

Part B: Model Building and Estimation (Exercises 7-12)

Exercise 7. Write pseudocode for the maximum likelihood estimation procedure in the Dixon-Coles model. Include the constraint that the average attack strength equals 1. Explain what happens to the optimization if this identifiability constraint is removed.

Exercise 8. An xG model is trained on 50,000 shots and achieves a Brier score of 0.082. The baseline Brier score (predicting the average goal rate of 10.5% for every shot) is 0.094. Calculate the Brier skill score and interpret the result. Is this a good xG model? Compare to the theoretical minimum Brier score.

Exercise 9. Implement a simple xG model using only two features: distance to goal and shot angle. Write the logistic regression equation explicitly. Using sample coefficients beta_0 = 1.2, beta_1 = -0.08 (distance), beta_2 = 3.5 (angle), calculate the xG for (a) a penalty kick at 12 yards with angle 0.45 radians, (b) a header from 8 yards with angle 0.35 radians, and (c) a long-range shot from 28 yards with angle 0.10 radians.

Exercise 10. The Dixon-Coles model uses a log-link parameterization where lambda = exp(log_alpha_home + log_beta_away + log_gamma). Explain why the log-link is preferred over the multiplicative form lambda = alpha * beta * gamma for numerical optimization. What constraints does the log-link automatically satisfy?

Exercise 11. You have xG data for two teams over 15 matches. Team A has scored 18 goals from 14.2 xG. Team B has scored 8 goals from 13.8 xG. Calculate the goals-minus-xG difference for each team and explain the concept of regression to the mean in this context. Which team would you expect to improve their goal-scoring output, and by approximately how much over the next 15 matches?

Exercise 12. Describe how you would calibrate an xG model. What does it mean for an xG model to be well-calibrated? Sketch a calibration plot for a hypothetical model that overestimates goal probability for high-xG chances and underestimates it for low-xG chances. How would you fix this miscalibration?

Part C: Asian Handicap Markets (Exercises 13-18)

Exercise 13. Explain the difference between whole-ball, half-ball, and quarter-ball Asian handicaps. For a match where you back the home team at AH -0.75 at odds 1.95 and the match ends 1-0, compute the exact profit or loss on a 100-unit stake.

Exercise 14. Your Dixon-Coles model produces the following 1X2 probabilities: Home 42%, Draw 27%, Away 31%. The market offers Home -0.25 at odds 1.92. Determine whether this represents positive expected value by computing the expected PnL per unit staked on the home side. Show your work step by step.

Exercise 15. A match has the following Asian handicap lines from a sharp bookmaker: Home -0.5 at 2.02, Away +0.5 at 1.88. Calculate the implied probabilities (removing the overround proportionally). Then compare these to your model's 1X2 probabilities of Home 48%, Draw 26%, Away 26%. Is there a discrepancy? What does it suggest?

Exercise 16. Create a complete payoff table for a bet on Away +0.75 at odds 2.00 for every possible match outcome category (home win by 2+, home win by 1, draw, away win). Remember that a quarter-ball line splits into two half-bets. Express the payoff as a fraction of the total stake.

Exercise 17. Professional soccer bettors prefer Asian handicap markets over 1X2 markets. List four reasons for this preference. Then explain a scenario where a 1X2 bet would actually be preferable to an Asian handicap bet.

Exercise 18. You observe the following Asian handicap line movements over 24 hours for a Premier League match: Opening: Home -0.5 at 1.95, After 6 hours: Home -0.75 at 1.95, After 12 hours: Home -0.75 at 2.00, Closing: Home -1.0 at 1.95. Interpret this sequence of movements. What does each shift tell you about sharp money flow? Would you bet with or against the closing line, and why?

Part D: League-Specific and Tournament Modeling (Exercises 19-24)

Exercise 19. The English Premier League averages 2.75 goals per game while the Italian Serie A averages 2.55 goals per game. Explain how you would adjust your Dixon-Coles model when applying it across these two leagues. If a team transfers from Serie A to the Premier League (hypothetically), how would you initialize their attack and defense parameters?

Exercise 20. Three teams are promoted from the Championship to the Premier League. Describe four different methods for initializing their parameters in your model. For each method, discuss its advantages and the data requirements. Which method would you use if you had access to full xG data from the Championship?

Exercise 21. Home advantage varies significantly across leagues: Premier League approximately 1.28, Bundesliga 1.30, MLS 1.42, and the Turkish Super Lig approximately 1.55. Propose three explanations for why home advantage is stronger in some leagues than others. How would you model a match between teams from different leagues at a neutral venue?

Exercise 22. International soccer tournaments (World Cup, Euros) present unique modeling challenges. List five specific challenges and for each, describe how you would address it in your model. Pay particular attention to the sample-size problem: a team like Brazil may play only 8-10 competitive matches between World Cups.

Exercise 23. The concept of squad rotation is important in European soccer, where top clubs may play 50-60 matches per season across domestic league, domestic cup, and European competitions. Explain how squad rotation affects match prediction. Design a feature that captures the degree to which a team is likely to rotate its squad for a given match.

Exercise 24. You are building a model for the Argentine Primera Division. This league has promotion/relegation, relegation based on a multi-year averaging table (not just the current season), and a unique tournament format that has changed multiple times. Describe the specific challenges this league presents compared to the Premier League and how you would adapt your model.

Part E: Advanced Applications and Betting Strategy (Exercises 25-30)

Exercise 25. Build a simulation-based approach to convert Dixon-Coles 1X2 probabilities into Asian handicap probabilities. Given P(Home) = 0.45, P(Draw) = 0.28, P(Away) = 0.27, and using a Poisson-based score simulation with lambda = 1.55 and mu = 1.15, run 50,000 simulations and compute the fair AH line and odds for the home team. Compare the simulated results with the analytical approach.

Exercise 26. Design a complete soccer betting system that combines Dixon-Coles ratings, xG-based adjustments, and Asian handicap analysis. Specify: (a) the data sources required, (b) the model update frequency, (c) the criteria for identifying value bets, (d) a bankroll management strategy, and (e) how you would evaluate the system's performance over a season.

Exercise 27. The xG regression trade involves betting against teams that are significantly outperforming their xG. Design a backtest for this strategy using the following protocol: after matchweek 10, identify all teams whose actual goals exceed their xG by more than 30%. Bet against these teams (on the opponent or the under) for the next 5 matchweeks. Specify how you would measure profitability and assess statistical significance.

Exercise 28. The draw is often mispriced in European soccer markets because recreational bettors tend to avoid it. Using historical data showing that draws occur in approximately 26% of matches but the average 1X2 draw odds imply only 23%, calculate the expected value of a blind draw-betting strategy. Then explain why this simplistic edge likely does not survive transaction costs and discuss how you would identify which specific matches offer genuine draw value.

Exercise 29. Implement a Bayesian updating scheme for Dixon-Coles parameters that updates after each matchweek rather than re-estimating from scratch. Describe the prior distribution you would use for each parameter (attack, defense, home advantage, rho) and explain how the posterior is computed after observing a new set of match results. What are the computational advantages over full re-estimation?

Exercise 30. You manage a bankroll dedicated to soccer betting across five major European leagues (Premier League, La Liga, Bundesliga, Serie A, Ligue 1). Design a season-long portfolio strategy that accounts for: (a) the different margins in each league's market, (b) the different model accuracy achievable in each league, (c) the interaction between simultaneous matches, (d) the optimal allocation of bankroll across leagues, and (e) a drawdown limit protocol. Specify concrete thresholds and percentages.