Chapter 7: Exercises

Overview

These exercises reinforce the concepts from Chapter 7 on Expected Goals (xG) models. They progress from fundamental understanding through practical implementation to advanced analysis. Complete solutions are available in the code/exercise-solutions.py file.


Part A: Conceptual Understanding (Questions 1-6)

Exercise 1: xG Fundamentals

Difficulty: Basic

Explain in your own words: a) What problem does xG solve that traditional shot statistics cannot? b) Why is a shot from 8 meters with a narrow angle different from one at 12 meters with a wide angle, even if their conversion rates are similar? c) What does an xG value of 0.25 mean in practical terms?


Exercise 2: Feature Importance Ranking

Difficulty: Basic

Rank the following xG model features from most to least important, based on typical feature importance scores. Justify your ranking:

  • Shot body part (foot vs. head)
  • Distance to goal
  • Angle to goal
  • Time remaining in match
  • Current score differential
  • Assist type (through ball, cross, etc.)

Exercise 3: Interpreting Match xG

Difficulty: Basic

Consider a match with the following statistics: - Team A: 1.8 xG, 3 goals scored - Team B: 2.2 xG, 1 goal scored

a) Which team created better chances? b) Which team was more efficient at converting chances? c) If this match were replayed 1000 times with the same chances, approximately what percentage would Team A win?


Exercise 4: Model Evaluation Metrics

Difficulty: Intermediate

Explain the difference between: a) Log loss and accuracy for xG models b) ROC AUC and precision-recall AUC c) Why might a model have excellent ROC AUC but poor calibration?


Exercise 5: xG Limitations

Difficulty: Intermediate

For each scenario, explain why standard xG models might produce misleading conclusions:

a) A penalty shootout specialist takes 30 penalties in a season b) A striker who only plays against weak opposition c) A team that scores 90% of their goals from set pieces d) A goalkeeper who faces primarily long-range shots


Exercise 6: Descriptive vs. Predictive xG

Difficulty: Intermediate

Explain the tension between using xG for: - Describing what happened in a match - Predicting future performance

When would you prioritize one interpretation over the other? Give specific examples.


Part B: Distance and Angle Calculations (Questions 7-12)

Exercise 7: Distance Calculation

Difficulty: Basic

Calculate the distance to the goal center (x=120, y=40 on StatsBomb coordinates) for shots taken from: a) (108, 40) b) (115, 35) c) (100, 55) d) (112, 40)

Show your work using the Euclidean distance formula.


Exercise 8: Angle Calculation

Difficulty: Intermediate

Using the goal post coordinates (left post at y=36.34, right post at y=43.66 for a 9.32m goal centered at y=40), calculate the shot angle in degrees for: a) A shot from (110, 40) - directly in front of goal b) A shot from (110, 50) - to the right of center c) A shot from (105, 35) - to the left at a wider position


Exercise 9: Distance-Angle Relationship

Difficulty: Intermediate

Write a function that, given a distance and angle, determines whether the shot is: - "Central close" (distance < 12m, angle > 25°) - "Central medium" (12m ≤ distance < 20m, angle > 20°) - "Wide close" (distance < 12m, angle ≤ 25°) - "Wide medium" (12m ≤ distance < 20m, angle ≤ 20°) - "Long range" (distance ≥ 20m)

Test your function on the shots from Exercises 7-8.


Exercise 10: Visualizing Shot Zones

Difficulty: Intermediate

Create a visualization that: a) Divides the attacking third into zones based on distance (0-6m, 6-12m, 12-18m, 18-25m, 25m+) b) Colors each zone according to typical conversion rate c) Overlays the goal posts for reference


Exercise 11: Expected Goals by Zone

Difficulty: Intermediate

Using StatsBomb open data from the 2018 World Cup: a) Calculate the average xG for shots in each zone from Exercise 10 b) Calculate the actual conversion rate in each zone c) Compare your zone-based estimates to StatsBomb's xG values


Exercise 12: Optimal Shooting Position

Difficulty: Advanced

Given the trade-off between distance (closer is better) and angle (wider is better), find the position(s) on the pitch that maximize expected goals by: a) Writing a function that estimates xG from position using distance and angle b) Creating a heatmap showing estimated xG across the attacking third c) Identifying the optimal shooting position (highest xG) at different distances from goal


Part C: Building xG Models (Questions 13-18)

Exercise 13: Simple Logistic Regression

Difficulty: Intermediate

Using World Cup 2018 data: a) Build a logistic regression model using only distance as a feature b) Report the coefficient and intercept c) Calculate xG for shots at distances of 5m, 10m, 15m, and 25m d) Plot the xG curve from 0-40m


Exercise 14: Multi-Feature Model

Difficulty: Intermediate

Extend Exercise 13 by: a) Adding angle and body part as features b) Comparing log loss between the simple and extended model c) Interpreting the coefficients: which features have the strongest effect?


Exercise 15: Feature Engineering

Difficulty: Intermediate

Create the following derived features and test whether they improve model performance: a) log_distance: Natural log of distance b) distance_squared: Square of distance c) angle_distance_interaction: Angle × Distance d) is_header: Binary indicator for headers e) is_close_range: Binary indicator for shots within 10m

Report the improvement in log loss and ROC AUC.


Exercise 16: Gradient Boosting Implementation

Difficulty: Advanced

Build a gradient boosting xG model: a) Use at least 6 features (distance, angle, body part, shot type, x-coordinate, y-coordinate) b) Tune hyperparameters using 5-fold cross-validation c) Compare performance to logistic regression d) Analyze feature importances


Exercise 17: Model Calibration

Difficulty: Advanced

For your gradient boosting model: a) Create a calibration curve comparing predicted probabilities to actual outcomes b) Identify any regions of miscalibration c) Apply Platt scaling or isotonic regression to improve calibration d) Report the change in Brier score after calibration


Exercise 18: Cross-Competition Validation

Difficulty: Advanced

Test model generalization: a) Train an xG model on World Cup 2018 data b) Evaluate on Women's World Cup 2019 data (or another available competition) c) Compare performance metrics between in-sample and out-of-sample d) Discuss reasons for any performance degradation


Part D: Applying xG Analysis (Questions 19-24)

Exercise 19: Player Finishing Analysis

Difficulty: Intermediate

Using World Cup 2018 data: a) Identify all players with at least 10 shots b) Calculate goals, xG, and goals-minus-xG for each c) Rank players by "finishing skill" (goals/xG ratio) d) Discuss the reliability of these rankings given sample sizes


Exercise 20: Team Shot Profile

Difficulty: Intermediate

For France and Croatia in the 2018 World Cup: a) Calculate total shots, total xG, and xG per shot b) Create shot maps showing location and xG for each team c) Compare their shot profiles: which team took higher quality chances? d) Analyze differences in shot zones (inside box vs. outside, central vs. wide)


Exercise 21: Match Analysis Deep Dive

Difficulty: Intermediate

Select the World Cup Final (France 4-2 Croatia): a) Plot a timeline of xG accumulation for both teams b) Identify the highest xG chance for each team c) Calculate the probability France would win given the xG created (using Poisson simulation) d) Discuss whether the actual scoreline was "deserved"


Exercise 22: Goalkeeper Evaluation

Difficulty: Advanced

For a selected goalkeeper with at least 20 shots faced: a) Calculate total xG conceded and actual goals conceded b) Compute "goals saved above expected" (xG - Goals) c) Break down by shot zone to identify strengths/weaknesses d) Discuss limitations of this analysis without post-shot xG


Exercise 23: Chance Creation Analysis

Difficulty: Advanced

Analyze which players create the best chances for teammates: a) Identify all passes that immediately precede shots b) Sum the xG of shots created by each passer (Expected Assists / xA) c) Compare xA to actual assists d) Identify the top 5 chance creators by xA


Exercise 24: xG Rolling Average

Difficulty: Intermediate

For a team of your choice with multiple matches: a) Calculate xG created and xG conceded per match b) Compute 3-match rolling averages c) Identify trends in chance creation/prevention over the tournament d) Visualize the rolling xG with actual goals overlaid


Part E: Model Comparison and Evaluation (Questions 25-28)

Exercise 25: Benchmark Comparison

Difficulty: Intermediate

Compare three xG estimation approaches: a) Simple distance-only logistic regression b) Multi-feature gradient boosting (your model) c) StatsBomb xG values (provided in data)

Report log loss, ROC AUC, and Brier score for each. Which performs best?


Exercise 26: Lift Analysis

Difficulty: Intermediate

For your best model: a) Divide predictions into deciles (10 groups by xG) b) Calculate actual conversion rate in each decile c) Compute lift (actual rate / baseline rate) for each decile d) Create a lift chart visualization

The top decile should have lift > 2.0 for a good model.


Exercise 27: Residual Analysis

Difficulty: Advanced

Examine model residuals: a) Calculate residuals (actual outcome - predicted probability) for all shots b) Group residuals by distance, angle, and body part c) Identify any systematic patterns suggesting missing features d) Propose additional features that might address the patterns


Exercise 28: Confidence Intervals

Difficulty: Advanced

Quantify uncertainty in xG predictions: a) Use bootstrap sampling (1000 iterations) to estimate model uncertainty b) For a sample of shots, compute 95% confidence intervals for xG c) Visualize how confidence interval width varies with predicted xG d) Discuss implications for communicating xG to non-technical audiences


Part F: Simulation and Prediction (Questions 29-32)

Exercise 29: Basic Match Simulation

Difficulty: Intermediate

Using Poisson distributions: a) Write a function that simulates a match outcome given home and away xG b) Run 10,000 simulations for a match with home xG = 1.5, away xG = 1.2 c) Calculate probabilities of home win, draw, and away win d) Generate the distribution of most likely scorelines


Exercise 30: Season Points Simulation

Difficulty: Intermediate

For a hypothetical team with 1.7 xG/match and 1.2 xGA/match: a) Simulate 1000 full 38-match seasons b) Calculate mean, standard deviation, and percentiles of total points c) Estimate the probability of finishing with 70+ points (Champions League) d) Estimate the probability of finishing below 35 points (relegation)


Exercise 31: Monte Carlo Match Prediction

Difficulty: Advanced

Create a full match prediction system: a) Take historical xG/xGA averages for two teams b) Apply home advantage adjustment (+10% xG for home team) c) Generate scoreline probability matrix using Poisson d) Calculate implied betting odds (1/probability) for each outcome e) Compare to actual bookmaker odds for a recent match


Exercise 32: Tournament Simulation

Difficulty: Advanced

Simulate the World Cup knockout rounds: a) Use average xG/xGA from group stage for each team b) Simulate each knockout match using your Poisson model c) Run 10,000 tournament simulations d) Calculate probability of winning the tournament for each team e) Compare your predicted winner probabilities to pre-tournament favorites


Part G: Advanced Topics (Questions 33-35)

Exercise 33: Expected Threat (xT) Grid

Difficulty: Advanced

Create a simplified Expected Threat model: a) Divide the pitch into a 12×8 grid (96 zones) b) For each zone, calculate the probability of a goal being scored from actions starting there c) Visualize the xT grid as a heatmap d) Compare your xT values in attacking zones to typical xG values


Exercise 34: Post-Shot xG Approximation

Difficulty: Advanced

Without actual shot placement data, approximate post-shot xG: a) For goals, assign high post-shot xG (0.7-0.9) based on being difficult to save b) For saves, estimate based on shot xG and whether it required a "great save" c) Analyze how post-shot xG differs from pre-shot xG d) Discuss what data would be needed for a proper PSxG model


Exercise 35: Neural Network xG Model

Difficulty: Expert

Build a neural network xG model: a) Design an architecture with 2-3 hidden layers b) Include appropriate regularization (dropout, early stopping) c) Train on World Cup data with validation split d) Compare performance to gradient boosting e) Discuss trade-offs between neural networks and tree-based models for xG


Submission Guidelines

For programming exercises: - Include well-commented code with docstrings - Generate all requested visualizations - Report numerical results to 2-3 decimal places - Include brief interpretations of results

For conceptual questions: - Provide clear, structured answers - Reference specific examples where appropriate - Acknowledge limitations and uncertainty


Grading Rubric

Category Weight Criteria
Conceptual Understanding 20% Accurate explanations, addresses nuances
Technical Implementation 35% Correct code, appropriate methods
Analysis Quality 25% Meaningful insights, proper interpretation
Visualization 10% Clear, informative, properly labeled
Communication 10% Well-structured, concise, professional