Chapter 8: Exercises

Overview

These exercises reinforce concepts from Chapter 8 on Expected Assists (xA) and chance creation analysis. They progress from conceptual understanding through implementation to advanced applications. Solutions are available in code/exercise-solutions.py.


Part A: Conceptual Understanding (Questions 1-6)

Exercise 1: xA Fundamentals

Difficulty: Basic

a) Explain in your own words what Expected Assists (xA) measures and how it differs from traditional assists.

b) A player creates three key passes with the following shot outcomes: - Shot 1: xG = 0.25, Saved - Shot 2: xG = 0.40, Goal - Shot 3: xG = 0.10, Off target

Calculate their total xA and actual assists. Explain any difference.

c) Why might a player have high xA but few actual assists?


Exercise 2: Chance Creation Types

Difficulty: Basic

Rank the following pass types by their typical xA per pass (highest to lowest) and explain your reasoning:

  • Cross from wide area
  • Through ball behind defense
  • Cutback from byline
  • Simple square pass in the box
  • Long diagonal switch of play

Exercise 3: SCA vs. xA

Difficulty: Basic

Explain the difference between: a) Expected Assists (xA) and Shot-Creating Actions (SCA) b) Shot-Creating Actions (SCA) and Goal-Creating Actions (GCA) c) Primary assists and secondary assists

When would you use each metric?


Exercise 4: Interpreting xA Data

Difficulty: Intermediate

Player A: 15 matches, 8 assists, 8.2 xA Player B: 15 matches, 4 assists, 9.8 xA

a) Which player created better quality chances? b) Which player had luckier teammates? c) Which player would you expect to have more assists in the next 15 matches? d) What additional context would help evaluate these players?


Exercise 5: Position-Adjusted Analysis

Difficulty: Intermediate

Why is comparing raw xA totals between a central midfielder and a winger problematic? Describe at least three factors that affect xA accumulation differently by position.


Exercise 6: Limitations Identification

Difficulty: Intermediate

For each scenario, explain why xA might provide a misleading picture:

a) A player whose team plays exclusively counter-attacking football b) A designated set-piece taker c) A deep-lying playmaker who specializes in switching play d) A player whose teammates are exceptionally clinical finishers


Part B: Data Analysis (Questions 7-12)

Exercise 7: Calculating xA from Raw Data

Difficulty: Intermediate

Using World Cup 2018 data from StatsBomb: a) Load all shot events with their key pass information b) Calculate total xA for each player who recorded at least one key pass c) Identify the top 5 players by total xA d) Compare their xA to their actual assists

# Starter code
from statsbombpy import sb
import pandas as pd

matches = sb.matches(competition_id=43, season_id=3)
# Your code here...

Exercise 8: Key Pass Analysis

Difficulty: Intermediate

For a single match (France vs Croatia, match_id=7298): a) Identify all key passes (passes that led to shots) b) Calculate what percentage of each team's passes were key passes c) Analyze the location of key passes (start and end positions) d) Determine the average xG generated per key pass for each team


Exercise 9: Pass Type Breakdown

Difficulty: Intermediate

Using World Cup data: a) Categorize key passes into: through balls, crosses, cutbacks, and other b) Calculate the total xA generated by each category c) Calculate the average xG per key pass for each category d) Visualize the xA contribution by pass type


Exercise 10: Player Creativity Profile

Difficulty: Intermediate

Choose one player with significant playing time and create a "creativity profile" including: a) Total passes, key passes, and key pass percentage b) xA total and xA per 90 minutes c) Breakdown of key passes by type (crosses, through balls, etc.) d) Location heatmap of where their key passes end e) Comparison to positional averages


Exercise 11: Team Comparison

Difficulty: Intermediate

Compare two teams' chance creation patterns: a) Total xA and xA per match b) Number of key passes and conversion rate to goals c) Preferred methods of chance creation (cross-heavy vs. through ball-focused) d) Discuss what the differences suggest about playing styles


Exercise 12: Time-Based Analysis

Difficulty: Advanced

Analyze how xA accumulation changes during matches: a) Divide matches into 15-minute periods b) Calculate average xA created per period across all matches c) Identify when teams create the most/best chances d) Discuss possible explanations for the pattern


Part C: Model Building (Questions 13-18)

Exercise 13: Linking Passes to Shots

Difficulty: Intermediate

Write a function that links passes to subsequent shots using temporal proximity: a) Handle cases where StatsBomb's key_pass_id isn't available b) Use a configurable time threshold (default 10 seconds) c) Validate your linkage against StatsBomb's provided relationships d) Report the accuracy of your linking method


Exercise 14: xA from Shot xG

Difficulty: Intermediate

Implement the standard xA calculation: a) For each shot, identify the assisting pass b) Credit the passer with the shot's xG value c) Aggregate by player to get total xA d) Calculate xA per 90 minutes for each player

Compare your calculated xA to any pre-computed values in the data.


Exercise 15: Feature Engineering for Passes

Difficulty: Intermediate

Create features for predicting whether a pass will lead to a shot: a) Start and end coordinates b) Pass distance and direction c) Whether pass enters the penalty box d) Pass type (cross, through ball, etc.) e) Game context (score differential, time remaining)

Which features do you expect to be most predictive?


Exercise 16: Assist Probability Model

Difficulty: Advanced

Build a model predicting the probability that a pass results in an assist: a) Prepare training data from multiple matches b) Engineer relevant features (from Exercise 15) c) Train a logistic regression and gradient boosting model d) Evaluate using log loss and ROC AUC e) Compare to using shot xG as the xA value


Exercise 17: SCA Calculator

Difficulty: Advanced

Implement a Shot-Creating Actions calculator: a) Identify all shots in the event stream b) For each shot, find the two preceding actions by the same team c) Classify each action type (pass, dribble, foul won, etc.) d) Aggregate SCA by player e) Separate primary SCA (immediately before shot) from secondary SCA


Exercise 18: xA Model Validation

Difficulty: Advanced

Validate your xA calculations: a) Split data into training and test sets by match b) Calculate xA for test set players c) Compare total xA to actual assists at the season level d) Calculate correlation between xA and assists e) Assess calibration: do high xA players get more assists?


Part D: Applications (Questions 19-24)

Exercise 19: Scouting Report

Difficulty: Intermediate

Create a scouting report for creative midfielders: a) Filter to players with >900 minutes as midfielders b) Calculate xA per 90, key passes per 90, and through balls per 90 c) Create a composite "creativity score" d) Rank the top 10 most creative midfielders e) Write brief scouting notes on the top 3


Exercise 20: Partnership Analysis

Difficulty: Intermediate

Identify the most effective passer-shooter partnerships: a) Count connections between each passer-shooter pair b) Calculate total xG generated by each partnership c) Calculate goals scored by each partnership d) Identify partnerships that over/underperformed expectations e) Visualize the top 10 partnerships


Exercise 21: Set Piece xA

Difficulty: Intermediate

Analyze set piece contributions to xA: a) Separate xA from open play vs. set pieces (corners, free kicks) b) Identify the top set piece xA creators c) Calculate what percentage of each player's xA comes from set pieces d) Discuss implications for player evaluation


Exercise 22: Cross Analysis

Difficulty: Intermediate

Deep dive into crossing effectiveness: a) Calculate total crosses and cross success rate for each team b) Determine what percentage of crosses lead to shots c) Calculate average xG when a cross leads to a shot d) Identify the most effective crossers (xA from crosses) e) Compare crossing styles: early vs. byline crosses


Exercise 23: Through Ball Analysis

Difficulty: Advanced

Analyze through ball effectiveness: a) Identify all through ball attempts b) Calculate success rate, shot rate, and goal rate c) Determine average xG when through balls lead to shots d) Identify the best through ball passers e) Map where successful through balls start and end


Exercise 24: Creativity Under Pressure

Difficulty: Advanced

Analyze how chance creation varies by game state: a) Calculate xA created while winning, drawing, and losing b) Determine if players create better chances under pressure (losing) c) Identify players who excel at creating chances when trailing d) Discuss tactical implications


Part E: Visualization (Questions 25-28)

Exercise 25: xA Timeline

Difficulty: Intermediate

Create a match timeline visualization: a) Plot cumulative xA over time for both teams b) Mark when key passes occurred (with xG values) c) Highlight assists with special markers d) Add context annotations (goals, substitutions)


Exercise 26: Key Pass Map

Difficulty: Intermediate

Create a key pass visualization: a) Draw a soccer pitch b) Plot arrows from pass start to pass end for all key passes c) Color by xG generated (darker = higher xG) d) Size by outcome (goal, shot on target, shot off target) e) Separate by team for comparison


Exercise 27: Creativity Radar

Difficulty: Advanced

Create a radar chart comparing player creativity: a) Include: xA per 90, key passes per 90, through balls per 90, crosses per 90, progressive passes per 90 b) Normalize each metric to 0-100 scale based on positional percentiles c) Compare 2-3 players on the same chart d) Add league average for reference


Exercise 28: xA vs Assists Scatter

Difficulty: Intermediate

Create a scatter plot of xA vs. actual assists: a) Plot each player with sufficient playing time b) Add a diagonal reference line (xA = Assists) c) Color by position or team d) Label notable over/underperformers e) Add regression line and R² value


Part F: Advanced Analysis (Questions 29-32)

Exercise 29: xA Stability Analysis

Difficulty: Advanced

Analyze how stable xA is across time: a) Calculate xA per 90 for each player in the first half of the tournament b) Calculate xA per 90 for the same players in the second half c) Compute correlation between first-half and second-half xA per 90 d) Compare stability to actual assists e) Discuss implications for sample size requirements


Exercise 30: xA Chain Analysis

Difficulty: Advanced

Extend beyond single-pass analysis: a) Identify the sequence of 3 passes before each shot b) Calculate "chain xA" crediting all three passers c) Weight contributions (e.g., 60% final pass, 30% second-to-last, 10% third-to-last) d) Compare player rankings between standard xA and chain xA e) Discuss which approach is more useful


Exercise 31: Opponent-Adjusted xA

Difficulty: Expert

Create opponent-adjusted xA: a) Calculate each team's xGA (expected goals against) as a measure of defensive quality b) Adjust each player's xA based on opponent strength c) Compare raw xA rankings to adjusted xA rankings d) Identify players who look better/worse after adjustment e) Discuss the validity of this approach


Exercise 32: xA Projection

Difficulty: Expert

Build a model to project future xA: a) Use player characteristics (age, position, historical xA) as features b) Predict next-period xA based on current-period xA c) Account for regression to the mean d) Validate projections against actual outcomes e) Discuss confidence intervals and uncertainty


Submission Guidelines

For programming exercises: - Include complete, runnable code with comments - Generate all requested visualizations - Report numerical results clearly - Provide brief interpretations

For conceptual questions: - Write clear, structured answers - Use specific examples where helpful - Acknowledge limitations and uncertainties


Grading Rubric

Category Weight Criteria
Conceptual Understanding 20% Accurate explanations, nuanced thinking
Data Analysis 30% Correct calculations, appropriate methods
Implementation 25% Working code, efficient approaches
Interpretation 15% Meaningful insights, proper context
Presentation 10% Clear visualizations, organized output