Chapter 8: Exercises
Overview
These exercises reinforce concepts from Chapter 8 on Expected Assists (xA) and chance creation analysis. They progress from conceptual understanding through implementation to advanced applications. Solutions are available in code/exercise-solutions.py.
Part A: Conceptual Understanding (Questions 1-6)
Exercise 1: xA Fundamentals
Difficulty: Basic
a) Explain in your own words what Expected Assists (xA) measures and how it differs from traditional assists.
b) A player creates three key passes with the following shot outcomes: - Shot 1: xG = 0.25, Saved - Shot 2: xG = 0.40, Goal - Shot 3: xG = 0.10, Off target
Calculate their total xA and actual assists. Explain any difference.
c) Why might a player have high xA but few actual assists?
Exercise 2: Chance Creation Types
Difficulty: Basic
Rank the following pass types by their typical xA per pass (highest to lowest) and explain your reasoning:
- Cross from wide area
- Through ball behind defense
- Cutback from byline
- Simple square pass in the box
- Long diagonal switch of play
Exercise 3: SCA vs. xA
Difficulty: Basic
Explain the difference between: a) Expected Assists (xA) and Shot-Creating Actions (SCA) b) Shot-Creating Actions (SCA) and Goal-Creating Actions (GCA) c) Primary assists and secondary assists
When would you use each metric?
Exercise 4: Interpreting xA Data
Difficulty: Intermediate
Player A: 15 matches, 8 assists, 8.2 xA Player B: 15 matches, 4 assists, 9.8 xA
a) Which player created better quality chances? b) Which player had luckier teammates? c) Which player would you expect to have more assists in the next 15 matches? d) What additional context would help evaluate these players?
Exercise 5: Position-Adjusted Analysis
Difficulty: Intermediate
Why is comparing raw xA totals between a central midfielder and a winger problematic? Describe at least three factors that affect xA accumulation differently by position.
Exercise 6: Limitations Identification
Difficulty: Intermediate
For each scenario, explain why xA might provide a misleading picture:
a) A player whose team plays exclusively counter-attacking football b) A designated set-piece taker c) A deep-lying playmaker who specializes in switching play d) A player whose teammates are exceptionally clinical finishers
Part B: Data Analysis (Questions 7-12)
Exercise 7: Calculating xA from Raw Data
Difficulty: Intermediate
Using World Cup 2018 data from StatsBomb: a) Load all shot events with their key pass information b) Calculate total xA for each player who recorded at least one key pass c) Identify the top 5 players by total xA d) Compare their xA to their actual assists
# Starter code
from statsbombpy import sb
import pandas as pd
matches = sb.matches(competition_id=43, season_id=3)
# Your code here...
Exercise 8: Key Pass Analysis
Difficulty: Intermediate
For a single match (France vs Croatia, match_id=7298): a) Identify all key passes (passes that led to shots) b) Calculate what percentage of each team's passes were key passes c) Analyze the location of key passes (start and end positions) d) Determine the average xG generated per key pass for each team
Exercise 9: Pass Type Breakdown
Difficulty: Intermediate
Using World Cup data: a) Categorize key passes into: through balls, crosses, cutbacks, and other b) Calculate the total xA generated by each category c) Calculate the average xG per key pass for each category d) Visualize the xA contribution by pass type
Exercise 10: Player Creativity Profile
Difficulty: Intermediate
Choose one player with significant playing time and create a "creativity profile" including: a) Total passes, key passes, and key pass percentage b) xA total and xA per 90 minutes c) Breakdown of key passes by type (crosses, through balls, etc.) d) Location heatmap of where their key passes end e) Comparison to positional averages
Exercise 11: Team Comparison
Difficulty: Intermediate
Compare two teams' chance creation patterns: a) Total xA and xA per match b) Number of key passes and conversion rate to goals c) Preferred methods of chance creation (cross-heavy vs. through ball-focused) d) Discuss what the differences suggest about playing styles
Exercise 12: Time-Based Analysis
Difficulty: Advanced
Analyze how xA accumulation changes during matches: a) Divide matches into 15-minute periods b) Calculate average xA created per period across all matches c) Identify when teams create the most/best chances d) Discuss possible explanations for the pattern
Part C: Model Building (Questions 13-18)
Exercise 13: Linking Passes to Shots
Difficulty: Intermediate
Write a function that links passes to subsequent shots using temporal proximity: a) Handle cases where StatsBomb's key_pass_id isn't available b) Use a configurable time threshold (default 10 seconds) c) Validate your linkage against StatsBomb's provided relationships d) Report the accuracy of your linking method
Exercise 14: xA from Shot xG
Difficulty: Intermediate
Implement the standard xA calculation: a) For each shot, identify the assisting pass b) Credit the passer with the shot's xG value c) Aggregate by player to get total xA d) Calculate xA per 90 minutes for each player
Compare your calculated xA to any pre-computed values in the data.
Exercise 15: Feature Engineering for Passes
Difficulty: Intermediate
Create features for predicting whether a pass will lead to a shot: a) Start and end coordinates b) Pass distance and direction c) Whether pass enters the penalty box d) Pass type (cross, through ball, etc.) e) Game context (score differential, time remaining)
Which features do you expect to be most predictive?
Exercise 16: Assist Probability Model
Difficulty: Advanced
Build a model predicting the probability that a pass results in an assist: a) Prepare training data from multiple matches b) Engineer relevant features (from Exercise 15) c) Train a logistic regression and gradient boosting model d) Evaluate using log loss and ROC AUC e) Compare to using shot xG as the xA value
Exercise 17: SCA Calculator
Difficulty: Advanced
Implement a Shot-Creating Actions calculator: a) Identify all shots in the event stream b) For each shot, find the two preceding actions by the same team c) Classify each action type (pass, dribble, foul won, etc.) d) Aggregate SCA by player e) Separate primary SCA (immediately before shot) from secondary SCA
Exercise 18: xA Model Validation
Difficulty: Advanced
Validate your xA calculations: a) Split data into training and test sets by match b) Calculate xA for test set players c) Compare total xA to actual assists at the season level d) Calculate correlation between xA and assists e) Assess calibration: do high xA players get more assists?
Part D: Applications (Questions 19-24)
Exercise 19: Scouting Report
Difficulty: Intermediate
Create a scouting report for creative midfielders: a) Filter to players with >900 minutes as midfielders b) Calculate xA per 90, key passes per 90, and through balls per 90 c) Create a composite "creativity score" d) Rank the top 10 most creative midfielders e) Write brief scouting notes on the top 3
Exercise 20: Partnership Analysis
Difficulty: Intermediate
Identify the most effective passer-shooter partnerships: a) Count connections between each passer-shooter pair b) Calculate total xG generated by each partnership c) Calculate goals scored by each partnership d) Identify partnerships that over/underperformed expectations e) Visualize the top 10 partnerships
Exercise 21: Set Piece xA
Difficulty: Intermediate
Analyze set piece contributions to xA: a) Separate xA from open play vs. set pieces (corners, free kicks) b) Identify the top set piece xA creators c) Calculate what percentage of each player's xA comes from set pieces d) Discuss implications for player evaluation
Exercise 22: Cross Analysis
Difficulty: Intermediate
Deep dive into crossing effectiveness: a) Calculate total crosses and cross success rate for each team b) Determine what percentage of crosses lead to shots c) Calculate average xG when a cross leads to a shot d) Identify the most effective crossers (xA from crosses) e) Compare crossing styles: early vs. byline crosses
Exercise 23: Through Ball Analysis
Difficulty: Advanced
Analyze through ball effectiveness: a) Identify all through ball attempts b) Calculate success rate, shot rate, and goal rate c) Determine average xG when through balls lead to shots d) Identify the best through ball passers e) Map where successful through balls start and end
Exercise 24: Creativity Under Pressure
Difficulty: Advanced
Analyze how chance creation varies by game state: a) Calculate xA created while winning, drawing, and losing b) Determine if players create better chances under pressure (losing) c) Identify players who excel at creating chances when trailing d) Discuss tactical implications
Part E: Visualization (Questions 25-28)
Exercise 25: xA Timeline
Difficulty: Intermediate
Create a match timeline visualization: a) Plot cumulative xA over time for both teams b) Mark when key passes occurred (with xG values) c) Highlight assists with special markers d) Add context annotations (goals, substitutions)
Exercise 26: Key Pass Map
Difficulty: Intermediate
Create a key pass visualization: a) Draw a soccer pitch b) Plot arrows from pass start to pass end for all key passes c) Color by xG generated (darker = higher xG) d) Size by outcome (goal, shot on target, shot off target) e) Separate by team for comparison
Exercise 27: Creativity Radar
Difficulty: Advanced
Create a radar chart comparing player creativity: a) Include: xA per 90, key passes per 90, through balls per 90, crosses per 90, progressive passes per 90 b) Normalize each metric to 0-100 scale based on positional percentiles c) Compare 2-3 players on the same chart d) Add league average for reference
Exercise 28: xA vs Assists Scatter
Difficulty: Intermediate
Create a scatter plot of xA vs. actual assists: a) Plot each player with sufficient playing time b) Add a diagonal reference line (xA = Assists) c) Color by position or team d) Label notable over/underperformers e) Add regression line and R² value
Part F: Advanced Analysis (Questions 29-32)
Exercise 29: xA Stability Analysis
Difficulty: Advanced
Analyze how stable xA is across time: a) Calculate xA per 90 for each player in the first half of the tournament b) Calculate xA per 90 for the same players in the second half c) Compute correlation between first-half and second-half xA per 90 d) Compare stability to actual assists e) Discuss implications for sample size requirements
Exercise 30: xA Chain Analysis
Difficulty: Advanced
Extend beyond single-pass analysis: a) Identify the sequence of 3 passes before each shot b) Calculate "chain xA" crediting all three passers c) Weight contributions (e.g., 60% final pass, 30% second-to-last, 10% third-to-last) d) Compare player rankings between standard xA and chain xA e) Discuss which approach is more useful
Exercise 31: Opponent-Adjusted xA
Difficulty: Expert
Create opponent-adjusted xA: a) Calculate each team's xGA (expected goals against) as a measure of defensive quality b) Adjust each player's xA based on opponent strength c) Compare raw xA rankings to adjusted xA rankings d) Identify players who look better/worse after adjustment e) Discuss the validity of this approach
Exercise 32: xA Projection
Difficulty: Expert
Build a model to project future xA: a) Use player characteristics (age, position, historical xA) as features b) Predict next-period xA based on current-period xA c) Account for regression to the mean d) Validate projections against actual outcomes e) Discuss confidence intervals and uncertainty
Submission Guidelines
For programming exercises: - Include complete, runnable code with comments - Generate all requested visualizations - Report numerical results clearly - Provide brief interpretations
For conceptual questions: - Write clear, structured answers - Use specific examples where helpful - Acknowledge limitations and uncertainties
Grading Rubric
| Category | Weight | Criteria |
|---|---|---|
| Conceptual Understanding | 20% | Accurate explanations, nuanced thinking |
| Data Analysis | 30% | Correct calculations, appropriate methods |
| Implementation | 25% | Working code, efficient approaches |
| Interpretation | 15% | Meaningful insights, proper context |
| Presentation | 10% | Clear visualizations, organized output |