Case Study 2: Analyzing Team Chance Creation Patterns
Overview
This case study examines how different teams create chances using Expected Assists and Shot-Creating Actions. By comparing the chance creation methods of World Cup 2018 participants, we identify tactical patterns and understand what makes teams effective at generating scoring opportunities.
Learning Objectives: - Compare chance creation patterns across different teams - Identify tactical signatures in how teams create chances - Analyze the relationship between creation method and effectiveness - Understand set piece vs. open play contributions
The Question
The sporting director of your club wants to understand:
- How do the most successful World Cup teams create their chances?
- What's the balance between different creation methods (crosses, through balls, set pieces)?
- Which creation methods are most efficient at generating high-quality chances?
- What can we learn about tactical approaches from chance creation data?
Part 1: Team-Level xA Analysis
1.1 Calculating Team Chance Creation
import pandas as pd
import numpy as np
from statsbombpy import sb
import matplotlib.pyplot as plt
from collections import defaultdict
def load_team_chance_creation():
"""
Load and calculate team-level chance creation metrics.
"""
matches = sb.matches(competition_id=43, season_id=3)
team_data = defaultdict(lambda: {
'matches': 0,
'total_xa': 0,
'key_passes': 0,
'crosses': 0,
'through_balls': 0,
'cutbacks': 0,
'set_pieces': 0,
'open_play': 0,
'goals': 0,
'shots': 0
})
all_xa_records = []
for _, match_row in matches.iterrows():
match_id = match_row['match_id']
events = sb.events(match_id=match_id)
shots = events[events['type'] == 'Shot']
passes = events[events['type'] == 'Pass']
# Process each team in the match
for team in [match_row['home_team'], match_row['away_team']]:
team_data[team]['matches'] += 1
# Get team shots
team_shots = shots[shots['team'] == team]
team_data[team]['shots'] += len(team_shots)
team_data[team]['goals'] += (team_shots['shot_outcome'] == 'Goal').sum()
# Analyze key passes
for _, shot in team_shots.iterrows():
kp_id = shot.get('shot_key_pass_id')
shot_type = shot.get('shot_type', 'Open Play')
shot_xg = shot.get('shot_statsbomb_xg', 0)
if pd.notna(kp_id):
kp = passes[passes['id'] == kp_id]
if len(kp) > 0:
kp = kp.iloc[0]
team_data[team]['key_passes'] += 1
team_data[team]['total_xa'] += shot_xg
# Classify pass type
if kp.get('pass_cross'):
team_data[team]['crosses'] += 1
pass_type = 'cross'
elif kp.get('pass_through_ball'):
team_data[team]['through_balls'] += 1
pass_type = 'through_ball'
elif kp.get('pass_cut_back'):
team_data[team]['cutbacks'] += 1
pass_type = 'cutback'
else:
pass_type = 'regular'
# Set piece vs open play
if shot_type in ['Free Kick', 'Corner', 'Penalty']:
team_data[team]['set_pieces'] += 1
is_set_piece = True
else:
team_data[team]['open_play'] += 1
is_set_piece = False
all_xa_records.append({
'team': team,
'match_id': match_id,
'pass_type': pass_type,
'shot_xg': shot_xg,
'is_set_piece': is_set_piece,
'goal': shot['shot_outcome'] == 'Goal'
})
# Convert to DataFrame
team_df = pd.DataFrame.from_dict(team_data, orient='index')
team_df = team_df.reset_index().rename(columns={'index': 'team'})
xa_df = pd.DataFrame(all_xa_records)
return team_df, xa_df
team_stats, xa_records = load_team_chance_creation()
1.2 Team xA Rankings
def analyze_team_xa_rankings(team_df):
"""
Rank teams by various chance creation metrics.
"""
df = team_df.copy()
# Calculate per-match metrics
df['xa_per_match'] = df['total_xa'] / df['matches']
df['key_passes_per_match'] = df['key_passes'] / df['matches']
df['goals_per_match'] = df['goals'] / df['matches']
# Calculate efficiency
df['xa_per_key_pass'] = df['total_xa'] / df['key_passes'].replace(0, np.nan)
df['conversion'] = df['goals'] / df['total_xa'].replace(0, np.nan)
print("=" * 70)
print("TEAM CHANCE CREATION RANKINGS")
print("=" * 70)
# By total xA created
print("\nTop 10 by Total xA Created:")
print("-" * 50)
top_xa = df.nlargest(10, 'total_xa')[['team', 'matches', 'total_xa',
'key_passes', 'goals']]
top_xa['total_xa'] = top_xa['total_xa'].round(2)
print(top_xa.to_string(index=False))
# By xA per match
print("\n\nTop 10 by xA per Match:")
print("-" * 50)
# Filter to teams with multiple matches
multi_match = df[df['matches'] >= 3]
top_xa_pm = multi_match.nlargest(10, 'xa_per_match')[
['team', 'matches', 'xa_per_match', 'goals_per_match']
]
top_xa_pm['xa_per_match'] = top_xa_pm['xa_per_match'].round(2)
top_xa_pm['goals_per_match'] = top_xa_pm['goals_per_match'].round(2)
print(top_xa_pm.to_string(index=False))
# By efficiency (xA per key pass)
print("\n\nTop 10 by xA Efficiency (per Key Pass):")
print("-" * 50)
efficient = df[df['key_passes'] >= 20].nlargest(10, 'xa_per_key_pass')[
['team', 'key_passes', 'xa_per_key_pass']
]
efficient['xa_per_key_pass'] = efficient['xa_per_key_pass'].round(3)
print(efficient.to_string(index=False))
return df
team_analysis = analyze_team_xa_rankings(team_stats)
Part 2: Chance Creation Methods
2.1 Cross vs. Through Ball Analysis
def analyze_creation_methods(team_df, xa_df):
"""
Compare different chance creation methods.
"""
print("\n" + "=" * 70)
print("CHANCE CREATION METHODS ANALYSIS")
print("=" * 70)
# Overall method breakdown
method_xa = xa_df.groupby('pass_type').agg({
'shot_xg': ['sum', 'count', 'mean'],
'goal': 'sum'
})
method_xa.columns = ['total_xa', 'count', 'xa_per_pass', 'goals']
method_xa = method_xa.sort_values('total_xa', ascending=False)
print("\nOverall xA by Pass Type:")
print("-" * 60)
for pass_type, row in method_xa.iterrows():
pct = row['total_xa'] / method_xa['total_xa'].sum() * 100
print(f" {pass_type:15} | {row['count']:4.0f} passes | "
f"{row['total_xa']:.2f} xA ({pct:.1f}%) | "
f"{row['xa_per_pass']:.3f} xA/pass")
# Team-level cross reliance
print("\n\nTeam Cross Reliance (% of key passes that are crosses):")
print("-" * 50)
df = team_df.copy()
df['cross_pct'] = df['crosses'] / df['key_passes'] * 100
df = df.sort_values('cross_pct', ascending=False)
for _, row in df.head(10).iterrows():
bar = "█" * int(row['cross_pct'] / 5)
print(f" {row['team']:20} | {row['cross_pct']:7.1f}% | {bar}")
# Team through ball usage
print("\n\nTeam Through Ball Usage:")
print("-" * 50)
df['tb_pct'] = df['through_balls'] / df['key_passes'] * 100
df = df.sort_values('tb_pct', ascending=False)
for _, row in df.head(10).iterrows():
bar = "█" * int(row['tb_pct'] / 2)
print(f" {row['team']:20} | {row['tb_pct']:7.1f}% | {bar}")
return method_xa
method_analysis = analyze_creation_methods(team_stats, xa_records)
2.2 Set Piece vs. Open Play
def analyze_set_pieces_vs_open_play(team_df, xa_df):
"""
Compare set piece and open play chance creation.
"""
print("\n" + "=" * 70)
print("SET PIECE VS OPEN PLAY ANALYSIS")
print("=" * 70)
# Overall breakdown
set_piece_xa = xa_df[xa_df['is_set_piece']]['shot_xg'].sum()
open_play_xa = xa_df[~xa_df['is_set_piece']]['shot_xg'].sum()
total_xa = set_piece_xa + open_play_xa
print(f"\nOverall xA Distribution:")
print(f" Set Pieces: {set_piece_xa:.2f} xA ({set_piece_xa/total_xa:.1%})")
print(f" Open Play: {open_play_xa:.2f} xA ({open_play_xa/total_xa:.1%})")
# xA per chance
sp_count = xa_df['is_set_piece'].sum()
op_count = (~xa_df['is_set_piece']).sum()
print(f"\nxA per Key Pass:")
print(f" Set Pieces: {set_piece_xa/sp_count:.3f}")
print(f" Open Play: {open_play_xa/op_count:.3f}")
# Team-level set piece reliance
print("\n\nTeam Set Piece Reliance (xA from set pieces):")
print("-" * 50)
team_sp = xa_df.groupby(['team', 'is_set_piece'])['shot_xg'].sum().unstack(fill_value=0)
team_sp.columns = ['open_play_xa', 'set_piece_xa']
team_sp['total_xa'] = team_sp['open_play_xa'] + team_sp['set_piece_xa']
team_sp['sp_pct'] = team_sp['set_piece_xa'] / team_sp['total_xa'] * 100
team_sp = team_sp.sort_values('sp_pct', ascending=False)
for team, row in team_sp.head(10).iterrows():
bar = "█" * int(row['sp_pct'] / 3)
print(f" {team:20} | {row['sp_pct']:7.1f}% | "
f"SP: {row['set_piece_xa']:.2f}, OP: {row['open_play_xa']:.2f} | {bar}")
return team_sp
set_piece_analysis = analyze_set_pieces_vs_open_play(team_stats, xa_records)
Part 3: Tactical Signatures
3.1 Identifying Tactical Patterns
def identify_tactical_signatures(team_df, xa_df):
"""
Identify distinct tactical approaches to chance creation.
"""
print("\n" + "=" * 70)
print("TACTICAL SIGNATURES")
print("=" * 70)
df = team_df[team_df['matches'] >= 3].copy()
# Calculate percentages
df['cross_pct'] = df['crosses'] / df['key_passes']
df['tb_pct'] = df['through_balls'] / df['key_passes']
df['sp_pct'] = df['set_pieces'] / df['key_passes']
# Classify tactical style
def classify_style(row):
if row['cross_pct'] > 0.35:
return 'Cross-Heavy'
elif row['tb_pct'] > 0.15:
return 'Penetrative'
elif row['sp_pct'] > 0.25:
return 'Set-Piece Dependent'
else:
return 'Balanced'
df['tactical_style'] = df.apply(classify_style, axis=1)
print("\nTactical Style Classification:")
print("-" * 60)
for style in ['Cross-Heavy', 'Penetrative', 'Set-Piece Dependent', 'Balanced']:
teams = df[df['tactical_style'] == style]
print(f"\n{style} ({len(teams)} teams):")
for _, row in teams.iterrows():
print(f" {row['team']:20} | "
f"Cross: {row['cross_pct']:.0%}, "
f"TB: {row['tb_pct']:.0%}, "
f"SP: {row['sp_pct']:.0%}")
# Effectiveness by style
print("\n\nEffectiveness by Tactical Style:")
print("-" * 50)
style_stats = df.groupby('tactical_style').agg({
'team': 'count',
'total_xa': 'mean',
'goals': 'mean',
'xa_per_key_pass': 'mean'
}).rename(columns={'team': 'n_teams', 'total_xa': 'avg_xa',
'goals': 'avg_goals', 'xa_per_key_pass': 'avg_efficiency'})
print(style_stats.round(2).to_string())
return df
tactical_analysis = identify_tactical_signatures(team_stats, xa_records)
3.2 Deep Dive: Top Teams
def deep_dive_top_teams(team_df, xa_df, n_teams=4):
"""
Detailed analysis of top teams' chance creation.
"""
# Identify semi-finalists (France, Croatia, Belgium, England)
top_teams = ['France', 'Croatia', 'Belgium', 'England']
print("\n" + "=" * 70)
print("DEEP DIVE: WORLD CUP SEMI-FINALISTS")
print("=" * 70)
for team in top_teams:
team_xa = xa_df[xa_df['team'] == team]
team_row = team_df[team_df['team'] == team].iloc[0]
print(f"\n{'-' * 60}")
print(f"{team}")
print(f"{'-' * 60}")
print(f"\nOverall: {team_row['total_xa']:.2f} xA from {team_row['key_passes']} key passes")
print(f"Goals: {team_row['goals']} | Matches: {team_row['matches']}")
print(f"xA per match: {team_row['total_xa']/team_row['matches']:.2f}")
# Pass type breakdown
type_breakdown = team_xa.groupby('pass_type').agg({
'shot_xg': ['sum', 'count']
})
type_breakdown.columns = ['xa', 'count']
print("\nBy Pass Type:")
for pass_type, row in type_breakdown.iterrows():
pct = row['xa'] / team_xa['shot_xg'].sum() * 100
print(f" {pass_type:15}: {row['xa']:.2f} xA ({pct:.0f}%)")
# Set piece contribution
sp_xa = team_xa[team_xa['is_set_piece']]['shot_xg'].sum()
op_xa = team_xa[~team_xa['is_set_piece']]['shot_xg'].sum()
print("\nSet Pieces vs Open Play:")
print(f" Set Pieces: {sp_xa:.2f} xA ({sp_xa/team_row['total_xa']:.0%})")
print(f" Open Play: {op_xa:.2f} xA ({op_xa/team_row['total_xa']:.0%})")
# Efficiency
print(f"\nEfficiency:")
print(f" xA per key pass: {team_row['total_xa']/team_row['key_passes']:.3f}")
deep_dive_top_teams(team_stats, xa_records)
Part 4: Visualization
4.1 Team Comparison Chart
def create_team_comparison_visualization(team_df, top_n=12):
"""
Create visualization comparing team chance creation.
"""
# Filter to teams with multiple matches
df = team_df[team_df['matches'] >= 3].copy()
# Sort by total xA
df = df.nlargest(top_n, 'total_xa')
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# 1. Total xA bar chart
ax1 = axes[0, 0]
bars = ax1.barh(df['team'], df['total_xa'], color='#1E3A8A', alpha=0.8)
ax1.set_xlabel('Total xA')
ax1.set_title('Total Expected Assists Created')
ax1.invert_yaxis()
# 2. xA per match
ax2 = axes[0, 1]
df['xa_pm'] = df['total_xa'] / df['matches']
bars2 = ax2.barh(df['team'], df['xa_pm'], color='#059669', alpha=0.8)
ax2.set_xlabel('xA per Match')
ax2.set_title('xA per Match')
ax2.invert_yaxis()
# 3. Creation method breakdown (stacked)
ax3 = axes[1, 0]
df['other'] = df['key_passes'] - df['crosses'] - df['through_balls'] - df['cutbacks']
bottom = np.zeros(len(df))
for col, label, color in [
('crosses', 'Crosses', '#DC2626'),
('through_balls', 'Through Balls', '#1E3A8A'),
('cutbacks', 'Cutbacks', '#059669'),
('other', 'Other', '#6B7280')
]:
ax3.barh(df['team'], df[col], left=bottom, label=label, alpha=0.8, color=color)
bottom += df[col].values
ax3.set_xlabel('Key Passes')
ax3.set_title('Key Pass Type Distribution')
ax3.legend(loc='lower right')
ax3.invert_yaxis()
# 4. Efficiency scatter
ax4 = axes[1, 1]
df['efficiency'] = df['total_xa'] / df['key_passes']
colors = ['#DC2626' if t in ['France', 'Croatia', 'Belgium', 'England']
else '#6B7280' for t in df['team']]
ax4.scatter(df['key_passes'], df['efficiency'], s=df['total_xa']*50,
c=colors, alpha=0.7, edgecolors='black')
for _, row in df.iterrows():
ax4.annotate(row['team'], (row['key_passes'], row['efficiency']),
fontsize=8, ha='center', va='bottom')
ax4.set_xlabel('Total Key Passes')
ax4.set_ylabel('xA per Key Pass')
ax4.set_title('Efficiency vs Volume (size = total xA)')
ax4.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('team_chance_creation_comparison.png', dpi=150, bbox_inches='tight')
plt.show()
return fig
fig = create_team_comparison_visualization(team_analysis)
Part 5: Key Findings and Recommendations
7.1 Summary of Findings
def summarize_findings(team_df, xa_df):
"""
Summarize key findings from the analysis.
"""
print("\n" + "=" * 70)
print("KEY FINDINGS")
print("=" * 70)
# Overall tournament patterns
total_xa = xa_df['shot_xg'].sum()
total_key_passes = len(xa_df)
print("\n1. OVERALL PATTERNS:")
print(f" - Total xA created: {total_xa:.1f}")
print(f" - Average xA per key pass: {total_xa/total_key_passes:.3f}")
# Method effectiveness
cross_xa_per = xa_df[xa_df['pass_type']=='cross']['shot_xg'].mean()
tb_xa_per = xa_df[xa_df['pass_type']=='through_ball']['shot_xg'].mean()
print("\n2. CREATION METHOD EFFECTIVENESS:")
print(f" - Through balls generate higher xG per chance ({tb_xa_per:.3f}) vs crosses ({cross_xa_per:.3f})")
# Set piece importance
sp_total = xa_df[xa_df['is_set_piece']]['shot_xg'].sum()
print(f"\n3. SET PIECE IMPORTANCE:")
print(f" - Set pieces account for {sp_total/total_xa:.1%} of total xA")
# Top team characteristics
finalists = team_df[team_df['team'].isin(['France', 'Croatia'])]
print("\n4. FINALIST CHARACTERISTICS:")
print(f" - France: {finalists[finalists['team']=='France']['total_xa'].values[0]:.2f} xA")
print(f" - Croatia: {finalists[finalists['team']=='Croatia']['total_xa'].values[0]:.2f} xA")
print("\n5. TACTICAL INSIGHTS:")
print(" - Teams vary significantly in creation methods")
print(" - No single 'best' approach - multiple styles can succeed")
print(" - Balance between volume and quality matters")
return
summarize_findings(team_analysis, xa_records)
7.2 Recommendations
## Recommendations for the Club
Based on our analysis of World Cup 2018 chance creation patterns:
### 1. Tactical Considerations
- **Cross-Heavy Teams** need target players who excel aerially
- **Penetrative Teams** require mobile attackers who make intelligent runs
- **Set-Piece Dependent** teams should invest in delivery specialists
### 2. Recruitment Implications
- When scouting chance creators, consider what creation style fits your tactics
- A player with high xA from crosses may not translate to a through-ball-focused system
- Set-piece specialists provide "guaranteed" xA but may have limited open-play contribution
### 3. Performance Benchmarks
- Elite teams create 1.5+ xA per match
- Top creative players contribute 0.2+ xA per 90
- Efficient creation = 0.12+ xA per key pass
### 4. Balance Recommendations
- Don't over-rely on one creation method (diversification reduces predictability)
- Invest in set-piece coaching (accounts for 25-35% of xA)
- Match creative player profiles to tactical system requirements
Conclusions
This case study demonstrated how team-level xA analysis reveals tactical patterns:
- Teams vary dramatically in how they create chances - from cross-heavy to penetrative to set-piece dependent
- Different methods have different efficiency - through balls generate higher xG but are harder to execute
- Set pieces matter significantly - contributing roughly a quarter of all xA
- Success comes through multiple routes - the finalists used different tactical approaches
- Context matters for scouting - a player's xA profile should match the team's tactical needs
Code Files
Complete implementation available in:
- code/case-study-code.py - Full analysis pipeline
- code/example-02-team-analysis.py - Team-level methods
- code/example-03-visualization.py - Visualization techniques