Case Study 2: Analyzing Team Chance Creation Patterns

Overview

This case study examines how different teams create chances using Expected Assists and Shot-Creating Actions. By comparing the chance creation methods of World Cup 2018 participants, we identify tactical patterns and understand what makes teams effective at generating scoring opportunities.

Learning Objectives: - Compare chance creation patterns across different teams - Identify tactical signatures in how teams create chances - Analyze the relationship between creation method and effectiveness - Understand set piece vs. open play contributions


The Question

The sporting director of your club wants to understand:

  1. How do the most successful World Cup teams create their chances?
  2. What's the balance between different creation methods (crosses, through balls, set pieces)?
  3. Which creation methods are most efficient at generating high-quality chances?
  4. What can we learn about tactical approaches from chance creation data?

Part 1: Team-Level xA Analysis

1.1 Calculating Team Chance Creation

import pandas as pd
import numpy as np
from statsbombpy import sb
import matplotlib.pyplot as plt
from collections import defaultdict

def load_team_chance_creation():
    """
    Load and calculate team-level chance creation metrics.
    """
    matches = sb.matches(competition_id=43, season_id=3)

    team_data = defaultdict(lambda: {
        'matches': 0,
        'total_xa': 0,
        'key_passes': 0,
        'crosses': 0,
        'through_balls': 0,
        'cutbacks': 0,
        'set_pieces': 0,
        'open_play': 0,
        'goals': 0,
        'shots': 0
    })

    all_xa_records = []

    for _, match_row in matches.iterrows():
        match_id = match_row['match_id']
        events = sb.events(match_id=match_id)

        shots = events[events['type'] == 'Shot']
        passes = events[events['type'] == 'Pass']

        # Process each team in the match
        for team in [match_row['home_team'], match_row['away_team']]:
            team_data[team]['matches'] += 1

            # Get team shots
            team_shots = shots[shots['team'] == team]
            team_data[team]['shots'] += len(team_shots)
            team_data[team]['goals'] += (team_shots['shot_outcome'] == 'Goal').sum()

            # Analyze key passes
            for _, shot in team_shots.iterrows():
                kp_id = shot.get('shot_key_pass_id')
                shot_type = shot.get('shot_type', 'Open Play')
                shot_xg = shot.get('shot_statsbomb_xg', 0)

                if pd.notna(kp_id):
                    kp = passes[passes['id'] == kp_id]
                    if len(kp) > 0:
                        kp = kp.iloc[0]

                        team_data[team]['key_passes'] += 1
                        team_data[team]['total_xa'] += shot_xg

                        # Classify pass type
                        if kp.get('pass_cross'):
                            team_data[team]['crosses'] += 1
                            pass_type = 'cross'
                        elif kp.get('pass_through_ball'):
                            team_data[team]['through_balls'] += 1
                            pass_type = 'through_ball'
                        elif kp.get('pass_cut_back'):
                            team_data[team]['cutbacks'] += 1
                            pass_type = 'cutback'
                        else:
                            pass_type = 'regular'

                        # Set piece vs open play
                        if shot_type in ['Free Kick', 'Corner', 'Penalty']:
                            team_data[team]['set_pieces'] += 1
                            is_set_piece = True
                        else:
                            team_data[team]['open_play'] += 1
                            is_set_piece = False

                        all_xa_records.append({
                            'team': team,
                            'match_id': match_id,
                            'pass_type': pass_type,
                            'shot_xg': shot_xg,
                            'is_set_piece': is_set_piece,
                            'goal': shot['shot_outcome'] == 'Goal'
                        })

    # Convert to DataFrame
    team_df = pd.DataFrame.from_dict(team_data, orient='index')
    team_df = team_df.reset_index().rename(columns={'index': 'team'})

    xa_df = pd.DataFrame(all_xa_records)

    return team_df, xa_df

team_stats, xa_records = load_team_chance_creation()

1.2 Team xA Rankings

def analyze_team_xa_rankings(team_df):
    """
    Rank teams by various chance creation metrics.
    """
    df = team_df.copy()

    # Calculate per-match metrics
    df['xa_per_match'] = df['total_xa'] / df['matches']
    df['key_passes_per_match'] = df['key_passes'] / df['matches']
    df['goals_per_match'] = df['goals'] / df['matches']

    # Calculate efficiency
    df['xa_per_key_pass'] = df['total_xa'] / df['key_passes'].replace(0, np.nan)
    df['conversion'] = df['goals'] / df['total_xa'].replace(0, np.nan)

    print("=" * 70)
    print("TEAM CHANCE CREATION RANKINGS")
    print("=" * 70)

    # By total xA created
    print("\nTop 10 by Total xA Created:")
    print("-" * 50)
    top_xa = df.nlargest(10, 'total_xa')[['team', 'matches', 'total_xa',
                                           'key_passes', 'goals']]
    top_xa['total_xa'] = top_xa['total_xa'].round(2)
    print(top_xa.to_string(index=False))

    # By xA per match
    print("\n\nTop 10 by xA per Match:")
    print("-" * 50)
    # Filter to teams with multiple matches
    multi_match = df[df['matches'] >= 3]
    top_xa_pm = multi_match.nlargest(10, 'xa_per_match')[
        ['team', 'matches', 'xa_per_match', 'goals_per_match']
    ]
    top_xa_pm['xa_per_match'] = top_xa_pm['xa_per_match'].round(2)
    top_xa_pm['goals_per_match'] = top_xa_pm['goals_per_match'].round(2)
    print(top_xa_pm.to_string(index=False))

    # By efficiency (xA per key pass)
    print("\n\nTop 10 by xA Efficiency (per Key Pass):")
    print("-" * 50)
    efficient = df[df['key_passes'] >= 20].nlargest(10, 'xa_per_key_pass')[
        ['team', 'key_passes', 'xa_per_key_pass']
    ]
    efficient['xa_per_key_pass'] = efficient['xa_per_key_pass'].round(3)
    print(efficient.to_string(index=False))

    return df

team_analysis = analyze_team_xa_rankings(team_stats)

Part 2: Chance Creation Methods

2.1 Cross vs. Through Ball Analysis

def analyze_creation_methods(team_df, xa_df):
    """
    Compare different chance creation methods.
    """
    print("\n" + "=" * 70)
    print("CHANCE CREATION METHODS ANALYSIS")
    print("=" * 70)

    # Overall method breakdown
    method_xa = xa_df.groupby('pass_type').agg({
        'shot_xg': ['sum', 'count', 'mean'],
        'goal': 'sum'
    })
    method_xa.columns = ['total_xa', 'count', 'xa_per_pass', 'goals']
    method_xa = method_xa.sort_values('total_xa', ascending=False)

    print("\nOverall xA by Pass Type:")
    print("-" * 60)
    for pass_type, row in method_xa.iterrows():
        pct = row['total_xa'] / method_xa['total_xa'].sum() * 100
        print(f"  {pass_type:15} | {row['count']:4.0f} passes | "
              f"{row['total_xa']:.2f} xA ({pct:.1f}%) | "
              f"{row['xa_per_pass']:.3f} xA/pass")

    # Team-level cross reliance
    print("\n\nTeam Cross Reliance (% of key passes that are crosses):")
    print("-" * 50)

    df = team_df.copy()
    df['cross_pct'] = df['crosses'] / df['key_passes'] * 100
    df = df.sort_values('cross_pct', ascending=False)

    for _, row in df.head(10).iterrows():
        bar = "█" * int(row['cross_pct'] / 5)
        print(f"  {row['team']:20} | {row['cross_pct']:7.1f}% | {bar}")

    # Team through ball usage
    print("\n\nTeam Through Ball Usage:")
    print("-" * 50)

    df['tb_pct'] = df['through_balls'] / df['key_passes'] * 100
    df = df.sort_values('tb_pct', ascending=False)

    for _, row in df.head(10).iterrows():
        bar = "█" * int(row['tb_pct'] / 2)
        print(f"  {row['team']:20} | {row['tb_pct']:7.1f}% | {bar}")

    return method_xa

method_analysis = analyze_creation_methods(team_stats, xa_records)

2.2 Set Piece vs. Open Play

def analyze_set_pieces_vs_open_play(team_df, xa_df):
    """
    Compare set piece and open play chance creation.
    """
    print("\n" + "=" * 70)
    print("SET PIECE VS OPEN PLAY ANALYSIS")
    print("=" * 70)

    # Overall breakdown
    set_piece_xa = xa_df[xa_df['is_set_piece']]['shot_xg'].sum()
    open_play_xa = xa_df[~xa_df['is_set_piece']]['shot_xg'].sum()
    total_xa = set_piece_xa + open_play_xa

    print(f"\nOverall xA Distribution:")
    print(f"  Set Pieces: {set_piece_xa:.2f} xA ({set_piece_xa/total_xa:.1%})")
    print(f"  Open Play:  {open_play_xa:.2f} xA ({open_play_xa/total_xa:.1%})")

    # xA per chance
    sp_count = xa_df['is_set_piece'].sum()
    op_count = (~xa_df['is_set_piece']).sum()

    print(f"\nxA per Key Pass:")
    print(f"  Set Pieces: {set_piece_xa/sp_count:.3f}")
    print(f"  Open Play:  {open_play_xa/op_count:.3f}")

    # Team-level set piece reliance
    print("\n\nTeam Set Piece Reliance (xA from set pieces):")
    print("-" * 50)

    team_sp = xa_df.groupby(['team', 'is_set_piece'])['shot_xg'].sum().unstack(fill_value=0)
    team_sp.columns = ['open_play_xa', 'set_piece_xa']
    team_sp['total_xa'] = team_sp['open_play_xa'] + team_sp['set_piece_xa']
    team_sp['sp_pct'] = team_sp['set_piece_xa'] / team_sp['total_xa'] * 100

    team_sp = team_sp.sort_values('sp_pct', ascending=False)

    for team, row in team_sp.head(10).iterrows():
        bar = "█" * int(row['sp_pct'] / 3)
        print(f"  {team:20} | {row['sp_pct']:7.1f}% | "
              f"SP: {row['set_piece_xa']:.2f}, OP: {row['open_play_xa']:.2f} | {bar}")

    return team_sp

set_piece_analysis = analyze_set_pieces_vs_open_play(team_stats, xa_records)

Part 3: Tactical Signatures

3.1 Identifying Tactical Patterns

def identify_tactical_signatures(team_df, xa_df):
    """
    Identify distinct tactical approaches to chance creation.
    """
    print("\n" + "=" * 70)
    print("TACTICAL SIGNATURES")
    print("=" * 70)

    df = team_df[team_df['matches'] >= 3].copy()

    # Calculate percentages
    df['cross_pct'] = df['crosses'] / df['key_passes']
    df['tb_pct'] = df['through_balls'] / df['key_passes']
    df['sp_pct'] = df['set_pieces'] / df['key_passes']

    # Classify tactical style
    def classify_style(row):
        if row['cross_pct'] > 0.35:
            return 'Cross-Heavy'
        elif row['tb_pct'] > 0.15:
            return 'Penetrative'
        elif row['sp_pct'] > 0.25:
            return 'Set-Piece Dependent'
        else:
            return 'Balanced'

    df['tactical_style'] = df.apply(classify_style, axis=1)

    print("\nTactical Style Classification:")
    print("-" * 60)

    for style in ['Cross-Heavy', 'Penetrative', 'Set-Piece Dependent', 'Balanced']:
        teams = df[df['tactical_style'] == style]
        print(f"\n{style} ({len(teams)} teams):")
        for _, row in teams.iterrows():
            print(f"  {row['team']:20} | "
                  f"Cross: {row['cross_pct']:.0%}, "
                  f"TB: {row['tb_pct']:.0%}, "
                  f"SP: {row['sp_pct']:.0%}")

    # Effectiveness by style
    print("\n\nEffectiveness by Tactical Style:")
    print("-" * 50)

    style_stats = df.groupby('tactical_style').agg({
        'team': 'count',
        'total_xa': 'mean',
        'goals': 'mean',
        'xa_per_key_pass': 'mean'
    }).rename(columns={'team': 'n_teams', 'total_xa': 'avg_xa',
                       'goals': 'avg_goals', 'xa_per_key_pass': 'avg_efficiency'})

    print(style_stats.round(2).to_string())

    return df

tactical_analysis = identify_tactical_signatures(team_stats, xa_records)

3.2 Deep Dive: Top Teams

def deep_dive_top_teams(team_df, xa_df, n_teams=4):
    """
    Detailed analysis of top teams' chance creation.
    """
    # Identify semi-finalists (France, Croatia, Belgium, England)
    top_teams = ['France', 'Croatia', 'Belgium', 'England']

    print("\n" + "=" * 70)
    print("DEEP DIVE: WORLD CUP SEMI-FINALISTS")
    print("=" * 70)

    for team in top_teams:
        team_xa = xa_df[xa_df['team'] == team]
        team_row = team_df[team_df['team'] == team].iloc[0]

        print(f"\n{'-' * 60}")
        print(f"{team}")
        print(f"{'-' * 60}")

        print(f"\nOverall: {team_row['total_xa']:.2f} xA from {team_row['key_passes']} key passes")
        print(f"Goals: {team_row['goals']} | Matches: {team_row['matches']}")
        print(f"xA per match: {team_row['total_xa']/team_row['matches']:.2f}")

        # Pass type breakdown
        type_breakdown = team_xa.groupby('pass_type').agg({
            'shot_xg': ['sum', 'count']
        })
        type_breakdown.columns = ['xa', 'count']

        print("\nBy Pass Type:")
        for pass_type, row in type_breakdown.iterrows():
            pct = row['xa'] / team_xa['shot_xg'].sum() * 100
            print(f"  {pass_type:15}: {row['xa']:.2f} xA ({pct:.0f}%)")

        # Set piece contribution
        sp_xa = team_xa[team_xa['is_set_piece']]['shot_xg'].sum()
        op_xa = team_xa[~team_xa['is_set_piece']]['shot_xg'].sum()

        print("\nSet Pieces vs Open Play:")
        print(f"  Set Pieces: {sp_xa:.2f} xA ({sp_xa/team_row['total_xa']:.0%})")
        print(f"  Open Play:  {op_xa:.2f} xA ({op_xa/team_row['total_xa']:.0%})")

        # Efficiency
        print(f"\nEfficiency:")
        print(f"  xA per key pass: {team_row['total_xa']/team_row['key_passes']:.3f}")

deep_dive_top_teams(team_stats, xa_records)

Part 4: Visualization

4.1 Team Comparison Chart

def create_team_comparison_visualization(team_df, top_n=12):
    """
    Create visualization comparing team chance creation.
    """
    # Filter to teams with multiple matches
    df = team_df[team_df['matches'] >= 3].copy()

    # Sort by total xA
    df = df.nlargest(top_n, 'total_xa')

    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    # 1. Total xA bar chart
    ax1 = axes[0, 0]
    bars = ax1.barh(df['team'], df['total_xa'], color='#1E3A8A', alpha=0.8)
    ax1.set_xlabel('Total xA')
    ax1.set_title('Total Expected Assists Created')
    ax1.invert_yaxis()

    # 2. xA per match
    ax2 = axes[0, 1]
    df['xa_pm'] = df['total_xa'] / df['matches']
    bars2 = ax2.barh(df['team'], df['xa_pm'], color='#059669', alpha=0.8)
    ax2.set_xlabel('xA per Match')
    ax2.set_title('xA per Match')
    ax2.invert_yaxis()

    # 3. Creation method breakdown (stacked)
    ax3 = axes[1, 0]
    df['other'] = df['key_passes'] - df['crosses'] - df['through_balls'] - df['cutbacks']

    bottom = np.zeros(len(df))
    for col, label, color in [
        ('crosses', 'Crosses', '#DC2626'),
        ('through_balls', 'Through Balls', '#1E3A8A'),
        ('cutbacks', 'Cutbacks', '#059669'),
        ('other', 'Other', '#6B7280')
    ]:
        ax3.barh(df['team'], df[col], left=bottom, label=label, alpha=0.8, color=color)
        bottom += df[col].values

    ax3.set_xlabel('Key Passes')
    ax3.set_title('Key Pass Type Distribution')
    ax3.legend(loc='lower right')
    ax3.invert_yaxis()

    # 4. Efficiency scatter
    ax4 = axes[1, 1]
    df['efficiency'] = df['total_xa'] / df['key_passes']
    colors = ['#DC2626' if t in ['France', 'Croatia', 'Belgium', 'England']
              else '#6B7280' for t in df['team']]

    ax4.scatter(df['key_passes'], df['efficiency'], s=df['total_xa']*50,
                c=colors, alpha=0.7, edgecolors='black')

    for _, row in df.iterrows():
        ax4.annotate(row['team'], (row['key_passes'], row['efficiency']),
                    fontsize=8, ha='center', va='bottom')

    ax4.set_xlabel('Total Key Passes')
    ax4.set_ylabel('xA per Key Pass')
    ax4.set_title('Efficiency vs Volume (size = total xA)')
    ax4.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig('team_chance_creation_comparison.png', dpi=150, bbox_inches='tight')
    plt.show()

    return fig

fig = create_team_comparison_visualization(team_analysis)

Part 5: Key Findings and Recommendations

7.1 Summary of Findings

def summarize_findings(team_df, xa_df):
    """
    Summarize key findings from the analysis.
    """
    print("\n" + "=" * 70)
    print("KEY FINDINGS")
    print("=" * 70)

    # Overall tournament patterns
    total_xa = xa_df['shot_xg'].sum()
    total_key_passes = len(xa_df)

    print("\n1. OVERALL PATTERNS:")
    print(f"   - Total xA created: {total_xa:.1f}")
    print(f"   - Average xA per key pass: {total_xa/total_key_passes:.3f}")

    # Method effectiveness
    cross_xa_per = xa_df[xa_df['pass_type']=='cross']['shot_xg'].mean()
    tb_xa_per = xa_df[xa_df['pass_type']=='through_ball']['shot_xg'].mean()

    print("\n2. CREATION METHOD EFFECTIVENESS:")
    print(f"   - Through balls generate higher xG per chance ({tb_xa_per:.3f}) vs crosses ({cross_xa_per:.3f})")

    # Set piece importance
    sp_total = xa_df[xa_df['is_set_piece']]['shot_xg'].sum()
    print(f"\n3. SET PIECE IMPORTANCE:")
    print(f"   - Set pieces account for {sp_total/total_xa:.1%} of total xA")

    # Top team characteristics
    finalists = team_df[team_df['team'].isin(['France', 'Croatia'])]
    print("\n4. FINALIST CHARACTERISTICS:")
    print(f"   - France: {finalists[finalists['team']=='France']['total_xa'].values[0]:.2f} xA")
    print(f"   - Croatia: {finalists[finalists['team']=='Croatia']['total_xa'].values[0]:.2f} xA")

    print("\n5. TACTICAL INSIGHTS:")
    print("   - Teams vary significantly in creation methods")
    print("   - No single 'best' approach - multiple styles can succeed")
    print("   - Balance between volume and quality matters")

    return

summarize_findings(team_analysis, xa_records)

7.2 Recommendations

## Recommendations for the Club

Based on our analysis of World Cup 2018 chance creation patterns:

### 1. Tactical Considerations
- **Cross-Heavy Teams** need target players who excel aerially
- **Penetrative Teams** require mobile attackers who make intelligent runs
- **Set-Piece Dependent** teams should invest in delivery specialists

### 2. Recruitment Implications
- When scouting chance creators, consider what creation style fits your tactics
- A player with high xA from crosses may not translate to a through-ball-focused system
- Set-piece specialists provide "guaranteed" xA but may have limited open-play contribution

### 3. Performance Benchmarks
- Elite teams create 1.5+ xA per match
- Top creative players contribute 0.2+ xA per 90
- Efficient creation = 0.12+ xA per key pass

### 4. Balance Recommendations
- Don't over-rely on one creation method (diversification reduces predictability)
- Invest in set-piece coaching (accounts for 25-35% of xA)
- Match creative player profiles to tactical system requirements

Conclusions

This case study demonstrated how team-level xA analysis reveals tactical patterns:

  1. Teams vary dramatically in how they create chances - from cross-heavy to penetrative to set-piece dependent
  2. Different methods have different efficiency - through balls generate higher xG but are harder to execute
  3. Set pieces matter significantly - contributing roughly a quarter of all xA
  4. Success comes through multiple routes - the finalists used different tactical approaches
  5. Context matters for scouting - a player's xA profile should match the team's tactical needs

Code Files

Complete implementation available in: - code/case-study-code.py - Full analysis pipeline - code/example-02-team-analysis.py - Team-level methods - code/example-03-visualization.py - Visualization techniques