27 min read

While Chapter 7 explored Expected Goals (xG) to measure shot quality, this chapter shifts focus to the actions that create those shots. Expected Assists (xA) and related chance creation metrics answer a fundamental question: Who creates the...

Learning Objectives

  • Define Expected Assists (xA) and explain its relationship to xG
  • Calculate xA from shot and pass data
  • Identify different types of chance-creating actions and their value
  • Evaluate player creativity using xA and related metrics
  • Build xA models and shot-creating action frameworks
  • Apply chance creation analysis to player scouting and recruitment
  • Distinguish between primary and secondary assists in analytics
  • Recognize the limitations and proper interpretation of xA metrics

Chapter 8: Expected Assists and Chance Creation

Learning Objectives

By the end of this chapter, you will be able to:

  1. Define Expected Assists (xA) and explain its relationship to xG
  2. Calculate xA from shot and pass data
  3. Identify different types of chance-creating actions and their value
  4. Evaluate player creativity using xA and related metrics
  5. Build xA models and shot-creating action frameworks
  6. Apply chance creation analysis to player scouting and recruitment
  7. Distinguish between primary and secondary assists in analytics
  8. Recognize the limitations and proper interpretation of xA metrics

8.1 Introduction to Chance Creation Analytics

While Chapter 7 explored Expected Goals (xG) to measure shot quality, this chapter shifts focus to the actions that create those shots. Expected Assists (xA) and related chance creation metrics answer a fundamental question: Who creates the opportunities that lead to goals?

8.1.1 The Creativity Problem

Traditional assist statistics suffer from significant limitations:

Dependency on conversion: A player who creates five excellent chances that teammates fail to convert receives zero assists. Another player whose routine pass happens to precede a long-range screamer receives full credit. This creates enormous noise in assist totals. Over a single season, the correlation between a player's assists and their underlying chance creation quality is surprisingly low--roughly 0.5-0.6, meaning that nearly half the variance in assist totals comes from factors outside the passer's control.

Binary attribution: Assists are all-or-nothing. A perfectly weighted through ball that puts an attacker one-on-one with the goalkeeper receives the same credit as a simple square pass that happens to precede a goal. In a typical Premier League season, roughly 25% of all assists come from passes that would be considered routine in any other context--they became assists only because of exceptional finishing by the shooter.

Ignoring non-assist contributions: Many valuable chance-creating actions don't directly precede shots. A player who consistently plays progressive passes into dangerous areas, forcing defenses to scramble, receives no statistical credit under the traditional assist framework.

Arbitrary definition boundaries: Different competitions and data providers define assists differently. Some count only the final pass before a goal; others credit the passer even if the scorer dribbles past two defenders before shooting. Corner kicks and free kicks that lead directly to goals may or may not count as assists depending on the competition's rules. This inconsistency makes cross-competition comparisons unreliable.

Intuition: Imagine two playmakers. Player A threads a perfect through ball that puts the striker one-on-one with the goalkeeper, but the striker blazes it over. Player B plays a simple five-yard pass, and the striker scores a worldie from 30 yards. Traditional assists credit Player B with 1 and Player A with 0. xA corrects this by crediting the quality of the chance created, not the outcome.

8.1.2 The Solution: Expected Assists

Expected Assists addresses these problems by crediting the passer with the xG of the resulting shot, regardless of whether it converts. This approach:

  • Removes conversion dependency: Good chances created count equally whether scored or missed
  • Weights by quality: A pass creating a 0.35 xG chance receives more credit than one creating a 0.05 xG opportunity
  • Enables stable measurement: xA totals stabilize faster than actual assists due to reduced noise
# Conceptual xA calculation
# For each pass that leads to a shot:
#   xA = xG of the resulting shot

# Example: A player makes 3 key passes in a match
key_passes = [
    {'shot_xg': 0.35, 'converted': True},   # Goal
    {'shot_xg': 0.22, 'converted': False},  # Saved
    {'shot_xg': 0.08, 'converted': False},  # Missed
]

total_xA = sum(kp['shot_xg'] for kp in key_passes)
actual_assists = sum(1 for kp in key_passes if kp['converted'])

print(f"Expected Assists: {total_xA:.2f}")  # 0.65 xA
print(f"Actual Assists: {actual_assists}")   # 1 assist

The intuition is straightforward: if a player delivers a pass that creates a shot with 0.35 xG, the passer receives 0.35 xA regardless of whether the shot converts. Over time, players who consistently create high-quality chances will accumulate high xA, and their actual assist totals will tend to converge toward their xA.

8.1.3 Definition and Relationship to xG

xA and xG are mathematically linked: the xA credited to a passer is exactly equal to the xG of the shot that resulted from their pass. This means:

  • The sum of all xA in a match equals the sum of all xG from assisted shots
  • A player's xA depends on both the quality of their pass AND the position from which the recipient shoots
  • xA inherits the strengths and weaknesses of the underlying xG model

This relationship has an important implication: if the xG model underestimates shots from certain positions (say, close-range headers from corners), it will also undervalue the passes that created those headers. Improvements to xG models automatically improve xA accuracy.

Real-World Example: In the 2018-19 Premier League, Eden Hazard recorded 15 assists from 7.8 xA, meaning his teammates converted at a rate far above expectation. The following season, after Hazard left for Real Madrid, the Chelsea attackers regressed significantly in their finishing, suggesting Hazard's high assist total was partly driven by teammate overperformance rather than a sudden decline in creativity.

8.1.4 Terminology and Definitions

Before proceeding, let us establish key terminology:

Key Pass: A pass that directly leads to a shot attempt, regardless of outcome. Also called "shot assist" in some systems. Note that a key pass is defined by outcome (did a shot follow?), not by the quality or intention of the pass itself.

Assist: A pass that directly leads to a goal. A subset of key passes.

Expected Assist (xA): The sum of xG values for all shots created by a player's passes. Sometimes called "xG Assisted" or "xGa" in different systems.

Shot-Creating Action (SCA): Any offensive action (pass, dribble, foul drawn, shot) in the two actions preceding a shot attempt. This broader definition, popularized by FBref, captures contributions beyond the final pass.

Goal-Creating Action (GCA): Any offensive action in the two actions preceding a goal.

Secondary Assist: The pass before the assist pass, sometimes called "hockey assist" or "pre-assist." This captures the penultimate contribution to a goal.

Callout: xA vs. Assists -- Why the Distinction Matters

Consider two creative midfielders over a season: - Player A: 180 key passes, 8.5 xA, 12 actual assists - Player B: 120 key passes, 9.2 xA, 5 actual assists

Traditional statistics favor Player A (12 assists vs. 5). But xA tells a different story: Player B actually created higher-quality chances per key pass (0.077 xA per key pass vs. 0.047), and his low assist total is likely due to poor finishing by teammates. Player B is probably the better creative player, despite having fewer assists.


8.2 Calculating Expected Assists

8.2.1 Basic xA Methodology

The fundamental xA calculation is straightforward:

import pandas as pd
import numpy as np
from statsbombpy import sb

def calculate_player_xa(events_df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate Expected Assists for each player.

    Parameters
    ----------
    events_df : pd.DataFrame
        Match event data including shots and passes

    Returns
    -------
    pd.DataFrame
        Player-level xA summary
    """
    # Identify shots and their preceding passes
    shots = events_df[events_df['type'] == 'Shot'].copy()

    # Get the pass that led to each shot (if any)
    xa_data = []

    for _, shot in shots.iterrows():
        # StatsBomb provides key_pass_id for shots
        if pd.notna(shot.get('shot_key_pass_id')):
            key_pass_id = shot['shot_key_pass_id']

            # Find the key pass
            key_pass = events_df[events_df['id'] == key_pass_id]

            if len(key_pass) > 0:
                passer = key_pass.iloc[0]['player']
                team = key_pass.iloc[0]['team']

                xa_data.append({
                    'passer': passer,
                    'team': team,
                    'shot_xg': shot['shot_statsbomb_xg'],
                    'goal': shot['shot_outcome'] == 'Goal',
                    'shot_player': shot['player']
                })

    xa_df = pd.DataFrame(xa_data)

    # Aggregate by player
    player_xa = xa_df.groupby(['passer', 'team']).agg({
        'shot_xg': 'sum',
        'goal': 'sum',
        'shot_player': 'count'
    }).rename(columns={
        'shot_xg': 'xA',
        'goal': 'assists',
        'shot_player': 'key_passes'
    }).reset_index()

    player_xa['xA_per_key_pass'] = player_xa['xA'] / player_xa['key_passes']

    return player_xa.sort_values('xA', ascending=False)

8.2.2 Linking Passes to Shots

The critical step in xA calculation is correctly linking passes to subsequent shots. Different data providers handle this differently:

StatsBomb: Provides shot_key_pass_id directly linking shots to their assist passes. Also includes pass_shot_assist boolean on passes. This is the most reliable approach because the linkage is established by the data collection team who watched the match.

Opta/Stats Perform: Uses assisted flag on shots and requires temporal matching to identify the assisting pass. The matching process can be error-prone, particularly when multiple passes occur in quick succession.

Wyscout: Provides assist_id linking goals to assists; key passes must be identified separately. Wyscout also classifies passes as "smart passes," which provides additional context for creativity analysis.

def link_passes_to_shots_temporal(events_df: pd.DataFrame,
                                   time_threshold: float = 12.0) -> pd.DataFrame:
    """
    Link passes to subsequent shots using temporal proximity.

    Use when direct linkage isn't available in the data.

    Parameters
    ----------
    events_df : pd.DataFrame
        Event data sorted by time
    time_threshold : float
        Maximum seconds between pass completion and shot

    Returns
    -------
    pd.DataFrame
        Shots with linked key pass information
    """
    events = events_df.sort_values(['match_id', 'minute', 'second']).copy()

    # Create timestamp
    events['timestamp'] = events['minute'] * 60 + events['second'].fillna(0)

    shots = events[events['type'] == 'Shot'].copy()
    passes = events[(events['type'] == 'Pass') &
                    (events['pass_outcome'].isna())]  # Successful passes

    linked_shots = []

    for _, shot in shots.iterrows():
        shot_time = shot['timestamp']
        shot_team = shot['team']

        # Find passes from same team within time threshold before shot
        candidate_passes = passes[
            (passes['team'] == shot_team) &
            (passes['timestamp'] < shot_time) &
            (passes['timestamp'] > shot_time - time_threshold)
        ]

        if len(candidate_passes) > 0:
            # Take the most recent pass
            key_pass = candidate_passes.iloc[-1]

            shot_data = shot.to_dict()
            shot_data['key_pass_player'] = key_pass['player']
            shot_data['key_pass_id'] = key_pass['id']

            linked_shots.append(shot_data)
        else:
            shot_data = shot.to_dict()
            shot_data['key_pass_player'] = None
            shot_data['key_pass_id'] = None
            linked_shots.append(shot_data)

    return pd.DataFrame(linked_shots)

8.2.3 Handling Edge Cases

Several scenarios require special handling:

Unassisted shots: Shots following dribbles, interceptions, or direct play from goalkeeper have no key pass. These contribute zero xA but should be tracked separately. In most leagues, roughly 30-40% of shots are unassisted.

Own goals: Own goals should not credit the passer with xA from the "assisted" shot. The xG model assigns xG based on the shooting situation, but an own goal represents a fundamentally different outcome that the passer did not create.

Set pieces: Free kick and corner takers receive xA for resulting shots, which can inflate their totals significantly. A designated corner taker might accumulate 2-3 xA per season purely from set-piece delivery. When evaluating open-play creativity, analysts should separate set-piece xA from open-play xA.

Second balls: When a shot is saved and another player shoots the rebound, the original passer might or might not receive credit depending on the system. Most xA implementations credit only the pass immediately preceding the shot, so the rebound shot would be attributed to whoever played the last pass before it (or classified as unassisted if the shooter simply followed up their own chance).

Dribbles preceding shots: When a player dribbles past defenders and then shoots, no pass is involved. Some systems attribute the xA to the player who passed to the dribbler; others treat the shot as unassisted. The choice depends on whether you want to measure direct chance creation (pass-to-shot) or broader chance involvement.

Common Pitfall: Set-piece takers (corner kick and free kick specialists) can accumulate large xA totals simply through volume. When evaluating creativity, always separate open-play xA from set-piece xA. A player with 8.0 total xA might have only 4.0 from open play if they take all corners and free kicks -- which tells a very different story about their creative ability.

def calculate_xa_with_context(events_df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate xA with additional context about assist types.
    """
    shots = events_df[events_df['type'] == 'Shot'].copy()

    xa_records = []

    for _, shot in shots.iterrows():
        record = {
            'shooter': shot['player'],
            'team': shot['team'],
            'shot_xg': shot.get('shot_statsbomb_xg', 0),
            'goal': shot['shot_outcome'] == 'Goal',
            'shot_type': shot.get('shot_type', 'Open Play'),
        }

        # Identify assist context
        key_pass_id = shot.get('shot_key_pass_id')

        if pd.isna(key_pass_id):
            record['assist_type'] = 'unassisted'
            record['passer'] = None
        else:
            key_pass = events_df[events_df['id'] == key_pass_id]
            if len(key_pass) > 0:
                kp = key_pass.iloc[0]
                record['passer'] = kp['player']

                # Classify assist type
                if kp.get('pass_cross', False):
                    record['assist_type'] = 'cross'
                elif kp.get('pass_through_ball', False):
                    record['assist_type'] = 'through_ball'
                elif kp.get('pass_cut_back', False):
                    record['assist_type'] = 'cutback'
                else:
                    record['assist_type'] = 'regular_pass'
            else:
                record['assist_type'] = 'unknown'
                record['passer'] = None

        xa_records.append(record)

    return pd.DataFrame(xa_records)

8.3 Key Pass Analysis and Creativity Metrics

Understanding chance creation requires looking beyond the simple xA number to examine the types and qualities of passes that create shooting opportunities.

8.3.1 Key Pass Volume and Quality

A player's creativity profile can be decomposed into two dimensions: how many chances they create (volume) and how good those chances are (quality). These dimensions are somewhat independent--a player can create many low-quality chances (e.g., a winger who crosses constantly but inaccurately) or fewer high-quality ones (e.g., a playmaker who picks the perfect through ball once per match).

def analyze_key_pass_profile(events_df: pd.DataFrame, player_name: str,
                              minutes_played: float) -> dict:
    """
    Create a detailed key pass profile for a player.
    """
    shots = events_df[events_df['type'] == 'Shot'].copy()
    passes = events_df[events_df['type'] == 'Pass']

    # Find all key passes by this player
    key_pass_records = []
    for _, shot in shots.iterrows():
        kp_id = shot.get('shot_key_pass_id')
        if pd.isna(kp_id):
            continue
        kp = passes[passes['id'] == kp_id]
        if len(kp) > 0 and kp.iloc[0]['player'] == player_name:
            key_pass_records.append({
                'shot_xg': shot.get('shot_statsbomb_xg', 0),
                'goal': shot['shot_outcome'] == 'Goal',
                'is_cross': kp.iloc[0].get('pass_cross', False),
                'is_through_ball': kp.iloc[0].get('pass_through_ball', False),
                'is_cutback': kp.iloc[0].get('pass_cut_back', False),
            })

    if not key_pass_records:
        return {'player': player_name, 'key_passes': 0}

    kp_df = pd.DataFrame(key_pass_records)
    per90_factor = 90 / minutes_played

    return {
        'player': player_name,
        'key_passes': len(kp_df),
        'key_passes_per90': len(kp_df) * per90_factor,
        'total_xa': kp_df['shot_xg'].sum(),
        'xa_per90': kp_df['shot_xg'].sum() * per90_factor,
        'xa_per_key_pass': kp_df['shot_xg'].mean(),
        'big_chances_created': (kp_df['shot_xg'] > 0.30).sum(),
        'assists': kp_df['goal'].sum(),
        'cross_pct': kp_df['is_cross'].mean(),
        'through_ball_pct': kp_df['is_through_ball'].mean(),
    }

8.3.2 Creativity Typologies

Based on key pass analysis, creative players tend to fall into several archetypes:

The Wide Creator: Generates xA primarily through crosses from wide positions. These players produce a high volume of key passes but with relatively low xA per key pass (typically 0.03-0.05). Examples include traditional wingers and overlapping full-backs. Their total xA can be high purely through volume.

The Line-Breaking Playmaker: Creates chances through incisive through balls and passes between defensive lines. Lower volume of key passes but much higher xA per key pass (0.08-0.15). These players create genuine "big chances" that put attackers in one-on-one situations.

The Progressive Distributor: Creates chances through a combination of progressive passes into dangerous zones. They may not always deliver the killer final pass but consistently advance play to positions where chances emerge. Their xA may be moderate, but their broader contribution to attacking sequences is significant.

The Set-Piece Specialist: Accumulates a substantial portion of xA from corner kicks, free kicks, and throw-ins. When evaluating open-play creativity, it is essential to separate set-piece xA from the total.

Callout: Separating Open-Play and Set-Piece xA

For accurate creativity assessment, always split xA into components: - Open-play xA: Reflects genuine creative ability in dynamic situations - Corner xA: Partially reflects delivery quality, partially set-piece design - Free kick xA: Similar to corner xA but from different positions - Throw-in xA: Rare but relevant for some teams

A player with 8.0 total xA but 3.0 from set pieces has an open-play xA of 5.0, which may tell a very different story than the headline number.


8.4 Types of Chance-Creating Actions

Not all chances are created equal, and understanding the types of actions that create shots helps contextualize player creativity.

8.4.1 Through Balls

Through balls--passes played into space behind the defense for an attacker to run onto--create some of the highest-quality chances:

def analyze_through_balls(events_df: pd.DataFrame) -> pd.DataFrame:
    """
    Analyze through ball effectiveness.
    """
    passes = events_df[events_df['type'] == 'Pass'].copy()
    through_balls = passes[passes['pass_through_ball'] == True]

    # Link to resulting shots
    shots = events_df[events_df['type'] == 'Shot']

    tb_stats = []
    for _, tb in through_balls.iterrows():
        # Find if this through ball led to a shot
        resulting_shot = shots[
            shots.get('shot_key_pass_id') == tb['id']
        ]

        tb_record = {
            'passer': tb['player'],
            'team': tb['team'],
            'successful': pd.isna(tb.get('pass_outcome')),
            'led_to_shot': len(resulting_shot) > 0,
            'shot_xg': resulting_shot['shot_statsbomb_xg'].iloc[0] if len(resulting_shot) > 0 else 0,
            'goal': (len(resulting_shot) > 0 and
                    resulting_shot['shot_outcome'].iloc[0] == 'Goal')
        }
        tb_stats.append(tb_record)

    tb_df = pd.DataFrame(tb_stats)

    # Summary statistics
    print("Through Ball Analysis:")
    print(f"Total through balls: {len(tb_df)}")
    print(f"Successful: {tb_df['successful'].sum()} ({tb_df['successful'].mean():.1%})")
    print(f"Led to shot: {tb_df['led_to_shot'].sum()} ({tb_df['led_to_shot'].mean():.1%})")
    print(f"Total xA from through balls: {tb_df['shot_xg'].sum():.2f}")
    print(f"Average xG when shot results: {tb_df[tb_df['led_to_shot']]['shot_xg'].mean():.3f}")

    return tb_df

Through balls typically account for only 2-3% of all passes but generate 8-12% of xA, reflecting their high value. The completion rate for through balls is significantly lower than for regular passes (typically 40-50% vs. 80-85%), making them a high-risk, high-reward action. The best creative players in the world are distinguished not just by attempting more through balls, but by completing them at higher rates.

Real-World Application: When scouting creative midfielders, through ball frequency and success rate are among the most valued metrics. Players who attempt 3+ through balls per 90 minutes with a success rate above 35% are exceptionally rare -- typically fewer than 20 players across Europe's top five leagues in any given season. This scarcity partly explains why elite playmakers command premium transfer fees.

8.4.2 Crosses

Crosses--passes from wide areas into the penalty box--are a traditional source of chances:

def analyze_crosses(events_df: pd.DataFrame) -> dict:
    """
    Analyze cross effectiveness and xA contribution.
    """
    passes = events_df[events_df['type'] == 'Pass'].copy()
    crosses = passes[passes['pass_cross'] == True]

    # Success rate
    successful_crosses = crosses[crosses['pass_outcome'].isna()]

    # Crosses leading to shots
    shots = events_df[events_df['type'] == 'Shot']

    crosses_to_shots = 0
    cross_xa = 0

    for _, cross in crosses.iterrows():
        shot = shots[shots.get('shot_key_pass_id') == cross['id']]
        if len(shot) > 0:
            crosses_to_shots += 1
            cross_xa += shot['shot_statsbomb_xg'].iloc[0]

    analysis = {
        'total_crosses': len(crosses),
        'successful': len(successful_crosses),
        'success_rate': len(successful_crosses) / len(crosses) if len(crosses) > 0 else 0,
        'led_to_shot': crosses_to_shots,
        'shot_rate': crosses_to_shots / len(crosses) if len(crosses) > 0 else 0,
        'total_xa': cross_xa,
        'xa_per_cross': cross_xa / len(crosses) if len(crosses) > 0 else 0
    }

    return analysis

Crosses have low individual xA (typically 0.02-0.04 per cross) but their volume means they contribute significantly to team chance creation. Teams vary dramatically in crossing frequency and effectiveness. In a typical Premier League season, the most cross-heavy team may attempt roughly 25 crosses per match, while the least cross-dependent may attempt only 10-12. Yet the relationship between crossing volume and goals scored is surprisingly weak, suggesting that crossing-heavy strategies are not necessarily optimal.

The analytics community has debated the value of crosses extensively. Research consistently shows that cutbacks from similar positions generate higher xG per pass than crosses, but crosses remain a staple of professional tactics because they can be executed quickly against organized defenses and exploit aerial mismatches.

8.4.3 Cutbacks

Cutbacks--passes from the byline back into the penalty area--create high-quality chances:

def identify_cutbacks(events_df: pd.DataFrame) -> pd.DataFrame:
    """
    Identify cutback passes and their effectiveness.

    Cutbacks are passes from near the byline (x > 114) passing
    backward (end_x < start_x) into the penalty area.
    """
    passes = events_df[events_df['type'] == 'Pass'].copy()

    # Extract coordinates
    passes['start_x'] = passes['location'].apply(
        lambda x: x[0] if isinstance(x, list) else None
    )
    passes['start_y'] = passes['location'].apply(
        lambda x: x[1] if isinstance(x, list) else None
    )
    passes['end_x'] = passes['pass_end_location'].apply(
        lambda x: x[0] if isinstance(x, list) else None
    )
    passes['end_y'] = passes['pass_end_location'].apply(
        lambda x: x[1] if isinstance(x, list) else None
    )

    passes = passes.dropna(subset=['start_x', 'end_x'])

    # Cutback criteria
    cutbacks = passes[
        (passes['start_x'] > 114) &  # Near byline
        (passes['end_x'] < passes['start_x']) &  # Passing backward
        (passes['end_x'] > 102) &  # Into penalty area
        (passes['end_y'] > 18) & (passes['end_y'] < 62)  # Within box width
    ]

    print(f"Identified {len(cutbacks)} cutback passes")

    return cutbacks

Cutbacks generate some of the highest xG-per-shot values because they often find attackers unmarked facing the goal. A successful cutback typically creates a shot with 0.15-0.25 xG, compared to 0.03-0.06 for a cross. This has led many analytically-minded teams to prioritize cutback deliveries over traditional crosses.

Best Practice: When analyzing chance creation methods, always compare xA per key pass (efficiency) alongside total xA (volume). A winger who delivers 100 crosses generating 4.0 xA is less efficient than one who delivers 40 crosses generating 3.2 xA, even though the first has higher total output. The efficient crosser may be making better decisions about when to cross.

8.4.4 Progressive Passes and Their Relationship to xA

Not all valuable passes directly create shots. Progressive passes move the ball significantly toward goal:

def identify_progressive_passes(passes_df: pd.DataFrame,
                                 progression_threshold: float = 12.0) -> pd.DataFrame:
    """
    Identify progressive passes that advance play toward goal.

    Parameters
    ----------
    passes_df : pd.DataFrame
        Pass data with coordinates
    progression_threshold : float
        Minimum forward progression in meters

    Returns
    -------
    pd.DataFrame
        Progressive passes
    """
    df = passes_df.copy()

    # Calculate progression
    df['progression'] = df['end_x'] - df['start_x']

    # Additional criteria: must be successful
    df['successful'] = df['pass_outcome'].isna()

    # Progressive pass definition
    progressive = df[
        (df['successful']) &
        (df['progression'] >= progression_threshold) &
        (df['end_x'] >= 60)  # Must end in opponent's half
    ]

    return progressive

While progressive passes don't directly appear in xA, tracking them helps identify players who consistently advance play into dangerous positions. Research has shown that progressive passing volume correlates moderately with xA (r = 0.4-0.5), suggesting that players who progress the ball well also tend to create more chances. However, the two skills are distinct: some players excel at progressing the ball into the final third but lack the vision to deliver the killer final pass, while others are poor progressors but deadly in the final third.

The relationship between progressive passing and xA is particularly important for evaluating central midfielders. A midfielder who completes 8-10 progressive passes per 90 minutes is significantly advancing their team's attacking potential, even if their direct xA is modest.

Callout: Progressive Passes as a Leading Indicator

Progressive passing metrics can serve as a leading indicator of future xA production. When a player joins a new team and plays in a more advanced role, their progressive passing numbers from a previous season can help predict how much xA they will generate. This is because progressive passing measures a fundamental skill (ability to identify and execute forward passes under pressure) that translates across tactical systems.

8.4.5 Smart Passes and Switch of Play

Some data providers track specific pass types that indicate tactical intelligence:

Smart passes: Unexpected passes that break defensive lines. Wyscout defines these as creative and penetrative passes that attempt to break the opposition's lines. They have a lower completion rate than average passes but create higher-quality chances when successful.

Switches: Long passes that change the point of attack. While switches rarely create immediate shooting opportunities, they can exploit defensive imbalances and create space for subsequent attacking actions.

Ground duels won leading to passes: Winning the ball and immediately creating a chance demonstrates both physical and technical quality.

def analyze_pass_types(events_df: pd.DataFrame) -> pd.DataFrame:
    """
    Break down chance creation by pass type.
    """
    # Get shots with their key passes
    shots = events_df[events_df['type'] == 'Shot'].copy()
    passes = events_df[events_df['type'] == 'Pass']

    pass_type_xa = {
        'through_ball': {'count': 0, 'xa': 0, 'goals': 0},
        'cross': {'count': 0, 'xa': 0, 'goals': 0},
        'cutback': {'count': 0, 'xa': 0, 'goals': 0},
        'regular': {'count': 0, 'xa': 0, 'goals': 0},
        'set_piece': {'count': 0, 'xa': 0, 'goals': 0}
    }

    for _, shot in shots.iterrows():
        kp_id = shot.get('shot_key_pass_id')
        if pd.isna(kp_id):
            continue

        kp = passes[passes['id'] == kp_id]
        if len(kp) == 0:
            continue

        kp = kp.iloc[0]
        xg = shot.get('shot_statsbomb_xg', 0)
        goal = shot['shot_outcome'] == 'Goal'

        # Classify pass type
        if kp.get('pass_through_ball'):
            pass_type = 'through_ball'
        elif kp.get('pass_cross'):
            pass_type = 'cross'
        elif kp.get('pass_cut_back'):
            pass_type = 'cutback'
        elif shot.get('shot_type') in ['Free Kick', 'Corner']:
            pass_type = 'set_piece'
        else:
            pass_type = 'regular'

        pass_type_xa[pass_type]['count'] += 1
        pass_type_xa[pass_type]['xa'] += xg
        pass_type_xa[pass_type]['goals'] += int(goal)

    # Create summary DataFrame
    summary = pd.DataFrame(pass_type_xa).T
    summary['xa_per_pass'] = summary['xa'] / summary['count'].replace(0, np.nan)

    return summary

8.5 Shot-Creating Actions (SCA) Framework

While xA focuses on the final pass, the Shot-Creating Actions framework from FBref/StatsBomb broadens the view to include all actions contributing to shot creation.

8.5.1 Defining Shot-Creating Actions

An SCA includes any of the following in the two actions before a shot:

  1. Pass leading to shot (the traditional key pass)
  2. Pass leading to another pass leading to shot (the "second assist")
  3. Dribble leading to shot
  4. Foul drawn leading to shot (winning a free kick in a dangerous area)
  5. Defensive action leading to shot (interception, tackle that starts a counter)

The two-action window captures the reality that many goals involve rapid sequences: a pass, a touch, and a shot. By looking at the two preceding actions rather than just one, SCA credits players whose contributions are one step removed from the final pass.

def calculate_sca(events_df: pd.DataFrame, n_actions: int = 2) -> pd.DataFrame:
    """
    Calculate Shot-Creating Actions for each player.

    Parameters
    ----------
    events_df : pd.DataFrame
        Event data
    n_actions : int
        Number of preceding actions to consider (default 2)

    Returns
    -------
    pd.DataFrame
        Player SCA statistics
    """
    events = events_df.sort_values(['minute', 'second']).reset_index(drop=True)

    shots = events[events['type'] == 'Shot']
    sca_records = []

    for shot_idx in shots.index:
        shot = events.loc[shot_idx]

        # Get preceding actions (same team, before shot)
        preceding = events[
            (events.index < shot_idx) &
            (events['team'] == shot['team']) &
            (events['type'].isin(['Pass', 'Carry', 'Dribble', 'Foul Won',
                                   'Interception', 'Tackle']))
        ].tail(n_actions)

        for _, action in preceding.iterrows():
            sca_records.append({
                'player': action['player'],
                'team': action['team'],
                'action_type': action['type'],
                'shot_xg': shot.get('shot_statsbomb_xg', 0),
                'goal': shot['shot_outcome'] == 'Goal',
                'is_primary': action.name == preceding.index[-1]  # Last action
            })

    sca_df = pd.DataFrame(sca_records)

    # Aggregate by player
    player_sca = sca_df.groupby(['player', 'team']).agg({
        'shot_xg': ['count', 'sum'],
        'goal': 'sum',
        'is_primary': 'sum'
    })
    player_sca.columns = ['SCA', 'SCA_xG', 'GCA', 'primary_SCA']
    player_sca = player_sca.reset_index()

    return player_sca.sort_values('SCA', ascending=False)

8.5.2 Goal-Creating Actions (GCA)

GCA narrows the focus to actions preceding goals rather than all shots:

def calculate_gca(events_df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate Goal-Creating Actions.
    """
    events = events_df.sort_values(['minute', 'second']).reset_index(drop=True)

    goals = events[(events['type'] == 'Shot') &
                   (events['shot_outcome'] == 'Goal')]

    gca_records = []

    for goal_idx in goals.index:
        goal = events.loc[goal_idx]

        # Get two preceding actions
        preceding = events[
            (events.index < goal_idx) &
            (events['team'] == goal['team']) &
            (events['type'].isin(['Pass', 'Carry', 'Dribble', 'Foul Won']))
        ].tail(2)

        for _, action in preceding.iterrows():
            gca_records.append({
                'player': action['player'],
                'team': action['team'],
                'action_type': action['type'],
            })

    gca_df = pd.DataFrame(gca_records)

    # Count by player
    player_gca = gca_df.groupby(['player', 'team']).size().reset_index(name='GCA')

    return player_gca.sort_values('GCA', ascending=False)

8.5.3 SCA vs. xA: When to Use Each

Metric Best For Limitations
xA Direct creativity measurement, comparing finishers' service Misses build-up contributions
SCA Holistic involvement in attacking moves Can over-credit peripheral involvement
GCA Measuring contribution to actual goals Small sample size issues
Key Passes Raw volume of chance creation No quality weighting

For player evaluation, using multiple metrics together provides the most complete picture. A player who ranks highly on xA per 90, SCA per 90, and progressive passes per 90 is almost certainly a genuinely creative player. One who ranks highly on only one of these metrics may have a more limited creative profile.

Advanced: FBref reports both SCA and GCA at the player level, broken down by action type (pass live, pass dead, dribble, shot, foul drawn, defensive action). This granular breakdown lets you distinguish between players who create through passing versus dribbling versus drawing fouls -- each indicating a different creative profile.


8.6 xA vs Actual Assists: Regression to the Mean

8.6.1 Understanding the Divergence

The gap between a player's actual assists and their xA provides insight into finishing quality around them and potential regression. Like goals vs. xG, assists vs. xA tend to regress toward expected values over time.

def analyze_xa_performance(player_data: pd.DataFrame,
                            min_xa: float = 3.0) -> pd.DataFrame:
    """
    Analyze which players over/underperform their xA.

    Parameters
    ----------
    player_data : pd.DataFrame
        Player xA and assist data
    min_xa : float
        Minimum xA for inclusion

    Returns
    -------
    pd.DataFrame
        xA performance analysis
    """
    df = player_data[player_data['xA'] >= min_xa].copy()

    df['assist_vs_xa'] = df['assists'] - df['xA']
    df['assist_conversion_rate'] = df['assists'] / df['xA']

    # Identify over/underperformers
    df['performance_category'] = pd.cut(
        df['assist_conversion_rate'],
        bins=[0, 0.7, 0.9, 1.1, 1.3, float('inf')],
        labels=['Very Unlucky', 'Unlucky', 'Expected',
                'Lucky', 'Very Lucky']
    )

    return df.sort_values('assist_vs_xa', ascending=False)

8.6.2 Why Regression Happens

The regression of assists toward xA occurs because the gap between actual assists and xA is driven primarily by teammate finishing variance, which is largely random over typical sample sizes. Several factors contribute:

Teammate finishing streaks. If a player's teammates happen to be in a hot finishing streak, more key passes will be converted to assists than xA would predict. This is not sustainable.

Shot quality within xG bands. xG provides a probability, but not all 0.20 xG shots are identical. Some are well-struck and heading for the corner; others are scuffed and easily saved. This within-band variation creates noise in the assist-to-xA ratio.

Sample size. Most creative players create 40-80 key passes per season. With conversion rates of 10-15%, this means only 4-12 of those passes become assists. The statistical noise in such small samples is enormous.

Research by multiple analytics groups has found that the year-on-year correlation of assists-minus-xA is approximately 0.10-0.15, indicating that nearly all the over/underperformance is noise rather than signal. By contrast, the year-on-year correlation of xA itself is much higher (0.50-0.65), confirming that chance creation quality is a more stable skill than assist conversion.

8.6.3 Convergence Timelines and Practical Expectations

A natural question is: how quickly do actual assists converge toward xA? The answer depends on the volume of key passes a player produces. For a highly creative midfielder generating roughly 100 key passes per season, the assists-to-xA ratio typically stabilizes within a single full season. For less prolific creators who produce only 30-50 key passes, meaningful convergence may require 18-24 months of data. This has important implications for mid-season evaluation: if a player is significantly over- or underperforming their xA halfway through a campaign, there is strong statistical reason to expect regression in the second half, but the magnitude of that regression is uncertain because the sample is still small. Analysts should resist the temptation to declare a player "lucky" or "unlucky" based on fewer than 40 key passes. Instead, they should flag the divergence as a monitoring item and revisit once more data has accumulated. In transfer contexts, the safest approach is to weight xA more heavily than actual assists whenever fewer than two full seasons of data are available, since xA provides a more stable signal of underlying creative quality.

Callout: Using xA to Set Realistic Assist Targets

When clubs sign a creative player, setting an assist target based on prior actual assists is a common mistake. A more robust approach is to project assists from xA: 1. Calculate the player's open-play xA per 90 from the last two seasons 2. Estimate minutes they will play in the new environment 3. Multiply to get projected xA for the season 4. Apply a conversion factor of 0.95-1.05 (assuming league-average finishing by new teammates)

This method avoids inflated expectations from seasons where teammates over-converted, and provides a defensible baseline for performance evaluation.

Callout: The Regression Trap in Transfer Markets

A common mistake in player recruitment is signing a player based on a high assist season without examining the underlying xA. If a player recorded 14 assists from 7.0 xA, they benefited from exceptional teammate finishing. Expecting 14 assists in the next season (or at a new club) is unrealistic--the player's true creation level is closer to 7 assists. Clubs that understand this distinction gain a significant edge in the transfer market.


8.7 Player Creativity Profiles

8.7.1 Building Comprehensive Profiles

Raw xA totals require context to be meaningful:

def contextualize_xa(player_xa: pd.DataFrame,
                      player_minutes: pd.DataFrame) -> pd.DataFrame:
    """
    Add context to xA statistics.
    """
    df = player_xa.merge(player_minutes, on=['player', 'team'])

    # Per-90 statistics
    df['xA_per90'] = df['xA'] / (df['minutes'] / 90)
    df['key_passes_per90'] = df['key_passes'] / (df['minutes'] / 90)

    # Efficiency metrics
    df['assists_per_xA'] = df['assists'] / df['xA']
    df['conversion_diff'] = df['assists'] - df['xA']

    return df.sort_values('xA_per90', ascending=False)

Key contextual factors:

Position: Attacking midfielders and wingers naturally accumulate more xA than defenders. A center-back with 0.05 xA per 90 is highly creative for the position, while an attacking midfielder with the same number would be below average.

Team style: Players in possession-heavy teams have more passing opportunities. A player at Manchester City or Barcelona will naturally attempt more passes in advanced areas than one at a counter-attacking team, inflating their raw xA.

Teammates: Playing with clinical finishers converts more xA to assists, but the quality of teammates also affects the types of chances available. Playing alongside a world-class striker who makes intelligent runs creates more through-ball opportunities.

8.7.2 Identifying Creative Players

Beyond raw xA, we can identify creativity through pass characteristics:

def analyze_creativity_profile(events_df: pd.DataFrame,
                                player: str) -> dict:
    """
    Build a creativity profile for a specific player.
    """
    player_passes = events_df[
        (events_df['type'] == 'Pass') &
        (events_df['player'] == player)
    ].copy()

    # Get shots that resulted from this player's passes
    shots = events_df[events_df['type'] == 'Shot']

    player_key_passes = []
    for _, p in player_passes.iterrows():
        resulting_shot = shots[
            shots.get('shot_key_pass_id') == p['id']
        ]
        if len(resulting_shot) > 0:
            player_key_passes.append({
                'pass': p,
                'shot': resulting_shot.iloc[0]
            })

    profile = {
        'player': player,
        'total_passes': len(player_passes),
        'successful_passes': player_passes['pass_outcome'].isna().sum(),
        'key_passes': len(player_key_passes),
        'total_xa': sum(kp['shot']['shot_statsbomb_xg'] for kp in player_key_passes),
    }

    # Pass type breakdown
    profile['through_balls'] = player_passes['pass_through_ball'].sum()
    profile['crosses'] = player_passes['pass_cross'].sum()
    profile['progressive_passes'] = len(player_passes[
        player_passes.get('pass_progressive', False) == True
    ])

    # xA by zone (where passes end)
    if len(player_key_passes) > 0:
        end_thirds = []
        for kp in player_key_passes:
            pass_data = kp['pass']
            end_x = pass_data['pass_end_location'][0] if isinstance(
                pass_data.get('pass_end_location'), list
            ) else None
            if end_x:
                if end_x > 102:
                    end_thirds.append('penalty_area')
                elif end_x > 80:
                    end_thirds.append('final_third')
                else:
                    end_thirds.append('midfield')

        from collections import Counter
        profile['xa_by_zone'] = Counter(end_thirds)

    return profile

8.7.3 Comparing Playmakers Across Leagues

When comparing playmakers across different leagues, several adjustments are necessary:

League tempo adjustment. Leagues with more possessions per match (e.g., the Bundesliga) naturally produce more passing actions than slower-paced leagues. Normalizing by possessions rather than minutes can provide fairer comparisons.

Defensive quality adjustment. Creating chances against Serie A defenses may be more difficult than against less organized leagues. While no perfect adjustment exists, comparing a player's xA to the league average for their position provides some control.

Age curves. Creative passing peaks at a different age than physical attributes. Research suggests that xA per 90 tends to peak between ages 25-29 for most creative midfielders, with a gradual decline thereafter. Young players (21-23) may show raw creative talent that has not yet been refined.

def compare_playmakers_cross_league(player_data: pd.DataFrame) -> pd.DataFrame:
    """
    Compare creative players across leagues with normalization.
    """
    df = player_data.copy()

    # Calculate league averages for the position
    league_avg = df.groupby(['league', 'position']).agg({
        'xA_per90': 'mean',
        'key_passes_per90': 'mean',
    }).rename(columns={
        'xA_per90': 'league_avg_xA_per90',
        'key_passes_per90': 'league_avg_kp_per90'
    })

    df = df.merge(league_avg, on=['league', 'position'])

    # Calculate above-average metrics
    df['xA_above_avg'] = df['xA_per90'] - df['league_avg_xA_per90']
    df['xA_percentile'] = df.groupby(['league', 'position'])['xA_per90'].rank(pct=True)

    return df.sort_values('xA_above_avg', ascending=False)

Common Pitfall: When comparing xA between players in different teams or leagues, raw numbers can be misleading. A creative midfielder at Manchester City will have more passing opportunities (and therefore more potential key passes) than one at a relegation-threatened club. Always use per-90 rates and consider the team context when making cross-team comparisons.


8.8 Team-Level Chance Creation Analysis

8.8.1 Team Chance Creation Profiles

Compare how teams create chances:

def analyze_team_chance_creation(match_data: pd.DataFrame) -> pd.DataFrame:
    """
    Analyze team-level chance creation patterns.
    """
    team_stats = match_data.groupby('team').agg({
        'total_xa': 'mean',
        'key_passes': 'mean',
        'crosses': 'mean',
        'through_balls': 'mean',
        'cutbacks': 'mean',
        'progressive_passes': 'mean'
    }).round(2)

    # Calculate percentages
    team_stats['cross_share'] = (
        team_stats['crosses'] /
        team_stats['key_passes']
    ).round(2)

    team_stats['through_ball_share'] = (
        team_stats['through_balls'] /
        team_stats['key_passes']
    ).round(2)

    return team_stats.sort_values('total_xa', ascending=False)

Teams with high xA but low goals may have finishing problems; teams with low xA but competitive goal totals may be over-relying on individual brilliance or set-piece efficiency.

8.8.2 Chance Creation Distribution

Analyzing how evenly xA is distributed across a squad reveals tactical dependency:

Concentrated creation: If one player accounts for more than 30% of a team's open-play xA, the team is heavily dependent on that individual. Injury or transfer of that player would significantly impact attacking output.

Distributed creation: Teams where multiple players contribute 10-15% of total xA are more robust to individual absences. These teams typically have more varied attacking patterns and are harder to defend against.

def analyze_creation_distribution(team_xa_data: pd.DataFrame, team: str) -> dict:
    """
    Analyze how evenly xA is distributed across a team's squad.
    """
    team_data = team_xa_data[team_xa_data['team'] == team].sort_values('xA', ascending=False)
    total_xa = team_data['xA'].sum()

    # Concentration metrics
    team_data['xa_share'] = team_data['xA'] / total_xa
    team_data['cumulative_share'] = team_data['xa_share'].cumsum()

    # How many players account for 50% of xA?
    players_for_50pct = (team_data['cumulative_share'] <= 0.50).sum() + 1

    # Herfindahl index (measure of concentration)
    herfindahl = (team_data['xa_share'] ** 2).sum()

    return {
        'team': team,
        'total_xa': total_xa,
        'top_creator_share': team_data['xa_share'].iloc[0],
        'top_creator': team_data['passer'].iloc[0],
        'players_for_50pct': players_for_50pct,
        'herfindahl_index': herfindahl,
        'creative_depth': 1 / herfindahl  # Effective number of creators
    }

8.8.3 Partnership Analysis

Analyze creative partnerships between passers and shooters:

def analyze_partnerships(events_df: pd.DataFrame,
                          min_connections: int = 5) -> pd.DataFrame:
    """
    Identify effective passer-shooter partnerships.
    """
    shots = events_df[events_df['type'] == 'Shot'].copy()
    passes = events_df[events_df['type'] == 'Pass']

    partnerships = []

    for _, shot in shots.iterrows():
        kp_id = shot.get('shot_key_pass_id')
        if pd.isna(kp_id):
            continue

        kp = passes[passes['id'] == kp_id]
        if len(kp) == 0:
            continue

        partnerships.append({
            'passer': kp.iloc[0]['player'],
            'shooter': shot['player'],
            'team': shot['team'],
            'xg': shot.get('shot_statsbomb_xg', 0),
            'goal': shot['shot_outcome'] == 'Goal'
        })

    partner_df = pd.DataFrame(partnerships)

    # Aggregate by partnership
    partnership_stats = partner_df.groupby(
        ['passer', 'shooter', 'team']
    ).agg({
        'xg': ['count', 'sum'],
        'goal': 'sum'
    })
    partnership_stats.columns = ['connections', 'total_xg', 'goals']
    partnership_stats = partnership_stats.reset_index()

    # Filter to meaningful partnerships
    partnership_stats = partnership_stats[
        partnership_stats['connections'] >= min_connections
    ]

    partnership_stats['xg_per_connection'] = (
        partnership_stats['total_xg'] /
        partnership_stats['connections']
    )

    return partnership_stats.sort_values('total_xg', ascending=False)

Strong partnerships--where specific passer-shooter combinations generate unusually high xG--indicate tactical patterns or natural chemistry that coaches can encourage. When a key partnership is broken (through injury or transfer), the impact on xA production can be significant.


8.9 Building xA Models

While we typically use the shot's xG as the xA value, we can also build models that predict assist probability directly.

8.9.1 Pass-Based Assist Probability

Model the probability that a pass results in an assist:

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss, roc_auc_score

def build_xa_model(passes_df: pd.DataFrame) -> tuple:
    """
    Build a model predicting assist probability from pass features.

    Parameters
    ----------
    passes_df : pd.DataFrame
        Pass data with features and assist outcome

    Returns
    -------
    tuple
        (model, feature_names, metrics)
    """
    df = passes_df.copy()

    # Feature engineering
    df['start_x'] = df['location'].apply(
        lambda x: x[0] if isinstance(x, list) else None
    )
    df['start_y'] = df['location'].apply(
        lambda x: x[1] if isinstance(x, list) else None
    )
    df['end_x'] = df['pass_end_location'].apply(
        lambda x: x[0] if isinstance(x, list) else None
    )
    df['end_y'] = df['pass_end_location'].apply(
        lambda x: x[1] if isinstance(x, list) else None
    )

    df = df.dropna(subset=['start_x', 'end_x'])

    # Calculate derived features
    df['pass_distance'] = np.sqrt(
        (df['end_x'] - df['start_x'])**2 +
        (df['end_y'] - df['start_y'])**2
    )
    df['end_distance_to_goal'] = np.sqrt(
        (120 - df['end_x'])**2 + (40 - df['end_y'])**2
    )
    df['progression'] = df['end_x'] - df['start_x']
    df['into_box'] = ((df['end_x'] > 102) &
                       (df['end_y'] > 18) &
                       (df['end_y'] < 62)).astype(int)

    # Pass type features
    df['is_cross'] = df['pass_cross'].fillna(False).astype(int)
    df['is_through_ball'] = df['pass_through_ball'].fillna(False).astype(int)

    # Target: did this pass lead to a goal?
    df['is_assist'] = df['pass_goal_assist'].fillna(False).astype(int)

    # Features
    feature_cols = [
        'start_x', 'start_y', 'end_x', 'end_y',
        'pass_distance', 'end_distance_to_goal', 'progression',
        'into_box', 'is_cross', 'is_through_ball'
    ]

    X = df[feature_cols].values
    y = df['is_assist'].values

    # Split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )

    # Train model
    model = GradientBoostingClassifier(
        n_estimators=100, max_depth=4,
        min_samples_leaf=20, random_state=42
    )
    model.fit(X_train, y_train)

    # Evaluate
    y_pred = model.predict_proba(X_test)[:, 1]
    metrics = {
        'log_loss': log_loss(y_test, y_pred),
        'roc_auc': roc_auc_score(y_test, y_pred)
    }

    return model, feature_cols, metrics

8.9.2 xA vs. xG-Based xA: Philosophical Considerations

There is a philosophical question about how to calculate xA:

xG-based xA: Credit the passer with the shot's xG value - Pros: Accounts for shot quality created - Cons: Does not capture pass difficulty or quality

Model-based xA: Predict assist probability from pass features - Pros: Values the pass itself, independent of shot execution - Cons: Does not distinguish a pass creating a tap-in vs. a difficult shot

Both approaches have merit; many analysts use xG-based xA for its simplicity and direct connection to goal-scoring. The model-based approach is more useful when you want to evaluate the pass itself rather than the resulting opportunity.

Intuition: The distinction between xG-based xA and model-based xA is analogous to valuing a real estate agent by the price of homes they sell (outcome-based) versus the quality of their salesmanship (process-based). The xG-based approach says "this player creates chances worth X goals." The model-based approach says "this player makes passes that are Y difficult and Z valuable." Both perspectives provide useful information.


8.10 Limitations and Alternatives

8.10.1 Sample Size Considerations

xA requires substantial sample sizes for reliability:

def estimate_xa_sample_requirements():
    """
    Estimate sample sizes needed for reliable xA conclusions.
    """
    # Based on typical xA variance
    typical_xa_per_90 = 0.20  # League average creative midfielder
    typical_variance = 0.08   # Standard deviation

    # For 95% confidence interval of +/-0.05 xA/90
    required_90s = (1.96 * typical_variance / 0.05) ** 2

    print("Sample Size Requirements for xA Analysis:")
    print("-" * 50)
    print(f"To estimate true xA/90 within +/-0.05 (95% CI):")
    print(f"  Required: ~{required_90s:.0f} 90-minute equivalents")
    print(f"  Approximately: {required_90s * 90 / 60:.0f} hours of play")
    print(f"  About: {required_90s / 3:.0f} full seasons (at ~30 appearances)")

8.10.2 Contextual Factors

Several factors affect xA interpretation:

Teammate quality: Playing with fast, clinical forwards inflates xA as they convert more chances and get into better positions. A creative midfielder's xA can drop noticeably if a key striker is injured and replaced by a less mobile alternative.

Team tactics: High-possession teams create more passing opportunities; counter-attacking teams may create fewer but higher-quality chances. Normalizing by possession can help, but does not fully address the issue.

Opposition strength: xA accumulated against weak opponents may not transfer to tougher competition. A player in a weaker league may appear more creative than they truly are.

Set pieces: Designated set-piece takers accumulate "easy" xA from crosses and free kicks. Always examine open-play xA separately.

8.10.3 What xA Does Not Capture

Several creative contributions escape xA measurement:

  • Passes before the key pass: Build-up play that creates space for the final ball
  • Decoy runs: Movement that creates space for teammates--entirely invisible to event data
  • Hold-up play: Retaining possession to allow teammates to advance positions
  • Drawing defenders: Dribbles that attract defenders, opening passing lanes for others
  • Decision-making quality: Choosing not to pass when a better option emerges is a creative skill that no metric captures

8.10.4 Alternatives: xGChain and xGBuildup

To address xA's limitation of only crediting the final pass, several extended metrics have been developed:

xGChain credits every player involved in a possession sequence that ends in a shot. If a possession involves passes from Players A, B, C, and D before a shot with 0.20 xG, each player receives 0.20 xGChain. This metric captures total involvement in chance-creating possessions.

xGBuildup is similar to xGChain but excludes the shooter and the assister. This specifically isolates build-up contributions from the final actions, making it ideal for evaluating deep-lying playmakers and ball-progressing defenders.

The relationship between xA, xGChain, and xGBuildup is hierarchical in terms of what each metric captures. xA is the narrowest, crediting only the final pass. xGChain is the broadest, crediting every player in the possession. xGBuildup sits in between, specifically targeting the players who contributed to the build-up but did not deliver the final ball or take the shot. When evaluating a central midfielder, comparing their xA rank to their xGBuildup rank is highly informative. A midfielder who ranks much higher in xGBuildup than in xA is primarily a facilitator--someone who gets the ball into dangerous areas but relies on others for the final action. Conversely, a midfielder who ranks higher in xA than xGBuildup is a "final-ball" specialist who may not contribute as much to the overall build-up but is deadly when the opportunity for a key pass arises. The most complete creative midfielders rank highly in both metrics, indicating that they contribute throughout the attacking sequence from build-up through the final pass.

def calculate_xg_chain(events_df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate xGChain: credit all players in a possession leading to a shot.
    """
    shots = events_df[events_df['type'] == 'Shot']

    chain_records = []

    for _, shot in shots.iterrows():
        xg = shot.get('shot_statsbomb_xg', 0)
        team = shot['team']
        possession_id = shot.get('possession', None)

        if possession_id is None:
            continue

        # Get all actions in this possession by the same team
        possession_events = events_df[
            (events_df['possession'] == possession_id) &
            (events_df['team'] == team)
        ]

        # Credit each unique player
        players_involved = possession_events['player'].dropna().unique()

        for player in players_involved:
            chain_records.append({
                'player': player,
                'team': team,
                'xg_chain': xg
            })

    chain_df = pd.DataFrame(chain_records)
    return chain_df.groupby(['player', 'team'])['xg_chain'].sum().reset_index()

Callout: Choosing the Right Creativity Metric

Metric Best For Key Limitation
xA Evaluating final-ball quality Ignores build-up
SCA/90 Overall attacking involvement Over-credits peripheral actions
xGChain Total involvement in attacking possessions Credits everyone equally
xGBuildup Build-up play contribution Ignores final-third actions
Progressive Passes Ball progression ability No direct link to goals

The best approach uses multiple metrics together. A player who ranks highly on xA AND xGBuildup AND progressive passes is a truly complete creative player.


8.11 Applications of Chance Creation Analysis

8.11.1 Player Scouting

xA and chance creation metrics are essential for scouting creative players:

def scout_creative_players(player_data: pd.DataFrame,
                            position_filter: list = None,
                            min_minutes: int = 900) -> pd.DataFrame:
    """
    Identify top creative players for scouting.

    Parameters
    ----------
    player_data : pd.DataFrame
        Player statistics including xA
    position_filter : list, optional
        Positions to include
    min_minutes : int
        Minimum playing time

    Returns
    -------
    pd.DataFrame
        Scouting shortlist
    """
    df = player_data[player_data['minutes'] >= min_minutes].copy()

    if position_filter:
        df = df[df['position'].isin(position_filter)]

    # Calculate scouting metrics
    df['xA_per90'] = df['xA'] / (df['minutes'] / 90)
    df['key_passes_per90'] = df['key_passes'] / (df['minutes'] / 90)
    df['progressive_passes_per90'] = df['progressive_passes'] / (df['minutes'] / 90)
    df['through_balls_per90'] = df['through_balls'] / (df['minutes'] / 90)

    # Composite creativity score (weighted average)
    df['creativity_score'] = (
        df['xA_per90'] * 10 +          # Primary metric
        df['key_passes_per90'] * 0.5 +  # Volume of chances
        df['through_balls_per90'] * 2   # High-value pass type
    )

    # Rank players
    scouting_cols = [
        'player', 'team', 'position', 'age', 'minutes',
        'xA', 'xA_per90', 'key_passes_per90',
        'creativity_score'
    ]

    return df[scouting_cols].sort_values('creativity_score', ascending=False)

8.11.2 Team Tactical Analysis

Teams with high xA but low goals may have finishing problems; teams with low xA but competitive goal totals may be over-relying on individual brilliance. By comparing a team's xA distribution across match situations (leading, drawing, trailing), analysts can identify how creativity changes under pressure.


8.12 Summary

Expected Assists and chance creation metrics provide essential tools for understanding how teams and players create goal-scoring opportunities:

  1. xA quantifies creativity by crediting passers with the xG of shots they create, removing dependency on teammate finishing
  2. The relationship between xA and xG is direct: xA equals the xG of the resulting shot, inheriting the strengths and weaknesses of the underlying xG model
  3. Key pass analysis reveals different creative styles: through-ball playmakers, crossing specialists, and progressive distributors
  4. Different pass types (through balls, crosses, cutbacks) have varying xA contributions and indicate different creative styles
  5. Shot-Creating Actions (SCA) broaden the view beyond the final pass to include all contributions to shot creation
  6. Assists regress toward xA because the gap is driven primarily by teammate finishing variance, which is largely random
  7. Player creativity profiles should incorporate multiple metrics (xA, SCA, progressive passes, xGBuildup) for a complete picture
  8. Cross-league comparisons require normalization for league tempo, defensive quality, and playing style
  9. Team-level analysis reveals chance creation concentration, tactical dependencies, and partnership dynamics
  10. Context matters for interpreting xA: position, team style, teammates, and set-piece duties all affect accumulation
  11. Alternatives like xGChain and xGBuildup address xA's limitation of only crediting the final pass
  12. Sample size requirements mean xA conclusions require substantial data (multiple seasons for individuals)

The next chapter explores Expected Threat (xT), which extends beyond passes-before-shots to value all actions that move the ball toward goal.


Key Formulas

Basic xA Calculation: $$xA_{player} = \sum_{i=1}^{n} xG_i$$

Where $xG_i$ is the expected goals value of shot $i$ that resulted from the player's key pass.

xA per 90: $$xA_{90} = \frac{xA_{total}}{Minutes / 90}$$

Assist vs xA Difference: $$\Delta_{assists} = Assists - xA$$

Positive values indicate over-performance; negative indicates under-performance.

xA per Key Pass: $$xA_{per KP} = \frac{xA_{total}}{Key Passes}$$

This measures the average quality of chances created per opportunity.

Herfindahl Index for Creation Concentration: $$H = \sum_{i=1}^{n} s_i^2$$

Where $s_i$ is player $i$'s share of total team xA. Higher values indicate more concentrated creation.


References

  1. StatsBomb (2020). "Expected Assists: A Better Way to Measure Creativity." StatsBomb Blog.
  2. Caley, M. (2015). "Expected Assists and Shot Quality." Cartilage Free Captain.
  3. FBref (2021). "Shot-Creating Actions Methodology." Sports Reference.
  4. Decroos, T. et al. (2019). "Actions Speak Louder than Goals." KDD Conference.
  5. Spearman, W. (2018). "Beyond Expected Goals." MIT Sloan Sports Analytics Conference.
  6. Understat (2021). "xGChain and xGBuildup Methodology." Understat.com.
  7. Singh, K. (2019). "Introducing Expected Threat (xT)." Karun.in.