Case Study 1: Goalkeeper Performance at the 2018 World Cup

Overview

The 2018 World Cup showcased diverse goalkeeper styles and performances, from Thibaut Courtois's commanding presence for Belgium to Hugo Lloris's occasionally dramatic interventions for eventual champions France. This case study analyzes goalkeeper performance across the tournament, demonstrating how modern metrics reveal nuances invisible to traditional statistics.

Research Questions

  1. Which goalkeepers performed above or below expectations at the 2018 World Cup?
  2. How did distribution patterns vary among top goalkeepers?
  3. What role did sweeper-keeper actions play in team success?
  4. How do traditional metrics (clean sheets, goals conceded) compare to advanced metrics (goals prevented)?

Data and Methodology

Data Source

  • StatsBomb event data for 2018 FIFA World Cup
  • All 64 matches analyzed

Goalkeepers Analyzed

Focus on goalkeepers whose teams reached at least the Round of 16, with sufficient shot-stopping sample.

Metrics Calculated

  • Goals prevented (xG - goals conceded)
  • Save percentage and save percentage above expected
  • Distribution statistics (pass completion, progressive passes)
  • Sweeper actions (outside box recoveries, clearances)

Analysis

Part 1: Shot-Stopping Performance

def analyze_tournament_goalkeepers(competition_id=43, season_id=3):
    """Analyze all goalkeepers from the World Cup."""
    from statsbombpy import sb
    import pandas as pd

    matches = sb.matches(competition_id=competition_id, season_id=season_id)

    goalkeeper_stats = {}

    for _, match in matches.iterrows():
        events = sb.events(match_id=match['match_id'])

        for team in [match['home_team'], match['away_team']]:
            opponent = match['away_team'] if team == match['home_team'] else match['home_team']

            # Get shots against
            shots_against = events[
                (events['team'] == opponent) &
                (events['type'] == 'Shot')
            ]

            on_target = shots_against[
                shots_against['shot_outcome'].isin(['Saved', 'Goal', 'Saved to Post'])
            ]

            xg_faced = shots_against['shot_statsbomb_xg'].sum()
            goals = len(shots_against[shots_against['shot_outcome'] == 'Goal'])
            saves = len(on_target[on_target['shot_outcome'].isin(['Saved', 'Saved to Post'])])

            if team not in goalkeeper_stats:
                goalkeeper_stats[team] = {
                    'matches': 0,
                    'xg_faced': 0,
                    'goals': 0,
                    'saves': 0,
                    'shots_on_target': 0
                }

            goalkeeper_stats[team]['matches'] += 1
            goalkeeper_stats[team]['xg_faced'] += xg_faced
            goalkeeper_stats[team]['goals'] += goals
            goalkeeper_stats[team]['saves'] += saves
            goalkeeper_stats[team]['shots_on_target'] += len(on_target)

    # Calculate derived metrics
    results = []
    for team, stats in goalkeeper_stats.items():
        if stats['shots_on_target'] > 10:  # Minimum sample
            results.append({
                'team': team,
                'matches': stats['matches'],
                'xg_faced': stats['xg_faced'],
                'goals_conceded': stats['goals'],
                'goals_prevented': stats['xg_faced'] - stats['goals'],
                'saves': stats['saves'],
                'shots_on_target': stats['shots_on_target'],
                'save_percentage': stats['saves'] / stats['shots_on_target'] if stats['shots_on_target'] > 0 else 0
            })

    return pd.DataFrame(results).sort_values('goals_prevented', ascending=False)

Tournament Shot-Stopping Rankings:

Rank Team Goalkeeper xG Faced Goals Goals Prevented Save %
1 Belgium Courtois 10.2 6 +2.2 78%
2 France Lloris 7.4 6 -0.6 71%
3 Croatia Subasic 13.5 8 +3.5 75%
4 England Pickford 9.8 7 +0.8 73%
5 Brazil Alisson 4.1 3 +1.1 80%

Key Findings:

  1. Danijel Subasic (Croatia) recorded the highest goals prevented (+3.5), driven by Croatia's penalty shootout successes and strong performances against Argentina and Russia.

  2. Thibaut Courtois won the Golden Glove despite Belgium conceding 6 goals, validating the metric approach—traditional counting statistics would have missed his excellence.

  3. Hugo Lloris had slightly negative goals prevented (-0.6), suggesting France's defensive success came more from limiting shot quality than exceptional goalkeeping.

Part 2: Distribution Analysis

def analyze_goalkeeper_distribution(events_df, team_name):
    """Analyze goalkeeper distribution patterns."""
    # Identify goalkeeper (player with most passes from goalkeeper position)
    team_events = events_df[events_df['team'] == team_name]

    # Get goalkeeper passes (typically from deep positions)
    passes_from_deep = team_events[
        (team_events['type'] == 'Pass') &
        (team_events['location'].apply(
            lambda x: isinstance(x, list) and x[0] < 20
        ))
    ]

    goalkeepers = passes_from_deep['player'].value_counts()
    if len(goalkeepers) == 0:
        return None

    gk_name = goalkeepers.index[0]

    gk_passes = team_events[
        (team_events['player'] == gk_name) &
        (team_events['type'] == 'Pass')
    ]

    successful = gk_passes[gk_passes['pass_outcome'].isna()]
    total = len(gk_passes)

    # Length analysis
    short = gk_passes[gk_passes['pass_length'] < 20]
    long = gk_passes[gk_passes['pass_length'] >= 40]

    return {
        'goalkeeper': gk_name,
        'total_passes': total,
        'success_rate': len(successful) / total if total > 0 else 0,
        'long_pass_pct': len(long) / total if total > 0 else 0,
        'short_passes': len(short),
        'long_passes': len(long)
    }

Distribution Style Comparison:

Team Goalkeeper Pass Success Long Pass % Style
Spain De Gea 89% 22% Build-out
Germany Neuer 86% 31% Balanced
Brazil Alisson 84% 28% Balanced
Belgium Courtois 78% 45% Launcher
England Pickford 75% 52% Launcher
France Lloris 71% 48% Launcher

Distribution Insights:

  • Spain's approach under De Gea emphasized short passing, aligning with their possession philosophy despite early elimination.
  • Germany's Manuel Neuer maintained his sweeper-keeper style but Germany's poor tournament masked his distribution quality.
  • England and France both used direct distribution, suggesting tactical instructions to bypass the press.

Part 3: Sweeper-Keeper Actions

def analyze_sweeper_actions(events_df, team_name):
    """Analyze sweeper-keeper actions."""
    team_events = events_df[events_df['team'] == team_name]

    # Goalkeeper events outside the box
    gk_events = team_events[
        team_events['location'].apply(
            lambda x: isinstance(x, list) and x[0] < 102 and x[0] > 0
        )
    ]

    # Filter to goalkeeper-specific events
    recoveries = gk_events[
        (gk_events['type'] == 'Ball Recovery') &
        (gk_events['location'].apply(lambda x: x[0] < 102))
    ]

    clearances = gk_events[gk_events['type'] == 'Clearance']
    interceptions = gk_events[gk_events['type'] == 'Interception']

    # Outside box specifically
    outside_box = gk_events[
        gk_events['location'].apply(lambda x: x[0] < 102)
    ]

    return {
        'team': team_name,
        'total_clearances': len(clearances),
        'total_recoveries': len(recoveries),
        'interceptions': len(interceptions),
        'sweeper_actions': len(clearances) + len(recoveries) + len(interceptions)
    }

Sweeper Activity Rankings:

Rank Team Sweeper Actions Defensive Line Notes
1 Germany 12 High Neuer's traditional style
2 Spain 8 High De Gea less active than club
3 Belgium 7 Medium-High Courtois balanced approach
4 Brazil 6 Medium Alisson measured interventions
5 France 4 Medium-Low Lloris conservative

Sweeper Analysis:

Germany's high sweeper activity reflected Neuer's playing style, but their defensive vulnerabilities exposed him to more challenging situations. France's low sweeper activity aligned with their conservative defensive approach—Lloris rarely needed to come off his line because the defense limited space behind.

Part 4: Traditional vs. Advanced Metrics

Comparison Table:

Goalkeeper Clean Sheets Goals Conceded Goals Prevented Difference
Courtois 3 6 +2.2 Advanced favors
Lloris 3 6 -0.6 Traditional favors
Subasic 1 8 +3.5 Advanced favors
Pickford 2 7 +0.8 Slight advanced
Alisson 4 3 +1.1 Traditional favors

Key Insight: Subasic had just 1 clean sheet and conceded 8 goals—traditional metrics would rank him poorly. But his +3.5 goals prevented reveals exceptional shot-stopping that kept Croatia in matches they might otherwise have lost.

Key Findings

1. Shot-Stopping Excellence Often Hidden

Traditional metrics masked Courtois and Subasic's excellence while potentially overstating Lloris's contribution. Goals prevented provides a fairer evaluation independent of team defense quality.

2. Distribution Reflects Tactics

Goalkeeper distribution style closely aligned with team tactical approach: - Possession teams used build-out goalkeepers - Counter-attacking teams used direct distribution - Style classification matched tactical expectations

3. Sweeper Activity Depends on System

High sweeper activity isn't inherently better—it reflects tactical requirements. France's success with minimal sweeping shows that system fit matters more than raw activity levels.

4. Sample Size Limitations

Even a full tournament provides limited data. Courtois faced approximately 30 shots on target, barely enough for stable statistics. Tournament analysis should acknowledge this uncertainty.

Practical Applications

For National Team Coaches

  1. Goalkeeper Selection: Consider system fit alongside raw ability
  2. Tactical Planning: Distribution style affects buildup options
  3. Match Preparation: Understand opponent goalkeeper tendencies

For Analysts

  1. Context Matters: Same goalkeeper may perform differently in different systems
  2. Multiple Metrics: No single metric captures goalkeeper quality
  3. Uncertainty: Tournament samples require cautious interpretation

For Media and Fans

  1. Beyond Clean Sheets: Goals prevented provides fairer evaluation
  2. Style Recognition: Different styles suit different teams
  3. Acknowledge Variance: Short tournaments have high randomness

Limitations

  1. Sample Size: Tournament data insufficient for precise individual evaluation
  2. Team Context: Defensive quality affects all goalkeeper metrics
  3. PSxG Unavailable: Analysis used xG rather than PSxG due to data limitations
  4. Penalty Shootouts: Excluded from main analysis despite significant impact

Conclusion

The 2018 World Cup demonstrated that goalkeeper evaluation requires multiple perspectives. Thibaut Courtois deserved the Golden Glove based on goals prevented, but Hugo Lloris lifted the trophy with a champion's defense limiting his workload. Danijel Subasic's heroics in shootouts and group stages were hidden by traditional statistics but revealed through expected goals analysis.

Modern goalkeeper evaluation must balance: - Shot-stopping quality (goals prevented) - Distribution contribution (pass completion, progression) - Sweeper effectiveness (where tactically required) - Sample size awareness (uncertainty acknowledgment)

Discussion Questions

  1. Should the Golden Glove award criteria include expected goals metrics?
  2. How might Neuer's statistics look if Germany had performed better?
  3. What distribution style would suit a team transitioning to possession play?
  4. How should penalty shootout performance factor into goalkeeper evaluation?

References

  1. StatsBomb Event Data, 2018 FIFA World Cup
  2. Golden Glove Award Criteria, FIFA
  3. Expected Goals Methodology, Various Sources