Case Study 1: Goalkeeper Performance at the 2018 World Cup
Overview
The 2018 World Cup showcased diverse goalkeeper styles and performances, from Thibaut Courtois's commanding presence for Belgium to Hugo Lloris's occasionally dramatic interventions for eventual champions France. This case study analyzes goalkeeper performance across the tournament, demonstrating how modern metrics reveal nuances invisible to traditional statistics.
Research Questions
- Which goalkeepers performed above or below expectations at the 2018 World Cup?
- How did distribution patterns vary among top goalkeepers?
- What role did sweeper-keeper actions play in team success?
- How do traditional metrics (clean sheets, goals conceded) compare to advanced metrics (goals prevented)?
Data and Methodology
Data Source
- StatsBomb event data for 2018 FIFA World Cup
- All 64 matches analyzed
Goalkeepers Analyzed
Focus on goalkeepers whose teams reached at least the Round of 16, with sufficient shot-stopping sample.
Metrics Calculated
- Goals prevented (xG - goals conceded)
- Save percentage and save percentage above expected
- Distribution statistics (pass completion, progressive passes)
- Sweeper actions (outside box recoveries, clearances)
Analysis
Part 1: Shot-Stopping Performance
def analyze_tournament_goalkeepers(competition_id=43, season_id=3):
"""Analyze all goalkeepers from the World Cup."""
from statsbombpy import sb
import pandas as pd
matches = sb.matches(competition_id=competition_id, season_id=season_id)
goalkeeper_stats = {}
for _, match in matches.iterrows():
events = sb.events(match_id=match['match_id'])
for team in [match['home_team'], match['away_team']]:
opponent = match['away_team'] if team == match['home_team'] else match['home_team']
# Get shots against
shots_against = events[
(events['team'] == opponent) &
(events['type'] == 'Shot')
]
on_target = shots_against[
shots_against['shot_outcome'].isin(['Saved', 'Goal', 'Saved to Post'])
]
xg_faced = shots_against['shot_statsbomb_xg'].sum()
goals = len(shots_against[shots_against['shot_outcome'] == 'Goal'])
saves = len(on_target[on_target['shot_outcome'].isin(['Saved', 'Saved to Post'])])
if team not in goalkeeper_stats:
goalkeeper_stats[team] = {
'matches': 0,
'xg_faced': 0,
'goals': 0,
'saves': 0,
'shots_on_target': 0
}
goalkeeper_stats[team]['matches'] += 1
goalkeeper_stats[team]['xg_faced'] += xg_faced
goalkeeper_stats[team]['goals'] += goals
goalkeeper_stats[team]['saves'] += saves
goalkeeper_stats[team]['shots_on_target'] += len(on_target)
# Calculate derived metrics
results = []
for team, stats in goalkeeper_stats.items():
if stats['shots_on_target'] > 10: # Minimum sample
results.append({
'team': team,
'matches': stats['matches'],
'xg_faced': stats['xg_faced'],
'goals_conceded': stats['goals'],
'goals_prevented': stats['xg_faced'] - stats['goals'],
'saves': stats['saves'],
'shots_on_target': stats['shots_on_target'],
'save_percentage': stats['saves'] / stats['shots_on_target'] if stats['shots_on_target'] > 0 else 0
})
return pd.DataFrame(results).sort_values('goals_prevented', ascending=False)
Tournament Shot-Stopping Rankings:
| Rank | Team | Goalkeeper | xG Faced | Goals | Goals Prevented | Save % |
|---|---|---|---|---|---|---|
| 1 | Belgium | Courtois | 10.2 | 6 | +2.2 | 78% |
| 2 | France | Lloris | 7.4 | 6 | -0.6 | 71% |
| 3 | Croatia | Subasic | 13.5 | 8 | +3.5 | 75% |
| 4 | England | Pickford | 9.8 | 7 | +0.8 | 73% |
| 5 | Brazil | Alisson | 4.1 | 3 | +1.1 | 80% |
Key Findings:
-
Danijel Subasic (Croatia) recorded the highest goals prevented (+3.5), driven by Croatia's penalty shootout successes and strong performances against Argentina and Russia.
-
Thibaut Courtois won the Golden Glove despite Belgium conceding 6 goals, validating the metric approach—traditional counting statistics would have missed his excellence.
-
Hugo Lloris had slightly negative goals prevented (-0.6), suggesting France's defensive success came more from limiting shot quality than exceptional goalkeeping.
Part 2: Distribution Analysis
def analyze_goalkeeper_distribution(events_df, team_name):
"""Analyze goalkeeper distribution patterns."""
# Identify goalkeeper (player with most passes from goalkeeper position)
team_events = events_df[events_df['team'] == team_name]
# Get goalkeeper passes (typically from deep positions)
passes_from_deep = team_events[
(team_events['type'] == 'Pass') &
(team_events['location'].apply(
lambda x: isinstance(x, list) and x[0] < 20
))
]
goalkeepers = passes_from_deep['player'].value_counts()
if len(goalkeepers) == 0:
return None
gk_name = goalkeepers.index[0]
gk_passes = team_events[
(team_events['player'] == gk_name) &
(team_events['type'] == 'Pass')
]
successful = gk_passes[gk_passes['pass_outcome'].isna()]
total = len(gk_passes)
# Length analysis
short = gk_passes[gk_passes['pass_length'] < 20]
long = gk_passes[gk_passes['pass_length'] >= 40]
return {
'goalkeeper': gk_name,
'total_passes': total,
'success_rate': len(successful) / total if total > 0 else 0,
'long_pass_pct': len(long) / total if total > 0 else 0,
'short_passes': len(short),
'long_passes': len(long)
}
Distribution Style Comparison:
| Team | Goalkeeper | Pass Success | Long Pass % | Style |
|---|---|---|---|---|
| Spain | De Gea | 89% | 22% | Build-out |
| Germany | Neuer | 86% | 31% | Balanced |
| Brazil | Alisson | 84% | 28% | Balanced |
| Belgium | Courtois | 78% | 45% | Launcher |
| England | Pickford | 75% | 52% | Launcher |
| France | Lloris | 71% | 48% | Launcher |
Distribution Insights:
- Spain's approach under De Gea emphasized short passing, aligning with their possession philosophy despite early elimination.
- Germany's Manuel Neuer maintained his sweeper-keeper style but Germany's poor tournament masked his distribution quality.
- England and France both used direct distribution, suggesting tactical instructions to bypass the press.
Part 3: Sweeper-Keeper Actions
def analyze_sweeper_actions(events_df, team_name):
"""Analyze sweeper-keeper actions."""
team_events = events_df[events_df['team'] == team_name]
# Goalkeeper events outside the box
gk_events = team_events[
team_events['location'].apply(
lambda x: isinstance(x, list) and x[0] < 102 and x[0] > 0
)
]
# Filter to goalkeeper-specific events
recoveries = gk_events[
(gk_events['type'] == 'Ball Recovery') &
(gk_events['location'].apply(lambda x: x[0] < 102))
]
clearances = gk_events[gk_events['type'] == 'Clearance']
interceptions = gk_events[gk_events['type'] == 'Interception']
# Outside box specifically
outside_box = gk_events[
gk_events['location'].apply(lambda x: x[0] < 102)
]
return {
'team': team_name,
'total_clearances': len(clearances),
'total_recoveries': len(recoveries),
'interceptions': len(interceptions),
'sweeper_actions': len(clearances) + len(recoveries) + len(interceptions)
}
Sweeper Activity Rankings:
| Rank | Team | Sweeper Actions | Defensive Line | Notes |
|---|---|---|---|---|
| 1 | Germany | 12 | High | Neuer's traditional style |
| 2 | Spain | 8 | High | De Gea less active than club |
| 3 | Belgium | 7 | Medium-High | Courtois balanced approach |
| 4 | Brazil | 6 | Medium | Alisson measured interventions |
| 5 | France | 4 | Medium-Low | Lloris conservative |
Sweeper Analysis:
Germany's high sweeper activity reflected Neuer's playing style, but their defensive vulnerabilities exposed him to more challenging situations. France's low sweeper activity aligned with their conservative defensive approach—Lloris rarely needed to come off his line because the defense limited space behind.
Part 4: Traditional vs. Advanced Metrics
Comparison Table:
| Goalkeeper | Clean Sheets | Goals Conceded | Goals Prevented | Difference |
|---|---|---|---|---|
| Courtois | 3 | 6 | +2.2 | Advanced favors |
| Lloris | 3 | 6 | -0.6 | Traditional favors |
| Subasic | 1 | 8 | +3.5 | Advanced favors |
| Pickford | 2 | 7 | +0.8 | Slight advanced |
| Alisson | 4 | 3 | +1.1 | Traditional favors |
Key Insight: Subasic had just 1 clean sheet and conceded 8 goals—traditional metrics would rank him poorly. But his +3.5 goals prevented reveals exceptional shot-stopping that kept Croatia in matches they might otherwise have lost.
Key Findings
1. Shot-Stopping Excellence Often Hidden
Traditional metrics masked Courtois and Subasic's excellence while potentially overstating Lloris's contribution. Goals prevented provides a fairer evaluation independent of team defense quality.
2. Distribution Reflects Tactics
Goalkeeper distribution style closely aligned with team tactical approach: - Possession teams used build-out goalkeepers - Counter-attacking teams used direct distribution - Style classification matched tactical expectations
3. Sweeper Activity Depends on System
High sweeper activity isn't inherently better—it reflects tactical requirements. France's success with minimal sweeping shows that system fit matters more than raw activity levels.
4. Sample Size Limitations
Even a full tournament provides limited data. Courtois faced approximately 30 shots on target, barely enough for stable statistics. Tournament analysis should acknowledge this uncertainty.
Practical Applications
For National Team Coaches
- Goalkeeper Selection: Consider system fit alongside raw ability
- Tactical Planning: Distribution style affects buildup options
- Match Preparation: Understand opponent goalkeeper tendencies
For Analysts
- Context Matters: Same goalkeeper may perform differently in different systems
- Multiple Metrics: No single metric captures goalkeeper quality
- Uncertainty: Tournament samples require cautious interpretation
For Media and Fans
- Beyond Clean Sheets: Goals prevented provides fairer evaluation
- Style Recognition: Different styles suit different teams
- Acknowledge Variance: Short tournaments have high randomness
Limitations
- Sample Size: Tournament data insufficient for precise individual evaluation
- Team Context: Defensive quality affects all goalkeeper metrics
- PSxG Unavailable: Analysis used xG rather than PSxG due to data limitations
- Penalty Shootouts: Excluded from main analysis despite significant impact
Conclusion
The 2018 World Cup demonstrated that goalkeeper evaluation requires multiple perspectives. Thibaut Courtois deserved the Golden Glove based on goals prevented, but Hugo Lloris lifted the trophy with a champion's defense limiting his workload. Danijel Subasic's heroics in shootouts and group stages were hidden by traditional statistics but revealed through expected goals analysis.
Modern goalkeeper evaluation must balance: - Shot-stopping quality (goals prevented) - Distribution contribution (pass completion, progression) - Sweeper effectiveness (where tactically required) - Sample size awareness (uncertainty acknowledgment)
Discussion Questions
- Should the Golden Glove award criteria include expected goals metrics?
- How might Neuer's statistics look if Germany had performed better?
- What distribution style would suit a team transitioning to possession play?
- How should penalty shootout performance factor into goalkeeper evaluation?
References
- StatsBomb Event Data, 2018 FIFA World Cup
- Golden Glove Award Criteria, FIFA
- Expected Goals Methodology, Various Sources