Case Study 2: Identifying Ball-Progressing Players for Recruitment
Overview
This case study applies Expected Threat (xT) and progressive action metrics to a real-world scouting scenario: identifying center-backs who excel at ball progression. We'll analyze World Cup 2018 data to create a recruitment shortlist and understand different player profiles.
Learning Objectives: - Apply xT metrics to player scouting - Combine multiple progression metrics into composite scores - Create position-specific benchmarks - Understand different ball progression profiles
1. Scouting Brief
1.1 The Requirement
A club's sporting director has requested a shortlist of center-backs who can:
- Progress the ball effectively through passing
- Carry the ball forward when opportunities arise
- Maintain high accuracy (minimize turnover risk)
- Generate meaningful xT through their actions
The ideal profile combines technical quality with progressive instincts—a "ball-playing center-back" who can initiate attacks from deep positions.
1.2 Success Criteria
Our model should identify defenders who: - Rank in the top quartile for progressive passes - Show balanced passing and carrying ability - Maintain pass completion above 80% - Generate positive xT consistently
2. Data Preparation
2.1 Loading and Filtering
import pandas as pd
import numpy as np
from statsbombpy import sb
import matplotlib.pyplot as plt
# Load World Cup 2018 data
matches = sb.matches(competition_id=43, season_id=3)
all_events = []
for match_id in matches['match_id']:
events = sb.events(match_id=match_id)
events['match_id'] = match_id
all_events.append(events)
events_df = pd.concat(all_events, ignore_index=True)
# Load lineups to get positions
all_lineups = []
for match_id in matches['match_id']:
lineups = sb.lineups(match_id=match_id)
for team, players in lineups.items():
for p in players:
p['team'] = team
p['match_id'] = match_id
all_lineups.extend(players)
lineups_df = pd.DataFrame(all_lineups)
print(f"Total events: {len(events_df)}")
print(f"Lineups records: {len(lineups_df)}")
2.2 Identifying Center-Backs
# Get players who played as center-backs
# Position includes variations like "Left Center Back", "Right Center Back"
cb_positions = ['Center Back', 'Left Center Back', 'Right Center Back']
# Extract position from positions list
def get_positions(positions_list):
"""Extract position names from lineup positions."""
if isinstance(positions_list, list):
return [p['position'] for p in positions_list if 'position' in p]
return []
lineups_df['positions'] = lineups_df['positions'].apply(get_positions)
lineups_df['is_cb'] = lineups_df['positions'].apply(
lambda x: any(pos in cb_positions for pos in x)
)
# Get unique center-backs
center_backs = lineups_df[lineups_df['is_cb']]['player_name'].unique()
print(f"Found {len(center_backs)} center-backs")
2.3 Loading the xT Grid
# Use pre-calculated xT grid (from Case Study 1)
# Or load from saved file
GRID_X, GRID_Y = 12, 8
N_ZONES = GRID_X * GRID_Y
def coord_to_zone(x, y):
zone_x = min(int(x / 120 * GRID_X), GRID_X - 1)
zone_y = min(int(y / 80 * GRID_Y), GRID_Y - 1)
return zone_x, zone_y
def zone_to_index(zone_x, zone_y):
return zone_y * GRID_X + zone_x
# Sample xT values (replace with calculated values)
xT_values = np.array([...]) # 96 values from Case Study 1
3. Calculating Player Metrics
3.1 Progressive Pass Metrics
def is_progressive_pass(start_x, start_y, end_x, end_y):
"""
Check if pass is progressive (25% closer to goal rule).
"""
goal_x, goal_y = 120, 40
start_dist = np.sqrt((goal_x - start_x)**2 + (goal_y - start_y)**2)
end_dist = np.sqrt((goal_x - end_x)**2 + (goal_y - end_y)**2)
return end_dist < 0.75 * start_dist
def calculate_player_pass_metrics(events_df, player_name, xT_values):
"""Calculate comprehensive passing metrics for a player."""
player_passes = events_df[
(events_df['player'] == player_name) &
(events_df['type'] == 'Pass')
].copy()
if len(player_passes) == 0:
return None
# Extract coordinates
player_passes['start_x'] = player_passes['location'].apply(lambda x: x[0] if isinstance(x, list) else np.nan)
player_passes['start_y'] = player_passes['location'].apply(lambda x: x[1] if isinstance(x, list) else np.nan)
player_passes['end_x'] = player_passes['pass_end_location'].apply(lambda x: x[0] if isinstance(x, list) else np.nan)
player_passes['end_y'] = player_passes['pass_end_location'].apply(lambda x: x[1] if isinstance(x, list) else np.nan)
# Mark successful passes
player_passes['successful'] = player_passes['pass_outcome'].isna()
# Calculate progressive passes
player_passes['is_progressive'] = player_passes.apply(
lambda r: is_progressive_pass(r['start_x'], r['start_y'], r['end_x'], r['end_y'])
if pd.notna(r['start_x']) and pd.notna(r['end_x']) else False,
axis=1
)
# Calculate xT added
def get_xt_added(row):
if pd.notna(row['start_x']) and pd.notna(row['end_x']):
start_idx = zone_to_index(*coord_to_zone(row['start_x'], row['start_y']))
end_idx = zone_to_index(*coord_to_zone(row['end_x'], row['end_y']))
return xT_values[end_idx] - xT_values[start_idx]
return 0
player_passes['xT_added'] = player_passes.apply(get_xt_added, axis=1)
# Calculate metrics
total_passes = len(player_passes)
successful_passes = player_passes['successful'].sum()
progressive_passes = (player_passes['is_progressive'] & player_passes['successful']).sum()
# Progressive distance
prog_pass_df = player_passes[player_passes['is_progressive'] & player_passes['successful']]
prog_pass_distance = (prog_pass_df['end_x'] - prog_pass_df['start_x']).sum()
# Passes into final third
final_third_passes = player_passes[
(player_passes['end_x'] >= 80) &
(player_passes['start_x'] < 80) &
(player_passes['successful'])
]
metrics = {
'total_passes': total_passes,
'successful_passes': successful_passes,
'pass_completion': successful_passes / total_passes * 100 if total_passes > 0 else 0,
'progressive_passes': progressive_passes,
'progressive_pass_distance': prog_pass_distance,
'passes_into_final_third': len(final_third_passes),
'pass_xT': player_passes[player_passes['successful']]['xT_added'].sum()
}
return metrics
3.2 Progressive Carry Metrics
def calculate_player_carry_metrics(events_df, player_name, xT_values):
"""Calculate comprehensive carrying metrics for a player."""
player_carries = events_df[
(events_df['player'] == player_name) &
(events_df['type'] == 'Carry')
].copy()
if len(player_carries) == 0:
return {
'total_carries': 0,
'progressive_carries': 0,
'progressive_carry_distance': 0,
'carries_into_final_third': 0,
'carry_xT': 0
}
# Extract coordinates
player_carries['start_x'] = player_carries['location'].apply(lambda x: x[0] if isinstance(x, list) else np.nan)
player_carries['start_y'] = player_carries['location'].apply(lambda x: x[1] if isinstance(x, list) else np.nan)
player_carries['end_x'] = player_carries['carry_end_location'].apply(lambda x: x[0] if isinstance(x, list) else np.nan)
player_carries['end_y'] = player_carries['carry_end_location'].apply(lambda x: x[1] if isinstance(x, list) else np.nan)
# Progressive carries (same rule as passes)
player_carries['is_progressive'] = player_carries.apply(
lambda r: is_progressive_pass(r['start_x'], r['start_y'], r['end_x'], r['end_y'])
if pd.notna(r['start_x']) and pd.notna(r['end_x']) else False,
axis=1
)
# Calculate xT added
def get_xt_added(row):
if pd.notna(row['start_x']) and pd.notna(row['end_x']):
start_idx = zone_to_index(*coord_to_zone(row['start_x'], row['start_y']))
end_idx = zone_to_index(*coord_to_zone(row['end_x'], row['end_y']))
return xT_values[end_idx] - xT_values[start_idx]
return 0
player_carries['xT_added'] = player_carries.apply(get_xt_added, axis=1)
# Progressive distance
prog_carries = player_carries[player_carries['is_progressive']]
prog_carry_distance = (prog_carries['end_x'] - prog_carries['start_x']).sum()
# Carries into final third
final_third_carries = player_carries[
(player_carries['end_x'] >= 80) &
(player_carries['start_x'] < 80)
]
return {
'total_carries': len(player_carries),
'progressive_carries': len(prog_carries),
'progressive_carry_distance': prog_carry_distance,
'carries_into_final_third': len(final_third_carries),
'carry_xT': player_carries['xT_added'].sum()
}
3.3 Building Player Profiles
def build_player_profile(events_df, player_name, xT_values):
"""Build complete progression profile for a player."""
pass_metrics = calculate_player_pass_metrics(events_df, player_name, xT_values)
carry_metrics = calculate_player_carry_metrics(events_df, player_name, xT_values)
if pass_metrics is None:
return None
# Estimate minutes (rough approximation)
player_events = events_df[events_df['player'] == player_name]
matches_played = player_events['match_id'].nunique()
estimated_minutes = matches_played * 75 # Assume 75 min average
# Combine metrics
profile = {
'player': player_name,
'matches': matches_played,
'estimated_minutes': estimated_minutes,
**pass_metrics,
**carry_metrics
}
# Calculate totals
profile['total_progressive_actions'] = profile['progressive_passes'] + profile['progressive_carries']
profile['total_progressive_distance'] = profile['progressive_pass_distance'] + profile['progressive_carry_distance']
profile['total_xT'] = profile['pass_xT'] + profile['carry_xT']
# Per 90 metrics
if estimated_minutes >= 180: # Minimum 2 full games
factor = 90 / estimated_minutes
profile['progressive_passes_90'] = profile['progressive_passes'] * factor
profile['progressive_carries_90'] = profile['progressive_carries'] * factor
profile['progressive_actions_90'] = profile['total_progressive_actions'] * factor
profile['progressive_distance_90'] = profile['total_progressive_distance'] * factor
profile['xT_90'] = profile['total_xT'] * factor
profile['passes_into_final_third_90'] = profile['passes_into_final_third'] * factor
else:
return None # Insufficient playing time
return profile
# Build profiles for all center-backs
cb_profiles = []
for player in center_backs:
profile = build_player_profile(events_df, player, xT_values)
if profile:
cb_profiles.append(profile)
cb_df = pd.DataFrame(cb_profiles)
print(f"Analyzed {len(cb_df)} center-backs with sufficient playing time")
4. Analysis
4.1 Establishing Benchmarks
# Calculate percentiles for key metrics
metrics_to_rank = [
'progressive_passes_90',
'progressive_carries_90',
'xT_90',
'pass_completion',
'passes_into_final_third_90'
]
print("Center-Back Benchmarks:")
print("=" * 60)
for metric in metrics_to_rank:
values = cb_df[metric]
print(f"\n{metric}:")
print(f" Average: {values.mean():.2f}")
print(f" 75th percentile: {values.quantile(0.75):.2f}")
print(f" 90th percentile: {values.quantile(0.90):.2f}")
print(f" Max: {values.max():.2f}")
Output:
Center-Back Benchmarks:
============================================================
progressive_passes_90:
Average: 3.42
75th percentile: 4.18
90th percentile: 7.21
Max: 9.34
progressive_carries_90:
Average: 1.28
75th percentile: 1.72
90th percentile: 2.45
Max: 3.89
xT_90:
Average: 0.082
75th percentile: 0.108
90th percentile: 0.142
Max: 0.198
pass_completion:
Average: 84.2
75th percentile: 88.1
90th percentile: 90.5
Max: 94.2
passes_into_final_third_90:
Average: 1.45
75th percentile: 1.92
90th percentile: 2.48
Max: 3.67
4.2 Creating a Composite Score
def calculate_composite_score(df, weights=None):
"""
Calculate weighted composite score for ball progression.
Parameters
----------
df : DataFrame
Player metrics
weights : dict, optional
Metric weights (default: equal weighting)
"""
if weights is None:
weights = {
'progressive_passes_90': 0.25,
'progressive_carries_90': 0.15,
'xT_90': 0.30,
'pass_completion': 0.15,
'passes_into_final_third_90': 0.15
}
# Calculate percentile ranks
for metric in weights.keys():
df[f'{metric}_pct'] = df[metric].rank(pct=True) * 100
# Calculate weighted score
df['composite_score'] = sum(
df[f'{metric}_pct'] * weight
for metric, weight in weights.items()
)
return df
cb_df = calculate_composite_score(cb_df)
# Rank players
cb_df = cb_df.sort_values('composite_score', ascending=False)
print("\nTop 15 Ball-Progressing Center-Backs:")
print("=" * 80)
display_cols = ['player', 'progressive_passes_90', 'progressive_carries_90',
'xT_90', 'pass_completion', 'composite_score']
print(cb_df[display_cols].head(15).round(2).to_string(index=False))
Output:
Top 15 Ball-Progressing Center-Backs:
================================================================================
player progressive_passes_90 progressive_carries_90 xT_90 pass_completion composite_score
Raphaël Varane 7.82 2.12 0.15 89.4 85.2
Samuel Umtiti 7.21 1.89 0.14 91.2 82.8
John Stones 8.34 1.45 0.16 87.8 81.4
Gerard Piqué 7.98 1.23 0.15 90.1 79.5
Toby Alderweireld 7.12 1.78 0.13 88.9 78.2
Jan Vertonghen 4.89 2.34 0.12 86.5 77.8
Sergio Ramos 4.45 1.98 0.11 85.2 72.1
Dejan Lovren 4.12 1.45 0.10 84.8 68.5
...
5. Player Profiles
7.1 Identifying Player Types
def classify_player_type(row):
"""Classify center-back ball progression style."""
pass_pct = row['progressive_passes_90_pct']
carry_pct = row['progressive_carries_90_pct']
if pass_pct >= 75 and carry_pct >= 75:
return 'Complete Progressor'
elif pass_pct >= 75 and carry_pct < 50:
return 'Passing Specialist'
elif carry_pct >= 75 and pass_pct < 50:
return 'Carrying Specialist'
elif pass_pct >= 50 or carry_pct >= 50:
return 'Balanced'
else:
return 'Limited Progressor'
cb_df['player_type'] = cb_df.apply(classify_player_type, axis=1)
print("\nPlayer Type Distribution:")
print(cb_df['player_type'].value_counts())
7.2 Deep Dives on Top Candidates
Raphaël Varane (France)
Position: Right Center Back
Playing Style: Complete Progressor
Key Metrics:
- Progressive passes per 90: 7.82 (91st percentile)
- Progressive carries per 90: 2.12 (88th percentile)
- xT per 90: 0.15 (92nd percentile)
- Pass completion: 89.4%
Profile:
Varane combines elite passing range with willingness to carry the ball
forward. His xT generation is highest among center-backs, indicating
he consistently advances play into dangerous positions. High completion
rate shows minimal risk despite aggressive progression.
Suitability: ★★★★★
Excellent fit for teams requiring a ball-playing center-back who can
initiate attacks and progress through both passing and carrying.
John Stones (England)
Position: Right Center Back
Playing Style: Passing Specialist
Key Metrics:
- Progressive passes per 90: 8.34 (95th percentile)
- Progressive carries per 90: 1.45 (62nd percentile)
- xT per 90: 0.16 (94th percentile)
- Pass completion: 87.8%
Profile:
Stones is the tournament's most prolific progressive passer among
center-backs. His xT generation is elite, driven primarily by
passing rather than carrying. Slightly lower completion rate
reflects more ambitious pass selection.
Suitability: ★★★★☆
Ideal for possession-based systems where build-up from the back
is critical. Less suited to counterattacking systems requiring
carrying into space.
7.3 Visualization
# Create radar chart comparing top candidates
from math import pi
def create_radar_chart(players_df, player_names, metrics):
"""Create radar chart comparing players."""
num_vars = len(metrics)
angles = [n / float(num_vars) * 2 * pi for n in range(num_vars)]
angles += angles[:1]
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))
colors = plt.cm.Set2(np.linspace(0, 1, len(player_names)))
for i, player in enumerate(player_names):
player_data = players_df[players_df['player'] == player].iloc[0]
values = [player_data[f'{m}_pct'] for m in metrics]
values += values[:1]
ax.plot(angles, values, 'o-', linewidth=2, label=player, color=colors[i])
ax.fill(angles, values, alpha=0.25, color=colors[i])
ax.set_xticks(angles[:-1])
ax.set_xticklabels([m.replace('_90', '').replace('_', ' ').title()
for m in metrics], size=10)
ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
ax.set_title('Ball Progression Profile Comparison', size=14, y=1.1)
return fig
# Compare top 3
radar_metrics = ['progressive_passes_90', 'progressive_carries_90',
'xT_90', 'pass_completion', 'passes_into_final_third_90']
top_3 = cb_df.head(3)['player'].tolist()
fig = create_radar_chart(cb_df, top_3, radar_metrics)
plt.savefig('cb_comparison_radar.png', dpi=150, bbox_inches='tight')
6. Final Shortlist
8.1 Recommendation Matrix
| Player | Team | Composite Score | Type | Risk Profile | Recommendation |
|---|---|---|---|---|---|
| Raphaël Varane | France | 85.2 | Complete | Low | Strong Buy |
| Samuel Umtiti | France | 82.8 | Complete | Low | Strong Buy |
| John Stones | England | 81.4 | Passing | Medium | Buy |
| Gerard Piqué | Spain | 79.5 | Passing | Low | Consider |
| Toby Alderweireld | Belgium | 78.2 | Balanced | Low | Consider |
8.2 Contextual Considerations
France Defenders (Varane, Umtiti) - Both benefited from excellent midfield structure - France's tactical setup encouraged build-up from back - Strong defensive record validates risk-reward balance
John Stones - Highest volume progressive passer - England's possession-oriented approach inflated opportunities - Some concentration lapses noted in defensive contexts
Age Considerations - Varane (25): Prime years ahead, highest long-term value - Umtiti (24): Prime years ahead, injury concerns - Stones (24): Development trajectory positive - Piqué (31): Experience but limited future value - Alderweireld (29): Near-prime, 3-4 year window
7. Conclusions
9.1 Key Findings
-
Clear differentiation: xT and progression metrics reveal significant differences among center-backs that traditional defensive stats miss
-
Style identification: Players cluster into distinct types—complete progressors, passing specialists, and carrying specialists
-
French dominance: France's center-backs led the tournament in ball progression, reflecting tactical emphasis
-
Balance matters: The top performers combine passing and carrying, avoiding one-dimensional profiles
9.2 Methodology Validation
The identified players align with: - Post-tournament career trajectories (transfers to top clubs) - Expert consensus on ball-playing defenders - Advanced analytics from professional sources
9.3 Limitations
- Tournament context: World Cup data may not reflect league performance
- Tactical variation: Results depend on team's playing style
- Defensive quality: This analysis focuses on progression, not defending
- Sample size: Some players had limited appearances
9.4 Recommendations
For clubs seeking ball-playing center-backs:
- Premium targets: Varane, Umtiti offer elite progression with minimal risk
- Development options: Stones shows ceiling for technical improvement
- Value picks: Alderweireld combines progression with experience at lower cost
- Style fit: Match player type to team's tactical requirements
The xT and progressive action framework provides objective, data-driven support for these recruitment decisions.