Expected Goals (xG) Analysis
Beginner
10 min read
0 views
Nov 27, 2025
# Expected Goals (xG) Analysis
## Overview
Expected Goals (xG) is a statistical metric that measures the quality of a goal-scoring chance. It assigns a probability value between 0 and 1 to each shot, representing the likelihood that an average player would score from that position and situation.
## Key Factors in xG Calculation
### Shot Location
- Distance from goal
- Angle to goal
- Location within the box
### Shot Type
- Header vs. foot
- Weak foot vs. strong foot
- Set piece vs. open play
### Situation Context
- Number of defenders between shot and goal
- Goalkeeper position
- Type of assist (through ball, cross, etc.)
## Python Implementation
```python
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
# Sample shot data
shot_data = pd.DataFrame({
'distance': [10, 20, 15, 8, 25, 12, 18, 6, 22, 14],
'angle': [45, 30, 60, 15, 40, 50, 35, 10, 25, 55],
'header': [0, 0, 1, 0, 0, 1, 0, 0, 0, 1],
'big_chance': [1, 0, 0, 1, 0, 1, 0, 1, 0, 0],
'defenders': [1, 3, 2, 0, 4, 2, 3, 1, 3, 2],
'goal': [1, 0, 0, 1, 0, 1, 0, 1, 0, 0]
})
# Calculate basic xG using logistic function
def calculate_basic_xg(distance, angle):
"""
Simple xG model based on distance and angle
"""
# Normalize distance (closer = higher xG)
distance_factor = 1 / (1 + distance / 10)
# Normalize angle (central = higher xG)
angle_factor = np.cos(np.radians(angle - 45))
# Combine factors
xg = distance_factor * angle_factor
# Apply logistic transformation
xg = 1 / (1 + np.exp(-5 * (xg - 0.5)))
return np.clip(xg, 0, 1)
# Apply basic xG calculation
shot_data['xG_basic'] = shot_data.apply(
lambda row: calculate_basic_xg(row['distance'], row['angle']),
axis=1
)
# Advanced xG model using machine learning
def train_xg_model(data):
"""
Train a Random Forest model for xG prediction
"""
features = ['distance', 'angle', 'header', 'big_chance', 'defenders']
X = data[features]
y = data['goal']
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Train model
model = RandomForestClassifier(
n_estimators=100,
max_depth=5,
random_state=42
)
model.fit(X_train, y_train)
# Predict probabilities
xg_predictions = model.predict_proba(X)[:, 1]
return model, xg_predictions
# Train model and get xG predictions
model, shot_data['xG_ml'] = train_xg_model(shot_data)
# Player-level xG analysis
player_shots = pd.DataFrame({
'player': ['Player A', 'Player A', 'Player B', 'Player B', 'Player C'],
'xG': [0.15, 0.42, 0.08, 0.65, 0.22],
'goal': [0, 1, 0, 1, 0]
})
# Calculate xG performance metrics
player_summary = player_shots.groupby('player').agg({
'xG': 'sum',
'goal': 'sum'
}).reset_index()
player_summary.columns = ['Player', 'Total_xG', 'Actual_Goals']
player_summary['xG_Performance'] = (
player_summary['Actual_Goals'] - player_summary['Total_xG']
)
print("Player xG Performance:")
print(player_summary)
# Visualize xG performance
plt.figure(figsize=(10, 6))
x = np.arange(len(player_summary))
width = 0.35
plt.bar(x - width/2, player_summary['Total_xG'], width,
label='Expected Goals (xG)', alpha=0.8)
plt.bar(x + width/2, player_summary['Actual_Goals'], width,
label='Actual Goals', alpha=0.8)
plt.xlabel('Player')
plt.ylabel('Goals')
plt.title('Expected vs Actual Goals by Player')
plt.xticks(x, player_summary['Player'])
plt.legend()
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('xg_performance.png', dpi=300, bbox_inches='tight')
plt.show()
# Feature importance
feature_importance = pd.DataFrame({
'feature': ['distance', 'angle', 'header', 'big_chance', 'defenders'],
'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
print("\nFeature Importance in xG Model:")
print(feature_importance)
```
## R Implementation
```r
library(tidyverse)
library(randomForest)
library(ggplot2)
library(plotly)
# Sample shot data
shot_data <- data.frame(
distance = c(10, 20, 15, 8, 25, 12, 18, 6, 22, 14),
angle = c(45, 30, 60, 15, 40, 50, 35, 10, 25, 55),
header = c(0, 0, 1, 0, 0, 1, 0, 0, 0, 1),
big_chance = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 0),
defenders = c(1, 3, 2, 0, 4, 2, 3, 1, 3, 2),
goal = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 0)
)
# Calculate basic xG
calculate_basic_xg <- function(distance, angle) {
# Distance factor (closer = higher xG)
distance_factor <- 1 / (1 + distance / 10)
# Angle factor (central = higher xG)
angle_factor <- cos((angle - 45) * pi / 180)
# Combine factors
xg <- distance_factor * angle_factor
# Apply logistic transformation
xg <- 1 / (1 + exp(-5 * (xg - 0.5)))
# Clip between 0 and 1
xg <- pmax(0, pmin(1, xg))
return(xg)
}
# Apply basic xG calculation
shot_data <- shot_data %>%
mutate(xG_basic = calculate_basic_xg(distance, angle))
# Train Random Forest xG model
xg_model <- randomForest(
as.factor(goal) ~ distance + angle + header + big_chance + defenders,
data = shot_data,
ntree = 100,
importance = TRUE
)
# Get xG predictions
shot_data$xG_ml <- predict(xg_model, shot_data, type = "prob")[, 2]
# Player-level xG analysis
player_shots <- data.frame(
player = c("Player A", "Player A", "Player B", "Player B", "Player C"),
xG = c(0.15, 0.42, 0.08, 0.65, 0.22),
goal = c(0, 1, 0, 1, 0)
)
# Calculate player xG performance
player_summary <- player_shots %>%
group_by(player) %>%
summarise(
Total_xG = sum(xG),
Actual_Goals = sum(goal)
) %>%
mutate(xG_Performance = Actual_Goals - Total_xG)
print("Player xG Performance:")
print(player_summary)
# Visualize xG performance
xg_plot <- player_summary %>%
pivot_longer(cols = c(Total_xG, Actual_Goals),
names_to = "Metric",
values_to = "Goals") %>%
ggplot(aes(x = player, y = Goals, fill = Metric)) +
geom_bar(stat = "identity", position = "dodge", alpha = 0.8) +
labs(
title = "Expected vs Actual Goals by Player",
x = "Player",
y = "Goals",
fill = ""
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
legend.position = "bottom"
)
print(xg_plot)
# Feature importance
importance_df <- data.frame(
feature = rownames(importance(xg_model)),
importance = importance(xg_model)[, "MeanDecreaseGini"]
) %>%
arrange(desc(importance))
print("Feature Importance in xG Model:")
print(importance_df)
# Visualize feature importance
importance_plot <- ggplot(importance_df,
aes(x = reorder(feature, importance),
y = importance)) +
geom_bar(stat = "identity", fill = "steelblue", alpha = 0.8) +
coord_flip() +
labs(
title = "Feature Importance in xG Model",
x = "Feature",
y = "Importance"
) +
theme_minimal()
print(importance_plot)
```
## Interpretation Guidelines
### Player Evaluation
- **xG Performance > 0**: Player is overperforming expected goals (clinical finisher)
- **xG Performance < 0**: Player is underperforming expected goals (needs improvement)
- **xG Performance ≈ 0**: Player is performing as expected
### Team Analysis
- High team xG but low actual goals suggests poor finishing
- Low team xG but high goals suggests clinical finishing or luck
- Sustained xG overperformance is difficult to maintain
### Match Analysis
- Compare xG between teams to assess match dominance
- xG can reveal "unlucky" losses where team created better chances
- Useful for identifying sustainable performance vs. variance
## Limitations
1. **Context Blind**: xG doesn't account for player skill level
2. **Defensive Pressure**: Difficulty quantifying immediate pressure on shooter
3. **Goalkeeper Quality**: Treats all goalkeepers as average
4. **Match Situation**: Doesn't consider score, time remaining, or tactical context
## Best Practices
- Use xG over multiple matches (minimum 10-15 games) for reliability
- Combine with qualitative analysis
- Consider xG per shot and xG per 90 minutes
- Track trends over time rather than single match values
- Use xG alongside actual goals for comprehensive evaluation
Discussion
Have questions or feedback? Join our community discussion on
Discord or
GitHub Discussions.
Table of Contents
Related Topics
Quick Actions