Expected Goals (xG) Analysis

Beginner 10 min read 17 views Nov 27, 2025

# Expected Goals (xG) Analysis ## Overview Expected Goals (xG) is a statistical metric that measures the quality of a goal-scoring chance. It assigns a probability value between 0 and 1 to each shot, representing the likelihood that an average player would score from that position and situation. ## Key Factors in xG Calculation ### Shot Location - Distance from goal - Angle to goal - Location within the box ### Shot Type - Header vs. foot - Weak foot vs. strong foot - Set piece vs. open play ### Situation Context - Number of defenders between shot and goal - Goalkeeper position - Type of assist (through ball, cross, etc.) ## Python Implementation ```python import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt import seaborn as sns # Sample shot data shot_data = pd.DataFrame({ 'distance': [10, 20, 15, 8, 25, 12, 18, 6, 22, 14], 'angle': [45, 30, 60, 15, 40, 50, 35, 10, 25, 55], 'header': [0, 0, 1, 0, 0, 1, 0, 0, 0, 1], 'big_chance': [1, 0, 0, 1, 0, 1, 0, 1, 0, 0], 'defenders': [1, 3, 2, 0, 4, 2, 3, 1, 3, 2], 'goal': [1, 0, 0, 1, 0, 1, 0, 1, 0, 0] }) # Calculate basic xG using logistic function def calculate_basic_xg(distance, angle): """ Simple xG model based on distance and angle """ # Normalize distance (closer = higher xG) distance_factor = 1 / (1 + distance / 10) # Normalize angle (central = higher xG) angle_factor = np.cos(np.radians(angle - 45)) # Combine factors xg = distance_factor * angle_factor # Apply logistic transformation xg = 1 / (1 + np.exp(-5 * (xg - 0.5))) return np.clip(xg, 0, 1) # Apply basic xG calculation shot_data['xG_basic'] = shot_data.apply( lambda row: calculate_basic_xg(row['distance'], row['angle']), axis=1 ) # Advanced xG model using machine learning def train_xg_model(data): """ Train a Random Forest model for xG prediction """ features = ['distance', 'angle', 'header', 'big_chance', 'defenders'] X = data[features] y = data['goal'] # Split data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42 ) # Train model model = RandomForestClassifier( n_estimators=100, max_depth=5, random_state=42 ) model.fit(X_train, y_train) # Predict probabilities xg_predictions = model.predict_proba(X)[:, 1] return model, xg_predictions # Train model and get xG predictions model, shot_data['xG_ml'] = train_xg_model(shot_data) # Player-level xG analysis player_shots = pd.DataFrame({ 'player': ['Player A', 'Player A', 'Player B', 'Player B', 'Player C'], 'xG': [0.15, 0.42, 0.08, 0.65, 0.22], 'goal': [0, 1, 0, 1, 0] }) # Calculate xG performance metrics player_summary = player_shots.groupby('player').agg({ 'xG': 'sum', 'goal': 'sum' }).reset_index() player_summary.columns = ['Player', 'Total_xG', 'Actual_Goals'] player_summary['xG_Performance'] = ( player_summary['Actual_Goals'] - player_summary['Total_xG'] ) print("Player xG Performance:") print(player_summary) # Visualize xG performance plt.figure(figsize=(10, 6)) x = np.arange(len(player_summary)) width = 0.35 plt.bar(x - width/2, player_summary['Total_xG'], width, label='Expected Goals (xG)', alpha=0.8) plt.bar(x + width/2, player_summary['Actual_Goals'], width, label='Actual Goals', alpha=0.8) plt.xlabel('Player') plt.ylabel('Goals') plt.title('Expected vs Actual Goals by Player') plt.xticks(x, player_summary['Player']) plt.legend() plt.grid(axis='y', alpha=0.3) plt.tight_layout() plt.savefig('xg_performance.png', dpi=300, bbox_inches='tight') plt.show() # Feature importance feature_importance = pd.DataFrame({ 'feature': ['distance', 'angle', 'header', 'big_chance', 'defenders'], 'importance': model.feature_importances_ }).sort_values('importance', ascending=False) print("\nFeature Importance in xG Model:") print(feature_importance) ``` ## R Implementation ```r library(tidyverse) library(randomForest) library(ggplot2) library(plotly) # Sample shot data shot_data <- data.frame( distance = c(10, 20, 15, 8, 25, 12, 18, 6, 22, 14), angle = c(45, 30, 60, 15, 40, 50, 35, 10, 25, 55), header = c(0, 0, 1, 0, 0, 1, 0, 0, 0, 1), big_chance = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 0), defenders = c(1, 3, 2, 0, 4, 2, 3, 1, 3, 2), goal = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 0) ) # Calculate basic xG calculate_basic_xg <- function(distance, angle) { # Distance factor (closer = higher xG) distance_factor <- 1 / (1 + distance / 10) # Angle factor (central = higher xG) angle_factor <- cos((angle - 45) * pi / 180) # Combine factors xg <- distance_factor * angle_factor # Apply logistic transformation xg <- 1 / (1 + exp(-5 * (xg - 0.5))) # Clip between 0 and 1 xg <- pmax(0, pmin(1, xg)) return(xg) } # Apply basic xG calculation shot_data <- shot_data %>% mutate(xG_basic = calculate_basic_xg(distance, angle)) # Train Random Forest xG model xg_model <- randomForest( as.factor(goal) ~ distance + angle + header + big_chance + defenders, data = shot_data, ntree = 100, importance = TRUE ) # Get xG predictions shot_data$xG_ml <- predict(xg_model, shot_data, type = "prob")[, 2] # Player-level xG analysis player_shots <- data.frame( player = c("Player A", "Player A", "Player B", "Player B", "Player C"), xG = c(0.15, 0.42, 0.08, 0.65, 0.22), goal = c(0, 1, 0, 1, 0) ) # Calculate player xG performance player_summary <- player_shots %>% group_by(player) %>% summarise( Total_xG = sum(xG), Actual_Goals = sum(goal) ) %>% mutate(xG_Performance = Actual_Goals - Total_xG) print("Player xG Performance:") print(player_summary) # Visualize xG performance xg_plot <- player_summary %>% pivot_longer(cols = c(Total_xG, Actual_Goals), names_to = "Metric", values_to = "Goals") %>% ggplot(aes(x = player, y = Goals, fill = Metric)) + geom_bar(stat = "identity", position = "dodge", alpha = 0.8) + labs( title = "Expected vs Actual Goals by Player", x = "Player", y = "Goals", fill = "" ) + theme_minimal() + theme( plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), legend.position = "bottom" ) print(xg_plot) # Feature importance importance_df <- data.frame( feature = rownames(importance(xg_model)), importance = importance(xg_model)[, "MeanDecreaseGini"] ) %>% arrange(desc(importance)) print("Feature Importance in xG Model:") print(importance_df) # Visualize feature importance importance_plot <- ggplot(importance_df, aes(x = reorder(feature, importance), y = importance)) + geom_bar(stat = "identity", fill = "steelblue", alpha = 0.8) + coord_flip() + labs( title = "Feature Importance in xG Model", x = "Feature", y = "Importance" ) + theme_minimal() print(importance_plot) ``` ## Interpretation Guidelines ### Player Evaluation - **xG Performance > 0**: Player is overperforming expected goals (clinical finisher) - **xG Performance < 0**: Player is underperforming expected goals (needs improvement) - **xG Performance ≈ 0**: Player is performing as expected ### Team Analysis - High team xG but low actual goals suggests poor finishing - Low team xG but high goals suggests clinical finishing or luck - Sustained xG overperformance is difficult to maintain ### Match Analysis - Compare xG between teams to assess match dominance - xG can reveal "unlucky" losses where team created better chances - Useful for identifying sustainable performance vs. variance ## Limitations 1. **Context Blind**: xG doesn't account for player skill level 2. **Defensive Pressure**: Difficulty quantifying immediate pressure on shooter 3. **Goalkeeper Quality**: Treats all goalkeepers as average 4. **Match Situation**: Doesn't consider score, time remaining, or tactical context ## Best Practices - Use xG over multiple matches (minimum 10-15 games) for reliability - Combine with qualitative analysis - Consider xG per shot and xG per 90 minutes - Track trends over time rather than single match values - Use xG alongside actual goals for comprehensive evaluation

Expected Assists (xA) and Key Pass Analysis Next

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.

Table of Contents

Expected Goals (xG) Analysis

Test Your Knowledge

Discussion