Expected Goals (xG) Analysis

Beginner 10 min read 1 views Nov 27, 2025
# Expected Goals (xG) Analysis ## Overview Expected Goals (xG) is a statistical metric that measures the quality of a goal-scoring chance. It assigns a probability value between 0 and 1 to each shot, representing the likelihood that an average player would score from that position and situation. ## Key Factors in xG Calculation ### Shot Location - Distance from goal - Angle to goal - Location within the box ### Shot Type - Header vs. foot - Weak foot vs. strong foot - Set piece vs. open play ### Situation Context - Number of defenders between shot and goal - Goalkeeper position - Type of assist (through ball, cross, etc.) ## Python Implementation ```python import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt import seaborn as sns # Sample shot data shot_data = pd.DataFrame({ 'distance': [10, 20, 15, 8, 25, 12, 18, 6, 22, 14], 'angle': [45, 30, 60, 15, 40, 50, 35, 10, 25, 55], 'header': [0, 0, 1, 0, 0, 1, 0, 0, 0, 1], 'big_chance': [1, 0, 0, 1, 0, 1, 0, 1, 0, 0], 'defenders': [1, 3, 2, 0, 4, 2, 3, 1, 3, 2], 'goal': [1, 0, 0, 1, 0, 1, 0, 1, 0, 0] }) # Calculate basic xG using logistic function def calculate_basic_xg(distance, angle): """ Simple xG model based on distance and angle """ # Normalize distance (closer = higher xG) distance_factor = 1 / (1 + distance / 10) # Normalize angle (central = higher xG) angle_factor = np.cos(np.radians(angle - 45)) # Combine factors xg = distance_factor * angle_factor # Apply logistic transformation xg = 1 / (1 + np.exp(-5 * (xg - 0.5))) return np.clip(xg, 0, 1) # Apply basic xG calculation shot_data['xG_basic'] = shot_data.apply( lambda row: calculate_basic_xg(row['distance'], row['angle']), axis=1 ) # Advanced xG model using machine learning def train_xg_model(data): """ Train a Random Forest model for xG prediction """ features = ['distance', 'angle', 'header', 'big_chance', 'defenders'] X = data[features] y = data['goal'] # Split data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42 ) # Train model model = RandomForestClassifier( n_estimators=100, max_depth=5, random_state=42 ) model.fit(X_train, y_train) # Predict probabilities xg_predictions = model.predict_proba(X)[:, 1] return model, xg_predictions # Train model and get xG predictions model, shot_data['xG_ml'] = train_xg_model(shot_data) # Player-level xG analysis player_shots = pd.DataFrame({ 'player': ['Player A', 'Player A', 'Player B', 'Player B', 'Player C'], 'xG': [0.15, 0.42, 0.08, 0.65, 0.22], 'goal': [0, 1, 0, 1, 0] }) # Calculate xG performance metrics player_summary = player_shots.groupby('player').agg({ 'xG': 'sum', 'goal': 'sum' }).reset_index() player_summary.columns = ['Player', 'Total_xG', 'Actual_Goals'] player_summary['xG_Performance'] = ( player_summary['Actual_Goals'] - player_summary['Total_xG'] ) print("Player xG Performance:") print(player_summary) # Visualize xG performance plt.figure(figsize=(10, 6)) x = np.arange(len(player_summary)) width = 0.35 plt.bar(x - width/2, player_summary['Total_xG'], width, label='Expected Goals (xG)', alpha=0.8) plt.bar(x + width/2, player_summary['Actual_Goals'], width, label='Actual Goals', alpha=0.8) plt.xlabel('Player') plt.ylabel('Goals') plt.title('Expected vs Actual Goals by Player') plt.xticks(x, player_summary['Player']) plt.legend() plt.grid(axis='y', alpha=0.3) plt.tight_layout() plt.savefig('xg_performance.png', dpi=300, bbox_inches='tight') plt.show() # Feature importance feature_importance = pd.DataFrame({ 'feature': ['distance', 'angle', 'header', 'big_chance', 'defenders'], 'importance': model.feature_importances_ }).sort_values('importance', ascending=False) print("\nFeature Importance in xG Model:") print(feature_importance) ``` ## R Implementation ```r library(tidyverse) library(randomForest) library(ggplot2) library(plotly) # Sample shot data shot_data <- data.frame( distance = c(10, 20, 15, 8, 25, 12, 18, 6, 22, 14), angle = c(45, 30, 60, 15, 40, 50, 35, 10, 25, 55), header = c(0, 0, 1, 0, 0, 1, 0, 0, 0, 1), big_chance = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 0), defenders = c(1, 3, 2, 0, 4, 2, 3, 1, 3, 2), goal = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 0) ) # Calculate basic xG calculate_basic_xg <- function(distance, angle) { # Distance factor (closer = higher xG) distance_factor <- 1 / (1 + distance / 10) # Angle factor (central = higher xG) angle_factor <- cos((angle - 45) * pi / 180) # Combine factors xg <- distance_factor * angle_factor # Apply logistic transformation xg <- 1 / (1 + exp(-5 * (xg - 0.5))) # Clip between 0 and 1 xg <- pmax(0, pmin(1, xg)) return(xg) } # Apply basic xG calculation shot_data <- shot_data %>% mutate(xG_basic = calculate_basic_xg(distance, angle)) # Train Random Forest xG model xg_model <- randomForest( as.factor(goal) ~ distance + angle + header + big_chance + defenders, data = shot_data, ntree = 100, importance = TRUE ) # Get xG predictions shot_data$xG_ml <- predict(xg_model, shot_data, type = "prob")[, 2] # Player-level xG analysis player_shots <- data.frame( player = c("Player A", "Player A", "Player B", "Player B", "Player C"), xG = c(0.15, 0.42, 0.08, 0.65, 0.22), goal = c(0, 1, 0, 1, 0) ) # Calculate player xG performance player_summary <- player_shots %>% group_by(player) %>% summarise( Total_xG = sum(xG), Actual_Goals = sum(goal) ) %>% mutate(xG_Performance = Actual_Goals - Total_xG) print("Player xG Performance:") print(player_summary) # Visualize xG performance xg_plot <- player_summary %>% pivot_longer(cols = c(Total_xG, Actual_Goals), names_to = "Metric", values_to = "Goals") %>% ggplot(aes(x = player, y = Goals, fill = Metric)) + geom_bar(stat = "identity", position = "dodge", alpha = 0.8) + labs( title = "Expected vs Actual Goals by Player", x = "Player", y = "Goals", fill = "" ) + theme_minimal() + theme( plot.title = element_text(hjust = 0.5, size = 14, face = "bold"), legend.position = "bottom" ) print(xg_plot) # Feature importance importance_df <- data.frame( feature = rownames(importance(xg_model)), importance = importance(xg_model)[, "MeanDecreaseGini"] ) %>% arrange(desc(importance)) print("Feature Importance in xG Model:") print(importance_df) # Visualize feature importance importance_plot <- ggplot(importance_df, aes(x = reorder(feature, importance), y = importance)) + geom_bar(stat = "identity", fill = "steelblue", alpha = 0.8) + coord_flip() + labs( title = "Feature Importance in xG Model", x = "Feature", y = "Importance" ) + theme_minimal() print(importance_plot) ``` ## Interpretation Guidelines ### Player Evaluation - **xG Performance > 0**: Player is overperforming expected goals (clinical finisher) - **xG Performance < 0**: Player is underperforming expected goals (needs improvement) - **xG Performance ≈ 0**: Player is performing as expected ### Team Analysis - High team xG but low actual goals suggests poor finishing - Low team xG but high goals suggests clinical finishing or luck - Sustained xG overperformance is difficult to maintain ### Match Analysis - Compare xG between teams to assess match dominance - xG can reveal "unlucky" losses where team created better chances - Useful for identifying sustainable performance vs. variance ## Limitations 1. **Context Blind**: xG doesn't account for player skill level 2. **Defensive Pressure**: Difficulty quantifying immediate pressure on shooter 3. **Goalkeeper Quality**: Treats all goalkeepers as average 4. **Match Situation**: Doesn't consider score, time remaining, or tactical context ## Best Practices - Use xG over multiple matches (minimum 10-15 games) for reliability - Combine with qualitative analysis - Consider xG per shot and xG per 90 minutes - Track trends over time rather than single match values - Use xG alongside actual goals for comprehensive evaluation

Discussion

Have questions or feedback? Join our community discussion on Discord or GitHub Discussions.