Goals Saved Above Expected (GSAx)
Goals Saved Above Expected (GSAx)
Goals Saved Above Expected (GSAx) represents a critical component of modern hockey analytics. This metric provides teams, analysts, and fans with objective, data-driven insights into player and team performance that go far beyond traditional counting statistics. Understanding this concept is essential for evaluating talent, making strategic decisions, and gaining competitive advantages in the NHL.
Understanding Goals Saved Above Expected (GSAx)
In the modern NHL, goals saved above expected (gsax) has become an indispensable tool for front offices, coaching staffs, and analytics departments. This metric evolved from traditional hockey statistics that often failed to account for context, quality of competition, team effects, and random variation. By incorporating advanced statistical methodologies and machine learning techniques, analysts can now isolate individual contributions and measure true talent more accurately.
The foundation of this analysis rests on comprehensive play-by-play data collected from every NHL game since the 2007-08 season. Each event—whether it's a shot attempt, faceoff, hit, or zone entry—is recorded with precise location coordinates, timestamps, game state information (score, time remaining, manpower situation), and all players on the ice. This granular data enables sophisticated models that can estimate expected outcomes and compare actual results to these baselines.
Key Formulas and Metrics
Expected Goals (xG) = Σ P(Goal | Shoti features)
Goals Above Expected = Actual Goals - Expected Goals
Shooting Talent = GAx / Shots
xG Per 60 = (xG / TOI) × 60
Shot Quality = xG / Shot Attempts
Python Implementation
import pandas as pd
import numpy as np
from hockey_scraper import scrape_games
from sklearn.preprocessing import StandardScaler
# Scrape NHL play-by-play data
season_games = list(range(2023020001, 2023020101))
pbp_data = scrape_games(season_games, True, data_format='Pandas')
# Extract play-by-play and shifts data
pbp = pbp_data['pbp']
shifts = pbp_data['shifts']
# Calculate player-level metrics
def calculate_player_metrics(pbp_df, player_name):
# Filter events where player was on ice
player_events = pbp_df[
(pbp_df['Home_Players'].str.contains(player_name, na=False)) |
(pbp_df['Away_Players'].str.contains(player_name, na=False))
].copy()
# Determine player's team for each event
player_events['Player_Team'] = np.where(
player_events['Home_Players'].str.contains(player_name, na=False),
player_events['Home_Team'],
player_events['Away_Team']
)
# Shot attempts (Corsi)
shot_events = player_events[player_events['Event'].isin(['SHOT', 'MISS', 'BLOCK', 'GOAL'])]
corsi_for = len(shot_events[shot_events['Ev_Team'] == shot_events['Player_Team']])
corsi_against = len(shot_events[shot_events['Ev_Team'] != shot_events['Player_Team']])
corsi_pct = (corsi_for / (corsi_for + corsi_against) * 100) if (corsi_for + corsi_against) > 0 else 0
# Calculate time on ice
player_shifts = shifts[shifts['Player'] == player_name]
toi_seconds = player_shifts['Duration'].sum()
toi_minutes = toi_seconds / 60
# Per 60 rates
cf_60 = (corsi_for / toi_minutes) * 60 if toi_minutes > 0 else 0
ca_60 = (corsi_against / toi_minutes) * 60 if toi_minutes > 0 else 0
# Goals and assists
goals = len(pbp_df[(pbp_df['Event'] == 'GOAL') &
(pbp_df['p1_name'] == player_name)])
assists = len(pbp_df[(pbp_df['Event'] == 'GOAL') &
((pbp_df['p2_name'] == player_name) |
(pbp_df['p3_name'] == player_name))])
return {
'player': player_name,
'toi_minutes': round(toi_minutes, 2),
'corsi_for': corsi_for,
'corsi_against': corsi_against,
'corsi_pct': round(corsi_pct, 2),
'cf_60': round(cf_60, 2),
'ca_60': round(ca_60, 2),
'goals': goals,
'assists': assists,
'points': goals + assists
}
# Analyze multiple players
players = ['Connor McDavid', 'Auston Matthews', 'Cale Makar']
results = [calculate_player_metrics(pbp, player) for player in players]
# Convert to DataFrame
metrics_df = pd.DataFrame(results)
print("Player Analytics Summary:")
print(metrics_df.to_string(index=False))
# Advanced: Calculate team-level aggregates
def team_analytics(pbp_df, team_name):
team_events = pbp_df[
(pbp_df['Ev_Team'] == team_name) |
(pbp_df['Home_Team'] == team_name) |
(pbp_df['Away_Team'] == team_name)
].copy()
# Team shot attempts
team_shots = team_events[team_events['Event'].isin(['SHOT', 'MISS', 'BLOCK', 'GOAL'])]
shots_for = len(team_shots[team_shots['Ev_Team'] == team_name])
shots_against = len(team_shots[team_shots['Ev_Team'] != team_name])
# Team goals
goals_for = len(team_events[(team_events['Event'] == 'GOAL') &
(team_events['Ev_Team'] == team_name)])
goals_against = len(team_events[(team_events['Event'] == 'GOAL') &
(team_events['Ev_Team'] != team_name)])
return {
'team': team_name,
'shots_for': shots_for,
'shots_against': shots_against,
'shot_pct': round(shots_for/(shots_for+shots_against)*100, 2),
'goals_for': goals_for,
'goals_against': goals_against,
'goal_diff': goals_for - goals_against
}
# Example team analysis
team_stats = team_analytics(pbp, 'EDM')
print("\nTeam Analytics:")
print(team_stats)
R Implementation
library(hockeyR)
library(tidyverse)
library(fastRhockey)
# Load play-by-play data for current season
pbp_data <- load_pbp(2023)
# Calculate player-level metrics
calculate_player_stats <- function(pbp, player_name) {
# Filter for player on-ice events
player_pbp <- pbp %>%
filter(
str_detect(home_on_1, player_name) |
str_detect(home_on_2, player_name) |
str_detect(home_on_3, player_name) |
str_detect(home_on_4, player_name) |
str_detect(home_on_5, player_name) |
str_detect(home_on_6, player_name) |
str_detect(away_on_1, player_name) |
str_detect(away_on_2, player_name) |
str_detect(away_on_3, player_name) |
str_detect(away_on_4, player_name) |
str_detect(away_on_5, player_name) |
str_detect(away_on_6, player_name)
) %>%
mutate(
player_team = case_when(
str_detect(home_on_1, player_name) ~ home_abbreviation,
str_detect(home_on_2, player_name) ~ home_abbreviation,
str_detect(home_on_3, player_name) ~ home_abbreviation,
str_detect(home_on_4, player_name) ~ home_abbreviation,
str_detect(home_on_5, player_name) ~ home_abbreviation,
str_detect(home_on_6, player_name) ~ home_abbreviation,
TRUE ~ away_abbreviation
)
)
# Calculate Corsi (shot attempts)
corsi_events <- player_pbp %>%
filter(event_type %in% c("SHOT", "MISS", "BLOCKED_SHOT", "GOAL"))
corsi_for <- corsi_events %>%
filter(event_team == player_team) %>%
nrow()
corsi_against <- corsi_events %>%
filter(event_team != player_team) %>%
nrow()
corsi_pct <- if((corsi_for + corsi_against) > 0) {
(corsi_for / (corsi_for + corsi_against)) * 100
} else { 0 }
# Count goals and assists
goals <- pbp %>%
filter(event_type == "GOAL",
event_player_1 == player_name) %>%
nrow()
assists <- pbp %>%
filter(event_type == "GOAL",
event_player_2 == player_name |
event_player_3 == player_name) %>%
nrow()
# Return metrics
tibble(
player = player_name,
corsi_for = corsi_for,
corsi_against = corsi_against,
corsi_pct = round(corsi_pct, 2),
goals = goals,
assists = assists,
points = goals + assists
)
}
# Analyze multiple players
players <- c("Connor McDavid", "Auston Matthews", "Cale Makar")
player_stats <- map_df(players, ~calculate_player_stats(pbp_data, .x))
print("Player Statistics:")
print(player_stats)
# Team-level analysis
team_stats <- pbp_data %>%
filter(event_type %in% c("SHOT", "GOAL", "MISS", "BLOCKED_SHOT")) %>%
group_by(event_team) %>%
summarize(
shot_attempts = n(),
shots_on_goal = sum(event_type == "SHOT"),
goals = sum(event_type == "GOAL"),
shooting_pct = (goals / shots_on_goal) * 100
) %>%
arrange(desc(shot_attempts))
print("\nTeam Statistics:")
print(team_stats)
# Advanced: Zone-based analysis
zone_analysis <- pbp_data %>%
filter(event_type %in% c("SHOT", "GOAL")) %>%
mutate(
zone = case_when(
x_fixed > 25 ~ "Offensive",
x_fixed < -25 ~ "Defensive",
TRUE ~ "Neutral"
)
) %>%
group_by(event_team, zone) %>%
summarize(
shots = n(),
goals = sum(event_type == "GOAL"),
.groups = 'drop'
)
print("\nZone-Based Shooting:")
print(zone_analysis)
Practical Applications in NHL Teams
NHL front offices and coaching staffs use goals saved above expected (gsax) extensively in player evaluation, contract negotiations, and strategic decision-making. During the draft process, scouts combine traditional scouting reports with analytics to identify undervalued prospects who excel in metrics that predict NHL success. For free agent signings, teams compare asking prices to projected on-ice value using regression models trained on historical data.
In-game coaching decisions also benefit from this analysis. Teams track real-time metrics to optimize line matching, identify which players perform best against specific opponents, and determine when to pull the goalie for an extra attacker. Between games, video coaches use these metrics to prepare scouting reports, highlighting opponent tendencies in faceoff formations, power play entries, and defensive zone coverage schemes.
Player development programs incorporate this analysis to provide individualized feedback. Young players receive detailed reports showing their strengths and weaknesses relative to NHL benchmarks, with specific drills designed to address deficiencies. Veteran players use the data to adapt their game as they age, potentially extending careers by maximizing efficiency in advantageous situations while avoiding unfavorable matchups.
Key Takeaways
- Goals Saved Above Expected (GSAx) provides objective, data-driven insights beyond traditional counting statistics
- Modern analytics use play-by-play data with location coordinates and game state context
- Python's hockey_scraper and R's hockeyR packages enable easy access to NHL data
- Rate statistics (per 60 minutes) allow fair comparisons across players with different ice time
- Relative metrics isolate individual contributions by controlling for team effects
- NHL teams use these analytics for player evaluation, contract negotiations, and strategy
- Combining multiple metrics provides a more complete picture than any single statistic
- Understanding context (score effects, quality of competition, zone starts) is essential for accurate analysis