Introduction to Hockey Analytics
Introduction to Hockey Analytics
Hockey analytics has revolutionized how NHL teams evaluate players, make strategic decisions, and build championship rosters. This field combines traditional statistics with advanced metrics derived from play-by-play data, tracking systems, and mathematical modeling to provide deeper insights beyond conventional box scores. From shot-based metrics like Corsi and Fenwick to player value models like WAR and GAR, modern analytics offers a comprehensive framework for understanding hockey performance at individual, line, and team levels.
Understanding Hockey Analytics
The evolution of hockey analytics began in the early 2000s when pioneers like Vic Ferrari, Gabriel Desjardins, and Alan Ryder started developing metrics from publicly available play-by-play data. These early adopters recognized that traditional statistics like plus/minus and goals had significant limitations and didnt capture the full picture of player contributions. They developed foundational concepts like Corsi (all shot attempts) and Fenwick (unblocked shot attempts) to measure puck possession and territorial control, which proved to be better predictors of future success than many traditional metrics.
Today, hockey analytics encompasses multiple interconnected domains. Shot-based metrics quantify offensive and defensive efficiency through volume (Corsi, Fenwick) and quality (expected goals). Possession indicators track zone entries, exits, and time to measure territorial advantage. Player evaluation models like WAR, RAPM, and GAR estimate individual value while controlling for teammates and competition. Goaltending analytics separate luck from skill using Goals Saved Above Expected. Special teams analysis optimizes power play and penalty kill strategies. Each domain provides complementary insights that together create comprehensive player and team evaluations.
The NHLs deployment of puck and player tracking technology in 2020-21 marked a transformative advancement. These systems capture position data 20 times per second, recording puck location, puck speed, player coordinates, skating velocity, and player orientation throughout games. This granular spatial data enables analysis previously impossible: skating efficiency metrics, optimal passing angles, defensive coverage quality, and shot pre-shot movement. Combined with machine learning and predictive modeling, tracking data helps teams optimize line combinations, identify undervalued players, and develop strategic advantages through data-driven decision making.
Key Components
- Shot-Based Metrics: Corsi counts all shot attempts (shots on goal + missed shots + blocked shots), Fenwick excludes blocked shots, and expected goals (xG) weights each shot by probability of scoring based on distance, angle, type, and context.
- Possession Indicators: Zone entries (especially controlled entries with possession), zone exits (clean exits vs. turnovers), and offensive zone time quantify territorial control and puck possession better than raw shot counts.
- Player Value Models: WAR (Wins Above Replacement), RAPM (Regularized Adjusted Plus-Minus), and GAR (Goals Above Replacement) estimate individual contributions while accounting for teammates, competition, and ice time.
- Goaltending Analytics: Save percentage by location and danger level, Goals Saved Above Expected (GSAx), and quality start percentage separate goaltender skill from team defensive performance and random variance.
- Context Adjustments: Score effects (teams trailing generate more shots), venue effects (home ice advantage), quality of competition/teammates (WOWY analysis), and strength state (5v5, power play, penalty kill) provide essential context.
Mathematical Foundation
Shot Differential = Shots For - Shots Against
Corsi For % = (Corsi For) / (Corsi For + Corsi Against) × 100
Expected Goals = Σ P(Goal | Distance, Angle, Type, Rush, Traffic, Rebounds)
PDO = (Shooting % + Save %) × 1000
These formulas represent core concepts in hockey analytics. Shot differential measures territorial control. Corsi% quantifies possession balance and typically regresses toward 50% over time. Expected goals predict scoring based on shot characteristics. PDO (shooting percentage plus save percentage) tends to regress to 1000, helping identify luck-driven results.
Python Implementation
import pandas as pd
import numpy as np
from hockey_scraper import scrape_games
from datetime import datetime
# Scrape NHL play-by-play data for 2023-24 season
games = scrape_games(datetime(2023, 10, 1), datetime(2024, 4, 15))
def calculate_team_corsi(pbp_data, team_name, strength="5x5"):
"""
Calculate comprehensive Corsi metrics for a team.
Returns Corsi For, Against, Percentage, and relative metrics.
"""
shot_events = ['SHOT', 'GOAL', 'MISS', 'BLOCK']
# Filter for specified strength state
filtered = pbp_data[pbp_data['Strength'] == strength]
# Corsi For
corsi_for = len(filtered[
(filtered['Event'].isin(shot_events)) &
(filtered['Ev_Team'] == team_name)
])
# Corsi Against
corsi_against = len(filtered[
(filtered['Event'].isin(shot_events)) &
(filtered['Ev_Team'] != team_name) &
(filtered['Home_Team'].isin([team_name]) |
filtered['Away_Team'].isin([team_name]))
])
# Calculate percentages
total_corsi = corsi_for + corsi_against
corsi_pct = (corsi_for / total_corsi * 100) if total_corsi > 0 else 0
# Per 60 minutes
total_toi = filtered['Period'].count() * 20 / 60 # Rough estimate
corsi_for_60 = (corsi_for / total_toi) * 60 if total_toi > 0 else 0
return {
'team': team_name,
'corsi_for': corsi_for,
'corsi_against': corsi_against,
'corsi_pct': round(corsi_pct, 2),
'corsi_for_60': round(corsi_for_60, 2),
'corsi_rel': round(corsi_pct - 50, 2) # Relative to league average
}
# Calculate for multiple teams
teams = ['COL', 'TOR', 'BOS', 'TBL', 'CAR']
results = [calculate_team_corsi(games, team) for team in teams]
results_df = pd.DataFrame(results)
print(results_df.sort_values('corsi_pct', ascending=False))
# Advanced: Calculate player-level Corsi with zone starts
def calculate_player_corsi_with_context(pbp_data, player_name):
"""
Calculate player Corsi with quality of competition and zone start adjustments.
"""
player_events = pbp_data[
(pbp_data['p1_name'] == player_name) |
(pbp_data['p2_name'] == player_name)
]
# Filter 5v5 and shot events
shot_events = ['SHOT', 'GOAL', 'MISS', 'BLOCK']
player_5v5 = player_events[player_events['Strength'] == '5x5']
corsi_for = len(player_5v5[
player_5v5['Event'].isin(shot_events)
])
# Zone starts (offensive zone starts vs defensive zone starts)
oz_starts = len(player_events[player_events['Ev_Zone'] == 'Off'])
dz_starts = len(player_events[player_events['Ev_Zone'] == 'Def'])
zone_start_pct = (oz_starts / (oz_starts + dz_starts) * 100) if (oz_starts + dz_starts) > 0 else 50
return {
'player': player_name,
'corsi_for': corsi_for,
'zone_start_pct': round(zone_start_pct, 1)
}
R Implementation
library(tidyverse)
library(hockeyR)
library(fastRhockey)
# Load NHL play-by-play data
pbp <- load_pbp(2023)
# Calculate team Corsi metrics at 5v5
team_corsi <- pbp %>%
filter(strength_state == "5v5") %>%
filter(event_type %in% c("SHOT", "GOAL", "MISSED_SHOT", "BLOCKED_SHOT")) %>%
group_by(event_team) %>%
summarise(
corsi_for = n(),
games = n_distinct(game_id),
goals = sum(event_type == "GOAL"),
.groups = "drop"
) %>%
mutate(
corsi_per_game = corsi_for / games,
shooting_pct = (goals / corsi_for) * 100
) %>%
arrange(desc(corsi_for))
# Calculate Corsi percentage (requires against calculation)
team_corsi_pct <- pbp %>%
filter(strength_state == "5v5") %>%
filter(event_type %in% c("SHOT", "GOAL", "MISSED_SHOT", "BLOCKED_SHOT")) %>%
mutate(
team_for = event_team,
team_against = ifelse(event_team == home_team, away_team, home_team)
) %>%
group_by(team_for) %>%
summarise(
corsi_for = n(),
.groups = "drop"
) %>%
left_join(
pbp %>%
filter(strength_state == "5v5") %>%
filter(event_type %in% c("SHOT", "GOAL", "MISSED_SHOT", "BLOCKED_SHOT")) %>%
mutate(team_against = ifelse(event_team == home_team, away_team, home_team)) %>%
group_by(team_against) %>%
summarise(corsi_against = n(), .groups = "drop"),
by = c("team_for" = "team_against")
) %>%
mutate(
corsi_pct = (corsi_for / (corsi_for + corsi_against)) * 100,
corsi_rel = corsi_pct - 50
) %>%
arrange(desc(corsi_pct))
# Visualize Corsi leaders
ggplot(team_corsi_pct %>% head(10),
aes(x = reorder(team_for, corsi_pct), y = corsi_pct)) +
geom_col(aes(fill = corsi_pct), show.legend = FALSE) +
geom_hline(yintercept = 50, linetype = "dashed", color = "red", alpha = 0.7) +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
coord_flip() +
labs(
title = "NHL Team Corsi For % Leaders (5v5)",
subtitle = "2023-24 Season",
x = NULL,
y = "Corsi For %"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
axis.text = element_text(size = 10)
)
# Player-level analysis with context
player_corsi <- pbp %>%
filter(strength_state == "5v5") %>%
filter(!is.na(event_player_1)) %>%
filter(event_type %in% c("SHOT", "GOAL", "MISSED_SHOT", "BLOCKED_SHOT")) %>%
group_by(event_player_1) %>%
summarise(
corsi = n(),
goals = sum(event_type == "GOAL"),
teams = paste(unique(event_team), collapse = ","),
.groups = "drop"
) %>%
filter(corsi >= 50) %>%
mutate(shooting_pct = (goals / corsi) * 100) %>%
arrange(desc(corsi))
NHL Application
NHL teams employ dedicated analytics departments that leverage these metrics for comprehensive player evaluation, strategic planning, and roster construction. The Carolina Hurricanes famously built a perennial contender by targeting players with elite possession metrics (Corsi, Fenwick) who were undervalued in traditional markets. The Toronto Maple Leafs use advanced analytics to optimize lineup combinations, identify buy-low trade candidates, and inform contract negotiations in arbitration cases. The Tampa Bay Lightning integrated tracking data with biomechanical analysis to manage player workload and prevent injuries during their back-to-back championship runs.
During games, coaching staffs monitor real-time analytics dashboards displaying shot attempt differentials, expected goals, zone time, and line matchup performance. This information guides critical decisions: which defensive pairs to deploy against opponent top lines, when to pull the goalie based on win probability models, and how to adjust forechecking pressure based on possession trends. Analytics also inform systems and strategies—teams analyze thousands of zone entries to determine optimal dump-and-chase versus controlled entry rates, study defensive coverage patterns to minimize high-danger chances, and optimize power play formations based on expected goals added by different personnel groupings and ice configurations.
Interpreting the Results
| Metric | Elite Range | Average Range | Poor Range | Context |
|---|---|---|---|---|
| Corsi For % (Team) | 52%+ | 48-52% | <48% | Colorado averaged 53.2% in 2023-24; Carolina consistently leads league |
| xG For % (Team) | 53%+ | 48-52% | <48% | Better predictor than Corsi; accounts for shot quality and location |
| Individual xG/60 (5v5) | 0.8+ | 0.4-0.7 | <0.4 | Matthews, MacKinnon, Kucherov exceed 1.0; elite goal scorers generate quality |
| PDO (Sh%+Sv%) | N/A | 995-1005 | N/A | Extreme values (>1020 or <980) indicate luck; regresses to 1000 |
| Goals Above Replacement | 20+ | 5-15 | <5 | Hart Trophy finalists typically 25+ GAR; Replacement level = 0 GAR |
Key Takeaways
- Hockey analytics combines traditional statistics with advanced metrics to provide comprehensive evaluation beyond box scores, measuring possession, shot quality, and individual value while controlling for context.
- Core analytical concepts include shot-based metrics (Corsi, Fenwick, xG), possession indicators (zone entries/exits), player value models (WAR, GAR, RAPM), and goaltending analytics (GSAx).
- NHL teams use analytics across all operations: player evaluation, trade decisions, lineup optimization, in-game strategy, contract negotiations, draft selection, and player development programs.
- Effective analysis requires understanding context including score effects, quality of competition/teammates, zone starts, and sample size considerations when interpreting metrics and making decisions.
- Public platforms like Natural Stat Trick, Evolving Hockey, and MoneyPuck democratize sophisticated analytics, while tracking data since 2020-21 enables spatial analysis previously impossible.