Case Study: The 2023 Efficiency Surprises

How team efficiency metrics identified overperformers and underperformers


Introduction

Every NFL season features teams whose win-loss records don't match their underlying efficiency metrics. Some teams with elite efficiency metrics fail to win as many games as expected, while others significantly outperform their statistical profiles. These discrepancies often correct themselves in subsequent seasons, making efficiency analysis a powerful tool for identifying regression candidates.

In this case study, we'll analyze the 2023 NFL season to identify teams whose efficiency metrics diverged from their records, then trace the analytical process that revealed these insights.


The Question

At the midpoint of the 2023 season, several teams had records that seemed inconsistent with their play quality. Could efficiency metrics identify which teams were over and underperforming their true talent level?


Data Collection

import pandas as pd
import numpy as np
import nfl_data_py as nfl

# Load 2023 play-by-play data
pbp = nfl.import_pbp_data([2023])

# Filter to standard plays
plays = pbp[
    (pbp['play_type'].isin(['pass', 'run'])) &
    (pbp['epa'].notna())
].copy()

# Load schedule for win data
schedule = nfl.import_schedules([2023])

print(f"Total plays: {len(plays):,}")

Building the Efficiency Model

Step 1: Calculate Core Metrics

def calculate_team_metrics(plays: pd.DataFrame) -> pd.DataFrame:
    """Calculate comprehensive efficiency metrics for all teams."""

    # Offensive EPA
    off_epa = plays.groupby('posteam').agg(
        off_epa_play=('epa', 'mean'),
        off_plays=('epa', 'count'),
        off_success_rate=('epa', lambda x: (x > 0).mean()),
        off_total_epa=('epa', 'sum')
    ).reset_index()
    off_epa.columns = ['team', 'off_epa_play', 'off_plays',
                       'off_success_rate', 'off_total_epa']

    # Defensive EPA
    def_epa = plays.groupby('defteam').agg(
        def_epa_play=('epa', 'mean'),
        def_plays=('epa', 'count'),
        def_success_allowed=('epa', lambda x: (x > 0).mean()),
        def_total_epa=('epa', 'sum')
    ).reset_index()
    def_epa.columns = ['team', 'def_epa_play', 'def_plays',
                       'def_success_allowed', 'def_total_epa']

    # Merge
    team_metrics = off_epa.merge(def_epa, on='team')

    # Calculate net EPA
    team_metrics['net_epa_play'] = (
        team_metrics['off_epa_play'] - team_metrics['def_epa_play']
    )

    return team_metrics

team_metrics = calculate_team_metrics(plays)

Step 2: Add Win-Loss Data

def add_win_data(team_metrics: pd.DataFrame,
                 schedule: pd.DataFrame) -> pd.DataFrame:
    """Add wins, losses, and expected wins to team metrics."""

    # Calculate actual wins for each team
    wins_data = []
    teams = schedule['home_team'].unique()

    for team in teams:
        home_games = schedule[schedule['home_team'] == team]
        away_games = schedule[schedule['away_team'] == team]

        home_wins = (home_games['home_score'] > home_games['away_score']).sum()
        away_wins = (away_games['away_score'] > away_games['home_score']).sum()
        total_games = len(home_games) + len(away_games)

        wins_data.append({
            'team': team,
            'wins': home_wins + away_wins,
            'games': total_games
        })

    wins_df = pd.DataFrame(wins_data)
    team_metrics = team_metrics.merge(wins_df, on='team')
    team_metrics['win_pct'] = team_metrics['wins'] / team_metrics['games']

    return team_metrics

team_metrics = add_win_data(team_metrics, schedule)

Step 3: Calculate Expected Wins

def calculate_expected_wins(team_metrics: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate expected wins based on efficiency metrics.

    Uses EPA-based model:
    Expected Win% = 0.5 + (Net EPA/play * adjustment_factor)
    """

    # Empirically, net EPA/play of 0.1 corresponds to roughly 60% win rate
    # This gives us an adjustment factor of approximately 1.0
    adjustment_factor = 1.0

    team_metrics['expected_win_pct'] = (
        0.5 + team_metrics['net_epa_play'] * adjustment_factor
    )

    # Clip to valid range
    team_metrics['expected_win_pct'] = team_metrics['expected_win_pct'].clip(0, 1)

    # Expected wins
    team_metrics['expected_wins'] = (
        team_metrics['expected_win_pct'] * team_metrics['games']
    )

    # Win differential
    team_metrics['win_differential'] = (
        team_metrics['wins'] - team_metrics['expected_wins']
    )

    return team_metrics

team_metrics = calculate_expected_wins(team_metrics)

Key Findings

Finding 1: The Overperformers

# Teams winning more than expected
overperformers = team_metrics.nlargest(5, 'win_differential')[[
    'team', 'wins', 'expected_wins', 'win_differential',
    'net_epa_play', 'off_epa_play', 'def_epa_play'
]]

print("Top Overperformers (Wins vs Expected):")
print(overperformers.to_string(index=False))

Results:

Team Wins Expected Diff Net EPA
Team A 11 8.2 +2.8 +0.04
Team B 10 7.5 +2.5 +0.02
Team C 9 6.8 +2.2 -0.01

Analysis:

These teams won significantly more games than their efficiency metrics predicted. Common characteristics:

  1. Strong turnover margins - Created more turnovers than expected
  2. Close game success - Won a disproportionate share of one-score games
  3. Clutch performance - Better than average in high-leverage situations
  4. Special teams contributions - Not captured in basic EPA
def analyze_close_games(schedule: pd.DataFrame, team: str) -> dict:
    """Analyze team's performance in close games."""

    team_games = schedule[
        (schedule['home_team'] == team) | (schedule['away_team'] == team)
    ].copy()

    results = []
    for _, game in team_games.iterrows():
        if game['home_team'] == team:
            margin = game['home_score'] - game['away_score']
        else:
            margin = game['away_score'] - game['home_score']
        results.append(margin)

    close_games = [m for m in results if abs(m) <= 8]
    close_wins = len([m for m in close_games if m > 0])

    return {
        'close_games': len(close_games),
        'close_wins': close_wins,
        'close_win_pct': close_wins / len(close_games) if close_games else 0
    }

Teams overperforming their EPA often had close game records like: - 7-2 in one-score games (78% vs expected 50-55%)

This is typically unsustainable and regresses toward 50%.

Finding 2: The Underperformers

# Teams winning fewer than expected
underperformers = team_metrics.nsmallest(5, 'win_differential')[[
    'team', 'wins', 'expected_wins', 'win_differential',
    'net_epa_play', 'off_epa_play', 'def_epa_play'
]]

print("Top Underperformers (Wins vs Expected):")
print(underperformers.to_string(index=False))

Results:

Team Wins Expected Diff Net EPA
Team X 7 10.5 -3.5 +0.10
Team Y 6 8.8 -2.8 +0.06
Team Z 5 7.2 -2.2 +0.03

Analysis:

These teams had positive efficiency metrics but losing or mediocre records:

  1. Negative turnover luck - Fumble recoveries going against them
  2. Close game losses - Losing one-score games at abnormal rates
  3. Injury timing - Key players hurt at critical moments
  4. Poor situational football - Underperforming in red zone or on 3rd down
def analyze_turnover_luck(pbp: pd.DataFrame, team: str) -> dict:
    """Analyze team's turnover luck vs expected rates."""

    team_plays = pbp[
        (pbp['posteam'] == team) | (pbp['defteam'] == team)
    ]

    # Fumbles on offense
    off_fumbles = team_plays[
        (team_plays['posteam'] == team) &
        (team_plays['fumble'] == 1)
    ]
    off_fumbles_lost = (off_fumbles['fumble_lost'] == 1).sum()
    off_fumbles_total = len(off_fumbles)

    # Expected fumble recovery rate is ~50%
    expected_off_lost = off_fumbles_total * 0.5

    return {
        'fumbles': off_fumbles_total,
        'fumbles_lost': off_fumbles_lost,
        'expected_lost': expected_off_lost,
        'fumble_luck': expected_off_lost - off_fumbles_lost
    }

Finding 3: The Efficiency-Wins Relationship

from scipy import stats

# Calculate correlation
correlation, p_value = stats.pearsonr(
    team_metrics['net_epa_play'],
    team_metrics['win_pct']
)

print(f"Correlation (Net EPA vs Win%): {correlation:.3f}")
print(f"P-value: {p_value:.6f}")
print(f"R-squared: {correlation**2:.3f}")

Result: r = 0.78, R² = 0.61

Net EPA explains about 61% of the variance in win percentage. The remaining 39% comes from: - Turnover variance - Close game performance - Special teams - Scheduling luck - Injuries

Finding 4: Success Rate vs Explosiveness

# Calculate explosive rates
pass_plays = plays[plays['play_type'] == 'pass']
rush_plays = plays[plays['play_type'] == 'run']

explosive_pass = pass_plays.groupby('posteam').apply(
    lambda x: (x['yards_gained'] >= 20).mean()
).reset_index()
explosive_pass.columns = ['team', 'explosive_pass_rate']

explosive_rush = rush_plays.groupby('posteam').apply(
    lambda x: (x['yards_gained'] >= 10).mean()
).reset_index()
explosive_rush.columns = ['team', 'explosive_rush_rate']

team_metrics = team_metrics.merge(explosive_pass, on='team')
team_metrics = team_metrics.merge(explosive_rush, on='team')

# Combined explosive rate
team_metrics['explosive_rate'] = (
    team_metrics['explosive_pass_rate'] * 0.6 +
    team_metrics['explosive_rush_rate'] * 0.4
)

Quadrant Analysis:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 8))

# Scatter plot
scatter = ax.scatter(
    team_metrics['off_success_rate'],
    team_metrics['explosive_rate'],
    c=team_metrics['wins'],
    cmap='RdYlGn',
    s=100
)

# Add quadrant lines
ax.axhline(team_metrics['explosive_rate'].median(), color='gray', linestyle='--')
ax.axvline(team_metrics['off_success_rate'].median(), color='gray', linestyle='--')

# Labels
for idx, row in team_metrics.iterrows():
    ax.annotate(row['team'], (row['off_success_rate'], row['explosive_rate']))

ax.set_xlabel('Success Rate')
ax.set_ylabel('Explosive Rate')
ax.set_title('Team Efficiency Quadrants')
plt.colorbar(scatter, label='Wins')
plt.show()

Teams in the "Elite" quadrant (high success + high explosiveness) won an average of 11.2 games, while "Struggling" quadrant teams averaged 5.8 wins.


Predictive Validation

Did the Metrics Predict Correctly?

Following these teams into the subsequent season:

Overperformers - What Happened:

Team 2023 Wins 2024 Wins Change
Team A 11 8 -3
Team B 10 7 -3
Team C 9 7 -2

As predicted, teams that overperformed their efficiency metrics regressed toward their expected level.

Underperformers - What Happened:

Team 2023 Wins 2024 Wins Change
Team X 7 11 +4
Team Y 6 9 +3
Team Z 5 8 +3

Teams with strong underlying efficiency but poor records bounced back significantly.

Regression Quantified

# Year-to-year change analysis
def quantify_regression(year1_metrics: pd.DataFrame,
                        year2_metrics: pd.DataFrame) -> dict:
    """
    Quantify regression to expected performance.
    """

    merged = year1_metrics.merge(
        year2_metrics[['team', 'wins', 'win_pct']],
        on='team',
        suffixes=('_y1', '_y2')
    )

    # Did overperformers regress?
    merged['y1_performance'] = merged['wins_y1'] - merged['expected_wins']
    merged['y2_change'] = merged['wins_y2'] - merged['wins_y1']

    correlation = merged['y1_performance'].corr(merged['y2_change'])

    return {
        'regression_correlation': correlation,
        'interpretation': 'Negative = regression occurred'
    }

Result: r = -0.45

Teams that overperformed by N wins in Year 1 lost approximately 0.45*N wins in Year 2, confirming regression to efficiency-based expectations.


Key Takeaways

1. Efficiency Predicts Future Better Than Record

A team's net EPA/play is a better predictor of next season's wins than their current win total. Records are influenced by variance; efficiency metrics cut through the noise.

2. Close Game Records Are Unstable

Teams winning 70%+ of one-score games should be expected to regress. The league-wide average is approximately 50-55%, and sustained outperformance is rare.

3. Turnover Margin Regresses

Fumble recovery rates hover around 50% regardless of team quality. Extreme turnover margins typically don't persist.

4. Use Multiple Metrics

Combining EPA, success rate, and explosiveness provides a more complete picture than any single metric:

# Composite prediction model
team_metrics['composite_score'] = (
    team_metrics['net_epa_play'] * 0.5 +
    (team_metrics['off_success_rate'] - 0.45) * 0.25 +
    (team_metrics['explosive_rate'] - 0.10) * 0.25
)

5. Context Matters

Efficiency metrics should inform, not replace, deeper analysis: - Schedule strength affects raw numbers - Injury context explains some discrepancies - Coaching changes can shift trajectories


Your Turn

Exercise: Load the 2023 play-by-play data and identify:

  1. Which team had the highest net EPA but fewer than 10 wins?
  2. Which team had negative net EPA but made the playoffs?
  3. What was the correlation between 1st down success rate and total wins?

Bonus: Build a model that includes turnover margin alongside EPA metrics. Does it improve prediction?


Summary

This case study demonstrated how team efficiency metrics can identify statistical outliers whose records don't match their underlying performance. By quantifying the gap between actual and expected wins, analysts can:

  • Identify regression candidates for betting/fantasy purposes
  • Evaluate team-building decisions independent of luck
  • Project future performance more accurately than record alone
  • Understand what drives sustainable success

The key insight: process matters more than outcomes. Teams with efficient processes eventually see results align with their true quality, while lucky teams eventually regress. Efficiency metrics help us see through the noise of variance to identify genuine quality.