Case Study: The Curious Case of the Turnover Machine

Scenario

The Green Bay Packers defense had a remarkable 2023 season statistically. They led the league in interceptions (25) and finished top-5 in total turnovers (32). Local media declared them an "elite, ball-hawking defense." The GM is considering extending multiple defensive players to premium contracts based on this performance.

However, the analytics department has concerns. Their EPA-based metrics tell a different story, and they've been asked to present a full evaluation to ownership: Is this defense truly elite, or are they benefiting from unsustainable turnover luck?

Team Defensive Stats: - 25 interceptions (1st in NFL) - 32 total turnovers (T-3rd) - -0.02 EPA per play allowed (16th) - 45.2% success rate allowed (19th) - 240.1 pass yards per game allowed (18th)


Data Gathering

import nfl_data_py as nfl
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load 2023 data
pbp = nfl.import_pbp_data([2023])

# Filter to Packers defense
team = 'GB'
gb_defense = pbp[pbp['defteam'] == team]

# Basic stats
passes_against = gb_defense[gb_defense['pass_attempt'] == 1]
runs_against = gb_defense[gb_defense['rush_attempt'] == 1]

print("Green Bay Packers Defense 2023")
print("=" * 50)
print(f"Total Plays Against: {len(gb_defense[gb_defense['play_type'].isin(['pass', 'run'])])}")
print(f"Interceptions: {passes_against['interception'].sum()}")
print(f"Fumbles Recovered: {gb_defense['fumble_lost'].sum()}")

Analysis 1: Turnover Context

First, let's examine whether these turnovers were earned or fortunate:

# Interception analysis
interceptions = passes_against[passes_against['interception'] == 1]

int_context = {
    'total': len(interceptions),
    'avg_air_yards': interceptions['air_yards'].mean(),
    'avg_pass_epa_before_int': passes_against[passes_against['interception'] == 0]['epa'].mean(),
    'deep_ints': (interceptions['air_yards'] >= 15).sum(),
    'short_ints': (interceptions['air_yards'] <= 5).sum(),
    'tipped_ints': len(interceptions)  # Would need charting data
}

print("Interception Analysis:")
print(f"  Total INTs: {int_context['total']}")
print(f"  Average Air Yards: {int_context['avg_air_yards']:.1f}")
print(f"  Deep (15+ yards): {int_context['deep_ints']}")
print(f"  Short (0-5 yards): {int_context['short_ints']}")

# Compare to league
all_passes = pbp[pbp['pass_attempt'] == 1]
league_int_rate = all_passes['interception'].mean()
gb_int_rate = passes_against['interception'].mean()

print(f"\nInterception Rate: {gb_int_rate:.2%} (League: {league_int_rate:.2%})")
print(f"  -> {gb_int_rate/league_int_rate:.1f}x league average")

Finding: Green Bay's interception rate was 1.8x the league average. This is a red flag - such extreme outperformance rarely persists.


Analysis 2: Non-Turnover Performance

What happens when we remove turnovers from the equation?

# EPA analysis excluding turnovers
all_plays = gb_defense[gb_defense['play_type'].isin(['pass', 'run'])]
turnover_plays = all_plays[(all_plays['interception'] == 1) | (all_plays['fumble_lost'] == 1)]
non_turnover = all_plays[~((all_plays['interception'] == 1) | (all_plays['fumble_lost'] == 1))]

print("EPA Analysis:")
print(f"  With turnovers:    {all_plays['epa'].mean():.3f}")
print(f"  Without turnovers: {non_turnover['epa'].mean():.3f}")
print(f"  Turnovers alone:   {turnover_plays['epa'].mean():.3f}")

# What percentage of their "good" defense is turnovers?
total_negative_epa = all_plays[all_plays['epa'] < 0]['epa'].sum()
turnover_negative_epa = turnover_plays['epa'].sum()

print(f"\nTurnover contribution to negative EPA: {turnover_negative_epa/total_negative_epa:.1%}")

Finding: When we exclude turnovers, Green Bay's defense allowed +0.05 EPA/play - which would rank 26th in the league. Their entire "elite" defensive performance was driven by turnovers.


Analysis 3: Historical Turnover Regression

Let's examine whether turnover rates persist:

# Load multiple years
multi_year = nfl.import_pbp_data([2021, 2022, 2023])

# Calculate team INT rates by year
def calc_int_rate(df, year):
    passes = df[(df['pass_attempt'] == 1) & (df['season'] == year)]
    return passes.groupby('defteam')['interception'].mean()

int_2021 = calc_int_rate(multi_year, 2021)
int_2022 = calc_int_rate(multi_year, 2022)
int_2023 = calc_int_rate(multi_year, 2023)

# Year-to-year correlation
corr_21_22 = int_2021.corr(int_2022)
corr_22_23 = int_2022.corr(int_2023)

print("Year-to-Year INT Rate Correlation:")
print(f"  2021 to 2022: r = {corr_21_22:.3f}")
print(f"  2022 to 2023: r = {corr_22_23:.3f}")

# Green Bay specifically
print(f"\nGreen Bay INT Rate:")
print(f"  2021: {int_2021.get('GB', 0):.2%}")
print(f"  2022: {int_2022.get('GB', 0):.2%}")
print(f"  2023: {int_2023.get('GB', 0):.2%}")

Finding: Year-to-year INT rate correlation is approximately 0.25 - meaning 75% of the variance is NOT skill. Green Bay's 2023 INT rate is a statistical outlier unlikely to persist.


Analysis 4: Coverage Quality Without Turnovers

How well did Green Bay actually cover receivers?

# Coverage metrics (excluding sacks)
coverage_plays = passes_against[passes_against['sack'] == 0]

coverage_metrics = {
    'comp_pct_allowed': coverage_plays['complete_pass'].mean(),
    'yards_per_target': coverage_plays['yards_gained'].mean(),
    'deep_comp_pct': coverage_plays[coverage_plays['air_yards'] >= 15]['complete_pass'].mean(),
    'yac_allowed': coverage_plays[coverage_plays['complete_pass'] == 1]['yards_after_catch'].mean(),
    'pass_epa_non_int': coverage_plays[coverage_plays['interception'] == 0]['epa'].mean()
}

# League comparison
all_coverage = pbp[(pbp['pass_attempt'] == 1) & (pbp['sack'] == 0)]
league_comp_pct = all_coverage['complete_pass'].mean()
league_yac = all_coverage[all_coverage['complete_pass'] == 1]['yards_after_catch'].mean()

print("Coverage Quality (excluding turnovers):")
print(f"  Comp % Allowed:    {coverage_metrics['comp_pct_allowed']:.1%} (League: {league_comp_pct:.1%})")
print(f"  Yards/Target:      {coverage_metrics['yards_per_target']:.1f}")
print(f"  Deep Comp %:       {coverage_metrics['deep_comp_pct']:.1%}")
print(f"  YAC Allowed:       {coverage_metrics['yac_allowed']:.1f} (League: {league_yac:.1f})")
print(f"  Non-INT Pass EPA:  {coverage_metrics['pass_epa_non_int']:.3f}")

Finding: Green Bay allowed a HIGHER completion percentage than league average and gave up significant YAC. Their coverage was actually below average - they just got lucky on turnovers.


Analysis 5: Pass Rush Analysis

Maybe their pass rush created turnover opportunities?

# Pass rush metrics
pass_rush = {
    'sack_rate': passes_against['sack'].mean(),
    'qb_hit_rate': passes_against['qb_hit'].mean(),
    'pressure_rate': ((passes_against['sack'] == 1) |
                      (passes_against['qb_hit'] == 1) |
                      (passes_against['qb_scramble'] == 1)).mean()
}

# League comparison
league_sack_rate = all_passes['sack'].mean()

print("Pass Rush Analysis:")
print(f"  Sack Rate:     {pass_rush['sack_rate']:.1%} (League: {league_sack_rate:.1%})")
print(f"  QB Hit Rate:   {pass_rush['qb_hit_rate']:.1%}")
print(f"  Pressure Rate: {pass_rush['pressure_rate']:.1%}")

# INTs on pressured vs clean plays
passes_against_copy = passes_against.copy()
passes_against_copy['pressured'] = (
    (passes_against_copy['sack'] == 1) |
    (passes_against_copy['qb_hit'] == 1)
)

int_on_pressure = passes_against_copy[passes_against_copy['pressured'] == True]['interception'].mean()
int_on_clean = passes_against_copy[passes_against_copy['pressured'] == False]['interception'].mean()

print(f"\nINT Rate by Pressure:")
print(f"  On pressure:    {int_on_pressure:.2%}")
print(f"  Clean pocket:   {int_on_clean:.2%}")

Finding: Green Bay's pass rush was average. Critically, most of their interceptions came on clean pocket plays - meaning the turnovers weren't generated by pressure forcing bad throws.


Analysis 6: Regression Projection

What should we expect next year?

# Regression to mean calculation
def regress_int_rate(observed_rate, league_rate, sample_size, regression_constant=50):
    """
    Regress observed INT rate toward league mean.
    Higher regression_constant = more regression.
    """
    regressed = (
        (observed_rate * sample_size + league_rate * regression_constant) /
        (sample_size + regression_constant)
    )
    return regressed

observed = gb_int_rate
league = league_int_rate
sample = len(passes_against)

projected = regress_int_rate(observed, league, sample)

print("Regression Projection:")
print(f"  2023 Observed INT Rate: {observed:.2%}")
print(f"  League Average:         {league:.2%}")
print(f"  Projected 2024:         {projected:.2%}")
print(f"  Expected INTs (500 PA): {projected * 500:.1f}")

# Expected EPA change from INT regression
current_int_epa = turnover_plays[turnover_plays['interception'] == 1]['epa'].mean()
int_reduction = (observed - projected) * sample
epa_loss = int_reduction * abs(current_int_epa)

print(f"\nExpected EPA Loss from INT Regression: {epa_loss:.1f} total")
print(f"  -> Approximately {epa_loss/sample:.3f} EPA/play worse")

Finding: Regression analysis suggests Green Bay should expect approximately 15-17 interceptions next year instead of 25 - a significant decline that will expose their underlying coverage weaknesses.


Visualization

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.suptitle('Green Bay Defense: Turnover Luck vs Skill', fontsize=14, fontweight='bold')

# Plot 1: EPA with vs without turnovers
ax1 = axes[0, 0]
categories = ['With TOs', 'Without TOs', 'League Avg']
epa_values = [all_plays['epa'].mean(), non_turnover['epa'].mean(),
              pbp[pbp['play_type'].isin(['pass', 'run'])]['epa'].mean()]
colors = ['green', 'red', 'gray']
ax1.bar(categories, epa_values, color=colors, alpha=0.7)
ax1.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
ax1.set_ylabel('EPA per Play Allowed')
ax1.set_title('EPA Allowed: Turnover Impact')

# Plot 2: Year-over-year INT rate
ax2 = axes[0, 1]
years = ['2021', '2022', '2023', '2024 Proj']
rates = [int_2021.get('GB', 0.02)*100, int_2022.get('GB', 0.025)*100,
         gb_int_rate*100, projected*100]
ax2.plot(years, rates, 'o-', markersize=10, linewidth=2)
ax2.axhline(y=league_int_rate*100, color='gray', linestyle='--', label='League Avg')
ax2.set_ylabel('INT Rate (%)')
ax2.set_title('Green Bay INT Rate by Year')
ax2.legend()

# Plot 3: Coverage quality comparison
ax3 = axes[1, 0]
metrics_labels = ['Comp %', 'Yards/Target', 'YAC']
gb_values = [coverage_metrics['comp_pct_allowed']*100,
             coverage_metrics['yards_per_target'],
             coverage_metrics['yac_allowed']]
league_values = [league_comp_pct*100,
                 all_coverage['yards_gained'].mean(),
                 league_yac]

x = np.arange(len(metrics_labels))
width = 0.35
ax3.bar(x - width/2, gb_values, width, label='Green Bay')
ax3.bar(x + width/2, league_values, width, label='League', alpha=0.6)
ax3.set_xticks(x)
ax3.set_xticklabels(metrics_labels)
ax3.set_title('Coverage Quality (lower is better)')
ax3.legend()

# Plot 4: Turnover-adjusted ranking
ax4 = axes[1, 1]
rankings = ['With TOs\nRank', 'Without TOs\nRank']
ranks = [16, 26]  # Example ranks
colors = ['green', 'red']
ax4.barh(rankings, ranks, color=colors, alpha=0.7)
ax4.set_xlabel('Defensive Rank (1 = Best)')
ax4.set_title('Defensive Ranking Change')
ax4.set_xlim(0, 32)

plt.tight_layout()
plt.savefig('turnover_luck_analysis.png', dpi=300, bbox_inches='tight')
plt.close()

Conclusions

Assessment

The Green Bay defense is NOT elite.

Evidence: 1. Turnover-adjusted EPA ranks 26th, not 16th 2. INT rate at 1.8x league average is unsustainable (r = 0.25 year-to-year) 3. Coverage quality was below average (high comp %, high YAC) 4. Pass rush was average, didn't create turnover opportunities 5. Historical regression suggests 15-17 INTs next year, not 25

Contract Recommendations

Player Recommendation Rationale
CB1 Caution Benefited from team INT rate, coverage was average
S1 Moderate Ball skills may persist somewhat, but expect regression
EDGE Market value Pass rush was average, not responsible for INTs
LB Caution Run defense was below average, turnovers masked issues

Financial Implications

If Green Bay extends at "elite" rates: - Total investment risk: ~$45M over contract - Expected performance decline: Regression to ~20th ranked defense - Value mismatch: Paying top-10 money for 20th ranked performance

Recommended approach: 1. Do NOT pay premium for turnover-driven performance 2. Short-term deals with incentives tied to coverage metrics 3. Invest in proven coverage and pass rush


Discussion Questions

  1. How would your analysis change if you had access to PFF coverage grades?

  2. Are there any circumstances where high INT rates ARE sustainable?

  3. How should teams balance turnover potential vs coverage consistency in free agency?

  4. What metrics would better predict INT rate persistence?

  5. How does this analysis apply to evaluating DBs in the draft?


Key Lessons

  1. Turnovers are noisy - treat extreme rates with skepticism
  2. Look at non-turnover performance for true quality assessment
  3. Historical regression is your friend - use it
  4. Premium contracts require sustainable performance - not luck
  5. Analytics can save millions in avoided bad contracts