Case Study: The Curious Case of the Turnover Machine
Scenario
The Green Bay Packers defense had a remarkable 2023 season statistically. They led the league in interceptions (25) and finished top-5 in total turnovers (32). Local media declared them an "elite, ball-hawking defense." The GM is considering extending multiple defensive players to premium contracts based on this performance.
However, the analytics department has concerns. Their EPA-based metrics tell a different story, and they've been asked to present a full evaluation to ownership: Is this defense truly elite, or are they benefiting from unsustainable turnover luck?
Team Defensive Stats: - 25 interceptions (1st in NFL) - 32 total turnovers (T-3rd) - -0.02 EPA per play allowed (16th) - 45.2% success rate allowed (19th) - 240.1 pass yards per game allowed (18th)
Data Gathering
import nfl_data_py as nfl
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Load 2023 data
pbp = nfl.import_pbp_data([2023])
# Filter to Packers defense
team = 'GB'
gb_defense = pbp[pbp['defteam'] == team]
# Basic stats
passes_against = gb_defense[gb_defense['pass_attempt'] == 1]
runs_against = gb_defense[gb_defense['rush_attempt'] == 1]
print("Green Bay Packers Defense 2023")
print("=" * 50)
print(f"Total Plays Against: {len(gb_defense[gb_defense['play_type'].isin(['pass', 'run'])])}")
print(f"Interceptions: {passes_against['interception'].sum()}")
print(f"Fumbles Recovered: {gb_defense['fumble_lost'].sum()}")
Analysis 1: Turnover Context
First, let's examine whether these turnovers were earned or fortunate:
# Interception analysis
interceptions = passes_against[passes_against['interception'] == 1]
int_context = {
'total': len(interceptions),
'avg_air_yards': interceptions['air_yards'].mean(),
'avg_pass_epa_before_int': passes_against[passes_against['interception'] == 0]['epa'].mean(),
'deep_ints': (interceptions['air_yards'] >= 15).sum(),
'short_ints': (interceptions['air_yards'] <= 5).sum(),
'tipped_ints': len(interceptions) # Would need charting data
}
print("Interception Analysis:")
print(f" Total INTs: {int_context['total']}")
print(f" Average Air Yards: {int_context['avg_air_yards']:.1f}")
print(f" Deep (15+ yards): {int_context['deep_ints']}")
print(f" Short (0-5 yards): {int_context['short_ints']}")
# Compare to league
all_passes = pbp[pbp['pass_attempt'] == 1]
league_int_rate = all_passes['interception'].mean()
gb_int_rate = passes_against['interception'].mean()
print(f"\nInterception Rate: {gb_int_rate:.2%} (League: {league_int_rate:.2%})")
print(f" -> {gb_int_rate/league_int_rate:.1f}x league average")
Finding: Green Bay's interception rate was 1.8x the league average. This is a red flag - such extreme outperformance rarely persists.
Analysis 2: Non-Turnover Performance
What happens when we remove turnovers from the equation?
# EPA analysis excluding turnovers
all_plays = gb_defense[gb_defense['play_type'].isin(['pass', 'run'])]
turnover_plays = all_plays[(all_plays['interception'] == 1) | (all_plays['fumble_lost'] == 1)]
non_turnover = all_plays[~((all_plays['interception'] == 1) | (all_plays['fumble_lost'] == 1))]
print("EPA Analysis:")
print(f" With turnovers: {all_plays['epa'].mean():.3f}")
print(f" Without turnovers: {non_turnover['epa'].mean():.3f}")
print(f" Turnovers alone: {turnover_plays['epa'].mean():.3f}")
# What percentage of their "good" defense is turnovers?
total_negative_epa = all_plays[all_plays['epa'] < 0]['epa'].sum()
turnover_negative_epa = turnover_plays['epa'].sum()
print(f"\nTurnover contribution to negative EPA: {turnover_negative_epa/total_negative_epa:.1%}")
Finding: When we exclude turnovers, Green Bay's defense allowed +0.05 EPA/play - which would rank 26th in the league. Their entire "elite" defensive performance was driven by turnovers.
Analysis 3: Historical Turnover Regression
Let's examine whether turnover rates persist:
# Load multiple years
multi_year = nfl.import_pbp_data([2021, 2022, 2023])
# Calculate team INT rates by year
def calc_int_rate(df, year):
passes = df[(df['pass_attempt'] == 1) & (df['season'] == year)]
return passes.groupby('defteam')['interception'].mean()
int_2021 = calc_int_rate(multi_year, 2021)
int_2022 = calc_int_rate(multi_year, 2022)
int_2023 = calc_int_rate(multi_year, 2023)
# Year-to-year correlation
corr_21_22 = int_2021.corr(int_2022)
corr_22_23 = int_2022.corr(int_2023)
print("Year-to-Year INT Rate Correlation:")
print(f" 2021 to 2022: r = {corr_21_22:.3f}")
print(f" 2022 to 2023: r = {corr_22_23:.3f}")
# Green Bay specifically
print(f"\nGreen Bay INT Rate:")
print(f" 2021: {int_2021.get('GB', 0):.2%}")
print(f" 2022: {int_2022.get('GB', 0):.2%}")
print(f" 2023: {int_2023.get('GB', 0):.2%}")
Finding: Year-to-year INT rate correlation is approximately 0.25 - meaning 75% of the variance is NOT skill. Green Bay's 2023 INT rate is a statistical outlier unlikely to persist.
Analysis 4: Coverage Quality Without Turnovers
How well did Green Bay actually cover receivers?
# Coverage metrics (excluding sacks)
coverage_plays = passes_against[passes_against['sack'] == 0]
coverage_metrics = {
'comp_pct_allowed': coverage_plays['complete_pass'].mean(),
'yards_per_target': coverage_plays['yards_gained'].mean(),
'deep_comp_pct': coverage_plays[coverage_plays['air_yards'] >= 15]['complete_pass'].mean(),
'yac_allowed': coverage_plays[coverage_plays['complete_pass'] == 1]['yards_after_catch'].mean(),
'pass_epa_non_int': coverage_plays[coverage_plays['interception'] == 0]['epa'].mean()
}
# League comparison
all_coverage = pbp[(pbp['pass_attempt'] == 1) & (pbp['sack'] == 0)]
league_comp_pct = all_coverage['complete_pass'].mean()
league_yac = all_coverage[all_coverage['complete_pass'] == 1]['yards_after_catch'].mean()
print("Coverage Quality (excluding turnovers):")
print(f" Comp % Allowed: {coverage_metrics['comp_pct_allowed']:.1%} (League: {league_comp_pct:.1%})")
print(f" Yards/Target: {coverage_metrics['yards_per_target']:.1f}")
print(f" Deep Comp %: {coverage_metrics['deep_comp_pct']:.1%}")
print(f" YAC Allowed: {coverage_metrics['yac_allowed']:.1f} (League: {league_yac:.1f})")
print(f" Non-INT Pass EPA: {coverage_metrics['pass_epa_non_int']:.3f}")
Finding: Green Bay allowed a HIGHER completion percentage than league average and gave up significant YAC. Their coverage was actually below average - they just got lucky on turnovers.
Analysis 5: Pass Rush Analysis
Maybe their pass rush created turnover opportunities?
# Pass rush metrics
pass_rush = {
'sack_rate': passes_against['sack'].mean(),
'qb_hit_rate': passes_against['qb_hit'].mean(),
'pressure_rate': ((passes_against['sack'] == 1) |
(passes_against['qb_hit'] == 1) |
(passes_against['qb_scramble'] == 1)).mean()
}
# League comparison
league_sack_rate = all_passes['sack'].mean()
print("Pass Rush Analysis:")
print(f" Sack Rate: {pass_rush['sack_rate']:.1%} (League: {league_sack_rate:.1%})")
print(f" QB Hit Rate: {pass_rush['qb_hit_rate']:.1%}")
print(f" Pressure Rate: {pass_rush['pressure_rate']:.1%}")
# INTs on pressured vs clean plays
passes_against_copy = passes_against.copy()
passes_against_copy['pressured'] = (
(passes_against_copy['sack'] == 1) |
(passes_against_copy['qb_hit'] == 1)
)
int_on_pressure = passes_against_copy[passes_against_copy['pressured'] == True]['interception'].mean()
int_on_clean = passes_against_copy[passes_against_copy['pressured'] == False]['interception'].mean()
print(f"\nINT Rate by Pressure:")
print(f" On pressure: {int_on_pressure:.2%}")
print(f" Clean pocket: {int_on_clean:.2%}")
Finding: Green Bay's pass rush was average. Critically, most of their interceptions came on clean pocket plays - meaning the turnovers weren't generated by pressure forcing bad throws.
Analysis 6: Regression Projection
What should we expect next year?
# Regression to mean calculation
def regress_int_rate(observed_rate, league_rate, sample_size, regression_constant=50):
"""
Regress observed INT rate toward league mean.
Higher regression_constant = more regression.
"""
regressed = (
(observed_rate * sample_size + league_rate * regression_constant) /
(sample_size + regression_constant)
)
return regressed
observed = gb_int_rate
league = league_int_rate
sample = len(passes_against)
projected = regress_int_rate(observed, league, sample)
print("Regression Projection:")
print(f" 2023 Observed INT Rate: {observed:.2%}")
print(f" League Average: {league:.2%}")
print(f" Projected 2024: {projected:.2%}")
print(f" Expected INTs (500 PA): {projected * 500:.1f}")
# Expected EPA change from INT regression
current_int_epa = turnover_plays[turnover_plays['interception'] == 1]['epa'].mean()
int_reduction = (observed - projected) * sample
epa_loss = int_reduction * abs(current_int_epa)
print(f"\nExpected EPA Loss from INT Regression: {epa_loss:.1f} total")
print(f" -> Approximately {epa_loss/sample:.3f} EPA/play worse")
Finding: Regression analysis suggests Green Bay should expect approximately 15-17 interceptions next year instead of 25 - a significant decline that will expose their underlying coverage weaknesses.
Visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.suptitle('Green Bay Defense: Turnover Luck vs Skill', fontsize=14, fontweight='bold')
# Plot 1: EPA with vs without turnovers
ax1 = axes[0, 0]
categories = ['With TOs', 'Without TOs', 'League Avg']
epa_values = [all_plays['epa'].mean(), non_turnover['epa'].mean(),
pbp[pbp['play_type'].isin(['pass', 'run'])]['epa'].mean()]
colors = ['green', 'red', 'gray']
ax1.bar(categories, epa_values, color=colors, alpha=0.7)
ax1.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
ax1.set_ylabel('EPA per Play Allowed')
ax1.set_title('EPA Allowed: Turnover Impact')
# Plot 2: Year-over-year INT rate
ax2 = axes[0, 1]
years = ['2021', '2022', '2023', '2024 Proj']
rates = [int_2021.get('GB', 0.02)*100, int_2022.get('GB', 0.025)*100,
gb_int_rate*100, projected*100]
ax2.plot(years, rates, 'o-', markersize=10, linewidth=2)
ax2.axhline(y=league_int_rate*100, color='gray', linestyle='--', label='League Avg')
ax2.set_ylabel('INT Rate (%)')
ax2.set_title('Green Bay INT Rate by Year')
ax2.legend()
# Plot 3: Coverage quality comparison
ax3 = axes[1, 0]
metrics_labels = ['Comp %', 'Yards/Target', 'YAC']
gb_values = [coverage_metrics['comp_pct_allowed']*100,
coverage_metrics['yards_per_target'],
coverage_metrics['yac_allowed']]
league_values = [league_comp_pct*100,
all_coverage['yards_gained'].mean(),
league_yac]
x = np.arange(len(metrics_labels))
width = 0.35
ax3.bar(x - width/2, gb_values, width, label='Green Bay')
ax3.bar(x + width/2, league_values, width, label='League', alpha=0.6)
ax3.set_xticks(x)
ax3.set_xticklabels(metrics_labels)
ax3.set_title('Coverage Quality (lower is better)')
ax3.legend()
# Plot 4: Turnover-adjusted ranking
ax4 = axes[1, 1]
rankings = ['With TOs\nRank', 'Without TOs\nRank']
ranks = [16, 26] # Example ranks
colors = ['green', 'red']
ax4.barh(rankings, ranks, color=colors, alpha=0.7)
ax4.set_xlabel('Defensive Rank (1 = Best)')
ax4.set_title('Defensive Ranking Change')
ax4.set_xlim(0, 32)
plt.tight_layout()
plt.savefig('turnover_luck_analysis.png', dpi=300, bbox_inches='tight')
plt.close()
Conclusions
Assessment
The Green Bay defense is NOT elite.
Evidence: 1. Turnover-adjusted EPA ranks 26th, not 16th 2. INT rate at 1.8x league average is unsustainable (r = 0.25 year-to-year) 3. Coverage quality was below average (high comp %, high YAC) 4. Pass rush was average, didn't create turnover opportunities 5. Historical regression suggests 15-17 INTs next year, not 25
Contract Recommendations
| Player | Recommendation | Rationale |
|---|---|---|
| CB1 | Caution | Benefited from team INT rate, coverage was average |
| S1 | Moderate | Ball skills may persist somewhat, but expect regression |
| EDGE | Market value | Pass rush was average, not responsible for INTs |
| LB | Caution | Run defense was below average, turnovers masked issues |
Financial Implications
If Green Bay extends at "elite" rates: - Total investment risk: ~$45M over contract - Expected performance decline: Regression to ~20th ranked defense - Value mismatch: Paying top-10 money for 20th ranked performance
Recommended approach: 1. Do NOT pay premium for turnover-driven performance 2. Short-term deals with incentives tied to coverage metrics 3. Invest in proven coverage and pass rush
Discussion Questions
-
How would your analysis change if you had access to PFF coverage grades?
-
Are there any circumstances where high INT rates ARE sustainable?
-
How should teams balance turnover potential vs coverage consistency in free agency?
-
What metrics would better predict INT rate persistence?
-
How does this analysis apply to evaluating DBs in the draft?
Key Lessons
- Turnovers are noisy - treat extreme rates with skepticism
- Look at non-turnover performance for true quality assessment
- Historical regression is your friend - use it
- Premium contracts require sustainable performance - not luck
- Analytics can save millions in avoided bad contracts