Defensive analytics presents unique challenges that make it arguably the most difficult area in football evaluation. Unlike offense, where credit flows toward the player with the ball, defensive success involves 11 players working in coordination...
In This Chapter
- Learning Objectives
- Introduction: The Challenge of Defensive Analytics
- Team-Level Defensive Metrics
- Pass Rush Analytics
- Coverage Analytics
- Run Defense Analytics
- Situational Defense
- Turnover Analysis
- Opponent Adjustment
- Individual Defensive Evaluation
- Building a Defensive Evaluation System
- Common Pitfalls in Defensive Analysis
- Summary
- Preview: Chapter 11
Chapter 10: Defensive Analytics
Learning Objectives
By the end of this chapter, you will be able to:
- Calculate and interpret EPA allowed and defensive success rate
- Understand the challenges of individual defensive attribution
- Apply pass rush and coverage metrics appropriately
- Evaluate run defense with EPA-based approaches
- Separate team from individual defensive performance
- Account for opponent and situation adjustments
- Build comprehensive defensive evaluation systems
Introduction: The Challenge of Defensive Analytics
Defensive analytics presents unique challenges that make it arguably the most difficult area in football evaluation. Unlike offense, where credit flows toward the player with the ball, defensive success involves 11 players working in coordination, with outcomes often determined by the weakest link rather than the strongest performer.
Why Defense Is Hard to Measure
The Attribution Problem
On a 6-yard completion: - Did the coverage defender fail? - Did the pass rusher not get there fast enough? - Did the linebacker lose the receiver in zone? - Did the defensive call give up the play? - Was the quarterback simply excellent?
Unlike a receiver who clearly catches the ball, defensive "success" is often the absence of offensive success. This negative space is inherently harder to measure and attribute.
The Interdependence Challenge
Defensive positions work together more tightly than offensive positions:
Offense: QB throws to WR → Individual credit easier
Defense: DB covers WR while DL rushes QB while LB drops → Shared responsibility
A cornerback's success rate depends heavily on: - How quickly the pass rush arrives - How long the QB has to throw - Whether safety help is available - The route combinations he faces - The offensive scheme he opposes
The Variance Problem
Defensive statistics are noisier than offensive stats: - Fewer opportunities per player (passes defended, interceptions) - Higher randomness in outcomes (tipped passes, fumble recoveries) - Dependent on offensive decisions (who gets targeted)
This means defensive metrics require larger samples to stabilize and carry more uncertainty.
Team-Level Defensive Metrics
EPA Allowed
The foundation of modern defensive evaluation is EPA allowed per play - the same metric we use offensively, but from the opponent's perspective.
import nfl_data_py as nfl
import pandas as pd
import numpy as np
# Load data
pbp = nfl.import_pbp_data([2023])
# Calculate defensive EPA (from opponent's perspective)
def calculate_defensive_epa(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Calculate EPA allowed by each defense.
Lower (more negative) is better.
"""
# Regular plays only
plays = pbp[
(pbp['play_type'].isin(['pass', 'run'])) &
(pbp['epa'].notna())
].copy()
# EPA allowed is opponent's EPA
defense_epa = (plays
.groupby('defteam')
.agg(
plays_against=('epa', 'count'),
total_epa_allowed=('epa', 'sum'),
epa_per_play=('epa', 'mean'),
success_rate_allowed=('epa', lambda x: (x > 0).mean())
)
.sort_values('epa_per_play')
)
return defense_epa
defense_stats = calculate_defensive_epa(pbp)
print("EPA Allowed Rankings (lower is better):")
print(defense_stats.head(10).round(3).to_string())
Interpreting Defensive EPA:
| EPA/Play | Interpretation |
|---|---|
| < -0.10 | Elite defense |
| -0.10 to -0.05 | Above average |
| -0.05 to 0.00 | Average |
| 0.00 to 0.05 | Below average |
| > 0.05 | Poor defense |
Defensive Success Rate
Success rate allowed measures how often the offense achieves positive EPA against a defense:
def defensive_success_rate(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Calculate success rate allowed by defense.
Lower is better - offense succeeding less often.
"""
plays = pbp[
(pbp['play_type'].isin(['pass', 'run'])) &
(pbp['epa'].notna())
]
success_allowed = (plays
.groupby('defteam')
.agg(
plays=('epa', 'count'),
success_allowed=('epa', lambda x: (x > 0).mean()),
explosive_allowed=('epa', lambda x: (x > 1.0).mean()),
negative_plays=('epa', lambda x: (x < -0.5).mean())
)
.sort_values('success_allowed')
)
return success_allowed
Success Rate vs EPA:
- Success rate measures consistency of stops
- EPA captures magnitude of plays
A defense that allows 40% success rate but gives up big plays when they do succeed will have worse EPA than one allowing 45% success rate but limiting damage.
Pass Defense vs Run Defense
Defenses often specialize, excelling against one attack while vulnerable to another:
def split_defense_analysis(pbp: pd.DataFrame) -> pd.DataFrame:
"""Analyze pass and run defense separately."""
plays = pbp[
(pbp['play_type'].isin(['pass', 'run'])) &
(pbp['epa'].notna())
]
# Pass defense
pass_def = (plays[plays['play_type'] == 'pass']
.groupby('defteam')
.agg(
pass_plays=('epa', 'count'),
pass_epa_allowed=('epa', 'mean'),
pass_success_allowed=('epa', lambda x: (x > 0).mean())
)
)
# Run defense
run_def = (plays[plays['play_type'] == 'run']
.groupby('defteam')
.agg(
run_plays=('epa', 'count'),
run_epa_allowed=('epa', 'mean'),
run_success_allowed=('epa', lambda x: (x > 0).mean())
)
)
# Combine
defense = pass_def.join(run_def)
defense['balance'] = defense['pass_epa_allowed'] - defense['run_epa_allowed']
return defense
split_defense = split_defense_analysis(pbp)
# Pass defense specialists (better vs pass)
print("Pass Defense Specialists:")
print(split_defense.nsmallest(5, 'pass_epa_allowed')[['pass_epa_allowed', 'run_epa_allowed']].round(3).to_string())
# Run defense specialists
print("\nRun Defense Specialists:")
print(split_defense.nsmallest(5, 'run_epa_allowed')[['pass_epa_allowed', 'run_epa_allowed']].round(3).to_string())
Pass Rush Analytics
Traditional Metrics and Their Limitations
Sacks are the most visible pass rush statistic but are problematic: - High variance (one sack per ~30 pass plays) - Dependent on coverage quality - Affected by QB mobility - Influenced by offensive line quality
Pressures (sacks + QB hits + hurries) are more informative but still limited: - "Hurry" definitions vary - No standard tracking in public data - Still doesn't capture how quickly pressure arrives
Pressure Rate Estimation
Without charting data, we can estimate pressure from play-by-play:
def estimate_pressure_rate(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Estimate defensive pressure rate from outcomes.
Sacks, QB hits, and scrambles indicate pressure.
"""
passes = pbp[pbp['pass_attempt'] == 1]
# Define pressure indicators
passes = passes.copy()
passes['pressured'] = (
(passes['sack'] == 1) |
(passes['qb_hit'] == 1) |
(passes['qb_scramble'] == 1)
)
pressure_analysis = (passes
.groupby('defteam')
.agg(
dropbacks=('pass_attempt', 'count'),
sacks=('sack', 'sum'),
qb_hits=('qb_hit', 'sum'),
scrambles=('qb_scramble', 'sum'),
pressured=('pressured', 'sum')
)
)
pressure_analysis['sack_rate'] = pressure_analysis['sacks'] / pressure_analysis['dropbacks']
pressure_analysis['pressure_rate'] = pressure_analysis['pressured'] / pressure_analysis['dropbacks']
# EPA when pressuring vs not
pressure_epa = (passes
.groupby(['defteam', 'pressured'])
['epa']
.mean()
.unstack()
)
pressure_epa.columns = ['no_pressure_epa', 'pressure_epa']
return pressure_analysis.join(pressure_epa)
pressure_stats = estimate_pressure_rate(pbp)
print("Defensive Pressure Rankings:")
print(pressure_stats.nlargest(10, 'sack_rate')[['sack_rate', 'pressure_rate']].round(3).to_string())
Pass Rush Efficiency
Sack rate alone doesn't capture pass rush quality. We need to consider:
def pass_rush_efficiency(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Evaluate pass rush efficiency beyond sacks.
"""
passes = pbp[pbp['pass_attempt'] == 1].copy()
# Completion metrics when not sacked
completions = passes[passes['sack'] == 0]
rush_efficiency = (passes
.groupby('defteam')
.agg(
dropbacks=('pass_attempt', 'count'),
sacks=('sack', 'sum'),
interceptions=('interception', 'sum'),
pass_epa_allowed=('epa', 'mean')
)
)
# Completion % allowed (when not sacked)
comp_allowed = (completions
.groupby('defteam')
.agg(
comp_pct_allowed=('complete_pass', 'mean'),
yards_per_attempt=('air_yards', 'mean')
)
)
return rush_efficiency.join(comp_allowed)
Individual Pass Rusher Evaluation
Individual pass rusher stats require charting data (PFF, SIS) but we can analyze defensive line impact through team splits:
def estimate_individual_rush_impact(pbp: pd.DataFrame, team: str) -> dict:
"""
Estimate pass rush impact through available proxies.
Note: True individual attribution requires film charting.
"""
team_passes = pbp[(pbp['pass_attempt'] == 1) & (pbp['defteam'] == team)]
# Analyze by game (proxy for lineup changes)
game_analysis = (team_passes
.groupby('game_id')
.agg(
sack_rate=('sack', 'mean'),
pressure_rate=('pressured', 'mean') if 'pressured' in team_passes.columns else ('sack', 'mean'),
pass_epa_allowed=('epa', 'mean')
)
)
return {
'team': team,
'games': len(game_analysis),
'sack_rate_mean': game_analysis['sack_rate'].mean(),
'sack_rate_std': game_analysis['sack_rate'].std(),
'game_to_game_variance': game_analysis['sack_rate'].var()
}
Coverage Analytics
The Coverage Evaluation Challenge
Coverage is perhaps the hardest area to evaluate in football:
- Target-based metrics are biased: Good corners get targeted less
- Completion percentage allowed is noisy: Sample sizes are small
- Yards per coverage snap is hidden: Not in standard data
- Scheme effects dominate: Zone vs man affects individual stats
Available Coverage Metrics
From standard play-by-play data, we can analyze team-level coverage:
def coverage_analysis(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Analyze team coverage metrics from PBP data.
"""
passes = pbp[
(pbp['pass_attempt'] == 1) &
(pbp['sack'] == 0) # Exclude sacks for coverage analysis
]
coverage = (passes
.groupby('defteam')
.agg(
targets=('pass_attempt', 'count'),
completions=('complete_pass', 'sum'),
interceptions=('interception', 'sum'),
yards_allowed=('yards_gained', 'sum'),
air_yards_allowed=('air_yards', 'sum'),
tds_allowed=('pass_touchdown', 'sum'),
pass_epa=('epa', 'mean')
)
)
coverage['comp_pct_allowed'] = coverage['completions'] / coverage['targets']
coverage['yards_per_target'] = coverage['yards_allowed'] / coverage['targets']
coverage['int_rate'] = coverage['interceptions'] / coverage['targets']
coverage['td_rate_allowed'] = coverage['tds_allowed'] / coverage['targets']
# Passer rating allowed (team level)
coverage['passer_rating_allowed'] = (
((coverage['comp_pct_allowed'] - 0.3) * 5 +
(coverage['yards_per_target'] - 3) * 0.25 +
(coverage['td_rate_allowed'] * 20) +
(2.375 - coverage['int_rate'] * 25)) / 6 * 100
).clip(0, 158.3)
return coverage
coverage_stats = coverage_analysis(pbp)
print("Coverage Rankings (by Passer Rating Allowed):")
print(coverage_stats.nsmallest(10, 'passer_rating_allowed')[
['comp_pct_allowed', 'yards_per_target', 'int_rate', 'passer_rating_allowed']
].round(3).to_string())
Deep Pass Defense
Defending deep passes is particularly valuable given their high EPA:
def deep_pass_defense(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Analyze defense against deep passes (15+ air yards).
"""
deep_passes = pbp[
(pbp['pass_attempt'] == 1) &
(pbp['sack'] == 0) &
(pbp['air_yards'] >= 15)
]
deep_defense = (deep_passes
.groupby('defteam')
.agg(
deep_targets=('pass_attempt', 'count'),
deep_completions=('complete_pass', 'sum'),
deep_epa=('epa', 'mean'),
deep_ints=('interception', 'sum')
)
)
deep_defense['deep_comp_pct'] = deep_defense['deep_completions'] / deep_defense['deep_targets']
deep_defense['deep_int_rate'] = deep_defense['deep_ints'] / deep_defense['deep_targets']
return deep_defense
deep_def = deep_pass_defense(pbp)
print("Deep Pass Defense (lower comp % better):")
print(deep_def.nsmallest(10, 'deep_comp_pct')[['deep_targets', 'deep_comp_pct', 'deep_epa']].round(3).to_string())
Yards After Catch Allowed
YAC allowed reflects tackling and pursuit:
def yac_defense(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Analyze yards after catch allowed.
"""
completions = pbp[
(pbp['complete_pass'] == 1) &
(pbp['yards_after_catch'].notna())
]
yac_allowed = (completions
.groupby('defteam')
.agg(
receptions_allowed=('complete_pass', 'count'),
total_yac_allowed=('yards_after_catch', 'sum'),
avg_yac_allowed=('yards_after_catch', 'mean'),
explosive_yac=('yards_after_catch', lambda x: (x >= 15).mean())
)
.sort_values('avg_yac_allowed')
)
return yac_allowed
yac_def = yac_defense(pbp)
print("YAC Allowed (lower is better):")
print(yac_def.head(10).round(2).to_string())
Run Defense Analytics
Yards Per Carry Allowed
Basic but still useful:
def run_defense_basic(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Basic run defense metrics.
"""
runs = pbp[pbp['rush_attempt'] == 1]
run_def = (runs
.groupby('defteam')
.agg(
rush_attempts_faced=('rush_attempt', 'count'),
yards_allowed=('yards_gained', 'sum'),
ypc_allowed=('yards_gained', 'mean'),
rush_epa_allowed=('epa', 'mean'),
rush_success_allowed=('epa', lambda x: (x > 0).mean())
)
.sort_values('ypc_allowed')
)
return run_def
Stuff Rate (Defensive Perspective)
Stuff rate measures the defense's ability to stop runs at or behind the line:
def defensive_stuff_rate(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Calculate defensive stuff rate (stopping runs at/behind LOS).
"""
runs = pbp[pbp['rush_attempt'] == 1]
stuff_analysis = (runs
.groupby('defteam')
.agg(
attempts_faced=('rush_attempt', 'count'),
stuffs=('yards_gained', lambda x: (x <= 0).sum()),
negative_plays=('yards_gained', lambda x: (x < 0).sum()),
tfl_yards=('yards_gained', lambda x: x[x < 0].sum())
)
)
stuff_analysis['stuff_rate'] = stuff_analysis['stuffs'] / stuff_analysis['attempts_faced']
stuff_analysis['negative_rate'] = stuff_analysis['negative_plays'] / stuff_analysis['attempts_faced']
return stuff_analysis.sort_values('stuff_rate', ascending=False)
defensive_stuffs = defensive_stuff_rate(pbp)
print("Best Run Stuffing Defenses:")
print(defensive_stuffs.head(10)[['stuff_rate', 'negative_rate']].round(3).to_string())
Explosive Run Prevention
Preventing big runs is critical:
def explosive_run_prevention(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Measure ability to prevent explosive runs.
"""
runs = pbp[pbp['rush_attempt'] == 1]
explosive = (runs
.groupby('defteam')
.agg(
attempts=('rush_attempt', 'count'),
runs_10_plus=('yards_gained', lambda x: (x >= 10).sum()),
runs_20_plus=('yards_gained', lambda x: (x >= 20).sum())
)
)
explosive['rate_10_plus'] = explosive['runs_10_plus'] / explosive['attempts']
explosive['rate_20_plus'] = explosive['runs_20_plus'] / explosive['attempts']
return explosive.sort_values('rate_10_plus')
explosive_prevention = explosive_run_prevention(pbp)
print("Best at Preventing Explosive Runs:")
print(explosive_prevention.head(10)[['rate_10_plus', 'rate_20_plus']].round(3).to_string())
Situational Defense
Third Down Defense
Third down defense often determines game outcomes:
def third_down_defense(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Analyze third down defensive performance.
"""
third_downs = pbp[pbp['down'] == 3]
third_def = (third_downs
.groupby('defteam')
.agg(
third_downs_faced=('play_id', 'count'),
conversions_allowed=('third_down_converted', 'sum'),
epa_allowed=('epa', 'mean')
)
)
third_def['conversion_rate_allowed'] = (
third_def['conversions_allowed'] / third_def['third_downs_faced']
)
# Split by distance
short = third_downs[third_downs['ydstogo'] <= 3]
medium = third_downs[(third_downs['ydstogo'] > 3) & (third_downs['ydstogo'] <= 7)]
long = third_downs[third_downs['ydstogo'] > 7]
short_conv = short.groupby('defteam')['third_down_converted'].mean()
medium_conv = medium.groupby('defteam')['third_down_converted'].mean()
long_conv = long.groupby('defteam')['third_down_converted'].mean()
third_def['short_conv_allowed'] = short_conv
third_def['medium_conv_allowed'] = medium_conv
third_def['long_conv_allowed'] = long_conv
return third_def.sort_values('conversion_rate_allowed')
third_down_def = third_down_defense(pbp)
print("Third Down Defense (lower conversion rate better):")
print(third_down_def.head(10)[['conversion_rate_allowed', 'short_conv_allowed', 'long_conv_allowed']].round(3).to_string())
Red Zone Defense
Limiting touchdowns in the red zone is critical:
def red_zone_defense(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Analyze red zone defensive performance.
"""
red_zone = pbp[
(pbp['yardline_100'] <= 20) &
(pbp['play_type'].isin(['pass', 'run']))
]
rz_def = (red_zone
.groupby('defteam')
.agg(
rz_plays=('play_id', 'count'),
tds_allowed=('touchdown', 'sum'),
epa_allowed=('epa', 'mean')
)
)
# TD rate per trip (approximate)
rz_trips = red_zone.groupby(['defteam', 'game_id', 'drive'])['touchdown'].max()
rz_trip_summary = rz_trips.groupby('defteam').agg(['count', 'sum'])
rz_trip_summary.columns = ['rz_trips', 'rz_tds']
rz_def = rz_def.join(rz_trip_summary)
rz_def['td_rate'] = rz_def['rz_tds'] / rz_def['rz_trips']
return rz_def.sort_values('td_rate')
rz_defense = red_zone_defense(pbp)
print("Red Zone Defense (lower TD rate better):")
print(rz_defense.head(10)[['rz_trips', 'td_rate', 'epa_allowed']].round(3).to_string())
Late and Close Defense
Performance when games are competitive:
def late_close_defense(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Analyze defense in close games (4th quarter, within 8 points).
"""
late_close = pbp[
(pbp['qtr'] == 4) &
(abs(pbp['score_differential']) <= 8) &
(pbp['play_type'].isin(['pass', 'run']))
]
clutch_def = (late_close
.groupby('defteam')
.agg(
plays=('play_id', 'count'),
epa_allowed=('epa', 'mean'),
success_allowed=('epa', lambda x: (x > 0).mean())
)
.query('plays >= 30')
.sort_values('epa_allowed')
)
return clutch_def
clutch_defense = late_close_defense(pbp)
print("Clutch Defense (4th quarter, close games):")
print(clutch_defense.head(10).round(3).to_string())
Turnover Analysis
Turnover Luck vs Skill
Turnovers are high-variance events with significant luck components:
def turnover_analysis(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Analyze turnovers generated by defense.
"""
plays = pbp[pbp['play_type'].isin(['pass', 'run'])]
turnovers = (plays
.groupby('defteam')
.agg(
plays=('play_id', 'count'),
interceptions=('interception', 'sum'),
fumbles_forced=('fumble_lost', 'sum')
)
)
turnovers['total_turnovers'] = turnovers['interceptions'] + turnovers['fumbles_forced']
turnovers['turnover_rate'] = turnovers['total_turnovers'] / turnovers['plays']
turnovers['int_rate'] = turnovers['interceptions'] / turnovers['plays']
turnovers['fumble_rate'] = turnovers['fumbles_forced'] / turnovers['plays']
# Calculate league average for context
league_int_rate = plays['interception'].mean()
league_fumble_rate = plays['fumble_lost'].mean()
turnovers['int_vs_expected'] = turnovers['interceptions'] - (turnovers['plays'] * league_int_rate)
turnovers['fumble_vs_expected'] = turnovers['fumbles_forced'] - (turnovers['plays'] * league_fumble_rate)
return turnovers.sort_values('turnover_rate', ascending=False)
turnover_stats = turnover_analysis(pbp)
print("Turnover Generation:")
print(turnover_stats.head(10)[['interceptions', 'fumbles_forced', 'turnover_rate']].round(3).to_string())
Interception Quality
Not all interceptions are equal:
def interception_quality(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Analyze quality of interceptions (context matters).
"""
interceptions = pbp[pbp['interception'] == 1]
int_quality = (interceptions
.groupby('defteam')
.agg(
total_ints=('interception', 'count'),
avg_air_yards=('air_yards', 'mean'), # Were they deep or shallow?
epa_swing=('epa', 'mean'), # Impact of the INT
return_yards=('return_yards', 'mean') if 'return_yards' in interceptions.columns else ('interception', 'count')
)
)
return int_quality
int_qual = interception_quality(pbp)
print("Interception Quality:")
print(int_qual.head(10).round(2).to_string())
Opponent Adjustment
Why Opponent Adjustment Matters
Raw defensive stats are heavily influenced by opponent quality: - A defense facing Kansas City will allow more EPA than one facing a rebuilding team - Schedule strength varies significantly
DVOA-Style Adjustment
def opponent_adjusted_defense(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Adjust defensive metrics for opponent quality.
"""
plays = pbp[pbp['play_type'].isin(['pass', 'run'])]
# Calculate offensive quality (EPA per play)
off_quality = plays.groupby('posteam')['epa'].mean()
# Merge opponent quality into plays
plays = plays.merge(
off_quality.rename('opp_offense_quality'),
left_on='posteam',
right_index=True
)
# Calculate raw and adjusted defensive EPA
raw_def = plays.groupby('defteam')['epa'].mean()
# Expected EPA based on opponents faced
expected_epa = plays.groupby('defteam')['opp_offense_quality'].mean()
adjusted = pd.DataFrame({
'raw_epa_allowed': raw_def,
'expected_epa': expected_epa,
'adjusted_epa': raw_def - expected_epa
})
# Negative adjusted EPA = better than expected
adjusted['rank_raw'] = adjusted['raw_epa_allowed'].rank()
adjusted['rank_adjusted'] = adjusted['adjusted_epa'].rank()
adjusted['rank_change'] = adjusted['rank_raw'] - adjusted['rank_adjusted']
return adjusted.sort_values('adjusted_epa')
adj_defense = opponent_adjusted_defense(pbp)
print("Opponent-Adjusted Defense:")
print(adj_defense.head(10).round(3).to_string())
print("\nBiggest rank improvers after adjustment:")
print(adj_defense.nlargest(5, 'rank_change')[['raw_epa_allowed', 'adjusted_epa', 'rank_change']].round(3).to_string())
Individual Defensive Evaluation
The Limitation of Public Data
Individual defensive metrics from standard PBP data are extremely limited: - No assignment data (who was supposed to cover whom) - No tracking data (positioning, speed, angles) - Limited to counting stats (tackles, sacks)
For true individual evaluation, services like PFF or SIS that chart every play are necessary.
What We Can Analyze
Tackles (available in some data):
def tackle_analysis(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Analyze tackle data if available.
Note: Tackle stats don't indicate defensive quality well.
"""
# Tackle data often requires additional data sources
# This is a framework assuming tackle data is available
if 'tackle_for_loss' in pbp.columns:
tfl_analysis = (pbp
.groupby('defteam')
.agg(
plays=('play_id', 'count'),
tfls=('tackle_for_loss', 'sum')
)
)
tfl_analysis['tfl_rate'] = tfl_analysis['tfls'] / tfl_analysis['plays']
return tfl_analysis
return None
Pass Breakups and Interceptions by Position
If position data is available:
def secondary_ballhawk_stats(pbp: pd.DataFrame) -> dict:
"""
Analyze ball-hawking ability of secondary.
Requires defender identification in data.
"""
# Framework for when defender data is available
passes = pbp[(pbp['pass_attempt'] == 1) & (pbp['sack'] == 0)]
team_ballhawks = (passes
.groupby('defteam')
.agg(
targets_faced=('pass_attempt', 'count'),
interceptions=('interception', 'sum'),
pass_defended=('pass_defended', 'sum') if 'pass_defended' in passes.columns else ('interception', 'sum')
)
)
team_ballhawks['int_rate'] = team_ballhawks['interceptions'] / team_ballhawks['targets_faced']
return team_ballhawks
Building a Defensive Evaluation System
Comprehensive Defensive Evaluator
from dataclasses import dataclass
from typing import List, Optional
@dataclass
class DefensiveReport:
"""Complete defensive evaluation report."""
team: str
season: int
# Overall
epa_per_play: float
success_rate_allowed: float
overall_rank: int
# Pass defense
pass_epa_allowed: float
sack_rate: float
comp_pct_allowed: float
yards_per_target: float
pass_rank: int
# Run defense
run_epa_allowed: float
ypc_allowed: float
stuff_rate: float
run_rank: int
# Situational
third_down_conv_allowed: float
red_zone_td_rate: float
# Turnovers
turnover_rate: float
int_rate: float
# Assessment
strengths: List[str]
weaknesses: List[str]
class DefensiveEvaluator:
"""Comprehensive defensive evaluation system."""
def __init__(self, pbp: pd.DataFrame, season: int = 2023):
self.pbp = pbp
self.season = season
self.plays = pbp[pbp['play_type'].isin(['pass', 'run'])].copy()
self.passes = pbp[pbp['pass_attempt'] == 1].copy()
self.rushes = pbp[pbp['rush_attempt'] == 1].copy()
self._calculate_league_averages()
def _calculate_league_averages(self):
"""Calculate league averages for comparison."""
self.league_epa = self.plays['epa'].mean()
self.league_pass_epa = self.passes['epa'].mean()
self.league_run_epa = self.rushes['epa'].mean()
self.league_sack_rate = self.passes['sack'].mean()
self.league_ypc = self.rushes['yards_gained'].mean()
def evaluate_team(self, team: str) -> DefensiveReport:
"""Generate comprehensive defensive evaluation."""
team_plays = self.plays[self.plays['defteam'] == team]
team_passes = self.passes[self.passes['defteam'] == team]
team_rushes = self.rushes[self.rushes['defteam'] == team]
# Overall metrics
epa_per_play = team_plays['epa'].mean()
success_allowed = (team_plays['epa'] > 0).mean()
# Pass defense
pass_epa = team_passes['epa'].mean()
sack_rate = team_passes['sack'].mean()
non_sacks = team_passes[team_passes['sack'] == 0]
comp_pct = non_sacks['complete_pass'].mean() if len(non_sacks) > 0 else 0
yards_per_target = non_sacks['yards_gained'].mean() if len(non_sacks) > 0 else 0
# Run defense
run_epa = team_rushes['epa'].mean()
ypc = team_rushes['yards_gained'].mean()
stuff_rate = (team_rushes['yards_gained'] <= 0).mean()
# Situational
third_downs = team_plays[team_plays['down'] == 3]
third_conv = third_downs['third_down_converted'].mean() if len(third_downs) > 0 else 0
red_zone = team_plays[team_plays['yardline_100'] <= 20]
rz_td_rate = red_zone['touchdown'].mean() if len(red_zone) > 0 else 0
# Turnovers
turnover_rate = (team_plays['interception'] | team_plays['fumble_lost']).mean()
int_rate = team_passes['interception'].mean()
# Rankings
all_teams_epa = self.plays.groupby('defteam')['epa'].mean()
overall_rank = all_teams_epa.rank().loc[team]
all_pass_epa = self.passes.groupby('defteam')['epa'].mean()
pass_rank = all_pass_epa.rank().loc[team]
all_run_epa = self.rushes.groupby('defteam')['epa'].mean()
run_rank = all_run_epa.rank().loc[team]
# Identify strengths and weaknesses
strengths = []
weaknesses = []
if pass_epa < self.league_pass_epa * 0.8:
strengths.append("Elite pass defense")
elif pass_epa > self.league_pass_epa * 1.2:
weaknesses.append("Poor pass defense")
if run_epa < self.league_run_epa * 0.8:
strengths.append("Elite run defense")
elif run_epa > self.league_run_epa * 1.2:
weaknesses.append("Poor run defense")
if sack_rate > self.league_sack_rate * 1.3:
strengths.append("Strong pass rush")
elif sack_rate < self.league_sack_rate * 0.7:
weaknesses.append("Weak pass rush")
if stuff_rate > 0.22:
strengths.append("Excellent run stuffing")
elif stuff_rate < 0.16:
weaknesses.append("Weak at line of scrimmage")
return DefensiveReport(
team=team,
season=self.season,
epa_per_play=epa_per_play,
success_rate_allowed=success_allowed,
overall_rank=int(overall_rank),
pass_epa_allowed=pass_epa,
sack_rate=sack_rate,
comp_pct_allowed=comp_pct,
yards_per_target=yards_per_target,
pass_rank=int(pass_rank),
run_epa_allowed=run_epa,
ypc_allowed=ypc,
stuff_rate=stuff_rate,
run_rank=int(run_rank),
third_down_conv_allowed=third_conv,
red_zone_td_rate=rz_td_rate,
turnover_rate=turnover_rate,
int_rate=int_rate,
strengths=strengths,
weaknesses=weaknesses
)
def generate_report(self, team: str) -> str:
"""Generate text report for team defense."""
r = self.evaluate_team(team)
lines = [
f"\n{'='*60}",
f"DEFENSIVE EVALUATION: {team}",
f"Season: {self.season}",
f"{'='*60}",
"",
f"OVERALL: Rank #{r.overall_rank} of 32",
f" EPA/Play Allowed: {r.epa_per_play:+.3f} (League: {self.league_epa:+.3f})",
f" Success Rate Allowed: {r.success_rate_allowed:.1%}",
"",
f"PASS DEFENSE: Rank #{r.pass_rank}",
f" Pass EPA Allowed: {r.pass_epa_allowed:+.3f}",
f" Sack Rate: {r.sack_rate:.1%}",
f" Comp % Allowed: {r.comp_pct_allowed:.1%}",
f" Yards/Target: {r.yards_per_target:.1f}",
"",
f"RUN DEFENSE: Rank #{r.run_rank}",
f" Run EPA Allowed: {r.run_epa_allowed:+.3f}",
f" YPC Allowed: {r.ypc_allowed:.2f}",
f" Stuff Rate: {r.stuff_rate:.1%}",
"",
"SITUATIONAL",
f" 3rd Down Conv: {r.third_down_conv_allowed:.1%}",
f" Red Zone TD Rate: {r.red_zone_td_rate:.1%}",
"",
"TURNOVERS",
f" Turnover Rate: {r.turnover_rate:.2%}",
f" INT Rate: {r.int_rate:.2%}",
"",
"ASSESSMENT",
f" Strengths: {', '.join(r.strengths) if r.strengths else 'None identified'}",
f" Weaknesses: {', '.join(r.weaknesses) if r.weaknesses else 'None identified'}",
f"{'='*60}"
]
return "\n".join(lines)
def rank_all_teams(self) -> pd.DataFrame:
"""Rank all teams by defensive performance."""
results = []
for team in sorted(self.plays['defteam'].unique()):
try:
report = self.evaluate_team(team)
results.append({
'team': team,
'epa_allowed': report.epa_per_play,
'overall_rank': report.overall_rank,
'pass_epa': report.pass_epa_allowed,
'pass_rank': report.pass_rank,
'run_epa': report.run_epa_allowed,
'run_rank': report.run_rank,
'sack_rate': report.sack_rate,
'stuff_rate': report.stuff_rate,
'turnover_rate': report.turnover_rate
})
except Exception as e:
continue
return pd.DataFrame(results).sort_values('overall_rank')
Common Pitfalls in Defensive Analysis
1. Over-relying on Turnovers
Turnovers are high-variance and largely random:
def turnover_stability_test(years: list) -> float:
"""
Test year-to-year turnover correlation.
Spoiler: It's low.
"""
pbp = nfl.import_pbp_data(years)
yearly_turnovers = {}
for year in years:
year_plays = pbp[(pbp['season'] == year) & (pbp['play_type'].isin(['pass', 'run']))]
turnovers = year_plays.groupby('defteam').apply(
lambda x: ((x['interception'] == 1) | (x['fumble_lost'] == 1)).mean()
)
yearly_turnovers[year] = turnovers
# Correlate consecutive years
correlations = []
for i in range(len(years) - 1):
common = yearly_turnovers[years[i]].index.intersection(yearly_turnovers[years[i+1]].index)
corr = yearly_turnovers[years[i]][common].corr(yearly_turnovers[years[i+1]][common])
correlations.append(corr)
return np.mean(correlations)
# Typically r < 0.30 - turnovers don't persist well
2. Confusing Correlation with Causation
A defense that faces many passes may appear to have better run defense simply because they're behind and opponents run less.
3. Ignoring Game Script
Defenses that build leads face different offensive strategies:
def game_script_adjusted_defense(pbp: pd.DataFrame) -> pd.DataFrame:
"""
Analyze defense in neutral game scripts only.
"""
neutral = pbp[
(pbp['play_type'].isin(['pass', 'run'])) &
(abs(pbp['score_differential']) <= 7)
]
neutral_def = (neutral
.groupby('defteam')
.agg(
plays=('epa', 'count'),
epa_allowed=('epa', 'mean'),
pass_rate_faced=('pass_attempt', 'mean')
)
)
return neutral_def
4. Small Sample Individual Stats
Individual defensive metrics require large samples to stabilize. A corner who allows 3 TDs on 30 targets isn't necessarily bad - sample size is too small.
Summary
Key Concepts
- Defensive EPA Allowed is the primary team-level metric
- Pass/Run splits reveal defensive strengths and weaknesses
- Pass rush and coverage interact - neither exists independently
- Turnovers are largely random - don't overweight them
- Individual attribution requires film - PBP data is insufficient
Metric Hierarchy
| Metric | What It Measures | Data Needed |
|---|---|---|
| EPA Allowed | Overall quality | Standard PBP |
| Success Rate | Consistency | Standard PBP |
| Sack Rate | Pass rush | Standard PBP |
| Coverage stats | Individual coverage | Film charting |
| Win rate | Individual quality | Tracking data |
Best Practices
- Use EPA over traditional stats (yards, points)
- Adjust for opponent quality
- Split by pass/run
- Consider game script
- Respect sample size limitations
- Use film services for individual evaluation
Preview: Chapter 11
Next, we'll explore Special Teams Analytics - the often-neglected third phase that can swing close games. We'll examine kicking, punting, and return game evaluation using EPA frameworks.