Case Study: Why Did the Offense Struggle?
"The numbers never lie, but they don't always tell the whole story at first glance."
Executive Summary
In this case study, you'll use exploratory data analysis to diagnose why a team's offense underperformed expectations. You'll work through a systematic EDA process to identify root causes, moving from high-level symptoms to specific actionable insights.
Skills Applied: - Systematic EDA workflow - Comparative analysis - Split analysis - Visual storytelling - Insight communication
The Scenario
The Jacksonville Jaguars finished the 2023 season with an offense that appeared mediocre despite having what many considered talented personnel. The coaching staff wants to understand:
- Where specifically did the offense underperform?
- What patterns explain the underperformance?
- What should be addressed in the offseason?
Your task is to conduct a thorough EDA to answer these questions.
Part 1: Establishing the Baseline
Step 1.1: Load and Prepare Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nfl_data_py as nfl
# Load 2023 data
pbp = nfl.import_pbp_data([2023])
# Filter to offensive plays
plays = pbp.query("play_type.isin(['pass', 'run'])").copy()
# Target team
TEAM = 'JAX'
Step 1.2: High-Level Performance
First, establish where Jacksonville ranks among all teams:
def calculate_team_rankings(plays: pd.DataFrame) -> pd.DataFrame:
"""Calculate comprehensive team offensive rankings."""
team_stats = (
plays
.groupby('posteam')
.agg(
plays=('play_id', 'count'),
epa_per_play=('epa', 'mean'),
success_rate=('success', 'mean'),
pass_rate=('pass', 'mean'),
explosive_rate=('yards_gained', lambda x: (
((plays.loc[x.index, 'pass'] == 1) & (x >= 20)) |
((plays.loc[x.index, 'rush'] == 1) & (x >= 10))
).mean()),
turnover_plays=('interception', lambda x: (
x.sum() + plays.loc[x.index, 'fumble_lost'].sum()
))
)
.reset_index()
)
# Add rankings
team_stats['epa_rank'] = team_stats['epa_per_play'].rank(ascending=False)
team_stats['success_rank'] = team_stats['success_rate'].rank(ascending=False)
return team_stats
team_rankings = calculate_team_rankings(plays)
jax_rank = team_rankings.query("posteam == 'JAX'").iloc[0]
print(f"Jacksonville 2023 Offensive Rankings:")
print(f" EPA per Play: {jax_rank['epa_per_play']:.3f} (Rank: {int(jax_rank['epa_rank'])})")
print(f" Success Rate: {jax_rank['success_rate']:.1%} (Rank: {int(jax_rank['success_rank'])})")
print(f" Pass Rate: {jax_rank['pass_rate']:.1%}")
Step 1.3: Visualize League Context
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# EPA distribution with JAX highlighted
teams = team_rankings.sort_values('epa_per_play', ascending=True)
colors = ['#006778' if t == 'JAX' else 'gray' for t in teams['posteam']]
axes[0].barh(teams['posteam'], teams['epa_per_play'], color=colors)
axes[0].axvline(0, color='black', linestyle='-', linewidth=0.5)
axes[0].set_xlabel('EPA per Play')
axes[0].set_title('2023 Offensive EPA Rankings')
# EPA vs Success Rate scatter
axes[1].scatter(
team_rankings['epa_per_play'],
team_rankings['success_rate'],
c='gray', s=100, alpha=0.6
)
# Highlight Jacksonville
jax_data = team_rankings.query("posteam == 'JAX'")
axes[1].scatter(
jax_data['epa_per_play'],
jax_data['success_rate'],
c='#006778', s=150, label='JAX', zorder=5
)
axes[1].annotate('JAX', (jax_data['epa_per_play'].values[0],
jax_data['success_rate'].values[0]),
xytext=(5, 5), textcoords='offset points')
# Add quadrant lines
axes[1].axhline(team_rankings['success_rate'].median(), color='gray', linestyle='--', alpha=0.5)
axes[1].axvline(team_rankings['epa_per_play'].median(), color='gray', linestyle='--', alpha=0.5)
axes[1].set_xlabel('EPA per Play')
axes[1].set_ylabel('Success Rate')
axes[1].set_title('EPA vs Success Rate by Team')
plt.tight_layout()
plt.savefig('jax_league_context.png', dpi=150, bbox_inches='tight')
Observation: Jacksonville ranked [X] in EPA per play, below expectations for their talent level. The next step is to understand why.
Part 2: Breaking Down Performance
Step 2.1: Pass vs Run Split
def analyze_pass_run_split(plays: pd.DataFrame, team: str) -> pd.DataFrame:
"""Compare pass and run efficiency for a team vs league."""
results = []
for play_type, filter_expr in [('pass', 'pass == 1'), ('run', 'rush == 1')]:
# Team stats
team_plays = plays.query(f"posteam == '{team}' and {filter_expr}")
team_epa = team_plays['epa'].mean()
team_success = team_plays['success'].mean()
team_n = len(team_plays)
# League stats
league_plays = plays.query(filter_expr)
league_epa = league_plays['epa'].mean()
league_success = league_plays['success'].mean()
results.append({
'play_type': play_type,
'team_epa': team_epa,
'league_epa': league_epa,
'epa_vs_avg': team_epa - league_epa,
'team_success': team_success,
'league_success': league_success,
'plays': team_n
})
return pd.DataFrame(results)
pass_run = analyze_pass_run_split(plays, 'JAX')
print(pass_run.to_string(index=False))
Step 2.2: Down-by-Down Analysis
def analyze_by_down(plays: pd.DataFrame, team: str) -> pd.DataFrame:
"""Analyze efficiency by down."""
team_plays = plays.query(f"posteam == '{team}'")
league_plays = plays
results = []
for down in [1, 2, 3, 4]:
team_down = team_plays.query(f"down == {down}")
league_down = league_plays.query(f"down == {down}")
results.append({
'down': down,
'team_epa': team_down['epa'].mean(),
'league_epa': league_down['epa'].mean(),
'team_success': team_down['success'].mean(),
'league_success': league_down['success'].mean(),
'team_pass_rate': team_down['pass'].mean(),
'plays': len(team_down)
})
return pd.DataFrame(results)
by_down = analyze_by_down(plays, 'JAX')
print(by_down.to_string(index=False))
Step 2.3: Visualize Down Performance
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
x = by_down['down']
width = 0.35
# EPA by down
axes[0].bar(x - width/2, by_down['team_epa'], width, label='JAX', color='#006778')
axes[0].bar(x + width/2, by_down['league_epa'], width, label='League', color='gray')
axes[0].axhline(0, color='black', linewidth=0.5)
axes[0].set_xlabel('Down')
axes[0].set_ylabel('EPA per Play')
axes[0].set_title('EPA by Down: JAX vs League')
axes[0].legend()
axes[0].set_xticks([1, 2, 3, 4])
# Success rate by down
axes[1].bar(x - width/2, by_down['team_success'], width, label='JAX', color='#006778')
axes[1].bar(x + width/2, by_down['league_success'], width, label='League', color='gray')
axes[1].set_xlabel('Down')
axes[1].set_ylabel('Success Rate')
axes[1].set_title('Success Rate by Down: JAX vs League')
axes[1].legend()
axes[1].set_xticks([1, 2, 3, 4])
plt.tight_layout()
plt.savefig('jax_by_down.png', dpi=150, bbox_inches='tight')
Key Finding: [Identify which downs show the biggest gaps]
Part 3: Situational Deep Dives
Step 3.1: Red Zone Analysis
def analyze_red_zone(plays: pd.DataFrame, team: str) -> dict:
"""Analyze red zone performance."""
rz = plays.query("yardline_100 <= 20")
team_rz = rz.query(f"posteam == '{team}'")
return {
'rz_plays': len(team_rz),
'rz_td_rate': team_rz['touchdown'].mean(),
'rz_epa': team_rz['epa'].mean(),
'rz_success': team_rz['success'].mean(),
'rz_pass_rate': team_rz['pass'].mean(),
'league_td_rate': rz['touchdown'].mean(),
'league_rz_epa': rz['epa'].mean()
}
rz_stats = analyze_red_zone(plays, 'JAX')
print(f"Red Zone Analysis:")
for key, value in rz_stats.items():
print(f" {key}: {value:.3f}" if isinstance(value, float) else f" {key}: {value}")
Step 3.2: Third Down Efficiency
def analyze_third_down(plays: pd.DataFrame, team: str) -> pd.DataFrame:
"""Detailed third down analysis by distance."""
third = plays.query("down == 3")
# Bin by distance
bins = [(1, 3, 'short'), (4, 6, 'medium'), (7, 10, 'long'), (11, 99, 'very_long')]
results = []
for min_d, max_d, label in bins:
team_plays = third.query(f"posteam == '{team}' and {min_d} <= ydstogo <= {max_d}")
league_plays = third.query(f"{min_d} <= ydstogo <= {max_d}")
results.append({
'distance': label,
'team_conv_rate': team_plays['first_down'].mean(),
'league_conv_rate': league_plays['first_down'].mean(),
'team_epa': team_plays['epa'].mean(),
'plays': len(team_plays)
})
return pd.DataFrame(results)
third_down = analyze_third_down(plays, 'JAX')
print(third_down.to_string(index=False))
Step 3.3: Late and Close Situations
def analyze_clutch(plays: pd.DataFrame, team: str) -> dict:
"""Analyze performance in close, late-game situations."""
# Close game in 4th quarter
clutch = plays.query(
"qtr == 4 and abs(score_differential) <= 8"
)
team_clutch = clutch.query(f"posteam == '{team}'")
return {
'clutch_plays': len(team_clutch),
'clutch_epa': team_clutch['epa'].mean(),
'clutch_success': team_clutch['success'].mean(),
'league_clutch_epa': clutch['epa'].mean(),
'overall_epa': plays.query(f"posteam == '{team}'")['epa'].mean()
}
clutch_stats = analyze_clutch(plays, 'JAX')
print(f"Clutch Situations (4Q, within 8 points):")
for key, value in clutch_stats.items():
print(f" {key}: {value:.3f}" if isinstance(value, float) else f" {key}: {value}")
Part 4: Player-Level Analysis
Step 4.1: Quarterback Performance
def analyze_qb(plays: pd.DataFrame, team: str) -> pd.DataFrame:
"""Analyze quarterback performance."""
passes = plays.query(f"pass == 1 and posteam == '{team}'")
qb_stats = (
passes
.groupby('passer_player_name')
.agg(
dropbacks=('pass', 'count'),
epa_per_play=('epa', 'mean'),
cpoe=('cpoe', 'mean'),
success_rate=('success', 'mean'),
air_yards_avg=('air_yards', 'mean'),
yac_avg=('yards_after_catch', 'mean'),
sack_rate=('sack', 'mean'),
interceptions=('interception', 'sum')
)
.reset_index()
.query("dropbacks >= 50")
)
return qb_stats
jax_qb = analyze_qb(plays, 'JAX')
print(jax_qb.to_string(index=False))
Step 4.2: Receiver Usage
def analyze_receivers(plays: pd.DataFrame, team: str) -> pd.DataFrame:
"""Analyze receiver efficiency and usage."""
team_passes = plays.query(f"pass == 1 and posteam == '{team}'")
receiver_stats = (
team_passes
.query("receiver_player_name.notna()")
.groupby('receiver_player_name')
.agg(
targets=('pass', 'count'),
receptions=('complete_pass', 'sum'),
yards=('yards_gained', 'sum'),
epa_total=('epa', 'sum'),
epa_per_target=('epa', 'mean'),
avg_depth=('air_yards', 'mean'),
yac_avg=('yards_after_catch', 'mean')
)
.reset_index()
.query("targets >= 20")
.assign(catch_rate=lambda x: x['receptions'] / x['targets'])
.sort_values('targets', ascending=False)
)
return receiver_stats
jax_receivers = analyze_receivers(plays, 'JAX')
print(jax_receivers.head(10).to_string(index=False))
Step 4.3: Running Back Efficiency
def analyze_rushers(plays: pd.DataFrame, team: str) -> pd.DataFrame:
"""Analyze rusher efficiency."""
team_runs = plays.query(f"rush == 1 and posteam == '{team}'")
rusher_stats = (
team_runs
.query("rusher_player_name.notna()")
.groupby('rusher_player_name')
.agg(
carries=('rush', 'count'),
yards=('yards_gained', 'sum'),
epa_total=('epa', 'sum'),
epa_per_carry=('epa', 'mean'),
success_rate=('success', 'mean'),
explosive_rate=('yards_gained', lambda x: (x >= 10).mean())
)
.reset_index()
.query("carries >= 20")
.assign(ypc=lambda x: x['yards'] / x['carries'])
.sort_values('carries', ascending=False)
)
return rusher_stats
jax_rushers = analyze_rushers(plays, 'JAX')
print(jax_rushers.to_string(index=False))
Part 5: Synthesis and Visualization
Step 5.1: Create Summary Dashboard
def create_diagnosis_dashboard(plays: pd.DataFrame, team: str):
"""Create 4-panel diagnosis dashboard."""
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
team_plays = plays.query(f"posteam == '{team}'")
# Panel 1: EPA by play type vs league
pass_run = team_plays.groupby('play_type')['epa'].mean()
league_pass_run = plays.groupby('play_type')['epa'].mean()
x = np.arange(2)
axes[0, 0].bar(x - 0.2, [pass_run.get('pass', 0), pass_run.get('run', 0)],
0.4, label=team, color='#006778')
axes[0, 0].bar(x + 0.2, [league_pass_run.get('pass', 0), league_pass_run.get('run', 0)],
0.4, label='League', color='gray')
axes[0, 0].set_xticks(x)
axes[0, 0].set_xticklabels(['Pass', 'Run'])
axes[0, 0].set_ylabel('EPA per Play')
axes[0, 0].set_title('EPA by Play Type')
axes[0, 0].legend()
axes[0, 0].axhline(0, color='black', linewidth=0.5)
# Panel 2: EPA by down
down_epa = team_plays.groupby('down')['epa'].mean()
league_down_epa = plays.groupby('down')['epa'].mean()
downs = [1, 2, 3, 4]
axes[0, 1].bar([d - 0.2 for d in downs], [down_epa.get(d, 0) for d in downs],
0.4, label=team, color='#006778')
axes[0, 1].bar([d + 0.2 for d in downs], [league_down_epa.get(d, 0) for d in downs],
0.4, label='League', color='gray')
axes[0, 1].set_xticks(downs)
axes[0, 1].set_xlabel('Down')
axes[0, 1].set_ylabel('EPA per Play')
axes[0, 1].set_title('EPA by Down')
axes[0, 1].legend()
axes[0, 1].axhline(0, color='black', linewidth=0.5)
# Panel 3: Weekly EPA trend
weekly = team_plays.groupby('week')['epa'].mean()
axes[1, 0].plot(weekly.index, weekly.values, marker='o', color='#006778')
axes[1, 0].axhline(0, color='gray', linestyle='--', alpha=0.5)
axes[1, 0].axhline(team_plays['epa'].mean(), color='#006778', linestyle='--',
label=f'Season Avg: {team_plays["epa"].mean():.3f}')
axes[1, 0].set_xlabel('Week')
axes[1, 0].set_ylabel('EPA per Play')
axes[1, 0].set_title('Weekly EPA Trend')
axes[1, 0].legend()
# Panel 4: Success rate by field position
team_plays['field_zone'] = pd.cut(
team_plays['yardline_100'],
bins=[0, 20, 40, 60, 80, 100],
labels=['Red Zone', '20-40', '40-60', '60-80', 'Own 20']
)
zone_success = team_plays.groupby('field_zone')['success'].mean()
league_zone = plays.copy()
league_zone['field_zone'] = pd.cut(
league_zone['yardline_100'],
bins=[0, 20, 40, 60, 80, 100],
labels=['Red Zone', '20-40', '40-60', '60-80', 'Own 20']
)
league_zone_success = league_zone.groupby('field_zone')['success'].mean()
x = np.arange(5)
axes[1, 1].bar(x - 0.2, zone_success.values, 0.4, label=team, color='#006778')
axes[1, 1].bar(x + 0.2, league_zone_success.values, 0.4, label='League', color='gray')
axes[1, 1].set_xticks(x)
axes[1, 1].set_xticklabels(zone_success.index, rotation=45)
axes[1, 1].set_ylabel('Success Rate')
axes[1, 1].set_title('Success Rate by Field Position')
axes[1, 1].legend()
plt.suptitle(f'{team} 2023 Offensive Diagnosis', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig(f'{team.lower()}_diagnosis.png', dpi=150, bbox_inches='tight')
return fig
create_diagnosis_dashboard(plays, 'JAX')
Part 6: Conclusions and Recommendations
Step 6.1: Summary of Findings
Based on the EDA, document your key findings:
DIAGNOSIS SUMMARY: Jacksonville Jaguars 2023 Offense
=====================================================
FINDING 1: [Primary Issue]
- Evidence: [Specific metrics]
- Impact: [How much did this hurt the offense?]
FINDING 2: [Secondary Issue]
- Evidence: [Specific metrics]
- Impact: [Quantified impact]
FINDING 3: [Tertiary Issue]
- Evidence: [Specific metrics]
- Impact: [Quantified impact]
BRIGHT SPOTS:
- [What worked well?]
- [Areas of above-average performance]
Step 6.2: Recommendations
RECOMMENDATIONS FOR 2024 OFFSEASON
===================================
PRIORITY 1: [Address primary weakness]
- Specific action items
- Expected impact
PRIORITY 2: [Address secondary weakness]
- Specific action items
- Expected impact
PRIORITY 3: [Optimize strength]
- Specific action items
- Expected impact
Discussion Questions
-
Data Limitations: What context is missing from play-by-play data that would help explain performance? (Injuries, scheme changes, opponent quality)
-
Sample Size: For situational analysis (red zone, third down), when do we have enough plays to trust the numbers?
-
Causation: How do we distinguish between "the QB played poorly" vs "the receivers didn't get open" vs "the play-calling was bad"?
-
Actionability: Which of your findings can the team actually address vs which are baked into personnel?
Extension: Build Your Own Diagnosis
Apply this same framework to another team: 1. Choose a team that underperformed or overperformed expectations 2. Run through the complete analysis 3. Identify 3 key findings 4. Create a summary dashboard 5. Write a 1-page executive summary for the front office