Case Study: Why Did the Offense Struggle?

"The numbers never lie, but they don't always tell the whole story at first glance."

Executive Summary

In this case study, you'll use exploratory data analysis to diagnose why a team's offense underperformed expectations. You'll work through a systematic EDA process to identify root causes, moving from high-level symptoms to specific actionable insights.

Skills Applied: - Systematic EDA workflow - Comparative analysis - Split analysis - Visual storytelling - Insight communication


The Scenario

The Jacksonville Jaguars finished the 2023 season with an offense that appeared mediocre despite having what many considered talented personnel. The coaching staff wants to understand:

  1. Where specifically did the offense underperform?
  2. What patterns explain the underperformance?
  3. What should be addressed in the offseason?

Your task is to conduct a thorough EDA to answer these questions.


Part 1: Establishing the Baseline

Step 1.1: Load and Prepare Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nfl_data_py as nfl

# Load 2023 data
pbp = nfl.import_pbp_data([2023])

# Filter to offensive plays
plays = pbp.query("play_type.isin(['pass', 'run'])").copy()

# Target team
TEAM = 'JAX'

Step 1.2: High-Level Performance

First, establish where Jacksonville ranks among all teams:

def calculate_team_rankings(plays: pd.DataFrame) -> pd.DataFrame:
    """Calculate comprehensive team offensive rankings."""

    team_stats = (
        plays
        .groupby('posteam')
        .agg(
            plays=('play_id', 'count'),
            epa_per_play=('epa', 'mean'),
            success_rate=('success', 'mean'),
            pass_rate=('pass', 'mean'),
            explosive_rate=('yards_gained', lambda x: (
                ((plays.loc[x.index, 'pass'] == 1) & (x >= 20)) |
                ((plays.loc[x.index, 'rush'] == 1) & (x >= 10))
            ).mean()),
            turnover_plays=('interception', lambda x: (
                x.sum() + plays.loc[x.index, 'fumble_lost'].sum()
            ))
        )
        .reset_index()
    )

    # Add rankings
    team_stats['epa_rank'] = team_stats['epa_per_play'].rank(ascending=False)
    team_stats['success_rank'] = team_stats['success_rate'].rank(ascending=False)

    return team_stats

team_rankings = calculate_team_rankings(plays)
jax_rank = team_rankings.query("posteam == 'JAX'").iloc[0]

print(f"Jacksonville 2023 Offensive Rankings:")
print(f"  EPA per Play: {jax_rank['epa_per_play']:.3f} (Rank: {int(jax_rank['epa_rank'])})")
print(f"  Success Rate: {jax_rank['success_rate']:.1%} (Rank: {int(jax_rank['success_rank'])})")
print(f"  Pass Rate: {jax_rank['pass_rate']:.1%}")

Step 1.3: Visualize League Context

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# EPA distribution with JAX highlighted
teams = team_rankings.sort_values('epa_per_play', ascending=True)
colors = ['#006778' if t == 'JAX' else 'gray' for t in teams['posteam']]

axes[0].barh(teams['posteam'], teams['epa_per_play'], color=colors)
axes[0].axvline(0, color='black', linestyle='-', linewidth=0.5)
axes[0].set_xlabel('EPA per Play')
axes[0].set_title('2023 Offensive EPA Rankings')

# EPA vs Success Rate scatter
axes[1].scatter(
    team_rankings['epa_per_play'],
    team_rankings['success_rate'],
    c='gray', s=100, alpha=0.6
)

# Highlight Jacksonville
jax_data = team_rankings.query("posteam == 'JAX'")
axes[1].scatter(
    jax_data['epa_per_play'],
    jax_data['success_rate'],
    c='#006778', s=150, label='JAX', zorder=5
)
axes[1].annotate('JAX', (jax_data['epa_per_play'].values[0],
                          jax_data['success_rate'].values[0]),
                 xytext=(5, 5), textcoords='offset points')

# Add quadrant lines
axes[1].axhline(team_rankings['success_rate'].median(), color='gray', linestyle='--', alpha=0.5)
axes[1].axvline(team_rankings['epa_per_play'].median(), color='gray', linestyle='--', alpha=0.5)

axes[1].set_xlabel('EPA per Play')
axes[1].set_ylabel('Success Rate')
axes[1].set_title('EPA vs Success Rate by Team')

plt.tight_layout()
plt.savefig('jax_league_context.png', dpi=150, bbox_inches='tight')

Observation: Jacksonville ranked [X] in EPA per play, below expectations for their talent level. The next step is to understand why.


Part 2: Breaking Down Performance

Step 2.1: Pass vs Run Split

def analyze_pass_run_split(plays: pd.DataFrame, team: str) -> pd.DataFrame:
    """Compare pass and run efficiency for a team vs league."""

    results = []

    for play_type, filter_expr in [('pass', 'pass == 1'), ('run', 'rush == 1')]:
        # Team stats
        team_plays = plays.query(f"posteam == '{team}' and {filter_expr}")
        team_epa = team_plays['epa'].mean()
        team_success = team_plays['success'].mean()
        team_n = len(team_plays)

        # League stats
        league_plays = plays.query(filter_expr)
        league_epa = league_plays['epa'].mean()
        league_success = league_plays['success'].mean()

        results.append({
            'play_type': play_type,
            'team_epa': team_epa,
            'league_epa': league_epa,
            'epa_vs_avg': team_epa - league_epa,
            'team_success': team_success,
            'league_success': league_success,
            'plays': team_n
        })

    return pd.DataFrame(results)

pass_run = analyze_pass_run_split(plays, 'JAX')
print(pass_run.to_string(index=False))

Step 2.2: Down-by-Down Analysis

def analyze_by_down(plays: pd.DataFrame, team: str) -> pd.DataFrame:
    """Analyze efficiency by down."""

    team_plays = plays.query(f"posteam == '{team}'")
    league_plays = plays

    results = []
    for down in [1, 2, 3, 4]:
        team_down = team_plays.query(f"down == {down}")
        league_down = league_plays.query(f"down == {down}")

        results.append({
            'down': down,
            'team_epa': team_down['epa'].mean(),
            'league_epa': league_down['epa'].mean(),
            'team_success': team_down['success'].mean(),
            'league_success': league_down['success'].mean(),
            'team_pass_rate': team_down['pass'].mean(),
            'plays': len(team_down)
        })

    return pd.DataFrame(results)

by_down = analyze_by_down(plays, 'JAX')
print(by_down.to_string(index=False))

Step 2.3: Visualize Down Performance

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

x = by_down['down']
width = 0.35

# EPA by down
axes[0].bar(x - width/2, by_down['team_epa'], width, label='JAX', color='#006778')
axes[0].bar(x + width/2, by_down['league_epa'], width, label='League', color='gray')
axes[0].axhline(0, color='black', linewidth=0.5)
axes[0].set_xlabel('Down')
axes[0].set_ylabel('EPA per Play')
axes[0].set_title('EPA by Down: JAX vs League')
axes[0].legend()
axes[0].set_xticks([1, 2, 3, 4])

# Success rate by down
axes[1].bar(x - width/2, by_down['team_success'], width, label='JAX', color='#006778')
axes[1].bar(x + width/2, by_down['league_success'], width, label='League', color='gray')
axes[1].set_xlabel('Down')
axes[1].set_ylabel('Success Rate')
axes[1].set_title('Success Rate by Down: JAX vs League')
axes[1].legend()
axes[1].set_xticks([1, 2, 3, 4])

plt.tight_layout()
plt.savefig('jax_by_down.png', dpi=150, bbox_inches='tight')

Key Finding: [Identify which downs show the biggest gaps]


Part 3: Situational Deep Dives

Step 3.1: Red Zone Analysis

def analyze_red_zone(plays: pd.DataFrame, team: str) -> dict:
    """Analyze red zone performance."""

    rz = plays.query("yardline_100 <= 20")
    team_rz = rz.query(f"posteam == '{team}'")

    return {
        'rz_plays': len(team_rz),
        'rz_td_rate': team_rz['touchdown'].mean(),
        'rz_epa': team_rz['epa'].mean(),
        'rz_success': team_rz['success'].mean(),
        'rz_pass_rate': team_rz['pass'].mean(),
        'league_td_rate': rz['touchdown'].mean(),
        'league_rz_epa': rz['epa'].mean()
    }

rz_stats = analyze_red_zone(plays, 'JAX')
print(f"Red Zone Analysis:")
for key, value in rz_stats.items():
    print(f"  {key}: {value:.3f}" if isinstance(value, float) else f"  {key}: {value}")

Step 3.2: Third Down Efficiency

def analyze_third_down(plays: pd.DataFrame, team: str) -> pd.DataFrame:
    """Detailed third down analysis by distance."""

    third = plays.query("down == 3")

    # Bin by distance
    bins = [(1, 3, 'short'), (4, 6, 'medium'), (7, 10, 'long'), (11, 99, 'very_long')]

    results = []
    for min_d, max_d, label in bins:
        team_plays = third.query(f"posteam == '{team}' and {min_d} <= ydstogo <= {max_d}")
        league_plays = third.query(f"{min_d} <= ydstogo <= {max_d}")

        results.append({
            'distance': label,
            'team_conv_rate': team_plays['first_down'].mean(),
            'league_conv_rate': league_plays['first_down'].mean(),
            'team_epa': team_plays['epa'].mean(),
            'plays': len(team_plays)
        })

    return pd.DataFrame(results)

third_down = analyze_third_down(plays, 'JAX')
print(third_down.to_string(index=False))

Step 3.3: Late and Close Situations

def analyze_clutch(plays: pd.DataFrame, team: str) -> dict:
    """Analyze performance in close, late-game situations."""

    # Close game in 4th quarter
    clutch = plays.query(
        "qtr == 4 and abs(score_differential) <= 8"
    )

    team_clutch = clutch.query(f"posteam == '{team}'")

    return {
        'clutch_plays': len(team_clutch),
        'clutch_epa': team_clutch['epa'].mean(),
        'clutch_success': team_clutch['success'].mean(),
        'league_clutch_epa': clutch['epa'].mean(),
        'overall_epa': plays.query(f"posteam == '{team}'")['epa'].mean()
    }

clutch_stats = analyze_clutch(plays, 'JAX')
print(f"Clutch Situations (4Q, within 8 points):")
for key, value in clutch_stats.items():
    print(f"  {key}: {value:.3f}" if isinstance(value, float) else f"  {key}: {value}")

Part 4: Player-Level Analysis

Step 4.1: Quarterback Performance

def analyze_qb(plays: pd.DataFrame, team: str) -> pd.DataFrame:
    """Analyze quarterback performance."""

    passes = plays.query(f"pass == 1 and posteam == '{team}'")

    qb_stats = (
        passes
        .groupby('passer_player_name')
        .agg(
            dropbacks=('pass', 'count'),
            epa_per_play=('epa', 'mean'),
            cpoe=('cpoe', 'mean'),
            success_rate=('success', 'mean'),
            air_yards_avg=('air_yards', 'mean'),
            yac_avg=('yards_after_catch', 'mean'),
            sack_rate=('sack', 'mean'),
            interceptions=('interception', 'sum')
        )
        .reset_index()
        .query("dropbacks >= 50")
    )

    return qb_stats

jax_qb = analyze_qb(plays, 'JAX')
print(jax_qb.to_string(index=False))

Step 4.2: Receiver Usage

def analyze_receivers(plays: pd.DataFrame, team: str) -> pd.DataFrame:
    """Analyze receiver efficiency and usage."""

    team_passes = plays.query(f"pass == 1 and posteam == '{team}'")

    receiver_stats = (
        team_passes
        .query("receiver_player_name.notna()")
        .groupby('receiver_player_name')
        .agg(
            targets=('pass', 'count'),
            receptions=('complete_pass', 'sum'),
            yards=('yards_gained', 'sum'),
            epa_total=('epa', 'sum'),
            epa_per_target=('epa', 'mean'),
            avg_depth=('air_yards', 'mean'),
            yac_avg=('yards_after_catch', 'mean')
        )
        .reset_index()
        .query("targets >= 20")
        .assign(catch_rate=lambda x: x['receptions'] / x['targets'])
        .sort_values('targets', ascending=False)
    )

    return receiver_stats

jax_receivers = analyze_receivers(plays, 'JAX')
print(jax_receivers.head(10).to_string(index=False))

Step 4.3: Running Back Efficiency

def analyze_rushers(plays: pd.DataFrame, team: str) -> pd.DataFrame:
    """Analyze rusher efficiency."""

    team_runs = plays.query(f"rush == 1 and posteam == '{team}'")

    rusher_stats = (
        team_runs
        .query("rusher_player_name.notna()")
        .groupby('rusher_player_name')
        .agg(
            carries=('rush', 'count'),
            yards=('yards_gained', 'sum'),
            epa_total=('epa', 'sum'),
            epa_per_carry=('epa', 'mean'),
            success_rate=('success', 'mean'),
            explosive_rate=('yards_gained', lambda x: (x >= 10).mean())
        )
        .reset_index()
        .query("carries >= 20")
        .assign(ypc=lambda x: x['yards'] / x['carries'])
        .sort_values('carries', ascending=False)
    )

    return rusher_stats

jax_rushers = analyze_rushers(plays, 'JAX')
print(jax_rushers.to_string(index=False))

Part 5: Synthesis and Visualization

Step 5.1: Create Summary Dashboard

def create_diagnosis_dashboard(plays: pd.DataFrame, team: str):
    """Create 4-panel diagnosis dashboard."""

    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    team_plays = plays.query(f"posteam == '{team}'")

    # Panel 1: EPA by play type vs league
    pass_run = team_plays.groupby('play_type')['epa'].mean()
    league_pass_run = plays.groupby('play_type')['epa'].mean()

    x = np.arange(2)
    axes[0, 0].bar(x - 0.2, [pass_run.get('pass', 0), pass_run.get('run', 0)],
                   0.4, label=team, color='#006778')
    axes[0, 0].bar(x + 0.2, [league_pass_run.get('pass', 0), league_pass_run.get('run', 0)],
                   0.4, label='League', color='gray')
    axes[0, 0].set_xticks(x)
    axes[0, 0].set_xticklabels(['Pass', 'Run'])
    axes[0, 0].set_ylabel('EPA per Play')
    axes[0, 0].set_title('EPA by Play Type')
    axes[0, 0].legend()
    axes[0, 0].axhline(0, color='black', linewidth=0.5)

    # Panel 2: EPA by down
    down_epa = team_plays.groupby('down')['epa'].mean()
    league_down_epa = plays.groupby('down')['epa'].mean()

    downs = [1, 2, 3, 4]
    axes[0, 1].bar([d - 0.2 for d in downs], [down_epa.get(d, 0) for d in downs],
                   0.4, label=team, color='#006778')
    axes[0, 1].bar([d + 0.2 for d in downs], [league_down_epa.get(d, 0) for d in downs],
                   0.4, label='League', color='gray')
    axes[0, 1].set_xticks(downs)
    axes[0, 1].set_xlabel('Down')
    axes[0, 1].set_ylabel('EPA per Play')
    axes[0, 1].set_title('EPA by Down')
    axes[0, 1].legend()
    axes[0, 1].axhline(0, color='black', linewidth=0.5)

    # Panel 3: Weekly EPA trend
    weekly = team_plays.groupby('week')['epa'].mean()
    axes[1, 0].plot(weekly.index, weekly.values, marker='o', color='#006778')
    axes[1, 0].axhline(0, color='gray', linestyle='--', alpha=0.5)
    axes[1, 0].axhline(team_plays['epa'].mean(), color='#006778', linestyle='--',
                        label=f'Season Avg: {team_plays["epa"].mean():.3f}')
    axes[1, 0].set_xlabel('Week')
    axes[1, 0].set_ylabel('EPA per Play')
    axes[1, 0].set_title('Weekly EPA Trend')
    axes[1, 0].legend()

    # Panel 4: Success rate by field position
    team_plays['field_zone'] = pd.cut(
        team_plays['yardline_100'],
        bins=[0, 20, 40, 60, 80, 100],
        labels=['Red Zone', '20-40', '40-60', '60-80', 'Own 20']
    )
    zone_success = team_plays.groupby('field_zone')['success'].mean()
    league_zone = plays.copy()
    league_zone['field_zone'] = pd.cut(
        league_zone['yardline_100'],
        bins=[0, 20, 40, 60, 80, 100],
        labels=['Red Zone', '20-40', '40-60', '60-80', 'Own 20']
    )
    league_zone_success = league_zone.groupby('field_zone')['success'].mean()

    x = np.arange(5)
    axes[1, 1].bar(x - 0.2, zone_success.values, 0.4, label=team, color='#006778')
    axes[1, 1].bar(x + 0.2, league_zone_success.values, 0.4, label='League', color='gray')
    axes[1, 1].set_xticks(x)
    axes[1, 1].set_xticklabels(zone_success.index, rotation=45)
    axes[1, 1].set_ylabel('Success Rate')
    axes[1, 1].set_title('Success Rate by Field Position')
    axes[1, 1].legend()

    plt.suptitle(f'{team} 2023 Offensive Diagnosis', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.savefig(f'{team.lower()}_diagnosis.png', dpi=150, bbox_inches='tight')

    return fig

create_diagnosis_dashboard(plays, 'JAX')

Part 6: Conclusions and Recommendations

Step 6.1: Summary of Findings

Based on the EDA, document your key findings:

DIAGNOSIS SUMMARY: Jacksonville Jaguars 2023 Offense
=====================================================

FINDING 1: [Primary Issue]
- Evidence: [Specific metrics]
- Impact: [How much did this hurt the offense?]

FINDING 2: [Secondary Issue]
- Evidence: [Specific metrics]
- Impact: [Quantified impact]

FINDING 3: [Tertiary Issue]
- Evidence: [Specific metrics]
- Impact: [Quantified impact]

BRIGHT SPOTS:
- [What worked well?]
- [Areas of above-average performance]

Step 6.2: Recommendations

RECOMMENDATIONS FOR 2024 OFFSEASON
===================================

PRIORITY 1: [Address primary weakness]
- Specific action items
- Expected impact

PRIORITY 2: [Address secondary weakness]
- Specific action items
- Expected impact

PRIORITY 3: [Optimize strength]
- Specific action items
- Expected impact

Discussion Questions

  1. Data Limitations: What context is missing from play-by-play data that would help explain performance? (Injuries, scheme changes, opponent quality)

  2. Sample Size: For situational analysis (red zone, third down), when do we have enough plays to trust the numbers?

  3. Causation: How do we distinguish between "the QB played poorly" vs "the receivers didn't get open" vs "the play-calling was bad"?

  4. Actionability: Which of your findings can the team actually address vs which are baked into personnel?


Extension: Build Your Own Diagnosis

Apply this same framework to another team: 1. Choose a team that underperformed or overperformed expectations 2. Run through the complete analysis 3. Identify 3 key findings 4. Create a summary dashboard 5. Write a 1-page executive summary for the front office