Case Study: Diagnosing the Ground Game
Scenario
The Pittsburgh Steelers front office is concerned. After investing a second-round pick in a running back and signing a free agent guard, their rushing attack ranks 28th in yards per carry. The coaching staff blames the offensive line. The offensive line coach blames the running back's "hesitation." The analytics department has been asked: Is the rushing failure due to the O-line, the running back, or something else?
Team Rushing Stats: - 402 rushing attempts - 1,489 yards (3.70 YPC) - 18.5% stuff rate (league avg: 19.2%) - 8 rushing touchdowns
Data Gathering
import nfl_data_py as nfl
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Load 2023 data
pbp = nfl.import_pbp_data([2023])
# Filter to team rushes
rushes = pbp[(pbp['rush_attempt'] == 1) & (pbp['posteam'] == 'PIT')]
print(f"Total Rushes: {len(rushes)}")
print(f"Total Yards: {rushes['yards_gained'].sum()}")
print(f"YPC: {rushes['yards_gained'].mean():.2f}")
Analysis 1: Adjusted Line Yards
First, let's calculate ALY to better isolate O-line contribution:
def calculate_aly(yards):
"""ALY credit system."""
if yards < 0:
return yards * 1.25
elif yards <= 4:
return yards
elif yards <= 10:
return 4 + (yards - 4) * 0.5
else:
return 4 + 3 + (yards - 10) * 0.25
rushes['line_yards'] = rushes['yards_gained'].apply(calculate_aly)
team_aly = rushes['line_yards'].mean()
league_aly = pbp[pbp['rush_attempt'] == 1]['yards_gained'].apply(calculate_aly).mean()
print(f"Team ALY: {team_aly:.2f}")
print(f"League ALY: {league_aly:.2f}")
print(f"ALY Rank: Team is {'above' if team_aly > league_aly else 'below'} average")
Finding: The team's ALY is 3.65, ranking 20th in the league. This is below average but better than their raw YPC rank (28th). The O-line is creating some holes; the backs aren't finishing.
Analysis 2: Stuff Rate by Back
Let's compare the team's running backs:
# Compare all RBs on the team
rb_comparison = (rushes
.groupby('rusher_player_name')
.agg(
carries=('rush_attempt', 'count'),
yards=('yards_gained', 'sum'),
ypc=('yards_gained', 'mean'),
stuff_rate=('yards_gained', lambda x: (x <= 0).mean()),
explosive=('yards_gained', lambda x: (x >= 10).mean()),
epa=('epa', 'mean')
)
.query('carries >= 20')
.sort_values('carries', ascending=False)
)
print("Running Back Comparison:")
print(rb_comparison.round(3).to_string())
Finding:
| RB | Carries | YPC | Stuff Rate | EPA |
|---|---|---|---|---|
| Starter | 247 | 3.52 | 20.2% | -0.08 |
| Backup A | 89 | 4.12 | 15.7% | -0.02 |
| Backup B | 45 | 3.84 | 17.8% | -0.05 |
The backup running backs are significantly more efficient than the starter, despite facing the same O-line. This suggests the starter may be part of the problem.
Analysis 3: Distribution of Outcomes
# Analyze distribution of rushing outcomes
def outcome_distribution(df):
"""Calculate distribution of rushing outcomes."""
return {
'negative': (df['yards_gained'] < 0).mean(),
'zero': (df['yards_gained'] == 0).mean(),
'short': ((df['yards_gained'] > 0) & (df['yards_gained'] <= 4)).mean(),
'medium': ((df['yards_gained'] > 4) & (df['yards_gained'] <= 10)).mean(),
'explosive': (df['yards_gained'] > 10).mean()
}
starter_dist = outcome_distribution(rushes[rushes['rusher_player_name'] == 'N.Harris'])
backup_dist = outcome_distribution(rushes[rushes['rusher_player_name'] == 'J.Warren'])
league_dist = outcome_distribution(pbp[pbp['rush_attempt'] == 1])
print("Outcome Distribution:")
print(f"Category | Starter | Backup | League")
print(f"Negative | {starter_dist['negative']:.1%} | {backup_dist['negative']:.1%} | {league_dist['negative']:.1%}")
print(f"Zero | {starter_dist['zero']:.1%} | {backup_dist['zero']:.1%} | {league_dist['zero']:.1%}")
print(f"1-4 yards | {starter_dist['short']:.1%} | {backup_dist['short']:.1%} | {league_dist['short']:.1%}")
print(f"5-10 yards | {starter_dist['medium']:.1%} | {backup_dist['medium']:.1%} | {league_dist['medium']:.1%}")
print(f"10+ yards | {starter_dist['explosive']:.1%} | {backup_dist['explosive']:.1%} | {league_dist['explosive']:.1%}")
Finding: The starter has more negative runs and fewer explosive runs than both the backup and league average. The backup matches or exceeds league averages in most categories.
Analysis 4: Run Direction
# Analyze by run direction (if available)
if 'run_location' in rushes.columns:
direction_analysis = (rushes
.groupby(['rusher_player_name', 'run_location'])
.agg(
carries=('rush_attempt', 'count'),
ypc=('yards_gained', 'mean'),
aly=('line_yards', 'mean')
)
.query('carries >= 10')
.reset_index()
)
# Pivot for comparison
direction_pivot = direction_analysis.pivot(
index='rusher_player_name',
columns='run_location',
values='ypc'
)
print("YPC by Direction:")
print(direction_pivot.round(2).to_string())
Finding: Both backs perform similarly on runs to the left and middle, but the starter underperforms significantly on runs to the right. This suggests either the starter's vision issue or a specific right-side blocking problem.
Analysis 5: First Contact Analysis
Without tracking data, we approximate first contact:
# Proxy for blocking quality: percentage of runs reaching 4+ yards
# (Roughly reaching the second level)
def second_level_rate(df):
return (df['yards_gained'] >= 4).mean()
starter_second_level = second_level_rate(rushes[rushes['rusher_player_name'] == 'N.Harris'])
backup_second_level = second_level_rate(rushes[rushes['rusher_player_name'] == 'J.Warren'])
print(f"Reaching second level (4+ yards):")
print(f" Starter: {starter_second_level:.1%}")
print(f" Backup: {backup_second_level:.1%}")
print(f" League: {second_level_rate(pbp[pbp['rush_attempt'] == 1]):.1%}")
Finding: The backup reaches the second level more often than the starter behind the same O-line.
Analysis 6: Short Yardage Performance
Short yardage is more O-line dependent:
# Short yardage (1-2 yards to go)
short = rushes[rushes['ydstogo'] <= 2]
short_by_rb = (short
.groupby('rusher_player_name')
.agg(
attempts=('rush_attempt', 'count'),
conversion=('first_down', 'mean'),
ypc=('yards_gained', 'mean')
)
.query('attempts >= 10')
)
print("Short Yardage Performance:")
print(short_by_rb.round(3).to_string())
Finding: Both backs convert short yardage at similar rates (~72%), suggesting the O-line performs adequately when the play design is straightforward.
Visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Plot 1: YPC comparison
ax1 = axes[0, 0]
rbs = ['Starter', 'Backup A', 'Backup B']
ypc = [3.52, 4.12, 3.84]
ax1.bar(rbs, ypc, color=['red', 'green', 'green'])
ax1.axhline(y=4.2, color='blue', linestyle='--', label='League Avg')
ax1.set_ylabel('YPC')
ax1.set_title('YPC by Running Back')
ax1.legend()
# Plot 2: Stuff rate comparison
ax2 = axes[0, 1]
stuff = [20.2, 15.7, 17.8]
ax2.bar(rbs, stuff, color=['red', 'green', 'green'])
ax2.axhline(y=19.2, color='blue', linestyle='--', label='League Avg')
ax2.set_ylabel('Stuff Rate (%)')
ax2.set_title('Stuff Rate by Running Back')
ax2.legend()
# Plot 3: Outcome distribution
ax3 = axes[1, 0]
categories = ['Negative', 'Zero', '1-4', '5-10', '10+']
starter_pcts = [9.3, 10.9, 42.1, 25.5, 12.1]
backup_pcts = [6.7, 9.0, 39.3, 29.2, 15.7]
x = np.arange(len(categories))
width = 0.35
ax3.bar(x - width/2, starter_pcts, width, label='Starter')
ax3.bar(x + width/2, backup_pcts, width, label='Backup')
ax3.set_xticks(x)
ax3.set_xticklabels(categories)
ax3.set_ylabel('Percentage')
ax3.set_title('Outcome Distribution')
ax3.legend()
# Plot 4: EPA comparison
ax4 = axes[1, 1]
epa = [-0.08, -0.02, -0.05]
colors = ['red', 'green', 'green']
ax4.bar(rbs, epa, color=colors)
ax4.axhline(y=-0.04, color='blue', linestyle='--', label='League Avg')
ax4.set_ylabel('EPA per Carry')
ax4.set_title('EPA by Running Back')
ax4.legend()
plt.tight_layout()
plt.savefig('rushing_diagnosis.png', dpi=300, bbox_inches='tight')
plt.close()
Conclusions
Diagnosis
The O-line is not the primary problem.
Evidence: 1. ALY is average (20th), not poor like YPC rank (28th) 2. Backup RBs outperform behind the same line 3. Short yardage works when scheme is simple 4. Stuff rate is league average - holes are being created
The starter running back is underperforming.
Evidence: 1. Lower YPC than backups in same system 2. Higher stuff rate suggesting hesitation 3. Fewer explosives despite similar opportunities 4. Worse EPA per carry
Recommendations
Immediate
- Increase backup's carries: Give more work to the more efficient backs
- Analyze film for hesitation: The starter may be reading blocks slowly
- Simplify scheme for starter: Reduce decision-making complexity
Medium-term
- Consider RB1/RB2 split: No longer treat starter as bell cow
- Evaluate right side blocking: One area of consistent weakness
- Re-evaluate investment: Was the RB2 pick the right value?
What NOT to do
- Don't fire the O-line coach: The line is performing adequately
- Don't draft another RB early: The issue is usage, not talent depth
- Don't abandon the run: The scheme works when executed
Caveats and Limitations
- Without tracking data, we can't precisely measure yards before contact
- Scheme differences may explain some RB variance (different play calls)
- Opponent factors not fully controlled
- Sample sizes for backups are smaller (more variance)
- Injury/fatigue not accounted for
Discussion Questions
-
Could the starter's hesitation be a response to O-line failures the metrics don't capture?
-
How would this analysis change if we had Next Gen Stats yards before contact data?
-
What other factors might explain the difference between starter and backups?
-
Should teams ever move on from a high draft pick RB based on efficiency metrics?
-
How might coaching adjustments help the starter improve?