10 min read

The running back position has undergone a dramatic philosophical shift in modern NFL analytics. Once considered the cornerstone of championship teams, rushing production now faces scrutiny for its relatively low value compared to passing. This...

Chapter 7: Rushing Analytics

Chapter Overview

The running back position has undergone a dramatic philosophical shift in modern NFL analytics. Once considered the cornerstone of championship teams, rushing production now faces scrutiny for its relatively low value compared to passing. This chapter explores why traditional rushing statistics mislead evaluators, how Expected Points Added reveals the true value of running plays, and when rushing actually does matter. You'll learn to separate scheme from talent, understand the contextual factors that inflate or deflate rushing numbers, and develop a nuanced framework for evaluating running backs in today's NFL.

Learning Objectives

By the end of this chapter, you will be able to:

  1. Explain why yards per carry is a flawed metric for RB evaluation
  2. Calculate and interpret EPA-based rushing metrics
  3. Understand the relationship between opportunity and production
  4. Analyze how game script and score affect rushing statistics
  5. Evaluate offensive line contribution to rushing success
  6. Identify situational rushing value (goal line, short yardage)
  7. Compare running backs using efficiency and volume metrics

7.1 The Devaluation of the Running Back

Historical Context

For decades, the running back was the glamour position. The NFL's greatest teams featured legendary runners: Jim Brown, Walter Payton, Emmitt Smith, Barry Sanders. Teams drafted running backs early, paid them handsomely, and built offenses around their ground games.

Then came the analytics revolution.

Data revealed uncomfortable truths:

  1. Passing is more efficient: The average pass attempt produces roughly twice the EPA of the average rush
  2. Running back production is replaceable: The difference between "elite" and "average" RBs is smaller than at any other skill position
  3. Running backs have short careers: The typical peak lasts 2-4 years, making long-term contracts risky
  4. Opportunity drives statistics: Volume explains rushing production more than talent

This doesn't mean rushing is worthless—far from it. But it does mean we need better tools to evaluate when rushing matters, which backs truly add value, and how much teams should invest in the position.

The Pass-vs-Rush Efficiency Gap

import nfl_data_py as nfl
import pandas as pd
import numpy as np

# Load 2023 data
pbp = nfl.import_pbp_data([2023])

# Compare pass vs rush efficiency
play_type_epa = pbp.groupby('play_type').agg(
    plays=('play_id', 'count'),
    total_epa=('epa', 'sum'),
    epa_per_play=('epa', 'mean'),
    success_rate=('epa', lambda x: (x > 0).mean())
).round(3)

print(play_type_epa[['plays', 'epa_per_play', 'success_rate']])

Typical results:

Play Type EPA/Play Success Rate
Pass ~0.05 ~45%
Run ~-0.05 ~42%

The gap is stark: an average pass is worth about 0.10 EPA more than an average rush. Over a season with 500 rushing attempts, that difference represents 50 expected points—roughly 5 wins worth of value.

Why the gap exists:

  1. Passes can gain more yards per play (higher ceiling)
  2. Passes are more likely to gain first downs on third-and-long
  3. Passing yards have become more reliable as rules favor offense
  4. Defensive penalties are more common against the pass

7.2 Problems with Traditional Rushing Metrics

Yards Per Carry: The Illusion of Efficiency

Yards per carry (YPC) is the most commonly cited rushing metric. A back averaging 5.0 YPC seems excellent; 3.5 YPC seems poor. But YPC suffers from critical flaws:

Problem 1: It ignores context

A 3-yard gain on 3rd-and-2 is more valuable than a 6-yard gain on 1st-and-10 with a 21-point lead. YPC treats them the same.

Problem 2: It's heavily scheme-dependent

Outside zone schemes often produce higher YPC than power running schemes, regardless of RB talent.

Problem 3: It rewards volatility over consistency

A back with carries of 1, 1, 1, 1, 16 yards (YPC = 4.0) appears equal to one with 4, 4, 4, 4, 4 yards (YPC = 4.0). But the consistent back is more valuable—football rewards moving the chains.

Problem 4: It conflates the line with the back

The offensive line creates (or doesn't create) the initial running lanes. YPC reflects O-line performance as much as RB skill.

The Distribution of Rushing Plays

Rushing outcomes follow a particular pattern:

# Analyze rushing outcome distribution
rushes = pbp[pbp['rush_attempt'] == 1]

# Create yard buckets
rushes['yard_bucket'] = pd.cut(
    rushes['yards_gained'],
    bins=[-50, -1, 0, 3, 5, 10, 20, 100],
    labels=['Loss', '0', '1-3', '4-5', '6-10', '11-20', '21+']
)

distribution = rushes['yard_bucket'].value_counts(normalize=True).sort_index()
print("Rushing Outcome Distribution:")
print(distribution.round(3))

Typical distribution: - ~20% of rushes gain 0 or lose yards - ~35% gain 1-3 yards - ~20% gain 4-5 yards - ~15% gain 6-10 yards - ~10% gain 11+ yards

Most rushes cluster in a narrow, low-yardage band. The occasional explosive run inflates YPC while masking the high frequency of stuffed plays.

The Problem with Rushing Touchdowns

Rushing touchdowns seem like a measure of production, but they're largely a function of:

  1. Team red zone possessions (how often does the offense get close?)
  2. Goal-line opportunities (scheme and game script)
  3. Randomness (small sample sizes)

A back on a good offense with conservative goal-line plays will score more than a better back on a worse team. TDs tell us more about opportunity than ability.


7.3 EPA-Based Rushing Metrics

Expected Points Added for Rushes

EPA measures the change in expected points from before to after a play. For rushing:

def calculate_rush_epa(pbp: pd.DataFrame, min_carries: int = 100) -> pd.DataFrame:
    """Calculate EPA-based rushing metrics."""
    rushes = pbp[pbp['rush_attempt'] == 1].copy()

    rb_stats = (rushes
        .groupby('rusher_player_name')
        .agg(
            carries=('rush_attempt', 'sum'),
            total_epa=('epa', 'sum'),
            epa_per_carry=('epa', 'mean'),
            yards=('yards_gained', 'sum'),
            ypc=('yards_gained', 'mean'),
            success_rate=('epa', lambda x: (x > 0).mean()),
            first_down_rate=('first_down', 'mean'),
            td_rate=('rush_touchdown', 'mean'),
            fumble_rate=('fumble_lost', 'mean')
        )
        .query(f'carries >= {min_carries}')
        .sort_values('epa_per_carry', ascending=False)
        .round(3)
    )

    return rb_stats

Interpreting Rush EPA

Unlike passing, where positive EPA is common, the average rushing play has negative EPA. This means:

EPA/Carry Interpretation
> 0.05 Excellent
0.00 to 0.05 Above average
-0.10 to 0.00 Average
-0.15 to -0.10 Below average
< -0.15 Poor

Key insight: An RB with 0.00 EPA/carry is well above average! Breaking even is impressive in an inherently negative-value play type.

Success Rate: Consistency Over Explosiveness

Success rate measures how often a player adds value:

# Define success (EPA > 0)
rushes['success'] = rushes['epa'] > 0

# Alternative: traditional success (40%/50%/100% of yards needed)
def traditional_success(row):
    """Traditional success rate definition."""
    if row['down'] == 1:
        return row['yards_gained'] >= 0.40 * row['ydstogo']
    elif row['down'] == 2:
        return row['yards_gained'] >= 0.50 * row['ydstogo']
    else:  # 3rd or 4th down
        return row['yards_gained'] >= row['ydstogo']

rushes['trad_success'] = rushes.apply(traditional_success, axis=1)

Success rate captures consistency better than YPC:

  • A back with 50% success rate consistently moves chains
  • A back with 35% success rate often stalls drives
  • The difference is larger than their YPC might suggest

The Negative Expected Value Problem

Because average rushing EPA is negative, comparing RBs by EPA can be misleading:

# Example: Two RBs
rb_a = {'carries': 300, 'epa_per_carry': -0.02}  # "Good" back
rb_b = {'carries': 150, 'epa_per_carry': -0.08}  # "Bad" back

# Total EPA
rb_a_total = 300 * -0.02  # -6 EPA
rb_b_total = 150 * -0.08  # -12 EPA

# RB A looks better, but...
# Both hurt the team compared to passing

This is why analysts argue for reduced rushing volume—even "good" rushing often costs expected points compared to passing alternatives.


7.4 Opportunity vs. Production

The Volume-Efficiency Tradeoff

A fundamental challenge in rushing analysis: volume and efficiency often correlate negatively.

# Analyze volume vs efficiency relationship
rb_stats = calculate_rush_epa(pbp, min_carries=50)

# Correlation
corr = rb_stats['carries'].corr(rb_stats['epa_per_carry'])
print(f"Correlation between carries and EPA/carry: {corr:.3f}")

# Typically negative: around -0.20 to -0.40

Why this happens:

  1. Regression to the mean: High early-season efficiency leads to more carries, then performance regresses
  2. Game script: Teams ahead run more but face stacked boxes
  3. Fatigue: More carries may reduce per-carry efficiency
  4. Defense adjusts: Consistent usage becomes predictable

Yards Created vs. Yards Given

Not all rushing yards are equal. Some yards are "given" by the blocking scheme; others are "created" by the runner:

Yards before contact (YBC): Distance gained before first defender contact - Primarily reflects O-line and scheme - Often called "easy yards"

Yards after contact (YAC): Distance gained after initial contact - More attributable to the runner - Reflects power, elusiveness, and vision

# If YBC/YAC data is available (Next Gen Stats)
def decompose_rushing_production(rushes: pd.DataFrame) -> pd.DataFrame:
    """Separate O-line contribution from RB contribution."""
    if 'yards_before_contact' not in rushes.columns:
        print("Yards before/after contact not in dataset")
        return None

    decomposition = (rushes
        .groupby('rusher_player_name')
        .agg(
            carries=('rush_attempt', 'sum'),
            total_yards=('yards_gained', 'sum'),
            ybc=('yards_before_contact', 'sum'),
            yac=('yards_after_contact', 'sum'),
            ybc_per_carry=('yards_before_contact', 'mean'),
            yac_per_carry=('yards_after_contact', 'mean')
        )
    )

    decomposition['pct_ybc'] = decomposition['ybc'] / decomposition['total_yards']
    decomposition['pct_yac'] = decomposition['yac'] / decomposition['total_yards']

    return decomposition

Evaluating Rushing Talent vs. Situation

To isolate RB talent from situation:

  1. Compare to replacement: How does the RB perform vs. backups in the same system?
  2. Examine YAC specifically: Yards after contact better reflect individual skill
  3. Control for down/distance: Compare on similar play types
  4. Consider box defenders: Performance against stacked boxes shows ability under pressure
def analyze_vs_stacked_box(rushes: pd.DataFrame) -> pd.DataFrame:
    """Analyze performance against different box counts."""
    # Stacked box: 8+ defenders in box
    if 'defenders_in_box' not in rushes.columns:
        print("Defenders in box not available")
        return None

    rushes['stacked_box'] = rushes['defenders_in_box'] >= 8

    by_box = (rushes
        .groupby(['rusher_player_name', 'stacked_box'])
        .agg(
            carries=('rush_attempt', 'sum'),
            epa=('epa', 'mean'),
            ypc=('yards_gained', 'mean'),
            success=('epa', lambda x: (x > 0).mean())
        )
        .unstack()
    )

    return by_box

7.5 Game Script and Rushing Context

The Score Effect

Game script dramatically affects rushing statistics:

def analyze_game_script_effect(rushes: pd.DataFrame) -> pd.DataFrame:
    """Analyze rushing by score differential."""
    rushes['game_state'] = pd.cut(
        rushes['score_differential'],
        bins=[-100, -14, -7, 0, 7, 14, 100],
        labels=['Down 14+', 'Down 7-14', 'Down 1-7',
                'Tied/Up 1-7', 'Up 7-14', 'Up 14+']
    )

    script_analysis = (rushes
        .groupby('game_state')
        .agg(
            carries=('rush_attempt', 'count'),
            epa=('epa', 'mean'),
            ypc=('yards_gained', 'mean'),
            success=('epa', lambda x: (x > 0).mean())
        )
        .round(3)
    )

    return script_analysis

Typical findings:

Game State EPA/Carry YPC Notes
Down 14+ Higher Lower Defense expects pass
Close game Lower Average Normal game flow
Up 14+ Lower Lower Defense stacks box, less consequence

Key insight: Running backs on winning teams accumulate garbage-time carries with worse efficiency but don't affect wins. Those on losing teams have fewer chances.

Fourth Quarter Considerations

Late-game rushing presents unique analytical challenges:

def fourth_quarter_analysis(rushes: pd.DataFrame) -> pd.DataFrame:
    """Analyze 4th quarter rushing dynamics."""
    q4 = rushes[rushes['qtr'] == 4]

    # Split by game competitiveness
    q4['competitive'] = abs(q4['score_differential']) <= 7
    q4['running_clock'] = q4['score_differential'] > 0

    analysis = (q4
        .groupby(['competitive', 'running_clock'])
        .agg(
            carries=('rush_attempt', 'count'),
            epa=('epa', 'mean'),
            success=('epa', lambda x: (x > 0).mean())
        )
    )

    return analysis

When a team is running out the clock:

  • Success is redefined: Not losing yards and burning clock is success
  • EPA is inappropriate: The goal isn't maximizing points but running time
  • Volume inflates: Many low-value carries accumulate

Adjusting for Game Context

To fairly evaluate running backs:

def context_adjusted_rushing(rushes: pd.DataFrame) -> pd.DataFrame:
    """Adjust rushing stats for game context."""
    # Filter to "meaningful" carries
    meaningful = rushes[
        (rushes['qtr'] <= 3) |  # First 3 quarters
        (abs(rushes['score_differential']) <= 14)  # Or close in 4th
    ]

    adjusted_stats = (meaningful
        .groupby('rusher_player_name')
        .agg(
            adj_carries=('rush_attempt', 'sum'),
            adj_epa=('epa', 'mean'),
            adj_success=('epa', lambda x: (x > 0).mean())
        )
    )

    # Compare to raw stats
    raw_stats = (rushes
        .groupby('rusher_player_name')
        .agg(
            raw_carries=('rush_attempt', 'sum'),
            raw_epa=('epa', 'mean')
        )
    )

    combined = adjusted_stats.join(raw_stats)
    combined['context_penalty'] = combined['raw_epa'] - combined['adj_epa']

    return combined

7.6 Offensive Line and Scheme Effects

The O-Line Problem

Running back evaluation is inseparable from offensive line evaluation:

def team_rush_blocking_quality(pbp: pd.DataFrame) -> pd.DataFrame:
    """Estimate team rush blocking quality."""
    rushes = pbp[pbp['rush_attempt'] == 1]

    team_rushing = (rushes
        .groupby('posteam')
        .agg(
            carries=('rush_attempt', 'sum'),
            epa=('epa', 'mean'),
            ypc=('yards_gained', 'mean'),
            success_rate=('epa', lambda x: (x > 0).mean()),
            # Stuffed rate: runs for 0 or negative
            stuff_rate=('yards_gained', lambda x: (x <= 0).mean())
        )
        .sort_values('epa', ascending=False)
    )

    return team_rushing

What the O-line controls: - Initial hole creation - Yards before contact - Stuff rate (runs for 0 or loss)

What the RB controls: - Vision (finding and hitting the hole) - Yards after contact - Breakaway speed

Scheme-Based Rushing Styles

Different schemes produce different rushing profiles:

Outside Zone: - Lateral movement before hitting hole - Cutback opportunities - Generally higher YPC - Requires vision and patience

Inside Zone: - Attack between the tackles - Quick decisions - More consistent, lower variance

Power/Gap: - Pulling linemen create leads - Downhill running - Favors power over speed

def analyze_rush_direction(rushes: pd.DataFrame) -> pd.DataFrame:
    """Analyze rushing by gap/direction."""
    # Use run_gap and run_location if available
    if 'run_gap' not in rushes.columns:
        # Approximate with description or direction
        return None

    direction_analysis = (rushes
        .groupby('run_gap')
        .agg(
            carries=('rush_attempt', 'count'),
            epa=('epa', 'mean'),
            ypc=('yards_gained', 'mean'),
            success=('epa', lambda x: (x > 0).mean()),
            big_play_rate=('yards_gained', lambda x: (x >= 10).mean())
        )
        .sort_values('epa', ascending=False)
    )

    return direction_analysis

Isolating RB from System

Perfect isolation is impossible, but approximations help:

  1. Multiple RBs in same system: Compare backs sharing snaps
  2. Same RB, different systems: Track performance across team changes
  3. Yards after contact focus: Emphasize RB-attributable production
  4. Regression models: Control for O-line, scheme, and game factors
def compare_backfield_mates(rushes: pd.DataFrame, team: str) -> pd.DataFrame:
    """Compare RBs sharing the same backfield."""
    team_rushes = rushes[rushes['posteam'] == team]

    comparison = (team_rushes
        .groupby('rusher_player_name')
        .agg(
            carries=('rush_attempt', 'sum'),
            epa=('epa', 'mean'),
            success=('epa', lambda x: (x > 0).mean()),
            ypc=('yards_gained', 'mean')
        )
        .query('carries >= 30')
        .sort_values('epa', ascending=False)
    )

    return comparison

7.7 Situational Rushing Value

Where Running Matters

While passing is more efficient on average, specific situations favor running:

Short Yardage (3rd/4th and 1-2)

def short_yardage_analysis(pbp: pd.DataFrame) -> pd.DataFrame:
    """Analyze short yardage rushing."""
    short = pbp[
        (pbp['down'].isin([3, 4])) &
        (pbp['ydstogo'] <= 2)
    ]

    by_play_type = (short
        .groupby('play_type')
        .agg(
            plays=('play_id', 'count'),
            conversion_rate=('first_down', 'mean'),
            epa=('epa', 'mean'),
            td_rate=('touchdown', 'mean')
        )
    )

    return by_play_type

In short-yardage situations, rushing often performs better: - Shorter required gain reduces variance advantage of passing - Lower interception risk - Physical running is more reliable

Goal Line (Inside the 5)

def goal_line_analysis(pbp: pd.DataFrame) -> pd.DataFrame:
    """Analyze goal line rushing."""
    goal_line = pbp[pbp['yardline_100'] <= 5]

    gl_by_type = (goal_line
        .groupby('play_type')
        .agg(
            plays=('play_id', 'count'),
            td_rate=('touchdown', 'mean'),
            epa=('epa', 'mean')
        )
    )

    return gl_by_type

Goal-line rushing has unique dynamics: - Compressed field limits defensive options - Physical power becomes premium - Play-action becomes more effective

Late-Game Clock Management

When protecting leads, rushing serves non-scoring purposes:

def clock_killing_value(rushes: pd.DataFrame) -> pd.DataFrame:
    """Assess rushing in clock-killing situations."""
    clock_kill = rushes[
        (rushes['qtr'] == 4) &
        (rushes['score_differential'] > 0) &
        (rushes['score_differential'] <= 14) &
        (rushes['half_seconds_remaining'] <= 300)  # Last 5 minutes
    ]

    # Adjusted success: don't fumble, don't lose big
    clock_kill['clock_success'] = (
        (clock_kill['yards_gained'] >= -2) &
        (clock_kill['fumble_lost'] != 1)
    )

    analysis = (clock_kill
        .groupby('rusher_player_name')
        .agg(
            carries=('rush_attempt', 'sum'),
            clock_success=('clock_success', 'mean'),
            fumbles=('fumble_lost', 'sum')
        )
        .query('carries >= 10')
    )

    return analysis

Identifying Situational Specialists

Some backs excel in specific roles:

def identify_specialists(rushes: pd.DataFrame, min_carries: int = 50) -> pd.DataFrame:
    """Identify RB specialization roles."""
    # Short yardage carries
    short = rushes[rushes['ydstogo'] <= 2]
    short_specialists = (short
        .groupby('rusher_player_name')
        .agg(short_carries=('rush_attempt', 'sum'),
             short_success=('epa', lambda x: (x > 0).mean()))
    )

    # Receiving backs
    receiving = rushes  # Would need reception data
    # ...

    # Overall volume
    volume = (rushes
        .groupby('rusher_player_name')
        .agg(total_carries=('rush_attempt', 'sum'))
        .query(f'total_carries >= {min_carries}')
    )

    specialists = volume.join(short_specialists)

    # Flag short-yardage specialists
    specialists['short_yardage_specialist'] = (
        (specialists['short_carries'] >= 20) &
        (specialists['short_success'] > 0.55)
    )

    return specialists

7.8 Receiving Value for Running Backs

The Dual-Threat Premium

In modern NFL offenses, receiving ability separates elite backs from replacement-level:

def rushing_and_receiving_value(pbp: pd.DataFrame, min_touches: int = 100) -> pd.DataFrame:
    """Calculate combined rushing and receiving value."""
    # Rushing stats
    rushes = pbp[pbp['rush_attempt'] == 1]
    rush_stats = (rushes
        .groupby('rusher_player_name')
        .agg(
            carries=('rush_attempt', 'sum'),
            rush_epa=('epa', 'sum'),
            rush_epa_per=('epa', 'mean')
        )
    )

    # Receiving stats for RBs
    targets = pbp[(pbp['pass_attempt'] == 1)]
    # Filter to RB targets (would need position data)
    rec_stats = (targets
        .groupby('receiver_player_name')
        .agg(
            targets=('pass_attempt', 'sum'),
            receptions=('complete_pass', 'sum'),
            rec_epa=('epa', 'sum'),
            rec_epa_per=('epa', 'mean')
        )
    )

    # Join (matching player names)
    combined = rush_stats.join(rec_stats, how='outer').fillna(0)
    combined['total_touches'] = combined['carries'] + combined['targets']
    combined['total_epa'] = combined['rush_epa'] + combined['rec_epa']
    combined['epa_per_touch'] = combined['total_epa'] / combined['total_touches']

    return combined.query(f'total_touches >= {min_touches}')

Why Receiving Matters More

RB receptions are often more valuable than rushes because:

  1. Higher EPA per play: Short passes are more efficient than short runs
  2. Mismatch creation: RBs vs. linebackers favor the offense
  3. Versatility value: Forces defense to respect multiple threats
  4. Third-down utility: Receiving backs stay on field in passing situations

Pass-Catching Metrics for RBs

def rb_receiving_analysis(pbp: pd.DataFrame, rb_names: list = None) -> pd.DataFrame:
    """Analyze RB receiving contributions."""
    receptions = pbp[(pbp['pass_attempt'] == 1) & (pbp['complete_pass'] == 1)]

    if rb_names:
        receptions = receptions[receptions['receiver_player_name'].isin(rb_names)]

    rb_receiving = (receptions
        .groupby('receiver_player_name')
        .agg(
            receptions=('complete_pass', 'sum'),
            targets=('pass_attempt', 'count'),  # Will differ in full query
            yards=('yards_gained', 'sum'),
            yac=('yards_after_catch', 'sum'),
            epa=('epa', 'sum'),
            epa_per_rec=('epa', 'mean'),
            first_downs=('first_down', 'sum')
        )
    )

    rb_receiving['catch_rate'] = rb_receiving['receptions'] / rb_receiving['targets']
    rb_receiving['yac_per_rec'] = rb_receiving['yac'] / rb_receiving['receptions']

    return rb_receiving

7.9 Comprehensive RB Evaluation Framework

Multi-Metric Evaluation

No single metric captures RB value. A comprehensive evaluation includes:

class RBEvaluator:
    """Comprehensive running back evaluation framework."""

    def __init__(self, pbp: pd.DataFrame, min_carries: int = 100):
        self.pbp = pbp
        self.min_carries = min_carries
        self.rushes = pbp[pbp['rush_attempt'] == 1]

    def calculate_all_metrics(self) -> pd.DataFrame:
        """Calculate comprehensive RB metrics."""
        metrics = (self.rushes
            .groupby('rusher_player_name')
            .agg(
                # Volume
                carries=('rush_attempt', 'sum'),
                yards=('yards_gained', 'sum'),

                # Efficiency
                epa_total=('epa', 'sum'),
                epa_per_carry=('epa', 'mean'),
                ypc=('yards_gained', 'mean'),
                success_rate=('epa', lambda x: (x > 0).mean()),

                # Scoring
                touchdowns=('rush_touchdown', 'sum'),
                td_rate=('rush_touchdown', 'mean'),

                # Ball security
                fumbles=('fumble_lost', 'sum'),
                fumble_rate=('fumble_lost', 'mean'),

                # Explosiveness
                big_runs=('yards_gained', lambda x: (x >= 10).sum()),
                explosive_rate=('yards_gained', lambda x: (x >= 10).mean()),
                long_run=('yards_gained', 'max'),

                # Consistency
                stuff_rate=('yards_gained', lambda x: (x <= 0).mean()),
                median_gain=('yards_gained', 'median')
            )
            .query(f'carries >= {self.min_carries}')
        )

        # Add rankings
        metrics['epa_rank'] = metrics['epa_per_carry'].rank(ascending=False)
        metrics['success_rank'] = metrics['success_rate'].rank(ascending=False)

        return metrics.sort_values('epa_per_carry', ascending=False)

    def situational_breakdown(self, rb_name: str) -> dict:
        """Generate situational performance breakdown."""
        rb_rushes = self.rushes[self.rushes['rusher_player_name'] == rb_name]

        if len(rb_rushes) < 50:
            return {"error": "Insufficient sample size"}

        breakdown = {
            'overall': {
                'carries': len(rb_rushes),
                'epa': rb_rushes['epa'].mean(),
                'success': (rb_rushes['epa'] > 0).mean()
            },
            'by_down': {},
            'by_quarter': {},
            'by_score': {}
        }

        # By down
        for down in [1, 2, 3]:
            down_rushes = rb_rushes[rb_rushes['down'] == down]
            if len(down_rushes) >= 20:
                breakdown['by_down'][f'down_{down}'] = {
                    'carries': len(down_rushes),
                    'epa': down_rushes['epa'].mean(),
                    'success': (down_rushes['epa'] > 0).mean()
                }

        # By quarter
        for qtr in [1, 2, 3, 4]:
            qtr_rushes = rb_rushes[rb_rushes['qtr'] == qtr]
            if len(qtr_rushes) >= 15:
                breakdown['by_quarter'][f'Q{qtr}'] = {
                    'carries': len(qtr_rushes),
                    'epa': qtr_rushes['epa'].mean()
                }

        # By game score
        ahead = rb_rushes[rb_rushes['score_differential'] > 7]
        behind = rb_rushes[rb_rushes['score_differential'] < -7]
        close = rb_rushes[abs(rb_rushes['score_differential']) <= 7]

        breakdown['by_score'] = {
            'ahead': {'carries': len(ahead), 'epa': ahead['epa'].mean() if len(ahead) > 0 else 0},
            'behind': {'carries': len(behind), 'epa': behind['epa'].mean() if len(behind) > 0 else 0},
            'close': {'carries': len(close), 'epa': close['epa'].mean() if len(close) > 0 else 0}
        }

        return breakdown

    def generate_report(self, rb_name: str) -> str:
        """Generate text evaluation report."""
        metrics = self.calculate_all_metrics()

        if rb_name not in metrics.index:
            return f"RB {rb_name} not found or doesn't meet minimum carries"

        rb = metrics.loc[rb_name]
        n_rbs = len(metrics)
        situational = self.situational_breakdown(rb_name)

        report = f"""
========================================
RB EVALUATION REPORT: {rb_name}
========================================

VOLUME: {int(rb['carries'])} carries, {int(rb['yards'])} yards

EFFICIENCY:
  EPA/Carry: {rb['epa_per_carry']:.3f} (Rank: {int(rb['epa_rank'])}/{n_rbs})
  YPC: {rb['ypc']:.1f}
  Success Rate: {rb['success_rate']*100:.1f}% (Rank: {int(rb['success_rank'])}/{n_rbs})

EXPLOSIVENESS:
  10+ Yard Runs: {int(rb['big_runs'])} ({rb['explosive_rate']*100:.1f}%)
  Longest Run: {int(rb['long_run'])} yards

CONSISTENCY:
  Stuff Rate (0 or less): {rb['stuff_rate']*100:.1f}%
  Median Gain: {rb['median_gain']:.1f} yards

BALL SECURITY:
  Fumbles Lost: {int(rb['fumbles'])}
  Fumble Rate: {rb['fumble_rate']*100:.2f}%

SCORING:
  Touchdowns: {int(rb['touchdowns'])}

SITUATIONAL:
"""
        if 'by_score' in situational:
            scores = situational['by_score']
            report += f"""
  When Ahead (7+): {scores['ahead']['carries']} carries, {scores['ahead']['epa']:.3f} EPA
  Close Game: {scores['close']['carries']} carries, {scores['close']['epa']:.3f} EPA
  When Behind: {scores['behind']['carries']} carries, {scores['behind']['epa']:.3f} EPA
"""

        # Assessment
        report += "\nASSESSMENT:\n"
        if rb['epa_per_carry'] > 0:
            report += "  - Above-average efficiency (positive EPA rare for RBs)\n"
        if rb['success_rate'] > 0.45:
            report += "  - Highly consistent (45%+ success rate)\n"
        if rb['explosive_rate'] > 0.12:
            report += "  - Explosive threat (12%+ big play rate)\n"
        if rb['fumble_rate'] < 0.005:
            report += "  - Excellent ball security\n"
        if rb['stuff_rate'] > 0.22:
            report += "  - Concern: High stuff rate (may be O-line or vision)\n"

        return report

The Value Hierarchy

Based on analytics, RB value comes from (in order):

  1. Receiving ability: Highest EPA per opportunity
  2. Efficiency in meaningful situations: Close-game success rate
  3. Ball security: Fumbles have massive negative EPA
  4. Short-yardage conversion: Situational value
  5. Volume on good teams: Wins correlate with rushing volume (but causation is reversed)

7.10 Practical Applications

Draft and Contract Valuation

Analytics has transformed how teams value running backs:

def expected_value_calculation(rb_stats: pd.DataFrame) -> pd.DataFrame:
    """Estimate RB value for contract purposes."""
    # Estimate EPA value (very rough)
    # 1 win ≈ 10 EPA, 1 win ≈ $2-3M on market
    rb_stats = rb_stats.copy()

    rb_stats['estimated_wins_above_replacement'] = (
        (rb_stats['epa_total'] - (-0.08 * rb_stats['carries']))  # vs replacement (-0.08 EPA/carry)
    ) / 10  # EPA per win

    rb_stats['estimated_value_M'] = rb_stats['estimated_wins_above_replacement'] * 2.5

    return rb_stats[['carries', 'epa_total', 'estimated_wins_above_replacement', 'estimated_value_M']]

Key insights for RB valuation:

  1. Don't pay for volume: Yards and TDs are available cheaply
  2. Pay for receiving: Dual-threat backs command premiums
  3. Avoid long contracts: Short peak windows create risk
  4. Draft for value: Day 2-3 picks can produce starting quality

Team Rushing Strategy Analysis

def analyze_team_rush_strategy(pbp: pd.DataFrame, team: str) -> dict:
    """Analyze a team's rushing strategy."""
    team_plays = pbp[pbp['posteam'] == team]
    rushes = team_plays[team_plays['rush_attempt'] == 1]

    strategy = {
        'rush_rate': rushes.shape[0] / team_plays.shape[0],
        'rush_epa': rushes['epa'].mean(),
        'first_down_rush_rate': (
            team_plays[team_plays['down'] == 1]['rush_attempt'].mean()
        ),
        'second_and_long_rush_rate': (
            team_plays[(team_plays['down'] == 2) & (team_plays['ydstogo'] >= 7)]['rush_attempt'].mean()
        ),
        'leading_rush_rate': (
            team_plays[team_plays['score_differential'] > 7]['rush_attempt'].mean()
        ),
        'behind_rush_rate': (
            team_plays[team_plays['score_differential'] < -7]['rush_attempt'].mean()
        )
    }

    return strategy

Identifying Undervalued Backs

def find_undervalued_rbs(pbp: pd.DataFrame) -> pd.DataFrame:
    """Identify potentially undervalued running backs."""
    rb_stats = RBEvaluator(pbp, min_carries=50).calculate_all_metrics()

    # High efficiency, low volume = potentially underused
    undervalued = rb_stats[
        (rb_stats['epa_per_carry'] > 0) &
        (rb_stats['carries'] < 150) &
        (rb_stats['success_rate'] > 0.45)
    ].sort_values('epa_per_carry', ascending=False)

    return undervalued

Chapter Summary

Key Takeaways

  1. Passing is more efficient than rushing on average by about 0.10 EPA per play
  2. YPC is misleading: It ignores context, rewards volatility, and conflates O-line with RB
  3. EPA and success rate provide better efficiency measures for rushers
  4. Game script inflates volume stats: Winning teams rush more in low-leverage situations
  5. The O-line drives a significant portion of rushing production (yards before contact)
  6. Receiving ability is the differentiating RB skill in modern NFL value
  7. Situational rushing matters: Short yardage, goal line, and clock management
  8. RB contracts are often bad investments: Short careers and replaceability

Common Analytical Mistakes

Mistake Better Approach
Using only YPC Add success rate and EPA
Ignoring game script Filter to competitive situations
Crediting RB for all yards Separate YBC from YAC
Volume = value Efficiency matters more
TD totals TD rate, context-adjusted

Looking Ahead

Chapter 8 explores receiving analytics, examining how to evaluate pass-catchers in a passing-dominated league. We'll learn about target share, efficiency metrics, separation, and the contested catch rate—metrics that increasingly drive offensive success.


Practice Exercises

See the accompanying exercises.md file for hands-on practice problems ranging from basic EPA calculations to comprehensive RB evaluation systems.

Further Reading

See further-reading.md for academic papers, industry resources, and data sources for advanced rushing analytics.