Case Study 2: Fourth Down Decision Model

Overview

This case study develops a predictive model to support fourth down decision-making. We'll build models to estimate conversion probability and combine them with expected value calculations to recommend optimal decisions.

Background

Fourth down is the highest-leverage decision in football. Coaches must choose between: 1. Go for it: Attempt to gain the necessary yards 2. Punt: Give the ball to the opponent with better field position 3. Field goal: Attempt to score 3 points

Traditional coaching has been conservative, but analytics shows going for it is often undervalued.

Business Problem

A college football program wants a data-driven fourth down decision system that: 1. Estimates conversion probability based on situation 2. Calculates expected value for each option 3. Provides clear recommendations with confidence levels 4. Explains the reasoning in coach-friendly terms 5. Works in real-time during games

Solution Design

Decision Framework

The expected value calculation:

Go for it:

EV_go = P(convert) × (EP_success) + P(fail) × (EP_fail)

Punt:

EV_punt = EP_after_punt

Field goal:

EV_fg = P(make) × 3 + P(miss) × EP_after_miss

Where EP = Expected Points based on field position.

Implementation

Part 1: Conversion Probability Model

"""
Fourth Down Decision Model
Part 1: Conversion Probability Model
"""

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import cross_val_score
from dataclasses import dataclass
from typing import Dict, List, Tuple


@dataclass
class FourthDownSituation:
    """Fourth down game situation."""
    yards_to_go: int
    field_position: int  # Yards from own goal line
    score_differential: int
    time_remaining: float  # Minutes
    quarter: int


class ConversionProbabilityModel:
    """
    Predict fourth down conversion probability.
    """

    def __init__(self):
        self.model = None
        self.scaler = StandardScaler()
        self.feature_columns = [
            'yards_to_go', 'log_yards_to_go', 'field_position',
            'in_opponent_territory', 'red_zone', 'short_yardage',
            'time_pressure', 'trailing'
        ]

    def create_features(self, situations: pd.DataFrame) -> pd.DataFrame:
        """Create features from raw situations."""
        df = situations.copy()

        # Transform yards to go
        df['log_yards_to_go'] = np.log1p(df['yards_to_go'])

        # Field position indicators
        df['in_opponent_territory'] = (df['field_position'] > 50).astype(int)
        df['red_zone'] = (df['field_position'] >= 80).astype(int)

        # Yardage categories
        df['short_yardage'] = (df['yards_to_go'] <= 2).astype(int)

        # Game state
        df['time_pressure'] = ((df['quarter'] == 4) &
                               (df['time_remaining'] < 4)).astype(int)
        df['trailing'] = (df['score_differential'] < 0).astype(int)

        return df

    def train(self, plays: pd.DataFrame) -> Dict:
        """
        Train conversion probability model.

        Parameters:
        -----------
        plays : pd.DataFrame
            Historical fourth down plays with 'converted' target

        Returns:
        --------
        dict : Training metrics
        """
        # Create features
        df = self.create_features(plays)
        X = df[self.feature_columns]
        y = plays['converted']

        # Scale features
        X_scaled = self.scaler.fit_transform(X)

        # Train calibrated model
        base_model = LogisticRegression(max_iter=1000, random_state=42)
        self.model = CalibratedClassifierCV(base_model, cv=5)

        # Cross-validate before final fit
        cv_scores = cross_val_score(base_model, X_scaled, y, cv=5, scoring='accuracy')

        # Final fit
        self.model.fit(X_scaled, y)

        return {
            'cv_accuracy': cv_scores.mean(),
            'cv_std': cv_scores.std(),
            'n_samples': len(y),
            'conversion_rate': y.mean()
        }

    def predict(self, situation: FourthDownSituation) -> float:
        """
        Predict conversion probability for a situation.

        Returns:
        --------
        float : Conversion probability (0-1)
        """
        # Create feature dict
        features = {
            'yards_to_go': situation.yards_to_go,
            'field_position': situation.field_position,
            'score_differential': situation.score_differential,
            'time_remaining': situation.time_remaining,
            'quarter': situation.quarter
        }

        df = pd.DataFrame([features])
        df = self.create_features(df)

        X = df[self.feature_columns]
        X_scaled = self.scaler.transform(X)

        prob = self.model.predict_proba(X_scaled)[0, 1]
        return prob


class FieldGoalProbabilityModel:
    """
    Predict field goal success probability.
    """

    def __init__(self):
        self.model = None

    def predict(self, field_position: int) -> float:
        """
        Predict field goal probability.

        Uses a logistic function based on historical data.

        Parameters:
        -----------
        field_position : int
            Yards from own goal line

        Returns:
        --------
        float : Field goal probability
        """
        # Distance = 100 - field_position + 17 (end zone + hold)
        distance = 100 - field_position + 17

        if distance > 60:
            return 0.0  # Too far for attempt

        # Logistic model fit to historical data
        # P(make) = 1 / (1 + exp(0.1 * (distance - 35)))
        prob = 1 / (1 + np.exp(0.1 * (distance - 35)))

        return min(0.95, prob)  # Cap at 95%

Part 2: Expected Points Model

"""
Fourth Down Decision Model
Part 2: Expected Points Model
"""

import numpy as np
from typing import Dict


class ExpectedPointsModel:
    """
    Expected points by field position model.
    """

    # Pre-computed EP values by yard line (from own goal)
    # Based on historical drive outcomes
    EP_BY_YARD_LINE = {
        1: -0.6, 5: -0.4, 10: -0.1, 15: 0.1, 20: 0.3,
        25: 0.5, 30: 0.7, 35: 1.0, 40: 1.3, 45: 1.6,
        50: 2.0, 55: 2.4, 60: 2.8, 65: 3.2, 70: 3.6,
        75: 4.0, 80: 4.4, 85: 4.8, 90: 5.2, 95: 5.6,
        99: 6.0
    }

    def get_ep(self, field_position: int, possession: str = 'offense') -> float:
        """
        Get expected points for a field position.

        Parameters:
        -----------
        field_position : int
            Yards from own goal line (1-99)
        possession : str
            'offense' or 'defense'

        Returns:
        --------
        float : Expected points
        """
        # Clamp to valid range
        fp = max(1, min(99, field_position))

        # Interpolate between known values
        lower = (fp // 5) * 5
        upper = lower + 5
        if upper > 99:
            upper = 99

        lower_ep = self.EP_BY_YARD_LINE.get(max(1, lower), 0)
        upper_ep = self.EP_BY_YARD_LINE.get(min(99, upper), 0)

        # Linear interpolation
        if upper > lower:
            ep = lower_ep + (upper_ep - lower_ep) * (fp - lower) / (upper - lower)
        else:
            ep = lower_ep

        # If defense has ball, negate
        if possession == 'defense':
            ep = -ep

        return ep

    def get_punt_ep(self, field_position: int,
                   avg_punt_distance: float = 43) -> float:
        """
        Get expected points after a punt.

        Parameters:
        -----------
        field_position : int
            Current field position
        avg_punt_distance : float
            Average punt distance

        Returns:
        --------
        float : Expected points for opponent after punt
        """
        # Estimate opponent field position after punt
        punt_yards = min(avg_punt_distance, 100 - field_position - 10)  # Leave room for touchback
        opponent_fp = 100 - (field_position + punt_yards)
        opponent_fp = max(20, opponent_fp)  # Touchback gives 20 yard line

        # Opponent's EP is our negative EP
        return -self.get_ep(opponent_fp, 'offense')

Part 3: Decision Engine

"""
Fourth Down Decision Model
Part 3: Decision Engine
"""

import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import Dict, List, Optional
from enum import Enum


class FourthDownDecision(Enum):
    """Fourth down decision options."""
    GO_FOR_IT = "go_for_it"
    PUNT = "punt"
    FIELD_GOAL = "field_goal"


@dataclass
class DecisionAnalysis:
    """Complete decision analysis."""
    situation: 'FourthDownSituation'
    recommendation: FourthDownDecision
    confidence: str  # Strong, Moderate, Weak

    # Expected values
    ev_go: float
    ev_punt: float
    ev_fg: float

    # Probabilities
    conversion_prob: float
    fg_prob: float

    # Explanation
    reasoning: List[str]


class FourthDownDecisionEngine:
    """
    Complete fourth down decision support system.
    """

    def __init__(self):
        self.conversion_model = ConversionProbabilityModel()
        self.fg_model = FieldGoalProbabilityModel()
        self.ep_model = ExpectedPointsModel()
        self.is_trained = False

    def train(self, historical_plays: pd.DataFrame) -> Dict:
        """Train the decision engine."""
        metrics = self.conversion_model.train(historical_plays)
        self.is_trained = True
        return metrics

    def analyze(self, situation: FourthDownSituation) -> DecisionAnalysis:
        """
        Analyze a fourth down situation.

        Parameters:
        -----------
        situation : FourthDownSituation
            Current game situation

        Returns:
        --------
        DecisionAnalysis : Complete analysis with recommendation
        """
        # Get probabilities
        conv_prob = self.conversion_model.predict(situation)
        fg_prob = self.fg_model.predict(situation.field_position)

        # Calculate expected values
        ev_go = self._calculate_ev_go(situation, conv_prob)
        ev_punt = self._calculate_ev_punt(situation)
        ev_fg = self._calculate_ev_fg(situation, fg_prob)

        # Determine recommendation
        evs = {'go_for_it': ev_go, 'punt': ev_punt, 'field_goal': ev_fg}

        # Remove FG if too far
        if situation.field_position < 55:  # Need to be past midfield
            del evs['field_goal']
            ev_fg = float('-inf')

        best_decision = max(evs.keys(), key=lambda k: evs[k])
        recommendation = FourthDownDecision(best_decision)

        # Determine confidence
        sorted_evs = sorted(evs.values(), reverse=True)
        if len(sorted_evs) >= 2:
            ev_gap = sorted_evs[0] - sorted_evs[1]
        else:
            ev_gap = sorted_evs[0]

        if ev_gap > 1.0:
            confidence = 'Strong'
        elif ev_gap > 0.3:
            confidence = 'Moderate'
        else:
            confidence = 'Weak'

        # Generate reasoning
        reasoning = self._generate_reasoning(
            situation, recommendation, conv_prob, fg_prob,
            ev_go, ev_punt, ev_fg
        )

        return DecisionAnalysis(
            situation=situation,
            recommendation=recommendation,
            confidence=confidence,
            ev_go=round(ev_go, 2),
            ev_punt=round(ev_punt, 2),
            ev_fg=round(ev_fg, 2) if ev_fg != float('-inf') else None,
            conversion_prob=round(conv_prob, 3),
            fg_prob=round(fg_prob, 3),
            reasoning=reasoning
        )

    def _calculate_ev_go(self, situation: FourthDownSituation,
                        conv_prob: float) -> float:
        """Calculate EV for going for it."""
        # If convert: get first down EP at current position
        ep_success = self.ep_model.get_ep(situation.field_position)

        # If fail: opponent gets ball at current position
        ep_fail = -self.ep_model.get_ep(100 - situation.field_position)

        ev = conv_prob * ep_success + (1 - conv_prob) * ep_fail
        return ev

    def _calculate_ev_punt(self, situation: FourthDownSituation) -> float:
        """Calculate EV for punting."""
        return self.ep_model.get_punt_ep(situation.field_position)

    def _calculate_ev_fg(self, situation: FourthDownSituation,
                        fg_prob: float) -> float:
        """Calculate EV for field goal."""
        if situation.field_position < 55:
            return float('-inf')

        # If make: +3 points, kickoff
        ev_make = 3 + self.ep_model.get_ep(25)  # Assume touchback

        # If miss: opponent gets ball at spot of kick
        kick_spot = situation.field_position - 7  # 7 yards back
        ev_miss = -self.ep_model.get_ep(100 - kick_spot)

        ev = fg_prob * ev_make + (1 - fg_prob) * ev_miss
        return ev

    def _generate_reasoning(self, situation: FourthDownSituation,
                           recommendation: FourthDownDecision,
                           conv_prob: float, fg_prob: float,
                           ev_go: float, ev_punt: float, ev_fg: float
                           ) -> List[str]:
        """Generate human-readable reasoning."""
        reasons = []

        # Describe situation
        reasons.append(
            f"4th and {situation.yards_to_go} at the "
            f"{100 - situation.field_position} yard line"
        )

        # Conversion probability
        if conv_prob >= 0.6:
            reasons.append(f"High conversion probability ({conv_prob:.0%})")
        elif conv_prob >= 0.4:
            reasons.append(f"Moderate conversion probability ({conv_prob:.0%})")
        else:
            reasons.append(f"Low conversion probability ({conv_prob:.0%})")

        # Expected values
        if recommendation == FourthDownDecision.GO_FOR_IT:
            ev_advantage = ev_go - max(ev_punt, ev_fg if ev_fg > float('-inf') else ev_punt)
            reasons.append(f"Going for it adds {ev_advantage:.1f} expected points")
        elif recommendation == FourthDownDecision.PUNT:
            reasons.append("Field position gain from punt outweighs conversion chance")
        else:
            reasons.append(f"Field goal attempt has {fg_prob:.0%} success rate")

        # Game state factors
        if situation.score_differential < 0 and situation.quarter == 4:
            reasons.append("Trailing late - more aggressive approach warranted")
        elif situation.score_differential > 14:
            reasons.append("Big lead - conservative approach acceptable")

        return reasons

    def get_chart(self, situation: FourthDownSituation) -> pd.DataFrame:
        """
        Get a decision chart for various distances at current field position.
        """
        results = []

        for ytg in range(1, 11):
            sit = FourthDownSituation(
                yards_to_go=ytg,
                field_position=situation.field_position,
                score_differential=situation.score_differential,
                time_remaining=situation.time_remaining,
                quarter=situation.quarter
            )

            analysis = self.analyze(sit)

            results.append({
                'yards_to_go': ytg,
                'recommendation': analysis.recommendation.value,
                'conv_prob': analysis.conversion_prob,
                'ev_go': analysis.ev_go,
                'ev_punt': analysis.ev_punt,
                'confidence': analysis.confidence
            })

        return pd.DataFrame(results)


def generate_sample_plays(n_plays: int = 1000) -> pd.DataFrame:
    """Generate sample fourth down plays for training."""
    np.random.seed(42)

    plays = []

    for _ in range(n_plays):
        ytg = np.random.choice(range(1, 15), p=[0.15, 0.12, 0.10] + [0.063] * 12)
        fp = np.random.randint(20, 95)
        quarter = np.random.choice([1, 2, 3, 4], p=[0.25, 0.25, 0.25, 0.25])

        # Conversion probability depends on distance
        base_prob = 0.7 - 0.05 * ytg
        base_prob = max(0.15, min(0.75, base_prob))

        converted = np.random.random() < base_prob

        plays.append({
            'yards_to_go': ytg,
            'field_position': fp,
            'score_differential': np.random.randint(-21, 22),
            'time_remaining': np.random.uniform(0, 15),
            'quarter': quarter,
            'converted': int(converted)
        })

    return pd.DataFrame(plays)


# =============================================================================
# DEMONSTRATION
# =============================================================================

if __name__ == "__main__":
    print("=" * 70)
    print("FOURTH DOWN DECISION MODEL")
    print("=" * 70)

    # Generate training data
    print("\n1. Generating training data...")
    plays = generate_sample_plays(1000)
    print(f"   Generated {len(plays)} plays")
    print(f"   Overall conversion rate: {plays['converted'].mean():.1%}")

    # Train model
    print("\n2. Training decision engine...")
    engine = FourthDownDecisionEngine()
    metrics = engine.train(plays)
    print(f"   CV Accuracy: {metrics['cv_accuracy']:.3f}")

    # Analyze example situations
    print("\n3. Example analyses:")

    situations = [
        FourthDownSituation(yards_to_go=1, field_position=65, score_differential=0,
                          time_remaining=10, quarter=2),
        FourthDownSituation(yards_to_go=4, field_position=45, score_differential=-7,
                          time_remaining=3, quarter=4),
        FourthDownSituation(yards_to_go=3, field_position=75, score_differential=3,
                          time_remaining=8, quarter=3),
    ]

    for sit in situations:
        analysis = engine.analyze(sit)
        print(f"\n   4th & {sit.yards_to_go} at opponent {100-sit.field_position}")
        print(f"   Recommendation: {analysis.recommendation.value.upper()} ({analysis.confidence})")
        print(f"   Conversion prob: {analysis.conversion_prob:.1%}")
        print(f"   EV Go: {analysis.ev_go:+.2f}, EV Punt: {analysis.ev_punt:+.2f}")
        print(f"   Reasoning: {analysis.reasoning[0]}")

    # Decision chart
    print("\n4. Decision chart for opponent 35 (65 yard line):")
    chart_sit = FourthDownSituation(yards_to_go=1, field_position=65,
                                    score_differential=0, time_remaining=10, quarter=2)
    chart = engine.get_chart(chart_sit)
    print(chart[['yards_to_go', 'recommendation', 'conv_prob', 'ev_go', 'ev_punt']].to_string(index=False))

    print("\n" + "=" * 70)
    print("DEMONSTRATION COMPLETE")
    print("=" * 70)

Key Insights

Model Findings

  1. Distance is King: Yards to go is the strongest predictor of conversion probability. Short-yardage situations (1-2 yards) convert ~60%+, while long-yardage (10+) converts ~30%.

  2. Field Position Matters: The expected value calculation often recommends going for it in opponent territory more than coaches typically do.

  3. Game State Adjustments: Late-game situations with score deficits warrant more aggressive decision-making due to limited opportunities remaining.

Practical Applications

  1. Pre-Game Preparation: Generate decision charts for various scenarios before games.

  2. Real-Time Support: Quick lookup of recommendations during games.

  3. Post-Game Analysis: Review decisions against model recommendations to identify improvement opportunities.

Exercises

  1. Add weather factors: Extend the model to account for weather conditions affecting conversion probability.

  2. Team-specific models: Build team-specific conversion models accounting for offensive tendencies.

  3. Opponent adjustments: Incorporate opponent defensive strength into predictions.

Further Reading

  • Ben Baldwin's fourth down analysis methodology
  • NYT 4th Down Bot methodology
  • EdjSports fourth down decision system documentation