Case Study 2: Fourth Down Decision Model
Overview
This case study develops a predictive model to support fourth down decision-making. We'll build models to estimate conversion probability and combine them with expected value calculations to recommend optimal decisions.
Background
Fourth down is the highest-leverage decision in football. Coaches must choose between: 1. Go for it: Attempt to gain the necessary yards 2. Punt: Give the ball to the opponent with better field position 3. Field goal: Attempt to score 3 points
Traditional coaching has been conservative, but analytics shows going for it is often undervalued.
Business Problem
A college football program wants a data-driven fourth down decision system that: 1. Estimates conversion probability based on situation 2. Calculates expected value for each option 3. Provides clear recommendations with confidence levels 4. Explains the reasoning in coach-friendly terms 5. Works in real-time during games
Solution Design
Decision Framework
The expected value calculation:
Go for it:
EV_go = P(convert) × (EP_success) + P(fail) × (EP_fail)
Punt:
EV_punt = EP_after_punt
Field goal:
EV_fg = P(make) × 3 + P(miss) × EP_after_miss
Where EP = Expected Points based on field position.
Implementation
Part 1: Conversion Probability Model
"""
Fourth Down Decision Model
Part 1: Conversion Probability Model
"""
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import cross_val_score
from dataclasses import dataclass
from typing import Dict, List, Tuple
@dataclass
class FourthDownSituation:
"""Fourth down game situation."""
yards_to_go: int
field_position: int # Yards from own goal line
score_differential: int
time_remaining: float # Minutes
quarter: int
class ConversionProbabilityModel:
"""
Predict fourth down conversion probability.
"""
def __init__(self):
self.model = None
self.scaler = StandardScaler()
self.feature_columns = [
'yards_to_go', 'log_yards_to_go', 'field_position',
'in_opponent_territory', 'red_zone', 'short_yardage',
'time_pressure', 'trailing'
]
def create_features(self, situations: pd.DataFrame) -> pd.DataFrame:
"""Create features from raw situations."""
df = situations.copy()
# Transform yards to go
df['log_yards_to_go'] = np.log1p(df['yards_to_go'])
# Field position indicators
df['in_opponent_territory'] = (df['field_position'] > 50).astype(int)
df['red_zone'] = (df['field_position'] >= 80).astype(int)
# Yardage categories
df['short_yardage'] = (df['yards_to_go'] <= 2).astype(int)
# Game state
df['time_pressure'] = ((df['quarter'] == 4) &
(df['time_remaining'] < 4)).astype(int)
df['trailing'] = (df['score_differential'] < 0).astype(int)
return df
def train(self, plays: pd.DataFrame) -> Dict:
"""
Train conversion probability model.
Parameters:
-----------
plays : pd.DataFrame
Historical fourth down plays with 'converted' target
Returns:
--------
dict : Training metrics
"""
# Create features
df = self.create_features(plays)
X = df[self.feature_columns]
y = plays['converted']
# Scale features
X_scaled = self.scaler.fit_transform(X)
# Train calibrated model
base_model = LogisticRegression(max_iter=1000, random_state=42)
self.model = CalibratedClassifierCV(base_model, cv=5)
# Cross-validate before final fit
cv_scores = cross_val_score(base_model, X_scaled, y, cv=5, scoring='accuracy')
# Final fit
self.model.fit(X_scaled, y)
return {
'cv_accuracy': cv_scores.mean(),
'cv_std': cv_scores.std(),
'n_samples': len(y),
'conversion_rate': y.mean()
}
def predict(self, situation: FourthDownSituation) -> float:
"""
Predict conversion probability for a situation.
Returns:
--------
float : Conversion probability (0-1)
"""
# Create feature dict
features = {
'yards_to_go': situation.yards_to_go,
'field_position': situation.field_position,
'score_differential': situation.score_differential,
'time_remaining': situation.time_remaining,
'quarter': situation.quarter
}
df = pd.DataFrame([features])
df = self.create_features(df)
X = df[self.feature_columns]
X_scaled = self.scaler.transform(X)
prob = self.model.predict_proba(X_scaled)[0, 1]
return prob
class FieldGoalProbabilityModel:
"""
Predict field goal success probability.
"""
def __init__(self):
self.model = None
def predict(self, field_position: int) -> float:
"""
Predict field goal probability.
Uses a logistic function based on historical data.
Parameters:
-----------
field_position : int
Yards from own goal line
Returns:
--------
float : Field goal probability
"""
# Distance = 100 - field_position + 17 (end zone + hold)
distance = 100 - field_position + 17
if distance > 60:
return 0.0 # Too far for attempt
# Logistic model fit to historical data
# P(make) = 1 / (1 + exp(0.1 * (distance - 35)))
prob = 1 / (1 + np.exp(0.1 * (distance - 35)))
return min(0.95, prob) # Cap at 95%
Part 2: Expected Points Model
"""
Fourth Down Decision Model
Part 2: Expected Points Model
"""
import numpy as np
from typing import Dict
class ExpectedPointsModel:
"""
Expected points by field position model.
"""
# Pre-computed EP values by yard line (from own goal)
# Based on historical drive outcomes
EP_BY_YARD_LINE = {
1: -0.6, 5: -0.4, 10: -0.1, 15: 0.1, 20: 0.3,
25: 0.5, 30: 0.7, 35: 1.0, 40: 1.3, 45: 1.6,
50: 2.0, 55: 2.4, 60: 2.8, 65: 3.2, 70: 3.6,
75: 4.0, 80: 4.4, 85: 4.8, 90: 5.2, 95: 5.6,
99: 6.0
}
def get_ep(self, field_position: int, possession: str = 'offense') -> float:
"""
Get expected points for a field position.
Parameters:
-----------
field_position : int
Yards from own goal line (1-99)
possession : str
'offense' or 'defense'
Returns:
--------
float : Expected points
"""
# Clamp to valid range
fp = max(1, min(99, field_position))
# Interpolate between known values
lower = (fp // 5) * 5
upper = lower + 5
if upper > 99:
upper = 99
lower_ep = self.EP_BY_YARD_LINE.get(max(1, lower), 0)
upper_ep = self.EP_BY_YARD_LINE.get(min(99, upper), 0)
# Linear interpolation
if upper > lower:
ep = lower_ep + (upper_ep - lower_ep) * (fp - lower) / (upper - lower)
else:
ep = lower_ep
# If defense has ball, negate
if possession == 'defense':
ep = -ep
return ep
def get_punt_ep(self, field_position: int,
avg_punt_distance: float = 43) -> float:
"""
Get expected points after a punt.
Parameters:
-----------
field_position : int
Current field position
avg_punt_distance : float
Average punt distance
Returns:
--------
float : Expected points for opponent after punt
"""
# Estimate opponent field position after punt
punt_yards = min(avg_punt_distance, 100 - field_position - 10) # Leave room for touchback
opponent_fp = 100 - (field_position + punt_yards)
opponent_fp = max(20, opponent_fp) # Touchback gives 20 yard line
# Opponent's EP is our negative EP
return -self.get_ep(opponent_fp, 'offense')
Part 3: Decision Engine
"""
Fourth Down Decision Model
Part 3: Decision Engine
"""
import pandas as pd
import numpy as np
from dataclasses import dataclass
from typing import Dict, List, Optional
from enum import Enum
class FourthDownDecision(Enum):
"""Fourth down decision options."""
GO_FOR_IT = "go_for_it"
PUNT = "punt"
FIELD_GOAL = "field_goal"
@dataclass
class DecisionAnalysis:
"""Complete decision analysis."""
situation: 'FourthDownSituation'
recommendation: FourthDownDecision
confidence: str # Strong, Moderate, Weak
# Expected values
ev_go: float
ev_punt: float
ev_fg: float
# Probabilities
conversion_prob: float
fg_prob: float
# Explanation
reasoning: List[str]
class FourthDownDecisionEngine:
"""
Complete fourth down decision support system.
"""
def __init__(self):
self.conversion_model = ConversionProbabilityModel()
self.fg_model = FieldGoalProbabilityModel()
self.ep_model = ExpectedPointsModel()
self.is_trained = False
def train(self, historical_plays: pd.DataFrame) -> Dict:
"""Train the decision engine."""
metrics = self.conversion_model.train(historical_plays)
self.is_trained = True
return metrics
def analyze(self, situation: FourthDownSituation) -> DecisionAnalysis:
"""
Analyze a fourth down situation.
Parameters:
-----------
situation : FourthDownSituation
Current game situation
Returns:
--------
DecisionAnalysis : Complete analysis with recommendation
"""
# Get probabilities
conv_prob = self.conversion_model.predict(situation)
fg_prob = self.fg_model.predict(situation.field_position)
# Calculate expected values
ev_go = self._calculate_ev_go(situation, conv_prob)
ev_punt = self._calculate_ev_punt(situation)
ev_fg = self._calculate_ev_fg(situation, fg_prob)
# Determine recommendation
evs = {'go_for_it': ev_go, 'punt': ev_punt, 'field_goal': ev_fg}
# Remove FG if too far
if situation.field_position < 55: # Need to be past midfield
del evs['field_goal']
ev_fg = float('-inf')
best_decision = max(evs.keys(), key=lambda k: evs[k])
recommendation = FourthDownDecision(best_decision)
# Determine confidence
sorted_evs = sorted(evs.values(), reverse=True)
if len(sorted_evs) >= 2:
ev_gap = sorted_evs[0] - sorted_evs[1]
else:
ev_gap = sorted_evs[0]
if ev_gap > 1.0:
confidence = 'Strong'
elif ev_gap > 0.3:
confidence = 'Moderate'
else:
confidence = 'Weak'
# Generate reasoning
reasoning = self._generate_reasoning(
situation, recommendation, conv_prob, fg_prob,
ev_go, ev_punt, ev_fg
)
return DecisionAnalysis(
situation=situation,
recommendation=recommendation,
confidence=confidence,
ev_go=round(ev_go, 2),
ev_punt=round(ev_punt, 2),
ev_fg=round(ev_fg, 2) if ev_fg != float('-inf') else None,
conversion_prob=round(conv_prob, 3),
fg_prob=round(fg_prob, 3),
reasoning=reasoning
)
def _calculate_ev_go(self, situation: FourthDownSituation,
conv_prob: float) -> float:
"""Calculate EV for going for it."""
# If convert: get first down EP at current position
ep_success = self.ep_model.get_ep(situation.field_position)
# If fail: opponent gets ball at current position
ep_fail = -self.ep_model.get_ep(100 - situation.field_position)
ev = conv_prob * ep_success + (1 - conv_prob) * ep_fail
return ev
def _calculate_ev_punt(self, situation: FourthDownSituation) -> float:
"""Calculate EV for punting."""
return self.ep_model.get_punt_ep(situation.field_position)
def _calculate_ev_fg(self, situation: FourthDownSituation,
fg_prob: float) -> float:
"""Calculate EV for field goal."""
if situation.field_position < 55:
return float('-inf')
# If make: +3 points, kickoff
ev_make = 3 + self.ep_model.get_ep(25) # Assume touchback
# If miss: opponent gets ball at spot of kick
kick_spot = situation.field_position - 7 # 7 yards back
ev_miss = -self.ep_model.get_ep(100 - kick_spot)
ev = fg_prob * ev_make + (1 - fg_prob) * ev_miss
return ev
def _generate_reasoning(self, situation: FourthDownSituation,
recommendation: FourthDownDecision,
conv_prob: float, fg_prob: float,
ev_go: float, ev_punt: float, ev_fg: float
) -> List[str]:
"""Generate human-readable reasoning."""
reasons = []
# Describe situation
reasons.append(
f"4th and {situation.yards_to_go} at the "
f"{100 - situation.field_position} yard line"
)
# Conversion probability
if conv_prob >= 0.6:
reasons.append(f"High conversion probability ({conv_prob:.0%})")
elif conv_prob >= 0.4:
reasons.append(f"Moderate conversion probability ({conv_prob:.0%})")
else:
reasons.append(f"Low conversion probability ({conv_prob:.0%})")
# Expected values
if recommendation == FourthDownDecision.GO_FOR_IT:
ev_advantage = ev_go - max(ev_punt, ev_fg if ev_fg > float('-inf') else ev_punt)
reasons.append(f"Going for it adds {ev_advantage:.1f} expected points")
elif recommendation == FourthDownDecision.PUNT:
reasons.append("Field position gain from punt outweighs conversion chance")
else:
reasons.append(f"Field goal attempt has {fg_prob:.0%} success rate")
# Game state factors
if situation.score_differential < 0 and situation.quarter == 4:
reasons.append("Trailing late - more aggressive approach warranted")
elif situation.score_differential > 14:
reasons.append("Big lead - conservative approach acceptable")
return reasons
def get_chart(self, situation: FourthDownSituation) -> pd.DataFrame:
"""
Get a decision chart for various distances at current field position.
"""
results = []
for ytg in range(1, 11):
sit = FourthDownSituation(
yards_to_go=ytg,
field_position=situation.field_position,
score_differential=situation.score_differential,
time_remaining=situation.time_remaining,
quarter=situation.quarter
)
analysis = self.analyze(sit)
results.append({
'yards_to_go': ytg,
'recommendation': analysis.recommendation.value,
'conv_prob': analysis.conversion_prob,
'ev_go': analysis.ev_go,
'ev_punt': analysis.ev_punt,
'confidence': analysis.confidence
})
return pd.DataFrame(results)
def generate_sample_plays(n_plays: int = 1000) -> pd.DataFrame:
"""Generate sample fourth down plays for training."""
np.random.seed(42)
plays = []
for _ in range(n_plays):
ytg = np.random.choice(range(1, 15), p=[0.15, 0.12, 0.10] + [0.063] * 12)
fp = np.random.randint(20, 95)
quarter = np.random.choice([1, 2, 3, 4], p=[0.25, 0.25, 0.25, 0.25])
# Conversion probability depends on distance
base_prob = 0.7 - 0.05 * ytg
base_prob = max(0.15, min(0.75, base_prob))
converted = np.random.random() < base_prob
plays.append({
'yards_to_go': ytg,
'field_position': fp,
'score_differential': np.random.randint(-21, 22),
'time_remaining': np.random.uniform(0, 15),
'quarter': quarter,
'converted': int(converted)
})
return pd.DataFrame(plays)
# =============================================================================
# DEMONSTRATION
# =============================================================================
if __name__ == "__main__":
print("=" * 70)
print("FOURTH DOWN DECISION MODEL")
print("=" * 70)
# Generate training data
print("\n1. Generating training data...")
plays = generate_sample_plays(1000)
print(f" Generated {len(plays)} plays")
print(f" Overall conversion rate: {plays['converted'].mean():.1%}")
# Train model
print("\n2. Training decision engine...")
engine = FourthDownDecisionEngine()
metrics = engine.train(plays)
print(f" CV Accuracy: {metrics['cv_accuracy']:.3f}")
# Analyze example situations
print("\n3. Example analyses:")
situations = [
FourthDownSituation(yards_to_go=1, field_position=65, score_differential=0,
time_remaining=10, quarter=2),
FourthDownSituation(yards_to_go=4, field_position=45, score_differential=-7,
time_remaining=3, quarter=4),
FourthDownSituation(yards_to_go=3, field_position=75, score_differential=3,
time_remaining=8, quarter=3),
]
for sit in situations:
analysis = engine.analyze(sit)
print(f"\n 4th & {sit.yards_to_go} at opponent {100-sit.field_position}")
print(f" Recommendation: {analysis.recommendation.value.upper()} ({analysis.confidence})")
print(f" Conversion prob: {analysis.conversion_prob:.1%}")
print(f" EV Go: {analysis.ev_go:+.2f}, EV Punt: {analysis.ev_punt:+.2f}")
print(f" Reasoning: {analysis.reasoning[0]}")
# Decision chart
print("\n4. Decision chart for opponent 35 (65 yard line):")
chart_sit = FourthDownSituation(yards_to_go=1, field_position=65,
score_differential=0, time_remaining=10, quarter=2)
chart = engine.get_chart(chart_sit)
print(chart[['yards_to_go', 'recommendation', 'conv_prob', 'ev_go', 'ev_punt']].to_string(index=False))
print("\n" + "=" * 70)
print("DEMONSTRATION COMPLETE")
print("=" * 70)
Key Insights
Model Findings
-
Distance is King: Yards to go is the strongest predictor of conversion probability. Short-yardage situations (1-2 yards) convert ~60%+, while long-yardage (10+) converts ~30%.
-
Field Position Matters: The expected value calculation often recommends going for it in opponent territory more than coaches typically do.
-
Game State Adjustments: Late-game situations with score deficits warrant more aggressive decision-making due to limited opportunities remaining.
Practical Applications
-
Pre-Game Preparation: Generate decision charts for various scenarios before games.
-
Real-Time Support: Quick lookup of recommendations during games.
-
Post-Game Analysis: Review decisions against model recommendations to identify improvement opportunities.
Exercises
-
Add weather factors: Extend the model to account for weather conditions affecting conversion probability.
-
Team-specific models: Build team-specific conversion models accounting for offensive tendencies.
-
Opponent adjustments: Incorporate opponent defensive strength into predictions.
Further Reading
- Ben Baldwin's fourth down analysis methodology
- NYT 4th Down Bot methodology
- EdjSports fourth down decision system documentation