Sometimes a single prediction isn't enough. Instead of asking "Who will win?" we want to explore the full range of possibilities: "What might happen?" Game simulation uses Monte Carlo methods to generate thousands of potential outcomes, producing...
In This Chapter
- Introduction
- Learning Objectives
- Part 1: Monte Carlo Fundamentals
- Part 2: Score-Based Game Simulation
- Part 3: Drive-by-Drive Simulation
- Part 4: Season Simulation
- Part 5: In-Game Win Probability
- Part 6: Simulation Validation
- Part 7: Practical Applications
- Limitations and Appropriate Use
- Summary
- Looking Ahead
- Chapter Summary
Chapter 21: Game Simulation
When analytics meets Monte Carlo
Introduction
Sometimes a single prediction isn't enough. Instead of asking "Who will win?" we want to explore the full range of possibilities: "What might happen?" Game simulation uses Monte Carlo methods to generate thousands of potential outcomes, producing probability distributions for any metric—win probability, final score, margin of victory, or even specific game scenarios.
This chapter covers the theory and practice of NFL game simulation. We'll build simulators from simple score-based models to sophisticated drive-by-drive engines, understand when simulation adds value, and learn to interpret simulation outputs appropriately.
Learning Objectives
By the end of this chapter, you will be able to:
- Explain the Monte Carlo method and its application to sports
- Build a basic game simulation model
- Implement drive-by-drive game simulation
- Generate and interpret probability distributions from simulations
- Simulate season outcomes and playoff scenarios
- Validate simulation models against historical data
- Understand the limitations and appropriate uses of simulation
Part 1: Monte Carlo Fundamentals
What Is Monte Carlo Simulation?
Monte Carlo simulation uses random sampling to explore the space of possible outcomes. Named after the famous casino, the method generates many random scenarios according to specified probability distributions, then aggregates results to estimate probabilities of interest.
For NFL games, this means: 1. Model how scores or plays are generated 2. Run thousands of simulated games 3. Analyze the distribution of outcomes
Why Simulate?
Single-point predictions have limitations: - A predicted spread of -7 doesn't tell us the probability of winning by 14+ - Win probability doesn't reveal the distribution of possible scores - Uncertainty isn't captured by point estimates
Simulation provides: - Full probability distributions - Confidence intervals - Scenario analysis ("What if?" questions) - Path-dependent outcomes (how games unfold)
Core Concepts
import numpy as np
from scipy import stats
from typing import Dict, List, Tuple
from dataclasses import dataclass
@dataclass
class SimulationResult:
"""Container for simulation outcomes."""
home_wins: int
away_wins: int
ties: int
home_scores: List[int]
away_scores: List[int]
margins: List[int]
n_simulations: int
@property
def home_win_pct(self) -> float:
return self.home_wins / self.n_simulations
@property
def mean_margin(self) -> float:
return np.mean(self.margins)
@property
def margin_std(self) -> float:
return np.std(self.margins)
def probability_of_margin(self, threshold: int) -> float:
"""P(margin >= threshold) for home team."""
return sum(1 for m in self.margins if m >= threshold) / self.n_simulations
def basic_monte_carlo_demo(home_mean: float = 24, away_mean: float = 21,
score_std: float = 10, n_sims: int = 10000) -> SimulationResult:
"""
Basic Monte Carlo game simulation.
Assumes scores are normally distributed (simplified).
"""
# Generate random scores
home_scores = np.random.normal(home_mean, score_std, n_sims)
away_scores = np.random.normal(away_mean, score_std, n_sims)
# Round to integers and ensure non-negative
home_scores = np.maximum(0, np.round(home_scores)).astype(int)
away_scores = np.maximum(0, np.round(away_scores)).astype(int)
# Calculate outcomes
margins = home_scores - away_scores
home_wins = np.sum(margins > 0)
away_wins = np.sum(margins < 0)
ties = np.sum(margins == 0)
return SimulationResult(
home_wins=int(home_wins),
away_wins=int(away_wins),
ties=int(ties),
home_scores=home_scores.tolist(),
away_scores=away_scores.tolist(),
margins=margins.tolist(),
n_simulations=n_sims
)
How Many Simulations?
The number of simulations affects precision:
| Simulations | Standard Error (for 50% probability) |
|---|---|
| 1,000 | ±1.58% |
| 10,000 | ±0.50% |
| 100,000 | ±0.16% |
| 1,000,000 | ±0.05% |
For most NFL applications, 10,000-100,000 simulations provide sufficient precision.
Part 2: Score-Based Game Simulation
The Normal Distribution Model
The simplest game simulation assumes scores follow normal distributions:
$$Home \sim N(\mu_{home}, \sigma)$$ $$Away \sim N(\mu_{away}, \sigma)$$
Where $\mu$ is expected score and $\sigma$ is standard deviation.
class ScoreBasedSimulator:
"""
Game simulation using score distributions.
Models team scores as correlated normal distributions.
"""
def __init__(self, score_std: float = 10.0,
correlation: float = 0.1,
min_score: int = 0,
home_advantage: float = 2.5):
"""
Initialize simulator.
Args:
score_std: Standard deviation of individual team scores
correlation: Correlation between team scores (game pace)
min_score: Minimum possible score
home_advantage: Points added to home team expected score
"""
self.score_std = score_std
self.correlation = correlation
self.min_score = min_score
self.home_advantage = home_advantage
def simulate_game(self, home_rating: float, away_rating: float,
n_sims: int = 10000) -> SimulationResult:
"""
Simulate game outcomes.
Args:
home_rating: Home team's average points scored
away_rating: Away team's average points scored
Returns:
SimulationResult with all outcomes
"""
# Adjust for home advantage
home_mean = home_rating + self.home_advantage
away_mean = away_rating
# Generate correlated scores
# Using bivariate normal distribution
mean = [home_mean, away_mean]
cov = [[self.score_std**2, self.correlation * self.score_std**2],
[self.correlation * self.score_std**2, self.score_std**2]]
scores = np.random.multivariate_normal(mean, cov, n_sims)
home_scores = np.maximum(self.min_score, np.round(scores[:, 0])).astype(int)
away_scores = np.maximum(self.min_score, np.round(scores[:, 1])).astype(int)
margins = home_scores - away_scores
return SimulationResult(
home_wins=int(np.sum(margins > 0)),
away_wins=int(np.sum(margins < 0)),
ties=int(np.sum(margins == 0)),
home_scores=home_scores.tolist(),
away_scores=away_scores.tolist(),
margins=margins.tolist(),
n_simulations=n_sims
)
def probability_distribution(self, result: SimulationResult) -> Dict:
"""
Generate probability distribution from results.
"""
margins = np.array(result.margins)
# Key probabilities
return {
'home_win_prob': result.home_win_pct,
'away_win_prob': result.away_wins / result.n_simulations,
'tie_prob': result.ties / result.n_simulations,
'home_cover_7': np.mean(margins >= 7),
'home_cover_3': np.mean(margins >= 3),
'home_cover_minus_3': np.mean(margins >= -3),
'blowout_home': np.mean(margins >= 14),
'blowout_away': np.mean(margins <= -14),
'one_score_game': np.mean(np.abs(margins) <= 8),
'mean_margin': float(np.mean(margins)),
'std_margin': float(np.std(margins)),
'percentile_10': float(np.percentile(margins, 10)),
'percentile_50': float(np.percentile(margins, 50)),
'percentile_90': float(np.percentile(margins, 90))
}
Score Correlation
In real NFL games, team scores are slightly correlated due to: - Game pace (high-scoring games tend to have both teams score more) - Weather conditions affecting both teams - Defensive style (low-scoring games affect both teams)
Typical correlation: 0.10-0.15
Realistic Score Distributions
NFL scores aren't truly normally distributed—they're discrete and have specific patterns:
class RealisticScoreSimulator(ScoreBasedSimulator):
"""
Simulator with realistic NFL score distributions.
Accounts for:
- Common score patterns (3, 7, 10 multiples)
- Score clustering
- Non-normal distribution shape
"""
# Common NFL final scores (empirical frequencies)
COMMON_SCORES = {
0: 0.02, 3: 0.05, 6: 0.04, 7: 0.06, 9: 0.02, 10: 0.07,
13: 0.06, 14: 0.05, 16: 0.04, 17: 0.08, 19: 0.02, 20: 0.07,
21: 0.05, 23: 0.04, 24: 0.07, 26: 0.03, 27: 0.05, 28: 0.04,
30: 0.04, 31: 0.04, 34: 0.03, 35: 0.02, 37: 0.02, 38: 0.02
}
def _adjust_to_realistic_score(self, raw_score: float) -> int:
"""
Adjust continuous score to realistic NFL score.
Maps to common score values with appropriate probabilities.
"""
# Round to nearest likely score
base_score = int(round(raw_score))
# Find closest common score
common = list(self.COMMON_SCORES.keys())
closest = min(common, key=lambda x: abs(x - base_score))
# With some probability, use the closest common score
if np.random.random() < 0.3:
return closest
return max(0, base_score)
def simulate_game(self, home_rating: float, away_rating: float,
n_sims: int = 10000) -> SimulationResult:
"""Simulate with realistic score adjustments."""
# Get base simulation
base_result = super().simulate_game(home_rating, away_rating, n_sims)
# Adjust scores for realism
home_scores = [self._adjust_to_realistic_score(s)
for s in base_result.home_scores]
away_scores = [self._adjust_to_realistic_score(s)
for s in base_result.away_scores]
margins = [h - a for h, a in zip(home_scores, away_scores)]
return SimulationResult(
home_wins=sum(1 for m in margins if m > 0),
away_wins=sum(1 for m in margins if m < 0),
ties=sum(1 for m in margins if m == 0),
home_scores=home_scores,
away_scores=away_scores,
margins=margins,
n_simulations=n_sims
)
Part 3: Drive-by-Drive Simulation
Beyond Final Scores
Score-based simulation captures outcomes but not how games unfold. Drive-by-drive simulation models the actual game process:
- Coin toss determines first possession
- Drives alternate between teams
- Each drive ends in: touchdown, field goal, punt, turnover, or end of half/game
- Score accumulates through drives
- Game ends after regulation (or overtime if tied)
Drive Outcome Model
from enum import Enum
from dataclasses import dataclass, field
from typing import Optional
class DriveResult(Enum):
TOUCHDOWN = 'touchdown'
FIELD_GOAL = 'field_goal'
PUNT = 'punt'
TURNOVER = 'turnover'
TURNOVER_TD = 'turnover_td' # Defensive/special teams TD
SAFETY = 'safety'
END_OF_HALF = 'end_of_half'
MISSED_FG = 'missed_fg'
@dataclass
class DriveOutcome:
result: DriveResult
points_scored: int
field_position_after: int # Yard line for opponent
time_used: float # Minutes
@dataclass
class GameState:
home_score: int = 0
away_score: int = 0
home_possession: bool = True
quarter: int = 1
time_remaining: float = 15.0 # Minutes in quarter
field_position: int = 25 # Yard line
drive_number: int = 0
class DriveSimulator:
"""
Simulate individual drives based on team efficiency.
Uses empirical probabilities for drive outcomes.
"""
# Base drive outcome probabilities (league average)
BASE_OUTCOMES = {
DriveResult.TOUCHDOWN: 0.22,
DriveResult.FIELD_GOAL: 0.15,
DriveResult.PUNT: 0.38,
DriveResult.TURNOVER: 0.12,
DriveResult.TURNOVER_TD: 0.02,
DriveResult.SAFETY: 0.005,
DriveResult.MISSED_FG: 0.05,
DriveResult.END_OF_HALF: 0.025
}
def __init__(self, home_off_rating: float = 0.0,
away_off_rating: float = 0.0,
home_def_rating: float = 0.0,
away_def_rating: float = 0.0):
"""
Initialize with team ratings.
Ratings are relative to league average (0 = average).
Positive = better, negative = worse.
"""
self.home_off = home_off_rating
self.away_off = away_off_rating
self.home_def = home_def_rating
self.away_def = away_def_rating
def _get_drive_probabilities(self, offense_home: bool) -> Dict[DriveResult, float]:
"""
Get drive outcome probabilities adjusted for teams.
"""
# Get relevant ratings
if offense_home:
off_rating = self.home_off
def_rating = self.away_def
else:
off_rating = self.away_off
def_rating = self.home_def
# Net advantage (positive = offense advantage)
net_advantage = off_rating - def_rating
# Adjust probabilities
probs = self.BASE_OUTCOMES.copy()
# Better offense/worse defense = more scoring
td_adjustment = net_advantage * 0.03 # Each point of rating = 3% TD rate change
fg_adjustment = net_advantage * 0.01
probs[DriveResult.TOUCHDOWN] = np.clip(
probs[DriveResult.TOUCHDOWN] + td_adjustment, 0.10, 0.40
)
probs[DriveResult.FIELD_GOAL] = np.clip(
probs[DriveResult.FIELD_GOAL] + fg_adjustment, 0.08, 0.25
)
# Adjust punts to balance
remaining = 1.0 - sum(probs[k] for k in probs if k != DriveResult.PUNT)
probs[DriveResult.PUNT] = max(0.15, remaining)
# Normalize
total = sum(probs.values())
return {k: v/total for k, v in probs.items()}
def simulate_drive(self, state: GameState) -> DriveOutcome:
"""
Simulate a single drive.
Returns drive outcome with points and field position.
"""
probs = self._get_drive_probabilities(state.home_possession)
# Sample outcome
outcomes = list(probs.keys())
probabilities = list(probs.values())
result = np.random.choice(outcomes, p=probabilities)
# Determine points and field position
if result == DriveResult.TOUCHDOWN:
# Assume ~95% XP success
points = 7 if np.random.random() < 0.95 else 6
field_position = 25 # Kickoff return average
time_used = np.random.uniform(2.0, 4.5)
elif result == DriveResult.FIELD_GOAL:
points = 3
field_position = 25
time_used = np.random.uniform(2.5, 4.0)
elif result == DriveResult.PUNT:
points = 0
field_position = np.random.randint(15, 35) # After punt
time_used = np.random.uniform(1.5, 3.5)
elif result == DriveResult.TURNOVER:
points = 0
# Turnover at random spot
field_position = 100 - np.random.randint(20, 50)
time_used = np.random.uniform(1.0, 3.0)
elif result == DriveResult.TURNOVER_TD:
points = -7 # Opponent scores
field_position = 25
time_used = np.random.uniform(0.5, 2.0)
elif result == DriveResult.SAFETY:
points = -2 # Opponent scores 2
field_position = 35 # Free kick
time_used = np.random.uniform(1.0, 2.5)
elif result == DriveResult.MISSED_FG:
points = 0
field_position = min(80, state.field_position + 10) # Spot of kick
time_used = np.random.uniform(2.0, 3.5)
else: # END_OF_HALF
points = 0
field_position = 25
time_used = state.time_remaining
return DriveOutcome(
result=result,
points_scored=points,
field_position_after=field_position,
time_used=time_used
)
Full Game Simulation
@dataclass
class FullGameResult:
"""Complete game simulation result."""
final_home_score: int
final_away_score: int
home_won: bool
quarter_scores: List[Tuple[int, int]]
drive_log: List[Dict]
total_drives: int
went_to_overtime: bool
class FullGameSimulator:
"""
Complete game simulation engine.
Simulates games drive-by-drive with realistic game flow.
"""
QUARTER_MINUTES = 15.0
OVERTIME_MINUTES = 10.0
def __init__(self, home_off: float = 0, away_off: float = 0,
home_def: float = 0, away_def: float = 0):
self.drive_sim = DriveSimulator(home_off, away_off, home_def, away_def)
def simulate_game(self) -> FullGameResult:
"""Simulate a complete game."""
state = GameState()
drive_log = []
quarter_scores = []
# Coin toss (50/50 for first possession)
state.home_possession = np.random.random() < 0.5
# Simulate regulation
for quarter in range(1, 5):
state.quarter = quarter
state.time_remaining = self.QUARTER_MINUTES
quarter_start_scores = (state.home_score, state.away_score)
while state.time_remaining > 0:
# Simulate drive
outcome = self.drive_sim.simulate_drive(state)
# Update state
if state.home_possession:
state.home_score += max(0, outcome.points_scored)
if outcome.points_scored < 0: # Defensive score
state.away_score += abs(outcome.points_scored)
else:
state.away_score += max(0, outcome.points_scored)
if outcome.points_scored < 0:
state.home_score += abs(outcome.points_scored)
state.time_remaining -= outcome.time_used
state.field_position = outcome.field_position_after
state.home_possession = not state.home_possession
state.drive_number += 1
drive_log.append({
'drive': state.drive_number,
'quarter': quarter,
'offense': 'home' if not state.home_possession else 'away',
'result': outcome.result.value,
'points': outcome.points_scored,
'time_used': outcome.time_used
})
# Record quarter scores
quarter_scores.append((
state.home_score - quarter_start_scores[0],
state.away_score - quarter_start_scores[1]
))
# Check for overtime
went_to_overtime = False
if state.home_score == state.away_score:
went_to_overtime = True
self._simulate_overtime(state, drive_log)
return FullGameResult(
final_home_score=state.home_score,
final_away_score=state.away_score,
home_won=state.home_score > state.away_score,
quarter_scores=quarter_scores,
drive_log=drive_log,
total_drives=state.drive_number,
went_to_overtime=went_to_overtime
)
def _simulate_overtime(self, state: GameState, drive_log: List):
"""Simulate overtime period (simplified)."""
state.time_remaining = self.OVERTIME_MINUTES
state.quarter = 5
# Each team gets at least one possession
for _ in range(2): # Initial possessions
outcome = self.drive_sim.simulate_drive(state)
if state.home_possession:
state.home_score += max(0, outcome.points_scored)
else:
state.away_score += max(0, outcome.points_scored)
state.home_possession = not state.home_possession
if state.home_score != state.away_score:
break
# If still tied, sudden death
while state.home_score == state.away_score and state.time_remaining > 0:
outcome = self.drive_sim.simulate_drive(state)
if state.home_possession:
state.home_score += max(0, outcome.points_scored)
else:
state.away_score += max(0, outcome.points_scored)
state.time_remaining -= outcome.time_used
state.home_possession = not state.home_possession
def simulate_many(self, n_sims: int = 10000) -> List[FullGameResult]:
"""Run multiple simulations."""
return [self.simulate_game() for _ in range(n_sims)]
Part 4: Season Simulation
Simulating Full Seasons
Beyond single games, simulation enables season-level analysis:
@dataclass
class SeasonResult:
"""Results from a single season simulation."""
team_records: Dict[str, Tuple[int, int, int]] # W-L-T
division_winners: List[str]
wildcard_teams: List[str]
playoff_bracket: Dict
super_bowl_winner: str
super_bowl_loser: str
class SeasonSimulator:
"""
Simulate full NFL seasons.
Uses game simulation for each matchup.
"""
def __init__(self, team_ratings: Dict[str, float],
schedule: List[Dict]):
"""
Initialize season simulator.
Args:
team_ratings: {team: rating} dictionary
schedule: List of games with home_team, away_team, week
"""
self.ratings = team_ratings
self.schedule = schedule
def simulate_season(self) -> SeasonResult:
"""Simulate one complete season."""
records = {team: [0, 0, 0] for team in self.ratings} # W, L, T
# Simulate each game
for game in self.schedule:
home = game['home_team']
away = game['away_team']
# Simple simulation based on ratings
home_rating = self.ratings.get(home, 0)
away_rating = self.ratings.get(away, 0)
# Expected margin
expected_margin = home_rating - away_rating + 2.5 # HFA
# Add randomness
actual_margin = expected_margin + np.random.normal(0, 13.5)
if actual_margin > 0:
records[home][0] += 1
records[away][1] += 1
elif actual_margin < 0:
records[home][1] += 1
records[away][0] += 1
else:
records[home][2] += 1
records[away][2] += 1
# Determine playoff seeding (simplified)
standings = sorted(
records.items(),
key=lambda x: (x[1][0], -x[1][1]), # Wins, then fewer losses
reverse=True
)
# Top 7 teams make playoffs per conference (simplified)
playoff_teams = [team for team, _ in standings[:14]]
return SeasonResult(
team_records={team: tuple(rec) for team, rec in records.items()},
division_winners=playoff_teams[:4],
wildcard_teams=playoff_teams[4:7],
playoff_bracket={},
super_bowl_winner=playoff_teams[0],
super_bowl_loser=playoff_teams[1]
)
def simulate_many_seasons(self, n_seasons: int = 10000) -> Dict:
"""
Simulate many seasons and aggregate results.
Returns probability distributions for key outcomes.
"""
results = {
'playoff_appearances': {team: 0 for team in self.ratings},
'division_titles': {team: 0 for team in self.ratings},
'super_bowl_appearances': {team: 0 for team in self.ratings},
'super_bowl_wins': {team: 0 for team in self.ratings},
'win_distributions': {team: [] for team in self.ratings}
}
for _ in range(n_seasons):
season = self.simulate_season()
for team, record in season.team_records.items():
results['win_distributions'][team].append(record[0])
for team in season.division_winners:
results['division_titles'][team] += 1
results['playoff_appearances'][team] += 1
for team in season.wildcard_teams:
results['playoff_appearances'][team] += 1
results['super_bowl_appearances'][season.super_bowl_winner] += 1
results['super_bowl_appearances'][season.super_bowl_loser] += 1
results['super_bowl_wins'][season.super_bowl_winner] += 1
# Convert to probabilities
for key in ['playoff_appearances', 'division_titles',
'super_bowl_appearances', 'super_bowl_wins']:
results[key] = {team: count / n_seasons
for team, count in results[key].items()}
return results
Part 5: In-Game Win Probability
Live Simulation
Simulation enables real-time win probability updates:
class LiveWinProbability:
"""
Calculate win probability during a game.
Uses simulation from current game state.
"""
def __init__(self, home_off: float = 0, away_off: float = 0,
home_def: float = 0, away_def: float = 0):
self.simulator = FullGameSimulator(home_off, away_off, home_def, away_def)
def calculate_win_probability(self, home_score: int, away_score: int,
quarter: int, time_remaining: float,
home_possession: bool,
n_sims: int = 5000) -> Dict:
"""
Calculate win probability from current state.
Args:
home_score: Current home team score
away_score: Current away team score
quarter: Current quarter (1-4, 5 for OT)
time_remaining: Minutes remaining in quarter
home_possession: True if home team has ball
n_sims: Number of simulations to run
"""
home_wins = 0
away_wins = 0
ties = 0
for _ in range(n_sims):
# Complete game from current state
final_home, final_away = self._complete_game(
home_score, away_score, quarter,
time_remaining, home_possession
)
if final_home > final_away:
home_wins += 1
elif final_away > final_home:
away_wins += 1
else:
ties += 1
return {
'home_win_prob': home_wins / n_sims,
'away_win_prob': away_wins / n_sims,
'tie_prob': ties / n_sims,
'home_lead': home_score - away_score,
'game_situation': self._describe_situation(
home_score, away_score, quarter, time_remaining
)
}
def _complete_game(self, home_score: int, away_score: int,
quarter: int, time_remaining: float,
home_possession: bool) -> Tuple[int, int]:
"""Complete game from current state."""
state = GameState(
home_score=home_score,
away_score=away_score,
quarter=quarter,
time_remaining=time_remaining,
home_possession=home_possession
)
# Complete remaining quarters
while state.quarter <= 4:
while state.time_remaining > 0:
outcome = self.simulator.drive_sim.simulate_drive(state)
if state.home_possession:
state.home_score += max(0, outcome.points_scored)
else:
state.away_score += max(0, outcome.points_scored)
state.time_remaining -= outcome.time_used
state.home_possession = not state.home_possession
state.quarter += 1
state.time_remaining = 15.0
return state.home_score, state.away_score
def _describe_situation(self, home_score: int, away_score: int,
quarter: int, time_remaining: float) -> str:
"""Generate human-readable situation description."""
lead = home_score - away_score
if quarter == 4 and time_remaining < 2:
if abs(lead) <= 8:
return "Close game, final minutes"
elif lead > 8:
return "Home team likely to win"
else:
return "Away team likely to win"
elif quarter <= 2:
return "Early game, much uncertainty"
else:
if abs(lead) <= 3:
return "Competitive game"
elif abs(lead) <= 10:
return f"{'Home' if lead > 0 else 'Away'} team has lead"
else:
return f"{'Home' if lead > 0 else 'Away'} team in control"
Part 6: Simulation Validation
How Good Is the Simulator?
Validation requires comparing simulated distributions to historical reality:
class SimulationValidator:
"""
Validate simulation models against historical data.
"""
def validate_score_distribution(self, simulated_scores: List[int],
historical_scores: List[int]) -> Dict:
"""
Compare simulated and historical score distributions.
Uses Kolmogorov-Smirnov test and summary statistics.
"""
from scipy import stats
# KS test
ks_stat, ks_pvalue = stats.ks_2samp(simulated_scores, historical_scores)
# Summary comparison
return {
'simulated_mean': np.mean(simulated_scores),
'historical_mean': np.mean(historical_scores),
'mean_diff': np.mean(simulated_scores) - np.mean(historical_scores),
'simulated_std': np.std(simulated_scores),
'historical_std': np.std(historical_scores),
'std_diff': np.std(simulated_scores) - np.std(historical_scores),
'ks_statistic': ks_stat,
'ks_pvalue': ks_pvalue,
'distributions_similar': ks_pvalue > 0.05
}
def validate_win_probabilities(self, predictions: List[float],
outcomes: List[int],
n_bins: int = 10) -> Dict:
"""
Check if predicted probabilities are calibrated.
"""
df = pd.DataFrame({'pred': predictions, 'outcome': outcomes})
df['bin'] = pd.cut(df['pred'], bins=n_bins, labels=False)
calibration = []
for bin_num in range(n_bins):
bin_data = df[df['bin'] == bin_num]
if len(bin_data) > 0:
calibration.append({
'predicted': bin_data['pred'].mean(),
'actual': bin_data['outcome'].mean(),
'count': len(bin_data)
})
calibration_df = pd.DataFrame(calibration)
mae = np.mean(np.abs(
calibration_df['predicted'] - calibration_df['actual']
))
return {
'calibration': calibration_df,
'mean_absolute_error': mae,
'well_calibrated': mae < 0.05
}
def validate_margin_distribution(self, simulated_margins: List[int],
historical_margins: List[int]) -> Dict:
"""
Compare margin of victory distributions.
"""
from scipy import stats
ks_stat, ks_pvalue = stats.ks_2samp(simulated_margins, historical_margins)
# Check specific probabilities
sim_close = np.mean(np.abs(simulated_margins) <= 7)
hist_close = np.mean(np.abs(np.array(historical_margins)) <= 7)
sim_blowout = np.mean(np.abs(simulated_margins) >= 21)
hist_blowout = np.mean(np.abs(np.array(historical_margins)) >= 21)
return {
'ks_statistic': ks_stat,
'ks_pvalue': ks_pvalue,
'simulated_close_game_pct': sim_close,
'historical_close_game_pct': hist_close,
'simulated_blowout_pct': sim_blowout,
'historical_blowout_pct': hist_blowout,
'valid': ks_pvalue > 0.05
}
Part 7: Practical Applications
Fantasy Football Projections
Simulation provides range of outcomes for fantasy:
def fantasy_projection_range(player_mean: float, player_std: float,
n_sims: int = 10000) -> Dict:
"""
Generate fantasy point projection distribution.
"""
projections = np.random.normal(player_mean, player_std, n_sims)
projections = np.maximum(0, projections) # No negative points
return {
'median': np.median(projections),
'mean': np.mean(projections),
'floor': np.percentile(projections, 10),
'ceiling': np.percentile(projections, 90),
'bust_prob': np.mean(projections < player_mean * 0.5),
'boom_prob': np.mean(projections > player_mean * 1.5)
}
Playoff Scenario Analysis
What needs to happen for a team to make playoffs?
def playoff_scenario_analysis(team: str, current_record: Tuple[int, int],
remaining_schedule: List[Dict],
other_teams: Dict[str, Tuple[int, int]],
team_ratings: Dict[str, float],
n_sims: int = 10000) -> Dict:
"""
Analyze playoff scenarios for a team.
"""
makes_playoff = 0
wins_division = 0
gets_bye = 0
for _ in range(n_sims):
# Simulate remaining games
final_record = simulate_remaining_season(
team, current_record, remaining_schedule, team_ratings
)
# Check playoff implications
if final_record[0] >= 10: # Simplified playoff threshold
makes_playoff += 1
if final_record[0] >= 12:
wins_division += 1
if final_record[0] >= 13:
gets_bye += 1
return {
'playoff_prob': makes_playoff / n_sims,
'division_title_prob': wins_division / n_sims,
'bye_prob': gets_bye / n_sims
}
Limitations and Appropriate Use
When Simulation Helps
- Understanding uncertainty: Full distributions, not just point estimates
- Scenario analysis: "What if" explorations
- Risk quantification: Probability of extreme outcomes
- Path-dependent analysis: How games might unfold
When Simulation Misleads
- False precision: More simulations don't fix bad models
- Assumption sensitivity: Results depend heavily on input parameters
- Computational cost: Complex simulations can be slow
- Over-interpretation: Simulated scenarios aren't predictions
Best Practices
- Validate against history: Ensure distributions match reality
- Sensitivity analysis: Test how results change with different inputs
- Report uncertainty: Simulations have their own error
- Use appropriate models: Match complexity to available data
Summary
Game simulation extends prediction from point estimates to full probability distributions. Key concepts:
- Monte Carlo basics: Random sampling to explore outcome space
- Score-based models: Simple but effective for many applications
- Drive-by-drive simulation: Captures game dynamics and flow
- Season simulation: Playoff probabilities and standings distributions
- Live win probability: Real-time updates during games
- Validation: Ensuring simulated distributions match reality
Simulation's power lies in revealing uncertainty—not just who might win, but the full range of possible outcomes.
Looking Ahead
Chapter 22 Preview: Next, we'll explore Betting Market Analysis—understanding how betting markets work, what market prices tell us about team strength, and how to evaluate model performance against market efficiency.
Chapter Summary
Simulation transforms static predictions into dynamic explorations of possibility. By generating thousands of potential outcomes, we can answer questions that point estimates cannot: What's the probability of a blowout? How likely is overtime? What are the team's playoff chances?
The key to good simulation is appropriate model complexity. Simple score-based models work well for many applications; drive-by-drive simulation adds realism at computational cost. Regardless of approach, validation against historical data ensures simulated distributions reflect reality.
For practitioners, simulation is a tool for understanding uncertainty. The goal isn't to predict exactly what will happen—that's impossible in football—but to map the landscape of what might happen and how likely each scenario is.