5 min read

Sometimes a single prediction isn't enough. Instead of asking "Who will win?" we want to explore the full range of possibilities: "What might happen?" Game simulation uses Monte Carlo methods to generate thousands of potential outcomes, producing...

Chapter 21: Game Simulation

When analytics meets Monte Carlo


Introduction

Sometimes a single prediction isn't enough. Instead of asking "Who will win?" we want to explore the full range of possibilities: "What might happen?" Game simulation uses Monte Carlo methods to generate thousands of potential outcomes, producing probability distributions for any metric—win probability, final score, margin of victory, or even specific game scenarios.

This chapter covers the theory and practice of NFL game simulation. We'll build simulators from simple score-based models to sophisticated drive-by-drive engines, understand when simulation adds value, and learn to interpret simulation outputs appropriately.


Learning Objectives

By the end of this chapter, you will be able to:

  1. Explain the Monte Carlo method and its application to sports
  2. Build a basic game simulation model
  3. Implement drive-by-drive game simulation
  4. Generate and interpret probability distributions from simulations
  5. Simulate season outcomes and playoff scenarios
  6. Validate simulation models against historical data
  7. Understand the limitations and appropriate uses of simulation

Part 1: Monte Carlo Fundamentals

What Is Monte Carlo Simulation?

Monte Carlo simulation uses random sampling to explore the space of possible outcomes. Named after the famous casino, the method generates many random scenarios according to specified probability distributions, then aggregates results to estimate probabilities of interest.

For NFL games, this means: 1. Model how scores or plays are generated 2. Run thousands of simulated games 3. Analyze the distribution of outcomes

Why Simulate?

Single-point predictions have limitations: - A predicted spread of -7 doesn't tell us the probability of winning by 14+ - Win probability doesn't reveal the distribution of possible scores - Uncertainty isn't captured by point estimates

Simulation provides: - Full probability distributions - Confidence intervals - Scenario analysis ("What if?" questions) - Path-dependent outcomes (how games unfold)

Core Concepts

import numpy as np
from scipy import stats
from typing import Dict, List, Tuple
from dataclasses import dataclass


@dataclass
class SimulationResult:
    """Container for simulation outcomes."""
    home_wins: int
    away_wins: int
    ties: int
    home_scores: List[int]
    away_scores: List[int]
    margins: List[int]
    n_simulations: int

    @property
    def home_win_pct(self) -> float:
        return self.home_wins / self.n_simulations

    @property
    def mean_margin(self) -> float:
        return np.mean(self.margins)

    @property
    def margin_std(self) -> float:
        return np.std(self.margins)

    def probability_of_margin(self, threshold: int) -> float:
        """P(margin >= threshold) for home team."""
        return sum(1 for m in self.margins if m >= threshold) / self.n_simulations


def basic_monte_carlo_demo(home_mean: float = 24, away_mean: float = 21,
                            score_std: float = 10, n_sims: int = 10000) -> SimulationResult:
    """
    Basic Monte Carlo game simulation.

    Assumes scores are normally distributed (simplified).
    """
    # Generate random scores
    home_scores = np.random.normal(home_mean, score_std, n_sims)
    away_scores = np.random.normal(away_mean, score_std, n_sims)

    # Round to integers and ensure non-negative
    home_scores = np.maximum(0, np.round(home_scores)).astype(int)
    away_scores = np.maximum(0, np.round(away_scores)).astype(int)

    # Calculate outcomes
    margins = home_scores - away_scores
    home_wins = np.sum(margins > 0)
    away_wins = np.sum(margins < 0)
    ties = np.sum(margins == 0)

    return SimulationResult(
        home_wins=int(home_wins),
        away_wins=int(away_wins),
        ties=int(ties),
        home_scores=home_scores.tolist(),
        away_scores=away_scores.tolist(),
        margins=margins.tolist(),
        n_simulations=n_sims
    )

How Many Simulations?

The number of simulations affects precision:

Simulations Standard Error (for 50% probability)
1,000 ±1.58%
10,000 ±0.50%
100,000 ±0.16%
1,000,000 ±0.05%

For most NFL applications, 10,000-100,000 simulations provide sufficient precision.


Part 2: Score-Based Game Simulation

The Normal Distribution Model

The simplest game simulation assumes scores follow normal distributions:

$$Home \sim N(\mu_{home}, \sigma)$$ $$Away \sim N(\mu_{away}, \sigma)$$

Where $\mu$ is expected score and $\sigma$ is standard deviation.

class ScoreBasedSimulator:
    """
    Game simulation using score distributions.

    Models team scores as correlated normal distributions.
    """

    def __init__(self, score_std: float = 10.0,
                 correlation: float = 0.1,
                 min_score: int = 0,
                 home_advantage: float = 2.5):
        """
        Initialize simulator.

        Args:
            score_std: Standard deviation of individual team scores
            correlation: Correlation between team scores (game pace)
            min_score: Minimum possible score
            home_advantage: Points added to home team expected score
        """
        self.score_std = score_std
        self.correlation = correlation
        self.min_score = min_score
        self.home_advantage = home_advantage

    def simulate_game(self, home_rating: float, away_rating: float,
                      n_sims: int = 10000) -> SimulationResult:
        """
        Simulate game outcomes.

        Args:
            home_rating: Home team's average points scored
            away_rating: Away team's average points scored

        Returns:
            SimulationResult with all outcomes
        """
        # Adjust for home advantage
        home_mean = home_rating + self.home_advantage
        away_mean = away_rating

        # Generate correlated scores
        # Using bivariate normal distribution
        mean = [home_mean, away_mean]
        cov = [[self.score_std**2, self.correlation * self.score_std**2],
               [self.correlation * self.score_std**2, self.score_std**2]]

        scores = np.random.multivariate_normal(mean, cov, n_sims)

        home_scores = np.maximum(self.min_score, np.round(scores[:, 0])).astype(int)
        away_scores = np.maximum(self.min_score, np.round(scores[:, 1])).astype(int)

        margins = home_scores - away_scores

        return SimulationResult(
            home_wins=int(np.sum(margins > 0)),
            away_wins=int(np.sum(margins < 0)),
            ties=int(np.sum(margins == 0)),
            home_scores=home_scores.tolist(),
            away_scores=away_scores.tolist(),
            margins=margins.tolist(),
            n_simulations=n_sims
        )

    def probability_distribution(self, result: SimulationResult) -> Dict:
        """
        Generate probability distribution from results.
        """
        margins = np.array(result.margins)

        # Key probabilities
        return {
            'home_win_prob': result.home_win_pct,
            'away_win_prob': result.away_wins / result.n_simulations,
            'tie_prob': result.ties / result.n_simulations,
            'home_cover_7': np.mean(margins >= 7),
            'home_cover_3': np.mean(margins >= 3),
            'home_cover_minus_3': np.mean(margins >= -3),
            'blowout_home': np.mean(margins >= 14),
            'blowout_away': np.mean(margins <= -14),
            'one_score_game': np.mean(np.abs(margins) <= 8),
            'mean_margin': float(np.mean(margins)),
            'std_margin': float(np.std(margins)),
            'percentile_10': float(np.percentile(margins, 10)),
            'percentile_50': float(np.percentile(margins, 50)),
            'percentile_90': float(np.percentile(margins, 90))
        }

Score Correlation

In real NFL games, team scores are slightly correlated due to: - Game pace (high-scoring games tend to have both teams score more) - Weather conditions affecting both teams - Defensive style (low-scoring games affect both teams)

Typical correlation: 0.10-0.15

Realistic Score Distributions

NFL scores aren't truly normally distributed—they're discrete and have specific patterns:

class RealisticScoreSimulator(ScoreBasedSimulator):
    """
    Simulator with realistic NFL score distributions.

    Accounts for:
    - Common score patterns (3, 7, 10 multiples)
    - Score clustering
    - Non-normal distribution shape
    """

    # Common NFL final scores (empirical frequencies)
    COMMON_SCORES = {
        0: 0.02, 3: 0.05, 6: 0.04, 7: 0.06, 9: 0.02, 10: 0.07,
        13: 0.06, 14: 0.05, 16: 0.04, 17: 0.08, 19: 0.02, 20: 0.07,
        21: 0.05, 23: 0.04, 24: 0.07, 26: 0.03, 27: 0.05, 28: 0.04,
        30: 0.04, 31: 0.04, 34: 0.03, 35: 0.02, 37: 0.02, 38: 0.02
    }

    def _adjust_to_realistic_score(self, raw_score: float) -> int:
        """
        Adjust continuous score to realistic NFL score.

        Maps to common score values with appropriate probabilities.
        """
        # Round to nearest likely score
        base_score = int(round(raw_score))

        # Find closest common score
        common = list(self.COMMON_SCORES.keys())
        closest = min(common, key=lambda x: abs(x - base_score))

        # With some probability, use the closest common score
        if np.random.random() < 0.3:
            return closest
        return max(0, base_score)

    def simulate_game(self, home_rating: float, away_rating: float,
                      n_sims: int = 10000) -> SimulationResult:
        """Simulate with realistic score adjustments."""
        # Get base simulation
        base_result = super().simulate_game(home_rating, away_rating, n_sims)

        # Adjust scores for realism
        home_scores = [self._adjust_to_realistic_score(s)
                      for s in base_result.home_scores]
        away_scores = [self._adjust_to_realistic_score(s)
                      for s in base_result.away_scores]

        margins = [h - a for h, a in zip(home_scores, away_scores)]

        return SimulationResult(
            home_wins=sum(1 for m in margins if m > 0),
            away_wins=sum(1 for m in margins if m < 0),
            ties=sum(1 for m in margins if m == 0),
            home_scores=home_scores,
            away_scores=away_scores,
            margins=margins,
            n_simulations=n_sims
        )

Part 3: Drive-by-Drive Simulation

Beyond Final Scores

Score-based simulation captures outcomes but not how games unfold. Drive-by-drive simulation models the actual game process:

  1. Coin toss determines first possession
  2. Drives alternate between teams
  3. Each drive ends in: touchdown, field goal, punt, turnover, or end of half/game
  4. Score accumulates through drives
  5. Game ends after regulation (or overtime if tied)

Drive Outcome Model

from enum import Enum
from dataclasses import dataclass, field
from typing import Optional


class DriveResult(Enum):
    TOUCHDOWN = 'touchdown'
    FIELD_GOAL = 'field_goal'
    PUNT = 'punt'
    TURNOVER = 'turnover'
    TURNOVER_TD = 'turnover_td'  # Defensive/special teams TD
    SAFETY = 'safety'
    END_OF_HALF = 'end_of_half'
    MISSED_FG = 'missed_fg'


@dataclass
class DriveOutcome:
    result: DriveResult
    points_scored: int
    field_position_after: int  # Yard line for opponent
    time_used: float  # Minutes


@dataclass
class GameState:
    home_score: int = 0
    away_score: int = 0
    home_possession: bool = True
    quarter: int = 1
    time_remaining: float = 15.0  # Minutes in quarter
    field_position: int = 25  # Yard line
    drive_number: int = 0


class DriveSimulator:
    """
    Simulate individual drives based on team efficiency.

    Uses empirical probabilities for drive outcomes.
    """

    # Base drive outcome probabilities (league average)
    BASE_OUTCOMES = {
        DriveResult.TOUCHDOWN: 0.22,
        DriveResult.FIELD_GOAL: 0.15,
        DriveResult.PUNT: 0.38,
        DriveResult.TURNOVER: 0.12,
        DriveResult.TURNOVER_TD: 0.02,
        DriveResult.SAFETY: 0.005,
        DriveResult.MISSED_FG: 0.05,
        DriveResult.END_OF_HALF: 0.025
    }

    def __init__(self, home_off_rating: float = 0.0,
                 away_off_rating: float = 0.0,
                 home_def_rating: float = 0.0,
                 away_def_rating: float = 0.0):
        """
        Initialize with team ratings.

        Ratings are relative to league average (0 = average).
        Positive = better, negative = worse.
        """
        self.home_off = home_off_rating
        self.away_off = away_off_rating
        self.home_def = home_def_rating
        self.away_def = away_def_rating

    def _get_drive_probabilities(self, offense_home: bool) -> Dict[DriveResult, float]:
        """
        Get drive outcome probabilities adjusted for teams.
        """
        # Get relevant ratings
        if offense_home:
            off_rating = self.home_off
            def_rating = self.away_def
        else:
            off_rating = self.away_off
            def_rating = self.home_def

        # Net advantage (positive = offense advantage)
        net_advantage = off_rating - def_rating

        # Adjust probabilities
        probs = self.BASE_OUTCOMES.copy()

        # Better offense/worse defense = more scoring
        td_adjustment = net_advantage * 0.03  # Each point of rating = 3% TD rate change
        fg_adjustment = net_advantage * 0.01

        probs[DriveResult.TOUCHDOWN] = np.clip(
            probs[DriveResult.TOUCHDOWN] + td_adjustment, 0.10, 0.40
        )
        probs[DriveResult.FIELD_GOAL] = np.clip(
            probs[DriveResult.FIELD_GOAL] + fg_adjustment, 0.08, 0.25
        )

        # Adjust punts to balance
        remaining = 1.0 - sum(probs[k] for k in probs if k != DriveResult.PUNT)
        probs[DriveResult.PUNT] = max(0.15, remaining)

        # Normalize
        total = sum(probs.values())
        return {k: v/total for k, v in probs.items()}

    def simulate_drive(self, state: GameState) -> DriveOutcome:
        """
        Simulate a single drive.

        Returns drive outcome with points and field position.
        """
        probs = self._get_drive_probabilities(state.home_possession)

        # Sample outcome
        outcomes = list(probs.keys())
        probabilities = list(probs.values())
        result = np.random.choice(outcomes, p=probabilities)

        # Determine points and field position
        if result == DriveResult.TOUCHDOWN:
            # Assume ~95% XP success
            points = 7 if np.random.random() < 0.95 else 6
            field_position = 25  # Kickoff return average
            time_used = np.random.uniform(2.0, 4.5)

        elif result == DriveResult.FIELD_GOAL:
            points = 3
            field_position = 25
            time_used = np.random.uniform(2.5, 4.0)

        elif result == DriveResult.PUNT:
            points = 0
            field_position = np.random.randint(15, 35)  # After punt
            time_used = np.random.uniform(1.5, 3.5)

        elif result == DriveResult.TURNOVER:
            points = 0
            # Turnover at random spot
            field_position = 100 - np.random.randint(20, 50)
            time_used = np.random.uniform(1.0, 3.0)

        elif result == DriveResult.TURNOVER_TD:
            points = -7  # Opponent scores
            field_position = 25
            time_used = np.random.uniform(0.5, 2.0)

        elif result == DriveResult.SAFETY:
            points = -2  # Opponent scores 2
            field_position = 35  # Free kick
            time_used = np.random.uniform(1.0, 2.5)

        elif result == DriveResult.MISSED_FG:
            points = 0
            field_position = min(80, state.field_position + 10)  # Spot of kick
            time_used = np.random.uniform(2.0, 3.5)

        else:  # END_OF_HALF
            points = 0
            field_position = 25
            time_used = state.time_remaining

        return DriveOutcome(
            result=result,
            points_scored=points,
            field_position_after=field_position,
            time_used=time_used
        )

Full Game Simulation

@dataclass
class FullGameResult:
    """Complete game simulation result."""
    final_home_score: int
    final_away_score: int
    home_won: bool
    quarter_scores: List[Tuple[int, int]]
    drive_log: List[Dict]
    total_drives: int
    went_to_overtime: bool


class FullGameSimulator:
    """
    Complete game simulation engine.

    Simulates games drive-by-drive with realistic game flow.
    """

    QUARTER_MINUTES = 15.0
    OVERTIME_MINUTES = 10.0

    def __init__(self, home_off: float = 0, away_off: float = 0,
                 home_def: float = 0, away_def: float = 0):
        self.drive_sim = DriveSimulator(home_off, away_off, home_def, away_def)

    def simulate_game(self) -> FullGameResult:
        """Simulate a complete game."""
        state = GameState()
        drive_log = []
        quarter_scores = []

        # Coin toss (50/50 for first possession)
        state.home_possession = np.random.random() < 0.5

        # Simulate regulation
        for quarter in range(1, 5):
            state.quarter = quarter
            state.time_remaining = self.QUARTER_MINUTES

            quarter_start_scores = (state.home_score, state.away_score)

            while state.time_remaining > 0:
                # Simulate drive
                outcome = self.drive_sim.simulate_drive(state)

                # Update state
                if state.home_possession:
                    state.home_score += max(0, outcome.points_scored)
                    if outcome.points_scored < 0:  # Defensive score
                        state.away_score += abs(outcome.points_scored)
                else:
                    state.away_score += max(0, outcome.points_scored)
                    if outcome.points_scored < 0:
                        state.home_score += abs(outcome.points_scored)

                state.time_remaining -= outcome.time_used
                state.field_position = outcome.field_position_after
                state.home_possession = not state.home_possession
                state.drive_number += 1

                drive_log.append({
                    'drive': state.drive_number,
                    'quarter': quarter,
                    'offense': 'home' if not state.home_possession else 'away',
                    'result': outcome.result.value,
                    'points': outcome.points_scored,
                    'time_used': outcome.time_used
                })

            # Record quarter scores
            quarter_scores.append((
                state.home_score - quarter_start_scores[0],
                state.away_score - quarter_start_scores[1]
            ))

        # Check for overtime
        went_to_overtime = False
        if state.home_score == state.away_score:
            went_to_overtime = True
            self._simulate_overtime(state, drive_log)

        return FullGameResult(
            final_home_score=state.home_score,
            final_away_score=state.away_score,
            home_won=state.home_score > state.away_score,
            quarter_scores=quarter_scores,
            drive_log=drive_log,
            total_drives=state.drive_number,
            went_to_overtime=went_to_overtime
        )

    def _simulate_overtime(self, state: GameState, drive_log: List):
        """Simulate overtime period (simplified)."""
        state.time_remaining = self.OVERTIME_MINUTES
        state.quarter = 5

        # Each team gets at least one possession
        for _ in range(2):  # Initial possessions
            outcome = self.drive_sim.simulate_drive(state)

            if state.home_possession:
                state.home_score += max(0, outcome.points_scored)
            else:
                state.away_score += max(0, outcome.points_scored)

            state.home_possession = not state.home_possession

            if state.home_score != state.away_score:
                break

        # If still tied, sudden death
        while state.home_score == state.away_score and state.time_remaining > 0:
            outcome = self.drive_sim.simulate_drive(state)

            if state.home_possession:
                state.home_score += max(0, outcome.points_scored)
            else:
                state.away_score += max(0, outcome.points_scored)

            state.time_remaining -= outcome.time_used
            state.home_possession = not state.home_possession

    def simulate_many(self, n_sims: int = 10000) -> List[FullGameResult]:
        """Run multiple simulations."""
        return [self.simulate_game() for _ in range(n_sims)]

Part 4: Season Simulation

Simulating Full Seasons

Beyond single games, simulation enables season-level analysis:

@dataclass
class SeasonResult:
    """Results from a single season simulation."""
    team_records: Dict[str, Tuple[int, int, int]]  # W-L-T
    division_winners: List[str]
    wildcard_teams: List[str]
    playoff_bracket: Dict
    super_bowl_winner: str
    super_bowl_loser: str


class SeasonSimulator:
    """
    Simulate full NFL seasons.

    Uses game simulation for each matchup.
    """

    def __init__(self, team_ratings: Dict[str, float],
                 schedule: List[Dict]):
        """
        Initialize season simulator.

        Args:
            team_ratings: {team: rating} dictionary
            schedule: List of games with home_team, away_team, week
        """
        self.ratings = team_ratings
        self.schedule = schedule

    def simulate_season(self) -> SeasonResult:
        """Simulate one complete season."""
        records = {team: [0, 0, 0] for team in self.ratings}  # W, L, T

        # Simulate each game
        for game in self.schedule:
            home = game['home_team']
            away = game['away_team']

            # Simple simulation based on ratings
            home_rating = self.ratings.get(home, 0)
            away_rating = self.ratings.get(away, 0)

            # Expected margin
            expected_margin = home_rating - away_rating + 2.5  # HFA

            # Add randomness
            actual_margin = expected_margin + np.random.normal(0, 13.5)

            if actual_margin > 0:
                records[home][0] += 1
                records[away][1] += 1
            elif actual_margin < 0:
                records[home][1] += 1
                records[away][0] += 1
            else:
                records[home][2] += 1
                records[away][2] += 1

        # Determine playoff seeding (simplified)
        standings = sorted(
            records.items(),
            key=lambda x: (x[1][0], -x[1][1]),  # Wins, then fewer losses
            reverse=True
        )

        # Top 7 teams make playoffs per conference (simplified)
        playoff_teams = [team for team, _ in standings[:14]]

        return SeasonResult(
            team_records={team: tuple(rec) for team, rec in records.items()},
            division_winners=playoff_teams[:4],
            wildcard_teams=playoff_teams[4:7],
            playoff_bracket={},
            super_bowl_winner=playoff_teams[0],
            super_bowl_loser=playoff_teams[1]
        )

    def simulate_many_seasons(self, n_seasons: int = 10000) -> Dict:
        """
        Simulate many seasons and aggregate results.

        Returns probability distributions for key outcomes.
        """
        results = {
            'playoff_appearances': {team: 0 for team in self.ratings},
            'division_titles': {team: 0 for team in self.ratings},
            'super_bowl_appearances': {team: 0 for team in self.ratings},
            'super_bowl_wins': {team: 0 for team in self.ratings},
            'win_distributions': {team: [] for team in self.ratings}
        }

        for _ in range(n_seasons):
            season = self.simulate_season()

            for team, record in season.team_records.items():
                results['win_distributions'][team].append(record[0])

            for team in season.division_winners:
                results['division_titles'][team] += 1
                results['playoff_appearances'][team] += 1

            for team in season.wildcard_teams:
                results['playoff_appearances'][team] += 1

            results['super_bowl_appearances'][season.super_bowl_winner] += 1
            results['super_bowl_appearances'][season.super_bowl_loser] += 1
            results['super_bowl_wins'][season.super_bowl_winner] += 1

        # Convert to probabilities
        for key in ['playoff_appearances', 'division_titles',
                    'super_bowl_appearances', 'super_bowl_wins']:
            results[key] = {team: count / n_seasons
                           for team, count in results[key].items()}

        return results

Part 5: In-Game Win Probability

Live Simulation

Simulation enables real-time win probability updates:

class LiveWinProbability:
    """
    Calculate win probability during a game.

    Uses simulation from current game state.
    """

    def __init__(self, home_off: float = 0, away_off: float = 0,
                 home_def: float = 0, away_def: float = 0):
        self.simulator = FullGameSimulator(home_off, away_off, home_def, away_def)

    def calculate_win_probability(self, home_score: int, away_score: int,
                                   quarter: int, time_remaining: float,
                                   home_possession: bool,
                                   n_sims: int = 5000) -> Dict:
        """
        Calculate win probability from current state.

        Args:
            home_score: Current home team score
            away_score: Current away team score
            quarter: Current quarter (1-4, 5 for OT)
            time_remaining: Minutes remaining in quarter
            home_possession: True if home team has ball
            n_sims: Number of simulations to run
        """
        home_wins = 0
        away_wins = 0
        ties = 0

        for _ in range(n_sims):
            # Complete game from current state
            final_home, final_away = self._complete_game(
                home_score, away_score, quarter,
                time_remaining, home_possession
            )

            if final_home > final_away:
                home_wins += 1
            elif final_away > final_home:
                away_wins += 1
            else:
                ties += 1

        return {
            'home_win_prob': home_wins / n_sims,
            'away_win_prob': away_wins / n_sims,
            'tie_prob': ties / n_sims,
            'home_lead': home_score - away_score,
            'game_situation': self._describe_situation(
                home_score, away_score, quarter, time_remaining
            )
        }

    def _complete_game(self, home_score: int, away_score: int,
                        quarter: int, time_remaining: float,
                        home_possession: bool) -> Tuple[int, int]:
        """Complete game from current state."""
        state = GameState(
            home_score=home_score,
            away_score=away_score,
            quarter=quarter,
            time_remaining=time_remaining,
            home_possession=home_possession
        )

        # Complete remaining quarters
        while state.quarter <= 4:
            while state.time_remaining > 0:
                outcome = self.simulator.drive_sim.simulate_drive(state)

                if state.home_possession:
                    state.home_score += max(0, outcome.points_scored)
                else:
                    state.away_score += max(0, outcome.points_scored)

                state.time_remaining -= outcome.time_used
                state.home_possession = not state.home_possession

            state.quarter += 1
            state.time_remaining = 15.0

        return state.home_score, state.away_score

    def _describe_situation(self, home_score: int, away_score: int,
                            quarter: int, time_remaining: float) -> str:
        """Generate human-readable situation description."""
        lead = home_score - away_score

        if quarter == 4 and time_remaining < 2:
            if abs(lead) <= 8:
                return "Close game, final minutes"
            elif lead > 8:
                return "Home team likely to win"
            else:
                return "Away team likely to win"
        elif quarter <= 2:
            return "Early game, much uncertainty"
        else:
            if abs(lead) <= 3:
                return "Competitive game"
            elif abs(lead) <= 10:
                return f"{'Home' if lead > 0 else 'Away'} team has lead"
            else:
                return f"{'Home' if lead > 0 else 'Away'} team in control"

Part 6: Simulation Validation

How Good Is the Simulator?

Validation requires comparing simulated distributions to historical reality:

class SimulationValidator:
    """
    Validate simulation models against historical data.
    """

    def validate_score_distribution(self, simulated_scores: List[int],
                                     historical_scores: List[int]) -> Dict:
        """
        Compare simulated and historical score distributions.

        Uses Kolmogorov-Smirnov test and summary statistics.
        """
        from scipy import stats

        # KS test
        ks_stat, ks_pvalue = stats.ks_2samp(simulated_scores, historical_scores)

        # Summary comparison
        return {
            'simulated_mean': np.mean(simulated_scores),
            'historical_mean': np.mean(historical_scores),
            'mean_diff': np.mean(simulated_scores) - np.mean(historical_scores),

            'simulated_std': np.std(simulated_scores),
            'historical_std': np.std(historical_scores),
            'std_diff': np.std(simulated_scores) - np.std(historical_scores),

            'ks_statistic': ks_stat,
            'ks_pvalue': ks_pvalue,
            'distributions_similar': ks_pvalue > 0.05
        }

    def validate_win_probabilities(self, predictions: List[float],
                                    outcomes: List[int],
                                    n_bins: int = 10) -> Dict:
        """
        Check if predicted probabilities are calibrated.
        """
        df = pd.DataFrame({'pred': predictions, 'outcome': outcomes})
        df['bin'] = pd.cut(df['pred'], bins=n_bins, labels=False)

        calibration = []
        for bin_num in range(n_bins):
            bin_data = df[df['bin'] == bin_num]
            if len(bin_data) > 0:
                calibration.append({
                    'predicted': bin_data['pred'].mean(),
                    'actual': bin_data['outcome'].mean(),
                    'count': len(bin_data)
                })

        calibration_df = pd.DataFrame(calibration)
        mae = np.mean(np.abs(
            calibration_df['predicted'] - calibration_df['actual']
        ))

        return {
            'calibration': calibration_df,
            'mean_absolute_error': mae,
            'well_calibrated': mae < 0.05
        }

    def validate_margin_distribution(self, simulated_margins: List[int],
                                      historical_margins: List[int]) -> Dict:
        """
        Compare margin of victory distributions.
        """
        from scipy import stats

        ks_stat, ks_pvalue = stats.ks_2samp(simulated_margins, historical_margins)

        # Check specific probabilities
        sim_close = np.mean(np.abs(simulated_margins) <= 7)
        hist_close = np.mean(np.abs(np.array(historical_margins)) <= 7)

        sim_blowout = np.mean(np.abs(simulated_margins) >= 21)
        hist_blowout = np.mean(np.abs(np.array(historical_margins)) >= 21)

        return {
            'ks_statistic': ks_stat,
            'ks_pvalue': ks_pvalue,
            'simulated_close_game_pct': sim_close,
            'historical_close_game_pct': hist_close,
            'simulated_blowout_pct': sim_blowout,
            'historical_blowout_pct': hist_blowout,
            'valid': ks_pvalue > 0.05
        }

Part 7: Practical Applications

Fantasy Football Projections

Simulation provides range of outcomes for fantasy:

def fantasy_projection_range(player_mean: float, player_std: float,
                              n_sims: int = 10000) -> Dict:
    """
    Generate fantasy point projection distribution.
    """
    projections = np.random.normal(player_mean, player_std, n_sims)
    projections = np.maximum(0, projections)  # No negative points

    return {
        'median': np.median(projections),
        'mean': np.mean(projections),
        'floor': np.percentile(projections, 10),
        'ceiling': np.percentile(projections, 90),
        'bust_prob': np.mean(projections < player_mean * 0.5),
        'boom_prob': np.mean(projections > player_mean * 1.5)
    }

Playoff Scenario Analysis

What needs to happen for a team to make playoffs?

def playoff_scenario_analysis(team: str, current_record: Tuple[int, int],
                               remaining_schedule: List[Dict],
                               other_teams: Dict[str, Tuple[int, int]],
                               team_ratings: Dict[str, float],
                               n_sims: int = 10000) -> Dict:
    """
    Analyze playoff scenarios for a team.
    """
    makes_playoff = 0
    wins_division = 0
    gets_bye = 0

    for _ in range(n_sims):
        # Simulate remaining games
        final_record = simulate_remaining_season(
            team, current_record, remaining_schedule, team_ratings
        )

        # Check playoff implications
        if final_record[0] >= 10:  # Simplified playoff threshold
            makes_playoff += 1
            if final_record[0] >= 12:
                wins_division += 1
            if final_record[0] >= 13:
                gets_bye += 1

    return {
        'playoff_prob': makes_playoff / n_sims,
        'division_title_prob': wins_division / n_sims,
        'bye_prob': gets_bye / n_sims
    }

Limitations and Appropriate Use

When Simulation Helps

  1. Understanding uncertainty: Full distributions, not just point estimates
  2. Scenario analysis: "What if" explorations
  3. Risk quantification: Probability of extreme outcomes
  4. Path-dependent analysis: How games might unfold

When Simulation Misleads

  1. False precision: More simulations don't fix bad models
  2. Assumption sensitivity: Results depend heavily on input parameters
  3. Computational cost: Complex simulations can be slow
  4. Over-interpretation: Simulated scenarios aren't predictions

Best Practices

  1. Validate against history: Ensure distributions match reality
  2. Sensitivity analysis: Test how results change with different inputs
  3. Report uncertainty: Simulations have their own error
  4. Use appropriate models: Match complexity to available data

Summary

Game simulation extends prediction from point estimates to full probability distributions. Key concepts:

  1. Monte Carlo basics: Random sampling to explore outcome space
  2. Score-based models: Simple but effective for many applications
  3. Drive-by-drive simulation: Captures game dynamics and flow
  4. Season simulation: Playoff probabilities and standings distributions
  5. Live win probability: Real-time updates during games
  6. Validation: Ensuring simulated distributions match reality

Simulation's power lies in revealing uncertainty—not just who might win, but the full range of possible outcomes.


Looking Ahead

Chapter 22 Preview: Next, we'll explore Betting Market Analysis—understanding how betting markets work, what market prices tell us about team strength, and how to evaluate model performance against market efficiency.


Chapter Summary

Simulation transforms static predictions into dynamic explorations of possibility. By generating thousands of potential outcomes, we can answer questions that point estimates cannot: What's the probability of a blowout? How likely is overtime? What are the team's playoff chances?

The key to good simulation is appropriate model complexity. Simple score-based models work well for many applications; drive-by-drive simulation adds realism at computational cost. Regardless of approach, validation against historical data ensures simulated distributions reflect reality.

For practitioners, simulation is a tool for understanding uncertainty. The goal isn't to predict exactly what will happen—that's impossible in football—but to map the landscape of what might happen and how likely each scenario is.