Case Study: Simulating an NBA Season to Identify Futures Market Inefficiencies


Executive Summary

NBA futures markets --- where bettors wager on season-long outcomes such as win totals, playoff berths, conference championships, and the NBA title --- represent one of the most fertile grounds for simulation-based analysis. Unlike single-game markets where pricing is sharp and information is quickly incorporated, futures markets involve long time horizons and complex dependencies that make analytical pricing difficult. This case study builds a complete NBA season simulation from first principles, runs 50,000 replications to produce probability distributions for every team, and compares the simulated probabilities against actual futures market prices to identify potential value bets. We demonstrate that Monte Carlo simulation can reveal systematic mispricings, particularly for teams in the middle of the standings where small differences in team strength produce large differences in playoff probability.


Background

The Futures Betting Landscape

Before an NBA season begins, sportsbooks post futures odds on a variety of markets: each team's regular-season win total (over/under), division winners, conference champions, and the NBA champion. These markets remain open throughout the season, with odds adjusting as games are played and new information emerges.

Futures markets are attractive to quantitative bettors for several reasons. First, the long time horizon means the sportsbook must commit to a price months before the outcome is resolved, creating more room for disagreement between the market and a well-built model. Second, futures involve complex multi-team interactions --- a team's playoff probability depends not just on their own performance but on the performance of every other team in the conference. Third, the vig on futures is typically higher than on single-game markets (often 15-25% total overround), but this is partially offset by the larger edges that can exist.

The challenge is that evaluating futures bets requires computing probabilities over an enormous outcome space. A single NBA season involves approximately 1,230 regular-season games and up to 105 playoff games, with each game's outcome influencing subsequent seedings and matchups. Analytical computation of, say, the probability that the 7th-best team in the Western Conference wins the championship is intractable. Monte Carlo simulation is the natural tool.

Our Objective

We will build a simulation that takes as input a set of pre-season team power ratings and produces as output a complete probability distribution over every outcome of interest: win totals, playoff probabilities, seedings, round-by-round advancement probabilities, and championship probabilities. We will then compare these simulation-derived probabilities against a snapshot of futures market odds to identify discrepancies.


Data and Model Setup

Team Power Ratings

Our simulation begins with a power rating for each of the 30 NBA teams. These ratings represent each team's expected point margin per game against a league-average opponent on a neutral court. A rating of +5.0 means the team is expected to outscore an average team by 5 points per game; a rating of -3.0 means they are expected to lose by 3 points per game.

For this case study, we use ratings inspired by a typical NBA season's competitive landscape, though the specific values are illustrative:

import numpy as np
import pandas as pd
from collections import defaultdict
from typing import Optional


def create_nba_ratings() -> dict[str, dict]:
    """
    Create NBA team power ratings for simulation.

    Returns:
        Dictionary mapping team names to rating info including
        conference, division, and power rating.
    """
    teams = {
        "BOS": {"conf": "East", "div": "Atlantic", "rating": 7.2},
        "MIL": {"conf": "East", "div": "Central", "rating": 4.8},
        "PHI": {"conf": "East", "div": "Atlantic", "rating": 3.5},
        "CLE": {"conf": "East", "div": "Central", "rating": 3.0},
        "NYK": {"conf": "East", "div": "Atlantic", "rating": 2.8},
        "MIA": {"conf": "East", "div": "Southeast", "rating": 1.5},
        "IND": {"conf": "East", "div": "Central", "rating": 1.2},
        "CHI": {"conf": "East", "div": "Central", "rating": -0.5},
        "ATL": {"conf": "East", "div": "Southeast", "rating": -1.0},
        "BKN": {"conf": "East", "div": "Atlantic", "rating": -2.5},
        "TOR": {"conf": "East", "div": "Atlantic", "rating": -3.0},
        "ORL": {"conf": "East", "div": "Southeast", "rating": 2.0},
        "DET": {"conf": "East", "div": "Central", "rating": -6.0},
        "WAS": {"conf": "East", "div": "Southeast", "rating": -7.5},
        "CHA": {"conf": "East", "div": "Southeast", "rating": -5.5},
        "DEN": {"conf": "West", "div": "Northwest", "rating": 6.5},
        "OKC": {"conf": "West", "div": "Northwest", "rating": 5.8},
        "MIN": {"conf": "West", "div": "Northwest", "rating": 4.0},
        "LAC": {"conf": "West", "div": "Pacific", "rating": 2.5},
        "DAL": {"conf": "West", "div": "Southwest", "rating": 3.2},
        "PHX": {"conf": "West", "div": "Pacific", "rating": 2.0},
        "SAC": {"conf": "West", "div": "Pacific", "rating": 1.0},
        "NOP": {"conf": "West", "div": "Southwest", "rating": 0.8},
        "LAL": {"conf": "West", "div": "Pacific", "rating": 2.2},
        "GSW": {"conf": "West", "div": "Pacific", "rating": 1.8},
        "HOU": {"conf": "West", "div": "Southwest", "rating": -2.0},
        "MEM": {"conf": "West", "div": "Southwest", "rating": -3.5},
        "UTA": {"conf": "West", "div": "Northwest", "rating": -5.0},
        "SAS": {"conf": "West", "div": "Southwest", "rating": -6.5},
        "POR": {"conf": "West", "div": "Northwest", "rating": -7.0},
    }
    return teams

Schedule Generation

An actual NBA schedule involves complex constraints (back-to-backs, travel, national TV windows). For simulation purposes, we generate a simplified schedule where each team plays 82 games: 4 games against division opponents (16 games), 3-4 games against other conference opponents (36 games from 10 opponents), and 2 games against each team in the other conference (30 games), totaling 82 games per team.

def generate_nba_schedule(
    teams: dict[str, dict],
    seed: int = 42,
) -> pd.DataFrame:
    """
    Generate a simplified 82-game NBA schedule.

    Args:
        teams: Team information dictionary.
        seed: Random seed for home/away assignment.

    Returns:
        DataFrame with home_team, away_team columns.
    """
    rng = np.random.default_rng(seed)
    games = []
    team_names = list(teams.keys())

    for i, team_a in enumerate(team_names):
        for j, team_b in enumerate(team_names):
            if i >= j:
                continue
            same_conf = teams[team_a]["conf"] == teams[team_b]["conf"]
            same_div = teams[team_a]["div"] == teams[team_b]["div"]

            if same_div:
                n_games = 4
            elif same_conf:
                n_games = 3
            else:
                n_games = 2

            for g in range(n_games):
                if g % 2 == 0:
                    games.append({"home_team": team_a, "away_team": team_b})
                else:
                    games.append({"home_team": team_b, "away_team": team_a})

    return pd.DataFrame(games)

Game Outcome Model

Individual games are simulated using a normal distribution for the point margin:

$$\text{Margin}_{\text{home}} \sim N(R_{\text{home}} - R_{\text{away}} + H, \sigma^2)$$

where $H = 3.0$ is the home-court advantage (in points) and $\sigma = 12.0$ is the game-level standard deviation. The home team wins if the margin is positive.


The Simulation Engine

class NBASeasonSimulator:
    """
    Monte Carlo simulator for a complete NBA season including playoffs.

    Simulates the regular season to determine standings, seeds teams
    into the playoff bracket, and simulates best-of-7 playoff series
    through the Finals.
    """

    def __init__(
        self,
        teams: dict[str, dict],
        schedule: pd.DataFrame,
        home_advantage: float = 3.0,
        margin_std: float = 12.0,
        seed: int = 42,
    ):
        """
        Args:
            teams: Team info with conference, division, and rating.
            schedule: Regular-season schedule DataFrame.
            home_advantage: Home-court advantage in points.
            margin_std: Standard deviation of game margins.
            seed: Random seed.
        """
        self.teams = teams
        self.schedule = schedule
        self.home_advantage = home_advantage
        self.margin_std = margin_std
        self.rng = np.random.default_rng(seed)
        self.team_names = sorted(teams.keys())

    def simulate_game(self, home: str, away: str) -> str:
        """Simulate a single game and return the winner."""
        expected = (
            self.teams[home]["rating"]
            - self.teams[away]["rating"]
            + self.home_advantage
        )
        margin = self.rng.normal(expected, self.margin_std)
        return home if margin > 0 else away

    def simulate_regular_season(self) -> dict[str, int]:
        """Simulate all regular-season games and return win totals."""
        wins = {team: 0 for team in self.team_names}
        for _, game in self.schedule.iterrows():
            winner = self.simulate_game(game["home_team"], game["away_team"])
            wins[winner] += 1
        return wins

    def get_playoff_seeds(
        self, wins: dict[str, int],
    ) -> dict[str, list[str]]:
        """Determine 1-8 seeds in each conference from win totals."""
        conferences = {"East": [], "West": []}
        for team in self.team_names:
            conf = self.teams[team]["conf"]
            conferences[conf].append((team, wins[team]))

        seeds = {}
        for conf, team_wins in conferences.items():
            sorted_teams = sorted(
                team_wins,
                key=lambda x: (x[1], self.rng.random()),
                reverse=True,
            )
            seeds[conf] = [t[0] for t in sorted_teams[:8]]
        return seeds

    def simulate_series(
        self, higher_seed: str, lower_seed: str,
    ) -> str:
        """Simulate a best-of-7 series. Higher seed has home court."""
        home_pattern = [True, True, False, False, True, False, True]
        wins_h, wins_l = 0, 0
        for game_idx in range(7):
            if home_pattern[game_idx]:
                winner = self.simulate_game(higher_seed, lower_seed)
            else:
                winner = self.simulate_game(lower_seed, higher_seed)

            if winner == higher_seed:
                wins_h += 1
            else:
                wins_l += 1

            if wins_h == 4 or wins_l == 4:
                break

        return higher_seed if wins_h == 4 else lower_seed

    def simulate_playoffs(
        self, seeds: dict[str, list[str]],
    ) -> dict[str, str]:
        """Simulate the full playoff bracket and return round winners."""
        results = {}
        conf_champs = {}

        for conf in ["East", "West"]:
            s = seeds[conf]
            r1 = [
                self.simulate_series(s[0], s[7]),
                self.simulate_series(s[1], s[6]),
                self.simulate_series(s[2], s[5]),
                self.simulate_series(s[3], s[4]),
            ]
            results[f"{conf}_R1"] = r1

            r2 = [
                self.simulate_series(r1[0], r1[3]),
                self.simulate_series(r1[1], r1[2]),
            ]
            results[f"{conf}_R2"] = r2

            cf = self.simulate_series(r2[0], r2[1])
            results[f"{conf}_CF"] = cf
            conf_champs[conf] = cf

        champion = self.simulate_series(
            conf_champs["East"], conf_champs["West"]
        )
        results["champion"] = champion
        return results

    def run_full_simulation(
        self, n_simulations: int = 50000,
    ) -> dict:
        """
        Run complete season + playoff simulations.

        Args:
            n_simulations: Number of full season simulations.

        Returns:
            Dictionary with win distributions and probability estimates.
        """
        win_totals = {t: [] for t in self.team_names}
        playoff_count = {t: 0 for t in self.team_names}
        conf_finals_count = {t: 0 for t in self.team_names}
        finals_count = {t: 0 for t in self.team_names}
        champ_count = {t: 0 for t in self.team_names}

        for _ in range(n_simulations):
            wins = self.simulate_regular_season()
            for team, w in wins.items():
                win_totals[team].append(w)

            seeds = self.get_playoff_seeds(wins)
            for conf_seeds in seeds.values():
                for team in conf_seeds:
                    playoff_count[team] += 1

            playoff_results = self.simulate_playoffs(seeds)

            for conf in ["East", "West"]:
                cf_winner = playoff_results[f"{conf}_CF"]
                conf_finals_count[cf_winner] += 1
                finals_count[cf_winner] += 1

            champ_count[playoff_results["champion"]] += 1

        results = {}
        for team in self.team_names:
            w = np.array(win_totals[team])
            results[team] = {
                "mean_wins": float(w.mean()),
                "std_wins": float(w.std()),
                "median_wins": float(np.median(w)),
                "p10": float(np.percentile(w, 10)),
                "p90": float(np.percentile(w, 90)),
                "playoff_pct": playoff_count[team] / n_simulations,
                "conf_finals_pct": conf_finals_count[team] / n_simulations,
                "finals_pct": finals_count[team] / n_simulations,
                "champ_pct": champ_count[team] / n_simulations,
            }
        return results

Running the Simulation

We execute 50,000 full season simulations. Each simulation produces a complete regular season (approximately 1,230 games) and a full playoff bracket (up to 105 games), for a total of approximately 67 million simulated game outcomes.

teams = create_nba_ratings()
schedule = generate_nba_schedule(teams)
simulator = NBASeasonSimulator(teams, schedule, seed=42)
results = simulator.run_full_simulation(n_simulations=50000)

# Display top championship contenders
print(f"{'Team':<6} {'Wins':>5} {'Std':>5} {'Play%':>7} "
      f"{'CF%':>6} {'Final%':>7} {'Champ%':>7}")
print("-" * 50)
for team in sorted(results.keys(),
                   key=lambda t: -results[t]["champ_pct"]):
    r = results[team]
    if r["champ_pct"] > 0.005:
        print(f"{team:<6} {r['mean_wins']:>5.1f} {r['std_wins']:>5.1f} "
              f"{r['playoff_pct']:>7.1%} {r['conf_finals_pct']:>6.1%} "
              f"{r['finals_pct']:>7.1%} {r['champ_pct']:>7.1%}")

Comparing Against Futures Markets

The central question is whether our simulation-derived probabilities differ from the market's implied probabilities. We convert futures odds to implied probabilities (removing vig) and compare.

def compare_to_market(
    sim_results: dict,
    market_odds: dict[str, dict[str, float]],
) -> pd.DataFrame:
    """
    Compare simulation probabilities against market-implied probabilities.

    Args:
        sim_results: Simulation output dictionary.
        market_odds: Dict mapping team to dict of market-implied
            probabilities (vig-removed) for various outcomes.

    Returns:
        DataFrame showing discrepancies between simulation and market.
    """
    rows = []
    for team, market in market_odds.items():
        if team not in sim_results:
            continue
        sim = sim_results[team]
        for outcome in ["champ_pct", "playoff_pct"]:
            if outcome in market:
                sim_prob = sim[outcome]
                mkt_prob = market[outcome]
                edge = sim_prob - mkt_prob
                rows.append({
                    "team": team,
                    "outcome": outcome,
                    "sim_prob": sim_prob,
                    "market_prob": mkt_prob,
                    "edge": edge,
                    "edge_pct": edge * 100,
                })

    df = pd.DataFrame(rows)
    return df.sort_values("edge", ascending=False)

Lessons Learned

1. The middle of the standings is where value lives. Teams at the extremes (clear contenders and clear rebuilders) are accurately priced by the market. The 6th through 10th best teams in each conference, where small rating differences create large swings in playoff probability, are where simulation reveals the most edge.

2. Playoff format amplifies strength differences. A team that is slightly better than average might make the playoffs 60% of the time but win the championship only 3% of the time. The seven-game series format favors better teams more strongly than single-elimination, and the four-round gauntlet makes championship probability a highly nonlinear function of team strength.

3. Standard deviation of win totals is remarkably stable. Nearly every team has a standard deviation of approximately 5-6 wins in an 82-game season. This means even the best team has meaningful probability mass on 48-win seasons, and even poor teams occasionally reach 40 wins.

4. 50,000 simulations provide precise probability estimates. For a championship probability of 10%, the standard error is approximately $\sqrt{0.10 \times 0.90 / 50{,}000} \approx 0.0013$, or about 0.13 percentage points. This precision is more than sufficient for identifying multi-percentage-point discrepancies with the market.

5. The simulation's biggest weakness is the assumption of static ratings. In reality, teams change through trades, injuries, player development, and coaching adjustments. Incorporating mid-season rating updates (via a Bayesian updating mechanism) would make the simulation more realistic and is a natural extension of this work.


Your Turn: Extension Projects

  1. Add the play-in tournament. The NBA uses a play-in format for seeds 7-10. Modify the simulation to include play-in games and assess how they change playoff probabilities for bubble teams.

  2. Incorporate rating uncertainty. Instead of treating team ratings as known constants, model them as distributions (e.g., each team's rating is $N(\mu_i, \tau^2)$ where $\tau$ represents rating uncertainty). How does this change championship probabilities, especially for teams with limited track records?

  3. Simulate in-season updates. After every simulated week of games, update team ratings using a Bayesian update (shrinking toward the prior). Compare this dynamic simulation to the static version.

  4. Optimize a futures portfolio. Using the simulation output, build a portfolio of futures bets (Chapter 25 methods) that maximizes expected ROI subject to bankroll constraints.

  5. Calibrate against historical data. Run the simulation on five past NBA seasons where you have both pre-season ratings and actual outcomes. How well do the simulated probability distributions match reality?


Discussion Questions

  1. The simulation uses a constant home-court advantage of 3.0 points. How would team-specific home-court advantages (which vary from about 1.5 to 5.0 in the NBA) change the results?

  2. Why might the market be more efficient at pricing championship odds than at pricing win totals?

  3. If you discovered a 5-percentage-point edge on a team's championship futures at +2000, how much of your bankroll should you bet, given the uncertainty in your team ratings?

  4. How would you modify this simulation framework for the NFL, where the season is much shorter (17 games vs. 82) and playoff format is single elimination?

  5. The simulation assumes game outcomes are independent. Under what circumstances might this assumption be violated, and how would violations affect the results?