Case Study 1: Building a Monte Carlo Win Total Projection System

Overview

This case study implements a complete Monte Carlo win total projection system for an NBA-like league. We build team ratings, construct schedules, simulate thousands of seasons, evaluate win total futures against the simulated distributions, and identify positive-expected-value bets. The system demonstrates the full pipeline from raw inputs to actionable bet recommendations.

Problem Statement

Win total futures are among the most analytically tractable futures markets. The core challenge is estimating the probability distribution of wins for each team, then comparing it to the posted over/under line. A simple point estimate (e.g., "Team A will win 50 games") is insufficient because the betting decision depends on the full distribution: what is the probability that Team A wins MORE than 48.5 games? To answer this properly, we need Monte Carlo simulation that accounts for schedule strength, rating uncertainty, home court advantage, and back-to-back effects.

Implementation

"""
Case Study 1: Monte Carlo Win Total Projection System

Builds team ratings, simulates seasons, and evaluates win total futures.
"""

import numpy as np
from dataclasses import dataclass, field
from typing import Dict, List, Tuple, Optional
from collections import Counter


@dataclass
class Team:
    """Team with ratings and prior season data.

    Attributes:
        name: Full team name.
        abbrev: Three-letter abbreviation.
        conference: "East" or "West".
        division: Division name.
        rating: Power rating (0 = league average).
        rating_std: Rating uncertainty.
        prev_wins: Previous season win count.
        prev_games: Previous season games played.
        prev_pts_for: Previous season total points scored.
        prev_pts_against: Previous season total points allowed.
    """

    name: str
    abbrev: str
    conference: str
    division: str
    rating: float
    rating_std: float = 2.0
    prev_wins: int = 41
    prev_games: int = 82
    prev_pts_for: float = 9100.0
    prev_pts_against: float = 9100.0


@dataclass
class ScheduleGame:
    """A single game on a team's schedule.

    Attributes:
        game_num: Sequential game number.
        opponent: Opponent abbreviation.
        is_home: Whether this is a home game.
        is_back_to_back: Whether the team played yesterday.
    """

    game_num: int
    opponent: str
    is_home: bool
    is_back_to_back: bool = False


@dataclass
class WinTotalLine:
    """A posted win total line to evaluate.

    Attributes:
        team: Team abbreviation.
        line: Over/under line (e.g. 48.5).
        over_odds: Decimal odds for over.
        under_odds: Decimal odds for under.
    """

    team: str
    line: float
    over_odds: float
    under_odds: float


class WinTotalSimulator:
    """Monte Carlo season simulator for win total projections.

    Args:
        teams: Dictionary of team abbreviation to Team.
        schedules: Dictionary of team abbreviation to schedule.
        home_advantage: Home court advantage in rating points.
        back_to_back_penalty: Rating penalty for back-to-back games.
        scale: Rating scale parameter for win probability.
    """

    def __init__(
        self,
        teams: Dict[str, Team],
        schedules: Dict[str, List[ScheduleGame]],
        home_advantage: float = 3.0,
        back_to_back_penalty: float = 1.5,
        scale: float = 10.0,
    ):
        self.teams = teams
        self.schedules = schedules
        self.home_advantage = home_advantage
        self.b2b_penalty = back_to_back_penalty
        self.scale = scale

    def pythagorean_wins(
        self, team: Team, exponent: float = 13.91
    ) -> float:
        """Calculate Pythagorean expected wins.

        Args:
            team: Team data.
            exponent: Sport-specific exponent.

        Returns:
            Expected wins based on point differential.
        """
        if team.prev_pts_for <= 0 or team.prev_pts_against <= 0:
            return team.prev_wins

        pf_k = team.prev_pts_for ** exponent
        pa_k = team.prev_pts_against ** exponent
        pyth_pct = pf_k / (pf_k + pa_k)
        return pyth_pct * team.prev_games

    def game_win_probability(
        self,
        team_rating: float,
        opp_rating: float,
        is_home: bool,
        is_b2b: bool,
        opp_is_b2b: bool = False,
    ) -> float:
        """Calculate single-game win probability.

        Args:
            team_rating: Team's simulated rating.
            opp_rating: Opponent's simulated rating.
            is_home: Whether team is at home.
            is_b2b: Whether team is on a back-to-back.
            opp_is_b2b: Whether opponent is on a back-to-back.

        Returns:
            Win probability for the team.
        """
        hca = self.home_advantage if is_home else -self.home_advantage
        b2b_adj = 0.0
        if is_b2b:
            b2b_adj -= self.b2b_penalty
        if opp_is_b2b:
            b2b_adj += self.b2b_penalty * 0.7

        diff = team_rating - opp_rating + hca + b2b_adj
        prob = 1.0 / (1.0 + 10 ** (-diff / self.scale))
        return prob

    def simulate_seasons(
        self,
        n_sims: int = 10_000,
        include_uncertainty: bool = True,
        seed: int = 42,
    ) -> Dict[str, Dict]:
        """Run Monte Carlo season simulation.

        Args:
            n_sims: Number of seasons to simulate.
            include_uncertainty: Sample ratings from uncertainty dist.
            seed: Random seed.

        Returns:
            Dictionary of team results with win distributions.
        """
        rng = np.random.RandomState(seed)
        team_names = list(self.teams.keys())

        win_counts = {name: np.zeros(n_sims) for name in team_names}
        division_wins = {name: 0 for name in team_names}
        conference_best = {name: 0 for name in team_names}

        for sim in range(n_sims):
            # Sample ratings
            if include_uncertainty:
                sim_ratings = {
                    name: rng.normal(t.rating, t.rating_std)
                    for name, t in self.teams.items()
                }
            else:
                sim_ratings = {
                    name: t.rating for name, t in self.teams.items()
                }

            # Simulate each team's season
            season_wins: Dict[str, int] = {}
            for team_name in team_names:
                schedule = self.schedules.get(team_name, [])
                wins = 0
                for game in schedule:
                    if game.opponent not in sim_ratings:
                        continue
                    wp = self.game_win_probability(
                        sim_ratings[team_name],
                        sim_ratings[game.opponent],
                        game.is_home,
                        game.is_back_to_back,
                    )
                    if rng.random() < wp:
                        wins += 1
                win_counts[team_name][sim] = wins
                season_wins[team_name] = wins

            # Division winners
            divisions: Dict[str, List[Tuple[str, int]]] = {}
            for name, team in self.teams.items():
                divisions.setdefault(team.division, []).append(
                    (name, season_wins[name])
                )
            for div_teams in divisions.values():
                div_teams.sort(key=lambda x: x[1], reverse=True)
                division_wins[div_teams[0][0]] += 1

            # Conference leaders
            conferences: Dict[str, List[Tuple[str, int]]] = {}
            for name, team in self.teams.items():
                conferences.setdefault(team.conference, []).append(
                    (name, season_wins[name])
                )
            for conf_teams in conferences.values():
                conf_teams.sort(key=lambda x: x[1], reverse=True)
                conference_best[conf_teams[0][0]] += 1

        # Compile results
        results: Dict[str, Dict] = {}
        for name in team_names:
            wins = win_counts[name]
            dist = dict(Counter(wins.astype(int)))
            results[name] = {
                "mean": round(float(np.mean(wins)), 1),
                "median": round(float(np.median(wins)), 1),
                "std": round(float(np.std(wins)), 1),
                "p10": round(float(np.percentile(wins, 10)), 1),
                "p90": round(float(np.percentile(wins, 90)), 1),
                "distribution": dist,
                "div_winner_prob": round(division_wins[name] / n_sims, 3),
                "conf_best_prob": round(conference_best[name] / n_sims, 3),
            }

        return results

    def evaluate_win_total(
        self,
        line: WinTotalLine,
        sim_results: Dict[str, Dict],
    ) -> Dict:
        """Evaluate a win total line against simulation.

        Args:
            line: Posted win total line.
            sim_results: Results from simulate_seasons.

        Returns:
            Evaluation dictionary with edge and recommendation.
        """
        result = sim_results.get(line.team)
        if result is None:
            return {"error": f"Team {line.team} not found"}

        dist = result["distribution"]
        total_sims = sum(dist.values())

        over_count = sum(
            cnt for w, cnt in dist.items() if w > line.line
        )
        under_count = sum(
            cnt for w, cnt in dist.items() if w < line.line
        )

        over_prob = over_count / total_sims
        under_prob = under_count / total_sims

        over_implied = 1.0 / line.over_odds
        under_implied = 1.0 / line.under_odds

        over_edge = over_prob - over_implied
        under_edge = under_prob - under_implied

        over_ev = over_prob * (line.over_odds - 1) - (1 - over_prob)
        under_ev = under_prob * (line.under_odds - 1) - (1 - under_prob)

        if over_edge > under_edge:
            best_side = "OVER"
            best_edge = over_edge
            best_ev = over_ev
        else:
            best_side = "UNDER"
            best_edge = under_edge
            best_ev = under_ev

        return {
            "team": line.team,
            "line": line.line,
            "model_mean": result["mean"],
            "model_std": result["std"],
            "over_prob": round(over_prob, 4),
            "under_prob": round(under_prob, 4),
            "over_edge": round(over_edge, 4),
            "under_edge": round(under_edge, 4),
            "best_side": best_side,
            "best_edge": round(best_edge, 4),
            "ev_per_dollar": round(best_ev, 4),
            "kelly": round(max(best_ev / (line.over_odds - 1 if best_side == "OVER" else line.under_odds - 1), 0), 4),
            "recommendation": (
                "BET" if best_edge > 0.03
                else ("LEAN" if best_edge > 0 else "PASS")
            ),
        }


def generate_league(
    n_teams: int = 12, seed: int = 42
) -> Tuple[Dict[str, Team], Dict[str, List[ScheduleGame]]]:
    """Generate a synthetic league with teams and schedules.

    Args:
        n_teams: Number of teams (must be even, divisible by 2).
        seed: Random seed.

    Returns:
        Tuple of (teams_dict, schedules_dict).
    """
    rng = np.random.RandomState(seed)

    conferences = ["East", "West"]
    divisions = ["Atlantic", "Central", "Southeast",
                  "Northwest", "Pacific", "Southwest"]

    city_names = [
        "Boston", "New York", "Philadelphia", "Chicago",
        "Milwaukee", "Cleveland", "Miami", "Atlanta",
        "Denver", "Phoenix", "LA", "Portland",
    ][:n_teams]

    abbrevs = [c[:3].upper() for c in city_names]

    teams: Dict[str, Team] = {}
    for i, (city, abbrev) in enumerate(zip(city_names, abbrevs)):
        conf_idx = 0 if i < n_teams // 2 else 1
        div_idx = (i % (n_teams // 2)) // (n_teams // 6) if n_teams >= 6 else 0
        actual_div_idx = conf_idx * 3 + min(div_idx, 2)

        rating = rng.normal(0, 4.0)
        rating_std = rng.uniform(1.5, 3.0)

        prev_pf = rng.normal(9100, 300)
        prev_pa = prev_pf - rating * 30

        teams[abbrev] = Team(
            name=city,
            abbrev=abbrev,
            conference=conferences[conf_idx],
            division=divisions[actual_div_idx % len(divisions)],
            rating=round(rating, 1),
            rating_std=round(rating_std, 1),
            prev_wins=max(15, min(67, int(41 + rating * 3 + rng.normal(0, 3)))),
            prev_games=82,
            prev_pts_for=round(prev_pf),
            prev_pts_against=round(prev_pa),
        )

    # Generate round-robin schedule
    games_per_matchup = 4
    schedules: Dict[str, List[ScheduleGame]] = {a: [] for a in abbrevs}
    game_counters = {a: 0 for a in abbrevs}

    for i, a1 in enumerate(abbrevs):
        for j, a2 in enumerate(abbrevs):
            if i >= j:
                continue
            for g in range(games_per_matchup):
                home_team = a1 if g < games_per_matchup // 2 else a2
                away_team = a2 if g < games_per_matchup // 2 else a1

                is_b2b = rng.random() < 0.15

                game_counters[home_team] += 1
                schedules[home_team].append(ScheduleGame(
                    game_num=game_counters[home_team],
                    opponent=away_team,
                    is_home=True,
                    is_back_to_back=is_b2b,
                ))

                game_counters[away_team] += 1
                schedules[away_team].append(ScheduleGame(
                    game_num=game_counters[away_team],
                    opponent=home_team,
                    is_home=False,
                    is_back_to_back=rng.random() < 0.15,
                ))

    return teams, schedules


def main() -> None:
    """Run the win total projection case study."""
    print("=" * 70)
    print("CASE STUDY 1: Monte Carlo Win Total Projection System")
    print("=" * 70)

    # --- Generate league ---
    teams, schedules = generate_league(n_teams=12, seed=42)

    print("\n--- League Teams ---\n")
    print(f"  {'Team':>12} {'Conf':>6} {'Division':>10} {'Rating':>7} "
          f"{'Unc':>5} {'Prev W':>7}")
    print("  " + "-" * 52)
    for abbrev in sorted(teams, key=lambda a: teams[a].rating, reverse=True):
        t = teams[abbrev]
        print(f"  {t.name:>12} {t.conference:>6} {t.division:>10} "
              f"{t.rating:>+7.1f} {t.rating_std:>5.1f} {t.prev_wins:>7}")

    # --- Pythagorean analysis ---
    print("\n--- Pythagorean Analysis ---\n")
    simulator = WinTotalSimulator(teams, schedules)

    for abbrev in sorted(teams, key=lambda a: teams[a].rating, reverse=True)[:6]:
        t = teams[abbrev]
        pyth = simulator.pythagorean_wins(t)
        luck = t.prev_wins - pyth
        print(f"  {t.name:>12}: Actual={t.prev_wins}, "
              f"Pyth={pyth:.1f}, Luck={luck:+.1f}")

    # --- Run simulation ---
    print("\n--- Season Simulation (10,000 seasons) ---\n")
    sim_results = simulator.simulate_seasons(
        n_sims=10_000, include_uncertainty=True, seed=42
    )

    print(f"  {'Team':>12} {'Mean':>6} {'Std':>5} {'P10':>5} "
          f"{'P90':>5} {'Div%':>6} {'Conf%':>6}")
    print("  " + "-" * 50)
    for abbrev in sorted(
        sim_results, key=lambda a: sim_results[a]["mean"], reverse=True
    ):
        r = sim_results[abbrev]
        print(f"  {teams[abbrev].name:>12} {r['mean']:>6.1f} "
              f"{r['std']:>5.1f} {r['p10']:>5.1f} {r['p90']:>5.1f} "
              f"{r['div_winner_prob']:>6.1%} {r['conf_best_prob']:>6.1%}")

    # --- Evaluate win totals ---
    print("\n\n--- Win Total Evaluation ---\n")

    rng = np.random.RandomState(99)
    lines: List[WinTotalLine] = []
    for abbrev in sorted(teams, key=lambda a: sim_results[a]["mean"], reverse=True):
        model_mean = sim_results[abbrev]["mean"]
        # Book sets line near model mean but with some offset
        book_offset = rng.normal(0, 1.5)
        book_line = round(model_mean + book_offset) + 0.5
        lines.append(WinTotalLine(
            team=abbrev, line=book_line,
            over_odds=1.91, under_odds=1.91,
        ))

    evaluations = []
    for line in lines:
        ev = simulator.evaluate_win_total(line, sim_results)
        evaluations.append(ev)

    evaluations.sort(key=lambda x: abs(x.get("best_edge", 0)), reverse=True)

    print(f"  {'Team':>12} {'Line':>6} {'Model':>6} {'Side':>6} "
          f"{'Prob':>6} {'Edge':>6} {'EV':>6} {'Rec':>5}")
    print("  " + "-" * 56)
    for ev in evaluations:
        print(f"  {teams[ev['team']].name:>12} {ev['line']:>6.1f} "
              f"{ev['model_mean']:>6.1f} {ev['best_side']:>6} "
              f"{ev['over_prob'] if ev['best_side'] == 'OVER' else ev['under_prob']:>6.1%} "
              f"{ev['best_edge']:>+5.1%} {ev['ev_per_dollar']:>+5.1%} "
              f"{ev['recommendation']:>5}")

    # --- Recommended bets ---
    bets = [e for e in evaluations if e["recommendation"] == "BET"]
    print(f"\n  Recommended bets: {len(bets)}")
    total_ev = sum(b["ev_per_dollar"] for b in bets)
    print(f"  Combined EV (per $1 per bet): {total_ev / len(bets):+.1%}" if bets else "")

    # --- Uncertainty impact ---
    print("\n\n--- Rating Uncertainty Impact ---\n")
    sim_certain = simulator.simulate_seasons(
        n_sims=5_000, include_uncertainty=False, seed=42
    )
    sim_uncertain = simulator.simulate_seasons(
        n_sims=5_000, include_uncertainty=True, seed=42
    )

    print(f"  {'Team':>12} {'Std(fixed)':>10} {'Std(uncertain)':>14} {'Increase':>9}")
    print("  " + "-" * 48)
    for abbrev in sorted(teams, key=lambda a: teams[a].rating, reverse=True)[:6]:
        s_c = sim_certain[abbrev]["std"]
        s_u = sim_uncertain[abbrev]["std"]
        print(f"  {teams[abbrev].name:>12} {s_c:>10.1f} {s_u:>14.1f} "
              f"{s_u - s_c:>+9.1f}")

    print("\n" + "=" * 70)


if __name__ == "__main__":
    main()

Analysis and Results

The simulation reveals several important patterns. First, even in a 12-team league with clear rating separations, the standard deviation of win totals is typically 4-6 wins. This means a team projected for 30 wins (out of approximately 44 total games per team in this round-robin) could reasonably finish anywhere from 24 to 36 wins. This wide distribution is why point estimates alone are insufficient for evaluating win total bets.

Second, incorporating rating uncertainty increases the win distribution's spread by 0.5 to 1.5 wins of additional standard deviation. This is crucial: using point estimate ratings underestimates the variance of outcomes, which would lead to overconfident probabilities and oversized bets.

Third, the win total evaluation identifies 3-5 bets per season with meaningful edge. These tend to occur when the book's line deviates from the model's mean by more than one standard deviation, or when the model has particularly high conviction (low rating uncertainty) for a given team.

The Pythagorean analysis adds a regression-to-the-mean component. Teams that outperformed their Pythagorean expectation last season (positive "luck") are projected for fewer wins next season, and vice versa. This is one of the most powerful preseason adjustments because close-game performance is largely random.

Key Takeaways

Monte Carlo simulation provides the complete information needed for win total bet evaluation: not just the mean, but the full probability distribution. The combination of team ratings, schedule analysis, and rating uncertainty produces realistic win distributions that properly capture the range of outcomes. The edge identification process is straightforward once the simulation is built: compare the simulated probability of going over or under the line to the book's implied probability, and bet when the difference exceeds a threshold that covers the vig.