Case Study 2: Anatomy of a Sportsbook — How the Vig Really Works


Executive Summary

The vigorish — commonly called "the vig" or "juice" — is the mechanism by which sportsbooks guarantee themselves a profit margin regardless of game outcomes. While most bettors understand that odds of -110 on each side of a point spread mean they must risk $110 to win $100, few appreciate the full architecture of how sportsbooks set lines, manage risk, balance exposure, and compound their edge across thousands of events. This case study dissects the complete operational mechanics of a sportsbook, from initial line-setting for an NFL game through real-time risk management as bets flow in. Using Python simulations, we model sportsbook operations across a full NFL Sunday slate, then simulate 1,000 individual bettors placing wagers over an entire 18-week season to demonstrate how the vig's seemingly small mathematical edge produces devastating long-term results for the average bettor. The analysis reveals that at standard -110 juice, a bettor picking games at 50% accuracy will lose approximately 4.55% of every dollar wagered, and that even modest improvements in win rate are insufficient to overcome the vig unless a bettor achieves sustained accuracy above 52.4%.


Background

What Is a Sportsbook?

A sportsbook is a business that accepts wagers on the outcomes of sporting events. In the modern legal U.S. market, sportsbooks operate under state-issued licenses, either as standalone entities or through partnerships with casinos, sports venues, or media companies. Major operators include DraftKings, FanDuel, BetMGM, Caesars Sportsbook, and several others.

The sportsbook's fundamental business model is deceptively simple: set prices (odds) on each possible outcome of an event such that the total implied probabilities exceed 100%. This excess — the overround — is the vig. If a sportsbook could attract exactly proportional betting volume on each side of every market, it would earn the vig as guaranteed profit regardless of which team wins.

In practice, the business is far more complex. Sportsbooks must set accurate opening lines, adjust those lines in response to betting action and new information (injuries, weather, roster changes), manage their risk exposure when lopsided action creates large potential liabilities, and compete with other books for customers — all while maintaining sufficient margin to cover operating costs, marketing spend, and regulatory compliance.

How Lines Are Set

The process of setting a line for an NFL game begins days or even weeks before kickoff. Here is a simplified but realistic walkthrough for a hypothetical Week 10 matchup: the Kansas City Chiefs at the Buffalo Bills.

Step 1 — Power Ratings: The sportsbook's trading team maintains power ratings for every NFL team, updated weekly based on performance metrics, advanced statistics (EPA, DVOA, success rate), and qualitative factors. Suppose the book's model rates Buffalo as 2.5 points better than Kansas City on a neutral field.

Step 2 — Home-Field Adjustment: Historical data suggests NFL home-field advantage is worth approximately 1.5 to 2.5 points (the value has declined in recent years). The book applies a 2.0-point home-field adjustment, making Buffalo a 4.5-point favorite in this matchup.

Step 3 — Situational Adjustments: The trading team may adjust for factors their model does not fully capture: Kansas City is coming off a Monday Night Football game (short rest disadvantage of approximately 0.5 points), Buffalo's star wide receiver is listed as questionable (no adjustment yet, but the line will move if he is ruled out), and the weather forecast calls for moderate wind (slight adjustment to the total, less impact on the spread).

Step 4 — Opening Line: The book opens with Buffalo -4.5 at -110/-110. This means a bettor must wager $110 on either side to win $100. The -110 price on each side embeds the vig.

Step 5 — Market Adjustment: Once the line is posted, professional bettors ("sharps") and the general public begin placing wagers. If the book receives disproportionate sharp action on Kansas City +4.5, the line may move to Buffalo -4, then Buffalo -3.5. The book adjusts the line to balance its liability, though it also respects the informational content of sharp action — if the sharpest bettors in the market believe Kansas City +4.5 is a value bet, that is a signal the true line may be closer to 3.5 or 4.

The Mathematics of the Vig

At standard -110/-110 pricing on a two-outcome market, the mathematics work as follows:

  • To win $100 on Side A, you risk $110.
  • To win $100 on Side B, you risk $110.
  • The sportsbook collects $220 in total wagers from the two bettors.
  • Regardless of which side wins, the book pays out $210 ($110 stake returned + $100 profit to the winner).
  • The book retains $10, which is 4.545% of the $220 wagered.

This 4.545% is the theoretical hold percentage, also called the "vig" or "juice." It represents the sportsbook's gross margin before operating expenses.

Different bet types carry different vig structures:

  • Point spreads and totals: Typically -110/-110 (4.55% theoretical hold)
  • Moneylines: Vig is embedded in the gap between the two prices (e.g., -150/+130 implies roughly 5.3% vig)
  • Parlays: The vig compounds multiplicatively with each leg, making parlays significantly more profitable for the book
  • Prop bets and futures: Often carry higher vig (6-10%+) due to less liquid markets and more uncertainty in pricing

The Challenge

You have been hired as a data analyst at a mid-sized sportsbook. Your manager has asked you to build a simulation model that accomplishes three objectives:

  1. Model a full NFL Sunday slate: Simulate the sportsbook's operations for a typical 13-game NFL Sunday, including line-setting, bet acceptance, line movement, and profit/loss calculation.

  2. Analyze hold percentage: Calculate the actual hold (not theoretical) across the full slate and explain why it differs from the theoretical vig.

  3. Simulate bettor outcomes: Model 1,000 recreational bettors placing spread bets over an 18-week NFL season to demonstrate the long-term impact of the vig on bankrolls.


Available Data

For this simulation, we generate synthetic data representing:

  • A slate of 13 NFL Sunday games with opening lines and "true" probabilities
  • Bet-by-bet records for thousands of individual wagers on each game
  • Season-long bettor records tracking bankrolls, wager counts, and outcomes

Data Dictionary

Column Type Description
game_id int Unique identifier for each game
home_team string Home team name
away_team string Away team name
opening_spread float Opening point spread (negative = home favored)
closing_spread float Closing spread after line movement
true_home_cover_prob float Simulated "true" probability home team covers
home_covered bool Whether the home team covered the spread
total_bets int Number of individual bets placed on this game
home_side_pct float Percentage of bets on home team to cover
total_wagered float Total dollars wagered on this game
book_profit float Sportsbook profit/loss on this game

Analysis Approach

Phase 1: Modeling the Vig Across Bet Types

"""
Phase 1: Calculate and visualize the vig (vigorish) embedded
in different bet types and odds formats.

This module demonstrates how the overround works for spreads,
moneylines, and parlays, and computes the breakeven win rate
a bettor needs to overcome the vig.
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from typing import Tuple, List, Dict


def american_to_implied_prob(odds: int) -> float:
    """
    Convert American odds to implied probability.

    Args:
        odds: American-format odds (e.g., -110, +150).

    Returns:
        Implied probability as a float between 0 and 1.

    Examples:
        >>> american_to_implied_prob(-110)
        0.5238095238095238
        >>> american_to_implied_prob(+150)
        0.4
    """
    if odds < 0:
        return abs(odds) / (abs(odds) + 100)
    else:
        return 100 / (odds + 100)


def calculate_vig(odds_side_a: int, odds_side_b: int) -> Dict[str, float]:
    """
    Calculate the vig (overround) for a two-outcome market.

    Args:
        odds_side_a: American odds for outcome A.
        odds_side_b: American odds for outcome B.

    Returns:
        Dictionary containing:
            - implied_prob_a: Implied probability of outcome A
            - implied_prob_b: Implied probability of outcome B
            - overround: Total implied probability (> 1.0 indicates vig)
            - vig_pct: The vig as a percentage of total wagered
            - breakeven_pct: Win rate needed to break even betting one side
    """
    prob_a = american_to_implied_prob(odds_side_a)
    prob_b = american_to_implied_prob(odds_side_b)
    overround = prob_a + prob_b
    vig_pct = (overround - 1) / overround * 100

    # Breakeven for a bettor always taking one side at these odds
    if odds_side_a < 0:
        breakeven = abs(odds_side_a) / (abs(odds_side_a) + 100)
    else:
        breakeven = 100 / (odds_side_a + 100)

    return {
        "implied_prob_a": prob_a,
        "implied_prob_b": prob_b,
        "overround": overround,
        "vig_pct": vig_pct,
        "breakeven_pct": breakeven * 100,
    }


def demonstrate_vig_across_bet_types() -> pd.DataFrame:
    """
    Show how the vig varies across common bet types and odds.

    Returns:
        pd.DataFrame: Summary of vig calculations for various market types.
    """
    markets = [
        ("Spread (-110/-110)", -110, -110),
        ("Spread (-105/-115)", -105, -115),
        ("Moneyline (Even game: -115/+105)", -115, 105),
        ("Moneyline (Moderate fav: -150/+130)", -150, 130),
        ("Moneyline (Heavy fav: -250/+200)", -250, 200),
        ("Moneyline (Extreme fav: -500/+380)", -500, 380),
        ("Reduced juice (-105/-105)", -105, -105),
        ("High-vig prop (-120/-120)", -120, -120),
    ]

    results = []
    for name, odds_a, odds_b in markets:
        vig_info = calculate_vig(odds_a, odds_b)
        results.append({
            "market": name,
            "odds_a": odds_a,
            "odds_b": odds_b,
            "overround": f"{vig_info['overround']:.4f}",
            "vig_pct": f"{vig_info['vig_pct']:.2f}%",
            "breakeven_win_rate": f"{vig_info['breakeven_pct']:.2f}%",
        })

    df = pd.DataFrame(results)
    print("=== Vig Analysis Across Market Types ===\n")
    print(df.to_string(index=False))
    return df


def demonstrate_parlay_vig() -> None:
    """
    Show how vig compounds in parlays, making them
    significantly more profitable for the sportsbook.
    """
    print("\n=== Parlay Vig Compounding ===\n")

    # True fair odds for a coin-flip game: +100/+100
    # Standard sportsbook odds: -110/-110
    # For a single -110 bet: vig = 4.55%

    single_bet_vig = 4.545

    for num_legs in range(2, 9):
        # Fair parlay payout: (2.0 ^ num_legs - 1) to 1
        fair_payout = 2.0 ** num_legs

        # Actual parlay payout at -110 per leg
        # At -110, each leg pays 1/1.10 = 0.9091x profit
        # Parlay multiplier: (1 + 100/110) ^ num_legs = (210/110) ^ num_legs
        actual_payout = (210 / 110) ** num_legs

        # Effective vig for the parlay
        # If true prob of winning = (0.5)^num_legs
        # Fair EV = (0.5)^num_legs * fair_payout = 1.0 (break even)
        # Actual EV = (0.5)^num_legs * actual_payout
        true_prob = 0.5 ** num_legs
        ev_per_dollar = true_prob * actual_payout
        effective_vig = (1 - ev_per_dollar) * 100

        print(
            f"  {num_legs}-leg parlay: "
            f"Fair payout = {fair_payout:.0f}:1 | "
            f"Actual payout = {actual_payout:.2f}:1 | "
            f"Effective vig = {effective_vig:.1f}% | "
            f"EV per $1 = ${ev_per_dollar:.4f}"
        )


vig_summary = demonstrate_vig_across_bet_types()
demonstrate_parlay_vig()

Phase 2: Simulating a Full NFL Sunday Slate

"""
Phase 2: Simulate a sportsbook's operations for a full NFL Sunday
slate of 13 games, including line-setting, bet flow, line movement,
and profit/loss calculation.
"""

np.random.seed(2024)


def generate_nfl_sunday_slate(num_games: int = 13) -> pd.DataFrame:
    """
    Generate a realistic slate of NFL games with opening lines,
    true probabilities, and outcomes.

    Args:
        num_games: Number of games on the slate (default 13 for a
            typical NFL Sunday).

    Returns:
        pd.DataFrame: Game-level data including spreads, true
            probabilities, and outcomes.
    """
    matchups = [
        ("Bills", "Dolphins", -3.5),
        ("Ravens", "Bengals", -6.0),
        ("Lions", "Bears", -7.5),
        ("Packers", "Vikings", 2.5),
        ("Eagles", "Cowboys", -3.0),
        ("Chiefs", "Raiders", -9.5),
        ("49ers", "Seahawks", -4.0),
        ("Steelers", "Browns", -2.5),
        ("Texans", "Jaguars", -6.5),
        ("Falcons", "Saints", 1.0),
        ("Broncos", "Chargers", 3.0),
        ("Jets", "Patriots", -1.5),
        ("Commanders", "Giants", -4.5),
    ]

    games = []
    for i, (home, away, spread) in enumerate(matchups[:num_games]):
        # True probability that home team covers
        # Centered around 50% with some variance (the market is efficient
        # but not perfectly so)
        true_prob = np.clip(np.random.normal(0.50, 0.06), 0.35, 0.65)

        # Simulate small line movement (within 1 point typically)
        line_move = np.random.choice(
            [-1.0, -0.5, 0.0, 0.5, 1.0],
            p=[0.08, 0.17, 0.50, 0.17, 0.08]
        )
        closing_spread = spread + line_move

        # Determine if home covers (using true probability)
        home_covered = np.random.random() < true_prob

        games.append({
            "game_id": i + 1,
            "home_team": home,
            "away_team": away,
            "opening_spread": spread,
            "closing_spread": closing_spread,
            "true_home_cover_prob": round(true_prob, 4),
            "home_covered": home_covered,
        })

    return pd.DataFrame(games)


def simulate_betting_action(
    games: pd.DataFrame,
    avg_bets_per_game: int = 5000,
    avg_bet_size: float = 125.0,
) -> Tuple[pd.DataFrame, pd.DataFrame]:
    """
    Simulate individual bets flowing into the sportsbook for each game.

    This function models the realistic phenomenon that public bettors
    tend to favor home teams, favorites, and popular franchises,
    creating imbalanced action that the book must manage.

    Args:
        games: DataFrame of games from generate_nfl_sunday_slate().
        avg_bets_per_game: Average number of bets placed per game.
        avg_bet_size: Average bet size in dollars.

    Returns:
        Tuple of:
            - pd.DataFrame: Updated games DataFrame with betting metrics
            - pd.DataFrame: Individual bet records
    """
    all_bets = []
    game_summaries = []

    for _, game in games.iterrows():
        # Number of bets varies by game attractiveness
        num_bets = int(np.random.normal(avg_bets_per_game, avg_bets_per_game * 0.3))
        num_bets = max(500, num_bets)

        # Public bias: tends to bet favorites and home teams
        spread = game["closing_spread"]
        is_home_favorite = spread < 0

        # Base probability of betting home side (public bias)
        if is_home_favorite:
            home_bet_prob = np.clip(0.55 + abs(spread) * 0.01, 0.50, 0.70)
        else:
            home_bet_prob = np.clip(0.45 - abs(spread) * 0.01, 0.30, 0.50)

        bets = []
        for j in range(num_bets):
            bet_on_home = np.random.random() < home_bet_prob
            # Bet sizes follow a log-normal distribution (most small, some large)
            bet_size = np.random.lognormal(
                mean=np.log(avg_bet_size), sigma=0.8
            )
            bet_size = round(min(bet_size, 10000), 2)  # Cap at $10,000

            # Standard -110 odds
            odds = -110

            bets.append({
                "game_id": game["game_id"],
                "bet_id": j + 1,
                "side": "home" if bet_on_home else "away",
                "bet_size": bet_size,
                "odds": odds,
            })

        bets_df = pd.DataFrame(bets)

        # Calculate game-level summary
        home_bets = bets_df[bets_df["side"] == "home"]
        away_bets = bets_df[bets_df["side"] == "away"]

        total_home_wagered = home_bets["bet_size"].sum()
        total_away_wagered = away_bets["bet_size"].sum()
        total_wagered = total_home_wagered + total_away_wagered

        # Calculate book's profit/loss
        if game["home_covered"]:
            # Home bettors win: book pays home bettors, keeps away bets
            payout_to_winners = total_home_wagered * (100 / 110)
            book_profit = total_away_wagered - payout_to_winners
        else:
            # Away bettors win: book pays away bettors, keeps home bets
            payout_to_winners = total_away_wagered * (100 / 110)
            book_profit = total_home_wagered - payout_to_winners

        game_summaries.append({
            "game_id": game["game_id"],
            "home_team": game["home_team"],
            "away_team": game["away_team"],
            "closing_spread": game["closing_spread"],
            "home_covered": game["home_covered"],
            "total_bets": num_bets,
            "home_side_pct": round(len(home_bets) / num_bets * 100, 1),
            "home_dollars_pct": round(total_home_wagered / total_wagered * 100, 1),
            "total_wagered": round(total_wagered, 2),
            "book_profit": round(book_profit, 2),
        })

        bets_df["game_home"] = game["home_team"]
        bets_df["game_away"] = game["away_team"]
        bets_df["home_covered"] = game["home_covered"]
        all_bets.append(bets_df)

    game_results = pd.DataFrame(game_summaries)
    all_bets_df = pd.concat(all_bets, ignore_index=True)

    return game_results, all_bets_df


# Generate and simulate
games = generate_nfl_sunday_slate()
game_results, all_bets = simulate_betting_action(games)

print("=== NFL Sunday Slate Results ===\n")
print(game_results[[
    "home_team", "away_team", "closing_spread", "home_covered",
    "total_bets", "home_side_pct", "home_dollars_pct",
    "total_wagered", "book_profit"
]].to_string(index=False))

total_handle = game_results["total_wagered"].sum()
total_profit = game_results["book_profit"].sum()
actual_hold = total_profit / total_handle * 100

print(f"\n--- Sunday Summary ---")
print(f"Total handle: ${total_handle:,.2f}")
print(f"Total book profit: ${total_profit:,.2f}")
print(f"Actual hold percentage: {actual_hold:.2f}%")
print(f"Theoretical hold (at -110/-110): 4.55%")
print(f"Difference: {actual_hold - 4.55:+.2f} percentage points")

Phase 3: Analyzing Hold Across Different Vig Levels

"""
Phase 3: Analyze how different vig levels impact sportsbook
profitability and bettor expected value across many simulated slates.
"""


def simulate_many_slates(
    num_simulations: int = 500,
    vig_levels: List[int] = None,
) -> pd.DataFrame:
    """
    Run Monte Carlo simulation of NFL Sunday slates under
    different vig (juice) levels to analyze hold distribution.

    Args:
        num_simulations: Number of Sunday slates to simulate for each
            vig level.
        vig_levels: List of odds to test (e.g., [-105, -110, -115]).
            Default tests five common vig levels.

    Returns:
        pd.DataFrame: Simulation results with hold percentages
            for each vig level across all simulated slates.
    """
    if vig_levels is None:
        vig_levels = [-105, -108, -110, -115, -120]

    results = []

    for juice in vig_levels:
        theoretical_hold = (1 - (2 * 100 / (abs(juice) + 100 + 100 / (abs(juice) / 100 + 1)))) * 100
        # Simpler: for -110, breakeven = 110/210 = 0.5238; overround = 2*0.5238 = 1.0476; vig = 4.55%
        implied_prob_fav = abs(juice) / (abs(juice) + 100)
        overround = 2 * implied_prob_fav
        theo_hold_pct = (overround - 1) / overround * 100

        for sim in range(num_simulations):
            num_games = 13
            total_wagered = 0
            total_profit = 0

            for g in range(num_games):
                # Simulate bet volume for this game
                num_bets = int(np.random.normal(5000, 1500))
                num_bets = max(500, num_bets)

                # Public tends to favor one side (55-65%)
                popular_side_pct = np.random.uniform(0.50, 0.68)

                # Generate bet sizes (log-normal)
                bet_sizes = np.random.lognormal(mean=np.log(125), sigma=0.8, size=num_bets)
                bet_sizes = np.minimum(bet_sizes, 10000)

                # Assign sides
                sides = np.random.random(num_bets) < popular_side_pct
                popular_wagered = bet_sizes[sides].sum()
                unpopular_wagered = bet_sizes[~sides].sum()
                game_wagered = popular_wagered + unpopular_wagered

                # Outcome: true 50/50 whether popular side covers
                popular_wins = np.random.random() < 0.5

                if popular_wins:
                    payout = popular_wagered * (100 / abs(juice))
                    profit = unpopular_wagered - payout
                else:
                    payout = unpopular_wagered * (100 / abs(juice))
                    profit = popular_wagered - payout

                total_wagered += game_wagered
                total_profit += profit

            actual_hold = total_profit / total_wagered * 100

            results.append({
                "juice": juice,
                "theoretical_hold_pct": round(theo_hold_pct, 3),
                "simulation": sim,
                "total_wagered": total_wagered,
                "total_profit": total_profit,
                "actual_hold_pct": actual_hold,
            })

    return pd.DataFrame(results)


def plot_hold_distributions(results: pd.DataFrame) -> None:
    """
    Plot the distribution of actual hold percentages across
    simulated slates for each vig level.

    Args:
        results: DataFrame from simulate_many_slates().
    """
    juice_levels = sorted(results["juice"].unique())

    fig, axes = plt.subplots(1, len(juice_levels), figsize=(18, 5), sharey=True)

    for i, juice in enumerate(juice_levels):
        subset = results[results["juice"] == juice]
        theo = subset["theoretical_hold_pct"].iloc[0]

        axes[i].hist(
            subset["actual_hold_pct"], bins=40,
            color="#2563eb", alpha=0.7, edgecolor="black", linewidth=0.5
        )
        axes[i].axvline(theo, color="red", linewidth=2, linestyle="--", label=f"Theoretical: {theo:.2f}%")
        axes[i].axvline(0, color="black", linewidth=1, linestyle=":", alpha=0.5)
        axes[i].set_title(f"Juice: {juice}")
        axes[i].set_xlabel("Actual Hold %")
        if i == 0:
            axes[i].set_ylabel("Frequency")
        axes[i].legend(fontsize=8)

        # Print summary stats
        mean_hold = subset["actual_hold_pct"].mean()
        losing_sundays = (subset["actual_hold_pct"] < 0).mean() * 100
        print(
            f"Juice {juice}: Mean hold = {mean_hold:.2f}%, "
            f"Std = {subset['actual_hold_pct'].std():.2f}%, "
            f"Losing Sundays = {losing_sundays:.1f}%"
        )

    plt.suptitle("Distribution of Actual Hold % Across 500 Simulated NFL Sundays", fontsize=14)
    plt.tight_layout()
    plt.savefig("hold_distributions.png", dpi=150, bbox_inches="tight")
    plt.show()


hold_results = simulate_many_slates()
plot_hold_distributions(hold_results)

Phase 4: Simulating 1,000 Bettors Over a Full Season

"""
Phase 4: Simulate 1,000 individual bettors placing spread bets over
an 18-week NFL season to demonstrate the long-term effect of the vig
on bankroll outcomes.

Each bettor has a defined skill level (win probability on each bet),
a bankroll management strategy, and a fixed number of bets per week.
We track how the vig separates skilled from unskilled bettors.
"""

from dataclasses import dataclass


@dataclass
class BettorProfile:
    """Represents a single bettor's characteristics and season results."""
    bettor_id: int
    true_win_rate: float
    bets_per_week: int
    bet_size: float
    starting_bankroll: float
    weekly_bankrolls: List[float] = None
    total_bets: int = 0
    total_wins: int = 0
    ending_bankroll: float = 0.0

    def __post_init__(self) -> None:
        if self.weekly_bankrolls is None:
            self.weekly_bankrolls = [self.starting_bankroll]


def simulate_bettor_season(
    bettor: BettorProfile,
    num_weeks: int = 18,
    odds: int = -110,
) -> BettorProfile:
    """
    Simulate one bettor's full NFL season of spread betting.

    Args:
        bettor: BettorProfile with the bettor's characteristics.
        num_weeks: Number of weeks in the season (NFL = 18).
        odds: American odds the bettor receives on each wager.

    Returns:
        Updated BettorProfile with season results.
    """
    bankroll = bettor.starting_bankroll
    payout_ratio = 100 / abs(odds)  # At -110, this is 0.9091

    for week in range(num_weeks):
        for bet in range(bettor.bets_per_week):
            if bankroll < bettor.bet_size:
                # Bettor is busted — cannot place any more bets
                break

            # Determine outcome
            won = np.random.random() < bettor.true_win_rate
            bettor.total_bets += 1

            if won:
                bankroll += bettor.bet_size * payout_ratio
                bettor.total_wins += 1
            else:
                bankroll -= bettor.bet_size

        bettor.weekly_bankrolls.append(bankroll)

    bettor.ending_bankroll = bankroll
    return bettor


def simulate_season_population(
    num_bettors: int = 1000,
    num_weeks: int = 18,
    odds: int = -110,
) -> pd.DataFrame:
    """
    Simulate a population of bettors with varying skill levels
    over a full NFL season.

    The population is distributed as follows:
        - 70% are pure recreational bettors (50% win rate)
        - 15% are slightly skilled (51% win rate)
        - 10% are moderately skilled (52% win rate)
        - 4% are sharp bettors (53% win rate)
        - 1% are elite (55% win rate)

    Args:
        num_bettors: Total number of bettors to simulate.
        num_weeks: Number of weeks in the season.
        odds: American odds applied to all bets.

    Returns:
        pd.DataFrame: Season results for all bettors.
    """
    # Define bettor population
    skill_distribution = [
        (0.50, int(num_bettors * 0.70)),  # Recreational: 50%
        (0.51, int(num_bettors * 0.15)),  # Slightly skilled: 51%
        (0.52, int(num_bettors * 0.10)),  # Moderately skilled: 52%
        (0.53, int(num_bettors * 0.04)),  # Sharp: 53%
        (0.55, int(num_bettors * 0.01)),  # Elite: 55%
    ]

    bettors = []
    bettor_id = 0

    for win_rate, count in skill_distribution:
        for _ in range(count):
            bets_per_week = np.random.choice([3, 5, 8, 10, 15], p=[0.15, 0.30, 0.30, 0.15, 0.10])
            bet_size = np.random.choice([25, 50, 100, 200], p=[0.30, 0.35, 0.25, 0.10])
            starting_bankroll = bet_size * np.random.choice([20, 40, 60, 100], p=[0.25, 0.35, 0.25, 0.15])

            profile = BettorProfile(
                bettor_id=bettor_id,
                true_win_rate=win_rate,
                bets_per_week=bets_per_week,
                bet_size=bet_size,
                starting_bankroll=starting_bankroll,
            )
            profile = simulate_bettor_season(profile, num_weeks, odds)
            bettors.append(profile)
            bettor_id += 1

    records = []
    for b in bettors:
        roi = (b.ending_bankroll - b.starting_bankroll) / b.starting_bankroll * 100
        records.append({
            "bettor_id": b.bettor_id,
            "true_win_rate": b.true_win_rate,
            "bets_per_week": b.bets_per_week,
            "bet_size": b.bet_size,
            "starting_bankroll": b.starting_bankroll,
            "ending_bankroll": round(b.ending_bankroll, 2),
            "total_bets": b.total_bets,
            "total_wins": b.total_wins,
            "actual_win_rate": round(b.total_wins / max(b.total_bets, 1), 4),
            "roi_pct": round(roi, 2),
            "went_bust": b.ending_bankroll < b.bet_size,
        })

    return pd.DataFrame(records), bettors


# Run the simulation
np.random.seed(42)
season_results, bettor_objects = simulate_season_population()

print("=== Season Simulation Results (1,000 Bettors) ===\n")

for win_rate in [0.50, 0.51, 0.52, 0.53, 0.55]:
    group = season_results[season_results["true_win_rate"] == win_rate]
    print(f"\n--- Win Rate: {win_rate:.0%} ({len(group)} bettors) ---")
    print(f"  Mean ROI: {group['roi_pct'].mean():.1f}%")
    print(f"  Median ROI: {group['roi_pct'].median():.1f}%")
    print(f"  % Profitable: {(group['roi_pct'] > 0).mean():.1%}")
    print(f"  % Went Bust: {group['went_bust'].mean():.1%}")
    print(f"  Mean ending bankroll: ${group['ending_bankroll'].mean():,.0f}")
    print(f"  Avg total bets placed: {group['total_bets'].mean():.0f}")
"""
Visualization of season simulation results.
"""


def plot_season_results(
    season_results: pd.DataFrame,
    bettor_objects: List[BettorProfile],
) -> None:
    """
    Create comprehensive visualizations of the season simulation,
    including ROI distributions, bankroll trajectories, and the
    relationship between skill and profitability.

    Args:
        season_results: DataFrame of season outcomes for all bettors.
        bettor_objects: List of BettorProfile objects with weekly
            bankroll histories.
    """
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))

    # Panel 1: ROI distribution by skill level
    win_rates = [0.50, 0.51, 0.52, 0.53, 0.55]
    colors = ["#ef4444", "#f97316", "#eab308", "#22c55e", "#2563eb"]

    for wr, color in zip(win_rates, colors):
        group = season_results[season_results["true_win_rate"] == wr]
        axes[0, 0].hist(
            group["roi_pct"], bins=30, alpha=0.5,
            label=f"{wr:.0%} ({len(group)})", color=color, edgecolor="black",
            linewidth=0.3
        )
    axes[0, 0].axvline(0, color="black", linewidth=2, linestyle="--")
    axes[0, 0].set_xlabel("Season ROI (%)")
    axes[0, 0].set_ylabel("Number of Bettors")
    axes[0, 0].set_title("ROI Distribution by True Win Rate")
    axes[0, 0].legend(title="Win Rate (n)")

    # Panel 2: Bankroll trajectories (sample of 50 bettors at 50% win rate)
    recreational = [b for b in bettor_objects if b.true_win_rate == 0.50]
    sample_rec = np.random.choice(recreational, size=min(50, len(recreational)), replace=False)
    for b in sample_rec:
        normalized = [br / b.starting_bankroll for br in b.weekly_bankrolls]
        axes[0, 1].plot(range(len(normalized)), normalized, alpha=0.2, color="#ef4444", linewidth=0.8)

    # Add mean trajectory
    max_weeks = 19  # 0 through 18
    mean_trajectory = []
    for week in range(max_weeks):
        week_vals = []
        for b in recreational:
            if week < len(b.weekly_bankrolls):
                week_vals.append(b.weekly_bankrolls[week] / b.starting_bankroll)
        mean_trajectory.append(np.mean(week_vals))
    axes[0, 1].plot(range(max_weeks), mean_trajectory, color="black", linewidth=3, label="Mean (50% bettors)")
    axes[0, 1].axhline(1.0, color="gray", linewidth=1, linestyle="--")
    axes[0, 1].set_xlabel("Week of Season")
    axes[0, 1].set_ylabel("Bankroll (Fraction of Starting)")
    axes[0, 1].set_title("Bankroll Trajectories: 50% Win Rate Bettors (n=50 sample)")
    axes[0, 1].legend()

    # Panel 3: Probability of profit by win rate
    profit_rates = []
    for wr in np.arange(0.48, 0.57, 0.005):
        # Quick simulation: 500 bettors at each win rate
        wins = 0
        trials = 500
        for _ in range(trials):
            num_bets = 18 * 8  # 8 bets per week, 18 weeks
            outcomes = np.random.random(num_bets) < wr
            pnl = outcomes.sum() * (100 / 110) - (~outcomes).sum()
            if pnl > 0:
                wins += 1
        profit_rates.append({"win_rate": wr, "profit_probability": wins / trials})

    profit_df = pd.DataFrame(profit_rates)
    axes[1, 0].plot(
        profit_df["win_rate"] * 100, profit_df["profit_probability"] * 100,
        linewidth=2, color="#2563eb", marker="o", markersize=4
    )
    axes[1, 0].axvline(52.38, color="red", linewidth=2, linestyle="--", label="Breakeven (52.4%)")
    axes[1, 0].axhline(50, color="gray", linewidth=1, linestyle=":", alpha=0.5)
    axes[1, 0].set_xlabel("True Win Rate (%)")
    axes[1, 0].set_ylabel("Probability of Profitable Season (%)")
    axes[1, 0].set_title("Probability of Season Profit vs. Win Rate (at -110)")
    axes[1, 0].legend()
    axes[1, 0].grid(True, alpha=0.3)

    # Panel 4: Expected loss per $100 wagered at different vig levels
    juice_levels = np.arange(-105, -126, -1)
    expected_losses = []
    for juice in juice_levels:
        # At 50% win rate, EV per dollar = 0.5 * (100/|juice|) - 0.5
        ev_per_bet = 0.5 * (100 / abs(juice)) - 0.5
        expected_losses.append({
            "juice": juice,
            "ev_per_dollar": ev_per_bet,
            "loss_per_100": -ev_per_bet * 100,
        })

    ev_df = pd.DataFrame(expected_losses)
    axes[1, 1].bar(
        ev_df["juice"].astype(str), ev_df["loss_per_100"],
        color="#ef4444", edgecolor="black", linewidth=0.5
    )
    axes[1, 1].set_xlabel("Juice (Odds)")
    axes[1, 1].set_ylabel("Expected Loss per $100 Wagered ($)")
    axes[1, 1].set_title("The Cost of the Vig: Expected Loss at 50% Win Rate")
    axes[1, 1].tick_params(axis="x", rotation=45)

    plt.tight_layout()
    plt.savefig("season_simulation_results.png", dpi=150, bbox_inches="tight")
    plt.show()


plot_season_results(season_results, bettor_objects)

Results Summary

Key Finding 1: The Vig Is a Small Per-Bet Tax with Devastating Cumulative Effects

At standard -110 odds, a 50% bettor loses $4.55 for every $100 wagered. This sounds almost negligible on any single bet. But a typical recreational bettor placing 8 bets per week over an 18-week NFL season makes 144 total wagers. If each bet is $100, that is $14,400 in total handle, and the expected loss is approximately $655 — a 6.5% loss on starting bankroll for a bettor who began with $10,000. The simulation confirmed this: recreational bettors at exactly 50% win rate showed a mean ROI of approximately -4.5% on dollars wagered, with substantial variance creating the illusion that some strategies "work" over a single season.

Key Finding 2: The Breakeven Win Rate Is 52.4%, and Very Few Bettors Achieve It Consistently

At -110 odds, a bettor must win 52.38% of spread bets to break even. Our simulation showed that even bettors with a true 52% edge — a level of skill that would rank them among the most capable handicappers in the market — had only about a 40-45% probability of showing a profit in any given season. Only the 53% and 55% skill tiers showed majority profitable outcomes. This demonstrates that even genuine skill can be obscured by variance over sample sizes as small as one NFL season (144 bets).

Key Finding 3: Sportsbook Hold Varies Significantly on Any Given Sunday

While the theoretical hold at -110/-110 is 4.55%, our simulation of 500 NFL Sundays showed actual hold percentages ranging from roughly -5% (a losing Sunday for the book) to +15% in extreme cases. The standard deviation of actual hold was approximately 3-4 percentage points. This means sportsbooks must manage significant short-term volatility despite having a long-term mathematical edge. The probability of the book losing money on any single Sunday at -110 juice was approximately 15-20%.

Key Finding 4: Parlay Vig Is Dramatically Higher Than Spread Vig

The vig on a 2-leg parlay at -110 per leg is approximately 8.7%, nearly double the single-bet vig. For a 5-leg parlay, the effective vig exceeds 20%. For an 8-leg parlay — the type of "lottery ticket" bet that sportsbook marketing departments aggressively promote — the effective vig approaches 33%. This explains why parlays are enormously profitable for sportsbooks and why same-game parlays have become a cornerstone of operator revenue strategy.

Key Finding 5: Public Bias Creates Predictable Action Imbalances

Our simulation modeled the well-documented tendency for public bettors to favor home teams and favorites. This imbalance means the sportsbook does not always achieve balanced action. On games where the public side wins, the book loses money. On games where the public side loses, the book wins more than the theoretical vig. Over large sample sizes, these fluctuations average out, but they create meaningful short-term risk that books must manage through position limits, line movement, and hedging with other sportsbooks.


Limitations

  1. Simplified odds model: Real sportsbooks offer a far more complex menu of bets (player props, live betting, alternative lines, teasers, round robins) with varying vig levels. Our simulation focused exclusively on standard spread bets at -110.

  2. Static win rates: We assumed each bettor has a fixed true win rate throughout the season. In reality, bettor skill varies by sport, bet type, and even by week depending on information advantages.

  3. No line shopping: Sophisticated bettors shop for the best odds across multiple sportsbooks, effectively reducing the vig they pay. Our simulation assumed all bettors accept -110 on every bet.

  4. No bankroll management dynamics: Our bettors used flat bet sizing. In reality, many bettors increase bet sizes after wins (or losses), which can dramatically alter outcome distributions.

  5. Independence assumption: We treated each bet outcome as independent, which is approximately true for spread bets on different games but would not hold for correlated bets (same-game parlays, related props).

  6. No promotional offers: Modern sportsbooks offer significant promotions (free bets, deposit bonuses, boosted odds) that reduce the effective vig for bettors, particularly new customers. These promotions are a major cost for operators and a meaningful source of value for savvy bettors.


Discussion Questions

  1. Breakeven analysis: At -110 odds, the breakeven win rate is 52.38%. What would the breakeven win rate be at -105 (reduced juice) and at -120 (high-vig prop market)? How does this difference affect a bettor's long-term expected profit or loss? Calculate the exact figures.

  2. Sportsbook risk management: If a sportsbook has taken $1.5 million on the Chiefs -3.5 at -110 and only $800,000 on the Bills +3.5, what is the book's maximum loss on each outcome? What strategies could the book use to reduce this exposure? At what point might the book move the line?

  3. The parlay paradox: Sportsbooks spend enormous marketing budgets promoting parlays, especially same-game parlays. Why is this rational from the book's perspective? If parlays are so favorable for the house, why do bettors continue to place them in large volumes? Discuss the behavioral economics at play.

  4. Variance and sample size: Our simulation showed that even a 53% bettor — someone with genuine, sustained edge — has roughly a 30% chance of losing money in a single NFL season of 144 bets. How many bets would a 53% bettor need to place to be 95% confident of showing a lifetime profit? Use the binomial distribution to calculate this.

  5. Reduced juice economics: Some sportsbooks (notably Circa Sports and certain offshore books) offer reduced juice (-105) on spreads and totals. What is the theoretical hold at -105/-105? Why might a sportsbook choose to offer reduced juice, and what must be true about their cost structure for this to be viable?

  6. The sharp vs. square dynamic: Sportsbooks famously treat sharp (professional) and square (recreational) bettors differently, sometimes limiting or banning sharp bettors. Using the concepts from this case study, explain why a sportsbook would reject profitable customers. Is this practice good or bad for market efficiency?


Your Turn: Mini-Project

Project: Build a Complete Sportsbook Simulator

Using the code framework from this case study, extend the simulation to build a more realistic sportsbook model:

  1. Add moneyline markets: Extend the simulation so that each game has both a spread market (-110/-110) and a moneyline market (with realistic odds derived from the spread). Calculate the sportsbook's combined hold across both market types for each game.

  2. Implement dynamic line movement: Modify simulate_betting_action() so that the line moves in response to one-sided action. Define a threshold (e.g., if more than 65% of dollars are on one side, the line moves 0.5 points). Track how line movement affects the final hold percentage.

  3. Model line shopping: Create a version of the bettor simulation where 10% of bettors are "line shoppers" who get -108 instead of -110 on every bet (representing the average improvement from shopping three books). Compare their season outcomes against the non-shopping population.

  4. Analyze the "survivor bias" problem: After your season simulation, identify the top 5% of bettors by ROI. What were their true win rates? How many of them were genuinely skilled (53%+ true win rate) versus lucky recreational bettors (50% true win rate)? This analysis demonstrates why it is difficult to distinguish skill from luck in sports betting. Create a confusion matrix showing the classification accuracy if you labeled anyone with a positive season ROI as "skilled."

  5. Visualization deliverable: Create a single dashboard with six panels: (a) vig comparison across bet types, (b) a single Sunday's game-by-game book P&L, (c) hold distribution across 500 Sundays, (d) season ROI by skill tier, (e) bankroll trajectory spaghetti plot, and (f) the survivor bias confusion matrix from task 4.

Stretch goal: Add a live in-game betting module. Model a single NFL game divided into 10 "betting windows" (representing key moments like end of quarters, scoring drives, turnovers). At each window, simulate a new batch of bets at updated odds. Calculate the book's total hold from pre-game plus in-game wagering, and compare it to the pre-game-only hold. Research suggests that in-game betting has higher hold for the sportsbook — does your simulation confirm this, and if so, why?