30 min read

> "The theory of probabilities is at bottom nothing but common sense reduced to calculus; it enables us to appreciate with exactness that which accurate minds feel with a sort of instinct for which they are frequently unable to account."

Learning Objectives

  • Understand and apply the foundational axioms and rules of probability to sporting events
  • Convert fluently between American, decimal, fractional, Hong Kong, and Malay odds formats
  • Extract implied probabilities from betting odds and interpret the overround
  • Calculate and compare sportsbook margins to evaluate market quality
  • Estimate true probabilities from historical data and assess calibration of your own predictions

Chapter 2: Probability and Odds

"The theory of probabilities is at bottom nothing but common sense reduced to calculus; it enables us to appreciate with exactness that which accurate minds feel with a sort of instinct for which they are frequently unable to account." --- Pierre-Simon Laplace, Theorie Analytique des Probabilites (1812)

Chapter Overview

In Chapter 1, we surveyed the landscape of sports betting --- its history, its mechanics, and the ecosystem of participants who make markets function. We established that sports betting is not a game of luck but a game of skill conducted under uncertainty. The single most important skill in that game is the subject of this chapter: the ability to reason rigorously about probability and to translate that reasoning into the language of odds.

Every decision you will ever make as a sports bettor reduces, in the final analysis, to a comparison between two numbers: the probability you believe an event will occur, and the probability implied by the odds a sportsbook is offering. If your estimate is more accurate than the market's, you have an edge. If it is not, you do not. Everything else --- bankroll management, line shopping, hedging, modeling --- is built on top of this foundation.

This chapter will take you from the formal axioms of probability through every major odds format used around the world, into the mechanics of how sportsbooks build their margins, and finally into the practical craft of estimating probabilities yourself. By the end, you will be able to:

  1. Apply probability rules --- addition, multiplication, conditional --- to real sporting scenarios.
  2. Convert between American, decimal, fractional, Hong Kong, and Malay odds without hesitation.
  3. Extract the implied probability from any set of odds and identify the sportsbook's margin.
  4. Evaluate sportsbook quality by comparing overround across markets.
  5. Build and calibrate your own probability estimates from historical data.

We will work through detailed examples with real-world numbers, derive every formula from first principles, and implement everything in Python so you can apply these ideas immediately. Let us begin.


2.1 Foundations of Probability

Before we can understand odds, we must understand what odds represent: probability. Probability is the mathematical framework for quantifying uncertainty, and it is the bedrock on which all of sports betting rests.

Sample Spaces and Events

A sample space (denoted $\Omega$) is the set of all possible outcomes of an experiment or observation. An event is any subset of the sample space.

Consider a simple sporting example: an NFL game between the Kansas City Chiefs and the Buffalo Bills. If we are only interested in the final result (ignoring ties for the moment, since NFL regular-season games can end in ties but playoff games cannot), the sample space is:

$$\Omega = \{\text{Chiefs win}, \text{Bills win}\}$$

If we include the possibility of a tie in a regular-season game:

$$\Omega = \{\text{Chiefs win}, \text{Bills win}, \text{Tie}\}$$

For a more granular sample space, we might consider the margin of victory:

$$\Omega = \{..., \text{Chiefs by 3}, \text{Chiefs by 2}, \text{Chiefs by 1}, \text{Tie}, \text{Bills by 1}, \text{Bills by 2}, \text{Bills by 3}, ...\}$$

The event "Chiefs win" is then the subset of all outcomes where the Chiefs' margin is positive. The event "Bills cover a +3.5 spread" is the subset of all outcomes where the Bills lose by 3 or fewer points, or win outright.

Key Insight: The way you define your sample space determines what questions you can answer. A bettor who thinks only in terms of "win or lose" has a coarser view of the world than one who thinks in terms of score distributions. As we will see in later chapters, the most successful bettors build detailed models of the full sample space --- not just the binary outcome.

The Axioms of Probability (Kolmogorov)

In 1933, the Russian mathematician Andrey Kolmogorov formalized probability theory with three axioms. Every rule we use in sports betting derives from these:

Axiom 1 (Non-negativity): For any event $A$, the probability of $A$ is non-negative:

$$P(A) \geq 0$$

Axiom 2 (Normalization): The probability of the entire sample space is 1:

$$P(\Omega) = 1$$

Axiom 3 (Countable Additivity): For any countable sequence of mutually exclusive events $A_1, A_2, A_3, \ldots$:

$$P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)$$

From these three axioms, everything follows. The probability of an event not occurring (its complement) is:

$$P(A^c) = 1 - P(A)$$

If the Chiefs have a 58% chance of winning, the probability of them not winning is 42%.

The Addition Rule

For any two events $A$ and $B$:

$$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

If the events are mutually exclusive (they cannot both occur), the intersection term is zero:

$$P(A \cup B) = P(A) + P(B) \quad \text{(if } A \text{ and } B \text{ are mutually exclusive)}$$

Worked Example 2.1: Applying the Addition Rule

Suppose in an NBA game, you estimate the following probabilities:

  • $P(\text{Team A wins by 1--5 points}) = 0.18$
  • $P(\text{Team A wins by 6--10 points}) = 0.15$
  • $P(\text{Team A wins by 11+ points}) = 0.12$

What is the probability that Team A wins the game?

These events are mutually exclusive (Team A cannot simultaneously win by 1--5 and by 6--10 points), so:

$$P(\text{Team A wins}) = 0.18 + 0.15 + 0.12 = 0.45$$

Now suppose you also want to know: what is the probability that Team A wins OR the total score goes over 210.5? These events are not mutually exclusive --- Team A can win in a high-scoring game. You would need:

$$P(\text{A wins} \cup \text{Over 210.5}) = P(\text{A wins}) + P(\text{Over 210.5}) - P(\text{A wins} \cap \text{Over 210.5})$$

This overlap is precisely why parlaying a side and a total from the same game requires careful thought about correlation, a topic we will return to in Chapter 10.

The Multiplication Rule and Conditional Probability

The conditional probability of event $A$ given that event $B$ has occurred is:

$$P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \quad \text{provided } P(B) > 0$$

Rearranging gives us the multiplication rule:

$$P(A \cap B) = P(A \mid B) \cdot P(B)$$

This is one of the most important formulas in sports betting. Consider a live-betting scenario:

Worked Example 2.2: Conditional Probability in Live Betting

Historical data from 5 NFL seasons (2019--2023) shows the following:

  • Teams that lead at halftime win the game approximately 78% of the time.
  • The Kansas City Chiefs lead at halftime in approximately 62% of their games.
  • Therefore, the probability that the Chiefs both lead at halftime AND win the game is:

$$P(\text{Lead at half} \cap \text{Win}) = P(\text{Win} \mid \text{Lead at half}) \times P(\text{Lead at half})$$ $$= 0.78 \times 0.62 = 0.4836$$

But conditional probability is even more powerful when you have game-specific information. Suppose the Chiefs are down by 10 at halftime. Historical data tells us that teams trailing by 10 or more at halftime win only about 12% of the time. But the Chiefs under Patrick Mahomes have won approximately 22% of such games. If you are betting the live moneyline, this conditional information --- $P(\text{Win} \mid \text{Down 10 at half, Mahomes})$ rather than the generic $P(\text{Win} \mid \text{Down 10 at half})$ --- is what matters.

Practical Note: Conditional probability is the mathematical engine behind live betting. Every time the game state changes --- a goal, a turnover, a red card --- the conditional probability of each outcome shifts. The bettor who updates these probabilities faster and more accurately than the sportsbook has an edge.

Independence and Dependence

Two events $A$ and $B$ are independent if the occurrence of one does not affect the probability of the other:

$$P(A \cap B) = P(A) \cdot P(B) \quad \text{(if independent)}$$

Equivalently, $P(A \mid B) = P(A)$.

In sports betting, true independence is rarer than most people assume. Consider:

  • Are the outcomes of consecutive NBA games for the same team independent? Not entirely --- fatigue, travel, injuries carry over.
  • Are the results of two games being played simultaneously in different cities by different teams independent? Much closer to independent, though weather, referee assignments, and public betting patterns can create subtle correlations.
  • Are the moneyline and the total in the same game independent? Absolutely not --- game flow, pace, and blowout dynamics create strong dependence.

Assuming independence when it does not exist is one of the most common errors in sports betting, particularly in parlay construction. We will analyze this in depth in Chapter 10.

Building Probability Intuition with Simulation

One of the most effective ways to internalize probability is through simulation. The following Python code demonstrates the law of large numbers --- the principle that observed frequencies converge to true probabilities as sample sizes increase.

"""
Simulation demonstrating the law of large numbers.
As the number of trials increases, the observed frequency
converges to the true probability.
"""

import random
from typing import List, Tuple


def simulate_coin_flips(
    n_flips: int,
    true_probability: float = 0.5,
    seed: int = 42
) -> Tuple[List[int], List[float]]:
    """
    Simulate n coin flips and track the running proportion of heads.

    Args:
        n_flips: Number of flips to simulate.
        true_probability: The true probability of heads (default 0.5 for fair coin).
        seed: Random seed for reproducibility.

    Returns:
        A tuple of (cumulative_heads, running_proportions).
    """
    random.seed(seed)
    cumulative_heads: List[int] = []
    running_proportions: List[float] = []
    heads_count = 0

    for i in range(1, n_flips + 1):
        if random.random() < true_probability:
            heads_count += 1
        cumulative_heads.append(heads_count)
        running_proportions.append(heads_count / i)

    return cumulative_heads, running_proportions


def simulate_home_wins(
    n_games: int,
    home_win_probability: float = 0.574,
    seed: int = 42
) -> float:
    """
    Simulate a season of games and return the observed home win rate.

    The default probability of 0.574 reflects the historical NFL home-field
    advantage from 2000-2023.

    Args:
        n_games: Number of games to simulate.
        home_win_probability: True probability of a home win.
        seed: Random seed for reproducibility.

    Returns:
        The observed proportion of home wins.
    """
    random.seed(seed)
    home_wins = sum(
        1 for _ in range(n_games)
        if random.random() < home_win_probability
    )
    return home_wins / n_games


# Demonstration
if __name__ == "__main__":
    # Show convergence with increasing sample size
    for n in [10, 100, 1_000, 10_000, 100_000]:
        _, proportions = simulate_coin_flips(n)
        print(f"After {n:>7,} flips: observed proportion = {proportions[-1]:.4f}")

    print()

    # Simulate NFL home-field advantage
    for n in [16, 256, 2560, 25600]:
        rate = simulate_home_wins(n)
        print(f"After {n:>6,} games: observed home win rate = {rate:.4f} "
              f"(true = 0.5740)")

The key takeaway: with small samples, observed frequencies can deviate wildly from true probabilities. With 16 games (one team's home schedule), you might observe a home win rate anywhere from 40% to 75% even if the true rate is 57.4%. This is why we need large datasets to estimate probabilities reliably --- a theme we will return to in Section 2.5 and throughout the book.


2.2 Odds Formats and Conversions

Odds are the language sportsbooks use to communicate probability and payouts. Different regions of the world have adopted different formats, but they all encode the same underlying information. A professional bettor must be fluent in all of them.

American Odds (Moneyline Odds)

American odds are the standard in the United States. They come in two forms:

Positive American odds (e.g., +150) tell you how much profit you would make on a \$100 stake. Odds of +150 mean a \$100 bet returns \$150 in profit plus your \$100 stake, for a total payout of \$250.

Negative American odds (e.g., -110) tell you how much you must stake to make \$100 in profit. Odds of -110 mean you must bet \$110 to win \$100 in profit, for a total payout of \$210.

The break-even point is +100 / -100, which represents even money (a 50% implied probability). In practice, you will rarely see these exact numbers because of the sportsbook's margin.

General payout formulas for American odds:

For positive odds (+$X$):

$$\text{Profit} = \text{Stake} \times \frac{X}{100}$$

$$\text{Total Payout} = \text{Stake} \times \left(1 + \frac{X}{100}\right)$$

For negative odds (-$X$):

$$\text{Profit} = \text{Stake} \times \frac{100}{X}$$

$$\text{Total Payout} = \text{Stake} \times \left(1 + \frac{100}{X}\right)$$

Why -110 on Both Sides? The most common odds you will encounter in spread and totals betting are -110 on both sides. This is the sportsbook's standard vigorish. If the book takes equal action on both sides, it pays out \$210 to the winners (their \$110 stake plus \$100 profit) while keeping \$110 from the losers, netting \$10 on \$220 in total action --- a margin of approximately 4.55%.

Decimal Odds

Decimal odds are the global standard outside North America. They represent the total payout (stake plus profit) per unit staked.

Decimal odds of 2.50 mean that for every \$1 you stake, you receive \$2.50 back if you win --- \$1.50 in profit plus your \$1 stake.

$$\text{Profit} = \text{Stake} \times (\text{Decimal Odds} - 1)$$

$$\text{Total Payout} = \text{Stake} \times \text{Decimal Odds}$$

Decimal odds are arguably the most intuitive format because the math is simple multiplication. They also make comparing odds across different markets trivially easy --- higher decimal odds always mean a bigger payout.

Decimal Odds \$100 Stake Profit Total Payout Rough Interpretation
1.50 \$50 | \$150 Strong favorite
1.91 \$91 | \$191 Standard -110 equivalent
2.00 \$100 | \$200 Even money
2.50 \$150 | \$250 Moderate underdog
5.00 \$400 | \$500 Significant underdog
10.00 \$900 | \$1,000 Long shot
51.00 \$5,000 | \$5,100 Extreme long shot

Fractional Odds

Fractional odds, traditional in the United Kingdom and Ireland, express the ratio of profit to stake. Odds of 3/2 (read "three to two") mean you win \$3 in profit for every \$2 staked.

$$\text{Profit} = \text{Stake} \times \frac{\text{Numerator}}{\text{Denominator}}$$

$$\text{Total Payout} = \text{Stake} \times \left(1 + \frac{\text{Numerator}}{\text{Denominator}}\right)$$

Common fractional odds and their equivalents:

Fractional Decimal American Implied Probability
1/10 1.10 -1000 90.91%
1/4 1.25 -400 80.00%
1/2 1.50 -200 66.67%
4/6 1.667 -150 60.00%
10/11 1.909 -110 52.38%
1/1 (Evens) 2.00 +100 50.00%
6/4 2.50 +150 40.00%
2/1 3.00 +200 33.33%
5/1 6.00 +500 16.67%
10/1 11.00 +1000 9.09%
50/1 51.00 +5000 1.96%

Hong Kong Odds

Hong Kong odds express the profit per unit staked --- they are simply decimal odds minus 1. If the Hong Kong odds are 1.50, you win \$1.50 profit on a \$1 stake (equivalent to decimal odds of 2.50).

$$\text{HK Odds} = \text{Decimal Odds} - 1$$

$$\text{Profit} = \text{Stake} \times \text{HK Odds}$$

Malay Odds

Malay odds use a system similar to American odds but normalized to a 1-unit stake. They come in positive and negative forms:

Positive Malay odds (0 to 1): These represent the profit on a 1-unit stake. Malay odds of 0.80 mean you win 0.80 for every 1 unit staked.

Negative Malay odds (-1 to 0): These represent how much you must risk to win 1 unit. Malay odds of -0.80 mean you risk 0.80 to win a maximum of 1 unit.

$$\text{If Malay} > 0: \text{Decimal Odds} = 1 + \text{Malay Odds}$$

$$\text{If Malay} < 0: \text{Decimal Odds} = 1 + \frac{1}{|\text{Malay Odds}|}$$

Master Conversion Formulas

The following table provides the complete set of conversion formulas. Decimal odds serve as the natural "hub" for conversions because the arithmetic is simplest.

Converting TO Decimal Odds:

From Format Formula
American (positive) $d = 1 + \frac{A}{100}$
American (negative) $d = 1 + \frac{100}{\|A\|}$
Fractional ($p/q$) $d = 1 + \frac{p}{q}$
Hong Kong $d = 1 + h$
Malay (positive) $d = 1 + m$
Malay (negative) $d = 1 + \frac{1}{\|m\|}$

Converting FROM Decimal Odds:

To Format Formula
American ($d \geq 2$) $A = (d - 1) \times 100$
American ($d < 2$) $A = \frac{-100}{d - 1}$
Fractional $\frac{p}{q} = d - 1$ (then simplify to integer ratio)
Hong Kong $h = d - 1$
Malay ($d \leq 2$) $m = d - 1$
Malay ($d > 2$) $m = \frac{-1}{d - 1}$

Python Implementation: OddsConverter

The following class provides a complete, production-quality odds conversion utility.

"""
A comprehensive odds conversion utility for sports betting.
Supports American, Decimal, Fractional, Hong Kong, and Malay formats.
"""

from __future__ import annotations
from dataclasses import dataclass
from fractions import Fraction
from typing import Union


@dataclass
class OddsConverter:
    """
    Convert between all major odds formats.

    The internal representation is decimal odds, which serves as the
    canonical format from which all other formats are derived.

    Attributes:
        decimal: The decimal odds value (must be > 1.0).
    """

    decimal: float

    def __post_init__(self) -> None:
        """Validate that decimal odds are greater than 1.0."""
        if self.decimal <= 1.0:
            raise ValueError(
                f"Decimal odds must be > 1.0, got {self.decimal}. "
                f"Odds of 1.0 or less imply zero or negative profit."
            )

    # --- Factory Methods (construct from any format) ---

    @classmethod
    def from_american(cls, american: int) -> OddsConverter:
        """
        Create from American odds.

        Args:
            american: American odds value (e.g., +150 or -110).

        Returns:
            An OddsConverter instance.

        Raises:
            ValueError: If american is 0 or between -100 and 0 exclusive.
        """
        if american == 0:
            raise ValueError("American odds of 0 are undefined.")
        if american > 0:
            decimal = 1 + (american / 100)
        else:
            decimal = 1 + (100 / abs(american))
        return cls(decimal=round(decimal, 6))

    @classmethod
    def from_decimal(cls, decimal: float) -> OddsConverter:
        """
        Create from decimal odds.

        Args:
            decimal: Decimal odds value (e.g., 2.50).

        Returns:
            An OddsConverter instance.
        """
        return cls(decimal=decimal)

    @classmethod
    def from_fractional(cls, numerator: int, denominator: int) -> OddsConverter:
        """
        Create from fractional odds.

        Args:
            numerator: The profit portion (e.g., 3 in 3/2).
            denominator: The stake portion (e.g., 2 in 3/2).

        Returns:
            An OddsConverter instance.
        """
        if denominator == 0:
            raise ValueError("Denominator cannot be zero.")
        decimal = 1 + (numerator / denominator)
        return cls(decimal=round(decimal, 6))

    @classmethod
    def from_hongkong(cls, hk: float) -> OddsConverter:
        """
        Create from Hong Kong odds.

        Args:
            hk: Hong Kong odds value (e.g., 1.50).

        Returns:
            An OddsConverter instance.
        """
        if hk <= 0:
            raise ValueError(f"Hong Kong odds must be > 0, got {hk}.")
        return cls(decimal=1 + hk)

    @classmethod
    def from_malay(cls, malay: float) -> OddsConverter:
        """
        Create from Malay odds.

        Args:
            malay: Malay odds value (e.g., 0.80 or -0.80).

        Returns:
            An OddsConverter instance.
        """
        if malay == 0:
            raise ValueError("Malay odds of 0 are undefined.")
        if malay > 0:
            decimal = 1 + malay
        else:
            decimal = 1 + (1 / abs(malay))
        return cls(decimal=round(decimal, 6))

    # --- Conversion Methods (output to any format) ---

    def to_american(self) -> int:
        """
        Convert to American odds.

        Returns:
            American odds as an integer (e.g., +150 or -110).
        """
        if self.decimal >= 2.0:
            return round((self.decimal - 1) * 100)
        else:
            return round(-100 / (self.decimal - 1))

    def to_decimal(self) -> float:
        """Return decimal odds."""
        return round(self.decimal, 4)

    def to_fractional(self) -> str:
        """
        Convert to fractional odds as a simplified string.

        Returns:
            A string like '3/2' or '1/1'.
        """
        frac = Fraction(self.decimal - 1).limit_denominator(1000)
        return f"{frac.numerator}/{frac.denominator}"

    def to_hongkong(self) -> float:
        """
        Convert to Hong Kong odds.

        Returns:
            Hong Kong odds as a float.
        """
        return round(self.decimal - 1, 4)

    def to_malay(self) -> float:
        """
        Convert to Malay odds.

        Returns:
            Malay odds as a float (positive if decimal <= 2, negative otherwise).
        """
        if self.decimal <= 2.0:
            return round(self.decimal - 1, 4)
        else:
            return round(-1 / (self.decimal - 1), 4)

    def implied_probability(self) -> float:
        """
        Calculate the implied probability (no-vig).

        Returns:
            Implied probability as a float between 0 and 1.
        """
        return round(1 / self.decimal, 6)

    def __repr__(self) -> str:
        """Provide a comprehensive string representation."""
        am = self.to_american()
        am_str = f"+{am}" if am > 0 else str(am)
        return (
            f"OddsConverter(decimal={self.to_decimal()}, "
            f"american={am_str}, "
            f"fractional={self.to_fractional()}, "
            f"hk={self.to_hongkong()}, "
            f"malay={self.to_malay()}, "
            f"implied_prob={self.implied_probability():.4f})"
        )


# Demonstration
if __name__ == "__main__":
    # Create from different formats and show all conversions
    examples = [
        OddsConverter.from_american(-110),
        OddsConverter.from_american(+150),
        OddsConverter.from_decimal(2.50),
        OddsConverter.from_fractional(5, 1),
        OddsConverter.from_hongkong(0.909),
        OddsConverter.from_malay(-0.667),
    ]

    for odds in examples:
        print(odds)
        print()

Tip for Beginners: If you are new to odds and find yourself confused, always convert to decimal first. Decimal odds are the most transparent format: multiply your stake by the decimal odds to get your total payout. Everything else is a cosmetic wrapper around this core idea.


2.3 Implied Probability

What Implied Probability Means

Every set of odds corresponds to a probability --- the probability at which a bet at those odds would break even in the long run. This is called the implied probability. It is "implied" because it is embedded in the odds, not because it is the true probability of the event occurring.

If a sportsbook offers the Chiefs at -150 (decimal 1.667), the implied probability is:

$$P_{\text{implied}} = \frac{1}{\text{Decimal Odds}} = \frac{1}{1.667} = 0.5999 \approx 60.0\%$$

This means that if the Chiefs won exactly 60% of the time, a bettor placing this wager at -150 would break even in the long run (before accounting for opportunity cost). If the Chiefs win more than 60% of the time, the bet has positive expected value. If less, it has negative expected value.

Conversion Formulas for Each Format

From Decimal Odds:

$$P_{\text{implied}} = \frac{1}{d}$$

From American Odds (positive, +$A$):

$$P_{\text{implied}} = \frac{100}{A + 100}$$

From American Odds (negative, -$A$):

$$P_{\text{implied}} = \frac{|A|}{|A| + 100}$$

From Fractional Odds ($p/q$):

$$P_{\text{implied}} = \frac{q}{p + q}$$

From Hong Kong Odds ($h$):

$$P_{\text{implied}} = \frac{1}{1 + h}$$

Why Implied Probabilities Sum to More Than 100%

If you take the implied probabilities of all outcomes in a market and sum them, you will find that the total exceeds 100%. This excess is the sportsbook's overround (also called the vig, juice, or margin), and it is how sportsbooks make money.

Worked Example 2.3: Implied Probabilities for an NFL Moneyline

Consider an NFL game with the following moneyline odds:

Team American Odds Decimal Odds Implied Probability
Kansas City Chiefs -155 1.645 $\frac{155}{255} = 60.78\%$
Buffalo Bills +135 2.35 $\frac{100}{235} = 42.55\%$

The sum of implied probabilities:

$$60.78\% + 42.55\% = 103.33\%$$

This 3.33% excess is the overround. The "true" probabilities (if we could know them) might be something like 59.0% and 41.0%, summing to exactly 100%. The sportsbook has inflated each probability slightly, which translates to slightly lower payouts for the bettor.

Let us extend this to a three-way market (common in soccer):

Outcome Decimal Odds Implied Probability
Manchester City 1.72 $\frac{1}{1.72} = 58.14\%$
Draw 3.80 $\frac{1}{3.80} = 26.32\%$
Arsenal 4.50 $\frac{1}{4.50} = 22.22\%$

Sum of implied probabilities:

$$58.14\% + 26.32\% + 22.22\% = 106.68\%$$

The overround here is 6.68%, which is typical for a soccer match in a competitive market. Note that three-way markets generally have higher overround than two-way markets because there are more outcomes across which to distribute the margin.

The Relationship Between Odds and Probability

It is essential to distinguish between three different concepts:

  1. The true probability --- the actual likelihood of the event occurring, which no one knows with certainty.
  2. The implied probability --- the probability embedded in the sportsbook's odds, which includes the overround.
  3. Your estimated probability --- your best assessment of the true probability based on your analysis.

Profitable betting occurs when (3) is more accurate than (2) and differs from it in a direction favorable to you. That is:

  • If you estimate the true probability is higher than the implied probability, the bet has positive expected value (bet the "yes" side).
  • If you estimate the true probability is lower than the implied probability, the bet has negative expected value (avoid it, or bet the other side).

This is the fundamental equation of sports betting, and we will formalize it with the concept of expected value in Chapter 3.

Python Implementation: Batch Conversion

"""
Batch conversion of odds to implied probabilities for a full market.
Demonstrates how to analyze a complete sportsbook offering.
"""

from typing import Dict, List, Tuple


def american_to_implied_probability(american: int) -> float:
    """
    Convert American odds to implied probability.

    Args:
        american: American odds (e.g., -155 or +135).

    Returns:
        Implied probability as a float between 0 and 1.
    """
    if american > 0:
        return 100 / (american + 100)
    else:
        return abs(american) / (abs(american) + 100)


def analyze_market(
    outcomes: Dict[str, int],
    market_name: str = "Market"
) -> Dict[str, float]:
    """
    Analyze a complete betting market.

    Given a dictionary mapping outcome names to American odds,
    calculate implied probabilities, the overround, and the
    vig-free (fair) probabilities.

    Args:
        outcomes: Dictionary mapping outcome names to American odds.
        market_name: A label for this market (for display).

    Returns:
        Dictionary mapping outcome names to fair (vig-free) probabilities.
    """
    print(f"\n{'='*60}")
    print(f"  Market Analysis: {market_name}")
    print(f"{'='*60}")

    implied_probs: Dict[str, float] = {}
    for name, odds in outcomes.items():
        prob = american_to_implied_probability(odds)
        implied_probs[name] = prob
        sign = "+" if odds > 0 else ""
        print(f"  {name:<25} {sign}{odds:>6}  ->  {prob:.4f} ({prob*100:.2f}%)")

    total = sum(implied_probs.values())
    overround = total - 1.0
    print(f"\n  Sum of implied probabilities: {total:.4f} ({total*100:.2f}%)")
    print(f"  Overround (margin):           {overround:.4f} ({overround*100:.2f}%)")

    # Calculate vig-free (fair) probabilities by normalizing
    fair_probs: Dict[str, float] = {
        name: prob / total for name, prob in implied_probs.items()
    }

    print(f"\n  Vig-free (fair) probabilities:")
    for name, prob in fair_probs.items():
        print(f"    {name:<25} {prob:.4f} ({prob*100:.2f}%)")

    return fair_probs


# Demonstration: Analyze a full slate of NFL games
if __name__ == "__main__":
    nfl_week = {
        "Chiefs vs Bills": {"Kansas City Chiefs": -155, "Buffalo Bills": 135},
        "49ers vs Eagles": {"San Francisco 49ers": -130, "Philadelphia Eagles": 110},
        "Cowboys vs Giants": {"Dallas Cowboys": -280, "New York Giants": 230},
        "Bengals vs Ravens": {"Cincinnati Bengals": 105, "Baltimore Ravens": -125},
    }

    all_overrounds: List[float] = []

    for game_name, game_odds in nfl_week.items():
        analyze_market(game_odds, game_name)
        total = sum(
            american_to_implied_probability(odds)
            for odds in game_odds.values()
        )
        all_overrounds.append(total - 1.0)

    avg_overround = sum(all_overrounds) / len(all_overrounds)
    print(f"\n{'='*60}")
    print(f"  Average overround across {len(all_overrounds)} games: "
          f"{avg_overround:.4f} ({avg_overround*100:.2f}%)")
    print(f"{'='*60}")

2.4 The Overround and Margin

Definition and Calculation

The overround (also called the vig, juice, or bookmaker margin) is the percentage by which the sum of implied probabilities exceeds 100%. It represents the sportsbook's built-in profit margin.

For a market with $n$ outcomes having implied probabilities $p_1, p_2, \ldots, p_n$:

$$\text{Overround} = \left(\sum_{i=1}^{n} p_i\right) - 1$$

where each $p_i = \frac{1}{d_i}$ and $d_i$ is the decimal odds for outcome $i$.

The margin can also be expressed as a percentage of the total implied probability:

$$\text{Margin (\%)} = \frac{\text{Overround}}{\sum_{i=1}^{n} p_i} \times 100$$

This latter formulation tells you what fraction of every dollar wagered the sportsbook expects to keep, on average, if the true probabilities match the "fair" (vig-removed) probabilities.

An alternative and commonly cited measure is the hold percentage, which equals:

$$\text{Hold (\%)} = 1 - \frac{1}{\sum_{i=1}^{n} p_i}$$

For our Chiefs-Bills example:

$$\text{Hold} = 1 - \frac{1}{1.0333} = 1 - 0.9678 = 0.0322 = 3.22\%$$

This means that for every \$100 wagered across this market (assuming the action is distributed proportional to the true probabilities), the sportsbook expects to keep approximately \$3.22.

How Books Distribute the Overround

Sportsbooks do not always distribute the overround equally across all outcomes. There are several common strategies:

Proportional distribution: The margin is spread proportional to each outcome's probability. A -300 favorite and a +250 underdog each have their implied probability inflated by the same relative amount. This is the most mathematically clean approach.

Favorite-longshot bias distribution: Many sportsbooks shade their odds by taking a larger margin from longshots than from favorites. This exploits the well-documented favorite-longshot bias --- the tendency of casual bettors to overbet longshots (because the potential payoff is exciting) and underbet favorites (because the payoff seems small). As a result, the implied probability of longshots is often inflated more than that of favorites relative to the true probability.

Sharp-side shading: Some books shade the odds away from the side that professional ("sharp") bettors tend to favor, effectively charging a higher price to bet the "smart" side.

Understanding how a specific sportsbook distributes its margin is important for identifying value. If a book inflates longshot probabilities disproportionately, there may be more value on the favorite side, and vice versa.

Balanced vs. Unbalanced Books

A balanced book is one where the sportsbook has taken equal action (in dollar terms, adjusted for odds) on each side of a bet, guaranteeing a profit regardless of the outcome. In practice, sportsbooks rarely achieve perfect balance, and modern books increasingly do not try to.

Consider our Chiefs-Bills example at -155/+135. If the book takes \$155 on the Chiefs and \$100 on the Bills:

  • If the Chiefs win: Pay out $\$155 \times \frac{100}{155} = \$100$ profit to Chiefs bettors. Collect \$100 from Bills bettors. Net: \$0.
  • If the Bills win: Pay out $\$100 \times 1.35 = \$135$ profit to Bills bettors. Collect \$155 from Chiefs bettors. Net: \$20.

This is not perfectly balanced. A truly balanced book would require action proportional to the implied probabilities on each side.

Modern sportsbooks often take a position --- accepting an unbalanced book when they believe one side is more likely than the other. Books with sophisticated models may intentionally accept more risk on the side they believe is less likely to win, effectively acting as bettors themselves. This is a significant evolution from the traditional "balanced book" model discussed in Chapter 1.

Comparing Overround Across Sportsbooks and Sports

Not all sportsbooks charge the same margin, and margins vary significantly across different sports and bet types. Here are typical ranges:

Market Type Typical Overround
NFL sides/totals (major book) 4.0--5.0%
NFL moneyline (two-way) 2.5--4.0%
NBA sides/totals 4.0--5.5%
MLB moneyline 3.0--5.0%
Soccer match result (three-way) 5.0--8.0%
Soccer match result (Pinnacle) 2.0--3.0%
Tennis match winner (two-way) 3.0--6.0%
NFL player props 6.0--15.0%
Futures markets (e.g., Super Bowl) 15.0--40.0%

Key Insight: The overround is the single most important factor in determining how hard it is to be profitable in a given market. A bettor with a 2% edge will be profitable in a market with 3% overround but unprofitable in a market with 10% overround. This is why professional bettors overwhelmingly focus on main markets (sides, totals, moneylines) at sharp sportsbooks like Pinnacle, which offer the lowest margins in the industry.

Using Overround to Assess Market Quality

When you encounter a set of odds, the overround tells you several things:

  1. The cost of placing a bet. Higher overround means a larger implicit "tax" on your wager.
  2. The competitiveness of the market. Highly liquid, competitive markets (NFL sides) have low overround. Thin, niche markets (conference tournament futures for mid-major colleges) have high overround.
  3. The potential for value. Markets with high overround are harder to beat, but they also tend to be less efficient. The sportsbook is charging more because it is less confident in its prices, which means there may be more mispricing --- if you can find it.

Python Implementation: Margin Calculator and Comparison

"""
Tools for calculating and comparing sportsbook margins across
different books and market types.
"""

from dataclasses import dataclass
from typing import Dict, List, Optional


@dataclass
class MarketOdds:
    """
    Represents a complete betting market at a single sportsbook.

    Attributes:
        sportsbook: Name of the sportsbook.
        market: Description of the market (e.g., 'Chiefs vs Bills ML').
        outcomes: Dictionary mapping outcome names to decimal odds.
    """
    sportsbook: str
    market: str
    outcomes: Dict[str, float]

    @property
    def implied_probabilities(self) -> Dict[str, float]:
        """Calculate the implied probability for each outcome."""
        return {name: 1 / odds for name, odds in self.outcomes.items()}

    @property
    def total_implied_probability(self) -> float:
        """Sum of all implied probabilities."""
        return sum(self.implied_probabilities.values())

    @property
    def overround(self) -> float:
        """The overround (excess above 1.0)."""
        return self.total_implied_probability - 1.0

    @property
    def overround_pct(self) -> float:
        """The overround as a percentage."""
        return self.overround * 100

    @property
    def hold_pct(self) -> float:
        """The expected hold percentage."""
        return (1 - 1 / self.total_implied_probability) * 100

    @property
    def fair_probabilities(self) -> Dict[str, float]:
        """Vig-free probabilities (normalized to sum to 1)."""
        total = self.total_implied_probability
        return {
            name: prob / total
            for name, prob in self.implied_probabilities.items()
        }

    def fair_decimal_odds(self) -> Dict[str, float]:
        """Calculate fair (no-vig) decimal odds for each outcome."""
        return {
            name: round(1 / prob, 4)
            for name, prob in self.fair_probabilities.items()
        }


def compare_sportsbooks(
    markets: List[MarketOdds],
    target_market: Optional[str] = None
) -> None:
    """
    Compare margins across multiple sportsbooks for the same market.

    Args:
        markets: List of MarketOdds from different sportsbooks.
        target_market: If provided, filter to only this market name.
    """
    if target_market:
        markets = [m for m in markets if m.market == target_market]

    if not markets:
        print("No markets to compare.")
        return

    print(f"\n{'='*70}")
    print(f"  Sportsbook Comparison: {markets[0].market}")
    print(f"{'='*70}")
    print(f"  {'Book':<20} {'Overround':>10} {'Hold %':>10} ", end="")

    outcome_names = list(markets[0].outcomes.keys())
    for name in outcome_names:
        print(f" {name:>12}", end="")
    print()
    print(f"  {'-'*66}")

    # Sort by overround (lowest = best for bettor)
    markets_sorted = sorted(markets, key=lambda m: m.overround)

    for m in markets_sorted:
        print(f"  {m.sportsbook:<20} {m.overround_pct:>9.2f}% {m.hold_pct:>9.2f}%", end="")
        for name in outcome_names:
            print(f" {m.outcomes[name]:>12.3f}", end="")
        print()

    # Show the best odds available across all books for each outcome
    print(f"\n  Best available odds:")
    for name in outcome_names:
        best_odds = max(m.outcomes[name] for m in markets_sorted)
        best_book = max(markets_sorted, key=lambda m: m.outcomes[name]).sportsbook
        print(f"    {name}: {best_odds:.3f} at {best_book}")

    # Calculate the overround if you always take the best available line
    best_odds_implied = sum(
        1 / max(m.outcomes[name] for m in markets_sorted)
        for name in outcome_names
    )
    best_overround = (best_odds_implied - 1.0) * 100
    print(f"\n  Best-line overround (shopping all books): {best_overround:.2f}%")


# Demonstration
if __name__ == "__main__":
    game = "Chiefs vs Bills ML"

    sportsbooks = [
        MarketOdds("Pinnacle", game, {"Chiefs": 1.649, "Bills": 2.370}),
        MarketOdds("FanDuel", game, {"Chiefs": 1.645, "Bills": 2.340}),
        MarketOdds("DraftKings", game, {"Chiefs": 1.625, "Bills": 2.350}),
        MarketOdds("BetMGM", game, {"Chiefs": 1.635, "Bills": 2.300}),
        MarketOdds("Caesars", game, {"Chiefs": 1.640, "Bills": 2.320}),
    ]

    compare_sportsbooks(sportsbooks, game)

    # Show fair probabilities from the sharpest book
    pinnacle = sportsbooks[0]
    print(f"\n  Fair probabilities (Pinnacle, sharpest line):")
    for name, prob in pinnacle.fair_probabilities.items():
        print(f"    {name}: {prob:.4f} ({prob*100:.2f}%)")
    print(f"\n  Fair decimal odds:")
    for name, odds in pinnacle.fair_decimal_odds().items():
        print(f"    {name}: {odds:.3f}")

Line Shopping: The demonstration above illustrates one of the most powerful and simplest strategies in sports betting --- line shopping. By comparing odds across multiple sportsbooks and always taking the best available price, you can often turn the effective overround from 4--5% at a single book down to 1--2% or even negative (meaning you have a built-in edge just from shopping). We will explore this in depth in Chapter 8.


2.5 Practical Probability Estimation

So far, we have focused on extracting probabilities from odds. But the real challenge in sports betting is going the other direction: estimating the true probability of an event and then comparing it to the market's implied probability. This section introduces three practical approaches.

The Historical Frequency Approach

The simplest way to estimate probability is to count how often something has happened in the past. If we want to estimate the probability that the home team wins an NFL game, we can look at historical data:

NFL Seasons Home Wins Total Games Home Win Rate
2000--2004 655 1,264 51.8%
2005--2009 719 1,316 54.6%
2010--2014 747 1,328 56.3%
2015--2019 744 1,328 56.0%
2020--2024 723 1,348 53.6%
2000--2024 3,588 6,584 54.5%

Note: The 2020 season included games with no or limited fans due to the COVID-19 pandemic, and the home win rate dropped to approximately 51.3% that year, providing a natural experiment on the effect of crowd support. The 2023 season saw a continued modest decline in home-field advantage, a trend many attribute to improvements in travel logistics and the increasing sophistication of road teams' preparation.

The historical frequency approach has clear strengths and limitations:

Strengths: - Simple, transparent, and easy to compute. - Provides a solid baseline or "prior" probability. - Can be computed for very specific situations (e.g., "How often does a team favored by 3--6 points on the road cover the spread?").

Limitations: - Assumes the future resembles the past, which is not always true. - Sensitive to how you define the reference class (which seasons? Which teams? Which conditions?). - Does not account for game-specific information (injuries, weather, motivation).

Power Ratings and Their Relationship to Probability

A power rating assigns a single number to each team that represents its overall quality. The difference in power ratings between two teams can then be converted to a predicted point spread and, from there, to a win probability.

The most common approach uses a normal (Gaussian) distribution. If Team A has a power rating of 85 and Team B has a power rating of 79, the predicted margin for Team A is:

$$\text{Predicted Margin}_A = (\text{Rating}_A - \text{Rating}_B) + \text{HFA}$$

where HFA is the home-field advantage (typically 2.5--3.0 points in the NFL as of recent seasons, down from a historical average of around 3.0).

If Team A is at home:

$$\text{Predicted Margin}_A = (85 - 79) + 2.5 = 8.5 \text{ points}$$

To convert this to a win probability, we model the actual margin as a normal distribution with mean equal to the predicted margin and standard deviation $\sigma$ (typically around 13.5 points for NFL games):

$$P(\text{Team A wins}) = P(X > 0) = \Phi\left(\frac{\text{Predicted Margin}}{\sigma}\right)$$

where $\Phi$ is the cumulative distribution function of the standard normal distribution.

$$P(\text{Team A wins}) = \Phi\left(\frac{8.5}{13.5}\right) = \Phi(0.630) \approx 0.7357$$

So Team A, with an 8.5-point predicted margin at home, has approximately a 73.6% chance of winning.

Why 13.5 Points? The standard deviation of NFL game margins has been remarkably stable over decades, hovering around 13--14 points. This value captures the inherent randomness in football --- turnovers, injuries, special teams plays, and all the other unpredictable events that cause actual game outcomes to deviate from expectations. Understanding this variance is essential for proper probability estimation.

The Challenge of Estimating "True" Probability

In reality, nobody knows the "true" probability of a sporting event. Unlike a coin flip or a die roll, where the probability can be derived from symmetry, the probability that the Chiefs beat the Bills on a specific Sunday depends on an enormous number of factors, many of which are unknowable or unmeasurable.

What we can do is make estimates of varying quality and track how well they perform over time. This brings us to the concept of calibration.

Calibration: Are Your Probabilities Any Good?

A probability estimator is well-calibrated if, among all events assigned a probability of $p$, approximately $p \times 100\%$ of them actually occur.

For example, if you assign a 70% probability to 100 different events, and 72 of them actually occur, you are well-calibrated at the 70% level. If only 55 occur, you are overconfident. If 85 occur, you are underconfident.

The Brier score is a common metric for evaluating probabilistic predictions:

$$\text{Brier Score} = \frac{1}{N} \sum_{i=1}^{N} (p_i - o_i)^2$$

where $p_i$ is the predicted probability of the event and $o_i$ is the outcome (1 if the event occurred, 0 if it did not). Lower Brier scores are better, with 0.0 being a perfect score and 0.25 being the score of a model that assigns 50% probability to everything.

For reference: - Closing lines at sharp sportsbooks typically achieve Brier scores around 0.20--0.21 for NFL moneylines. - A simple power rating model might achieve 0.22--0.23. - Random assignment of 50% to everything yields 0.25. - Always predicting the favorite at 100% yields results far worse than 0.25 due to the quadratic penalty for high-confidence misses.

Python Implementation: Probability Estimator and Calibration

"""
A simple probability estimator from historical data, with calibration
analysis tools. Demonstrates the complete workflow from raw data to
calibrated probabilities.
"""

import math
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Tuple


@dataclass
class GameResult:
    """
    A historical game result for building probability estimates.

    Attributes:
        home_team: Name of the home team.
        away_team: Name of the away team.
        home_score: Final score of the home team.
        away_score: Final score of the away team.
    """
    home_team: str
    away_team: str
    home_score: int
    away_score: int

    @property
    def home_margin(self) -> int:
        """Margin of victory for the home team (negative = away win)."""
        return self.home_score - self.away_score

    @property
    def home_win(self) -> bool:
        """Whether the home team won."""
        return self.home_score > self.away_score


@dataclass
class SimpleRatingSystem:
    """
    A basic Elo-like rating system for estimating win probabilities.

    Uses a simple power rating approach where each team has a rating,
    and the predicted margin is the rating difference plus home-field
    advantage. Win probability is calculated using a normal CDF.

    Attributes:
        default_rating: Starting rating for new teams.
        home_field_advantage: Points added for home team.
        score_std_dev: Standard deviation of score margins.
        k_factor: Update rate (higher = more responsive to recent results).
        ratings: Current team ratings.
    """
    default_rating: float = 1500.0
    home_field_advantage: float = 2.5
    score_std_dev: float = 13.5
    k_factor: float = 20.0
    ratings: Dict[str, float] = field(default_factory=dict)

    def get_rating(self, team: str) -> float:
        """Get a team's rating, initializing if necessary."""
        if team not in self.ratings:
            self.ratings[team] = self.default_rating
        return self.ratings[team]

    def predict_margin(self, home_team: str, away_team: str) -> float:
        """
        Predict the margin of victory for the home team.

        Args:
            home_team: Name of the home team.
            away_team: Name of the away team.

        Returns:
            Predicted margin (positive = home team favored).
        """
        home_rating = self.get_rating(home_team)
        away_rating = self.get_rating(away_team)
        return (home_rating - away_rating) + self.home_field_advantage

    def predict_win_probability(
        self,
        home_team: str,
        away_team: str
    ) -> float:
        """
        Predict the probability that the home team wins.

        Uses the normal CDF to convert predicted margin to probability.

        Args:
            home_team: Name of the home team.
            away_team: Name of the away team.

        Returns:
            Probability of home team winning (0 to 1).
        """
        margin = self.predict_margin(home_team, away_team)
        # Use the error function to compute the normal CDF
        # CDF(x) = 0.5 * (1 + erf(x / sqrt(2)))
        z = margin / self.score_std_dev
        return 0.5 * (1 + math.erf(z / math.sqrt(2)))

    def update(self, game: GameResult) -> None:
        """
        Update ratings based on a game result.

        Uses a simplified margin-of-victory adjustment.

        Args:
            game: The completed game result.
        """
        predicted_margin = self.predict_margin(game.home_team, game.away_team)
        actual_margin = game.home_margin
        error = actual_margin - predicted_margin

        # Scale the update by the prediction error
        adjustment = self.k_factor * (error / self.score_std_dev)
        self.ratings[game.home_team] = (
            self.get_rating(game.home_team) + adjustment
        )
        self.ratings[game.away_team] = (
            self.get_rating(game.away_team) - adjustment
        )


def calculate_brier_score(
    predictions: List[float],
    outcomes: List[int]
) -> float:
    """
    Calculate the Brier score for a set of probabilistic predictions.

    Args:
        predictions: List of predicted probabilities (0 to 1).
        outcomes: List of actual outcomes (0 or 1).

    Returns:
        The Brier score (lower is better, 0 is perfect).

    Raises:
        ValueError: If predictions and outcomes have different lengths.
    """
    if len(predictions) != len(outcomes):
        raise ValueError("predictions and outcomes must have same length")

    return sum(
        (p - o) ** 2 for p, o in zip(predictions, outcomes)
    ) / len(predictions)


def calibration_analysis(
    predictions: List[float],
    outcomes: List[int],
    n_bins: int = 10
) -> List[Tuple[float, float, int]]:
    """
    Perform calibration analysis by binning predictions.

    Groups predictions into bins and compares the average predicted
    probability to the actual frequency of occurrence.

    Args:
        predictions: List of predicted probabilities.
        outcomes: List of actual outcomes (0 or 1).
        n_bins: Number of bins for grouping predictions.

    Returns:
        List of (bin_center, actual_frequency, count) tuples.
    """
    bins: List[Tuple[float, float, int]] = []
    bin_width = 1.0 / n_bins

    for i in range(n_bins):
        lower = i * bin_width
        upper = (i + 1) * bin_width

        # Find predictions in this bin
        in_bin = [
            (p, o) for p, o in zip(predictions, outcomes)
            if lower <= p < upper or (i == n_bins - 1 and p == upper)
        ]

        if in_bin:
            avg_predicted = sum(p for p, _ in in_bin) / len(in_bin)
            avg_actual = sum(o for _, o in in_bin) / len(in_bin)
            bins.append((avg_predicted, avg_actual, len(in_bin)))

    return bins


# Demonstration with simulated data
if __name__ == "__main__":
    import random

    random.seed(42)

    # Create a simple rating system
    model = SimpleRatingSystem()

    # Simulate a season of games between 8 teams
    teams = [
        "Chiefs", "Bills", "Eagles", "49ers",
        "Ravens", "Lions", "Cowboys", "Dolphins"
    ]

    # Set initial "true" ratings (unknown to the model)
    true_strengths = {
        "Chiefs": 5.0, "Bills": 3.5, "Eagles": 3.0, "49ers": 2.5,
        "Ravens": 2.0, "Lions": 1.0, "Cowboys": -1.0, "Dolphins": -2.0
    }

    predictions: List[float] = []
    actual_outcomes: List[int] = []

    # Simulate 200 games
    for _ in range(200):
        home = random.choice(teams)
        away = random.choice([t for t in teams if t != home])

        # Get model prediction BEFORE seeing the result
        pred = model.predict_win_probability(home, away)
        predictions.append(pred)

        # Simulate game using "true" strengths
        true_margin = (
            true_strengths[home] - true_strengths[away]
            + 2.5  # home field advantage
            + random.gauss(0, 13.5)  # random variation
        )
        home_won = 1 if true_margin > 0 else 0
        actual_outcomes.append(home_won)

        # Update model with result
        home_score = max(0, int(24 + true_margin / 2 + random.gauss(0, 5)))
        away_score = max(0, int(24 - true_margin / 2 + random.gauss(0, 5)))
        game = GameResult(home, away, home_score, away_score)
        model.update(game)

    # Calculate Brier score
    brier = calculate_brier_score(predictions, actual_outcomes)
    print(f"Brier Score: {brier:.4f}")
    print(f"(Baseline 50/50 model would score: 0.2500)")

    # Calibration analysis
    print(f"\nCalibration Analysis:")
    print(f"  {'Predicted':>10} {'Actual':>10} {'Count':>8}")
    print(f"  {'-'*30}")
    for predicted, actual, count in calibration_analysis(
        predictions, actual_outcomes
    ):
        print(f"  {predicted:>9.2%} {actual:>9.2%} {count:>8}")

    # Show final ratings
    print(f"\nFinal Ratings (after 200 games):")
    sorted_ratings = sorted(
        model.ratings.items(), key=lambda x: x[1], reverse=True
    )
    for team, rating in sorted_ratings:
        true = true_strengths[team]
        print(f"  {team:<12} Rating: {rating:>7.1f}  "
              f"(True strength: {true:>+5.1f})")

Worked Example: Estimating Win Probability from Team Stats

Let us walk through a concrete probability estimation exercise for a hypothetical NFL matchup.

Worked Example 2.4: Chiefs at Bills, Week 14

Step 1: Gather relevant statistics (through Week 13):

Metric Chiefs Bills NFL Average
Points per game 27.3 24.8 22.1
Points allowed per game 18.2 20.1 22.1
Yards per play 6.12 5.88 5.45
Yards allowed per play 5.01 5.23 5.45
Turnover margin +8 +5 0
Pythagorean win % (2.37) .716 .634 .500

Step 2: Calculate a simple power rating.

A basic approach uses the Simple Rating System (SRS), which measures a team's average point margin adjusted for strength of schedule. Let us say our SRS calculations yield:

  • Chiefs SRS: +8.2
  • Bills SRS: +4.5

Step 3: Predict the margin.

The Chiefs are the away team in this scenario. We subtract home-field advantage rather than adding it:

$$\text{Predicted Margin}_{\text{Chiefs}} = (8.2 - 4.5) - 2.5 = 1.2$$

The Chiefs are predicted to win by 1.2 points on the road, which aligns with what we might expect for a slightly better team playing away.

Step 4: Convert to win probability.

$$P(\text{Chiefs win}) = \Phi\left(\frac{1.2}{13.5}\right) = \Phi(0.089) \approx 0.5354$$

So our model estimates the Chiefs have about a 53.5% chance of winning.

Step 5: Compare to the market.

If the sportsbook is offering the Chiefs at +105 (decimal 2.05), the implied probability is:

$$P_{\text{implied}} = \frac{100}{105 + 100} = \frac{100}{205} = 0.4878 = 48.78\%$$

Our estimate of 53.5% is significantly higher than the implied probability of 48.8%. The difference is:

$$\Delta P = 53.5\% - 48.8\% = 4.7\%$$

This suggests potential value on the Chiefs. But before betting, we would want to:

  1. Check if our model has been well-calibrated historically (per the methods above).
  2. Consider factors the model may not capture (specific injury news, weather, travel, motivation).
  3. Assess our confidence interval --- is the true probability likely between 50% and 57%, or could it be anywhere from 45% to 62%?
  4. Apply the expected value framework (Chapter 3) and Kelly criterion (Chapter 7) to size the bet appropriately.

Practical Caution: The gap between "I think there is value here" and "I am confident enough to bet real money" is enormous. Estimating probabilities is hard, and overconfidence is the bettor's greatest enemy. The calibration exercises in this section are not optional --- they are essential. If you cannot demonstrate, over hundreds of predictions, that your probability estimates are well-calibrated, you should not be betting based on them.


2.6 Chapter Summary

Key Concepts

This chapter established the mathematical foundation upon which all sports betting analysis rests. We covered:

  1. Probability fundamentals --- the axioms, addition and multiplication rules, conditional probability, and independence. These are not abstract concepts but practical tools you will use every time you evaluate a bet.

  2. Odds formats --- American, decimal, fractional, Hong Kong, and Malay. Each format encodes the same information in a different way, and a professional bettor must be fluent in all of them.

  3. Implied probability --- the probability embedded in the odds, which always exceeds the true probability due to the sportsbook's margin. Extracting and comparing implied probabilities is the first step in identifying value.

  4. The overround --- the sportsbook's built-in profit margin. Understanding how much you are being "taxed" on each bet, and how that tax varies across books and markets, is essential for determining where to focus your betting activity.

  5. Practical probability estimation --- using historical frequencies, power ratings, and calibration analysis to develop and validate your own probability estimates.

Key Formulas

Formula Description
$P(A^c) = 1 - P(A)$ Complement rule
$P(A \cup B) = P(A) + P(B) - P(A \cap B)$ Addition rule
$P(A \cap B) = P(A \mid B) \cdot P(B)$ Multiplication rule
$P_{\text{implied}} = \frac{1}{d}$ Implied probability from decimal odds
$P_{\text{imp}}^{+} = \frac{100}{A + 100}$ Implied prob from positive American odds
$P_{\text{imp}}^{-} = \frac{\|A\|}{\|A\|+100}$ Implied prob from negative American odds
$d = 1 + \frac{A}{100}$ (if $A > 0$) Positive American to decimal
$d = 1 + \frac{100}{\|A\|}$ (if $A < 0$) Negative American to decimal
$\text{Overround} = \sum \frac{1}{d_i} - 1$ Overround from decimal odds
$\text{Hold} = 1 - \frac{1}{\sum p_i}$ Hold percentage
$P(\text{win}) = \Phi\left(\frac{\mu}{\sigma}\right)$ Margin to win probability
$\text{Brier} = \frac{1}{N}\sum(p_i - o_i)^2$ Brier score

Key Code Patterns

In this chapter, we implemented:

  • simulate_coin_flips() --- Demonstrates the law of large numbers and builds intuition for probability convergence.
  • OddsConverter --- A complete class for converting between all five major odds formats. Use this as a utility throughout your betting operations.
  • analyze_market() --- Extracts implied probabilities from a set of odds and calculates the overround. Use this every time you evaluate a new market.
  • MarketOdds and compare_sportsbooks() --- Tools for comparing margins across sportsbooks. Essential for line shopping.
  • SimpleRatingSystem --- A basic power-rating model that estimates win probabilities from team strength. This is the skeleton upon which more sophisticated models (Chapters 12--15) are built.
  • calculate_brier_score() and calibration_analysis() --- Calibration tools that tell you whether your probability estimates are any good. Run these regularly.

Decision Framework: Choosing Odds Formats

When should you use which format? Here is a practical guide:

Situation Best Format Reason
Calculating payouts quickly Decimal Simple multiplication: stake x odds = payout
Comparing odds across books Decimal Higher number always = better odds for bettor
Communicating with US bettors American Industry standard in the United States
Communicating with UK/Irish bettors Fractional Cultural convention
Calculating implied probability Decimal Simplest formula: 1/d
Building models and spreadsheets Decimal Easiest arithmetic; avoids positive/negative split
Calculating expected value Decimal EV = (prob x (odds - 1)) - ((1 - prob) x 1)
Assessing the overround quickly Decimal Sum of 1/d for all outcomes; subtract 1

Our Recommendation: If you are building a systematic approach to sports betting --- which this book assumes you are --- use decimal odds as your default internal format. Convert to American or fractional only when communicating with others or interfacing with specific sportsbook platforms that require it. The OddsConverter class from Section 2.2 handles all conversions.


What's Next

You now understand how to quantify uncertainty (probability), how sportsbooks express their prices (odds), and how to extract the probability the market is assigning to each outcome (implied probability). But knowing the probability of an event is only half the battle.

In Chapter 3: Expected Value and the Betting Edge, we will combine probability with payouts to answer the most important question in sports betting: Is this bet worth making? We will formalize the concept of expected value (EV), show you exactly how to calculate whether you have an edge on a given bet, and introduce the mathematics that distinguish a bet with positive expected value from one that merely feels good. Expected value is where probability meets profitability --- and it is the single concept that separates winning bettors from losing ones.

We will also explore why a bet can have positive expected value and still lose, why a bet can have negative expected value and still win, and why thinking in terms of EV rather than outcomes is the most important mental shift you will ever make as a bettor.


Chapter 2 is complete. Proceed to Chapter 3: Expected Value and the Betting Edge.