37 min read

> "The essence of the Bayesian approach is to provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence."

Learning Objectives

  • Derive and apply Bayes' theorem to update sports predictions as new evidence arrives
  • Construct informative prior distributions from historical sports data and domain knowledge
  • Perform sequential Bayesian updating to track team strength throughout a season
  • Distinguish when Bayesian methods offer clear advantages over frequentist approaches in betting contexts
  • Build practical Bayesian models for sports prediction using PyMC, including hierarchical team-strength models

Chapter 10: Bayesian Thinking for Bettors

"The essence of the Bayesian approach is to provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence." --- Sharon Bertsch McGrayne, The Theory That Would Not Die

Chapter Overview

Every time you watch a game, read an injury report, or observe a line movement, you are doing something fundamentally Bayesian: you are updating what you believe. You started the season with some sense of how good each team is. Ten games in, those beliefs have shifted. By the playoffs, they have shifted again. The question is not whether you update your beliefs---you do, inevitably---but whether you update them well.

This chapter teaches you to update them well.

Bayesian statistics provides a formal, mathematically rigorous framework for combining prior knowledge with observed evidence to produce updated beliefs. For sports bettors, this framework is not merely elegant---it is practical. Betting markets reward you for having better-calibrated beliefs than the crowd. Bayesian methods give you the machinery to calibrate precisely.

In Chapter 7, we built the probability foundations that underpin all of our statistical reasoning. In Chapter 8, we learned to describe data through distributions and summary statistics. In Chapter 9, we developed hypothesis testing and regression tools for analyzing sports data. Now, in this capstone chapter of Part II, we bring everything together under the Bayesian umbrella. You will see how priors encode the knowledge from Chapter 8, how likelihoods connect to the distributions we studied, and how posterior inference extends the hypothesis testing framework of Chapter 9 into something more natural for the bettor's workflow.

By the end of this chapter, you will not just understand Bayes' theorem---you will think Bayesianly. And that shift in thinking, more than any single formula or code snippet, is what will make you a better bettor.


10.1 Bayes' Theorem and Conditional Probability

The Foundation: Conditional Probability

Before we derive Bayes' theorem, we need to be precise about conditional probability. Recall from Chapter 7 that the conditional probability of event $A$ given event $B$ is:

$$P(A \mid B) = \frac{P(A \cap B)}{P(B)}$$

This tells us: among the scenarios where $B$ occurs, what fraction of them also include $A$? It is a narrowing of the universe. Instead of asking "how likely is $A$?" we ask "how likely is $A$, given that we know $B$ happened?"

For bettors, conditional probability is the bread and butter of analysis. We rarely care about unconditional probabilities. We do not ask "what is the probability the Chiefs win?" in a vacuum. We ask "what is the probability the Chiefs win, given that Mahomes is playing, the game is at home, the opponent is the Raiders, and the weather is clear?"

Deriving Bayes' Theorem

From the definition of conditional probability, we can write two expressions for the joint probability $P(A \cap B)$:

$$P(A \cap B) = P(A \mid B) \cdot P(B)$$

$$P(A \cap B) = P(B \mid A) \cdot P(A)$$

Since both expressions equal $P(A \cap B)$, we can set them equal:

$$P(A \mid B) \cdot P(B) = P(B \mid A) \cdot P(A)$$

Dividing both sides by $P(B)$:

$$\boxed{P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}}$$

This is Bayes' theorem. Simple to derive, profound in its implications.

Let us name the parts, because the terminology is essential:

Component Symbol Name Meaning
$P(A \mid B)$ Posterior Posterior probability Your updated belief about $A$ after seeing $B$
$P(A)$ Prior Prior probability Your belief about $A$ before seeing $B$
$P(B \mid A)$ Likelihood Likelihood How likely the evidence $B$ is if $A$ is true
$P(B)$ Evidence Marginal likelihood / Evidence The total probability of seeing $B$ under all scenarios

The relationship is often written in its proportional form, which strips away the normalizing constant:

$$\text{Posterior} \propto \text{Prior} \times \text{Likelihood}$$

$$P(A \mid B) \propto P(A) \cdot P(B \mid A)$$

This proportional form is remarkably useful in practice. It says: to update your belief, take what you believed before (the prior), weight it by how consistent the evidence is with that belief (the likelihood), and renormalize.

The Medical Test Analogy

Before we apply Bayes' theorem to sports, let us build intuition with the classic medical testing example, because it illuminates a cognitive trap that bettors fall into constantly.

Suppose a disease affects 1 in 1,000 people. A test for this disease has a 99% true positive rate (sensitivity) and a 1% false positive rate. You test positive. What is the probability you actually have the disease?

Most people intuitively guess around 99%. The correct answer is startlingly different.

Let $D$ = having the disease and $+$ = testing positive.

  • $P(D) = 0.001$ (prior: disease is rare)
  • $P(+ \mid D) = 0.99$ (likelihood: test catches real cases)
  • $P(+ \mid \neg D) = 0.01$ (false positive rate)

We need $P(D \mid +)$. First, compute the evidence $P(+)$ using the law of total probability:

$$P(+) = P(+ \mid D) \cdot P(D) + P(+ \mid \neg D) \cdot P(\neg D)$$ $$P(+) = 0.99 \times 0.001 + 0.01 \times 0.999 = 0.00099 + 0.00999 = 0.01098$$

Now apply Bayes' theorem:

$$P(D \mid +) = \frac{P(+ \mid D) \cdot P(D)}{P(+)} = \frac{0.99 \times 0.001}{0.01098} \approx 0.0902$$

About 9%. Not 99%. The base rate (prior) matters enormously.

Key Insight for Bettors: This is exactly the trap you fall into when you see a "strong signal" but ignore the base rate. A team goes 4-0 to start the season---surely they are elite? But most teams that start 4-0 regress. The base rate of truly elite teams is low. Your prior matters.

Sports Application: Championship Probability After 10 Games

Let us now apply Bayes' theorem to a concrete sports question. Suppose before the NFL season, you estimate the Kansas City Chiefs have a 15% probability of winning the Super Bowl based on preseason analysis (roster, coaching, schedule). This is your prior: $P(\text{Champ}) = 0.15$.

After 10 games, the Chiefs are 8-2. You need to compute $P(\text{Champ} \mid \text{8-2 start})$.

To apply Bayes' theorem, you need the likelihood: how likely is an 8-2 start for teams that go on to win the championship versus teams that do not?

From historical data (2002--2024 NFL seasons), suppose we find:

  • $P(\text{8-2 or better} \mid \text{Champ}) = 0.65$ --- 65% of eventual champions started 8-2 or better through 10 games
  • $P(\text{8-2 or better} \mid \neg\text{Champ}) = 0.18$ --- 18% of non-champions had such a start

Now apply Bayes' theorem:

$$P(\text{Champ} \mid \text{8-2}) = \frac{P(\text{8-2} \mid \text{Champ}) \cdot P(\text{Champ})}{P(\text{8-2})}$$

Compute the evidence:

$$P(\text{8-2}) = 0.65 \times 0.15 + 0.18 \times 0.85 = 0.0975 + 0.153 = 0.2505$$

$$P(\text{Champ} \mid \text{8-2}) = \frac{0.65 \times 0.15}{0.2505} = \frac{0.0975}{0.2505} \approx 0.389$$

The Chiefs' championship probability rose from 15% to about 39% after the strong start. A significant update, but nowhere near certainty. The prior still exerts a strong pull: even with an excellent record, most teams do not win the Super Bowl because the playoffs are a single-elimination gauntlet.

Python Implementation: Bayesian Update Calculator

import numpy as np

def bayesian_update(prior, likelihood_given_true, likelihood_given_false):
    """
    Apply Bayes' theorem for a binary hypothesis.

    Parameters
    ----------
    prior : float
        Prior probability of the hypothesis being true, P(H).
    likelihood_given_true : float
        P(Evidence | H is true).
    likelihood_given_false : float
        P(Evidence | H is false).

    Returns
    -------
    posterior : float
        Updated probability P(H | Evidence).
    bayes_factor : float
        The Bayes factor = likelihood_given_true / likelihood_given_false.
        Values > 1 mean evidence supports H; < 1 means evidence opposes H.
    """
    evidence = (likelihood_given_true * prior +
                likelihood_given_false * (1 - prior))
    posterior = (likelihood_given_true * prior) / evidence
    bayes_factor = likelihood_given_true / likelihood_given_false

    return posterior, bayes_factor

# --- Example: Chiefs championship probability after 8-2 start ---
prior_champ = 0.15
likelihood_82_given_champ = 0.65
likelihood_82_given_not_champ = 0.18

posterior, bf = bayesian_update(prior_champ,
                                 likelihood_82_given_champ,
                                 likelihood_82_given_not_champ)

print(f"Prior P(Champ):           {prior_champ:.3f}")
print(f"Likelihood (Champ):       {likelihood_82_given_champ:.3f}")
print(f"Likelihood (Not Champ):   {likelihood_82_given_not_champ:.3f}")
print(f"Posterior P(Champ|8-2):   {posterior:.3f}")
print(f"Bayes Factor:             {bf:.2f}")
print(f"\nInterpretation: The 8-2 start is {bf:.1f}x more likely for")
print(f"eventual champions than non-champions, moving our estimate")
print(f"from {prior_champ:.0%} to {posterior:.1%}.")

# --- Multiple pieces of evidence: sequential updating ---
# After 8-2 start, we also learn the Chiefs have the #1 scoring offense
# P(#1 offense | Champ) = 0.30, P(#1 offense | Not Champ) = 0.03

posterior_2, bf_2 = bayesian_update(
    prior=posterior,  # Previous posterior becomes new prior
    likelihood_given_true=0.30,
    likelihood_given_false=0.03
)

print(f"\n--- After additional evidence: #1 scoring offense ---")
print(f"Updated Posterior:        {posterior_2:.3f}")
print(f"Bayes Factor:             {bf_2:.2f}")
Prior P(Champ):           0.150
Likelihood (Champ):       0.650
Likelihood (Not Champ):   0.180
Posterior P(Champ|8-2):   0.389
Bayes Factor:             3.61

Interpretation: The 8-2 start is 3.6x more likely for
eventual champions than non-champions, moving our estimate
from 15% to 38.9%.

--- After additional evidence: #1 scoring offense ---
Updated Posterior:        0.864
Bayes Factor:             10.00

Notice the sequential updating: the posterior from the first update becomes the prior for the second. This is a fundamental property of Bayesian reasoning---it is coherent under sequential evidence. The order in which you process independent pieces of evidence does not matter. You arrive at the same posterior regardless.

Callout: The Bayes Factor The Bayes factor measures the strength of evidence. A Bayes factor of 3.6 means the observed evidence is 3.6 times more likely under the hypothesis than under the alternative. Rules of thumb: 1--3 is "barely worth mentioning," 3--10 is "substantial," 10--30 is "strong," 30--100 is "very strong," and >100 is "decisive" (following Jeffreys' scale). For betting, even "substantial" evidence can be actionable if your edge compounds over many bets.


10.2 Prior Distributions and Domain Knowledge

From Point Estimates to Distributions

In Section 10.1, we worked with single probabilities---point estimates. But real uncertainty is richer than a single number. When you say "I think this team has a 55% chance of winning," you might mean that quite confidently (you would bet heavily at those odds) or quite loosely (you are really unsure and 55% is just your best guess). A prior distribution captures both your best estimate and your uncertainty about that estimate.

This is where Bayesian thinking becomes genuinely powerful for bettors. Instead of saying "this team's win probability is 0.55," you say "my belief about this team's win probability is described by a Beta(11, 9) distribution, centered near 0.55 but with meaningful spread."

Choosing Priors: The Spectrum from Ignorance to Expertise

Priors fall on a spectrum:

Uninformative (flat) priors express maximum ignorance. For a probability parameter, a Uniform(0, 1) prior---equivalently, Beta(1, 1)---says "any win rate between 0% and 100% is equally likely before I see data." This is rarely appropriate in sports because you always know something. A team is not equally likely to go 0-17 as 17-0.

Weakly informative priors nudge toward plausible values without being overly specific. For an NFL team's win rate, a Beta(4, 4) prior centers on 0.500 with moderate spread, encoding the belief that most teams are mediocre but some are quite good or quite bad. This is often a sensible default.

Informative priors encode specific domain knowledge. If you know a team returned most of its starters from a 12-5 season and added a top free agent, you might use a Beta(14, 6) prior, centering near a 0.700 win rate. This is where the Bayesian bettor gains their edge: incorporating knowledge the market may be slow to price in.

Prior Type Example (Win Rate) Effective Sample Size When to Use
Uninformative Beta(1, 1) 2 You truly know nothing (rare in sports)
Weakly informative Beta(4, 4) 8 Default for unknown teams or new leagues
Moderately informative Beta(8, 8) 16 Average team, moderate confidence
Informative Beta(14, 6) 20 Strong preseason analysis for a specific team
Strongly informative Beta(30, 12) 42 Elite team with extensive track record

The "effective sample size" of a Beta($\alpha$, $\beta$) prior is $\alpha + \beta$. It represents how many games' worth of conviction your prior carries. A Beta(14, 6) prior is as influential as having already observed 20 games (14 wins, 6 losses). New evidence must overcome this inertia.

Conjugate Priors: Mathematical Convenience

A conjugate prior is one where the posterior belongs to the same distribution family as the prior. This makes the math clean and the updating simple. Two conjugate pairs are especially useful for sports bettors:

Beta-Binomial (for win rates and proportions):

If the prior on a win probability $\theta$ is $\text{Beta}(\alpha, \beta)$, and you observe $w$ wins in $n$ games (a Binomial likelihood), then the posterior is:

$$\theta \mid w, n \sim \text{Beta}(\alpha + w, \beta + n - w)$$

This is beautifully simple. To update, just add wins to $\alpha$ and losses to $\beta$.

The posterior mean is:

$$E[\theta \mid \text{data}] = \frac{\alpha + w}{\alpha + \beta + n}$$

This is a weighted average of the prior mean $\frac{\alpha}{\alpha + \beta}$ and the observed win rate $\frac{w}{n}$, with weights proportional to their respective sample sizes. As $n$ grows, the data dominates. As $n$ is small, the prior dominates. This is exactly the regression-to-the-mean behavior we discussed in Chapter 9.

Normal-Normal (for point spreads and scores):

If the prior on a team's true scoring margin $\mu$ is $\text{Normal}(\mu_0, \sigma_0^2)$, and we observe $n$ games with sample mean margin $\bar{x}$ and known variance $\sigma^2$, the posterior is:

$$\mu \mid \text{data} \sim \text{Normal}\left(\frac{\frac{\mu_0}{\sigma_0^2} + \frac{n\bar{x}}{\sigma^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}}, \quad \frac{1}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}}\right)$$

This looks complex but has the same intuition: the posterior mean is a precision-weighted average of the prior mean and the data mean. "Precision" is the reciprocal of variance---it measures how concentrated (confident) a distribution is. Higher precision means more influence.

Python: Constructing Priors from Historical Data

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def fit_beta_prior_from_history(historical_win_rates):
    """
    Fit a Beta distribution prior from historical team win rates.
    Uses method of moments for simplicity.

    Parameters
    ----------
    historical_win_rates : array-like
        Win rates (0 to 1) from historical seasons.

    Returns
    -------
    alpha, beta : float
        Parameters of the fitted Beta distribution.
    """
    data = np.array(historical_win_rates)
    mean = data.mean()
    var = data.var()

    # Method of moments for Beta distribution
    # mean = alpha / (alpha + beta)
    # var = alpha * beta / ((alpha + beta)^2 * (alpha + beta + 1))
    common = mean * (1 - mean) / var - 1
    alpha = mean * common
    beta = (1 - mean) * common

    return alpha, beta

# --- Historical NFL win rates (2018-2024 seasons, all 32 teams each year) ---
# Simulating realistic distribution of NFL team win rates
np.random.seed(42)
historical_wins = np.random.beta(5.5, 5.5, size=224)  # 32 teams x 7 seasons
historical_wins = np.clip(historical_wins, 0.06, 0.94)  # No 0-17 or 17-0

alpha_prior, beta_prior = fit_beta_prior_from_history(historical_wins)
print(f"Fitted Beta prior: Beta({alpha_prior:.2f}, {beta_prior:.2f})")
print(f"Prior mean: {alpha_prior/(alpha_prior+beta_prior):.3f}")
print(f"Effective sample size: {alpha_prior + beta_prior:.1f}")

# --- Visualize different priors ---
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
x = np.linspace(0, 1, 200)

# Prior choices for different scenarios
priors = {
    'Unknown Team\nBeta(4, 4)': (4, 4),
    'Average Team\nBeta(8, 8)': (8, 8),
    'Preseason Favorite\nBeta(14, 6)': (14, 6),
}

for ax, (label, (a, b)) in zip(axes, priors.items()):
    pdf = stats.beta.pdf(x, a, b)
    ax.plot(x, pdf, 'b-', linewidth=2)
    ax.fill_between(x, pdf, alpha=0.3)
    ax.set_title(label, fontsize=12)
    ax.set_xlabel('Win Rate')
    ax.set_ylabel('Density')

    # Mark the mean and 90% credible interval
    mean = a / (a + b)
    lo, hi = stats.beta.ppf([0.05, 0.95], a, b)
    ax.axvline(mean, color='red', linestyle='--', label=f'Mean={mean:.2f}')
    ax.axvspan(lo, hi, alpha=0.15, color='red',
               label=f'90% CI=[{lo:.2f}, {hi:.2f}]')
    ax.legend(fontsize=9)

plt.tight_layout()
plt.savefig('prior_distributions.png', dpi=150, bbox_inches='tight')
plt.show()

# --- Setting an informative prior for a specific team ---
# The 2024 Chiefs: won Super Bowl, returning core players
# Historical: Super Bowl winners average ~0.72 win rate next season
# But regression to mean suggests maybe ~0.65
# Our prior: Beta(13.5, 7.5) -> mean 0.643, effective n = 21

alpha_chiefs = 13.5
beta_chiefs = 7.5
mean_prior = alpha_chiefs / (alpha_chiefs + beta_chiefs)
lo, hi = stats.beta.ppf([0.05, 0.95], alpha_chiefs, beta_chiefs)
print(f"\nChiefs Prior: Beta({alpha_chiefs}, {beta_chiefs})")
print(f"  Mean win rate: {mean_prior:.3f} ({mean_prior*17:.1f} wins in 17 games)")
print(f"  90% credible interval: [{lo:.3f}, {hi:.3f}]")
print(f"  In wins: [{lo*17:.1f}, {hi*17:.1f}]")

Sensitivity Analysis: How Much Do Priors Matter?

A common (and legitimate) objection to Bayesian methods is: "Different priors give different answers. Isn't that subjective?" The answer is nuanced, and important for bettors to understand.

When priors matter a lot: - Early in the season (small $n$) - When teams have limited track records (expansion teams, major roster overhauls) - For rare events (championship probabilities, perfect seasons)

When priors barely matter: - Late in the season (large $n$) - For well-established quantities (league-average scoring) - When evidence is overwhelming

The key insight: with enough data, all reasonable priors converge to the same posterior. This is the Bayesian convergence theorem (also known as Cromwell's rule in practice---as long as your prior does not assign zero probability to the truth, you will eventually find it). The question is how much data you need.

For a Beta-Binomial model, the prior's influence is proportional to its effective sample size divided by total sample size (prior + data). After 17 NFL games, a Beta(8, 8) prior (effective n = 16) still contributes about half the weight. After 82 NBA games, that same prior contributes less than 9%.

Practical Rule for Bettors: In the NFL (16--17 games), your prior matters all season long. Choose it carefully. In the NBA or MLB (82 or 162 games), priors matter mainly in the first month. After that, the data speaks for itself.

Worked Example: Setting a Prior for a New NFL Team

The 2002 Houston Texans were an expansion franchise. How should a Bayesian bettor have set a prior for their win rate?

Step 1: Historical base rate. Expansion teams in the modern NFL (post-merger) have averaged about 3--4 wins in their first season. That is roughly a 0.20--0.25 win rate.

Step 2: Uncertainty. There is substantial variation. The 1995 Jaguars won 4 games; the 1976 Buccaneers won 0. The spread is wide.

Step 3: Fit a prior. A Beta(3, 11) prior gives a mean of 0.214 (about 3.4 wins in 16 games) with a 90% credible interval of roughly [0.07, 0.40], or 1 to 6 wins. This captures the range of plausible outcomes for an expansion team.

Step 4: Update. The Texans started 1-4. Using the Beta-Binomial conjugacy:

$$\text{Posterior} = \text{Beta}(3 + 1, 11 + 4) = \text{Beta}(4, 15)$$

Posterior mean: $\frac{4}{19} \approx 0.211$ (about 3.4 wins over 16 games).

The posterior barely moved---because the observed data (1-4, a 0.200 win rate) was almost exactly what the prior predicted. When your prior is well-calibrated, new evidence confirms rather than overturns it.


10.3 Updating Beliefs with New Evidence

Sequential Bayesian Updating

One of the most powerful features of Bayesian inference is its natural handling of sequential evidence. As each week of the NFL season unfolds, you do not need to start from scratch. You simply update:

$$\text{Posterior}_{\text{week } t} = \text{Prior}_{\text{week } t} \times \text{Likelihood}_{\text{week } t}$$

where $\text{Prior}_{\text{week } t} = \text{Posterior}_{\text{week } t-1}$.

This creates a chain of updates, each building on the last. Your belief distribution evolves week by week, game by game, evidence by evidence.

How Quickly Posteriors Dominate Priors

The speed of convergence depends on three factors:

  1. Strength of the prior (effective sample size): Stronger priors take longer to overcome.
  2. Informativeness of the data: Blowout wins are more informative than narrow victories.
  3. Consistency of the evidence: A team that wins every game moves the posterior faster than one alternating wins and losses.

For a Beta($\alpha$, $\beta$) prior updated with $w$ wins in $n$ games, the posterior mean is:

$$\hat{\theta}_{\text{posterior}} = \frac{\alpha + w}{\alpha + \beta + n} = \underbrace{\frac{\alpha + \beta}{\alpha + \beta + n}}_{\text{weight on prior}} \cdot \underbrace{\frac{\alpha}{\alpha + \beta}}_{\text{prior mean}} + \underbrace{\frac{n}{\alpha + \beta + n}}_{\text{weight on data}} \cdot \underbrace{\frac{w}{n}}_{\text{observed rate}}$$

This decomposition makes the shrinkage explicit. The posterior is always a weighted average, and the weights shift toward data as $n$ grows.

Shrinkage and Regression to the Mean

Shrinkage is the Bayesian mechanism behind regression to the mean, the phenomenon we explored in Chapter 9. Early-season extremes are "shrunk" toward the prior mean because the prior carries substantial weight relative to the small data sample.

Consider two NFL teams after Week 4: - Team A: 4-0 (observed win rate = 1.000) - Team B: 0-4 (observed win rate = 0.000)

With a Beta(8, 8) prior (league-average expectation, effective $n$ = 16):

Team A posterior: Beta(8 + 4, 8 + 0) = Beta(12, 8) - Posterior mean: 12/20 = 0.600 (shrunk from 1.000 toward 0.500)

Team B posterior: Beta(8 + 0, 8 + 4) = Beta(8, 12) - Posterior mean: 8/20 = 0.400 (shrunk from 0.000 toward 0.500)

The Bayesian model correctly recognizes that going 4-0 in four games does not make you a 100% favorite in every game, nor does 0-4 make you hopeless. This is precisely the kind of overcorrection that recreational bettors make and sharp bettors exploit.

Callout: Why Bayesian Shrinkage is Worth Money Sportsbooks and sharp bettors have long known that early-season records are misleading. The betting market itself performs a kind of implicit Bayesian shrinkage: lines after Week 4 reflect both the early results and preseason expectations. If you can perform this shrinkage more accurately than the market, you have an edge. Specifically, if the market over-reacts to early results (weights data too heavily relative to a reasonable prior), you should bet against the hot starters and for the cold starters.

Python: Tracking a Team Through the Season

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def track_season_bayesian(prior_alpha, prior_beta, game_results, team_name):
    """
    Track Bayesian posterior for a team's win rate through the season.

    Parameters
    ----------
    prior_alpha, prior_beta : float
        Beta prior parameters.
    game_results : list of int
        1 for win, 0 for loss, for each game in order.
    team_name : str
        For labeling.

    Returns
    -------
    dict with posterior history.
    """
    n_games = len(game_results)

    # Track posterior parameters after each game
    alphas = [prior_alpha]
    betas_ = [prior_beta]
    means = [prior_alpha / (prior_alpha + prior_beta)]
    ci_lows = [stats.beta.ppf(0.05, prior_alpha, prior_beta)]
    ci_highs = [stats.beta.ppf(0.95, prior_alpha, prior_beta)]

    a, b = prior_alpha, prior_beta
    for result in game_results:
        if result == 1:
            a += 1
        else:
            b += 1
        alphas.append(a)
        betas_.append(b)
        means.append(a / (a + b))
        ci_lows.append(stats.beta.ppf(0.05, a, b))
        ci_highs.append(stats.beta.ppf(0.95, a, b))

    return {
        'team': team_name,
        'weeks': list(range(n_games + 1)),
        'alphas': alphas, 'betas': betas_,
        'means': means, 'ci_lows': ci_lows, 'ci_highs': ci_highs,
        'games': game_results,
    }

# --- Simulate an NFL season for two teams ---
# Team Alpha: preseason favorite, starts hot then cools off
# Prior: Beta(13, 7) -> expected ~0.65 win rate
team_alpha_results = [1,1,1,1,0,1,1,0,0,1,0,0,1,0,1,0,1]  # 10-7

# Team Beta: preseason underdog, starts cold then heats up
# Prior: Beta(6, 12) -> expected ~0.33 win rate
team_beta_results =  [0,0,1,0,0,1,0,1,1,1,1,0,1,1,1,1,0]  # 10-7

alpha_track = track_season_bayesian(13, 7, team_alpha_results, "Alpha (Favorite)")
beta_track = track_season_bayesian(6, 12, team_beta_results, "Beta (Underdog)")

# --- Visualization ---
fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

for ax, track in zip(axes, [alpha_track, beta_track]):
    weeks = track['weeks']

    # Plot posterior mean and credible interval
    ax.plot(weeks, track['means'], 'b-o', linewidth=2, markersize=4,
            label='Posterior Mean', zorder=3)
    ax.fill_between(weeks, track['ci_lows'], track['ci_highs'],
                    alpha=0.25, color='blue', label='90% Credible Interval')

    # Mark actual game results
    for i, result in enumerate(track['games']):
        color = 'green' if result == 1 else 'red'
        marker = '^' if result == 1 else 'v'
        ax.plot(i + 1, track['means'][i + 1], marker=marker,
                color=color, markersize=10, zorder=4)

    # Reference line at 0.5
    ax.axhline(0.5, color='gray', linestyle=':', alpha=0.5)

    # Observed win rate line
    cumulative_wins = np.cumsum(track['games'])
    game_numbers = np.arange(1, len(track['games']) + 1)
    observed_rates = cumulative_wins / game_numbers
    ax.plot(game_numbers, observed_rates, 'r--', alpha=0.5,
            label='Raw Win Rate (no prior)')

    final_record = f"{sum(track['games'])}-{len(track['games'])-sum(track['games'])}"
    ax.set_title(f"{track['team']} | Final: {final_record}", fontsize=13)
    ax.set_ylabel('Win Probability')
    ax.legend(loc='best')
    ax.set_ylim(0, 1)

axes[1].set_xlabel('Week (0 = Preseason)')
plt.suptitle('Bayesian Tracking Through an NFL Season', fontsize=15, y=1.01)
plt.tight_layout()
plt.savefig('bayesian_season_tracking.png', dpi=150, bbox_inches='tight')
plt.show()

# --- Key insight: compare final posteriors ---
print("=== End-of-Season Posteriors ===")
for track in [alpha_track, beta_track]:
    a, b = track['alphas'][-1], track['betas'][-1]
    mean = a / (a + b)
    lo, hi = stats.beta.ppf([0.05, 0.95], a, b)
    prior_mean = track['alphas'][0] / (track['alphas'][0] + track['betas'][0])
    print(f"\n{track['team']}:")
    print(f"  Prior mean:     {prior_mean:.3f}")
    print(f"  Final record:   {sum(track['games'])}-{len(track['games'])-sum(track['games'])}")
    print(f"  Posterior mean:  {mean:.3f}")
    print(f"  90% CI:          [{lo:.3f}, {hi:.3f}]")
    print(f"  Posterior Beta:  Beta({a:.0f}, {b:.0f})")

The visualization reveals a critical dynamic. Team Alpha and Team Beta both finished 10-7, but their posteriors differ because they started from different priors. Team Alpha (the preseason favorite with a strong prior) ends with a higher posterior mean than Team Beta (the underdog). Is this correct? In many cases, yes: a preseason favorite that goes 10-7 probably is better than a preseason underdog that goes 10-7, because the prior encoded real information about roster quality, coaching, and historical performance.

However, this is also where Bayesian thinking demands intellectual honesty. If your prior was wrong---if Team Beta genuinely improved through trades or player development---then clinging to an outdated prior is a liability, not an asset. The Bayesian framework gives you the math, but you must supply the judgment about when your prior needs revision versus when it should be trusted.

Early-Season Wisdom: The Small-Sample Problem

The first month of any sports season is a minefield for bettors who do not think Bayesianly. Consider these real phenomena:

  • NBA: Teams that start 5-0 finish the season above 0.600 only about 70% of the time.
  • NFL: Teams that start 3-0 miss the playoffs roughly 30% of the time.
  • MLB: A team's April record correlates only modestly (~0.35) with its final record.

Each of these reflects the same statistical reality: small samples are noisy. The Bayesian approach handles this naturally through the prior, while a naive "what have you done lately" approach would wildly overweight early results.

The practical implication: early in the season, you can often find value by betting against teams with extreme records, because the market tends to overweight recent performance. This is the Bayesian shrinkage edge, and it is one of the most reliable sources of value in sports betting.


10.4 Bayesian vs. Frequentist Approaches in Betting

The Philosophical Divide

The Bayesian-frequentist debate is one of the longest-running arguments in statistics. For most of its history, this was an academic dispute. For bettors, it has practical stakes.

Frequentist statistics treats probability as a long-run frequency. A parameter (like a team's "true" win rate) is fixed but unknown. Data are random. We make inferences through procedures (confidence intervals, p-values) that have good long-run properties. We do not put probability distributions on parameters---that would be "subjective."

Bayesian statistics treats probability as a degree of belief. Parameters are uncertain, and we describe that uncertainty with probability distributions. Data are observed (fixed), and we update our beliefs in light of them. Priors encode what we knew before; posteriors encode what we know after.

For bettors, the Bayesian interpretation is more natural. When you say "I think there is a 60% chance the Chiefs win this game," you are making a Bayesian statement. You do not mean "if this game were played infinitely many times under identical conditions, the Chiefs would win 60% of the time." You mean "given everything I know, 60% represents my degree of belief." Games are played once. Seasons happen once. The frequentist long-run may never arrive.

Practical Differences: A Comparison

Aspect Frequentist Approach Bayesian Approach
Parameter interpretation Fixed, unknown Random, has a distribution
Prior information Not formally incorporated Formally incorporated via prior
Results Point estimates, confidence intervals, p-values Posterior distributions, credible intervals
Small samples Often unreliable or impossible Naturally handled via prior
Interpretation of intervals "95% of intervals constructed this way contain the true value" "There is a 95% probability the parameter lies in this interval"
Sequential analysis Complicated (multiple testing corrections needed) Natural (just keep updating)
Computational complexity Usually simpler Can be computationally expensive
Subjectivity Implicit (choice of test, significance level) Explicit (choice of prior)
Prediction Plug-in estimates or prediction intervals Full posterior predictive distributions

When Bayesian is Clearly Better

Small samples with strong prior information. In the NFL, you get 17 games per season. If you have reliable preseason information (from roster analysis, historical patterns, coaching track records), a Bayesian model that formally incorporates this prior will outperform a frequentist model that must ignore it. This is not a theoretical argument---it is measurable in prediction accuracy.

Sequential decision-making. Betting is inherently sequential. You make decisions week by week, incorporating new information each time. The Bayesian framework handles this perfectly: last week's posterior becomes this week's prior. Frequentist methods require ad hoc adjustments for sequential analysis, and getting the multiple-testing corrections right is notoriously tricky.

Regularization and hierarchical modeling. When you model multiple related quantities---say, all 32 NFL team strengths simultaneously---Bayesian hierarchical models naturally "borrow strength" across teams, producing better estimates for each individual team. The Bayesian framework makes the regularization explicit and principled.

Communicating uncertainty. A posterior distribution says "there is a 90% probability that this team's true win rate is between 0.45 and 0.65." A frequentist confidence interval says something much harder to explain (and much less useful to a bettor): "if we repeated this procedure many times, 90% of the resulting intervals would contain the true value." The Bayesian statement directly answers the question you care about.

When Frequentist is Fine

Large datasets with weak or no prior information. If you are analyzing 10 years of NBA play-by-play data to estimate the average value of a three-pointer, you have hundreds of thousands of data points. The prior is essentially irrelevant---any reasonable prior will be overwhelmed by the data. A simple frequentist regression will give you the same answer in a fraction of the computation time.

Standard hypothesis testing with large samples. Testing whether home-court advantage exists in the NBA? With thousands of games, a simple t-test or chi-squared test will work perfectly. The Bayesian version would give you a slightly more informative answer (a posterior on the magnitude of home-court advantage), but the extra computational cost may not be justified.

When speed matters. MCMC sampling (the workhorse of practical Bayesian computation) can be slow. If you need to fit thousands of models quickly---for instance, running a grid search over model specifications---frequentist methods that have closed-form solutions or fast optimization routines may be practically necessary.

The Pragmatic Approach: Use What Works

The best bettors are not ideologues. They are pragmatists. Here is a decision framework:

  1. Do you have meaningful prior information? If yes, lean Bayesian.
  2. Is your sample size small relative to the complexity of your question? If yes, lean Bayesian.
  3. Do you need a full probability distribution over outcomes? If yes, lean Bayesian.
  4. Is your data set large and your question simple? If yes, frequentist is fine.
  5. Do you need results in milliseconds? If yes, frequentist (or at least, pre-compute Bayesian posteriors).

In practice, most serious sports modelers use a hybrid approach: Bayesian priors for season-level team strength parameters (where samples are small), frequentist methods for play-by-play analysis (where data is abundant), and machine learning for pattern recognition in large feature spaces.

Callout: The Market's Approach Sportsbooks themselves use something that resembles Bayesian updating, even if they do not call it that. Opening lines reflect a prior (based on power ratings, historical trends, and public perception). As money comes in and information arrives, the line moves---an implicit posterior update. Understanding this process helps you identify when the market's "prior" is wrong and when its "update" is too aggressive or too timid.


10.5 Practical Bayesian Models for Sports

From Theory to Practice

The previous sections gave you the conceptual framework. Now we build real models. We will progress from simple analytical solutions to full probabilistic programming, culminating in a hierarchical model that represents the state of the art in Bayesian sports modeling.

Bayesian Hierarchical Models: The Core Idea

The most important Bayesian concept for sports modelers is the hierarchical model (also called a multilevel model). The idea is simple and powerful: individual teams are not independent. They come from a population of teams. By modeling the population distribution alongside individual team parameters, we get better estimates for every team.

In a hierarchical model for NFL team strengths:

  • Level 1 (Team): Each team $i$ has a strength parameter $\theta_i$.
  • Level 2 (League): The $\theta_i$ values are drawn from a common distribution: $\theta_i \sim \text{Normal}(\mu_{\text{league}}, \sigma_{\text{league}}^2)$.
  • Data level: Game outcomes depend on the difference in team strengths.

The beauty of this structure is partial pooling. Instead of estimating each team's strength completely independently (no pooling---high variance) or assuming all teams are equal (complete pooling---high bias), we let the data determine how much to pool. Teams with more data are pooled less; teams with less data are pooled more.

The mathematical structure is:

$$\mu_{\text{league}} \sim \text{Normal}(0, \sigma_{\mu}^2) \quad \text{(hyperprior on league mean)}$$

$$\sigma_{\text{league}} \sim \text{HalfNormal}(\tau) \quad \text{(hyperprior on league spread)}$$

$$\theta_i \sim \text{Normal}(\mu_{\text{league}}, \sigma_{\text{league}}^2) \quad \text{(team strength, for } i = 1, \ldots, 32\text{)}$$

$$y_j \sim \text{Normal}(\theta_{\text{home}_j} - \theta_{\text{away}_j} + \text{HFA}, \sigma_{\text{game}}^2) \quad \text{(observed margin for game } j\text{)}$$

where HFA is the home-field advantage parameter and $\sigma_{\text{game}}$ captures game-to-game randomness.

MCMC Basics for Practitioners

Except for simple conjugate models, Bayesian posteriors do not have closed-form solutions. We need numerical methods, and the dominant family of methods is Markov Chain Monte Carlo (MCMC).

The core idea of MCMC: we cannot directly compute the posterior distribution, but we can construct a sequence (a "chain") of random samples that, after enough steps, are distributed according to the posterior. With enough samples, we can approximate any feature of the posterior: its mean, variance, quantiles, or shape.

Key MCMC concepts for practitioners:

  • Warmup/burn-in: The chain needs time to "find" the high-probability region of the posterior. Discard early samples.
  • Chains: Run multiple independent chains and check that they agree (convergence).
  • Effective sample size: Due to autocorrelation, 4,000 MCMC samples may contain only 1,000 "effectively independent" samples.
  • Divergences: Warning signs that the sampler is struggling. Usually indicates model misspecification or parameterization issues.
  • $\hat{R}$ (R-hat): A convergence diagnostic. Values near 1.00 indicate convergence; values above 1.01 suggest problems.

Modern probabilistic programming languages (PyMC, Stan, NumPyro) use advanced MCMC algorithms---particularly the No-U-Turn Sampler (NUTS), a variant of Hamiltonian Monte Carlo---that are remarkably efficient for models with hundreds of parameters.

Python with PyMC: A Simple Bayesian Model

Let us start with a straightforward model: estimating a team's true scoring margin from observed game results.

import numpy as np
import pymc as pm
import arviz as az
import matplotlib.pyplot as plt

# --- Observed data: a team's scoring margins in 10 games ---
# Positive = won by that many, negative = lost by that many
observed_margins = np.array([7, -3, 14, 3, -10, 21, 6, -7, 10, 3])

print(f"Observed margins: {observed_margins}")
print(f"Sample mean: {observed_margins.mean():.1f}")
print(f"Sample std:  {observed_margins.std():.1f}")
print(f"Record: {(observed_margins > 0).sum()}-{(observed_margins < 0).sum()}")

# --- Build the Bayesian model ---
with pm.Model() as margin_model:
    # Prior on true team strength (scoring margin)
    # Centered at 0 (average team) with moderate uncertainty
    # A team's true margin is unlikely to be beyond +/- 15
    mu = pm.Normal('true_margin', mu=0, sigma=7)

    # Prior on game-to-game variability
    # NFL games typically have ~13-14 point standard deviation
    sigma = pm.HalfNormal('game_variability', sigma=15)

    # Likelihood: observed margins given true strength
    y_obs = pm.Normal('observed', mu=mu, sigma=sigma,
                       observed=observed_margins)

    # Sample from the posterior
    trace = pm.sample(2000, tune=1000, chains=4,
                       random_seed=42, return_inferencedata=True)

# --- Analyze the posterior ---
print("\n=== Posterior Summary ===")
summary = az.summary(trace, var_names=['true_margin', 'game_variability'],
                      hdi_prob=0.90)
print(summary)

# --- Visualize ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Posterior of true margin
az.plot_posterior(trace, var_names=['true_margin'], ax=axes[0],
                  hdi_prob=0.90, ref_val=0)
axes[0].set_title('Posterior: True Scoring Margin')
axes[0].axvline(observed_margins.mean(), color='red', linestyle='--',
                label=f'Sample mean = {observed_margins.mean():.1f}')
axes[0].legend()

# Posterior of game variability
az.plot_posterior(trace, var_names=['game_variability'], ax=axes[1],
                  hdi_prob=0.90)
axes[1].set_title('Posterior: Game-to-Game Variability')

plt.tight_layout()
plt.savefig('bayesian_margin_model.png', dpi=150, bbox_inches='tight')
plt.show()

# --- Posterior predictive: what margin do we expect next game? ---
with margin_model:
    ppc = pm.sample_posterior_predictive(trace, random_seed=42)

next_game_margins = ppc.posterior_predictive['observed'].values.flatten()
print(f"\n=== Prediction for Next Game ===")
print(f"Expected margin: {next_game_margins.mean():.1f}")
print(f"Std deviation:   {next_game_margins.std():.1f}")
print(f"P(win):          {(next_game_margins > 0).mean():.3f}")
print(f"P(cover -3):     {(next_game_margins > 3).mean():.3f}")
print(f"P(cover -7):     {(next_game_margins > 7).mean():.3f}")

Notice the power of the posterior predictive distribution. Instead of a single point prediction ("I think they will win by 4"), we get a full distribution of plausible outcomes. This lets us directly compute probabilities relevant to betting: the probability of winning, covering a specific spread, or the game going over a certain total. No separate step is needed---it all flows naturally from the Bayesian framework.

Advanced: Hierarchical Model for NFL Team Strength

Now let us build the full hierarchical model. This is substantially more complex, but it represents the kind of model that professional sports analytics groups actually use.

import numpy as np
import pymc as pm
import arviz as az
import pandas as pd

# --- Simulate a partial NFL season (8 weeks, simplified) ---
np.random.seed(123)

n_teams = 32
n_weeks = 8

# True (hidden) team strengths - this is what we're trying to estimate
true_strengths = np.random.normal(0, 5, size=n_teams)
true_hfa = 2.5  # True home field advantage
true_sigma = 13.5  # True game-to-game noise

# Generate schedule and results
games = []
for week in range(n_weeks):
    # Simple round-robin-ish schedule (16 games per week)
    teams_available = list(range(n_teams))
    np.random.shuffle(teams_available)
    for g in range(16):
        home = teams_available[2 * g]
        away = teams_available[2 * g + 1]
        true_diff = true_strengths[home] - true_strengths[away] + true_hfa
        margin = np.random.normal(true_diff, true_sigma)
        games.append({
            'week': week + 1,
            'home_team': home,
            'away_team': away,
            'home_margin': margin
        })

df = pd.DataFrame(games)
print(f"Generated {len(df)} games over {n_weeks} weeks")
print(f"\nSample games:")
print(df.head(10).to_string(index=False))

# --- Build the hierarchical Bayesian model ---
home_idx = df['home_team'].values
away_idx = df['away_team'].values
observed_margins = df['home_margin'].values

with pm.Model() as nfl_model:
    # === Hyperpriors (league-level) ===
    # How spread out are team strengths?
    sigma_team = pm.HalfNormal('sigma_team', sigma=10)

    # === Team-level priors ===
    # Each team's strength, drawn from the league distribution
    # Centered at 0: strength is relative (average team = 0)
    team_strength = pm.Normal('team_strength', mu=0, sigma=sigma_team,
                               shape=n_teams)

    # === Game-level parameters ===
    # Home field advantage
    hfa = pm.Normal('home_field_advantage', mu=3, sigma=2)

    # Game-to-game noise
    sigma_game = pm.HalfNormal('sigma_game', sigma=15)

    # === Likelihood ===
    # Expected margin = home_strength - away_strength + HFA
    mu_game = team_strength[home_idx] - team_strength[away_idx] + hfa

    y = pm.Normal('margin', mu=mu_game, sigma=sigma_game,
                   observed=observed_margins)

    # === Sample ===
    trace_nfl = pm.sample(2000, tune=1500, chains=4,
                            random_seed=42, return_inferencedata=True,
                            target_accept=0.9)

# --- Results ---
print("\n=== Model Diagnostics ===")
summary_params = az.summary(trace_nfl,
                             var_names=['sigma_team', 'home_field_advantage',
                                        'sigma_game'],
                             hdi_prob=0.90)
print(summary_params)

# --- Compare estimated vs. true team strengths ---
team_summary = az.summary(trace_nfl, var_names=['team_strength'],
                           hdi_prob=0.90)

estimated_strengths = team_summary['mean'].values
strength_ci_low = team_summary['hdi_5%'].values
strength_ci_high = team_summary['hdi_95%'].values

# Correlation between estimated and true strengths
correlation = np.corrcoef(true_strengths, estimated_strengths)[0, 1]
print(f"\nCorrelation (true vs. estimated): {correlation:.3f}")

# Rankings comparison
true_ranks = np.argsort(-true_strengths) + 1
est_ranks = np.argsort(-estimated_strengths) + 1
rank_correlation = np.corrcoef(
    np.argsort(true_strengths),
    np.argsort(estimated_strengths)
)[0, 1]
print(f"Rank correlation: {rank_correlation:.3f}")

# --- Visualize estimated vs true strengths ---
fig, ax = plt.subplots(figsize=(10, 8))

# Sort by true strength for clarity
sort_idx = np.argsort(true_strengths)

for i, idx in enumerate(sort_idx):
    ax.errorbar(estimated_strengths[idx], i,
                xerr=[[estimated_strengths[idx] - strength_ci_low[idx]],
                       [strength_ci_high[idx] - estimated_strengths[idx]]],
                fmt='o', color='blue', capsize=3, markersize=5)
    ax.plot(true_strengths[idx], i, 'rx', markersize=10, markeredgewidth=2)

ax.set_yticks(range(n_teams))
ax.set_yticklabels([f'Team {sort_idx[i]}' for i in range(n_teams)], fontsize=8)
ax.set_xlabel('Team Strength (points)')
ax.set_title('Hierarchical Bayesian Team Strength Estimates\n'
             'Blue = Estimated (90% CI), Red X = True', fontsize=13)
ax.axvline(0, color='gray', linestyle=':', alpha=0.5)
ax.legend(['True Strength', 'Estimated (90% CI)'], loc='lower right')

plt.tight_layout()
plt.savefig('hierarchical_nfl_model.png', dpi=150, bbox_inches='tight')
plt.show()

# --- Predict a specific game ---
# Example: Team 0 (home) vs Team 1 (away)
team_0_samples = trace_nfl.posterior['team_strength'].values[:, :, 0].flatten()
team_1_samples = trace_nfl.posterior['team_strength'].values[:, :, 1].flatten()
hfa_samples = trace_nfl.posterior['home_field_advantage'].values.flatten()
sigma_samples = trace_nfl.posterior['sigma_game'].values.flatten()

# Predicted margin for Team 0 at home vs Team 1
pred_margin_mean = team_0_samples - team_1_samples + hfa_samples
pred_margin = np.random.normal(pred_margin_mean, sigma_samples)

print(f"\n=== Prediction: Team 0 (home) vs Team 1 ===")
print(f"True strengths: Team 0 = {true_strengths[0]:.1f}, "
      f"Team 1 = {true_strengths[1]:.1f}")
print(f"Expected margin: {pred_margin.mean():.1f}")
print(f"P(Team 0 wins):  {(pred_margin > 0).mean():.3f}")
print(f"P(Team 0 -3.5):  {(pred_margin > 3.5).mean():.3f}")
print(f"P(Total > 44):   Not estimable from margin-only model")

Let us examine what the hierarchical model achieves that a simpler model cannot:

Partial pooling in action. Teams that have played more games (or more informative games) have narrower credible intervals. Teams with fewer data points are shrunk more strongly toward the league average. This is automatic---we did not have to manually tune any shrinkage parameters.

Uncertainty propagation. When we predict a game outcome, we do not just get a single number. We get a full distribution that accounts for uncertainty in every parameter: team strengths, home-field advantage, and game-to-game variability. This is critical for proper bet sizing (as we will discuss in Part III on the Kelly criterion).

Principled handling of home-field advantage. Instead of assuming a fixed home-field advantage (e.g., 3 points), we estimate it from the data with a posterior distribution. If the data suggest home-field advantage is smaller than historically assumed (as has been the trend since 2020), our model automatically adapts.

Worked Example: Full Bayesian Prediction for an NFL Game

Let us walk through the complete workflow a Bayesian bettor would follow, from priors to prediction to bet evaluation.

Scenario: It is Week 12 of the NFL season. The Buffalo Bills (8-3) are hosting the Miami Dolphins (6-5). The market line is Bills -3.5.

Step 1: Set up priors. Before the season, based on roster analysis, draft capital, and coaching, we set: - Bills prior strength: Normal(4.0, 3.0) --- a good team, expected to win by ~4 on a neutral field - Dolphins prior strength: Normal(1.5, 3.0) --- a slightly above-average team

Step 2: Incorporate season data. After 11 games, our hierarchical model has updated both teams' strength estimates based on their results and the strength of their opponents. Suppose the posteriors are: - Bills posterior strength: Normal(5.2, 1.8) --- data confirmed they are strong - Dolphins posterior strength: Normal(-0.3, 2.0) --- weaker than expected; prior has been pulled down

Step 3: Generate game prediction.

Expected margin = Bills strength - Dolphins strength + HFA

$$\text{E}[\text{margin}] = 5.2 - (-0.3) + 2.5 = 8.0$$

But we need the full distribution, accounting for parameter uncertainty and game noise:

$$\text{margin} \sim \text{Normal}(8.0, \sqrt{1.8^2 + 2.0^2 + 13.5^2}) \approx \text{Normal}(8.0, 13.9)$$

Step 4: Evaluate the bet.

$$P(\text{Bills cover } -3.5) = P(\text{margin} > 3.5) = \Phi\left(\frac{8.0 - 3.5}{13.9}\right) = \Phi(0.324) \approx 0.627$$

Our model says the Bills cover 62.7% of the time. At standard -110 juice, we need 52.4% to break even. This is a substantial edge, suggesting a bet on the Bills -3.5.

Step 5: Assess robustness. A Bayesian bettor does not stop at the point estimate. They ask: - How sensitive is this to the prior? If we used a weaker prior for the Bills, the expected margin might drop to 6.5 instead of 8.0. The cover probability drops to ~58%. Still a bet, but with less confidence. - How sensitive is this to the home-field advantage estimate? If HFA is only 1.5 instead of 2.5, the expected margin drops to 7.0, and cover probability is ~60%. Still a bet.

This kind of sensitivity analysis is central to Bayesian betting. You do not just ask "what does my model say?" You ask "how much would the answer change if my assumptions were somewhat wrong?"

Bayesian Regression for Game Prediction

Beyond the hierarchical team-strength model, Bayesian regression offers a flexible framework for incorporating multiple predictors. The Bayesian version of the regression models we built in Chapter 9 adds uncertainty quantification to every coefficient.

A Bayesian regression for predicting game margins might include:

$$\text{margin}_j = \beta_0 + \beta_1 \cdot \text{strength\_diff}_j + \beta_2 \cdot \text{rest\_diff}_j + \beta_3 \cdot \text{travel\_dist}_j + \beta_4 \cdot \text{altitude}_j + \epsilon_j$$

with priors on each $\beta$:

$$\beta_0 \sim \text{Normal}(0, 5) \quad \text{(intercept: home advantage-like term)}$$ $$\beta_1 \sim \text{Normal}(1, 0.5) \quad \text{(strength difference should have coefficient near 1)}$$ $$\beta_2 \sim \text{Normal}(0, 2) \quad \text{(rest advantage effect)}$$ $$\beta_3 \sim \text{Normal}(0, 1) \quad \text{(travel effect; expect small and negative)}$$ $$\beta_4 \sim \text{Normal}(0, 2) \quad \text{(altitude effect; Denver)}$$

The priors encode domain knowledge: we expect the strength difference coefficient to be near 1 (if teams are rated in points), the travel effect to be small and probably negative, and so on. The data refines these expectations.

The key advantage over frequentist regression: when you have only a few games of data involving high-altitude venues or extreme rest differences, the Bayesian priors prevent the model from fitting noise. The frequentist model might estimate a wild coefficient from three data points; the Bayesian model will shrink toward the prior, producing a more reliable estimate.

Model Checking and Posterior Predictive Checks

A Bayesian model is only as good as its assumptions. Posterior predictive checks are the Bayesian analog of residual analysis in frequentist statistics. The idea: if your model is good, data simulated from the model should look like the real data.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def posterior_predictive_check(observed_data, simulated_data_sets,
                                stat_name='mean', stat_func=np.mean):
    """
    Perform a posterior predictive check.

    Parameters
    ----------
    observed_data : array
        The actual observed data.
    simulated_data_sets : array of shape (n_simulations, n_observations)
        Data simulated from the posterior predictive distribution.
    stat_name : str
        Name of the test statistic.
    stat_func : callable
        Function to compute the test statistic.

    Returns
    -------
    p_value : float
        Bayesian p-value (proportion of simulated statistics >= observed).
    """
    observed_stat = stat_func(observed_data)
    simulated_stats = np.array([stat_func(sim) for sim in simulated_data_sets])
    p_value = np.mean(simulated_stats >= observed_stat)

    return observed_stat, simulated_stats, p_value

# --- Example: checking our margin model ---
# Observed data
observed_margins = np.array([7, -3, 14, 3, -10, 21, 6, -7, 10, 3])

# Simulate from posterior predictive
# (In practice, these come from pm.sample_posterior_predictive)
# Here we simulate manually for illustration
np.random.seed(42)
n_sims = 4000
post_mu_samples = np.random.normal(4.4, 3.5, size=n_sims)     # from posterior
post_sigma_samples = np.abs(np.random.normal(12, 2, size=n_sims))  # from posterior

simulated_datasets = np.array([
    np.random.normal(mu, sigma, size=len(observed_margins))
    for mu, sigma in zip(post_mu_samples, post_sigma_samples)
])

# Check multiple statistics
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
checks = [
    ('Mean', np.mean),
    ('Std Dev', np.std),
    ('Max', np.max),
    ('Min', np.min),
]

for ax, (name, func) in zip(axes.flat, checks):
    obs_stat, sim_stats, p_val = posterior_predictive_check(
        observed_margins, simulated_datasets, name, func
    )
    ax.hist(sim_stats, bins=50, density=True, alpha=0.7, color='steelblue',
            label='Simulated from model')
    ax.axvline(obs_stat, color='red', linewidth=2, linestyle='--',
               label=f'Observed = {obs_stat:.1f}')
    ax.set_title(f'{name} (Bayesian p = {p_val:.3f})')
    ax.legend()

plt.suptitle('Posterior Predictive Checks', fontsize=14)
plt.tight_layout()
plt.savefig('posterior_predictive_checks.png', dpi=150, bbox_inches='tight')
plt.show()

print("If Bayesian p-values are near 0 or 1, the model is failing to")
print("capture that aspect of the data. Values near 0.5 indicate good fit.")

A well-calibrated model will produce Bayesian p-values near 0.5 for all test statistics. If the mean check gives $p = 0.45$ but the max check gives $p = 0.02$, it suggests the model is capturing the central tendency of the data but failing to reproduce the occasional blowout win. You might then consider a heavier-tailed likelihood (e.g., Student-t instead of Normal) to better model extreme outcomes.

Callout: Model Calibration and Betting Profits Calibration is directly tied to profitability. If your model says an event has a 60% probability, and it actually occurs 60% of the time across many such predictions, your model is well-calibrated. Well-calibrated probabilistic predictions allow you to size bets optimally (Kelly criterion) and identify mispriced lines reliably. Posterior predictive checks are one of the best tools for assessing calibration. We will return to calibration in depth in Part IV.


10.6 Putting It All Together: The Bayesian Bettor's Workflow

Before we summarize, let us step back and describe the complete workflow of a bettor who thinks Bayesianly. This workflow integrates everything from Part II.

Pre-Season

  1. Set priors for each team's strength using historical data, roster analysis, and domain knowledge (Section 10.2). Use techniques from Chapter 8 (distributions) to characterize these priors.
  2. Choose your model structure: hierarchical team-strength model (Section 10.5), possibly with additional covariates from regression analysis (Chapter 9).
  3. Establish baselines: what does the league-average team look like? What is the typical spread of team strengths? Use Chapter 8's descriptive statistics.

During the Season

  1. Update weekly using Bayesian sequential updating (Section 10.3). Each game provides new evidence that shifts your posterior.
  2. Check model calibration regularly using posterior predictive checks (Section 10.5). If your model is systematically off, diagnose and fix it.
  3. Compare your posteriors to market lines. When your posterior distribution assigns significantly different probabilities than the market implies, you have found a potential bet. Use the hypothesis testing framework from Chapter 9 to assess whether the discrepancy is statistically meaningful or just noise.

For Each Bet

  1. Generate the full posterior predictive distribution for the game outcome.
  2. Compute the implied probability from the market line.
  3. Compare: if your posterior probability exceeds the breakeven threshold (accounting for vigorish), you have an edge.
  4. Size the bet according to the Kelly criterion (which we will develop in Part III), using the full posterior distribution to account for parameter uncertainty.
  5. Record everything for later analysis and model improvement.

10.7 Chapter Summary

Key Concepts

Bayes' theorem provides the mathematical foundation for updating beliefs with evidence:

$$P(H \mid E) = \frac{P(E \mid H) \cdot P(H)}{P(E)}$$

or equivalently:

$$\text{Posterior} \propto \text{Prior} \times \text{Likelihood}$$

Prior distributions encode what you know before seeing data. For sports betting, informative priors based on domain knowledge are a major advantage, especially in small-sample settings like the NFL.

Sequential updating makes Bayesian reasoning naturally suited to the week-by-week rhythm of sports seasons. Last week's posterior becomes this week's prior.

Conjugate priors provide closed-form updates: Beta-Binomial for win rates, Normal-Normal for scoring margins. These are the workhorses of quick Bayesian calculations.

Shrinkage / regression to the mean emerges naturally from Bayesian updating. Early-season extremes are pulled toward the prior, protecting against overreaction to small samples.

Hierarchical models estimate individual team strengths while borrowing information across the league. This produces better estimates for every team, especially those with limited data.

Posterior predictive distributions give you full probability distributions over game outcomes, enabling direct computation of betting-relevant quantities (win probability, cover probability, expected margin).

Essential Formulas

Formula Purpose
$P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}$ Bayes' theorem
$\text{Beta}(\alpha + w, \beta + n - w)$ Beta-Binomial posterior after $w$ wins in $n$ games
$\hat{\theta} = \frac{\alpha + w}{\alpha + \beta + n}$ Posterior mean for Beta-Binomial
$\text{BF} = \frac{P(E \mid H)}{P(E \mid \neg H)}$ Bayes factor (strength of evidence)
$\text{Weight}_{\text{prior}} = \frac{\alpha + \beta}{\alpha + \beta + n}$ Prior's influence in posterior

Decision Framework: Bayesian vs. Frequentist

Use this quick-reference guide:

Situation Recommended Approach Reason
NFL team ratings (17 games) Bayesian Small sample, strong priors available
MLB season-long hitting stats Frequentist or light Bayesian Large sample, priors quickly overwhelmed
New/expansion team analysis Bayesian Almost no team-specific data; prior is essential
Play-by-play analysis Frequentist Massive datasets; computational speed matters
Championship probability Bayesian Rare event, prior matters enormously
Line movement prediction Frequentist/ML Large historical dataset, weak priors
In-season team strength tracking Bayesian Sequential updating is a natural fit

Code Patterns

The key code patterns from this chapter:

  1. Simple Bayesian update (Section 10.1): bayesian_update(prior, likelihood_true, likelihood_false) for quick binary hypothesis calculations.
  2. Beta-Binomial tracking (Section 10.3): track_season_bayesian() for monitoring team win probabilities through a season.
  3. PyMC single-team model (Section 10.5): Bayesian estimation of a team's true scoring margin with uncertainty.
  4. PyMC hierarchical model (Section 10.5): Full league-wide team strength estimation with partial pooling.
  5. Posterior predictive checks (Section 10.5): Model validation by comparing simulated and observed data.

Common Pitfalls

Pitfall 1: Overconfident Priors. Setting a prior with too small a variance can prevent you from adapting to genuinely new information. If you are "certain" a team is bad and they win eight straight, your prior should yield to the data. Use sensitivity analysis to check.

Pitfall 2: Ignoring Priors Entirely. Using flat, uninformative priors in small-sample settings throws away valuable information. In the NFL, preseason knowledge is worth roughly 4--6 games of data. Do not waste it.

Pitfall 3: Confusing Prior Predictive with Posterior Predictive. The prior predictive distribution describes what you expect before seeing any data. The posterior predictive describes what you expect for the next observation after learning from data. Use the posterior predictive for betting decisions.

Pitfall 4: Not Checking Your Model. A Bayesian model that is not checked against reality is just an elaborate exercise in wishful thinking. Always run posterior predictive checks. Always track your calibration.

Pitfall 5: Falling in Love with Complexity. A simple Beta-Binomial model that you understand deeply and use correctly will outperform a complex hierarchical model that you run blindly. Start simple. Add complexity only when it measurably improves prediction accuracy.


Connections to Previous Chapters

This chapter sits on the foundations built throughout Part II:

  • Chapter 7 (Probability Foundations): Bayes' theorem is a direct consequence of the probability axioms and conditional probability rules established there. The law of total probability, used to compute the evidence term, was a key result from Chapter 7.
  • Chapter 8 (Descriptive Statistics and Distributions): Prior distributions are probability distributions. The Beta distribution, Normal distribution, and other families from Chapter 8 are the building blocks of Bayesian priors and posteriors.
  • Chapter 9 (Hypothesis Testing and Regression): Bayesian inference offers an alternative to frequentist hypothesis testing, one that many argue is more natural and interpretable. Bayesian regression extends the regression models of Chapter 9 with full uncertainty quantification.

What's Next: Part III --- The Betting Marketplace

With Part II complete, you now have a comprehensive statistical toolkit: probability theory, distributions, descriptive statistics, hypothesis testing, regression, and Bayesian inference. You understand both the mathematics and the code needed to analyze sports data rigorously.

But statistical knowledge alone does not make you profitable. In Part III, we shift from analysis to action. Chapter 11 opens our exploration of the betting marketplace itself: how odds are set, how lines move, how the vigorish eats into your edge, and how the market aggregates information from thousands of participants. You will learn to read the market as a signal---sometimes confirming your analysis, sometimes challenging it.

Where Part II asked "what do I believe, and why?", Part III asks "how do I profit from what I believe?"

The Bayesian posterior you compute is your edge. The marketplace is where you monetize it.


Exercises

Conceptual

  1. Base rate neglect. A sports commentator says: "This team has won 5 straight games against the spread---they are a lock to cover this week." Using Bayesian reasoning, explain why this conclusion may be flawed. What is the relevant base rate, and how should it influence your estimate?

  2. Prior sensitivity. You are analyzing a rookie quarterback's performance. After 4 games, he has a 72% completion rate (well above the league average of 64%). Compare the posteriors you would get from three different priors: uninformative Beta(1,1), league-average Beta(16,9), and skeptical-of-rookies Beta(12,10). Which prior would you use, and why?

  3. Bayesian vs. frequentist. A frequentist analyst reports: "Home-field advantage is 2.8 points (95% CI: 1.2 to 4.4, p < 0.001)." A Bayesian analyst reports: "There is a 95% probability that home-field advantage is between 1.5 and 3.9 points." Explain the difference in interpretation. Which is more useful for a bettor setting a line?

Computational

  1. Build a Beta-Binomial updater. Write a Python function that takes a team's preseason prior (as Beta parameters) and a sequence of game results, and produces a visualization showing the posterior distribution after each game. Apply it to a real team's current season results.

  2. Bayesian A/B test for betting strategies. You are testing two betting strategies over 50 bets each. Strategy A won 29 of 50 bets. Strategy B won 31 of 50 bets. Using a Bayesian approach with Beta(1,1) priors, compute the posterior probability that Strategy B is genuinely better than Strategy A. (Hint: sample from both posteriors and compute $P(\theta_B > \theta_A)$.)

  3. Hierarchical model extension. Extend the PyMC hierarchical model from Section 10.5 to include a home-field advantage that varies by team (some teams have stronger home-field advantages than others). What does the model reveal about the distribution of home-field advantages across the league?

Applied

  1. Season-long Bayesian tracking project. Choose a current sports season. Set priors for all teams before the season starts, then update weekly using the Beta-Binomial model. At the end of each week, record your posterior win probabilities and the market's implied probabilities. After at least 8 weeks, analyze: are your posteriors better calibrated than the market? Where do they systematically differ?

  2. Bayesian prediction model. Build a complete Bayesian game prediction model using PyMC for the sport of your choice. Include at least three predictors (e.g., team strength, home-field advantage, rest days). Evaluate its calibration using posterior predictive checks. Use the model to identify at least 5 games where your predicted probability differs from the market by more than 5 percentage points.


Further Reading

  • Bayesian Data Analysis by Gelman, Carlin, Stern, Dunson, Vehtari, and Rubin. The definitive textbook on Bayesian statistics. Chapters 1--5 cover the foundations relevant to this chapter; Chapter 15 covers hierarchical models in depth.

  • Statistical Rethinking by Richard McElreath. An outstanding, accessible introduction to Bayesian statistics with a focus on building and understanding models. The accompanying lecture videos are freely available online.

  • Bayesian Methods for Hackers by Cameron Davidson-Pilon. A practical, code-first introduction to Bayesian methods using PyMC. Excellent for readers who learn better through code than equations.

  • PyMC documentation and examples (pymc.io). The official documentation includes numerous sports-related examples and tutorials on hierarchical modeling.

  • Probability Theory: The Logic of Science by E.T. Jaynes. A deep, philosophical treatment of Bayesian probability as an extension of logic. Not for the faint of heart, but profoundly influential.


This concludes Part II: Statistical Foundations. You now have the mathematical and computational tools to analyze sports data with rigor and sophistication. In Part III, we enter the marketplace where these tools meet money.