Every sport generates data that follows specific probability distributions. A soccer match typically produces between zero and five goals. An NFL game's margin of victory clusters around certain key numbers. An NBA team's season win total reflects...
Learning Objectives
- Identify which probability distribution best models a given sports betting scenario
- Apply the normal, Poisson, binomial, and beta distributions to calculate betting-relevant probabilities
- Fit probability distributions to real sports data and assess goodness of fit
- Use distribution parameters to derive spread, totals, and moneyline probabilities
- Implement Bayesian updating with the beta-binomial model to estimate true team strength
In This Chapter
Chapter 7: Probability Distributions in Betting
Chapter Overview
Every sport generates data that follows specific probability distributions. A soccer match typically produces between zero and five goals. An NFL game's margin of victory clusters around certain key numbers. An NBA team's season win total reflects the accumulation of dozens of binary win-or-lose outcomes. Behind each of these patterns lies a mathematical structure -- a probability distribution -- that governs the frequency and likelihood of different outcomes.
For the sports bettor, understanding probability distributions is not an academic exercise. It is the foundation upon which every serious betting model is built. When a sportsbook sets a total at 2.5 goals in a soccer match, they are implicitly invoking the Poisson distribution. When they set an NFL spread at -3.5, the normal distribution lurks behind their number. When a futures market prices a team's season win total, the binomial distribution is at work. The bettor who understands these distributions can reverse-engineer the implied assumptions, compare them against their own models, and identify where the market has mispriced an outcome.
In this chapter, we move from the general probability theory of Chapter 6 to the specific distributions that matter most in sports betting. We will study four distributions in depth: the normal distribution for continuous quantities like point spreads and margins; the Poisson distribution for count data like goals and runs; the binomial distribution for sequences of wins and losses; and the beta distribution for estimating unknown probabilities. For each, we develop the mathematical theory, implement it in Python, and work through detailed betting examples.
By the end of this chapter, you will have a toolkit of distributions that covers the vast majority of sports betting scenarios. You will also learn how to test whether a distribution actually fits your data -- a critical skill that separates rigorous modelers from those who blindly apply formulas without validation.
7.1 The Normal Distribution
The normal distribution -- also called the Gaussian distribution or the bell curve -- is the most important distribution in statistics and one of the most useful in sports betting. Its ubiquity stems from the Central Limit Theorem, which tells us that sums and averages of many independent random variables tend toward a normal distribution regardless of the underlying distribution of the individual variables.
7.1.1 Properties and the 68-95-99.7 Rule
A normal distribution is fully characterized by two parameters: its mean $\mu$ (the center) and its standard deviation $\sigma$ (the spread). The probability density function is:
$$ f(x) = \frac{1}{\sigma\sqrt{2\pi}} \, e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$
This formula produces the familiar bell-shaped curve, symmetric about $\mu$. The key properties are:
- Symmetry: The distribution is perfectly symmetric around the mean. $P(X > \mu + k) = P(X < \mu - k)$ for any $k$.
- The 68-95-99.7 rule: Approximately 68% of values fall within one standard deviation of the mean, 95% within two, and 99.7% within three.
- Defined on all real numbers: The tails extend to $-\infty$ and $+\infty$, though the density becomes negligible far from the mean.
- Mean = Median = Mode: All three measures of central tendency coincide at $\mu$.
The cumulative distribution function (CDF), which gives the probability that a value is less than or equal to $x$, is:
$$ \Phi(x) = P(X \leq x) = \int_{-\infty}^{x} \frac{1}{\sigma\sqrt{2\pi}} \, e^{-\frac{(t - \mu)^2}{2\sigma^2}} \, dt $$
There is no closed-form expression for this integral, which is why we rely on tables, calculators, or software to evaluate it.
7.1.2 Why NFL Scores and NBA Point Differentials Are Approximately Normal
NFL game margins (home score minus away score) across a full season form a distribution that is approximately normal with a mean around 2.5 to 3.0 (reflecting home-field advantage) and a standard deviation of approximately 13.5 to 14 points. This is not a coincidence. A football game's final margin is the sum of many individual scoring events (touchdowns, field goals, safeties), each influenced by numerous factors. The Central Limit Theorem tells us that such sums tend toward normality.
Similarly, NBA point differentials are approximately normal, with a standard deviation of roughly 12 points. The higher-scoring nature of basketball means even more individual events contribute to the final margin, making the normal approximation particularly good.
It is important to note that the normal approximation is not perfect. NFL margins show slight peaks at key numbers (3, 7, 10, 14) due to the discrete scoring system. We will address this limitation later, but for many betting calculations, the normal model provides an excellent starting point.
7.1.3 The z-Score
The z-score standardizes any normal distribution to the standard normal distribution $N(0, 1)$, allowing comparisons across different contexts:
$$ z = \frac{x - \mu}{\sigma} $$
A z-score tells you how many standard deviations a value is from the mean. A z-score of +1.5 means the value is 1.5 standard deviations above the mean; a z-score of -2.0 means two standard deviations below.
In betting, z-scores are invaluable for comparing performance across sports. A team that beats the spread by 7 points in the NFL (where $\sigma \approx 13.5$) has a z-score of $7/13.5 \approx 0.52$. A team that beats the spread by 7 points in the NBA (where $\sigma \approx 12$) has a z-score of $7/12 \approx 0.58$. Despite the same raw margin, the NBA performance was slightly more unusual relative to its sport's variability.
7.1.4 Using the Normal Distribution for Spread Betting
Suppose we model the margin of victory for a particular game as:
$$ \text{Margin} \sim N(\mu, \sigma^2) $$
where $\mu$ is our predicted margin (positive if we expect the home team to win) and $\sigma$ is the standard deviation of margins.
The probability that the favored team covers a spread of $-s$ (meaning they must win by more than $s$ points) is:
$$ P(\text{Cover}) = P(\text{Margin} > s) = 1 - \Phi\left(\frac{s - \mu}{\sigma}\right) $$
This is the fundamental equation for spread betting under the normal model. Everything flows from it.
7.1.5 Worked Example: Probability of Covering -3.5
Setup: We project an NFL game where Team A is favored. Our model predicts Team A will win by 6 points on average ($\mu = 6$). We use the standard NFL margin standard deviation of $\sigma = 13.5$ points. The sportsbook has set the spread at Team A $-3.5$. What is the probability that Team A covers?
Solution:
Team A covers if they win by more than 3.5 points, i.e., $\text{Margin} > 3.5$.
Step 1: Calculate the z-score.
$$ z = \frac{3.5 - 6.0}{13.5} = \frac{-2.5}{13.5} = -0.1852 $$
Step 2: Find $P(\text{Margin} > 3.5)$.
$$ P(\text{Margin} > 3.5) = 1 - \Phi(-0.1852) = \Phi(0.1852) \approx 0.5735 $$
So our model gives Team A a 57.35% probability of covering $-3.5$.
Step 3: Determine if there is value. A standard $-110$ spread bet requires a win probability of approximately $52.4\%$ to break even. Our model's $57.35\%$ suggests substantial value on Team A $-3.5$.
Extended calculation: What if the spread were $-7$?
$$ z = \frac{7 - 6}{13.5} = 0.0741 $$
$$ P(\text{Margin} > 7) = 1 - \Phi(0.0741) \approx 0.4705 $$
At $-7$, our model gives only a 47.05% cover probability -- below break-even. We would pass on this bet or look at the other side.
7.1.6 Python Implementation
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# --- Fitting a Normal Distribution to NFL Margins ---
# Simulated NFL margin data (home score - away score)
# In practice, you would load actual data from a database
np.random.seed(42)
nfl_margins = np.random.normal(loc=2.8, scale=13.5, size=512)
# Fit the normal distribution to the data
mu_fit, sigma_fit = stats.norm.fit(nfl_margins)
print(f"Fitted NFL margin distribution: N({mu_fit:.2f}, {sigma_fit:.2f}^2)")
print(f"Mean (home advantage): {mu_fit:.2f} points")
print(f"Std deviation: {sigma_fit:.2f} points")
# --- Calculating Cover Probabilities ---
def cover_probability(predicted_margin, spread, sigma=13.5):
"""
Calculate probability of covering a point spread.
Parameters:
predicted_margin: Our model's predicted margin of victory
(positive = favorite wins by this much)
spread: The point spread (positive number for the favorite,
e.g., 3.5 means favorite must win by > 3.5)
sigma: Standard deviation of margins (default 13.5 for NFL)
Returns:
Probability of covering the spread
"""
z = (spread - predicted_margin) / sigma
return 1 - stats.norm.cdf(z)
# Example: Team A favored, predicted margin = 6, spread = -3.5
pred_margin = 6.0
spread = 3.5
prob = cover_probability(pred_margin, spread)
print(f"\nPredicted margin: {pred_margin}")
print(f"Spread: -{spread}")
print(f"Cover probability: {prob:.4f} ({prob*100:.2f}%)")
# Break-even probability at -110 odds
breakeven = 110 / (110 + 100)
print(f"Break-even at -110: {breakeven:.4f} ({breakeven*100:.2f}%)")
print(f"Edge: {(prob - breakeven)*100:.2f}%")
# --- Visualizing the Distribution ---
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
x = np.linspace(-45, 45, 1000)
pdf = stats.norm.pdf(x, loc=pred_margin, scale=13.5)
ax.plot(x, pdf, 'b-', linewidth=2, label=f'N({pred_margin}, 13.5²)')
ax.fill_between(x, pdf, where=(x > spread), alpha=0.3, color='green',
label=f'P(Cover -3.5) = {prob:.4f}')
ax.axvline(x=spread, color='red', linestyle='--', linewidth=1.5,
label=f'Spread = -{spread}')
ax.axvline(x=pred_margin, color='blue', linestyle=':', linewidth=1.5,
label=f'Predicted margin = {pred_margin}')
ax.set_xlabel('Margin of Victory (Home - Away)', fontsize=12)
ax.set_ylabel('Probability Density', fontsize=12)
ax.set_title('NFL Spread Betting: Normal Distribution Model', fontsize=14)
ax.legend(fontsize=10)
plt.tight_layout()
plt.savefig('nfl_spread_normal.png', dpi=150, bbox_inches='tight')
plt.show()
# --- Sensitivity Analysis: Cover probability across spreads ---
spreads = np.arange(0, 15, 0.5)
probs = [cover_probability(pred_margin, s) for s in spreads]
print("\n--- Cover Probability Table (Predicted Margin = 6.0) ---")
print(f"{'Spread':<10} {'Cover Prob':<12} {'Value at -110?':<15}")
print("-" * 37)
for s, p in zip(spreads, probs):
value = "YES" if p > breakeven else "No"
print(f"-{s:<9.1f} {p:<12.4f} {value:<15}")
This code demonstrates three critical skills: fitting a distribution to data, calculating cover probabilities analytically, and conducting sensitivity analysis across different spread values. The sensitivity table is particularly useful in practice -- it tells you the maximum spread at which you still have an edge.
7.2 The Poisson Distribution
While the normal distribution excels at modeling continuous quantities like margins and point differentials, many sports outcomes are discrete counts: goals in soccer, goals in hockey, runs in baseball, aces in tennis. For these, we turn to the Poisson distribution.
7.2.1 Properties and When It Applies
The Poisson distribution models the number of events occurring in a fixed interval of time (or space), given that events occur at a constant average rate and independently of each other. Its probability mass function is:
$$ P(X = k) = \frac{\lambda^k \, e^{-\lambda}}{k!} $$
where $\lambda > 0$ is the rate parameter (the expected number of events) and $k = 0, 1, 2, \ldots$ is the number of events observed.
Key properties of the Poisson distribution:
- Mean equals variance: $E[X] = \text{Var}(X) = \lambda$. This is both a defining feature and a testable assumption.
- Discrete and non-negative: The distribution is defined only for non-negative integers.
- Additive: If $X \sim \text{Poisson}(\lambda_1)$ and $Y \sim \text{Poisson}(\lambda_2)$ are independent, then $X + Y \sim \text{Poisson}(\lambda_1 + \lambda_2)$.
- Approximation to binomial: When $n$ is large and $p$ is small, $\text{Binomial}(n, p) \approx \text{Poisson}(np)$.
The cumulative distribution function is:
$$ P(X \leq k) = e^{-\lambda} \sum_{i=0}^{k} \frac{\lambda^i}{i!} $$
7.2.2 Soccer Goals as a Poisson Process
Soccer is the canonical application of the Poisson distribution in sports betting. The average number of goals per team per match in a top European league is approximately 1.3 to 1.5. Goals are relatively rare events, and while they are not perfectly independent (the game state changes after a goal), the Poisson model provides a remarkably good fit for practical purposes.
For a given match, we model:
$$ \text{Home goals} \sim \text{Poisson}(\lambda_H) $$
$$ \text{Away goals} \sim \text{Poisson}(\lambda_A) $$
where $\lambda_H$ and $\lambda_A$ reflect the attacking strength of the home team and away team, respectively, adjusted for defensive quality and home advantage.
If we assume independence between the two teams' goal counts (an assumption we will revisit), we can compute the probability of any exact score by multiplying:
$$ P(\text{Score} = i\text{-}j) = P(\text{Home} = i) \times P(\text{Away} = j) = \frac{\lambda_H^i \, e^{-\lambda_H}}{i!} \times \frac{\lambda_A^j \, e^{-\lambda_A}}{j!} $$
From these exact-score probabilities, we can derive:
- Match result probabilities (1X2): Sum the exact scores corresponding to home win, draw, and away win.
- Over/Under probabilities: Sum the exact scores where total goals exceed or fall below a given threshold.
- Both teams to score (BTTS): $1 - P(\text{Home} = 0) - P(\text{Away} = 0) + P(\text{Home} = 0) \times P(\text{Away} = 0)$.
7.2.3 Worked Example: Predicting Exact Score in a Soccer Match
Setup: Manchester City hosts Aston Villa. Based on our model (using team attack/defense ratings, home advantage, and recent form), we estimate:
- $\lambda_H = 2.1$ (Manchester City expected goals)
- $\lambda_A = 0.9$ (Aston Villa expected goals)
Step 1: Calculate exact score probabilities.
For the score 2-1 (Home 2, Away 1):
$$ P(2\text{-}1) = \frac{2.1^2 \, e^{-2.1}}{2!} \times \frac{0.9^1 \, e^{-0.9}}{1!} $$
$$ = \frac{4.41 \times 0.1225}{2} \times \frac{0.9 \times 0.4066}{1} $$
$$ = 0.2700 \times 0.3659 = 0.0988 $$
So the probability of a 2-1 home win is approximately 9.88%.
Step 2: Calculate match result probabilities.
We sum across all relevant exact scores. For computational tractability, we consider scores from 0-0 up to 8-8 (scores beyond this have negligible probability):
$$ P(\text{Home Win}) = \sum_{i > j} P(i, j) \approx 0.624 $$
$$ P(\text{Draw}) = \sum_{i = j} P(i, j) \approx 0.184 $$
$$ P(\text{Away Win}) = \sum_{i < j} P(i, j) \approx 0.192 $$
Step 3: Calculate Over/Under 2.5 goals.
$$ P(\text{Over 2.5}) = 1 - P(\text{Total} \leq 2) = 1 - \sum_{\substack{i+j \leq 2}} P(i, j) $$
The scores with total $\leq 2$ are: 0-0, 1-0, 0-1, 2-0, 0-2, 1-1. Computing each:
$$ P(\text{Over 2.5}) = 1 - [P(0,0) + P(1,0) + P(0,1) + P(2,0) + P(0,2) + P(1,1)] $$
$$ \approx 1 - [0.0498 + 0.1046 + 0.0449 + 0.1098 + 0.0202 + 0.0943] $$
$$ = 1 - 0.4236 = 0.5764 $$
Our model gives approximately a 57.6% probability of Over 2.5 goals.
7.2.4 The Dixon-Coles Correction
The standard Poisson model assumes independence between home and away goals, but empirical data shows a slight correlation. Specifically, low-scoring draws (0-0, 1-1) occur slightly more often than the independence assumption predicts.
Dixon and Coles (1997) introduced a correction factor $\tau$ that adjusts the probabilities for scores of 0-0, 0-1, 1-0, and 1-1:
$$ P_{\text{DC}}(i, j) = \tau_{\lambda_H, \lambda_A}(i, j) \times P_{\text{Poisson}}(i, j) $$
where $\tau$ depends on a parameter $\rho$ that captures the correlation between scores. When $\rho < 0$ (the typical case), low-scoring draws become more likely. We will implement the full Dixon-Coles model in Chapter 19. For now, be aware that the basic Poisson model systematically underestimates 0-0 and 1-1 draws by a small but meaningful amount.
7.2.5 The Mean-Variance Assumption
A critical assumption of the Poisson distribution is that the mean equals the variance. In practice, sports scoring data sometimes exhibits overdispersion -- the variance exceeds the mean. This can occur because:
- Teams adjust strategy based on the current score (e.g., "parking the bus" when leading in soccer).
- Not all games are created equal; varying team quality introduces extra variability.
- Weather, injuries, and other factors create heterogeneity across games.
When overdispersion is present, the negative binomial distribution provides a more flexible alternative. We will encounter this in Section 7.5 when fitting distributions to real data.
7.2.6 Python Implementation
import numpy as np
from scipy import stats
from itertools import product
def poisson_match_probs(lambda_home, lambda_away, max_goals=8):
"""
Calculate exact score probabilities for a soccer match
using the independent Poisson model.
Parameters:
lambda_home: Expected goals for home team
lambda_away: Expected goals for away team
max_goals: Maximum goals to consider per team
Returns:
score_matrix: 2D array of exact score probabilities
home_win_prob, draw_prob, away_win_prob: Match result probs
"""
# Create probability vectors for each team
home_probs = stats.poisson.pmf(range(max_goals + 1), lambda_home)
away_probs = stats.poisson.pmf(range(max_goals + 1), lambda_away)
# Outer product gives the score matrix (independence assumption)
score_matrix = np.outer(home_probs, away_probs)
# Match result probabilities
home_win = np.tril(score_matrix, k=-1).sum() # Below diagonal
draw = np.trace(score_matrix) # Diagonal
away_win = np.triu(score_matrix, k=1).sum() # Above diagonal
return score_matrix, home_win, draw, away_win
def over_under_prob(score_matrix, line):
"""
Calculate Over/Under probabilities from a score matrix.
Parameters:
score_matrix: 2D array of exact score probabilities
line: The total goals line (e.g., 2.5)
Returns:
over_prob, under_prob
"""
max_goals = score_matrix.shape[0]
under_prob = 0.0
for i in range(max_goals):
for j in range(max_goals):
if i + j < line:
under_prob += score_matrix[i, j]
return 1 - under_prob, under_prob
# --- Worked Example: Manchester City vs Aston Villa ---
lambda_H = 2.1 # Man City expected goals
lambda_A = 0.9 # Aston Villa expected goals
scores, p_home, p_draw, p_away = poisson_match_probs(lambda_H, lambda_A)
print("=== Match Result Probabilities ===")
print(f"Home Win: {p_home:.4f} ({p_home*100:.1f}%)")
print(f"Draw: {p_draw:.4f} ({p_draw*100:.1f}%)")
print(f"Away Win: {p_away:.4f} ({p_away*100:.1f}%)")
# Convert to fair decimal odds
print(f"\nFair odds - Home: {1/p_home:.2f}, Draw: {1/p_draw:.2f}, "
f"Away: {1/p_away:.2f}")
# Over/Under probabilities
for line in [1.5, 2.5, 3.5, 4.5]:
over, under = over_under_prob(scores, line)
print(f"\nOver {line}: {over:.4f} ({over*100:.1f}%) | "
f"Under {line}: {under:.4f} ({under*100:.1f}%)")
# Most likely exact scores
print("\n=== Top 10 Most Likely Exact Scores ===")
score_list = []
for i in range(scores.shape[0]):
for j in range(scores.shape[1]):
score_list.append((i, j, scores[i, j]))
score_list.sort(key=lambda x: x[2], reverse=True)
for rank, (h, a, p) in enumerate(score_list[:10], 1):
print(f" {rank}. {h}-{a}: {p:.4f} ({p*100:.2f}%)")
# Both Teams to Score (BTTS)
p_home_clean = stats.poisson.pmf(0, lambda_A) # Away scores 0
p_away_clean = stats.poisson.pmf(0, lambda_H) # Home scores 0
p_btts = 1 - p_home_clean - p_away_clean + p_home_clean * p_away_clean
print(f"\nBTTS Yes: {p_btts:.4f} ({p_btts*100:.1f}%)")
print(f"BTTS No: {1-p_btts:.4f} ({(1-p_btts)*100:.1f}%)")
# --- Verify Poisson Assumption: Mean vs Variance ---
# Simulating many matches to check
np.random.seed(42)
n_simulations = 100000
sim_home_goals = np.random.poisson(lambda_H, n_simulations)
sim_away_goals = np.random.poisson(lambda_A, n_simulations)
sim_totals = sim_home_goals + sim_away_goals
print(f"\n=== Poisson Assumption Check (Simulated) ===")
print(f"Home goals - Mean: {sim_home_goals.mean():.3f}, "
f"Var: {sim_home_goals.var():.3f}")
print(f"Away goals - Mean: {sim_away_goals.mean():.3f}, "
f"Var: {sim_away_goals.var():.3f}")
print(f"Total goals - Mean: {sim_totals.mean():.3f}, "
f"Var: {sim_totals.var():.3f}")
print(f"(For Poisson, mean should approximately equal variance)")
This implementation covers the full pipeline from parameter estimation to betting-relevant outputs. The score matrix approach is particularly elegant: once constructed, it supports any market derivation through simple summation.
7.3 The Binomial Distribution
When we model repeated independent trials with two outcomes -- win or lose, cover or not cover, over or under -- the binomial distribution is the natural choice. It answers questions like: "If a team has a 60% chance of winning each game, what is the probability they win at least 50 games in an 82-game season?"
7.3.1 Properties and When It Applies
The binomial distribution models the number of successes in $n$ independent trials, where each trial has probability $p$ of success. Its probability mass function is:
$$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n-k} $$
where $\binom{n}{k} = \frac{n!}{k!(n-k)!}$ is the binomial coefficient ("n choose k").
Key properties:
- Mean: $E[X] = np$
- Variance: $\text{Var}(X) = np(1 - p)$
- Standard deviation: $\sigma = \sqrt{np(1-p)}$
- Symmetry: The distribution is symmetric when $p = 0.5$; skewed right when $p < 0.5$ and skewed left when $p > 0.5$.
- Normal approximation: When $np \geq 5$ and $n(1-p) \geq 5$, the binomial is well-approximated by $N(np, np(1-p))$.
7.3.2 Win-Loss Sequences and Streaks
One of the most common fallacies in sports betting is overinterpreting streaks. When a basketball team wins 10 of 12 games, commentators declare they are "on fire." When a baseball team loses 8 of 10, the narrative becomes one of crisis. But how surprising are these streaks given the team's underlying win probability?
The binomial distribution gives us the answer. If a team has a true win probability of $p$, the probability of winning exactly $k$ out of $n$ games is given directly by the PMF. The probability of winning $k$ or more is:
$$ P(X \geq k) = \sum_{i=k}^{n} \binom{n}{i} p^i (1-p)^{n-i} $$
This is equivalent to $1 - F(k-1)$, where $F$ is the CDF of the binomial distribution.
7.3.3 Worked Example: Is a 10-2 Streak Really Exceptional?
Setup: An NBA team has a true win probability of 60% (roughly a 49-win team over 82 games). They have just gone 10-2 in their last 12 games. Commentators are raving about their "hot streak." How likely is this assuming no change in underlying ability?
Solution:
We model wins in 12 games as $X \sim \text{Binomial}(n = 12, p = 0.60)$.
The probability of winning exactly 10 out of 12:
$$ P(X = 10) = \binom{12}{10} (0.60)^{10} (0.40)^{2} = 66 \times 0.006047 \times 0.16 = 0.0639 $$
The probability of winning 10 or more (the "at least this extreme" probability):
$$ P(X \geq 10) = P(X=10) + P(X=11) + P(X=12) $$
$$ P(X = 11) = \binom{12}{11} (0.60)^{11} (0.40)^{1} = 12 \times 0.003628 \times 0.40 = 0.0174 $$
$$ P(X = 12) = \binom{12}{12} (0.60)^{12} = 0.002177 = 0.0022 $$
$$ P(X \geq 10) = 0.0639 + 0.0174 + 0.0022 = 0.0835 $$
Interpretation: There is an 8.35% chance that a true 60%-win team goes 10-2 or better in any given 12-game stretch. While this seems low, consider that an NBA season has many overlapping 12-game windows. The probability that such a streak occurs at least once during an 82-game season is considerably higher. This is a crucial lesson: what looks exceptional in isolation is often unremarkable over a long season.
Now compare: if the team's true win probability were 50%, the same calculation yields:
$$ P(X \geq 10 \mid p = 0.50) = \binom{12}{10}(0.5)^{12} + \binom{12}{11}(0.5)^{12} + \binom{12}{12}(0.5)^{12} $$
$$ = (66 + 12 + 1) \times \frac{1}{4096} = \frac{79}{4096} = 0.0193 $$
For a 50% team, the probability drops to 1.93% -- much more surprising, but still not vanishingly rare.
7.3.4 Season Win Totals
Sportsbooks offer futures bets on season win totals with an over/under line. The binomial distribution provides a natural model. If we estimate a team's true game-by-game win probability as $p$, then their season wins follow $X \sim \text{Binomial}(n, p)$, where $n$ is the number of games in the season.
For example, in the NFL ($n = 17$), a team with $p = 0.65$ has:
- Expected wins: $E[X] = 17 \times 0.65 = 11.05$
- Standard deviation: $\sigma = \sqrt{17 \times 0.65 \times 0.35} = \sqrt{3.8675} \approx 1.97$
The probability of winning exactly 11 games:
$$ P(X = 11) = \binom{17}{11} (0.65)^{11} (0.35)^{6} = 12376 \times 0.008688 \times 0.001838 = 0.1976 $$
Important caveat: The binomial model assumes each game is independent with the same win probability. In reality, schedule strength varies, injuries occur, and teams may rest starters late in the season. These factors introduce correlations and heterogeneity that the simple binomial model ignores. More sophisticated models account for game-by-game win probability variation, but the binomial provides a useful baseline.
7.3.5 Python Implementation
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# --- Season Win Total Analysis ---
def season_win_probs(n_games, win_prob):
"""
Calculate the full probability distribution of season wins.
Parameters:
n_games: Number of games in the season
win_prob: Per-game win probability
Returns:
wins: Array of possible win counts (0 to n_games)
probs: Corresponding probabilities
"""
wins = np.arange(0, n_games + 1)
probs = stats.binom.pmf(wins, n_games, win_prob)
return wins, probs
def over_under_win_total(n_games, win_prob, line):
"""
Calculate over/under probabilities for a season win total.
Parameters:
n_games: Number of games in season
win_prob: Per-game win probability
line: The over/under line (e.g., 10.5)
Returns:
over_prob, under_prob
"""
# Over means winning more than line games
# P(X > line) = P(X >= ceil(line)) = 1 - P(X <= floor(line))
under_prob = stats.binom.cdf(np.floor(line), n_games, win_prob)
over_prob = 1 - under_prob
return over_prob, under_prob
# Example: NFL team with 65% win probability, 17-game season
n_games = 17
p_win = 0.65
line = 10.5
wins, probs = season_win_probs(n_games, p_win)
over, under = over_under_win_total(n_games, p_win, line)
print(f"=== NFL Season Win Total (p = {p_win}, n = {n_games}) ===")
print(f"Expected wins: {n_games * p_win:.2f}")
print(f"Std deviation: {np.sqrt(n_games * p_win * (1 - p_win)):.2f}")
print(f"\nOver {line}: {over:.4f} ({over*100:.1f}%)")
print(f"Under {line}: {under:.4f} ({under*100:.1f}%)")
print(f"\n--- Full Distribution ---")
for w, prob in zip(wins, probs):
bar = "#" * int(prob * 200)
print(f" {w:2d} wins: {prob:.4f} {bar}")
# --- Streak Analysis ---
def streak_probability(n_games, p_win, streak_wins):
"""
Probability of winning at least streak_wins in n_games.
"""
return 1 - stats.binom.cdf(streak_wins - 1, n_games, p_win)
print("\n=== Streak Analysis (12-game window) ===")
n_window = 12
for p in [0.45, 0.50, 0.55, 0.60, 0.65, 0.70]:
for threshold in [8, 9, 10, 11]:
prob = streak_probability(n_window, p, threshold)
if threshold == 10: # Focus on 10+ wins
print(f" p = {p:.2f}: P(>= {threshold} wins in {n_window}) "
f"= {prob:.4f} ({prob*100:.2f}%)")
# --- Visualization: Binomial vs Normal Approximation ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Left: Season wins distribution
ax = axes[0]
ax.bar(wins, probs, color='steelblue', alpha=0.7, label='Binomial PMF')
# Normal approximation overlay
x_norm = np.linspace(0, n_games, 200)
mu = n_games * p_win
sigma = np.sqrt(n_games * p_win * (1 - p_win))
ax.plot(x_norm, stats.norm.pdf(x_norm, mu, sigma), 'r-', linewidth=2,
label=f'Normal approx N({mu:.1f}, {sigma:.1f}²)')
ax.axvline(x=line, color='green', linestyle='--', linewidth=1.5,
label=f'O/U line = {line}')
ax.set_xlabel('Season Wins', fontsize=12)
ax.set_ylabel('Probability', fontsize=12)
ax.set_title(f'NFL Season Win Distribution (p = {p_win})', fontsize=13)
ax.legend()
# Right: Probability of covering various O/U lines
ax = axes[1]
lines = np.arange(5.5, 15.5, 1.0)
for p in [0.50, 0.55, 0.60, 0.65, 0.70]:
over_probs = [over_under_win_total(n_games, p, l)[0] for l in lines]
ax.plot(lines, over_probs, 'o-', label=f'p = {p:.2f}')
ax.axhline(y=0.5, color='gray', linestyle=':', alpha=0.5)
ax.set_xlabel('Over/Under Line', fontsize=12)
ax.set_ylabel('P(Over)', fontsize=12)
ax.set_title('Over Probability by Win Rate and Line', fontsize=13)
ax.legend()
plt.tight_layout()
plt.savefig('binomial_season_wins.png', dpi=150, bbox_inches='tight')
plt.show()
7.3.6 The Gambler's Fallacy Connection
The binomial distribution assumes independence between trials. This means a team's probability of winning game $n+1$ is unaffected by the outcomes of games $1$ through $n$. The gambler's fallacy -- the belief that a losing streak makes a win "due" -- is a direct violation of this assumption.
In sports, there are legitimate reasons why independence might not hold perfectly (confidence, fatigue, injuries). But empirical research consistently shows that the effect of streaks on future performance is far smaller than most people believe. The binomial model's independence assumption is a much closer approximation to reality than the "hot hand" narrative suggests.
7.4 The Beta Distribution
The distributions we have studied so far -- normal, Poisson, binomial -- model observable outcomes (scores, goals, wins). The beta distribution serves a fundamentally different purpose: it models uncertainty about probabilities themselves. When we say a team has a "60% win probability," how confident are we in that number? Could it really be 55%? Or 65%? The beta distribution quantifies this uncertainty.
7.4.1 Properties
The beta distribution is a continuous distribution defined on the interval $[0, 1]$, making it naturally suited for modeling probabilities. It has two shape parameters, $\alpha > 0$ and $\beta > 0$, and its probability density function is:
$$ f(x; \alpha, \beta) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)} $$
where $B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha + \beta)}$ is the beta function and $\Gamma$ is the gamma function.
Key properties:
- Mean: $E[X] = \frac{\alpha}{\alpha + \beta}$
- Mode (for $\alpha > 1$ and $\beta > 1$): $\text{Mode}[X] = \frac{\alpha - 1}{\alpha + \beta - 2}$
- Variance: $\text{Var}(X) = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$
- Concentration: The sum $\alpha + \beta$ controls how "peaked" the distribution is. Larger values mean more certainty; smaller values mean more spread.
The beta distribution is extraordinarily flexible. Depending on $\alpha$ and $\beta$, it can be: - Uniform ($\alpha = \beta = 1$): Complete ignorance about the probability. - Bell-shaped and symmetric ($\alpha = \beta > 1$): Centered at 0.5 with varying certainty. - Skewed left or right ($\alpha \neq \beta$): Belief favoring probabilities above or below 0.5. - U-shaped ($\alpha < 1, \beta < 1$): Belief that the probability is near 0 or near 1 but not in between.
7.4.2 The Beta-Binomial Model: Bayesian Updating
The beta distribution's most powerful application in sports betting is as a conjugate prior for the binomial distribution. This means that if our prior belief about a win probability $p$ follows a $\text{Beta}(\alpha, \beta)$ distribution, and we observe $k$ wins in $n$ trials, then our updated (posterior) belief about $p$ is:
$$ p \mid k, n \sim \text{Beta}(\alpha + k, \, \beta + n - k) $$
This is remarkably elegant. We simply add the observed wins to $\alpha$ and the observed losses to $\beta$. The posterior automatically balances our prior belief with the evidence from the data.
The prior parameters $\alpha$ and $\beta$ can be interpreted as "pseudo-observations": $\alpha - 1$ prior wins and $\beta - 1$ prior losses. The total $\alpha + \beta$ controls the weight of the prior relative to the data. A prior with $\alpha + \beta = 10$ (weak prior) will be quickly overwhelmed by data, while $\alpha + \beta = 100$ (strong prior) requires much more evidence to shift.
7.4.3 Worked Example: Estimating a Team's True Win Probability
Setup: A new NBA season has started. Based on preseason analysis (roster changes, draft picks, last season's performance), we believe a team's true win probability is around 55%, but we are not very confident. We encode this as a $\text{Beta}(11, 9)$ prior, which has:
- Mean: $11/20 = 0.55$
- The sum $\alpha + \beta = 20$ represents moderate confidence (equivalent to about 20 games of prior information).
After 20 games, the team has gone 14-6. How should we update our belief?
Solution:
The posterior distribution is:
$$ p \mid \text{data} \sim \text{Beta}(11 + 14, \, 9 + 6) = \text{Beta}(25, 15) $$
The posterior parameters:
- Posterior mean: $\frac{25}{25 + 15} = \frac{25}{40} = 0.625$
- Posterior mode: $\frac{25 - 1}{25 + 15 - 2} = \frac{24}{38} = 0.6316$
- 95% credible interval: We can compute this using Python (approximately $[0.468, 0.765]$).
Interpretation: Our estimate of the team's true win probability has shifted from the prior mean of 0.55 to the posterior mean of 0.625. Note that the posterior mean (0.625) is between the prior mean (0.55) and the observed win rate ($14/20 = 0.70$). The Bayesian update has "shrunk" the observed win rate toward our prior, reflecting our belief that a 70% win rate in 20 games likely overstates the team's true ability.
This shrinkage property is crucial in sports betting. Small sample sizes produce noisy estimates. A team that starts 14-6 is probably not a true 70% team. Bayesian updating with a reasonable prior automatically corrects for this, producing more reliable estimates -- especially early in the season when data is scarce.
After 40 more games (let us say the team goes 23-17 in those 40 games, for a season record of 37-23):
$$ p \mid \text{all data} \sim \text{Beta}(11 + 37, \, 9 + 23) = \text{Beta}(48, 32) $$
- Posterior mean: $48/80 = 0.60$
- 95% credible interval: approximately $[0.489, 0.704]$
With more data, the credible interval has narrowed. Our estimate of 0.60 is now quite stable, and the prior's influence has diminished relative to the accumulated evidence.
7.4.4 When to Use the Beta Distribution in Betting
The beta distribution is useful whenever you need to estimate an unknown probability:
- Win probability estimation: As shown in the worked example, combining prior beliefs with observed records.
- Conversion rates: What fraction of a team's shots become goals? What is a pitcher's true strikeout rate?
- Betting system evaluation: If a bettor claims a 55% win rate on spread bets, the beta distribution quantifies how confident we should be based on their sample size.
- Market efficiency testing: Is a sportsbook's implied probability calibrated? The beta distribution can model the true probability underlying observed outcomes.
7.4.5 Python Implementation
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
def bayesian_win_probability(prior_alpha, prior_beta, wins, losses):
"""
Update a Beta prior with observed wins and losses.
Returns the posterior Beta distribution parameters and summary statistics.
"""
post_alpha = prior_alpha + wins
post_beta = prior_beta + losses
posterior = stats.beta(post_alpha, post_beta)
return {
'alpha': post_alpha,
'beta': post_beta,
'mean': posterior.mean(),
'mode': (post_alpha - 1) / (post_alpha + post_beta - 2)
if post_alpha > 1 and post_beta > 1 else None,
'std': posterior.std(),
'ci_95': posterior.ppf([0.025, 0.975]),
'distribution': posterior
}
# --- Worked Example: Tracking a team through the season ---
# Prior: Beta(11, 9) -- believe true win prob is ~55%
prior_a, prior_b = 11, 9
# Sequence of updates through the season
updates = [
("Preseason", 0, 0),
("After 10 games (7-3)", 7, 3),
("After 20 games (14-6)", 14, 6),
("After 40 games (25-15)", 25, 15),
("After 60 games (37-23)", 37, 23),
("After 82 games (50-32)", 50, 32),
]
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()
x = np.linspace(0, 1, 1000)
print("=== Bayesian Win Probability Updates ===\n")
for idx, (label, wins, losses) in enumerate(updates):
result = bayesian_win_probability(prior_a, prior_b, wins, losses)
print(f"{label}")
print(f" Record: {wins}-{losses} "
f"(observed rate: {wins/(wins+losses):.3f})" if wins + losses > 0
else f" No games played yet")
print(f" Posterior: Beta({result['alpha']}, {result['beta']})")
print(f" Mean: {result['mean']:.4f}")
if result['mode'] is not None:
print(f" Mode: {result['mode']:.4f}")
print(f" 95% CI: [{result['ci_95'][0]:.3f}, {result['ci_95'][1]:.3f}]")
print()
# Plot
ax = axes[idx]
pdf = result['distribution'].pdf(x)
ax.plot(x, pdf, 'b-', linewidth=2)
ax.fill_between(x, pdf, alpha=0.2)
ax.axvline(x=result['mean'], color='red', linestyle='--',
label=f"Mean = {result['mean']:.3f}")
if wins + losses > 0:
ax.axvline(x=wins/(wins+losses), color='green', linestyle=':',
label=f"Obs rate = {wins/(wins+losses):.3f}")
ax.set_title(label, fontsize=11)
ax.set_xlabel('Win Probability')
ax.set_ylabel('Density')
ax.legend(fontsize=8)
ax.set_xlim(0.2, 0.9)
plt.suptitle('Bayesian Updating of Win Probability Through the Season',
fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('beta_bayesian_updating.png', dpi=150, bbox_inches='tight')
plt.show()
# --- Practical Application: Should we bet on this team? ---
print("=== Betting Application ===\n")
# After 20 games (14-6), the sportsbook offers this team at -130 moneyline
# (implied probability = 130/230 = 56.5%)
result_20 = bayesian_win_probability(prior_a, prior_b, 14, 6)
implied_prob = 130 / 230
# What is P(true win prob > implied prob)?
p_value = 1 - result_20['distribution'].cdf(implied_prob)
print(f"After 20 games (14-6):")
print(f"Our posterior mean: {result_20['mean']:.4f}")
print(f"Sportsbook implied probability: {implied_prob:.4f}")
print(f"P(true win prob > {implied_prob:.4f}): {p_value:.4f}")
print(f"\nInterpretation: There is a {p_value*100:.1f}% probability that the "
f"team's true\nwin rate exceeds the sportsbook's implied probability.")
print(f"This suggests {'a potential edge' if p_value > 0.6 else 'no clear edge'}.")
7.5 Fitting Distributions to Real Sports Data
The distributions we have studied are models -- simplifications of reality. A model is useful only if it fits the data well enough for the purpose at hand. In this section, we develop the tools to assess whether a particular distribution adequately describes a given dataset.
7.5.1 Visual Assessment
Before running any formal tests, always visualize. Two plots are particularly informative:
Histogram with fitted PDF/PMF overlay: Plot the empirical distribution of your data alongside the theoretical distribution with fitted parameters. Visual discrepancies (e.g., the data has heavier tails, or peaks at certain values) are often immediately apparent.
QQ Plot (Quantile-Quantile Plot): A QQ plot compares the quantiles of your data against the quantiles of the theoretical distribution. If the data follows the distribution, the points will fall along a straight diagonal line. Systematic deviations reveal specific ways the fit fails:
- Points curving above the line at both ends: heavier tails than the theoretical distribution.
- Points curving below the line at both ends: lighter tails.
- An S-shaped pattern: skewness mismatch.
PP Plot (Probability-Probability Plot): Similar to a QQ plot but compares cumulative probabilities rather than quantiles. Less commonly used but sometimes more sensitive to deviations in the middle of the distribution.
7.5.2 Formal Goodness-of-Fit Tests
Three tests are commonly employed:
Chi-Squared Test: The classic test for categorical and discrete data. It compares observed frequencies in each bin against expected frequencies under the theoretical distribution:
$$ \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} $$
where $O_i$ is the observed count in bin $i$, $E_i$ is the expected count, and the sum is over all $k$ bins. Under the null hypothesis (data follows the distribution), $\chi^2$ follows a chi-squared distribution with $k - 1 - m$ degrees of freedom, where $m$ is the number of parameters estimated from the data.
Rules of thumb: each bin should have an expected count of at least 5. Merge bins at the tails if necessary.
Kolmogorov-Smirnov (KS) Test: Measures the maximum absolute difference between the empirical CDF and the theoretical CDF:
$$ D = \sup_x |F_n(x) - F(x)| $$
where $F_n$ is the empirical CDF and $F$ is the theoretical CDF. The KS test is most powerful for detecting overall distributional differences and works best for continuous distributions. One important caveat: the standard KS test assumes the theoretical distribution parameters are known in advance, not estimated from the data. When parameters are estimated from the data (as is usually the case), the test becomes conservative -- it is less likely to reject a poor fit. The Lilliefors correction addresses this for the normal distribution specifically.
Anderson-Darling Test: A refinement of the KS test that gives more weight to the tails of the distribution:
$$ A^2 = -n - \sum_{i=1}^{n} \frac{2i - 1}{n} \left[\ln F(X_{(i)}) + \ln(1 - F(X_{(n+1-i)}))\right] $$
where $X_{(i)}$ are the ordered data values. The Anderson-Darling test is generally more powerful than the KS test, especially for detecting deviations in the tails -- which is precisely where betting value often lies.
7.5.3 When Standard Distributions Don't Fit
Sometimes no single standard distribution fits well. Common reasons in sports data include:
-
Overdispersion: The variance exceeds what the distribution predicts. For Poisson data, the negative binomial distribution accommodates overdispersion by introducing an extra parameter.
-
Zero-inflation: More zeros than the distribution predicts. In some sports contexts (e.g., shutouts in low-scoring sports), a zero-inflated Poisson model may be needed.
-
Mixture distributions: The data may be a mixture of two or more populations. For example, NFL game totals might be a mixture of distributions corresponding to different game types (offensive shootouts vs. defensive battles).
-
Discrete scoring artifacts: In sports with discrete scoring increments (3 and 7 in football, 2 and 3 in basketball), the continuous normal approximation breaks down at key numbers.
-
Heavy tails: Extreme outcomes (blowouts) may occur more frequently than a normal distribution predicts. The Student's t-distribution or other heavy-tailed distributions may fit better.
We will explore some of these more advanced models in later chapters, particularly Chapter 19 on advanced modeling techniques.
7.5.4 Comparing Distribution Fits Across Sports
Different sports lend themselves to different distributions:
| Sport | Data Type | Primary Distribution | Why It Works | Common Deviation |
|---|---|---|---|---|
| NFL | Point margins | Normal | Sum of many scoring events | Peaks at key numbers (3, 7) |
| NBA | Point margins | Normal | High-scoring, many possessions | Slight heavy tails |
| Soccer | Goals per team | Poisson | Rare, roughly independent events | Slight overdispersion |
| Hockey | Goals per team | Poisson | Similar to soccer | Overdispersion (power plays) |
| Baseball | Runs per team | Poisson / Neg. Binomial | Count data, but clustered | Notable overdispersion |
| Any sport | Season wins | Binomial | Repeated win/loss trials | Schedule heterogeneity |
| Any sport | Win probability | Beta | Unknown probability | N/A (modeling tool) |
7.5.5 Comprehensive Example: Which Distribution Best Fits NFL Game Totals?
This extended example walks through the complete process of fitting and comparing distributions to a real sports dataset.
The question: NFL game totals (the combined points scored by both teams) are the basis for one of the most popular betting markets. What distribution best describes them?
Candidate distributions: - Normal distribution: $N(\mu, \sigma^2)$ - Poisson distribution: $\text{Poisson}(\lambda)$ - Negative binomial distribution: $\text{NegBin}(r, p)$
Reasoning: The total is a sum of many individual scoring events, suggesting normality (via CLT). But it is also a count of discrete points, suggesting Poisson. The negative binomial allows for overdispersion, which we might expect given the heterogeneity of NFL games.
7.5.6 Python Implementation
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# --- Generate realistic NFL total points data ---
# In practice, you would load actual data.
# NFL totals are approximately normal with mean ~44 and std ~14
np.random.seed(42)
# We simulate with slight overdispersion relative to Poisson
# by using a mixture of games with different expected totals
n_games = 512
game_types = np.random.choice([0, 1, 2], size=n_games, p=[0.25, 0.50, 0.25])
# Low-scoring games (~37), average games (~44), high-scoring (~51)
means = [37, 44, 51]
nfl_totals = np.array([
max(0, int(np.random.normal(means[gt], 10)))
for gt in game_types
])
print(f"=== NFL Game Totals: Descriptive Statistics ===")
print(f"N games: {len(nfl_totals)}")
print(f"Mean: {nfl_totals.mean():.2f}")
print(f"Median: {np.median(nfl_totals):.2f}")
print(f"Std Dev: {nfl_totals.std():.2f}")
print(f"Variance: {nfl_totals.var():.2f}")
print(f"Variance / Mean: {nfl_totals.var() / nfl_totals.mean():.2f} "
f"(= 1.0 for Poisson)")
print(f"Min: {nfl_totals.min()}, Max: {nfl_totals.max()}")
print(f"Skewness: {stats.skew(nfl_totals):.3f}")
print(f"Kurtosis: {stats.kurtosis(nfl_totals):.3f}")
# --- Fit each distribution ---
# 1. Normal
mu_norm, sigma_norm = stats.norm.fit(nfl_totals)
print(f"\n--- Normal Fit ---")
print(f"mu = {mu_norm:.2f}, sigma = {sigma_norm:.2f}")
# 2. Poisson
lambda_pois = nfl_totals.mean()
print(f"\n--- Poisson Fit ---")
print(f"lambda = {lambda_pois:.2f}")
# 3. Negative Binomial
# Parameterize: mean = r(1-p)/p, variance = r(1-p)/p^2
# Method of moments
sample_mean = nfl_totals.mean()
sample_var = nfl_totals.var()
if sample_var > sample_mean:
p_nb = sample_mean / sample_var
r_nb = sample_mean * p_nb / (1 - p_nb)
print(f"\n--- Negative Binomial Fit ---")
print(f"r = {r_nb:.2f}, p = {p_nb:.4f}")
else:
print("\nVariance <= Mean: Negative binomial not needed")
r_nb, p_nb = None, None
# --- Visual Comparison ---
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Top left: Histogram with fits
ax = axes[0, 0]
bins = np.arange(nfl_totals.min() - 0.5, nfl_totals.max() + 1.5, 1)
ax.hist(nfl_totals, bins=bins, density=True, alpha=0.5, color='steelblue',
label='Observed', edgecolor='black', linewidth=0.3)
x_cont = np.linspace(nfl_totals.min(), nfl_totals.max(), 200)
ax.plot(x_cont, stats.norm.pdf(x_cont, mu_norm, sigma_norm), 'r-',
linewidth=2, label=f'Normal({mu_norm:.1f}, {sigma_norm:.1f}²)')
x_disc = np.arange(nfl_totals.min(), nfl_totals.max() + 1)
ax.plot(x_disc, stats.poisson.pmf(x_disc, lambda_pois), 'g^-',
linewidth=1.5, markersize=4, label=f'Poisson({lambda_pois:.1f})')
if r_nb is not None:
ax.plot(x_disc, stats.nbinom.pmf(x_disc, r_nb, p_nb), 'ms-',
linewidth=1.5, markersize=4,
label=f'NegBin({r_nb:.1f}, {p_nb:.3f})')
ax.set_xlabel('Total Points')
ax.set_ylabel('Density')
ax.set_title('Distribution Fit Comparison')
ax.legend()
# Top right: QQ plot for Normal
ax = axes[0, 1]
stats.probplot(nfl_totals, dist="norm", plot=ax)
ax.set_title('QQ Plot: Normal Distribution')
# Bottom left: Observed vs Expected (chi-squared bins)
ax = axes[1, 0]
bin_edges = np.arange(0, 85, 5)
observed_counts, _ = np.histogram(nfl_totals, bins=bin_edges)
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2
# Expected counts under normal
expected_norm = len(nfl_totals) * np.diff(
stats.norm.cdf(bin_edges, mu_norm, sigma_norm))
width = 2.0
ax.bar(bin_centers - width/2, observed_counts, width=width, alpha=0.6,
color='steelblue', label='Observed')
ax.bar(bin_centers + width/2, expected_norm, width=width, alpha=0.6,
color='red', label='Expected (Normal)')
ax.set_xlabel('Total Points (binned)')
ax.set_ylabel('Count')
ax.set_title('Observed vs Expected: Normal Model')
ax.legend()
# Bottom right: Residuals
ax = axes[1, 1]
residuals = (observed_counts - expected_norm) / np.sqrt(
np.maximum(expected_norm, 1))
ax.bar(bin_centers, residuals, width=3, color='purple', alpha=0.6)
ax.axhline(y=0, color='black', linestyle='-')
ax.axhline(y=2, color='red', linestyle='--', alpha=0.5)
ax.axhline(y=-2, color='red', linestyle='--', alpha=0.5)
ax.set_xlabel('Total Points (binned)')
ax.set_ylabel('Standardized Residual')
ax.set_title('Standardized Residuals (Normal Model)')
plt.suptitle('NFL Game Totals: Distribution Fitting Analysis',
fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('nfl_totals_distribution_fit.png', dpi=150, bbox_inches='tight')
plt.show()
# --- Formal Goodness-of-Fit Tests ---
print("\n=== Formal Goodness-of-Fit Tests ===\n")
# 1. Kolmogorov-Smirnov test (Normal)
ks_stat, ks_pval = stats.kstest(nfl_totals, 'norm', args=(mu_norm, sigma_norm))
print(f"KS Test (Normal): statistic = {ks_stat:.4f}, p-value = {ks_pval:.4f}")
# 2. Anderson-Darling test (Normal)
ad_result = stats.anderson(nfl_totals, dist='norm')
print(f"Anderson-Darling (Normal): statistic = {ad_result.statistic:.4f}")
for sl, cv in zip(ad_result.significance_level, ad_result.critical_values):
reject = "REJECT" if ad_result.statistic > cv else "fail to reject"
print(f" At {sl}% significance: critical value = {cv:.4f} -> {reject}")
# 3. Chi-squared test (all distributions)
def chi_squared_gof(observed, expected, n_params):
"""
Perform chi-squared goodness-of-fit test.
Merges bins with expected count < 5.
"""
# Merge small bins
obs_merged, exp_merged = [], []
obs_accum, exp_accum = 0, 0
for o, e in zip(observed, expected):
obs_accum += o
exp_accum += e
if exp_accum >= 5:
obs_merged.append(obs_accum)
exp_merged.append(exp_accum)
obs_accum, exp_accum = 0, 0
if obs_accum > 0:
obs_merged[-1] += obs_accum
exp_merged[-1] += exp_accum
obs_merged = np.array(obs_merged)
exp_merged = np.array(exp_merged)
chi2 = np.sum((obs_merged - exp_merged)**2 / exp_merged)
df = len(obs_merged) - 1 - n_params
p_value = 1 - stats.chi2.cdf(chi2, df)
return chi2, df, p_value
# Normal
expected_norm = len(nfl_totals) * np.diff(
stats.norm.cdf(bin_edges, mu_norm, sigma_norm))
chi2_n, df_n, pval_n = chi_squared_gof(observed_counts, expected_norm, 2)
print(f"\nChi-squared (Normal): chi2 = {chi2_n:.2f}, df = {df_n}, "
f"p = {pval_n:.4f}")
# Poisson
expected_pois = len(nfl_totals) * np.diff(
stats.poisson.cdf(bin_edges, lambda_pois))
chi2_p, df_p, pval_p = chi_squared_gof(observed_counts, expected_pois, 1)
print(f"Chi-squared (Poisson): chi2 = {chi2_p:.2f}, df = {df_p}, "
f"p = {pval_p:.4f}")
# Negative Binomial
if r_nb is not None:
expected_nb = len(nfl_totals) * np.diff(
stats.nbinom.cdf(bin_edges, r_nb, p_nb))
chi2_nb, df_nb, pval_nb = chi_squared_gof(
observed_counts, expected_nb, 2)
print(f"Chi-squared (Neg Binom): chi2 = {chi2_nb:.2f}, df = {df_nb}, "
f"p = {pval_nb:.4f}")
# --- Summary ---
print("\n=== Distribution Comparison Summary ===")
print(f"{'Distribution':<20} {'Chi2':<10} {'df':<5} {'p-value':<10} "
f"{'Verdict':<15}")
print("-" * 60)
for name, chi2, df, pval in [
("Normal", chi2_n, df_n, pval_n),
("Poisson", chi2_p, df_p, pval_p),
("Neg. Binomial", chi2_nb, df_nb, pval_nb) if r_nb else (None,)*4
]:
if name:
verdict = "Good fit" if pval > 0.05 else "Poor fit"
print(f"{name:<20} {chi2:<10.2f} {df:<5} {pval:<10.4f} {verdict:<15}")
Key insight from this analysis: NFL game totals are typically well-modeled by a normal distribution, despite being technically discrete. The Poisson distribution fails because its mean-equals-variance constraint is too restrictive -- NFL totals have variance much larger than their mean (overdispersion). The negative binomial, which allows for overdispersion, provides a better discrete fit, but the normal distribution is often sufficient for practical betting calculations.
7.6 Chapter Summary
This chapter has equipped you with four probability distributions that together cover the vast majority of sports betting modeling scenarios. Let us consolidate what we have learned.
7.6.1 Distribution Quick-Reference Table
| Property | Normal | Poisson | Binomial | Beta |
|---|---|---|---|---|
| Type | Continuous | Discrete | Discrete | Continuous |
| Support | $(-\infty, +\infty)$ | $\{0, 1, 2, \ldots\}$ | $\{0, 1, \ldots, n\}$ | $[0, 1]$ |
| Parameters | $\mu, \sigma$ | $\lambda$ | $n, p$ | $\alpha, \beta$ |
| Mean | $\mu$ | $\lambda$ | $np$ | $\frac{\alpha}{\alpha+\beta}$ |
| Variance | $\sigma^2$ | $\lambda$ | $np(1-p)$ | $\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$ |
| Key formula | $f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ | $P(k)=\frac{\lambda^k e^{-\lambda}}{k!}$ | $P(k)=\binom{n}{k}p^k(1-p)^{n-k}$ | $f(x)=\frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}$ |
| Primary use | Point spreads, margins | Goals, runs (counts) | Season wins, streaks | Unknown probabilities |
| Sports examples | NFL margins, NBA differentials | Soccer goals, hockey goals | Season records, ATS records | True win rates, conversion rates |
| Python | stats.norm |
stats.poisson |
stats.binom |
stats.beta |
7.6.2 Decision Framework: Choosing the Right Distribution
When faced with a new modeling problem, ask yourself these questions in order:
-
Is the quantity continuous or discrete? - Continuous (margins, ratings, efficiency metrics) --> Consider normal (or t-distribution for heavy tails). - Discrete (counts, wins) --> Go to question 2.
-
Is it a count of events with no fixed upper bound? - Yes (goals, runs, penalties) --> Consider Poisson (or negative binomial if overdispersed). - No --> Go to question 3.
-
Is it a count of successes in a fixed number of trials? - Yes (wins in a season, covers in N bets) --> Use binomial. - No --> Go to question 4.
-
Are you modeling an unknown probability? - Yes (true win rate, true cover rate) --> Use beta.
-
Does the data fit? - Always validate with visual inspection and formal goodness-of-fit tests (Section 7.5). - If the fit is poor, consider extensions: negative binomial for overdispersed counts, t-distribution for heavy-tailed continuous data, mixture models for multimodal data.
7.6.3 Key Formulas Summary
Normal distribution -- cover probability:
$$ P(\text{Cover spread } s) = 1 - \Phi\left(\frac{s - \mu}{\sigma}\right) $$
Poisson -- exact score probability:
$$ P(\text{Score} = i\text{-}j) = \frac{\lambda_H^i \, e^{-\lambda_H}}{i!} \times \frac{\lambda_A^j \, e^{-\lambda_A}}{j!} $$
Binomial -- at least $k$ wins in $n$ games:
$$ P(X \geq k) = 1 - \sum_{i=0}^{k-1} \binom{n}{i} p^i (1-p)^{n-i} $$
Beta-Binomial update:
$$ \text{Prior: } \text{Beta}(\alpha, \beta) \xrightarrow{\text{observe } k \text{ wins, } n-k \text{ losses}} \text{Posterior: } \text{Beta}(\alpha + k, \beta + n - k) $$
Chi-squared goodness of fit:
$$ \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} $$
7.6.4 Common Pitfalls
-
Assuming normality without checking: Just because the Central Limit Theorem suggests approximate normality does not mean the approximation is good enough for your purposes. Always validate, especially in the tails where betting value often lies.
-
Ignoring overdispersion in Poisson models: If the variance of your count data substantially exceeds the mean, the Poisson model will underestimate the probability of extreme outcomes. This leads to systematically mispriced bets on high and low totals.
-
Confusing the binomial's independence assumption with reality: The binomial assumes each game is independent, but injuries, schedule effects, and team dynamics create dependencies. Use the binomial as a baseline, not a final answer.
-
Using too-strong priors with the beta distribution: If your prior is overly concentrated ($\alpha + \beta$ is very large), it will dominate the data and slow adaptation to new evidence. Start with a prior that is informative but flexible, typically with $\alpha + \beta$ between 10 and 30 for team-level estimates.
-
Fitting distributions to combined data from heterogeneous populations: If you fit a single normal distribution to all NFL game totals without accounting for game context (indoor vs. outdoor, dome vs. wind, team quality), the fit will be poor. Conditioning on relevant factors before fitting usually yields better models.
-
Over-relying on p-values from goodness-of-fit tests: A large sample will cause any goodness-of-fit test to reject because no real data perfectly follows any theoretical distribution. The practical question is not "does the data perfectly follow this distribution?" but rather "is this distribution a good enough approximation for my betting purposes?"
7.6.5 Practical Code Patterns
Throughout this chapter, we have developed several code patterns that recur frequently in betting analysis:
stats.norm.cdf()and1 - stats.norm.cdf(): The workhorses for spread and totals betting under the normal model.np.outer()for Poisson score matrices: An elegant way to compute all exact-score probabilities simultaneously.stats.binom.cdf()for cumulative probabilities: Essential for season win totals and streak analysis.stats.beta()for Bayesian updating: The prior-to-posterior pipeline that powers dynamic team rating.stats.probplot()for QQ plots: The fastest way to visually assess distributional fit.- Custom chi-squared GOF with bin merging: A production-ready test that handles the small-expected-count problem automatically.
Keep these patterns in your modeling toolkit. They form the building blocks for the more sophisticated models we will develop in Parts III and IV of this textbook.
7.6.6 Chapter Exercises
-
Spread betting: Your model predicts an NBA game margin of $+8$ for the home team with $\sigma = 12$. The spread is $-6.5$. Calculate the cover probability and determine whether there is value at $-110$.
-
Poisson soccer model: In a match between Team X (expected goals $\lambda = 1.8$) and Team Y (expected goals $\lambda = 1.2$), calculate: (a) the probability of a 1-1 draw, (b) the probability of Over 2.5 goals, (c) the most likely exact score.
-
Streak analysis: A baseball team has a true win probability of 0.54. What is the probability they win 8 or more of their next 10 games? What about 90 or more of 162 games?
-
Bayesian updating: You start with a $\text{Beta}(5, 5)$ prior for a bettor's win rate on totals bets. After 100 bets, they have gone 58-42. What is the posterior distribution? What is the probability their true win rate exceeds 55%?
-
Distribution fitting: Collect data on total points scored in NHL games from a recent season. Fit both a Poisson and a normal distribution. Which fits better? Is there evidence of overdispersion?
-
Multi-distribution modeling: For an NFL game where you project the margin to be Home $+4$ with $\sigma = 13.5$, compute: (a) the probability the home team wins outright, (b) the probability of covering $-3$, (c) the probability the game total goes Over 44.5 (assuming you model the total as $N(44, 14^2)$ independently of the margin).
What's Next: Chapter 8 -- Hypothesis Testing for Bettors
Now that we can model sports outcomes with appropriate probability distributions, the next question is: how do we test whether our models are actually right? And how do we determine whether a bettor's track record reflects genuine skill or is merely consistent with random chance?
Chapter 8 introduces hypothesis testing -- the formal statistical framework for answering these questions. We will learn to formulate null and alternative hypotheses, calculate test statistics and p-values, and apply these tools to real betting problems. Specific topics include:
- Testing whether a betting system beats the closing line: Is a reported 55% ATS record over 200 bets statistically significant?
- Testing whether a team's performance has truly changed: After a coaching change, has the team genuinely improved, or is the recent record within the range of normal variation?
- The multiple testing problem: When you test dozens of angles and systems, some will look profitable by chance alone. We will learn how to control the false discovery rate.
- Power analysis: How many bets do you need to track before you can reliably distinguish a 55% bettor from a 50% bettor?
These are among the most practically important statistical tools for any serious bettor. The distributions from this chapter provide the foundation; hypothesis testing provides the framework for drawing rigorous conclusions.
Chapter 7 is part of "The Sports Betting Textbook," Part II: Statistical Foundations. Prerequisites: Chapter 6 (Probability Fundamentals) and Chapter 2 (The Betting Markets). Estimated study time: 5 hours.
Related Reading
Explore this topic in other books
AI Engineering Probability, Statistics & Information Theory Sports Betting Probability and Odds Prediction Markets Probability Fundamentals NFL Analytics Statistical Foundations College Football Analytics Descriptive Statistics Basketball Analytics Descriptive Statistics Soccer Analytics Statistics for Soccer