Chapter 7: Key Takeaways — Probability Distributions in Betting
1. The Right Distribution for the Right Problem
Selecting the correct probability distribution is the foundation of any quantitative betting model. The choice depends on the nature of the data:
- Poisson for counts of rare, independent events in a fixed interval (soccer goals, hockey goals, strikeouts per game).
- Binomial for the number of successes in a fixed number of independent trials (season wins, free throw makes, parlay legs).
- Normal (Gaussian) for continuous measurements that arise from the sum of many small effects (point spread margins, total scores in high-scoring sports, season-long aggregates).
- Beta for modeling uncertainty about a probability parameter (a team's true win rate, a player's true shooting percentage).
Misapplying a distribution leads to systematically biased probability estimates and poor betting decisions.
2. The Poisson Distribution Is the Backbone of Low-Scoring Sports Models
The Poisson distribution, parameterized by a single rate lambda, provides a natural model for goal-scoring in soccer and hockey. Its key properties are:
- The mean and variance are both equal to lambda.
- It assumes events are independent and occur at a constant rate.
- The probability of k events is P(X = k) = (lambda^k * e^(-lambda)) / k!.
For soccer betting, independent Poisson models for each team generate a scoreline probability matrix that prices the 1X2 market, Over/Under, Both Teams to Score, and Correct Score markets. The Dixon-Coles extension addresses the model's known weakness: the slight dependence between teams' goal outputs at low scores.
3. The Binomial Distribution Governs Season-Level Outcomes
When a team plays n games with a per-game win probability p, the total wins follow a Binomial(n, p) distribution:
- Expected wins: E[X] = n * p.
- Standard deviation: SD = sqrt(n * p * (1 - p)).
- This distribution directly prices season win total Over/Under markets.
The binomial also provides the null hypothesis for streak analysis. Observed win or loss streaks that fall within the binomial expectation are not evidence of momentum — they are evidence of randomness.
4. The Normal Distribution Underpins Spread and Totals Markets
Point spread margins in the NFL are well-approximated by a normal distribution with mean near zero (after accounting for the line) and standard deviation approximately 13.86 points. This model enables:
- Calculating cover probabilities for any spread.
- Pricing alternate spreads and teasers.
- Estimating the probability of outright wins, pushes, and backdoor covers.
- Assessing Over/Under probabilities for point totals.
The Central Limit Theorem justifies the normal approximation for aggregate scores: the total points in a basketball or football game are the sum of many individual scoring events, and sums of independent random variables converge to normality.
5. The Beta Distribution Enables Bayesian Reasoning
The Beta distribution is defined on the interval [0, 1] and is the conjugate prior to the Binomial likelihood. This means:
- If the prior is Beta(alpha, beta) and we observe s successes and f failures, the posterior is Beta(alpha + s, beta + f).
- The posterior mean is (alpha + s) / (alpha + beta + s + f).
- Credible intervals quantify the remaining uncertainty about the true parameter.
In betting, the Beta distribution is used to model uncertainty about a team's true win probability, a player's true shooting percentage, or any other proportion. As more data accumulates, the posterior concentrates around the true value, and the choice of prior matters less.
6. Overdispersion Is a Real Concern
The Poisson distribution constrains the variance to equal the mean. When real data exhibits more variability than the Poisson predicts (overdispersion), the model underestimates the probability of extreme outcomes. The Negative Binomial distribution addresses this by introducing a dispersion parameter that allows the variance to exceed the mean. Always test for overdispersion before committing to a Poisson model.
7. Distribution Fitting Requires Rigorous Validation
Fitting a distribution to data is not enough; you must validate the fit:
- Goodness-of-fit tests (chi-squared, Kolmogorov-Smirnov, Anderson-Darling) test whether the data is consistent with the proposed distribution.
- Information criteria (AIC, BIC) compare models while penalizing complexity.
- Visual inspection (Q-Q plots, probability plots, histogram overlays) catches problems that formal tests may miss.
- Calibration analysis checks whether events you predict at probability p occur approximately p fraction of the time.
A model that fits the center of the distribution well but misses the tails will perform poorly on bets that depend on extreme outcomes (alternate spreads, large Over/Under lines, longshot parlays).
8. Tail Probabilities Matter More Than You Think
In sports betting, many of the most profitable opportunities involve tail probabilities: alternate spreads, extreme totals, longshot correct scores. Standard distributions (Normal, Poisson) often underestimate tail probabilities because real-world sports data tends to have heavier tails than these models predict. Strategies that depend on tail accuracy — such as teasers, alternate lines, and prop bets — require careful attention to the true shape of the distribution in its extremes.
9. Streaks and Randomness
The human brain is wired to see patterns in randomness. The binomial distribution shows that long win and loss streaks are a natural consequence of independent trials at a fixed probability. Before attributing a streak to "momentum" or "being due," check whether the observed streak falls within the range that pure randomness produces. In most cases, it does.
10. From Distributions to Decisions
Probability distributions are not ends in themselves — they are tools for making better decisions under uncertainty. The workflow for the quantitative bettor is:
- Select the appropriate distribution for the variable of interest.
- Estimate the distribution's parameters from historical data.
- Validate the fit using goodness-of-fit tests and calibration checks.
- Calculate the probability of the relevant betting outcome.
- Compare your probability to the market's implied probability.
- Bet when your edge exceeds a meaningful threshold, sizing according to Kelly or fractional Kelly.
Every step in this chain depends on a solid understanding of probability distributions.
End of Key Takeaways — Chapter 7