Chapter 24 Key Takeaways: Simulation and Monte Carlo Methods

Key Concepts

  1. Monte Carlo Simulation: A method for estimating expectations, probabilities, and distributions by generating random samples from a model. The estimate $\hat{\mu}_N = \frac{1}{N}\sum_{i=1}^N g(X_i)$ converges to the true value $E[g(X)]$ as $N \to \infty$ by the Law of Large Numbers. The standard error decreases at $O(1/\sqrt{N})$, meaning precision is expensive: halving the error requires quadrupling the simulations.

  2. Season and Tournament Simulation: Monte Carlo simulation applied to full sports seasons and playoff brackets. Takes team power ratings as input and produces probability distributions for wins, playoff berths, seedings, and championships. This is the standard approach used by professional forecasters and the backbone of futures market analysis.

  3. The Bootstrap: A resampling method that estimates the sampling distribution of a statistic by drawing samples with replacement from the observed data. For betting, it provides confidence intervals for ROI, win rate, Sharpe ratio, and other metrics that lack analytical confidence interval formulas.

  4. BCa Bootstrap Intervals: The Bias-Corrected and Accelerated bootstrap confidence interval, which adjusts for both bias and skewness in the bootstrap distribution. The BCa method produces more accurate coverage than the simpler percentile method, especially for skewed statistics like the Sharpe ratio and profit factor.

  5. Permutation Tests: Distribution-free hypothesis tests that evaluate the significance of an observed effect by comparing it to the distribution of the test statistic under random permutations of the data labels. Permutation tests make no distributional assumptions and can use any test statistic, making them ideal for sports data with non-normal distributions.

  6. Variance Reduction: Techniques that improve the efficiency of Monte Carlo simulation by reducing the variance of the estimator without changing its expected value. The four main techniques are antithetic variates (negative correlation), control variates (known-mean adjustment), importance sampling (oversampling rare events), and stratified sampling (dividing the sample space).

  7. Convergence Diagnostics: Methods for assessing whether a Monte Carlo simulation has run long enough to produce reliable estimates. Key diagnostics include running mean plots, running standard error, and comparisons of successive confidence intervals.

  8. Reproducibility: The practice of setting and recording random seeds so that simulation results can be exactly reproduced. Essential for debugging, peer review, and regulatory compliance.


Key Formulas

Formula Expression Application
MC Estimate $\hat{\mu}_N = \frac{1}{N}\sum_{i=1}^N g(X_i)$ Point estimate of any expectation
MC Standard Error $SE = \hat{\sigma} / \sqrt{N}$ Uncertainty of MC estimate
MC 95% CI $\hat{\mu}_N \pm 1.96 \cdot \hat{\sigma}/\sqrt{N}$ Confidence interval for estimate
Probability Estimate $\hat{P}(A) = \frac{1}{N}\sum \mathbf{1}_A(X_i)$ Proportion of simulations where event $A$ occurs
Win Probability (logistic) $P = \frac{1}{1+10^{-(R_A - R_B + H)/s}}$ Game outcome probability from ratings
Normal Margin Model $M \sim N(R_A - R_B + H, \sigma^2)$ Simulating point margins
Bootstrap Replicate $\hat{\theta}^* = T(X_1^*, \ldots, X_n^*)$ Statistic from resample
Percentile CI $[\hat{\theta}^*_{(\alpha/2)}, \hat{\theta}^*_{(1-\alpha/2)}]$ Simple bootstrap CI
Permutation P-Value $p = \frac{1}{N}\sum \mathbf{1}(T_\pi \geq T_{\text{obs}})$ Proportion of permutations exceeding observed
Control Variate $\hat{\mu}_{CV} = \hat{\mu}_N - \beta^*(\hat{C}_N - \mu_C)$ Adjusted estimate using known-mean variable
CV Variance $\text{Var}(\hat{\mu}_{CV}) = \text{Var}(\hat{\mu}_N)(1 - \rho^2)$ Variance reduction from correlation
Efficiency Gain $\text{Eff} = \text{Var}_\text{naive} / \text{Var}_\text{reduced}$ Factor of improvement

Quick-Reference Simulation Workflow

When building a Monte Carlo simulation for sports betting, follow this six-step workflow:

Step 1 --- Define the quantity of interest. Specify exactly what probability, expectation, or distribution you want to estimate. Examples: championship probability, expected profit of a staking strategy, distribution of season win totals.

Step 2 --- Build the simulation model. Implement a function that generates one realization of the random process. For season simulations, this means simulating every game. For betting performance, this means generating one bootstrap resample. Ensure the model uses the NumPy Generator API with a recorded seed.

Step 3 --- Run a pilot simulation. Run a small number of replications (1,000-5,000) to estimate the variance of the quantity of interest. Use this to determine how many replications are needed for the desired precision.

Step 4 --- Run the full simulation with convergence tracking. Execute the full simulation, recording running means and running standard errors at regular intervals. Monitor convergence to ensure the estimate has stabilized.

Step 5 --- Apply variance reduction (if applicable). If the naive simulation is too slow or imprecise, apply the most appropriate variance reduction technique. Control variates are the default choice for season simulations; antithetic variates work well for monotone functions; importance sampling is reserved for rare-event estimation.

Step 6 --- Report results with uncertainty. Always report the point estimate, standard error, and a 95% confidence interval. For probability estimates, also report the number of simulations and the effective sample size (if variance reduction was used).

The core principle: A simulation result without a standard error is a random number, not an estimate. Always quantify and report the simulation uncertainty alongside the point estimate.


Ready for Chapter 25? Self-Assessment Checklist

Before moving on to Chapter 25 ("Optimization Methods for Betting"), confirm that you can do the following:

  • [ ] Build a Monte Carlo simulation from scratch using NumPy's Generator API with proper seeding
  • [ ] Estimate probabilities and expectations via simulation with correct standard errors
  • [ ] Simulate a complete sports season given team power ratings and a schedule
  • [ ] Simulate single-elimination and multi-round playoff brackets
  • [ ] Implement the non-parametric bootstrap to construct confidence intervals for any statistic
  • [ ] Compute BCa bootstrap intervals and explain why they are preferred over percentile intervals
  • [ ] Design and execute permutation tests for two-sample, paired, and trend hypotheses
  • [ ] Implement antithetic variates and control variates for variance reduction
  • [ ] Explain when importance sampling is appropriate and diagnose when it fails (via effective sample size)
  • [ ] Assess convergence of a simulation using running mean and running SE diagnostics

If you can check every box with confidence, you are well prepared for Chapter 25, where optimization methods will transform the probabilistic insights from simulation into optimal betting decisions.