Chapter 24 Quiz: Simulation and Monte Carlo Methods

Q: A March Madness bracket simulation runs 50,000 replications and estimates that a 12-seed has a 4.2% chance of reaching the Sweet 16. The standard error of this estimate is approximately: (A) 0.09% (B) 0.28% (C) 0.90% (D) 2.80%

(A) 0.09%. For a proportion estimate , the standard error is . This small standard error indicates that 50,000 replications provide a precise estimate of this probability. Note that the SE would be larger for probabilities closer to 50% and smaller for more extreme probabilities.

Q: True or False: Increasing the number of Monte Carlo simulations from 10,000 to 40,000 reduces the standard error by half.

True. The standard error is proportional to . Since , the standard error with 40,000 simulations is half that with 10,000 simulations. This illustrates the convergence rate: a 4x increase in simulations yields a 2x improvement in precision.

Instructions: Answer all 25 questions. This quiz is worth 100 points. You have 60 minutes. A calculator is permitted; no notes or internet access. For multiple choice, select the single best answer.

Section 1: Multiple Choice (10 questions, 3 points each = 30 points)

Question 1. In a Monte Carlo simulation with $N$ independent replications, the standard error of the mean estimate decreases at a rate of:

(A) $O(1/N)$

(B) $O(1/\sqrt{N})$

(D) $O(\log N / N)$

Answer

**(B) $O(1/\sqrt{N})$.** By the Central Limit Theorem, the standard error of the sample mean is $\sigma / \sqrt{N}$, which decreases at $O(1/\sqrt{N})$. This means halving the standard error requires quadrupling the number of simulations. This rate is dimension-free, meaning it applies regardless of the dimensionality of the simulation, which is a key advantage of Monte Carlo methods over deterministic numerical integration.

Question 2. When simulating a full NFL season to estimate playoff probabilities, which of the following is the most important modeling decision?

(A) The choice of pseudorandom number generator algorithm

(B) Whether to simulate game outcomes as win/loss or as point margins

(D) Whether to use Python or R for the simulation

Answer

**(B) Whether to simulate game outcomes as win/loss or as point margins.** Simulating point margins allows the model to capture information relevant to spread and totals betting, provides a natural mechanism for tiebreaking, and more realistically represents the continuous nature of game competitiveness. The choice of PRNG algorithm (A) has negligible impact on results for standard simulations. Decimal precision (C) and programming language (D) are implementation details that do not affect the statistical validity of the simulation.

Question 3. The bootstrap is used in sports betting analysis primarily to:

(A) Generate synthetic game data for model training

(B) Estimate the sampling distribution of performance metrics when analytical formulas are unavailable

(D) Detect arbitrage opportunities across sportsbooks

Answer

**(B) Estimate the sampling distribution of performance metrics when analytical formulas are unavailable.** The bootstrap resamples the observed data with replacement to approximate the sampling distribution of a statistic. For betting metrics like ROI, Sharpe ratio, and maximum drawdown, analytical confidence intervals are either unavailable or rely on distributional assumptions that may not hold. The bootstrap provides distribution-free inference for these metrics. It does not generate new data for training (A), improve model accuracy (C), or detect arbitrage (D).

Question 4. In a permutation test evaluating whether a new prediction model is better than an old one, the null hypothesis is:

(A) Both models have the same prediction accuracy

(B) The new model is worse than the old model

(C) The labels "old model" and "new model" are exchangeable --- assigning predictions randomly to either label does not affect the test statistic

(D) The prediction errors follow a normal distribution

Answer

**(C) The labels "old model" and "new model" are exchangeable --- assigning predictions randomly to either label does not affect the test statistic.** The null hypothesis of a permutation test is specifically about exchangeability of labels, not about equality of parameters. Under the null, the assignment of predictions to "old" or "new" is arbitrary, and the observed difference in performance is due to chance. Option (A) is close but imprecise --- the permutation test does not assume a parametric model of "accuracy." Option (D) is incorrect because permutation tests make no distributional assumptions.

Question 5. Antithetic variates reduce variance by:

(A) Doubling the number of simulations

(B) Pairing each simulation with a negatively correlated counterpart so their average is closer to the true value

(D) Removing outliers from the simulation results

Answer

**(B) Pairing each simulation with a negatively correlated counterpart so their average is closer to the true value.** Antithetic variates generate pairs of simulations using uniform random variables $U$ and $1-U$. When the simulation output is a monotone function of $U$, the two outputs are negatively correlated: when one overestimates, the other tends to underestimate. The average of the pair has lower variance than the average of two independent simulations. Option (A) is incorrect because the efficiency gain comes from negative correlation, not just more simulations.

Question 6. A bettor's bootstrap analysis shows a 95% BCa confidence interval for win rate of (0.498, 0.582). The bettor needs a win rate above 0.524 to be profitable at -110 odds. Which statement is most accurate?

(A) The bettor is definitely profitable because the point estimate exceeds 0.524

(B) The bettor is definitely not profitable because the lower bound is below 0.524

(C) The data are consistent with both profitability and unprofitability --- the confidence interval includes both scenarios

(D) The bootstrap cannot address questions about profitability

Answer

**(C) The data are consistent with both profitability and unprofitability --- the confidence interval includes both scenarios.** The 95% CI spans from 0.498 (below breakeven) to 0.582 (above breakeven), meaning we cannot rule out either possibility with 95% confidence. The point estimate alone (A) ignores uncertainty. The lower bound (B) does not prove unprofitability --- it only shows that it is plausible. The bootstrap can absolutely address profitability questions (D) by computing P(win rate > 0.524) from the bootstrap distribution.

Question 7. When using control variates in a season simulation, a natural control variate is:

(A) The total number of wins across all teams, which is fixed by the schedule

(B) The random seed used for the simulation

(D) The home-field advantage parameter

Answer

**(A) The total number of wins across all teams, which is fixed by the schedule.** In any simulated season, the total number of wins across all teams must equal the total number of games played (each game produces exactly one winner). This quantity has a known expected value determined by the schedule, making it a perfect control variate. If a particular simulation produces slightly more total wins than expected (due to simulation noise), the control variate correction adjusts the estimate accordingly. Options (B), (C), and (D) are constants, not random variables, and therefore cannot serve as control variates.

Question 8. A March Madness bracket simulation runs 50,000 replications and estimates that a 12-seed has a 4.2% chance of reaching the Sweet 16. The standard error of this estimate is approximately:

(A) 0.09%

(B) 0.28%

(D) 2.80%

Answer

**(A) 0.09%.** For a proportion estimate $\hat{p}$, the standard error is $\sqrt{\hat{p}(1-\hat{p})/N} = \sqrt{0.042 \times 0.958 / 50000} = \sqrt{0.04027/50000} = \sqrt{0.000000805} \approx 0.000898 \approx 0.09\%$. This small standard error indicates that 50,000 replications provide a precise estimate of this probability. Note that the SE would be larger for probabilities closer to 50% and smaller for more extreme probabilities.

Question 9. Which of the following is NOT a valid reason to prefer permutation tests over parametric tests for sports data?

(A) Sports data often has non-normal distributions

(B) Permutation tests make no distributional assumptions

(D) Permutation tests can use any test statistic, not just those with known null distributions

Answer

**(C) Permutation tests are always more powerful than parametric tests.** This is false. When parametric assumptions are met (e.g., the data are truly normal), parametric tests like the t-test are more powerful than permutation tests because they exploit the additional information contained in the distributional assumption. Permutation tests trade power for robustness: they work correctly regardless of the distribution, but they may require more data to detect the same effect. Options (A), (B), and (D) are all valid reasons to prefer permutation tests.

Question 10. Importance sampling is most valuable when:

(A) All simulated outcomes are equally likely

(B) The quantity of interest depends on rare events that naive simulation rarely generates

(D) The number of dimensions is very large

Answer

**(B) The quantity of interest depends on rare events that naive simulation rarely generates.** Importance sampling shines when estimating rare-event probabilities. For example, estimating the probability that a 35-win team wins the NBA championship might produce zero championships in 10,000 naive simulations. Importance sampling oversamples favorable outcomes and corrects with importance weights, producing a nonzero estimate with much lower variance. When outcomes are equally likely (A), importance sampling offers no advantage. Fast simulation functions (C) reduce the need for variance reduction. High dimensionality (D) challenges all methods equally.

Section 2: True/False (5 questions, 3 points each = 15 points)

Write "True" or "False." Full credit requires correct identification only.

Question 11. True or False: Increasing the number of Monte Carlo simulations from 10,000 to 40,000 reduces the standard error by half.

Answer

**True.** The standard error is proportional to $1/\sqrt{N}$. Since $\sqrt{40000/10000} = \sqrt{4} = 2$, the standard error with 40,000 simulations is half that with 10,000 simulations. This illustrates the $O(1/\sqrt{N})$ convergence rate: a 4x increase in simulations yields a 2x improvement in precision.

Question 12. True or False: The bootstrap can produce reliable confidence intervals for any statistic regardless of sample size.

Answer

**False.** The bootstrap requires a sample that is large enough to be representative of the population. With very small samples (e.g., 20 bets), the empirical distribution is a poor approximation of the true distribution, and bootstrap confidence intervals can have poor coverage. The bootstrap also struggles with statistics that depend on extreme order statistics (e.g., the maximum) when the sample size is small. As a rough guideline, bootstrap inference becomes reliable with sample sizes of at least 50-100 observations.

Question 13. True or False: In a permutation test, if the observed test statistic is more extreme than all 10,000 permutation statistics, the p-value is reported as zero.

Answer

**False.** The correct practice is to report the p-value as $(r + 1)/(N + 1)$ where $r$ is the number of permutation statistics at least as extreme as the observed statistic and $N$ is the number of permutations. If zero permutations exceed the observed statistic, the p-value is $1/(N+1) = 1/10001 \approx 0.0001$, not zero. Reporting exactly zero is incorrect because it implies an impossible event under the null hypothesis, which we cannot conclude from a finite number of permutations.

Question 14. True or False: Variance reduction techniques change the expected value of the Monte Carlo estimator.

Answer

**False.** A properly implemented variance reduction technique produces an estimator with the same expected value as the naive Monte Carlo estimator but with lower variance. The techniques modify how samples are drawn or how estimates are computed, but they do not introduce bias. If a variance reduction method changes the expected value, it is incorrectly implemented.

Question 15. True or False: When using the bootstrap to analyze betting performance, the bootstrap samples should preserve the temporal ordering of bets.

Answer

**False.** The standard non-parametric bootstrap resamples with replacement without regard to temporal ordering. This is appropriate when the quantity of interest (e.g., ROI, win rate) does not depend on the order of observations. However, for time-dependent statistics like maximum drawdown or streak length, the standard bootstrap may be inappropriate, and block bootstrap or other time-series bootstrap methods should be considered. The question as stated applies to the standard bootstrap for order-independent statistics.

Section 3: Fill in the Blank (3 questions, 4 points each = 12 points)

Question 16. The bootstrap method was introduced by statistician __________ in __________ and is named by analogy with the phrase "pulling oneself up by one's bootstraps."

Answer

**Bradley Efron** in **1979** Efron's 1979 paper "Bootstrap Methods: Another Look at the Jackknife" introduced the bootstrap as a general-purpose method for estimating standard errors and confidence intervals. The metaphor refers to the seemingly impossible act of using the data itself to estimate the variability of statistics computed from that data. Efron later developed the BCa method (1987) and wrote the definitive textbook on the bootstrap with Robert Tibshirani (1993).

Question 17. Monte Carlo methods are named after the __________ casino in __________, reflecting the connection between random sampling and games of chance.

Answer

**Monte Carlo** casino in **Monaco** The name was coined by Nicholas Metropolis and Stanislaw Ulam during their work on nuclear simulations at Los Alamos in the 1940s. The reference to the famous gambling destination in Monaco was both a nod to the role of randomness in the method and a convenient code name for classified wartime research. The methods were first developed to model neutron diffusion in nuclear weapons but quickly found applications in physics, engineering, finance, and eventually sports analytics.

Question 18. In the BCa bootstrap method, the two correction factors are the __________ correction $\hat{z}_0$ and the __________ factor $\hat{a}$, which adjust for the center and skewness of the bootstrap distribution respectively.

Answer

**Bias** correction $\hat{z}_0$ and the **acceleration** factor $\hat{a}$ The bias correction factor $\hat{z}_0$ measures how the median of the bootstrap distribution differs from the original estimate, correcting for systematic bias. The acceleration factor $\hat{a}$ captures how the standard error of the statistic changes as the parameter value changes, correcting for skewness. Together, they adjust the percentiles used for the confidence interval, producing intervals with better coverage than the simple percentile method, especially for skewed statistics like the Sharpe ratio.

Section 4: Short Answer (3 questions, 5 points each = 15 points)

Answer each question in 3-5 sentences.

Question 19. Explain why a Monte Carlo estimate without a standard error or confidence interval is incomplete and potentially misleading. Use a sports betting example to illustrate.

Answer

A Monte Carlo estimate is a random quantity --- it varies from run to run depending on the random seed and number of simulations. Without a standard error, we have no way to assess how much the estimate might change if we ran the simulation again. For example, if a season simulation reports that Team X has a 12% championship probability, this could mean the true probability is anywhere from 10% to 14% (with a small SE) or from 5% to 19% (with a large SE). The betting decision --- whether to bet on Team X at +800 futures (implied 11.1%) --- depends critically on this uncertainty. Reporting only the point estimate creates false precision and can lead to bets on apparent edges that are entirely within the margin of simulation error. Always report the point estimate, standard error, and a confidence interval.

Question 20. Describe the key difference between a permutation test and a bootstrap hypothesis test, and explain when each is more appropriate.

Answer

A **permutation test** evaluates a null hypothesis by randomly reassigning group labels (e.g., "home" vs. "away") and computing the test statistic under each reassignment. It directly tests the null hypothesis that the labels are exchangeable. A **bootstrap hypothesis test** instead resamples from the data to estimate the sampling distribution of the test statistic and checks whether the observed statistic is extreme relative to this distribution. Permutation tests are more appropriate when you have a clear two-group comparison with an exchangeability null hypothesis (e.g., "does my new model outperform my old model on the same games?"). Bootstrap tests are more appropriate when you want to test whether a single quantity differs from a reference value (e.g., "is my win rate significantly above 52.4%?") or when you need confidence intervals rather than just p-values. In sports betting, permutation tests are ideal for model comparison, while bootstrap methods are ideal for performance evaluation.

Question 21. Explain why the effective sample size is an important diagnostic when using importance sampling, and what happens when it is very small relative to the actual sample size.

Answer

The **effective sample size** (ESS) measures how many independent, equally-weighted samples the importance-weighted sample is equivalent to. It is computed as $N_{\text{eff}} = (\sum w_i)^2 / \sum w_i^2$, where $w_i$ are the importance weights. When the proposal distribution $q$ is well-matched to the target, most weights are similar in magnitude and $N_{\text{eff}}$ is close to $N$. When $N_{\text{eff}}$ is very small (e.g., $N_{\text{eff}} = 50$ out of $N = 10{,}000$), it means a handful of samples with enormous weights dominate the estimate. This produces a high-variance estimator that is unreliable despite the large nominal sample size. In practice, if $N_{\text{eff}} / N < 0.1$, the importance sampling distribution is poorly chosen and should be redesigned. This diagnostic prevents the catastrophic failure mode where importance sampling actually increases variance relative to naive Monte Carlo.

Section 5: Code Analysis (2 questions, 6 points each = 12 points)

Question 22. Examine the following bootstrap implementation:

def bootstrap_roi(bet_results, n_bootstrap=10000, seed=42):
    rng = np.random.default_rng(seed)
    n = len(bet_results)
    boot_rois = []
    for _ in range(n_bootstrap):
        sample = rng.choice(bet_results, size=n)
        roi = sample.sum() / (n * 100) * 100
        boot_rois.append(roi)
    ci_lower = np.percentile(boot_rois, 2.5)
    ci_upper = np.percentile(boot_rois, 97.5)
    return ci_lower, ci_upper

(a) Identify a critical bug in the resampling step.

(b) The function uses the percentile method. Name one advantage and one disadvantage of this method compared to BCa.

Answer

**(a)** The critical bug is that `rng.choice(bet_results, size=n)` samples **without replacement** by default (the `replace` parameter defaults to `False` in NumPy's `Generator.choice`). The bootstrap requires sampling **with replacement**. Without replacement, the "bootstrap sample" is just a random permutation of the original data, and the ROI will be identical every time. The fix is: `rng.choice(bet_results, size=n, replace=True)`. Note: This is a subtle but critical error. In NumPy's legacy `np.random.choice`, the default is `replace=True`, but in the modern `Generator.choice` API, the default is `replace=False`. This inconsistency is a common source of bugs. **(b)** Advantage of the percentile method: it is simple to compute and understand --- just take percentiles of the bootstrap distribution. No additional computations (jackknife, bias correction) are needed. Disadvantage: the percentile method can have poor coverage when the bootstrap distribution is skewed. For betting metrics like ROI with skewed distributions (especially with small samples), the percentile interval may be shifted in the wrong direction, leading to actual coverage below the nominal 95%. The BCa method corrects for this skewness and generally achieves better coverage.

Question 23. Examine the following season simulation code:

def simulate_season(ratings, schedule, rng, margin_std=13.5):
    wins = {team: 0 for team in ratings}
    for game in schedule:
        home, away = game['home'], game['away']
        expected_margin = ratings[home] - ratings[away] + 3.0
        margin = rng.normal(expected_margin, margin_std)
        if margin > 0:
            wins[home] += 1
        else:
            wins[away] += 1
    return wins

results = []
for _ in range(10000):
    rng = np.random.default_rng(42)
    season = simulate_season(ratings, schedule, rng)
    results.append(season)

(a) Identify the critical bug in the simulation loop.

(b) What symptom would this bug produce in the results?

Answer

**(a)** The bug is that the random number generator is re-seeded with the same seed (42) inside every iteration of the simulation loop. This means every simulated season uses the exact same sequence of random numbers, producing identical results. **(b)** All 10,000 simulated seasons would be identical. Every team would have the same win total in every simulation. The standard deviation of simulated outcomes would be exactly zero, and all probability estimates would be either 0% or 100%. The simulation would appear to have perfect precision but would be completely uninformative. **(c)** Corrected version:

rng = np.random.default_rng(42)  # Create RNG once, outside the loop
results = []
for _ in range(10000):
    season = simulate_season(ratings, schedule, rng)
    results.append(season)

The RNG should be initialized once before the loop. Each call to `rng.normal()` advances the generator's internal state, so successive simulations automatically produce different random sequences while remaining reproducible from the same initial seed.

Section 6: Applied Problems (2 questions, 8 points each = 16 points)

Question 24. A bettor has a track record of 500 bets at -110 odds with a win rate of 55.0%. They run a bootstrap analysis with 10,000 replicates.

(a) (2 points) Calculate the point estimate of ROI. (At -110 odds, a win returns +$90.91 profit on a $100 bet, and a loss costs $100.)

(b) (2 points) The bootstrap produces a distribution of ROI with mean 4.48% and standard deviation 3.12%. The BCa 95% CI is (-1.6%, 10.7%). Interpret this interval in the context of evaluating the bettor's skill.

(c) (2 points) From the bootstrap distribution, 92.4% of the ROI replicates are positive. Compute the implied Bayesian posterior probability that the bettor has a genuine positive edge, and explain any assumptions required.

(d) (2 points) The bettor wants to know: "How many more bets do I need to narrow my 95% CI to a width of 5 percentage points?" Using the bootstrap SE, estimate the required additional sample size.

Answer

**(a)** With 500 bets at 55.0% win rate: - Wins: 275, Losses: 225 - Profit from wins: 275 x $90.91 = $25,000.25 - Loss from losses: 225 x $100 = $22,500.00 - Net profit: $2,500.25 - ROI = $2,500.25 / (500 x $100) = **5.00%** **(b)** The BCa 95% CI of (-1.6%, 10.7%) means we are 95% confident that the bettor's true long-run ROI falls between -1.6% and +10.7%. Since the interval includes zero, we cannot rule out the possibility that the bettor has no genuine edge at the 95% confidence level. However, the interval is substantially skewed toward positive values, suggesting that a positive edge is more likely than a negative one. The bettor shows promise but has not yet proven profitability beyond a reasonable doubt. **(c)** If we assume a flat (uniform) prior on the bettor's true ROI, the bootstrap distribution approximates the posterior distribution. Under this assumption, the posterior probability of a genuine positive edge is approximately **92.4%**. This requires the assumptions that (a) the bets are approximately independent, (b) the empirical distribution of bet outcomes is a reasonable approximation of the true distribution, and (c) the flat prior is appropriate (which may not be justified --- most bettors are not profitable, so a skeptical prior might be warranted). **(d)** The current 95% CI width is approximately 10.7% - (-1.6%) = 12.3 percentage points with $n = 500$. We want a width of 5 percentage points. CI width is proportional to $1/\sqrt{n}$, so: $\frac{5}{12.3} = \sqrt{\frac{500}{n_{\text{new}}}}$ $n_{\text{new}} = 500 \times (12.3/5)^2 = 500 \times 6.05 \approx 3{,}025$ Additional bets needed: $3{,}025 - 500 = **2{,}525$ additional bets**.

Question 25. You are tasked with estimating the probability that the worst team in an 8-team tournament (power rating 5 points below average) wins the championship. The tournament is single elimination with three rounds.

(a) (2 points) Set up the naive Monte Carlo approach. If each game outcome is determined by a margin drawn from $N(\text{rating difference}, 10^2)$, write the probability that the weakest team wins one game against the strongest team (rating +5 above average, so a 10-point gap).

(b) (3 points) You run 100,000 naive simulations and the weakest team wins the championship 312 times. Compute the probability estimate, its standard error, and a 95% confidence interval.

(c) (3 points) Propose an importance sampling strategy for this problem. Specifically, describe how you would modify the simulation to make the weakest team win more often, and write the form of the importance weight.

Answer

**(a)** The probability that the weakest team (rating -5) beats the strongest team (rating +5) in a single game: $P(\text{win}) = P(Z > 0)$ where $Z \sim N(-10, 10^2)$ $P(\text{win}) = P\left(\frac{Z - (-10)}{10} > \frac{0 - (-10)}{10}\right) = P(W > 1.0) = 1 - \Phi(1.0) = 0.1587$ The weakest team has approximately a **15.9%** chance of winning a single game against the strongest team. **(b)** From 100,000 simulations with 312 championships: Probability estimate: $\hat{p} = 312 / 100{,}000 = 0.00312$ Standard error: $SE = \sqrt{\hat{p}(1-\hat{p})/N} = \sqrt{0.00312 \times 0.99688 / 100{,}000} = \sqrt{3.11 \times 10^{-8}} = 0.000176$ 95% CI: $0.00312 \pm 1.96 \times 0.000176 = (0.00278, 0.00347)$ The weakest team has approximately a **0.31%** chance of winning the tournament, with a 95% CI of **(0.28%, 0.35%)**. **(c)** Importance sampling strategy: Modify the simulation so that game margins are drawn from $N(\text{rating difference} + \delta, 10^2)$ where $\delta$ is a boost applied to the weakest team's games. For example, adding $\delta = 8$ points to the weakest team's expected margin in every game makes their wins much more common. The importance weight for a complete tournament simulation is: $$w = \prod_{i=1}^{k} \frac{f(m_i | \mu_i)}{q(m_i | \mu_i + \delta)}$$ where $m_i$ is the simulated margin in game $i$, $f$ is the density under the true model (mean $\mu_i$), $q$ is the density under the importance sampling distribution (mean $\mu_i + \delta$), and $k$ is the number of games the weakest team plays. For normal distributions, this simplifies to: $$w = \prod_{i=1}^{k} \exp\left(-\frac{\delta(2m_i - 2\mu_i - \delta)}{2\sigma^2}\right)$$ This produces many more "championship wins" for the weakest team, each with a small importance weight, resulting in a lower-variance estimate of the rare-event probability.

Scoring Summary

Section	Questions	Points Each	Total
1. Multiple Choice	10	3	30
2. True/False	5	3	15
3. Fill in the Blank	3	4	12
4. Short Answer	3	5	15
5. Code Analysis	2	6	12
6. Applied Problems	2	8	16
Total	25	---	100

Grade Thresholds

Grade	Score Range	Percentage
A	90-100	90-100%
B	80-89	80-89%
C	70-79	70-79%
D	60-69	60-69%
F	0-59	0-59%