Chapter 10 Quiz: Bayesian Thinking for Bettors
Instructions: Answer all 25 questions. This quiz is worth 100 points. You have 60 minutes. A calculator is permitted; no notes or internet access. For multiple choice, select the single best answer.
Section 1: Multiple Choice (10 questions, 3 points each = 30 points)
Question 1. In Bayesian inference, the "prior" represents:
(A) The probability of the data given the hypothesis
(B) Your belief about a parameter before observing new data
(C) The probability of the data under all possible hypotheses
(D) The final updated belief after observing data
Answer
**(B) Your belief about a parameter before observing new data.** The prior distribution encodes what you know (or believe) about a parameter before incorporating the current data. For example, before an NFL season begins, your prior for a team's win probability might be based on last season's record, offseason changes, and expert assessments. Option (A) describes the likelihood, (C) describes the marginal likelihood (evidence), and (D) describes the posterior.Question 2. A sports bettor uses a Beta(10, 10) prior for a team's win probability. This prior implies:
(A) The bettor has no information about the team's ability
(B) The bettor believes the team is equally likely to have any win rate between 0 and 1
(C) The bettor believes the team's win rate is centered around 50% with moderate confidence
(D) The bettor is certain the team's win rate is exactly 50%
Answer
**(C) The bettor believes the team's win rate is centered around 50% with moderate confidence.** Beta(10, 10) has a mean of 10/20 = 0.50 and an effective sample size of 20. This represents moderate prior confidence (equivalent to having observed 10 wins in 20 games) centered at 50%. Option (A) would be Beta(1,1) or similar. Option (B) describes a uniform prior Beta(1,1). Option (D) would require a degenerate prior concentrated at 0.50.Question 3. Bayes' theorem states: $P(H|D) = \frac{P(D|H) \cdot P(H)}{P(D)}$. In this formula, $P(D|H)$ is called the:
(A) Prior
(B) Posterior
(C) Likelihood
(D) Evidence
Answer
**(C) Likelihood.** $P(D|H)$ is the likelihood --- the probability of observing the data $D$ given that hypothesis $H$ is true. In a sports context, if $H$ is "the team's true win rate is 60%" and $D$ is "the team went 7-3," then the likelihood is the binomial probability of observing 7 wins in 10 games with a 60% win probability. The prior is $P(H)$, the posterior is $P(H|D)$, and the evidence is $P(D)$.Question 4. In a Beta-Binomial model, if the prior is Beta(a, b) and you observe $w$ wins and $l$ losses, the posterior is:
(A) Beta(a + l, b + w)
(B) Beta(a + w, b + l)
(C) Beta(w, l)
(D) Beta(a * w, b * l)
Answer
**(B) Beta(a + w, b + l).** This is the conjugate update rule for the Beta-Binomial model. The first parameter $\alpha$ accumulates successes (wins) and the second parameter $\beta$ accumulates failures (losses). The prior pseudocounts ($a$ wins and $b$ losses) are simply added to the observed data. This is one of the most elegant and computationally simple results in Bayesian statistics.Question 5. The concept of "Bayesian shrinkage" in sports refers to:
(A) Reducing the sample size to avoid overfitting
(B) Pulling extreme estimates toward a population mean, especially with small samples
(C) Removing outlier games from the dataset before analysis
(D) Using a smaller prior standard deviation to increase model confidence
Answer
**(B) Pulling extreme estimates toward a population mean, especially with small samples.** Bayesian shrinkage occurs naturally when a prior centered on the population mean is combined with limited data. A team that starts 4-0 (100% win rate) is shrunk toward the league average (50%) because the prior provides regularization. The degree of shrinkage depends on the relative strength of the prior versus the data: with few observations, the prior dominates; with many observations, the data dominates. This prevents extreme early-season estimates from producing wild predictions.Question 6. A 95% Bayesian credible interval for a team's true win probability is [0.45, 0.68]. The correct interpretation is:
(A) If we repeated the season 100 times, the team's record would fall in this range 95 times
(B) There is a 95% probability that the team's true win probability lies between 0.45 and 0.68, given the observed data and prior
(C) 95% of teams with similar records have win probabilities in this range
(D) The team will win between 45% and 68% of their remaining games
Answer
**(B) There is a 95% probability that the team's true win probability lies between 0.45 and 0.68, given the observed data and prior.** This is the key advantage of Bayesian credible intervals over frequentist confidence intervals. The Bayesian interval directly answers the question a bettor cares about: "Given what I know, what range of true win probabilities is plausible?" The frequentist confidence interval has a more convoluted interpretation about repeated sampling procedures. Option (D) is a prediction interval (a different concept).Question 7. Which of the following is a conjugate prior for the Poisson distribution (used to model goals in soccer)?
(A) Normal distribution
(B) Beta distribution
(C) Gamma distribution
(D) Uniform distribution
Answer
**(C) Gamma distribution.** The Gamma distribution is the conjugate prior for the Poisson likelihood. If the prior is Gamma($\alpha$, $\beta$) and you observe a total of $s$ events in $n$ observations, the posterior is Gamma($\alpha + s$, $\beta + n$). This is particularly useful for modeling soccer goals, hockey goals, or any count-based sports outcome where the rate parameter is of interest.Question 8. In a hierarchical Bayesian model for team ratings, the "hyperparameters" represent:
(A) The individual team-level parameters (offensive and defensive ratings)
(B) The league-wide distribution from which individual team parameters are drawn
(C) The data used to fit the model
(D) The computational settings for the MCMC sampler
Answer
**(B) The league-wide distribution from which individual team parameters are drawn.** In a hierarchical model, individual team parameters (e.g., offensive strength) are modeled as draws from a league-wide distribution. The hyperparameters are the parameters of this league-wide distribution (e.g., the mean and variance of the league's offensive strength distribution). This structure allows information to be shared across teams: a team with few games benefits from the league-wide pattern. This is what produces hierarchical shrinkage.Question 9. You have a prior belief that a bettor's win rate is Beta(52, 48) (centered at 52%, effective N = 100). After observing 200 bets with 112 wins, the posterior will be:
(A) Dominated by the prior because it has more pseudocounts
(B) Dominated by the data because 200 observations outweigh the prior's effective sample size of 100
(C) Exactly equal to the raw observed win rate of 56%
(D) Impossible to compute without MCMC
Answer
**(B) Dominated by the data because 200 observations outweigh the prior's effective sample size of 100.** The posterior is Beta(52 + 112, 48 + 88) = Beta(164, 136). The posterior mean is 164/300 = 54.7%. The prior mean was 52%, and the data mean was 112/200 = 56%. The posterior is a weighted average, with the data contributing 200/300 = 67% of the weight and the prior contributing 100/300 = 33%. The data dominates because its effective sample size (200) exceeds the prior's (100).Question 10. The primary advantage of Bayesian methods for early-season sports predictions (before much data is available) is:
(A) Bayesian methods require less computational power than frequentist methods
(B) Bayesian methods can incorporate prior information to stabilize estimates when sample sizes are small
(C) Bayesian methods always produce more accurate predictions than frequentist methods
(D) Bayesian methods eliminate the need for any historical data
Answer
**(B) Bayesian methods can incorporate prior information to stabilize estimates when sample sizes are small.** Early in a sports season, each team has played only a few games. Raw statistics from 3-4 games are extremely noisy and unreliable. Bayesian methods address this by combining the limited current-season data with a prior based on preseason expectations, last season's performance, or league averages. As the season progresses and more data accumulates, the data gradually overwhelms the prior, and Bayesian and frequentist estimates converge. This is the single most practical advantage of Bayesian thinking for sports bettors.Section 2: True/False (5 questions, 3 points each = 15 points)
Question 11. True or False: As the amount of observed data increases, the posterior distribution becomes less and less sensitive to the choice of prior.
Answer
**True.** This is one of the most important properties of Bayesian inference. As the sample size grows, the likelihood function becomes increasingly concentrated and dominates the posterior. Two analysts who start with very different priors but observe the same large dataset will converge to nearly identical posteriors. This property is formally known as the "washing out" of the prior or posterior consistency. For sports modeling, this means prior specification matters most early in the season and becomes negligible by midseason.Question 12. True or False: A Bayesian 95% credible interval and a frequentist 95% confidence interval always contain the same range of values.
Answer
**False.** While they often give similar numerical results (especially with uninformative priors and large samples), their interpretations differ fundamentally, and they can yield different ranges. The Bayesian credible interval is the range containing 95% of the posterior probability mass. The frequentist confidence interval is a range computed from a procedure that, if repeated many times, would contain the true parameter 95% of the time. With informative priors or small samples, the two can differ substantially.Question 13. True or False: In a Beta-Binomial model, the posterior mean is always a weighted average of the prior mean and the observed success rate.
Answer
**True.** If the prior is Beta($\alpha$, $\beta$) and you observe $w$ wins in $n$ games, the posterior mean is: $\frac{\alpha + w}{\alpha + \beta + n} = \frac{\alpha + \beta}{\alpha + \beta + n} \cdot \frac{\alpha}{\alpha + \beta} + \frac{n}{\alpha + \beta + n} \cdot \frac{w}{n}$ This is a weighted average of the prior mean $\frac{\alpha}{\alpha + \beta}$ and the observed rate $\frac{w}{n}$, with weights proportional to the prior effective sample size ($\alpha + \beta$) and the data sample size ($n$). This elegant property is central to Bayesian shrinkage.Question 14. True or False: MCMC (Markov Chain Monte Carlo) sampling is necessary for all Bayesian models in sports analytics.
Answer
**False.** Many useful Bayesian models in sports have closed-form solutions that require no sampling. The Beta-Binomial model (conjugate update), the Normal-Normal model, and the Poisson-Gamma model all have analytical posterior distributions. MCMC is needed for more complex models (hierarchical models, non-conjugate priors, models with many parameters) where the posterior cannot be computed analytically. Conjugate models are often sufficient for basic team tracking and player evaluation tasks.Question 15. True or False: A Bayes factor of 10 in favor of model A over model B means that model A is 10 times more likely to be correct than model B.
Answer
**False (with nuance).** A Bayes factor of 10 means the observed data is 10 times more likely under model A than under model B. However, the posterior odds that model A is correct also depend on the prior odds. If you started believing model B was 5 times more likely than model A (prior odds = 1:5 for A:B), then after observing a Bayes factor of 10, your posterior odds would be 10/5 = 2:1 in favor of A. The Bayes factor updates your beliefs but does not give the absolute probability that a model is correct without specifying prior model probabilities.Section 3: Fill in the Blank (3 questions, 4 points each = 12 points)
Question 16. In Bayesian inference, the formula Posterior $\propto$ __________ $\times$ Prior expresses the core relationship between beliefs before and after observing data.
Answer
**Likelihood** The fundamental equation of Bayesian inference is: Posterior $\propto$ Likelihood $\times$ Prior, or more formally, $P(\theta | D) \propto P(D | \theta) \cdot P(\theta)$. The posterior is proportional to the product of the likelihood (how probable the data is given the parameter) and the prior (our pre-data beliefs). The normalizing constant $P(D)$ (the evidence or marginal likelihood) ensures the posterior integrates to 1, but it is often unnecessary for parameter estimation since it does not depend on $\theta$.Question 17. In a Beta($\alpha$, $\beta$) distribution, the mean is $\frac{\alpha}{\alpha + \beta}$ and the quantity $\alpha + \beta$ is often called the __________, representing the strength of prior belief in terms of equivalent observations.
Answer
**Effective sample size** (also accepted: "concentration parameter," "pseudocount," or "prior strength") The sum $\alpha + \beta$ determines how tightly the Beta distribution is concentrated around its mean. A Beta(2, 2) with $\alpha + \beta = 4$ is wide and uncertain. A Beta(50, 50) with $\alpha + \beta = 100$ is tightly concentrated around 0.50. The effective sample size interpretation is powerful: Beta(50, 50) encodes the same amount of information as having observed 50 wins in 100 games. New data adds to these pseudocounts, so the posterior's effective sample size is $\alpha + \beta + n$, where $n$ is the number of new observations.Question 18. A Bayesian model that estimates individual team parameters as draws from a common league-wide distribution, allowing information sharing across teams, is called a __________ model.
Answer
**Hierarchical** (also accepted: "multilevel" or "mixed-effects Bayesian") Hierarchical models are one of the most powerful tools in sports analytics. Instead of estimating each team's strength independently, a hierarchical model treats team parameters as samples from a league-wide distribution. This creates automatic regularization: teams with few games or extreme results are shrunk toward the league mean. The model simultaneously estimates both the individual team parameters and the league-wide distribution parameters (hyperparameters), learning the appropriate degree of shrinkage from the data itself.Section 4: Short Answer (3 questions, 5 points each = 15 points)
Question 19. Explain why Bayesian shrinkage is particularly valuable in the first few weeks of a sports season, and describe what happens to the degree of shrinkage as the season progresses.
Answer
In the first few weeks of a sports season, each team has played only 2-4 games. Raw statistics from such small samples are extremely unreliable: a team that starts 4-0 does not truly have a 100% win rate, and a team that starts 0-4 is not truly a 0% team. Bayesian shrinkage pulls these extreme estimates toward a prior (typically the preseason expectation or league average), producing more stable and realistic predictions. Mathematically, the posterior mean is a weighted average of the prior mean and the observed data mean, with weights proportional to the prior's effective sample size and the number of games played. Early on, the prior dominates (e.g., with 4 games played and a prior effective N of 20, the prior gets 20/24 = 83% of the weight). As the season progresses, the data's weight increases: by game 50, the prior's 20 pseudocounts are far outweighed by 50 real games, and the posterior closely tracks the actual win rate. Shrinkage thus starts strong and fades naturally, which is exactly the behavior a bettor wants: conservative estimates early, data-driven estimates later.Question 20. Describe the posterior predictive distribution and explain why it is more useful for betting than a simple point estimate of a model parameter.
Answer
The **posterior predictive distribution** is the distribution of future observations, accounting for both parameter uncertainty and inherent randomness. Instead of first estimating a single "best" parameter value and then predicting from it, the posterior predictive integrates over all plausible parameter values, weighted by their posterior probability. For betting, this is superior to a point estimate for two reasons. First, it captures the full range of plausible outcomes, not just the most likely one. A bettor needs to know not just "the model predicts the home team wins by 3 points" but also "there is a 62% chance they cover -2.5 and a 45% chance they cover -5.5." The posterior predictive provides this full distribution. Second, it honestly reflects model uncertainty. If the model's parameter estimates are uncertain (wide posterior), the predictive distribution will be wider, automatically making the model less confident in its predictions. This prevents overconfident betting decisions that a point-estimate-based model might encourage. The posterior predictive connects directly to expected value calculations: it gives the probability of each outcome, which can be compared to the sportsbook's implied probabilities.Question 21. Explain the concept of a "loss function" in Bayesian decision theory and how it applies to a bettor deciding whether to place a wager.
Answer
In Bayesian decision theory, a **loss function** quantifies the cost of making a wrong decision given the true state of the world. The optimal decision minimizes the expected loss, where the expectation is taken over the posterior distribution of the uncertain parameter. For a bettor deciding whether to place a wager, the loss function captures the asymmetry of betting outcomes. If you bet and the true win probability is below the break-even threshold, you lose your stake. If you do not bet and the true win probability is above the break-even threshold, you forgo expected profit. A Bayesian bettor integrates the loss function over the posterior distribution of the true win probability to compute the expected loss for each action (bet or no bet). For example, if the posterior distribution for the true win probability is Beta(65, 35) and the break-even rate is 52.4%, the bettor computes the expected profit (positive region of the posterior above 52.4%) and expected loss (negative region below 52.4%), weighted by the posterior probabilities. This is strictly more informative than a single point estimate, because it accounts for the probability of being wrong.Section 5: Code Analysis (2 questions, 6 points each = 12 points)
Question 22. Examine the following Bayesian updating code:
from scipy.stats import beta
class BayesianTracker:
def __init__(self, alpha_prior=4, beta_prior=4):
self.alpha = alpha_prior
self.beta = beta_prior
def update(self, wins, losses):
self.alpha += wins
self.beta += losses
def mean(self):
return self.alpha / (self.alpha + self.beta)
def interval(self, level=0.95):
return beta.interval(level, self.alpha, self.beta)
tracker = BayesianTracker(alpha_prior=6, beta_prior=4)
tracker.update(wins=8, losses=2)
print(f"Posterior mean: {tracker.mean():.3f}")
print(f"95% CI: {tracker.interval()}")
(a) Trace through the code and write the exact posterior parameters and posterior mean.
(b) What prior belief does Beta(6, 4) encode, and what is its effective sample size?
(c) If we then call tracker.update(wins=2, losses=8), what is the new posterior mean? Does sequential updating give the same result as observing all 20 games at once?
Answer
**(a)** After `tracker.update(wins=8, losses=2)`: - `self.alpha = 6 + 8 = 14` - `self.beta = 4 + 2 = 6` - Posterior: Beta(14, 6) - Posterior mean: 14 / (14 + 6) = 14/20 = **0.700** The 95% credible interval would be `beta.interval(0.95, 14, 6)` which is approximately (0.478, 0.876). **(b)** Beta(6, 4) encodes a prior belief that: - The team's win rate is centered at 6/10 = **60%** - The effective sample size is $\alpha + \beta = 6 + 4 = 10$ (equivalent to having observed 6 wins in 10 games) - The distribution is moderately concentrated around 60%, expressing moderate confidence **(c)** After the second update `tracker.update(wins=2, losses=8)`: - `self.alpha = 14 + 2 = 16` - `self.beta = 6 + 8 = 14` - New posterior: Beta(16, 14) - New posterior mean: 16/30 = **0.533** **Yes, sequential updating gives exactly the same result as observing all data at once.** If we had started with Beta(6, 4) and updated with 10 wins and 10 losses simultaneously, we would get Beta(6+10, 4+10) = Beta(16, 14), with mean 16/30 = 0.533. This is a fundamental property of Bayesian updating: the order of observations does not matter; only the sufficient statistics (total wins and total losses) determine the posterior.Question 23. Examine the following code for Bayesian model comparison:
import numpy as np
from scipy.stats import beta as beta_dist
def bayes_factor_skill_vs_luck(wins, total, skill_rate=0.55, luck_rate=0.524):
"""Compare two hypotheses about a bettor's win rate."""
likelihood_skill = beta_dist.pdf(skill_rate, wins + 1, total - wins + 1)
likelihood_luck = beta_dist.pdf(luck_rate, wins + 1, total - wins + 1)
bf = likelihood_skill / likelihood_luck
return bf
bf = bayes_factor_skill_vs_luck(wins=56, total=100)
print(f"Bayes Factor: {bf:.2f}")
(a) This code contains a conceptual error in how the Bayes factor is computed. Identify and explain the error.
(b) Write the corrected approach (you may describe it in words or pseudocode).
Answer
**(a)** The code computes the **ratio of posterior densities at two point values**, not the Bayes factor. The Bayes factor compares two **models** (hypotheses), not two parameter values. Specifically: - The correct Bayes factor for $H_{\text{skill}}: p = 0.55$ vs $H_{\text{luck}}: p = 0.524$ as **simple hypotheses** should be the ratio of **likelihoods** evaluated at the data: - $BF = \frac{P(\text{data} | p = 0.55)}{P(\text{data} | p = 0.524)}$ - This is the binomial probability of observing 56 wins in 100 bets at each rate. - The code instead evaluates the **posterior density** of the win rate parameter at 0.55 and 0.524, using `beta_dist.pdf` with parameters derived from the data. This conflates the posterior density with the likelihood. **(b)** Corrected approach:from scipy.stats import binom
def bayes_factor_corrected(wins, total, skill_rate=0.55, luck_rate=0.524):
"""Correct Bayes factor for simple point hypotheses."""
likelihood_skill = binom.pmf(wins, total, skill_rate)
likelihood_luck = binom.pmf(wins, total, luck_rate)
bf = likelihood_skill / likelihood_luck
return bf
For composite hypotheses (e.g., $H_1$: $p$ comes from a distribution centered above 0.524), the Bayes factor requires integrating the likelihood over the prior under each hypothesis, which is more complex.
Section 6: Applied Problems (2 questions, 8 points each = 16 points)
Question 24. You are tracking an NBA team's home win probability using a Beta-Binomial model. Your prior (based on last season) is Beta(35, 25), reflecting a belief that the team wins about 58.3% of home games.
(a) (2 points) The team starts the current season 6-2 at home. Compute the posterior parameters and the posterior mean.
(b) (2 points) The sportsbook implies a 55% home win probability for the team's next home game. Compute the posterior probability that the team's true home win rate exceeds 55%.
(c) (2 points) How would your answer to (b) change if you had used a weaker prior of Beta(7, 5) instead? Compute the new posterior and the probability of exceeding 55%.
(d) (2 points) Based on your analysis, does the model identify a betting opportunity on the home team at 55% implied probability? How does prior strength affect your confidence in this assessment?
Answer
**(a)** Posterior = Beta(35 + 6, 25 + 2) = Beta(41, 27). Posterior mean = 41 / (41 + 27) = 41/68 = **0.603** (60.3%). **(b)** We need $P(p > 0.55 | \text{data})$ where $p \sim \text{Beta}(41, 27)$. Using the Beta CDF: $P(p > 0.55) = 1 - F_{\text{Beta}(41,27)}(0.55)$. The Beta(41, 27) distribution has mean 0.603 and standard deviation approximately $\sqrt{\frac{41 \times 27}{68^2 \times 69}} \approx 0.059$. Using a normal approximation: $z = \frac{0.55 - 0.603}{0.059} = -0.898$. $P(p > 0.55) \approx \Phi(0.898) \approx 0.815$. Using the exact Beta CDF, $P(p > 0.55) \approx$ **81.5%**. **(c)** With a weaker prior Beta(7, 5): - Posterior = Beta(7 + 6, 5 + 2) = Beta(13, 7) - Posterior mean = 13/20 = 0.650 (65.0%) - Standard deviation $\approx \sqrt{\frac{13 \times 7}{20^2 \times 21}} \approx 0.104$ - $z = \frac{0.55 - 0.65}{0.104} = -0.962$ - $P(p > 0.55) \approx \Phi(0.962) \approx$ **83.2%** The weaker prior gives a slightly higher probability (83.2% vs 81.5%) because the posterior is more influenced by the strong 6-2 observed record, pushing the mean higher (65% vs 60.3%). **(d)** Both priors suggest the team's true home win rate likely exceeds 55% (probability around 81-83%). This indicates a **potential betting opportunity** on the home team at 55% implied odds. However, the stronger prior produces a posterior mean closer to the historical rate (60.3%) while the weaker prior produces a posterior more influenced by the small current-season sample (65%). The stronger prior is more conservative and arguably more reliable given that only 8 games have been played. A bettor should use the stronger prior if last season's data is trustworthy and not much has changed. The edge (model probability minus market probability) is approximately 5-10 percentage points, which exceeds typical edge thresholds.Question 25. You are comparing two approaches to predicting soccer match outcomes:
Approach A (Frequentist): Uses each team's goals scored and conceded from the current season (12 matches) to estimate Poisson rates. For the home team: $\hat{\lambda}_{\text{home}} = 1.8$ (observed average). For the away team: $\hat{\lambda}_{\text{away}} = 0.9$.
Approach B (Bayesian): Uses a Gamma prior based on last season's rates, updated with the current 12 matches. Home team prior: Gamma(18, 12) with mean 1.5. After observing 22 goals in 12 matches: posterior Gamma(40, 24) with mean 1.67. Away team prior: Gamma(12, 12) with mean 1.0. After observing 11 goals in 12 matches: posterior Gamma(23, 24) with mean 0.96.
(a) (2 points) Which approach gives a higher predicted goal total for this match? Calculate the predicted total (sum of both teams' means) under each approach.
(b) (2 points) The sportsbook sets the total at 2.5 goals. Under Approach A (using Poisson with $\lambda = 1.8 + 0.9 = 2.7$), what is the approximate probability of Over 2.5 goals?
(c) (3 points) Explain two specific advantages of Approach B over Approach A for betting purposes, particularly early in the season.
(d) (1 point) By the end of the season (after 38 matches), would you expect the two approaches to produce meaningfully different predictions? Why or why not?
Answer
**(a)** - Approach A: predicted total = $1.8 + 0.9 = 2.7$ goals - Approach B: predicted total = $1.67 + 0.96 = 2.63$ goals **Approach A predicts a higher total** (2.7 vs 2.63). The Bayesian approach shrinks both estimates toward their priors: the home team's rate is pulled down from 1.8 to 1.67 (prior was 1.5), and the away team's rate is pulled slightly up from 0.92 to 0.96 (prior was 1.0). **(b)** Under a Poisson model with $\lambda = 2.7$: $P(\text{Over 2.5}) = 1 - P(X \leq 2) = 1 - [P(0) + P(1) + P(2)]$ $P(0) = e^{-2.7} = 0.0672$ $P(1) = 2.7 \times e^{-2.7} = 0.1815$ $P(2) = \frac{2.7^2}{2} \times e^{-2.7} = 0.2450$ $P(\text{Over 2.5}) = 1 - (0.0672 + 0.1815 + 0.2450) = 1 - 0.4937 = **0.506** \approx 50.6\%$ **(c)** Two advantages of Approach B: 1. **Stability with small samples:** After only 12 matches, raw averages are noisy. If the home team scored 5 goals in one blowout match, it inflates the average. The Bayesian approach regularizes by blending with last season's data, producing more stable estimates. This means fewer false signals and better-calibrated betting decisions early in the season. 2. **Uncertainty quantification:** The Bayesian posterior provides not just a point estimate but a full distribution over possible rates. This allows the bettor to compute $P(\text{Over 2.5 goals})$ while accounting for uncertainty in the rate parameters themselves (posterior predictive distribution), which is more honest than plugging in a single point estimate. With only 12 matches of data, there is substantial uncertainty about the true rates, and Approach A ignores this entirely. **(d)** After 38 matches, the two approaches would produce **very similar predictions**. With 38 observations, the Bayesian prior (effective sample size of 12 for each prior) would be overwhelmed by the data. The posterior would be dominated by the 38-match observed rates, and the Bayesian shrinkage would be minimal. This is the "washing out" of the prior: with enough data, the prior matters very little, and Bayesian and frequentist estimates converge.Scoring Summary
| Section | Questions | Points Each | Total |
|---|---|---|---|
| 1. Multiple Choice | 10 | 3 | 30 |
| 2. True/False | 5 | 3 | 15 |
| 3. Fill in the Blank | 3 | 4 | 12 |
| 4. Short Answer | 3 | 5 | 15 |
| 5. Code Analysis | 2 | 6 | 12 |
| 6. Applied Problems | 2 | 8 | 16 |
| Total | 25 | --- | 100 |
Grade Thresholds
| Grade | Score Range | Percentage |
|---|---|---|
| A | 90-100 | 90-100% |
| B | 80-89 | 80-89% |
| C | 70-79 | 70-79% |
| D | 60-69 | 60-69% |
| F | 0-59 | 0-59% |