Chapter 10 Exercises: Bayesian Thinking for Bettors

Instructions: Complete all exercises in the parts assigned by your instructor. Show all work for calculation problems. For programming challenges, include comments explaining your logic and provide sample output. For analysis and research problems, cite your sources where applicable.


Part A: Conceptual Understanding

Each problem is worth 5 points. Answer in complete sentences unless otherwise directed.


Exercise A.1 --- Bayesian vs. Frequentist Reasoning

Explain the fundamental philosophical difference between Bayesian and frequentist approaches to probability. In your answer, address (a) how each framework defines probability, (b) how each handles the concept of a "true parameter," (c) why the Bayesian approach is more natural for sports bettors who update beliefs as a season progresses, and (d) one practical limitation of the Bayesian approach.


Exercise A.2 --- The Role of Priors

Explain what a prior distribution represents in Bayesian inference. Then address: (a) What is an informative prior and when should you use one? (b) What is an uninformative (diffuse) prior and when is it appropriate? (c) How does the prior's influence diminish as sample size grows? (d) Give a sports betting example where the choice of prior would meaningfully affect predictions in the first four weeks of a season but not by Week 12.


Exercise A.3 --- Bayes' Theorem in Words

State Bayes' theorem and explain each component (prior, likelihood, evidence, posterior) using the following concrete example: Before the season, you believe an NFL team has a 45% chance of making the playoffs. After they start 4-1, you want to update that estimate. Map each component of Bayes' theorem to this specific scenario, even if you cannot compute exact numbers.


Exercise A.4 --- Conjugate Priors

Explain the concept of conjugate priors and why they are computationally convenient. For each of the following likelihood-prior pairs, name the conjugate prior and describe a sports application:

(a) Binomial likelihood (e.g., number of wins in $n$ games)

(b) Poisson likelihood (e.g., goals scored per game)

(c) Normal likelihood with known variance (e.g., point differential)

Explain why conjugate priors are less important in the age of computational Bayesian methods (MCMC, variational inference) but still useful for building intuition.


Exercise A.5 --- Bayesian Updating as Sequential Learning

One of the most powerful aspects of Bayesian inference is its sequential nature: yesterday's posterior becomes today's prior. Explain this concept and (a) describe how it applies to tracking a team's true win probability over a season, (b) explain why this is superior to using a fixed window of recent games, (c) describe the mathematical mechanism by which extreme early-season results are "washed out" by subsequent data, and (d) identify a situation where this sequential property could lead you astray.


Exercise A.6 --- Credible Intervals vs. Confidence Intervals

Explain the difference between a 95% Bayesian credible interval and a 95% frequentist confidence interval. Address (a) the correct interpretation of each, (b) why the Bayesian credible interval provides a more intuitive answer for a bettor asking "What is the range of plausible values for this team's true win rate?", and (c) under what conditions the two intervals give approximately the same result.


Exercise A.7 --- Bayesian Model Comparison

Describe how Bayesian methods can be used to compare competing models. Explain (a) the concept of the marginal likelihood (evidence) and its role in model comparison, (b) what a Bayes factor is and how to interpret it, (c) how Bayesian model comparison naturally penalizes overly complex models (Occam's razor), and (d) provide an example of two competing models for an NFL team's performance that you might compare using a Bayes factor.


Exercise A.8 --- When Bayesian Methods Fail

Identify three scenarios in sports modeling where Bayesian methods might produce poor results or misleading conclusions. For each, explain (a) the specific problem, (b) why it occurs, and (c) how you might detect or mitigate it. Consider issues such as misspecified priors, model misspecification, sensitivity to prior choice with small samples, and computational challenges.


Part B: Calculations

Each problem is worth 5 points. Show all work and round final answers to the indicated precision.


Exercise B.1 --- Bayes' Theorem: Doping Test

A sports league implements a drug test that has a 98% true positive rate (sensitivity) and a 96% true negative rate (specificity). The league estimates that 3% of players use the banned substance.

(a) A player tests positive. What is the probability they actually used the substance? (Apply Bayes' theorem.)

(b) Explain why the answer is lower than most people intuitively expect (the base rate fallacy).

(c) If the player tests positive twice (independent tests), what is the updated probability?


Exercise B.2 --- Posterior Calculation with Beta-Binomial

A bettor models an NBA team's home win probability using a Beta-Binomial model. The prior is $\text{Beta}(6, 4)$, reflecting a belief that the team wins about 60% of home games.

(a) What is the prior mean and the prior's effective sample size (pseudocount)?

(b) The team starts the season 7-3 at home. Compute the posterior distribution parameters.

(c) What is the posterior mean? How has it changed from the prior?

(d) Compute the 95% credible interval for the team's true home win probability using the posterior distribution.

(e) The sportsbook implies a 58% home win probability for the team's next game. Using the posterior, what is the probability that the team's true win rate exceeds 58%? (Use a Beta CDF.)


Exercise B.3 --- Beta Prior Elicitation

You want to set a Beta prior for an NFL kicker's field goal percentage. Based on your research: - The kicker made 85% of field goals last season (35 of 41 attempts). - League average is 83%. - You want the prior to have an effective sample size of 20 kicks.

(a) Find the Beta parameters $\alpha$ and $\beta$ that correspond to a mean of 0.84 (blending the kicker's record with the league average) and an effective sample size of 20.

(b) Plot or describe the shape of this Beta(16.8, 3.2) prior.

(c) After 5 games, the kicker has made 12 of 15 field goals. Compute the posterior.

(d) Compare the posterior mean to the raw current-season make rate (12/15 = 80%) and the prior mean (84%). Explain why the posterior is between these values.


Exercise B.4 --- Poisson-Gamma Conjugate Model

You model goals scored by a soccer team per match using a Poisson distribution with a Gamma prior on the rate parameter $\lambda$.

Prior: $\lambda \sim \text{Gamma}(\alpha = 8, \beta = 5)$, where the mean is $\alpha / \beta = 1.6$ goals per match.

(a) In 6 matches, the team scores a total of 12 goals. Compute the posterior distribution parameters.

(b) What is the posterior mean for $\lambda$?

(c) The team's next match has a total line of 2.5 goals (combined both teams). If the opponent averages 1.2 goals per match (treated as known), what is the model's predicted probability that the combined goals exceed 2.5?


Exercise B.5 --- Bayesian Hypothesis Testing

You want to test whether a bettor has genuine skill or is merely lucky. Their record is 58 wins out of 100 bets at -110 odds.

(a) Define the null hypothesis ($H_0$: win rate = 0.524, the break-even rate at -110) and the alternative ($H_1$: win rate > 0.524).

(b) Under a Bayesian framework with a Beta(1,1) prior on the win rate, compute the posterior distribution given 58 wins and 42 losses.

(c) Compute the posterior probability that the true win rate exceeds 0.524 (the break-even rate).

(d) Compare this Bayesian result with the frequentist p-value from a one-sided binomial test. Do they lead to the same conclusion?


Exercise B.6 --- Bayesian Shrinkage for Small Samples

Five NFL teams have the following records through Week 4 (4 games each):

Team Wins Losses Raw Win %
Team A 4 0 100%
Team B 3 1 75%
Team C 2 2 50%
Team D 1 3 25%
Team E 0 4 0%

Apply Bayesian shrinkage using a Beta(4, 4) prior (centered at 50%, representing league-average belief).

(a) Compute the posterior mean win probability for each team.

(b) How much has each team's estimate been shrunk toward 50% compared to the raw win percentage?

(c) By Week 12, Team A is 9-3. Recompute the posterior mean. How much less influence does the prior now have?

(d) Explain why this shrinkage is appropriate for early-season estimates and less necessary later.


Exercise B.7 --- Updating with New Information

Before an NFL game, your Bayesian model estimates the home team's win probability at 62%. The posterior is based on 10 weeks of data. You then learn that the home team's starting quarterback is ruled out with an injury.

(a) Explain conceptually how you would update the 62% estimate given this new information.

(b) If historical data shows that teams replacing their starting QB see their win probability drop by approximately 12 percentage points on average, what is your updated estimate?

(c) The sportsbook line moves from Home -3.5 to Home -1.0 after the injury news. Does this line move seem consistent with the magnitude of your adjustment?

(d) Describe how a more sophisticated Bayesian approach (using a hierarchical model for QB quality) might produce a better-informed adjustment than the blanket 12-percentage-point penalty.


Part C: Programming Challenges

Each problem is worth 10 points. Write clean, well-documented Python code. Include docstrings, type hints, and at least three test cases per function.


Exercise C.1 --- Bayesian Win Probability Updater

Build a BayesianWinTracker class that tracks a team's win probability using a Beta-Binomial model, updating after each game.

Requirements: - Initialize with a configurable prior (default Beta(4, 4) for unknown teams, or use last season's record for informative priors). - Provide an update(won: bool) method that updates the posterior. - Provide a get_probability() method returning the posterior mean. - Provide a get_credible_interval(level: float) method returning the HPD interval. - Track the full history of posterior parameters and probabilities over time. - Implement a plot_evolution() method showing how the posterior mean and credible interval evolve game by game. - Include a compare_to_market(market_prob: float) method that computes the posterior probability of the true win rate exceeding the market's implied probability. - Demonstrate with a simulated 17-game NFL season.


Exercise C.2 --- Beta-Binomial Model for Player Props

Build a Beta-Binomial model for tracking player shooting or scoring rates.

Requirements: - Implement a PlayerPropModel class that models a player's success rate (e.g., three-point shooting percentage, field goal rate) using a Beta-Binomial framework. - Allow setting the prior from historical data (last season's stats, career averages). - Update the model game by game as new data arrives. - Compute the posterior predictive distribution: given the current posterior, what is the probability the player makes at least $k$ shots out of $n$ attempts in the next game? - Compare the posterior predictive probability to a sportsbook's player prop line. - Generate a plot showing the posterior density for the player's true rate alongside the sportsbook's implied rate. - Demonstrate with a realistic NBA player's three-point shooting data.


Exercise C.3 --- PyMC Sports Model

Build a Bayesian regression model for predicting NBA point differentials using PyMC.

Requirements: - Generate or load game-level data with features: home team net rating, away team net rating, rest advantage, home-court indicator. - Define a Bayesian linear regression model in PyMC with: - Normal priors on regression coefficients (informed by domain knowledge). - Half-Normal prior on the residual standard deviation. - Sample from the posterior using MCMC (NUTS sampler). - Report posterior summaries (mean, std, 95% HDI) for all parameters. - Generate posterior predictive samples for upcoming games. - Compute the full predictive distribution for a specific game's point differential and derive win probability with uncertainty. - Produce trace plots, posterior density plots, and a posterior predictive check.


Exercise C.4 --- Hierarchical Model for Team Ratings

Build a hierarchical Bayesian model that estimates team strengths across a league while sharing information between teams.

Requirements: - Model each team's offensive and defensive strength as draws from a league-wide distribution. - Use a hierarchical structure: team parameters are drawn from a common Normal distribution whose mean and variance are also estimated. - Implement using either PyMC or a simplified conjugate Normal-Normal model. - Fit the model to one season of synthetic game data (32 teams, 272 games). - Show that the hierarchical model produces more stable team ratings than independent estimation (demonstrate shrinkage). - Compare the hierarchical model's out-of-sample predictions to a non-hierarchical alternative. - Generate a plot ranking all teams by posterior mean strength with credible intervals.


Exercise C.5 --- Bayesian A/B Testing for Betting Strategies

Build a Bayesian framework for comparing two betting strategies.

Requirements: - Implement a StrategyComparison class that models two strategies' win rates using independent Beta-Binomial models. - After observing results from both strategies, compute the posterior probability that Strategy A is better than Strategy B. - Implement an expected_loss function that computes the expected opportunity cost of choosing the wrong strategy. - Implement a minimum_sample_size function that estimates how many more bets are needed to reach a desired confidence level (e.g., 95% probability that the better strategy is identified). - Visualize the posterior distributions for both strategies on the same plot, shading the area where A > B. - Demonstrate with realistic betting strategy results (e.g., 120 bets each, with win rates of 55% and 51%).


Part D: Analysis & Interpretation

Each problem is worth 5 points. Provide structured, well-reasoned responses.


Exercise D.1 --- Interpreting a Bayesian Team Rating System

A Bayesian team rating system produces the following posterior estimates for four NFL teams midway through the season (after 8 games):

Team Posterior Mean (pts above avg) 95% Credible Interval Prior Mean
Team Alpha +5.2 [+2.1, +8.3] +3.0
Team Beta +4.8 [+1.5, +8.1] +6.0
Team Gamma -1.2 [-4.5, +2.1] +1.0
Team Delta -3.5 [-7.2, +0.2] -5.0

(a) Which team has improved the most relative to its prior? Which has declined the most?

(b) Team Gamma's credible interval spans zero. What does this mean for betting purposes?

(c) If Team Alpha plays Team Beta, and the model predicts Alpha by 0.4 points plus 3 points of home-field advantage, how confident should you be in this prediction given the credible intervals?

(d) A sportsbook sets Alpha -4.5 for this matchup. Based on the model, do you see value, and how would you size the bet?


Exercise D.2 --- Prior Sensitivity Analysis

A colleague builds a Bayesian model to estimate a new NBA team's (expansion team) win probability. They are unsure what prior to use and test three options:

Prior Parameters Prior Mean Prior Effective N
Uninformative Beta(1, 1) 0.50 2
Weak Beta(4, 6) 0.40 10
Strong Beta(20, 30) 0.40 50

After 15 games, the team is 8-7.

(a) Compute the posterior mean under each prior.

(b) How different are the posterior means? Which prior has the most influence?

(c) After 60 games (32-28 record), recompute. How much does the prior matter now?

(d) What recommendation would you give about prior selection for a team with no historical data?


Exercise D.3 --- Bayesian Reasoning About Streaks

A bettor reports a 12-game winning streak. Under a Bayesian framework with a Beta(52.4, 47.6) prior (centered at the break-even rate for -110 bets):

(a) What is the posterior after 12 consecutive wins (12 wins, 0 losses added to the prior)?

(b) What is the posterior probability that the bettor's true win rate exceeds 55%?

(c) A frequentist colleague says "12 wins in a row at a 52.4% base rate has probability $0.524^{12} = 0.0031$, so the bettor must be skilled." Explain the Bayesian counterargument.

(d) What additional data would you need to be 90% confident that the bettor's true win rate exceeds the break-even rate?


Exercise D.4 --- Bayesian vs. Frequentist Model Evaluation

You build two models for the same prediction task: - Model A: Frequentist logistic regression (point estimates for coefficients) - Model B: Bayesian logistic regression (posterior distributions for coefficients)

Both are fit on 3 seasons of NBA data and tested on 1 season.

(a) Describe one advantage Model B has for generating probability estimates for betting.

(b) Describe one advantage Model A has in terms of computational efficiency and deployment.

(c) When would the two models produce nearly identical predictions?

(d) Describe a specific scenario in sports betting where the Bayesian uncertainty quantification from Model B would lead to a meaningfully different betting decision than Model A's point estimates.


Exercise D.5 --- Evaluating a Bayesian Betting System

A betting system uses Bayesian updating to track team win probabilities and bets when the posterior mean deviates from the market by more than 5 percentage points. Over two seasons:

Season Bets Wins Win Rate ROI Avg Edge
Season 1 124 72 58.1% +8.2% 6.8%
Season 2 131 66 50.4% -4.1% 5.9%

(a) What might explain the performance decline from Season 1 to Season 2?

(b) From a Bayesian perspective, if your prior belief is that the system has a 55% long-term win rate, what is your posterior belief after observing both seasons (255 total bets, 138 wins)?

(c) Compute the posterior probability that the system is profitable (win rate > 52.4% at -110).

(d) Should you continue using this system? Justify your recommendation using Bayesian reasoning.


Part E: Research & Extension

Each problem is worth 5 points. These require independent research beyond Chapter 10. Cite all sources.


Exercise E.1 --- History of Bayesian Methods in Sports

Research the history of Bayesian methods in sports analytics. Write a 500-700 word essay covering (a) early applications of Bayesian statistics in baseball (e.g., empirical Bayes estimates of batting averages), (b) the role of Bayesian methods in the Elo rating system and its variants, (c) modern applications in basketball analytics (player evaluation, lineup optimization), (d) adoption of Bayesian methods by professional sports teams and betting operations, and (e) the current frontier (real-time Bayesian updating for in-game predictions).


Exercise E.2 --- Probabilistic Programming for Sports

Research one probabilistic programming framework (PyMC, Stan, Turing.jl, or NumPyro) and write a 400-600 word summary of (a) its computational approach (MCMC algorithm used), (b) its strengths and weaknesses for sports modeling, (c) at least one published sports application, and (d) a comparison with at least one alternative framework.


Exercise E.3 --- Bayesian Approaches to Market Efficiency

Research how Bayesian methods have been used to study the efficiency of sports betting markets. Find at least two academic papers or substantive analyses that use Bayesian techniques to test whether betting markets are efficient. For each, summarize (a) the research question, (b) the Bayesian methodology used, (c) the key findings, and (d) implications for bettors.


Exercise E.4 --- Hierarchical Models in Team Sports

Research hierarchical Bayesian models as applied to team sports ratings. Find at least two examples (papers, blog posts, or open-source implementations) of hierarchical models used for rating teams. For each, describe (a) the sport and prediction target, (b) the hierarchical structure (what is shared across teams), (c) how the model handles new or relocated teams, and (d) how it compares to non-hierarchical alternatives.


Exercise E.5 --- Bayesian Optimization for Betting Strategy Tuning

Research Bayesian optimization and how it could be applied to tuning betting strategy hyperparameters (e.g., edge thresholds, Kelly fraction multipliers, model weights). Write a 400-600 word summary covering (a) the basic idea of Bayesian optimization (surrogate models, acquisition functions), (b) why it is suitable for betting strategy tuning where each evaluation is expensive (a full backtest), (c) a concrete example of what you would optimize, and (d) potential pitfalls (overfitting to backtested performance).


Scoring Guide

Part Problems Points Each Total Points
A: Conceptual Understanding 8 5 40
B: Calculations 7 5 35
C: Programming Challenges 5 10 50
D: Analysis & Interpretation 5 5 25
E: Research & Extension 5 5 25
Total 30 --- 175

Grading Criteria

Part A (Conceptual): Full credit requires clear, accurate explanations that demonstrate understanding of Bayesian principles and their relevance to sports betting. Partial credit for incomplete but correct reasoning.

Part B (Calculations): Full credit requires correct final answers with all work shown. Partial credit for correct methodology with arithmetic errors.

Part C (Programming): Graded on correctness (40%), code quality and documentation (30%), and test coverage (30%). Code must execute without errors.

Part D (Analysis): Graded on analytical depth, logical reasoning, and appropriate application of Bayesian concepts to real-world betting scenarios. Multiple valid approaches may exist.

Part E (Research): Graded on research quality, source credibility, analytical depth, and clear writing. Minimum source requirements specified per problem.


Solutions: Complete worked solutions for all exercises are available in code/exercise-solutions.py. For programming challenges, reference implementations are provided in the code/ directory.