Chapter 3 Exercises: Probability Fundamentals
Part A: Conceptual Questions (8 Questions)
A.1: What Is Probability?
Explain the difference between the frequentist and Bayesian interpretations of probability. For each interpretation, describe how it would answer the question: "What does it mean when a prediction market contract is priced at $0.65?"
In your answer, address: - How each interpretation defines probability - How each interpretation handles one-time events (e.g., "Will Candidate X win the 2028 election?") - Which interpretation is more natural for prediction markets and why
A.2: Sample Spaces in Practice
For each of the following prediction market scenarios, write out the sample space. State whether the sample space is finite, countably infinite, or uncountably infinite. Explain your reasoning.
a) A market on which party wins the next UK general election (Conservative, Labour, or Other). b) A market on the exact closing price of the S&P 500 on December 31. c) A market on how many hurricanes will make landfall in the US this year. d) A market on whether a specific Supreme Court case is decided 5-4, 6-3, 7-2, 8-1, or 9-0.
A.3: Mutually Exclusive vs. Independent
Explain the difference between mutually exclusive events and independent events. Can two events be both mutually exclusive and independent? Provide a prediction market example for each concept.
A.4: The Base Rate Fallacy
A prediction market trader reads a report that claims: "Whenever this indicator has flashed in the past, the market crashed 80% of the time." The trader immediately buys crash insurance contracts.
Explain why this reasoning may be flawed using the concept of base rates and Bayes' theorem. What additional information does the trader need?
A.5: Why Variance Matters
Two traders both have positive expected value strategies: - Trader A: EV = $0.02 per trade, makes 100 trades per day on contracts priced near $0.50. - Trader B: EV = $0.08 per trade, makes 5 trades per day on contracts priced near $0.50.
Which trader has more predictable daily results? Justify your answer using variance and the concepts from this chapter. Which trader would you rather be, and why?
A.6: Interpreting Conditional Probability
A prediction market for "Will Company X be acquired?" is currently priced at $0.25. You observe: - P(Acquisition | Positive earnings report) = 0.40 - P(Acquisition | Negative earnings report) = 0.10
The earnings report is about to be released. Explain in plain language what these conditional probabilities mean. If the market price moves to $0.40 immediately after a positive earnings report, is the market behaving consistently with these probabilities?
A.7: The Gambler's Fallacy and the LLN
Your friend has lost money on 7 of his last 10 prediction market trades (all on independent events). He says: "I'm due for a winning streak. The law of large numbers says things have to even out."
Explain why your friend is wrong. What does the LLN actually say? How should past losses inform future trading decisions?
A.8: Conjugate Priors
Explain in your own words what a conjugate prior is and why the Beta-Binomial model is particularly useful for prediction markets. What happens to the posterior as you observe more and more data? Under what conditions does the choice of prior matter a lot, and when does it matter little?
Part B: Calculations (8 Problems)
B.1: Basic Probability Rules
A prediction market platform offers the following contracts for an upcoming election:
| Candidate | Market Price |
|---|---|
| Candidate A | $0.45 |
| Candidate B | $0.35 |
| Candidate C | $0.15 |
| Other | $0.08 |
a) Do these prices form a valid probability distribution? If not, what might explain the discrepancy? b) If you assume the true probabilities are proportional to these prices but must sum to 1, what are the normalized probabilities? c) What is the probability that neither Candidate A nor Candidate B wins? d) If Candidates A and B are from the same party and one drops out, the remaining one absorbs all their support. What would the new probabilities be if Candidate B drops out?
B.2: Conditional Probability
In a prediction market, you observe the following: - P(Policy passes) = 0.40 - P(Market rally | Policy passes) = 0.70 - P(Market rally | Policy fails) = 0.30
a) Calculate P(Market rally) using the law of total probability. b) Calculate P(Policy passes | Market rally) using Bayes' theorem. c) Calculate P(Policy passes | No market rally). d) If you observe a market rally, how should the prediction market price for the policy contract change?
B.3: Bayes' Theorem — Drug Test Analogy
A prediction market contract pays $1 if a certain economic indicator exceeds a threshold. You have a "signal" (a statistical model) that predicts whether the indicator will exceed the threshold.
- The signal has a true positive rate of 90% (if the indicator will exceed, the signal correctly predicts "yes" 90% of the time).
- The signal has a false positive rate of 20% (if the indicator will not exceed, the signal incorrectly predicts "yes" 20% of the time).
- The base rate (prior probability) that the indicator exceeds the threshold is 15%.
a) If the signal says "yes," what is the posterior probability that the indicator will exceed the threshold? b) If the market is priced at $0.15 and your signal says "yes," should you buy? What is your EV? c) If you improve the false positive rate to 5%, recalculate the posterior and EV. d) What false positive rate would make the posterior exactly 0.50?
B.4: Inclusion-Exclusion
Three prediction markets track whether certain bills pass Congress by year-end: - P(Bill A passes) = 0.50 - P(Bill B passes) = 0.40 - P(Bill C passes) = 0.30 - P(A and B pass) = 0.25 - P(A and C pass) = 0.20 - P(B and C pass) = 0.15 - P(All three pass) = 0.10
a) Calculate P(at least one bill passes). b) Calculate P(exactly one bill passes). c) Calculate P(none pass). d) If you buy contracts on all three bills at their market prices, what is the probability you profit on at least one?
B.5: Expected Value Analysis
You are evaluating three prediction market trades:
| Contract | Market Price | Your Estimated Probability | Position Size |
|---|---|---|---|
| "GDP > 3%" | $0.20 | 0.28 | $200 | ||
| "Inflation < 2%" | $0.60 | 0.52 | $150 | ||
| "Rate cut by June" | $0.45 | 0.55 | $300 |
a) Calculate the EV for each trade. b) What is the total portfolio EV? c) For each trade, calculate the maximum profit and maximum loss. d) Which trade has the best risk-adjusted EV (EV / max_loss)? e) If you can only make one trade, which should you choose and why?
B.6: Variance and Risk
Using the trades from B.5:
a) Calculate the variance and standard deviation of each individual trade outcome (assuming your estimated probabilities are correct). b) Assuming all three trades are independent, calculate the portfolio variance and standard deviation. c) Now assume trades 1 and 3 have a correlation of 0.60 (GDP growth and rate cuts are positively correlated). Recalculate the portfolio variance. How much higher is it? d) Calculate the probability that you lose money on ALL three trades simultaneously, assuming independence.
B.7: Beta Distribution
You are modeling your uncertainty about the true probability of an event using a Beta distribution.
a) If you start with Beta(1, 1) (uniform prior) and observe 6 successes and 4 failures, what is the posterior distribution? What is its mean? b) If you start with Beta(10, 10) (strong prior centered at 0.50) and observe the same 6 successes and 4 failures, what is the posterior? What is its mean? c) Compare the posteriors from (a) and (b). Why are they different? Which is more influenced by the data? d) How many observations of the form (6 successes, 4 failures per 10 trials) would you need to shift the Beta(10, 10) prior's posterior mean above 0.55?
B.8: Sequential Bayesian Updating
A prediction market contract is currently at $0.50. Three pieces of evidence arrive:
- Evidence 1: Likelihood ratio = 3 (evidence is 3x more likely if hypothesis is true)
- Evidence 2: Likelihood ratio = 0.5 (evidence is half as likely if hypothesis is true)
- Evidence 3: Likelihood ratio = 4 (evidence is 4x more likely if hypothesis is true)
a) Using the odds form of Bayes' theorem, calculate the posterior probability after each piece of evidence. b) What is the final probability after all three pieces? c) Does the order of evidence matter? Prove your answer. d) What single likelihood ratio would produce the same update as all three combined?
Part C: Programming Challenges (6 Challenges)
C.1: ProbabilitySpace Class
Build a ProbabilitySpace class that:
- Takes a dictionary of outcomes and their probabilities as input
- Validates that probabilities sum to 1 (within tolerance)
- Implements methods for:
probability(event: set) -> float--- returns P(event)complement(event: set) -> float--- returns P(event complement)union(event_a: set, event_b: set) -> float--- returns P(A union B)intersection(event_a: set, event_b: set) -> float--- returns P(A intersect B)conditional(event_a: set, event_b: set) -> float--- returns P(A | B)is_independent(event_a: set, event_b: set, tol: float) -> bool
Test your class with a sample space representing a prediction market with 4 candidates.
C.2: Full Bayesian Updater with Visualization
Build a BayesianTracker class that:
- Takes a prior probability and hypothesis name
- Has an
update(evidence_name, likelihood_if_true, likelihood_if_false)method - Tracks the full history of updates
- Includes a
plot_history()method that creates a matplotlib line chart showing how the probability evolved with each piece of evidence - Includes a
to_dataframe()method that returns a pandas DataFrame of the update history - Add labels showing the likelihood ratio for each update on the plot
Test it with at least 5 sequential evidence updates.
C.3: Monte Carlo LLN Simulator
Write a function simulate_lln(true_prob, market_price, max_trades, n_paths) that:
- Simulates
n_pathsindependent trading paths, each withmax_tradesbinary trades - For each path, tracks cumulative average profit
- Plots all paths on one chart, showing convergence to the true EV
- Adds horizontal lines for the true EV and for zero
- Adds a histogram of final average profits as a subplot
- Calculates and prints: the percentage of paths that are profitable at trade 10, 100, 500, and max_trades
- Demonstrates that longer trading horizons lead to higher probability of profitability for positive EV strategies
C.4: Distribution Explorer
Create a script that generates a 2x2 subplot figure showing:
- Top-left: Bernoulli PMFs for p = 0.2, 0.5, 0.8 (bar charts overlaid)
- Top-right: Binomial PMFs for n = 20 with p = 0.3, 0.5, 0.7
- Bottom-left: Beta PDFs for (alpha, beta) = (1,1), (2,5), (5,2), (10,10), (50,50)
- Bottom-right: Normal PDFs for (mu, sigma) = (0, 1), (0, 2), (2, 1)
Add appropriate titles, legends, and labels to each subplot. Include a brief interpretation of each distribution in terms of prediction markets as print statements.
C.5: Expected Value Optimizer
Build a function find_best_trades(contracts, budget) that:
- Takes a list of contracts, each with: name, market_price, your_estimated_prob, max_position
- Takes a total budget constraint
- Uses a greedy algorithm to allocate budget to contracts with the highest EV per dollar risked
- Returns the optimal allocation
- Reports total EV, total risk (max loss), and EV/risk ratio
- Handle edge cases: no positive EV trades, budget smaller than cheapest contract, etc.
Test with a portfolio of 10 hypothetical prediction market contracts.
C.6: Calibration Checker
Write a CalibrationChecker class that:
- Takes a list of (predicted_probability, actual_outcome) pairs
- Bins predictions into probability buckets (e.g., 0-10%, 10-20%, ..., 90-100%)
- For each bucket, calculates the average predicted probability and the actual frequency of outcomes
- Plots a calibration curve (predicted vs. actual) with a diagonal reference line
- Calculates the Brier score: $\text{BS} = \frac{1}{n}\sum_{i=1}^n (p_i - o_i)^2$
- Prints a summary showing over-confident and under-confident buckets
Test with both well-calibrated and poorly-calibrated synthetic data.
Part D: Analysis Scenarios (5 Scenarios)
D.1: Market-Implied Probability Analysis
A prediction market offers the following contracts for "Which country will host the 2036 Olympics?":
| City | Market Price |
|---|---|
| Istanbul | $0.22 |
| Doha | $0.18 |
| Mexico City | $0.15 |
| Jakarta | $0.12 |
| Toronto | $0.10 |
| Mumbai | $0.08 |
| Other | $0.20 |
a) What is the total "overround" (sum of prices minus 1)? b) Remove the overround proportionally to derive implied probabilities. c) If you believe Istanbul's true probability is 0.30, calculate the EV of buying Istanbul contracts at $0.22. d) The IOC announces that Doha has been shortlisted along with 2 other unnamed cities. How should this affect the prices? Use Bayes' theorem with reasonable likelihood estimates. e) After the announcement, Doha's price jumps to $0.35. Is this reaction consistent with your Bayesian analysis?
D.2: Evaluating a Trading Strategy
A trader shows you their last 200 trades on binary prediction markets: - 118 trades were profitable - Average profit on winning trades: $0.22 - Average loss on losing trades: $0.18 - All trades were on contracts priced near $0.50
a) What is the trader's win rate? Is it statistically significantly different from 50%? (Use a binomial test or normal approximation.) b) Calculate the trader's expected profit per trade using the observed data. c) Calculate a 95% confidence interval for the true expected profit per trade. d) How many more trades would you need to observe to be 99% confident the trader has a positive edge? e) Could this track record be explained by luck alone? What is the probability of achieving at least 118 wins out of 200 fair coin flips?
D.3: Correlated Prediction Markets
You hold positions in four prediction markets, all expiring on Election Day: - "Party A wins the presidency" (price: $0.55) - "Party A wins the Senate" (price: $0.45) - "Party A wins the House" (price: $0.60) - "GDP growth exceeds 3% in Q4" (price: $0.30)
You believe these markets have the following correlation structure: - Presidency and Senate: correlation 0.70 - Presidency and House: correlation 0.65 - Senate and House: correlation 0.55 - GDP and each of the political markets: correlation 0.15
a) Calculate the portfolio variance assuming you hold $100 in each contract. (Hint: you need the covariance matrix.) b) Compare this to the variance you would calculate if you incorrectly assumed independence. c) What is the ratio of the true portfolio standard deviation to the independence-assumed standard deviation? d) Suggest a position adjustment that would reduce the portfolio variance while maintaining similar expected value.
D.4: Prior Sensitivity Analysis
You are analyzing a prediction market for "Will a recession occur in the next 12 months?" The market price is $0.20.
You want to use Bayes' theorem to update this probability based on a new jobs report that missed expectations. You estimate: - P(Bad jobs report | Recession coming) = 0.70 - P(Bad jobs report | No recession coming) = 0.25
a) Calculate the posterior probability using the market price ($0.20) as your prior. b) Now calculate the posterior using three different priors: 0.10, 0.30, and 0.50. c) Plot posterior vs. prior for this evidence (holding the likelihoods constant). What shape is this curve? d) At what prior does the posterior equal 0.50? e) Discuss: How sensitive is the posterior to the choice of prior? Under what conditions is prior sensitivity a concern for prediction market traders?
D.5: Is This Market Efficient?
A prediction market for "Will the Fed raise rates at their next meeting?" has shown the following price trajectory over 5 days:
| Day | Event | Price |
|---|---|---|
| Monday | Market opens | $0.30 |
| Tuesday | CPI data comes in higher than expected | $0.55 |
| Wednesday | Fed governor gives hawkish speech | $0.65 |
| Thursday | Employment data is weaker than expected | $0.50 |
| Friday | Market closes before meeting | $0.52 |
a) Calculate the implied likelihood ratio for each day's price change (treating the previous day's price as the prior and the current price as the posterior). b) Are the implied likelihood ratios reasonable? What would an unreasonable likelihood ratio suggest? c) On Thursday, the price dropped from $0.65 to $0.50. Calculate what the market is implying about P(Weak employment data | Rate hike) vs. P(Weak employment data | No rate hike). d) A Bayesian who started at $0.30 and processed all the same evidence should arrive at $0.52. Using the odds form, verify that the product of all likelihood ratios applied to the initial odds gives the final odds. e) What would it mean if the final price were significantly different from what sequential Bayesian updating would predict?
Part E: Research Questions (3 Questions)
E.1: Probability Theory in Market Design
Research how probability theory informs the design of prediction market mechanisms. Topics to explore: - How do automated market makers (like Hanson's LMSR) use logarithmic scoring rules rooted in information theory and probability? - What is the connection between proper scoring rules and probability elicitation? - How does the concept of a "probability simplex" relate to markets with multiple outcomes?
Write a 500-800 word essay summarizing your findings, with at least 3 academic references.
E.2: The Bayesian vs. Frequentist Debate in Prediction
Research the historical and ongoing debate between Bayesian and frequentist statistics as it applies to prediction and forecasting.
Address the following: - What are the key philosophical differences? - How does each framework handle the "reference class problem" (what set of events should we compare a unique event to)? - Are prediction market prices more naturally interpreted through a Bayesian or frequentist lens? - What are the practical differences when building forecasting models?
Write a 500-800 word essay with examples from prediction markets.
E.3: Novel Applications of Bayesian Updating
Identify and describe three novel or emerging applications of Bayesian updating beyond traditional prediction markets. For each application, explain: - The problem being solved - How Bayesian updating is applied - How it connects to the prediction market framework covered in this chapter - Current limitations and open questions
Possible areas to explore: clinical trial adaptive designs, cybersecurity threat assessment, climate model updating, sports analytics, intelligence analysis. Write 400-600 words.