Chapter 20 Exercises: Probability Thinking

Contributors to Introduction to Data Science

Chapter 20 Exercises: Probability Thinking

How to use these exercises: The best way to learn probability is by doing — specifically, by simulating. For every problem, try to solve it both by reasoning/formula AND by simulation. If the two answers agree, you can be confident in both approaches.

Difficulty key: Foundational | Intermediate | Advanced | Extension

Part A: Conceptual Understanding (Foundational)

Exercise 20.1 — Three interpretations

For each of the following probability statements, identify which interpretation of probability is being used (classical, frequentist, or subjective):

(a) "The probability of drawing a heart from a standard deck is 13/52." (b) "Based on historical data, the probability that it rains in Seattle on a randomly chosen day in January is 0.58." (c) "I think there's a 30% chance the project finishes on time." (d) "If you rolled this die a million times, about 1/6 of the rolls would be sixes." (e) "The probability of a fair coin landing heads is 0.5."

Guidance

(a) Classical — counting equally likely outcomes. (b) Frequentist — based on observed long-run frequency. (c) Subjective — a personal degree of belief. (d) Frequentist — defined by what would happen over many repetitions. (e) Classical (could also be argued as frequentist) — based on symmetry of the coin.

Exercise 20.2 — Complement rule

(a) If the probability of rain tomorrow is 0.35, what is the probability it does NOT rain? (b) If 23% of countries have vaccination rates below 70%, what percentage have rates of 70% or above? (c) Why is the complement rule useful? Describe a situation where it's easier to compute P(not A) and then subtract from 1 than to compute P(A) directly.

Guidance

(a) 1 - 0.35 = 0.65. (b) 100% - 23% = 77%. (c) Example: "What's the probability of getting at least one 6 in 10 dice rolls?" Computing P(at least one 6) directly requires considering 1, 2, 3, ... up to 10 sixes. But P(zero sixes) = (5/6)^10, so P(at least one) = 1 - (5/6)^10 is much simpler.

Exercise 20.3 — Independence check

Determine whether each pair of events is independent or not independent. Explain your reasoning.

(a) Flipping heads on a coin, then flipping heads again (b) Drawing a red card from a deck, then drawing another red card without replacing the first (c) It raining today, and a stock market crash today (d) A student studying for an exam, and the student passing the exam

Guidance

(a) Independent — each flip is unaffected by previous flips. (b) NOT independent — removing a red card changes the proportion of red cards remaining. (c) Independent (almost certainly — weather and stock crashes have no causal connection). (d) NOT independent — studying affects the probability of passing.

Exercise 20.4 — The gambler's fallacy

A roulette wheel has landed on red 8 times in a row. Your friend says, "Black is definitely coming next — it's overdue." Explain why this reasoning is wrong. Then explain what the law of large numbers actually says (and what it doesn't say) about long streaks.

Guidance

Each spin is independent — the wheel has no memory. The probability of black on the next spin is exactly the same as always (about 18/38 on an American wheel). The law of large numbers says that over thousands of spins, the proportion of reds and blacks will converge to their theoretical probabilities — but it says nothing about what the next individual spin will be. Streaks of 8 are unusual but not impossible, and they don't make the opposite outcome more likely.

Exercise 20.5 — Conditional probability intuition

Elena's data shows that among high-income countries, 92% have measles vaccination rates above 80%. Among all countries, 65% have rates above 80%.

(a) Write these facts using conditional probability notation. (b) Is P(high income | vacc > 80%) the same as P(vacc > 80% | high income)? (c) Without computing the exact answer, explain why P(high income | vacc > 80%) would be LESS than 92%.

Guidance

(a) P(vacc > 80% | high income) = 0.92. P(vacc > 80%) = 0.65. (b) No — these are different conditional probabilities (different direction). (c) P(high income | vacc > 80%) is less because many non-high-income countries also achieve >80% vaccination. The 65% overall rate includes contributions from all income groups, so being >80% doesn't guarantee you're high-income.

Exercise 20.6 — Bayes' theorem in words

Explain Bayes' theorem in plain English to a friend who has never taken a statistics class. Use the medical test example: a disease affects 1 in 10,000 people, and the test is 95% accurate (5% false positive rate). Your friend tests positive. Walk them through why the probability they have the disease is much less than 95%.

Guidance

In a group of 10,000 people, only 1 has the disease. The test correctly identifies that person (95% chance). But of the 9,999 healthy people, 5% — about 500 — will also test positive. So out of roughly 501 positive tests, only 1 is a true positive. That's about 0.2%. The key insight: when the condition is very rare, even a small false positive rate produces many more false alarms than true detections.

Exercise 20.7 — Addition rule

A die is rolled once. Compute the probability of each event: (a) Rolling a 2 or a 5 (mutually exclusive) (b) Rolling an even number or a number greater than 3 (overlapping) (c) Rolling a number less than 3 or greater than 4 (mutually exclusive)

Verify each answer by simulation.

Guidance

(a) P(2 or 5) = 1/6 + 1/6 = 2/6. (b) Even: {2,4,6}. >3: {4,5,6}. Overlap: {4,6}. P = 3/6 + 3/6 - 2/6 = 4/6. (c) <3: {1,2}. >4: {5,6}. No overlap. P = 2/6 + 2/6 = 4/6.

Exercise 20.8 — Expected value

Marcus is considering a new menu item. There's a 60% chance it's popular (profit: $800/month) and a 40% chance it flops (loss: $300/month). What is the expected monthly value? Should he introduce the item?

Guidance

E(value) = 0.60 * $800 + 0.40 * (-$300) = $480 - $120 = $360/month. The expected value is positive, so on average the item is profitable. He should introduce it — but he should also consider his risk tolerance (can he afford the $300/month loss for a while if it takes time to catch on?).

Part B: Simulation Exercises (Intermediate)

Exercise 20.9 — Coin flip simulator

Write a function coin_flip_experiment(n_flips, n_experiments) that: (a) Simulates n_experiments runs, each consisting of n_flips coin flips (b) Returns an array of the proportion of heads in each experiment (c) Plots a histogram of the proportions (d) Prints the mean and standard deviation of the proportions

Run it with n_flips = 10, 100, and 1000 (each with 10,000 experiments). How does the distribution of proportions change as n_flips increases? What does this illustrate?

Guidance

As n_flips increases, the histogram gets narrower and more centered on 0.5. This illustrates the law of large numbers (proportions converge) and previews the concept of sampling distributions ([Chapter 22](../chapter-22-sampling-estimation/index.md)).

Exercise 20.10 — Dice sum simulator

Write a simulation to find the probability distribution of the sum of two fair dice. Run 100,000 trials. Plot the results as a bar chart. Which sum is most likely? Verify that the simulated probabilities match the theoretical ones.

Guidance

The most likely sum is 7 (probability 6/36 = 1/6). The distribution is symmetric and triangular-shaped, peaking at 7. Sums of 2 and 12 are least likely (1/36 each).

Exercise 20.11 — The birthday problem (extended)

Modify the birthday simulation from the chapter to answer: how many people do you need in a room for there to be a 99% chance that at least two share a birthday? Run simulations for group sizes from 2 to 100 and plot P(shared birthday) vs. group size. Mark where the probability crosses 50% and 99%.

Guidance

50% is crossed at 23 people, 99% is crossed at about 57 people. The curve rises steeply and then levels off. This exercise reinforces how quickly "OR" probabilities accumulate.

Exercise 20.12 — Monty Hall extended

Extend the Monty Hall simulation to work with n doors (where n >= 3). The host always opens one losing door. How does the advantage of switching change as the number of doors increases? Try n = 3, 5, 10, 50, and 100.

Guidance

With n doors, the probability of winning by staying is 1/n, and the probability of winning by switching is (n-1)/(n*(n-2)) ... actually, when the host opens just one door, P(switch wins) = (n-1)/(n*(n-2))... No — let's think again. With 3 doors and 1 opened: switch wins 2/3. With n doors and 1 opened: stay wins 1/n, switch wins (n-1)/(n*(n-2)). As n grows, the advantage of switching diminishes (approaches 1/(n-1) vs 1/n). Simulation will confirm.

Exercise 20.13 — Medical test Bayes simulation

Create a simulation for the medical test scenario with customizable parameters: - disease_rate: prevalence of the disease (default 0.001) - sensitivity: P(positive | disease) (default 0.99) - false_positive_rate: P(positive | no disease) (default 0.02)

Run the simulation and compare with the Bayes' theorem formula. Then explore: how does the posterior probability change as you vary (a) the disease rate from 0.0001 to 0.1, (b) the false positive rate from 0.001 to 0.1? Create plots showing these relationships.

Guidance

def bayes_medical(disease_rate, sensitivity, fpr, n_sim=1000000):
    has_disease = np.random.random(n_sim) < disease_rate
    test_pos = np.where(has_disease,
                        np.random.random(n_sim) < sensitivity,
                        np.random.random(n_sim) < fpr)
    return np.mean(has_disease[test_pos])

Key insight: the posterior is very sensitive to the base rate. Even with a fantastic test, rare diseases produce low posteriors.

Exercise 20.14 — Sampling variability experiment

Using Elena's vaccination data (or simulated data), investigate sampling variability: (a) Take 1,000 random samples of size n=20 and compute the mean of each (b) Repeat for n=50 and n=100 (c) Plot the three distributions of sample means on the same figure (d) Compute the standard deviation of the sample means for each sample size (e) How does the standard deviation of sample means relate to n? (Hint: it's proportional to 1/sqrt(n))

Guidance

The standard deviation of sample means (called the standard error) should be approximately population_std / sqrt(n). This relationship is fundamental and will be formalized in Chapter 22.

Exercise 20.15 — Simulate P(at least one) problems

Use simulation to estimate the probability of at least one success in these scenarios: (a) At least one 6 in 4 dice rolls (b) At least one head in 3 coin flips (c) At least one shared birthday in a group of 30 (d) At least one defective item in a sample of 20 from a batch with 5% defect rate

For (a) and (b), verify with the complement rule: P(at least one) = 1 - P(none).

Exercise 20.16 — Random walk simulation

A "random walk" starts at position 0. At each step, you move +1 (with probability 0.5) or -1 (with probability 0.5). Simulate 100 random walks of 1000 steps each. Plot all 100 walks on the same graph (use low alpha for transparency). What do you notice about where the walks end up? What is the expected final position? What is the standard deviation of final positions?

Guidance

n_walks = 100
n_steps = 1000
walks = np.cumsum(np.random.choice([-1, 1], size=(n_walks, n_steps)), axis=1)

Expected final position is 0 (symmetric). Standard deviation of final positions is approximately sqrt(n_steps) = sqrt(1000) ≈ 31.6. The walks spread out over time — this is diffusion.

Exercise 20.17 — Monte Carlo estimation of pi

Use simulation to estimate pi. The method: generate random (x, y) points in a 1x1 square. Count how many fall inside a quarter-circle of radius 1. The ratio (inside/total) approximates pi/4.

(a) Implement this simulation. (b) Run it with 100, 1,000, 10,000, and 1,000,000 points. (c) Plot how the estimate converges to pi as the number of points increases. (d) How does this relate to the law of large numbers?

Guidance

n = 1000000
x = np.random.random(n)
y = np.random.random(n)
inside = np.sum(x**2 + y**2 <= 1)
pi_estimate = 4 * inside / n

This converges to pi by the law of large numbers — the proportion of points inside the circle converges to the area ratio.

Exercise 20.18 — Simulating rare events

How many times do you need to roll a die to get three 6s in a row? Simulate 10,000 trials and plot the distribution of "number of rolls needed." What is the median? The mean? Why are these different?

Guidance

The expected number is quite large (around 258 rolls, since P(three 6s in a row) = (1/6)^3 = 1/216, and the expected waiting time involves geometric distribution reasoning). The distribution is right-skewed, so the median is less than the mean.

Part C: Synthesis and Real-World Application (Advanced)

Exercise 20.19 — Elena's sampling strategy

Elena needs to select a sample of countries for detailed analysis. She has 200 countries in her dataset, grouped into 4 income categories. She can only afford to analyze 40 countries in detail. Compare three sampling strategies by simulation: (a) Simple random sample of 40 countries (b) Stratified sample: 10 from each income group (c) Proportional stratified sample: number from each group proportional to group size

For each strategy, simulate 1000 samples and compute the mean vaccination rate. Which strategy gives the least variable estimates? Which best represents the population?

Exercise 20.20 — The false positive paradox in real life

Research a real-world example of the false positive paradox (base rate neglect). Some possibilities: airport security screening, drug testing in sports, COVID-19 testing in low-prevalence populations, or mammography screening for breast cancer. Write a 200-300 word analysis that includes the relevant probabilities and a Bayes' theorem calculation.

Exercise 20.21 — Probability in risk communication

The same probability can be communicated in different ways that affect how people perceive it. Express the following risk in three different ways and discuss which communication is most helpful:

"If 1,000 women aged 50-60 are screened for breast cancer, about 80 will have a positive mammogram. Of those 80, about 8 will actually have cancer. Of the 920 who test negative, about 2 will have cancer that was missed."

Compute: sensitivity, specificity, false positive rate, positive predictive value, and negative predictive value. Which numbers matter most for a patient deciding whether to get screened?

Exercise 20.22 — Simulating a real-world process

Marcus wants to know: if he makes 50 pastries each morning and each one has a 3% chance of being unsatisfactory (too flat, burnt edge, etc.), what's the probability that he has to throw away more than 5 in a day? Simulate this scenario 10,000 times. Also compute the expected number of rejects and the standard deviation.

Guidance

Each pastry is a Bernoulli trial with p=0.03. The number of rejects in 50 pastries follows a binomial distribution (which we'll formalize in [Chapter 21](../chapter-21-distributions-normal-curve/index.md)). Expected rejects: 50 * 0.03 = 1.5. P(>5) should be quite small.

Part D: Extension Problems (Challenge)

Exercise 20.23 — Bayesian updating sequence

You have a coin that might be fair (P(heads) = 0.5) or biased (P(heads) = 0.8). Initially, you think there's a 50% chance it's fair. You flip the coin 10 times and get 8 heads. Use Bayes' theorem to update your belief after EACH flip. Plot your evolving belief P(fair) vs. flip number. How quickly do you become confident about which coin you have?

Exercise 20.24 — The coupon collector problem

A cereal company puts one of 6 different toys in each box. You want to collect all 6. How many boxes do you need to buy, on average? Simulate this 100,000 times and plot the distribution of "boxes needed." Compare with the theoretical answer: E = 6 * (1/6 + 1/5 + 1/4 + 1/3 + 1/2 + 1/1) = 14.7.

Exercise 20.25 — Two-envelope problem

You're given two envelopes. One contains twice as much money as the other. You pick one and find $100. Should you switch? Simulate both strategies (always stay, always switch) over 100,000 trials. What do you find? This is a famous probability puzzle — research the paradox and explain why the naive argument for switching is wrong.

Exercise 20.26 — Simulating the St. Petersburg paradox

A game works like this: flip a coin until it comes up tails. If the first tails appears on flip n, you win $2^n. The expected payout is infinite (sum of $1 + $1 + $1 + ...). Simulate 10,000 games. What's the average payout? Would you pay $100 to play this game? $1,000? Why does the simulation give a finite average even though the expected value is infinite?

Exercise 20.27 — Simpson's Paradox revisited with probability

Create a simulation demonstrating Simpson's Paradox. Two hospitals treat patients. Hospital A has a higher overall survival rate, but Hospital B has a higher survival rate for both mild AND severe cases. Simulate this by adjusting the proportion of severe cases at each hospital. Show both the aggregated and disaggregated probabilities.

Exercise 20.28 — Probability calibration

Make 20 predictions about events you're uncertain about (e.g., "Will it rain tomorrow?", "Will my team win?", "Will I finish this assignment today?"). For each, assign a probability. Then track the outcomes. If you said "80% likely" for 10 things, about 8 should happen. Plot your predicted probabilities vs. actual hit rates. Are you well-calibrated? Most people are overconfident.

Exercise 20.29 — Benford's Law simulation

Benford's Law states that in many real-world datasets, the leading digit is "1" about 30% of the time, not 11% as you'd expect. Verify this using (a) simulated exponentially-growing quantities, and (b) a real dataset (population sizes, financial data, etc.). This law is actually used to detect fraud in financial data.

Exercise 20.30 — Your own Monte Carlo

Identify a real-world probability question that interests you — something that would be hard to compute analytically but easy to simulate. Write a Monte Carlo simulation to estimate the answer. Present your question, simulation code, results, and interpretation in a short report (300-500 words plus code).

Reflection

After completing these exercises, you should be comfortable with:

[ ] Simulating random experiments in Python using np.random
[ ] Computing probabilities using complement, addition, and multiplication rules
[ ] Distinguishing independent from dependent events
[ ] Applying Bayes' theorem and explaining why base rates matter
[ ] Demonstrating the law of large numbers through simulation
[ ] Computing and interpreting expected value
[ ] Recognizing the gambler's fallacy, base rate neglect, and other probability traps

If these feel solid, you're ready for Chapter 21: Distributions and the Normal Curve.