Chapter 20 Quiz: Probability Thinking

Q: True or False: If P(A) = 0.3 and P(B) = 0.4, then P(A or B) must equal 0.7.

False. P(A or B) = P(A) + P(B) - P(A and B). This equals 0.7 only if A and B are mutually exclusive (P(A and B) = 0). If they can overlap, P(A or B) < 0.7. For example, if P(A and B) = 0.1, then P(A or B) = 0.3 + 0.4 - 0.1 = 0.6.

Q: True or False: The more simulations you run in a Monte Carlo estimation, the more accurate the estimate becomes.

True. This is a consequence of the law of large numbers. As the number of simulations increases, the proportion of successes converges to the true probability. The accuracy (as measured by the standard error) improves proportionally to 1/sqrt(n), where n is the number of simulations.

Q: What does this code compute, and what is the approximate output? ```python import numpy as np np.random.seed(42) count = 0 n_sims = 100000 for _ in range(n_sims): rolls = np.random.randint(1, 7, size=2) if sum(rolls) >= 10: count += 1 print(f"Result: {count/n_sims:.4f}") ```

This code estimates the probability that the sum of two dice is 10 or greater. The possible sums >= 10 are: 10 (3 ways: 4+6, 5+5, 6+4), 11 (2 ways: 5+6, 6+5), and 12 (1 way: 6+6). That's 6 out of 36 outcomes = 6/36 = 1/6 ≈ 0.1667. The output will be approximately `Result: 0.1667`.

Contributors to Introduction to Data Science

Chapter 20 Quiz: Probability Thinking

Instructions: Answer all questions before checking solutions. For multiple choice, select the best answer. For short answer, aim for 2-4 clear sentences. Total points: 100.

Section 1: Multiple Choice (10 questions, 4 points each)

Question 1. A fair coin has been flipped 10 times and landed heads every time. What is the probability of heads on the 11th flip?

(A) Less than 0.5 — tails is "due"
(B) Exactly 0.5 — each flip is independent
(C) Greater than 0.5 — the coin seems biased toward heads
(D) It depends on whether the coin is truly fair

Answer

**Correct: (B)** If the coin is fair, each flip is independent. The coin has no memory. P(heads) = 0.5 regardless of previous outcomes. The belief that tails is "due" is the gambler's fallacy. Now, in practice, 10 heads in a row might make you question whether the coin IS fair — that's a different (and valid) question. But if the coin is fair, the answer is 0.5.

Question 2. A disease affects 1 in 500 people. A test for the disease has 98% sensitivity (true positive rate) and 3% false positive rate. If you test positive, what is the approximate probability you have the disease?

(A) About 98%
(B) About 6%
(C) About 50%
(D) About 3%

Answer

**Correct: (B)** Using Bayes' theorem: P(disease) = 1/500 = 0.002. P(positive | disease) = 0.98. P(positive | no disease) = 0.03. P(positive) = 0.98 * 0.002 + 0.03 * 0.998 = 0.00196 + 0.02994 = 0.0319. P(disease | positive) = 0.00196 / 0.0319 ≈ 0.061, or about 6%. The low base rate means most positive tests are false positives.

Question 3. What does the law of large numbers guarantee?

(A) That if you flip a coin 100 times, you'll get exactly 50 heads
(B) That short streaks of heads will be balanced by streaks of tails
(C) That the proportion of heads approaches 0.5 as the number of flips increases
(D) That the absolute number of heads will always equal the number of tails

Answer

**Correct: (C)** The law of large numbers says that the *proportion* (not the count) converges to the theoretical probability. It does NOT guarantee exact balance (A), short-run compensation (B), or equal counts (D). In fact, the absolute difference between heads and tails can grow — it's the proportion that shrinks toward 0.5.

Question 4. Which of the following pairs of events are independent?

(A) Drawing a king from a deck, then drawing another king without replacement
(B) A student studying hard, and that student passing the exam
(C) Rolling a 3 on one die, and rolling a 5 on a different die
(D) The temperature being high today, and ice cream sales being high today

Answer

**Correct: (C)** Two events are independent if knowing one occurred doesn't change the probability of the other. Rolling separate dice are independent. (A) is NOT independent — removing a king changes the deck. (B) is NOT independent — studying affects passing probability. (D) is NOT independent — high temperature causes more ice cream sales.

Question 5. In the Monty Hall problem (3 doors, 1 car, 2 goats), after the host reveals a goat, you should:

(A) Stay with your original door — 50/50 chance either way
(B) Switch — it increases your probability of winning to 2/3
(C) It doesn't matter — the probability is 1/3 regardless
(D) Switch — it increases your probability of winning to 3/4

Answer

**Correct: (B)** Your original pick has a 1/3 chance of being correct. That probability doesn't change when the host opens a door (the host always CAN open a goat door). The remaining door therefore has a 2/3 chance. Switching doubles your probability of winning from 1/3 to 2/3. This is counterintuitive but confirmed by both mathematical proof and simulation.

Question 6. P(A and B) = P(A) * P(B) is true when:

(A) A and B are mutually exclusive
(B) A and B are independent
(C) A and B are complementary
(D) Always — this is the definition of P(A and B)

Answer

**Correct: (B)** The multiplication rule P(A and B) = P(A) * P(B) only applies when A and B are independent. When they're not independent, the correct formula is P(A and B) = P(A) * P(B|A). If A and B are mutually exclusive (A), P(A and B) = 0, which does NOT equal P(A) * P(B) (unless one of them has probability 0).

Question 7. What is the sample space for rolling two dice and recording the sum?

(A) {1, 2, 3, 4, 5, 6}
(B) {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
(C) {(1,1), (1,2), ..., (6,6)} — all 36 ordered pairs
(D) Either (B) or (C) depending on the context

Answer

**Correct: (D)** If we define the experiment as "recording the sum," then (B) is the sample space — the set of possible sums. If we define it as "rolling two dice," then (C) is the sample space — all 36 outcomes. The distinction matters because the outcomes in (B) are NOT equally likely (7 is more likely than 2), while the outcomes in (C) ARE equally likely. Either answer can be correct depending on how you frame the experiment.

Question 8. You want to estimate the probability of a rare event that happens about 0.1% of the time. How many simulations should you run for a reliable estimate?

(A) About 100
(B) About 1,000
(C) About 100,000 or more
(D) Simulation can't estimate rare event probabilities

Answer

**Correct: (C)** To reliably estimate a probability of 0.001, you need enough simulations that you expect to see the event many times. With 100 simulations, you'd expect 0.1 occurrences — not enough. With 1,000, you'd expect 1 occurrence — barely any. With 100,000, you'd expect 100 occurrences, which is enough for a reasonable estimate. Rule of thumb: you need at least 1/p simulations, and ideally 10/p or more, where p is the probability you're estimating.

Question 9. The expected value of rolling a fair die is 3.5. This means:

(A) You're most likely to roll a 3 or 4
(B) If you roll the die many times, the average of all rolls will approach 3.5
(C) You will eventually roll a 3.5
(D) Half the time you'll roll above 3.5 and half below

Answer

**Correct: (B)** The expected value is the long-run average — the value the mean of many rolls converges to. You can never actually roll 3.5 (C). It's not about what you're "most likely" to roll (A) — all outcomes are equally likely. And (D) is about the median, not the expected value (and it's actually not quite right either, since 3.5 isn't a possible roll).

Question 10. Which approach to estimating probability involves generating thousands of random outcomes using a computer?

(A) Classical probability
(B) Bayesian inference
(C) Monte Carlo simulation
(D) Frequentist hypothesis testing

Answer

**Correct: (C)** Monte Carlo simulation estimates probabilities by running random experiments many times on a computer and counting outcomes. It's named after the Monte Carlo casino. It's especially useful when analytical formulas are difficult or impossible to derive. We used this approach throughout Chapter 20 to estimate probabilities before introducing formulas.

Section 2: True/False (3 questions, 5 points each)

Question 11. True or False: If P(A) = 0.3 and P(B) = 0.4, then P(A or B) must equal 0.7.

Answer

**False.** P(A or B) = P(A) + P(B) - P(A and B). This equals 0.7 only if A and B are mutually exclusive (P(A and B) = 0). If they can overlap, P(A or B) < 0.7. For example, if P(A and B) = 0.1, then P(A or B) = 0.3 + 0.4 - 0.1 = 0.6.

Question 12. True or False: P(A|B) is always equal to P(B|A).

Answer

**False.** P(A|B) and P(B|A) are generally different. Confusing them is called the "confusion of the inverse" or "transposing the conditional." For example: P(has fever | has flu) is high, but P(has flu | has fever) is much lower (many conditions cause fever). Bayes' theorem relates the two, but they are not equal.

Question 13. True or False: The more simulations you run in a Monte Carlo estimation, the more accurate the estimate becomes.

Answer

**True.** This is a consequence of the law of large numbers. As the number of simulations increases, the proportion of successes converges to the true probability. The accuracy (as measured by the standard error) improves proportionally to 1/sqrt(n), where n is the number of simulations.

Section 3: Short Answer (3 questions, 5 points each)

Question 14. Explain in your own words why Bayes' theorem is important for data science. Give one practical application.

Answer

Bayes' theorem provides a formal framework for updating beliefs in light of new evidence. It combines prior knowledge (what you believed before seeing data) with new observations to produce a posterior belief. This is fundamental to data science because we're constantly using data to revise our understanding of the world. Practical applications include spam filtering (updating the probability that an email is spam based on the words it contains), medical diagnosis (updating disease probability based on test results), and recommendation systems (updating user preference estimates based on behavior).

Question 15. Explain the birthday problem result (50% chance of a shared birthday with just 23 people) to someone who finds it surprising. Why is the answer so much lower than most people guess?

Answer

The surprise comes from confusing two different questions. Most people think "What's the chance someone shares MY birthday?" — and that requires about 253 people for a 50% chance. But the actual question is "What's the chance ANY two people share A birthday?" With 23 people, there are 23*22/2 = 253 different pairs, and each pair has a 1/365 chance of matching. Those small individual probabilities accumulate across many pairs. People underestimate how rapidly "at least one match among many pairs" probabilities grow.

Question 16. What is the difference between the gambler's fallacy and the law of large numbers? Someone might confuse them — explain how they're related but different.

Answer

The law of large numbers says that the *proportion* of heads converges to 0.5 over many flips. The gambler's fallacy incorrectly concludes that the *next flip* must be influenced by previous flips to "correct" any imbalance. The law of large numbers works through DILUTION (new flips make old streaks a smaller proportion of the total), not through COMPENSATION (future flips counteracting past ones). The coin doesn't know what happened before. The proportion evens out not because the coin "tries to balance" but because any streak becomes statistically insignificant as the total number of flips grows.

Section 4: Applied Scenarios (2 questions, 7.5 points each)

Question 17. Jordan is checking whether their university's grade distributions show randomness or systematic patterns. They pick 5 random courses from the catalog. The probability that any given course has an "easy" grade distribution (mean GPA above 3.5) is 0.20.

(a) What's the probability that all 5 courses are "easy"? (b) What's the probability that none of the 5 are "easy"? (c) What's the probability that at least one is "easy"? (d) Are these courses independent? Under what circumstances might they NOT be independent?

Answer

Assuming independence: (a) P(all 5 easy) = 0.20^5 = 0.00032 (very unlikely) (b) P(none easy) = 0.80^5 = 0.3277 (about 33%) (c) P(at least one easy) = 1 - P(none) = 1 - 0.3277 = 0.6723 (about 67%) (d) They might NOT be independent if: (1) all courses are in the same department (departments might have grading cultures), (2) the student selected courses known for easy grading (selection bias), or (3) the courses share the same instructor.

Question 18. Elena wants to estimate the mean vaccination rate for a region with 500 countries. She can only survey 50 countries. She takes a random sample and gets a sample mean of 74.2%.

She repeats this process: takes another random sample of 50 and gets 71.8%. Another: 76.1%. Another: 73.5%.

(a) Why does she get a different answer each time? (b) If she took 1,000 such samples and plotted the distribution of sample means, what would the shape look like? (c) If she increased her sample size to 200 countries, would the sample means be more or less variable? Explain. (d) What concept from this chapter explains why larger samples give more reliable estimates?

Answer

(a) **Sampling variability** — each random sample selects different countries, producing different results. This is the fundamental randomness of sampling. (b) The distribution would be approximately **bell-shaped** (normal), centered around the true population mean. This is a preview of the Central Limit Theorem ([Chapter 21](../chapter-21-distributions-normal-curve/index.md)). (c) **Less variable** — the sample means would cluster more tightly around the true mean. The standard error decreases as sample size increases. (d) The **law of large numbers** explains this — larger samples produce averages that are closer to the population mean because the random fluctuations of individual observations average out.

Section 5: Code Analysis (2 questions, 5 points each)

Question 19. What does this code compute, and what is the approximate output?

import numpy as np
np.random.seed(42)

count = 0
n_sims = 100000
for _ in range(n_sims):
    rolls = np.random.randint(1, 7, size=2)
    if sum(rolls) >= 10:
        count += 1

print(f"Result: {count/n_sims:.4f}")

Answer

This code estimates the probability that the sum of two dice is 10 or greater. The possible sums >= 10 are: 10 (3 ways: 4+6, 5+5, 6+4), 11 (2 ways: 5+6, 6+5), and 12 (1 way: 6+6). That's 6 out of 36 outcomes = 6/36 = 1/6 ≈ 0.1667. The output will be approximately `Result: 0.1667`.

Question 20. This code contains a conceptual error. What is wrong, and what would the correct code look like?

import numpy as np

# Estimate P(drawing two aces from a deck)
n_sims = 100000
successes = 0
for _ in range(n_sims):
    card1 = np.random.randint(1, 53)  # Card 1-52
    card2 = np.random.randint(1, 53)  # Card 2
    # Aces are cards 1-4
    if card1 <= 4 and card2 <= 4:
        successes += 1

print(f"P(two aces): {successes/n_sims:.4f}")

Answer

The error is that the code draws card2 from the same full deck of 52 cards, meaning it's possible to draw the same card twice. This simulates drawing WITH replacement, when the problem intends drawing WITHOUT replacement (you can't draw the same physical card twice). The correct probability with replacement is (4/52) * (4/52) = 16/2704 ≈ 0.0059. The correct probability without replacement is (4/52) * (3/51) = 12/2652 ≈ 0.0045. To fix the code for without replacement:

for _ in range(n_sims):
    deck = list(range(1, 53))
    card1 = np.random.choice(deck)
    deck.remove(card1)
    card2 = np.random.choice(deck)
    if card1 <= 4 and card2 <= 4:
        successes += 1

Or more efficiently:

for _ in range(n_sims):
    cards = np.random.choice(range(1, 53), size=2, replace=False)
    if cards[0] <= 4 and cards[1] <= 4:
        successes += 1