Self-Assessment Quiz: Conditional Probability and Bayes' Theorem

Twenty questions to check your understanding. Answer before opening the key. Aim for 16+.


Question 1

The conditional probability $P(A \mid B)$ is defined as:

A) $P(A)\,P(B)$ B) $\dfrac{P(A \cap B)}{P(B)}$, requiring $P(B) > 0$ C) $\dfrac{P(B \cap A)}{P(A)}$, requiring $P(A) > 0$ D) $P(A) + P(B) - P(A \cap B)$

Question 2

"Conditioning on $B$" does what to the sample space?

A) enlarges it to include $B$ B) leaves it unchanged but rescales $A$ C) shrinks it to the outcomes in $B$ and renormalizes so probabilities sum to 1 D) removes $A$ from it

Question 3

From the corrected SaaS table ($P(A \cap C) = 0.10$, $P(A) = 0.40$, $P(C) = 0.25$), the value $P(A \mid C)$ equals:

A) $0.25$ B) $0.40$ C) $0.10$ D) $0.625$

Question 4

Two events $A$ and $B$ are independent exactly when:

A) $P(A \cap B) = 0$ B) $P(A \cup B) = P(A) + P(B)$ C) $P(A \cap B) = P(A)\,P(B)$ D) $P(A \mid B) = P(B \mid A)$

Question 5

Events $A$ and $B$ are mutually exclusive (disjoint) and each has probability $0.3$. They are:

A) independent, because disjoint events don't influence each other B) dependent, because $P(A \cap B) = 0 \ne 0.09 = P(A)P(B)$ C) independent only if $P(A) = P(B)$ D) impossible to classify without more information

Question 6

In Bayes' theorem $P(A \mid B) = \dfrac{P(B \mid A)\,P(A)}{P(B)}$, the factor $P(A)$ is called the:

A) likelihood B) posterior C) prior D) evidence

Question 7

In the same formula, $P(B \mid A)$ is the:

A) prior B) likelihood C) posterior D) normalizer

Question 8

The law of total probability writes the Bayes denominator $P(B)$ (two-case form) as:

A) $P(B \mid A) + P(B \mid \overline{A})$ B) $P(B \mid A)\,P(A) + P(B \mid \overline{A})\,P(\overline{A})$ C) $P(A)\,P(\overline{A})$ D) $1 - P(B \mid A)$

Question 9 (True/False, justify)

True or false: If $P(A \mid B) = 0.9$, then $P(B \mid A) = 0.9$. Justify in one sentence.

Question 10

A disease affects 1 in 1,000 people. A test has 99% sensitivity and a 1% false-positive rate. The probability that a person who tests positive actually has the disease is closest to:

A) 99% B) 90% C) 50% D) 9%

Question 11

In the rare-disease computation, the reason the answer is so far below the test accuracy is that:

A) the test is actually broken B) the huge healthy majority produces far more false positives than there are true positives C) sensitivity and specificity were swapped D) the base rate was ignored in defining sensitivity

Question 12 (True/False, justify)

True or false: A test that is "99% accurate" gives a 99%-reliable diagnosis regardless of how rare the condition is. Justify in one sentence.

Question 13

The "prosecutor's fallacy" is the courtroom version of confusing:

A) sensitivity with specificity B) $P(\text{match} \mid \text{innocent})$ with $P(\text{innocent} \mid \text{match})$ C) independence with disjointness D) a prior with a posterior of the same event

Question 14

The "naive" assumption in a naive Bayes classifier is that the features are:

A) mutually exclusive B) equally likely C) mutually independent given the class D) independent of the class label

Question 15

When classifying with naive Bayes, we may ignore the denominator $P(w_1, \dots, w_n)$ because:

A) it is always equal to 1 B) it is the same positive constant for every class, so it doesn't change which class scores highest C) the features are independent D) it equals the prior

Question 16

A word that never appeared in spam during training has $P(w \mid S) = 0$. In the unsmoothed classifier, the effect on the spam score for any email containing that word is:

A) negligible B) it slightly lowers the score C) it sets the entire spam product to 0 D) it raises the ham score

Question 17

Real naive Bayes implementations sum log-probabilities instead of multiplying probabilities mainly to:

A) make the math exact B) avoid floating-point underflow from multiplying many small numbers C) change which class wins D) eliminate the need for a prior

Question 18

A Las Vegas algorithm is:

A) always correct, with a random running time (fast in expectation) B) bounded in time but occasionally wrong C) always wrong with small probability D) deterministic but slow

Question 19 (Short answer)

A one-sided Monte Carlo test errs with probability at most $\tfrac{1}{2}$ per run and is run $k$ times independently, answering "yes" if any run says yes. Give the bound on the probability that the combined test is wrong, and state the smallest $k$ that pushes it below $10^{-6}$.

Question 20 (Short answer)

In the randomized-quicksort analysis, the comparison indicators $X_{ij}$ are highly dependent. In one sentence, name the single tool that nonetheless lets us compute $E[X] = \sum_{i


Answer Key

Q Ans Note
1 B Definition of conditional probability; needs $P(B) > 0$.
2 C Conditioning shrinks the sample space to $B$ and renormalizes.
3 B $P(A \mid C) = 0.10 / 0.25 = 0.40$ (divide the corner cell by the column total).
4 C Independence is the product rule $P(A \cap B) = P(A)P(B)$.
5 B Disjoint-with-positive-probability $\Rightarrow$ dependent; the product test fails.
6 C $P(A)$ is the prior.
7 B $P(B \mid A)$ is the likelihood (how probable the evidence is if $A$ holds).
8 B Sum over the partition $\{A, \overline{A}\}$, each term a multiplication-rule product.
9 False $P(A\mid B)$ and $P(B\mid A)$ share the numerator $P(A\cap B)$ but divide by different denominators; equal only in special cases.
10 D $\approx 0.0902$; false positives ($\sim$999) swamp true positives ($\sim$99).
11 B Base-rate effect: false positives from the large healthy group dominate.
12 False Accuracy is a likelihood $P(T\mid D)$; the diagnosis is the posterior $P(D\mid T)$, which collapses for rare conditions.
13 B Swapping $P(\text{match}\mid\text{innocent})$ and $P(\text{innocent}\mid\text{match})$.
14 C Features are assumed conditionally independent given the class.
15 B The shared denominator is a positive constant across classes, so the $\arg\max$ is unaffected.
16 C A single zero likelihood annihilates the entire product (the zero-frequency problem).
17 B Summing logs prevents underflow; $\arg\max$ is unchanged because $\log$ is increasing.
18 A Las Vegas: always correct, random (fast-in-expectation) running time.
19 Error $\le (\tfrac12)^k = 2^{-k}$; smallest $k$ with $2^{-k} < 10^{-6}$ is $k = 20$ ($2^{-20} \approx 9.5\times10^{-7}$).
20 Linearity of expectation: $E[\sum X_{ij}] = \sum E[X_{ij}]$ holds for any random variables, dependent or not.

Topics to review by question

Questions Topic
1–3 Conditional probability and reading joint tables (§21.1)
4–5 Independence vs. mutual exclusivity (§21.2)
6–9 Bayes' theorem, its vocabulary, and direction (§21.3)
10–13 The base-rate / prosecutor's fallacy (§21.4)
14–17 Naive Bayes: the assumption and its production wrinkles (§21.5)
18–20 Randomized algorithms: Monte Carlo, Las Vegas, linearity of expectation (§21.6, Ch. 20)