Further Reading: Conditional Probability and Bayes' Theorem

Annotated pointers for going deeper. Start with the textbook sections to firm up the mechanics, then branch toward whichever thread pulls you: the cognitive science of why the base-rate fallacy is so sticky, the machine-learning life of naive Bayes, or the algorithms side of randomization.

Core textbook treatments

Rosen, Discrete Mathematics and Its Applications (8th ed.), §7.2–7.3 The market-standard discrete-math treatment of conditional probability, independence, and Bayes' theorem, with a deep, well-graded exercise bank (including the classic Monty Hall and medical-test problems). If you want more drill than this chapter provides, start here.

Lehman, Leighton & Meyer, Mathematics for Computer Science (MIT 6.042), conditional-probability and "deviation" chapters Freely available. The most CS-flavored presentation in print: conditional probability is developed with an eye toward algorithm analysis, and the same source carries the indicator-variable / linearity-of-expectation machinery used in §21.6. The closest companion in spirit to this chapter.

Graham, Knuth & Patashnik, Concrete Mathematics (2nd ed.), Chapter 8 (Discrete Probability) Probability as a working tool for the practicing computer scientist, heavy on generating functions and expectation. Denser than this chapter, but excellent for the expectation arguments behind randomized analysis.

On Bayes' theorem and the base-rate fallacy

Gigerenzer, Calculated Risks (also published as Reckoning with Risk) The definitive popular-but-rigorous case for natural frequencies as the cure for the base-rate fallacy — exactly the "imagine 100,000 people" reframing of §21.4. Full of real medical-test and courtroom examples where professionals got the conditional backwards.

Kahneman, Thinking, Fast and Slow (chapters on base rates and representativeness) Tier 2 — attributed. The accessible synthesis of the Kahneman–Tversky research program showing that base-rate neglect is systematic, not random. Read it for why the 9%-vs-99% gap in §21.4 fools even experts.

3Blue1Brown, "Bayes theorem, the geometry of changing beliefs" (YouTube) Free. A visual derivation of Bayes that makes "posterior $\propto$ likelihood $\times$ prior" and the natural-frequencies picture click. The best 15-minute on-ramp if the algebra of §21.3 still feels formal.

On naive Bayes and text classification

Manning, Raghavan & Schütze, Introduction to Information Retrieval, "Text classification and Naive Bayes" chapter Freely available online. The standard reference for the naive Bayes spam/text classifier built in §21.5 and Case Study 2, including Laplace smoothing, the log-space trick, and a careful statement of the conditional-independence assumption. Read this right after the chapter.

Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.), the probability and naive Bayes sections Tier 2 — attributed. Places naive Bayes inside the broader family of probabilistic models (Bayesian networks), showing precisely which independence assumptions each model relaxes — the "simplest member of a family" remark in §21.5 made concrete.

On randomized algorithms

Cormen, Leiserson, Rivest & Stein (CLRS), Introduction to Algorithms (4th ed.), §7.3–7.4 (randomized quicksort) and the probabilistic-analysis chapter The canonical, fully worked indicator-variable analysis of randomized quicksort that §21.6 sketches — including the $\frac{2}{j-i+1}$ pairwise-comparison probability and the harmonic-sum bound to $O(n\log n)$.

Motwani & Raghavan, Randomized Algorithms Tier 2 — attributed. The graduate-level standard on Las Vegas vs. Monte Carlo algorithms, error amplification, and the analysis techniques behind them. The place to go after CLRS if randomization becomes a serious interest.

Katz & Lindell, Introduction to Modern Cryptography (3rd ed.), the number-theory / primality sections The bridge from §21.6's Monte Carlo error analysis to why RSA key generation relies on a randomized primality test — the exact "the keys are made by a Monte Carlo algorithm" point that closes the chapter and sets up Part IV.

Suggested order

Re-read §§21.3–21.4 here, then work the Rosen §7.2–7.3 exercises for drill.
Watch the 3Blue1Brown Bayes video, then read a chapter of Gigerenzer's Calculated Risks for the base-rate intuition.
Read the Manning–Raghavan–Schütze naive Bayes chapter alongside Case Study 2.
Read CLRS §7.3–7.4 for the full randomized-quicksort proof; save Motwani–Raghavan for after Part IV.
When you reach the RSA thread (Chapters 22–25), revisit Katz–Lindell's primality sections to see this chapter's Monte Carlo analysis put to work.