Further Reading: Discrete Probability

Annotated pointers for going deeper. Start with the textbook sections that re-teach this chapter in a second voice, then branch toward whatever pulled you in: the probabilistic method, the analysis of randomized algorithms, or the hashing applications. Everything below is Tier-1 (canonical, verifiable) or Tier-2 (a real, named idea attributed carefully).

Core textbook treatments

Rosen, Discrete Mathematics and Its Applications (8th ed.), §7.1–7.4 — Tier 1. The market-standard development of discrete probability: sample spaces, the equally-likely model, conditional probability and Bayes, expected value, variance, and a clean treatment of linearity of expectation with the hat-check and similar examples. The largest graded exercise bank if you want more drill than this chapter provides; §7.4 ("Expected Value and Variance") matches our §20.4–20.5 closely.

Lehman, Leighton & Meyer, Mathematics for Computer Science (MIT 6.042J), the probability unit — Tier 1. Freely available. The most CS-flavored treatment in print, and the closest in spirit to this chapter. Probability is developed for algorithm analysis throughout: indicator variables and linearity are introduced as labor-saving tools, the birthday/hashing connection is made explicit, and the chapters flow straight into deviation bounds and randomized algorithms. The single best companion to read alongside ours.

Graham, Knuth & Patashnik, Concrete Mathematics (2nd ed.), Chapter 8 ("Discrete Probability") — Tier 1. Denser and more identity-focused: expectation and variance treated with the same generating-function machinery as the rest of the book, plus a memorable treatment of the "mean and variance of a sum." Best read after Chapter 11 (summations) and the generating-functions material; rewards the effort.

On expected value, variance, and concentration

Mitzenmacher & Upfal, Probability and Computing (2nd ed.), Chapters 1–4 — Tier 1. The standard graduate-level "probability for computer scientists" text. Chapters 1–4 cover exactly our arc — events, expectation, linearity, variance — and then go where we only gestured: Markov, Chebyshev, and Chernoff bounds, the tools that turn "small variance" into "almost never far from the mean." Read this when the §20.5 🔗 Connection callout about concentration makes you curious; it is the natural sequel.

Blitzstein & Hwang, Introduction to Probability (2nd ed.), the expectation and variance chapters — Tier 1. A famously readable probability text (the Harvard Stat 110 book) with strong intuition-building and worked examples. Heavier on continuous probability than we need, but its treatment of expectation, indicators, and the "story proofs" behind linearity is excellent for cementing the why.

On the probabilistic method

Alon & Spencer, The Probabilistic Method (4th ed.), Chapter 1 — Tier 1. The definitive book on the technique we previewed in §20.6, by the people who shaped the field. Chapter 1 develops exactly the two moves we used — "positive probability implies existence" and "an outcome at least as good as the average exists" — and applies them to the two-coloring and MAX-CUT results from our case studies, then far beyond. Demanding but foundational; read Chapter 1 after this chapter and you will recognize the arguments.

Paul Erdős and the origins of the method — Tier 2. The probabilistic method is universally attributed to Paul Erdős, who from the 1940s onward used random constructions to prove existence theorems in combinatorics (notably lower bounds for Ramsey numbers). For the historical arc, the introductory chapter of Alon & Spencer (above) and most graduate combinatorics courses recount it; treat specific dates and paper details as attributed-but-unverified unless you check the primary sources.

On hashing and randomized algorithms (the applications)

Cormen, Leiserson, Rivest & Stein (CLRS), Introduction to Algorithms (4th ed.), hashing and randomized-algorithms chapters — Tier 1. The canonical reference for hash tables (the subject of Case Study 1) and for the expected-time analysis of randomized algorithms. Its treatment of "simple uniform hashing," expected chain length, and the analysis of randomized quicksort uses exactly the indicator-plus-linearity technique you practiced here. The bridge from this chapter to Chapter 21 and to data structures generally.

Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching (2nd ed.), §6.4 — Tier 1. The deep, classic analysis of hashing, including the expected number of probes and the birthday-style collision mathematics that Case Study 1 applies. Reference-grade; dip in for the rigorous version of any hashing fact you want to nail down.

"The power of two choices" (balanced allocations) — Tier 2. A celebrated result in randomized algorithms: if each ball is placed in the less full of two randomly chosen bins, the maximum load drops dramatically (from $\Theta(\log m/\log\log m)$ to $\Theta(\log\log m)$). This is the idea behind Case Study 1's Option C. The result is commonly attributed to Azar, Broder, Karlin, and Upfal (1990s) and is surveyed in Mitzenmacher & Upfal (above); use that survey for the precise statements and references.

Tools and visualizations

3Blue1Brown, "Binomial distributions" and probability explainers (YouTube) — Tier 2. Free. Visual intuition for distributions, expectation as a balance point, and why randomness "clumps." Good for building the mental pictures behind §20.3–20.5 before (or after) the formal definitions.

numpy.random and Python's random module documentation — Tier 1. Free. Once you have written simulate from scratch (Project Checkpoint), the standard libraries give you fast, well-tested generators for larger Monte-Carlo studies — the right tool for the "Your Turn" simulations in both case studies, after the concept has been shown by hand.

Suggested order

Re-read §§20.2–20.4 here, then do Rosen §7.1–7.4 exercises for drill.
Read the MIT 6.042J probability unit for the CS framing of indicators and linearity — it mirrors this chapter most closely.
For the applications, read CLRS's hashing chapter (pairs with Case Study 1) and skim Knuth Vol. 3 §6.4 for the rigorous hashing mathematics.
For the probabilistic method, read Alon & Spencer Chapter 1 (pairs with Case Study 2 and §20.6).
Save Mitzenmacher & Upfal for after this chapter — it is the natural next course, picking up exactly where the §20.5 concentration callout leaves off and leading into Chapter 21's randomized-algorithm analysis.