Further Reading: Hashing, Checksums, and Error Detection

Annotated pointers for going deeper. The chapter touched four worlds — hash tables, error-detecting codes, the algebra of $\mathbb{Z}_2[x]$, and cryptographic hashing — and each has a natural next step. Start with the textbook sections, then follow whichever thread (data structures, coding theory, or cryptography) pulls you.


Core textbook treatments

Rosen, Discrete Mathematics and Its Applications (8th ed.), §4.5 (hashing functions, ISBN/UPC check digits) and §4.6 / §13.x (the number-theoretic background). The market-standard discrete-math treatment of modular hashing and identifier check digits, with a large graded exercise bank. The cleanest place to get more drill on the ISBN/UPC arithmetic of §26.4.

Cormen, Leiserson, Rivest & Stein (CLRS), Introduction to Algorithms (4th ed.), Chapter 11 ("Hash Tables"). The canonical algorithms reference for this chapter: chaining, open addressing, the load-factor analysis ($O(1+\alpha)$), and a full development of universal and perfect hashing with the Carter–Wegman family and its $1/m$ collision bound. Read this for the rigor behind §26.2–26.3.

Lehman, Leighton & Meyer, Mathematics for Computer Science (MIT 6.042), the "Hashing" / pigeonhole material. Freely available. Develops the pigeonhole principle and its consequences (including collisions) in the same CS-first spirit as this book. The best companion for seeing §26.2's collision theorem as one instance of a broad counting argument.

On the pigeonhole principle behind collisions

Graham, Knuth & Patashnik, Concrete Mathematics (2nd ed.), Chapter 3 (and the binomial/counting material). Denser, but it treats the counting arguments that power load-factor and birthday-style estimates as working tools. Read it after Part III if you want the math behind "why a load factor of 1 still leaves a long chain."

On error-detecting and error-correcting codes (the CRC thread)

CLRS / Rosen coding sections, then a dedicated coding text. For the §26.5 CRC algebra and its sequel: Rosen's chapter on the integers and polynomials gives the $\mathbb{Z}_2[x]$ groundwork. For the full theory — why a generator catches the error classes it does, and the leap to correction — the standard graduate references are the coding-theory chapters that Chapter 38 will build on; see that chapter's further reading for specifics so you meet the finite-field machinery (Chapter 24) at the right time.

Knuth, The Art of Computer Programming, Vol. 2 / Vol. 3 (the hashing and arithmetic material). Encyclopedic and careful on hashing and on polynomial/modular arithmetic. Use it as a reference, not a read-through; it is where to settle a fine point about a hash's distribution or a CRC's algebra.

On cryptographic hashing (the §26.6 thread)

Katz & Lindell, Introduction to Modern Cryptography (3rd ed.), the chapters on hash functions and message authentication. The rigorous treatment of preimage, second-preimage, and collision resistance, of the birthday bound on collision-finding, and of HMAC — exactly the §26.6 properties, made precise. This is the right next book the moment you want to know why SHA-256 is trusted and what "infeasible" means formally.

Suggested order

  1. Re-read §§26.1–26.3 here, then work CLRS Chapter 11 for the algorithmic depth on load factor and universal hashing.
  2. Read the MIT 6.042 pigeonhole/hashing material to see the collision theorem as one counting argument among many.
  3. For the CRC algebra, consolidate the $\mathbb{Z}_2[x]$ background (Rosen / Knuth Vol. 2), then defer the full coding theory to Chapter 38, where finite fields (Chapter 24) make correction possible.
  4. When the cryptographic side (§26.6) grabs you, move to Katz & Lindell's hash-function and MAC chapters — and notice they lean on the same Chapter 25 "infeasible to compute" idea.