Chapter 26 — Key Takeaways

Hashing, Checksums, and Error Detection. Reread this before an exam: it is the chapter compressed to its definitions, formulas, and decision rules.

The one idea

Map something large → something small, fast, for comparison. Three jobs, one shape:

Use Maps Math Guarantee / limit
Hash table key → slot $0..m{-}1$ mod arithmetic (Ch. 23) collisions guaranteed (pigeonhole)
Parity bit block → 1 bit count mod 2 catches odd # flips; misses even
Weighted checksum digits → check digit weighted sum mod prime catches single-digit + transposition
CRC message → $r$-bit remainder polynomial ÷ over $\mathbb{Z}_2$ catches all bursts of length $\le r$
Cryptographic hash data → fixed digest one-way + collision-resistant collisions infeasible to find

Definitions (precise forms)

Term Definition
Hash function deterministic, fast $h\colon U \to \{0,\dots,m-1\}$, $\lvert U\rvert \gg m$
Collision distinct $k_1 \ne k_2$ with $h(k_1) = h(k_2)$
Load factor $\alpha = n/m$ = keys ÷ slots = avg keys per slot
Checksum small value sent alongside data; recompute & compare to detect change
Parity bit one bit making total #1s even (even parity) or odd (odd parity)
CRC remainder of (message $\times x^r$) ÷ generator $G(x)$ over $\mathbb{Z}_2$
Cryptographic hash hash + preimage, second-preimage, collision resistance

Key theorems & formulas

  • Collision theorem. $\lvert U\rvert > m \Rightarrow \exists\,k_1 \ne k_2,\ h(k_1)=h(k_2)$. (Proof: "no collision" $\equiv$ $h$ injective $\Rightarrow \lvert U\rvert \le m$, contradiction.)
  • Generalized pigeonhole on a table. Some slot holds $\ge \lceil n/m \rceil = \lceil\alpha\rceil$ keys. So $\alpha$ lower-bounds the worst chain.
  • Expected chained lookup (uniform hashing): $O(1 + \alpha)$. Resize when $\alpha$ exceeds a threshold (e.g. 0.75).
  • Parity. Detects an error iff the number of flips $e$ is odd (each flip toggles count mod 2).
  • ISBN-10 check digit. $\displaystyle\sum_{i=1}^{10} i\,d_i \equiv 0 \pmod{11}$; check digit $10 = $ X. Prime 11 ⇒ every single-digit error and every adjacent transposition detected.
  • CRC detection rule. Error $E(x)$ undetected $\iff G(x) \mid E(x)$. Therefore choose $G$ so that:
Want to detect Require of $G(x)$
all single-bit errors $\ge 2$ nonzero terms
all bursts of length $\le r$ $\deg G = r$
all odd numbers of errors factor $(x+1)$
  • Hamming(7,4). Parity at positions $1,2,4$; data at $3,5,6,7$. Syndrome $= s_4 s_2 s_1$ (binary) $=$ bad position ($0 \Rightarrow$ no error). Minimum distance 3 ⇒ corrects $t=1$ error.

Decision rule: which scheme do I use?

Threat is an ADVERSARY (tampering, forgery)?
    └─ yes → cryptographic hash (SHA-256); keyed → HMAC.  (CRC is NOT safe here.)
    └─ no (random faults):
         need to CORRECT, not just detect?
             └─ yes → error-correcting code (Hamming / Ch. 38 codes)
             └─ no (detect only):
                  burst errors (disk/network)?    → CRC
                  human-typed identifier?          → weighted checksum mod prime
                  cheapest single-flip detection?  → parity bit
Building a fast lookup, not a channel?  → hash table (prime m, watch α, plan for collisions)

Worked numbers (verify yourself)

Computation Result
$1234 \bmod 11$, $5678 \bmod 11$ both $2$ (a collision); $91011 \bmod 11 = 8$
even parity of $1010001$ (three 1s) append $1$ → $10100011$
ISBN-10 check, digits $1..9$ (sum $285 \equiv 10$) X (i.e. 10)
CRC: $M=1101$, $G=1011$ remainder $001$; frame $1101001$
hamming_encode([1,0,1,1]) $[0,1,1,0,0,1,1]$
decode after flipping position 5 syndrome $101_2 = 5$, recovers $[1,0,1,1]$
Hamming distance of $1011010$, $1001011$ $2$

Common pitfalls

  • ⚠️ "Passed the check ⇒ correct." Parity misses 2-bit errors; no detection scheme proves correctness — pigeonhole guarantees collisions exist.
  • ⚠️ Power-of-two table size. $k \bmod 2^p$ keeps only low $p$ bits; clusters aligned/patterned keys. Prefer a prime $m$ (or mix bits first).
  • ⚠️ CRC for security. CRC is linear & keyless ⇒ an attacker recomputes/preserves it. Use a cryptographic hash against tampering.
  • ⚠️ Wrong base case / modulus. Mod 10 (not prime) misses transpositions ISBN's mod 11 catches.
  • ⚠️ "Find the perfect collision-free hash." Impossible for $\lvert U\rvert > m$ — a theorem, not a skill gap.

Toolkit additions — coding.py

Function Does Note
hamming_distance(a, b) #positions where bit lists differ the ruler of coding theory (Ch. 38)
hamming_encode(bits) 4 data bits → 7-bit codeword parity at positions 1,2,4
hamming_decode(bits) corrects $\le 1$ error → (data, pos) syndrome names the bad bit

Connections

  • Builds on: Ch. 23 (modular arithmetic, $\mathbb{Z}_2$, modular inverse), Ch. 17 (pigeonhole & generalized pigeonhole).
  • Sets up: Ch. 38 (coding theory: Hamming distance + finite fields → error correction).
  • Threshold ideas: hash functions cannot be injective; hard-to-find is as good as nonexistent (foundation of crypto, echoed in Ch. 25 and Ch. 37's "easy to check, hard to find").