Library › Discrete Mathematics for Computer Science › Part IV: Number Theory and Cryptography › Chapter 26: Hashing, Checksums, and Error Detection › Chapter 26 — Key Takeaways
Chapter 26 — Key Takeaways
Hashing, Checksums, and Error Detection. Reread this before an exam: it is the chapter compressed to
its definitions, formulas, and decision rules.
The one idea
Map something large → something small, fast, for comparison. Three jobs, one shape:
Use
Maps
Math
Guarantee / limit
Hash table
key → slot $0..m{-}1$
mod arithmetic (Ch. 23)
collisions guaranteed (pigeonhole)
Parity bit
block → 1 bit
count mod 2
catches odd # flips; misses even
Weighted checksum
digits → check digit
weighted sum mod prime
catches single-digit + transposition
CRC
message → $r$-bit remainder
polynomial ÷ over $\mathbb{Z}_2$
catches all bursts of length $\le r$
Cryptographic hash
data → fixed digest
one-way + collision-resistant
collisions infeasible to find
Term
Definition
Hash function
deterministic, fast $h\colon U \to \{0,\dots,m-1\}$, $\lvert U\rvert \gg m$
Collision
distinct $k_1 \ne k_2$ with $h(k_1) = h(k_2)$
Load factor
$\alpha = n/m$ = keys ÷ slots = avg keys per slot
Checksum
small value sent alongside data; recompute & compare to detect change
Parity bit
one bit making total #1s even (even parity) or odd (odd parity)
CRC
remainder of (message $\times x^r$) ÷ generator $G(x)$ over $\mathbb{Z}_2$
Cryptographic hash
hash + preimage, second-preimage, collision resistance
Collision theorem. $\lvert U\rvert > m \Rightarrow \exists\,k_1 \ne k_2,\ h(k_1)=h(k_2)$.
(Proof: "no collision" $\equiv$ $h$ injective $\Rightarrow \lvert U\rvert \le m$, contradiction.)
Generalized pigeonhole on a table. Some slot holds $\ge \lceil n/m \rceil = \lceil\alpha\rceil$
keys. So $\alpha$ lower-bounds the worst chain.
Expected chained lookup (uniform hashing): $O(1 + \alpha)$. Resize when $\alpha$ exceeds a
threshold (e.g. 0.75).
Parity. Detects an error iff the number of flips $e$ is odd (each flip toggles count mod 2).
ISBN-10 check digit. $\displaystyle\sum_{i=1}^{10} i\,d_i \equiv 0 \pmod{11}$; check digit $10 = $
X. Prime 11 ⇒ every single-digit error and every adjacent transposition detected.
CRC detection rule. Error $E(x)$ undetected $\iff G(x) \mid E(x)$. Therefore choose $G$ so that:
Want to detect
Require of $G(x)$
all single-bit errors
$\ge 2$ nonzero terms
all bursts of length $\le r$
$\deg G = r$
all odd numbers of errors
factor $(x+1)$
Hamming(7,4). Parity at positions $1,2,4$; data at $3,5,6,7$. Syndrome $= s_4 s_2 s_1$ (binary)
$=$ bad position ($0 \Rightarrow$ no error). Minimum distance 3 ⇒ corrects $t=1$ error.
Decision rule: which scheme do I use?
Threat is an ADVERSARY (tampering, forgery)?
└─ yes → cryptographic hash (SHA-256); keyed → HMAC. (CRC is NOT safe here.)
└─ no (random faults):
need to CORRECT, not just detect?
└─ yes → error-correcting code (Hamming / Ch. 38 codes)
└─ no (detect only):
burst errors (disk/network)? → CRC
human-typed identifier? → weighted checksum mod prime
cheapest single-flip detection? → parity bit
Building a fast lookup, not a channel? → hash table (prime m, watch α, plan for collisions)
Worked numbers (verify yourself)
Computation
Result
$1234 \bmod 11$, $5678 \bmod 11$
both $2$ (a collision); $91011 \bmod 11 = 8$
even parity of $1010001$ (three 1s)
append $1$ → $10100011$
ISBN-10 check, digits $1..9$ (sum $285 \equiv 10$)
X (i.e. 10)
CRC: $M=1101$, $G=1011$
remainder $001$; frame $1101001$
hamming_encode([1,0,1,1])
$[0,1,1,0,0,1,1]$
decode after flipping position 5
syndrome $101_2 = 5$, recovers $[1,0,1,1]$
Hamming distance of $1011010$, $1001011$
$2$
Common pitfalls
⚠️ "Passed the check ⇒ correct." Parity misses 2-bit errors; no detection scheme proves
correctness — pigeonhole guarantees collisions exist.
⚠️ Power-of-two table size. $k \bmod 2^p$ keeps only low $p$ bits; clusters aligned/patterned
keys. Prefer a prime $m$ (or mix bits first).
⚠️ CRC for security. CRC is linear & keyless ⇒ an attacker recomputes/preserves it. Use a
cryptographic hash against tampering.
⚠️ Wrong base case / modulus. Mod 10 (not prime) misses transpositions ISBN's mod 11 catches.
⚠️ "Find the perfect collision-free hash." Impossible for $\lvert U\rvert > m$ — a theorem, not a
skill gap.
Function
Does
Note
hamming_distance(a, b)
#positions where bit lists differ
the ruler of coding theory (Ch. 38)
hamming_encode(bits)
4 data bits → 7-bit codeword
parity at positions 1,2,4
hamming_decode(bits)
corrects $\le 1$ error → (data, pos)
syndrome names the bad bit
Connections
Builds on: Ch. 23 (modular arithmetic, $\mathbb{Z}_2$, modular inverse), Ch. 17 (pigeonhole &
generalized pigeonhole).
Sets up: Ch. 38 (coding theory: Hamming distance + finite fields → error correction ).
Threshold ideas: hash functions cannot be injective; hard-to-find is as good as nonexistent
(foundation of crypto, echoed in Ch. 25 and Ch. 37's "easy to check, hard to find").
← Previous
Case Study: Building a Reliable Frame Protocol over a Noisy Link
Next →
Further Reading: Hashing, Checksums, and Error Detection