Chapter 38 — Key Takeaways (Coding Theory)

Re-grounding card. Skim before an exam: definitions, the two theorems, the decision rules, the pitfalls.


The one idea

Spend redundancy to buy distance; let distance absorb noise. A code makes most bit-patterns illegal so corruption becomes visible (detection) or repairable (correction). The single number that decides a code's power is its minimum distance $d_{\min}$.


Core definitions

Term Definition Note
Code $C$ A chosen subset of length-$n$ strings over an alphabet $(n,M)$ code: $M$ codewords
Codeword A member of $C$ (a "legal" string) unused strings = illegal
Hamming distance $d(x,y)$ # positions where $x,y$ differ $= \operatorname{wt}(x\oplus y)$ a metric (Thm 38.1)
Weight $\operatorname{wt}(z)$ # nonzero positions of $z$ $d(x,y)=\operatorname{wt}(x\oplus y)$
Minimum distance $d_{\min}$ $\min\{d(x,y): x\ne y \in C\}$ THE design parameter
Code rate $R$ $R = k/n = (\log_2 M)/n$ message-fraction of a block
Linear code $[n,k]$ A subspace of $\mathrm{GF}(2)^n$ (contains $\mathbf 0$, closed under XOR) $2^k$ codewords
Generator matrix $G$ $k\times n$, rows a basis; encode $c = mG$ "how to build a codeword"
Parity-check matrix $H$ $(n-k)\times n$; $c\in C \iff Hc^{\mathsf T}=\mathbf 0$ "is it legal / what broke"
Syndrome $s$ $s = Hy^{\mathsf T} = He^{\mathsf T}$ (error only) $\mathbf 0$ ⇒ no detected error
Reed–Solomon evaluations of a degree-$ MDS; $d_{\min}=n-k+1$

The two threshold theorems (memorize as a pair)

Goal Need on $d_{\min}$ Why
Detect $\le t$ errors $d_{\min} \ge t+1$ no $\le t$ flips reach another codeword
Correct $\le t$ errors $d_{\min} \ge 2t+1$ radius-$t$ balls around codewords stay disjoint
Correct $t_c$ AND detect $t_d$ $d_{\min} \ge t_c+t_d+1$ combined mode (e.g. SECDED)

Read it backward (given $d_{\min}=d$):

$$\text{detects } d-1 \text{ errors}, \qquad \text{corrects } \left\lfloor \frac{d-1}{2}\right\rfloor \text{ errors.}$$


Decision aid — "which code / what guarantee?"

Situation Use / conclude
Just need to notice corruption, cheapest possible parity bit ($d_{\min}=2$, detect 1) — see Ch. 26
Correct any single bit error, efficiently Hamming$(7,4)$, $d_{\min}=3$, rate $4/7$
Correct 1 and detect 2 (ECC RAM) extended Hamming / SECDED, $d_{\min}=4$
Burst errors (scratch, fade, smudge) Reed–Solomon over $\mathrm{GF}(2^8)$ (count symbols)
Want to know a linear code's power fast compute min weight of nonzero codewords (Thm 38.4)
Want $d_{\min}$ from $H$ fewest columns of $H$ that XOR to $\mathbf 0$

Hamming(7,4) cheat sheet

  • Layout (positions 1–7): $[p_1,\ p_2,\ d_1,\ p_4,\ d_2,\ d_3,\ d_4]$. Parity at powers of two (1,2,4); data at 3,5,6,7.
  • Encode: $$p_1 = d_1\oplus d_2\oplus d_4,\quad p_2 = d_1\oplus d_3\oplus d_4,\quad p_4 = d_2\oplus d_3\oplus d_4.$$
  • Decode (syndrome): $$s_1 = c_1\oplus c_3\oplus c_5\oplus c_7,\ \ s_2 = c_2\oplus c_3\oplus c_6\oplus c_7,\ \ s_4 = c_4\oplus c_5\oplus c_6\oplus c_7.$$ Read $s_4 s_2 s_1$ as binary $=$ error position (0 ⇒ clean). Flip that bit; read data at 3,5,6,7.
  • Parity-check matrix (column $j$ = $j$ in binary): $$H = \begin{pmatrix} 0&0&0&1&1&1&1\\ 0&1&1&0&0&1&1\\ 1&0&1&0&1&0&1\end{pmatrix}.$$
  • Worked example: data $1011 \to$ codeword $[0,1,1,0,0,1,1]$; flip pos 5 → receive $[0,1,1,0,1,1,1]$ → syndrome $101_2=5$ → recover $1011$.
  • $d_{\min}=3$ → corrects 1 error. General: $r$ parity bits ⇒ block $2^r-1$, data $2^r-1-r$.

Reed–Solomon cheat sheet

Fact Value
Symbols elements of $\mathrm{GF}(q)$, usually $q=2^8$ (one byte = one symbol)
Message coefficients of $m(x)$, $\deg < k$
Codeword $(m(\alpha_1),\dots,m(\alpha_n))$ at $n$ distinct points
Minimum distance $d_{\min} = n-k+1$ (meets Singleton bound; MDS)
Corrects $t = \lfloor (n-k)/2\rfloor$ symbol errors
Why bursts a long bit-burst spans only a few symbols
Why a field "$\le k-1$ roots" needs no zero divisors — must be $\mathrm{GF}(2^8)$, not $\mathbb{Z}_{256}$

Examples: CD (CIRC) RS$(32,28)$ → $t=2$ symbols/block; NASA RS$(255,223)$ → $d_{\min}=33$, $t=16$.


Key theorems (with the engine of each proof)

Theorem Statement Proof engine
38.1 Hamming distance is a metric triangle ineq: prove per-position, then sum
38.2 detect $\le t$ $\iff d_{\min}\ge t+1$ a corruption hits another codeword iff two codewords within $t$
38.3 correct $\le t$ $\iff d_{\min}\ge 2t+1$ triangle ineq ⇒ radius-$t$ balls disjoint
38.4 linear: $d_{\min} = $ min nonzero weight $d(x,y)=\operatorname{wt}(x\oplus y)$ + closure under XOR
RS $d_{\min}=n-k+1$ nonzero deg-$

Common pitfalls

  • ⚠️ $d_{\min}=3$ corrects 1 OR detects 2 — never both. A double error is miscorrected silently (syndrome points at a third, innocent position). For "correct 1 + detect 2," use $d_{\min}=4$ (SECDED).
  • ⚠️ Reed–Solomon over $\mathbb{Z}_{256}$ is broken — zero divisors ($2\cdot128\equiv0$) let polynomials have too many roots, collapsing the distance bound. Must be the field $\mathrm{GF}(2^8)$.
  • ⚠️ No free correction: more reliability = lower rate $R=k/n$. Always quote the rate when you quote the correction power.
  • ⚠️ Detection $\ne$ correction. Checksums/CRC (Ch. 26) detect; they carry no location info, so they cannot correct.
  • ⚠️ Minimum distance is set by the closest pair of codewords, not the average — one bad pair caps the whole code.

Toolkit additions (dmtoolkit/coding.py)

Function Built in Does
hamming_distance(a, b) Ch. 26 # differing positions
hamming_encode(bits) Ch. 26 4 data bits → 7-bit codeword
hamming_decode(bits) Ch. 26 correct ≤1 error → (data, error_pos)
syndrome(H, word) Ch. 38 $Hy^{\mathsf T}$ over GF(2); names the error position for Hamming
min_distance(codewords) Ch. 38 min nonzero weight = $d_{\min}$ (linear codes, Thm 38.4)

Builds toward capstone Track D (a working error-correcting code over a noisy channel).


  • Builds on: Ch. 24 (finite fields $\mathrm{GF}(2)$, $\mathrm{GF}(2^8)$ — the structure under linear & RS codes); Ch. 26 (parity, checksums, CRC, bursts, polynomials over $\mathbb{Z}_2$ — error detection).
  • Sets up: Ch. 39 capstone Track D.
  • Recurring themes: abstraction (a code is a subspace; decoding is linear algebra); proofs guarantee correctness (distance gives theorems, not "probably"); discrete math is the language of CS (CDs, QR, ECC RAM, Voyager).