Chapter 38 — Key Takeaways (Coding Theory)

DataField.Dev

Chapter 38 — Key Takeaways (Coding Theory)

Re-grounding card. Skim before an exam: definitions, the two theorems, the decision rules, the pitfalls.

The one idea

Spend redundancy to buy distance; let distance absorb noise. A code makes most bit-patterns illegal so corruption becomes visible (detection) or repairable (correction). The single number that decides a code's power is its minimum distance $d_{\min}$.

Core definitions

Term	Definition	Note
Code $C$	A chosen subset of length-$n$ strings over an alphabet	$(n,M)$ code: $M$ codewords
Codeword	A member of $C$ (a "legal" string)	unused strings = illegal
Hamming distance $d(x,y)$	# positions where $x,y$ differ $= \operatorname{wt}(x\oplus y)$	a metric (Thm 38.1)
Weight $\operatorname{wt}(z)$	# nonzero positions of $z$	$d(x,y)=\operatorname{wt}(x\oplus y)$
Minimum distance $d_{\min}$	$\min\{d(x,y): x\ne y \in C\}$	THE design parameter
Code rate $R$	$R = k/n = (\log_2 M)/n$	message-fraction of a block
Linear code $[n,k]$	A subspace of $\mathrm{GF}(2)^n$ (contains $\mathbf 0$, closed under XOR)	$2^k$ codewords
Generator matrix $G$	$k\times n$, rows a basis; encode $c = mG$	"how to build a codeword"
Parity-check matrix $H$	$(n-k)\times n$; $c\in C \iff Hc^{\mathsf T}=\mathbf 0$	"is it legal / what broke"
Syndrome $s$	$s = Hy^{\mathsf T} = He^{\mathsf T}$ (error only)	$\mathbf 0$ ⇒ no detected error
Reed–Solomon	evaluations of a degree-$	MDS; $d_{\min}=n-k+1$

The two threshold theorems (memorize as a pair)

Goal	Need on $d_{\min}$	Why
Detect $\le t$ errors	$d_{\min} \ge t+1$	no $\le t$ flips reach another codeword
Correct $\le t$ errors	$d_{\min} \ge 2t+1$	radius-$t$ balls around codewords stay disjoint
Correct $t_c$ AND detect $t_d$	$d_{\min} \ge t_c+t_d+1$	combined mode (e.g. SECDED)

Read it backward (given $d_{\min}=d$):

$$\text{detects } d-1 \text{ errors}, \qquad \text{corrects } \left\lfloor \frac{d-1}{2}\right\rfloor \text{ errors.}$$

Decision aid — "which code / what guarantee?"

Situation	Use / conclude
Just need to notice corruption, cheapest possible	parity bit ($d_{\min}=2$, detect 1) — see Ch. 26
Correct any single bit error, efficiently	Hamming$(7,4)$, $d_{\min}=3$, rate $4/7$
Correct 1 and detect 2 (ECC RAM)	extended Hamming / SECDED, $d_{\min}=4$
Burst errors (scratch, fade, smudge)	Reed–Solomon over $\mathrm{GF}(2^8)$ (count symbols)
Want to know a linear code's power fast	compute min weight of nonzero codewords (Thm 38.4)
Want $d_{\min}$ from $H$	fewest columns of $H$ that XOR to $\mathbf 0$

Hamming(7,4) cheat sheet

Layout (positions 1–7): $[p_1,\ p_2,\ d_1,\ p_4,\ d_2,\ d_3,\ d_4]$. Parity at powers of two (1,2,4); data at 3,5,6,7.
Encode: $$p_1 = d_1\oplus d_2\oplus d_4,\quad p_2 = d_1\oplus d_3\oplus d_4,\quad p_4 = d_2\oplus d_3\oplus d_4.$$
Decode (syndrome): $$s_1 = c_1\oplus c_3\oplus c_5\oplus c_7,\ \ s_2 = c_2\oplus c_3\oplus c_6\oplus c_7,\ \ s_4 = c_4\oplus c_5\oplus c_6\oplus c_7.$$ Read $s_4 s_2 s_1$ as binary $=$ error position (0 ⇒ clean). Flip that bit; read data at 3,5,6,7.
Parity-check matrix (column $j$ = $j$ in binary): $$H = \begin{pmatrix} 0&0&0&1&1&1&1\\ 0&1&1&0&0&1&1\\ 1&0&1&0&1&0&1\end{pmatrix}.$$
Worked example: data $1011 \to$ codeword $[0,1,1,0,0,1,1]$; flip pos 5 → receive $[0,1,1,0,1,1,1]$ → syndrome $101_2=5$ → recover $1011$.
$d_{\min}=3$ → corrects 1 error. General: $r$ parity bits ⇒ block $2^r-1$, data $2^r-1-r$.

Reed–Solomon cheat sheet

Fact	Value
Symbols	elements of $\mathrm{GF}(q)$, usually $q=2^8$ (one byte = one symbol)
Message	coefficients of $m(x)$, $\deg < k$
Codeword	$(m(\alpha_1),\dots,m(\alpha_n))$ at $n$ distinct points
Minimum distance	$d_{\min} = n-k+1$ (meets Singleton bound; MDS)
Corrects	$t = \lfloor (n-k)/2\rfloor$ symbol errors
Why bursts	a long bit-burst spans only a few symbols
Why a field	"$\le k-1$ roots" needs no zero divisors — must be $\mathrm{GF}(2^8)$, not $\mathbb{Z}_{256}$

Examples: CD (CIRC) RS$(32,28)$ → $t=2$ symbols/block; NASA RS$(255,223)$ → $d_{\min}=33$, $t=16$.

Key theorems (with the engine of each proof)

Theorem	Statement	Proof engine
38.1	Hamming distance is a metric	triangle ineq: prove per-position, then sum
38.2	detect $\le t$ $\iff d_{\min}\ge t+1$	a corruption hits another codeword iff two codewords within $t$
38.3	correct $\le t$ $\iff d_{\min}\ge 2t+1$	triangle ineq ⇒ radius-$t$ balls disjoint
38.4	linear: $d_{\min} = $ min nonzero weight	$d(x,y)=\operatorname{wt}(x\oplus y)$ + closure under XOR
RS	$d_{\min}=n-k+1$	nonzero deg-$

Common pitfalls

⚠️ $d_{\min}=3$ corrects 1 OR detects 2 — never both. A double error is miscorrected silently (syndrome points at a third, innocent position). For "correct 1 + detect 2," use $d_{\min}=4$ (SECDED).
⚠️ Reed–Solomon over $\mathbb{Z}_{256}$ is broken — zero divisors ($2\cdot128\equiv0$) let polynomials have too many roots, collapsing the distance bound. Must be the field $\mathrm{GF}(2^8)$.
⚠️ No free correction: more reliability = lower rate $R=k/n$. Always quote the rate when you quote the correction power.
⚠️ Detection $\ne$ correction. Checksums/CRC (Ch. 26) detect; they carry no location info, so they cannot correct.
⚠️ Minimum distance is set by the closest pair of codewords, not the average — one bad pair caps the whole code.

Toolkit additions (`dmtoolkit/coding.py`)

Function	Built in	Does
`hamming_distance(a, b)`	Ch. 26	# differing positions
`hamming_encode(bits)`	Ch. 26	4 data bits → 7-bit codeword
`hamming_decode(bits)`	Ch. 26	correct ≤1 error → (data, error_pos)
`syndrome(H, word)`	Ch. 38	$Hy^{\mathsf T}$ over GF(2); names the error position for Hamming
`min_distance(codewords)`	Ch. 38	min nonzero weight = $d_{\min}$ (linear codes, Thm 38.4)

Builds toward capstone Track D (a working error-correcting code over a noisy channel).

Cross-links

Builds on: Ch. 24 (finite fields $\mathrm{GF}(2)$, $\mathrm{GF}(2^8)$ — the structure under linear & RS codes); Ch. 26 (parity, checksums, CRC, bursts, polynomials over $\mathbb{Z}_2$ — error detection).
Sets up: Ch. 39 capstone Track D.
Recurring themes: abstraction (a code is a subspace; decoding is linear algebra); proofs guarantee correctness (distance gives theorems, not "probably"); discrete math is the language of CS (CDs, QR, ECC RAM, Voyager).

Chapter 38 — Key Takeaways (Coding Theory)

The one idea

Core definitions

The two threshold theorems (memorize as a pair)

Decision aid — "which code / what guarantee?"

Hamming(7,4) cheat sheet

Reed–Solomon cheat sheet

Key theorems (with the engine of each proof)

Common pitfalls

Toolkit additions (dmtoolkit/coding.py)

Cross-links

Toolkit additions (`dmtoolkit/coding.py`)