Chapter 38: Coding Theory — Error-Correcting Codes

DataField.Dev

42 min read

> "We are not interested in the fact that the brain has the consistency of cold porridge."

Prerequisites

24
26

Learning Objectives

Quantify a code's error-detection and error-correction power using Hamming distance and minimum distance, and prove the two threshold theorems that connect them.
Distinguish error-detecting from error-correcting codes and choose the right one for a given channel and budget.
Encode and decode with a Hamming code, and explain syndrome decoding as a lookup keyed by the parity-check matrix.
Define a linear code over GF(2) by its generator and parity-check matrices, and compute its minimum distance from the matrix.
Explain how Reed–Solomon codes over larger finite fields correct burst errors in QR codes, CDs, and deep-space links.

In This Chapter

Overview
Learning Paths
38.1 The noisy channel problem
38.2 Hamming distance and code rate
38.3 Error-detecting vs. error-correcting codes
38.4 Hamming codes
38.5 Linear codes over GF(2)
38.6 Real codes: Reed–Solomon (QR codes, CDs, deep space)
Project Checkpoint: completing coding.py
Summary
Spaced Review
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 38: Coding Theory — Error-Correcting Codes

"We are not interested in the fact that the brain has the consistency of cold porridge." — Alan Turing (on which details of a system matter for what it computes; here, on why we can ignore the physics of a channel and reason only about its bits)

Overview

Every wire lies a little. Every disk forgets a little. Every photon that carries a bit from a Mars rover to a dish in California has a chance of arriving flipped. If you have ever wondered how a scratched CD still plays, how a smudged QR code still scans, how a file copied across a flaky network still arrives byte-for-byte correct, or how ECC memory in a server shrugs off a cosmic ray — you have been wondering about coding theory. This is the chapter where the abstract machinery you have been building all book pays off in a system you use a hundred times a day without noticing.

Here is the problem, stated the way an engineer would. You want to send a message across a noisy channel — any medium that may corrupt some of the bits in transit. You cannot make the channel perfect; physics will not allow it. So you do the only thing you can: you send more bits than the message strictly needs, arranged so cleverly that the receiver can either notice that corruption happened (error detection) or even repair it without asking you to resend (error correction). The art is doing this with as little extra data as possible. That trade-off — redundancy against reliability — is the whole subject, and it turns out to be a problem in pure discrete mathematics: distances between strings, subspaces of vector spaces over finite fields, and the algebra of polynomials.

In Chapter 26 you learned to detect errors with checksums, parity bits, and CRCs — a mismatch tells you something broke, but not what or where. This chapter crosses the line from detection to correction. And it cashes the check written in Chapter 24: the groups, rings, and finite fields that felt abstract become the exact structure that makes correction efficient. A linear code is a subspace. Decoding is solving a linear system. Reed–Solomon codes — the ones in your phone's camera and on every CD ever pressed — are polynomials over a finite field. The algebra was never abstract; it was a blueprint.

In this chapter, you will learn to:

Model a transmission as a string sent through a channel that flips bits, and measure the damage with Hamming distance.
Compute a code's minimum distance and use it to read off, by two short theorems, exactly how many errors the code can detect and correct.
Tell error-detecting codes from error-correcting codes, and reason about the code rate you pay for each.
Build and run a Hamming code by hand and in code, and understand syndrome decoding as a single multiplication by the parity-check matrix.
Describe a linear code over $\mathrm{GF}(2)$ by its generator and parity-check matrices and extract its minimum distance from the matrix alone.
Explain, at the level of structure, how Reed–Solomon codes over $\mathrm{GF}(2^8)$ correct the bursts of errors that scratches and signal fades actually produce.

Learning Paths

🏃 Fast Track: If you only need the working knowledge, read 38.1–38.2 (distance and rate), the two theorems in 38.3, and the Hamming-code mechanics in 38.4. Do the ⭐⭐ exercises. You will be able to encode/decode a Hamming code and say what any code can correct.

📖 Standard Path: Read in order. Prove the two threshold theorems in 38.3 yourself before reading the proofs. Work every 🔄 Check Your Understanding. Section 38.5 (linear codes) is where detection and correction become one idea — do not skip it.

🔬 Deep Dive: After the chapter, prove that for a linear code the minimum distance equals the minimum weight of a nonzero codeword (we prove it in 38.5), then show the Singleton bound $d \le n - k + 1$ and verify Reed–Solomon meets it with equality (Exercise set, Part E). Connect back to Chapter 24: a Reed–Solomon code is the evaluation map of a polynomial, and its distance is the number-of-roots bound for polynomials over a field.

38.1 The noisy channel problem

Start with a concrete failure. You are copying a 4-bit value, say 1011, across a connection that, every so often, flips a single bit. The receiver gets 1111. It has no way to know anything went wrong — 1111 is a perfectly legal 4-bit value. Garbage in, garbage trusted. The fundamental difficulty is that when every bit pattern is a valid message, every corruption produces another valid message, and the receiver is helpless.

The escape is to make most bit patterns illegal. If you agree in advance that only certain patterns are "real" messages, then a corruption that lands on an illegal pattern is detectable, and a corruption that lands near exactly one real pattern is correctable — you decode to the nearest legal pattern. Coding theory is the study of how to choose the legal patterns well.

Let us fix vocabulary so we can be precise for the rest of the chapter.

Definitions (code, codeword, block code). Fix an alphabet — for almost all of this chapter the binary alphabet $\{0,1\}$. A code $C$ is a chosen subset of all strings of a given length $n$ over that alphabet; the strings in $C$ are its codewords, and $n$ is the block length. A code with $\lvert C\rvert = M$ codewords of length $n$ is called an $(n, M)$ code. We encode by mapping each possible message to a distinct codeword, transmit the codeword, and decode the (possibly corrupted) received string back to a message.

The set of all length-$n$ binary strings has $2^n$ elements; a code uses only some of them. The unused strings are the "illegal" patterns that make detection and correction possible. If $\lvert C\rvert = 2^k$, then each codeword carries $k$ bits of real message in $n$ transmitted bits — the other $n-k$ bits are redundancy, the price of reliability.

💡 Intuition: Think of the $2^n$ possible strings as points in space, and the codewords as a sparse constellation of "stars." A received string is a point that may have drifted from the star it was sent as. If the stars are spread far apart, a small drift still leaves you closer to the original star than to any other, and you can navigate home. Coding theory is the geometry of spreading stars far apart while not using too many of them.

To reason quantitatively we need a model of the channel. The simplest, and the one we use throughout, is the binary symmetric channel: each transmitted bit is independently flipped with some probability $p < 1/2$, and otherwise passes through unchanged. "Symmetric" means a 0 is as likely to flip to a 1 as a 1 is to flip to a 0. This is a deliberate, useful idealization (a Tier-3 model — real channels have memory and bursts, which is exactly why §38.6 exists), but it makes the core question sharp: given that few bits flip, how do we arrange codewords so the receiver can recover?

🔗 Connection. The "make most patterns illegal so corruption is visible" idea is exactly what a parity bit did in Chapter 26: appending one parity bit makes half of all $(k+1)$-bit strings illegal (the ones with the wrong parity), so any single flip lands on an illegal string and is detected. A parity-checked code is the smallest interesting error-detecting code. This chapter generalizes that one bit of cleverness into a full theory — and pushes past detection into correction.

38.2 Hamming distance and code rate

To make "stars spread far apart" precise we need a notion of distance between strings. It is the single most important definition in the chapter, and you have already met it operationally in Chapter 26's coding.py; here it gets its canonical home.

Hamming distance

Consider two codewords of the same length, 10110 and 10011. Walk down them position by position: they agree at positions 1, 2, 5 and differ at positions 3 and 4. Two disagreements. That count is the distance.

Definition (Hamming distance). The Hamming distance $d(x, y)$ between two strings $x$ and $y$ of the same length is the number of positions at which they differ. Equivalently, over $\mathrm{GF}(2)$, it is the number of 1s in their bitwise XOR: $d(x,y) = \operatorname{wt}(x \oplus y)$, where the weight $\operatorname{wt}(z)$ of a string is its number of nonzero positions.

Hamming distance counts exactly the thing the channel does to you: if the channel flips $t$ bits of a transmitted codeword $x$, the received string $y$ satisfies $d(x, y) = t$. Distance is error count. That is why this measure, and not some other, is the right ruler.

It is no accident that $d$ behaves like a real distance. The following fact lets us reason geometrically — in particular, it licenses the "decode to the nearest codeword" strategy.

Theorem 38.1 (Hamming distance is a metric). For all equal-length strings $x, y, z$: 1. $d(x,y) \ge 0$, with $d(x,y) = 0$ if and only if $x = y$ (identity of indiscernibles); 2. $d(x,y) = d(y,x)$ (symmetry); 3. $d(x,z) \le d(x,y) + d(y,z)$ (the triangle inequality).

The strategy first. Properties 1 and 2 are immediate from the definition — "number of differing positions" can't be negative, is zero exactly when nothing differs, and doesn't care about order. The triangle inequality is the only one with content. The key idea: work one position at a time. The total distance is a sum over positions, so if we can prove the inequality for each single position separately, summing the per-position inequalities gives the whole thing. At a single position there are only a handful of cases, and we can just check them.

Proof. Properties 1 and 2 follow directly: $d$ counts a set of positions, so it is a nonnegative integer; it is $0$ exactly when the set of differing positions is empty, i.e. $x = y$; and "positions where $x,y$ differ" is the same set as "positions where $y,x$ differ."

For property 3, write $d_i(a,b) = 1$ if strings $a, b$ differ at position $i$ and $0$ otherwise, so that $d(a,b) = \sum_i d_i(a,b)$. Fix a position $i$. We claim $$d_i(x,z) \le d_i(x,y) + d_i(y,z).$$ If $x$ and $z$ agree at position $i$, the left side is $0$ and the inequality holds trivially (the right side is a sum of nonnegative terms). If $x$ and $z$ differ at position $i$, the left side is $1$; but then $y$ cannot equal both $x$ and $z$ at position $i$, so at least one of $d_i(x,y), d_i(y,z)$ is $1$, making the right side at least $1$. Either way the per-position inequality holds. Summing over all positions $i$, $$d(x,z) = \sum_i d_i(x,z) \le \sum_i \big(d_i(x,y) + d_i(y,z)\big) = d(x,y) + d(y,z). \qquad \blacksquare$$

Here is the definition in code — identical to the coding.py version, restated so the chapter is self-contained:

def hamming_distance(a, b):
    """Number of positions where equal-length bit lists a and b differ."""
    return sum(x != y for x, y in zip(a, b))

print(hamming_distance([1, 0, 1, 1, 0], [1, 0, 0, 1, 1]))
# Expected output:
# 2

Minimum distance — the single number that decides everything

A code is only as strong as its closest pair of codewords. If two legal codewords are distance 1 apart, a single flip can turn one into the other undetectably. So the quantity that governs a code's power is the smallest distance occurring anywhere in the code.

Definition (minimum distance). The minimum distance of a code $C$ with at least two codewords is $$d_{\min}(C) = \min\{\, d(x, y) : x, y \in C,\ x \ne y \,\}.$$ An $(n, M)$ code with minimum distance $d$ is called an $(n, M, d)$ code. The minimum distance is the design parameter: §38.3 proves that it alone determines how many errors the code detects and corrects.

Code rate — what reliability costs

Redundancy is not free; every check bit is a bit you transmitted that was not message. The efficiency of a code is captured by its rate.

Definition (code rate). For a code with $M = 2^k$ codewords of block length $n$, the code rate is $$R = \frac{k}{n} = \frac{\log_2 M}{n},$$ the fraction of each transmitted block that is genuine message. A rate of $1$ means no redundancy (and no protection); lower rates buy more protection at the cost of throughput.

For example, the triple-repetition code that sends each bit three times (0 → 000, 1 → 111) has $k = 1$, $n = 3$, so $R = 1/3$: two-thirds of what you transmit is redundancy. The Hamming code of §38.4 sends 4 message bits in 7 transmitted bits, rate $4/7 \approx 0.57$ — far more efficient, for the same single-error-correcting power. Better mathematics buys back throughput.

🔄 Check Your Understanding 1. Compute $d(\texttt{11001}, \texttt{10101})$ and the weight $\operatorname{wt}(\texttt{10110})$. 2. The repetition code $\{\texttt{000}, \texttt{111}\}$ has what minimum distance, and what rate? 3. A channel flips at most 1 bit per block. Two codewords are distance 1 apart. Give a single received string that the receiver cannot unambiguously decode.

Answers

$d(\texttt{11001}, \texttt{10101}) = 2$ (differ at positions 2 and 3). $\operatorname{wt}(\texttt{10110}) = 3$. 2. $d_{\min} = d(\texttt{000},\texttt{111}) = 3$; rate $R = 1/3$. 3. Say the codewords are $\texttt{000}$ and $\texttt{001}$ (distance 1). If the receiver gets $\texttt{001}$, it could be $\texttt{001}$ sent cleanly or $\texttt{000}$ with position 3 flipped — both are within 1 flip, so decoding is ambiguous. (Indeed $d_{\min}=1$ means the code corrects zero errors, by the theorem we are about to prove.)

38.3 Error-detecting vs. error-correcting codes

Now the payoff: two short, beautiful theorems that turn the single number $d_{\min}$ into exact guarantees. They are the reason coding theorists talk about minimum distance and almost nothing else when sizing a code.

Detection

First, detection. The receiver detects an error if the received string is not a codeword — it can tell that something broke, even if it cannot fix it. When does a code guarantee this for up to $t$ errors?

Theorem 38.2 (Detection threshold). A code $C$ can detect all error patterns of weight at most $t$ if and only if its minimum distance satisfies $d_{\min} \ge t + 1$.

The strategy first. "Detect up to $t$ errors" means: no codeword, corrupted in $\le t$ positions, can land exactly on another codeword (if it did, the receiver would see a legal string and suspect nothing). So detection fails precisely when some $\le t$ flips carry one codeword to another — i.e. when two codewords are within distance $t$. The cleanest proof is by contraposition in one direction and a direct construction in the other; we phrase both directions around that single observation.

Proof. ($\Leftarrow$) Suppose $d_{\min} \ge t+1$. Take any codeword $x$ and corrupt it in at most $t$ positions to get $y$, so $d(x,y) \le t$. If $y$ were a (different) codeword, then $x$ and $y$ would be two distinct codewords at distance $\le t < t+1 \le d_{\min}$, contradicting the definition of minimum distance. So $y$ is not a codeword, and the receiver sees an illegal string — the error is detected.

($\Rightarrow$) Suppose, contrapositively, that $d_{\min} \le t$. Then there exist distinct codewords $x, y$ with $d(x,y) = d_{\min} \le t$. Sending $x$ and flipping exactly the $d(x,y)$ positions where it differs from $y$ turns $x$ into $y$ — a legal codeword — using at most $t$ flips. The receiver sees $y$, a perfectly valid codeword, and detects nothing. So the code fails to detect this weight-$\le t$ error. $\blacksquare$

A parity bit gives $d_{\min} = 2$ (any two codewords differ in at least two positions: change one data bit and the parity bit must change too). By Theorem 38.2 it detects all single-bit errors ($t = 1$), and indeed misses some double-bit errors — exactly the behavior Chapter 26 reported.

Correction

Correction is stronger: the receiver must not only notice the error but identify the original codeword. The natural strategy is nearest-codeword decoding (also called minimum-distance decoding): decode a received string to the codeword closest to it in Hamming distance. The question is when that strategy is guaranteed to be correct.

Theorem 38.3 (Correction threshold). A code $C$ can correct all error patterns of weight at most $t$ under nearest-codeword decoding if and only if its minimum distance satisfies $d_{\min} \ge 2t + 1$.

The strategy first. Picture a ball of radius $t$ (in Hamming distance) around each codeword — all strings within $t$ flips of it. Nearest-codeword decoding succeeds for $\le t$ errors exactly when these balls do not overlap: a received string with $\le t$ errors sits inside the ball of the codeword that was sent, and if no other codeword's ball reaches it, the sent codeword is unambiguously the nearest. The balls of radius $t$ around two codewords at distance $d$ are disjoint precisely when $d \ge 2t+1$ — that is the triangle inequality doing its job. So the whole theorem reduces to "the balls don't collide iff $d_{\min} \ge 2t+1$."

Proof. ($\Leftarrow$) Suppose $d_{\min} \ge 2t + 1$. Send codeword $x$; the channel produces $y$ with $d(x,y) \le t$. Let $x'$ be any other codeword. By the triangle inequality (Theorem 38.1), $$d(x, x') \le d(x, y) + d(y, x'),$$ so $$d(y, x') \ge d(x, x') - d(x, y) \ge d_{\min} - t \ge (2t+1) - t = t + 1 > t \ge d(x, y).$$ Thus every other codeword $x'$ is strictly farther from $y$ than $x$ is. The unique nearest codeword to $y$ is $x$, so nearest-codeword decoding returns $x$ — the error is corrected.

($\Rightarrow$) Suppose $d_{\min} \le 2t$. Pick distinct codewords $x, x'$ with $d(x, x') = d_{\min} \le 2t$. Build a string $y$ from $x$ by flipping it toward $x'$ in $\lceil d(x,x')/2 \rceil \le t$ of the positions where they differ. Then $d(x, y) \le t$, and $d(x', y) = d(x,x') - d(x,y) \le \lfloor d(x,x')/2\rfloor \le t$, so $y$ is within distance $t$ of both codewords. A received $y$ this close to two codewords cannot be decoded unambiguously to the one that was sent — correction of this weight-$\le t$ error is impossible. $\blacksquare$

These two theorems are worth memorizing as a pair, because they answer the only two questions a system designer asks:

Goal	Requirement on minimum distance	Mnemonic
Detect up to $t$ errors	$d_{\min} \ge t + 1$	one extra to notice a slide
Correct up to $t$ errors	$d_{\min} \ge 2t + 1$	twice plus one — room on both sides

🚪 Threshold Concept. This is the idea that makes reliable computing possible despite unreliable hardware. Notice what just happened: we converted an analog, physical, probabilistic problem ("the wire sometimes glitches") into a single integer ($d_{\min}$) and two clean theorems. We never had to make the channel perfect. Instead we engineered enough distance between legal messages that the channel's mistakes fall harmlessly into the gaps. Every reliable digital system — RAID arrays, ECC RAM, the Voyager probes still phoning home from interstellar space, the QR code on a coffee cup — runs on this one move: spend redundancy to buy distance, and let distance absorb noise. Once you see it, you see it everywhere.

🔄 Check Your Understanding 1. The triple-repetition code has $d_{\min} = 3$. How many errors can it detect? How many can it correct? 2. You need to correct any 2-bit error. What is the minimum $d_{\min}$ you must design for? 3. A code has $d_{\min} = 4$. Can it correct 1 error and detect 2 errors simultaneously? (Hint: a code can be used in a combined mode where it corrects up to $t_c$ and detects up to $t_d \ge t_c$ when $d_{\min} \ge t_c + t_d + 1$.)

Answers

Detect up to $d_{\min}-1 = 2$ errors; correct up to $\lfloor (d_{\min}-1)/2 \rfloor = 1$ error. 2. $d_{\min} \ge 2(2)+1 = 5$. 3. Yes: with $t_c = 1$, $t_d = 2$ we need $d_{\min} \ge 1 + 2 + 1 = 4$, which holds with equality. This "correct 1, detect 2" mode (SECDED) is exactly how ECC server memory is run — see §38.4.

38.4 Hamming codes

The repetition code corrects one error but wastes two-thirds of the channel. In 1950, Richard Hamming — frustrated by a weekend's computation lost to a card-reader error that the machine detected but could not fix — asked the obvious next question: what is the most efficient single-error-correcting code? His answer is one of the most elegant constructions in all of computer science, and you have already been using its (7,4) instance in coding.py. Now we understand why it works.

Definition (Hamming code). A Hamming code is a single-error-correcting linear code in which the parity-check positions are placed at the powers of two ($1, 2, 4, 8, \dots$) and each parity bit checks exactly those positions whose index has a particular bit set. The $\mathrm{Hamming}(7,4)$ code encodes $4$ data bits into a $7$-bit codeword and corrects any single-bit error. In general, $r$ parity bits protect a block of length $2^r - 1$ carrying $2^r - 1 - r$ data bits.

The construction, by its central trick

Here is Hamming's idea, and it is genuinely clever. Number the $7$ positions of the codeword $1$ through $7$, and write each position number in binary:

position	binary	bit values
1	001	$s_1$
2	010	$s_2$
3	011	$s_2, s_1$
4	100	$s_4$
5	101	$s_4, s_1$
6	110	$s_4, s_2$
7	111	$s_4, s_2, s_1$

Put a parity bit at each position that is a pure power of two — positions 1, 2, 4 — and a data bit at each remaining position — 3, 5, 6, 7. (This matches coding.py exactly: codeword $= [p_1, p_2, d_1, p_4, d_2, d_3, d_4]$ at positions $1\dots 7$.) Now set:

parity $p_1$ (position 1, binary $001$) to make even parity over all positions whose binary has the $1$s-bit set: positions $1, 3, 5, 7$.
parity $p_2$ (position 2, binary $010$) over all positions with the $2$s-bit set: positions $2, 3, 6, 7$.
parity $p_4$ (position 4, binary $100$) over all positions with the $4$s-bit set: positions $4, 5, 6, 7$.

So, writing $d_1, d_2, d_3, d_4$ for the data bits at positions $3,5,6,7$: $$p_1 = d_1 \oplus d_2 \oplus d_4, \qquad p_2 = d_1 \oplus d_3 \oplus d_4, \qquad p_4 = d_2 \oplus d_3 \oplus d_4.$$

💡 Intuition: Why powers of two and why "binary index"? Because it makes decoding self-locating. When the receiver recomputes the three parities, each parity is either satisfied (0) or violated (1). Read those three results as a 3-bit binary number $s_4 s_2 s_1$ — and that number is the position of the flipped bit. A single error at position $j$ violates exactly the parities whose bit is set in $j$'s binary expansion, which spells out $j$. The code doesn't just detect the error; it announces its address.

Syndrome decoding

The three recomputed parities are called the syndrome of the received word.

Definition (syndrome). The syndrome of a received word is the vector of recomputed parity-check results. For the Hamming code it is the 3-bit value $s_4 s_2 s_1$ where $s_1, s_2, s_4$ are the parities over positions $\{1,3,5,7\}, \{2,3,6,7\}, \{4,5,6,7\}$ respectively. A syndrome of $0$ means "no detected error"; a nonzero syndrome, read as a binary number, names the position to flip.

Let us run the canonical example by hand, the same one in coding.py. Encode the data $d_1 d_2 d_3 d_4 = 1011$ (data at positions 3,5,6,7): $$p_1 = 1 \oplus 0 \oplus 1 = 0, \quad p_2 = 1 \oplus 1 \oplus 1 = 1, \quad p_4 = 0 \oplus 1 \oplus 1 = 0.$$ The codeword, positions $1\dots 7$, is $[p_1, p_2, d_1, p_4, d_2, d_3, d_4] = [0,1,1,0,0,1,1]$.

Now suppose the channel flips position 5, so the receiver gets $[0,1,1,0,1,1,1]$. Recompute the syndrome: $$s_1 = c_1 \oplus c_3 \oplus c_5 \oplus c_7 = 0\oplus 1\oplus 1\oplus 1 = 1,$$ $$s_2 = c_2 \oplus c_3 \oplus c_6 \oplus c_7 = 1\oplus 1\oplus 1\oplus 1 = 0,$$ $$s_4 = c_4 \oplus c_5 \oplus c_6 \oplus c_7 = 0\oplus 1\oplus 1\oplus 1 = 1.$$ The syndrome $s_4 s_2 s_1 = 101_2 = 5$. Flip position 5, recover $[0,1,1,0,0,1,1]$, read off the data at positions 3,5,6,7: $1,0,1,1$. The original message, restored — and the receiver never had to ask for a resend.

def hamming_decode(code):
    """Correct up to one bit error; return (data_bits, error_position)."""
    c = code[:]
    s1 = c[0] ^ c[2] ^ c[4] ^ c[6]   # parity over positions 1,3,5,7
    s2 = c[1] ^ c[2] ^ c[5] ^ c[6]   # parity over positions 2,3,6,7
    s4 = c[3] ^ c[4] ^ c[5] ^ c[6]   # parity over positions 4,5,6,7
    syndrome = s4 * 4 + s2 * 2 + s1  # binary s4 s2 s1 = bad position
    if syndrome:
        c[syndrome - 1] ^= 1         # flip the indicated bit
    return [c[2], c[4], c[5], c[6]], syndrome

print(hamming_decode([0, 1, 1, 0, 1, 1, 1]))  # position 5 corrupted
# Expected output:
# ([1, 0, 1, 1], 5)

Why the Hamming code has minimum distance 3

Everything above assumes the code corrects one error. By Theorem 38.3 that requires $d_{\min} \ge 3$. Chapter 26 asserted this and deferred the proof to here. Time to pay the debt.

The strategy first. We want the smallest distance between two distinct codewords. A slicker route uses a fact we will prove in §38.5: for a linear code, the minimum distance equals the minimum weight of a nonzero codeword. The Hamming code is linear (the parity equations are linear over $\mathrm{GF}(2)$), so it suffices to show no nonzero codeword has weight 1 or 2 — equivalently, that every weight-1 and weight-2 pattern violates some parity check. And the syndrome machinery hands us that directly: a single error at position $j$ produces syndrome $= j \ne 0$, so no weight-1 string is a codeword; and we show no weight-2 string is either.

Proof that $d_{\min}(\mathrm{Hamming}(7,4)) = 3$. Because the code is linear (§38.5), $d_{\min}$ equals the minimum weight of a nonzero codeword, so we find the lightest nonzero codeword.

No codeword has weight 1. A weight-1 string has a single 1, at some position $j \in \{1,\dots,7\}$. Its syndrome is exactly $j$ written in binary (a single error at position $j$ flips precisely the parities whose bit is set in $j$). Since $1 \le j \le 7$, the syndrome is nonzero, so the parity checks are violated — a weight-1 string is not a codeword.

No codeword has weight 2. A weight-2 string has 1s at two distinct positions $i \ne j$. Its syndrome is the XOR of the two single-error syndromes, namely (binary $i$) $\oplus$ (binary $j$). Because $i \ne j$, their binary representations differ in at least one bit, so the XOR is nonzero — the checks are violated, and a weight-2 string is not a codeword.

A codeword of weight 3 exists. Take the data $d_1 d_2 d_3 d_4 = 1000$ (so $d_1 = 1$, the rest $0$). Then $p_1 = 1\oplus 0\oplus 0 = 1$, $p_2 = 1\oplus 0\oplus 0 = 1$, $p_4 = 0\oplus 0\oplus 0 = 0$, giving the codeword $[p_1,p_2,d_1,p_4,d_2,d_3,d_4] = [1,1,1,0,0,0,0]$, which has weight exactly $3$. (This is the dependency among columns $1,2,3$ of $H$: $001 \oplus 010 \oplus 011 = 000$.)

Since the minimum nonzero weight is neither 1 nor 2 but 3 is achieved, $d_{\min} = 3$. By Theorem 38.3 the code corrects all single-bit errors, and by Theorem 38.2 it detects all single- and double-bit errors (though it will miscorrect a double error — see the pitfall). $\blacksquare$

⚠️ Common Pitfall: A distance-3 code corrects 1 error or detects 2, but not both at once. If two bits flip, the syndrome is nonzero and points at some third position; the decoder "corrects" a bit that was never wrong, producing a wrong codeword silently. To get "correct 1 and detect 2" (SECDED — single-error-correct, double-error-detect), you add one more overall parity bit, raising the block to 8 bits and $d_{\min}$ to 4. That extended $\mathrm{Hamming}(8,4)$ is exactly what ECC server memory uses on every 64-bit word. Distance is destiny: $d_{\min}=3$ buys "correct 1," $d_{\min}=4$ buys "correct 1 and detect 2."

🔄 Check Your Understanding 1. Encode the data bits $d_1 d_2 d_3 d_4 = 1001$ with $\mathrm{Hamming}(7,4)$. 2. A receiver computes syndrome $s_4 s_2 s_1 = 110$. What does it do? 3. Why are the parity bits placed at positions $1, 2, 4$ rather than, say, at the end (positions 5, 6, 7)?

Answers

$d_1=1,d_2=0,d_3=0,d_4=1$: $p_1 = 1\oplus 0\oplus 1 = 0$, $p_2 = 1\oplus 0\oplus 1 = 0$, $p_4 = 0\oplus 0\oplus 1 = 1$. Codeword $[0,0,1,1,0,0,1]$. 2. $110_2 = 6$; flip position 6, then read data at 3,5,6,7. 3. Placing them at power-of-two positions is what makes the syndrome equal the error position. With parity at the end, the syndrome would still detect/locate errors but you'd need a lookup table mapping syndromes to positions; the power-of-two placement makes that table the identity, which is the whole elegance.

38.5 Linear codes over GF(2)

Hamming codes are a special case of a much larger and more powerful family. The unifying idea is to stop thinking of codewords as arbitrary strings and start thinking of them as vectors in a vector space over a finite field — and of the code itself as a subspace. This is where Chapter 24's algebra becomes the engine, not the scenery.

🔗 Connection. Recall from Chapter 24 that $\mathrm{GF}(2) = \mathbb{Z}_2 = \{0,1\}$ is the smallest finite field: addition is XOR ($1+1=0$), multiplication is AND, every nonzero element is its own inverse. A length-$n$ binary string is then a vector in $\mathrm{GF}(2)^n$, and all of linear algebra — vectors, subspaces, matrices, linear maps, dimension — applies, with arithmetic done mod 2. We do not redefine fields here; we use the structure Chapter 24 built.

The definition and its first payoff

Definition (linear code). A linear code over $\mathrm{GF}(2)$ is a code $C \subseteq \mathrm{GF}(2)^n$ that is a linear subspace: it contains the all-zeros word, and the XOR (sum over $\mathrm{GF}(2)$) of any two codewords is again a codeword. If $C$ has dimension $k$ as a subspace, it has $2^k$ codewords and is called an $[n, k]$ code (square brackets signal linearity), with rate $k/n$.

Linearity is not a minor convenience; it collapses the whole theory. Here is the first and most useful consequence, the fact we used twice already.

Theorem 38.4 (Minimum distance = minimum weight). For a linear code $C$, the minimum distance equals the minimum weight of a nonzero codeword: $$d_{\min}(C) = \min\{\, \operatorname{wt}(c) : c \in C,\ c \ne \mathbf{0} \,\}.$$

The strategy first. This is the move that turns an $O(M^2)$ search over all pairs of codewords into an $O(M)$ search over single codewords. The key fact: Hamming distance is translation-invariant over $\mathrm{GF}(2)$ — $d(x,y)$ depends only on the difference $x \oplus y$, since $d(x,y) = \operatorname{wt}(x \oplus y)$. And in a linear code, the difference of two codewords is itself a codeword. So "distances between pairs" and "weights of codewords" are literally the same set of numbers.

Proof. Recall $d(x,y) = \operatorname{wt}(x \oplus y)$ for all $x,y$.

($\le$) Let $c \ne \mathbf{0}$ be a codeword of minimum weight $w$. Since $C$ is linear, $\mathbf{0} \in C$, and $\mathbf{0}, c$ are two distinct codewords with $d(\mathbf{0}, c) = \operatorname{wt}(c) = w$. Hence $d_{\min} \le w$.

($\ge$) Let $x \ne y$ be two codewords achieving the minimum distance, so $d_{\min} = d(x,y) = \operatorname{wt}(x \oplus y)$. By linearity $z = x \oplus y$ is a codeword, and $z \ne \mathbf{0}$ because $x \ne y$. Thus $z$ is a nonzero codeword of weight $d_{\min}$, so the minimum nonzero weight is $\le d_{\min}$.

Combining, the minimum nonzero weight equals $d_{\min}$. $\blacksquare$

This is exactly the shortcut we invoked in §38.4: to find the Hamming code's minimum distance we hunted for the lightest nonzero codeword instead of comparing all $\binom{16}{2} = 120$ pairs.

Generator and parity-check matrices

A subspace can be described two dual ways: by a basis that spans it (the generator matrix) or by the linear equations that cut it out (the parity-check matrix). Both are central.

Definitions (generator matrix, parity-check matrix). A generator matrix $G$ of an $[n,k]$ linear code is a $k \times n$ matrix whose rows form a basis of the code; encoding a message row vector $m \in \mathrm{GF}(2)^k$ is the matrix product $c = mG$ (arithmetic mod 2). A parity-check matrix $H$ is an $(n-k) \times n$ matrix whose null space is the code: a string $c$ is a codeword if and only if $Hc^{\mathsf T} = \mathbf{0}$ (mod 2). The two are linked by $GH^{\mathsf T} = \mathbf{0}$.

The parity-check matrix is the powerhouse, because it makes the syndrome a matrix product.

Definition (syndrome, general). For a received word $y$, its syndrome is $s = Hy^{\mathsf T}$. If the channel added an error vector $e$ (so $y = c \oplus e$ for the sent codeword $c$), then since $Hc^{\mathsf T} = \mathbf 0$, $$s = H y^{\mathsf T} = H(c \oplus e)^{\mathsf T} = Hc^{\mathsf T} \oplus He^{\mathsf T} = He^{\mathsf T}.$$ The syndrome depends only on the error, not on which codeword was sent — so decoding reduces to: given $s$, find the lowest-weight $e$ with $He^{\mathsf T} = s$, and subtract it.

The Hamming$(7,4)$ code falls right out of this framework, and beautifully so. Its parity-check matrix has as its $j$-th column the binary representation of $j$: $$H = \begin{pmatrix} 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 \end{pmatrix},$$ where the rows are the $s_4, s_2, s_1$ checks (top to bottom) and column $j$ is $j$ in binary. Now the magic of §38.4 is a one-line theorem: a single error at position $j$ is the error vector $e$ with a 1 in position $j$, and $He^{\mathsf T}$ picks out exactly column $j$ of $H$ — which is $j$ in binary. The syndrome is the error position because the columns of $H$ are the position numbers. The clever bit-placement of §38.4 was secretly just "make the columns of $H$ count from 1 to 7."

🔄 Check Your Understanding 1. Using the $H$ above, compute the syndrome of the received word $y = (0,1,1,0,1,1,1)$ (our running example) as $Hy^{\mathsf T}$, and confirm it matches §38.4. 2. Why does $d_{\min}$ equal the smallest number of columns of $H$ that are linearly dependent (i.e. XOR to zero)? 3. A linear $[n,k]$ code has how many codewords? What is its rate?

Answers

Row $s_4$ (cols 4–7): $0\oplus1\oplus1\oplus1 = 1$. Row $s_2$ (cols 2,3,6,7): $1\oplus1\oplus1\oplus1 = 0$. Row $s_1$ (cols 1,3,5,7): $0\oplus1\oplus1\oplus1 = 1$. Syndrome (top-to-bottom $s_4 s_2 s_1$) $= 101_2 = 5$ — matches. 2. A nonzero codeword $c$ of weight $w$ satisfies $Hc^{\mathsf T} = \mathbf 0$, i.e. the $w$ columns of $H$ where $c$ has 1s sum (XOR) to zero — they are linearly dependent. The lightest nonzero codeword corresponds to the smallest dependent set of columns, so $d_{\min}$ = minimum number of columns of $H$ that are linearly dependent. (For Hamming, all columns are distinct and nonzero, so no 1 or 2 columns are dependent, but some 3 are — hence $d_{\min}=3$, a second proof of §38.4.) 3. $2^k$ codewords; rate $k/n$.

Here is the linear machinery in code — encoding by $mG$ and checking membership by $Hc^{\mathsf T}$, with arithmetic mod 2 done from scratch:

def matvec_mod2(M, v):                       # M @ v over GF(2)
    return [sum(M[i][j] * v[j] for j in range(len(v))) % 2 for i in range(len(M))]

H = [[0,0,0,1,1,1,1],                        # rows s4, s2, s1; column j = j in binary
     [0,1,1,0,0,1,1],
     [1,0,1,0,1,0,1]]

y = [0,1,1,0,1,1,1]                          # received word (position 5 flipped)
s = matvec_mod2(H, y)                        # syndrome s4 s2 s1
print(s, "->", s[0]*4 + s[1]*2 + s[2])
# Expected output:
# [1, 0, 1] -> 5

💡 Intuition: Two views of one code. The generator matrix $G$ answers "how do I build a codeword?" (multiply your message by $G$). The parity-check matrix $H$ answers "is this legal, and if not, what broke?" (multiply by $H$ and read the syndrome). Encoding lives in $G$; decoding lives in $H$. They are the two faces of the same subspace, which is why a single linear-algebra fact — distance equals minimum weight equals fewest dependent columns of $H$ — ties the whole theory together.

38.6 Real codes: Reed–Solomon (QR codes, CDs, deep space)

Hamming codes correct isolated single-bit errors. But the real world rarely flips one lonely bit. A scratch on a CD wipes out a contiguous run of thousands of bits. A QR code gets a coffee-ring over one corner. A cosmic-ray strike on a memory chip can corrupt several adjacent cells. These are burst errors — a notion you met in Chapter 26 — and a code that fixes one bit here and one bit there is the wrong tool. The dominant real-world answer is the Reed–Solomon code, and it is the most spectacular payoff of the finite-field algebra from Chapter 24.

The key shift: symbols, not bits

The trick that beats bursts is to stop counting bits and start counting symbols. Group the data into chunks — for the codes in your phone and on every CD, chunks of 8 bits, i.e. one byte, treated as a single element of the finite field $\mathrm{GF}(2^8)$ (256 elements, exactly one per byte). Now a burst of, say, 20 corrupted bits touches at most $\lceil 20/8 \rceil + 1 = 4$ adjacent symbols. A code that can correct a handful of symbol errors therefore shrugs off a burst spanning dozens of bits, because the whole burst counts as only a few symbol-errors. The field structure of $\mathrm{GF}(2^8)$ — the one Chapter 24 built from polynomials mod an irreducible polynomial — is what lets us do arithmetic on bytes-as-symbols.

Definition (Reed–Solomon code, evaluation view). A Reed–Solomon code over a finite field $\mathrm{GF}(q)$ treats a message of $k$ symbols as the coefficients of a polynomial $m(x) = m_0 + m_1 x + \dots + m_{k-1}x^{k-1}$ of degree $< k$, and the codeword as the list of its evaluations $\big(m(\alpha_1), m(\alpha_2), \dots, m(\alpha_n)\big)$ at $n$ distinct fixed points $\alpha_1,\dots,\alpha_n \in \mathrm{GF}(q)$ (with $n \le q$). This is an $[n, k]$ linear code over $\mathrm{GF}(q)$.

Why it works — one polynomial fact does everything

The entire error-correcting power of Reed–Solomon rests on a single theorem about polynomials over a field, one whose ancestor you have seen since high school.

The structural reason (the polynomial root bound). A nonzero polynomial of degree $< k$ over a field has at most $k-1$ roots. Two distinct degree-$< k$ polynomials $m_1 \ne m_2$ therefore agree (i.e. $m_1(\alpha) = m_2(\alpha)$, equivalently $(m_1-m_2)(\alpha)=0$) at most $k-1$ points — because $m_1 - m_2$ is a nonzero polynomial of degree $< k$, so it has at most $k-1$ roots. Hence their codewords, evaluated at $n$ points, differ in at least $n - (k-1) = n - k + 1$ positions. So $$d_{\min} = n - k + 1.$$

The strategy first. We want the minimum distance of the evaluation code. By Theorem 38.4 (linearity — and Reed–Solomon is linear because evaluation is a linear map on coefficients) it equals the minimum weight of a nonzero codeword. A nonzero codeword is the evaluation vector of a nonzero polynomial $m$ of degree $< k$; its weight is the number of nonzero evaluations, i.e. $n$ minus the number of evaluation points that are roots of $m$. Since a nonzero degree-$< k$ polynomial has at most $k-1$ roots, at most $k-1$ of the evaluations are zero, so at least $n-(k-1)$ are nonzero. That is the weight bound, and it is achieved.

Proof that $d_{\min} = n-k+1$. Reed–Solomon is linear (the map from coefficient vector to evaluation vector is linear over $\mathrm{GF}(q)$: evaluation is linear in the coefficients). By Theorem 38.4, $d_{\min}$ is the least weight of a nonzero codeword. Let $m(x)$ be any nonzero polynomial of degree $\le k-1$; its codeword is $(m(\alpha_1),\dots,m(\alpha_n))$. The weight of this codeword is $n$ minus the number of evaluation points $\alpha_i$ at which $m(\alpha_i) = 0$, i.e. $n$ minus the number of roots of $m$ among the $\alpha_i$. A nonzero polynomial of degree $\le k-1$ over a field has at most $k-1$ roots, so at most $k-1$ of the $\alpha_i$ are roots, so the weight is at least $n - (k-1) = n-k+1$. Thus $d_{\min} \ge n-k+1$. Equality is achieved: take $m(x) = (x-\alpha_1)(x-\alpha_2)\cdots(x-\alpha_{k-1})$, a nonzero polynomial of degree exactly $k-1$ that vanishes at $\alpha_1,\dots,\alpha_{k-1}$; its codeword has exactly $k-1$ zeros and weight exactly $n-k+1$. $\blacksquare$

This $d_{\min} = n-k+1$ is the best any code can possibly do for given $n$ and $k$ — it meets the Singleton bound $d_{\min} \le n-k+1$ with equality (codes that do are called maximum distance separable, MDS). By Theorem 38.3, a Reed–Solomon code corrects up to $t = \lfloor (n-k)/2 \rfloor$ symbol errors. The famous CD code, a shortened RS over $\mathrm{GF}(2^8)$ with $n=32, k=28$, corrects $t = 2$ symbol errors per block; layered with interleaving (the CIRC scheme), it recovers from scratches up to roughly 2.5 mm wide. The same family, with different parameters, is why a QR code still scans with a corner missing (its highest error-correction level tolerates about 30% of the symbols destroyed) and why the Voyager probes, launched in 1977 and now in interstellar space, still return readable data across billions of kilometers.

🔗 Connection. Look back at what just happened. The "at most $k-1$ roots" fact is a property of fields — it fails over $\mathbb{Z}_n$ when $n$ is not prime (e.g. $x^2 - 1$ has four roots mod 8). Reed–Solomon works only because $\mathrm{GF}(2^8)$ is a genuine field, exactly the structure Chapter 24 was at pains to build (and exactly why $\mathrm{GF}(p^k) \ne \mathbb{Z}_{p^k}$). The abstract axioms you verified for finite fields are the load-bearing wall under the most-deployed error-correcting code on Earth. Coding theory is where Chapter 24 stops being theory.

💡 Intuition: $k$ points determine a degree-$ k$ points. Even if a few points are corrupted, the correct polynomial is still the one that passes through "almost all" of them — and the field structure guarantees no impostor polynomial can sneak through too many. Decoding is robust interpolation: find the low-degree curve through the surviving points.

🔄 Check Your Understanding 1. A Reed–Solomon code has $n=255$, $k=223$ (a NASA standard). What is $d_{\min}$, and how many symbol errors does it correct? 2. Why is correcting symbol errors the right idea for burst (consecutive-bit) errors? 3. Why does the root-bound argument fail over $\mathbb{Z}_n$ for composite $n$, and why does that matter for Reed–Solomon?

Answers

$d_{\min} = n-k+1 = 255-223+1 = 33$; it corrects $t = \lfloor (n-k)/2\rfloor = \lfloor 32/2\rfloor = 16$ symbol errors per block. 2. A burst of consecutive bad bits spans only a few consecutive symbols (bytes); correcting a few symbol-errors therefore absorbs a long bit-burst. Counting by symbol turns "many adjacent bit errors" into "few symbol errors." 3. Over $\mathbb{Z}_n$ with $n$ composite there are zero divisors, so a nonzero degree-$d$ polynomial can have more than $d$ roots (e.g. $x^2-1 \equiv 0 \pmod 8$ has roots $1,3,5,7$). The distance bound $d_{\min}=n-k+1$ would collapse. Reed–Solomon needs a true field — no zero divisors — which is why it lives over $\mathrm{GF}(2^8)$, not $\mathbb{Z}_{256}$.

Project Checkpoint: completing `coding.py`

Chapter 26 began dmtoolkit/coding.py with hamming_distance, hamming_encode, and hamming_decode. This chapter completes the module with the linear-algebra view that unifies everything: a general syndrome computation against a parity-check matrix, and a min_distance function that finds a code's correcting power by Theorem 38.4 — searching codewords by weight, not pairs by distance. These are the functions the capstone's Track D (an error-correcting code) builds on.

Add to coding.py (keeping the three Chapter 26 functions unchanged):

HAMMING74_H = [[0, 0, 0, 1, 1, 1, 1],     # rows = checks s4, s2, s1
               [0, 1, 1, 0, 0, 1, 1],     # column j (1-indexed) = j in binary
               [1, 0, 1, 0, 1, 0, 1]]

def syndrome(H, word):
    """Syndrome H * word^T over GF(2), as a list of bits (top row first)."""
    return [sum(H[i][j] * word[j] for j in range(len(word))) % 2
            for i in range(len(H))]

def min_distance(codewords):
    """Minimum distance of a code = min weight of a nonzero codeword,
    valid when the code is LINEAR (Theorem 38.4). codewords: list of bit-lists."""
    weights = [sum(c) for c in codewords if any(c)]   # skip the all-zero word
    return min(weights)

if __name__ == "__main__":
    y = [0, 1, 1, 0, 1, 1, 1]                          # Hamming(7,4), pos 5 flipped
    s = syndrome(HAMMING74_H, y)
    print("syndrome:", s, "= position", s[0] * 4 + s[1] * 2 + s[2])
    rep = [[0, 0, 0], [1, 1, 1]]                       # the repetition code {000,111}
    print("rep d_min:", min_distance(rep))
# Hand-derivation:
#   syndrome of [0,1,1,0,1,1,1]: row s4 = 0^1^1^1 = 1; s2 = 1^1^1^1 = 0; s1 = 0^1^1^1 = 1
#     -> [1,0,1], position 1*4 + 0*2 + 1 = 5
#   repetition code nonzero codewords: just [1,1,1], weight 3 -> min_distance 3
# Expected output:
# syndrome: [1, 0, 1] = position 5
# rep d_min: 3

The syndrome function is the general form of hamming_decode's three XORs: multiplying by $H$ is the parity check, and for the Hamming code the result names the error position because the columns of $H$ are the position numbers. The min_distance function operationalizes the chapter's central theorem — for any linear code, hand it the list of codewords and it returns $d_{\min}$, from which Theorems 38.2 and 38.3 read off detection and correction power. Together with Chapter 26's encode/decode pair, coding.py now spans distance, encoding, decoding, syndromes, and distance analysis — a complete miniature of coding theory, ready for the capstone to extend toward Reed–Solomon over the finite fields of Chapter 24.

Toolkit so far: every module from logic.py (Ch. 1–3) through graphs.py (Ch. 27–34), with coding.py (Ch. 26 + 38) now complete for binary linear codes. The capstone (Chapter 39) assembles the whole library; Track D turns coding.py into a working noisy-channel demonstrator.

Summary

Coding theory spends redundancy to buy distance, and lets distance absorb a channel's noise. Reference table of the chapter's results:

Core definitions

Term	Definition
Code / codeword	A chosen subset $C$ of length-$n$ strings; its members are codewords. An $(n,M)$ code has $M$ codewords; an $[n,k]$ linear code has $2^k$.
Hamming distance $d(x,y)$	Number of positions where $x,y$ differ; $= \operatorname{wt}(x\oplus y)$. A metric (Thm 38.1).
Weight $\operatorname{wt}(z)$	Number of nonzero positions of $z$.
Minimum distance $d_{\min}$	Smallest distance between distinct codewords; the design parameter.
Code rate $R$	$R = k/n = (\log_2 M)/n$: fraction of each block that is message.
Linear code	A code that is a subspace of $\mathrm{GF}(2)^n$ (closed under XOR, contains $\mathbf 0$).
Generator matrix $G$	$k\times n$; rows are a basis; encode by $c = mG$.
Parity-check matrix $H$	$(n-k)\times n$; $c$ is a codeword iff $Hc^{\mathsf T}=\mathbf 0$; syndrome $= Hy^{\mathsf T}$.
Reed–Solomon code	Evaluations of a degree-$

The two threshold theorems (the whole point of $d_{\min}$)

To guarantee	Need	Then corrects/detects
Detect $\le t$ errors	$d_{\min} \ge t+1$	any $\le t$ flips land off the code
Correct $\le t$ errors	$d_{\min} \ge 2t+1$	nearest-codeword decoding is unique
Correct $t_c$, detect $t_d$	$d_{\min} \ge t_c + t_d + 1$	e.g. $d_{\min}=4$: SECDED

So a code with minimum distance $d$ detects $d-1$ errors, or corrects $\big\lfloor \tfrac{d-1}{2}\big\rfloor$.

Key structural facts

Linearity collapses the theory: $d_{\min}$ = minimum nonzero weight (Thm 38.4) = fewest linearly dependent columns of $H$. Search codewords, not pairs.
Hamming$(7,4)$: parity at positions $1,2,4$; data at $3,5,6,7$; column $j$ of $H$ is $j$ in binary, so the syndrome equals the error position. $d_{\min}=3$ → corrects 1 error. Rate $4/7$. Add 1 parity → SECDED ($d_{\min}=4$).
Reed–Solomon: count symbols (bytes in $\mathrm{GF}(2^8)$), not bits, to beat bursts. $d_{\min}=n-k+1$ from the "$\le k-1$ roots" field fact; corrects $\lfloor (n-k)/2\rfloor$ symbol errors. Needs a true field (no zero divisors) — the Chapter 24 payoff.

Common pitfalls

A $d_{\min}=3$ code corrects 1 error or detects 2, never both — a double error miscorrects silently. Use $d_{\min}=4$ for SECDED.
Reed–Solomon over $\mathbb{Z}_{256}$ would fail — zero divisors break the root bound. It must be $\mathrm{GF}(2^8)$.
Higher reliability means lower rate; there is no free correction.

Toolkit: coding.py complete for binary linear codes — hamming_distance, hamming_encode, hamming_decode (Ch. 26), plus syndrome(H, word) and min_distance(codewords) (this chapter).

Spaced Review

Retrieval keeps knowledge available. Answer from memory before checking. Today's questions revisit error detection (Ch. 26) and the algebraic structures (Ch. 24) this chapter depends on.

(Ch. 26) A single parity bit detects all single-bit errors but misses all double-bit errors. Translate both facts into statements about the code's minimum distance, using this chapter's detection theorem.

(Ch. 26) What is the difference between error detection and error correction, and which one did a CRC provide? Why can't a CRC, by itself, correct?

(Ch. 24) State the definition of a field. Which field axiom is the one that makes the Reed–Solomon root bound ("a degree-$d$ nonzero polynomial has $\le d$ roots") hold — and which structure ($\mathbb{Z}_8$ vs. $\mathrm{GF}(8)$) violates it?

(Ch. 24) In $\mathrm{GF}(2)$, addition is XOR and $1+1=0$. Use this to explain in one sentence why, for a linear code, the XOR of two codewords is a codeword — the fact behind Theorem 38.4.

(Ch. 26 + 24) Chapter 26 modeled a bit-string as a polynomial over $\mathbb{Z}_2$ for CRCs. How does this chapter reuse "strings as algebraic objects," and what does it gain by moving from $\mathbb{Z}_2$ to $\mathrm{GF}(2^8)$?

Answers

"Detects all single but no guaranteed double" $\Leftrightarrow$ $d_{\min} = 2$: by Theorem 38.2 it detects up to $d_{\min}-1 = 1$ error, and since $d_{\min} = 2 < 3$ it cannot detect all 2-error patterns. 2. Detection = noticing corruption happened; correction = recovering the original. A CRC detects (a nonzero remainder flags an error) but carries no information about which bits changed, so it cannot locate or repair — it has detection-only design ($d_{\min}$ chosen for detection, and no decoding map to nearest codeword). 3. A field is a commutative ring with unity ($1\ne0$) in which every nonzero element has a multiplicative inverse. The "no zero divisors" property (a consequence of invertibility — if $ab=0$ and $a\ne 0$ then $b = a^{-1}\cdot 0 = 0$) is what forces $\le d$ roots; $\mathbb{Z}_8$ has zero divisors ($2\cdot 4 = 0$) and violates it, while $\mathrm{GF}(8)$, a genuine field, does not. 4. A linear code is a subspace, closed under addition; over $\mathrm{GF}(2)$ addition is XOR, so the XOR of two codewords is again in the subspace — a codeword. 5. Both chapters treat data as algebraic objects (polynomials/vectors) so that algebra can analyze errors. Moving to $\mathrm{GF}(2^8)$ lets each byte be one field element, so burst errors count as few symbol errors, and the field's root bound gives the optimal distance $n-k+1$ — turning detection-grade polynomial arithmetic into correction-grade Reed–Solomon.

What's Next

You have now seen discrete mathematics secure the internet (RSA), search and model the world (graphs), and — here — defeat noise itself, turning unreliable physics into reliable bits with nothing but distance and the algebra of finite fields. That is the last of the major applications. What remains is to put the whole toolkit to work. Chapter 39 is the capstone: you will assemble dmtoolkit into a complete project, choosing among four tracks — an RSA cryptosystem (Track A), a social-network analyzer (Track B), a Sudoku solver (Track C), or an error-correcting code (Track D) built directly on this chapter's coding.py. Every thread of the book converges there. Bring your coding.py; Track D is waiting for it.

Prerequisites

Learning Objectives

In This Chapter

Chapter 38: Coding Theory — Error-Correcting Codes

Overview

Learning Paths

38.1 The noisy channel problem

38.2 Hamming distance and code rate

Hamming distance

Minimum distance — the single number that decides everything

Code rate — what reliability costs

38.3 Error-detecting vs. error-correcting codes

Detection

Correction

38.4 Hamming codes

The construction, by its central trick

Syndrome decoding

Why the Hamming code has minimum distance 3

38.5 Linear codes over GF(2)

The definition and its first payoff

Generator and parity-check matrices

38.6 Real codes: Reed–Solomon (QR codes, CDs, deep space)

The key shift: symbols, not bits

Why it works — one polynomial fact does everything

Project Checkpoint: completing coding.py

Summary

Spaced Review

What's Next

Project Checkpoint: completing `coding.py`