Exercises: Coding Theory — Error-Correcting Codes

DataField.Dev

Exercises: Coding Theory — Error-Correcting Codes

These build from mechanical warm-ups (computing distances and rates) through the two threshold theorems, Hamming-code mechanics, the linear-algebra view, and on to Reed–Solomon and the finite-field fact underneath it. Difficulty: ⭐ foundational, ⭐⭐ intermediate, ⭐⭐⭐ challenging. Worked solutions to the daggered (†) and odd-numbered problems are in the appendix answers-to-selected.md — try them before you peek. All arithmetic on bits is over $\mathrm{GF}(2)$ (addition is XOR); when a problem says "by hand," do not run code, hand-trace.

Part A — Warm-ups (⭐)

38.1 † Compute the Hamming distances $d(\texttt{10110}, \texttt{11100})$ and $d(\texttt{0000}, \texttt{1011})$, and the weights $\operatorname{wt}(\texttt{101101})$ and $\operatorname{wt}(\texttt{11110000})$.

38.2 The repetition code that sends each bit five times is $\{\texttt{00000}, \texttt{11111}\}$. State its block length $n$, its number of message bits $k$, its minimum distance $d_{\min}$, and its rate $R$.

38.3 † A code has minimum distance $d_{\min} = 7$. Using the two threshold theorems (38.2 and 38.3), state (a) the largest number of errors it is guaranteed to detect, and (b) the largest number it is guaranteed to correct.

38.4 Encode the data bits $d_1 d_2 d_3 d_4 = 1100$ with the $\mathrm{Hamming}(7,4)$ code, using the parity equations $p_1 = d_1 \oplus d_2 \oplus d_4$, $p_2 = d_1 \oplus d_3 \oplus d_4$, $p_4 = d_2 \oplus d_3 \oplus d_4$ and the layout $[p_1, p_2, d_1, p_4, d_2, d_3, d_4]$.

38.5 † A $\mathrm{Hamming}(7,4)$ receiver computes the syndrome $s_4 s_2 s_1 = 011$. What position (if any) does it flip, and why?

38.6 A code sends $k = 4$ message bits in $n = 7$ transmitted bits. What is its rate? A different code sends $k = 4$ in $n = 8$ (a SECDED extension). What is its rate, and what did the extra bit buy?

Part B — Prove it (⭐⭐)

38.7 † (Scaffolded — fill in the missing steps.) Prove the detection threshold in the "$\Leftarrow$" direction: if $d_{\min} \ge t+1$, then $C$ detects every error of weight $\le t$. Fill the blanks. - Let $x \in C$ be any codeword, and let $y$ be $x$ corrupted in at most $t$ positions, so $d(x, y) \le \underline{\hphantom{xx}}$. - Suppose, for contradiction, that $y$ is also a codeword and $y \ne x$. Then $x$ and $y$ are two distinct codewords at distance $\le t$, so $d_{\min} \le \underline{\hphantom{xx}}$. - But we assumed $d_{\min} \ge t+1$, and $t < t+1$, a contradiction. Therefore $y$ is $\underline{\hphantom{xxxxxxxx}}$, which means the receiver sees an illegal string and the error is $\underline{\hphantom{xxxxxxxx}}$. $\blacksquare$

38.8 Prove that for any code with at least two codewords, $d_{\min} \ge 1$, and that $d_{\min} = 1$ exactly when some single bit flip carries one codeword to another. Conclude that a code with $d_{\min} = 1$ can correct zero errors.

38.9 † Prove that the $\mathrm{Hamming}(7,4)$ code has minimum distance at most 3 by exhibiting two distinct codewords at distance 3. (You may reuse the codeword $[1,1,1,0,0,0,0]$ from the chapter and the all-zeros codeword.)

38.10 Let $C$ be a linear code. Prove that the all-zeros word $\mathbf{0}$ is always a codeword, and that if $c \in C$ then $c \oplus c = \mathbf{0} \in C$ confirms closure. Then prove that for any code (not necessarily linear) containing $\mathbf{0}$, the minimum distance is at most the smallest weight of a nonzero codeword. (Why does the reverse inequality need linearity?)

38.11 † Using Theorem 38.4 (for a linear code, $d_{\min}$ = minimum nonzero weight), prove that the single-parity-check code $\{x \in \mathrm{GF}(2)^n : \operatorname{wt}(x) \text{ is even}\}$ has $d_{\min} = 2$. (Hint: it is linear; find the lightest nonzero even-weight word, and argue no nonzero codeword has weight 1.)

38.12 Prove the Singleton bound $d_{\min} \le n - k + 1$ for any $[n,k]$ linear code with $2^k \ge 2$. Strategy hint: if you delete the first $d_{\min} - 1$ coordinates from every codeword, the $2^k$ shortened words are still all distinct (deleting fewer than $d_{\min}$ positions cannot make two codewords collide). They live in $\mathrm{GF}(2)^{\,n - (d_{\min}-1)}$, so $2^k \le 2^{\,n-d_{\min}+1}$.

Part C — Implement it (⭐⭐)

Write Python for each. Do not run it — hand-trace and include an # Expected output: comment, in the book's convention. Reference solutions live in code/exercise-solutions.py.

38.13 † Write weight(bits) returning the number of 1s in a bit-list, and min_distance_linear(codewords) that returns the minimum distance of a linear code by Theorem 38.4 (minimum nonzero weight). Test it on the repetition code [[0,0,0],[1,1,1]] and state the expected output.

38.14 Write nearest_codeword(codewords, received) that returns the codeword at smallest Hamming distance from received (ties broken by first occurrence). Use it to decode received = [1,1,0] against the repetition code [[0,0,0],[1,1,1]], and give the expected output.

38.15 † Write hamming_encode_check(data) that encodes 4 data bits with $\mathrm{Hamming}(7,4)$ and then verifies its own output by computing the syndrome $H c^{\mathsf T}$ against the parity-check matrix $H$ (columns = position numbers in binary); the syndrome of a valid codeword must be $[0,0,0]$. Show the expected output for data = [1,0,1,1].

38.16 Write all_codewords(G) that, given a $k \times n$ generator matrix $G$ over $\mathrm{GF}(2)$, returns the list of all $2^k$ codewords $mG$ (arithmetic mod 2) by iterating over all $2^k$ messages. Run it in your head for $G = [[1,1,0],[0,1,1]]$ and give the four codewords as the expected output.

Part D — Find the error (⭐⭐)

Each "proof" or claim below is wrong. State precisely which part fails and why.

38.17 † Claim: "A code with $d_{\min} = 4$ can correct any 2-bit error, because $d_{\min} = 4 \ge 2$." Find the flaw. (Hint: which threshold theorem applies to correction, and what does it require for $t = 2$?)

38.18 Claim: "For any code, the minimum distance equals the minimum weight of a nonzero codeword, so to find $d_{\min}$ you only ever need to look at single codewords." "Proof": "$d(\mathbf{0}, c) = \operatorname{wt}(c)$, so distances are just weights." Find the flaw. Give a small non-linear 3-codeword code where the minimum distance is strictly less than the minimum nonzero weight.

38.19 † Claim: "$\mathrm{Hamming}(7,4)$ corrects 1 error and detects 2 errors at the same time, because $d_{\min} = 3$ and $3 \ge 1 + 2$." Find the flaw. What is the exact distance condition for "correct $t_c$ and detect $t_d$," and what does $\mathrm{Hamming}(7,4)$ actually do when two bits flip?

38.20 Claim: "Reed–Solomon over $\mathbb{Z}_{256}$ (the integers mod 256) is just as good as over $\mathrm{GF}(2^8)$ — both have 256 symbols, so the distance is still $n - k + 1$." Find the flaw. Identify the exact field property the distance argument relies on, and give a concrete polynomial that breaks it over $\mathbb{Z}_8$.

Part E — Model it & Conjecture and test (⭐⭐⭐)

38.21 † (Model it.) A satellite link to a deep-space probe corrupts bits in long bursts: a solar flare can flip a contiguous run of up to 16 bits at a time, but such events are rare and isolated. Engineers group the data into 8-bit symbols (bytes) and use a Reed–Solomon code over $\mathrm{GF}(2^8)$. (a) Translate "a burst of up to 16 consecutive bit-flips" into a bound on the number of corrupted symbols. (b) If the code is the NASA-standard $[255, 223]$ RS code, how many symbol errors can it correct, and is one such burst within that budget? (c) Explain, in two sentences, why counting in symbols rather than bits is the modeling decision that makes the burst tractable.

38.22 (Model it.) Your team must ship firmware over a channel that flips at most one bit per 64-bit word, and a silent miscorrection (correcting the wrong bit) is unacceptable — you would rather detect-and-resend than risk it. Decide between (i) plain $\mathrm{Hamming}$ with $d_{\min}=3$, (ii) SECDED with $d_{\min}=4$, and (iii) triple repetition. For each, state $d_{\min}$, what it does under a single-bit error, what it does under a double-bit error, and the rate cost. Recommend one and justify it using the threshold theorems.

38.23 † (Conjecture and test, then prove.) Conjecture a formula for the minimum distance of the $n$-fold repetition code $\{\,\underbrace{00\cdots0}_{n}, \underbrace{11\cdots1}_{n}\,\}$ as a function of $n$, and for the number of errors it corrects. Test your conjecture in code for $n = 3, 5, 7$ using a min_distance_linear helper, then prove the general formula. (It is linear, so Theorem 38.4 applies.) How many errors does the $n = 2m+1$ repetition code correct?

38.24 (Conjecture and test, then prove.) For the $\mathrm{Hamming}(7,4)$ code, conjecture how many of the $2^7 = 128$ length-7 strings are codewords, and how many are not. Test by generating all codewords from the generator matrix and counting. Then prove your codeword count from the dimension of the code (it is a $[7,4]$ linear code). Of the non-codewords, how many are at distance exactly 1 from a codeword, and what does that say about the code "tiling" all of $\mathrm{GF}(2)^7$? (This is the perfect property of Hamming codes.)

38.25 † (Conjecture and test, then prove — Reed–Solomon root bound.) Over $\mathbb{Z}_n$, count the solutions of $x^2 \equiv 1 \pmod n$ for $n = 5, 7, 8, 12, 15$. Conjecture a rule for when a quadratic has more than 2 roots, in terms of whether $n$ is prime. Then connect it to coding theory: for which of these $n$ would a Reed–Solomon distance argument ($\le k-1$ roots) be valid, and why?

Part F — Interleaved & Deep Dive

Mixing techniques from earlier chapters keeps them sharp.

38.26 † (Ch. 26 + 38.) A single parity bit is the simplest error-detecting code. (a) Express the parity-checked code on $k$ data bits as a code of block length $n = k+1$, and compute its $d_{\min}$. (b) Using Theorem 38.2, confirm it detects all single-bit errors. (c) Explain, in one sentence using $d_{\min}$, why it cannot correct even one error — closing the loop with Chapter 26's "detection, not correction."

38.27 (Ch. 24 + 38.) Recall from Chapter 24 that $\mathrm{GF}(2)$ is a field where addition is XOR. Prove that the set of length-$n$ binary strings of even weight is a subspace of $\mathrm{GF}(2)^n$ (hence a linear code) by verifying it contains $\mathbf{0}$ and is closed under XOR. (Hint: $\operatorname{wt}(x \oplus y) = \operatorname{wt}(x) + \operatorname{wt}(y) - 2\cdot(\text{overlap})$, so parities add mod 2.)

38.28 † (Ch. 23 + 38.) A Reed–Solomon codeword over $\mathrm{GF}(q)$ is the evaluation of a polynomial at $n$ points. Chapter 23 used polynomial interpolation ideas implicitly. Argue that any $k$ uncorrupted evaluation points uniquely determine a degree-$ k$ points provides the redundancy that lets a few corrupted points be overcome.

38.29 (Ch. 6 + 38.) Prove by induction on the number of parity bits $r$ that a Hamming code with $r$ parity bits has block length $n = 2^r - 1$ and carries $k = 2^r - 1 - r$ data bits. (Set up $P(r)$, take $r = 2$ as a base case giving $\mathrm{Hamming}(3,1)$, and argue the step by counting the nonzero binary columns available for the parity-check matrix.)

38.30 (Ch. 5 + 38, Deep Dive.) Prove by contradiction that a code correcting all single-bit errors must have $d_{\min} \ge 3$: assume $d_{\min} \le 2$, exhibit a received word equidistant (or ambiguously close) to two codewords under a single flip, and derive that nearest-codeword decoding cannot recover the sent word.

38.31 † (Deep Dive — the Hamming bound / sphere-packing.) Each codeword of a single-error-correcting binary code "owns" a ball of all strings within distance 1 of it; these balls are disjoint (by Theorem 38.3). A radius-1 ball in $\mathrm{GF}(2)^n$ contains $1 + n$ strings (the center plus $n$ single-flips). Prove that a single-error-correcting code of block length $n$ has at most $\dfrac{2^n}{n+1}$ codewords, and verify that $\mathrm{Hamming}(7,4)$ meets this bound with equality (so it is perfect: the balls exactly tile the space).

38.32 (Deep Dive.) Show that the minimum distance of a linear code equals the smallest number of columns of its parity-check matrix $H$ that are linearly dependent (XOR to $\mathbf{0}$). Use this to re-prove $d_{\min}(\mathrm{Hamming}(7,4)) = 3$ directly from the matrix $H$ whose columns are $1, 2, \dots, 7$ in binary: argue no one or two columns are dependent, but three are.

Solutions to † and odd-numbered exercises: see appendices/answers-to-selected.md. For each proof, the rubric rewards: a precisely stated claim, correct use of the threshold theorems and Theorem 38.4, explicit appeals to the metric/field properties when needed, and a clean conclusion. For each "implement it," the rubric rewards a correct hand-derived # Expected output: and from-scratch $\mathrm{GF}(2)$ arithmetic (no library that hides the field).