Case Study: Building a Reliable Frame Protocol over a Noisy Link

DataField.Dev

Case Study: Building a Reliable Frame Protocol over a Noisy Link

"The channel does not promise to deliver your bits unchanged. The protocol is the promise you build on top."

Executive Summary

In this build-focused case study you'll construct, piece by piece, the integrity layer of a small data-link protocol for a noisy serial channel — the kind that connects a sensor to a controller, where a stray bit-flip is routine and a burst of flips (a moment of electrical interference) is common. You'll build three detection schemes in increasing strength — a parity bit, a weighted checksum, and a CRC — hand-deriving every checksum and every receiver verdict, then assemble them into a single send/receive round-trip and watch it catch the errors it is designed to catch and (honestly) miss the ones it cannot. Finally you'll bolt on the one thing none of them provides — defense against a malicious tamperer — by reaching for a cryptographic hash (§26.6). The throughline is theme one, discrete math is the language of CS: the modular arithmetic of Chapter 23 and the polynomials-over-$\mathbb{Z}_2$ of §26.5 are not decorations here; they are the protocol.

Skills applied

Implementing parity, weighted-modular, and CRC checks from scratch and hand-deriving their outputs (§26.4–26.5).
Choosing a CRC generator and stating exactly which error classes it guarantees to catch (§26.5).
Assembling a frame → transmit → verify round-trip and reasoning about detected vs. undetected errors.
Distinguishing accidental-error detection from adversarial tamper-detection, and matching the tool to the threat (§26.6).

Background

The scenario

You're writing the firmware for a (hypothetical) environmental sensor that ships short data frames to a base station over a half-duplex radio link. A frame is small — a few bytes of payload — and the channel introduces two kinds of corruption:

Single-bit flips, from thermal noise: one bit per frame, occasionally.
Burst errors, from interference: a run of consecutive bits, all corrupted, up to a dozen or so bits long.

The base station must detect corrupted frames and request a retransmit. (Correction — fixing the frame without a resend — is a later milestone; this milestone is detection.) Your job: build the smallest checksum that reliably catches what this channel actually does, prove it does, and resist the temptation to over- or under-engineer.

💡 Intuition: Match the tool to the channel's failure mode, just as §26.6 says to match the tool to the adversary. Single flips want at least a parity bit; human-typed identifiers want a weighted mod-prime sum; bursts — this channel's signature — want a CRC, because polynomial division over $\mathbb{Z}_2$ catches every burst shorter than the generator (§26.5).

Why this matters

Every reliable link you've ever used — Ethernet, Wi-Fi, USB, the disk controller under your filesystem — has a layer exactly like this one. Building a small version makes the guarantees concrete: you'll see why Ethernet uses a CRC and not a parity bit, in the form of frames your own hand-computed CRC catches that parity sails past.

Phase 1: A parity bit (and proving its limit)

Start with the cheapest detector. Treat each frame as a list of bits and append one even-parity bit.

def parity_bit(bits):
    """Even-parity bit: makes the total number of 1s even."""
    return sum(bits) % 2

def frame_with_parity(bits):
    return bits + [parity_bit(bits)]

def parity_ok(frame):
    """Receiver check: a correct even-parity frame has an even number of 1s."""
    return sum(frame) % 2 == 0

payload = [1, 0, 1, 1, 0, 0, 1]
sent = frame_with_parity(payload)
print("sent:", sent, "ok:", parity_ok(sent))
# Expected output:
# sent: [1, 0, 1, 1, 0, 0, 1, 0] ok: True

Hand-derivation: the payload $1011001$ has four 1-bits (even), so even parity appends $0$, giving $10110010$ with four 1-bits — parity_ok sees an even count and returns True. Now corrupt it the two ways the channel can:

One flip (flip position 0): frame becomes $00110010$, three 1-bits (odd) → parity_ok returns False. Caught, as §26.4 promises for an odd number of flips.
Two flips (flip positions 0 and 1): frame becomes $01110010$… count the 1s: positions with 1 are 1,2,3,6 → four 1-bits (even) → parity_ok returns True. Missed. Two flips restore the parity.

caught   = [0, 0, 1, 1, 0, 0, 1, 0]   # one flip from sent
missed   = [0, 1, 1, 1, 0, 0, 1, 0]   # two flips from sent
print(parity_ok(caught), parity_ok(missed))
# Expected output:
# False True

This is the §26.4 claim, demonstrated rather than stated: parity detects exactly the odd error counts. For this channel, where bursts flip several consecutive bits, parity is dangerously weak — a burst of length 2 (an even count) is invisible.

🔄 Check Your Understanding The channel produces bursts up to ~12 bits long. Roughly what fraction of burst errors would a single parity bit miss, and why is that unacceptable here?

Answer

A burst flips some number of bits; parity misses every burst that flips an even number of them — roughly half of all bursts (any burst whose corrupted-bit count is even). Missing ~50% of the channel's dominant error mode is unacceptable; we need a detector built for bursts.

Phase 2: A weighted checksum (and what it's actually good at)

Before the CRC, build the §26.4 weighted scheme — not because it suits this channel best, but because it's the right tool for a different failure mode (human-entered identifiers), and contrasting it sharpens why CRC wins for bursts. We mirror the ISBN-style construction: weight position $i$ by $i$ and work modulo a prime.

def weighted_check_digit(values, modulus=11):
    """Check digit c so that sum(i*values[i-1]) + len*... ≡ 0; ISBN-10 style.
    Here: choose c in [0, modulus) making the weighted sum of values+[c] ≡ 0 (mod modulus),
    with c weighted by position len(values)+1."""
    n = len(values)
    s = sum((i + 1) * v for i, v in enumerate(values))   # positions 1..n
    # need s + (n+1)*c ≡ 0 (mod modulus); with modulus prime and (n+1) invertible, solve for c
    # for n+1 = 10 and modulus 11 this matches the chapter's ISBN cancellation
    for c in range(modulus):
        if (s + (n + 1) * c) % modulus == 0:
            return c
    return None

vals = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print(weighted_check_digit(vals, modulus=11))
# Expected output:
# 10

Hand-derivation (the chapter's ISBN example, exactly): the weighted sum of $1,\dots,9$ with weights $1,\dots,9$ is $1+4+9+16+25+36+49+64+81 = 285$. We need $285 + 10c \equiv 0 \pmod{11}$. Since $285 = 11\cdot25 + 10$, $285 \equiv 10$, so $10 + 10c \equiv 0$, i.e. $10(1+c)\equiv 0 \pmod{11}$; because 11 is prime, $10$ is invertible (§26.4, Chapter 23) and we cancel it to get $1 + c \equiv 0$, so $c = 10$ — written X in a real ISBN. The brute-force loop in the code finds the same $c = 10$.

What does this buy that parity doesn't? Per §26.4, with a prime modulus and positional weights, every single-digit error and every adjacent transposition changes the weighted sum mod 11 and is caught. That is the right guarantee for catalog numbers a human types — but it is not a burst-error guarantee, so it is not what our radio channel needs. Right tool, wrong channel.

⚠️ Common Pitfall: Reaching for the checksum you know instead of the one the channel needs. A weighted mod-prime sum is excellent against transpositions and single-digit slips; it makes no special promise about a 7-bit burst. The chapter's discipline is to name the channel's error model first, then pick the scheme whose theorem matches it.

🔄 Check Your Understanding Why does the ISBN-style scheme need the modulus to be prime (here 11, not 10) for the cancellation step 10(1+c) ≡ 0 ⇒ 1+c ≡ 0 to be valid?

Answer

Cancelling the factor 10 requires it to have a multiplicative inverse mod the modulus. A prime modulus guarantees every nonzero residue (including 10) is invertible (Chapter 23), so the cancellation is legitimate. Mod 10, the factor 10 is $\equiv 0$ and not invertible, and the argument collapses.

Phase 3: Build the CRC the channel actually needs

Now the right tool. Bursts up to ~12 bits demand a generator of degree $r \ge 12$ so that every burst of length $\le r$ is caught (§26.5). For a hand-traceable build we'll demonstrate the mechanism with a small generator and then state the production choice.

We reuse the chapter's crc_remainder and add a sender and a receiver:

def crc_remainder(message, generator):
    """r-bit CRC remainder (r = len(generator)-1); message, generator are 0/1 strings."""
    r = len(generator) - 1
    bits = list(message) + ["0"] * r
    for i in range(len(message)):
        if bits[i] == "1":
            for j in range(len(generator)):
                bits[i + j] = str(int(bits[i + j]) ^ int(generator[j]))
    return "".join(bits[len(message):])

def crc_frame(message, generator):
    """Sender: append the CRC remainder to the message."""
    return message + crc_remainder(message, generator)

def crc_ok(frame, generator):
    """Receiver: frame is intact iff dividing it by generator leaves a zero remainder."""
    return set(crc_remainder(frame, generator)) <= {"0"}

G = "1011"                         # x^3 + x + 1, degree r = 3 (small, for hand-tracing)
M = "11010110"
frame = crc_frame(M, G)
print("frame:", frame, "ok:", crc_ok(frame, G))
# Expected output:
# frame: 11010110111 ok: True

Let's hand-derive the remainder of $M = 11010110$ augmented with $r = 3$ zeros, $11010110\,000$, divided by $G = 1011$. We use schoolbook XOR long division: keep a running 4-bit window, and whenever its leading bit is 1, XOR $G = 1011$ into it, then bring down the next dividend bit. The dividend bits are $\underline{1\,1\,0\,1}\,0\,1\,1\,0\,0\,0\,0$.

window   action                       bit brought down
1101     XOR 1011 -> 0110             then bring down '0'  -> 1100
1100     XOR 1011 -> 0111             then bring down '1'  -> 1111
1111     XOR 1011 -> 0100             then bring down '1'  -> 1001
1001     XOR 1011 -> 0010             then bring down '0'  -> 0100
0100     leading 0: no XOR            then bring down '0'  -> 1000
1000     XOR 1011 -> 0011             then bring down '0'  -> 0110
0110     leading 0: no XOR            then bring down '0'  -> 1100
1100     XOR 1011 -> 0111             (no more bits)       -> remainder

All eleven dividend bits have been consumed; the final window holds the remainder. Its low $r = 3$ bits are $R = 111$. So the transmitted frame is $M$ followed by $R$: $11010110\,111 = 11010110111$ — matching the # Expected output:. The receiver divides the whole frame by $G$; because we appended the exact remainder, that division leaves $000$, so crc_ok returns True.

Now the payoff — catch a burst that parity missed. Corrupt three consecutive bits of the frame (a length-3 burst). Since $\deg G = 3$, §26.5 guarantees any burst of length $\le 3$ is detected:

corrupted = list(frame)
corrupted[2] = "1" if corrupted[2] == "0" else "0"   # flip 3 consecutive bits
corrupted[3] = "1" if corrupted[3] == "0" else "0"
corrupted[4] = "1" if corrupted[4] == "0" else "0"
print("recv :", "".join(corrupted), "ok:", crc_ok("".join(corrupted), G))
# Expected output:
# recv : 11101110111 ok: False

Hand-check the logic without redoing the full division: the error polynomial $E(x)$ for a length-3 burst has degree $< 3 = \deg G$, and a nonzero polynomial of degree less than $\deg G$ cannot be divisible by $G$ (§26.5). The receiver's remainder equals $E(x) \bmod G(x)$, which is nonzero — so crc_ok returns False. The burst that parity (Phase 1) would have missed half the time is caught with certainty here. That single contrast — parity misses even-length bursts; CRC catches every burst $\le r$ — is the whole reason this milestone uses a CRC.

💡 Intuition: The CRC is a hash whose codomain is the $r$-bit remainders, but chosen with the algebra of $\mathbb{Z}_2[x]$ so that the specific changes this channel makes — short bursts — land in a different slot from the original with certainty, not just high probability (§26.5). That certainty is what a generic sum-checksum cannot promise.

🐛 Find the Error. A teammate sizes the generator by saying: "Bursts are up to 12 bits, so use a degree-11 generator — 11 bits is close enough to 12." Where is the off-by-one, and what degree actually guarantees all bursts of length $\le 12$?

Answer

§26.5's guarantee is "all bursts of length $\le r$ are detected," where $r = \deg G$. To cover bursts up to length 12 you need $r \ge 12$, i.e. a degree-12 (or higher) generator — a 13-bit string. A degree-11 generator only guarantees bursts $\le 11$; a 12-bit burst could slip through. The fix is to pick $r$ at least as large as the longest burst you must catch (real protocols use CRC-16 or CRC-32, with ample margin).

Phase 4: Assemble the round-trip — and confront the adversary

Put the pieces together into one honest send/receive and state precisely what it guarantees.

def send(message, generator):
    """Frame a message with a CRC for an accidental-error channel."""
    return crc_frame(message, generator)

def receive(frame, generator):
    """Return (accepted, payload-or-None)."""
    if crc_ok(frame, generator):
        return True, frame[:len(frame) - (len(generator) - 1)]   # strip the r CRC bits
    return False, None

G = "1011"
wire = send("11010110", G)
print(receive(wire, G))                     # clean delivery
flipped = wire[:5] + ("0" if wire[5] == "1" else "1") + wire[6:]   # one flip
print(receive(flipped, G))                  # single-bit error
# Expected output:
# (True, '11010110')
# (False, None)

The clean frame strips its 3 CRC bits and returns the payload 11010110; the single-flip frame fails the division (a single-bit error is $E(x)=x^i$, and $G=1011$ has $\ge 2$ terms so it cannot divide $x^i$, §26.5) and is rejected for retransmit. The protocol does exactly its job: detect accidental corruption in this channel's error classes — single flips and bursts $\le r$ — and ask for a resend.

Now the honest limit, and the §26.6 pivot. Suppose the "channel" is not just noisy but hostile: an attacker on the wire can replace the whole frame. Can our CRC stop them?

⚠️ Common Pitfall: using the wrong tool for the threat. A CRC is not a cryptographic hash. CRC is linear over $\mathbb{Z}_2$ and its generator is public: an attacker who alters the payload can simply recompute the matching CRC (or craft changes that leave the CRC unchanged, exploiting linearity), and receive will accept the forged frame. Our protocol detects accidents, not adversaries.

The fix matches §26.6: against a tamperer you need a cryptographic hash — better, a keyed one (an HMAC), so the attacker cannot recompute the digest without the secret key.

import hashlib, hmac
def send_authenticated(message_bytes, key):
    tag = hmac.new(key, message_bytes, hashlib.sha256).hexdigest()
    return message_bytes, tag
def receive_authenticated(message_bytes, tag, key):
    expected = hmac.new(key, message_bytes, hashlib.sha256).hexdigest()
    return hmac.compare_digest(tag, expected)     # constant-time compare
# No printed output: correctness here rests on SHA-256's collision/preimage resistance (§26.6),
# not on a value we can hand-derive — which is exactly the point of a cryptographic hash.

We deliberately give no # Expected output: for the SHA-256 tag: unlike a CRC, you cannot hand-derive a cryptographic digest, and that infeasibility is the security property (§26.6). The contrast closes the case study: a CRC's output is something you can compute with a pencil (and so can an attacker); a cryptographic hash's output is something no one can shortcut — which is why one guards against noise and the other against malice.

🔄 Check Your Understanding Why can we hand-derive and verify the CRC remainder but not the SHA-256 tag — and why is that inability the desired property for tamper-detection?

Answer

CRC is short, linear, and keyless: pencil-and-paper polynomial division reproduces it (and so an attacker can recompute it after tampering). SHA-256 is designed so that no feasible computation produces or reverses a digest without doing the full work, and an HMAC adds a secret key. That very infeasibility (preimage/collision resistance, §26.6) is what stops an attacker from forging a matching tag — exactly the guarantee a CRC cannot give.

Discussion Questions

The channel's dominant error is bursts up to ~12 bits, yet Phase 3 demonstrated the mechanism with a degree-3 generator. State the production generator degree you'd ship and justify it from the §26.5 burst guarantee. What do you gain and lose by going to CRC-32?
Phase 1's parity bit and Phase 3's CRC both reduce to "polynomial arithmetic over $\mathbb{Z}_2$." The chapter notes a parity check is the CRC with generator $G = x + 1$. Verify this on a tiny example, and explain why $(x+1) \mid E$ encodes "an odd number of bits flipped."
Our receive strips the last $r$ bits as the CRC. What goes wrong if the sender and receiver disagree about $r$ (the generator degree)? Which step of the round-trip breaks first?
The HMAC step used hmac.compare_digest rather than ==. The chapter doesn't cover timing attacks — but reason about it: why might a naive tag == expected leak information an attacker could exploit, and how does that relate to §26.6's theme of matching the tool to the adversary?
Argue for or against shipping both a CRC and an HMAC on every frame. When is the CRC redundant given the HMAC, and when is it still worth the few cheap bits?

Your Turn: Extensions

Option A (build a different generator). Re-run Phase 3's hand-derivation with $G = 11001$ (degree 4) on the same message $M = 11010110$. Produce the 4-bit remainder by XOR long division, give the transmitted frame, and confirm a length-4 burst is now guaranteed caught.
Option B (a checksum bake-off). For a fixed payload, hand-construct one error that parity catches but the CRC's design also catches, one that parity misses but the CRC catches (an even-length burst), and argue whether any error the CRC misses exists for bursts $\le r$. Summarize in a three-row table of "scheme vs. error caught."
Option C (toward correction). This milestone only detects. Sketch how you'd upgrade to correction for single-bit errors using the Project Checkpoint's hamming_encode/hamming_decode instead of a CRC, and state the cost difference (extra redundancy) — the §26.4 / Chapter 38 detection-vs-correction tradeoff, made concrete on your own protocol.

Key Takeaways

Name the channel's error model first, then pick the scheme. Single flips → parity (catches odd counts); typed identifiers → weighted mod-prime (catches single-digit + transpositions); bursts → CRC (catches every burst $\le r = \deg G$). The theorem must match the threat.
A CRC is built, not hoped. Choosing $\deg G \ge$ (longest burst) gives a guarantee, not a probability — the §26.5 payoff that makes CRC the workhorse of Ethernet, ZIP, and disk controllers.
The round-trip is frame → divide → verdict. The sender appends the remainder so the frame is a multiple of $G$; the receiver re-divides and rejects any nonzero remainder for retransmit.
Detection of accidents is not defense against adversaries. CRC is linear and public, so a tamperer recomputes it; tamper-detection needs a cryptographic hash (or keyed HMAC), whose output you cannot hand-derive — and that infeasibility is the security (§26.6).