Case Study: Building the Field Inside AES — A GF(2⁸) Arithmetic Engine

DataField.Dev

Case Study: Building the Field Inside AES — A GF(2⁸) Arithmetic Engine

"Inside every byte that AES touches there is a field. Build the field, and the cipher's strangest step becomes ordinary multiplication."

Executive Summary

In this build case study you will construct, from nothing, a working arithmetic engine for the finite field $\mathrm{GF}(2^8)$ — the 256-element field that lives inside the AES block cipher, every QR code, and every Reed–Solomon error-correcting code. You will implement field addition (bitwise XOR), field multiplication (polynomial multiply-and-reduce), and the harder operations — inverses and fast multiplication — by exploiting the chapter's deepest fact about finite fields: their nonzero elements form a cyclic group (Theorem 24.6), so you can build logarithm and antilogarithm tables and turn multiplication into addition of exponents. Finally you will use your engine to compute one column of the AES MixColumns step, the place where a working cryptographer actually multiplies in $\mathrm{GF}(2^8)$.

Where Case Study 1 audited an existing protocol, here you build the primitive that real ciphers depend on. The whole thing is the §24.5 construction scaled from $\mathrm{GF}(2^3)$ to $\mathrm{GF}(2^8)$, plus the §24.3 generator idea cashed in for speed.

Skills applied

Implementing $\mathrm{GF}(2^n)$ addition as XOR and multiplication as polynomial multiply-then-reduce (§24.5).
Choosing and using an irreducible polynomial as the field's modulus (§24.5).
Verifying that a chosen element is a primitive element / generator of $\mathrm{GF}(2^8)^*$ (Theorem 24.6, §24.3).
Building log/antilog tables from the cyclic multiplicative group to make multiplication and inversion $O(1)$ table lookups.
Applying the engine to a real computation: an AES MixColumns multiply.

Background

The field with 256 elements

AES processes data as bytes, and it treats each byte as an element of $\mathrm{GF}(2^8)$. By the §24.5 recipe, an element of $\mathrm{GF}(2^8)$ is a polynomial of degree $< 8$ with coefficients in $\{0,1\}$ — equivalently, an 8-bit byte. The byte 0b00010011 is the polynomial $x^4 + x + 1$. There are exactly $2^8 = 256$ of them, and by Theorem 24.5 a field of order $256 = 2^8$ exists and is essentially unique.

AES fixes the irreducible polynomial (its "prime")

$$m(x) = x^8 + x^4 + x^3 + x + 1,$$

which in binary is the 9-bit constant 0b1_0001_1011 = 0x11B. Multiplication reduces modulo $m(x)$, exactly as $\mathrm{GF}(2^3)$ in §24.5 reduced modulo $x^3 + x + 1$. Everything you learned at the 3-bit scale transfers verbatim; only the degree changes.

🔗 Connection. This is §24.5's $\mathrm{GF}(2^3)$ example grown up. There we fixed $x^3 \equiv x+1$ and watched $x$ generate all 7 nonzero elements. Here we fix $x^8 \equiv x^4 + x^3 + x + 1$ and will find a generator of all 255 nonzero elements. The Reed–Solomon codes of Chapter 38 run on this very field.

Why build it from tables

A naive multiply (the shift-and-XOR loop from §24.5) is correct but does real work on every call. Because $\mathrm{GF}(2^8)^*$ is cyclic of order 255 (Theorem 24.6), there is a primitive element $g$ whose powers $g^0, g^1, \dots, g^{254}$ list every nonzero byte. If we precompute two tables — antilog[i] = g**i and its inverse log[v] = i — then multiplication becomes

$$u \cdot v = g^{\,\log u + \log v} \quad\Rightarrow\quad u \cdot v = \texttt{antilog}\big[(\log u + \log v) \bmod 255\big],$$

a couple of lookups and one addition mod 255. This is exactly the log-table multiplication §24.5 promised, and it is how production finite-field code actually multiplies. Building those tables is the heart of this case study.

🧩 Productive Struggle: Before reading on, predict: if g is a primitive element of $\mathrm{GF}(2^8)^*$, how many distinct nonzero bytes appear among $g^0, g^1, \dots, g^{254}$? What is $g^{255}$? If you can answer from Theorem 24.6 alone, you already understand why the tables work.

Answer

All 255 nonzero bytes appear exactly once (a generator's powers exhaust the group), and $g^{255} = g^{\lvert \mathrm{GF}(2^8)^*\rvert} = 1$ by Corollary 24.3. So the exponent lives mod 255.

Phase 1: Addition — the easy operation

In $\mathrm{GF}(2^n)$, addition is coefficient-wise mod 2, which is bitwise XOR (§24.5). And since $a \oplus a = 0$, every element is its own additive inverse — subtraction equals addition. The whole additive group is one line:

def gf_add(a, b):
    """Addition in GF(2^8): coefficient-wise mod 2 == bitwise XOR.
    Also serves as subtraction, since every element is its own inverse."""
    return a ^ b

print(gf_add(0x53, 0xCA))      # (x^6+x^4+x+1) + (x^7+x^6+x^3+x) ...
print(gf_add(0x53, 0x53))      # a + a == 0 in GF(2^n)
# Expected output:
# 153
# 0

Verify the first by hand: $0x53 = \texttt{0101 0011}$ and $0xCA = \texttt{1100 1010}$; XOR gives $\texttt{1001 1001} = 0x99 = 153$. The second is immediate: any byte XOR itself is $0$, confirming self-inverse additive structure. That is the entire additive group $(\mathrm{GF}(2^8), +)$ — abelian, order 256, every element of additive order 2 (except $0$).

Phase 2: Multiplication — multiply, then reduce

Now the multiplicative core. We scale the §24.5 gf8_mul loop from 3 bits to 8 and from modulus 0b1011 to AES's $m(x) = $ 0x11B. The algorithm: for each bit of b, conditionally XOR in a shifted copy of a; whenever a overflows degree 8 (bit 7 was set before the shift, producing an $x^8$ term), reduce by XOR-ing the modulus.

def gf_mul(a, b):
    """Multiply in GF(2^8) modulo m(x) = x^8 + x^4 + x^3 + x + 1 (0x11B).
    Russian-peasant style: shift-and-XOR with reduction."""
    p = 0
    for _ in range(8):                    # 8 bits in GF(2^8)
        if b & 1:
            p ^= a                        # addition is XOR
        b >>= 1
        carry = a & 0x80                  # is the x^7 coefficient set?
        a <<= 1
        if carry:
            a ^= 0x11B                    # reduce: subtract m(x)
    return p & 0xFF

print(gf_mul(0x57, 0x13))      # the worked example from FIPS-197
# Expected output:
# 254

This is the AES standard's own worked example: in $\mathrm{GF}(2^8)$, $0x57 \cdot 0x13 = 0xFE = 254$. Let us hand-derive enough of it to trust the loop. Write $0x57 = x^6 + x^4 + x^2 + x + 1$ and $0x13 = x^4 + x + 1$, so

$$0x13 = 1 + x + x^4 \;\Rightarrow\; 0x57 \cdot 0x13 = 0x57\cdot 1 \;\oplus\; 0x57\cdot x \;\oplus\; 0x57\cdot x^4.$$

Compute each piece by repeated "multiply by $x$" (a left shift, reducing with 0x11B whenever bit 7 was set):

$0x57 \cdot 1 = 0x57$.
$0x57 \cdot x$: shift $\texttt{0101 0111} \to \texttt{1010 1110} = 0xAE$ (bit 7 was 0, no reduction).
$0x57 \cdot x^2 = 0xAE \cdot x$: shift $\texttt{1010 1110} \to \texttt{1 0101 1100}$; bit 7 was set, so reduce: $\texttt{1 0101 1100} \oplus \texttt{1 0001 1011} = \texttt{0100 0111} = 0x47$.
$0x57 \cdot x^3 = 0x47\cdot x$: shift $\to \texttt{1000 1110} = 0x8E$ (no reduction).
$0x57 \cdot x^4 = 0x8E\cdot x$: shift $\texttt{1000 1110}\to\texttt{1 0001 1100}$; bit 7 set, reduce: $\oplus\,\texttt{1 0001 1011} = \texttt{0000 0111} = 0x07$.

Now XOR the three needed terms ($x^0$, $x^1$, $x^4$ pieces):

$$0x57 \oplus 0xAE \oplus 0x07 = \texttt{0101 0111} \oplus \texttt{1010 1110} \oplus \texttt{0000 0111}.$$

Step it: $\texttt{0101 0111} \oplus \texttt{1010 1110} = \texttt{1111 1001}$; then $\texttt{1111 1001} \oplus \texttt{0000 0111} = \texttt{1111 1110} = 0xFE = 254$. ✓ The loop's output matches the hand computation, and matches the AES specification.

⚠️ Common Pitfall: The reduction must use the carry-out before the shift overflows, i.e. test bit 7 of a before shifting, then XOR 0x11B after. Testing the wrong bit, or reducing with 0x1B (the low 8 bits) instead of 0x11B, gives subtly wrong products that still "look like bytes." Always validate against the FIPS example $0x57\cdot 0x13 = 0xFE$.

🔄 Check Your Understanding Using gf_mul, what is $0x02 \cdot 0x80$? (Multiplying by 0x02 is "multiply by $x$.")

Answer

$0x80 = x^7$, so $x \cdot x^7 = x^8 \equiv x^4 + x^3 + x + 1 = \texttt{0001 1011} = 0x1B = 27$. The reduction fires because bit 7 of $0x80$ is set. So gf_mul(0x02, 0x80) == 27.

Phase 3: Find a primitive element and build the log/antilog tables

Here the chapter's cyclic-group theorem pays for itself. Theorem 24.6 guarantees $\mathrm{GF}(2^8)^*$ is cyclic of order 255; we need one generator. A standard choice for this field is $g = 0x03$ (the polynomial $x + 1$), which is a primitive element under $m(x) = 0x11B$. We verify it the §24.3 way — its powers must produce all 255 nonzero bytes before returning to 1 — and simultaneously fill the tables:

def build_tables(g=0x03):
    """Build antilog[i] = g^i and log[v] = i for GF(2^8)*, g a primitive element.
    Returns (antilog, log); antilog has length 256 with antilog[255]==antilog[0]==1."""
    antilog = [0] * 256
    log = [0] * 256
    x = 1
    for i in range(255):                  # |GF(2^8)*| = 255
        antilog[i] = x
        log[x] = i
        x = gf_mul(x, g)                  # step forward: multiply by the generator
    antilog[255] = antilog[0]             # g^255 == g^0 == 1  (Corollary 24.3)
    return antilog, log

antilog, log = build_tables(0x03)
print(antilog[0], antilog[1], antilog[2])     # g^0, g^1, g^2
print(len(set(antilog[:255])))                # distinct powers g^0..g^254
print(antilog[255])                           # g^255 should be 1
# Expected output:
# 1 3 5
# 255
# 1

Trace the first three antilog entries by hand to confirm: $g^0 = 1$; $g^1 = 0x03 = 3$; $g^2 = 0x03\cdot 0x03 = (x+1)^2 = x^2 + 1 = \texttt{0000 0101} = 5$ (squaring in characteristic 2 has no cross term, since $2x = 0$). The crucial line is len(set(antilog[:255])) == 255: the 255 powers are all distinct, which is precisely the statement that $g = 0x03$ is a generator and $\mathrm{GF}(2^8)^*$ is cyclic, witnessed concretely. And antilog[255] == 1 is Corollary 24.3 in this field: an element raised to the group's order is the identity.

💡 Intuition: The antilog table is a single lap around the cyclic group, recording every stop. The log table is the same lap read backwards: "which step number lands on byte $v$?" Because the group is cyclic, every nonzero byte is some step, so the log table is total on $\{1,\dots,255\}$. This is the §24.3 "single step that visits everything" picture, stored as an array.

Phase 4: Fast multiplication and inverses via the tables

With the tables built, multiplication becomes addition of exponents mod 255 — the payoff Theorem 24.6 promised. And inversion becomes negation of an exponent, because $g^i \cdot g^{255-i} = g^{255} = 1$:

def gf_mul_fast(u, v):
    """Multiply in GF(2^8) by adding logs mod 255 (Theorem 24.6)."""
    if u == 0 or v == 0:                  # 0 has no log; handle separately
        return 0
    return antilog[(log[u] + log[v]) % 255]

def gf_inv(u):
    """Multiplicative inverse in GF(2^8)*: g^i has inverse g^(255-i)."""
    if u == 0:
        raise ValueError("0 has no multiplicative inverse")
    return antilog[(255 - log[u]) % 255]

print(gf_mul_fast(0x57, 0x13))           # must agree with gf_mul: 0xFE
print(gf_mul(gf_inv(0x57), 0x57))        # u^{-1} * u == 1
# Expected output:
# 254
# 1

The first line cross-checks the fast multiply against the from-scratch loop: both give $0xFE = 254$, so the log/antilog machinery is consistent with multiply-and-reduce. The second line is the field's defining property made concrete: $0x57$ has a multiplicative inverse (because $\mathrm{GF}(2^8)$ is a field, every nonzero element does — §24.4), and multiplying it back gives $1$. We can even read off the inverse: $\log(0x57)$ is some exponent $i$, and $0x57^{-1} = g^{255-i}$. Division in $\mathrm{GF}(2^8)$ is now just $u / v = u \cdot v^{-1}$ — the full arithmetic of a field, all four operations, running on bytes.

🚪 Threshold Concept. Watch what the cyclic structure bought us. A field operation that looks like a polynomial algorithm (multiply, reduce, search for an inverse) became a constant-time table lookup, purely because $\mathrm{GF}(2^8)^*$ is cyclic and a generator exists. Theorem 24.6 is not a curiosity; it is the reason finite-field libraries are fast. The abstraction "this group is cyclic" compiled directly into "use an antilog table."

Phase 5: Use the engine — one AES MixColumns multiply

Finally, put the engine to work where AES uses it. The MixColumns step multiplies each column of the state by a fixed matrix over $\mathrm{GF}(2^8)$:

$$ \begin{pmatrix} s_0' \\ s_1' \\ s_2' \\ s_3' \end{pmatrix} = \begin{pmatrix} 2 & 3 & 1 & 1 \\ 1 & 2 & 3 & 1 \\ 1 & 1 & 2 & 3 \\ 3 & 1 & 1 & 2 \end{pmatrix} \begin{pmatrix} s_0 \\ s_1 \\ s_2 \\ s_3 \end{pmatrix}, $$

where the entries 2, 3, 1 are the bytes $0x02, 0x03, 0x01$ and all arithmetic is in $\mathrm{GF}(2^8)$ — multiplication is gf_mul, addition is XOR. Compute the first output byte for the input column $(s_0, s_1, s_2, s_3) = (0xDB, 0x13, 0x53, 0x45)$, a standard test vector:

def mixcolumn_first(s):
    """First output byte of AES MixColumns for a 4-byte column s."""
    return (gf_mul(2, s[0]) ^ gf_mul(3, s[1]) ^ gf_mul(1, s[2]) ^ gf_mul(1, s[3]))

print(mixcolumn_first([0xDB, 0x13, 0x53, 0x45]))
# Expected output:
# 142

Hand-derive each product, then XOR (recall multiply-by-2 is "multiply by $x$," with reduction when bit 7 is set):

$0x02 \cdot 0xDB$: $0xDB = \texttt{1101 1011}$, bit 7 set; shift to $\texttt{1 1011 0110}$, reduce $\oplus\,\texttt{1 0001 1011} = \texttt{1010 1101} = 0xAD$.
$0x03 \cdot 0x13 = (0x02 \oplus 0x01)\cdot 0x13 = (0x02\cdot 0x13) \oplus 0x13$. Now $0x02\cdot 0x13$: $0x13 = \texttt{0001 0011}$, bit 7 clear; shift to $\texttt{0010 0110} = 0x26$. So $0x03\cdot 0x13 = 0x26 \oplus 0x13 = \texttt{0011 0101} = 0x35$.
$0x01 \cdot 0x53 = 0x53$ (multiply by 1 is identity).
$0x01 \cdot 0x45 = 0x45$.

XOR all four: $0xAD \oplus 0x35 \oplus 0x53 \oplus 0x45$. Step it: $\texttt{1010 1101} \oplus \texttt{0011 0101} = \texttt{1001 1000}$ ($=0x98$); $\texttt{1001 1000} \oplus \texttt{0101 0011} = \texttt{1100 1011}$ ($=0xCB$); $\texttt{1100 1011} \oplus \texttt{0100 0101} = \texttt{1000 1110} = 0x8E = 142$. ✓

Your from-scratch field engine just reproduced an AES MixColumns output byte. Every byte AES mixes is this same field multiplication; you have built the arithmetic the cipher runs on.

🔗 Connection. The MixColumns matrix is invertible over the field — which is exactly why decryption can undo it (multiply by the inverse matrix). A matrix over a ring with zero divisors might not be invertible; this is the §24.4 "fields let you divide" lesson at the level of linear algebra, and the same reason Reed–Solomon (Chapter 38) insists on a field.

Discussion Questions

We used $g = 0x03$ as a primitive element. How would you verify a candidate is primitive without building the whole table — i.e. what is the cheapest structural check, given that the only possible element orders divide 255? (Hint: $255 = 3 \cdot 5 \cdot 17$; relate to Lagrange and the divisors of the group order.)
gf_mul_fast handles 0 with a special case because log[0] is undefined. Explain in group terms why $0$ has no logarithm, and why that is consistent with $\mathrm{GF}(2^8)^*$ — not all of $\mathrm{GF}(2^8)$ — being the cyclic group.
We chose AES's irreducible $m(x) = 0x11B$. If we had instead used a reducible degree-8 polynomial as the modulus, which field axiom would fail, and what would go wrong with gf_inv? Tie this to why §24.5 demanded the modulus be irreducible.
The log-table multiply is $O(1)$ but uses 512 bytes of tables. The shift-and-XOR multiply is $O(1)$ in the byte size but does more work per call and uses no tables. When would you prefer each? (Connect to the time–space tradeoffs of Chapter 14.)
Addition in $\mathrm{GF}(2^8)$ is XOR and needs no table. Why is multiplication the operation that benefits from the cyclic-group trick, while addition does not — what is structurally different about the additive group versus the multiplicative group here?

Your Turn: Extensions

Option A (full field class). Wrap gf_add, gf_mul_fast, gf_inv, and division into a small GF256 class with operator overloading, so you can write GF256(0x57) * GF256(0x13). Keep the tables as class-level data built once. Hand-trace one product and one division and give expected outputs.
Option B (full MixColumns). Extend mixcolumn_first to compute all four output bytes of a column, then to a $4\times 4$ state. Verify against the standard test vector $(0xDB,0x13,0x53,0x45)\mapsto(0x8E,0x4D,0xA1,0xBC)$ by hand-deriving the second byte.
Option C (Reed–Solomon preview). Using your engine, evaluate the polynomial $p(x) = 0x01 + 0x02\,x + 0x03\,x^2$ at the field elements $x = 0x01, 0x02, 0x03$ (all arithmetic in $\mathrm{GF}(2^8)$). These evaluations are the encoded symbols of a tiny Reed–Solomon codeword — the Chapter 38 construction in miniature. Give the three expected outputs, hand-derived.

Key Takeaways

$\mathrm{GF}(2^8)$ is §24.5's recipe at full scale. Addition is XOR; multiplication is polynomial multiply-and-reduce modulo the irreducible $0x11B$. Nothing new beyond $\mathrm{GF}(2^3)$ — only the degree grew.
The multiplicative group being cyclic is what makes the field fast. Theorem 24.6 guarantees a primitive element; its powers give log/antilog tables that turn multiplication into addition of exponents and inversion into negation of an exponent.
A field gives you all four operations on bytes. Because every nonzero element is invertible (§24.4), gf_inv always succeeds, and division is just multiply-by-inverse — which is exactly what lets AES decryption and Reed–Solomon decoding undo their forward steps.
Build it, then validate against a known vector. Every routine here was cross-checked against the FIPS-197 example ($0x57\cdot0x13=0xFE$) and an AES MixColumns test vector — the build-heavy counterpart to a proof, anchoring correctness to an authoritative reference.
You built the arithmetic a real cipher runs on. The same gf_mul powering this engine powers AES MixColumns, QR-code decoding, and deep-space error correction — the field structure of §24.5 made executable.