Case Study 13.2: XOR Encryption — From Simple Cipher to Foundation of AES
The XOR Operation in Cryptography
XOR has one property that makes it uniquely useful in cryptography: it is the only Boolean operation that is both invertible and preserves uniformity.
- Invertible:
(a XOR k) XOR k = a. The same operation with the same key undoes itself. - Uniform: If
kis uniformly distributed (each bit independently random), then(a XOR k)is uniformly distributed regardless ofa. XOR with a random key is information-theoretically secure — this is the one-time pad.
No other single-bit Boolean operation has both properties. AND maps all inputs to 0 when the key bit is 0 (not uniform). OR maps all inputs to 1 when the key bit is 1 (not uniform). XNOR, NAND, NOR, etc. are all non-uniform in some configuration.
Part 1: Single-Key XOR Stream Cipher
The simplest XOR cipher: XOR every byte of the plaintext with a single key byte, repeated.
; xor_stream(uint8_t *buf, size_t len, uint8_t key)
; Encrypts/decrypts buf in-place with repeating single-byte key
; RDI = buf, RSI = len, DL = key (byte)
section .text
global xor_stream
xor_stream:
test rsi, rsi ; len == 0?
jz .done
; Replicate the key byte to all 8 bytes of a 64-bit value
; key = 0xKK → 0xKKKKKKKKKKKKKKKK
movzx eax, dl ; rax = key (zero-extended)
imul rax, 0x0101010101010101 ; replicate: 0x01010101... * key = key in every byte
; Process 8 bytes at a time
mov rcx, rsi
shr rcx, 3 ; rcx = len / 8 (number of 64-bit blocks)
jz .byte_tail
.block_loop:
xor qword [rdi], rax ; XOR 8 bytes at once
add rdi, 8
dec rcx
jnz .block_loop
.byte_tail:
; Process remaining 0-7 bytes
and rsi, 7 ; rsi = len % 8
jz .done
; Use byte key (original key in DL, but we clobbered it in RAX; recover low byte)
movzx ecx, al ; al = replicated key's low byte = original key
.tail_loop:
xor byte [rdi], cl ; XOR one byte
inc rdi
dec rsi
jnz .tail_loop
.done:
ret
Performance: 8 bytes per instruction in the main loop — 8× faster than byte-at-a-time for large inputs.
Part 2: Why Single-Key XOR Is Weak
The Vigenère attack applies when the key repeats:
Ciphertext 1: C1 = M1 XOR K
Ciphertext 2: C2 = M2 XOR K (same key!)
C1 XOR C2 = (M1 XOR K) XOR (M2 XOR K) = M1 XOR M2
The key is eliminated. An attacker with two ciphertexts encrypted with the same key obtains the XOR of the two plaintexts — and English text has enough structure that M1 XOR M2 can be used to recover both plaintexts using frequency analysis.
This was the flaw in the original one-time pad misuse, the weakness in WEP (802.11b WiFi encryption), and why reusing a stream cipher key is catastrophically insecure.
Part 3: What AES Does with XOR
AES (Advanced Encryption Standard) is a block cipher that operates on 128-bit blocks. It uses XOR, but it also uses: - SubBytes: Apply a nonlinear lookup table (S-Box) to each byte independently - ShiftRows: Cyclically shift each row of the 4×4 byte matrix - MixColumns: Linear mixing of each column using GF(2^8) arithmetic - AddRoundKey: XOR the block with the round key
The XOR with the round key (AddRoundKey) is the only key-mixing step. Without the nonlinear S-Box, AES would be linear over GF(2) and trivially broken. But with the S-Box interleaved with linear mixing, AES resists all known practical attacks.
Key insight: AES uses XOR for the same reason the stream cipher does — to combine the key with the data — but surrounds it with non-invertible confusion (S-Box) that prevents the key-elimination attack.
Part 4: AES-NI — XOR at Hardware Speed
The AES-NI instruction set (available on Intel since 2010) implements the entire AES round in one instruction:
; AESENC xmm_state, xmm_roundkey
; Performs one AES encryption round:
; SubBytes + ShiftRows + MixColumns + AddRoundKey
; All four operations in approximately 7 clock cycles
; Simplified AES-128 key schedule and encryption (preview of Chapter 15):
section .text
global aes128_encrypt_preview
; aes128_encrypt_preview(uint8_t *block, const uint8_t *key)
; RDI = 16-byte block (in/out), RSI = 176-byte expanded key schedule
aes128_encrypt_preview:
; Load the plaintext block into XMM0
movdqu xmm0, [rdi] ; xmm0 = plaintext (16 bytes)
; Initial round key XOR (AddRoundKey before first round)
movdqu xmm1, [rsi] ; round key 0
pxor xmm0, xmm1 ; AddRoundKey
; Rounds 1-9 (AESENC performs SubBytes+ShiftRows+MixColumns+AddRoundKey)
movdqu xmm1, [rsi + 16]
aesenc xmm0, xmm1 ; Round 1
movdqu xmm1, [rsi + 32]
aesenc xmm0, xmm1 ; Round 2
movdqu xmm1, [rsi + 48]
aesenc xmm0, xmm1 ; Round 3
movdqu xmm1, [rsi + 64]
aesenc xmm0, xmm1 ; Round 4
movdqu xmm1, [rsi + 80]
aesenc xmm0, xmm1 ; Round 5
movdqu xmm1, [rsi + 96]
aesenc xmm0, xmm1 ; Round 6
movdqu xmm1, [rsi + 112]
aesenc xmm0, xmm1 ; Round 7
movdqu xmm1, [rsi + 128]
aesenc xmm0, xmm1 ; Round 8
movdqu xmm1, [rsi + 144]
aesenc xmm0, xmm1 ; Round 9
; Final round (AESENCLAST: SubBytes+ShiftRows+AddRoundKey, no MixColumns)
movdqu xmm1, [rsi + 160]
aesenclast xmm0, xmm1 ; Round 10
; Store result
movdqu [rdi], xmm0
ret
This encrypts one 128-bit block in 10 instructions (plus loads). At approximately 7 cycles per AESENC on modern Intel hardware, that is roughly 70 cycles for one 16-byte block — about 4.4 cycles per byte. A software AES implementation in C takes 200-300 cycles per block. AES-NI is 4-5× faster.
Part 5: The XOR → AES Upgrade
The progression from this chapter to Chapter 15:
| Stage | What we have | Security level |
|---|---|---|
| Ch. 13: XOR cipher | 8-byte block XOR | None (trivially broken) |
| Ch. 15: AES-NI CTR | 16-byte block per AESENC | 128-bit security |
The architecture is similar: in AES counter mode (CTR), the encryption produces a keystream (from encrypting a counter) that is XORed with the plaintext — exactly like a stream cipher, but with an AES-generated keystream instead of a repeating key.
CTR mode: ciphertext[i] = plaintext[i] XOR AES(key, nonce || counter_i)
The XOR from Chapter 13 becomes the final step. The AESENC family from Chapter 15 generates the keystream. The design we built for xor_stream (process multiple bytes per instruction, handle tail bytes) is directly reused in the CTR implementation.
Why This Progression Matters
Understanding XOR cipher first, before AES-NI, means you understand what problem AES solves and why XOR alone is insufficient. When you implement AES-NI in Chapter 15, you are not blindly following an instruction description — you understand that AESENC is doing the nonlinear transformation that prevents the key-elimination attack that breaks XOR cipher.
Assembly programmers who write cryptographic code without this understanding produce code that looks correct (it encrypts and decrypts successfully) but is subtly vulnerable (wrong cipher mode, reused nonces, missing authentication). Understanding the problem at the bit level is the foundation of correct cryptographic implementation.