Case Study 13.2: XOR Encryption — From Simple Cipher to Foundation of AES

Open Assembly Language Project

Case Study 13.2: XOR Encryption — From Simple Cipher to Foundation of AES

The XOR Operation in Cryptography

XOR has one property that makes it uniquely useful in cryptography: it is the only Boolean operation that is both invertible and preserves uniformity.

Invertible: (a XOR k) XOR k = a. The same operation with the same key undoes itself.
Uniform: If k is uniformly distributed (each bit independently random), then (a XOR k) is uniformly distributed regardless of a. XOR with a random key is information-theoretically secure — this is the one-time pad.

No other single-bit Boolean operation has both properties. AND maps all inputs to 0 when the key bit is 0 (not uniform). OR maps all inputs to 1 when the key bit is 1 (not uniform). XNOR, NAND, NOR, etc. are all non-uniform in some configuration.

Part 1: Single-Key XOR Stream Cipher

The simplest XOR cipher: XOR every byte of the plaintext with a single key byte, repeated.

; xor_stream(uint8_t *buf, size_t len, uint8_t key)
; Encrypts/decrypts buf in-place with repeating single-byte key
; RDI = buf, RSI = len, DL = key (byte)

section .text
global xor_stream

xor_stream:
    test rsi, rsi           ; len == 0?
    jz   .done

    ; Replicate the key byte to all 8 bytes of a 64-bit value
    ; key = 0xKK → 0xKKKKKKKKKKKKKKKK
    movzx  eax, dl          ; rax = key (zero-extended)
    imul   rax, 0x0101010101010101  ; replicate: 0x01010101... * key = key in every byte

    ; Process 8 bytes at a time
    mov  rcx, rsi
    shr  rcx, 3             ; rcx = len / 8 (number of 64-bit blocks)
    jz   .byte_tail

.block_loop:
    xor  qword [rdi], rax   ; XOR 8 bytes at once
    add  rdi, 8
    dec  rcx
    jnz  .block_loop

.byte_tail:
    ; Process remaining 0-7 bytes
    and  rsi, 7             ; rsi = len % 8
    jz   .done

    ; Use byte key (original key in DL, but we clobbered it in RAX; recover low byte)
    movzx  ecx, al          ; al = replicated key's low byte = original key

.tail_loop:
    xor  byte [rdi], cl     ; XOR one byte
    inc  rdi
    dec  rsi
    jnz  .tail_loop

.done:
    ret

Performance: 8 bytes per instruction in the main loop — 8× faster than byte-at-a-time for large inputs.

Part 2: Why Single-Key XOR Is Weak

The Vigenère attack applies when the key repeats:

Ciphertext 1: C1 = M1 XOR K
Ciphertext 2: C2 = M2 XOR K (same key!)

C1 XOR C2 = (M1 XOR K) XOR (M2 XOR K) = M1 XOR M2

The key is eliminated. An attacker with two ciphertexts encrypted with the same key obtains the XOR of the two plaintexts — and English text has enough structure that M1 XOR M2 can be used to recover both plaintexts using frequency analysis.

This was the flaw in the original one-time pad misuse, the weakness in WEP (802.11b WiFi encryption), and why reusing a stream cipher key is catastrophically insecure.

Part 3: What AES Does with XOR

AES (Advanced Encryption Standard) is a block cipher that operates on 128-bit blocks. It uses XOR, but it also uses: - SubBytes: Apply a nonlinear lookup table (S-Box) to each byte independently - ShiftRows: Cyclically shift each row of the 4×4 byte matrix - MixColumns: Linear mixing of each column using GF(2^8) arithmetic - AddRoundKey: XOR the block with the round key

The XOR with the round key (AddRoundKey) is the only key-mixing step. Without the nonlinear S-Box, AES would be linear over GF(2) and trivially broken. But with the S-Box interleaved with linear mixing, AES resists all known practical attacks.

Key insight: AES uses XOR for the same reason the stream cipher does — to combine the key with the data — but surrounds it with non-invertible confusion (S-Box) that prevents the key-elimination attack.

Part 4: AES-NI — XOR at Hardware Speed

The AES-NI instruction set (available on Intel since 2010) implements the entire AES round in one instruction:

; AESENC xmm_state, xmm_roundkey
; Performs one AES encryption round:
;   SubBytes + ShiftRows + MixColumns + AddRoundKey
; All four operations in approximately 7 clock cycles

; Simplified AES-128 key schedule and encryption (preview of Chapter 15):
section .text
global aes128_encrypt_preview

; aes128_encrypt_preview(uint8_t *block, const uint8_t *key)
; RDI = 16-byte block (in/out), RSI = 176-byte expanded key schedule

aes128_encrypt_preview:
    ; Load the plaintext block into XMM0
    movdqu xmm0, [rdi]      ; xmm0 = plaintext (16 bytes)

    ; Initial round key XOR (AddRoundKey before first round)
    movdqu xmm1, [rsi]      ; round key 0
    pxor   xmm0, xmm1       ; AddRoundKey

    ; Rounds 1-9 (AESENC performs SubBytes+ShiftRows+MixColumns+AddRoundKey)
    movdqu xmm1, [rsi + 16]
    aesenc xmm0, xmm1       ; Round 1

    movdqu xmm1, [rsi + 32]
    aesenc xmm0, xmm1       ; Round 2

    movdqu xmm1, [rsi + 48]
    aesenc xmm0, xmm1       ; Round 3

    movdqu xmm1, [rsi + 64]
    aesenc xmm0, xmm1       ; Round 4

    movdqu xmm1, [rsi + 80]
    aesenc xmm0, xmm1       ; Round 5

    movdqu xmm1, [rsi + 96]
    aesenc xmm0, xmm1       ; Round 6

    movdqu xmm1, [rsi + 112]
    aesenc xmm0, xmm1       ; Round 7

    movdqu xmm1, [rsi + 128]
    aesenc xmm0, xmm1       ; Round 8

    movdqu xmm1, [rsi + 144]
    aesenc xmm0, xmm1       ; Round 9

    ; Final round (AESENCLAST: SubBytes+ShiftRows+AddRoundKey, no MixColumns)
    movdqu xmm1, [rsi + 160]
    aesenclast xmm0, xmm1   ; Round 10

    ; Store result
    movdqu [rdi], xmm0
    ret

This encrypts one 128-bit block in 10 instructions (plus loads). At approximately 7 cycles per AESENC on modern Intel hardware, that is roughly 70 cycles for one 16-byte block — about 4.4 cycles per byte. A software AES implementation in C takes 200-300 cycles per block. AES-NI is 4-5× faster.

Part 5: The XOR → AES Upgrade

The progression from this chapter to Chapter 15:

Stage	What we have	Security level
Ch. 13: XOR cipher	8-byte block XOR	None (trivially broken)
Ch. 15: AES-NI CTR	16-byte block per AESENC	128-bit security

The architecture is similar: in AES counter mode (CTR), the encryption produces a keystream (from encrypting a counter) that is XORed with the plaintext — exactly like a stream cipher, but with an AES-generated keystream instead of a repeating key.

CTR mode: ciphertext[i] = plaintext[i] XOR AES(key, nonce || counter_i)

The XOR from Chapter 13 becomes the final step. The AESENC family from Chapter 15 generates the keystream. The design we built for xor_stream (process multiple bytes per instruction, handle tail bytes) is directly reused in the CTR implementation.

Why This Progression Matters

Understanding XOR cipher first, before AES-NI, means you understand what problem AES solves and why XOR alone is insufficient. When you implement AES-NI in Chapter 15, you are not blindly following an instruction description — you understand that AESENC is doing the nonlinear transformation that prevents the key-elimination attack that breaks XOR cipher.

Assembly programmers who write cryptographic code without this understanding produce code that looks correct (it encrypts and decrypts successfully) but is subtly vulnerable (wrong cipher mode, reused nonces, missing authentication). Understanding the problem at the bit level is the foundation of correct cryptographic implementation.