Case Study 9.2: Constant-Time Comparison — The Security Perspective
The Problem: Timing Attacks on strcmp
The standard C memcmp function compares two memory regions byte by byte and returns as soon as it finds a difference. This early-exit behavior is a security vulnerability when comparing secret values like cryptographic MACs, tokens, or passwords: an attacker who can measure the comparison time can deduce how many leading bytes match, which leaks information about the secret value.
This case study implements a constant-time comparison function in x86-64 assembly and explains exactly why each instruction choice matters.
The Vulnerable Implementation
// INSECURE: timing leaks information
int memcmp_insecure(const uint8_t *a, const uint8_t *b, size_t n) {
for (size_t i = 0; i < n; i++) {
if (a[i] != b[i]) return a[i] - b[i]; // returns early!
}
return 0;
}
In assembly, this becomes:
; INSECURE version
memcmp_insecure:
xor eax, eax
test rdx, rdx
jz .done
.loop:
movzx ecx, byte [rdi]
movzx r8d, byte [rsi]
cmp ecx, r8d
jne .mismatch ; ← branch taken on first mismatch
inc rdi
inc rsi
dec rdx
jnz .loop
ret
.mismatch:
sub ecx, r8d
mov eax, ecx
.done:
ret
The jne .mismatch branch is the vulnerability. If an attacker submits comparison targets and measures response time with enough precision (nanosecond-level timing via network or cache side-channel), they can determine that a token starting with AA compares longer than one starting with AB — revealing that the secret starts with A.
The Attack Model
Modern timing attack toolkits can resolve differences of: - ~10 cycles via local cache timing (Flush+Reload) - ~1 μs via high-resolution timers - ~100 μs via network timing with enough samples
A 32-byte HMAC compared with early exit leaks 8 bits per comparison in ~256 queries (binary search per byte). This is practical: researchers have demonstrated HMAC forgery against OpenSSL via network timing.
The Constant-Time Principle
A constant-time comparison must:
1. Visit every byte regardless of where the mismatch occurs
2. Produce no data-dependent branches (no jne, no jz based on comparison result)
3. Produce no data-dependent memory accesses (no lookup tables indexed by secret data)
The result can only vary in the final return value, not in execution time.
Implementation: Constant-Time Memory Comparison
; constant_time_memcmp(const void *a, const void *b, size_t n)
; Returns 0 if equal, nonzero if not equal
; Runs in exactly the same number of cycles regardless of content
; RDI = a, RSI = b, RDX = n
; Return in RAX
section .text
global constant_time_memcmp
constant_time_memcmp:
xor eax, eax ; accumulator = 0
test rdx, rdx ; n == 0?
jz .done ; if n=0, trivially equal (ok to branch on n)
.loop:
; Load bytes from both buffers
movzx ecx, byte [rdi] ; ecx = a[i]
movzx r8d, byte [rsi] ; r8d = b[i]
; XOR: produces 0 iff bytes are equal, nonzero otherwise
xor ecx, r8d ; ecx = a[i] ^ b[i]
; OR into accumulator: once nonzero, stays nonzero
or eax, ecx ; accumulator |= (a[i] ^ b[i])
; Advance pointers
inc rdi
inc rsi
; Decrement count and loop (no branch on content, only on counter)
dec rdx
jnz .loop ; branch on counter, NOT on content
.done:
; RAX is 0 if all bytes matched, nonzero otherwise
; Optional: normalize to 0 or 1
; neg rax ; sets CF if rax != 0
; sbb eax, eax ; eax = -CF = 0xFFFFFFFF if !=, 0 if ==
; neg eax ; eax = 1 if !=, 0 if ==
ret
The critical property: the jnz .loop branch is taken based only on the loop counter rdx, which is determined entirely by the length parameter n — not by the contents of the buffers. An attacker controlling the buffer contents cannot influence when the loop exits.
Why XOR + OR Works
Consider comparing two buffers byte by byte:
- If
a[i] == b[i]:a[i] ^ b[i] = 0. ORing 0 into the accumulator does not change it. - If
a[i] != b[i]:a[i] ^ b[i] != 0. ORing a nonzero value into the accumulator makes it nonzero.
After the loop, accumulator == 0 iff all bytes matched. The accumulator is monotonic: once a mismatch sets a bit, no subsequent matching byte can clear it (OR only sets bits, never clears).
The Compiler Problem: Defeating Optimization
A subtle danger: the C version of constant-time comparison can be "optimized" by the compiler back into a variable-time version:
int ct_memcmp_c(const uint8_t *a, const uint8_t *b, size_t n) {
uint8_t result = 0;
for (size_t i = 0; i < n; i++) {
result |= (a[i] ^ b[i]); // compiler may optimize this away!
}
return result;
}
GCC -O2 can recognize that this loop computes a "could exit early" equivalent and transform it to an early-exit loop, entirely defeating the purpose. This is why security-critical libraries write constant-time code directly in assembly, or use compiler barriers (__volatile__, memory fences) to prevent reordering.
In assembly, the loop structure above is guaranteed by the programmer — the assembler does not optimize control flow. This is one legitimate reason to write security-critical primitives in assembly.
Verifying Constant-Time Behavior
With Valgrind (ct-verif)
The ct-verif tool uses Valgrind's Memcheck to track "taint" (secret data) and verify it does not influence branching:
valgrind --tool=memcheck --track-origins=yes ./test_ct_memcmp
Timing Measurements
A simple validation with rdtsc:
; Measure comparison of equal data vs data differing in last byte
; Time should be equal within measurement noise
; RDTSC returns timestamp counter in EDX:EAX
; On modern CPUs, use RDTSCP for more precise measurements
rdtscp ; EDX:EAX = timestamp, ECX = processor ID
shl rdx, 32
or rax, rdx ; rax = full 64-bit cycle count
; ... run comparison ...
rdtscp ; get end timestamp
; ... compute difference ...
Running 1000 iterations with equal data vs. different data should show the same cycle distribution. If the histogram of times differs, there is a timing leak.
The OpenSSL Solution: CRYPTO_memcmp
OpenSSL's constant-time comparison (in crypto/mem.c) uses essentially our assembly technique, with an added twist: it uses a volatile pointer to prevent the compiler from seeing through the comparison:
int CRYPTO_memcmp(const void *in_a, const void *in_b, size_t len) {
size_t i;
const volatile unsigned char *a = in_a;
const volatile unsigned char *b = in_b;
unsigned char x = 0;
for (i = 0; i < len; i++)
x |= a[i] ^ b[i];
return x;
}
The volatile qualifier prevents the compiler from reasoning about the memory access pattern and eliminates the optimization that would reintroduce branching.
Takeaway: Security Requires Assembly Discipline
Three points from this case study:
-
The vulnerability is not in the algorithm — it is in the implementation.
a[i] != b[i]is a correct comparison; the early exit is the bug. -
Compilers optimize for speed, not security. A C compiler may legally transform your constant-time C into variable-time machine code. Assembly gives you direct control over which instructions execute.
-
Constant-time is a property of the instruction sequence, not the algorithm. The XOR + OR pattern is constant-time because neither XOR, OR, INC, nor DEC take different amounts of time based on their input data. The
jnz .loopbranch depends only on the loop counter. This is the kind of reasoning you must apply to every security-critical code path.
This principle — avoiding secret-dependent branches and memory accesses — is the foundation of side-channel resistant cryptographic code, and it will appear again when we implement the AES-NI encryption tool in Chapter 15.