Case Study 37-2: Why Intel CET Defeats ROP — The Shadow Stack in Detail

Open Assembly Language Project

Case Study 37-2: Why Intel CET Defeats ROP — The Shadow Stack in Detail

Introduction

Intel CET's Shadow Stack represents a fundamentally different approach to memory corruption defense than its predecessors. Canaries detect overwrites by checking a value. ASLR makes addresses unpredictable. But CET SHSTK does something different: it maintains a hardware-protected copy of return addresses in memory that user-space code physically cannot write to. This case study examines the SHSTK mechanism at the microarchitecture level, explains exactly why it defeats traditional ROP, analyzes what the next generation of attacks looks like in a SHSTK world, and surveys the current deployment state.

The Fundamental Insight: Separation of Return Address Storage

The core vulnerability that ROP exploits is that return addresses and data are in the same writable memory region (the stack). An overflow that reaches the return address changes where the function returns to.

SHSTK's solution: maintain return addresses in a SEPARATE memory region with hardware-enforced write protection. Data overflows cannot reach this region. Only the CPU's own CALL/RET mechanism can write to it.

Traditional stack (single region):
┌──────────────────────────────────┐
│  data (buf, local vars)          │ ← writable by code
│  ...                             │ ← writable by code
│  saved RBP                       │ ← writable by code
│  return address                  │ ← writable by code ← ATTACK TARGET
└──────────────────────────────────┘
         ↑
   overflow reaches here

SHSTK (two regions):
Regular stack:                    Shadow stack:
┌──────────────────┐              ┌──────────────────┐
│  data            │ ← writable   │  return address  │ ← CPU-protected
│  saved RBP       │ ← writable   │  return address  │ ← only CALL writes
│  return address  │ ← writable   │  (protected)     │ ← only RET reads
└──────────────────┘              └──────────────────┘
         ↑                                 ↑
   attacker can corrupt             attacker CANNOT corrupt

Microarchitecture Implementation

The Shadow Stack Page Type

The shadow stack is a region of virtual memory with a distinctive page table entry (PTE) attribute combination. Intel CET introduces a "shadow stack page" type identified by:

Normal Data Page:                Shadow Stack Page:
  Present:  1                      Present:  1
  Writable: 1                      Writable: 0  ← software cannot write
  User:     1 (or 0)               User:     1
  Dirty:    0 (initially)          Dirty:    1  ← shadow stack marker
  (all other bits same)

The combination Writable=0, Dirty=1 is the shadow stack marker. This combination is invalid for normal data pages (you cannot set the dirty bit on a non-writable page through normal operation). The CPU recognizes this pattern and enforces special shadow stack rules.

What Happens on Software Writes

If user-space code tries to write to a shadow stack page:

; Attempt to write to shadow stack address (attacker trying to forge a return addr):
mov [shadow_stack_address], rax     ; → #GP (General Protection fault)
push rax                             ; → #GP if RSP targets shadow stack page
stosq                                ; → #GP if RDI targets shadow stack page

Any write attempt via ordinary store instructions or PUSH raises #GP immediately. The page's Writable=0 attribute causes the CPU to refuse the write, regardless of privilege level.

CALL Behavior with SHSTK

When CALL target executes with CET SHSTK enabled:

Microarchitecture steps:
1. Compute return address = RIP + instruction_length
2. Write return address to [RSP-8] (regular stack, normal decrement)
3. Write return address to [SSP-8] (shadow stack, using CET's internal write path)
4. Decrement RSP by 8
5. Decrement SSP by 8
6. Set RIP = target

Step 3 uses a special internal write path that can access shadow stack pages — user-space code cannot replicate this path via normal instructions.

The SSP (Shadow Stack Pointer) is a new CPU register visible in the IA32_PL3_SSP MSR. It is saved and restored on context switches by the kernel.

RET Behavior with SHSTK

When ret executes with CET SHSTK enabled:

Microarchitecture steps:
1. Load candidate_return_addr from [RSP] (regular stack)
2. Load shadow_return_addr from [SSP] (shadow stack)
3. Compare: if candidate_return_addr != shadow_return_addr
      raise #CP (Control Protection exception), stop
4. Otherwise: set RIP = candidate_return_addr
5. Increment RSP by 8
6. Increment SSP by 8

The comparison is in hardware. There is no instruction sequence in user space that can make a forged return address match the shadow stack — because writing to the shadow stack is not possible from user space.

A Precise Trace of SHSTK Defeating a ROP Chain

Consider the scenario from Case Study 37-1, with SHSTK enabled:

Legitimate call chain before the vulnerability:
  main calls vulnerable()
    → CALL: pushes 0x401370 to both regular stack AND shadow stack

Regular stack at this point:  Shadow stack:
  [0x7ffd..b8]:  0x401370      [0x7f..a8]:  0x401370   ← SHSTK copy
  [0x7ffd..c0]:  (higher stack)

The attacker overflows buf in vulnerable():

Regular stack after overflow:
  [rbp+8]: 0x401200  ← overwritten with G1 address (attacker)

Shadow stack (unchanged — no write instruction touched it):
  [0x7f..a8]:  0x401370  ← still the legitimate return address

When vulnerable() executes ret:

Step 1: load from [RSP] = 0x401200  (attacker's G1)
Step 2: load from [SSP] = 0x401370  (legitimate return)
Step 3: 0x401200 != 0x401370 → MISMATCH → #CP raised
Step 4-6: NOT EXECUTED (exception occurred at step 3)

The kernel's #CP handler terminates the process. No instruction from the ROP chain ever executes.

The Next Generation: Attacks in a SHSTK World

SHSTK specifically and very effectively defeats return-address-based ROP. Sophisticated attackers have explored what remains:

Data-Only Attacks

These corrupt data without redirecting control flow: - Corrupt a UID variable to escalate privilege - Corrupt a file descriptor number to redirect I/O - Corrupt a size variable to later trigger an out-of-bounds read/write - Corrupt a flag that bypasses authentication

CET provides zero protection against data-only attacks. These attacks are harder to automate and exploit because they require understanding the specific data layout and semantics of the target program. But they work.

Heap Function Pointer Corruption (with IBT disabled or insufficient)

Function pointers in heap objects (vtables, callbacks, function pointer structs) can be corrupted:

struct handler {
    int (*process)(char *data, size_t len);  /* function pointer */
    int state;
};

If process can be overwritten (via UAF, heap overflow, or other heap corruption), the next call handler->process(...) jumps to the attacker's target. SHSTK does not protect this: it is not a ret-based call. CET IBT would protect it if the target does not start with ENDBR64.

JIT Spraying (pre-IBT)

JIT compilers generate executable code at runtime. If the JIT code contains attacker-influenced constant values at predictable positions, those constants might be interpreted as useful instructions when jumped to. With IBT, JIT-generated code must include ENDBR64 at valid indirect call targets; without this, the constants are not valid IBT targets.

Control-Flow Bending

Abadi et al. showed that even with CFI, there exist attacks that stay within the valid call graph but choose malicious paths. For example: if function A can call either B or C (both valid per CFI), an attacker who controls the decision makes A call C when B was intended. This requires a deep vulnerability (controlling the branch condition), but it works within CFI constraints.

setjmp/longjmp Abuse (mitigated by glibc CET support)

longjmp jumps to a previously saved execution context, bypassing the normal call stack. A CET-unaware longjmp would mismatch the shadow stack. glibc's CET-aware implementation handles this correctly using RSTORSSP.

Older or custom setjmp/longjmp implementations may not be CET-aware and could be exploited to manipulate the shadow stack pointer.

Deployment Status (2025)

Hardware

Intel Tiger Lake (2020+): Full CET support (SHSTK + IBT)
Intel Ice Lake, Rocket Lake, Alder Lake, Raptor Lake: CET supported
AMD: CET not yet broadly available in mainstream CPUs (expected in future generations; AMD has shadow stack support in Zen 5 class processors)
ARM64: Pointer Authentication (PAC) and Branch Target Identification (BTI) serve analogous roles

Linux Kernel

Kernel 5.18+: User-space SHSTK support (opt-in)
Kernel 6.x: Broader CET support, including SHSTK for glibc
Kernel CET for kernel-mode: Work in progress; not yet mainstream

User-Space Software

glibc 2.35+: CET-aware (handles setjmp/longjmp, signal handlers)
GCC 8+: -fcf-protection=full emits ENDBR64 and generates SHSTK-compatible code
Clang 7+: -fcf-protection=full similarly supported
Fedora, Ubuntu 22.04+: Ship system libraries with CET support
Firefox, Chrome: IBT enabled in recent versions

Practical Security Benefit

For systems running on CET hardware with CET-enabled software: - Traditional ROP chains are defeated - JOP is substantially mitigated - The remaining viable attack surface requires: heap UAF with function pointer corruption, data-only attacks, or a hardware/microcode vulnerability in CET itself

This represents a meaningful security improvement. The remaining attack surface requires significantly more sophisticated vulnerabilities and exploitation techniques than the pre-CET world.

Summary

CET SHSTK defeats ROP by solving the root problem: return addresses and data share the same writable memory region. By maintaining hardware-protected copies of return addresses in shadow stack pages (which user-space cannot write to), every ret in a forged ROP chain will mismatch the shadow stack and trigger a hardware exception. No information leak, no canary bypass, no ASLR defeat can change this: the shadow stack is physically inaccessible to ordinary software writes. CET IBT complements this by requiring valid indirect branch targets to be explicitly marked. Together, they represent the most significant advance in memory corruption defense since NX/DEP — not because they make exploitation impossible, but because they close the specific mechanism that powered the previous generation of attacks.