Case Study 36-2: Intel CET — The Hardware Solution to Memory Corruption

Introduction

Intel Control-flow Enforcement Technology (CET) represents a qualitative shift in how memory corruption is defended against. Previous mitigations — stack canaries, ASLR, NX — were software or OS-level features that attackers found ways to bypass over time. CET moves the enforcement into the CPU itself, making the shadow stack inaccessible to ordinary software writes.

This case study examines CET's two components (SHSTK and IBT) at the microarchitecture level, explains why each addresses a different aspect of the exploitation problem, analyzes why SHSTK defeats traditional ROP, and discusses the current deployment and adoption status.

The Problem CET Solves

Return-Oriented Programming (Chapter 37) works because: 1. The return address on the stack is writable memory 2. ret pops the return address from the stack — no other validation 3. If the return address is corrupted, execution goes wherever the attacker chose

Stack canaries detect corruptions that pass through the canary. But if the attacker can: - Leak the canary (format string, out-of-bounds read), then - Overwrite the return address with the correct canary value intact

...the canary check passes and execution proceeds to the attacker's target.

SHSTK eliminates the possibility of forging a return address regardless of whether the canary is known, because the comparison happens against hardware-protected memory.

Component 1: Shadow Stack (SHSTK)

Microarchitecture Design

The shadow stack is a region of memory with a special page table attribute: DIRTY=1, WRITE=0 in the page table entry. This combination, enforced by the CPU, means: - Software MOV [addr], value to a shadow stack page: #GP (general protection fault) - Software PUSH or POP: also faults if targeting shadow stack pages - Only CET-specific instructions can write to shadow stack pages: CALL and RET

Regular Stack page:        Shadow Stack page:
Page attributes:           Page attributes:
  Present: 1                 Present: 1
  Writable: 1                Writable: 0   ← software cannot write
  User: 1 (or 0 for kernel)  User: 1
  NX: 0 (stack may be NX-protected) NX: 0
  Dirty: varies              Dirty: 1   ← CET shadow stack marker

The CPU has a new register: SSP (Shadow Stack Pointer). On CALL: 1. Normal stack: push return address onto RSP stack 2. Shadow stack: CPU writes return address to SSP, decrements SSP by 8

On RET: 1. Normal stack: load return address from RSP, increment RSP 2. Shadow stack: load address from SSP, increment SSP 3. Compare both: if not equal, raise #CP (Control Protection exception)

Why Software Cannot Bypass SHSTK

An attacker who overflows a buffer and overwrites the return address on the regular stack does NOT control the shadow stack. When ret executes:

Regular Stack:              Shadow Stack:
[rbp+8] = 0xdeadbeef       [SSP] = 0x401234 (legitimate return address)
(attacker-controlled)       (CPU-protected, unchanged)

On RET:
  Regular: load 0xdeadbeef
  Shadow:  load 0x401234
  Compare: 0xdeadbeef != 0x401234
  Result:  #CP exception → process terminated

The comparison is in hardware. There is no code path that can modify the shadow stack value without either (1) using CALL, which sets both stacks consistently, or (2) exploiting a hardware vulnerability in the CET implementation itself.

WRSS: The Shadow Stack Write Instruction

There is one user-space instruction that can write to the shadow stack: WRSS (Write to Shadow Stack). It exists specifically to support setjmp/longjmp and other stack unwinding operations that legitimately modify the shadow stack. WRSS requires privilege level 0 (kernel mode) — user space cannot execute it directly.

For user-space shadow stack manipulation (like setjmp), glibc uses SAVEPREVSSP/RSTORSSP (save/restore shadow stack pointer), which save and restore the shadow stack pointer to allow jumping to a saved execution context. These are designed for the specific use case and do not allow arbitrary shadow stack writes.

SHSTK and setjmp/longjmp

This is a real challenge in deploying SHSTK. longjmp jumps to an arbitrary saved context, which means ret will occur at a different call depth than where setjmp was called. With SHSTK, the shadow stack would have entries for the intervening frames, which must be cleaned up.

Solution: _longjmp in a CET-aware glibc walks the shadow stack to the depth of the setjmp call site, using RSTORSSP to restore the saved shadow stack pointer. This is more complex than traditional longjmp but works correctly.

Component 2: Indirect Branch Tracking (IBT)

The Problem IBT Addresses

Indirect calls (call rax, call [rbx+rdi*8]) and indirect jumps can be used for: - JOP (Jump-Oriented Programming): like ROP but with gadgets ending in JMP - Partial SROP bypasses - Vtable hijacking in C++ - Function pointer corruption

IBT requires that every valid target of an indirect branch must begin with ENDBR64.

Hardware Enforcement

When IBT is enabled (controlled by the ENDBRBR64 bit in the IA32_S_CET MSR): 1. Every indirect call/jump triggers an "INDIRECT BRANCH TRACKING" state in the CPU 2. The first instruction at the target must be ENDBR64 (or ENDBR32 for 32-bit mode) 3. If the first instruction is anything else, the CPU raises #CP

; Valid indirect call target:
foo:
    endbr64         ; required marker
    push    rbp
    ; ...

; Invalid indirect call target (calling here raises #CP):
foo+5:
    push    rbp     ; not endbr64 — IBT violation if jumped to indirectly
    ; ...

Effect on ROP and JOP Gadgets

Traditional ROP gadgets:

; pop rax; ret  — a useful ROP gadget
40125a:  58     pop    rax
40125b:  c3     ret

With IBT, using this as an indirect call target requires it to begin with ENDBR64. The 58 encoding of POP RAX is not ENDBR64 (F3 0F 1E FA). Landing here via call rax raises #CP.

This dramatically reduces the gadget surface: only explicitly marked function entries (and a few other specifically allowed targets) are valid indirect call destinations. For ROP chains that use ret gadgets (not indirect calls), IBT is less directly applicable — that is SHSTK's job.

Interaction Between SHSTK and IBT

They defend different vectors: - SHSTK: protects return addresses. Makes ret gadgets unusable for forging control flow. - IBT: protects indirect calls and jumps. Makes JOP and function-pointer corruption harder.

Together, they address the full spectrum of indirect control flow manipulation.

ENDBR64: The Instruction Encoding

ENDBR64 is encoded as F3 0F 1E FA (4 bytes). On CPUs without CET: - F3 is a REP prefix - 0F 1E is a NOP (hint instruction) - FA is... another part of the NOP hint

The net effect: F3 0F 1E FA is a REP NOP on non-CET CPUs. It executes harmlessly with no side effects. This is essential for binary compatibility: a binary with ENDBR64 markers runs correctly on old CPUs.

Checking CET status in a binary:

objdump -M intel -d binary | head -40
# CET-enabled binary: first instruction of every function is endbr64
# Non-CET binary: functions start with push rbp or sub rsp or similar

Current Deployment Status (as of 2025)

Hardware availability: - Intel: Tiger Lake (2020), Ice Lake, and all subsequent desktop/server processors - AMD: Not yet implemented in mainstream products (as of this writing; announced for future processors) - ARM: Pointer Authentication (PAC) and Branch Target Identification (BTI) serve similar roles on ARM64

Software stack: - Linux kernel: CET SHSTK support in kernel 5.18+ (for user-space processes); kernel-mode SHSTK in newer versions - glibc: CET-aware since 2.27+ (required for setjmp/longjmp compatibility) - GCC: -fcf-protection=full enables IBT+SHSTK code generation (since GCC 8) - Clang: -fcf-protection=full similarly supported

Adoption: - Major Linux distributions enable CET in packages as of 2022-2024 - Many system libraries (libc, libssl) ship with ENDBR64 markers in recent versions - Not all software is recompiled yet; CET provides partial protection in mixed environments

Effectiveness in mixed environments: If a library loaded by a CET-enabled process does NOT have ENDBR64 markers, the dynamic linker can run that library with IBT suppressed for that region (using a "legacy-compatible" mode). This preserves compatibility but reduces protection for calls into legacy code.

What CET Does NOT Solve

CET is a significant advance, not a panacea: - Logic vulnerabilities (authentication bypass, SQL injection, command injection) are not memory corruption and are unaffected by CET - Heap UAF without control-flow hijack can still corrupt data - Side-channel attacks (Spectre/Meltdown) bypass CET entirely - Kernel exploits require kernel-mode CET (which is being deployed separately) - Future bypass techniques may be developed that work within CET's constraints

The correct mental model: CET closes the traditional ROP/JOP exploitation paths very effectively. Sophisticated attackers with a sufficiently complex vulnerability chain can potentially still find ways to achieve controlled execution, but the bar is enormously higher. Defense in depth remains essential.

Assembly-Level Summary

A function in a fully CET+mitigations-enabled binary:

my_function:
    endbr64              ; IBT: valid indirect call target
    push    rbp
    mov     rbp, rsp
    sub     rsp, N
    mov     rax, [fs:0x28]   ; canary prologue
    mov     [rbp-8], rax
    xor     eax, eax
    ; ... function body ...
    mov     rax, [rbp-8]     ; canary epilogue
    xor     rax, [fs:0x28]
    jne     __stack_chk_fail
    leave
    ret                  ; SHSTK: CPU compares shadow stack on RET

Four layers of protection visible in the assembly: CET IBT (endbr64), stack canary (read + store + check), and CET SHSTK (implicit in ret). With NX, ASLR, and Full RELRO added at the OS/linker level, this represents the current state of the art in memory corruption defense.