Case Study 36-2: Intel CET — The Hardware Solution to Memory Corruption
Introduction
Intel Control-flow Enforcement Technology (CET) represents a qualitative shift in how memory corruption is defended against. Previous mitigations — stack canaries, ASLR, NX — were software or OS-level features that attackers found ways to bypass over time. CET moves the enforcement into the CPU itself, making the shadow stack inaccessible to ordinary software writes.
This case study examines CET's two components (SHSTK and IBT) at the microarchitecture level, explains why each addresses a different aspect of the exploitation problem, analyzes why SHSTK defeats traditional ROP, and discusses the current deployment and adoption status.
The Problem CET Solves
Return-Oriented Programming (Chapter 37) works because:
1. The return address on the stack is writable memory
2. ret pops the return address from the stack — no other validation
3. If the return address is corrupted, execution goes wherever the attacker chose
Stack canaries detect corruptions that pass through the canary. But if the attacker can: - Leak the canary (format string, out-of-bounds read), then - Overwrite the return address with the correct canary value intact
...the canary check passes and execution proceeds to the attacker's target.
SHSTK eliminates the possibility of forging a return address regardless of whether the canary is known, because the comparison happens against hardware-protected memory.
Component 1: Shadow Stack (SHSTK)
Microarchitecture Design
The shadow stack is a region of memory with a special page table attribute: DIRTY=1, WRITE=0 in the page table entry. This combination, enforced by the CPU, means:
- Software MOV [addr], value to a shadow stack page: #GP (general protection fault)
- Software PUSH or POP: also faults if targeting shadow stack pages
- Only CET-specific instructions can write to shadow stack pages: CALL and RET
Regular Stack page: Shadow Stack page:
Page attributes: Page attributes:
Present: 1 Present: 1
Writable: 1 Writable: 0 ← software cannot write
User: 1 (or 0 for kernel) User: 1
NX: 0 (stack may be NX-protected) NX: 0
Dirty: varies Dirty: 1 ← CET shadow stack marker
The CPU has a new register: SSP (Shadow Stack Pointer). On CALL:
1. Normal stack: push return address onto RSP stack
2. Shadow stack: CPU writes return address to SSP, decrements SSP by 8
On RET:
1. Normal stack: load return address from RSP, increment RSP
2. Shadow stack: load address from SSP, increment SSP
3. Compare both: if not equal, raise #CP (Control Protection exception)
Why Software Cannot Bypass SHSTK
An attacker who overflows a buffer and overwrites the return address on the regular stack does NOT control the shadow stack. When ret executes:
Regular Stack: Shadow Stack:
[rbp+8] = 0xdeadbeef [SSP] = 0x401234 (legitimate return address)
(attacker-controlled) (CPU-protected, unchanged)
On RET:
Regular: load 0xdeadbeef
Shadow: load 0x401234
Compare: 0xdeadbeef != 0x401234
Result: #CP exception → process terminated
The comparison is in hardware. There is no code path that can modify the shadow stack value without either (1) using CALL, which sets both stacks consistently, or (2) exploiting a hardware vulnerability in the CET implementation itself.
WRSS: The Shadow Stack Write Instruction
There is one user-space instruction that can write to the shadow stack: WRSS (Write to Shadow Stack). It exists specifically to support setjmp/longjmp and other stack unwinding operations that legitimately modify the shadow stack. WRSS requires privilege level 0 (kernel mode) — user space cannot execute it directly.
For user-space shadow stack manipulation (like setjmp), glibc uses SAVEPREVSSP/RSTORSSP (save/restore shadow stack pointer), which save and restore the shadow stack pointer to allow jumping to a saved execution context. These are designed for the specific use case and do not allow arbitrary shadow stack writes.
SHSTK and setjmp/longjmp
This is a real challenge in deploying SHSTK. longjmp jumps to an arbitrary saved context, which means ret will occur at a different call depth than where setjmp was called. With SHSTK, the shadow stack would have entries for the intervening frames, which must be cleaned up.
Solution: _longjmp in a CET-aware glibc walks the shadow stack to the depth of the setjmp call site, using RSTORSSP to restore the saved shadow stack pointer. This is more complex than traditional longjmp but works correctly.
Component 2: Indirect Branch Tracking (IBT)
The Problem IBT Addresses
Indirect calls (call rax, call [rbx+rdi*8]) and indirect jumps can be used for:
- JOP (Jump-Oriented Programming): like ROP but with gadgets ending in JMP
- Partial SROP bypasses
- Vtable hijacking in C++
- Function pointer corruption
IBT requires that every valid target of an indirect branch must begin with ENDBR64.
Hardware Enforcement
When IBT is enabled (controlled by the ENDBRBR64 bit in the IA32_S_CET MSR):
1. Every indirect call/jump triggers an "INDIRECT BRANCH TRACKING" state in the CPU
2. The first instruction at the target must be ENDBR64 (or ENDBR32 for 32-bit mode)
3. If the first instruction is anything else, the CPU raises #CP
; Valid indirect call target:
foo:
endbr64 ; required marker
push rbp
; ...
; Invalid indirect call target (calling here raises #CP):
foo+5:
push rbp ; not endbr64 — IBT violation if jumped to indirectly
; ...
Effect on ROP and JOP Gadgets
Traditional ROP gadgets:
; pop rax; ret — a useful ROP gadget
40125a: 58 pop rax
40125b: c3 ret
With IBT, using this as an indirect call target requires it to begin with ENDBR64. The 58 encoding of POP RAX is not ENDBR64 (F3 0F 1E FA). Landing here via call rax raises #CP.
This dramatically reduces the gadget surface: only explicitly marked function entries (and a few other specifically allowed targets) are valid indirect call destinations. For ROP chains that use ret gadgets (not indirect calls), IBT is less directly applicable — that is SHSTK's job.
Interaction Between SHSTK and IBT
They defend different vectors:
- SHSTK: protects return addresses. Makes ret gadgets unusable for forging control flow.
- IBT: protects indirect calls and jumps. Makes JOP and function-pointer corruption harder.
Together, they address the full spectrum of indirect control flow manipulation.
ENDBR64: The Instruction Encoding
ENDBR64 is encoded as F3 0F 1E FA (4 bytes). On CPUs without CET:
- F3 is a REP prefix
- 0F 1E is a NOP (hint instruction)
- FA is... another part of the NOP hint
The net effect: F3 0F 1E FA is a REP NOP on non-CET CPUs. It executes harmlessly with no side effects. This is essential for binary compatibility: a binary with ENDBR64 markers runs correctly on old CPUs.
Checking CET status in a binary:
objdump -M intel -d binary | head -40
# CET-enabled binary: first instruction of every function is endbr64
# Non-CET binary: functions start with push rbp or sub rsp or similar
Current Deployment Status (as of 2025)
Hardware availability: - Intel: Tiger Lake (2020), Ice Lake, and all subsequent desktop/server processors - AMD: Not yet implemented in mainstream products (as of this writing; announced for future processors) - ARM: Pointer Authentication (PAC) and Branch Target Identification (BTI) serve similar roles on ARM64
Software stack:
- Linux kernel: CET SHSTK support in kernel 5.18+ (for user-space processes); kernel-mode SHSTK in newer versions
- glibc: CET-aware since 2.27+ (required for setjmp/longjmp compatibility)
- GCC: -fcf-protection=full enables IBT+SHSTK code generation (since GCC 8)
- Clang: -fcf-protection=full similarly supported
Adoption:
- Major Linux distributions enable CET in packages as of 2022-2024
- Many system libraries (libc, libssl) ship with ENDBR64 markers in recent versions
- Not all software is recompiled yet; CET provides partial protection in mixed environments
Effectiveness in mixed environments:
If a library loaded by a CET-enabled process does NOT have ENDBR64 markers, the dynamic linker can run that library with IBT suppressed for that region (using a "legacy-compatible" mode). This preserves compatibility but reduces protection for calls into legacy code.
What CET Does NOT Solve
CET is a significant advance, not a panacea: - Logic vulnerabilities (authentication bypass, SQL injection, command injection) are not memory corruption and are unaffected by CET - Heap UAF without control-flow hijack can still corrupt data - Side-channel attacks (Spectre/Meltdown) bypass CET entirely - Kernel exploits require kernel-mode CET (which is being deployed separately) - Future bypass techniques may be developed that work within CET's constraints
The correct mental model: CET closes the traditional ROP/JOP exploitation paths very effectively. Sophisticated attackers with a sufficiently complex vulnerability chain can potentially still find ways to achieve controlled execution, but the bar is enormously higher. Defense in depth remains essential.
Assembly-Level Summary
A function in a fully CET+mitigations-enabled binary:
my_function:
endbr64 ; IBT: valid indirect call target
push rbp
mov rbp, rsp
sub rsp, N
mov rax, [fs:0x28] ; canary prologue
mov [rbp-8], rax
xor eax, eax
; ... function body ...
mov rax, [rbp-8] ; canary epilogue
xor rax, [fs:0x28]
jne __stack_chk_fail
leave
ret ; SHSTK: CPU compares shadow stack on RET
Four layers of protection visible in the assembly: CET IBT (endbr64), stack canary (read + store + check), and CET SHSTK (implicit in ret). With NX, ASLR, and Full RELRO added at the OS/linker level, this represents the current state of the art in memory corruption defense.