14 min read

Security engineering is not a one-time fix — it is an ongoing contest. Each mitigation closes one attack vector, the attacker community finds another, and the cycle repeats. Understanding this history is essential for understanding why each...

Chapter 36: Exploit Mitigations

The Arms Race

Security engineering is not a one-time fix — it is an ongoing contest. Each mitigation closes one attack vector, the attacker community finds another, and the cycle repeats. Understanding this history is essential for understanding why each mitigation is designed the way it is and why multiple mitigations together are far stronger than any one alone.

The mitigation history in rough chronological order:

  1. Stack canaries (1998): detect overwrites of the return address before ret
  2. Non-executable stack / NX / DEP (2000-2004): prevent executing shellcode on the stack
  3. ASLR (2001): randomize addresses so gadgets and code are hard to find
  4. PIE (position-independent executables): randomize the executable's own load address
  5. RELRO (2003): protect the GOT from overwrites
  6. CFI / CET (2015-2020): validate indirect calls and protect return addresses in hardware

Each one was deployed in response to demonstrated exploitation. Each one was subsequently partially bypassed by sophisticated techniques. The current generation (CET with shadow stacks) is far more robust than its predecessors, though the story continues.

This chapter examines each mitigation at the assembly level: what it does, how it is implemented in the code the compiler generates, and what conditions must hold for it to be bypassed.

🔐 Security Note: Understanding bypass conditions is not an invitation to bypass mitigations. It is essential for: choosing which mitigations to deploy and in what combination; understanding which code patterns create bypass opportunities; designing the next generation of mitigations; and auditing systems to ensure mitigations are actually effective. Security professionals need to understand attacks to build defenses.

Stack Canaries

The Concept

A stack canary is a random value placed on the stack between the local variables and the return address. Before the function returns, the canary is checked. If it has changed, someone wrote past the end of the local variables — which is the definition of a stack buffer overflow targeting the return address. The process is terminated immediately.

The name comes from the "canary in the coal mine": a warning that something is wrong.

The Assembly Implementation

GCC inserts canary code in the prologue and epilogue of protected functions. Here is what the compiler generates for a function compiled with -fstack-protector-strong:

Prologue:

; Standard prologue first
push    rbp
mov     rbp, rsp
sub     rsp, N

; Canary insertion
mov     rax, QWORD PTR fs:0x28     ; read the thread-local canary value
mov     QWORD PTR [rbp-8], rax     ; store it just below saved RBP
xor     eax, eax                   ; clear rax (don't leak the canary)

Epilogue:

; Canary check
mov     rax, QWORD PTR [rbp-8]     ; reload canary from stack
xor     rax, QWORD PTR fs:0x28     ; XOR with the original value
je      .ok                        ; if zero (unchanged), OK
call    __stack_chk_fail            ; canary changed! abort

.ok:
leave
ret

The stack layout with a canary:

High address
┌───────────────────────────────┐
│  Saved RBP (8 bytes)          │ ← rbp
├───────────────────────────────┤
│  Canary (8 bytes)             │ ← rbp - 8
├───────────────────────────────┤
│  Local variables              │ ← buf starts here
│  ...                          │
└───────────────────────────────┘ ← rsp

An overflow that reaches the return address must pass through the canary first. If any byte of the canary changes, the check fails.

Where the Canary Lives: fs:0x28

The canary is stored in thread-local storage, accessed via the fs segment register. On Linux x86-64, fs:0x0 points to the Thread Control Block (TCB). fs:0x28 is the stack canary slot in the TCB.

The canary value is initialized at program start by the C runtime, using random bytes from the kernel's random number generator. It is different for every process execution. The low byte is set to zero (so strcpy and similar string functions cannot leak it through normal string output — a null byte terminates string reading).

GCC Canary Flags

Flag Behavior
-fstack-protector Protects functions with vulnerable-looking buffers (char arrays, alloca calls)
-fstack-protector-strong Protects functions with arrays or address-of-local-variable operations (recommended)
-fstack-protector-all Protects every function (higher overhead, maximum protection)
-fno-stack-protector Disables protection (never use in production)

Canary Bypass Condition

The only way to bypass a canary without triggering it is to know its value before the overflow. The canary can be leaked if: - There is a format string vulnerability (%p reading the stack) - There is a read-past-end vulnerability that reveals the canary value - The attacker can forge a valid canary through other means

This is why format string vulnerabilities are often step one in a staged exploit: leak the canary, then overflow past it with the correct canary value intact.

⚙️ How It Works: The key security property is that the canary is 8 bytes of random data, initialized once at process start. An attacker who cannot read memory (no information leak) has at most a 1-in-2^56 chance of guessing the canary correctly (the low byte is always zero, so only 56 bits are random). That probability is negligible for practical attacks.

NX/DEP: Non-Executable Memory

The Concept

NX (No-eXecute) is a hardware feature that marks pages of memory as non-executable. When set, the CPU will raise a fault if execution reaches a page with the NX bit set. DEP (Data Execution Prevention) is Microsoft's name for the same technology on Windows.

NX killed classic shellcode injection: even if you write shellcode into the stack or heap, you cannot execute it there.

The NX Bit

In x86-64 page table entries, bit 63 is the NX bit (also called the XD bit — eXecute Disable):

Page Table Entry (64 bits):
  Bit 63:  NX — 1 = non-executable
  Bit 62-52: (ignored or used for other purposes)
  Bits 51-12: Physical page frame number
  Bits 11-0:  Control bits (present, RW, user, etc.)

The operating system sets the NX bit when mapping pages: - .text section (code): NX = 0 (executable) - .data, .bss, .rodata, stack, heap: NX = 1 (non-executable)

On AMD processors, the feature is called "NX" (No eXecute). On Intel, it is "XD" (eXecute Disable). Both are enabled by default on modern systems.

Checking NX Status

# Check if a binary has NX enabled
checksec --file=./binary

# Output:
# NX:        NX enabled

# Check the stack mapping for a running process
cat /proc/PID/maps | grep stack
# r--p (no 'x') = non-executable
# rwxp = executable (NX disabled on this stack)

The execstack Exception

For historical compatibility (JIT compilers, nested functions with trampolines), Linux allows marking a stack executable with execstack -s binary. This is strongly discouraged. checksec reports this as NX disabled.

What NX Achieves

With NX: writing shellcode to the stack and returning into it fails because the CPU refuses to execute non-executable memory. The attacker must find executable code to redirect to.

Without NX (or with execstack): shellcode runs from the stack directly.

NX forces the attacker away from "inject and execute" toward "reuse existing code" — which is exactly what ROP (Chapter 37) does.

ASLR: Address Space Layout Randomization

The Concept

ASLR randomizes the base addresses of memory regions at process start. Every execution, the stack is at a different base address, the heap is at a different base address, and libraries are loaded at different base addresses. An attacker who hard-codes an address (to a gadget in libc, to the stack, to a buffer) cannot rely on that address being correct in any given execution.

ASLR in Linux

Linux ASLR is controlled by /proc/sys/kernel/randomize_va_space: - 0: disabled - 1: stack, libraries randomized - 2: stack, libraries, heap randomized (default)

Entropy varies by region: - Stack: ~20-24 bits of randomization (aligned to page boundary) - Heap: ~13 bits - Libraries: ~28 bits (MMAP randomization) - Executable: 0 bits unless PIE

# Check ASLR setting
cat /proc/sys/kernel/randomize_va_space
# 2 (full ASLR)

# Observe ASLR in action
for i in $(seq 5); do ldd /usr/bin/ls | grep libc; done
# Different addresses each time

PIE: Position-Independent Executables

By default, ASLR does NOT randomize the executable's own load address — only libraries and the stack/heap. The executable loads at a fixed virtual address (typically 0x400000).

PIE (Position-Independent Executable) enables randomization of the executable itself:

# Compile with PIE
gcc -pie -fPIE program.c -o program

# Without PIE: executable always at 0x400000
# With PIE: executable at a random address each run

PIE requires the executable's code to use RIP-relative addressing (all references to global variables and functions go through the GOT or use LEA with RIP-relative offsets). This is what -fPIE enables.

ASLR Bypass Conditions

  • 32-bit processes: Only ~8 bits of entropy in some configurations. Brute force (try all ~256 addresses) is feasible.
  • Information leaks: Any vulnerability that reveals a code or library address breaks ASLR for that address region. Knowing one libc address reveals the libc base (calculate: leaked_address - offset_in_libc = libc_base).
  • No PIE: Without PIE, the executable's address is fixed even with ASLR. Gadgets in the executable itself are at known addresses.
  • Partial overwrites: If the overflow can overwrite only the low bytes of the return address (rather than all 8), it can redirect within the same page while preserving the randomized upper bytes.

📐 OS Kernel Project: The MinOS kernel runs in protected mode with no ASLR (a kernel manages its own address space, not the user-space randomization mechanism). For the Chapter 38 capstone, observe that your kernel loads at a fixed physical and virtual address. ASLR would require a bootloader that randomizes the kernel's load address and communicates that to the kernel — a feature of modern secure boot implementations.

RELRO: Relocation Read-Only

Background: The GOT and PLT

When a program calls a shared library function (like printf), the call goes through the PLT (Procedure Linkage Table):

; Call printf:
call    printf@PLT

; The PLT entry:
printf@PLT:
    jmp     QWORD PTR [printf@GOT]   ; indirect jump through GOT entry
    push    0                         ; (for lazy binding only)
    jmp     _dl_runtime_resolve

The GOT (Global Offset Table) holds the actual address of printf in memory. On first call (lazy binding), the GOT entry points to the resolver; after first call, it holds the real address.

The GOT is in the .got.plt section, which must be writable for lazy binding. An attacker who can write to the GOT can overwrite printf's entry with a chosen address — the next call to printf jumps there instead.

Partial RELRO

-Wl,-z,relro enables partial RELRO: - Reorders ELF sections so that .got (non-PLT GOT) and .init_array come before .data - Makes those sections read-only after the dynamic linker initializes them - Does NOT make .got.plt read-only (lazy binding still works) - Prevents some GOT overwrite attacks but not the PLT GOT

Full RELRO

-Wl,-z,relro -Wl,-z,now enables full RELRO: - Forces the dynamic linker to resolve ALL symbols at startup (no lazy binding) - After resolution, makes the entire .got.plt section read-only - Result: the GOT is completely unwritable during normal execution - Cost: slightly slower startup (all symbols resolved eagerly)

Check RELRO status:

checksec --file=./binary
# Partial RELRO: .got protected, .got.plt not
# Full RELRO:    entire GOT read-only

RELRO Bypass Conditions

With Full RELRO, GOT overwrite attacks are impossible. An attacker must find: - A writable code pointer in .data (function pointers in structs) - A heap-based function pointer (vtable, callback) - A ROP chain that does not need GOT overwrites

⚠️ Common Mistake: Enabling -z relro without -z now only gives partial RELRO. Many security checklists specify "RELRO enabled" without specifying which level. Always use Full RELRO (-z relro -z now) for production binaries.

CFI: Control Flow Integrity

The Problem with Indirect Calls

Indirect calls and indirect jumps are the attacker's leverage point: - call rax — jumps to wherever RAX points - jmp QWORD PTR [rbx+rdi*8] — jumps to a computed address - ret — jumps to the return address (which is on the stack)

If an attacker can control the target of any of these, they control execution. ROP exploits this: each ret becomes a mechanism for executing chosen code.

Clang CFI

Clang's CFI (enabled with -fsanitize=cfi) instruments every indirect call to check that the target is a valid function of the expected type:

For indirect function calls through function pointers:

; Clang CFI around: (*fn_ptr)(arg)
; Before the call, check fn_ptr is in the set of valid function targets:
lea     rax, [rip+.valid_targets]   ; load table of valid targets
; ... check logic ...
jne     __ubsan_handle_cfi_check_fail
call    rax                          ; only if check passes

Clang maintains a "valid call targets" set for each function pointer type. An indirect call to a function not in the set triggers a CFI violation — which terminates the process.

Limitation: Clang CFI requires all code to be compiled with CFI enabled. Library calls that go through the PLT are not protected by default. Mixed-compilation environments reduce effectiveness.

Intel CET: Control-flow Enforcement Technology

Intel CET is a hardware-enforced CFI mechanism, available on Intel Tiger Lake and newer processors (2020+). It has two components:

IBT: Indirect Branch Tracking

IBT requires that every valid target of an indirect call or jump (call rax, jmp rax, jmp [rax]) must begin with an ENDBR64 instruction (on x86-64):

; Every function that can be called indirectly must start with:
my_function:
    endbr64         ; Intel CET: this is a valid indirect branch target
    push    rbp
    mov     rbp, rsp
    ; ...

If an indirect call lands somewhere without ENDBR64 at the target address, the CPU raises a control protection exception (#CP). Jumping into the middle of a function (as ROP gadgets do) does not reach an ENDBR64 at the gadget address, so it triggers the exception.

Effect on ROP: A gadget like pop rax; ret cannot be used as an indirect jump target because it does not begin with ENDBR64. IBT dramatically reduces the available gadget set.

SHSTK: Shadow Stack

The shadow stack is a second, hardware-protected stack that stores only return addresses. It is maintained automatically by the CPU on every CALL and RET:

  • On CALL: the CPU pushes the return address to both the regular stack AND the shadow stack
  • On RET: the CPU pops the return address from both stacks and compares them
  • If they differ, the CPU raises a control protection exception

The shadow stack lives at a memory range marked with a special page attribute (shadow stack page). User-space code cannot write to it with ordinary MOV or PUSH instructions — only the CPU itself (via CALL/RET semantics) can push return addresses onto it.

Regular Stack:          Shadow Stack (hardware protected):
┌───────────────┐       ┌───────────────┐
│ return addr   │  ══>  │ return addr   │  (CPU copies on CALL)
│ (writable)    │       │ (NOT writable │
│               │       │  by software) │
└───────────────┘       └───────────────┘

On RET:
  pop return_addr from regular stack
  pop shadow_addr from shadow stack
  if return_addr != shadow_addr: #CP exception!

Effect on ROP: Even if an attacker overwrites the return address on the regular stack (via a buffer overflow), the shadow stack still has the correct return address. When ret executes, the mismatch is detected and the process is terminated. This directly defeats the fundamental mechanism of ROP chains.

CET in Assembly: The ENDBR64 Instruction

Every function and valid indirect call target in a CET-enabled binary begins with ENDBR64:

; objdump output of a CET-enabled function:
00000000004011a0 <process>:
  4011a0:  f3 0f 1e fa             endbr64
  4011a4:  55                      push   rbp
  4011a5:  48 89 e5                mov    rbp, rsp
  ; ...

ENDBR64 is encoding F3 0F 1E FA. On CPUs without CET support, it decodes as REP NOP (harmless). On CET-enabled CPUs, it marks this address as a valid branch target.

The Shadow Stack in Practice

# Check if a binary has CET enabled
checksec --file=./binary
# IBT:        IBT enabled
# SHSTK:      SHSTK enabled

# Compile with CET support (GCC 9+, glibc 2.28+)
gcc -fcf-protection=full program.c -o program

🔐 Security Note: CET SHSTK is a significant advance over previous mitigations. Unlike canaries (which can be bypassed by leaking the canary value) or ASLR (which can be bypassed by leaking an address), the shadow stack cannot be bypassed by any software write — it requires either a hardware vulnerability or a control flow path that legitimately executes ROP-like chains (which CFI also addresses). CET + CFI together represent the current best-practice hardware mitigation stack.

checksec: Auditing Binary Security

checksec is a tool that examines a binary's security features:

$ checksec --file=./vulnerable_server
[*] './vulnerable_server'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)
    RUNPATH:  b'/usr/local/lib'

Reading this output: - Partial RELRO: GOT entries for non-PLT functions are protected, but PLT entries are writable. Partial bypass possible. - Canary found: Stack canaries present. Overflow to return address requires first leaking the canary. - NX enabled: Stack and heap are non-executable. Classic shellcode injection does not work. - No PIE: Executable loads at 0x400000. Gadgets in the executable itself are at known addresses. ASLR does not protect the executable. - RUNPATH: A custom library path is set — potential DLL hijacking concern.

A hardened production binary would show:

RELRO:    Full RELRO
Stack:    Canary found
NX:       NX enabled
PIE:      PIE enabled
RPATH:    No RPATH
RUNPATH:  No RUNPATH

How Mitigations Layer: Defense in Depth

No single mitigation stops all attacks. The power is in combination:

Attack Canary NX ASLR Full RELRO CFI/CET
Classic shellcode injection ✓ detects overflow ✓ prevents execution ✓ randomizes stack
Return-to-libc ✓ detects overflow — (code in libc) ✓ randomizes libc ✓ SHSTK
ROP chain ✓ detects if no leak — (uses code) ✓ if no info leak ✓ SHSTK + IBT
GOT overwrite ✓ makes GOT RO
Format string leak + ROP — (canary leaked) — (address leaked) ✓ SHSTK

With all mitigations enabled (Full RELRO + Canary + NX + PIE/ASLR + CET), a successful exploitation requires either: - A vulnerability that provides both an information leak AND a writable code pointer AND bypasses SHSTK, all in one chained sequence - A logic vulnerability that does not involve memory corruption at all (authentication bypass, command injection)

This is the current state of the art. It is hard but not impossible — which is why research continues.

Prologue/Epilogue Summary: What the Compiler Adds

Here is the complete prologue and epilogue for a function compiled with all mitigations:

my_function:
    endbr64                           ; CET IBT: valid indirect branch target
    push    rbp
    mov     rbp, rsp
    sub     rsp, N
    mov     rax, QWORD PTR fs:0x28   ; load canary
    mov     QWORD PTR [rbp-8], rax   ; store canary on stack
    xor     eax, eax                 ; clear rax (don't leak canary)
    ; ... function body ...
    mov     rax, QWORD PTR [rbp-8]   ; load canary from stack
    xor     rax, QWORD PTR fs:0x28   ; check against original
    jne     __stack_chk_fail         ; if changed, abort
    leave
    ret                              ; CPU checks shadow stack on RET

The CET shadow stack check happens in the CPU itself during ret — no explicit instruction needed.

🔄 Check Your Understanding: 1. What two assembly instructions implement the canary check in a function epilogue? 2. Why does the canary have a null byte at the low end? 3. What is the difference between NX and ASLR — what does each prevent? 4. What is Full RELRO and why does it matter that it is "full" rather than "partial"? 5. Explain how CET's SHSTK defeats a classic return address overwrite.

Summary

Modern exploit mitigations are not magic — they are specific assembly sequences and hardware features, each addressing a specific attack vector in the vulnerability-exploitation chain. Stack canaries detect overwrites before return. NX/DEP prevents executing shellcode in data memory. ASLR randomizes addresses. RELRO protects the GOT. CET adds hardware enforcement for both indirect branches and return addresses. Together, these mitigations represent 35 years of engineering in response to real attacks. Understanding them at the assembly level explains why the next-generation techniques (ROP, SROP, information leaks) exist — and why CET's shadow stack represents a qualitative advance in the ability to detect and prevent memory corruption exploitation.