Chapter 37: Return-Oriented Programming and Modern Exploitation

Open Assembly Language Project

13 min read

> 🔐 Security Note — Read This First: This chapter explains Return-Oriented Programming (ROP) as a defensive education topic. ROP is the foundation for the second generation of exploit mitigations — Intel CET, SafeStack, and modern CFI exist...

Key Takeaways Exercises Quiz Case Study 01 Case Study 02 Further Reading

Chapter 37: Return-Oriented Programming and Modern Exploitation

🔐 Security Note — Read This First: This chapter explains Return-Oriented Programming (ROP) as a defensive education topic. ROP is the foundation for the second generation of exploit mitigations — Intel CET, SafeStack, and modern CFI exist specifically to defeat it. Security defenders, mitigations engineers, compiler engineers, and systems programmers must understand ROP conceptually to understand why these mitigations exist and how they work. All content is educational. No complete, weaponized exploit tools are provided or described. The examples are illustrative of the mechanism, not ready-to-use attack code.

Why ROP Exists: The Response to NX/DEP

The story of ROP begins with a constraint. In the mid-2000s, NX/DEP was deployed widely, and it worked: you could no longer inject shellcode into the stack and execute it. The stack was non-executable. Any attempt to jump to stack memory and execute it would fault.

But NX/DEP only marks certain pages non-executable. The code in the program itself — the .text section, the code in shared libraries — is still executable. It has to be. If .text were non-executable, the program would not run.

The insight that led to ROP: "What if we don't inject new code? What if we reuse code that is already there?"

Every program contains thousands of useful instruction sequences. Every shared library — libc alone contains hundreds of thousands of instructions — contains small fragments of code that end in ret. If you could chain these fragments together by controlling the stack, you could perform arbitrary computation without injecting a single new instruction.

This is Return-Oriented Programming. The name describes the technique exactly: you program by returning into carefully chosen instruction sequences.

Gadgets: The Building Blocks

A ROP gadget is a short sequence of instructions that ends with a RET instruction. The ret is what makes it useful: it pops the next address from the stack and jumps there, which is how gadgets chain together.

Examples of useful gadgets:

; Load a value into RAX
pop rax
ret

; Load into RDI (first syscall argument)
pop rdi
ret

; Write to memory
mov [rdi], rax
ret

; Set RAX to zero
xor rax, rax
ret

; System call
syscall
ret

; Do arithmetic
add rax, rbx
ret

These are not complete programs. They are fragments — individual instructions followed by ret. But combined on the stack in the right sequence, they can perform any computation.

💡 Mental Model: Think of gadgets as "micro-instructions" in a very different kind of computer. The "program counter" is the stack pointer. Each "instruction" is a gadget. "Executing an instruction" means ret jumps to that gadget, the gadget runs, and the final ret advances to the next gadget on the stack. The CPU does not know it is being used as a ROP executor — it is just following CALL and RET mechanics faithfully.

Building a ROP Chain

A ROP chain is a sequence of gadget addresses placed on the stack. When execution is redirected to the chain (by an overflow or other control flow hijack), each gadget executes and chains to the next via ret.

The Stack Layout of a ROP Chain

Attacker-controlled stack (forged):
┌───────────────────────────────┐
│  addr of gadget 1             │ ← RSP points here after overflow
├───────────────────────────────┤
│  data for gadget 1 (if needed)│
├───────────────────────────────┤
│  addr of gadget 2             │
├───────────────────────────────┤
│  data for gadget 2 (if needed)│
├───────────────────────────────┤
│  addr of gadget 3             │
├───────────────────────────────┤
│  ... (continues)              │
└───────────────────────────────┘

Execution flow: 1. The overflow overwrites the return address with the address of gadget 1 2. The function returns: ret pops addr of gadget1 from RSP into RIP 3. Gadget 1 executes (e.g., pop rax) — this pops the next value on the stack (the "data for gadget 1") into RAX and then executes its own ret 4. That ret pops the next address on the stack — the address of gadget 2 — into RIP 5. Gadget 2 executes, and its ret loads gadget 3's address 6. This continues until the chain is done

Register Trace: A Three-Gadget Chain

Conceptual chain that sets RAX=1, RDI=pointer, then calls a function:

Initial: RSP points to the start of the forged chain

Gadget 1: pop rdi; ret
Stack before gadget 1: [pop_rdi_gadget_addr, string_ptr, pop_rax_gadget_addr, ...]
  Instruction: pop rdi → RDI = string_ptr; RSP += 8
  Instruction: ret → RIP = pop_rax_gadget_addr; RSP += 8

Gadget 2: pop rax; ret
Stack: [pop_rax_gadget_addr, 1, call_target_gadget_addr, ...]
  Instruction: pop rax → RAX = 1; RSP += 8
  Instruction: ret → RIP = call_target_gadget_addr; RSP += 8

Gadget 3: call [rax] or jmp rax (for a function call via register)
  Executes the desired function

Step	Gadget	RDI	RAX	RSP (relative)	RFLAGS
Start	overflow → ret	?	?	0	-
After pop rdi	pop rdi; ret	string_ptr	?	+16	-
After pop rax	pop rax; ret	string_ptr	1	+32	-
Execute target	...	string_ptr	1	+40	-

Finding Gadgets

ROPgadget

ROPgadget scans a binary for gadget sequences:

# Find all gadgets in a binary
ROPgadget --binary ./program --rop

# Find specific gadget types
ROPgadget --binary ./program --rop | grep "pop rdi"
ROPgadget --binary ./program --rop | grep "pop rax"

# Include libc gadgets
ROPgadget --binary /lib/x86_64-linux-gnu/libc.so.6 --rop | grep "pop rdi"

Example output:

Gadgets information
============================================================
0x0000000000401200 : pop rdi ; ret
0x0000000000401208 : pop rsi ; pop r15 ; ret
0x0000000000401234 : pop rdx ; ret
0x000000000040126c : xor eax, eax ; ret
0x0000000000401290 : syscall ; ret
...

Unique gadgets found: 847

Ropper

ropper is an alternative with a similar interface:

ropper --file ./program --search "pop rdi"

Gadget Quality

Not all gadgets are equally useful: - Shorter is better: a gadget that pops 5 registers before ret works but changes registers you might not want changed - Side-effect-free: ideally, the gadget only does what you intend — no unexpected memory writes or register clobbers - Available: large binaries have more gadgets; libc is a particularly rich source because it is large - Stable: in ASLR+PIE, gadgets in the executable are at predictable offsets from the base; gadgets in libc are at predictable offsets from the libc base (once the base is known)

Unintended Gadgets

Here is an important subtlety: gadgets are not limited to what the compiler intended. Any byte sequence in the binary can be executed if RIP is pointed to it. Consider:

Intended instruction at 0x401234:
    48 89 c7    mov rdi, rax
    c3          ret

Unintended gadget starting at 0x401235:
    89 c7       mov edi, eax   (different operation!)
    c3          ret

The byte at 0x401235 (which is the second byte of mov rdi, rax) begins a valid but unintended instruction sequence. x86-64's variable-length instruction encoding means there are far more gadgets than there are explicitly compiled instruction sequences. This is why large binaries have thousands of gadgets.

⚙️ How It Works: This is unique to CISC architectures with variable-length instructions. On a fixed-width ISA like ARM64, instructions are always 4 bytes aligned to 4-byte boundaries. You cannot start decoding from byte 2 of an instruction and get something meaningful. ARM64's equivalent (SROP and branch-oriented programming using explicit branch instructions) requires different techniques.

ret2libc: The Simpler Special Case

Before full ROP chains were necessary, ret2libc was the dominant technique after NX deployment. The idea: instead of building a general computation from gadgets, just jump directly to system() in libc with /bin/sh as the argument.

ret2libc Without ASLR

If ASLR is disabled (or for the executable itself without PIE), libc is at a fixed address. The chain:

Find the address of system in libc: objdump -T /lib/.../libc.so.6 | grep system
Find the string /bin/sh in libc: it exists as a literal string in libc, find its offset
Build the overflow payload: - Padding to return address (offset bytes) - pop rdi; ret gadget address (to set RDI = address of "/bin/sh") - Address of /bin/sh string in libc (the argument to pop) - Address of system in libc (the function to call)

Conceptually:

Stack after overflow:
┌─────────────────────────────┐
│  pop_rdi_ret gadget addr    │ ← return address (was overwritten)
├─────────────────────────────┤
│  address of "/bin/sh"       │ ← popped into RDI by first gadget
├─────────────────────────────┤
│  address of system()        │ ← ret after pop rdi jumps here
└─────────────────────────────┘

With ASLR disabled: the addresses are known, this works.

ret2libc With ASLR

With ASLR, the libc base address is random. The chain becomes:

Phase 1: Information leak — use a gadget to call puts(printf) or similar, printing a known libc address (a PLT stub address that we called). From this, calculate: libc_base = leaked_address - known_offset_of_puts_in_libc
Phase 2: Calculate real addresses — now that libc base is known, calculate the real addresses of system and /bin/sh
Phase 3: Execute — trigger another overflow (or redirect from the information leak chain) to call system("/bin/sh")

This is a two-stage exploit: the first chain leaks information, the second chain uses that information.

ret2plt: Calling PLT Entries

ret2plt is a specific technique for the information leak phase. The PLT (Procedure Linkage Table) stubs are at known addresses even without PIE (or at known offsets from the executable base with PIE):

; PLT stub for puts (example addresses):
00401040 <puts@plt>:
  401040:  jmp QWORD PTR [puts@GOT]    ; jump to puts's real address
  401046:  push 0                      ; lazy binding index
  40104b:  jmp _dl_runtime_resolve

; PLT stub for printf:
00401060 <printf@plt>:
  401060:  jmp QWORD PTR [printf@GOT]

By returning to the puts@plt stub with a GOT entry address in RDI, you call puts on a GOT entry — which prints the actual runtime address of a libc function. From this, calculate libc base.

This is why Full RELRO (making GOT read-only) is a meaningful mitigation: it does not prevent ret2plt-based information leaks directly (the PLT stubs still call the functions), but combined with ASLR and PIE, it closes the GOT overwrite path for the second stage.

JOP: Jump-Oriented Programming

JOP is ROP's cousin for indirect jumps. Instead of gadgets ending in RET, JOP uses gadgets ending in JMP. This is useful when the shadow stack (SHSTK) protects RET but not JMP.

; JOP gadget examples:
mov rsp, rdi
jmp rdx         ; "dispatcher" pattern

add rdi, 8
jmp [rdi]       ; advance through a jump table

; "trampoline": loads next target and jumps
pop r15
jmp r15

JOP is more complex to chain because there is no automatic "advance to next entry" mechanism — the designer must provide a dispatcher gadget. CET IBT addresses JOP by requiring ENDBR64 at indirect JMP targets.

SROP: Sigreturn-Oriented Programming

SROP (Sigreturn-Oriented Programming) is an elegant technique discovered in 2014 that requires only a single gadget.

The Sigreturn Mechanism

When the Linux kernel delivers a signal to a process, it pushes a sigcontext structure on the user stack before calling the signal handler. When the signal handler returns via sigreturn, the kernel pops the sigcontext and restores ALL registers from it:

/* sigcontext structure (simplified) */
struct sigcontext {
    uint64_t r8;
    uint64_t r9;
    uint64_t r10;
    uint64_t r11;
    uint64_t r12;
    uint64_t r13;
    uint64_t r14;
    uint64_t r15;
    uint64_t rdi;
    uint64_t rsi;
    uint64_t rbp;
    uint64_t rbx;
    uint64_t rdx;
    uint64_t rax;
    uint64_t rcx;
    uint64_t rsp;
    uint64_t rip;
    /* ... flags, segments ... */
};

The sigreturn syscall (number 15 on Linux x86-64) reads this structure from the stack and restores it to the CPU. It sets ALL registers, including RIP and RSP, to whatever values are in the structure.

The SROP Insight

If you can call sigreturn with a forged sigcontext on the stack, you set all registers to arbitrary values simultaneously. You only need one gadget: syscall; ret (or just syscall).

The chain: 1. Overflow points RSP to a forged stack 2. First "gadget" on forged stack: syscall; ret 3. Before the syscall, RAX is set to 15 (sigreturn syscall number) by one of the preceding setup steps 4. The forged sigcontext structure is on the stack below the syscall gadget 5. sigreturn restores ALL registers from the structure — setting RIP, RSP, RDI, RAX, etc. to whatever you want

With one gadget and the ability to craft a sigcontext structure, you achieve full register control.

Why SROP Matters for Defenders

SROP is why: - ENDBR64 (CET IBT) must cover the syscall instruction's entry point - SHSTK (CET) protects the ret after the syscall gadget - Linux kernels validate that sigreturn is called from a signal handler (via a magic value in the sigcontext), partially mitigating misuse

Blind ROP is a technique for exploiting a server remotely when you do not have a copy of the binary. Published in 2014 by Bittau et al.

The technique: 1. Trigger a crash to confirm a buffer overflow exists 2. Find the offset to the return address by incrementally increasing the overflow and watching for crashes 3. Find gadgets by probing — return to addresses in the binary and observing whether the server crashes, hangs, or responds differently 4. Reconstruct enough of the binary's gadget layout to build a full chain

BROP demonstrates that ASLR's security relies on process address space privacy. If an attacker can repeatedly crash and restart a process (which re-randomizes addresses), they can eventually brute-force the layout. BROP works because: - Many servers fork a child per connection; the child has the same address layout as the parent - Crashing the child does not reveal that ASLR has been partially mapped; a new child is forked with the same parent layout

Modern defenses: re-randomize on each fork, not just on exec.

The ROP Turing Completeness

This is not just a practical observation: Shacham (2007) proved formally that ROP is Turing complete — any computation that can be expressed in x86-64 assembly can be expressed as a ROP chain, given a binary with sufficient gadget diversity. This means that NX/DEP, which prevents injecting new code, does not prevent arbitrary computation. It only changes the form of the computation.

The formal proof identifies these primitives needed for Turing completeness in a ROP chain: - Load constant into register (pop + value) - Load from memory - Store to memory - Arithmetic (add, xor, etc.) - Conditional branch (requires creativity — typically via conditional execution paths)

Any binary large enough to contain these gadget types supports Turing-complete ROP. libc alone is sufficient.

Mitigations Against ROP

CET SHSTK: The Hardware Defense

Covered in detail in Chapter 36. The shadow stack is the most effective defense: ret gadgets chain because each ret pops the next gadget address from the stack. With SHSTK, ret also checks the shadow stack. Since the forged chain was not built through legitimate CALL instructions, the shadow stack does not have matching entries. Every ret in the chain triggers #CP.

SafeStack

SafeStack (Clang -fsanitize=safe-stack) separates the stack into: - Safe stack: return addresses, function arguments (read-only after function entry in principle) - Unsafe stack: local variables that might be overflowed

The return address is on the safe stack; the buffer that might overflow is on the unsafe stack. An overflow of the buffer cannot reach the return address. SafeStack is a software equivalent of SHSTK with lower hardware requirements.

CFI: Control Flow Integrity

Clang CFI (-fsanitize=cfi) checks that indirect calls go to valid targets. For ROP specifically, the relevant variant is: - CFI-icall: check that indirect function calls go to functions of the correct type

ROP gadgets are not valid function entries, so reaching them via indirect call would fail. For pure ret-based ROP, CFI-icall is less directly effective; SHSTK addresses that.

PIE + ASLR: Randomize Gadget Addresses

With PIE+ASLR, every gadget's address changes with each run. Without an information leak, brute-forcing 28 bits of entropy is impractical for 64-bit systems. Combined with SHSTK, this means: - Even with an information leak (which reveals gadget addresses), SHSTK prevents using them - Without an information leak, ASLR prevents finding them

The Current State: ROP in 2025

Systems with CET SHSTK + IBT enabled are effectively ROP-resistant via traditional techniques. The current frontier: - Kernel exploitation: kernel-mode CET is being deployed but not yet universal. Kernel ROP is still a significant concern. - JIT spraying: JIT compilers generate executable code containing attacker-influenced constants. With IBT, JIT-generated code must include ENDBR64; without it, the gadgets in JIT code are not reachable via indirect call. - Data-only attacks: attacks that corrupt data (not code pointers) to achieve privilege escalation without redirecting control flow. These are unaffected by CET. Examples: corrupting a file descriptor number, a UID value, or a flag that controls permissions. - Logic bugs: authentication bypass without memory corruption. No memory safety mitigation applies.

The academic community has studied "control-flow bending" attacks that work within CFI constraints by using valid call edges in malicious combinations. These are sophisticated and not representative of common attack patterns.

Diagram: Complete ROP Chain Execution

Stack state when vulnerable function's ret executes:

RSP → ┌─────────────────────────────────────┐
      │  Address of gadget 1:               │
      │  pop rdi; ret   (at 0x401200)       │
      ├─────────────────────────────────────┤
RSP+8 │  Value for RDI:                     │
      │  0x4040a0 (address of "/bin/sh")    │
      ├─────────────────────────────────────┤
RSP+16│  Address of gadget 2:               │
      │  pop rax; ret   (at 0x40126c)       │
      ├─────────────────────────────────────┤
RSP+24│  Value for RAX:                     │
      │  59 (execve syscall number)         │
      ├─────────────────────────────────────┤
RSP+32│  Address of gadget 3:               │
      │  xor rsi, rsi; ret  (at 0x401290)  │
      ├─────────────────────────────────────┤
RSP+40│  Address of gadget 4:               │
      │  xor rdx, rdx; ret  (at 0x4012a4)  │
      ├─────────────────────────────────────┤
RSP+48│  Address of gadget 5:               │
      │  syscall; ret   (at 0x4012b0)      │
      └─────────────────────────────────────┘

Execution flow:
  ret → RIP=0x401200 (pop rdi)
        pop rdi ← RDI = 0x4040a0
        ret → RIP=0x40126c (pop rax)
        pop rax ← RAX = 59
        ret → RIP=0x401290 (xor rsi,rsi)
        xor rsi, rsi ← RSI = 0
        ret → RIP=0x4012a4 (xor rdx,rdx)
        xor rdx, rdx ← RDX = 0
        ret → RIP=0x4012b0 (syscall)
        syscall: execve(RDI="/bin/sh", RSI=NULL, RDX=NULL)

This diagram is conceptual: all addresses are illustrative. With ASLR, these addresses change every run; with SHSTK, each ret would compare against the shadow stack and fail.

🔄 Check Your Understanding: 1. Why does NX/DEP not prevent ROP? 2. What makes a sequence of instructions a "gadget"? 3. How does a ROP chain "advance" from one gadget to the next? 4. Why is an information leak required before ret2libc when ASLR is enabled? 5. Explain in one sentence why CET SHSTK defeats traditional ROP chains.

Summary

Return-Oriented Programming is the technique that forced the development of the second generation of exploit mitigations. Its insight — that arbitrary computation is possible by chaining existing code fragments that end in ret — demonstrated that preventing code injection (via NX/DEP) is insufficient. ROP chains work by placing gadget addresses on the stack and using ret to chain between them. Every large binary contains thousands of usable gadgets. ret2libc and ret2plt are simpler special cases. SROP achieves full register control with a single gadget. CET SHSTK defeats ROP directly by maintaining a hardware-protected copy of return addresses that cannot be forged. IBT defeats JOP by requiring ENDBR64 at indirect branch targets. Understanding ROP explains why these mitigations are designed the way they are and why they represent qualitative improvements over earlier approaches.

In This Chapter

Chapter 37: Return-Oriented Programming and Modern Exploitation

Why ROP Exists: The Response to NX/DEP

Gadgets: The Building Blocks

Building a ROP Chain

The Stack Layout of a ROP Chain

Register Trace: A Three-Gadget Chain

Finding Gadgets

ROPgadget

Ropper

Gadget Quality

Unintended Gadgets

ret2libc: The Simpler Special Case

ret2libc Without ASLR

ret2libc With ASLR

ret2plt: Calling PLT Entries

JOP: Jump-Oriented Programming

SROP: Sigreturn-Oriented Programming

The Sigreturn Mechanism

The SROP Insight

Why SROP Matters for Defenders

Blind ROP (BROP)

The ROP Turing Completeness

Mitigations Against ROP

CET SHSTK: The Hardware Defense

SafeStack

CFI: Control Flow Integrity

PIE + ASLR: Randomize Gadget Addresses

The Current State: ROP in 2025

Diagram: Complete ROP Chain Execution

Summary