Case Study 37-1: A ROP Chain by Hand — Understanding the Control Flow

Introduction

The most effective way to understand ROP at a deep level is to trace through one manually — not running it, but following each step on paper, seeing exactly how the stack state evolves and how each ret advances the chain. This case study does exactly that: a completely annotated walkthrough of a conceptual ROP chain that computes execve("/bin/sh", NULL, NULL) through gadget chaining.

The purpose is mechanical understanding: after this walkthrough, you will understand exactly why SHSTK works (it detects the mismatch between the forged chain and the shadow stack), what information an attacker needs (gadget addresses), and why ASLR matters (it randomizes those addresses).

🔐 Security Note: This walkthrough traces an illustrative ROP chain on conceptual address values. It is a pedagogical exercise — no functional exploit code for any real system is provided. The addresses used are illustrative only and correspond to no real binary. This kind of analysis is performed by security researchers, mitigations engineers, and CTF competitors for understanding.

The Target: execve("/bin/sh", NULL, NULL)

The computation we want to perform via ROP:

RDI = pointer to "/bin/sh" string (first arg)
RSI = 0 (NULL argv)
RDX = 0 (NULL envp)
RAX = 59 (execve syscall number)
syscall

This requires setting four registers and executing a syscall. In a normal function call, these would be set in the calling code. In a ROP chain, we build a forged stack that sets each one via gadgets.

The Available Gadgets

We find these gadgets in our hypothetical binary (fixed addresses — no ASLR, no PIE, for illustration):

Gadget Address Assembly
G1 0x401200 pop rdi; ret
G2 0x401210 pop rax; ret
G3 0x401220 xor rsi, rsi; ret
G4 0x401230 xor rdx, rdx; ret
G5 0x401240 syscall; ret

Also in the binary: - The string /bin/sh is at address 0x404040 (in .data)

The Forged Stack Layout

The buffer overflow overwrites the return address. After the overflow, RSP points to our forged data:

Address      Content            Role in chain
──────────   ─────────────────  ─────────────────────
[RSP+0]:     0x401200           → gadget G1 (pop rdi; ret)
[RSP+8]:     0x404040           → value for RDI ("/bin/sh" address)
[RSP+16]:    0x401220           → gadget G3 (xor rsi, rsi; ret)
[RSP+24]:    0x401230           → gadget G4 (xor rdx, rdx; ret)
[RSP+32]:    0x401210           → gadget G2 (pop rax; ret)
[RSP+40]:    0x00000039         → value for RAX (59 = execve)
[RSP+48]:    0x401240           → gadget G5 (syscall; ret)

Step-by-Step Execution Trace

Initial State

The vulnerable function executes its ret instruction. At this moment: - Regular stack: [RSP] = 0x401200 (G1 address — the overflow put this here) - Shadow stack: [SSP] = 0x40159f (the legitimate return address to main)

Step 1: ret in Vulnerable Function

CPU executes: ret
  → pops [RSP] = 0x401200 into RIP
  → RSP += 8 (now RSP = RSP+8)
  → SHSTK check: compare 0x401200 (from regular stack) with 0x40159f (from shadow)
  → MISMATCH → #CP exception (if SHSTK enabled)

Without SHSTK, the process continues:

RIP = 0x401200  (G1: pop rdi; ret)
RSP = RSP+8

Step 2: G1 Executes (pop rdi; ret)

G1 is at 0x401200, contains pop rdi; ret.

CPU fetches and executes: pop rdi
  → RDI = [RSP] = 0x404040  ("/bin/sh" address)
  → RSP += 8  (RSP is now RSP+16)

CPU executes: ret
  → pops [RSP] = 0x401220 into RIP  (G3: xor rsi, rsi; ret)
  → RSP += 8  (now RSP+24)

State after G1:

RDI = 0x404040 ("/bin/sh")
RSP at RSP+24
RIP = 0x401220 (G3)

Step 3: G3 Executes (xor rsi, rsi; ret)

G3 is at 0x401220, contains xor rsi, rsi; ret.

CPU executes: xor rsi, rsi
  → RSI = 0 (XOR of anything with itself = 0)
  → RSP unchanged

CPU executes: ret
  → pops [RSP] = 0x401230 into RIP  (G4: xor rdx, rdx; ret)
  → RSP += 8  (now RSP+32)

State after G3:

RDI = 0x404040, RSI = 0
RSP at RSP+32
RIP = 0x401230 (G4)

Step 4: G4 Executes (xor rdx, rdx; ret)

G4 is at 0x401230, contains xor rdx, rdx; ret.

CPU executes: xor rdx, rdx
  → RDX = 0

CPU executes: ret
  → pops [RSP] = 0x401210 into RIP  (G2: pop rax; ret)
  → RSP += 8  (now RSP+40)

State after G4:

RDI = 0x404040, RSI = 0, RDX = 0
RSP at RSP+40
RIP = 0x401210 (G2)

Step 5: G2 Executes (pop rax; ret)

G2 is at 0x401210, contains pop rax; ret.

CPU executes: pop rax
  → RAX = [RSP] = 0x39 = 59  (execve syscall number)
  → RSP += 8  (now RSP+48)

CPU executes: ret
  → pops [RSP] = 0x401240 into RIP  (G5: syscall; ret)
  → RSP += 8  (now RSP+56)

State after G2:

RDI = 0x404040, RSI = 0, RDX = 0, RAX = 59
RSP at RSP+56
RIP = 0x401240 (G5)

Step 6: G5 Executes (syscall; ret)

G5 is at 0x401240, contains syscall; ret.

CPU executes: syscall
  → System call: execve(RDI="/bin/sh", RSI=NULL, RDX=NULL)
  → Kernel executes /bin/sh
  → (If execve fails, the process continues to the ret, but execve does not return on success)

Register Trace Summary

Step Gadget RDI RSI RDX RAX RSP
Initial overflow ? ? ? ? RSP+0
ret in vulnerable ? ? ? ? RSP+8
pop rdi G1 0x404040 ? ? ? RSP+16
ret → G3 G1→G3 0x404040 ? ? ? RSP+24
xor rsi,rsi G3 0x404040 0 ? ? RSP+24
ret → G4 G3→G4 0x404040 0 ? ? RSP+32
xor rdx,rdx G4 0x404040 0 0 ? RSP+32
ret → G2 G4→G2 0x404040 0 0 ? RSP+40
pop rax G2 0x404040 0 0 59 RSP+48
ret → G5 G2→G5 0x404040 0 0 59 RSP+56
syscall G5 0x404040 0 0 59 RSP+56

Why SHSTK Defeats This Chain

At Step 1, when the vulnerable function executes ret: - Regular stack: 0x401200 (G1 — attacker-controlled) - Shadow stack: 0x40159f (legitimate return address — hardware-protected)

The mismatch triggers #CP. Execution never reaches G1.

At every subsequent ret in the chain, the same problem occurs: the shadow stack has the values pushed by legitimate CALL instructions, not by the forged chain. No ret in the chain can succeed without triggering #CP.

The shadow stack contents are set by CALL instructions — which the attacker did not execute. The forged chain has no CALL instructions; it is entirely ret-based. This fundamental mismatch is why SHSTK is specifically effective against ROP.

What Information the Attacker Needed

This chain required: 1. The offset from the buffer to the return address (72 bytes in typical x86-64 frames) 2. The addresses of each gadget (G1 through G5) — if ASLR+PIE, these are random per run 3. The address of the /bin/sh string — also randomized by ASLR

With ASLR+PIE, all five gadget addresses and the string address change with every run. An information leak that reveals one address in the executable lets us compute all gadget addresses (since they are fixed offsets from the base). An information leak that reveals one libc address (if we are using libc gadgets) lets us compute all libc gadget addresses.

The Lesson for Defenders

This trace illustrates why: 1. SHSTK is specifically effective — it attacks the ret mechanism that makes ROP work 2. ASLR is necessary but not sufficient — information leaks defeat it 3. Multiple mitigations together are strong — ASLR + SHSTK + IBT makes the full chain impractical 4. Information leaks are the first thing to prevent — they break ASLR and enable address knowledge

Understanding the chain step by step makes it clear what each mitigation blocks and why defense in depth works.