Case Study 4.2: Buffer Overflow Preview — Why Stack Layout Matters for Security

Open Assembly Language Project

Case Study 4.2: Buffer Overflow Preview — Why Stack Layout Matters for Security

Understanding the memory model before the exploit

Overview

Chapter 11 covers buffer overflows and mitigations in full technical detail, including how to write and defend against stack smashing attacks. This case study is not that chapter. This is the prerequisite: understanding the stack memory layout clearly enough that when Chapter 11 arrives, the mechanics are obvious rather than mysterious.

The goal here is simple: given a function with a local buffer, draw the exact stack layout, label every location, and understand what is adjacent to what. Security is memory layout made adversarial.

A Function and Its Stack Frame

Consider this simple function:

// C version:
void process_input(int fd) {
    char buffer[64];    // local buffer on the stack
    int bytes_read;

    bytes_read = read(fd, buffer, 64);
    buffer[bytes_read] = '\0';
    // process buffer...
}

The assembly (gcc -O0) produces something like:

process_input:
    push   rbp
    mov    rbp, rsp
    sub    rsp, 80              ; allocate: 64 (buffer) + 4 (bytes_read) + 12 (padding)

    ; buffer is at [rbp-64] to [rbp-1]  (64 bytes)
    ; bytes_read is at [rbp-68]          (4 bytes)
    ; 12 bytes of padding to maintain alignment

Now let's draw the stack frame. Assume the function was called when RSP = 0x7FFFE048 (which means RSP = 0x7FFFE040 after the CALL instruction pushes the return address).

Address         Content                    Notes
──────────────────────────────────────────────────────────────────
0x7FFFE048      [caller's frame ...]       ← RSP before CALL
0x7FFFE040      return address (8 bytes)   ← CALL pushes this
                                             ← RSP after CALL, before push rbp
0x7FFFE038      saved RBP (8 bytes)        ← push rbp
                                             ← RBP points here (after mov rbp, rsp)
0x7FFFE030      [padding - 8 bytes]        bytes_read is 4 bytes; compiler aligns
0x7FFFE02C      bytes_read (4 bytes)       [rbp-12] ... actually let's use exact offsets
──────────────────────────────────────────────────────────────────

Let me use concrete RBP-relative addressing:

RBP = 0x7FFFE038 (after push rbp; mov rbp, rsp)

RBP + 8  → return address (8 bytes): 0x7FFFE040 - 0x7FFFE038 + 8 = ...

Let me compute this precisely:

RSP before CALL:       0x7FFFE048
After CALL (push retaddr): 0x7FFFE040  ← return address at [0x7FFFE040]
After push rbp:        0x7FFFE038  ← saved RBP at [0x7FFFE038]
After sub rsp, 80:     0x7FFFE038 - 80 = 0x7FFFE038 - 0x50 = 0x7FFFDFEB8 ?

Wait, 0x7FFFE038 - 0x50 = 0x7FFFDFE8

RSP = 0x7FFFDFE8

Frame layout (relative to RBP = 0x7FFFE038):

High addresses
┌──────────────────────────────────────────┐ 0x7FFFE048  caller's frame
│         caller's stack frame             │
├──────────────────────────────────────────┤ 0x7FFFE040  ← [rbp+8]
│     RETURN ADDRESS (8 bytes)             │  ← this is what we protect
├──────────────────────────────────────────┤ 0x7FFFE038  ← RBP (frame pointer)
│     SAVED RBP (8 bytes)                  │  ← [rbp+0]
├──────────────────────────────────────────┤ 0x7FFFE030
│     bytes_read (4 bytes, at [rbp-8])     │  ← [rbp-8] to [rbp-5]
│     padding (4 bytes)                    │  ← [rbp-12] to [rbp-9]  alignment
├──────────────────────────────────────────┤ 0x7FFFE028
│                                          │  ← [rbp-16] to [rbp-13]
│     buffer[48] to buffer[63] (16 bytes)  │  ← top of buffer
├──────────────────────────────────────────┤ 0x7FFFE018
│     buffer[32] to buffer[47] (16 bytes)  │
├──────────────────────────────────────────┤ 0x7FFFE008
│     buffer[16] to buffer[31] (16 bytes)  │
├──────────────────────────────────────────┤ 0x7FFFDFF8
│     buffer[0]  to buffer[15] (16 bytes)  │  ← [rbp-64]: first byte of buffer
└──────────────────────────────────────────┘ 0x7FFFDFE8 ← RSP
Low addresses

Key relationships: - buffer[0] is at [rbp-64] = 0x7FFFDFF8 - buffer[63] (last byte) is at [rbp-1] = 0x7FFFE037 - bytes_read is at [rbp-8] = 0x7FFFE030 - Saved RBP is at [rbp+0] = 0x7FFFE038 - Return address is at [rbp+8] = 0x7FFFE040

The Adjacency Problem

Look at the memory addresses. The buffer occupies bytes 0x7FFFDFF8 through 0x7FFFE037. The bytes_read variable starts at 0x7FFFE030. The saved RBP starts at 0x7FFFE038. The return address starts at 0x7FFFE040.

These are all adjacent in memory. There is no gap between the end of buffer and the bytes_read variable. There is no gap between bytes_read and the saved RBP. There is no gap between the saved RBP and the return address.

If a write to buffer goes past byte 63 (past [rbp-1]), the very next byte it writes is bytes_read[0] (at [rbp-8]... wait, that's not right either. Let me recalculate.

Actually with -O0, the compiler typically places the buffer at a negative offset from RBP, and the local integer variables at even more negative offsets. The exact layout depends on the compiler version and flags. Let's use a simpler, clearer layout:

With GCC -O0 on x86-64 Linux, the typical layout for:
  char buffer[64];   (allocated at [rbp-72] to [rbp-9])
  int bytes_read;    (allocated at [rbp-76] to [rbp-73])

  ... (more alignment padding)

Actually, the layout matters less than the principle. Let me state it clearly:

The principle: The local variables, the saved frame pointer, and the return address are all in the same stack frame, at consecutive memory addresses. Writing past the end of a local array overwrites the other contents of the frame — including, eventually, the return address.

What the CPU Sees When the Function Returns

The RET instruction does: 1. Load the 8 bytes at [RSP] (which is [rbp+8] at this point) into RIP 2. Add 8 to RSP

The CPU does NOT verify that this value is a valid code address, a legal instruction boundary, or a location the programmer intended to return to. It loads the bytes, puts them in RIP, and executes whatever instruction is there.

This is the memory model consequence: the return address is just data on the stack. It's bytes in memory, adjacent to other bytes. If those bytes can be overwritten, the CPU will obediently jump to whatever address the new bytes specify.

Mitigations That Operate at the Memory Level

Understanding the stack layout makes the security mitigations that Chapter 11 covers immediately understandable:

Stack Canary (GCC -fstack-protector):

┌──────────────────────────────────────────┐
│         caller's stack frame             │
├──────────────────────────────────────────┤ [rbp+8]
│     RETURN ADDRESS                       │
├──────────────────────────────────────────┤ [rbp+0]
│     SAVED RBP                            │
├──────────────────────────────────────────┤ [rbp-8]
│     STACK CANARY (8 bytes)               │  ← placed between buffer and retaddr
├──────────────────────────────────────────┤ [rbp-16]
│     bytes_read (4 bytes) + padding       │
├──────────────────────────────────────────┤ [rbp-80]
│     buffer[0..63]                        │
└──────────────────────────────────────────┘

The compiler inserts a random value (the "canary") between the buffer and the return address. Before returning, it checks that the canary is unchanged. A sequential overflow that overwrites the return address must first overwrite the canary. If the canary value has changed, the program calls __stack_chk_fail() and terminates.

In assembly, you can see this in compiler output:

; Function prologue with stack canary:
mov  rax, QWORD [fs:0x28]      ; load canary from TLS
mov  QWORD [rbp-8], rax        ; store canary in frame

; Function epilogue with canary check:
mov  rax, QWORD [rbp-8]        ; load stored canary
xor  rax, QWORD [fs:0x28]      ; compare with TLS canary
jne  .stack_smash_detected     ; if different, abort

ASLR (Address Space Layout Randomization): ASLR randomizes the base addresses of the stack, heap, and shared libraries. Even if you can overwrite the return address, you don't know what address to put there — the stack is at a different random address each run. From the memory model perspective: ASLR changes the numbers in the stack frame layout without changing the structure.

NX Bit (No-Execute): Pages in the stack region are mapped without execute permission (visible in /proc/self/maps as rw-p without x). Even if you overwrite the return address to point at shellcode you've placed in the buffer, the CPU won't execute it — the page fault on the non-executable stack will kill the program.

Why Stack Layout Knowledge Is Foundational

Every buffer overflow vulnerability, every return-oriented programming (ROP) technique, every stack-based use-after-free, and every stack pivot starts from the same foundation: you understand exactly what is at each offset from the stack pointer, what each value means, and what the CPU will do with it.

The memory model from Chapter 4 is not abstract theory. It is the concrete substrate that determines: - Which bytes are adjacent to the buffer - What the return address is and where it lives - What happens when RET executes - Where the canary is and why overflowing past it is detected - Why ASLR makes exploitation harder but not impossible

Security work at the assembly level is applied memory model knowledge. Nothing about Chapter 11's material will be surprising if you understand the stack frame layout described here.

A Concrete Example to Internalize

Write these six values down and remember them:

For a function called from address 0x401234 with one 64-byte local buffer: 1. buffer[0] is at the lowest stack address (highest index from RBP) 2. buffer[63] is adjacent to the stack canary (if enabled) or directly to saved RBP 3. Saved RBP is 8 bytes, immediately above the buffer/canary 4. Return address is 8 bytes, immediately above saved RBP 5. RET loads the return address into RIP — it's just a memory load 6. The CPU doesn't care if the return address is valid — it loads and jumps

This is the complete model. Everything in Chapter 11 is a consequence of these six facts.