Chapter 11 Key Takeaways: The Stack and Function Calls

  1. The stack is ordinary memory, growing downward (toward lower addresses). RSP points to the top (lowest valid address). PUSH decrements RSP then stores; POP loads then increments RSP. There is nothing magical about the stack except the convention that RSP tracks its top.

  2. CALL pushes the return address (RIP of the next instruction) and jumps. RET pops the return address and jumps to it. This mechanism works correctly only if RSP is pointing to the return address at the moment RET executes — which means the function must leave RSP exactly where it found it (minus the 8 bytes CALL consumed).

  3. The standard function prologue: push rbp; mov rbp, rsp; sub rsp, N. This saves the caller's frame pointer, establishes a stable frame pointer for the current function, and allocates N bytes of local storage. The epilogue reverses this with leave; ret (or mov rsp, rbp; pop rbp; ret).

  4. Local variables live at negative offsets from RBP: [rbp-8], [rbp-16], etc. Function arguments beyond the sixth live at positive offsets from RBP: [rbp+16], [rbp+24], etc. The return address is at [rbp+8].

  5. The System V AMD64 ABI passes the first six integer arguments in RDI, RSI, RDX, RCX, R8, R9 (in that order). Floating-point arguments use XMM0-XMM5. Arguments beyond six go on the stack in right-to-left order.

  6. Callee-saved registers (RBX, RBP, R12-R15) must be preserved across function calls. If your function uses them, push them in the prologue and pop them before returning. Caller-saved registers (RAX, RCX, RDX, RSI, RDI, R8-R11) may be freely clobbered by any function you call.

  7. Before any call instruction, RSP must be 16-byte aligned. The CALL itself pushes 8 bytes (making RSP 8-byte aligned at function entry). The function's push rbp makes RSP 16-byte aligned again. The subsequent sub rsp, N must use N as a multiple of 16 to maintain alignment.

  8. N in sub rsp, N must be rounded up to the next multiple of 16 if your local variable size is not already a multiple of 16. Extra padding bytes are wasted but necessary for alignment.

  9. Leaf functions (those that make no calls) can omit the frame setup entirely as long as they do not modify RSP. They can hold all state in registers (which cannot be clobbered by non-existent calls) and simply ret.

  10. GCC's -fomit-frame-pointer (default at -O1+) uses RSP directly for all local variable addressing, freeing RBP as a general-purpose register. The debugger uses DWARF unwind tables to walk the call stack without frame pointers.

  11. The return address at [rbp+8] is the fundamental target of buffer overflow attacks. A local buffer at a negative offset from RBP, written without bounds checking, can reach and overwrite the return address. When ret executes, it jumps to the attacker-supplied value.

  12. Stack canaries (-fstack-protector) place a random value between locals and the saved RBP. Before ret, the canary is checked. If it has been overwritten (which any buffer overflow reaching the return address must do), the program aborts. This is the primary defense against the overflow pattern.

  13. xor al, al before call printf (or any variadic function) tells the ABI how many XMM arguments are being passed (zero in this case). Failing to set AL before calling a variadic function can cause the function to scan more XMM registers than necessary, leading to slow behavior or crashes.

  14. The leave instruction is equivalent to mov rsp, rbp; pop rbp. It is a compact epilogue instruction that restores both RSP and RBP in two machine cycles. Some programmers prefer the explicit two-instruction form for clarity.

  15. Stack overflow (not to be confused with buffer overflow) occurs when recursion or deep function calls consume all available stack space. The OS allocates a fixed stack size (typically 8MB on Linux). When RSP crosses the stack limit, the next memory access generates a SIGSEGV.