Chapter 21 Key Takeaways: Understanding Compiler Output

  1. AT&T syntax puts the source operand before the destination, the opposite of Intel (NASM) syntax. movq %rbx, %rax means rax = rbx. Size suffixes (b/w/l/q) append to the mnemonic. Registers are prefixed with %, immediates with $, and memory is disp(base,index,scale).

  2. gcc -S program.c produces AT&T syntax assembly. Add -masm=intel for Intel syntax, -fverbose-asm for comments linking instructions to C variables, and -O0/-O2/-O3 to control optimization.

  3. GCC -O0 stores every variable at a fixed RBP-relative stack offset and reloads before each use. This is predictable and debugger-friendly but generates 3-5× more instructions than optimized code. Every local variable at -O0 has an addressable stack location.

  4. GCC -O2 eliminates the stack frame entirely for leaf functions. Values live in registers throughout. The prologue (push rbp; mov rbp, rsp) and epilogue are omitted when not needed. This is the first thing to look for when reading -O2 output.

  5. GCC converts conditional branches to CMOV instructions when the branch is predictably unpredictable (like abs value or max). movl %edi, %eax; negl %eax; testl %edi, %edi; cmovns %edi, %eax is the canonical CMOVNS pattern for absolute value.

  6. Switch statements with dense cases generate jump tables. The pattern jmp *.Ltable(,%rdi,8) is an indirect jump through a table of 64-bit addresses. Before the jump, GCC checks that the case value is within range (unsigned compare with the maximum case value).

  7. Integer division by a compile-time constant is replaced with multiply-high + shift. The "magic number" (like 1717986919 for division by 7) is computed at compile time. This avoids the slow IDIV instruction entirely. The technique works on all architectures (x86-64 uses IMUL, ARM64 uses SMULL, RISC-V uses MULH).

  8. LEA is used for arithmetic that doesn't need to update flags and for reg = other_reg ± constant in one instruction. leaq 8(%rax,%rcx,4), %rdx computes rdx = rax + rcx*4 + 8 without setting any flags — useful when the compiler needs the arithmetic result but must preserve condition flags.

  9. Tail call optimization converts tail-recursive functions to loops. A function ending in return f(args) where f is the same function can have the recursive call replaced by a jmp back to the function start, eliminating stack frame accumulation. GCC does this at -O2.

  10. Compiler Explorer (godbolt.org) is the essential tool for this chapter. Type C code, see assembly instantly, compare compilers (GCC vs. Clang), compare architectures (x86-64 vs. ARM64 vs. RISC-V), and compare optimization levels. The color-coded mapping from source lines to assembly is uniquely educational.

  11. Reading compiler output reveals undefined behavior. GCC at -O2 aggressively exploits UB to simplify code. Signed integer overflow is assumed not to happen; null-pointer dereference is assumed not to happen. Code that "worked at -O0" may generate incorrect assembly at -O2 if it relies on UB.

  12. Loop invariant code motion moves loop-invariant computations before the loop. A variable loaded inside a loop but never modified is kept in a register, not reloaded each iteration. At -O0, every variable is reloaded from the stack each time it's used.

  13. -fverbose-asm adds comments identifying C variables to assembly registers. The comment format # variable_name, temp_name helps when mapping between C source and the generated assembly.

  14. Architecture shapes the compiler output. ARM64's CNEG handles conditional negate in one instruction (vs. x86-64's CMOVNS pattern). RISC-V lacks CMOV entirely and must use branches or multi-instruction sequences. x86-64's three-operand IMUL reg, reg, imm has no equivalent in ARM64 or RISC-V. Same C, different assembly, by necessity.