Chapter 21 Key Takeaways: Understanding Compiler Output
-
AT&T syntax puts the source operand before the destination, the opposite of Intel (NASM) syntax.
movq %rbx, %raxmeansrax = rbx. Size suffixes (b/w/l/q) append to the mnemonic. Registers are prefixed with%, immediates with$, and memory isdisp(base,index,scale). -
gcc -S program.cproduces AT&T syntax assembly. Add-masm=intelfor Intel syntax,-fverbose-asmfor comments linking instructions to C variables, and-O0/-O2/-O3to control optimization. -
GCC -O0 stores every variable at a fixed RBP-relative stack offset and reloads before each use. This is predictable and debugger-friendly but generates 3-5× more instructions than optimized code. Every local variable at
-O0has an addressable stack location. -
GCC -O2 eliminates the stack frame entirely for leaf functions. Values live in registers throughout. The prologue (
push rbp; mov rbp, rsp) and epilogue are omitted when not needed. This is the first thing to look for when reading -O2 output. -
GCC converts conditional branches to CMOV instructions when the branch is predictably unpredictable (like abs value or max).
movl %edi, %eax; negl %eax; testl %edi, %edi; cmovns %edi, %eaxis the canonical CMOVNS pattern for absolute value. -
Switch statements with dense cases generate jump tables. The pattern
jmp *.Ltable(,%rdi,8)is an indirect jump through a table of 64-bit addresses. Before the jump, GCC checks that the case value is within range (unsigned compare with the maximum case value). -
Integer division by a compile-time constant is replaced with multiply-high + shift. The "magic number" (like 1717986919 for division by 7) is computed at compile time. This avoids the slow
IDIVinstruction entirely. The technique works on all architectures (x86-64 uses IMUL, ARM64 uses SMULL, RISC-V uses MULH). -
LEAis used for arithmetic that doesn't need to update flags and forreg = other_reg ± constantin one instruction.leaq 8(%rax,%rcx,4), %rdxcomputesrdx = rax + rcx*4 + 8without setting any flags — useful when the compiler needs the arithmetic result but must preserve condition flags. -
Tail call optimization converts tail-recursive functions to loops. A function ending in
return f(args)wherefis the same function can have the recursive call replaced by ajmpback to the function start, eliminating stack frame accumulation. GCC does this at -O2. -
Compiler Explorer (godbolt.org) is the essential tool for this chapter. Type C code, see assembly instantly, compare compilers (GCC vs. Clang), compare architectures (x86-64 vs. ARM64 vs. RISC-V), and compare optimization levels. The color-coded mapping from source lines to assembly is uniquely educational.
-
Reading compiler output reveals undefined behavior. GCC at -O2 aggressively exploits UB to simplify code. Signed integer overflow is assumed not to happen; null-pointer dereference is assumed not to happen. Code that "worked at -O0" may generate incorrect assembly at -O2 if it relies on UB.
-
Loop invariant code motion moves loop-invariant computations before the loop. A variable loaded inside a loop but never modified is kept in a register, not reloaded each iteration. At -O0, every variable is reloaded from the stack each time it's used.
-
-fverbose-asmadds comments identifying C variables to assembly registers. The comment format# variable_name, temp_namehelps when mapping between C source and the generated assembly. -
Architecture shapes the compiler output. ARM64's
CNEGhandles conditional negate in one instruction (vs. x86-64's CMOVNS pattern). RISC-V lacks CMOV entirely and must use branches or multi-instruction sequences. x86-64's three-operandIMUL reg, reg, immhas no equivalent in ARM64 or RISC-V. Same C, different assembly, by necessity.