Chapter 39 Key Takeaways: Beyond Assembly
-
Compilers transform source code through a pipeline: lexing → parsing → AST → IR (LLVM IR / GCC GIMPLE) → optimization passes → instruction selection → register allocation → instruction scheduling → assembly. Each stage has a specific role; understanding the pipeline explains why the output looks the way it does.
-
Optimization happens on the IR, not on source or assembly: constant folding, inlining, loop vectorization, dead code elimination, and dozens of other passes transform the IR before code generation. This is why
-O2and-O0produce radically different assembly from identical source code. -
Register allocation is graph coloring: build an interference graph where variables that are simultaneously live cannot share a register. Color the graph with N colors (physical registers). Variables with more simultaneous peers than available colors get spilled to the stack. This explains why optimized code has fewer stack accesses.
-
JIT compilation generates machine code at runtime: the pattern is
mmap(PROT_RW)→ write instruction bytes →mprotect(PROT_RX)→ cast to function pointer → call. This is the same pattern flagged as suspicious in malware — and it IS the legitimate mechanism that V8, HotSpot, LLVM ORC, and every JIT compiler uses. -
Machine code bytes are just data:
0xB8 0x2A 0x00 0x00 0x00 0xC3ismov eax, 42; ret. Writing these bytes to executable memory and calling the result is exactly what the CPU executes — indistinguishable from compiler-generated code. -
WebAssembly is a portable stack-machine ISA: code pushes and pops values from a stack rather than using named registers. All memory accesses are bounds-checked against a linear memory region. Control flow only branches to declared labels. These properties make WASM safe to run from untrusted sources in browsers.
-
WASM → native via JIT: browsers JIT-compile WASM to x86-64 (or ARM64) for efficient execution. The WASM stack-machine IR is compiled to a register-machine output. V8 uses two tiers: Liftoff (fast baseline) and TurboFan (optimizing, for hot functions).
-
RISC-V is the open ISA: no license fees, no paperwork, no NDA. Anyone can build a RISC-V CPU. The political and economic significance — especially China's domestic RISC-V push — is accelerating hardware availability. RISC-V syntax and conventions are very similar to ARM64.
-
ARM64 and RISC-V share Linux syscall numbers (64 for write, 93 for exit): both use the unified POSIX-aligned syscall table. x86-64 has its own historic numbering (1 for write, 60 for exit). This is a deliberate design decision in the Linux kernel.
-
RISC-V pseudo-instructions improve readability:
liexpands toaddi rd, x0, immfor small values orlui + addifor large ones.laexpands toauipc + addi.mvexpands toaddi rd, rs, 0.retexpands tojalr x0, x1, 0. The hardware only sees the real instructions. -
GPU execution is SIMT (Single Instruction Multiple Threads): 32 (or more) threads execute the same instruction simultaneously on different data. Divergent branches serialize — threads that take the other path stall. This is SIMD at a much larger scale than CPU SIMD vectors.
-
CUDA PTX is NVIDIA's stable virtual ISA: PTX → SASS (the actual hardware ISA) via the driver's JIT. Writing CUDA at the PTX level is analogous to writing LLVM IR — portable across GPU generations. Understanding PTX builds on the SIMD concepts from Part VI.
-
The compiler generates
leafor small-constant multiplications because LEA's addressing modes (base + index*scale + displacement) performx*2,x*4,x*8, and combinations likex*3(asx + x*2) more efficiently thanimulfor small multipliers. -
WASI extends WASM beyond browsers: WebAssembly System Interface provides standard OS-like services (file I/O, network) for WASM running natively via
wasmtimeorwasmer. "Compile once, run anywhere" — with better performance and security guarantees than Java's equivalent promise. -
Assembly knowledge transfers to every future architecture: the concepts — registers, memory addressing, calling conventions, syscalls, interrupts — are universal. Only the register names, instruction encodings, and hardware details differ. Having mastered x86-64 and ARM64, understanding RISC-V, MIPS, or any future ISA is a matter of days, not months.