Chapter 39 Key Takeaways: Beyond Assembly

Open Assembly Language Project

Chapter 39 Key Takeaways: Beyond Assembly

Compilers transform source code through a pipeline: lexing → parsing → AST → IR (LLVM IR / GCC GIMPLE) → optimization passes → instruction selection → register allocation → instruction scheduling → assembly. Each stage has a specific role; understanding the pipeline explains why the output looks the way it does.
Optimization happens on the IR, not on source or assembly: constant folding, inlining, loop vectorization, dead code elimination, and dozens of other passes transform the IR before code generation. This is why -O2 and -O0 produce radically different assembly from identical source code.
Register allocation is graph coloring: build an interference graph where variables that are simultaneously live cannot share a register. Color the graph with N colors (physical registers). Variables with more simultaneous peers than available colors get spilled to the stack. This explains why optimized code has fewer stack accesses.
JIT compilation generates machine code at runtime: the pattern is mmap(PROT_RW) → write instruction bytes → mprotect(PROT_RX) → cast to function pointer → call. This is the same pattern flagged as suspicious in malware — and it IS the legitimate mechanism that V8, HotSpot, LLVM ORC, and every JIT compiler uses.
Machine code bytes are just data: 0xB8 0x2A 0x00 0x00 0x00 0xC3 is mov eax, 42; ret. Writing these bytes to executable memory and calling the result is exactly what the CPU executes — indistinguishable from compiler-generated code.
WebAssembly is a portable stack-machine ISA: code pushes and pops values from a stack rather than using named registers. All memory accesses are bounds-checked against a linear memory region. Control flow only branches to declared labels. These properties make WASM safe to run from untrusted sources in browsers.
WASM → native via JIT: browsers JIT-compile WASM to x86-64 (or ARM64) for efficient execution. The WASM stack-machine IR is compiled to a register-machine output. V8 uses two tiers: Liftoff (fast baseline) and TurboFan (optimizing, for hot functions).
RISC-V is the open ISA: no license fees, no paperwork, no NDA. Anyone can build a RISC-V CPU. The political and economic significance — especially China's domestic RISC-V push — is accelerating hardware availability. RISC-V syntax and conventions are very similar to ARM64.
ARM64 and RISC-V share Linux syscall numbers (64 for write, 93 for exit): both use the unified POSIX-aligned syscall table. x86-64 has its own historic numbering (1 for write, 60 for exit). This is a deliberate design decision in the Linux kernel.
RISC-V pseudo-instructions improve readability: li expands to addi rd, x0, imm for small values or lui + addi for large ones. la expands to auipc + addi. mv expands to addi rd, rs, 0. ret expands to jalr x0, x1, 0. The hardware only sees the real instructions.
GPU execution is SIMT (Single Instruction Multiple Threads): 32 (or more) threads execute the same instruction simultaneously on different data. Divergent branches serialize — threads that take the other path stall. This is SIMD at a much larger scale than CPU SIMD vectors.
CUDA PTX is NVIDIA's stable virtual ISA: PTX → SASS (the actual hardware ISA) via the driver's JIT. Writing CUDA at the PTX level is analogous to writing LLVM IR — portable across GPU generations. Understanding PTX builds on the SIMD concepts from Part VI.
The compiler generates lea for small-constant multiplications because LEA's addressing modes (base + index*scale + displacement) perform x*2, x*4, x*8, and combinations like x*3 (as x + x*2) more efficiently than imul for small multipliers.
WASI extends WASM beyond browsers: WebAssembly System Interface provides standard OS-like services (file I/O, network) for WASM running natively via wasmtime or wasmer. "Compile once, run anywhere" — with better performance and security guarantees than Java's equivalent promise.
Assembly knowledge transfers to every future architecture: the concepts — registers, memory addressing, calling conventions, syscalls, interrupts — are universal. Only the register names, instruction encodings, and hardware details differ. Having mastered x86-64 and ARM64, understanding RISC-V, MIPS, or any future ISA is a matter of days, not months.