Chapter 19 Key Takeaways: x86-64 vs. ARM64 Comparison

  1. The CISC/RISC distinction is real but not absolute. Modern x86-64 CPUs decode CISC instructions into RISC-like micro-operations internally. The visible ISA is CISC; the internal execution engine is RISC-like. ARM64 exposes the RISC philosophy directly.

  2. Fixed-width ARM64 encoding (4 bytes) simplifies the decoder substantially. Intel's x86-64 front-end (decode + µop cache) consumes ~30-35% of die area. ARM64's decoder uses ~8-10%. This freed transistor budget is what enabled Apple to build Firestorm cores with 192KB L1 caches and 600-entry ROBs.

  3. ARM64 has more GP registers (31 + XZR) and more calling-convention argument registers (8 vs. 6) than x86-64. This reduces stack spilling for complex functions and eliminates the "7th argument goes to the stack" overhead for multi-parameter functions.

  4. Code density favors x86-64 by roughly 10-20% for typical programs. The difference is primarily from memory-operand instructions: ARM64 needs LDR + EOR where x86-64 uses XOR reg, [mem]. In practice, this size difference rarely causes measurable performance differences because caches are sized to compensate.

  5. ARM64 has more callee-saved registers (X19-X28 = 10 callee-saved integers) than x86-64 (6 callee-saved integers). Functions that need to preserve many values across calls benefit from ARM64's larger callee-save set.

  6. The Apple M1 empirically proved the ARM64 performance thesis. By investing die area saved from a simpler decoder into larger caches and wider execution, Apple produced chips that beat Intel at higher performance-per-watt despite lower clock frequencies.

  7. Rosetta 2 translates x86-64 binaries to ARM64 ahead-of-time. The main technical challenge is emulating x86-64's TSO (Total Store Order) memory model on ARM64's weaker model. Rosetta 2 inserts DMB barriers where needed, with ~5-15% overhead for multithreaded code.

  8. ARM64 cloud adoption is substantial and growing. AWS Graviton, Microsoft Cobalt, and Google Axion all demonstrate that ARM64 delivers 25-40% better price/performance than equivalent x86-64 cloud instances for common server workloads.

  9. RISC-V is the open-ISA alternative. No licenses, no royalties, and a similar RISC philosophy to ARM64 (with some differences: no condition flags, no barrel shifter). Currently dominant in embedded/IoT, growing in cloud research, not yet competitive with ARM64's software ecosystem.

  10. ARM64's weaker memory model is an advantage, not a flaw. Weak ordering allows more out-of-order reordering, which increases CPU performance. When ordering is needed, explicit barriers (DMB/DSB/LDAXR/STLXR) are more precise and cheaper than x86-64's blanket TSO guarantee.

  11. Heterogeneous computing is the present, not the future. Both M-series and modern Intel chips use big.LITTLE (performance + efficiency) core configurations. The ISA you're executing on may change even within a single process depending on scheduler decisions.

  12. A security researcher must know both ISAs. Stack layouts differ (return address in register vs. on stack), ROP gadgets differ, system call numbers differ, and exploitation techniques differ. A buffer overflow on ARM64 looks different from one on x86-64, even for the same C code.

  13. SIMD is comparable in capability between the two architectures at 128-bit width (SSE2 ≈ NEON). x86-64 extends to 512-bit with AVX-512; ARM64 extends to variable width with SVE/SVE2. For most applications, 128-bit NEON provides the same practical benefit as SSE2.

  14. The era of x86 monoculture is over. "Just learn x86-64" was valid advice until 2015. In 2026, ARM64 is required knowledge for mobile development, cloud deployment on Graviton, Apple Silicon development, and security research on modern devices.