Chapter 16 Key Takeaways: ARM64 Architecture

  1. RISC is not simpler — it is differently complex. Fewer instructions per task (x86-64) vs. more instructions, each simpler and predictable (ARM64). The tradeoff is programming verbosity vs. hardware simplicity.

  2. ARM64 has 31 general-purpose registers (X0-X30) plus a zero register (XZR/WZR) and a separate stack pointer (SP). This is nearly twice as many as x86-64's 16, which reduces stack pressure significantly.

  3. W registers are 32-bit views of X registers (the low 32 bits). Writing to a W register zero-extends into the corresponding X register. This is cleaner than x86-64's aliasing, where 8/16-bit writes leave the upper bits unchanged.

  4. XZR always reads as zero and silently discards writes. It is used to implement pseudoinstructions: CMPSUBS XZR, Xn, Xm; MOV Xd, XnORR Xd, XZR, Xn.

  5. ARM64 condition flags (N, Z, C, V) are only updated when instructions use the S suffix (ADDS, SUBS, ANDS). Regular ADD, SUB, and AND do not touch the flags. This allows flag-preserving arithmetic chains.

  6. Every ARM64 instruction is exactly 4 bytes (32 bits) wide. Fixed-width encoding simplifies the decoder, enables predictable alignment, and supports efficient superscalar dispatch — at the cost of slightly larger binary size vs. x86-64.

  7. ARM64 is a load/store architecture: ALU instructions cannot access memory. All data must be loaded into registers before arithmetic can be performed, then stored back. There is no equivalent of x86-64's add rax, [rbx].

  8. The link register X30 (LR) holds the return address after a BL instruction. Non-leaf functions must save X30 to the stack before calling another function, or the outer return address is overwritten.

  9. X29 (FP) is the frame pointer by convention (AAPCS64). The canonical prologue is STP X29, X30, [SP, #-16]! / MOV X29, SP, saving both FP and LR in one store pair instruction.

  10. AAPCS64 calling convention: X0-X7 = arguments 1-8, X0 = return value, X19-X28 = callee-saved. The stack must be 16-byte aligned before making any function call.

  11. ARM64 replaced ARM32's per-instruction condition codes with CSEL/CSET/CSINC. CSEL Xd, Xn, Xm, cond is a full ternary conditional select, more powerful than x86-64's CMOV.

  12. Linux ARM64 system calls use X8 for the syscall number and SVC #0 to invoke the kernel. ARM64 syscall numbers differ from x86-64's (write=64 not 1; exit=93 not 60).

  13. Practical ARM64 platforms: Raspberry Pi 4/5 (native), QEMU user-mode (qemu-aarch64) on any host, Apple Silicon (M1/M2/M3/M4) on macOS. Cross-compilation uses aarch64-linux-gnu-as and aarch64-linux-gnu-ld.

  14. Modern x86-64 CPUs internally decode CISC instructions into RISC-like micro-operations. The CISC encoding is fundamentally an API compatibility layer. The actual execution inside an Intel/AMD CPU is RISC-style.