Chapter 16 Key Takeaways: ARM64 Architecture
-
RISC is not simpler — it is differently complex. Fewer instructions per task (x86-64) vs. more instructions, each simpler and predictable (ARM64). The tradeoff is programming verbosity vs. hardware simplicity.
-
ARM64 has 31 general-purpose registers (X0-X30) plus a zero register (XZR/WZR) and a separate stack pointer (SP). This is nearly twice as many as x86-64's 16, which reduces stack pressure significantly.
-
W registers are 32-bit views of X registers (the low 32 bits). Writing to a W register zero-extends into the corresponding X register. This is cleaner than x86-64's aliasing, where 8/16-bit writes leave the upper bits unchanged.
-
XZR always reads as zero and silently discards writes. It is used to implement pseudoinstructions:
CMP→SUBS XZR, Xn, Xm;MOV Xd, Xn→ORR Xd, XZR, Xn. -
ARM64 condition flags (N, Z, C, V) are only updated when instructions use the S suffix (
ADDS,SUBS,ANDS). RegularADD,SUB, andANDdo not touch the flags. This allows flag-preserving arithmetic chains. -
Every ARM64 instruction is exactly 4 bytes (32 bits) wide. Fixed-width encoding simplifies the decoder, enables predictable alignment, and supports efficient superscalar dispatch — at the cost of slightly larger binary size vs. x86-64.
-
ARM64 is a load/store architecture: ALU instructions cannot access memory. All data must be loaded into registers before arithmetic can be performed, then stored back. There is no equivalent of x86-64's
add rax, [rbx]. -
The link register X30 (LR) holds the return address after a
BLinstruction. Non-leaf functions must save X30 to the stack before calling another function, or the outer return address is overwritten. -
X29 (FP) is the frame pointer by convention (AAPCS64). The canonical prologue is
STP X29, X30, [SP, #-16]!/MOV X29, SP, saving both FP and LR in one store pair instruction. -
AAPCS64 calling convention: X0-X7 = arguments 1-8, X0 = return value, X19-X28 = callee-saved. The stack must be 16-byte aligned before making any function call.
-
ARM64 replaced ARM32's per-instruction condition codes with CSEL/CSET/CSINC.
CSEL Xd, Xn, Xm, condis a full ternary conditional select, more powerful than x86-64'sCMOV. -
Linux ARM64 system calls use X8 for the syscall number and
SVC #0to invoke the kernel. ARM64 syscall numbers differ from x86-64's (write=64 not 1; exit=93 not 60). -
Practical ARM64 platforms: Raspberry Pi 4/5 (native), QEMU user-mode (
qemu-aarch64) on any host, Apple Silicon (M1/M2/M3/M4) on macOS. Cross-compilation usesaarch64-linux-gnu-asandaarch64-linux-gnu-ld. -
Modern x86-64 CPUs internally decode CISC instructions into RISC-like micro-operations. The CISC encoding is fundamentally an API compatibility layer. The actual execution inside an Intel/AMD CPU is RISC-style.