Chapter 17 Key Takeaways: ARM64 Instruction Set
-
ARM64 arithmetic instructions (ADD, SUB, MUL, AND, ORR, EOR) take three explicit register operands. The destination is always first:
ADD Xd, Xn, XmmeansXd = Xn + Xm. Compare to x86-64's two-operand format where the destination is also a source. -
The barrel shifter is built into every ARM64 data-processing instruction.
ADD X0, X1, X2, LSL #3computesX0 = X1 + (X2 << 3)in one instruction. This makes array index arithmetic (base + index * element_size) extremely efficient. -
ARM64 has no division remainder register. SDIV/UDIV produces only the quotient. To get the remainder, use
MSUB X_rem, X_quot, X_divisor, X_dividend(equivalent toremainder = dividend - quotient * divisor). -
SMULH/UMULH produce the upper 64 bits of a 128-bit multiply. Use these in sequence with MUL to get the full 128-bit product of two 64-bit values, or to implement fast division by constants via multiply-high + shift.
-
ARM64 addressing modes are rich: base only, base+immediate offset, base+register offset (with optional shift), pre-indexed (
[Xn, #imm]!), and post-indexed ([Xn], #imm). Pre-indexed updates the base before access; post-indexed updates it after. The!suffix means write-back. -
LDP and STP (load/store pair) transfer two registers in one instruction. The canonical ARM64 function prologue
STP X29, X30, [SP, #-16]!saves both frame pointer and link register in one instruction while decrementing SP. -
Sized loads (LDRB, LDRH, LDRSB, LDRSH, LDRSW) control zero vs. sign extension. The
Sin LDRSB/LDRSH/LDRSW means sign-extend. Use sign-extending loads when loadingchar,short, orintfor use in 64-bit arithmetic. -
CBZ and CBNZ branch if a register is zero or non-zero without touching flags. These replace the x86-64
TEST reg, reg+JZ/JNZpattern with a single instruction. TBZ/TBNZ do the same for individual bits. -
Conditional branches (B.EQ, B.NE, B.LT, etc.) have only a ±1MB range. Unconditional
Bhas ±128MB range. For farther jumps, load the target address into a register and useBR Xn(indirect branch). -
AAPCS64 calling convention: X0-X7 = first 8 arguments, X0 = return value, X19-X28 = callee-saved, X0-X18 = caller-saved. Stack must be 16-byte aligned before any
BLinstruction. -
The canonical function prologue is
STP X29, X30, [SP, #-16]!/MOV X29, SPand the epilogue isLDP X29, X30, [SP], #16/RET. Leaf functions (no calls) can skip saving LR and omit the full prologue. -
Linux ARM64 system calls use X8 for the syscall number, X0-X5 for arguments, and
SVC #0to invoke the kernel. ARM64 syscall numbers differ from x86-64: write=64, exit=93, openat=56. -
ARM64 has no string instructions (no REP SCASB, REP MOVSB, etc.). Implementing strlen, memcpy, and memset requires explicit byte/word/NEON loops. The NEON SIMD approach (16 bytes at once) is typically used in production libc implementations.
-
CSEL (conditional select) is ARM64's branchless conditional.
CSEL Xd, Xn, Xm, condselects between two registers without branching — stronger than x86-64's CMOV because it's a full ternary (not a conditional replacement).