Appendix E: ARM64 (AArch64) Instruction Quick Reference

This appendix covers the most commonly used ARM64 instructions from this book. ARM64 uses a fixed 32-bit instruction width. For the complete instruction set, see the ARM Architecture Reference Manual (ARM ARM), document DDI0487.

All ARM64 instructions are 4 bytes, unconditionally. The fixed instruction width simplifies decoding and enables efficient pipelining.


Register File

General-Purpose Registers

64-bit 32-bit ABI Name Role (AAPCS64)
X0 W0 1st arg / return value; caller-saved
X1 W1 2nd arg; caller-saved
X2 W2 3rd arg; caller-saved
X3 W3 4th arg; caller-saved
X4 W4 5th arg; caller-saved
X5 W5 6th arg; caller-saved
X6 W6 7th arg; caller-saved
X7 W7 8th arg; caller-saved
X8 W8 XR Indirect result location / syscall number; caller-saved
X9-X15 W9-W15 Caller-saved temporaries
X16 W16 IP0 Intra-procedure-call scratch (linker trampoline)
X17 W17 IP1 Intra-procedure-call scratch
X18 W18 PR Platform register (OS-reserved on some systems)
X19-X28 W19-W28 Callee-saved
X29 W29 FP Frame pointer; callee-saved
X30 W30 LR Link register (return address); callee-saved by convention
SP WSP SP Stack pointer (dedicated, not general-purpose)
XZR WZR Zero register (reads as 0, writes discarded)
PC Program counter (not directly accessible as GPR)

SIMD/FP Registers

ARM64 has 32 SIMD/FP registers (V0-V31), each 128 bits wide. They can be accessed as: - Qn (128-bit, all 16 bytes) - Dn (64-bit, lower 8 bytes) - Sn (32-bit, lower 4 bytes) - Hn (16-bit, lower 2 bytes) - Bn (8-bit, lower 1 byte) - Vn.16B, Vn.8H, Vn.4S, Vn.2D (vector element access)


Data Movement

Instruction Description Notes
MOV Xd, Xn Copy register Alias for ORR Xd, XZR, Xn
MOV Xd, #imm Move immediate (16-bit) Alias for MOVZ Xd, #imm
MOVZ Xd, #imm16{, LSL #shift} Move 16-bit zero-extended immediate shift = 0, 16, 32, or 48
MOVN Xd, #imm16{, LSL #shift} Move bitwise NOT of 16-bit immediate
MOVK Xd, #imm16{, LSL #shift} Move 16-bit immediate, keep other bits For loading 64-bit constants
LDR Xd, [Xn] Load 64-bit from [Xn]
LDR Xd, [Xn, #imm] Load 64-bit from [Xn + imm] imm unsigned, multiple of 8
LDR Xd, [Xn, Xm] Load 64-bit from [Xn + Xm]
LDR Xd, [Xn, #imm]! Load 64-bit, pre-index (update Xn before)
LDR Xd, [Xn], #imm Load 64-bit, post-index (update Xn after)
LDR Xd, label PC-relative load (±1 MB range)
LDRB Wd, [Xn] Load byte, zero-extend to 32 bits
LDRH Wd, [Xn] Load halfword (16-bit), zero-extend
LDRSB Xd, [Xn] Load byte, sign-extend to 64 bits
LDRSH Xd, [Xn] Load halfword, sign-extend to 64 bits
LDRSW Xd, [Xn] Load word (32-bit), sign-extend to 64 bits
STR Xd, [Xn] Store 64-bit to [Xn]
STR Xd, [Xn, #imm] Store 64-bit to [Xn + imm]
STRB Wd, [Xn] Store byte
STRH Wd, [Xn] Store halfword
LDP X1, X2, [Xn] Load pair (two 64-bit registers) Efficient for saving/restoring register pairs
STP X1, X2, [Xn] Store pair Typical prologue: STP X29, X30, [SP, #-16]!
ADR Xd, label Load PC-relative address (±1 MB) Single instruction
ADRP Xd, label Load page-aligned PC-relative address (±4 GB) Pair with ADD for any address

Arithmetic

Instruction Description Flags
ADD Xd, Xn, Xm Xd = Xn + Xm None
ADD Xd, Xn, #imm12 Xd = Xn + imm (0-4095, or shifted by 12) None
ADDS Xd, Xn, Xm ADD and set flags NZCV
SUB Xd, Xn, Xm Xd = Xn - Xm None
SUB Xd, Xn, #imm12 Xd = Xn - imm None
SUBS Xd, Xn, Xm SUB and set flags NZCV
NEG Xd, Xn Xd = 0 - Xn None (alias for SUB Xd, XZR, Xn)
MUL Xd, Xn, Xm Xd = Xn × Xm (lower 64 bits) None
SMULH Xd, Xn, Xm Xd = upper 64 bits of signed Xn × Xm None
UMULH Xd, Xn, Xm Xd = upper 64 bits of unsigned Xn × Xm None
SDIV Xd, Xn, Xm Xd = Xn ÷ Xm (signed integer division) None
UDIV Xd, Xn, Xm Xd = Xn ÷ Xm (unsigned integer division) None
MADD Xd, Xn, Xm, Xa Xd = Xa + Xn × Xm (multiply-add) None
MSUB Xd, Xn, Xm, Xa Xd = Xa - Xn × Xm (multiply-subtract) None
ADC Xd, Xn, Xm Xd = Xn + Xm + C (add with carry) None
SBC Xd, Xn, Xm Xd = Xn - Xm - (1-C) (subtract with carry) None
CMN Xn, Xm Set flags for Xn + Xm (compare negative) NZCV

Bitwise and Shift

Instruction Description Notes
AND Xd, Xn, Xm Xd = Xn AND Xm
AND Xd, Xn, #imm Bitwise AND with bitmask immediate Special encoding for bit patterns
ORR Xd, Xn, Xm Xd = Xn OR Xm
EOR Xd, Xn, Xm Xd = Xn XOR Xm
BIC Xd, Xn, Xm Xd = Xn AND NOT Xm (bit clear)
TST Xn, Xm Set flags for Xn AND Xm NZCV; alias for ANDS XZR, Xn, Xm
LSL Xd, Xn, #imm Logical shift left 0-63
LSR Xd, Xn, #imm Logical shift right (zero fill) 0-63
ASR Xd, Xn, #imm Arithmetic shift right (sign fill) 0-63
ROR Xd, Xn, #imm Rotate right 0-63
LSL Xd, Xn, Xm Logical shift left by register
CLZ Xd, Xn Count leading zeros
RBIT Xd, Xn Reverse bits
REV Xd, Xn Reverse bytes (byte-swap) Useful for endian conversion
REV16 Xd, Xn Reverse bytes within each 16-bit halfword
UBFX Xd, Xn, #lsb, #width Unsigned bitfield extract
SBFX Xd, Xn, #lsb, #width Signed bitfield extract
BFI Xd, Xn, #lsb, #width Bitfield insert

Comparison and Control Flow

Instruction Description Notes
CMP Xn, Xm Set flags for Xn - Xm Alias for SUBS XZR, Xn, Xm
CMP Xn, #imm Set flags for Xn - imm
B label Unconditional branch (±128 MB)
B.cond label Conditional branch (±1 MB)
BL label Branch and link (sets X30=PC+4) Function call
BR Xn Branch to register Indirect jump
BLR Xn Branch and link to register Indirect call
RET Return (branch to X30) Alias for BR X30
RET Xn Return to address in Xn
CBZ Xn, label Branch if Xn == 0 Compact zero test
CBNZ Xn, label Branch if Xn != 0
TBZ Xn, #bit, label Branch if bit is zero
TBNZ Xn, #bit, label Branch if bit is nonzero
CSEL Xd, Xn, Xm, cond Conditional select: Xd = (cond) ? Xn : Xm
CSINC Xd, Xn, Xm, cond Conditional select increment
CSET Xd, cond Set Xd to 1 if condition, else 0

Condition Codes

Code Meaning Flags
EQ Equal Z=1
NE Not equal Z=0
LT Less than (signed) N≠V
LE Less or equal (signed) Z=1 or N≠V
GT Greater than (signed) Z=0 and N=V
GE Greater or equal (signed) N=V
LO Lower (unsigned, CF equivalent) C=0
LS Lower or same (unsigned) C=0 or Z=1
HI Higher (unsigned) C=1 and Z=0
HS Higher or same (unsigned) C=1
MI Minus (negative) N=1
PL Plus (non-negative) N=0
VS Overflow set V=1
VC Overflow clear V=0
AL Always (default)

System Instructions

Instruction Description Notes
SVC #imm Supervisor call (system call from EL0) Trap to EL1 (kernel)
HVC #imm Hypervisor call Trap to EL2
SMC #imm Secure monitor call Trap to EL3
ERET Exception return Restores PC from ELR_ELn, state from SPSR_ELn
MRS Xd, sysreg Read system register e.g., MRS X0, CurrentEL
MSR sysreg, Xn Write system register e.g., MSR VBAR_EL1, X0
ISB Instruction synchronization barrier Flush pipeline
DSB SY Data synchronization barrier Wait for all memory ops
DMB SY Data memory barrier Order memory accesses
NOP No operation
BRK #imm Breakpoint instruction (triggers debug exception) Software breakpoint
WFI Wait for interrupt (low-power idle)
WFE Wait for event
SEV Send event Wakes WFE on other cores

Commonly Used System Registers

Register Description Access
SP_EL0 Stack pointer for EL0 (user mode) EL1+
SP_EL1 Stack pointer for EL1 (kernel) EL1+
ELR_EL1 Exception link register (return PC) EL1+
SPSR_EL1 Saved program status register EL1+
VBAR_EL1 Vector base address register (exception table) EL1+
SCTLR_EL1 System control register (MMU enable, etc.) EL1+
TCR_EL1 Translation control register EL1+
TTBR0_EL1 Translation table base 0 (user space) EL1+
TTBR1_EL1 Translation table base 1 (kernel) EL1+
MAIR_EL1 Memory attribute indirection register EL1+
CurrentEL Current exception level (read-only) Any
NZCV Condition flags (direct read/write) Any
DAIF Debug/Abort/IRQ/FIQ mask bits Any
TPIDR_EL0 Thread ID register (user) Any
CNTFRQ_EL0 Timer frequency Any
CNTP_TVAL_EL0 Timer countdown value Any
CNTP_CTL_EL0 Timer control (enable, mask) Any

NEON SIMD Instructions (Selected)

Instruction Description Width
LD1 {Vt.4S}, [Xn] Load 4 single-precision floats 128 bits
ST1 {Vt.4S}, [Xn] Store 4 single-precision floats 128 bits
FADD Vd.4S, Vn.4S, Vm.4S Add 4 packed singles 4×f32
FMUL Vd.4S, Vn.4S, Vm.4S Multiply 4 packed singles 4×f32
FMLA Vd.4S, Vn.4S, Vm.4S Fused multiply-add 4×f32
ADD Vd.4S, Vn.4S, Vm.4S Add 4 packed 32-bit integers 4×i32
MUL Vd.4S, Vn.4S, Vm.4S Multiply 4 packed 32-bit integers 4×i32
CMEQ Vd.4S, Vn.4S, Vm.4S Compare equal, packed 32-bit 4×i32 mask
TBL Vd.8B, {Vn.16B}, Vm.8B Table lookup by byte index 8 bytes
ZIP1 Vd.4S, Vn.4S, Vm.4S Interleave (low halves)
ZIP2 Vd.4S, Vn.4S, Vm.4S Interleave (high halves)
DUP Vd.4S, Wn Broadcast scalar to all lanes
INS Vd.S[i], Wn Insert scalar into lane i
UMOV Wd, Vn.S[i] Extract lane i to scalar
ADDV Sd, Vn.4S Horizontal add all lanes Scalar result
FADDP Vd.4S, Vn.4S, Vm.4S Pairwise add

Calling Convention Summary (AAPCS64)

Arguments (integer/pointer): X0, X1, X2, X3, X4, X5, X6, X7
Arguments (float/SIMD):       V0, V1, V2, V3, V4, V5, V6, V7
Return value (integer):       X0 (and X1 for 128-bit values)
Return value (float):         V0

Caller-saved:   X0-X15, X16, X17, V0-V7, V16-V31
Callee-saved:   X19-X28, X29 (FP), X30 (LR), V8-V15 (lower 64 bits)
Special:        X18 (platform-reserved on some systems), SP (must remain aligned)

Stack alignment: 16 bytes at all times (not just before calls — always)

Standard prologue:
    STP X29, X30, [SP, #-16]!   ; save FP and LR, pre-decrement SP
    MOV X29, SP                 ; establish frame pointer

Standard epilogue:
    LDP X29, X30, [SP], #16     ; restore FP and LR, post-increment SP
    RET                         ; branch to X30

Syscall Interface (Linux AArch64)

Syscall number: X8
Arguments:      X0, X1, X2, X3, X4, X5
Return value:   X0 (negative = error code)
Instruction:    SVC #0

Common syscall numbers (same as RISC-V Linux):
  write:  64   (X8=64, X0=fd, X1=buf, X2=count)
  read:   63   (X8=63, X0=fd, X1=buf, X2=count)
  open:   56   (X8=56, X0=path, X1=flags, X2=mode)
  close:  57   (X8=57, X0=fd)
  exit:   93   (X8=93, X0=status)
  mmap:  222   (X8=222, X0=addr, X1=length, X2=prot, X3=flags, X4=fd, X5=offset)