Appendix E: ARM64 (AArch64) Instruction Quick Reference
This appendix covers the most commonly used ARM64 instructions from this book. ARM64 uses a fixed 32-bit instruction width. For the complete instruction set, see the ARM Architecture Reference Manual (ARM ARM), document DDI0487.
All ARM64 instructions are 4 bytes, unconditionally. The fixed instruction width simplifies decoding and enables efficient pipelining.
Register File
General-Purpose Registers
| 64-bit |
32-bit |
ABI Name |
Role (AAPCS64) |
| X0 |
W0 |
— |
1st arg / return value; caller-saved |
| X1 |
W1 |
— |
2nd arg; caller-saved |
| X2 |
W2 |
— |
3rd arg; caller-saved |
| X3 |
W3 |
— |
4th arg; caller-saved |
| X4 |
W4 |
— |
5th arg; caller-saved |
| X5 |
W5 |
— |
6th arg; caller-saved |
| X6 |
W6 |
— |
7th arg; caller-saved |
| X7 |
W7 |
— |
8th arg; caller-saved |
| X8 |
W8 |
XR |
Indirect result location / syscall number; caller-saved |
| X9-X15 |
W9-W15 |
— |
Caller-saved temporaries |
| X16 |
W16 |
IP0 |
Intra-procedure-call scratch (linker trampoline) |
| X17 |
W17 |
IP1 |
Intra-procedure-call scratch |
| X18 |
W18 |
PR |
Platform register (OS-reserved on some systems) |
| X19-X28 |
W19-W28 |
— |
Callee-saved |
| X29 |
W29 |
FP |
Frame pointer; callee-saved |
| X30 |
W30 |
LR |
Link register (return address); callee-saved by convention |
| SP |
WSP |
SP |
Stack pointer (dedicated, not general-purpose) |
| XZR |
WZR |
— |
Zero register (reads as 0, writes discarded) |
| PC |
— |
— |
Program counter (not directly accessible as GPR) |
SIMD/FP Registers
ARM64 has 32 SIMD/FP registers (V0-V31), each 128 bits wide. They can be accessed as:
- Qn (128-bit, all 16 bytes)
- Dn (64-bit, lower 8 bytes)
- Sn (32-bit, lower 4 bytes)
- Hn (16-bit, lower 2 bytes)
- Bn (8-bit, lower 1 byte)
- Vn.16B, Vn.8H, Vn.4S, Vn.2D (vector element access)
Data Movement
| Instruction |
Description |
Notes |
MOV Xd, Xn |
Copy register |
Alias for ORR Xd, XZR, Xn |
MOV Xd, #imm |
Move immediate (16-bit) |
Alias for MOVZ Xd, #imm |
MOVZ Xd, #imm16{, LSL #shift} |
Move 16-bit zero-extended immediate |
shift = 0, 16, 32, or 48 |
MOVN Xd, #imm16{, LSL #shift} |
Move bitwise NOT of 16-bit immediate |
|
MOVK Xd, #imm16{, LSL #shift} |
Move 16-bit immediate, keep other bits |
For loading 64-bit constants |
LDR Xd, [Xn] |
Load 64-bit from [Xn] |
|
LDR Xd, [Xn, #imm] |
Load 64-bit from [Xn + imm] |
imm unsigned, multiple of 8 |
LDR Xd, [Xn, Xm] |
Load 64-bit from [Xn + Xm] |
|
LDR Xd, [Xn, #imm]! |
Load 64-bit, pre-index (update Xn before) |
|
LDR Xd, [Xn], #imm |
Load 64-bit, post-index (update Xn after) |
|
LDR Xd, label |
PC-relative load (±1 MB range) |
|
LDRB Wd, [Xn] |
Load byte, zero-extend to 32 bits |
|
LDRH Wd, [Xn] |
Load halfword (16-bit), zero-extend |
|
LDRSB Xd, [Xn] |
Load byte, sign-extend to 64 bits |
|
LDRSH Xd, [Xn] |
Load halfword, sign-extend to 64 bits |
|
LDRSW Xd, [Xn] |
Load word (32-bit), sign-extend to 64 bits |
|
STR Xd, [Xn] |
Store 64-bit to [Xn] |
|
STR Xd, [Xn, #imm] |
Store 64-bit to [Xn + imm] |
|
STRB Wd, [Xn] |
Store byte |
|
STRH Wd, [Xn] |
Store halfword |
|
LDP X1, X2, [Xn] |
Load pair (two 64-bit registers) |
Efficient for saving/restoring register pairs |
STP X1, X2, [Xn] |
Store pair |
Typical prologue: STP X29, X30, [SP, #-16]! |
ADR Xd, label |
Load PC-relative address (±1 MB) |
Single instruction |
ADRP Xd, label |
Load page-aligned PC-relative address (±4 GB) |
Pair with ADD for any address |
Arithmetic
| Instruction |
Description |
Flags |
ADD Xd, Xn, Xm |
Xd = Xn + Xm |
None |
ADD Xd, Xn, #imm12 |
Xd = Xn + imm (0-4095, or shifted by 12) |
None |
ADDS Xd, Xn, Xm |
ADD and set flags |
NZCV |
SUB Xd, Xn, Xm |
Xd = Xn - Xm |
None |
SUB Xd, Xn, #imm12 |
Xd = Xn - imm |
None |
SUBS Xd, Xn, Xm |
SUB and set flags |
NZCV |
NEG Xd, Xn |
Xd = 0 - Xn |
None (alias for SUB Xd, XZR, Xn) |
MUL Xd, Xn, Xm |
Xd = Xn × Xm (lower 64 bits) |
None |
SMULH Xd, Xn, Xm |
Xd = upper 64 bits of signed Xn × Xm |
None |
UMULH Xd, Xn, Xm |
Xd = upper 64 bits of unsigned Xn × Xm |
None |
SDIV Xd, Xn, Xm |
Xd = Xn ÷ Xm (signed integer division) |
None |
UDIV Xd, Xn, Xm |
Xd = Xn ÷ Xm (unsigned integer division) |
None |
MADD Xd, Xn, Xm, Xa |
Xd = Xa + Xn × Xm (multiply-add) |
None |
MSUB Xd, Xn, Xm, Xa |
Xd = Xa - Xn × Xm (multiply-subtract) |
None |
ADC Xd, Xn, Xm |
Xd = Xn + Xm + C (add with carry) |
None |
SBC Xd, Xn, Xm |
Xd = Xn - Xm - (1-C) (subtract with carry) |
None |
CMN Xn, Xm |
Set flags for Xn + Xm (compare negative) |
NZCV |
Bitwise and Shift
| Instruction |
Description |
Notes |
AND Xd, Xn, Xm |
Xd = Xn AND Xm |
|
AND Xd, Xn, #imm |
Bitwise AND with bitmask immediate |
Special encoding for bit patterns |
ORR Xd, Xn, Xm |
Xd = Xn OR Xm |
|
EOR Xd, Xn, Xm |
Xd = Xn XOR Xm |
|
BIC Xd, Xn, Xm |
Xd = Xn AND NOT Xm (bit clear) |
|
TST Xn, Xm |
Set flags for Xn AND Xm |
NZCV; alias for ANDS XZR, Xn, Xm |
LSL Xd, Xn, #imm |
Logical shift left |
0-63 |
LSR Xd, Xn, #imm |
Logical shift right (zero fill) |
0-63 |
ASR Xd, Xn, #imm |
Arithmetic shift right (sign fill) |
0-63 |
ROR Xd, Xn, #imm |
Rotate right |
0-63 |
LSL Xd, Xn, Xm |
Logical shift left by register |
|
CLZ Xd, Xn |
Count leading zeros |
|
RBIT Xd, Xn |
Reverse bits |
|
REV Xd, Xn |
Reverse bytes (byte-swap) |
Useful for endian conversion |
REV16 Xd, Xn |
Reverse bytes within each 16-bit halfword |
|
UBFX Xd, Xn, #lsb, #width |
Unsigned bitfield extract |
|
SBFX Xd, Xn, #lsb, #width |
Signed bitfield extract |
|
BFI Xd, Xn, #lsb, #width |
Bitfield insert |
|
Comparison and Control Flow
| Instruction |
Description |
Notes |
CMP Xn, Xm |
Set flags for Xn - Xm |
Alias for SUBS XZR, Xn, Xm |
CMP Xn, #imm |
Set flags for Xn - imm |
|
B label |
Unconditional branch (±128 MB) |
|
B.cond label |
Conditional branch (±1 MB) |
|
BL label |
Branch and link (sets X30=PC+4) |
Function call |
BR Xn |
Branch to register |
Indirect jump |
BLR Xn |
Branch and link to register |
Indirect call |
RET |
Return (branch to X30) |
Alias for BR X30 |
RET Xn |
Return to address in Xn |
|
CBZ Xn, label |
Branch if Xn == 0 |
Compact zero test |
CBNZ Xn, label |
Branch if Xn != 0 |
|
TBZ Xn, #bit, label |
Branch if bit is zero |
|
TBNZ Xn, #bit, label |
Branch if bit is nonzero |
|
CSEL Xd, Xn, Xm, cond |
Conditional select: Xd = (cond) ? Xn : Xm |
|
CSINC Xd, Xn, Xm, cond |
Conditional select increment |
|
CSET Xd, cond |
Set Xd to 1 if condition, else 0 |
|
Condition Codes
| Code |
Meaning |
Flags |
EQ |
Equal |
Z=1 |
NE |
Not equal |
Z=0 |
LT |
Less than (signed) |
N≠V |
LE |
Less or equal (signed) |
Z=1 or N≠V |
GT |
Greater than (signed) |
Z=0 and N=V |
GE |
Greater or equal (signed) |
N=V |
LO |
Lower (unsigned, CF equivalent) |
C=0 |
LS |
Lower or same (unsigned) |
C=0 or Z=1 |
HI |
Higher (unsigned) |
C=1 and Z=0 |
HS |
Higher or same (unsigned) |
C=1 |
MI |
Minus (negative) |
N=1 |
PL |
Plus (non-negative) |
N=0 |
VS |
Overflow set |
V=1 |
VC |
Overflow clear |
V=0 |
AL |
Always (default) |
— |
System Instructions
| Instruction |
Description |
Notes |
SVC #imm |
Supervisor call (system call from EL0) |
Trap to EL1 (kernel) |
HVC #imm |
Hypervisor call |
Trap to EL2 |
SMC #imm |
Secure monitor call |
Trap to EL3 |
ERET |
Exception return |
Restores PC from ELR_ELn, state from SPSR_ELn |
MRS Xd, sysreg |
Read system register |
e.g., MRS X0, CurrentEL |
MSR sysreg, Xn |
Write system register |
e.g., MSR VBAR_EL1, X0 |
ISB |
Instruction synchronization barrier |
Flush pipeline |
DSB SY |
Data synchronization barrier |
Wait for all memory ops |
DMB SY |
Data memory barrier |
Order memory accesses |
NOP |
No operation |
|
BRK #imm |
Breakpoint instruction (triggers debug exception) |
Software breakpoint |
WFI |
Wait for interrupt (low-power idle) |
|
WFE |
Wait for event |
|
SEV |
Send event |
Wakes WFE on other cores |
Commonly Used System Registers
| Register |
Description |
Access |
SP_EL0 |
Stack pointer for EL0 (user mode) |
EL1+ |
SP_EL1 |
Stack pointer for EL1 (kernel) |
EL1+ |
ELR_EL1 |
Exception link register (return PC) |
EL1+ |
SPSR_EL1 |
Saved program status register |
EL1+ |
VBAR_EL1 |
Vector base address register (exception table) |
EL1+ |
SCTLR_EL1 |
System control register (MMU enable, etc.) |
EL1+ |
TCR_EL1 |
Translation control register |
EL1+ |
TTBR0_EL1 |
Translation table base 0 (user space) |
EL1+ |
TTBR1_EL1 |
Translation table base 1 (kernel) |
EL1+ |
MAIR_EL1 |
Memory attribute indirection register |
EL1+ |
CurrentEL |
Current exception level (read-only) |
Any |
NZCV |
Condition flags (direct read/write) |
Any |
DAIF |
Debug/Abort/IRQ/FIQ mask bits |
Any |
TPIDR_EL0 |
Thread ID register (user) |
Any |
CNTFRQ_EL0 |
Timer frequency |
Any |
CNTP_TVAL_EL0 |
Timer countdown value |
Any |
CNTP_CTL_EL0 |
Timer control (enable, mask) |
Any |
NEON SIMD Instructions (Selected)
| Instruction |
Description |
Width |
LD1 {Vt.4S}, [Xn] |
Load 4 single-precision floats |
128 bits |
ST1 {Vt.4S}, [Xn] |
Store 4 single-precision floats |
128 bits |
FADD Vd.4S, Vn.4S, Vm.4S |
Add 4 packed singles |
4×f32 |
FMUL Vd.4S, Vn.4S, Vm.4S |
Multiply 4 packed singles |
4×f32 |
FMLA Vd.4S, Vn.4S, Vm.4S |
Fused multiply-add |
4×f32 |
ADD Vd.4S, Vn.4S, Vm.4S |
Add 4 packed 32-bit integers |
4×i32 |
MUL Vd.4S, Vn.4S, Vm.4S |
Multiply 4 packed 32-bit integers |
4×i32 |
CMEQ Vd.4S, Vn.4S, Vm.4S |
Compare equal, packed 32-bit |
4×i32 mask |
TBL Vd.8B, {Vn.16B}, Vm.8B |
Table lookup by byte index |
8 bytes |
ZIP1 Vd.4S, Vn.4S, Vm.4S |
Interleave (low halves) |
|
ZIP2 Vd.4S, Vn.4S, Vm.4S |
Interleave (high halves) |
|
DUP Vd.4S, Wn |
Broadcast scalar to all lanes |
|
INS Vd.S[i], Wn |
Insert scalar into lane i |
|
UMOV Wd, Vn.S[i] |
Extract lane i to scalar |
|
ADDV Sd, Vn.4S |
Horizontal add all lanes |
Scalar result |
FADDP Vd.4S, Vn.4S, Vm.4S |
Pairwise add |
|
Calling Convention Summary (AAPCS64)
Arguments (integer/pointer): X0, X1, X2, X3, X4, X5, X6, X7
Arguments (float/SIMD): V0, V1, V2, V3, V4, V5, V6, V7
Return value (integer): X0 (and X1 for 128-bit values)
Return value (float): V0
Caller-saved: X0-X15, X16, X17, V0-V7, V16-V31
Callee-saved: X19-X28, X29 (FP), X30 (LR), V8-V15 (lower 64 bits)
Special: X18 (platform-reserved on some systems), SP (must remain aligned)
Stack alignment: 16 bytes at all times (not just before calls — always)
Standard prologue:
STP X29, X30, [SP, #-16]! ; save FP and LR, pre-decrement SP
MOV X29, SP ; establish frame pointer
Standard epilogue:
LDP X29, X30, [SP], #16 ; restore FP and LR, post-increment SP
RET ; branch to X30
Syscall Interface (Linux AArch64)
Syscall number: X8
Arguments: X0, X1, X2, X3, X4, X5
Return value: X0 (negative = error code)
Instruction: SVC #0
Common syscall numbers (same as RISC-V Linux):
write: 64 (X8=64, X0=fd, X1=buf, X2=count)
read: 63 (X8=63, X0=fd, X1=buf, X2=count)
open: 56 (X8=56, X0=path, X1=flags, X2=mode)
close: 57 (X8=57, X0=fd)
exit: 93 (X8=93, X0=status)
mmap: 222 (X8=222, X0=addr, X1=length, X2=prot, X3=flags, X4=fd, X5=offset)