Appendix D: x86-64 Instruction Quick Reference

This appendix covers the most commonly used x86-64 instructions from this book. For complete encoding details, flag effects, and exception conditions, consult the Intel Software Developer's Manual (SDM) Volume 2.

Latency and throughput values are for Intel Skylake microarchitecture unless otherwise noted. L = latency (cycles); T = reciprocal throughput (cycles per instruction). Values from Agner Fog's instruction tables.


Data Movement

Instruction Description L T Flags
mov r64, r/m64 Move 64-bit value 1 0.25 None
mov r/m64, imm32 Move sign-extended immediate 1 0.25 None
movabs r64, imm64 Move 64-bit immediate (REX.W B8+r) 1 0.25 None
movzx r64, r/m8 Move byte, zero-extend to 64 bits 1 0.25 None
movzx r64, r/m16 Move word, zero-extend to 64 bits 1 0.25 None
movsx r64, r/m8 Move byte, sign-extend to 64 bits 1 0.25 None
movsx r64, r/m32 Move dword, sign-extend to 64 bits 1 0.25 None
movsxd r64, r/m32 Move dword, sign-extend to 64 bits 1 0.25 None
lea r64, m Load effective address (no memory access) 1 0.25 None
xchg r64, r/m64 Exchange two 64-bit values (atomic if m) 1-23 1 None
push r/m64 Push onto stack; RSP -= 8 1-3 1 None
pop r/m64 Pop from stack; RSP += 8 1-3 1 None
pushfq Push RFLAGS onto stack 2 1 None
popfq Pop stack into RFLAGS 2 1 All
cmovcc r64, r/m64 Conditional move if condition cc true 1 0.5 None

Arithmetic

Instruction Description L T Flags
add r/m64, r64 Add; dest += src 1 0.25 CF OF SF ZF PF AF
add r/m64, imm32 Add sign-extended immediate 1 0.25 CF OF SF ZF PF AF
sub r/m64, r64 Subtract; dest -= src 1 0.25 CF OF SF ZF PF AF
inc r/m64 Increment; dest += 1 1 0.25 OF SF ZF PF AF (not CF)
dec r/m64 Decrement; dest -= 1 1 0.25 OF SF ZF PF AF (not CF)
neg r/m64 Negate; dest = 0 - dest 1 0.25 CF OF SF ZF PF AF
mul r/m64 Unsigned multiply RDX:RAX = RAX × src 3 1 CF OF (ZF SF PF AF undefined)
imul r64, r/m64 Signed multiply; truncated result 3 1 CF OF (others undefined)
imul r64, r/m64, imm32 Signed multiply with immediate 3 1 CF OF
div r/m64 Unsigned divide RDX:RAX ÷ src → RAX rem RDX 35-90 21-74 All undefined
idiv r/m64 Signed divide 35-90 21-74 All undefined
cqo Sign-extend RAX to RDX:RAX 1 0.25 None
adc r/m64, r64 Add with carry (dest += src + CF) 1 0.33 CF OF SF ZF PF AF
sbb r/m64, r64 Subtract with borrow (dest -= src + CF) 1 0.33 CF OF SF ZF PF AF

Bitwise and Shift

Instruction Description L T Flags
and r/m64, r64 Bitwise AND 1 0.25 OF=0 CF=0 SF ZF PF
or r/m64, r64 Bitwise OR 1 0.25 OF=0 CF=0 SF ZF PF
xor r/m64, r64 Bitwise XOR 1 0.25 OF=0 CF=0 SF ZF PF
not r/m64 Bitwise NOT (one's complement) 1 0.25 None
shl r/m64, cl Shift left by CL bits 1 0.5 CF OF (last shifted out)
shl r/m64, imm8 Shift left by immediate 1 0.5 CF OF
shr r/m64, cl Logical shift right (zero fill) 1 0.5 CF OF
sar r/m64, cl Arithmetic shift right (sign fill) 1 0.5 CF OF
rol r/m64, cl Rotate left 1 0.5 CF OF
ror r/m64, cl Rotate right 1 0.5 CF OF
bsf r64, r/m64 Bit scan forward (index of lowest set bit) 3 1 ZF (ZF=1 if src=0)
bsr r64, r/m64 Bit scan reverse (index of highest set bit) 3 1 ZF
tzcnt r64, r/m64 Count trailing zeros (BMI1) 3 1 CF ZF
lzcnt r64, r/m64 Count leading zeros 3 1 CF ZF
popcnt r64, r/m64 Count set bits 3 1 ZF CF OF SF PF = 0
test r/m64, r64 AND without storing; sets flags 1 0.25 OF=0 CF=0 SF ZF PF
bt r/m64, r64 Bit test; CF = bit at position 1 0.5 CF
bts r/m64, r64 Bit test and set 1 0.5 CF
btr r/m64, r64 Bit test and reset (clear) 1 0.5 CF
btc r/m64, r64 Bit test and complement 1 0.5 CF

Comparison and Control Flow

Instruction Description L T Flags
cmp r/m64, r64 Compare (sub without storing); sets flags 1 0.25 CF OF SF ZF PF AF
cmp r/m64, imm32 Compare with sign-extended immediate 1 0.25 CF OF SF ZF PF AF
jmp rel32 Unconditional near jump 1 0.5 None
jmp r/m64 Indirect jump 1 0.5 None
je / jz rel32 Jump if equal / zero (ZF=1) 1 0.5 None
jne / jnz rel32 Jump if not equal / not zero (ZF=0) 1 0.5 None
jl / jnge rel32 Jump if less (signed: SF≠OF) 1 0.5 None
jle / jng rel32 Jump if less or equal (signed: ZF=1 or SF≠OF) 1 0.5 None
jg / jnle rel32 Jump if greater (signed: ZF=0 and SF=OF) 1 0.5 None
jge / jnl rel32 Jump if greater or equal (signed: SF=OF) 1 0.5 None
jb / jnae / jc rel32 Jump if below (unsigned: CF=1) 1 0.5 None
jbe / jna rel32 Jump if below or equal (unsigned: CF=1 or ZF=1) 1 0.5 None
ja / jnbe rel32 Jump if above (unsigned: CF=0 and ZF=0) 1 0.5 None
jae / jnb / jnc rel32 Jump if above or equal (unsigned: CF=0) 1 0.5 None
js rel32 Jump if sign (SF=1) 1 0.5 None
jns rel32 Jump if not sign (SF=0) 1 0.5 None
jo rel32 Jump if overflow (OF=1) 1 0.5 None
jno rel32 Jump if not overflow (OF=0) 1 0.5 None
jp / jpe rel32 Jump if parity even (PF=1) 1 0.5 None
jnp / jpo rel32 Jump if parity odd (PF=0) 1 0.5 None
loop rel8 Decrement RCX; jump if RCX ≠ 0 5 5 None
call rel32 Push return address, jump to target 1-3 1 None
call r/m64 Indirect call 1-3 1 None
ret Pop return address and jump 1-3 1 None
ret imm16 Pop return address, adjust RSP by imm16 1-3 1 None
setcc r/m8 Set byte to 1 if condition cc true, else 0 1 0.5 None

String and Repetition

Instruction Description L T Notes
rep movsb Copy RCX bytes from [RSI] to [RDI] varies - DF controls direction
rep movsq Copy RCX qwords from [RSI] to [RDI] varies -
rep stosb Fill RCX bytes at [RDI] with AL varies - Used for memset
rep stosq Fill RCX qwords at [RDI] with RAX varies -
repe cmpsb Compare bytes at [RSI] and [RDI] while equal varies - ZF = comparison result
repne scasb Find byte in [RDI] not equal to AL varies - Used for strlen
cld Clear direction flag (DF=0, forward) 1 1 None
std Set direction flag (DF=1, backward) 1 1 None

Stack and Function Call Mechanics

Instruction Description Notes
enter imm16, 0 Create stack frame (slow; avoid) push rbp; mov rbp, rsp; sub rsp, imm16
leave Destroy stack frame mov rsp, rbp; pop rbp
syscall Fast system call to ring 0 RAX=number; saves RCX=RIP, R11=RFLAGS
sysret Return from kernel to ring 3 Restores RIP from RCX, RFLAGS from R11
int imm8 Software interrupt (INT 0x80 = Linux 32-bit ABI) Use syscall for 64-bit
iretq Interrupt return (64-bit) Pops RIP, CS, RFLAGS, RSP, SS
hlt Halt CPU until next interrupt Ring 0 only
cli Disable maskable interrupts (IF=0) Ring 0 only
sti Enable maskable interrupts (IF=1) Ring 0 only
nop No operation (0x90) 1 cycle; used for alignment
pause Hint spin-wait loop Reduces power and memory conflicts

Memory Ordering and Atomics

Instruction Description Notes
lock add [m], r Atomic add LOCK prefix makes RMW atomic
lock cmpxchg [m], r Atomic compare-and-swap RAX=expected; r=new
lock xchg [m], r Atomic exchange LOCK implicit for xchg
lock inc [m] Atomic increment
mfence Full memory fence All loads/stores ordered
sfence Store fence Stores ordered before later stores
lfence Load fence Loads ordered; also serializes instruction stream
xacquire lock prefix Hardware lock elision acquire For HTM
xrelease lock prefix Hardware lock elision release For HTM
rdtsc Read time-stamp counter → EDX:EAX Not serializing; use with lfence
rdtscp Read TSC and processor ID Serializes on read side

System Control and Privilege

Instruction Description Privilege
lgdt [m] Load GDT register Ring 0
lidt [m] Load IDT register Ring 0
lldt r/m16 Load LDT selector Ring 0
ltr r/m16 Load task register (TSS) Ring 0
mov cr0, r64 Write CR0 (protected mode, paging enable) Ring 0
mov cr3, r64 Write CR3 (page table base address) Ring 0
mov cr4, r64 Write CR4 (PAE, SMEP, SMAP, etc.) Ring 0
rdmsr Read MSR (ECX=address) → EDX:EAX Ring 0
wrmsr Write MSR (ECX=address, EDX:EAX=value) Ring 0
cpuid Query CPU features (EAX=leaf) Any
invlpg [m] Invalidate TLB entry for address Ring 0
invpcid Invalidate TLB by PCID Ring 0
endbr64 Valid indirect branch target (CET IBT) Any (nop without CET)
wrssd [m], r32 Write to shadow stack (CET SHSTK) Any
rstorssp [m] Restore shadow stack pointer (CET SHSTK) Ring 0

SIMD (SSE/AVX) — Commonly Used Instructions

Packed Floating-Point (SSE/AVX)

Instruction Description Width
movaps xmm, m128 Move aligned packed singles 4×f32 in 128b
movups xmm, m128 Move unaligned packed singles 4×f32 in 128b
vmovaps ymm, m256 Move aligned packed singles (AVX) 8×f32 in 256b
addps xmm, xmm/m128 Add packed singles 4 f32
vaddps ymm, ymm, ymm/m256 Add packed singles (AVX) 8 f32
mulps xmm, xmm/m128 Multiply packed singles 4 f32
vmulps ymm, ymm, ymm/m256 Multiply packed singles (AVX) 8 f32
divps xmm, xmm/m128 Divide packed singles 4 f32
sqrtps xmm, xmm/m128 Square root packed singles 4 f32
haddps xmm, xmm/m128 Horizontal add packed singles adjacent pairs
shufps xmm, xmm/m128, imm8 Shuffle packed singles 4 f32
vpermps ymm, ymm, ymm/m256 Permute packed singles (AVX2) 8 f32
vfmadd231ps ymm, ymm, ymm Fused multiply-add a*b+c (FMA) 8 f32
vcvttps2dq ymm, ymm/m256 Convert packed f32 to i32 (truncate) 8 values
vcvtdq2ps ymm, ymm/m256 Convert packed i32 to f32 8 values

Packed Integer (SSE/AVX2)

Instruction Description Width
paddb xmm, xmm/m128 Add packed bytes 16×i8
paddw xmm, xmm/m128 Add packed words 8×i16
paddd xmm, xmm/m128 Add packed dwords 4×i32
paddq xmm, xmm/m128 Add packed qwords 2×i64
vpaddq ymm, ymm, ymm/m256 Add packed qwords (AVX2) 4×i64
pcmpeqd xmm, xmm/m128 Compare packed dwords for equality 4×i32
vpcmpeqd ymm, ymm, ymm/m256 Compare packed dwords (AVX2) 8×i32
pmovmskb r32, xmm Extract byte mask from packed comparison 16 bits
vpmovmskb r32, ymm Extract byte mask (AVX2) 32 bits
pshufb xmm, xmm/m128 Shuffle bytes (SSSE3) 16 bytes
vpshufb ymm, ymm, ymm/m256 Shuffle bytes (AVX2) 32 bytes
pclmulqdq xmm, xmm/m128, imm8 Carry-less multiply (PCLMUL) for GCM

AES Instructions (AES-NI)

Instruction Description
aesenc xmm, xmm/m128 One AES encryption round
aesenclast xmm, xmm/m128 Final AES encryption round
aesdec xmm, xmm/m128 One AES decryption round
aesdeclast xmm, xmm/m128 Final AES decryption round
aeskeygenassist xmm, xmm/m128, imm8 AES key schedule assist
pclmulqdq xmm, xmm/m128, imm8 Carry-less multiply (GCM authentication)

REX Prefix Encoding

The REX prefix (0x40-0x4F) enables 64-bit operand size and access to extended registers:

Bit Name Effect
REX.W (bit 3) Wide 64-bit operand size
REX.R (bit 2) Register extend Extends the reg field of ModRM to access R8-R15, XMM8-XMM15
REX.X (bit 1) Index extend Extends the index field of SIB byte
REX.B (bit 0) Base extend Extends the r/m field of ModRM or the base field of SIB

Common REX prefix values: - 40: REX (no bits set — used to access SPL, BPL, SIL, DIL) - 48: REX.W (64-bit operation) - 4C: REX.W + REX.R (64-bit, extended reg field) - 49: REX.W + REX.B (64-bit, extended r/m field)


Register Quick Reference

64-bit 32-bit 16-bit 8-bit (high) 8-bit (low) ABI Role (System V AMD64)
RAX EAX AX AH AL Return value; caller-saved
RBX EBX BX BH BL Callee-saved
RCX ECX CX CH CL 4th arg; caller-saved
RDX EDX DX DH DL 3rd arg; caller-saved
RSI ESI SI SIL 2nd arg; caller-saved
RDI EDI DI DIL 1st arg; caller-saved
RSP ESP SP SPL Stack pointer
RBP EBP BP BPL Frame pointer; callee-saved
R8 R8D R8W R8B 5th arg; caller-saved
R9 R9D R9W R9B 6th arg; caller-saved
R10 R10D R10W R10B Caller-saved; syscall arg 4
R11 R11D R11W R11B Caller-saved; syscall clobbers
R12 R12D R12W R12B Callee-saved
R13 R13D R13W R13B Callee-saved
R14 R14D R14W R14B Callee-saved
R15 R15D R15W R15B Callee-saved
RIP Instruction pointer
RFLAGS EFLAGS FLAGS Condition flags