This appendix covers the most commonly used x86-64 instructions from this book. For complete encoding details, flag effects, and exception conditions, consult the Intel Software Developer's Manual (SDM) Volume 2.
Latency and throughput values are for Intel Skylake microarchitecture unless otherwise noted. L = latency (cycles); T = reciprocal throughput (cycles per instruction). Values from Agner Fog's instruction tables.
Data Movement
Instruction
Description
L
T
Flags
mov r64, r/m64
Move 64-bit value
1
0.25
None
mov r/m64, imm32
Move sign-extended immediate
1
0.25
None
movabs r64, imm64
Move 64-bit immediate (REX.W B8+r)
1
0.25
None
movzx r64, r/m8
Move byte, zero-extend to 64 bits
1
0.25
None
movzx r64, r/m16
Move word, zero-extend to 64 bits
1
0.25
None
movsx r64, r/m8
Move byte, sign-extend to 64 bits
1
0.25
None
movsx r64, r/m32
Move dword, sign-extend to 64 bits
1
0.25
None
movsxd r64, r/m32
Move dword, sign-extend to 64 bits
1
0.25
None
lea r64, m
Load effective address (no memory access)
1
0.25
None
xchg r64, r/m64
Exchange two 64-bit values (atomic if m)
1-23
1
None
push r/m64
Push onto stack; RSP -= 8
1-3
1
None
pop r/m64
Pop from stack; RSP += 8
1-3
1
None
pushfq
Push RFLAGS onto stack
2
1
None
popfq
Pop stack into RFLAGS
2
1
All
cmovcc r64, r/m64
Conditional move if condition cc true
1
0.5
None
Arithmetic
Instruction
Description
L
T
Flags
add r/m64, r64
Add; dest += src
1
0.25
CF OF SF ZF PF AF
add r/m64, imm32
Add sign-extended immediate
1
0.25
CF OF SF ZF PF AF
sub r/m64, r64
Subtract; dest -= src
1
0.25
CF OF SF ZF PF AF
inc r/m64
Increment; dest += 1
1
0.25
OF SF ZF PF AF (not CF)
dec r/m64
Decrement; dest -= 1
1
0.25
OF SF ZF PF AF (not CF)
neg r/m64
Negate; dest = 0 - dest
1
0.25
CF OF SF ZF PF AF
mul r/m64
Unsigned multiply RDX:RAX = RAX × src
3
1
CF OF (ZF SF PF AF undefined)
imul r64, r/m64
Signed multiply; truncated result
3
1
CF OF (others undefined)
imul r64, r/m64, imm32
Signed multiply with immediate
3
1
CF OF
div r/m64
Unsigned divide RDX:RAX ÷ src → RAX rem RDX
35-90
21-74
All undefined
idiv r/m64
Signed divide
35-90
21-74
All undefined
cqo
Sign-extend RAX to RDX:RAX
1
0.25
None
adc r/m64, r64
Add with carry (dest += src + CF)
1
0.33
CF OF SF ZF PF AF
sbb r/m64, r64
Subtract with borrow (dest -= src + CF)
1
0.33
CF OF SF ZF PF AF
Bitwise and Shift
Instruction
Description
L
T
Flags
and r/m64, r64
Bitwise AND
1
0.25
OF=0 CF=0 SF ZF PF
or r/m64, r64
Bitwise OR
1
0.25
OF=0 CF=0 SF ZF PF
xor r/m64, r64
Bitwise XOR
1
0.25
OF=0 CF=0 SF ZF PF
not r/m64
Bitwise NOT (one's complement)
1
0.25
None
shl r/m64, cl
Shift left by CL bits
1
0.5
CF OF (last shifted out)
shl r/m64, imm8
Shift left by immediate
1
0.5
CF OF
shr r/m64, cl
Logical shift right (zero fill)
1
0.5
CF OF
sar r/m64, cl
Arithmetic shift right (sign fill)
1
0.5
CF OF
rol r/m64, cl
Rotate left
1
0.5
CF OF
ror r/m64, cl
Rotate right
1
0.5
CF OF
bsf r64, r/m64
Bit scan forward (index of lowest set bit)
3
1
ZF (ZF=1 if src=0)
bsr r64, r/m64
Bit scan reverse (index of highest set bit)
3
1
ZF
tzcnt r64, r/m64
Count trailing zeros (BMI1)
3
1
CF ZF
lzcnt r64, r/m64
Count leading zeros
3
1
CF ZF
popcnt r64, r/m64
Count set bits
3
1
ZF CF OF SF PF = 0
test r/m64, r64
AND without storing; sets flags
1
0.25
OF=0 CF=0 SF ZF PF
bt r/m64, r64
Bit test; CF = bit at position
1
0.5
CF
bts r/m64, r64
Bit test and set
1
0.5
CF
btr r/m64, r64
Bit test and reset (clear)
1
0.5
CF
btc r/m64, r64
Bit test and complement
1
0.5
CF
Comparison and Control Flow
Instruction
Description
L
T
Flags
cmp r/m64, r64
Compare (sub without storing); sets flags
1
0.25
CF OF SF ZF PF AF
cmp r/m64, imm32
Compare with sign-extended immediate
1
0.25
CF OF SF ZF PF AF
jmp rel32
Unconditional near jump
1
0.5
None
jmp r/m64
Indirect jump
1
0.5
None
je / jz rel32
Jump if equal / zero (ZF=1)
1
0.5
None
jne / jnz rel32
Jump if not equal / not zero (ZF=0)
1
0.5
None
jl / jnge rel32
Jump if less (signed: SF≠OF)
1
0.5
None
jle / jng rel32
Jump if less or equal (signed: ZF=1 or SF≠OF)
1
0.5
None
jg / jnle rel32
Jump if greater (signed: ZF=0 and SF=OF)
1
0.5
None
jge / jnl rel32
Jump if greater or equal (signed: SF=OF)
1
0.5
None
jb / jnae / jc rel32
Jump if below (unsigned: CF=1)
1
0.5
None
jbe / jna rel32
Jump if below or equal (unsigned: CF=1 or ZF=1)
1
0.5
None
ja / jnbe rel32
Jump if above (unsigned: CF=0 and ZF=0)
1
0.5
None
jae / jnb / jnc rel32
Jump if above or equal (unsigned: CF=0)
1
0.5
None
js rel32
Jump if sign (SF=1)
1
0.5
None
jns rel32
Jump if not sign (SF=0)
1
0.5
None
jo rel32
Jump if overflow (OF=1)
1
0.5
None
jno rel32
Jump if not overflow (OF=0)
1
0.5
None
jp / jpe rel32
Jump if parity even (PF=1)
1
0.5
None
jnp / jpo rel32
Jump if parity odd (PF=0)
1
0.5
None
loop rel8
Decrement RCX; jump if RCX ≠ 0
5
5
None
call rel32
Push return address, jump to target
1-3
1
None
call r/m64
Indirect call
1-3
1
None
ret
Pop return address and jump
1-3
1
None
ret imm16
Pop return address, adjust RSP by imm16
1-3
1
None
setcc r/m8
Set byte to 1 if condition cc true, else 0
1
0.5
None
String and Repetition
Instruction
Description
L
T
Notes
rep movsb
Copy RCX bytes from [RSI] to [RDI]
varies
-
DF controls direction
rep movsq
Copy RCX qwords from [RSI] to [RDI]
varies
-
rep stosb
Fill RCX bytes at [RDI] with AL
varies
-
Used for memset
rep stosq
Fill RCX qwords at [RDI] with RAX
varies
-
repe cmpsb
Compare bytes at [RSI] and [RDI] while equal
varies
-
ZF = comparison result
repne scasb
Find byte in [RDI] not equal to AL
varies
-
Used for strlen
cld
Clear direction flag (DF=0, forward)
1
1
None
std
Set direction flag (DF=1, backward)
1
1
None
Stack and Function Call Mechanics
Instruction
Description
Notes
enter imm16, 0
Create stack frame (slow; avoid)
push rbp; mov rbp, rsp; sub rsp, imm16
leave
Destroy stack frame
mov rsp, rbp; pop rbp
syscall
Fast system call to ring 0
RAX=number; saves RCX=RIP, R11=RFLAGS
sysret
Return from kernel to ring 3
Restores RIP from RCX, RFLAGS from R11
int imm8
Software interrupt (INT 0x80 = Linux 32-bit ABI)
Use syscall for 64-bit
iretq
Interrupt return (64-bit)
Pops RIP, CS, RFLAGS, RSP, SS
hlt
Halt CPU until next interrupt
Ring 0 only
cli
Disable maskable interrupts (IF=0)
Ring 0 only
sti
Enable maskable interrupts (IF=1)
Ring 0 only
nop
No operation (0x90)
1 cycle; used for alignment
pause
Hint spin-wait loop
Reduces power and memory conflicts
Memory Ordering and Atomics
Instruction
Description
Notes
lock add [m], r
Atomic add
LOCK prefix makes RMW atomic
lock cmpxchg [m], r
Atomic compare-and-swap
RAX=expected; r=new
lock xchg [m], r
Atomic exchange
LOCK implicit for xchg
lock inc [m]
Atomic increment
mfence
Full memory fence
All loads/stores ordered
sfence
Store fence
Stores ordered before later stores
lfence
Load fence
Loads ordered; also serializes instruction stream
xacquire lock prefix
Hardware lock elision acquire
For HTM
xrelease lock prefix
Hardware lock elision release
For HTM
rdtsc
Read time-stamp counter → EDX:EAX
Not serializing; use with lfence
rdtscp
Read TSC and processor ID
Serializes on read side
System Control and Privilege
Instruction
Description
Privilege
lgdt [m]
Load GDT register
Ring 0
lidt [m]
Load IDT register
Ring 0
lldt r/m16
Load LDT selector
Ring 0
ltr r/m16
Load task register (TSS)
Ring 0
mov cr0, r64
Write CR0 (protected mode, paging enable)
Ring 0
mov cr3, r64
Write CR3 (page table base address)
Ring 0
mov cr4, r64
Write CR4 (PAE, SMEP, SMAP, etc.)
Ring 0
rdmsr
Read MSR (ECX=address) → EDX:EAX
Ring 0
wrmsr
Write MSR (ECX=address, EDX:EAX=value)
Ring 0
cpuid
Query CPU features (EAX=leaf)
Any
invlpg [m]
Invalidate TLB entry for address
Ring 0
invpcid
Invalidate TLB by PCID
Ring 0
endbr64
Valid indirect branch target (CET IBT)
Any (nop without CET)
wrssd [m], r32
Write to shadow stack (CET SHSTK)
Any
rstorssp [m]
Restore shadow stack pointer (CET SHSTK)
Ring 0
SIMD (SSE/AVX) — Commonly Used Instructions
Packed Floating-Point (SSE/AVX)
Instruction
Description
Width
movaps xmm, m128
Move aligned packed singles
4×f32 in 128b
movups xmm, m128
Move unaligned packed singles
4×f32 in 128b
vmovaps ymm, m256
Move aligned packed singles (AVX)
8×f32 in 256b
addps xmm, xmm/m128
Add packed singles
4 f32
vaddps ymm, ymm, ymm/m256
Add packed singles (AVX)
8 f32
mulps xmm, xmm/m128
Multiply packed singles
4 f32
vmulps ymm, ymm, ymm/m256
Multiply packed singles (AVX)
8 f32
divps xmm, xmm/m128
Divide packed singles
4 f32
sqrtps xmm, xmm/m128
Square root packed singles
4 f32
haddps xmm, xmm/m128
Horizontal add packed singles
adjacent pairs
shufps xmm, xmm/m128, imm8
Shuffle packed singles
4 f32
vpermps ymm, ymm, ymm/m256
Permute packed singles (AVX2)
8 f32
vfmadd231ps ymm, ymm, ymm
Fused multiply-add a*b+c (FMA)
8 f32
vcvttps2dq ymm, ymm/m256
Convert packed f32 to i32 (truncate)
8 values
vcvtdq2ps ymm, ymm/m256
Convert packed i32 to f32
8 values
Packed Integer (SSE/AVX2)
Instruction
Description
Width
paddb xmm, xmm/m128
Add packed bytes
16×i8
paddw xmm, xmm/m128
Add packed words
8×i16
paddd xmm, xmm/m128
Add packed dwords
4×i32
paddq xmm, xmm/m128
Add packed qwords
2×i64
vpaddq ymm, ymm, ymm/m256
Add packed qwords (AVX2)
4×i64
pcmpeqd xmm, xmm/m128
Compare packed dwords for equality
4×i32
vpcmpeqd ymm, ymm, ymm/m256
Compare packed dwords (AVX2)
8×i32
pmovmskb r32, xmm
Extract byte mask from packed comparison
16 bits
vpmovmskb r32, ymm
Extract byte mask (AVX2)
32 bits
pshufb xmm, xmm/m128
Shuffle bytes (SSSE3)
16 bytes
vpshufb ymm, ymm, ymm/m256
Shuffle bytes (AVX2)
32 bytes
pclmulqdq xmm, xmm/m128, imm8
Carry-less multiply (PCLMUL)
for GCM
AES Instructions (AES-NI)
Instruction
Description
aesenc xmm, xmm/m128
One AES encryption round
aesenclast xmm, xmm/m128
Final AES encryption round
aesdec xmm, xmm/m128
One AES decryption round
aesdeclast xmm, xmm/m128
Final AES decryption round
aeskeygenassist xmm, xmm/m128, imm8
AES key schedule assist
pclmulqdq xmm, xmm/m128, imm8
Carry-less multiply (GCM authentication)
REX Prefix Encoding
The REX prefix (0x40-0x4F) enables 64-bit operand size and access to extended registers:
Bit
Name
Effect
REX.W (bit 3)
Wide
64-bit operand size
REX.R (bit 2)
Register extend
Extends the reg field of ModRM to access R8-R15, XMM8-XMM15
REX.X (bit 1)
Index extend
Extends the index field of SIB byte
REX.B (bit 0)
Base extend
Extends the r/m field of ModRM or the base field of SIB
Common REX prefix values:
- 40: REX (no bits set — used to access SPL, BPL, SIL, DIL)
- 48: REX.W (64-bit operation)
- 4C: REX.W + REX.R (64-bit, extended reg field)
- 49: REX.W + REX.B (64-bit, extended r/m field)
Register Quick Reference
64-bit
32-bit
16-bit
8-bit (high)
8-bit (low)
ABI Role (System V AMD64)
RAX
EAX
AX
AH
AL
Return value; caller-saved
RBX
EBX
BX
BH
BL
Callee-saved
RCX
ECX
CX
CH
CL
4th arg; caller-saved
RDX
EDX
DX
DH
DL
3rd arg; caller-saved
RSI
ESI
SI
—
SIL
2nd arg; caller-saved
RDI
EDI
DI
—
DIL
1st arg; caller-saved
RSP
ESP
SP
—
SPL
Stack pointer
RBP
EBP
BP
—
BPL
Frame pointer; callee-saved
R8
R8D
R8W
—
R8B
5th arg; caller-saved
R9
R9D
R9W
—
R9B
6th arg; caller-saved
R10
R10D
R10W
—
R10B
Caller-saved; syscall arg 4
R11
R11D
R11W
—
R11B
Caller-saved; syscall clobbers
R12
R12D
R12W
—
R12B
Callee-saved
R13
R13D
R13W
—
R13B
Callee-saved
R14
R14D
R14W
—
R14B
Callee-saved
R15
R15D
R15W
—
R15B
Callee-saved
RIP
—
—
—
—
Instruction pointer
RFLAGS
EFLAGS
FLAGS
—
—
Condition flags
We use cookies to improve your experience and show relevant ads. Privacy Policy