Chapter 8 Key Takeaways: Data Movement and Addressing Modes
-
MOV is the most common instruction in any binary. Its job is copying a value from source to destination. Neither operand may be a memory reference simultaneously — memory-to-memory requires an intermediate register.
-
Writing to a 32-bit register (EAX, EBX, …) automatically zeroes the upper 32 bits of the full 64-bit register. Writing to 16-bit (AX) or 8-bit (AL, AH) registers does NOT zero the upper bits. This asymmetry is a major source of partial-register bugs.
-
The general addressing mode form is
[base + index×scale + displacement]. The scale must be 1, 2, 4, or 8 — no other values. When struct sizes are not powers-of-two, compilers use a byte offset pointer rather than a scaled index. -
LEA computes an address without accessing memory. It exploits the addressing mode hardware for arithmetic:
lea rax, [rbx + rbx*4]setsrax = rbx*5with no memory access, no flags modification, and 1-cycle latency. -
LEA can multiply by 3, 5, or 9 in one instruction using
[base + base*2],[base + base*4],[base + base*8]. Combinations with SHL extend this to 6, 10, 12, 18, etc. -
MOVZX zero-extends; MOVSX sign-extends. Use MOVZX for
uint8_t/uint16_tloads into wider registers; use MOVSX forint8_t/int16_t. Using plain MOV to a narrow register when you need the full register clean is a common bug. -
MOVSXD is the 32-to-64 sign extension instruction. It is separate from MOVSX because the 32-bit-to-64-bit case needs a different encoding. Use it when loading a signed 32-bit value into a 64-bit register.
-
RIP-relative addressing (
[rel label]) is the standard way to access static data in 64-bit position-independent code. The address is encoded as a 32-bit signed offset from the next instruction's RIP, resolved at link time. -
XCHG with a memory operand is always atomic — it carries an implicit LOCK prefix. This makes it useful as a simple test-and-set mutex, but also means it is slower than two MOVs for non-atomic swaps.
-
The scale in addressing modes is free hardware.
[rbx + rcx*8]costs no more than[rbx + rcx]in terms of address generation latency on modern processors (1 cycle for both on Haswell and later). -
Compilers prefer LEA over IMUL for multiplication by small constants when the multiplier can be expressed as
1 + 2^k(k ≤ 3) or decomposed into LEA + SHL chains. The threshold: if LEA achieves it in ≤ 2 instructions at ≤ 2 cycles, use LEA; otherwise use IMUL. -
In AT&T syntax (used by GDB default), the operand order is reversed and addresses are written as
disp(base, index, scale). The commandset disassembly-flavor intelin GDB switches to Intel syntax, which matches NASM. -
The addressing mode
[rdi + rcx*8]is the canonical array access pattern for 8-byte (64-bit) elements. For 4-byte:[rdi + rcx*4]. For 2-byte:[rdi + rcx*2]. For 1-byte:[rdi + rcx]. -
Struct field access in assembly is always
[pointer + compile_time_offset]. The offsets can be computed withoffsetof()in C or by inspection of the struct definition. Padding and alignment rules determine the exact offsets. -
The real performance cost in data movement is cache misses, not addressing mode complexity. A simple
[rbx]that misses L3 cache costs 200+ cycles. A complex[rbx + rcx*8 + 32]that hits L1 cache costs 4 cycles. Optimize data layout before optimizing addressing modes.