Chapter 8 Key Takeaways: Data Movement and Addressing Modes

  1. MOV is the most common instruction in any binary. Its job is copying a value from source to destination. Neither operand may be a memory reference simultaneously — memory-to-memory requires an intermediate register.

  2. Writing to a 32-bit register (EAX, EBX, …) automatically zeroes the upper 32 bits of the full 64-bit register. Writing to 16-bit (AX) or 8-bit (AL, AH) registers does NOT zero the upper bits. This asymmetry is a major source of partial-register bugs.

  3. The general addressing mode form is [base + index×scale + displacement]. The scale must be 1, 2, 4, or 8 — no other values. When struct sizes are not powers-of-two, compilers use a byte offset pointer rather than a scaled index.

  4. LEA computes an address without accessing memory. It exploits the addressing mode hardware for arithmetic: lea rax, [rbx + rbx*4] sets rax = rbx*5 with no memory access, no flags modification, and 1-cycle latency.

  5. LEA can multiply by 3, 5, or 9 in one instruction using [base + base*2], [base + base*4], [base + base*8]. Combinations with SHL extend this to 6, 10, 12, 18, etc.

  6. MOVZX zero-extends; MOVSX sign-extends. Use MOVZX for uint8_t/uint16_t loads into wider registers; use MOVSX for int8_t/int16_t. Using plain MOV to a narrow register when you need the full register clean is a common bug.

  7. MOVSXD is the 32-to-64 sign extension instruction. It is separate from MOVSX because the 32-bit-to-64-bit case needs a different encoding. Use it when loading a signed 32-bit value into a 64-bit register.

  8. RIP-relative addressing ([rel label]) is the standard way to access static data in 64-bit position-independent code. The address is encoded as a 32-bit signed offset from the next instruction's RIP, resolved at link time.

  9. XCHG with a memory operand is always atomic — it carries an implicit LOCK prefix. This makes it useful as a simple test-and-set mutex, but also means it is slower than two MOVs for non-atomic swaps.

  10. The scale in addressing modes is free hardware. [rbx + rcx*8] costs no more than [rbx + rcx] in terms of address generation latency on modern processors (1 cycle for both on Haswell and later).

  11. Compilers prefer LEA over IMUL for multiplication by small constants when the multiplier can be expressed as 1 + 2^k (k ≤ 3) or decomposed into LEA + SHL chains. The threshold: if LEA achieves it in ≤ 2 instructions at ≤ 2 cycles, use LEA; otherwise use IMUL.

  12. In AT&T syntax (used by GDB default), the operand order is reversed and addresses are written as disp(base, index, scale). The command set disassembly-flavor intel in GDB switches to Intel syntax, which matches NASM.

  13. The addressing mode [rdi + rcx*8] is the canonical array access pattern for 8-byte (64-bit) elements. For 4-byte: [rdi + rcx*4]. For 2-byte: [rdi + rcx*2]. For 1-byte: [rdi + rcx].

  14. Struct field access in assembly is always [pointer + compile_time_offset]. The offsets can be computed with offsetof() in C or by inspection of the struct definition. Padding and alignment rules determine the exact offsets.

  15. The real performance cost in data movement is cache misses, not addressing mode complexity. A simple [rbx] that misses L3 cache costs 200+ cycles. A complex [rbx + rcx*8 + 32] that hits L1 cache costs 4 cycles. Optimize data layout before optimizing addressing modes.