Before a processor can add two numbers, compare them, or jump to a subroutine, it must move data. Every non-trivial computation is mostly data movement: load a value from memory, operate on it in a register, store the result back. The x86-64 MOV...
In This Chapter
- The Fundamental Operation of Every Computer
- The MOV Instruction: All Forms
- Addressing Modes: The Complete Reference
- LEA: The Instruction That Isn't Really About Memory
- MOVZX: Zero-Extend on Load
- MOVSX: Sign-Extend on Load
- XCHG: Exchange
- A Complete Addressing Mode Example: Struct Access in C
- Performance of Addressing Modes
- Register Trace: A Data Movement Sequence
- Memory-Mapped I/O Preview
- The AT&T Syntax Alternative
- Complete Example: Copying an Array
- Summary
Chapter 8: Data Movement and Addressing Modes
The Fundamental Operation of Every Computer
Before a processor can add two numbers, compare them, or jump to a subroutine, it must move data. Every non-trivial computation is mostly data movement: load a value from memory, operate on it in a register, store the result back. The x86-64 MOV instruction family and its addressing modes are the machinery behind all of that movement, and understanding them in detail separates programmers who write assembly from programmers who understand assembly.
This chapter covers every form of MOV, the complete x86-64 addressing mode syntax, the LEA instruction (which computes addresses without reading memory and serves double duty as fast arithmetic), and the MOVZX/MOVSX instructions that handle size mismatches safely. By the end of this chapter, you will read mov rax, [rbx + rcx*8 + 16] and immediately understand what it accesses, how the processor computes the address, and why a compiler might emit that exact instruction for a struct field inside an array.
The MOV Instruction: All Forms
MOV is the most common instruction in any x86-64 binary. Its job is conceptually simple: copy a value from a source to a destination. The complexity is in how many kinds of sources and destinations exist.
; Register to register (same size)
mov rax, rbx ; 64-bit: RAX = RBX
mov eax, ebx ; 32-bit: EAX = EBX (also zeros upper 32 bits of RAX)
mov ax, bx ; 16-bit: AX = BX (upper 48 bits of RAX unchanged)
mov al, bl ; 8-bit: AL = BL (upper 56 bits of RAX unchanged)
; Immediate to register
mov rax, 0x1234567890abcdef ; 64-bit immediate (MOVABS encoding)
mov eax, 42 ; 32-bit immediate (zeros upper 32 bits of RAX)
mov ax, 0xFFFF ; 16-bit immediate
mov al, 0xFF ; 8-bit immediate
; Memory to register
mov rax, [rbx] ; load 64-bit from address in RBX
mov eax, [rbx] ; load 32-bit from address in RBX
mov ax, [rbx] ; load 16-bit from address in RBX
mov al, [rbx] ; load 8-bit from address in RBX
; Register to memory
mov [rbx], rax ; store 64-bit to address in RBX
mov [rbx], eax ; store 32-bit
mov [rbx], ax ; store 16-bit
mov [rbx], al ; store 8-bit
; Immediate to memory (must specify size explicitly)
mov qword [rbx], 42 ; store 64-bit value 42
mov dword [rbx], 42 ; store 32-bit value 42
mov word [rbx], 42 ; store 16-bit value 42
mov byte [rbx], 42 ; store 8-bit value 42
The 32-bit Zero-Extension Rule
This is one of the most important (and surprising) behaviors in x86-64. When you write to a 32-bit register, the upper 32 bits of the full 64-bit register are automatically zeroed:
mov rax, 0xFFFFFFFFFFFFFFFF ; RAX = 0xFFFFFFFFFFFFFFFF
mov eax, 1 ; RAX = 0x0000000000000001 ← upper 32 zeroed!
mov rax, 0xFFFFFFFFFFFFFFFF ; RAX = 0xFFFFFFFFFFFFFFFF
mov ax, 1 ; RAX = 0xFFFFFFFFFFFF0001 ← only 16 bits changed
mov al, 1 ; RAX = 0xFFFFFFFFFFFFFF01 ← only 8 bits changed
The 32-bit zero-extension rule exists because Intel designed it for performance: clearing the upper half eliminates false dependencies that would otherwise stall the out-of-order engine. The 8-bit and 16-bit variants do not have this behavior — they preserve the upper bits — which is a historical quirk from 16-bit x86.
⚠️ Common Mistake: Writing to AX or AL when you intend to operate on a clean 64-bit value. If RAX contains garbage in the upper bits and you write to AL, the garbage persists. Use
movzx rax, alto explicitly zero-extend when you need the full register clean.
What MOV Cannot Do
MOV has one important restriction: you cannot move memory to memory directly. Both operands cannot be memory references:
; ILLEGAL — cannot do memory to memory
mov [rdi], [rsi] ; assembler error
; CORRECT — use a register as intermediary
mov rax, [rsi]
mov [rdi], rax
MOV also cannot load a 64-bit immediate into memory directly (use a register first), and the immediate-to-memory form is limited to 32-bit sign-extended values for the mov [mem], imm encoding.
Addressing Modes: The Complete Reference
An addressing mode is the method by which an instruction specifies the memory location it wants to access. x86-64 supports more addressing modes than most RISC architectures, and they all reduce to one general form:
[ base + index × scale + displacement ]
Where: - base is any general-purpose register - index is any general-purpose register except RSP - scale is 1, 2, 4, or 8 (no other values — enforced by the encoding) - displacement is a signed 8-bit or 32-bit constant
Not all components are required. Here is every combination you will encounter:
Immediate (Not Really an Addressing Mode, but Listed for Completeness)
mov rax, 42 ; source is the literal value 42
The value is encoded directly in the instruction bytes. No memory access for the source operand.
Register Direct
mov rax, rbx ; source is the register RBX
No memory access. The value comes from another register.
Direct Memory (Absolute Address)
mov rax, [0x600000] ; load from absolute address 0x600000
Rarely used in position-independent code. Requires a 32-bit address in the 64-bit encoding, which limits it to addresses that fit in 32 bits (or use sign extension to reach negative addresses). In practice, you will almost always see RIP-relative addressing instead.
Register Indirect
mov rax, [rbx] ; load from address stored in RBX
This is [base] — displacement and index are zero. The register holds the address. This is how you dereference a pointer.
; Dereferencing a pointer in C:
; int x = *ptr;
; In assembly (ptr in RDI):
mov eax, [rdi] ; EAX = *ptr
Base + Displacement
mov rax, [rbx + 8] ; load from address RBX+8
mov rax, [rbx - 4] ; displacement can be negative
mov rax, [rbx + 0x18] ; or in hex
This is the workhorse of struct field access. If RBX points to a struct, [rbx + 8] accesses the field at offset 8:
struct Point {
int64_t x; // offset 0
int64_t y; // offset 8
int64_t z; // offset 16
};
; Accessing struct fields (struct pointer in RDI):
mov rax, [rdi] ; rax = point.x
mov rbx, [rdi + 8] ; rbx = point.y
mov rcx, [rdi + 16] ; rcx = point.z
Base + Index
mov rax, [rbx + rcx] ; load from address RBX + RCX
Useful for two independent dynamic values. Less common than the next form.
Base + Index × Scale
mov rax, [rbx + rcx*8] ; load from RBX + RCX*8
mov rax, [rbx + rcx*4] ; scale 4 for int32_t arrays
mov rax, [rbx + rcx*2] ; scale 2 for int16_t arrays
mov rax, [rbx + rcx*1] ; scale 1 for int8_t arrays (same as base+index)
This is the array indexing form. If RBX is the base address of an array of 64-bit integers and RCX is the index, [rbx + rcx*8] accesses element array[rcx].
; int64_t array[N]; accessing array[i]
; RBX = array base address, RCX = i
mov rax, [rbx + rcx*8] ; rax = array[i]
The scale factor must be 1, 2, 4, or 8. These correspond to the sizes of the standard C types (byte, short, int, pointer/long). No other scale values are encodable. If you need [rbx + rcx*3], you must do the multiply yourself or use LEA.
⚙️ How It Works: The scale multiplication happens in dedicated address generation hardware, not the integer ALU. It costs nothing extra compared to unscaled indexing. The processor computes
base + index*scale + displacementin a single clock in the AGU (Address Generation Unit) pipeline stage.
Base + Index × Scale + Displacement
mov rax, [rbx + rcx*8 + 16] ; the full form
mov rax, [rdi + rsi*4 + 24] ; accessing struct in array
This is the most general form. It combines array indexing with a field offset, letting you access a field inside a struct inside an array in one instruction:
struct Record {
int32_t id; // offset 0
int32_t flags; // offset 4
int64_t value; // offset 8
int64_t next; // offset 16
};
// Accessing records[i].value:
// RBX = array base, RCX = i, sizeof(Record) = 24
mov rax, [rbx + rcx*8 + 8] ; WRONG: scale 8 implies stride 8, not 24
Wait — there is a catch. The scale can only be 1, 2, 4, or 8. If your struct size is not one of those values, you cannot use the scaled index form directly. For a 24-byte struct, you need to multiply the index by 24 yourself:
; records[i].value where sizeof(Record) = 24
; Option 1: IMUL
imul rcx, rcx, 24 ; rcx = i * 24
mov rax, [rbx + rcx + 8] ; rax = records[i].value
; Option 2: LEA (faster, see next section)
lea rcx, [rcx + rcx*2] ; rcx = rcx * 3
shl rcx, 3 ; rcx *= 8, so rcx = i * 24
mov rax, [rbx + rcx + 8]
RIP-Relative Addressing
mov rax, [rel my_variable] ; NASM: RIP + offset to my_variable
mov rax, [my_variable] ; also works in NASM 64-bit mode (default rel)
In 64-bit mode, addresses can be up to 64 bits wide — too large for a fixed address in most instruction encodings. Position-independent code (PIE/shared libraries) cannot use absolute 32-bit addresses anyway. The solution is RIP-relative addressing: the address is expressed as a signed 32-bit offset from the next instruction's RIP.
section .data
counter: dq 0
section .text
global _start
_start:
; RIP-relative access to counter
mov rax, [rel counter] ; load counter
inc rax
mov [rel counter], rax ; store counter
The assembler computes the correct offset at link time. At runtime, the CPU adds the 32-bit offset to RIP to get the actual address. This works anywhere the data is within ±2GB of the code, which covers virtually all binaries.
💡 Mental Model: Think of RIP-relative as "the data is this many bytes away from where I'm standing right now." It is the addressing mode that makes position-independent executables possible.
Addressing Mode Reference Table
| Mode | NASM Syntax | Address Computed | Use Case |
|---|---|---|---|
| Immediate | mov rax, 42 |
N/A (value in instruction) | Literal values |
| Register | mov rax, rbx |
N/A (register to register) | Register copies |
| Direct | mov rax, [0x600000] |
0x600000 | Absolute static addresses |
| Register Indirect | mov rax, [rbx] |
RBX | Pointer dereference |
| Base+Disp | mov rax, [rbx+8] |
RBX + 8 | Struct field access |
| Base+Index | mov rax, [rbx+rcx] |
RBX + RCX | Two dynamic indices |
| Base+Index×Scale | mov rax, [rbx+rcx*8] |
RBX + RCX×8 | Array indexing |
| Full | mov rax, [rbx+rcx*4+16] |
RBX + RCX×4 + 16 | Array of structs |
| RIP-relative | mov rax, [rel label] |
RIP + offset | PIC static data |
LEA: The Instruction That Isn't Really About Memory
LEA stands for Load Effective Address. It computes an address using the full addressing mode syntax, but — crucially — it does not actually access memory. The computed address is stored directly in the destination register.
lea rax, [rbx + 8] ; rax = rbx + 8 (no memory load)
lea rax, [rbx + rcx*4 + 16] ; rax = rbx + rcx*4 + 16
At first this seems pointless. Why compute an address and not use it? The answer is that LEA is the fastest way to perform certain kinds of arithmetic:
LEA as Addition with Offset
; These are equivalent:
add rax, rbx ; rax += rbx
lea rax, [rax + rbx] ; rax = rax + rbx (different dest possible)
; LEA can use a different destination:
lea rcx, [rax + rbx] ; rcx = rax + rbx, rax and rbx unchanged
; ADD cannot do this: add rdx, rax, rbx is NOT a valid encoding
LEA for Multiplication by Small Constants
The scale factor in the addressing mode encodes multiplication by 1, 2, 4, or 8. By combining base and index cleverly, you can multiply by 3, 5, 9:
; Multiply by 3: x*3 = x*2 + x
lea rax, [rbx + rbx*2] ; rax = rbx + rbx*2 = rbx*3
; Multiply by 5: x*5 = x*4 + x
lea rax, [rbx + rbx*4] ; rax = rbx + rbx*4 = rbx*5
; Multiply by 9: x*9 = x*8 + x
lea rax, [rbx + rbx*8] ; rax = rbx + rbx*8 = rbx*9
; Multiply by 10: two LEAs
lea rax, [rbx + rbx*4] ; rax = rbx*5
lea rax, [rax + rax] ; rax = rbx*10 (or: lea rax, [rax*2])
; Multiply by 7: x*7 = x*8 - x
lea rax, [rbx*8] ; rax = rbx*8
sub rax, rbx ; rax = rbx*7
; Or: lea rax, [rbx*8 - base] is NOT encodable (no negative base)
; Use: lea rax, [rbx + rbx*4]; add rax, rbx; add rax, rax (complex)
; Or just: imul rax, rbx, 7
; Multiply by 25: x*25 = x*5 * 5
lea rax, [rbx + rbx*4] ; rax = rbx*5
lea rax, [rax + rax*4] ; rax = rax*5 = rbx*25
LEA for Addition with Displacement
; Increment by a constant without affecting flags
lea rax, [rax + 4] ; rax += 4 (does NOT touch flags — ADD does)
; Multi-operand addition in one instruction
lea rax, [rbx + rcx + 16] ; rax = rbx + rcx + 16
⚡ Performance Note: LEA executes in the AGU pipeline and does not use the integer ALU. On modern Intel processors, there are up to 3 AGUs that can execute LEA instructions per clock, compared to 4 integer ALUs. For simple forms (base+displacement), LEA has the same latency as ADD (1 cycle). The three-component form (base+index+displacement) may have slightly higher latency on some microarchitectures.
Why Compilers Love LEA
GCC and Clang use LEA aggressively for two reasons: it computes a three-way sum in one instruction, and it does not modify flags. Consider this C code:
long compute(long a, long b) {
return a * 5 + b + 3;
}
Naive translation might use IMUL. But GCC -O2 emits:
; RDI = a, RSI = b
lea rax, [rdi + rdi*4] ; rax = a*5
lea rax, [rax + rsi + 3] ; rax = a*5 + b + 3
ret
Two LEA instructions, no flags clobbered, no multiply latency. This is one of those x86-specific tricks that makes the architecture fast despite its complexity.
🔍 Under the Hood: The "no-flags" property of LEA is important when surrounding code depends on flags from a previous comparison. An ADD in the middle would destroy those flags; a LEA leaves them untouched, allowing the conditional branch that checks them to remain correct.
MOVZX: Zero-Extend on Load
When loading a smaller value into a larger register, you often want the upper bits cleared. MOVZX does this explicitly:
movzx rax, byte [rbx] ; rax = zero_extend_to_64(*(uint8_t*)rbx)
movzx rax, word [rbx] ; rax = zero_extend_to_64(*(uint16_t*)rbx)
movzx eax, byte [rbx] ; eax = zero_extend_to_32(*(uint8_t*)rbx)
movzx eax, word [rbx] ; eax = zero_extend_to_32(*(uint16_t*)rbx)
movzx rax, al ; rax = zero_extend_to_64(al)
movzx rax, ax ; rax = zero_extend_to_64(ax)
The key thing to remember: movzx r64, r/m8 always zeroes bits 63:8 of the destination. This is different from mov al, [rbx] which only writes bits 7:0 and leaves the rest of RAX as-is.
; Incorrect: loading a byte into a loop counter
mov rax, 0xDEADBEEFDEADBEEF
mov al, [rbx] ; RAX = 0xDEADBEEFDEADBE?? (garbage in upper bits!)
; Correct:
movzx rax, byte [rbx] ; RAX = 0x00000000000000?? (clean)
📊 C Comparison:
c uint8_t b = read_byte(ptr); uint64_t v = b; // implicit zero-extensionThe compiler emitsmovzx rax, byte [rdi]for this.
MOVSX: Sign-Extend on Load
When loading a signed smaller value into a larger register, you want the sign bit propagated. MOVSX sign-extends:
movsx rax, byte [rbx] ; sign-extend 8-bit to 64-bit
movsx rax, word [rbx] ; sign-extend 16-bit to 64-bit
movsx rax, dword [rbx] ; sign-extend 32-bit to 64-bit
movsx eax, byte [rbx] ; sign-extend 8-bit to 32-bit
movsx eax, word [rbx] ; sign-extend 16-bit to 32-bit
Sign extension replicates the most significant bit:
; Value in memory: 0xFF = -1 as int8_t
movsx rax, byte [rbx] ; RAX = 0xFFFFFFFFFFFFFFFF = -1 as int64_t
movzx rax, byte [rbx] ; RAX = 0x00000000000000FF = 255 as uint64_t
; Value in memory: 0x80 = -128 as int8_t
movsx eax, byte [rbx] ; EAX = 0xFFFFFF80 = -128 as int32_t
movzx eax, byte [rbx] ; EAX = 0x00000080 = 128 as uint32_t
📊 C Comparison:
c int8_t b = *(int8_t*)ptr; int64_t v = b; // implicit sign-extensionThe compiler emitsmovsx rax, byte [rdi]for this.
MOVSXD: Sign-Extend 32-bit to 64-bit
There is a special instruction for the 32-to-64 case because the regular movsx encoding does not cover it:
movsxd rax, dword [rbx] ; sign-extend 32-bit to 64-bit
movsxd rax, ecx ; sign-extend ECX into RAX
This is commonly seen when working with 32-bit indices or when interfacing with code that stores signed 32-bit values:
; int32_t index; used as 64-bit array subscript
movsxd rax, dword [rbp-4] ; load signed 32-bit index
mov rbx, [array + rax*8] ; use as 64-bit index
⚠️ Common Mistake: Using
mov eax, [mem]instead ofmovsxd rax, dword [mem]when the value is a signed 32-bit integer. The MOV zero-extends, which gives wrong results for negative indices.movsxdsign-extends, preserving the signed value.
XCHG: Exchange
XCHG swaps the contents of two operands atomically:
xchg rax, rbx ; swap RAX and RBX
xchg [mutex], rax ; atomically swap memory with register
The memory form of XCHG has an implicit LOCK prefix — it is always atomic, regardless of whether you write LOCK explicitly. This makes it useful as a mutex acquire:
; Spinlock using XCHG (the classic test-and-set)
section .data
lock_var: db 0 ; 0 = free, 1 = locked
acquire_lock:
mov al, 1
.spin:
xchg [lock_var], al ; atomically: al = *lock_var, *lock_var = 1
test al, al ; was it 0 before?
jnz .spin ; no: someone else holds it, spin
ret ; yes: we acquired it
release_lock:
mov byte [lock_var], 0 ; store-release; not atomic needed here
ret
⚠️ Common Mistake: Using XCHG with memory as a fast swap (for the swap algorithm). Its implicit LOCK prefix causes it to emit a full memory fence, making it slower than two MOVs for non-atomic use. Use XCHG between registers only if you just want to swap values and do not need atomicity.
A Complete Addressing Mode Example: Struct Access in C
Let us trace through what the compiler generates for a realistic C struct access:
typedef struct {
uint32_t id; // offset 0, size 4
uint32_t flags; // offset 4, size 4
uint64_t value; // offset 8, size 8
char *name; // offset 16, size 8
double score; // offset 24, size 8
} Record; // total size: 32 bytes
uint64_t get_value(Record *rec) {
return rec->value;
}
; GCC -O2 output for get_value:
; RDI = rec (pointer to Record)
get_value:
mov rax, [rdi + 8] ; load rec->value (offset 8)
ret
Now for an array access:
uint64_t sum_values(Record *array, int count) {
uint64_t total = 0;
for (int i = 0; i < count; i++) {
total += array[i].value;
}
return total;
}
; GCC -O2 output (simplified):
; RDI = array, ESI = count
sum_values:
xor eax, eax ; total = 0
test esi, esi
jle .done ; if count <= 0, return 0
xor ecx, ecx ; i = 0
.loop:
add rax, [rdi + rcx*1 + 8] ; total += array[i].value
; Note: GCC uses rcx as byte offset, not element index
; Because sizeof(Record)=32 is not 1/2/4/8, it uses byte offset
add rcx, 32 ; advance to next Record (32 bytes)
dec esi ; count--
jnz .loop
.done:
ret
The compiler cannot use rcx*32 (32 is not a valid scale), so it tracks the byte offset instead of the element index. This is standard compiler behavior: when the stride is not 1, 2, 4, or 8, the index register holds the byte offset, updated by adding the stride each iteration.
Performance of Addressing Modes
Not all addressing modes are equally fast:
| Mode | Address Generation Latency | Notes |
|---|---|---|
| Register direct | 0 (no AGU needed) | Fastest possible |
| Base only | 1 cycle | Simple |
| Base + displacement | 1 cycle | Same as above |
| Base + index | 1 cycle | Same |
| Base + index×scale | 1 cycle | Scale is free |
| Base + index×scale + displacement | 1 cycle (Intel Haswell+) | Was 2 cycles on older CPUs |
| RIP-relative | 1 cycle | RIP is just another register to AGU |
Modern Intel processors (Haswell and later) can compute the full general form in 1 cycle. On older hardware (Sandy Bridge, Ivy Bridge), the four-component form sometimes took 2 cycles. This is a rare case where the full general form has no practical penalty on current hardware.
⚡ Performance Note: The real performance concern with addressing modes is pipeline depth and memory latency, not the AGU complexity. A
[rbx]load that hits L1 cache takes ~4 cycles to complete. A[rbx + rcx*8 + 16]load that also hits L1 takes ~4 cycles. The addressing mode itself is essentially free; the cache hierarchy is where the cost lives.
Register Trace: A Data Movement Sequence
Let us trace through a complete example to cement the concepts:
section .data
array: dq 10, 20, 30, 40, 50 ; 5 × 8-byte values
section .text
global _start
_start:
lea rbx, [rel array] ; RBX = address of array
mov rcx, 2 ; index = 2
mov rax, [rbx + rcx*8] ; rax = array[2]
movzx rdx, byte [rbx] ; rdx = (uint8_t)array[0] — just the first byte (0x0A)
lea rsi, [rbx + 4*8] ; rsi = address of array[4] (not a load)
; rax should be 30, rdx should be 10 (byte), rsi should be array+32
| Instruction | RAX | RBX | RCX | RDX | RSI | Notes |
|---|---|---|---|---|---|---|
| (start) | ? | ? | ? | ? | ? | |
lea rbx, [rel array] |
? | array_addr | ? | ? | ? | No memory load |
mov rcx, 2 |
? | array_addr | 2 | ? | ? | Immediate load |
mov rax, [rbx+rcx*8] |
30 | array_addr | 2 | ? | ? | Loads array[2]=30 |
movzx rdx, byte [rbx] |
30 | array_addr | 2 | 10 | ? | First byte of array[0] |
lea rsi, [rbx+4*8] |
30 | array_addr | 2 | 10 | array_addr+32 | Address, not load |
🛠️ Lab Exercise: Assemble and run this code in GDB. Set a breakpoint at
_start, then useni(next instruction) to step through. After each instruction, check register values withinfo registers. Verify the trace above. Then modify the index to 4 and confirmraxbecomes 50.
Memory-Mapped I/O Preview
In Chapter 29, you will use these addressing modes to talk directly to hardware registers. The principle is that hardware devices are mapped to specific physical addresses, and writing to those addresses controls the device.
; On a bare-metal x86-64 system (or in MinOS kernel mode):
; VGA text buffer lives at physical address 0xB8000
; Each character cell is 2 bytes: character + attribute
mov rbx, 0xB8000 ; VGA buffer base address
mov word [rbx], 0x0741 ; 'A' (0x41) with white-on-black (0x07)
mov word [rbx + 2], 0x0742 ; 'B' in next cell
The addressing modes are identical to those for normal RAM access. The memory controller routes writes to certain address ranges to hardware registers instead of DRAM. From the instruction's perspective, it is just a memory write.
📐 OS Kernel Project (MinOS): Save this pattern. In Chapter 29's MinOS kernel project, you will implement a VGA text mode console driver that uses exactly this technique — direct writes to 0xB8000 — to display output from your kernel before any device driver infrastructure exists.
The AT&T Syntax Alternative
If you are reading compiler output or working with GDB disassembly, you will encounter AT&T syntax. The addressing mode syntax is reversed and differently formatted:
| NASM (Intel syntax) | AT&T syntax | Meaning |
|---|---|---|
mov rax, rbx |
movq %rbx, %rax |
rax = rbx |
mov rax, [rbx] |
movq (%rbx), %rax |
rax = *rbx |
mov rax, [rbx+8] |
movq 8(%rbx), %rax |
rax = *(rbx+8) |
mov rax, [rbx+rcx*8] |
movq (%rbx,%rcx,8), %rax |
rax = (rbx+rcx8) |
mov rax, [rbx+rcx*4+16] |
movq 16(%rbx,%rcx,4), %rax |
rax = (rbx+rcx4+16) |
movzx eax, byte [rbx] |
movzbl (%rbx), %eax |
zero-extend byte to dword |
movsx rax, dword [rbx] |
movslq (%rbx), %rax |
sign-extend dword to qword |
The AT&T format is disp(base, index, scale). GDB defaults to AT&T; use set disassembly-flavor intel to switch.
Complete Example: Copying an Array
Here is a complete NASM program demonstrating multiple addressing modes:
; copy_array.asm — demonstrates addressing modes
; Copies src[0..4] to dst[0..4], reversing the order
section .data
src: dq 1, 2, 3, 4, 5 ; source array
dst: dq 0, 0, 0, 0, 0 ; destination array
section .text
global _start
_start:
lea rsi, [rel src] ; RSI = &src[0]
lea rdi, [rel dst] ; RDI = &dst[0]
mov rcx, 0 ; loop index
.loop:
; Load src[rcx]
mov rax, [rsi + rcx*8]
; Store into dst[4-rcx] (reversed)
; dst index = 4 - rcx
mov rdx, 4
sub rdx, rcx
mov [rdi + rdx*8], rax
; Increment and check
inc rcx
cmp rcx, 5
jl .loop
; Exit
mov eax, 60 ; sys_exit
xor edi, edi ; status = 0
syscall
🔄 Check Your Understanding: 1. In the loop above, what is the value in RDX on the first iteration (rcx=0)? 2. After
mov [rdi + rdx*8], raxon the first iteration, which element ofdstwas written? 3. If you changedsrcto holddq 10, 20, 30, 40, 50, what woulddstcontain after the loop?
Answer
- RDX = 4 - 0 = 4 on first iteration.
dst[4]was written withsrc[0]= 1. Sodst[4] = 1.dstwould contain[50, 40, 30, 20, 10]— the array reversed.
Summary
The x86-64 addressing modes give you a powerful language for describing how to locate data. The general form [base + index*scale + displacement] covers array indexing ([rsi + rcx*8]), struct field access ([rdi + 24]), and array-of-struct field access ([rbx + rcx*32 + 8]) all in one instruction. LEA exploits this syntax for arithmetic — multiplying by 3, 5, or 9, or computing multi-operand sums without touching flags. MOVZX and MOVSX handle size mismatches cleanly, preventing the partial-register bugs that have caused subtle errors since the 16-bit era.
The 32-bit zero-extension rule is the one behavior you must internalize: writing to EAX always zeroes the upper half of RAX, but writing to AX or AL does not. Get this wrong and you will spend an afternoon debugging a 64-bit value that should have been clean.
In the next chapter, you will use these addressing modes constantly as we work through every arithmetic and logic instruction in the ISA.