Chapter 3 Key Takeaways: The x86-64 Architecture
The 13 Most Important Points from This Chapter
1. x86-64 has 16 general-purpose 64-bit registers. RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP are the original 8 (inherited from x86 via the 8086). R8-R15 are the 8 new registers added by AMD's 64-bit extension. All 16 can hold any 64-bit integer value; their "names" reflect historical convention, not hardware enforcement.
2. The 32-bit write zeroes the upper half — the most important architectural rule. Any write to a 32-bit register (EAX, EBX, R8D, etc.) automatically zeroes bits 63:32 of the corresponding 64-bit register. Writes to 16-bit or 8-bit sub-registers do NOT zero upper bits. This rule is the source of a common class of silent bugs where a 64-bit value is silently truncated to 32 bits.
3. The System V AMD64 ABI assigns registers to calling convention roles. Integer arguments: RDI, RSI, RDX, RCX, R8, R9 (in that order). Return value: RAX (primary), RDX (secondary). Callee-saved (must be preserved across function calls): RBX, RBP, R12, R13, R14, R15. Caller-saved (may be destroyed): everything else. Violating these conventions causes crashes or data corruption in the calling code.
4. RSP is the stack pointer and must remain 16-byte aligned before CALL. The stack grows toward lower addresses. PUSH decrements RSP then stores; POP loads then increments. The System V ABI requires RSP to be 16-byte aligned immediately before a CALL instruction. After CALL pushes the 8-byte return address, RSP is 8-byte aligned at function entry — which is why function prologues often adjust RSP by 8 or push an extra register.
5. SYSCALL saves RIP to RCX and RFLAGS to R11.
The syscall instruction is not a normal call — it doesn't use the stack. Instead, it saves the return address to RCX and the flags to R11, then jumps to the kernel. This means: RCX and R11 are always clobbered by a syscall, regardless of what the called system call function might otherwise preserve.
6. RIP (instruction pointer) advances automatically and cannot be set with MOV.
RIP holds the address of the next instruction. It advances as instructions execute. You can only modify it through JMP (unconditional), conditional jumps, CALL, and RET. In 64-bit mode, almost all code uses RIP-relative addressing for data access — the LEA instruction with [rip + offset] — to support position-independent execution.
7. The REX prefix enables 64-bit operations and access to R8-R15. Instructions need no REX prefix for 32-bit operations on RAX-RSP (the original 8). A REX prefix with the W bit set indicates 64-bit operand size. REX.R, REX.X, REX.B extend the register fields to address R8-R15. The assembler generates REX automatically; you only need to understand it when reading machine code bytes.
8. RFLAGS is 64 bits; user-space code accesses it via PUSHFQ/POPFQ.
The key user-space flags: CF (unsigned overflow), OF (signed overflow), ZF (zero), SF (sign), DF (direction for string ops). The CLD instruction clears DF (standard); STD sets it (reverse direction). The IF flag (interrupt enable) can only be modified in ring 0 (kernel mode). User space cannot enable or disable hardware interrupts.
9. XMM0-XMM15 are the 128-bit SSE registers; YMM are 256-bit (AVX); ZMM are 512-bit (AVX-512). YMM registers are the upper half of ZMM registers; XMM are the lower half of YMM. Writing to XMM via VEX-encoded AVX instructions zeroes the upper YMM half. Writing to XMM via legacy SSE instructions does not touch the upper half. The XMM registers are callee-saved in the x87 register convention but caller-saved in the System V ABI for SSE.
10. CPUID is the runtime feature detection mechanism. Before using any optional CPU extension (SSE4.2, AVX2, AES-NI, SHA, AVX-512, etc.), check the corresponding CPUID bit. CPUID leaf 1, ECX bit 25 = AES-NI; leaf 1, ECX bit 28 = AVX; leaf 7 sub-leaf 0, EBX bit 5 = AVX2; leaf 7 EBX bit 16 = AVX-512F. Production code dispatches to the best available implementation at startup.
11. FS and GS segment registers provide thread-local and per-CPU storage.
In 64-bit mode, all segment registers except FS and GS have their bases forced to 0. The FS base is set by the OS to point to the current thread's TLS (Thread Local Storage) structure. The stack canary (fs:0x28 on Linux) is a well-known TLS slot. In the kernel, GS stores the per-CPU structure (swapped at context switch with SWAPGS).
12. The CPU pipeline executes instructions out-of-order but retires them in order. Modern x86-64 CPUs decode instructions into micro-ops and execute them out-of-order based on data availability. The architectural state (visible through GDB) reflects in-order retirement. Practical implications: data-dependent instruction sequences cause pipeline stalls; independent instructions can be issued in parallel; memory operations have constrained ordering that MFENCE can enforce.
13. The brand string and vendor identification are readable via CPUID leaves 0 and 0x80000002-4. CPUID leaf 0 returns the maximum leaf and the vendor string ("GenuineIntel" or "AuthenticAMD") in EBX/ECX/EDX. Leaves 0x80000002-4 return the 48-character brand string ("Intel(R) Core(TM) i9..."). These are read-only CPU self-descriptions embedded in the processor.
Visual Summary: Complete Register Bank
General Purpose Registers (64-bit)
┌──────────────────────────────────────────────────────────────────────────┐
│ RAX (accumulator/return value/syscall#) │ R8 │
│ RBX (callee-saved) │ R9 │
│ RCX (4th arg / loop counter / RIP←SYSCAL)│ R10 │
│ RDX (3rd arg / high MUL result) │ R11 (RFLAGS←SYSCALL) │
│ RSI (2nd arg / source index) │ R12 ┐ │
│ RDI (1st arg / destination index) │ R13 │ Callee-saved │
│ RSP (stack pointer — must align to 16) │ R14 │ │
│ RBP (frame pointer, optional) │ R15 ┘ │
└──────────────────────────────────────────────────────────────────────────┘
Special Registers
┌────────────────────────────────┐
│ RIP (instruction pointer) │ Modified by JMP/CALL/RET/SYSCALL only
│ RFLAGS (status flags) │ Read via PUSHFQ/POPFQ
│ CS/SS/DS/ES (base = 0) │ Legacy; essentially unused in 64-bit
│ FS/GS (TLS/per-CPU) │ Set by OS; used for fs:0x... accesses
└────────────────────────────────┘
SIMD Registers
┌─────────────────────────────────────────────────────┐
│ XMM0-XMM15 (128-bit SSE) ← part of YMM/ZMM │
│ YMM0-YMM15 (256-bit AVX) ← part of ZMM │
│ ZMM0-ZMM31 (512-bit AVX-512, with k0-k7 masks) │
└─────────────────────────────────────────────────────┘