Part II: The x86-64 Instruction Set

The Core Instruction Set Taught Through Real Programs

Part I gave you the machine model: registers, memory, the fetch-decode-execute cycle, and how bits flow through a processor. You understand that a program is just a sequence of instructions operating on state. You can read a NASM source file without flinching.

Part II is where that model stops being abstract.

Over eight chapters, you will work through every instruction category in the x86-64 architecture — not as a reference manual exercise, but through programs that do real things. By the time you finish Chapter 15, you will have written a working AES-NI encryption routine, implemented a function call stack by hand, manipulated data structures at the byte level, and understood why the processor's SIMD units can process a 1080p video frame faster than your scalar C code by a factor of eight.

How These Chapters Build on Part I

Part I established four foundations you will use in every chapter that follows:

The register file. You know that RAX through R15 are your working registers, that RSP points to the top of the stack, and that RIP tracks the next instruction. Part II assumes this fluency. Chapter 8 opens with advanced addressing modes, not a review of what registers are.

The memory model. You understand that memory is a flat 64-bit address space, that the stack grows downward, and that the heap is managed by the runtime. Part II builds on this in Chapters 11 and 12 to discuss stack frames, function calls, and how arrays and structs actually sit in memory.

The flag register. You saw RFLAGS in Part I. Part II turns it from a conceptual curiosity into a tool you use constantly. Chapter 9 covers every arithmetic flag in detail; Chapter 10 shows how conditional jumps read those flags to implement every control flow structure in C.

NASM syntax fundamentals. You can write section .text, global _start, and basic instructions. Part II extends this to complex addressing expressions, the REP string instructions, and SSE/AVX intrinsics syntax.

Chapter-by-Chapter Preview

Chapter 8 — Data Movement and Addressing Modes is foundational for everything that follows. The MOV instruction has more forms than most programmers realize, and the x86-64 addressing mode syntax — [base + index*scale + displacement] — appears in virtually every non-trivial program. This chapter also covers LEA, which compilers exploit for arithmetic that has nothing to do with memory, and the sign/zero-extension variants MOVSX and MOVZX that prevent the subtle bugs that come from mixing operand sizes.

Chapter 9 — Arithmetic and Logic covers the integer ALU completely. ADD, SUB, MUL, DIV, and their signed variants are joined by the bitwise logic instructions and the shift/rotate family. The flags chapter that was previewed in Part I gets its full treatment here: you will learn the difference between the Carry Flag and the Overflow Flag, and why that difference matters for writing correct comparisons on signed versus unsigned values. The 128-bit arithmetic example using ADC chains shows how to extend the architecture's native word size.

Chapter 10 — Control Flow translates every C control structure into assembly. If/else, while, for, do-while, and switch/case all have assembly equivalents, and understanding them makes you a significantly better debugger. The chapter also covers the CMOV family — conditional move instructions that implement branches without actually branching — and explains when branchless code wins on modern processors and when it makes things worse.

Chapter 11 — The Stack and Function Calls is the chapter that makes everything else click. You will implement the System V AMD64 ABI from scratch: the six-register argument convention, callee-saved versus caller-saved registers, the 16-byte alignment requirement, and the full prologue/epilogue sequence. The stack frame diagrams show exact memory layouts with real addresses. This chapter also plants the seeds for the buffer overflow exploit in Chapters 35–37: the return address sitting on the stack is exactly what Chapter 35 will target.

Chapter 12 — Arrays, Strings, and Data Structures covers the REP-prefix string instructions (MOVSB, STOSB, SCASB, CMPSB), linked list traversal in assembly, and the struct layout rules that make [rbp - 24] refer to a specific field. The implementations of strlen, strcpy, memset, and memcmp in assembly show exactly what the C standard library is doing under the hood.

Chapter 13 — Bit Manipulation covers the full bitwise toolkit: BT/BTS/BTR/BTC for testing and modifying individual bits, BSF/BSR and their more-predictable modern replacements LZCNT/TZCNT, POPCNT, and the BMI1/BMI2 instruction sets that give you hardware accelerated operations like extract-bit-field and parallel-bit-deposit. This chapter also opens the encryption tool anchor example: a streaming XOR cipher in assembly, which sets up the AES-NI implementation in Chapter 15.

Chapter 14 — Floating Point covers the three floating-point subsystems on x86-64: the legacy x87 stack-based FPU, the SSE/SSE2 scalar instructions that compilers actually use, and a preview of AVX. You will learn why == comparisons on floats are wrong at the hardware level, how the MXCSR register controls floating-point exceptions, and what the performance cliff caused by denormal numbers looks like from the instruction level.

Chapter 15 — SIMD Programming completes Part II with the highest-performance topic in the book. The XMM, YMM, and ZMM register files, packed arithmetic, shuffle instructions, and auto-vectorization are covered in full. The chapter culminates in the AES-NI implementation: hardware-accelerated AES-128 encryption using the AESENC family of instructions, completing the encryption tool anchor example that started in Chapter 13.

A Note on the Anchor Examples

Four anchor examples run through this textbook. Part II introduces or extends all of them:

The MinOS kernel gets a function call convention in Chapter 11. The C-to-assembly comparison appears in every chapter, grounding each new instruction set in familiar territory. The XOR-to-AES-NI encryption tool starts in Chapter 13 with a simple XOR cipher and culminates in Chapter 15 with a real AES-NI implementation. The buffer overflow storyline begins in Chapter 11 when you see the return address sitting on the stack for the first time — a fact you will remember when you reach Chapter 35.

Every chapter contains working NASM code you can assemble and run. Run it. Single-step through it in GDB. Change a value and see what breaks. The instruction set is not a list to be memorized; it is a vocabulary to be used.

Part II: The x86-64 Instruction Set

The Core Instruction Set Taught Through Real Programs

How These Chapters Build on Part I

Chapter-by-Chapter Preview

A Note on the Anchor Examples

Chapters in This Part