Appendix A: Glossary

Terms are defined as they apply to assembly language programming and systems programming. Cross-references to other glossary terms are indicated with bold type.

A

A20 gate A hardware mechanism on x86 PCs that controls whether address bit 20 is enabled. In real mode, the 8086 processor wrapped around at 1 MB; enabling A20 allows access above 1 MB. Required during bootloader initialization before switching to protected or long mode.

ABI (Application Binary Interface) The contract between compiled code modules specifying how they interact at the machine level: which registers pass arguments, which are caller-saved vs. callee-saved, stack alignment requirements, structure layout, and how system calls are invoked. See System V AMD64 ABI, AAPCS64.

AAPCS64 (Arm Architecture Procedure Call Standard for AArch64) The ARM64 calling convention. Arguments passed in X0-X7; return value in X0; callee-saved: X19-X28, X29 (FP), X30 (LR); stack must be 16-byte aligned.

ASLR (Address Space Layout Randomization) A kernel security feature that randomizes the base addresses of the stack, heap, shared libraries, and (with PIE) the executable itself at each load. Reduces the predictability of addresses needed for exploitation. On Linux x86-64, provides approximately 28 bits of entropy for library addresses.

ASan (AddressSanitizer) A compiler instrumentation tool that detects memory errors (buffer overflows, use-after-free, double-free) at runtime by adding shadow memory and instrumented access checks around every memory operation. Incurs roughly 2x runtime overhead and 3x memory overhead.

AT&T syntax The assembly syntax used by default in GAS (GNU Assembler), GDB, and objdump. Operand order is source, destination (opposite of Intel syntax). Register names are prefixed with %; immediates with $. Size suffixes (b, w, l, q) are appended to mnemonics.

auto-vectorization A compiler optimization that transforms scalar loop code into SIMD instructions (SSE, AVX, NEON) automatically, without explicit intrinsics. Requires loops with regular access patterns, no data dependencies between iterations, and sufficient iteration count.

AVX (Advanced Vector Extensions) Intel SIMD extension introducing 256-bit YMM registers (doubling SSE's 128-bit XMM). AVX2 extended most integer operations to 256-bit. AVX-512 further extended to 512-bit ZMM registers and added mask registers.

B

base pointer (RBP) Register used in the x86-64 standard frame layout to point to the bottom of the current stack frame. push rbp; mov rbp, rsp is the standard prologue. Allows frame-relative addressing ([rbp-8]) regardless of RSP changes during the function. Omitted when -fomit-frame-pointer is used.

branch prediction CPU hardware mechanism that guesses the outcome of conditional branches before the condition is evaluated, allowing speculative instruction fetch and execution. A mispredicted branch causes a pipeline flush and a penalty of approximately 15-20 cycles on modern Intel microarchitectures.

BSS segment The section of a binary or in-memory program layout that holds zero-initialized global and static variables. BSS data is not stored in the binary (only the size is recorded); the OS zero-initializes it at load time. Named after "Block Started by Symbol," a historical assembler directive.

C

cache line The minimum unit of data transfer between the CPU cache and main memory. On x86-64 and ARM64 systems, cache lines are 64 bytes. Memory accesses that span multiple cache lines or that access data with a stride larger than the cache line size reduce cache efficiency.

callee-saved register A register that a called function is responsible for preserving. If a function uses a callee-saved register, it must save the original value (typically by pushing it to the stack) and restore it before returning. In the System V AMD64 ABI, callee-saved registers are: RBX, RBP, R12-R15.

caller-saved register A register that the calling function is responsible for saving before a function call, if its value is needed afterward. The called function may freely modify caller-saved registers. In the System V AMD64 ABI, caller-saved registers include: RAX, RCX, RDX, RSI, RDI, R8-R11.

canary (stack canary) A random value placed on the stack between local variables and the saved return address during function prologue. Checked in the epilogue; if overwritten (indicating a buffer overflow), __stack_chk_fail is called. The canary value is read from fs:0x28 (thread-local storage). The low byte is always 0x00 to foil string-based overflows.

CET (Control-flow Enforcement Technology) Intel hardware security feature with two components: SHSTK (Shadow Stack) and IBT (Indirect Branch Tracking). SHSTK maintains a hardware-protected copy of return addresses to detect ROP attacks. IBT requires valid indirect call targets to begin with ENDBR64.

CFI (Control-Flow Integrity) A class of security mitigation that restricts where indirect calls and returns may transfer control. Software CFI (Clang -fsanitize=cfi) checks target validity at each indirect call. Hardware CFI is implemented by Intel CET and ARM PAC/BTI.

CISC (Complex Instruction Set Computer) An ISA design philosophy allowing variable-length instructions with many addressing modes, memory operands, and operations that take multiple cycles to execute. x86-64 is the dominant modern CISC architecture.

context switch The kernel operation of saving the CPU register state of one thread and restoring the state of another, allowing multiple threads to share one CPU. The saved state (including all general-purpose registers, instruction pointer, flags, and segment registers) is called the thread context.

CPUID An x86-64 instruction that returns CPU identification and feature information. Input: EAX (leaf number), sometimes ECX (sub-leaf). Output: EAX, EBX, ECX, EDX with feature bits. Used at runtime to detect support for SSE4.2, AVX, AES-NI, and other extensions before using them.

D

data segment (.data) The section of a binary holding initialized global and static variables. Unlike BSS, .data is stored verbatim in the binary file and mapped into process memory at load time.

DWARF A standard debug information format embedded in ELF binaries. Stores the mapping from machine code addresses to source file, line number, and variable names. Used by GDB, Ghidra, and other debuggers to display source-level information when debugging compiled code.

E

ELF (Executable and Linkable Format) The standard binary file format for Linux and most Unix-like systems. Contains program headers (describing runtime segments), section headers (describing the file structure), and the binary code/data. Parsed by the kernel's ELF loader at execve time and by dynamic linker (ld-linux.so) for shared library resolution.

ENDBR64 The x86-64 instruction marking a valid IBT target for indirect branches. Encoding: F3 0F 1E FA. On CPUs without CET, executes as a no-op (effectively REP NOP). Emitted by GCC/Clang at every function entry and every valid indirect branch target when -fcf-protection=full is enabled.

epilogue The sequence of instructions at the end of a function that restores callee-saved registers, destroys the stack frame, and executes ret. Standard x86-64 epilogue: leave (equivalent to mov rsp, rbp; pop rbp) followed by ret. When a stack canary is used, the epilogue also verifies the canary value before ret.

F

far pointer A pointer that includes both a segment selector and an offset, used for inter-segment transfers. Used in the x86 bootloader during mode switches (e.g., jmp 0x0008:long_mode_entry to reload CS and switch to 64-bit code segment semantics).

FLAGS / RFLAGS The x86-64 flags register. Key bits: CF (carry), PF (parity), AF (auxiliary carry), ZF (zero), SF (sign), OF (overflow), DF (direction, used by string instructions), IF (interrupt enable). Condition codes in jcc instructions test combinations of these bits.

frame pointer See base pointer (RBP).

full RELRO A linker/loader security option that makes the GOT read-only after dynamic linking completes. Prevents attackers who achieve an arbitrary write primitive from overwriting GOT entries to redirect function calls. More secure than Partial RELRO but slightly increases load time due to eager binding.

G

gadget (ROP gadget) A short sequence of existing instructions ending with ret (or another indirect control transfer) that can be chained together to perform useful operations without injecting new code. Used in Return-Oriented Programming.

GAS (GNU Assembler) The assembler included in GNU Binutils, used by GCC as its back-end. Uses AT&T syntax by default but accepts Intel syntax with .intel_syntax noprefix. Supports all major architectures (x86-64, ARM64, RISC-V, MIPS, etc.).

GDB (GNU Debugger) The standard debugger for Linux. Supports source-level and assembly-level debugging, hardware and software breakpoints, memory inspection, backtrace display, scripting via Python, and remote debugging via GDB server. Extended by pwndbg for security research workflows.

GDT (Global Descriptor Table) An x86 data structure that defines memory segments. Required for protected mode and long mode operation. In 64-bit mode, most segmentation is disabled (base = 0, limit = full address space), but the GDT still defines code and data segment access rights and must include a TSS descriptor for system calls.

Ghidra A free, open-source reverse engineering suite developed by the NSA. Features include: disassembly, decompilation to C pseudocode, cross-reference analysis, symbol renaming, script automation, and collaborative multi-user analysis. Competes with IDA Pro for professional RE tasks.

GOT (Global Offset Table) A table of pointers to globally defined symbols (functions and variables) used by the dynamic linker. For functions, the GOT is populated lazily (at first call) via the PLT. Under Full RELRO, the GOT is made read-only after eager binding at load time.

GPGPU (General-Purpose GPU Computing) The use of GPUs for non-graphics computation. GPU architectures (NVIDIA's SIMT model, the PTX/SASS ISA) use thousands of parallel threads executing the same instruction on different data, with the execution unit called a warp.

H

heap The region of a process's address space used for dynamic memory allocation (malloc/free, new/delete). Managed by the allocator library (glibc's ptmalloc, jemalloc, tcmalloc). Unlike the stack, heap allocations persist until explicitly freed and may be reused. Heap corruption vulnerabilities include use-after-free and double-free.

I

IBT (Indirect Branch Tracking) The CET component that prevents attacker-controlled indirect branches (via call *rax, jmp *rbx) from targeting arbitrary code. Enforces that all valid indirect call/jump targets begin with ENDBR64. Reducing the set of valid targets significantly degrades gadget availability.

IDT (Interrupt Descriptor Table) An x86 data structure (analogous to the IVT in real mode) that maps interrupt and exception vectors (0-255) to handler addresses, privilege levels, and descriptor types. Loaded with lidt. The ISR for each vector saves registers, calls a C handler, and returns with iret.

instruction latency The number of clock cycles from when an instruction begins execution until its result is available for use by a dependent instruction. For example, an L1 cache load has latency 4-5 cycles; an FP divide may have latency 20+ cycles.

instruction throughput The number of identical instructions that can be initiated per clock cycle (expressed as reciprocal throughput, i.e., one instruction every N cycles). An instruction with throughput 0.5 can be pipelined at 2 per cycle; one with throughput 4 can only issue every 4 cycles. See Agner Fog's instruction tables for per-architecture measurements.

Intel syntax The assembly syntax used in Intel documentation, NASM, and MASM. Operand order is destination, source. Register names are bare (no % prefix); immediates are bare (no $ prefix). Size is specified by register width or BYTE PTR/WORD PTR/DWORD PTR/QWORD PTR qualifiers.

interrupt An asynchronous signal to the CPU from hardware (hardware interrupt) or a synchronous CPU-generated exception (trap, fault, abort). Causes the CPU to save context and transfer to an interrupt service routine (ISR) via the IDT. Hardware interrupts are routed through the PIC or APIC.

J

JIT (Just-In-Time compilation) A technique that generates machine code at runtime, enabling dynamic optimization of interpreted code. Requires: allocating writable memory (mmap with PROT_READ|PROT_WRITE), writing instruction bytes, making the region executable (mprotect with PROT_READ|PROT_EXEC), and calling the generated code. Used in V8, HotSpot JVM, LLVM ORC, and WebAssembly runtimes.

JOP (Jump-Oriented Programming) A variant of ROP that uses jmp-terminated gadgets instead of ret-terminated ones. Harder to construct automatically but can bypass SHSTK (which only enforces ret). Combined with Blind ROP techniques for remote exploitation.

L

leaf function A function that makes no further function calls. Leaf functions can use the red zone (the 128 bytes below RSP) for local storage without adjusting RSP. They also omit the standard prologue/epilogue when -fomit-frame-pointer is active.

linker A tool that combines compiled object files (.o files) and libraries into a final executable or shared library. Resolves external symbol references, assigns final addresses, and produces the output binary. The GNU linker is ld; LLVM's linker is lld.

LLVM IR The intermediate representation used by the LLVM compiler framework. A typed, SSA-form, three-address instruction set. Written in a human-readable text format or compact bitcode. Optimization passes operate on LLVM IR before architecture-specific code generation.

long mode The x86-64 operating mode providing 64-bit addressing and register widths. Requires a 64-bit GDT code segment, a page table (CR3 must be valid), and CR4.PAE + EFER.LME + CR0.PG bits set. The mode in which all 64-bit Linux and Windows processes run.

M

memory-mapped I/O (MMIO) A technique for communicating with hardware by reading and writing to physical addresses that are wired to device registers rather than RAM. For example, the VGA text buffer at physical address 0xB8000 and the APIC at 0xFEE00000 are accessed via MMIO.

ModRM byte A byte in x86 instruction encoding that specifies the addressing mode and operands. Bits 7-6: mod (register, memory, or memory+displacement). Bits 5-3: reg (register or opcode extension). Bits 2-0: r/m (register or memory base). Together with the REX prefix and optional SIB byte, encodes all operand combinations.

MSFP (Microarchitecture-Specific Function Prologue) Not a standard term — see prologue.

N

NASM (Netwide Assembler) A portable x86 assembler using Intel syntax with its own macro system. Supports output formats including ELF64, Mach-O, flat binary (for bootloaders), and Win64 PE. The assembler used in this book for all x86-64 examples.

NOP sled A sequence of nop (0x90) instructions before shellcode. Increases the target area for an imprecise return address: jumping anywhere in the sled slides execution forward into the shellcode. Less effective with modern ASLR.

NX bit (No-Execute bit) Bit 63 of x86-64 page table entries. When set, the CPU refuses to fetch instructions from that page, causing a page fault (#PF). The enforcement mechanism for DEP. ARM64 equivalent: the XN (Execute Never) bit. Called the NX bit on AMD, XD (Execute Disable) on Intel.

O

out-of-order execution A CPU microarchitecture technique that executes instructions in an order determined by data availability and execution unit availability, not the program order. Allows a CPU to extract instruction-level parallelism from a sequential instruction stream. Tracked by the reorder buffer (ROB).

objdump A GNU Binutils tool for inspecting binary files. Commonly used flags: -d (disassemble executable sections), -M intel (Intel syntax), -t (symbol table), -T (dynamic symbol table), -r (relocation entries), -x (all headers). The primary tool for static binary analysis without a full RE suite.

P

page (memory page) The minimum unit of virtual memory management on x86-64 and ARM64 systems. Standard page size: 4 KiB (4096 bytes). Huge pages: 2 MiB (2 MiB pages) or 1 GiB (1 GiB pages). Each page has its own protection attributes (read/write/execute) managed by the page table.

page fault An exception (#PF on x86, Data Abort / Instruction Abort on ARM64) generated when a memory access has no valid page table entry, when a write occurs to a read-only page, or when a fetch occurs from a non-executable page. Handled by the kernel's page fault handler, which may allocate a new physical page (demand paging) or terminate the process (segmentation fault / SIGSEGV).

page table A hierarchical data structure that maps virtual addresses to physical addresses. x86-64 uses a 4-level table (PML4 → PDPT → PD → PT), each level indexed by 9 bits of the virtual address. Each entry contains the physical base address of the next level table or page, plus control bits (Present, Writable, User, NX, etc.).

partial RELRO A linker security option (enabled by default in GCC) that makes some sections (.init_array, .fini_array, .jcr) read-only after startup but leaves the GOT writable. Less secure than Full RELRO but avoids the startup overhead of eager symbol binding.

PIC (Position-Independent Code) Code that can execute correctly regardless of where it is loaded in memory, using PC-relative addressing for all data and code references. Required for shared libraries. For executables, produces PIE binaries that support ASLR.

PIE (Position-Independent Executable) An executable compiled with -fPIE and linked with -pie, allowing the kernel to load it at a random base address (ASLR). Without PIE, the executable is loaded at a fixed address (typically 0x400000), making ROP gadgets predictable.

PIT (Programmable Interval Timer) The Intel 8253/8254 hardware timer present on x86 PCs. Generates periodic interrupts (IRQ 0) at a programmable frequency. Base frequency: 1,193,182 Hz. Programming the PIT to divide by 11,932 produces approximately 100 Hz (100 timer interrupts per second), used for preemptive scheduling in MinOS.

PLT (Procedure Linkage Table) A table of small stub functions used for lazy binding of shared library calls. Each PLT entry: on the first call, invokes the dynamic linker to resolve the symbol and patch the GOT; on subsequent calls, jumps directly through the GOT to the resolved function. The PLT is the mechanism that makes call printf@plt work.

prologue The sequence of instructions at the start of a function that establishes the stack frame and saves callee-saved registers. Standard x86-64 prologue: push rbp; mov rbp, rsp; sub rsp, N (where N is the local variable space). When a stack canary is used, the prologue also reads from fs:0x28 and stores the canary at [rbp-8].

protected mode An x86 operating mode (32-bit) with hardware memory protection and segmentation. The first step when transitioning from 16-bit real mode during OS boot. Protected mode enables segmentation with limit checks and permission bits, but does not yet enable the 64-bit register widths or full virtual address space of long mode.

pwndbg A GDB plugin that enhances GDB for security research and CTF work. Adds display panels showing register values, stack content, disassembly, and heap state at each prompt. Provides commands like cyclic (de Bruijn pattern generation), checksec, vmmap, and heap for exploit development workflows.

pwntools A Python library for exploit development and CTF challenges. Provides: process (local process interaction), remote (network interaction), p64/u64 (struct.pack wrappers), ELF (binary analysis), ROP (gadget finder), asm/disasm (inline assembly). The standard tool for CTF pwn challenges.

R

RDTSC (Read Time-Stamp Counter) An x86-64 instruction that reads the processor's time-stamp counter (cycle count since reset) into EDX:EAX. Used for high-resolution timing without OS calls. Use LFENCE; RDTSC; LFENCE to prevent out-of-order reordering around the measurement point.

real mode The initial x86 CPU operating mode at power-on, providing a 1 MB address space with 16-bit registers and no memory protection. x86 BIOS boots in real mode; OS bootloaders run in real mode before switching to protected mode and then long mode.

red zone In the System V AMD64 ABI, the 128 bytes below RSP that leaf functions can use as scratch space without adjusting RSP. Signal handlers and interrupt handlers must not use the red zone (the kernel adjusts RSP before delivering signals and interrupts to avoid corruption). Not present in the Windows x64 ABI.

REX prefix A one-byte prefix (0x40-0x4F) in x86-64 that extends instruction encoding to access 64-bit register widths (REX.W bit) and the additional registers R8-R15, XMM8-XMM15 (REX.R, REX.X, REX.B bits). For example, 48 = 0x48 = REX with W=1 (64-bit operand size).

RELRO (Relocation Read-Only) See Full RELRO and Partial RELRO.

RISC (Reduced Instruction Set Computer) An ISA design philosophy with fixed-width, simple instructions, large register files, and load-store architecture (computation is register-to-register; memory access is separate). ARM64 and RISC-V are modern RISC architectures.

ROP (Return-Oriented Programming) An exploitation technique that chains together short sequences of existing code (gadgets), each ending with ret, by placing their addresses on the stack. Allows attacker-controlled computation without injecting new code, bypassing NX/DEP. Defeated by CET SHSTK and SafeStack.

S

SIB byte (Scale-Index-Base byte) An optional byte in x86 instruction encoding that follows the ModRM byte when base+index addressing is needed. Bits 7-6: scale (1, 2, 4, or 8 — the shift amount for the index). Bits 5-3: index register. Bits 2-0: base register. Enables expressions like [rbx + rcx*8 + 0x10].

shadow stack (SHSTK) The CET hardware mechanism that maintains a second, hardware-protected stack containing only return addresses. When call executes, the return address is pushed to both the regular stack and the shadow stack. When ret executes, the CPU compares the top of both stacks; a mismatch raises a #CP (Control Protection) exception. Shadow stack pages have a unique combination of page table bits (Writable=0, Dirty=1) that the kernel and user space can use to identify them.

SIMD (Single Instruction, Multiple Data) A class of instruction that applies one operation to multiple data elements in parallel. x86-64 SIMD extensions: MMX (64-bit), SSE/SSE2 (128-bit, XMM registers), AVX/AVX2 (256-bit, YMM registers), AVX-512 (512-bit, ZMM registers). ARM64: NEON (128-bit, V registers).

SIGRETURN (signal return) The system call (number 15 on x86-64) that restores process state after a signal handler returns. The kernel expects a sigcontext structure on the stack containing all register values. In SROP, attackers forge a fake sigcontext to set arbitrary register values by triggering a SIGRETURN via ROP.

SSA (Static Single Assignment) A property of an intermediate representation where each variable is assigned exactly once. LLVM IR is in SSA form. SSA simplifies data flow analysis and optimization because definitions and uses have a direct correspondence without aliasing.

stack The LIFO data structure used by function calls for return addresses, saved registers, and local variables. On x86-64, the stack grows downward (toward lower addresses). RSP points to the top of the stack (the lowest-addressed used location). push decrements RSP and writes; pop reads and increments RSP.

System V AMD64 ABI The calling convention used by Linux, macOS, and BSDs for x86-64. Arguments: RDI, RSI, RDX, RCX, R8, R9 (integer/pointer); XMM0-XMM7 (floating point). Return: RAX (integer/pointer), XMM0 (float). Callee-saved: RBX, RBP, R12-R15. Stack must be 16-byte aligned before call.

syscall (instruction) The x86-64 instruction for entering the kernel via the fast system call path. Sets CS from IA32_STAR MSR, RIP from LSTAR MSR, RFLAGS masked by SFMASK MSR. The kernel system call number is in RAX; arguments in RDI, RSI, RDX, R10, R8, R9 (note: R10 instead of RCX, which is overwritten by syscall).

T

tail call optimization A compiler optimization that converts a function call in tail position (the last operation before ret) into a direct jmp, avoiding the overhead of a new stack frame. Enables recursive functions to run in constant stack space. Visible in assembly as jmp _other_function replacing call _other_function; ret.

TLB (Translation Lookaside Buffer) A cache for page table entries that caches recent virtual-to-physical address translations. On a TLB hit, virtual address translation completes in 1 cycle; on a miss, the CPU must walk the page table (4 memory accesses for x86-64 4-level paging). Context switches and explicit invlpg instructions invalidate TLB entries.

TSS (Task State Segment) An x86-64 data structure that, in 64-bit mode, stores privilege-level stack pointers (RSP0, RSP1, RSP2) and IST (Interrupt Stack Table) stacks used when entering the kernel via interrupts. The GDT must contain a TSS descriptor, and TR must be loaded with ltr.

U

UAF (Use-After-Free) A memory corruption vulnerability where a program continues to use a pointer to an object after that object has been freed. If the freed memory is reallocated for a different object (possibly attacker-controlled), the stale pointer now accesses the new object's data, enabling data corruption or type confusion.

V

virtual address An address as seen by a running process. The CPU translates virtual addresses to physical addresses via the page table. Processes on the same machine can have the same virtual address but different physical addresses, providing isolation.

vmlinux The uncompressed, non-stripped ELF binary of the Linux kernel. Used for debugging with GDB (gdb vmlinux), symbol resolution, and generating kernel modules. The boot image (bzImage) is a self-extracting compressed version; vmlinux is the bare ELF.

W

WASM (WebAssembly) A portable, low-level bytecode format designed for safe execution in browsers and other sandboxed environments. Compiled from C, C++, Rust, and other languages. Uses a stack machine execution model, linear memory with explicit bounds checking, and a type system that ensures safety. Runtimes include Wasmtime (native) and browser JIT engines (V8, SpiderMonkey).

X

XMM register A 128-bit register used for SSE SIMD operations (XMM0-XMM15). Can hold: 16 bytes, 8 words, 4 dwords, 2 qwords, 4 single-precision floats, or 2 double-precision floats. The lower 128 bits of YMM (AVX) and ZMM (AVX-512) registers.

XOR (Exclusive OR) A bitwise operation returning 1 when exactly one input bit is 1. In assembly: xor eax, eax zeroes EAX (encoded as 2 bytes, smaller than mov eax, 0). Also the fundamental operation of the XOR cipher and the AddRoundKey step of AES.

Y

YMM register A 256-bit register used for AVX/AVX2 SIMD operations (YMM0-YMM15). The lower 128 bits overlap with the corresponding XMM register. When a VEX-encoded (AVX) instruction writes a 128-bit result to XMM, it zero-extends the upper 128 bits of the corresponding YMM register, avoiding the SSE/AVX transition penalty.

Z

ZMM register A 512-bit register used for AVX-512 SIMD operations (ZMM0-ZMM31). The lower 256 bits overlap with the corresponding YMM register, and the lower 128 bits overlap with XMM. AVX-512 also introduces 8 opmask registers (k0-k7) for per-element conditional operations.

zero-extension In x86-64, any instruction that writes to a 32-bit register (e.g., mov eax, 1) automatically zero-extends the result to 64 bits (the upper 32 bits of RAX become 0). This behavior is intentional and differs from 16-bit and 8-bit writes, which do not zero-extend.