Appendix B: Answers to Selected Exercises and Quiz Questions

Questions and exercises marked with ⭐ in each chapter have answers here. Quiz answers are provided first, followed by exercise answers for each chapter.

Part I: The Machine Model

Chapter 1 Quiz Answers

Q1 (⭐): B — The von Neumann architecture treats instructions and data as values in the same memory. This is what enables self-modifying code, JIT compilation, and buffer overflow exploits (overwriting data that will later be executed as code).

Q3 (⭐): A — A 32-bit register can hold 2^32 distinct values (0 through 4,294,967,295). The calculation: 2^10 = 1,024; 2^20 = 1,048,576; 2^30 = 1,073,741,824; 2^32 = 4,294,967,296.

Q5 (⭐): C — The fetch-decode-execute cycle is the fundamental loop of CPU operation: (1) fetch the instruction bytes at RIP from memory, (2) decode those bytes to determine the operation and operands, (3) execute the operation. Modern CPUs overlap these stages in a pipeline.

Chapter 1 Exercise Answers

Exercise 1.1 (⭐): The four fields in a simple machine instruction are: opcode (what operation to perform), source operand(s) (what data to operate on), destination operand (where to put the result), and next instruction indicator (usually implicit: increment PC by instruction length). The opcode is encoded in the instruction's leading bytes.

Exercise 1.3 (⭐): Binary: 1100 0011. Hex: C3. This is the ret instruction on x86 — a single byte encoding. Knowing this, when you see 0xC3 as the last byte of a function in a hex dump, you immediately recognize it as the return instruction.

Chapter 2 Quiz Answers

Q1 (⭐): B — mov rbp, rsp copies the stack pointer into the base pointer, establishing the stack frame. The encoding 48 89 E5 breaks down as: 48 = REX.W prefix (64-bit operand size), 89 = MOV r/m64,r64 opcode, E5 = ModRM byte (mod=11 register-register, reg=4 for RSP, r/m=5 for RBP).

Q3 (⭐): A — The xchg eax, eax instruction is a no-op on x86 (single byte 90 — the NOP). It exchanges EAX with itself, doing nothing, but zero-extends to RAX (clearing upper 32 bits). It also serves as an alignment NOP in code. The multi-byte NOP 0F 1F 40 00 is used for longer alignment gaps.

Q5 (⭐): C — The LEA instruction (Load Effective Address) computes an address and stores it in the destination register without accessing memory. lea rax, [rbx + rcx*4 + 8] computes rbx + rcx*4 + 8 and puts the result in RAX. It is commonly used for arithmetic (multiplying by 2, 3, 4, 5, 8, 9) without using MUL.

Chapter 2 Exercise Answers

Exercise 2.1 (⭐): - push rbp: 1 byte (0x55) — special encoding for push register - mov rbp, rsp: 3 bytes (48 89 E5) — REX + opcode + ModRM - sub rsp, 0x20: 4 bytes (48 83 EC 20) — REX + opcode + ModRM + imm8 - mov [rbp-8], rdi: 4 bytes (48 89 7D F8) — REX + opcode + ModRM + disp8 - ret: 1 byte (C3)

Exercise 2.3 (⭐): The encoding 48 8B 04 25 28 00 00 00 disassembles as: mov rax, [0x28] — load from absolute address 0x28. However, with the 64 prefix prepended (full sequence 64 48 8B 04 25 28 00 00 00), it becomes mov rax, fs:[0x28] — a thread-local storage access. This is the canonical way to read the stack canary value in Linux programs.

Chapter 3 Quiz Answers

Q1 (⭐): D — RFLAGS bit 11 is the Overflow Flag (OF), not the Sign Flag (bit 7), Zero Flag (bit 6), or Carry Flag (bit 0). OF is set when a signed arithmetic result exceeds the representable range (e.g., adding two large positive numbers produces a negative result in two's complement).

Q3 (⭐): B — shr rax, 1 shifts RAX right by one bit (dividing by 2 for unsigned values). The bit shifted out goes to CF. sar rax, 1 (shift arithmetic right) preserves the sign bit, dividing signed values by 2. The important difference: shr fills with 0; sar fills with the sign bit.

Part II: x86-64 Instruction Set

Chapter 5 Quiz Answers

Q1 (⭐): A — In the System V AMD64 ABI, the first six integer/pointer arguments are passed in registers: RDI, RSI, RDX, RCX, R8, R9 (in that order). The seventh and subsequent arguments are pushed on the stack, right to left (last argument first). Floating-point arguments use XMM0-XMM7 instead.

Q3 (⭐): C — The callee-saved registers in the System V AMD64 ABI are: RBX, RBP, R12, R13, R14, R15. A function that uses any of these must push them at the start and pop them at the end. The rationale is that these registers tend to hold long-lived values (loop indices, pointers to large structures) that the caller wants preserved.

Q5 (⭐): B — The System V AMD64 ABI red zone is the 128 bytes below RSP. Leaf functions (those making no function calls) can use this space for local variables without adjusting RSP. This avoids the overhead of sub rsp, N in small functions. Signal handlers and interrupt handlers must not use or depend on the red zone.

Chapter 5 Exercise Answers

Exercise 5.1 (⭐): The argument registers in order are: RDI (1st), RSI (2nd), RDX (3rd), RCX (4th), R8 (5th), R9 (6th). A mnemonic: "Does She Dance? Can Roger Run?" — D (RDI), S (RSI), D (RDX), C (RCX), R (R8), R (R9).

Exercise 5.3 (⭐): Stack alignment rule: RSP must be 16-byte aligned immediately before a call instruction executes (so that RSP is 16-byte aligned minus 8 at the start of the called function, after call pushes the return address). The compiler ensures this by padding the local frame allocation. When you write hand-coded assembly that calls C functions, you must maintain this alignment.

Chapter 6 Quiz Answers

Q1 (⭐): C — The memory hierarchy from fastest to slowest: L1 cache (~4 cycles, ~32 KB) → L2 cache (~12 cycles, ~256 KB) → L3 cache (~40 cycles, ~8-32 MB) → DRAM (~100-200 ns) → NVMe SSD (~100 μs) → HDD (~5-10 ms). The ratio from L1 to DRAM is approximately 50:1 in time; L1 to HDD is approximately 1,000,000:1.

Q3 (⭐): A — A cache line is 64 bytes on all modern x86-64 and ARM64 processors. This means accessing any byte in a 64-byte aligned block pulls the entire block into cache. Structuring data so that frequently co-accessed fields fit within one cache line (struct-of-arrays vs. array-of-structs) is a key performance optimization.

Part III: ARM64 Assembly

Chapter 8 Quiz Answers

Q1 (⭐): D — In ARM64 (AArch64), BL (Branch and Link) saves the return address in X30 (LR, the Link Register) and jumps to the target. This is different from x86-64 where call pushes the return address to the stack. ARM64 function returns use RET (which branches to X30), not a stack pop.

Q3 (⭐): B — LDR X0, [X1] loads an 8-byte value (64-bit) from the memory address in X1 into X0. ARM64 uses size-suffixed register names to indicate width: LDR X0 (64-bit), LDR W0 (32-bit), LDRH W0 (16-bit), LDRB W0 (8-bit). This load-store architecture requires explicit load/store instructions; computation only operates on registers.

Q5 (⭐): A — The ARM64 exception levels are: EL0 (user space applications), EL1 (OS kernel), EL2 (hypervisor), EL3 (secure monitor / TrustZone). Most application code runs at EL0; the Linux kernel at EL1; virtualization (QEMU, KVM) at EL2; ARM TrustZone (secure boot, DRM) at EL3.

Part IV: Assembly-C Interface

Chapter 12 Quiz Answers

Q1 (⭐): B — The .bss section holds zero-initialized uninitialized data. It is not stored in the binary (only its size is recorded); the OS zero-initializes it when the program loads. This is why global arrays declared as int arr[1000000]; don't make the binary 4 MB larger — they are BSS, not .data.

Q3 (⭐): C — objdump -T prints the dynamic symbol table (symbols exported/imported for dynamic linking). This shows which functions a shared library exports or which functions an executable imports from shared libraries. -t (lowercase) shows the static symbol table; -r shows relocations; -d shows disassembly.

Part V: Systems Programming

Chapter 16 Quiz Answers

Q1 (⭐): A — The x86-64 syscall instruction causes the CPU to switch to ring 0 (kernel mode), save user-space RIP in RCX and RFLAGS in R11, load the kernel entry point from the LSTAR MSR into RIP, and apply the SFMASK MSR to RFLAGS. Arguments: RAX=syscall number, RDI, RSI, RDX, R10, R8, R9 (R10 instead of RCX because syscall overwrites RCX).

Q3 (⭐): B — The write system call has number 1 on x86-64 Linux. Arguments: RDI=file descriptor (1 for stdout), RSI=buffer address, RDX=byte count. Return value in RAX: number of bytes written (positive) or negated error code (negative). The Linux syscall ABI is documented in the man page syscall(2) and syscalls(2).

Q5 (⭐): C — The IDT (Interrupt Descriptor Table) maps interrupt vectors (0-255) to handler addresses. The kernel loads the IDT address and limit with lidt. Each entry (called a gate descriptor) contains: handler offset, segment selector (must be the kernel code segment), type (trap gate or interrupt gate — interrupt gates also clear IF), DPL (privilege level required for software-triggered interrupts), and present bit.

Chapter 18 Quiz Answers

Q1 (⭐): D — In 4-level paging on x86-64, a 64-bit virtual address is decomposed as: bits 63-48 (sign extension of bit 47, or zero), bits 47-39 (PML4 index, 9 bits → 512 entries), bits 38-30 (PDPT index, 9 bits), bits 29-21 (PD index, 9 bits), bits 20-12 (PT index, 9 bits), bits 11-0 (page offset, 12 bits → 4 KB page). Total: 48 significant bits = 256 TB virtual address space.

Q3 (⭐): A — The NX bit is bit 63 (the most significant bit) of a page table entry. When set, the CPU refuses to fetch instructions from that page. Setting bit 63 of a PTE requires 64-bit page table entries (enabled by PAE). This is the hardware mechanism enforced by the OS to implement the NX/DEP (No-Execute/Data Execution Prevention) security policy.

Part VI: Performance and Microarchitecture

Chapter 22 Quiz Answers

Q1 (⭐): B — The perf stat -e cache-misses,cache-references,instructions,cycles ./program command reports hardware performance counter values. cache-misses / cache-references gives the cache miss rate. instructions / cycles gives IPC (instructions per cycle). A high cache miss rate and low IPC together indicate memory-bound performance.

Q3 (⭐): C — RDTSC (Read Time-Stamp Counter) reads the CPU's cycle counter. The correct pattern for precise measurements: lfence; rdtsc (save result), run timed code, lfence; rdtsc (save result), compute difference. The lfence instructions prevent out-of-order execution from moving code before or after the measurement points. The counter increments at the reference frequency (not the boosted frequency on modern CPUs).

Chapter 24 Quiz Answers

Q1 (⭐): B — vaddps ymm0, ymm1, ymm2 adds 8 single-precision (32-bit) floats. YMM registers are 256 bits = 32 bytes = 8 floats of 32 bits each. The ps suffix means "packed single-precision." pd would be "packed double-precision" (4 doubles); ss would be "scalar single" (1 float).

Q3 (⭐): A — The horizontal reduction problem in SIMD is the challenge of summing all lanes within a single vector register. There is no single instruction for this; it requires log2(N) steps of hadd (horizontal add) or vperm+vadd operations to reduce 8 floats to 1. This is why vertical (same-lane) SIMD operations are much more efficient than horizontal reductions.

Part VII: Security and Reverse Engineering

Chapter 34 Quiz Answers

Q1 (⭐): D — The standard function prologue push rbp; mov rbp, rsp appears at the start of most compiled functions. When scanning disassembly for function boundaries, this pattern (or, with optimization, just push rbp) marks the beginning of a function. objdump -d labels these with the symbol name when not stripped; Ghidra identifies them as function entries during auto-analysis.

Q3 (⭐): C — A jump table (vtable dispatch or switch statement) in assembly looks like: load the index/vtable pointer, scale it (multiply by pointer size), load the target address from a table, and jmp reg. The table itself appears as a sequence of pointers in the .rodata or .data section. Cross-referencing these pointer values back to code in Ghidra reveals the jump targets.

Q5 (⭐): B — movzx eax, BYTE PTR [rbx] moves one byte from the memory address in RBX into AL, then zero-extends the result to fill EAX (and implicitly RAX). The z in movzx means zero-extend. The contrast: movsx (move with sign-extend) would fill the upper bits with copies of the byte's high bit, preserving the signed value.

Chapter 34 Exercise Answers

Exercise 34.1 (⭐): Converting mov rax, QWORD PTR [rbx + rcx*8 - 0x10] (Intel syntax) to AT&T syntax: movq -0x10(%rbx,%rcx,8), %rax. Key differences: destination goes last in AT&T; register names get % prefix; immediates/displacements get written as offsets; the q size suffix is added to the mnemonic; SIB addressing becomes displacement(base,index,scale).

Exercise 34.3 (⭐): The __libc_start_main function in glibc's startup code calls main with three arguments: argc (in RDI), argv (in RSI), and envp (in RDX). To find main in a stripped binary: look for the final call instruction before __libc_start_main in the _start function, or look for the address passed as the first argument to __libc_start_main (in RDI).

Chapter 35 Quiz Answers

Q1 (⭐): C — When gets(buf) overflows a buffer where buf is at rbp-0x40 (64 bytes below rbp), data first overwrites the local variables above buf, then overwrites the saved RBP (8 bytes), then overwrites the saved return address (8 bytes). Total offset from the start of buf to the return address: 64 (buffer size) + 8 (saved RBP) = 72 bytes. Writing bytes 73-80 overwrites the return address.

Q3 (⭐): A — In the Morris Worm (1988), the fingerd daemon used gets() to read a username into a fixed-size buffer. Since gets() performs no bounds checking, supplying more than the buffer size overwrote the return address. The worm exploited this to redirect execution to shellcode that it had placed in the overflowed buffer.

Q5 (⭐): B — Position-independent shellcode avoids absolute addresses (which are unknown at injection time) by using PC-relative techniques. The classic x86-32 technique: call next; next: pop esi — the call pushes the address of next to the stack, and pop esi captures it, giving ESI a known runtime address. In x86-64, lea rax, [rip + offset] provides direct PC-relative addressing.

Chapter 36 Quiz Answers

Q1 (⭐): B — The stack canary in the function epilogue reads the canary from [rbp-8], XORs it with the value at fs:0x28 (the expected canary from thread-local storage), and checks if the result is zero. xor rax, [fs:0x28]; jne __stack_chk_fail — if the canary was not modified, the XOR produces zero and the jne is not taken. If modified, the XOR produces non-zero and __stack_chk_fail is called.

Q3 (⭐): C — NX/DEP prevents execution of stack and heap pages by setting bit 63 (the NX bit) in the page table entries for those pages. When the CPU attempts to fetch an instruction from a page with NX=1, it raises a page fault (#PF) with error code bit 4 (I/D flag) set. This is enforced in hardware; software cannot bypass it without modifying the page table.

Q5 (⭐): D — Full RELRO makes the entire GOT read-only after startup. It works by: (1) forcing eager binding (all symbols resolved at load time, not lazily), and (2) calling mprotect to mark the GOT pages read-only before transferring control to the program. An attacker who achieves an arbitrary write cannot overwrite GOT entries to redirect function calls.

Chapter 36 Exercise Answers

Exercise 36.1 (⭐): The checksec output for a binary with maximum mitigations looks like:

Arch:     amd64-64-little
RELRO:    Full RELRO
Stack:    Canary found
NX:       NX enabled
PIE:      PIE enabled

Each flag: Full RELRO = read-only GOT; Canary found = stack canary in prologue/epilogue; NX enabled = non-executable stack/heap; PIE enabled = position-independent executable (ASLR for text segment).

Chapter 37 Quiz Answers

Q1 (⭐): B — A ROP gadget chains to the next gadget via ret. When ret executes, it pops the top of the stack into RIP. By controlling the stack (via a buffer overflow), the attacker places gadget addresses in sequence. Each gadget's ret pops the next gadget address, creating a chain of small code sequences. This is why controlling the stack after an overflow enables arbitrary computation.

Q3 (⭐): A — SROP (Sigreturn-Oriented Programming) uses the sigreturn system call (number 15 on x86-64) to set all registers to attacker-controlled values simultaneously. The kernel's sigreturn handler restores process state from a sigcontext structure on the stack, including all general-purpose registers, RIP, and RFLAGS. By forging this structure, an attacker can set any register to any value with a single gadget (pop rax; ret to set RAX=15, then syscall).

Q5 (⭐): C — Intel CET SHSTK (Shadow Stack) defeats classic ROP by maintaining a hardware-protected copy of return addresses. When the attacker overwrites the return address on the regular stack, the shadow stack still holds the original value. When ret executes, the CPU compares both stacks; the mismatch raises a #CP (Control Protection) exception, terminating the program before the attacker's gadget executes.

Chapter 37 Exercise Answers

Exercise 37.1 (⭐): A gadget that sets RDI to a specific value looks like: pop rdi; ret. To find it: ROPgadget --binary ./target | grep "pop rdi". The gadget address goes on the stack; the desired value goes immediately after it. When execution reaches this gadget, pop rdi loads the next stack value into RDI, then ret continues the chain.

Part VIII: Capstone and Beyond

Chapter 38 Quiz Answers

Q1 (⭐): C — The x86-64 boot sequence begins at physical address 0xFFFFFFF0 (the reset vector). The BIOS copies itself from ROM to RAM, initializes hardware, and loads the first sector (512 bytes, the MBR) from the boot disk to physical address 0x7C00. The boot signature (the last two bytes of the MBR) must be 0x55 0xAA for the BIOS to recognize it as bootable.

Q3 (⭐): B — The MinOS bootloader switches from real mode to long mode in this sequence: (1) disable interrupts (cli), (2) enable A20, (3) load the GDT with lgdt, (4) set CR0.PE=1 (enter protected mode), (5) far jump to flush the instruction pipeline and reload CS, (6) set up the page tables and load CR3, (7) set CR4.PAE=1 and EFER.LME=1, (8) set CR0.PG=1 (enable paging / activate long mode), (9) far jump to 64-bit code segment.

Q5 (⭐): D — The MinOS scheduler implements round-robin preemptive scheduling with a timeslice of 10 timer ticks (at 100 Hz PIT = 100ms timeslice). On each timer interrupt, timer_ticks is decremented. When it reaches zero, the scheduler saves the current process's registers (via the SAVE_CONTEXT macro), advances the process table pointer to the next PROC_RUNNING process, restores its saved registers (RESTORE_CONTEXT), and returns from the interrupt to the new process.

Chapter 38 Exercise Answers

Exercise 38.1 (⭐): The five GDT entries in MinOS are: (0) Null descriptor (required by architecture), (1) 64-bit kernel code segment (selector 0x08, DPL=0, type=code, L=1), (2) 64-bit kernel data segment (selector 0x10, DPL=0, type=data), (3) 64-bit user code segment (selector 0x18, DPL=3), (4) 64-bit user data segment (selector 0x20, DPL=3), and the TSS descriptor at selector 0x28.

Chapter 39 Quiz Answers

Q1 (⭐): B — LLVM IR uses SSA (Static Single Assignment) form, where each variable is assigned exactly once and a phi instruction merges values from different control flow paths. In SSA, the same C variable x that is assigned in two branches becomes two separate IR values (%x.if and %x.else), merged by %x.merge = phi [%x.if, %if.true], [%x.else, %if.false].

Q3 (⭐): A — The JIT compilation sequence using POSIX APIs is: (1) mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) to allocate writable memory, (2) write instruction bytes into the allocated region, (3) mprotect(ptr, size, PROT_READ|PROT_EXEC) to make the region executable (removing write permission), (4) cast the pointer to a function pointer and call it. The two-step (write then execute) is required by W^X (Write XOR Execute) security policy.

Q5 (⭐): C — In RISC-V assembly, la a1, msg is a pseudo-instruction that expands to two real instructions: auipc a1, hi20(msg) (add upper immediate to PC — loads the upper 20 bits of the PC-relative offset) followed by addi a1, a1, lo12(msg) (add the lower 12 bits). This two-instruction sequence loads any PC-relative address within a ±2 GB range.

Chapter 40 Quiz Answers

Q1 (⭐): B — mov rbp, rsp is encoded as 48 89 E5. The breakdown: 48 = REX.W prefix (64-bit operand), 89 = MOV r/m64, r64 opcode, E5 = ModRM byte (mod=11 register-register, reg=100 = RSP as source, r/m=101 = RBP as destination). Note: 48 89 EC would be mov rsp, rbp (reversed operands); 48 8B E5 would be mov rsp, rbp using the 8B (MOV r64,r/m64) opcode form.

Q3 (⭐): D — In ARM64, BL (Branch and Link) stores the return address in X30 (the Link Register, LR). The return instruction RET (or RET X30) branches to X30. When a function calls a sub-function, it must save X30 to the stack first (stp x29, x30, [sp, #-16]!) because the inner BL would overwrite X30 with the new return address.

Q5 (⭐): B — The CPUID instruction queries CPU features and capabilities. Input in EAX (leaf number) selects which information to return in EAX/EBX/ECX/EDX. Common leaves: 0 (max leaf and vendor string), 1 (feature flags including SSE, AVX, AES-NI), 7 (extended features including AVX2, AVX-512). Runtime feature detection with CPUID allows a binary to use the best available SIMD instructions.

Q7 (⭐): B — The red zone is the 128 bytes below RSP in the System V AMD64 ABI. Leaf functions (making no further calls) can use this area for local storage without adjusting RSP, saving the overhead of sub rsp, N and add rsp, N in the prologue/epilogue. The kernel respects this zone by adjusting RSP before delivering signals.

Q9 (⭐): A — movzx eax, BYTE PTR [rbx] moves the byte at address [RBX] into AL and zero-extends it to fill EAX (32 bits), and consequently RAX (64 bits, since writing EAX always zeros the upper 32 bits of RAX). The "zx" suffix means zero-extend. The sign-extending version is movsx.

Q11 (⭐): B — cmpxchg [mem], rbx compares RAX with [mem]: if equal, writes RBX to [mem] and sets ZF=1; if not equal, loads [mem] into RAX and clears ZF=0. This is the Compare-and-Swap (CAS) primitive used to implement lock-free data structures. Prefixed with lock (lock cmpxchg [mem], rbx) makes it atomic at the cache coherency level.

Q13 (⭐): A — The write system call number is 1 on x86-64 Linux, 64 on ARM64 Linux, and 64 on RISC-V Linux. ARM64 and RISC-V share the same syscall numbers because both use the Linux unified syscall table (defined in include/uapi/asm-generic/unistd.h). x86-64 uses a separate, historically defined table.

Q15 (⭐): B — The epilogue code implements a stack canary check. The prologue stored a random value (from fs:0x28) at [rbp-8]. The epilogue loads it back with mov rax, [rbp-8], XORs with the current canary at [fs:0x28]. If the canary was not modified, the XOR produces zero and jne is not taken. If a buffer overflow overwrote the canary, the XOR produces non-zero, and __stack_chk_fail terminates the program.

Q17 (⭐): B — In MinOS, register saving and restoring during a context switch is performed by the interrupt handler entry/exit code. The SAVE_CONTEXT macro (in NASM) pushes all general-purpose registers to the current process's kernel stack. After the C scheduler function (schedule()) selects the next process, RESTORE_CONTEXT pops all registers from the new process's kernel stack. The iretq (interrupt return) instruction then atomically restores RIP, CS, RFLAGS, RSP, and SS from the kernel stack.

Q19 (⭐): B — Intel CET IBT (Indirect Branch Tracking) requires that every valid indirect call target begin with the ENDBR64 instruction (encoding F3 0F 1E FA). On processors without CET, this executes as a no-op (it is encoded as a REP NOP). The IBT mechanism causes a #CP exception if an indirect call or jump lands on any instruction other than ENDBR64.