Chapter 39 Exercises: Beyond Assembly

Open Assembly Language Project

Chapter 39 Exercises: Beyond Assembly

Compiler Pipeline Exercises

Exercise 39.1 — IR identification Match each representation to its stage in the compiler pipeline:

a) %result = add nsw i32 %a, %b — LLVM IR b) mov eax, edi; add eax, esi — ? c) BinaryOp(Add, Var(a), Var(b)) — ? d) tokens: [int, identifier(add), ...] — ? e) D.1234 = a + b; — GCC GIMPLE

Exercise 39.2 ⭐ — Compiler optimization observation Using godbolt.org (or your local GCC/Clang):

a) Compile int f(int x) { return x * 4; } at -O0 and -O2. What instruction does the compiler use for x * 4 at -O2? b) Compile int sum(int a, int b) { return a + b; } at -O2. How many instructions does it produce? c) Compile a simple loop at -O0 and -O3. What SIMD instructions appear at -O3? d) What does __attribute__((noinline)) do, and why might you use it to study compiler behavior?

Exercise 39.3 — Optimization recognition For each assembly sequence, identify which compiler optimization produced it:

; Snippet A (replaces a function call):
; Original C: result = min(a, b);
cmp  eax, edi
cmovg eax, edi

; Snippet B (replaces a loop):
; Original C: int sum = 0; for (int i = 0; i < 4; i++) sum += a[i];
vmovdqu xmm0, [rdi]
vphaddw xmm0, xmm0, xmm0
vphaddw xmm0, xmm0, xmm0

; Snippet C (replaces x * 15):
lea  eax, [rdi + rdi*2]   ; rdi * 3
lea  eax, [rax + rax*4]   ; * 5 → 15

Exercise 39.4 ⭐ — Register allocation insight A function has 8 local variables but the x86-64 ABI provides only 6 caller-saved registers (RAX, RCX, RDX, RSI, RDI, R8-R11). Explain:

a) How does the register allocator decide which variables get registers and which get spilled? b) What does a "spill" look like in the disassembly? c) What is the interference graph, and how does it determine which variables can share a register? d) Why does -O0 produce more stack accesses than -O2?

JIT Compilation Exercises

Exercise 39.5 ⭐ — JIT basics Answer these questions about JIT compilation:

a) What system call is used on Linux to allocate executable memory? b) Why must JIT compilers separate the "write" and "execute" phases? c) On Linux with CET IBT enabled, what must JIT-generated code include at every indirect call target? d) What is the difference between baseline JIT and optimizing JIT?

Exercise 39.6 — JIT code generation Write C code (using inline arrays of bytes) that generates and executes each of these x86-64 functions:

a) A function that takes no arguments and returns 100 (use mov eax, 100; ret) b) A function that takes one int argument (in EDI) and returns it doubled (use lea eax, [edi+edi]; ret) c) Combine (a) and (b) to demonstrate calling one JIT function from another

Exercise 39.7 ⭐ — LLVM JIT Describe the steps to use LLVM's ORC JIT to:

a) Create a module with one function b) JIT-compile the module c) Get a function pointer to the compiled function d) Call the function

(Conceptual description is acceptable; actual LLVM API calls are a bonus.)

WebAssembly Exercises

Exercise 39.8 — WASM vs. register machine Compare WASM (stack machine) and x86-64 (register machine) for the expression (a + b) * (c - d):

a) Write the WASM stack machine instructions b) Write the equivalent x86-64 register-based instructions c) How many "instructions" does each approach need? d) What is the trade-off between stack machines and register machines for code generation?

Exercise 39.9 ⭐ — WASM security model Explain how WASM's design prevents each of these attacks:

a) Reading memory outside the module's linear memory b) Jumping to an arbitrary address (like a ROP gadget) c) Executing shellcode written to memory d) Accessing other processes' memory or the browser's memory

Exercise 39.10 — WASM practical Write a C "Hello World" and compile it to WASM using Emscripten or wasi-sdk. Observe:

a) What is the .wasm file size? b) Run it with wasmtime hello.wasm. What output do you see? c) Use wasm2wat hello.wasm to decompile to WebAssembly text format. What does the main function look like?

RISC-V Exercises

Exercise 39.11 ⭐ — RISC-V register conventions Identify the ABI role of each RISC-V register:

Register	Name	Role
x0	zero	?
x1	ra	?
x2	sp	?
x10-x17	a0-a7	?
x18-x27	s2-s11	?

How does this compare to the x86-64 System V ABI? The ARM64 AAPCS64?

Exercise 39.12 — RISC-V hello world The RISC-V hello world in the chapter uses syscall 64 (write) and 93 (exit). Note that Linux RISC-V syscall numbers differ from x86-64.

a) Assemble and run the RISC-V hello world using qemu-riscv64 b) What is the x86-64 syscall number for write? For RISC-V? Why do they differ? c) In ARM64, what instruction invokes a system call? What is the ARM64 write syscall number?

Exercise 39.13 ⭐ — Cross-ISA comparison For the function int add(int a, int b) { return a + b; }:

a) Write the x86-64 assembly (3 instructions) b) Write the ARM64 assembly (2 instructions) c) Write the RISC-V assembly (2 instructions) d) What similarities do you observe between ARM64 and RISC-V? What is different?

Exercise 39.14 — RISC-V QEMU Set up a RISC-V development environment using QEMU:

a) Install gcc-riscv64-linux-gnu and qemu-user b) Compile a C "hello world" for RISC-V: riscv64-linux-gnu-gcc -static hello.c -o hello_riscv c) Run with: qemu-riscv64 ./hello_riscv d) Disassemble with: riscv64-linux-gnu-objdump -M no-aliases -d hello_riscv | head -60 e) What differences do you observe from the equivalent x86-64 or ARM64 disassembly?

GPU and Future Architecture Exercises

Exercise 39.15 — SIMT vs. SIMD Compare NVIDIA GPU SIMT and x86-64 AVX-512 SIMD:

a) How many elements does an AVX-512 vaddps add simultaneously? b) How many threads does a single NVIDIA warp execute simultaneously? c) What happens when threads in a SIMT warp take different branches? d) What is the hardware cost of SIMT vs. SIMD divergence?

Exercise 39.16 ⭐ — CUDA PTX reading Given this PTX code snippet:

.func (.reg .f32 %result) saxpy(
    .param .f32 %a,
    .param .f32 %x,
    .param .f32 %y)
{
    .reg .f32 temp;
    fma.rn.f32 temp, %a, %x, %y;
    mov.f32 %result, temp;
    ret;
}

a) What computation does this perform? b) What does fma.rn.f32 mean? c) What x86-64 instruction is the equivalent? (Hint: Chapter 16)

Synthesis Exercises

Exercise 39.17 ⭐ — Compiler pipeline end-to-end For the function int triple(int x) { return x * 3; }:

a) Show the LLVM IR (approximate) b) What optimization converts x * 3 to a shift/add? c) Show the expected x86-64 output at -O2 (approximate) d) Show the expected ARM64 output at -O2 e) Show the expected RISC-V output at -O2

Exercise 39.18 — JIT security in a post-CET world Modern browsers run WebAssembly with JIT compilation on CET-enabled hardware. Describe:

a) What does the browser need to do to make JIT-compiled WASM work with CET IBT? b) What mprotect call pattern is required on Linux? c) How does the browser ensure JIT-generated code can be an indirect call target? d) Why is this harder for general-purpose JIT compilers than for WASM-specific JITs?

Exercise 39.19 ⭐ — RISC-V vs. ARM64 for embedded You are choosing a processor for a new embedded system (IoT device, ~$5 BOM target):

a) What does ARM's Cortex-M series offer for this use case? b) What RISC-V cores are available at comparable cost and power? c) What is the toolchain maturity difference (GCC/LLVM support, library support)? d) What is the strategic/licensing reason someone might prefer RISC-V?

Exercise 39.20 ⭐ — Future architecture readiness You have completed this book and are starting your career. For each future architecture you might encounter:

Architecture	Key concept to understand first	Similarity to what you know
RISC-V 64-bit	?	ARM64 conventions
CUDA PTX	?	x86-64 SIMD (AVX)
WebAssembly	?	Stack machine concepts
ARM SVE (Scalable Vector)	?	AVX-512 but variable width
RISC-V V extension	?	AVX/NEON SIMD

Fill in the "key concept to understand first" for each, using your existing knowledge as the bridge.