Chapter 19 Exercises: x86-64 vs. ARM64 Comparison
Exercise 1: Architecture Philosophy
For each design choice, explain the tradeoff it represents and which architecture made which choice:
a) Memory operands in arithmetic instructions b) Fixed vs. variable instruction width c) Condition flags updated on every ALU operation vs. only with S suffix d) Return address on stack vs. in a register e) Zero register vs. clearing a register explicitly
Exercise 2: Port Code Between ISAs
Translate each x86-64 function to ARM64 assembly. State any differences in approach.
a)
; Absolute value
abs_val:
mov eax, edi
cdq ; sign-extend EAX into EDX:EAX
xor eax, edx ; EAX ^= EDX
sub eax, edx ; EAX -= EDX
ret
b)
; Count set bits (popcount)
count_bits:
xor eax, eax
.loop:
test edi, edi
jz .done
mov ecx, edi
and ecx, 1
add eax, ecx
shr edi, 1
jmp .loop
.done:
ret
c)
; Swap two integers via memory
swap: ; rdi = int *a, rsi = int *b
mov eax, [rdi]
mov ecx, [rsi]
mov [rdi], ecx
mov [rsi], eax
ret
Exercise 3: Port ARM64 to x86-64
Translate each ARM64 snippet to equivalent x86-64:
a)
// CSEL branchless max
CMP X0, X1
CSEL X0, X0, X1, GT // X0 = max(X0, X1)
b)
// Barrel shifter multiplication trick
ADD X0, X0, X0, LSL #3 // X0 = X0 * 9
c)
// CBZ loop pattern
.loop:
LDR W2, [X0], #4
ADD X1, X1, X2
SUBS W3, W3, #1
CBNZ W3, .loop
Exercise 4: Register File Analysis
a) A C function has 7 local variables of type int64_t. On x86-64, how many can live in callee-saved registers without using the stack? On ARM64?
b) A function calls printf(fmt, a, b, c, d, e, f, g, h) — 9 arguments (1 format + 8 values). How many arguments go onto the stack on x86-64? On ARM64?
c) The "red zone" (128 bytes below RSP/SP) exists on both x86-64 (System V) and ARM64 (AAPCS64). What is the red zone for, and why can't kernel code use it?
Exercise 5: Instruction Encoding
a) The x86-64 instruction MOV RAX, RBX has the encoding 48 89 D8. Decode this:
- What does byte 0x48 indicate?
- What does byte 0x89 indicate?
- What does byte 0xD8 (11 011 011 in binary, where bits [7:6]=mod, [5:3]=reg, [2:0]=rm) indicate?
b) An ARM64 MOV X0, X1 is ORR X0, XZR, X1, which encodes to 32 bits. With fixed encoding, a 5-bit field identifies the destination register (allowing 32 registers). How many distinct x86-64 registers can a single 3-bit r/m field encode without a REX prefix?
c) The ARM64 branch instruction B label has a 26-bit immediate. What is the maximum forward branch distance in bytes? In megabytes?
Exercise 6: Performance Analysis
Given this inner loop, analyzed on both architectures:
for (int i = 0; i < n; i++) {
c[i] = a[i] + b[i];
}
a) Write the x86-64 assembly for this loop (using scalar int64_t). b) Write the ARM64 assembly for this loop (using scalar int64_t). c) Count the instructions in each loop iteration. d) If the loop runs for n=1000, which architecture spends more cycles on loop overhead (branch + counter decrement)? Are they the same? e) Rewrite both using SIMD (SSE2 for x86-64, NEON for ARM64) to process 2 int64_t per iteration. How many instructions now?
Exercise 7: RISC-V Basics
RISC-V RV64I has no condition flags. It uses compare-and-branch instructions (BEQ, BNE, BLT, BGE, BLTU, BGEU).
a) Write RISC-V code equivalent to if (a != b) goto label; (a in x10, b in x11)
b) Write RISC-V code for if (a < 0) goto negative; (a in x10 as signed)
c) Write RISC-V code for absolute value using BLT and NEG (RISC-V has SUB x0, x0, xn as NEG)
d) RISC-V also has no barrel shifter inline. Write RISC-V code for x = y * 5 (y in x10) using ADD and SLLI (shift left logical immediate).
Exercise 8: Apple Silicon Analysis
Apple M4 benchmarks (2026): - Single-threaded: ~3900 Geekbench score - Intel Core i9-14900K: ~2900 Geekbench score - AMD Ryzen 9 7950X: ~2800 Geekbench score
a) M4 runs at 4.0 GHz, Core i9-14900K runs at 5.5 GHz. M4 still wins. List 3 architectural reasons why a lower-clock ARM64 chip can outperform a higher-clock x86-64 chip.
b) Rosetta 2 translates x86-64 to ARM64. An x86-64 program under Rosetta 2 scores ~2700 Geekbench on M4 (same program scores 2900 on Intel natively). What is the Rosetta 2 overhead percentage?
c) A developer claims "I don't need to port my app to ARM64 because Rosetta 2 is fast enough." What are 3 reasons this might be a bad long-term strategy?
Exercise 9: Memory Ordering
x86-64 uses TSO (Total Store Order). ARM64 uses a weaker model that allows more reordering.
a) In x86-64, can a LOAD be reordered before a preceding STORE? (TSO answer) b) In ARM64, can a LOAD be reordered before a preceding STORE? What instruction prevents this? c) Rosetta 2 must emulate x86-64 TSO on ARM64. This requires inserting memory barrier instructions. Which ARM64 instruction(s) could be used? d) Why is the weaker ARM64 memory model generally considered an advantage rather than a disadvantage?
Exercise 10: Calling Convention Comparison
Write both x86-64 and ARM64 assembly for a function call to result = add_three(a, b, c) where a, b, c are int64_t values 10, 20, 30.
Show: 1. Setting up arguments before the call 2. Making the call instruction 3. Reading the return value after the call 4. Any stack alignment adjustments needed
Exercise 11: Code Density Measurement
Assemble the following function for both x86-64 and ARM64 (use Compiler Explorer or cross-compilers). Count the bytes in the compiled output:
int64_t weighted_sum(int64_t a, int64_t b, int64_t c) {
return a * 3 + b * 5 + c * 7;
}
a) How many bytes is the x86-64 version? b) How many bytes is the ARM64 version? c) Which uses more instructions? More bytes? d) At -O2, does the compiler use an IMUL for x86-64 or a multiply-with-shifts sequence? e) At -O2, does the ARM64 compiler use MUL or ADD+LSL sequences?
Exercise 12: Architecture Selection
For each use case, argue for x86-64 or ARM64:
a) A high-frequency trading server where single-thread latency is paramount and power cost is irrelevant b) An IoT sensor that runs on a coin cell battery for 2 years c) A cloud-based web server where cost-per-request is the primary metric d) A workstation for video editing and 3D rendering (multi-threaded + GPU compute) e) A smartphone application processor f) A laptop intended for 16-hour battery life
Exercise 13: Historical Context
Match each event to its approximate year:
| Event | Year |
|---|---|
| Intel 8086 (original x86) | |
| ARM1 processor (Acorn) | |
| ARMv8-A with AArch64 (64-bit ARM) | |
| First iPhone (ARM11, 32-bit) | |
| Apple M1 announcement | |
| AWS Graviton (first ARM64 cloud server) | |
| RISC-V ISA first published |
Years to use (not in order): 1978, 1985, 2007, 2011, 2018, 2010, 2020
Exercise 14: ISA Design Trade-offs Essay
Write 300-400 words answering:
"If you were designing a new CPU ISA from scratch in 2026, what design choices would you make regarding: (1) instruction width, (2) number of registers, (3) memory operands in ALU instructions, and (4) condition flags? Justify each choice by referencing the x86-64 and ARM64 comparison."
There is no single correct answer — the goal is to reason about the trade-offs.
Exercise 15: Future of ISAs
Consider RISC-V, WebAssembly (WASM), and domain-specific architectures (TPUs, NPUs):
a) WASM is described as a "portable ISA." In what way is WASM an ISA, and how is it different from x86-64 and ARM64? b) A neural network inference accelerator (NPU) doesn't use x86-64 or ARM64 instructions. Why is a domain-specific ISA often more efficient than a general-purpose one for a specific workload? c) If all major software were written in Rust or Go (which compile to any target), would the ISA choice matter less? What performance considerations remain ISA-dependent even for high-level language programs?