Chapter 19 Exercises: x86-64 vs. ARM64 Comparison

Open Assembly Language Project

Chapter 19 Exercises: x86-64 vs. ARM64 Comparison

Exercise 1: Architecture Philosophy

For each design choice, explain the tradeoff it represents and which architecture made which choice:

a) Memory operands in arithmetic instructions b) Fixed vs. variable instruction width c) Condition flags updated on every ALU operation vs. only with S suffix d) Return address on stack vs. in a register e) Zero register vs. clearing a register explicitly

Exercise 2: Port Code Between ISAs

Translate each x86-64 function to ARM64 assembly. State any differences in approach.

a)

; Absolute value
abs_val:
    mov   eax, edi
    cdq                  ; sign-extend EAX into EDX:EAX
    xor   eax, edx       ; EAX ^= EDX
    sub   eax, edx       ; EAX -= EDX
    ret

b)

; Count set bits (popcount)
count_bits:
    xor   eax, eax
.loop:
    test  edi, edi
    jz    .done
    mov   ecx, edi
    and   ecx, 1
    add   eax, ecx
    shr   edi, 1
    jmp   .loop
.done:
    ret

c)

; Swap two integers via memory
swap:                     ; rdi = int *a, rsi = int *b
    mov  eax, [rdi]
    mov  ecx, [rsi]
    mov  [rdi], ecx
    mov  [rsi], eax
    ret

Exercise 3: Port ARM64 to x86-64

Translate each ARM64 snippet to equivalent x86-64:

a)

// CSEL branchless max
CMP  X0, X1
CSEL X0, X0, X1, GT    // X0 = max(X0, X1)

b)

// Barrel shifter multiplication trick
ADD X0, X0, X0, LSL #3  // X0 = X0 * 9

c)

// CBZ loop pattern
.loop:
    LDR  W2, [X0], #4
    ADD  X1, X1, X2
    SUBS W3, W3, #1
    CBNZ W3, .loop

Exercise 4: Register File Analysis

a) A C function has 7 local variables of type int64_t. On x86-64, how many can live in callee-saved registers without using the stack? On ARM64?

b) A function calls printf(fmt, a, b, c, d, e, f, g, h) — 9 arguments (1 format + 8 values). How many arguments go onto the stack on x86-64? On ARM64?

c) The "red zone" (128 bytes below RSP/SP) exists on both x86-64 (System V) and ARM64 (AAPCS64). What is the red zone for, and why can't kernel code use it?

Exercise 5: Instruction Encoding

a) The x86-64 instruction MOV RAX, RBX has the encoding 48 89 D8. Decode this: - What does byte 0x48 indicate? - What does byte 0x89 indicate? - What does byte 0xD8 (11 011 011 in binary, where bits [7:6]=mod, [5:3]=reg, [2:0]=rm) indicate?

b) An ARM64 MOV X0, X1 is ORR X0, XZR, X1, which encodes to 32 bits. With fixed encoding, a 5-bit field identifies the destination register (allowing 32 registers). How many distinct x86-64 registers can a single 3-bit r/m field encode without a REX prefix?

c) The ARM64 branch instruction B label has a 26-bit immediate. What is the maximum forward branch distance in bytes? In megabytes?

Exercise 6: Performance Analysis

Given this inner loop, analyzed on both architectures:

for (int i = 0; i < n; i++) {
    c[i] = a[i] + b[i];
}

a) Write the x86-64 assembly for this loop (using scalar int64_t). b) Write the ARM64 assembly for this loop (using scalar int64_t). c) Count the instructions in each loop iteration. d) If the loop runs for n=1000, which architecture spends more cycles on loop overhead (branch + counter decrement)? Are they the same? e) Rewrite both using SIMD (SSE2 for x86-64, NEON for ARM64) to process 2 int64_t per iteration. How many instructions now?

Exercise 7: RISC-V Basics

RISC-V RV64I has no condition flags. It uses compare-and-branch instructions (BEQ, BNE, BLT, BGE, BLTU, BGEU).

a) Write RISC-V code equivalent to if (a != b) goto label; (a in x10, b in x11) b) Write RISC-V code for if (a < 0) goto negative; (a in x10 as signed) c) Write RISC-V code for absolute value using BLT and NEG (RISC-V has SUB x0, x0, xn as NEG) d) RISC-V also has no barrel shifter inline. Write RISC-V code for x = y * 5 (y in x10) using ADD and SLLI (shift left logical immediate).

Exercise 8: Apple Silicon Analysis

Apple M4 benchmarks (2026): - Single-threaded: ~3900 Geekbench score - Intel Core i9-14900K: ~2900 Geekbench score - AMD Ryzen 9 7950X: ~2800 Geekbench score

a) M4 runs at 4.0 GHz, Core i9-14900K runs at 5.5 GHz. M4 still wins. List 3 architectural reasons why a lower-clock ARM64 chip can outperform a higher-clock x86-64 chip.

b) Rosetta 2 translates x86-64 to ARM64. An x86-64 program under Rosetta 2 scores ~2700 Geekbench on M4 (same program scores 2900 on Intel natively). What is the Rosetta 2 overhead percentage?

c) A developer claims "I don't need to port my app to ARM64 because Rosetta 2 is fast enough." What are 3 reasons this might be a bad long-term strategy?

Exercise 9: Memory Ordering

x86-64 uses TSO (Total Store Order). ARM64 uses a weaker model that allows more reordering.

a) In x86-64, can a LOAD be reordered before a preceding STORE? (TSO answer) b) In ARM64, can a LOAD be reordered before a preceding STORE? What instruction prevents this? c) Rosetta 2 must emulate x86-64 TSO on ARM64. This requires inserting memory barrier instructions. Which ARM64 instruction(s) could be used? d) Why is the weaker ARM64 memory model generally considered an advantage rather than a disadvantage?

Exercise 10: Calling Convention Comparison

Write both x86-64 and ARM64 assembly for a function call to result = add_three(a, b, c) where a, b, c are int64_t values 10, 20, 30.

Show: 1. Setting up arguments before the call 2. Making the call instruction 3. Reading the return value after the call 4. Any stack alignment adjustments needed

Exercise 11: Code Density Measurement

Assemble the following function for both x86-64 and ARM64 (use Compiler Explorer or cross-compilers). Count the bytes in the compiled output:

int64_t weighted_sum(int64_t a, int64_t b, int64_t c) {
    return a * 3 + b * 5 + c * 7;
}

a) How many bytes is the x86-64 version? b) How many bytes is the ARM64 version? c) Which uses more instructions? More bytes? d) At -O2, does the compiler use an IMUL for x86-64 or a multiply-with-shifts sequence? e) At -O2, does the ARM64 compiler use MUL or ADD+LSL sequences?

Exercise 12: Architecture Selection

For each use case, argue for x86-64 or ARM64:

a) A high-frequency trading server where single-thread latency is paramount and power cost is irrelevant b) An IoT sensor that runs on a coin cell battery for 2 years c) A cloud-based web server where cost-per-request is the primary metric d) A workstation for video editing and 3D rendering (multi-threaded + GPU compute) e) A smartphone application processor f) A laptop intended for 16-hour battery life

Exercise 13: Historical Context

Match each event to its approximate year:

Event	Year
Intel 8086 (original x86)
ARM1 processor (Acorn)
ARMv8-A with AArch64 (64-bit ARM)
First iPhone (ARM11, 32-bit)
Apple M1 announcement
AWS Graviton (first ARM64 cloud server)
RISC-V ISA first published

Years to use (not in order): 1978, 1985, 2007, 2011, 2018, 2010, 2020

Exercise 14: ISA Design Trade-offs Essay

Write 300-400 words answering:

"If you were designing a new CPU ISA from scratch in 2026, what design choices would you make regarding: (1) instruction width, (2) number of registers, (3) memory operands in ALU instructions, and (4) condition flags? Justify each choice by referencing the x86-64 and ARM64 comparison."

There is no single correct answer — the goal is to reason about the trade-offs.

Exercise 15: Future of ISAs

Consider RISC-V, WebAssembly (WASM), and domain-specific architectures (TPUs, NPUs):

a) WASM is described as a "portable ISA." In what way is WASM an ISA, and how is it different from x86-64 and ARM64? b) A neural network inference accelerator (NPU) doesn't use x86-64 or ARM64 instructions. Why is a domain-specific ISA often more efficient than a general-purpose one for a specific workload? c) If all major software were written in Rust or Go (which compile to any target), would the ISA choice matter less? What performance considerations remain ISA-dependent even for high-level language programs?