Chapter 17 Exercises: ARM64 Instruction Set


Exercise 1: Instruction Translation — x86-64 to ARM64

Translate each x86-64 instruction sequence to equivalent ARM64 code. Assume: X0=rax, X1=rbx, X2=rcx, X3=rdx.

a)

mov rax, 100
add rax, rbx
sub rax, 10

b)

imul rax, rbx
add rax, [rcx]         ; assume rcx contains a pointer
mov [rdx], rax

c)

and rax, 0xFF
or  rax, 0x100
xor rbx, rbx           ; zero rbx

d)

shl rax, 4
shr rbx, 2
sar rcx, 1

e)

; multiply by 9 using shifts
lea rax, [rax + rax*8]

Exercise 2: ARM64 Register Traces

Trace the following ARM64 code. Fill in the register values after each instruction. Initial values: X0=100, X1=7, X2=3.

SDIV  X3, X0, X1     // X3 = ?
MSUB  X4, X3, X1, X0 // X4 = ?
MUL   X5, X3, X2     // X5 = ?
ADD   X6, X4, X5     // X6 = ?
Instruction X3 X4 X5 X6
SDIV X3,X0,X1
MSUB X4,X3,X1,X0
MUL X5,X3,X2
ADD X6,X4,X5

Exercise 3: Calling Convention

Write the prologue and epilogue for a function that: - Takes 3 int64_t arguments - Uses callee-saved registers X19, X20, X21 internally - Has 32 bytes of local variable space - Calls another function (so must save LR)


Exercise 4: Addressing Modes

Compute the effective address (memory address used) and describe what happens to the base register for each addressing mode variant. Assume X0 = 0x1000, X1 = 4.

a) LDR X2, [X0] b) LDR X2, [X0, #16] c) LDR X2, [X0, X1] d) LDR X2, [X0, X1, LSL #3] e) LDR X2, [X0, #16]! f) LDR X2, [X0], #16

After each instruction, what is the value of X0?


Exercise 5: Barrel Shifter

Rewrite each expression as a single ARM64 instruction using the inline barrel shifter:

a) ADD X0, X1, X2 * 4 (where * means actual multiply) b) ADD X0, X1, X2 / 2 (logical right shift) c) SUB X0, X1, X2 * 8 d) AND X0, X1, X2 * 16 (is this valid?) e) Write X0 = X1 * 5 using ADD with a barrel shifter (hint: 5 = 4+1)


Exercise 6: LDP and STP

a) Write the ARM64 instructions to save X19, X20, X21, X22 to the stack (callee-saved registers), adjusting SP appropriately.

b) Write the corresponding restore sequence.

c) Draw the stack diagram showing the saved register layout relative to SP.

d) How many instructions does this require? How many would be required if we used individual STR/LDR instructions?


Exercise 7: Conditional Branches and Flag Instructions

For each C expression, write the ARM64 code that evaluates it and branches to label_true if true, label_false otherwise. Assume X0=a, X1=b (signed 64-bit).

a) a == b b) a != 0 c) a < b (signed) d) a >= 0 (is a non-negative?) e) (a & 0xF) == 0 (lower nibble is zero?) f) a == b || a > 0 (compound condition — requires multiple branches)


Exercise 8: CBZ, CBNZ, TBZ, TBNZ

Simplify each sequence using CBZ, CBNZ, TBZ, or TBNZ:

a)

CMP X0, #0
B.EQ loop_exit

b)

CMP X1, #0
B.NE loop_body

c)

TST X0, #1
B.NE is_odd

d)

TST X0, #0x8000000000000000   // test sign bit
B.NE is_negative

Exercise 9: System Call Program

Write a complete ARM64 Linux assembly program (no C library) that:

  1. Reads up to 64 bytes from stdin (fd=0) into a buffer
  2. Writes those bytes back to stdout (fd=1)
  3. Exits with status 0

Use only read (syscall 63), write (syscall 64), and exit (syscall 93).


Exercise 10: AAPCS64 Argument Passing

For each C function prototype, state which register(s) hold each argument and where the return value goes:

a) int foo(int a, int b, int c); b) void bar(char *s, int len, int flags); c) uint64_t baz(uint64_t x, uint64_t y, uint64_t z, uint64_t w, uint64_t v, uint64_t u, uint64_t t, uint64_t s); (8 arguments) d) double fadd(double a, double b); (floating-point) e) int nine_args(int a, int b, int c, int d, int e, int f, int g, int h, int i); (9 arguments — where does i go?)


Exercise 11: 128-bit Arithmetic

Write ARM64 code to compute the full 128-bit product of two 64-bit unsigned integers in X0 and X1. Store the high 64 bits in X2 and the low 64 bits in X3.

Then write code to add two 128-bit numbers: - (X1:X0) + (X3:X2) → (X5:X4)


Exercise 12: Loop Implementation

Translate the following C loop patterns to ARM64 assembly:

a) for (int i = 0; i < n; i++) { sum += arr[i]; } (n in W1, arr in X0, sum in X2)

b) while (*p != 0) { count++; p++; } (p in X0, count in X1; char array)

c) do { x = x * 2; } while (x < 1000); (x in X0)

d) Optimize loop (b) using LDRB and CBZ


Exercise 13: Sized Loads and Sign Extension

Write ARM64 code for each scenario:

a) Load a char (signed byte) from address X0 into X1, sign-extended to 64 bits b) Load an unsigned char (unsigned byte) from address X0 into X1, zero-extended to 64 bits c) Load a short (signed 16-bit) from address X0 into X1, sign-extended to 64 bits d) Load an int (signed 32-bit) from address X0 into X1, sign-extended to 64 bits e) Store the low byte of X0 to the address in X1


Exercise 14: Division and Remainder

a) Write ARM64 code to compute quotient = a / b and remainder = a % b for signed integers in X0 and X1.

b) Write ARM64 code to compute a % 16 without using SDIV (hint: 16 is a power of 2).

c) Write ARM64 code to compute a / 8 for an unsigned integer without using UDIV.

d) If a function needs to divide by a constant 7, the compiler will replace SDIV with a multiply-high + shift sequence. Using SMULH and shifts, implement x / 7 for unsigned x (research: use the magic number for dividing by 7).


Exercise 15: Instruction Encoding Challenge

ARM64 instructions are 32 bits. The MOV X0, #imm16 instruction encoding has: - 8 bits: opcode - 2 bits: hw (shift amount: 0, 16, 32, 48) - 16 bits: imm16 - 5 bits: Rd (destination register)

a) How many distinct registers can be encoded in 5 bits? b) What is the maximum value of imm16? c) With MOVK (keep other bits), how many instructions are needed to load an arbitrary 64-bit constant? d) If the constant is 0x0000000000001234, how many MOV/MOVK instructions are needed? How many if it's 0xFFFFFFFFFFFF1234? e) What does MOVN X0, #0 do? What value is in X0 afterward?