Chapter 18 Quiz: ARM64 Programming

Select the best answer for each question.


1. To access element arr[i] of an int64_t array (8-byte elements) where X0=arr and W1=i, the correct ARM64 instruction is:

a) LDR X2, [X0, X1, LSL #8] b) LDR X2, [X0, X1, LSL #3] c) LDR X2, [X0, X1, LSL #2] d) LDR X2, [X0 + X1*8]


2. In ARM64 NEON, what does V0.4S mean?

a) 4 × 64-bit doubles b) 4 × 32-bit integers or floats c) 4 × 16-bit halfwords d) 40-bit scalar register


3. What does FMLA V0.4S, V1.4S, V2.S[0] do?

a) V0[i] = V1[i] * V2[i] for i in 0..3 b) V0[i] += V1[i] * V2[i] for i in 0..3 c) V0[i] += V1[i] * V2[0] for i in 0..3 (broadcast scalar V2[0]) d) V0[i] = V1[i] + V2[0] for i in 0..3


4. Which ARM64 instruction converts a signed 64-bit integer in X0 to a double-precision float in D0?

a) MOV D0, X0 b) FMOV D0, X0 c) SCVTF D0, X0 d) CVTSI2SD D0, X0


5. What does UMAXV S0, V1.4S compute?

a) S0 = min of all 4 int32 elements of V1 b) S0 = max of all 4 int32 elements of V1 (unsigned) c) S0 = sum of all 4 float32 elements of V1 d) S0 = V1[0] max V1[1] max V1[2]


6. In a Linux ARM64 program's _start, where does argv[1] (the first command-line argument) live?

a) X1 register at program entry b) Memory at [SP + 8] c) Memory at [SP + 16] d) Memory at [SP + 24]


7. On macOS ARM64, system calls differ from Linux ARM64 in which way?

a) macOS uses X8 for syscall numbers; Linux uses X16 b) macOS uses X16 for syscall numbers and SVC #0x80; Linux uses X8 and SVC #0 c) macOS uses INT 0x80; Linux uses SVC d) They are identical — both use X8 and SVC #0


8. Which NEON register size is used to load 16 bytes at once?

a) D0 (64-bit) b) S0 (32-bit) c) Q0 (128-bit) d) V0.8H (128-bit but only half used)


9. Which FP/SIMD registers are callee-saved in AAPCS64?

a) V0-V7 b) V8-V15 (lower 64 bits only) c) V8-V31 (all 128 bits) d) V16-V31


10. What does FMOV X0, D0 do?

a) Converts D0 (double) to int64_t in X0 b) Moves the bit pattern of D0 into X0 without type conversion c) Moves X0 (integer) into D0 as a float d) This is not a valid instruction


11. The instruction MOVI V1.16B, #0 does what?

a) Moves a 16-bit immediate into V1 b) Sets all 16 bytes of V1 to 0 c) Moves 16 bytes from memory at address 0 into V1 d) Clears only the first byte of V1


12. After computing n = 1000 floating-point additions using scalar FADD, a NEON approach using FADD V.4S would reduce the number of loop iterations to approximately:

a) 125 b) 250 c) 500 d) 1000 (no improvement — FADD still processes one at a time)


13. Which instruction correctly saves a callee-saved NEON register D8 to the stack?

a) PUSH D8 b) STR D8, [SP, #-8]! c) STR Q8, [SP, #-16]! (for full 128-bit) d) Both b and c, depending on whether you use D8 as a 64-bit or 128-bit register


14. CMEQ V0.16B, V0.16B, V1.16B compares 16 bytes. What value is placed in each byte of V0 where they are equal?

a) 0x01 b) 0x7F c) 0xFF d) 0x00 (the bytes are zeroed if equal)


15. What section directive is used in Mach-O (macOS) ARM64 assembly to declare read-only constant data?

a) .section .rodata b) .section __TEXT,__const c) .section __DATA,__data d) .const_data


16. For implementing memcpy efficiently on ARM64, LDP/STP is preferred over individual LDR/STR because:

a) LDP/STP are the only instructions that can copy memory b) LDP/STP copy 16 bytes per instruction pair, halving the number of instructions c) LDP/STP are atomic d) Single LDR/STR cannot be used for memory copies


17. Rosetta 2 on Apple Silicon:

a) Runs ARM64 binaries on Intel Macs b) Translates x86-64 code to ARM64 for execution on M-series chips c) Emulates full x86-64 hardware in software d) Requires manual recompilation of x86-64 apps


18. In the null-byte detection algorithm has_zero_byte = (x - 0x0101...) & ~x & 0x8080..., the 0x8080... mask is used to:

a) Set all bytes to 0x80 b) Isolate the most significant bit of each byte (detect potential null indicators) c) Mask off the carry from each byte addition d) Zero out all odd-numbered bytes


19. Why must you use UXTW when using a 32-bit W register as an index in LDR X0, [X1, W2, UXTW #3]?

a) UXTW sets condition flags before the load b) The base+register addressing mode requires a 64-bit offset; UXTW zero-extends W2 to 64 bits c) UXTW multiplies W2 by the element size d) Without UXTW, W2 would be treated as a signed offset


20. In the horizontal sum pattern for NEON float reduction, FADDP S0, V0.2S does what?

a) Adds all 4 elements of V0.4S b) Adds the first two elements of V0 (V0[0] + V0[1]) → S0 c) Broadcasts V0[0] to all positions in S0 d) This is an invalid instruction