Case Study 16-1: Setting Up ARM64 Development — Raspberry Pi and QEMU

Objective

Get an ARM64 development environment running and write, assemble, link, execute, and debug a "Hello, World" program in ARM64 assembly. Understand every byte of the program's behavior before moving on.


Environment Options

We'll cover two setups: Raspberry Pi (native) and QEMU on x86-64 Ubuntu. Both produce identical results. Use whichever you have access to.

Setup A: Raspberry Pi 4/5 (Native ARM64)

Install Raspberry Pi OS 64-bit (Bookworm). Once booted:

# Verify you're on 64-bit ARM
uname -m
# Expected output: aarch64

# Install the assembler and linker
sudo apt update
sudo apt install binutils gcc

# Verify
as --version
# Expected: GNU assembler (Ubuntu ....) 2.41

On native hardware, as is already the ARM64 assembler — no cross-compilation prefix needed.

Setup B: QEMU on x86-64 Ubuntu/Debian

# Install cross tools and user-mode emulator
sudo apt install binutils-aarch64-linux-gnu qemu-user

# Verify
aarch64-linux-gnu-as --version
qemu-aarch64 --version

For the QEMU setup, every as command becomes aarch64-linux-gnu-as, every ld becomes aarch64-linux-gnu-ld. The rest of this case study shows both forms.


The Program

// hello_arm64.s
// A complete ARM64 Linux "Hello, World" using only system calls.
// No C library, no startup code — just the kernel.

// ============================================================
// Data Section: read-only string
// ============================================================
.section .rodata
.align 3                        // Align to 8-byte boundary (good practice)
msg:
    .ascii  "Hello, ARM64!\n"
// The '.' symbol is the current location counter
// msg_len = (current position) - (start of msg) = length in bytes
msg_len = . - msg               // Computed at assemble time: 14

// ============================================================
// Text Section: executable code
// ============================================================
.section .text
.global _start                  // _start is the ELF entry point
_start:

    // === System call: write(1, msg, 14) ===
    // Linux ARM64 syscall convention:
    //   X8  = syscall number
    //   X0  = arg1
    //   X1  = arg2
    //   X2  = arg3
    //   SVC #0 invokes the kernel

    MOV X8, #64                 // 64 = __NR_write (Linux ARM64)
    MOV X0, #1                  // fd = 1 (stdout)
    ADR X1, msg                 // X1 = PC-relative address of msg
    MOV X2, #msg_len            // X2 = length (14)
    SVC #0                      // Call kernel → write()
    // After return: X0 = bytes written (should be 14) or -errno

    // === System call: exit(0) ===
    MOV X8, #93                 // 93 = __NR_exit (Linux ARM64)
    MOV X0, #0                  // exit status = 0
    SVC #0                      // Call kernel → exit()
    // This syscall does not return

Step-by-Step Assembly and Execution

Step 1: Assemble

# Native ARM64 (Raspberry Pi):
as hello_arm64.s -o hello_arm64.o

# Cross-compiled (x86-64 host):
aarch64-linux-gnu-as hello_arm64.s -o hello_arm64.o

The assembler converts the text to an ELF object file. Look at it:

# Native:
objdump -d hello_arm64.o

# Cross:
aarch64-linux-gnu-objdump -d hello_arm64.o

Output:

hello_arm64.o:     file format elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <_start>:
   0:   d2800808        mov     x8, #0x40               // write syscall
   4:   d2800020        mov     x0, #0x1                // fd=1
   8:   10000061        adr     x1, 14 <_start+0x14>    // addr of msg
   c:   d28001c2        mov     x2, #0xe                // length=14
  10:   d4000001        svc     #0x0
  14:   d2800ba8        mov     x8, #0x5d               // exit syscall
  18:   d2800000        mov     x0, #0x0                // status=0
  1c:   d4000001        svc     #0x0

Notice: every instruction is exactly 4 bytes (8 hex digits). The ADR instruction at offset 0x8 encodes a PC-relative offset: the message starts at offset 0x14 (right after the code).

# Native:
ld hello_arm64.o -o hello_arm64

# Cross:
aarch64-linux-gnu-ld hello_arm64.o -o hello_arm64

Step 3: Run

# Native (Raspberry Pi):
./hello_arm64
Hello, ARM64!

# QEMU user emulation (x86-64 host):
qemu-aarch64 ./hello_arm64
Hello, ARM64!

Instruction-by-Instruction Explanation

MOV X8, #64

Loads the immediate value 64 into register X8. On Linux ARM64, X8 is the syscall number register (this is the first difference from x86-64, where syscall numbers go in RAX).

64 in decimal = 0x40. The Linux ARM64 syscall number for write is 64. You can verify:

grep -r '__NR_write' /usr/include/asm-generic/unistd.h
# #define __NR_write 64

ARM64 Linux uses the "generic" syscall table (not the i386-derived table that x86-64 uses), so many syscall numbers are different from x86-64.

MOV X0, #1

Sets X0 to 1. X0 is the first argument register (equivalent to RDI on x86-64). For write, argument 1 is the file descriptor. 1 is stdout.

ADR X1, msg

This instruction deserves attention. ADR loads a PC-relative address — the address of the label msg is computed as (current PC + offset) at runtime.

Why not MOV X1, msg? Because ARM64 can only encode a 16-bit immediate in a MOV instruction (with optional shifts, up to 64-bit, but this requires multiple instructions). For addresses, ADR is the right tool: it encodes a 21-bit PC-relative offset in the 32-bit instruction, giving a range of ±1MB.

In the disassembly, we see adr x1, 14 <_start+0x14>, meaning "load the address that is 20 (0x14) bytes ahead of this instruction's address." That's where .rodata landed after linking.

MOV X2, #msg_len

msg_len was computed at assemble time as 14. This becomes a simple immediate. X2 is the third argument (count for write).

SVC #0

Supervisor Call — invokes the OS kernel. The #0 is the immediate (traditionally used for the call type on bare-metal; Linux ignores it and looks at X8 for the syscall number). This is equivalent to SYSCALL on x86-64.

After this instruction: - The kernel executes write(1, <msg_addr>, 14) - X0 returns the number of bytes written (14) or a negative errno - X1-X7 may be clobbered by the kernel (the syscall clobbers X0-X7, X9-X15, X16-X17 per the Linux ARM64 ABI)

MOV X8, #93 / MOV X0, #0 / SVC #0

Linux ARM64 syscall 93 is exit. X0 = 0 is the exit status. SVC #0 calls the kernel. This call never returns.


Tracing with GDB/QEMU

To trace the program step by step:

# Terminal 1: run with GDB stub
qemu-aarch64 -g 1234 ./hello_arm64

# Terminal 2: connect GDB
aarch64-linux-gnu-gdb ./hello_arm64
(gdb) target remote :1234
(gdb) info registers
(gdb) layout regs
(gdb) stepi

Or on native Raspberry Pi:

gdb ./hello_arm64
(gdb) break _start
(gdb) run
(gdb) layout regs
(gdb) stepi

Trace output at each step:

Step 1: MOV X8, #64
  X8: 0x0000000000000000 → 0x0000000000000040

Step 2: MOV X0, #1
  X0: 0x0000000000000000 → 0x0000000000000001

Step 3: ADR X1, msg
  X1: 0x0000000000000000 → 0x0000000000400000  (actual address depends on load address)
  (GDB would show the actual virtual address where msg was loaded)

Step 4: MOV X2, #14
  X2: 0x0000000000000000 → 0x000000000000000E

Step 5: SVC #0
  [kernel prints "Hello, ARM64!" to stdout]
  X0: 0x000000000000000E  (14 bytes written, returned in X0)

Step 6: MOV X8, #93
  X8: 0x0000000000000040 → 0x000000000000005D

Step 7: MOV X0, #0
  X0: 0x000000000000000E → 0x0000000000000000

Step 8: SVC #0
  [process terminates]

Comparing ARM64 and x86-64 Hello World

Side by side for reference:

// ARM64                         // x86-64 (NASM)
.section .rodata                 section .data
msg:                             msg:
  .ascii "Hello!\n"                db "Hello!", 10
msg_len = . - msg                msg_len equ $ - msg

.section .text                   section .text
.global _start                   global _start
_start:                          _start:
  MOV X8, #64   // write           mov rax, 1      ; write
  MOV X0, #1    // fd=stdout        mov rdi, 1      ; fd=stdout
  ADR X1, msg   // buf              mov rsi, msg    ; buf
  MOV X2, #msg_len                  mov rdx, msg_len
  SVC #0                            syscall

  MOV X8, #93   // exit             mov rax, 60     ; exit
  MOV X0, #0                        mov rdi, 0
  SVC #0                            syscall

Key differences: 1. ARM64 uses X8 for syscall number; x86-64 uses RAX 2. ARM64 uses ADR for PC-relative address; x86-64 uses a direct label reference (position-dependent by default in NASM) 3. ARM64 uses SVC #0; x86-64 uses SYSCALL 4. The syscall numbers differ (write=64 on ARM64, write=1 on x86-64)


Common Setup Problems

Problem: qemu-aarch64: Could not open '/lib/ld-linux-aarch64.so.1': No such file or directory Solution: You linked a dynamically-linked binary but don't have the ARM64 runtime. Either add -static to your ld command, or install qemu-aarch64-static and the ARM64 libraries (libc6-arm64-cross).

Problem: aarch64-linux-gnu-ld: cannot find -lc Solution: You're trying to link with libc. For a pure system-call program (like this one), you don't need it. Just use aarch64-linux-gnu-ld hello_arm64.o -o hello_arm64 (no -lc).

Problem: Segmentation fault immediately Solution: Double-check your section names (.text not text, .rodata not rodata). Also verify the .global _start declaration.


Summary

A working ARM64 development environment lets you:

  1. Assemble with aarch64-linux-gnu-as (cross) or as (native)
  2. Link with aarch64-linux-gnu-ld or ld
  3. Execute with qemu-aarch64 (user mode) or natively on ARM64 hardware
  4. Debug with aarch64-linux-gnu-gdb + QEMU GDB stub

The Hello World program demonstrates the fundamental ARM64 syscall mechanism: load syscall number into X8, arguments into X0-X5, call SVC #0, read result from X0. This pattern is the foundation for all Linux ARM64 assembly programming.