Case Study 16-1: Setting Up ARM64 Development — Raspberry Pi and QEMU
Objective
Get an ARM64 development environment running and write, assemble, link, execute, and debug a "Hello, World" program in ARM64 assembly. Understand every byte of the program's behavior before moving on.
Environment Options
We'll cover two setups: Raspberry Pi (native) and QEMU on x86-64 Ubuntu. Both produce identical results. Use whichever you have access to.
Setup A: Raspberry Pi 4/5 (Native ARM64)
Install Raspberry Pi OS 64-bit (Bookworm). Once booted:
# Verify you're on 64-bit ARM
uname -m
# Expected output: aarch64
# Install the assembler and linker
sudo apt update
sudo apt install binutils gcc
# Verify
as --version
# Expected: GNU assembler (Ubuntu ....) 2.41
On native hardware, as is already the ARM64 assembler — no cross-compilation prefix needed.
Setup B: QEMU on x86-64 Ubuntu/Debian
# Install cross tools and user-mode emulator
sudo apt install binutils-aarch64-linux-gnu qemu-user
# Verify
aarch64-linux-gnu-as --version
qemu-aarch64 --version
For the QEMU setup, every as command becomes aarch64-linux-gnu-as, every ld becomes aarch64-linux-gnu-ld. The rest of this case study shows both forms.
The Program
// hello_arm64.s
// A complete ARM64 Linux "Hello, World" using only system calls.
// No C library, no startup code — just the kernel.
// ============================================================
// Data Section: read-only string
// ============================================================
.section .rodata
.align 3 // Align to 8-byte boundary (good practice)
msg:
.ascii "Hello, ARM64!\n"
// The '.' symbol is the current location counter
// msg_len = (current position) - (start of msg) = length in bytes
msg_len = . - msg // Computed at assemble time: 14
// ============================================================
// Text Section: executable code
// ============================================================
.section .text
.global _start // _start is the ELF entry point
_start:
// === System call: write(1, msg, 14) ===
// Linux ARM64 syscall convention:
// X8 = syscall number
// X0 = arg1
// X1 = arg2
// X2 = arg3
// SVC #0 invokes the kernel
MOV X8, #64 // 64 = __NR_write (Linux ARM64)
MOV X0, #1 // fd = 1 (stdout)
ADR X1, msg // X1 = PC-relative address of msg
MOV X2, #msg_len // X2 = length (14)
SVC #0 // Call kernel → write()
// After return: X0 = bytes written (should be 14) or -errno
// === System call: exit(0) ===
MOV X8, #93 // 93 = __NR_exit (Linux ARM64)
MOV X0, #0 // exit status = 0
SVC #0 // Call kernel → exit()
// This syscall does not return
Step-by-Step Assembly and Execution
Step 1: Assemble
# Native ARM64 (Raspberry Pi):
as hello_arm64.s -o hello_arm64.o
# Cross-compiled (x86-64 host):
aarch64-linux-gnu-as hello_arm64.s -o hello_arm64.o
The assembler converts the text to an ELF object file. Look at it:
# Native:
objdump -d hello_arm64.o
# Cross:
aarch64-linux-gnu-objdump -d hello_arm64.o
Output:
hello_arm64.o: file format elf64-littleaarch64
Disassembly of section .text:
0000000000000000 <_start>:
0: d2800808 mov x8, #0x40 // write syscall
4: d2800020 mov x0, #0x1 // fd=1
8: 10000061 adr x1, 14 <_start+0x14> // addr of msg
c: d28001c2 mov x2, #0xe // length=14
10: d4000001 svc #0x0
14: d2800ba8 mov x8, #0x5d // exit syscall
18: d2800000 mov x0, #0x0 // status=0
1c: d4000001 svc #0x0
Notice: every instruction is exactly 4 bytes (8 hex digits). The ADR instruction at offset 0x8 encodes a PC-relative offset: the message starts at offset 0x14 (right after the code).
Step 2: Link
# Native:
ld hello_arm64.o -o hello_arm64
# Cross:
aarch64-linux-gnu-ld hello_arm64.o -o hello_arm64
Step 3: Run
# Native (Raspberry Pi):
./hello_arm64
Hello, ARM64!
# QEMU user emulation (x86-64 host):
qemu-aarch64 ./hello_arm64
Hello, ARM64!
Instruction-by-Instruction Explanation
MOV X8, #64
Loads the immediate value 64 into register X8. On Linux ARM64, X8 is the syscall number register (this is the first difference from x86-64, where syscall numbers go in RAX).
64 in decimal = 0x40. The Linux ARM64 syscall number for write is 64. You can verify:
grep -r '__NR_write' /usr/include/asm-generic/unistd.h
# #define __NR_write 64
ARM64 Linux uses the "generic" syscall table (not the i386-derived table that x86-64 uses), so many syscall numbers are different from x86-64.
MOV X0, #1
Sets X0 to 1. X0 is the first argument register (equivalent to RDI on x86-64). For write, argument 1 is the file descriptor. 1 is stdout.
ADR X1, msg
This instruction deserves attention. ADR loads a PC-relative address — the address of the label msg is computed as (current PC + offset) at runtime.
Why not MOV X1, msg? Because ARM64 can only encode a 16-bit immediate in a MOV instruction (with optional shifts, up to 64-bit, but this requires multiple instructions). For addresses, ADR is the right tool: it encodes a 21-bit PC-relative offset in the 32-bit instruction, giving a range of ±1MB.
In the disassembly, we see adr x1, 14 <_start+0x14>, meaning "load the address that is 20 (0x14) bytes ahead of this instruction's address." That's where .rodata landed after linking.
MOV X2, #msg_len
msg_len was computed at assemble time as 14. This becomes a simple immediate. X2 is the third argument (count for write).
SVC #0
Supervisor Call — invokes the OS kernel. The #0 is the immediate (traditionally used for the call type on bare-metal; Linux ignores it and looks at X8 for the syscall number). This is equivalent to SYSCALL on x86-64.
After this instruction:
- The kernel executes write(1, <msg_addr>, 14)
- X0 returns the number of bytes written (14) or a negative errno
- X1-X7 may be clobbered by the kernel (the syscall clobbers X0-X7, X9-X15, X16-X17 per the Linux ARM64 ABI)
MOV X8, #93 / MOV X0, #0 / SVC #0
Linux ARM64 syscall 93 is exit. X0 = 0 is the exit status. SVC #0 calls the kernel. This call never returns.
Tracing with GDB/QEMU
To trace the program step by step:
# Terminal 1: run with GDB stub
qemu-aarch64 -g 1234 ./hello_arm64
# Terminal 2: connect GDB
aarch64-linux-gnu-gdb ./hello_arm64
(gdb) target remote :1234
(gdb) info registers
(gdb) layout regs
(gdb) stepi
Or on native Raspberry Pi:
gdb ./hello_arm64
(gdb) break _start
(gdb) run
(gdb) layout regs
(gdb) stepi
Trace output at each step:
Step 1: MOV X8, #64
X8: 0x0000000000000000 → 0x0000000000000040
Step 2: MOV X0, #1
X0: 0x0000000000000000 → 0x0000000000000001
Step 3: ADR X1, msg
X1: 0x0000000000000000 → 0x0000000000400000 (actual address depends on load address)
(GDB would show the actual virtual address where msg was loaded)
Step 4: MOV X2, #14
X2: 0x0000000000000000 → 0x000000000000000E
Step 5: SVC #0
[kernel prints "Hello, ARM64!" to stdout]
X0: 0x000000000000000E (14 bytes written, returned in X0)
Step 6: MOV X8, #93
X8: 0x0000000000000040 → 0x000000000000005D
Step 7: MOV X0, #0
X0: 0x000000000000000E → 0x0000000000000000
Step 8: SVC #0
[process terminates]
Comparing ARM64 and x86-64 Hello World
Side by side for reference:
// ARM64 // x86-64 (NASM)
.section .rodata section .data
msg: msg:
.ascii "Hello!\n" db "Hello!", 10
msg_len = . - msg msg_len equ $ - msg
.section .text section .text
.global _start global _start
_start: _start:
MOV X8, #64 // write mov rax, 1 ; write
MOV X0, #1 // fd=stdout mov rdi, 1 ; fd=stdout
ADR X1, msg // buf mov rsi, msg ; buf
MOV X2, #msg_len mov rdx, msg_len
SVC #0 syscall
MOV X8, #93 // exit mov rax, 60 ; exit
MOV X0, #0 mov rdi, 0
SVC #0 syscall
Key differences:
1. ARM64 uses X8 for syscall number; x86-64 uses RAX
2. ARM64 uses ADR for PC-relative address; x86-64 uses a direct label reference (position-dependent by default in NASM)
3. ARM64 uses SVC #0; x86-64 uses SYSCALL
4. The syscall numbers differ (write=64 on ARM64, write=1 on x86-64)
Common Setup Problems
Problem: qemu-aarch64: Could not open '/lib/ld-linux-aarch64.so.1': No such file or directory
Solution: You linked a dynamically-linked binary but don't have the ARM64 runtime. Either add -static to your ld command, or install qemu-aarch64-static and the ARM64 libraries (libc6-arm64-cross).
Problem: aarch64-linux-gnu-ld: cannot find -lc
Solution: You're trying to link with libc. For a pure system-call program (like this one), you don't need it. Just use aarch64-linux-gnu-ld hello_arm64.o -o hello_arm64 (no -lc).
Problem: Segmentation fault immediately
Solution: Double-check your section names (.text not text, .rodata not rodata). Also verify the .global _start declaration.
Summary
A working ARM64 development environment lets you:
- Assemble with
aarch64-linux-gnu-as(cross) oras(native) - Link with
aarch64-linux-gnu-ldorld - Execute with
qemu-aarch64(user mode) or natively on ARM64 hardware - Debug with
aarch64-linux-gnu-gdb+ QEMU GDB stub
The Hello World program demonstrates the fundamental ARM64 syscall mechanism: load syscall number into X8, arguments into X0-X5, call SVC #0, read result from X0. This pattern is the foundation for all Linux ARM64 assembly programming.