Case Study 18-2: ARM64 on Apple Silicon — Hello World on macOS
Objective
Write, assemble, link, and run an ARM64 assembly "Hello, World" on macOS Apple Silicon (M-series). Understand the differences from Linux ARM64: system call mechanism, Mach-O binary format, section names, and the use of Clang vs. GNU assembler.
Background: What Makes Apple Silicon Different
The M-series chips (M1 through M4 as of 2026) are ARM64 processors. The hardware is AArch64. The instruction set is the same standard ARM64 you've been learning.
What differs is everything around the hardware:
1. Binary format: Mach-O (not ELF)
2. Section naming: __TEXT,__text not .text
3. System call convention: X16 for syscall number, SVC #0x80 (not X8 / SVC #0)
4. Syscall numbers: BSD-derived (not Linux generic)
5. Assembler: Apple's as (based on LLVM) or Clang directly
6. Linker: Apple's ld (requires -lSystem or -syslibroot arguments)
7. Dynamic linker: /usr/lib/dyld (not /lib/ld-linux-aarch64.so.1)
8. Entry point: _start for bare metal, or main when linking with libc/libSystem
The Hello World Program (macOS ARM64)
// hello_macos.s
// macOS ARM64 (AArch64) Hello World
//
// Build:
// as -arch arm64 -o hello_macos.o hello_macos.s
// ld -arch arm64 -platform_version macos 13.0 13.0 \
// -e _start -o hello_macos hello_macos.o
// OR with Clang (easier):
// clang -arch arm64 hello_macos.s -o hello_macos -nostdlib -e _start
//
// Run: ./hello_macos
// ============================================================
// macOS uses Mach-O sections, not ELF sections
// __TEXT,__text = executable code (like .text in ELF)
// __TEXT,__const = read-only data (like .rodata in ELF)
// __DATA,__data = initialized data (like .data in ELF)
// ============================================================
.section __TEXT,__text
.global _start
.align 2 // ARM64 instructions must be 4-byte aligned
_start:
// === macOS write system call ===
// macOS ARM64 syscall convention:
// X16 = syscall number (NOT X8 like Linux!)
// X0-X5 = arguments
// SVC #0x80 = invoke Unix system call
MOV X16, #4 // macOS BSD syscall: write = 4
// (NOT 64 like Linux ARM64!)
MOV X0, #1 // fd = stdout (same as Linux)
ADR X1, msg // X1 = address of message
MOV X2, #msg_len // X2 = length
SVC #0x80 // invoke macOS kernel (Unix call)
// X0 = bytes written, or -errno on error
// === macOS exit system call ===
MOV X16, #1 // macOS BSD syscall: exit = 1
// (NOT 93 like Linux ARM64!)
MOV X0, #0 // exit status = 0
SVC #0x80
// ============================================================
// Read-only constant data in __TEXT,__const
// ============================================================
.section __TEXT,__const
msg:
.ascii "Hello, Apple Silicon!\n"
msg_len = . - msg // = 22
Build Process: macOS
Method 1: Apple's Assembler + Linker
# Assemble (produces Mach-O object file)
as -arch arm64 -o hello_macos.o hello_macos.s
# Link (the platform_version and -e flags are required)
ld -arch arm64 \
-platform_version macos 13.0 13.0 \
-e _start \
-o hello_macos \
hello_macos.o
# Run
./hello_macos
Hello, Apple Silicon!
Method 2: Clang (simpler)
# Clang assembles and links in one step
# -nostdlib: don't link standard libraries
# -e _start: entry point is _start
clang -arch arm64 \
-nostdlib \
-e _start \
hello_macos.s \
-o hello_macos
# Or, for static programs (no dyld):
clang -arch arm64 \
-nostdlib \
-static \
-e _start \
hello_macos.s \
-o hello_macos
Method 3: Using libSystem (recommended for production)
Instead of raw syscalls, use libSystem (Apple's libc wrapper):
// hello_macos_libc.s — using printf from libSystem
.section __TEXT,__text
.global _main
.align 2
_main:
STP X29, X30, [SP, #-16]!
MOV X29, SP
ADRP X0, msg@PAGE // load high bits of msg address (page-relative)
ADD X0, X0, msg@PAGEOFF // add low bits (offset within page)
BL _printf // call printf (external C function)
MOV W0, #0 // return 0
LDP X29, X30, [SP], #16
RET
.section __TEXT,__const
.align 2
msg:
.asciz "Hello, Apple Silicon!\n"
# Link with libSystem (provides printf, malloc, etc.)
clang -arch arm64 hello_macos_libc.s -o hello_macos_libc
# (No -nostdlib — we want libSystem linked)
./hello_macos_libc
Hello, Apple Silicon!
⚠️ Common Mistake: On macOS, external function names from C libraries are prefixed with
_in assembly.printfin C is_printfin assembly.mallocis_malloc.exitis_exit. This is a historical convention from early macOS (based on NeXTSTEP/BSD). Linux doesn't use this convention.
Examining the Mach-O Binary
# Show file type
file hello_macos
# hello_macos: Mach-O 64-bit executable arm64
# Show Mach-O headers
otool -h hello_macos
# Mach header
# magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
# 0xFEEDFACF 16777228 0 0x00 2 16 688 0x00200085
# Show load commands (segments, section layout)
otool -l hello_macos | head -60
# Disassemble
otool -tv hello_macos
# OR:
objdump --disassemble hello_macos # LLVM objdump
# Show symbol table
nm hello_macos
Mach-O vs. ELF: Structure Comparison
ELF (Linux) Mach-O (macOS)
───────────────────────────── ──────────────────────────────────
ELF Header Mach-O Header (struct mach_header_64)
magic: 0x7F454C46 (".ELF") magic: 0xFEEDFACF (arm64, little-endian)
e_ident: ABI info cputype: 16777228 (ARM64)
e_type: ET_EXEC filetype: MH_EXECUTE
e_entry: _start address ncmds: number of load commands
Program Header Table (segments) Load Commands (segments + more)
PT_LOAD (text) LC_SEGMENT_64 (__TEXT)
PT_LOAD (data) section __text
PT_INTERP section __const
PT_DYNAMIC LC_SEGMENT_64 (__DATA)
LC_LOAD_DYLINKER (/usr/lib/dyld)
Section Header Table (no separate SHT equivalent —
.text, .data, .bss, .symtab sections are embedded in LC_SEGMENT_64)
.rel.*, .debug_*
The key structural difference: ELF separates segments (runtime) from sections (linking), each with their own header tables. Mach-O uses "Load Commands" which describe both at once.
macOS ARM64 System Call Numbers
macOS ARM64 BSD Syscall Numbers (selected)
┌────────┬──────────────────────────────────────────────────────────┐
│ Number │ Name and signature │
├────────┼──────────────────────────────────────────────────────────┤
│ 1 │ exit(int status) │
│ 3 │ read(int fd, void *buf, size_t count) │
│ 4 │ write(int fd, const void *buf, size_t count) │
│ 5 │ open(const char *path, int flags, int mode) │
│ 6 │ close(int fd) │
│ 20 │ getpid() → pid_t │
│ 197 │ mmap(void *addr, size_t len, int prot, int flags, ...) │
│ 199 │ lseek(int fd, off_t offset, int whence) │
│ 271 │ sendto(int socket, ...) │
│ 368 │ openat(int fd, const char *path, int flags, int mode) │
└────────┴──────────────────────────────────────────────────────────┘
Note: macOS uses classic BSD numbers (open=5, write=4, exit=1)
vs. Linux ARM64's generic table (openat=56, write=64, exit=93)
⚠️ Common Mistake: macOS has both
open(syscall 5) andopenat(syscall 368). Linux ARM64 only hasopenatin its generic syscall table (syscall 56) — there's noopen. If you're porting code between platforms, check the syscall table carefully.
ADRP + ADD: macOS Address Loading
On macOS ARM64, you'll see ADRP + ADD (or ADRP + LDR) for loading addresses, not just ADR. Why?
ADR has a ±1MB range. For larger programs or shared libraries where the code and data might be more than 1MB apart, this isn't enough. Apple's toolchain uses ADRP (PC-relative to page) + ADD (offset within page), which together have a range of ±4GB.
// Load address of msg using ADRP + ADD
ADRP X0, msg@PAGE // X0 = page containing msg (21-bit page offset from PC)
ADD X0, X0, msg@PAGEOFF // X0 += offset of msg within its page
// Together: X0 = exact address of msg, range ±4GB from PC
The assembler macros @PAGE and @PAGEOFF are Apple assembler extensions. GNU assembler uses :lo12: for the offset part.
For code that calls external functions (like printf):
ADRP X8, _printf@GOTPAGE // X8 = page of printf's GOT entry
LDR X8, [X8, _printf@GOTPAGEOFF] // X8 = printf's actual address (via GOT)
BLR X8 // indirect call
// OR simply: BL _printf (linker creates the stub automatically)
Debugging on macOS: LLDB
macOS doesn't ship GDB (Apple removed it due to code signing requirements). Use LLDB:
# Basic LLDB session
lldb ./hello_macos
(lldb) disassemble --name _start # or: di -n _start
(lldb) breakpoint set --name _start # or: b _start
(lldb) run # or: r
(lldb) register read # or: reg read
(lldb) stepi # or: si
(lldb) register read x0 x1 x2 x16 # read specific registers
(lldb) memory read $x1 # read memory at address in X1
(lldb) continue # or: c
LLDB uses the same concepts as GDB but different commands. The register read output shows all ARM64 registers with their values.
Universal Binaries and lipo
macOS supports "Universal Binaries" (formerly "fat binaries") that contain code for multiple architectures in one file. During the x86→ARM64 transition, many apps shipped as universal binaries containing both x86-64 and ARM64 code:
# Check if a binary is universal
file /usr/bin/python3
# /usr/bin/python3: Mach-O universal binary with 2 architectures:
# [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
# Extract just the ARM64 slice
lipo -extract arm64 /usr/bin/python3 -o python3_arm64_only
# Combine two binaries into a universal binary
lipo -create python3_x86 python3_arm64 -o python3_universal
Rosetta 2 handles the x86-64 slices on Apple Silicon transparently — when you run a universal binary, macOS picks the native ARM64 slice if available.
Summary: Linux vs. macOS ARM64 Assembly
Key Differences: Linux ARM64 vs. macOS ARM64
══════════════════════════════════════════════════════════════════════════
Feature Linux ARM64 macOS ARM64
─────────────────────────────────────────────────────────────────────────
Binary format ELF Mach-O
Section syntax .section .text .section __TEXT,__text
Syscall register X8 X16
Syscall invoke SVC #0 SVC #0x80
Syscall numbers Generic (write=64,exit=93) BSD (write=4, exit=1)
C symbol prefix No underscore (printf) Underscore prefix (_printf)
Entry point _start _start or _main
Address load ADR (short range) ADRP + ADD (large range)
Debugger GDB LLDB
Assembler aarch64-linux-gnu-as as -arch arm64 (LLVM)
Object inspection objdump, readelf otool, nm, objdump
Page alignment 4KB 16KB (M-series!)
══════════════════════════════════════════════════════════════════════════
The 16KB page size on M-series chips is a notable difference: Apple Silicon uses 16KB memory pages rather than the standard 4KB. This affects mmap, stack alignment, and memory allocation behavior in subtle ways. For most assembly programs it doesn't matter, but for system-level code it can cause issues with code that assumes 4KB pages.
The instruction set itself — the actual ARM64 ISA — is identical on both platforms. Everything in Chapters 16, 17, and the non-macOS parts of 18 applies directly to Apple Silicon once you account for the syscall convention differences.