Case Study 18-2: ARM64 on Apple Silicon — Hello World on macOS

Objective

Write, assemble, link, and run an ARM64 assembly "Hello, World" on macOS Apple Silicon (M-series). Understand the differences from Linux ARM64: system call mechanism, Mach-O binary format, section names, and the use of Clang vs. GNU assembler.


Background: What Makes Apple Silicon Different

The M-series chips (M1 through M4 as of 2026) are ARM64 processors. The hardware is AArch64. The instruction set is the same standard ARM64 you've been learning.

What differs is everything around the hardware: 1. Binary format: Mach-O (not ELF) 2. Section naming: __TEXT,__text not .text 3. System call convention: X16 for syscall number, SVC #0x80 (not X8 / SVC #0) 4. Syscall numbers: BSD-derived (not Linux generic) 5. Assembler: Apple's as (based on LLVM) or Clang directly 6. Linker: Apple's ld (requires -lSystem or -syslibroot arguments) 7. Dynamic linker: /usr/lib/dyld (not /lib/ld-linux-aarch64.so.1) 8. Entry point: _start for bare metal, or main when linking with libc/libSystem


The Hello World Program (macOS ARM64)

// hello_macos.s
// macOS ARM64 (AArch64) Hello World
//
// Build:
//   as -arch arm64 -o hello_macos.o hello_macos.s
//   ld -arch arm64 -platform_version macos 13.0 13.0 \
//      -e _start -o hello_macos hello_macos.o
//   OR with Clang (easier):
//   clang -arch arm64 hello_macos.s -o hello_macos -nostdlib -e _start
//
// Run: ./hello_macos

// ============================================================
// macOS uses Mach-O sections, not ELF sections
// __TEXT,__text  = executable code   (like .text in ELF)
// __TEXT,__const = read-only data    (like .rodata in ELF)
// __DATA,__data  = initialized data  (like .data in ELF)
// ============================================================

.section __TEXT,__text
.global _start
.align 2                         // ARM64 instructions must be 4-byte aligned

_start:
    // === macOS write system call ===
    // macOS ARM64 syscall convention:
    //   X16 = syscall number  (NOT X8 like Linux!)
    //   X0-X5 = arguments
    //   SVC #0x80 = invoke Unix system call

    MOV X16, #4                  // macOS BSD syscall: write = 4
                                 // (NOT 64 like Linux ARM64!)
    MOV X0, #1                   // fd = stdout (same as Linux)
    ADR X1, msg                  // X1 = address of message
    MOV X2, #msg_len             // X2 = length
    SVC #0x80                    // invoke macOS kernel (Unix call)
    // X0 = bytes written, or -errno on error

    // === macOS exit system call ===
    MOV X16, #1                  // macOS BSD syscall: exit = 1
                                 // (NOT 93 like Linux ARM64!)
    MOV X0, #0                   // exit status = 0
    SVC #0x80

// ============================================================
// Read-only constant data in __TEXT,__const
// ============================================================
.section __TEXT,__const
msg:
    .ascii "Hello, Apple Silicon!\n"
msg_len = . - msg                // = 22

Build Process: macOS

Method 1: Apple's Assembler + Linker

# Assemble (produces Mach-O object file)
as -arch arm64 -o hello_macos.o hello_macos.s

# Link (the platform_version and -e flags are required)
ld -arch arm64 \
   -platform_version macos 13.0 13.0 \
   -e _start \
   -o hello_macos \
   hello_macos.o

# Run
./hello_macos
Hello, Apple Silicon!

Method 2: Clang (simpler)

# Clang assembles and links in one step
# -nostdlib: don't link standard libraries
# -e _start: entry point is _start

clang -arch arm64 \
      -nostdlib \
      -e _start \
      hello_macos.s \
      -o hello_macos

# Or, for static programs (no dyld):
clang -arch arm64 \
      -nostdlib \
      -static \
      -e _start \
      hello_macos.s \
      -o hello_macos

Instead of raw syscalls, use libSystem (Apple's libc wrapper):

// hello_macos_libc.s — using printf from libSystem
.section __TEXT,__text
.global _main
.align 2

_main:
    STP X29, X30, [SP, #-16]!
    MOV X29, SP

    ADRP X0, msg@PAGE               // load high bits of msg address (page-relative)
    ADD  X0, X0, msg@PAGEOFF         // add low bits (offset within page)
    BL   _printf                    // call printf (external C function)

    MOV  W0, #0                     // return 0
    LDP  X29, X30, [SP], #16
    RET

.section __TEXT,__const
.align 2
msg:
    .asciz "Hello, Apple Silicon!\n"
# Link with libSystem (provides printf, malloc, etc.)
clang -arch arm64 hello_macos_libc.s -o hello_macos_libc
# (No -nostdlib — we want libSystem linked)

./hello_macos_libc
Hello, Apple Silicon!

⚠️ Common Mistake: On macOS, external function names from C libraries are prefixed with _ in assembly. printf in C is _printf in assembly. malloc is _malloc. exit is _exit. This is a historical convention from early macOS (based on NeXTSTEP/BSD). Linux doesn't use this convention.


Examining the Mach-O Binary

# Show file type
file hello_macos
# hello_macos: Mach-O 64-bit executable arm64

# Show Mach-O headers
otool -h hello_macos
# Mach header
# magic      cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
# 0xFEEDFACF 16777228          0  0x00           2    16        688 0x00200085

# Show load commands (segments, section layout)
otool -l hello_macos | head -60

# Disassemble
otool -tv hello_macos
# OR:
objdump --disassemble hello_macos   # LLVM objdump

# Show symbol table
nm hello_macos

Mach-O vs. ELF: Structure Comparison

ELF (Linux)                      Mach-O (macOS)
─────────────────────────────    ──────────────────────────────────
ELF Header                       Mach-O Header (struct mach_header_64)
  magic: 0x7F454C46 (".ELF")       magic: 0xFEEDFACF (arm64, little-endian)
  e_ident: ABI info                cputype: 16777228 (ARM64)
  e_type: ET_EXEC                  filetype: MH_EXECUTE
  e_entry: _start address          ncmds: number of load commands

Program Header Table (segments)  Load Commands (segments + more)
  PT_LOAD (text)                   LC_SEGMENT_64 (__TEXT)
  PT_LOAD (data)                     section __text
  PT_INTERP                          section __const
  PT_DYNAMIC                       LC_SEGMENT_64 (__DATA)
                                   LC_LOAD_DYLINKER (/usr/lib/dyld)
Section Header Table             (no separate SHT equivalent —
  .text, .data, .bss, .symtab       sections are embedded in LC_SEGMENT_64)
  .rel.*, .debug_*

The key structural difference: ELF separates segments (runtime) from sections (linking), each with their own header tables. Mach-O uses "Load Commands" which describe both at once.


macOS ARM64 System Call Numbers

macOS ARM64 BSD Syscall Numbers (selected)
┌────────┬──────────────────────────────────────────────────────────┐
│ Number │ Name and signature                                        │
├────────┼──────────────────────────────────────────────────────────┤
│ 1      │ exit(int status)                                          │
│ 3      │ read(int fd, void *buf, size_t count)                     │
│ 4      │ write(int fd, const void *buf, size_t count)              │
│ 5      │ open(const char *path, int flags, int mode)               │
│ 6      │ close(int fd)                                             │
│ 20     │ getpid() → pid_t                                          │
│ 197    │ mmap(void *addr, size_t len, int prot, int flags, ...)   │
│ 199    │ lseek(int fd, off_t offset, int whence)                  │
│ 271    │ sendto(int socket, ...)                                   │
│ 368    │ openat(int fd, const char *path, int flags, int mode)    │
└────────┴──────────────────────────────────────────────────────────┘
Note: macOS uses classic BSD numbers (open=5, write=4, exit=1)
vs. Linux ARM64's generic table (openat=56, write=64, exit=93)

⚠️ Common Mistake: macOS has both open (syscall 5) and openat (syscall 368). Linux ARM64 only has openat in its generic syscall table (syscall 56) — there's no open. If you're porting code between platforms, check the syscall table carefully.


ADRP + ADD: macOS Address Loading

On macOS ARM64, you'll see ADRP + ADD (or ADRP + LDR) for loading addresses, not just ADR. Why?

ADR has a ±1MB range. For larger programs or shared libraries where the code and data might be more than 1MB apart, this isn't enough. Apple's toolchain uses ADRP (PC-relative to page) + ADD (offset within page), which together have a range of ±4GB.

// Load address of msg using ADRP + ADD
ADRP X0, msg@PAGE       // X0 = page containing msg (21-bit page offset from PC)
ADD  X0, X0, msg@PAGEOFF // X0 += offset of msg within its page
// Together: X0 = exact address of msg, range ±4GB from PC

The assembler macros @PAGE and @PAGEOFF are Apple assembler extensions. GNU assembler uses :lo12: for the offset part.

For code that calls external functions (like printf):

ADRP X8, _printf@GOTPAGE    // X8 = page of printf's GOT entry
LDR  X8, [X8, _printf@GOTPAGEOFF]  // X8 = printf's actual address (via GOT)
BLR  X8                      // indirect call
// OR simply: BL _printf (linker creates the stub automatically)

Debugging on macOS: LLDB

macOS doesn't ship GDB (Apple removed it due to code signing requirements). Use LLDB:

# Basic LLDB session
lldb ./hello_macos

(lldb) disassemble --name _start    # or: di -n _start
(lldb) breakpoint set --name _start  # or: b _start
(lldb) run                          # or: r
(lldb) register read                 # or: reg read
(lldb) stepi                         # or: si
(lldb) register read x0 x1 x2 x16   # read specific registers
(lldb) memory read $x1               # read memory at address in X1
(lldb) continue                      # or: c

LLDB uses the same concepts as GDB but different commands. The register read output shows all ARM64 registers with their values.


Universal Binaries and lipo

macOS supports "Universal Binaries" (formerly "fat binaries") that contain code for multiple architectures in one file. During the x86→ARM64 transition, many apps shipped as universal binaries containing both x86-64 and ARM64 code:

# Check if a binary is universal
file /usr/bin/python3
# /usr/bin/python3: Mach-O universal binary with 2 architectures:
# [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]

# Extract just the ARM64 slice
lipo -extract arm64 /usr/bin/python3 -o python3_arm64_only

# Combine two binaries into a universal binary
lipo -create python3_x86 python3_arm64 -o python3_universal

Rosetta 2 handles the x86-64 slices on Apple Silicon transparently — when you run a universal binary, macOS picks the native ARM64 slice if available.


Summary: Linux vs. macOS ARM64 Assembly

Key Differences: Linux ARM64 vs. macOS ARM64
══════════════════════════════════════════════════════════════════════════
Feature            Linux ARM64                macOS ARM64
─────────────────────────────────────────────────────────────────────────
Binary format      ELF                        Mach-O
Section syntax     .section .text             .section __TEXT,__text
Syscall register   X8                         X16
Syscall invoke     SVC #0                     SVC #0x80
Syscall numbers    Generic (write=64,exit=93) BSD (write=4, exit=1)
C symbol prefix    No underscore (printf)     Underscore prefix (_printf)
Entry point        _start                     _start or _main
Address load       ADR (short range)          ADRP + ADD (large range)
Debugger           GDB                        LLDB
Assembler          aarch64-linux-gnu-as       as -arch arm64 (LLVM)
Object inspection  objdump, readelf           otool, nm, objdump
Page alignment     4KB                        16KB (M-series!)
══════════════════════════════════════════════════════════════════════════

The 16KB page size on M-series chips is a notable difference: Apple Silicon uses 16KB memory pages rather than the standard 4KB. This affects mmap, stack alignment, and memory allocation behavior in subtle ways. For most assembly programs it doesn't matter, but for system-level code it can cause issues with code that assumes 4KB pages.

The instruction set itself — the actual ARM64 ISA — is identical on both platforms. Everything in Chapters 16, 17, and the non-macOS parts of 18 applies directly to Apple Silicon once you account for the syscall convention differences.