Part III: ARM64 Assembly

The Architecture That Ate the World

For thirty years, x86 was the only architecture that mattered for general-purpose computing. Intel and AMD dominated desktops, laptops, and servers. ARM was the embedded chip in your calculator, your MP3 player, the thing you ignored in systems programming class.

Then the iPhone happened.

Then Android. Then the Raspberry Pi. Then AWS Graviton. Then Apple Silicon. Then every hyperscaler decided that ARM64 was cheaper per watt than anything Intel sold.

As of 2026, the majority of computing devices on the planet run ARM64. Your phone is ARM64. The Mac sitting on your desk might be ARM64 (and if it is, it's probably faster than the x86 machine next to it). AWS Graviton4 instances run ARM64 and offer better price/performance than comparable x86 instances for most workloads. Apple sold more ARM64 chips in 2023 than Intel sold x86 chips.

If you only know x86-64, you know half the story.

RISC vs. CISC: Philosophy as Architecture

The deepest difference between x86-64 and ARM64 is philosophical, not technical.

x86 (CISC — Complex Instruction Set Computer) was designed when memory was expensive and programmers wrote assembly by hand. The philosophy: pack as much work as possible into each instruction. REP MOVSB copies an entire buffer. ENTER sets up a stack frame. IMUL rbx, [rsp+8], 42 multiplies a memory operand by an immediate and stores in a register — three operations, one instruction.

ARM (RISC — Reduced Instruction Set Computer) was designed with a different philosophy: simple, regular instructions that a compiler can optimize and a chip designer can implement efficiently. Every instruction is 4 bytes. No instruction reads memory AND performs arithmetic. Load your data first, operate on registers, store back.

RISC is not simpler to program. It's differently complex. You'll need more instructions to accomplish the same task. What you gain is predictability: every instruction takes the same number of bytes, the decoder can process instructions in parallel, and the compiler has a clean target.

The deeper secret: modern x86 processors internally translate CISC instructions into RISC-like micro-operations before executing them. The CISC surface is an API that decades of software depend on. Inside an Intel Core, it's RISC all the way down.

What Part III Covers

Chapter 16 — The ARM64 architecture itself: the 31-register file, the zero register, PSTATE flags, fixed-width encoding, and the load/store discipline that defines RISC programming.

Chapter 17 — The ARM64 instruction set: data processing, the barrel shifter, load/store addressing modes, branches, the AAPCS64 calling convention, and Linux system calls.

Chapter 18 — ARM64 programming in practice: arrays, string operations without string instructions, floating-point with the NEON/FP register file, SIMD with NEON, and the differences between Linux ARM64 and Apple Silicon macOS.

Chapter 19 — The great comparison: x86-64 vs. ARM64, side by side. Same programs, both ISAs. Code density, power, performance, and why the industry is betting on ARM64 to win the next decade.

Getting Your Hands on ARM64

You have three good options:

Raspberry Pi 4/5 — Around $50-100, runs 64-bit Linux natively. Real hardware, real ARM64, boots off an SD card. Ideal for embedded development.

QEMU — Full system emulation. qemu-system-aarch64 with a Cortex-A57 CPU model runs ARM64 Linux on any host. Slower than native but free and available everywhere.

Apple Silicon — If you own an M1/M2/M3/M4 Mac, you are already on ARM64. clang on macOS compiles ARM64 natively. GDB is replaced by LLDB. The system calls differ from Linux. Chapter 18 covers the differences.

For this text, all code targets Linux ARM64 (AAPCS64 calling convention, Linux syscall ABI). Apple Silicon differences are noted where they matter.

Setup command for Ubuntu/Debian cross-compilation tools:

sudo apt install gcc-aarch64-linux-gnu binutils-aarch64-linux-gnu qemu-user

With qemu-user, you can run ARM64 binaries directly on an x86-64 host: qemu-aarch64 ./my_arm64_program. No full system emulation required.

Let's learn what the other half of computing looks like.

Chapters in This Part