Chapter 7 Key Takeaways: First Programs

Open Assembly Language Project

Chapter 7 Key Takeaways: First Programs

The 14 Most Important Points from This Chapter

1. MOV copies data but never sets flags. All four forms work differently. mov rax, rbx (reg→reg), mov rax, [mem] (load), mov [mem], rax (store), mov rax, imm (immediate). There is no mov [mem], [mem] — memory-to-memory moves don't exist in x86. MOV never touches CF, OF, ZF, SF, or any flag. Use it freely without worrying about clobbering condition codes.

2. 32-bit writes zero the upper 32 bits; 8-bit and 16-bit writes do not. mov eax, 1 → RAX = 0x0000000000000001. mov al, 1 → only the low byte changes; upper 56 bits are unchanged. This asymmetry is the source of the most common assembly register bugs. When in doubt about what a partial write leaves in the upper bits, use a 32-bit write (which implicitly zeros the upper half) or an explicit movzx.

3. xor eax, eax is the canonical zero idiom. Use it. Two bytes, zero latency (the processor's dependency-breaking circuitry treats it as zero without waiting for the old EAX value), and it zero-extends to clear all 64 bits of RAX. mov rax, 0 requires 7 bytes and may have a false register dependency on some microarchitectures. The only reason to use mov rax, 0 instead is readability in code that prioritizes clarity over size.

4. ADD sets CF for unsigned overflow; OF for signed overflow. INC/DEC do not set CF. add rax, 1 where rax = 0xFFFFFFFFFFFFFFFF: CF=1 (unsigned wrapped), OF=0 (treated as signed, -1+1=0 is fine). The deliberate omission of CF from INC/DEC is historical (pre-64-bit code used INC/DEC inside loops that needed CF preserved from earlier RCRCL chains). In modern code it's a quirk: use add rax, 1 if you need CF.

5. NEG sets CF=1 unless the operand was zero, and sets OF=1 for the minimum signed value. neg rax computes 0 - rax. neg 0 → CF=0. neg anything_else → CF=1. neg 0x8000000000000000 (minimum int64_t) → no positive representation exists, result is the same value, OF=1. These flags matter for multiprecision arithmetic where NEG is used as part of a wider operation.

6. The Linux syscall argument registers are RAX, RDI, RSI, RDX, R10, R8, R9 — note R10, not RCX. The System V ABI uses RCX as the fourth function argument. But Linux syscalls use R10 instead of RCX for the fourth argument, because SYSCALL clobbers RCX (saving the return address there). This is a deliberate kernel interface decision. It's easy to confuse: if you pass arg4 in RCX for a syscall, the kernel will never see it.

7. SYSCALL clobbers RCX and R11. The kernel does this; you cannot prevent it. The SYSCALL instruction saves RIP to RCX and RFLAGS to R11 as part of its hardware operation. After SYSRET returns, RCX = your return address (not useful to you), R11 = your original RFLAGS (also not useful in most cases). If you need RCX or R11 across a system call, push them before SYSCALL and pop them after.

8. A negative return value from a syscall indicates an error. The magnitude is the errno code. Linux syscalls never return between -1 and -4095 for success. If rax is in that range after a syscall, it's a negated errno. -1 = EPERM, -2 = ENOENT, -9 = EBADF, -14 = EFAULT. In C, the library converts this to rax = -1 and stores the errno in the errno global. In raw assembly, test for negative RAX and handle it yourself.

9. The naive strlen loop is ~4 instructions per byte; the 8-bytes-at-a-time version is ~1.1; AVX2 is ~0.22. This is the quantitative case for why assembly programmers care about algorithms, not just instruction selection. Using SCASB (a specialized hardware instruction) is actually slower than the naive loop on modern hardware due to microcode overhead. The right tool is not the most specialized instruction — it's the most throughput-efficient one.

10. The Hacker's Delight zero-byte detection technique uses: (word - 0x0101010101010101) & ~word & 0x8080808080808080. If any byte in word is zero, subtracting 0x01 causes a borrow that sets bit 7 of that byte position. The & ~word term eliminates false positives from bytes that were already 0x80-0xFF. The & 0x808... term extracts only the high bits. A non-zero result means at least one zero byte exists; bsf then finds which one.

11. The MinOS bootloader runs in 16-bit real mode at address 0x7C00. BITS 16 and ORG 0x7C00 are both required. BITS 16 tells NASM to generate 16-bit code (byte 0x66 and 0x67 prefixes for 32/64-bit operands). ORG 0x7C00 tells NASM where the binary will be loaded so it resolves label addresses correctly. Without BITS 16, NASM generates 32-bit instructions that execute incorrectly in real mode. Without ORG 0x7C00, all label addresses are off by 0x7C00.

12. The boot signature dw 0xAA55 must be the last 2 bytes of a 512-byte sector. The BIOS checks bytes 510-511 of the sector for 0x55 0xAA (little-endian: dw 0xAA55). If not present, the BIOS will not attempt to boot from that device. The times 510-($ - $$) db 0 directive pads from the end of your code to byte 510, after which the 2-byte signature completes the 512-byte sector.

13. Always verify print_uint64 handles 0 and 0xFFFFFFFFFFFFFFFF (18446744073709551615). These are the two edge cases. Zero requires a special case because the DIV-based loop would produce no digits. The maximum 64-bit value is 20 digits, so your digit buffer must be at least 20 bytes. Using a 32-byte or 64-byte buffer eliminates the size concern entirely.

14. Stack alignment: maintain RSP % 16 == 0 at the point of a CALL or SYSCALL. The _start entry point is called by the OS with RSP already 16-byte aligned. A CALL instruction pushes 8 bytes (return address), leaving RSP % 16 == 8. The first instruction of a function prologue (push rbp) leaves RSP % 16 == 0 again. Before any CALL within a function, verify that any local variable allocations and PUSH instructions keep RSP % 16 == 0 at the point of the CALL. Misaligned RSP before SSE/AVX instructions causes #GP faults.

Visual Summary: System Call Register Convention

Linux x86-64 Syscall Layout
───────────────────────────────────────────────────────────────
Register  Role                 Example (sys_write)
────────  ───────────────────  ──────────────────────────────
RAX       syscall number       1 (sys_write)
RDI       argument 1           fd = 1 (stdout)
RSI       argument 2           buf = address of "Hello\n"
RDX       argument 3           count = 6
R10       argument 4           (not used by sys_write)
R8        argument 5           (not used by sys_write)
R9        argument 6           (not used by sys_write)

After SYSCALL returns:
RAX       return value         bytes written (e.g., 6) or -errno
RCX       CLOBBERED            (hardware saved your return address here)
R11       CLOBBERED            (hardware saved RFLAGS here)
All other registers: unchanged

Common syscall numbers (Linux x86-64):
  0 = sys_read    1 = sys_write   2 = sys_open    3 = sys_close
 60 = sys_exit   39 = sys_getpid  9 = sys_mmap   11 = sys_munmap

Visual Summary: MOV Family Quick Reference

MOV Forms and Their Flag Behavior
──────────────────────────────────────────────────────────────
Instruction                Effect                    Flags
──────────────────────────────────────────────────────────────
mov  rax, rbx              rax = rbx                 none
mov  eax, ebx              rax = 0x00000000 || ebx   none (zero-extends!)
mov  ax, bx                ax = bx (upper 48 bits unchanged) none
mov  al, bl                al = bl (upper 56 bits unchanged) none

mov  rax, [addr]           rax = memory[addr]        none
mov  eax, [addr]           rax = 0 || mem32[addr]    none
mov  [addr], rax           memory[addr] = rax        none

mov  rax, 42               rax = 42                  none
mov  eax, 42               rax = 0x0000000000000042  none

movzx rax, BYTE [addr]     rax = zero-extend(byte)   none
movzx rax, WORD [addr]     rax = zero-extend(word)   none
movsx rax, BYTE [addr]     rax = sign-extend(byte)   none
movsx rax, DWORD [addr]    rax = sign-extend(dword)  none

Key rule: writes to 32-bit registers zero-extend to 64 bits.
          writes to 8-bit or 16-bit registers do NOT.