Appendix H: Calling Conventions Reference
This appendix summarizes the three major calling conventions used in this book, presented side by side for comparison. For the full specifications, see the ABI documents listed in Appendix C (Bibliography).
Three Calling Conventions at a Glance
| Property | System V AMD64 | Windows x64 | ARM64 AAPCS64 |
|---|---|---|---|
| Architecture | x86-64 (Linux, macOS, BSDs) | x86-64 (Windows) | AArch64 (all platforms) |
| Int/ptr arg registers | RDI RSI RDX RCX R8 R9 | RCX RDX R8 R9 | X0 X1 X2 X3 X4 X5 X6 X7 |
| FP arg registers | XMM0-XMM7 | XMM0-XMM3 | V0-V7 |
| Max register args | 6 int + 8 FP | 4 total (int+FP shared) | 8 int + 8 FP |
| Int return | RAX | RAX | X0 |
| FP return | XMM0 | XMM0 | V0 |
| Stack alignment | 16-byte before CALL | 16-byte (32-byte for AVX) | 16-byte always |
| Red zone | 128 bytes below RSP | None | None |
| Shadow space | None | 32 bytes (4 × 8) | None |
| Scratch regs (caller-saved) | RAX RCX RDX RSI RDI R8-R11 | RAX RCX RDX R8-R11 | X0-X15 (caller-saved) |
| Preserved regs (callee-saved) | RBX RBP R12-R15 | RBX RBP RSI RDI R12-R15 | X19-X28 X29 (FP) |
| Return address saved in | Stack (via CALL) | Stack (via CALL) | X30 (LR register) |
System V AMD64 ABI (Linux, macOS, BSDs)
Integer / Pointer Argument Passing
Arguments are assigned to registers in this order, left to right. Integers, pointers, and enum values use this sequence:
| Position | 1st | 2nd | 3rd | 4th | 5th | 6th | 7th+ |
|---|---|---|---|---|---|---|---|
| Register | RDI | RSI | RDX | RCX | R8 | R9 | Stack |
Additional arguments beyond the 6th are pushed on the stack, right to left (last argument pushed first). The stack must be 16-byte aligned before the CALL instruction executes.
Floating-Point Argument Passing
Floating-point and SIMD arguments (single, double, __m128, __m256) use:
| Position | 1st | 2nd | 3rd | 4th | 5th | 6th | 7th | 8th | 9th+ |
|---|---|---|---|---|---|---|---|---|---|
| Register | XMM0 | XMM1 | XMM2 | XMM3 | XMM4 | XMM5 | XMM6 | XMM7 | Stack |
When both integer and FP registers are used, they are assigned independently. A function with signature f(int, double, int, double) uses RDI (1st int), XMM0 (1st double), RSI (2nd int), XMM1 (2nd double).
Return Values
- Integers and pointers: RAX (64-bit), EAX (32-bit), AX (16-bit), AL (8-bit)
- 128-bit values: RDX:RAX (high in RDX, low in RAX)
- Single-precision float: XMM0 (in the lower 32 bits,
ssformat) - Double-precision float: XMM0 (in the lower 64 bits,
sdformat) __m128: XMM0__m256: YMM0- Struct return: depends on size (small structs in RAX:RDX; large structs via hidden first pointer argument)
Register Preservation
Callee-saved (function must preserve these if it uses them): RBX, RBP, R12, R13, R14, R15
Caller-saved (function may freely modify; caller saves before call if needed): RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11
Special:
RSP must be restored to its original value before ret (stack must be balanced).
The direction flag (DF in RFLAGS) must be clear (0) on function entry and exit.
Red Zone
The 128 bytes below RSP (addresses RSP-128 through RSP-1) may be used by leaf functions (functions that make no further function calls) without adjusting RSP. The kernel guarantees not to use this area when delivering signals.
Non-leaf functions must not assume the red zone is preserved across calls, because the CALL instruction adjusts RSP and the called function may overwrite this area.
Stack Frame Layout (Standard)
Higher addresses
┌──────────────────────┐ ← Previous frame's RSP (before CALL)
│ 7th argument │ (if > 6 args)
│ 6th+ arguments │
│ return address │ ← RSP at function entry (CALL pushes this)
│ saved RBP │ ← RBP after prologue
│ saved callee regs │ (RBX, R12-R15 if used)
│ local variables │ ← [rbp - N]
│ possible alignment │
│ red zone (128 bytes) │ ← [RSP - 128] through [RSP - 1]
└──────────────────────┘ ← RSP during function body
Lower addresses
Complete Prologue and Epilogue
; Standard prologue (with frame pointer):
push rbp
mov rbp, rsp
sub rsp, N ; N = local variable space, rounded to 16
; Optionally save callee-saved registers used:
push rbx
push r12
; ... function body ...
; Epilogue:
pop r12 ; restore in reverse order
pop rbx
leave ; equivalent to: mov rsp, rbp; pop rbp
ret
; Leaf function using red zone (no prologue/epilogue needed):
mov [rsp - 8], rbx ; save into red zone (if RBX needed)
; ... function body using rsp-8 through rsp-128 ...
mov rbx, [rsp - 8] ; restore
ret
Windows x64 ABI
Key Differences from System V AMD64
- Different argument registers: RCX, RDX, R8, R9 (not RDI, RSI, RDX, RCX)
- Shadow space (home space): The caller allocates 32 bytes of "shadow space" on the stack before every CALL, providing scratch space for the callee to spill its first four register arguments. The callee may or may not use this space; the caller always provides it.
- Different callee-saved registers: RSI and RDI are callee-saved on Windows (but caller-saved on Linux).
- No red zone: Windows has no red zone; any code below RSP may be overwritten.
- XMM6-XMM15 are callee-saved on Windows (only XMM0-XMM5 are volatile).
Argument Registers (Windows x64)
Integer and FP arguments share slots (position 1 can be int or float, not both):
| Position | 1st | 2nd | 3rd | 4th | 5th+ |
|---|---|---|---|---|---|
| Integer | RCX | RDX | R8 | R9 | Stack |
| Float | XMM0 | XMM1 | XMM2 | XMM3 | Stack |
Shadow Space
; Windows x64 function call (caller must provide 32 bytes shadow space):
sub rsp, 8 + 32 ; 8 for alignment, 32 for shadow space
mov rcx, arg1
mov rdx, arg2
call function
add rsp, 8 + 32 ; clean up
Stack Frame Layout (Windows x64)
Higher addresses
┌──────────────────────┐
│ 5th argument │ (if > 4 args)
│ shadow space (32B) │ ← 4 × 8 bytes, always present
│ return address │ ← after CALL
│ saved RBP (optional) │
│ callee-saved regs │ (RBX, RSI, RDI, R12-R15, XMM6-XMM15)
│ local variables │
└──────────────────────┘ ← RSP
Lower addresses
ARM64 AAPCS64 (AArch64 Procedure Call Standard)
Integer / Pointer Argument Passing
| Position | 1st | 2nd | 3rd | 4th | 5th | 6th | 7th | 8th | 9th+ |
|---|---|---|---|---|---|---|---|---|---|
| Register | X0 | X1 | X2 | X3 | X4 | X5 | X6 | X7 | Stack |
Float / SIMD Argument Passing
| Position | 1st | 2nd | 3rd | 4th | 5th | 6th | 7th | 8th | 9th+ |
|---|---|---|---|---|---|---|---|---|---|
| Register | V0/D0/S0 | V1 | V2 | V3 | V4 | V5 | V6 | V7 | Stack |
Integer and FP registers are allocated independently (same as System V AMD64).
Return Values
- Integer/pointer: X0 (and X1 for 128-bit values)
- Float: V0 (D0 for double, S0 for single)
- SIMD
__int128: X0:X1 float128_t: Q0
Register Preservation
Callee-saved: X19, X20, X21, X22, X23, X24, X25, X26, X27, X28, X29 (FP), X30 (LR, by convention) V8-V15 (only the lower 64 bits; the upper 64 bits are not preserved)
Caller-saved: X0-X15, X16 (IP0), X17 (IP1), X18 (platform register — treat as caller-saved unless targeting a platform that reserves it) V0-V7, V16-V31
Special: SP must be 16-byte aligned at all times (not just before calls — always). X30 (LR) holds the return address; functions that make further calls must save it.
Standard Prologue and Epilogue
// Standard prologue:
stp x29, x30, [sp, #-16]! // save FP and LR; pre-decrement SP
mov x29, sp // establish frame pointer
// Optionally save other callee-saved registers:
stp x19, x20, [sp, #-16]!
// ... function body ...
// Epilogue:
ldp x19, x20, [sp], #16 // restore in reverse order
ldp x29, x30, [sp], #16 // restore FP and LR; post-increment SP
ret // branch to X30
ARM64 Stack Frame Layout
Higher addresses
┌──────────────────────┐
│ 9th+ arguments │ (if > 8 args)
│ (no shadow space) │
│ saved X29 (old FP) │ ← [SP] after prologue; [X29]
│ saved X30 (LR) │ ← [X29 + 8]
│ other callee regs │
│ local variables │
└──────────────────────┘ ← SP (always 16-byte aligned)
Lower addresses
Unlike x86-64, ARM64 has no concept of a red zone; the stack must be maintained with strict 16-byte alignment at all times.
Variadic Functions (Variable Argument Lists)
System V AMD64
For variadic functions (printf, fprintf, etc.): the caller sets AL = the number of vector registers used (for the benefit of the callee's va_start machinery). This is typically visible as xor eax, eax before calling printf when no FP arguments are passed.
The va_list on x86-64 System V is a structure containing:
- gp_offset: offset into the register save area for the next GP register argument
- fp_offset: offset into the register save area for the next FP register argument
- overflow_arg_area: pointer to remaining stack arguments
- reg_save_area: pointer to the saved register arguments
ARM64 AAPCS64
The ARM64 va_list is a structure pointing to the stack-passed arguments, with a separate "named parameter register" count. Variadic functions must save all argument registers to memory in their prologue to allow va_arg to iterate over them.
Syscall vs. Function Call Comparison
| Property | Function Call (System V) | System Call (Linux x86-64) |
|---|---|---|
| Mechanism | call instruction |
syscall instruction |
| Number/selector | — | RAX |
| Arg 1 | RDI | RDI |
| Arg 2 | RSI | RSI |
| Arg 3 | RDX | RDX |
| Arg 4 | RCX | R10 (RCX clobbered by syscall) |
| Arg 5 | R8 | R8 |
| Arg 6 | R9 | R9 |
| Return | RAX | RAX (negative = error) |
| Clobbers | Caller-saved regs | RCX, R11 (always), others per syscall |
| Stack | Caller-managed | Unchanged |
Note: The 4th argument register differs: function calls use RCX, but syscalls use R10, because syscall saves the user-space RIP into RCX (and RFLAGS into R11).
Cross-Platform Assembly Compatibility
When writing assembly that needs to work on both Linux (System V) and Windows:
; Linux/macOS/BSD version (System V AMD64):
; arg1 in RDI, arg2 in RSI
; Windows version:
; arg1 in RCX, arg2 in RDX
; Cross-platform wrapper approach:
%ifdef WINDOWS
; Translate Windows registers to System V layout
mov rdi, rcx
mov rsi, rdx
%endif
; Common code using rdi, rsi
For library code distributed as object files, consider using #ifdef _WIN32 in C source and letting the compiler handle the ABI differences.