Appendix H: Calling Conventions Reference

This appendix summarizes the three major calling conventions used in this book, presented side by side for comparison. For the full specifications, see the ABI documents listed in Appendix C (Bibliography).


Three Calling Conventions at a Glance

Property System V AMD64 Windows x64 ARM64 AAPCS64
Architecture x86-64 (Linux, macOS, BSDs) x86-64 (Windows) AArch64 (all platforms)
Int/ptr arg registers RDI RSI RDX RCX R8 R9 RCX RDX R8 R9 X0 X1 X2 X3 X4 X5 X6 X7
FP arg registers XMM0-XMM7 XMM0-XMM3 V0-V7
Max register args 6 int + 8 FP 4 total (int+FP shared) 8 int + 8 FP
Int return RAX RAX X0
FP return XMM0 XMM0 V0
Stack alignment 16-byte before CALL 16-byte (32-byte for AVX) 16-byte always
Red zone 128 bytes below RSP None None
Shadow space None 32 bytes (4 × 8) None
Scratch regs (caller-saved) RAX RCX RDX RSI RDI R8-R11 RAX RCX RDX R8-R11 X0-X15 (caller-saved)
Preserved regs (callee-saved) RBX RBP R12-R15 RBX RBP RSI RDI R12-R15 X19-X28 X29 (FP)
Return address saved in Stack (via CALL) Stack (via CALL) X30 (LR register)

System V AMD64 ABI (Linux, macOS, BSDs)

Integer / Pointer Argument Passing

Arguments are assigned to registers in this order, left to right. Integers, pointers, and enum values use this sequence:

Position 1st 2nd 3rd 4th 5th 6th 7th+
Register RDI RSI RDX RCX R8 R9 Stack

Additional arguments beyond the 6th are pushed on the stack, right to left (last argument pushed first). The stack must be 16-byte aligned before the CALL instruction executes.

Floating-Point Argument Passing

Floating-point and SIMD arguments (single, double, __m128, __m256) use:

Position 1st 2nd 3rd 4th 5th 6th 7th 8th 9th+
Register XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 Stack

When both integer and FP registers are used, they are assigned independently. A function with signature f(int, double, int, double) uses RDI (1st int), XMM0 (1st double), RSI (2nd int), XMM1 (2nd double).

Return Values

  • Integers and pointers: RAX (64-bit), EAX (32-bit), AX (16-bit), AL (8-bit)
  • 128-bit values: RDX:RAX (high in RDX, low in RAX)
  • Single-precision float: XMM0 (in the lower 32 bits, ss format)
  • Double-precision float: XMM0 (in the lower 64 bits, sd format)
  • __m128: XMM0
  • __m256: YMM0
  • Struct return: depends on size (small structs in RAX:RDX; large structs via hidden first pointer argument)

Register Preservation

Callee-saved (function must preserve these if it uses them): RBX, RBP, R12, R13, R14, R15

Caller-saved (function may freely modify; caller saves before call if needed): RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11

Special: RSP must be restored to its original value before ret (stack must be balanced). The direction flag (DF in RFLAGS) must be clear (0) on function entry and exit.

Red Zone

The 128 bytes below RSP (addresses RSP-128 through RSP-1) may be used by leaf functions (functions that make no further function calls) without adjusting RSP. The kernel guarantees not to use this area when delivering signals.

Non-leaf functions must not assume the red zone is preserved across calls, because the CALL instruction adjusts RSP and the called function may overwrite this area.

Stack Frame Layout (Standard)

Higher addresses
 ┌──────────────────────┐ ← Previous frame's RSP (before CALL)
 │  7th argument         │ (if > 6 args)
 │  6th+ arguments       │
 │  return address       │ ← RSP at function entry (CALL pushes this)
 │  saved RBP            │ ← RBP after prologue
 │  saved callee regs    │ (RBX, R12-R15 if used)
 │  local variables      │ ← [rbp - N]
 │  possible alignment   │
 │  red zone (128 bytes) │ ← [RSP - 128] through [RSP - 1]
 └──────────────────────┘ ← RSP during function body
Lower addresses

Complete Prologue and Epilogue

; Standard prologue (with frame pointer):
push    rbp
mov     rbp, rsp
sub     rsp, N          ; N = local variable space, rounded to 16

; Optionally save callee-saved registers used:
push    rbx
push    r12

; ... function body ...

; Epilogue:
pop     r12             ; restore in reverse order
pop     rbx
leave                   ; equivalent to: mov rsp, rbp; pop rbp
ret
; Leaf function using red zone (no prologue/epilogue needed):
mov     [rsp - 8], rbx   ; save into red zone (if RBX needed)
; ... function body using rsp-8 through rsp-128 ...
mov     rbx, [rsp - 8]   ; restore
ret

Windows x64 ABI

Key Differences from System V AMD64

  1. Different argument registers: RCX, RDX, R8, R9 (not RDI, RSI, RDX, RCX)
  2. Shadow space (home space): The caller allocates 32 bytes of "shadow space" on the stack before every CALL, providing scratch space for the callee to spill its first four register arguments. The callee may or may not use this space; the caller always provides it.
  3. Different callee-saved registers: RSI and RDI are callee-saved on Windows (but caller-saved on Linux).
  4. No red zone: Windows has no red zone; any code below RSP may be overwritten.
  5. XMM6-XMM15 are callee-saved on Windows (only XMM0-XMM5 are volatile).

Argument Registers (Windows x64)

Integer and FP arguments share slots (position 1 can be int or float, not both):

Position 1st 2nd 3rd 4th 5th+
Integer RCX RDX R8 R9 Stack
Float XMM0 XMM1 XMM2 XMM3 Stack

Shadow Space

; Windows x64 function call (caller must provide 32 bytes shadow space):
sub     rsp, 8 + 32     ; 8 for alignment, 32 for shadow space
mov     rcx, arg1
mov     rdx, arg2
call    function
add     rsp, 8 + 32     ; clean up

Stack Frame Layout (Windows x64)

Higher addresses
 ┌──────────────────────┐
 │  5th argument         │ (if > 4 args)
 │  shadow space (32B)   │ ← 4 × 8 bytes, always present
 │  return address       │ ← after CALL
 │  saved RBP (optional) │
 │  callee-saved regs    │ (RBX, RSI, RDI, R12-R15, XMM6-XMM15)
 │  local variables      │
 └──────────────────────┘ ← RSP
Lower addresses

ARM64 AAPCS64 (AArch64 Procedure Call Standard)

Integer / Pointer Argument Passing

Position 1st 2nd 3rd 4th 5th 6th 7th 8th 9th+
Register X0 X1 X2 X3 X4 X5 X6 X7 Stack

Float / SIMD Argument Passing

Position 1st 2nd 3rd 4th 5th 6th 7th 8th 9th+
Register V0/D0/S0 V1 V2 V3 V4 V5 V6 V7 Stack

Integer and FP registers are allocated independently (same as System V AMD64).

Return Values

  • Integer/pointer: X0 (and X1 for 128-bit values)
  • Float: V0 (D0 for double, S0 for single)
  • SIMD __int128: X0:X1
  • float128_t: Q0

Register Preservation

Callee-saved: X19, X20, X21, X22, X23, X24, X25, X26, X27, X28, X29 (FP), X30 (LR, by convention) V8-V15 (only the lower 64 bits; the upper 64 bits are not preserved)

Caller-saved: X0-X15, X16 (IP0), X17 (IP1), X18 (platform register — treat as caller-saved unless targeting a platform that reserves it) V0-V7, V16-V31

Special: SP must be 16-byte aligned at all times (not just before calls — always). X30 (LR) holds the return address; functions that make further calls must save it.

Standard Prologue and Epilogue

// Standard prologue:
stp     x29, x30, [sp, #-16]!   // save FP and LR; pre-decrement SP
mov     x29, sp                 // establish frame pointer

// Optionally save other callee-saved registers:
stp     x19, x20, [sp, #-16]!

// ... function body ...

// Epilogue:
ldp     x19, x20, [sp], #16     // restore in reverse order
ldp     x29, x30, [sp], #16     // restore FP and LR; post-increment SP
ret                             // branch to X30

ARM64 Stack Frame Layout

Higher addresses
 ┌──────────────────────┐
 │  9th+ arguments       │ (if > 8 args)
 │  (no shadow space)    │
 │  saved X29 (old FP)   │ ← [SP] after prologue; [X29]
 │  saved X30 (LR)       │ ← [X29 + 8]
 │  other callee regs    │
 │  local variables      │
 └──────────────────────┘ ← SP (always 16-byte aligned)
Lower addresses

Unlike x86-64, ARM64 has no concept of a red zone; the stack must be maintained with strict 16-byte alignment at all times.


Variadic Functions (Variable Argument Lists)

System V AMD64

For variadic functions (printf, fprintf, etc.): the caller sets AL = the number of vector registers used (for the benefit of the callee's va_start machinery). This is typically visible as xor eax, eax before calling printf when no FP arguments are passed.

The va_list on x86-64 System V is a structure containing: - gp_offset: offset into the register save area for the next GP register argument - fp_offset: offset into the register save area for the next FP register argument - overflow_arg_area: pointer to remaining stack arguments - reg_save_area: pointer to the saved register arguments

ARM64 AAPCS64

The ARM64 va_list is a structure pointing to the stack-passed arguments, with a separate "named parameter register" count. Variadic functions must save all argument registers to memory in their prologue to allow va_arg to iterate over them.


Syscall vs. Function Call Comparison

Property Function Call (System V) System Call (Linux x86-64)
Mechanism call instruction syscall instruction
Number/selector RAX
Arg 1 RDI RDI
Arg 2 RSI RSI
Arg 3 RDX RDX
Arg 4 RCX R10 (RCX clobbered by syscall)
Arg 5 R8 R8
Arg 6 R9 R9
Return RAX RAX (negative = error)
Clobbers Caller-saved regs RCX, R11 (always), others per syscall
Stack Caller-managed Unchanged

Note: The 4th argument register differs: function calls use RCX, but syscalls use R10, because syscall saves the user-space RIP into RCX (and RFLAGS into R11).


Cross-Platform Assembly Compatibility

When writing assembly that needs to work on both Linux (System V) and Windows:

; Linux/macOS/BSD version (System V AMD64):
; arg1 in RDI, arg2 in RSI

; Windows version:
; arg1 in RCX, arg2 in RDX

; Cross-platform wrapper approach:
%ifdef WINDOWS
    ; Translate Windows registers to System V layout
    mov     rdi, rcx
    mov     rsi, rdx
%endif
    ; Common code using rdi, rsi

For library code distributed as object files, consider using #ifdef _WIN32 in C source and letting the compiler handle the ABI differences.