Chapter 25 Key Takeaways: System Calls

  • A system call is a supervised privilege escalation. The syscall instruction causes the CPU to transition from ring 3 to ring 0 (kernel mode) via a fixed entry point defined in the LSTAR MSR. It is not a function call; no attacker can redirect it to arbitrary kernel code.

  • syscall destroys RCX and R11. The instruction saves RIP into RCX and RFLAGS into R11 for later use by SYSRET. Any values your program had in these registers are gone after a syscall. This is why the Linux syscall ABI uses R10 (not RCX) for the fourth argument.

  • The Linux x86-64 syscall ABI: RAX = syscall number; RDI, RSI, RDX, R10, R8, R9 = arguments 1–6; RAX = return value. Negative return means error; the absolute value is the errno code.

  • Error handling in raw assembly: If RAX is negative after a syscall, -RAX equals the errno. The C library converts this to a -1 return plus a write to the thread-local errno variable. In raw assembly, you manage this yourself.

  • The vDSO eliminates ring transitions for hot paths. gettimeofday and clock_gettime can be called millions of times per second without ever entering the kernel, because the kernel maps a shared page of time data into every process. This is ~10x faster than a real syscall.

  • ARM64 uses SVC #0 with X8 as the syscall number. The argument registers X0–X5 correspond to arguments 1–6. Crucially, ARM64 syscall numbers are completely different from x86-64 numbers (e.g., write=64 on ARM64, write=1 on x86-64).

  • strace makes the system call layer visible without modifying the program. Every file open, network connection, memory allocation, and process creation appears in the trace. For debugging and security analysis, it is often the fastest way to understand what a program is actually doing.

  • The C standard library is mostly syscall wrappers. open, read, write, malloc (via brk or mmap), fork, exec — all of these are thin wrappers around raw syscalls. Building your own minimal libc demonstrates that there is no magic below the syscall interface.

  • sys_mmap is the general-purpose memory interface. It handles anonymous memory allocation (like malloc for large chunks), file mapping, and shared memory between processes. The flags MAP_PRIVATE|MAP_ANONYMOUS give you zeroed anonymous memory; MAP_SHARED gives shared memory that survives a fork.

  • sys_execve replaces the process image entirely. On success, it never returns — the calling code no longer exists. On failure, it returns a negative errno and the original code continues. This is the foundation of all process launching: shell, loader, and supervisor all use it.

  • The MinOS syscall dispatcher uses swapgs to access per-CPU data. On syscall entry, the kernel is running on the user's stack. Before doing anything else, it must switch to the kernel stack. The per-CPU kernel stack address is stored in kernel GS-relative memory, accessed after swapgs swaps in the kernel GS base.

  • Suspicious syscall patterns are security-relevant. Reading SSH keys then opening a network connection to an unknown IP, calling ptrace(PTRACE_TRACEME) as an anti-debugging check, writing to .bashrc or crontab directories — these behaviors are visible in strace output regardless of how obfuscated the binary is, because they must eventually make system calls.