Case Study 20-2: Calling printf Without libc — printf's Implementation
Objective
Trace exactly what happens when your assembly code calls printf: from the PLT stub through the dynamic linker to the eventual write(2) system call. Use GDB to step through every layer. Understand that "calling printf" involves the dynamic linker, the C runtime, and ultimately the kernel.
Setup: A Simple Program
; trace_printf.asm — we'll trace this with GDB
extern printf
section .data
fmt: db "Hello, %s! Count=%d", 10, 0
name: db "World", 0
section .text
global _start
_start:
push rbp
mov rbp, rsp
sub rsp, 8 ; stack alignment
mov rdi, fmt
mov rsi, name
mov rdx, 42
xor eax, eax
call printf
add rsp, 8
pop rbp
; exit(0) via syscall
mov eax, 60
xor edi, edi
syscall
Build with dynamic linking:
nasm -f elf64 -o trace_printf.o trace_printf.asm
gcc trace_printf.o -o trace_printf -nostartfiles
# -nostartfiles: don't include crt1.o etc., use our _start
Layer 1: The call printf Instruction
When CALL printf executes, the assembler replaced printf with the address of printf's PLT stub (Procedure Linkage Table). Not the address of printf in libc — the PLT stub.
Disassemble to see this:
objdump -d trace_printf | grep -A 5 'printf'
Output:
0000000000401020 <printf@plt>:
401020: ff 25 e2 2f 00 00 jmpq *0x2fe2(%rip) # GOT entry
401026: 68 00 00 00 00 pushq $0x0 # relocation index
40102b: e9 e0 ff ff ff jmpq 401010 <.plt> # to PLT[0]
So CALL printf actually calls printf@plt. This is the PLT stub.
Layer 2: The PLT Stub (First Call)
The PLT stub at 0x401020 does:
; printf@plt:
jmp qword [rip + GOT_printf_offset] ; indirect jump via GOT entry
push 0x0 ; relocation index (skipped on first jmp if GOT loaded)
jmp PLT_0 ; jump to PLT[0] (the resolver)
On the first call to printf, the GOT entry for printf contains the address of the next instruction (push 0x0), not yet resolved. So the first instruction jmp [GOT] jumps to the push instruction — effectively falling through to the resolver logic.
On subsequent calls, the GOT entry has been updated to point directly to printf's address in libc, and the first jmp [GOT] goes directly there, skipping the resolver entirely. This is lazy binding.
Layer 3: PLT[0] — The Resolver
PLT[0] is special:
; PLT[0] — calls the dynamic linker's resolver
pushq qword [rip + GOT+8] ; push link_map pointer (GOT[1])
jmpq qword [rip + GOT+16] ; call _dl_runtime_resolve (GOT[2])
GOT[0] = address of .dynamic section GOT[1] = link_map pointer (the runtime representation of the ELF binary) GOT[2] = _dl_runtime_resolve (the dynamic linker's symbol resolution function)
So printf@plt → PLT[0] → _dl_runtime_resolve.
Layer 4: _dl_runtime_resolve in ld.so
The dynamic linker's resolver:
- Receives the link_map and relocation index (pushed by the PLT stub)
- Looks up the symbol "printf" in the dynamic symbol table
- Finds
printfinlibc.so.6 - Writes the address of
printfinto the GOT entry for printf - Jumps to
printf
After this runs, the GOT entry for printf is updated. Future calls to printf@plt will hit the first jmp [GOT] and go directly to printf — bypassing the resolver entirely.
GDB Session: Tracing the Full Path
gdb ./trace_printf
(gdb) break _start
Breakpoint 1 at 0x401040
(gdb) run
Breakpoint 1, 0x0000000000401040 in _start ()
(gdb) disassemble
Dump of assembler code for function _start:
0x401040: push %rbp
0x401041: mov %rsp,%rbp
0x401044: sub $0x8,%rsp
0x401048: mov $0x402004,%edi ; format string address
0x40104d: mov $0x40200e,%esi ; "World" address
0x401052: mov $0x2a,%edx ; 42
0x401057: xor %eax,%eax
0x401059: call 0x401020 <printf@plt> ← our call
...
(gdb) break *0x401059
(gdb) continue
(gdb) stepi ; step INTO the call — enters printf@plt
(gdb) x/4i $rip
=> 0x401020 <printf@plt>: jmpq *0x2fe2(%rip)
0x401026 <printf@plt+6>: push $0x0
0x40102b <printf@plt+11>: jmpq 0x401010
(gdb) stepi ; execute the jmp [GOT] — first call, goes to push
; We should now be at 0x401026 (the push, because GOT not yet resolved)
(gdb) x/4x 0x404008 ; look at GOT entry for printf (address will vary)
0x404008: 0x01026 ... ; currently points to push instruction (lazy binding)
(gdb) stepi ; push $0x0 (relocation index)
(gdb) stepi ; jmp PLT[0]
; Now in PLT[0]:
(gdb) x/4i $rip
=> 0x401010: pushq 0x2ff2(%rip) ; push link_map
0x401016: jmpq *0x2ff4(%rip) ; jmp _dl_runtime_resolve
(gdb) stepi ; push link_map
(gdb) stepi ; jmp to _dl_runtime_resolve (jumps into ld.so)
; Now inside ld-linux-x86-64.so.2!
(gdb) bt ; backtrace
#0 _dl_runtime_resolve_xsavec (...) in /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
#1 0x401020 in printf@plt ()
#2 0x401059 in _start ()
; ... (many ld.so frames) ...
; Eventually:
(gdb) finish ; run until _dl_runtime_resolve returns
; It resolves printf and jumps to it:
(gdb) bt
#0 __printf (format=0x402004 "Hello, %s! Count=%d\n") in libc.so.6
#1 0x401059 in _start ()
; Now inside the real printf in libc.so.6!
After the first call, check the GOT again:
(gdb) x/1gx 0x404008 ; same GOT entry, now updated
0x404008: 0x7ffff7e4a210 ; now points to printf in libc!
On the second call to printf (if there were one), jmp [GOT] would jump directly to 0x7ffff7e4a210 — no resolver, no PLT[0], just a direct call.
Layer 5: printf in libc
Inside printf in libc, several things happen:
printfcallsvprintf(which handles the variadic arguments viava_list)vprintfformats the string into an internal buffer- The buffer is flushed by calling
fwrite(via the FILE*stdout) fwriteeventually callswrite(1, buf, len)— the Linux system callwriteis implemented as a wrapper around theSYSCALLinstruction
(gdb) break __write ; or the specific syscall wrapper
(gdb) continue
; ... eventually hits the write syscall:
(gdb) x/5i $rip
=> libc: mov $0x1,%eax ; syscall number 1 = write
libc: syscall ; invoke kernel
Actually on x86-64 Linux, write = syscall 1. The libc wrapper calls SYSCALL with: - RAX = 1 (write) - RDI = 1 (fd = stdout) - RSI = buffer address - RDX = byte count
The Full Call Chain
Assembly code:
CALL printf → calls printf@plt
printf@plt:
JMP [GOT_printf] → (first call) falls through to push + PLT[0]
→ (later calls) jumps directly to printf in libc
PLT[0]:
push link_map
JMP [GOT_resolver] → _dl_runtime_resolve in ld.so
_dl_runtime_resolve:
Looks up "printf" in libc.so.6
Updates GOT[printf] = &printf
Jumps to printf in libc
printf in libc:
Formats string
Calls fwrite → fflush → write_wrapper
write_wrapper in libc:
MOV EAX, 1 ; syscall number
SYSCALL ; invoke kernel
Linux kernel:
Copies buffer to stdout
Returns bytes written in EAX/RAX
Returns all the way back to printf
printf returns to assembly code
Total function call depth for one printf: approximately 8-15 frames, including the dynamic linker resolution on the first call.
Checking with ltrace
ltrace intercepts library calls at the PLT level — exactly the layer we traced:
ltrace ./trace_printf
# Output:
# printf("Hello, %s! Count=%d\n", "World", 42) = 21
ltrace works by replacing the GOT entry for printf with a trampoline that logs the call before jumping to the real function. The mechanism is identical to how LD_PRELOAD works (Chapter 24).
Summary
A single CALL printf in assembly involves:
- The PLT stub — a 3-instruction trampoline in the binary's .plt section
- The GOT entry — initially points to the resolver; updated after first call to point directly to printf
- The dynamic linker (_dl_runtime_resolve) — only on the first call, resolves the symbol and patches the GOT
- printf in libc — formats the string
- write(2) system call — actually outputs to stdout via the kernel
After the first call, the PLT stub's jmp [GOT] goes directly to printf with no resolver overhead. The "lazy binding" design means you don't pay the resolution cost for library functions you call frequently — just once per function per process lifetime.
This mechanism is the foundation for GOT overwrite attacks (Chapter 36): if an attacker can write to the GOT entry for any function, they can redirect ALL calls to that function to arbitrary code — once the PLT stub resolves to their payload instead of printf.