Case Study 24-2: Full PLT/GOT Trace with GDB — Seeing Every Instruction

Open Assembly Language Project

Case Study 24-2: Full PLT/GOT Trace with GDB — Seeing Every Instruction

Objective

Execute a complete, instruction-level trace of a function call through the PLT and GOT mechanism, from call printf@plt in main through _dl_runtime_resolve in ld-linux.so, to the actual printf in libc.so. This case study uses real GDB commands on a real program to make the abstract PLT/GOT diagram concrete and verifiable.

Setup

// plt_trace.c
#include <stdio.h>

int main(void) {
    printf("First call\n");
    printf("Second call\n");
    return 0;
}

gcc -g -o plt_trace plt_trace.c
# Note: default on modern Linux is PIE + partial RELRO
# We compile without PIE for simpler absolute addresses in this trace:
gcc -g -no-pie -o plt_trace plt_trace.c

Phase 1: Static Analysis (Before Running)

Before touching GDB, understand the binary's structure:

# Find printf's PLT entry
objdump -d plt_trace | grep -A 4 "printf@plt"

0000000000401030 <printf@plt>:
  401030:  ff 25 d2 2f 00 00     jmp    QWORD PTR [rip+0x2fd2]  # 0x404008 <printf@got.plt>
  401036:  68 00 00 00 00        push   0x0
  40103b:  e9 e0 ff ff ff        jmp    0x401020 <.plt>

# Find PLT[0] (the resolver stub)
objdump -d plt_trace | grep -A 6 "<\.plt>:"

0000000000401020 <.plt>:
  401020:  ff 35 da 2f 00 00     push   QWORD PTR [rip+0x2fda]  # 0x404000+8
  401026:  ff 25 dc 2f 00 00     jmp    QWORD PTR [rip+0x2fdc]  # 0x404000+16
  40102c:  0f 1f 40 00           nop    DWORD PTR [rax+0x0]

# Find the GOT address for printf
readelf -r plt_trace | grep printf

000000404008  000200000007 R_X86_64_JUMP_SLOT  0000000000000000 printf@GLIBC_2.2.5 + 0
# GOT[printf] is at 0x404008

# What's in the GOT at offset 0x404008 before running?
readelf -x .got.plt plt_trace | head -10
# Hex dump of section '.got.plt':
#   0x00404000 60304000 00000000 00000000 00000000  `0@.............
#   0x00404010 36104000 00000000 ...
#
# 0x00404008 = 00000000 00000000 → zero before dynamic linker runs
# At runtime, dynamic linker initializes GOT[0..2] and sets GOT[printf]
# to the PLT push instruction address: 0x401036

Phase 2: Dynamic Trace (GDB Session)

$ gdb plt_trace
(gdb) # Show the PLT and GOT before execution
(gdb) break main
Breakpoint 1 at 0x401144: file plt_trace.c, line 4.

(gdb) run
Starting program: plt_trace
Breakpoint 1, main () at plt_trace.c:4

(gdb) # ── STEP 1: Examine GOT before first printf call ──
(gdb) x/xg 0x404008
0x404008 <printf@got.plt>:  0x0000000000401036
# GOT[printf] = 0x401036 = the "push 0x0" instruction in printf@plt
# Dynamic linker set this up: it points back into the PLT (lazy binding)

(gdb) # ── STEP 2: Disassemble call site ──
(gdb) disassemble main
Dump of assembler code for function main:
   0x0000000000401144 <+0>:  push   rbp
   0x0000000000401145 <+1>:  mov    rbp,rsp
   0x0000000000401148 <+4>:  lea    rdi,[rip+0xeb5]        # 0x402004 "First call\n"
   0x000000000040114f <+11>: call   0x401030 <printf@plt>  ← THIS
   0x0000000000401154 <+16>: lea    rdi,[rip+0xeaf]        # 0x40200a "Second call\n"
   0x000000000040115b <+23>: call   0x401030 <printf@plt>
   0x0000000000401160 <+28>: xor    eax,eax
   0x0000000000401162 <+30>: pop    rbp
   0x0000000000401163 <+31>: ret

(gdb) # ── STEP 3: Step into the call ──
(gdb) stepi
0x0000000000401030 in printf@plt ()

(gdb) # Now at the PLT stub. Disassemble it:
(gdb) x/3i $rip
0x401030 <printf@plt>:    jmp QWORD PTR [rip+0x2fd2]  # jumps to [0x404008]
0x401036 <printf@plt+6>:  push 0x0
0x40103b <printf@plt+11>: jmp 0x401020 <.plt>

(gdb) # ── STEP 4: Execute the jmp — where does it go? ──
(gdb) stepi
# The jmp reads [0x404008] = 0x401036, so we land at 0x401036
0x0000000000401036 in printf@plt ()

(gdb) # We jumped to the push instruction — lazy binding path
(gdb) x/1i $rip
0x401036 <printf@plt+6>: push 0x0   # Push relocation index 0 (printf is slot 0)

(gdb) stepi    # Execute push
(gdb) stepi    # Execute jmp 0x401020 (PLT[0])
0x0000000000401020 in plt ()

(gdb) # ── STEP 5: PLT[0] resolver stub ──
(gdb) x/3i $rip
0x401020: push QWORD PTR [rip+0x2fda]  # Push link_map pointer from GOT[1]
0x401026: jmp QWORD PTR [rip+0x2fdc]   # Jump to _dl_runtime_resolve via GOT[2]
0x40102c: nop DWORD PTR [rax+0x0]

(gdb) # What's in GOT[1] and GOT[2]?
(gdb) x/2xg 0x404000
0x404000: 0x00007ffff7ffe190   0x00007ffff7fe7680
#          ^^^GOT[1]=link_map    ^^^GOT[2]=_dl_runtime_resolve

(gdb) info symbol 0x00007ffff7fe7680
_dl_runtime_resolve_xsavec in section .text of /lib64/ld-linux-x86-64.so.2

(gdb) stepi    # push link_map
(gdb) stepi    # jmp _dl_runtime_resolve
# Now inside ld-linux.so:

0x00007ffff7fe7680 in _dl_runtime_resolve_xsavec ()
   from /lib64/ld-linux-x86-64.so.2

(gdb) # ── STEP 6: Inside _dl_runtime_resolve ──
# This is the dynamic linker. It will:
# 1. Find the relocation index (0, which we pushed) in the .rela.plt table
# 2. Find the symbol name ("printf") from the relocation
# 3. Look up "printf" in libc.so's symbol table
# 4. Write printf's address to GOT[printf] (0x404008)
# 5. Jump to printf

(gdb) # Run until _dl_runtime_resolve returns (to printf):
(gdb) finish
Run till exit from _dl_runtime_resolve_xsavec ()
0x00007ffff7c60d10 in printf () from /lib/x86_64-linux-gnu/libc.so.6
# We are now at the real printf!

(gdb) # ── STEP 7: Verify GOT was updated ──
(gdb) x/xg 0x404008
0x404008 <printf@got.plt>:  0x00007ffff7c60d10
# GOT[printf] now contains the real printf address!
(gdb) info symbol 0x00007ffff7c60d10
printf in section .text of /lib/x86_64-linux-gnu/libc.so.6

(gdb) # ── STEP 8: Let printf run ──
(gdb) finish
Run till exit from printf ()
First call                   ← printf printed its output
main () at plt_trace.c:5

(gdb) # ── STEP 9: Second call — much simpler! ──
(gdb) break *0x40114f+5    # break at second printf call (0x401154)
Breakpoint 2 at 0x401154

(gdb) continue
Breakpoint 2, main () at plt_trace.c:5
0x0000000000401154 in main ()

(gdb) stepi    # call 0x401030 <printf@plt>
0x0000000000401030 in printf@plt ()

(gdb) x/1i $rip
0x401030 <printf@plt>: jmp QWORD PTR [rip+0x2fd2]

(gdb) # What does [0x404008] contain now?
(gdb) x/xg 0x404008
0x404008 <printf@got.plt>:  0x00007ffff7c60d10
# It's the real printf address (updated by first call)

(gdb) stepi    # Execute jmp [0x404008]
0x00007ffff7c60d10 in printf () from /lib/x86_64-linux-gnu/libc.so.6
# Went DIRECTLY to printf — no PLT[0], no _dl_runtime_resolve, no resolver!

Summary of the Two Calls

Call 1 (lazy binding, first call):
  main
    → call printf@plt          [1 instruction]
    → jmp [GOT+printf]         [reads GOT = PLT stub]
    → push reloc_index=0       [identify which symbol]
    → jmp PLT[0]               [go to resolver]
    → push link_map            [arg for resolver]
    → jmp _dl_runtime_resolve  [enter dynamic linker]
    → _dl_runtime_resolve runs (~hundreds of instructions)
    → updates GOT[printf] = real printf
    → jmp printf               [finally reaches printf]
    Total: 8 visible steps + hundreds of resolver instructions

Call 2 (already resolved):
  main
    → call printf@plt          [1 instruction]
    → jmp [GOT+printf]         [reads GOT = real printf]
    → printf                   [directly at printf]
    Total: 3 visible steps (essentially 1 extra indirect jmp)

Observing Full RELRO

Recompile with full RELRO:

gcc -g -no-pie -Wl,-z,relro,-z,now -o plt_trace_relro plt_trace.c

(gdb) # With -z now, ALL symbols are resolved at startup
(gdb) break _start    # Break BEFORE main
(gdb) run

(gdb) x/xg 0x404008    # GOT[printf] before main runs
0x404008:  0x00007ffff7c60d10   ← Already resolved! Real printf address.
# _dl_runtime_resolve was called during startup, not lazily.

(gdb) # Try to write to the GOT:
(gdb) set *((long*)0x404008) = 0xdeadbeef
Cannot access memory at address 0x404008
# GOT is read-only! mprotect already applied.

This confirms: full RELRO resolves all symbols at startup (no lazy binding) and then marks the GOT read-only. GOT overwrite attacks fail with a segfault.

The ltrace Perspective

ltrace gives the same information at a higher level:

ltrace -e printf ./plt_trace
# printf("First call\n")         = 11
# printf("Second call\n")        = 12

ltrace works by intercepting PLT calls — it patches the GOT to point to its own trampolines, intercepts the call, logs it, then calls the real function. This is the same mechanism as LD_PRELOAD: modify the GOT to redirect calls.

# See all library calls:
ltrace ./plt_trace
# __libc_start_main(0x401144, 1, 0x7fffffffe3d8, 0x401170, ...)   = 0
# printf("First call\n")  = 11
# printf("Second call\n") = 12
# +++ exited (status 0) +++

Key Observations from the Trace

The GOT is writable before full RELRO: Any code that can write to address 0x404008 can redirect all subsequent printf calls to arbitrary code.
The resolver runs exactly once per symbol: After the first call, _dl_runtime_resolve is never called again for that symbol. The PLT is a self-patching stub.
Partial RELRO does not protect .got.plt: The lazy-binding entries in .got.plt remain writable under partial RELRO. Full RELRO (-z now) is required to protect them.
The overhead of the PLT is one indirect jmp: After first resolution, the PLT adds exactly one indirect memory access (load from GOT + jmp) per call. On modern processors, this is essentially free due to the branch target predictor caching the GOT value.
ASLR randomizes everything: In a real ASLR system, every address (printf's address in libc, the GOT address, the link_map, _dl_runtime_resolve) changes on every run. The PLT/GOT mechanism works correctly regardless because it uses relative addresses and runtime-populated tables.

Summary

This GDB trace turned the abstract PLT/GOT diagram into observed machine behavior: - GOT entry starts as a PLT stub address (lazy binding placeholder) - First call: PLT stub → PLT[0] → _dl_runtime_resolve → GOT update → real function - Second call: PLT stub → GOT (now has real address) → real function directly - Full RELRO: GOT pre-populated at startup, then write-protected — GOT overwrite impossible - ltrace uses the same GOT-patching mechanism to intercept library calls

The PLT/GOT mechanism is elegant: it achieves position independence, lazy binding, and minimal overhead in exactly three instructions per call (after resolution).