Case Study 1.1: Hello World — From C to Binary
Tracing a simple program through every stage of the toolchain
Overview
The goal of this case study is to follow a "Hello, World!" program from C source code all the way to the actual bytes that execute on the CPU, examining every transformation in detail. We'll look at the ELF file format, the machine code encoding, how the program gets loaded, and what the kernel sees when we call write().
This is a single program examined with six different tools. By the end, you should be able to look at any compiled binary and understand what you're seeing.
The Source
// hello.c
#include <stdio.h>
int main(void) {
printf("Hello, World!\n");
return 0;
}
We compile it with: gcc -O0 hello.c -o hello
Stage 1: Preprocessing
Running gcc -E hello.c expands the #include <stdio.h> directive. The output is several hundred lines of function declarations and type definitions, ending with our two lines:
// (... ~750 lines of stdio.h declarations ...)
int main(void) {
printf("Hello, World!\n");
return 0;
}
The preprocessor has replaced the #include with the actual content of /usr/include/stdio.h plus all the headers it includes (stddef.h, bits/types.h, etc.). This is why "just adding an #include" can meaningfully slow down compilation — the compiler has to parse all of it.
Key observation: printf is declared (its signature is in stdio.h) but not defined here. The definition is in the C standard library (libc.so.6), which the linker will connect us to.
Stage 2: Compilation — C to Assembly
Running gcc -O0 -S hello.c -o hello.s produces:
; gcc -O0 -S hello.c output (annotated)
; x86-64 Linux, GCC 12.2
.file "hello.c"
.section .rodata ; read-only data section
.LC0:
.string "Hello, World!" ; the string literal (null-terminated)
.text
.globl main
.type main, @function
main:
endbr64 ; Intel CET: indirect branch target marker
pushq %rbp ; save caller's frame pointer
movq %rsp, %rbp ; set up our frame pointer
leaq .LC0(%rip), %rax ; rax = address of "Hello, World!" (RIP-relative)
movq %rax, %rdi ; rdi = argument 1 for puts()
call puts@PLT ; call puts() via the PLT
movl $0, %eax ; return value = 0
popq %rbp ; restore caller's frame pointer
ret ; return
Several things to note:
GCC optimized printf to puts: Since the format string contains no % specifiers and ends with \n, GCC knows that printf("Hello, World!\n") is equivalent to puts("Hello, World!"). The puts function automatically adds a newline, so the \n is stripped from the string literal (hence .string "Hello, World!" without the newline). This is a legitimate optimization — puts is simpler than printf and typically faster.
RIP-relative addressing: The instruction leaq .LC0(%rip), %rax uses RIP-relative addressing — the address of the string is computed as an offset from the instruction pointer. This is how 64-bit position-independent code works. The compiled code can run at any address in the address space and the string reference is still correct.
PLT (Procedure Linkage Table): The call is puts@PLT rather than just puts. The PLT is a mechanism for calling shared library functions. On the first call, the PLT stub invokes the dynamic linker to resolve puts to its actual address in libc.so.6. On subsequent calls, the linker has cached the address and the PLT stub jumps directly to it.
endbr64: This instruction is part of Intel CET (Control-flow Enforcement Technology), a security feature. Valid indirect branch targets are marked with endbr64. Processors that support CET will fault if an indirect branch (like call *rax) targets an instruction that isn't marked with endbr64. This is a hardware-level control flow integrity check.
The function is in AT&T syntax (GCC's default). For those more comfortable with Intel syntax, the same function reads:
; Intel syntax equivalent (for reference)
main:
endbr64
push rbp
mov rbp, rsp
lea rax, [rip + .LC0]
mov rdi, rax
call puts
mov eax, 0
pop rbp
ret
Stage 3: Assembly — Assembly to Object File
Running gcc -c hello.c -o hello.o produces an ELF64 object file. We inspect it:
$ file hello.o
hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV)
$ objdump -d hello.o
hello.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rbp,%rsp
8: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # f <main+0xf>
f: 48 89 c7 mov %rdi,%rax
12: e8 00 00 00 00 call 17 <main+0x17>
17: b8 00 00 00 00 mov $0x0,%eax
1c: 5d pop %rbp
1d: c3 ret
The relocation entries: Notice that lea 0x0(%rip), %rax has 00 00 00 00 as the offset — zero! And call 17 <main+0x17> has all zeros in its displacement. These are relocations — placeholders that the linker will fill in with the actual addresses.
To see the relocation table:
$ readelf -r hello.o
Relocation section '.rela.text' at offset 0x228 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000b 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
000000000013 000600000004 R_X86_64_PLT32 0000000000000000 puts - 4
Two relocations:
1. Offset 0xb (the 4-byte offset in the lea instruction): fill in the RIP-relative address of the .rodata string when linking
2. Offset 0x13 (the 4-byte displacement in the call instruction): fill in the PLT-relative address of puts when linking
The object file is incomplete. It knows it needs the string and puts, but it doesn't know their addresses yet.
Stage 4: Linking — Producing the Executable
Running gcc hello.o -o hello invokes the linker (ld via GCC's driver). The linker:
- Combines
.text,.data,.rodata, and.bsssections from all input objects - Resolves relocations (fills in the placeholder addresses)
- Generates the PLT and GOT (Global Offset Table) for shared library calls
- Writes the ELF executable with the correct headers
Let's examine the result:
$ file hello
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=..., for GNU/Linux 3.2.0,
not stripped
$ size hello
text data bss dec hex filename
1503 600 8 2111 83f hello
The size command shows the four main sections:
- .text: 1503 bytes of machine code
- .data: 600 bytes of initialized data (includes GOT, PLT, etc.)
- .bss: 8 bytes of zero-initialized data
- Total: 2111 bytes
For a "hello world" program, 1503 bytes of code seems like a lot. Where does it come from?
$ objdump -d hello | grep "^[0-9a-f]* <"
0000000000001020 <_start>:
0000000000001040 <deregister_tm_clones>:
0000000000001070 <register_tm_clones>:
00000000000010a0 <__do_global_dtors_aux>:
00000000000010c0 <frame_dummy>:
00000000000010c9 <main>:
The executable contains not just main, but also the C runtime startup code: _start (the true entry point, which calls main), __do_global_dtors_aux (runs global destructors at exit), register_tm_clones and deregister_tm_clones (for GCC's transactional memory support). These are all part of GCC's crt1.o, crti.o, and crtn.o startup files that GCC links automatically.
Now let's look at the resolved main:
$ objdump -d hello | grep -A 20 "<main>:"
00000000000010c9 <main>:
10c9: f3 0f 1e fa endbr64
10cd: 55 push %rbp
10ce: 48 89 e5 mov %rbp,%rsp ; note: GCC wrote these backwards!
10d1: 48 8d 05 2c 0f 00 00 lea 0xf2c(%rip),%rax # 2004 <_IO_buf_end@@...>
10d8: 48 89 c7 mov %rdi,%rax
10db: e8 f0 fe ff ff call 10d0 <puts@plt>
10e0: b8 00 00 00 00 mov $0x0,%eax
10e5: 5d pop %rbp
10e6: c3 ret
The relocations have been filled in:
- lea 0xf2c(%rip), %rax: the string "Hello, World!" is at 0x10d8 + 0xf2c = 0x2004 (current RIP + offset)
- call e8 f0 fe ff ff: this is a relative call to puts@PLT. The offset -0x110 from the next instruction (0x10e0) lands at 0x10d0, which is the PLT entry for puts
Stage 5: The ELF File Structure
The ELF (Executable and Linkable Format) file has three levels of structure:
$ readelf -h hello
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 ... (ELF, 64-bit, little-endian, Linux ABI)
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Entry point address: 0x1020
Start of program headers: 64 (bytes into file)
Start of section headers: 13584 (bytes into file)
Number of section headers: 31
Section header string table index: 30
The entry point is 0x1020, which is the address of _start, not main. The C runtime's _start function calls main after setting up the environment.
$ readelf -S hello | grep -E "Name|\.text|\.data|\.rodata|\.bss"
[ 1] .interp PROGBITS 0000000000000318
[14] .text PROGBITS 0000000000001020
[17] .rodata PROGBITS 0000000000002000
[24] .data PROGBITS 0000000000003de8
[25] .bss NOBITS 0000000000003df0
The .rodata section at 0x2000 contains our string. Let's verify:
$ readelf -p .rodata hello
String dump of section '.rodata':
[ 0] Hello, World!
There it is. The string is 14 bytes: "Hello, World!" (13 chars) plus the null terminator.
Stage 6: The Running Process
When we execute ./hello, the kernel's execve system call loads the ELF binary:
- The kernel maps
.textas read-execute - The kernel maps
.rodataas read-only - The kernel maps
.dataand.bssas read-write - The kernel sets up the stack with command-line arguments and environment
- The kernel jumps to the dynamic linker (
/lib64/ld-linux-x86-64.so.2) specified in the.interpsection - The dynamic linker maps
libc.so.6into the process address space and resolvesputs - The dynamic linker jumps to
_startat0x1020 _startcalls__libc_start_main, which callsmainmaincallsputsthrough the PLTputscallswrite(1, "Hello, World!\n", 14)via a system call- The kernel writes 14 bytes to stdout (connected to the terminal)
mainreturns 0,__libc_start_maincallsexit(0), which calls_exitsystem call
The actual bytes that travel from the binary to the screen are 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 0a — the ASCII codes for "Hello, World!\n".
What We Learned
This trace reveals several things that matter for assembly programmers:
-
The compiler makes choices you didn't make: GCC silently replaced
printfwithputs. In a security context, this matters — if you're patching a binary to call a different function, you need to know which function was actually compiled in. -
Object files are incomplete: They contain placeholder addresses that only the linker can fill. This is why you can't run an object file — it references addresses that don't exist yet.
-
The true entry point is not
main: It's_start, which is part of the C runtime. When writing assembly programs without the C runtime (like the MinOS kernel), you define_startyourself. -
The machine code is surprisingly compact: The entire
mainfunction is 30 bytes (0x10c9to0x10e6). A "hello world" function is smaller than most PDF icons. -
Every byte has meaning: The
f3 0f 1e fathat starts every function with CET enabled, the48REX prefix that indicates 64-bit operand size, the relative displacement in the call instruction — nothing is arbitrary.
Practical Application
To trace any C function through this process yourself:
# 1. Compile to assembly
gcc -O0 -S -fno-asynchronous-unwind-tables yourfile.c -o yourfile.s
# (-fno-asynchronous-unwind-tables removes .cfi_ directives for cleaner output)
# 2. Compile to object and show relocations
gcc -c yourfile.c -o yourfile.o
readelf -r yourfile.o
# 3. Link and show all functions
gcc yourfile.c -o yourfile
objdump -d yourfile | grep "^[0-9a-f]* <"
# 4. Show string contents
readelf -p .rodata yourfile
readelf -p .data yourfile
Understanding this pipeline is the foundation for every assembly skill that follows.