10 min read

You have written assembly. You have run the assembler and produced a .o file. Now what? The journey from object file to running process involves two more tools: the linker (ld) and the loader (the kernel + dynamic linker). Understanding this...

Chapter 23: Linking, Loading, and ELF

What Happens After Assembly

You have written assembly. You have run the assembler and produced a .o file. Now what? The journey from object file to running process involves two more tools: the linker (ld) and the loader (the kernel + dynamic linker). Understanding this pipeline is not merely academic — it determines what symbols are visible, how relocations work, why linking fails with "undefined reference," what the ELF format actually contains, and how the OS maps your program into memory.

This chapter tears open the black box between as and execution.


23.1 Object Files: The Currency of Linking

An object file (.o) is the direct output of the assembler. It contains: - Machine code (not yet fully linked — addresses are incomplete) - Symbols (names and their values/addresses) - Relocations (instructions to the linker: "fill in this address later") - Section data (.text, .data, .bss, .rodata, etc.)

Object files are the inputs to the linker. The linker combines multiple .o files (and archives .a) into a single executable or shared library.

source.c  →  [cc1]  →  source.s
source.s  →  [as]   →  source.o   ← object file
source.o  →  [ld]   →  a.out      ← executable or shared library
a.out     →  [exec syscall + ld.so] → running process

Examining an Object File

# Compile to object file only
gcc -c -O0 example.c -o example.o

# List sections
readelf -S example.o

# List symbols
readelf -s example.o

# List relocations
readelf -r example.o

# Disassemble
objdump -d example.o

23.2 ELF Format: Executable and Linkable Format

ELF (Executable and Linkable Format) is the standard binary format on Linux, FreeBSD, and most Unix-like systems. The same format serves three purposes:

  1. Relocatable object file (.o) — output of assembler, input to linker
  2. Executable — output of linker, directly executable
  3. Shared library (.so) — position-independent code, loaded by dynamic linker

ELF File Layout

ELF File
╔══════════════════════════════════════╗
║  ELF Header (64 bytes)               ║  ← magic, type, arch, entry point
║    e_ident[16]  magic + class + ABI  ║
║    e_type        ET_REL/ET_EXEC/ET_DYN ║
║    e_machine     EM_X86_64 (0x3E)    ║
║    e_entry       entry point address ║
║    e_phoff       program header offset ║
║    e_shoff       section header offset ║
╠══════════════════════════════════════╣
║  Program Header Table (executables)  ║  ← segments for loader
║    PT_LOAD: .text + .rodata          ║  ← read+execute
║    PT_LOAD: .data + .bss             ║  ← read+write
║    PT_DYNAMIC: dynamic linking info  ║
║    PT_GNU_STACK: stack permissions   ║
╠══════════════════════════════════════╣
║  .text section                       ║  ← machine code
╠══════════════════════════════════════╣
║  .rodata section                     ║  ← read-only data (string literals)
╠══════════════════════════════════════╣
║  .data section                       ║  ← initialized global/static variables
╠══════════════════════════════════════╣
║  .bss section                        ║  ← uninitialized variables (size only, no bytes)
╠══════════════════════════════════════╣
║  .symtab section                     ║  ← symbol table
╠══════════════════════════════════════╣
║  .strtab section                     ║  ← string table (symbol names)
╠══════════════════════════════════════╣
║  .rel.text / .rela.text section      ║  ← relocations for .text
╠══════════════════════════════════════╣
║  .debug_* sections (if -g)           ║  ← DWARF debug info
╠══════════════════════════════════════╣
║  Section Header Table                ║  ← index of all sections
╚══════════════════════════════════════╝

ELF Header Fields

// From /usr/include/elf.h (simplified)
typedef struct {
    unsigned char e_ident[16];  // Magic: \x7fELF + class + data + version + OS/ABI
    uint16_t  e_type;           // ET_NONE=0, ET_REL=1, ET_EXEC=2, ET_DYN=3, ET_CORE=4
    uint16_t  e_machine;        // EM_X86_64=62, EM_AARCH64=183, EM_RISCV=243
    uint32_t  e_version;        // EV_CURRENT=1
    uint64_t  e_entry;          // Entry point virtual address (0 for .o files)
    uint64_t  e_phoff;          // Program header table offset
    uint64_t  e_shoff;          // Section header table offset
    uint32_t  e_flags;          // Architecture-specific flags
    uint16_t  e_ehsize;         // Size of this header (64 for ELF64)
    uint16_t  e_phentsize;      // Size of one program header entry (56)
    uint16_t  e_phnum;          // Number of program header entries
    uint16_t  e_shentsize;      // Size of one section header entry (64)
    uint16_t  e_shnum;          // Number of section header entries
    uint16_t  e_shstrndx;       // Index of section name string table section
} Elf64_Ehdr;

Reading the magic bytes of an ELF file:

xxd /bin/ls | head -2
# 0000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
#          ^^^^                                     ← \x7fELF (magic)
#               ^^                                  ← 02 = 64-bit (ELFCLASS64)
#                 ^^                                ← 01 = little-endian (ELFDATA2LSB)

23.3 Sections: The Building Blocks

ELF files are organized into sections. Each section has a type, flags (allocatable, writable, executable), and content.

Core Sections

Section SHT_type Flags Content
.text PROGBITS ALLOC+EXECINSTR Machine code
.data PROGBITS ALLOC+WRITE Initialized global/static variables
.bss NOBITS ALLOC+WRITE Uninitialized variables (zero-initialized at load)
.rodata PROGBITS ALLOC Read-only constants, string literals
.symtab SYMTAB (none) Symbol table (name → address mapping)
.strtab STRTAB (none) String table for symbol names
.shstrtab STRTAB (none) String table for section names
.rel.X REL (none) Relocations for section X (no addend)
.rela.X RELA (none) Relocations for section X (with addend)
.debug_* PROGBITS (none) DWARF debug information
.note.* NOTE (none) Build ID, OS notes

.bss: The Zero-Cost Section

.bss (Block Started by Symbol) is a special section that contains no bytes in the file — only a size. At load time, the kernel allocates zero-initialized pages for the .bss region. This means a 1 MB array of global zeros takes 0 bytes in the .o file:

// This goes in .bss — zero bytes in the file
int big_array[1000000];    // 4 MB in memory, 0 bytes in .o

// This goes in .data — 4 bytes in the file
int initialized = 42;      // 4 bytes in both .o and in memory
# Verify: see section sizes
size example.o
# text    data     bss     dec     hex filename
#  245      12 4000000 4000257  3d0181 example.o
#                ^^^^^^
#            4 MB in .bss — but the file is tiny

Segment vs. Section

Sections are for the linker; segments are for the loader: - Section: Named unit of content (.text, .data, .bss) - Segment (Program Header): Memory mapping instructions for the OS loader

The linker combines sections into segments based on permissions. Sections with ALLOC+EXECINSTR go into a read-execute PT_LOAD segment; sections with ALLOC+WRITE go into a read-write PT_LOAD segment.

Sections                     Segments (PT_LOAD)
.text   (rx)    ─────────►  LOAD [r-x]  0x400000 – 0x401fff
.rodata (r)     ─────────►
.data   (rw)    ─────────►  LOAD [rw-]  0x402000 – 0x402fff
.bss    (rw)    ─────────►

23.4 Symbol Tables: Names and Addresses

The symbol table maps names to addresses (or sizes, types, etc.). Every global function, global variable, and external reference is a symbol.

Symbol Structure

typedef struct {
    uint32_t  st_name;   // Index into string table (.strtab)
    uint8_t   st_info;   // Binding (LOCAL/GLOBAL/WEAK) + Type (FUNC/OBJECT/NOTYPE)
    uint8_t   st_other;  // Visibility (DEFAULT/PROTECTED/HIDDEN)
    uint16_t  st_shndx;  // Section index (SHN_UNDEF=0 for undefined)
    uint64_t  st_value;  // Address/value (offset within section for .o files)
    uint64_t  st_size;   // Size of the symbol (function body size, etc.)
} Elf64_Sym;

Symbol Binding

  • LOCAL (STB_LOCAL): Visible only within the object file. C static functions/variables.
  • GLOBAL (STB_GLOBAL): Visible to all object files. Default for non-static globals.
  • WEAK (STB_WEAK): Like global, but can be overridden by a GLOBAL symbol of the same name.

Symbol Type

  • FUNC (STT_FUNC): A function (code symbol).
  • OBJECT (STT_OBJECT): A data variable.
  • NOTYPE (STT_NOTYPE): Unspecified (often external/undefined references).

Reading Symbol Tables

readelf -s example.o
# Num:    Value          Size Type    Bind   Vis      Ndx Name
#   0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
#   1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS example.c
#   2: 0000000000000000    32 FUNC    GLOBAL DEFAULT    1 add
#   3: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
#                                                     ^^^
#                                              UND = undefined (needs to be resolved)

# nm is the traditional tool
nm -C example.o
# 0000000000000000 T add        ← T = .text (defined function)
#                  U printf     ← U = undefined (external reference)

23.5 Relocations: Instructions to the Linker

When the assembler generates code that references a symbol (a function call, a global variable access), it does not know the final address. Instead, it writes a placeholder and records a relocation entry: "at this offset in .text, fill in the address of symbol X."

Relocation Structure (RELA format, with addend)

typedef struct {
    uint64_t  r_offset;  // Where to apply the relocation (offset within section)
    uint64_t  r_info;    // Symbol index + relocation type (packed together)
    int64_t   r_addend;  // Addend for the relocation formula
} Elf64_Rela;

The relocation type tells the linker how to compute the final value: - R_X86_64_PC32: 32-bit PC-relative reference (call/jump target). Formula: S + A - P where S=symbol address, A=addend, P=relocation offset. - R_X86_64_64: 64-bit absolute address. Formula: S + A. - R_X86_64_PLT32: Call via PLT (for function calls to shared library functions). - R_X86_64_GOTPCREL: Load from GOT (for accessing global variables in shared libraries).

Watching Relocations in Action

// simple.c
extern void printf(const char *fmt, ...);
extern int global_var;

void demo(void) {
    printf("hello\n");    // Will generate a relocation for printf
    global_var = 42;       // Will generate a relocation for global_var
}
gcc -c -O0 simple.c -o simple.o
readelf -r simple.o
# Relocation section '.rela.text' at offset 0x... contains 2 entries:
#   Offset          Info           Type           Sym. Value    Sym. Name + Addend
# 000000000015  000500000002 R_X86_64_PC32  0000000000000000 printf - 4
# 000000000023  000600000002 R_X86_64_PC32  0000000000000000 global_var - 4
#
# Offset 0x15: this is where the call to printf's 4-byte displacement is
# Type R_X86_64_PC32: fill in (target_addr - (here + 4))
# Symbol: printf (undefined — in libc)

objdump -d simple.o
# 0000000000000012 <demo>:
#  12: 48 8d 3d 00 00 00 00  lea    0x0(%rip),%rdi  ← 00 00 00 00 = placeholder
#  19: e8 00 00 00 00        call   1e <demo+0xc>   ← 00 00 00 00 = placeholder

The 00 00 00 00 placeholders in the object file will be filled in by the linker once it knows where printf actually lives.


23.6 Linking: Symbol Resolution and Relocation

The linker performs two passes:

Pass 1 — Symbol Resolution: Build a global symbol table. Each .o file's symbols are added. Undefined symbols (UND) must be satisfied by another .o file, an archive (.a), or a shared library (.so).

Pass 2 — Relocation: For each relocation entry in each .o file, compute the final address and patch the machine code.

Static Linking with an Archive

An archive (.a) is a collection of .o files. The linker includes only the .o files from the archive that satisfy undefined symbols — not the entire archive.

# Create archive
ar rcs libmylib.a module1.o module2.o module3.o

# Link with archive
gcc main.o -L. -lmylib -o program
# Equivalent: ld main.o libmylib.a -o program (simplified)

The archive is searched once, left-to-right. If main.o references foo from module1.o, and module1.o references bar from module2.o, the order must be: main.o -lmylib (the library satisfies both). If dependencies are circular, list the library twice: -lA -lB -lA.

Linker Symbol Resolution Rules

  1. If a symbol is defined in multiple .o files as GLOBAL: error (duplicate symbol).
  2. If a symbol is defined as WEAK in one file and GLOBAL in another: the GLOBAL definition wins.
  3. If a symbol is defined as WEAK in multiple files: any one is chosen (undefined behavior to rely on which).
  4. If a symbol remains undefined after all inputs: error ("undefined reference to 'foo'").

Common linking errors:

undefined reference to 'foo'        → foo not defined in any input
multiple definition of 'bar'        → bar defined in two .o files as GLOBAL
cannot find -lfoo                   → libfoo.a or libfoo.so not found

23.7 Linker Scripts: Controlling Memory Layout

The linker uses a linker script to determine how sections are combined and where they are placed in the output binary. The default script (built into ld) handles normal executables. Custom linker scripts are needed for: - Embedded systems (no OS, specific memory map) - Custom executables (non-standard layout) - Operating system kernels

Default Script Behavior (Simplified)

SECTIONS {
    . = 0x400000;               /* start at virtual address 0x400000 */
    .text : { *(.text) }        /* all .text from all .o files */
    .rodata : { *(.rodata) }    /* all .rodata */
    . = ALIGN(0x1000);          /* align to page boundary */
    .data : { *(.data) }        /* all .data */
    .bss  : { *(.bss)  }        /* all .bss */
}

MinOS Kernel Linker Script

The MinOS kernel project uses a custom linker script that places the kernel at a specific physical address:

/* minOS/linker.ld */
OUTPUT_FORMAT("elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)

SECTIONS {
    /* Kernel loaded at 1 MB physical = 0x100000 */
    /* Higher-half virtual = 0xFFFFFFFF80100000 (identity-mapped initially) */
    . = 0x100000;

    _kernel_start = .;

    .text : {
        *(.multiboot)           /* Multiboot header must be first */
        *(.text)
        *(.text.*)
    }

    . = ALIGN(0x1000);
    .rodata : {
        *(.rodata)
        *(.rodata.*)
    }

    . = ALIGN(0x1000);
    .data : {
        *(.data)
        *(.data.*)
    }

    . = ALIGN(0x1000);
    _bss_start = .;
    .bss : {
        *(COMMON)               /* Uninitialized C externals */
        *(.bss)
        *(.bss.*)
    }
    _bss_end = .;

    _kernel_end = .;

    /* Discard debug sections (not needed for kernel itself) */
    /DISCARD/ : { *(.comment) *(.note*) }
}

The symbols _bss_start, _bss_end, _kernel_start, _kernel_end are defined by the linker script and become accessible in assembly/C code:

// In kernel C code
extern char _bss_start[], _bss_end[];
extern char _kernel_start[], _kernel_end[];

// Zero the BSS (kernel must do this itself — no OS to zero it)
void zero_bss(void) {
    memset(_bss_start, 0, _bss_end - _bss_start);
}

23.8 Static vs. Dynamic Linking

Static Linking

All library code is copied into the executable at link time. No external dependencies at runtime.

gcc -static main.c -o main_static
ls -lh main_static
# -rwxr-xr-x 1 user user 892K main_static   ← libc is embedded
ls -lh main_dynamic
# -rwxr-xr-x 1 user user  16K main_dynamic  ← libc is external

Advantages: No external dependencies, predictable behavior, easier distribution. Disadvantages: Larger binaries, no security patches from library updates, no sharing between processes.

Dynamic Linking

Library code is not copied into the executable. At runtime, the dynamic linker (/lib64/ld-linux-x86-64.so.2) loads the required shared libraries and resolves symbols.

ldd /bin/ls
# linux-vdso.so.1 (0x00007fff...)            ← virtual DSO from kernel
# libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
# libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0
# /lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2

Advantages: Smaller executables, shared library code between processes, security patches apply automatically. Disadvantages: "DLL hell" (wrong version), startup overhead, deployment complexity.


23.9 Position-Independent Code (PIC)

Shared libraries must be position-independent: they can be loaded at any virtual address in any process. Without PIC, a shared library would need to be at the same address in every process — impossible when multiple libraries exist.

The Problem

; Non-PIC code — assumes it's at a fixed address
mov rax, [global_var]        ; WRONG for .so: uses absolute address
call printf                  ; WRONG for .so: uses absolute address

If the library is loaded at 0x7f000000, these absolute addresses are wrong.

PIC Solution: RIP-Relative Addressing

; PIC code — uses RIP-relative addressing
mov rax, [rip + offset_to_got]   ; Load address from GOT (correct regardless of load address)
call printf@PLT                   ; Call through PLT (correct regardless of load address)

On x86-64, all memory references in shared libraries use RIP-relative addressing. The distance from the current instruction to the target is constant regardless of where the library is loaded, because both the instruction and the target move together.

# Force PIC generation
gcc -fPIC -shared -o libfoo.so foo.c

# Inspect: look for RIP-relative addressing
objdump -d libfoo.so | grep -A3 "rip"
# lea rax, [rip + 0x100]   ← RIP-relative

GOT and PLT

PIC introduces two new data structures covered in depth in Chapter 24: - GOT (Global Offset Table): Contains runtime addresses of global variables and functions. Filled by the dynamic linker. - PLT (Procedure Linkage Table): Jump stubs for external function calls. Enables lazy binding (resolving calls on first use, not at startup).


23.10 The Loader: From File to Process

When you execute a program, the kernel does the following:

Kernel Phase (execve syscall)

  1. Read ELF header: Verify magic bytes, check architecture, find PT_LOAD segments and PT_INTERP segment.
  2. Map PT_LOAD segments: mmap each segment into the process address space with appropriate permissions.
  3. Find the interpreter (PT_INTERP): Usually /lib64/ld-linux-x86-64.so.2 — the dynamic linker.
  4. Map the dynamic linker into the process address space.
  5. Jump to the dynamic linker's entry point, passing the auxiliary vector (AT_PHDR, AT_ENTRY, etc.) on the stack.

Dynamic Linker Phase (ld-linux.so)

  1. Process .dynamic section: Read DT_NEEDED entries (required shared libraries), DT_SYMTAB, DT_RELA, DT_JMPREL, etc.
  2. Load required shared libraries: Recursively process dependencies (DFS order).
  3. Perform relocations: For each DT_RELA entry, compute and write the final address.
  4. Run .init sections and DT_INIT_ARRAY constructors (C++ global constructors, __attribute__((constructor))).
  5. Jump to the program's entry point (e_entry in the ELF header → _startmain).

Virtual Memory Layout

After loading, the process address space looks like:

Virtual Address Space (x86-64 Linux)
┌─────────────────────────────────────┐ 0xFFFFFFFFFFFFFFFF
│  Kernel (not accessible from user)  │
├─────────────────────────────────────┤ 0xFFFF800000000000
│                                     │
│  (non-canonical gap)                │
│                                     │
├─────────────────────────────────────┤ 0x00007FFFFFFFFFFF
│  Stack (grows downward)             │
│    argv, environ, auxv              │
│    ...                              │
│  (mmap region — grows down)         │
│    Shared libraries (.so files)     │
│    mmap'd files                     │
│    [stack guard page]               │
├─────────────────────────────────────┤
│  [mmap base — dynamic linker chose] │
├─────────────────────────────────────┤
│  Heap (grows upward via brk/mmap)   │
├─────────────────────────────────────┤
│  .bss (zero-initialized)           │
│  .data (initialized)                │
│  .rodata (read-only)                │
│  .text (executable)                 │
├─────────────────────────────────────┤ 0x0000000000400000
│  (unmapped — null pointer guard)    │
└─────────────────────────────────────┘ 0x0000000000000000

ASLR: Address Space Layout Randomization

Modern Linux (and macOS, Windows) uses ASLR to randomize the base addresses of the stack, heap, and shared libraries. This prevents attackers from hardcoding target addresses in exploits.

# Run the same program twice — different addresses for libc
ldd /bin/ls  # Run twice:
# libc.so.6 => 0x00007f3d8b200000   ← first run
# libc.so.6 => 0x00007f9c3a100000   ← second run

The executable itself is not randomized unless compiled as a PIE (Position-Independent Executable):

# Compile as PIE (default in modern distributions)
gcc -fPIE -pie main.c -o main

# Check
readelf -h main | grep Type
# Type: DYN (Position-Independent Executable)

Let's trace a complete link of a two-file project and see every step.

Source Files

// math_utils.c
int add(int a, int b) { return a + b; }
int sub(int a, int b) { return a - b; }
// main.c
#include <stdio.h>
extern int add(int, int);
extern int sub(int, int);

int main(void) {
    printf("%d\n", add(3, 4));
    printf("%d\n", sub(10, 3));
    return 0;
}

Step 1: Compile to Object Files

gcc -c -O0 math_utils.c -o math_utils.o
gcc -c -O0 main.c -o main.o

Step 2: Inspect Object Files

nm math_utils.o
# 0000000000000000 T add   ← defined at offset 0 in .text
# 0000000000000020 T sub   ← defined at offset 32 in .text

nm main.o
#                  U add    ← undefined reference
#                  U printf ← undefined reference
#                  U sub    ← undefined reference
# 0000000000000000 T main   ← defined at offset 0 in .text

readelf -r main.o
# Relocation section '.rela.text' contains 4 entries:
#   Offset  Type        Sym. Name
#   ...     R_X86_64_PC32  add          ← needs add's address
#   ...     R_X86_64_PLT32 printf       ← needs printf via PLT
#   ...     R_X86_64_PC32  add          ← (second call)
#   ...     R_X86_64_PLT32 printf
gcc main.o math_utils.o -o program
# ld resolves:
#   add  → math_utils.o:0x0000000000401020  (example address)
#   sub  → math_utils.o:0x0000000000401040
#   printf → libc.so.6 (via PLT — see Chapter 24)

Step 4: Verify

nm program | grep -E "add|sub|main|printf"
# 0000000000401020 T add
# 0000000000401000 T main
# 0000000000401040 T sub
#                  U printf@GLIBC_2.2.5   ← still "undefined" but resolved via PLT

objdump -d program | grep -A3 "call"
#   call   0x401020 <add>    ← direct call, address filled in
#   call   0x401030 <printf@plt>  ← call through PLT for dynamic symbol

The linker filled in the relocation for add (known at link time) and left printf to be resolved at runtime via the PLT mechanism.


23.12 Tools Reference

Tool Purpose Key flags
readelf Read ELF metadata -h header, -S sections, -s symbols, -r relocations, -l segments, -d dynamic
objdump Disassemble + dump -d disassemble, -D all sections, -s hex dump, -t symbol table
nm List symbols -C demangle C++, -u undefined only, -D dynamic symbols
size Section sizes (no flags needed)
ldd List shared libs (no flags needed)
file Identify file type Identifies ELF type, architecture
strings Extract strings -n 8 minimum 8 chars
strip Remove symbols --strip-debug removes debug only
objcopy Copy/convert --only-section=.text extract one section
ld Direct linker -T script.ld custom script, -Map output.map generate map
ar Archive manager rcs create, t list, x extract

Check Your Understanding

  1. Why does .bss have no bytes in the object file?
  2. What is the difference between a section and a segment?
  3. What does a relocation entry contain, and when is it processed?
  4. Why does compiling a shared library require -fPIC?
  5. What is the role of the PT_INTERP segment?
  6. How does the linker handle a WEAK symbol vs. a GLOBAL symbol of the same name?
  7. What is ASLR and why does it require PIE executables to work on the main binary?

Summary

The journey from object file to running process passes through: ELF sections (content organized by type and permissions), the symbol table (names and addresses), relocation entries (linker instructions for patching addresses), the linker (resolving symbols, applying relocations, combining sections into segments), the static/shared library choice, position-independent code (RIP-relative addressing for .so), and the loader (kernel mapping + dynamic linker bootstrapping). Understanding this pipeline makes debugging link errors, reading binary tools' output, and writing embedded system linker scripts second nature.