Chapter 23 Key Takeaways: Linking, Loading, and ELF

  1. ELF (Executable and Linkable Format) serves three roles: relocatable object file (.o, output of assembler), executable (output of linker), and shared library (.so). The same format with different e_type values: ET_REL=1, ET_EXEC=2, ET_DYN=3.

  2. The ELF header's 64 bytes identify the binary and point to the two tables: the program header table (for the loader — describes segments to map) and the section header table (for tools — describes content organization). The section header table is optional for execution; it can be stripped with strip without affecting program behavior.

  3. .bss occupies zero bytes in the file but allocates memory at load time. The section header records only a size; the OS maps zero-initialized pages for the .bss virtual address range. A 4 MB array of global zeros adds zero bytes to the compiled object file.

  4. ELF sections are for the linker; ELF segments (program headers) are for the loader. The linker combines sections into segments based on permissions: .text + .rodata → read-execute PT_LOAD; .data + .bss → read-write PT_LOAD. The loader ignores sections entirely.

  5. A relocation entry records: where to patch (r_offset), what symbol to look up (r_info), and a calculation addend (r_addend). The relocation type (R_X86_64_PC32, R_X86_64_PLT32, R_X86_64_64) specifies the formula: S + A - P for PC-relative, S + A for absolute. The linker applies these formulas during pass 2.

  6. Symbol binding determines link-time visibility: LOCAL symbols (C static) are invisible outside the .o file; GLOBAL symbols are visible to all inputs; WEAK symbols can be overridden by GLOBAL symbols of the same name. Duplicate GLOBAL symbols are a linker error; duplicate WEAK symbols are allowed (one wins silently).

  7. Archive (.a) linking is order-dependent: the linker scans the archive once, left-to-right, and includes only .o files that satisfy currently-undefined symbols. If an undefined symbol is created after the archive was scanned, it will not be resolved. Libraries go after the object files that use them on the command line.

  8. Position-Independent Code (PIC) is required for shared libraries. PIC uses RIP-relative addressing so all references are encoded as offsets from the current instruction — correct regardless of load address. Compile with -fPIC for shared libraries; -fPIE for position-independent executables (required for ASLR to randomize the main executable).

  9. The linker script is the authority on memory layout: it specifies section order, virtual addresses, alignment, and defines symbols visible to C/assembly code. Custom linker scripts are required for OS kernels, bootloaders, and embedded firmware where memory layout is not the default.

  10. KEEP prevents the linker's garbage collector from removing unreferenced sections. The Multiboot header, interrupt vector tables, and similar structures referenced only by external tools (GRUB, CPU hardware) need KEEP to survive --gc-sections optimization.

  11. The loader (kernel + dynamic linker) turns an ELF file into a running process: the kernel reads PT_LOAD segments and calls mmap for each; the PT_INTERP segment specifies the dynamic linker path; the dynamic linker loads shared libraries, applies relocations (filling GOT entries), runs constructors, then jumps to e_entry_startmain.

  12. ASLR (Address Space Layout Randomization) randomizes stack, heap, and library base addresses on each run to prevent hardcoded exploit targets. The main executable is randomized only when compiled as PIE (-fPIE -pie, which produces ET_DYN). Non-PIE executables (ET_EXEC) always load at the same address.

  13. Essential ELF tools: readelf (all ELF metadata), objdump (disassembly + hex dumps), nm (symbol table), size (section sizes), ldd (shared library dependencies), strings (embedded strings), strip (remove debug symbols), ar (archive management), ld -Map (linker map generation).

  14. OS kernels must zero .bss themselves and copy .data from flash/ROM to RAM before calling C code. There is no operating system to do it for them. The linker script defines _bss_start/_bss_end symbols, and the boot assembly uses these for rep stosb zero-initialization — the first task every kernel entry point performs.