Case Study 28-1: The MinOS Bootloader — Complete Code Walkthrough

Every Instruction Explained

The MinOS bootloader is 512 bytes of assembly that contains more complexity per byte than almost any other code you will write. This walkthrough examines it instruction by instruction, explaining not just what each instruction does but why it is positioned exactly where it is.

Before Instruction 1: What the BIOS Did

When the first instruction of our bootloader executes: - CS = 0x0000, IP = 0x7C00 (physical address 0x7C00) - DL = boot drive number (0x00 = floppy, 0x80 = first hard disk, 0x81 = second hard disk) - Real mode is active: 16-bit operations, segment:offset addressing - Interrupts are enabled - A20 line is OFF (memory above 1MB wraps around) - The BIOS interrupt table is set up and functional

The Bootloader in Three Acts

Act 1: Initialization (instructions 1–15)

[BITS 16]
[ORG 0x7C00]

boot_start:
    cli                     ; [1] Disable interrupts

Why CLI first? The BIOS may have left interrupts enabled with stale interrupt vectors pointing at BIOS code. Before we set up our own stack (which interrupt handlers will use), we do not want any interrupts to fire. An interrupt with a bad stack pointer would crash immediately.

    xor ax, ax              ; [2] AX = 0
    mov ds, ax              ; [3] DS = 0 (data segment = 0x0000)
    mov es, ax              ; [4] ES = 0
    mov ss, ax              ; [5] SS = 0 (stack segment = 0x0000)
    mov sp, 0x7C00          ; [6] Stack pointer: grows DOWN from 0x7C00

Why XOR AX, AX instead of MOV AX, 0? In 16-bit mode, XOR AX, AX is 2 bytes; MOV AX, 0 is 3 bytes. In a 512-byte bootloader, every byte counts. The stack at 0x7C00 grows downward — the bootloader is at 0x7C00, so there are about 512 bytes of stack space before the stack would collide with the bootloader itself (at 0x7C00 - 512 = 0x7A00). This is fine for the shallow call depth of a bootloader.

    mov [boot_drive], dl    ; [7] Save DL (boot drive) before any BIOS call clobbers it

BIOS calls modify DL unpredictably. Save it immediately.

Act 2: Loading the Kernel (instructions 16–30)

    mov si, msg_booting     ; [8] SI = pointer to message
    call print_string       ; [9] Print "Booting MinOS..."

The print_string function loops, loading each character with LODSB (Load String Byte: loads [DS:SI] into AL, increments SI), and calling INT 0x10/AH=0x0E for each non-zero byte.

    mov ah, 0x42            ; [10] INT 0x13 function: Extended Read
    mov dl, [boot_drive]    ; [11] Restore boot drive to DL
    mov si, dap             ; [12] DS:SI points to Disk Address Packet
    int 0x13                ; [13] BIOS disk read
    jc  disk_error          ; [14] CF=1 on error: jump to error handler

The Extended Read (INT 0x13/AH=0x42) reads sectors directly from LBA addresses into a memory buffer. Our DAP (Disk Address Packet) specifies: read 32 sectors (16KB) from LBA 1 (the sector immediately after the boot sector) into memory at 0x0000:0x8000. After this call, the kernel binary is at physical address 0x8000.

Act 3: Mode Transitions (A20 → Protected → Long)

    in  al, 0x92            ; [15] Read port 0x92 (System Control Port A)
    or  al, 0x02            ; [16] Set bit 1 (Fast A20 enable)
    and al, ~0x01           ; [17] Clear bit 0 (don't trigger reset!)
    out 0x92, al            ; [18] Write back

The AND AL, ~0x01 is critical. Port 0x92 bit 0 is a system reset signal. Writing bit 0 = 1 would reset the machine immediately. Many bootloaders have crashed here.

    lgdt [gdt_ptr_32]       ; [19] Load GDT register
    mov eax, cr0            ; [20] Read CR0
    or  eax, 1              ; [21] Set PE bit (bit 0)
    mov cr0, eax            ; [22] Enable protected mode
    jmp 0x08:pm_entry_32    ; [23] FAR jump: flush pipeline, CS=0x08

Between MOV CR0, EAX and JMP, the CPU is in a liminal state: protected mode enabled but CS still loaded with a real-mode value (0x0000). The far jump loads CS with selector 0x08 (kernel code segment) and flushes the instruction queue of any speculatively-fetched real-mode bytes.

The 32-bit Protected Mode Section

[BITS 32]
pm_entry_32:
    mov ax, 0x10            ; Data segment selector
    mov ds, ax              ; Load all data segments
    mov es, ax
    mov ss, ax
    mov fs, ax
    mov gs, ax
    mov esp, 0x90000        ; 32-bit stack at 576KB

In protected mode, segment registers hold selectors, not segment addresses. 0x08 is the code segment, 0x10 is the data segment (second entry, 8 bytes per entry).

    ; Zero out space for page tables at 0x1000-0x3FFF (12KB)
    mov edi, 0x1000
    xor eax, eax
    mov ecx, 0x3000 / 4    ; 3072 dwords
    rep stosd               ; zeroes 12KB from 0x1000 to 0x3FFF

REP STOSD fills with 32-bit zeros. This is essential — the CPU will treat any non-zero bits in page table entries as present pages.

    ; PML4[0] → PDPT at 0x2000 (P|R/W = 0x03)
    mov dword [0x1000], 0x2003
    ; PDPT[0] → PD at 0x3000
    mov dword [0x2000], 0x3003
    ; PD[0] → 2MB identity page (P|R/W|PS = 0x83)
    mov dword [0x3000], 0x0083

Why store only dword (32-bit) and not the full 64-bit entry? Because the upper 32 bits are zero (physical address fits in low 32 bits, all flag bits zero), and we just zeroed the entire region. The mov dword writes the lower 32 bits; the upper 32 bits are already zero from REP STOSD.

    ; Enable PAE
    mov eax, cr4
    or  eax, (1 << 5)       ; PAE = bit 5 of CR4
    mov cr4, eax

    ; CR3 = PML4 physical address
    mov eax, 0x1000
    mov cr3, eax

    ; Enable long mode in EFER MSR
    mov ecx, 0xC0000080     ; IA32_EFER
    rdmsr                   ; read: EAX=low, EDX=high
    or  eax, (1 << 8)       ; LME = bit 8
    wrmsr                   ; write back

    ; Enable paging
    mov eax, cr0
    or  eax, (1 << 31)      ; PG = bit 31
    mov cr0, eax

    ; Far jump to 64-bit CS (selector 0x18 in GDT)
    jmp 0x18:lm_entry_64

The 64-bit Entry

[BITS 64]
lm_entry_64:
    mov ax, 0x20            ; 64-bit data segment
    mov ds, ax
    mov es, ax
    mov ss, ax
    xor ax, ax
    mov fs, ax
    mov gs, ax
    mov rsp, 0x90000        ; 64-bit stack

    jmp 0x8000              ; jump to kernel

In 64-bit mode, most segment registers are effectively ignored for addressing (base=0, limit=ignored), but they still need valid selectors. FS and GS are special: they can be used with base addresses set via MSR for thread-local storage.

The Boot Signature

times 510 - ($ - MATH0` is the start of the section. `$ - $$` is the number of bytes used so far. `510 - ($ - $$)` is the number of padding bytes needed to reach byte 510. Then two bytes for the signature.

This is the most satisfying line in any bootloader: one directive guarantees the binary will be exactly 512 bytes with the magic number in exactly the right place.

### Testing Each Stage in GDB

```gdb
# Connect to QEMU with GDB on port 1234 (-s -S flags)
(gdb) target remote localhost:1234

# Real mode: set architecture
(gdb) set architecture i8086
(gdb) break *0x7C00
(gdb) continue

# Step through CLI, segment setup, disk read
(gdb) x/10i 0x7C00

# After disk read: verify kernel was loaded
(gdb) x/4bx 0x8000    # should show kernel magic bytes

# Watch the CR0 change when entering protected mode
(gdb) info registers cr0
# Before: 0x60000010 (PE=0)
# After mov cr0, eax: 0x60000011 (PE=1)

# Watch the EFER change when enabling long mode
(gdb) monitor info cpuid | grep LME

The complete annotated bootloader demonstrates that even 510 bytes of code contains dozens of subtle decisions, each made to work within tight hardware constraints.