5 min read

At power-on, the x86-64 CPU starts in 16-bit real mode at address 0xFFFFFFF0, runs BIOS code, and eventually jumps to address 0x7C00 — the first sector of the bootable disk. The code at 0x7C00 is yours. The BIOS is done. The OS is not yet loaded. No...

Chapter 28: Bare Metal Programming

Before the Operating System

At power-on, the x86-64 CPU starts in 16-bit real mode at address 0xFFFFFFF0, runs BIOS code, and eventually jumps to address 0x7C00 — the first sector of the bootable disk. The code at 0x7C00 is yours. The BIOS is done. The OS is not yet loaded. No interrupt handlers are set up. No virtual memory. No stack (until you create one). No libc. No C runtime. Just a CPU, some registers, a few kilobytes of BIOS data structures in low memory, and 512 bytes in which to change everything.

This is the most fundamental level of assembly programming. Everything the operating system provides — virtual memory, system calls, file systems, the C runtime — was bootstrapped by code that started exactly here.


The x86-64 Boot Sequence

Step 1: Power-On Reset

When the CPU starts after a power-on or reset: - All registers are in defined states (mostly zero, except CS=0xF000, EIP=0xFFF0) - The CPU is in 16-bit real mode - The first instruction is at physical address CS:IP = 0xFFFF0 (0xF000 × 16 + 0xFFF0) - This is inside the BIOS ROM

Step 2: BIOS POST and Initialization

The BIOS (Basic Input/Output System) or UEFI firmware: 1. Tests memory (POST — Power-On Self Test) 2. Initializes hardware (chipset, interrupts, clocks) 3. Sets up real-mode interrupt handlers (INT 0x10 for video, INT 0x13 for disk, etc.) 4. Finds a bootable device (checks MBR signature 0x55AA at offset 510) 5. Loads the first 512 bytes (MBR) from the boot device to physical address 0x7C00 6. Jumps to 0x7C00

Your bootloader code is at 0x7C00. The BIOS has exited.

Step 3: MBR Bootloader (Stage 1)

The Master Boot Record is exactly 512 bytes. The last two bytes must be the magic signature 0x55, 0xAA (at offsets 510 and 511) or the BIOS will not boot it. You have 510 bytes for actual code and static data.

In 510 bytes, a typical stage-1 bootloader: 1. Sets up the stack (the BIOS may have left DL = boot drive number, which you should save) 2. Prints a loading message using BIOS INT 0x10 3. Reads additional sectors (the kernel) from disk using BIOS INT 0x13 4. Prepares for mode transitions 5. Jumps to stage-2 code


Real Mode (16-bit)

In real mode, the CPU operates as a very fast 16-bit 8086: - Only 16-bit registers visible (AX, BX, CX, DX, SI, DI, SP, BP) - Memory accessed via segment:offset addressing - Physical address = segment register × 16 + offset - Maximum addressable memory: 1MB (20-bit address space) - Can access BIOS interrupts

; Real mode addressing example
; Physical address = CS:IP = 0x0000:0x7C00 = 0x7C00
; Physical address = DS:SI where DS=0x0800, SI=0x0100
; → 0x0800 × 16 + 0x0100 = 0x8100

; Segment registers in real mode: CS, DS, ES, FS, GS, SS
; Set DS=0 to use flat addressing within first 64KB
xor ax, ax
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00      ; stack grows down from bootloader address

BIOS Video Services (INT 0x10)

; Print character 'A' to screen in real mode
; INT 0x10 / AH=0x0E: BIOS teletype output
mov ah, 0x0E        ; function: teletype output
mov al, 'A'         ; character to print
mov bh, 0           ; page number
mov bl, 0x07        ; color (white on black)
int 0x10

BIOS Disk Read (INT 0x13)

; Read disk sectors using BIOS Extended Read (INT 0x13 / AH=0x42)
; Uses a Disk Address Packet (DAP) structure

section .data
dap:
    db 0x10         ; DAP size (16 bytes)
    db 0            ; reserved
    dw 10           ; number of sectors to read
    dw 0x8000       ; offset of destination buffer
    dw 0x0000       ; segment of destination buffer (ES:0x8000 = 0x08000)
    dq 1            ; starting LBA (sector 1 = second sector, 0-indexed)

section .text
    mov ah, 0x42    ; Extended Read Sectors
    mov dl, [boot_drive]  ; boot drive number (saved from DL at boot)
    mov si, dap     ; DS:SI = pointer to DAP
    int 0x13
    jc .disk_error  ; CF set on error

Transitioning to Protected Mode (32-bit)

Real mode's 1MB address space is insufficient for loading a kernel. The transition to 32-bit protected mode requires:

  1. Setting up a GDT (Global Descriptor Table)
  2. Loading GDTR with LGDT
  3. Setting bit 0 of CR0 (PE — Protection Enable)
  4. A far jump to flush the instruction pipeline and load the new CS

The GDT (Global Descriptor Table)

In protected mode, segment registers contain selectors — indices into the GDT, not raw addresses. Each GDT entry (8 bytes) describes a segment: its base address, size limit, and access rights.

; Minimal GDT for protected mode transition
; Entry 0: null descriptor (required by CPU)
; Entry 1: kernel code segment (selector 0x08)
; Entry 2: kernel data segment (selector 0x10)

align 8
gdt_start:
    ; Null descriptor
    dq 0

    ; Code segment: base=0, limit=4GB, 32-bit, ring 0, executable
    ; Flags byte 6: G=1 (4KB granularity), D=1 (32-bit), L=0 (not 64-bit)
    ; Access byte 5: P=1, DPL=0, S=1, Type=1010 (code, execute/read)
    dw 0xFFFF           ; limit[15:0]
    dw 0x0000           ; base[15:0]
    db 0x00             ; base[23:16]
    db 0x9A             ; access: P=1, DPL=0, S=1, Type=A (exec+read)
    db 0xCF             ; flags[7:4]=C (G=1,D=1,L=0,AVL=0), limit[19:16]=F
    db 0x00             ; base[31:24]

    ; Data segment: base=0, limit=4GB, 32-bit, ring 0, writable
    dw 0xFFFF
    dw 0x0000
    db 0x00
    db 0x92             ; access: P=1, DPL=0, S=1, Type=2 (data, read/write)
    db 0xCF
    db 0x00

gdt_end:

gdt_ptr:
    dw gdt_end - gdt_start - 1  ; limit
    dd gdt_start                 ; base (32-bit in real mode context)

; Transition to protected mode:
    cli                 ; disable interrupts
    lgdt [gdt_ptr]      ; load GDT
    mov eax, cr0
    or  eax, 1          ; set PE bit
    mov cr0, eax        ; enable protected mode
    jmp 0x08:pm_entry   ; far jump: flush pipeline, load CS=0x08

bits 32
pm_entry:
    ; Now in 32-bit protected mode
    mov ax, 0x10        ; data segment selector
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax
    mov ss, ax
    mov esp, 0x90000    ; set up a stack

Transitioning to Long Mode (64-bit)

From 32-bit protected mode, transitioning to 64-bit long mode requires:

  1. Enable PAE (Physical Address Extension): CR4.PAE = 1
  2. Set up minimal page tables (identity map the first 2MB)
  3. Enable long mode: set EFER.LME = 1 (via WRMSR)
  4. Enable paging: CR0.PG = 1
  5. Far jump to 64-bit code segment
; === Long mode setup (from 32-bit protected mode) ===
; Runs in 32-bit mode, sets up 64-bit page tables

bits 32

; Step 1: Enable PAE
mov eax, cr4
or  eax, (1 << 5)       ; PAE bit = bit 5
mov cr4, eax

; Step 2: Set up minimal page tables (identity map first 2MB using 2MB huge page)
; PML4 at 0x1000, PDP at 0x2000, PD at 0x3000

; PML4[0] → PDP at 0x2000
mov dword [0x1000], 0x2003  ; present | r/w | address = 0x2000
mov dword [0x1004], 0       ; high 32 bits of 64-bit entry

; PDP[0] → PD at 0x3000
mov dword [0x2000], 0x3003
mov dword [0x2004], 0

; PD[0] → 2MB huge page at physical 0
; PS bit (bit 7) set = 2MB page; identity map (phys 0 = virt 0)
mov dword [0x3000], 0x0083  ; present | r/w | PS (huge page) | address = 0
mov dword [0x3004], 0

; CR3 = physical address of PML4
mov eax, 0x1000
mov cr3, eax

; Step 3: Enable long mode via EFER MSR
mov ecx, 0xC0000080     ; IA32_EFER MSR number
rdmsr
or  eax, (1 << 8)       ; LME = bit 8
wrmsr

; Step 4: Enable paging (and protected mode stays on)
mov eax, cr0
or  eax, (1 << 31)      ; PG = bit 31
mov cr0, eax
; At this point, CPU is in compatibility mode (IA-32e, LMA=1, CS.L=0)

; Step 5: Far jump to 64-bit code segment
; We need a 64-bit code segment in the GDT
; (add to GDT: 64-bit code segment with L=1)
jmp 0x18:long_mode_entry  ; 0x18 = 64-bit code segment selector

bits 64
long_mode_entry:
    ; Now in 64-bit long mode!
    ; All 64-bit registers available
    ; Virtual address space is active (identity mapped for first 2MB)

    ; Set data segments
    mov ax, 0x20        ; 64-bit data segment selector
    mov ds, ax
    mov es, ax
    mov ss, ax
    ; FS and GS for thread-local storage (set to 0 for now)
    xor ax, ax
    mov fs, ax
    mov gs, ax

    ; Set up stack
    mov rsp, 0x9F000    ; top of available conventional memory

    ; Jump to the kernel
    jmp kernel_main

The 64-bit GDT Extension

; Add these entries to the GDT for 64-bit mode:
; Entry 3 (selector 0x18): 64-bit kernel code
    dw 0x0000           ; limit (ignored in 64-bit)
    dw 0x0000           ; base (ignored in 64-bit)
    db 0x00
    db 0x9A             ; P=1, DPL=0, S=1, Type=A (exec+read)
    db 0x20             ; flags: G=0, D=0, L=1 (64-bit!) — bit 5
    db 0x00

; Entry 4 (selector 0x20): 64-bit kernel data
    dw 0x0000
    dw 0x0000
    db 0x00
    db 0x92             ; P=1, DPL=0, S=1, Type=2 (data, r/w)
    db 0x00
    db 0x00

⚙️ How It Works: The critical field in the 64-bit code segment is the L bit (bit 5 of the flags byte, or bit 53 of the full 8-byte entry). When L=1 and the CPU is in IA-32e mode (LMA=1), the segment is a 64-bit code segment. When the far jump loads CS with this selector, the CPU enters 64-bit mode and 64-bit instructions become available.


The Complete MinOS Bootloader

Here is the full 512-byte bootloader, annotated line by line:

; minOS_boot.asm — Complete MinOS NASM Bootloader
; Builds to exactly 512 bytes.
; Build: nasm -f bin minOS_boot.asm -o boot.bin
;        cat boot.bin kernel.bin > minOS.img
;        qemu-system-x86_64 -drive format=raw,file=minOS.img

[BITS 16]
[ORG 0x7C00]           ; BIOS loads us at this address

; ============= Stage 1: Setup (Real Mode 16-bit) =============

boot_start:
    ; Clear interrupts during setup
    cli

    ; Set up segment registers for flat real-mode addressing
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov sp, 0x7C00      ; stack below bootloader

    ; Save boot drive number (BIOS puts it in DL)
    mov [boot_drive], dl

    ; Print boot message
    mov si, msg_booting
    call print_string

    ; Load kernel sectors from disk
    ; Our kernel is at sectors 1-N (1-indexed: sector 2 and beyond)
    ; Target buffer: 0x0000:0x8000 (physical 0x8000)
    mov ah, 0x42        ; INT 13h Extended Read
    mov dl, [boot_drive]
    mov si, dap
    int 0x13
    jc  disk_error

    ; Print "OK"
    mov si, msg_ok
    call print_string

; ============= Enable A20 line =============
; The A20 line must be enabled to access memory above 1MB
; Method: Fast A20 via port 0x92

    in  al, 0x92
    or  al, 0x02        ; set bit 1 (Fast A20 enable)
    and al, ~0x01       ; don't reset
    out 0x92, al

; ============= Load GDT and enter protected mode =============

    lgdt [gdt_ptr_32]

    mov eax, cr0
    or  eax, 1
    mov cr0, eax

    jmp 0x08:pm_entry_32   ; flush prefetch, load CS=kernel code

; ============= Real Mode Utilities =============

print_string:
    ; SI = pointer to null-terminated string
    ; Uses BIOS teletype (INT 0x10/AH=0x0E)
    lodsb               ; load byte at [SI], increment SI
    test al, al
    jz .done
    mov ah, 0x0E
    mov bh, 0
    int 0x10
    jmp print_string
.done:
    ret

disk_error:
    mov si, msg_error
    call print_string
.halt:
    cli
    hlt
    jmp .halt

; ============= Data =============

msg_booting db "Booting MinOS...", 13, 10, 0
msg_ok      db "Kernel loaded.", 13, 10, 0
msg_error   db "DISK ERROR!", 13, 10, 0
boot_drive  db 0

; Disk Address Packet for Extended Read
dap:
    db 0x10     ; packet size
    db 0        ; reserved
    dw 32       ; read 32 sectors (16KB of kernel)
    dw 0x8000   ; buffer offset
    dw 0x0000   ; buffer segment (ES=0 → physical 0x8000)
    dq 1        ; starting LBA = sector 1 (0-indexed: sector 2)

; 32-bit GDT (for protected mode)
align 4
gdt_32:
    dq 0                ; null descriptor
    ; Code: 0x08
    dw 0xFFFF, 0x0000
    db 0x00, 0x9A, 0xCF, 0x00
    ; Data: 0x10
    dw 0xFFFF, 0x0000
    db 0x00, 0x92, 0xCF, 0x00
    ; 64-bit Code: 0x18
    dw 0x0000, 0x0000
    db 0x00, 0x9A, 0x20, 0x00
    ; 64-bit Data: 0x20
    dw 0x0000, 0x0000
    db 0x00, 0x92, 0x00, 0x00
gdt_32_end:

gdt_ptr_32:
    dw gdt_32_end - gdt_32 - 1
    dd gdt_32

; ============= Protected Mode Entry (32-bit) =============
[BITS 32]
pm_entry_32:
    mov ax, 0x10
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov fs, ax
    mov gs, ax
    mov esp, 0x90000

    ; Setup page tables for long mode transition
    ; PML4 at 0x1000, PDPT at 0x2000, PD at 0x3000
    ; Zero all three tables first
    mov edi, 0x1000
    xor eax, eax
    mov ecx, 0x3000 / 4     ; 3 pages × 4096 bytes / 4 bytes per stosd
    rep stosd

    ; PML4[0] → PDPT at 0x2000
    mov dword [0x1000], 0x2003    ; P|R/W
    ; PDPT[0] → PD at 0x3000
    mov dword [0x2000], 0x3003
    ; PD[0] → 2MB identity page (physical 0)
    mov dword [0x3000], 0x0083    ; P|R/W|PS(2MB)

    ; Enable PAE
    mov eax, cr4
    or  eax, (1 << 5)
    mov cr4, eax

    ; CR3 = PML4
    mov eax, 0x1000
    mov cr3, eax

    ; EFER.LME = 1
    mov ecx, 0xC0000080
    rdmsr
    or  eax, (1 << 8)
    wrmsr

    ; Enable paging → enter compatibility mode
    mov eax, cr0
    or  eax, (1 << 31)
    mov cr0, eax

    ; Far jump to 64-bit code segment
    jmp 0x18:lm_entry_64

[BITS 64]
lm_entry_64:
    ; 64-bit long mode active!
    mov ax, 0x20
    mov ds, ax
    mov es, ax
    mov ss, ax
    xor ax, ax
    mov fs, ax
    mov gs, ax
    mov rsp, 0x90000

    ; Jump to kernel main (kernel was loaded to 0x8000)
    ; Kernel entry point is at the start of the kernel binary
    jmp 0x8000

; ============= Boot Signature =============
; Pad to 510 bytes, then add 0x55AA signature
times 510 - ($ - $$) db 0
dw 0xAA55

VGA Text Mode Output

Before setting up interrupts or a proper console, you can write directly to the VGA text buffer at physical address 0xB8000:

; VGA text mode: 80×25 characters, 2 bytes per character
; Byte 0: ASCII character
; Byte 1: attribute (high nibble = background, low nibble = foreground)
; Colors: 0=black, 7=light gray, 0xF=bright white, 0x4=red, 0x2=green

VGA_BASE  equ 0xB8000
COLS      equ 80
ROWS      equ 25

; Write character at column CX, row DX
; AL = character, AH = attribute
vga_putchar_at:
    ; offset = (row * 80 + col) * 2
    imul rdx, rdx, COLS
    add  rdx, rcx
    imul rdx, rdx, 2
    ; Write to VGA buffer
    mov word [VGA_BASE + rdx], ax   ; AH=attr, AL=char
    ret

; Clear screen (fill with spaces, dark attribute)
vga_clear:
    mov rdi, VGA_BASE
    mov rax, 0x0720         ; space (0x20) with attribute 0x07
    ; Replicate to fill 16-bit word
    ; Fill 80*25 = 2000 words = 4000 bytes = 500 qwords
    mov rcx, 500
    ; Build qword from word: 0x07200720_07200720
    movzx rax, ax
    mov rbx, rax
    shl rbx, 16
    or  rax, rbx
    mov rbx, rax
    shl rbx, 32
    or  rax, rbx
    rep stosq
    ret

QEMU Setup and Debugging

# Build the MinOS boot image
nasm -f bin minOS_boot.asm -o boot.bin
nasm -f bin kernel.asm -o kernel.bin  # flat binary kernel

# Combine: boot sector + kernel (padding to sector boundaries)
cat boot.bin kernel.bin > minOS.img
# Pad to at least 512+16*512 = 8704 bytes
truncate -s 32768 minOS.img   # pad to 32KB

# Run in QEMU (standard display)
qemu-system-x86_64 -drive format=raw,file=minOS.img

# Run with curses display (terminal-based)
qemu-system-x86_64 -drive format=raw,file=minOS.img -display curses

# Run with GDB debugging support (-s: GDB port 1234, -S: start paused)
qemu-system-x86_64 -drive format=raw,file=minOS.img -s -S

# In a separate terminal, attach GDB:
gdb
(gdb) target remote localhost:1234
(gdb) set architecture i8086        # real mode initially
(gdb) break *0x7C00                 # breakpoint at bootloader entry
(gdb) continue
(gdb) x/20i 0x7C00                  # examine bootloader instructions

🛠️ Lab Exercise: Build and run the MinOS bootloader in QEMU. Observe the "Booting MinOS..." message appearing in the QEMU window. Attach GDB with -s -S, set a breakpoint at 0x7C00, and single-step through the real-mode to protected-mode to long-mode transition. Watch CR0 change (bit 0 set for protected mode, bit 31 set for paging).


Summary

The x86-64 boot process is a journey through three CPU modes: real mode (16-bit, 1MB), protected mode (32-bit, 4GB with segmentation), and long mode (64-bit, 128TB virtual). Each transition requires specific hardware setup: a GDT for protected mode, page tables for long mode. The bootloader is the most constrained code you will ever write — exactly 510 bytes to load a kernel and hand off control. Understanding this sequence explains every layer that sits above it.

🔄 Check Your Understanding: 1. Why is the magic signature 0x55AA at the very end (bytes 510–511) of the 512-byte MBR? 2. What is the A20 line, and why must it be enabled before accessing memory above 1MB? 3. Which bit of CR0 enables protected mode, and which bit enables paging? 4. Why is a far jump (jmp 0x08:pm_entry_32) required after setting CR0.PE=1? 5. What does the L bit in a GDT code segment descriptor do?