At power-on, the x86-64 CPU starts in 16-bit real mode at address 0xFFFFFFF0, runs BIOS code, and eventually jumps to address 0x7C00 — the first sector of the bootable disk. The code at 0x7C00 is yours. The BIOS is done. The OS is not yet loaded. No...
In This Chapter
Chapter 28: Bare Metal Programming
Before the Operating System
At power-on, the x86-64 CPU starts in 16-bit real mode at address 0xFFFFFFF0, runs BIOS code, and eventually jumps to address 0x7C00 — the first sector of the bootable disk. The code at 0x7C00 is yours. The BIOS is done. The OS is not yet loaded. No interrupt handlers are set up. No virtual memory. No stack (until you create one). No libc. No C runtime. Just a CPU, some registers, a few kilobytes of BIOS data structures in low memory, and 512 bytes in which to change everything.
This is the most fundamental level of assembly programming. Everything the operating system provides — virtual memory, system calls, file systems, the C runtime — was bootstrapped by code that started exactly here.
The x86-64 Boot Sequence
Step 1: Power-On Reset
When the CPU starts after a power-on or reset:
- All registers are in defined states (mostly zero, except CS=0xF000, EIP=0xFFF0)
- The CPU is in 16-bit real mode
- The first instruction is at physical address CS:IP = 0xFFFF0 (0xF000 × 16 + 0xFFF0)
- This is inside the BIOS ROM
Step 2: BIOS POST and Initialization
The BIOS (Basic Input/Output System) or UEFI firmware:
1. Tests memory (POST — Power-On Self Test)
2. Initializes hardware (chipset, interrupts, clocks)
3. Sets up real-mode interrupt handlers (INT 0x10 for video, INT 0x13 for disk, etc.)
4. Finds a bootable device (checks MBR signature 0x55AA at offset 510)
5. Loads the first 512 bytes (MBR) from the boot device to physical address 0x7C00
6. Jumps to 0x7C00
Your bootloader code is at 0x7C00. The BIOS has exited.
Step 3: MBR Bootloader (Stage 1)
The Master Boot Record is exactly 512 bytes. The last two bytes must be the magic signature 0x55, 0xAA (at offsets 510 and 511) or the BIOS will not boot it. You have 510 bytes for actual code and static data.
In 510 bytes, a typical stage-1 bootloader: 1. Sets up the stack (the BIOS may have left DL = boot drive number, which you should save) 2. Prints a loading message using BIOS INT 0x10 3. Reads additional sectors (the kernel) from disk using BIOS INT 0x13 4. Prepares for mode transitions 5. Jumps to stage-2 code
Real Mode (16-bit)
In real mode, the CPU operates as a very fast 16-bit 8086: - Only 16-bit registers visible (AX, BX, CX, DX, SI, DI, SP, BP) - Memory accessed via segment:offset addressing - Physical address = segment register × 16 + offset - Maximum addressable memory: 1MB (20-bit address space) - Can access BIOS interrupts
; Real mode addressing example
; Physical address = CS:IP = 0x0000:0x7C00 = 0x7C00
; Physical address = DS:SI where DS=0x0800, SI=0x0100
; → 0x0800 × 16 + 0x0100 = 0x8100
; Segment registers in real mode: CS, DS, ES, FS, GS, SS
; Set DS=0 to use flat addressing within first 64KB
xor ax, ax
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00 ; stack grows down from bootloader address
BIOS Video Services (INT 0x10)
; Print character 'A' to screen in real mode
; INT 0x10 / AH=0x0E: BIOS teletype output
mov ah, 0x0E ; function: teletype output
mov al, 'A' ; character to print
mov bh, 0 ; page number
mov bl, 0x07 ; color (white on black)
int 0x10
BIOS Disk Read (INT 0x13)
; Read disk sectors using BIOS Extended Read (INT 0x13 / AH=0x42)
; Uses a Disk Address Packet (DAP) structure
section .data
dap:
db 0x10 ; DAP size (16 bytes)
db 0 ; reserved
dw 10 ; number of sectors to read
dw 0x8000 ; offset of destination buffer
dw 0x0000 ; segment of destination buffer (ES:0x8000 = 0x08000)
dq 1 ; starting LBA (sector 1 = second sector, 0-indexed)
section .text
mov ah, 0x42 ; Extended Read Sectors
mov dl, [boot_drive] ; boot drive number (saved from DL at boot)
mov si, dap ; DS:SI = pointer to DAP
int 0x13
jc .disk_error ; CF set on error
Transitioning to Protected Mode (32-bit)
Real mode's 1MB address space is insufficient for loading a kernel. The transition to 32-bit protected mode requires:
- Setting up a GDT (Global Descriptor Table)
- Loading GDTR with
LGDT - Setting bit 0 of CR0 (PE — Protection Enable)
- A far jump to flush the instruction pipeline and load the new CS
The GDT (Global Descriptor Table)
In protected mode, segment registers contain selectors — indices into the GDT, not raw addresses. Each GDT entry (8 bytes) describes a segment: its base address, size limit, and access rights.
; Minimal GDT for protected mode transition
; Entry 0: null descriptor (required by CPU)
; Entry 1: kernel code segment (selector 0x08)
; Entry 2: kernel data segment (selector 0x10)
align 8
gdt_start:
; Null descriptor
dq 0
; Code segment: base=0, limit=4GB, 32-bit, ring 0, executable
; Flags byte 6: G=1 (4KB granularity), D=1 (32-bit), L=0 (not 64-bit)
; Access byte 5: P=1, DPL=0, S=1, Type=1010 (code, execute/read)
dw 0xFFFF ; limit[15:0]
dw 0x0000 ; base[15:0]
db 0x00 ; base[23:16]
db 0x9A ; access: P=1, DPL=0, S=1, Type=A (exec+read)
db 0xCF ; flags[7:4]=C (G=1,D=1,L=0,AVL=0), limit[19:16]=F
db 0x00 ; base[31:24]
; Data segment: base=0, limit=4GB, 32-bit, ring 0, writable
dw 0xFFFF
dw 0x0000
db 0x00
db 0x92 ; access: P=1, DPL=0, S=1, Type=2 (data, read/write)
db 0xCF
db 0x00
gdt_end:
gdt_ptr:
dw gdt_end - gdt_start - 1 ; limit
dd gdt_start ; base (32-bit in real mode context)
; Transition to protected mode:
cli ; disable interrupts
lgdt [gdt_ptr] ; load GDT
mov eax, cr0
or eax, 1 ; set PE bit
mov cr0, eax ; enable protected mode
jmp 0x08:pm_entry ; far jump: flush pipeline, load CS=0x08
bits 32
pm_entry:
; Now in 32-bit protected mode
mov ax, 0x10 ; data segment selector
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
mov esp, 0x90000 ; set up a stack
Transitioning to Long Mode (64-bit)
From 32-bit protected mode, transitioning to 64-bit long mode requires:
- Enable PAE (Physical Address Extension):
CR4.PAE = 1 - Set up minimal page tables (identity map the first 2MB)
- Enable long mode: set
EFER.LME = 1(viaWRMSR) - Enable paging:
CR0.PG = 1 - Far jump to 64-bit code segment
; === Long mode setup (from 32-bit protected mode) ===
; Runs in 32-bit mode, sets up 64-bit page tables
bits 32
; Step 1: Enable PAE
mov eax, cr4
or eax, (1 << 5) ; PAE bit = bit 5
mov cr4, eax
; Step 2: Set up minimal page tables (identity map first 2MB using 2MB huge page)
; PML4 at 0x1000, PDP at 0x2000, PD at 0x3000
; PML4[0] → PDP at 0x2000
mov dword [0x1000], 0x2003 ; present | r/w | address = 0x2000
mov dword [0x1004], 0 ; high 32 bits of 64-bit entry
; PDP[0] → PD at 0x3000
mov dword [0x2000], 0x3003
mov dword [0x2004], 0
; PD[0] → 2MB huge page at physical 0
; PS bit (bit 7) set = 2MB page; identity map (phys 0 = virt 0)
mov dword [0x3000], 0x0083 ; present | r/w | PS (huge page) | address = 0
mov dword [0x3004], 0
; CR3 = physical address of PML4
mov eax, 0x1000
mov cr3, eax
; Step 3: Enable long mode via EFER MSR
mov ecx, 0xC0000080 ; IA32_EFER MSR number
rdmsr
or eax, (1 << 8) ; LME = bit 8
wrmsr
; Step 4: Enable paging (and protected mode stays on)
mov eax, cr0
or eax, (1 << 31) ; PG = bit 31
mov cr0, eax
; At this point, CPU is in compatibility mode (IA-32e, LMA=1, CS.L=0)
; Step 5: Far jump to 64-bit code segment
; We need a 64-bit code segment in the GDT
; (add to GDT: 64-bit code segment with L=1)
jmp 0x18:long_mode_entry ; 0x18 = 64-bit code segment selector
bits 64
long_mode_entry:
; Now in 64-bit long mode!
; All 64-bit registers available
; Virtual address space is active (identity mapped for first 2MB)
; Set data segments
mov ax, 0x20 ; 64-bit data segment selector
mov ds, ax
mov es, ax
mov ss, ax
; FS and GS for thread-local storage (set to 0 for now)
xor ax, ax
mov fs, ax
mov gs, ax
; Set up stack
mov rsp, 0x9F000 ; top of available conventional memory
; Jump to the kernel
jmp kernel_main
The 64-bit GDT Extension
; Add these entries to the GDT for 64-bit mode:
; Entry 3 (selector 0x18): 64-bit kernel code
dw 0x0000 ; limit (ignored in 64-bit)
dw 0x0000 ; base (ignored in 64-bit)
db 0x00
db 0x9A ; P=1, DPL=0, S=1, Type=A (exec+read)
db 0x20 ; flags: G=0, D=0, L=1 (64-bit!) — bit 5
db 0x00
; Entry 4 (selector 0x20): 64-bit kernel data
dw 0x0000
dw 0x0000
db 0x00
db 0x92 ; P=1, DPL=0, S=1, Type=2 (data, r/w)
db 0x00
db 0x00
⚙️ How It Works: The critical field in the 64-bit code segment is the
Lbit (bit 5 of the flags byte, or bit 53 of the full 8-byte entry). When L=1 and the CPU is in IA-32e mode (LMA=1), the segment is a 64-bit code segment. When the far jump loads CS with this selector, the CPU enters 64-bit mode and 64-bit instructions become available.
The Complete MinOS Bootloader
Here is the full 512-byte bootloader, annotated line by line:
; minOS_boot.asm — Complete MinOS NASM Bootloader
; Builds to exactly 512 bytes.
; Build: nasm -f bin minOS_boot.asm -o boot.bin
; cat boot.bin kernel.bin > minOS.img
; qemu-system-x86_64 -drive format=raw,file=minOS.img
[BITS 16]
[ORG 0x7C00] ; BIOS loads us at this address
; ============= Stage 1: Setup (Real Mode 16-bit) =============
boot_start:
; Clear interrupts during setup
cli
; Set up segment registers for flat real-mode addressing
xor ax, ax
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00 ; stack below bootloader
; Save boot drive number (BIOS puts it in DL)
mov [boot_drive], dl
; Print boot message
mov si, msg_booting
call print_string
; Load kernel sectors from disk
; Our kernel is at sectors 1-N (1-indexed: sector 2 and beyond)
; Target buffer: 0x0000:0x8000 (physical 0x8000)
mov ah, 0x42 ; INT 13h Extended Read
mov dl, [boot_drive]
mov si, dap
int 0x13
jc disk_error
; Print "OK"
mov si, msg_ok
call print_string
; ============= Enable A20 line =============
; The A20 line must be enabled to access memory above 1MB
; Method: Fast A20 via port 0x92
in al, 0x92
or al, 0x02 ; set bit 1 (Fast A20 enable)
and al, ~0x01 ; don't reset
out 0x92, al
; ============= Load GDT and enter protected mode =============
lgdt [gdt_ptr_32]
mov eax, cr0
or eax, 1
mov cr0, eax
jmp 0x08:pm_entry_32 ; flush prefetch, load CS=kernel code
; ============= Real Mode Utilities =============
print_string:
; SI = pointer to null-terminated string
; Uses BIOS teletype (INT 0x10/AH=0x0E)
lodsb ; load byte at [SI], increment SI
test al, al
jz .done
mov ah, 0x0E
mov bh, 0
int 0x10
jmp print_string
.done:
ret
disk_error:
mov si, msg_error
call print_string
.halt:
cli
hlt
jmp .halt
; ============= Data =============
msg_booting db "Booting MinOS...", 13, 10, 0
msg_ok db "Kernel loaded.", 13, 10, 0
msg_error db "DISK ERROR!", 13, 10, 0
boot_drive db 0
; Disk Address Packet for Extended Read
dap:
db 0x10 ; packet size
db 0 ; reserved
dw 32 ; read 32 sectors (16KB of kernel)
dw 0x8000 ; buffer offset
dw 0x0000 ; buffer segment (ES=0 → physical 0x8000)
dq 1 ; starting LBA = sector 1 (0-indexed: sector 2)
; 32-bit GDT (for protected mode)
align 4
gdt_32:
dq 0 ; null descriptor
; Code: 0x08
dw 0xFFFF, 0x0000
db 0x00, 0x9A, 0xCF, 0x00
; Data: 0x10
dw 0xFFFF, 0x0000
db 0x00, 0x92, 0xCF, 0x00
; 64-bit Code: 0x18
dw 0x0000, 0x0000
db 0x00, 0x9A, 0x20, 0x00
; 64-bit Data: 0x20
dw 0x0000, 0x0000
db 0x00, 0x92, 0x00, 0x00
gdt_32_end:
gdt_ptr_32:
dw gdt_32_end - gdt_32 - 1
dd gdt_32
; ============= Protected Mode Entry (32-bit) =============
[BITS 32]
pm_entry_32:
mov ax, 0x10
mov ds, ax
mov es, ax
mov ss, ax
mov fs, ax
mov gs, ax
mov esp, 0x90000
; Setup page tables for long mode transition
; PML4 at 0x1000, PDPT at 0x2000, PD at 0x3000
; Zero all three tables first
mov edi, 0x1000
xor eax, eax
mov ecx, 0x3000 / 4 ; 3 pages × 4096 bytes / 4 bytes per stosd
rep stosd
; PML4[0] → PDPT at 0x2000
mov dword [0x1000], 0x2003 ; P|R/W
; PDPT[0] → PD at 0x3000
mov dword [0x2000], 0x3003
; PD[0] → 2MB identity page (physical 0)
mov dword [0x3000], 0x0083 ; P|R/W|PS(2MB)
; Enable PAE
mov eax, cr4
or eax, (1 << 5)
mov cr4, eax
; CR3 = PML4
mov eax, 0x1000
mov cr3, eax
; EFER.LME = 1
mov ecx, 0xC0000080
rdmsr
or eax, (1 << 8)
wrmsr
; Enable paging → enter compatibility mode
mov eax, cr0
or eax, (1 << 31)
mov cr0, eax
; Far jump to 64-bit code segment
jmp 0x18:lm_entry_64
[BITS 64]
lm_entry_64:
; 64-bit long mode active!
mov ax, 0x20
mov ds, ax
mov es, ax
mov ss, ax
xor ax, ax
mov fs, ax
mov gs, ax
mov rsp, 0x90000
; Jump to kernel main (kernel was loaded to 0x8000)
; Kernel entry point is at the start of the kernel binary
jmp 0x8000
; ============= Boot Signature =============
; Pad to 510 bytes, then add 0x55AA signature
times 510 - ($ - $$) db 0
dw 0xAA55
VGA Text Mode Output
Before setting up interrupts or a proper console, you can write directly to the VGA text buffer at physical address 0xB8000:
; VGA text mode: 80×25 characters, 2 bytes per character
; Byte 0: ASCII character
; Byte 1: attribute (high nibble = background, low nibble = foreground)
; Colors: 0=black, 7=light gray, 0xF=bright white, 0x4=red, 0x2=green
VGA_BASE equ 0xB8000
COLS equ 80
ROWS equ 25
; Write character at column CX, row DX
; AL = character, AH = attribute
vga_putchar_at:
; offset = (row * 80 + col) * 2
imul rdx, rdx, COLS
add rdx, rcx
imul rdx, rdx, 2
; Write to VGA buffer
mov word [VGA_BASE + rdx], ax ; AH=attr, AL=char
ret
; Clear screen (fill with spaces, dark attribute)
vga_clear:
mov rdi, VGA_BASE
mov rax, 0x0720 ; space (0x20) with attribute 0x07
; Replicate to fill 16-bit word
; Fill 80*25 = 2000 words = 4000 bytes = 500 qwords
mov rcx, 500
; Build qword from word: 0x07200720_07200720
movzx rax, ax
mov rbx, rax
shl rbx, 16
or rax, rbx
mov rbx, rax
shl rbx, 32
or rax, rbx
rep stosq
ret
QEMU Setup and Debugging
# Build the MinOS boot image
nasm -f bin minOS_boot.asm -o boot.bin
nasm -f bin kernel.asm -o kernel.bin # flat binary kernel
# Combine: boot sector + kernel (padding to sector boundaries)
cat boot.bin kernel.bin > minOS.img
# Pad to at least 512+16*512 = 8704 bytes
truncate -s 32768 minOS.img # pad to 32KB
# Run in QEMU (standard display)
qemu-system-x86_64 -drive format=raw,file=minOS.img
# Run with curses display (terminal-based)
qemu-system-x86_64 -drive format=raw,file=minOS.img -display curses
# Run with GDB debugging support (-s: GDB port 1234, -S: start paused)
qemu-system-x86_64 -drive format=raw,file=minOS.img -s -S
# In a separate terminal, attach GDB:
gdb
(gdb) target remote localhost:1234
(gdb) set architecture i8086 # real mode initially
(gdb) break *0x7C00 # breakpoint at bootloader entry
(gdb) continue
(gdb) x/20i 0x7C00 # examine bootloader instructions
🛠️ Lab Exercise: Build and run the MinOS bootloader in QEMU. Observe the "Booting MinOS..." message appearing in the QEMU window. Attach GDB with
-s -S, set a breakpoint at 0x7C00, and single-step through the real-mode to protected-mode to long-mode transition. Watch CR0 change (bit 0 set for protected mode, bit 31 set for paging).
Summary
The x86-64 boot process is a journey through three CPU modes: real mode (16-bit, 1MB), protected mode (32-bit, 4GB with segmentation), and long mode (64-bit, 128TB virtual). Each transition requires specific hardware setup: a GDT for protected mode, page tables for long mode. The bootloader is the most constrained code you will ever write — exactly 510 bytes to load a kernel and hand off control. Understanding this sequence explains every layer that sits above it.
🔄 Check Your Understanding: 1. Why is the magic signature
0x55AAat the very end (bytes 510–511) of the 512-byte MBR? 2. What is the A20 line, and why must it be enabled before accessing memory above 1MB? 3. Which bit of CR0 enables protected mode, and which bit enables paging? 4. Why is a far jump (jmp 0x08:pm_entry_32) required after setting CR0.PE=1? 5. What does theLbit in a GDT code segment descriptor do?