Every keypress you have ever typed, every pixel ever drawn, every byte ever written to disk — all of it required software to talk directly to hardware through I/O interfaces. At the lowest level, this means writing values to specific addresses or...
In This Chapter
- How Software Talks to Hardware
- Port-Mapped I/O (PIO / PMIO)
- Common I/O Port Reference
- The PIT: Programmable Interval Timer (8253/8254)
- The 16550 UART (Serial Port)
- Memory-Mapped I/O (MMIO)
- The APIC (Advanced Programmable Interrupt Controller)
- PCI Device Discovery
- ARM64: Everything Is MMIO
- MinOS: PIT Timer Driver + Serial Debug Console
- Summary
Chapter 29: Device I/O
How Software Talks to Hardware
Every keypress you have ever typed, every pixel ever drawn, every byte ever written to disk — all of it required software to talk directly to hardware through I/O interfaces. At the lowest level, this means writing values to specific addresses or ports that hardware registers are mapped to, and reading values back.
There are two fundamentally different ways hardware registers appear to software: port-mapped I/O (using dedicated I/O address space and IN/OUT instructions) and memory-mapped I/O (using regular memory addresses and MOV instructions). Understanding both, and knowing which devices use which, is essential for bare-metal and kernel development.
Port-Mapped I/O (PIO / PMIO)
x86-64 provides a separate 16-bit I/O address space, distinct from the memory address space, containing 65,536 byte-addressable ports (0x0000–0xFFFF). You access them exclusively with the IN and OUT instructions.
IN and OUT Instructions
; Read a byte from port into AL (or word into AX, dword into EAX)
in al, 0x60 ; read byte from port 0x60 (immediate port address)
in ax, dx ; read word from port in DX
in eax, dx ; read dword from port in DX
; Write to port from AL (or AX, EAX)
out 0x60, al ; write byte AL to port 0x60
out dx, al ; write byte AL to port in DX
out dx, eax ; write dword EAX to port in DX
⚠️ Common Mistake:
INandOUTinstructions can only use an 8-bit immediate port address (0x00–0xFF) directly. For ports above 0xFF, you must first load the port number intoDXand use the register form.IN AL, 0x3F8is valid;IN AL, 0x3F8in NASM notation works for the immediate form only for ports ≤ 0xFF.⚠️ Common Mistake:
INandOUTare privileged instructions in ring 0. User-mode code (ring 3) that attempts them receives a #GP fault unless the I/O Permission Bitmap (IOPB) in the TSS grants access to specific ports.
I/O Wait
Some older hardware (ISA bus devices) requires a short delay between I/O port accesses. The traditional method is to write to an unused port:
; I/O wait: write harmless byte to port 0x80 (POST code port)
io_wait:
out 0x80, al
ret
Modern hardware (PCI and PCIe devices) does not require this, but it is still present in legacy code for compatibility.
Common I/O Port Reference
| Port | Device | Direction | Purpose |
|---|---|---|---|
| 0x20, 0x21 | PIC1 | R/W | Master PIC command/data |
| 0x40–0x43 | PIT | R/W | Programmable Interval Timer |
| 0x60 | PS/2 | R/W | Keyboard/mouse data |
| 0x64 | PS/2 | R/W | PS/2 status/command |
| 0x70, 0x71 | CMOS/RTC | R/W | Real-time clock |
| 0x80 | POST | W | Diagnostic POST code display |
| 0x92 | System | R/W | Fast A20, system reset |
| 0xA0, 0xA1 | PIC2 | R/W | Slave PIC command/data |
| 0x1F0–0x1F7 | IDE0 | R/W | Primary IDE/ATA controller |
| 0x3F8–0x3FF | COM1 | R/W | Serial port (16550 UART) |
| 0x2F8–0x2FF | COM2 | R/W | Serial port 2 |
| 0x3C0–0x3CF | VGA | R/W | VGA registers |
| 0xCF8, 0xCFC | PCI | R/W | PCI config space access |
The PIT: Programmable Interval Timer (8253/8254)
The PIT is the timer chip that drove everything from the original PC speaker to the system scheduler's tick. It contains three independent 16-bit counter channels. Channel 0 is wired to IRQ0 and is the primary system timer.
PIT Architecture
PIT Register Map:
Port 0x40: Channel 0 counter (IRQ0 — system timer)
Port 0x41: Channel 1 counter (legacy DRAM refresh, ignore)
Port 0x42: Channel 2 counter (PC speaker)
Port 0x43: Mode/Command register (write only)
Mode/Command byte (write to 0x43):
Bits 7:6 — Channel select: 00=ch0, 01=ch1, 10=ch2, 11=read-back
Bits 5:4 — Access mode: 00=latch, 01=lo byte, 10=hi byte, 11=lo+hi
Bits 3:1 — Operating mode: 000=interrupt on terminal count,
010=rate generator (repeating), 011=square wave
Bit 0 — BCD mode: 0=binary (use this), 1=BCD (don't)
Setting the PIT to 100Hz
The PIT's internal oscillator runs at 1.193182 MHz. To get 100Hz interrupts:
Reload value = 1,193,182 / desired_frequency
= 1,193,182 / 100
= 11,931 (0x2E9B)
; Configure PIT Channel 0 for 100Hz IRQ0
PIT_CHANNEL0 equ 0x40
PIT_COMMAND equ 0x43
PIT_FREQUENCY equ 1193182
DESIRED_HZ equ 100
PIT_DIVISOR equ PIT_FREQUENCY / DESIRED_HZ ; = 11931
pit_init_100hz:
; Mode/Command: channel 0, lo+hi access, mode 3 (square wave generator)
; 0x36 = 0b00_11_011_0
; ch0 | lo+hi | square wave | binary
mov al, 0x36
out PIT_COMMAND, al
; Send reload value (lo byte first, then hi byte)
mov ax, PIT_DIVISOR
out PIT_CHANNEL0, al ; low byte
shr ax, 8
out PIT_CHANNEL0, al ; high byte
ret
The Timer IRQ Handler
; IRQ0 handler: fires at 100Hz (every 10ms)
; Vector 32 (IRQ0 = PIC base 32 + 0)
section .data
tick_count: dq 0 ; global tick counter
seconds: dq 0 ; seconds since boot
timer_handler:
push rax
; Increment tick counter
inc qword [tick_count]
; Every 100 ticks = 1 second
mov rax, [tick_count]
xor rdx, rdx
mov rcx, 100
div rcx
test rdx, rdx
jnz .no_second
inc qword [seconds]
; Optional: update system clock, run scheduler tick, etc.
.no_second:
; Send EOI to PIC1
mov al, 0x20
out 0x20, al
pop rax
iretq
The 16550 UART (Serial Port)
The 16550 UART is the serial port chip. It is irreplaceable for bare-metal debugging: before you have a VGA driver, before you have interrupts working, the serial port is available for debug output. QEMU redirects serial output to stdout with -serial stdio.
UART Register Map (COM1: ports 0x3F8–0x3FF)
Port Offset DLAB=0 DLAB=1
+0 Data / Receive Divisor Latch Low (baud rate)
+1 Interrupt Enable Divisor Latch High
+2 FIFO Control FIFO Control
+3 Line Control Line Control (set DLAB here)
+4 Modem Control Modem Control
+5 Line Status Line Status
+6 Modem Status Modem Status
Line Control Register (LCR) at +3:
Bits 1:0 — Word length: 00=5-bit, 01=6-bit, 10=7-bit, 11=8-bit
Bit 2 — Stop bits: 0=1 stop, 1=2 stops
Bit 3 — Parity enable
Bits 5:3 — Parity type
Bit 7 — DLAB: 1=access divisor registers, 0=normal operation
Line Status Register (LSR) at +5:
Bit 0 — Data Ready: 1=received byte available at +0
Bit 5 — Transmitter Holding Register Empty: 1=safe to write to +0
Bit 6 — Transmitter Empty: 1=both holding register and shift register empty
Initializing the UART for 115200 Baud, 8N1
; COM1 at 115200 baud, 8 data bits, no parity, 1 stop bit (8N1)
COM1_BASE equ 0x3F8
BAUD_115200_DIVISOR equ 1 ; 1.8432 MHz / (16 × 115200) = 1
serial_init:
; Disable interrupts
mov al, 0x00
out COM1_BASE + 1, al
; Enable DLAB (access divisor latch)
mov al, 0x80
out COM1_BASE + 3, al
; Set baud rate divisor: 1 = 115200 baud
mov al, BAUD_115200_DIVISOR & 0xFF
out COM1_BASE + 0, al ; divisor low byte
mov al, BAUD_115200_DIVISOR >> 8
out COM1_BASE + 1, al ; divisor high byte
; 8 bits, no parity, 1 stop bit (clear DLAB)
; 0x03 = 0b00_0_0_0_11 (8N1, DLAB=0)
mov al, 0x03
out COM1_BASE + 3, al
; Enable FIFO, clear TX/RX, 14-byte threshold
mov al, 0xC7
out COM1_BASE + 2, al
; Enable IRQs, RTS/DSR ready
mov al, 0x0B
out COM1_BASE + 4, al
ret
; serial_putchar: send one character to COM1
; AL = character to send
serial_putchar:
push rax
push rdx
; Wait until transmitter holding register is empty
.wait:
mov dx, COM1_BASE + 5 ; Line Status Register
in al, dx
test al, 0x20 ; bit 5: THR empty?
jz .wait ; not ready yet, wait
; Restore character and send
pop rdx
pop rax
out COM1_BASE, al ; write to transmit register
push rax ; push again for final cleanup
push rdx
pop rdx
pop rax
ret
; serial_puts: send null-terminated string to COM1
; RDI = pointer to string
serial_puts:
push rax
.loop:
mov al, [rdi]
test al, al
jz .done
call serial_putchar
inc rdi
jmp .loop
.done:
pop rax
ret
; serial_write_hex64: write 64-bit value in RAX as hex to serial
serial_write_hex64:
push rbx
push rcx
push rdx
mov rbx, rax
mov rcx, 16 ; 16 hex digits
.digit:
rol rbx, 4 ; rotate MSN to LSN position
mov al, bl
and al, 0x0F
add al, '0'
cmp al, '9' + 1
jl .ok
add al, 'A' - '9' - 1
.ok:
call serial_putchar
dec rcx
jnz .digit
pop rdx
pop rcx
pop rbx
ret
Using Serial Output in QEMU
# Redirect serial to stdout (see output in terminal)
qemu-system-x86_64 -drive format=raw,file=minOS.img -serial stdio
# Or redirect to a file
qemu-system-x86_64 -drive format=raw,file=minOS.img -serial file:serial.log
# Or to a PTY (for connecting with minicom or screen)
qemu-system-x86_64 -drive format=raw,file=minOS.img -serial pty
# QEMU prints: char device redirected to /dev/pts/N
# Then: screen /dev/pts/N 115200
⚡ Performance Note: The serial port at 115200 baud transmits at 11,520 bytes per second. Each
serial_putcharcall with the busy-wait loop takes roughly 87 microseconds per byte. For kernel debugging, this is fine; for production output, use VGA or framebuffer instead.
Memory-Mapped I/O (MMIO)
Most modern devices use memory-mapped I/O: device registers appear at specific physical addresses, accessed with ordinary MOV instructions. The kernel maps these physical addresses into the kernel's virtual address space.
; MMIO example: accessing a device register at physical address 0xFED00000
; In a kernel with 1:1 physical mapping starting at 0xFFFF800000000000:
%define PHYS_MAP_BASE 0xFFFF800000000000
; Read 32-bit register at device offset 0x100
mov rax, PHYS_MAP_BASE + 0xFED00000
mov eax, [rax + 0x100] ; read register
; Write 32-bit register at device offset 0x0B0 (EOI register)
mov rax, PHYS_MAP_BASE + 0xFED00000
mov dword [rax + 0x0B0], 0 ; write 0 to EOI register
⚠️ Common Mistake: The C
volatilekeyword is essential for MMIO in C code. Without it, the compiler may optimize away reads or writes to MMIO addresses that appear unused. In assembly, you have explicit control — every MOV is a real access. But you may still need memory fences (MFENCE) to prevent CPU reordering of MMIO accesses.
The APIC (Advanced Programmable Interrupt Controller)
Modern x86-64 systems use the APIC instead of the legacy 8259A PIC. The Local APIC (LAPIC) is built into each CPU core and is memory-mapped at physical address 0xFEE00000 (by default).
Key LAPIC Registers
LAPIC Register Map (at physical 0xFEE00000):
Offset 0x020: Local APIC ID Register (read)
Offset 0x030: Local APIC Version Register (read)
Offset 0x080: Task Priority Register (TPR) — interrupt priority threshold
Offset 0x0B0: EOI Register — write 0 to acknowledge interrupt (write only)
Offset 0x0D0: Logical Destination Register
Offset 0x0F0: Spurious Interrupt Vector Register — enable APIC + spurious vector
Offset 0x320: LVT Timer Register — configure LAPIC timer
Offset 0x380: Initial Count Register (for LAPIC timer)
Offset 0x390: Current Count Register (read LAPIC timer count)
Offset 0x3E0: Divide Configuration Register (LAPIC timer divider)
Initializing the LAPIC
; Enable the LAPIC and configure the spurious interrupt vector
; Assumes LAPIC is mapped in kernel address space
%define LAPIC_BASE 0xFEE00000 ; physical (use virtual in kernel)
%define LAPIC_SVR 0x0F0 ; Spurious Vector Register
%define LAPIC_EOI 0x0B0 ; End of Interrupt
%define LAPIC_LVT_TIMER 0x320 ; LVT Timer
%define LAPIC_ICRINIT 0x380 ; Initial Count
%define LAPIC_ICRCOUNT 0x390 ; Current Count
%define LAPIC_DIVCONF 0x3E0 ; Divide Configuration
%define VIRT_LAPIC 0xFFFFFFFFFEE00000 ; kernel virtual address (example)
lapic_enable:
; Set SVR: bit 8 = APIC enable, bits 7:0 = spurious vector (255 = 0xFF)
mov eax, [VIRT_LAPIC + LAPIC_SVR]
or eax, 0x1FF ; enable APIC + spurious vector = 0xFF
mov [VIRT_LAPIC + LAPIC_SVR], eax
ret
lapic_send_eoi:
; Write 0 to the EOI register to acknowledge the current interrupt
mov dword [VIRT_LAPIC + LAPIC_EOI], 0
ret
; Configure LAPIC one-shot timer at a given count
; RDI = initial count value (ticks until interrupt)
; Vector 32 (IRQ0 equivalent in APIC world)
lapic_timer_oneshot:
; Set timer vector in LVT (vector 32, mode 00 = one-shot)
mov dword [VIRT_LAPIC + LAPIC_LVT_TIMER], 32
; Set divide by 16
mov dword [VIRT_LAPIC + LAPIC_DIVCONF], 3
; Set initial count
mov [VIRT_LAPIC + LAPIC_ICRINIT], edi
ret
PCI Device Discovery
PCI (Peripheral Component Interconnect) is the standard bus for modern peripheral devices. Every PCI device has a configuration space — 256 bytes of registers accessible via port I/O at ports 0xCF8 (address register) and 0xCFC (data register).
; Read a 32-bit value from PCI configuration space
; BUS:DEVICE:FUNCTION:OFFSET selects the register
; rdi = bus (8-bit), rsi = device (5-bit), rdx = function (3-bit), rcx = offset (8-bit)
pci_read32:
; Build the 32-bit address:
; Bit 31: enable bit (must be 1)
; Bits 23:16: bus number
; Bits 15:11: device number
; Bits 10:8: function number
; Bits 7:2: register offset (dword-aligned)
; Bits 1:0: must be 0
mov eax, (1 << 31) ; enable bit
shl rdi, 16 ; bus → bits 23:16
or rax, rdi
shl rsi, 11 ; device → bits 15:11
or rax, rsi
shl rdx, 8 ; function → bits 10:8
or rax, rdx
and rcx, ~3 ; align offset to dword
or rax, rcx
; Write to PCI address port
mov dx, 0xCF8
out dx, eax
; Read from PCI data port
mov dx, 0xCFC
in eax, dx
ret
; Enumerate all PCI devices: scan bus 0, devices 0-31, function 0
pci_enumerate:
push rbx
xor rbx, rbx ; device number
.scan_device:
cmp rbx, 32
jge .done
; Read vendor:device ID (offset 0)
xor rdi, rdi ; bus = 0
mov rsi, rbx ; device
xor rdx, rdx ; function = 0
xor rcx, rcx ; offset = 0
call pci_read32
; If vendor ID = 0xFFFF, no device present
cmp ax, 0xFFFF
je .next
; Print vendor:device IDs (eax = DevID:VendorID)
; ...
.next:
inc rbx
jmp .scan_device
.done:
pop rbx
ret
ARM64: Everything Is MMIO
ARM64 has no port-mapped I/O. There is no IN/OUT equivalent. Every device register is memory-mapped. Memory access instructions (LDR/STR) are used for all device I/O.
// ARM64 MMIO example: read from a device register
// x0 = device base physical address (mapped in kernel)
ldr w1, [x0, #0x100] // read 32-bit register at offset 0x100
// ARM64 memory barriers for MMIO:
// DSB (Data Synchronization Barrier): ensures all previous memory accesses complete
// before subsequent accesses
dsb sy // full system barrier (most conservative)
dsb st // store barrier only
// ISB (Instruction Synchronization Barrier): flushes pipeline
isb
// DMB (Data Memory Barrier): ordering without completion guarantee
dmb sy
The ARM64 memory model allows significant reordering of memory accesses. MMIO accesses in particular must be surrounded by appropriate barriers to prevent the CPU from reordering reads and writes to device registers. On x86-64, MFENCE provides the equivalent but is needed less often due to x86's stronger ordering model.
MinOS: PIT Timer Driver + Serial Debug Console
; minOS/drivers/timer_serial.asm — PIT + Serial initialization for MinOS
; Called during kernel init (before interrupts enabled):
drivers_init:
call pit_init_100hz ; set up 100Hz IRQ0
call serial_init ; set up 115200 baud COM1
; Test serial output
mov rdi, init_msg
call serial_puts
ret
section .data
init_msg db "MinOS serial console ready at 115200 baud", 13, 10, 0
📐 OS Kernel Project: After this chapter's drivers are integrated, MinOS has: a timer that fires 100 times per second (for scheduling), a serial console for debug output before VGA works, keyboard input, and physical memory management. The combination forms a functional embedded operating system.
Summary
Device I/O in x86-64 uses two mechanisms: port-mapped I/O with IN/OUT for legacy devices (PIC, PIT, UART, PS/2), and memory-mapped I/O with MOV for modern devices (APIC, PCIe, framebuffer). The PIT generates 100Hz timer interrupts. The 16550 UART provides a critical debug output path before the display system is initialized. On ARM64, all I/O is memory-mapped and requires explicit memory barriers.
🔄 Check Your Understanding: 1. What is the formula for calculating the PIT reload value to generate N Hz interrupts? 2. Why must the UART transmitter holding register be empty before writing a character? 3. What does the LAPIC EOI register do, and how does it differ from the 8259A PIC EOI? 4. Why does ARM64 require explicit memory barriers for MMIO when x86-64 generally does not? 5. What two I/O ports are used to read PCI configuration space registers?