5 min read

Every keypress you have ever typed, every pixel ever drawn, every byte ever written to disk — all of it required software to talk directly to hardware through I/O interfaces. At the lowest level, this means writing values to specific addresses or...

Chapter 29: Device I/O

How Software Talks to Hardware

Every keypress you have ever typed, every pixel ever drawn, every byte ever written to disk — all of it required software to talk directly to hardware through I/O interfaces. At the lowest level, this means writing values to specific addresses or ports that hardware registers are mapped to, and reading values back.

There are two fundamentally different ways hardware registers appear to software: port-mapped I/O (using dedicated I/O address space and IN/OUT instructions) and memory-mapped I/O (using regular memory addresses and MOV instructions). Understanding both, and knowing which devices use which, is essential for bare-metal and kernel development.


Port-Mapped I/O (PIO / PMIO)

x86-64 provides a separate 16-bit I/O address space, distinct from the memory address space, containing 65,536 byte-addressable ports (0x0000–0xFFFF). You access them exclusively with the IN and OUT instructions.

IN and OUT Instructions

; Read a byte from port into AL (or word into AX, dword into EAX)
in  al, 0x60        ; read byte from port 0x60 (immediate port address)
in  ax, dx          ; read word from port in DX
in  eax, dx         ; read dword from port in DX

; Write to port from AL (or AX, EAX)
out 0x60, al        ; write byte AL to port 0x60
out dx, al          ; write byte AL to port in DX
out dx, eax         ; write dword EAX to port in DX

⚠️ Common Mistake: IN and OUT instructions can only use an 8-bit immediate port address (0x00–0xFF) directly. For ports above 0xFF, you must first load the port number into DX and use the register form. IN AL, 0x3F8 is valid; IN AL, 0x3F8 in NASM notation works for the immediate form only for ports ≤ 0xFF.

⚠️ Common Mistake: IN and OUT are privileged instructions in ring 0. User-mode code (ring 3) that attempts them receives a #GP fault unless the I/O Permission Bitmap (IOPB) in the TSS grants access to specific ports.

I/O Wait

Some older hardware (ISA bus devices) requires a short delay between I/O port accesses. The traditional method is to write to an unused port:

; I/O wait: write harmless byte to port 0x80 (POST code port)
io_wait:
    out 0x80, al
    ret

Modern hardware (PCI and PCIe devices) does not require this, but it is still present in legacy code for compatibility.


Common I/O Port Reference

Port Device Direction Purpose
0x20, 0x21 PIC1 R/W Master PIC command/data
0x40–0x43 PIT R/W Programmable Interval Timer
0x60 PS/2 R/W Keyboard/mouse data
0x64 PS/2 R/W PS/2 status/command
0x70, 0x71 CMOS/RTC R/W Real-time clock
0x80 POST W Diagnostic POST code display
0x92 System R/W Fast A20, system reset
0xA0, 0xA1 PIC2 R/W Slave PIC command/data
0x1F0–0x1F7 IDE0 R/W Primary IDE/ATA controller
0x3F8–0x3FF COM1 R/W Serial port (16550 UART)
0x2F8–0x2FF COM2 R/W Serial port 2
0x3C0–0x3CF VGA R/W VGA registers
0xCF8, 0xCFC PCI R/W PCI config space access

The PIT: Programmable Interval Timer (8253/8254)

The PIT is the timer chip that drove everything from the original PC speaker to the system scheduler's tick. It contains three independent 16-bit counter channels. Channel 0 is wired to IRQ0 and is the primary system timer.

PIT Architecture

PIT Register Map:
  Port 0x40: Channel 0 counter (IRQ0 — system timer)
  Port 0x41: Channel 1 counter (legacy DRAM refresh, ignore)
  Port 0x42: Channel 2 counter (PC speaker)
  Port 0x43: Mode/Command register (write only)

Mode/Command byte (write to 0x43):
  Bits 7:6 — Channel select: 00=ch0, 01=ch1, 10=ch2, 11=read-back
  Bits 5:4 — Access mode: 00=latch, 01=lo byte, 10=hi byte, 11=lo+hi
  Bits 3:1 — Operating mode: 000=interrupt on terminal count,
              010=rate generator (repeating), 011=square wave
  Bit  0   — BCD mode: 0=binary (use this), 1=BCD (don't)

Setting the PIT to 100Hz

The PIT's internal oscillator runs at 1.193182 MHz. To get 100Hz interrupts:

Reload value = 1,193,182 / desired_frequency
            = 1,193,182 / 100
            = 11,931 (0x2E9B)
; Configure PIT Channel 0 for 100Hz IRQ0

PIT_CHANNEL0  equ 0x40
PIT_COMMAND   equ 0x43
PIT_FREQUENCY equ 1193182
DESIRED_HZ    equ 100
PIT_DIVISOR   equ PIT_FREQUENCY / DESIRED_HZ   ; = 11931

pit_init_100hz:
    ; Mode/Command: channel 0, lo+hi access, mode 3 (square wave generator)
    ; 0x36 = 0b00_11_011_0
    ;        ch0 | lo+hi | square wave | binary
    mov al, 0x36
    out PIT_COMMAND, al

    ; Send reload value (lo byte first, then hi byte)
    mov ax, PIT_DIVISOR
    out PIT_CHANNEL0, al    ; low byte
    shr ax, 8
    out PIT_CHANNEL0, al    ; high byte
    ret

The Timer IRQ Handler

; IRQ0 handler: fires at 100Hz (every 10ms)
; Vector 32 (IRQ0 = PIC base 32 + 0)

section .data
    tick_count: dq 0        ; global tick counter
    seconds:    dq 0        ; seconds since boot

timer_handler:
    push rax

    ; Increment tick counter
    inc qword [tick_count]

    ; Every 100 ticks = 1 second
    mov rax, [tick_count]
    xor rdx, rdx
    mov rcx, 100
    div rcx
    test rdx, rdx
    jnz .no_second
    inc qword [seconds]
    ; Optional: update system clock, run scheduler tick, etc.

.no_second:
    ; Send EOI to PIC1
    mov al, 0x20
    out 0x20, al

    pop rax
    iretq

The 16550 UART (Serial Port)

The 16550 UART is the serial port chip. It is irreplaceable for bare-metal debugging: before you have a VGA driver, before you have interrupts working, the serial port is available for debug output. QEMU redirects serial output to stdout with -serial stdio.

UART Register Map (COM1: ports 0x3F8–0x3FF)

Port Offset  DLAB=0              DLAB=1
  +0         Data / Receive      Divisor Latch Low (baud rate)
  +1         Interrupt Enable    Divisor Latch High
  +2         FIFO Control        FIFO Control
  +3         Line Control        Line Control (set DLAB here)
  +4         Modem Control       Modem Control
  +5         Line Status         Line Status
  +6         Modem Status        Modem Status

Line Control Register (LCR) at +3:
  Bits 1:0 — Word length: 00=5-bit, 01=6-bit, 10=7-bit, 11=8-bit
  Bit  2   — Stop bits: 0=1 stop, 1=2 stops
  Bit  3   — Parity enable
  Bits 5:3 — Parity type
  Bit  7   — DLAB: 1=access divisor registers, 0=normal operation

Line Status Register (LSR) at +5:
  Bit 0 — Data Ready: 1=received byte available at +0
  Bit 5 — Transmitter Holding Register Empty: 1=safe to write to +0
  Bit 6 — Transmitter Empty: 1=both holding register and shift register empty

Initializing the UART for 115200 Baud, 8N1

; COM1 at 115200 baud, 8 data bits, no parity, 1 stop bit (8N1)

COM1_BASE equ 0x3F8
BAUD_115200_DIVISOR equ 1   ; 1.8432 MHz / (16 × 115200) = 1

serial_init:
    ; Disable interrupts
    mov al, 0x00
    out COM1_BASE + 1, al

    ; Enable DLAB (access divisor latch)
    mov al, 0x80
    out COM1_BASE + 3, al

    ; Set baud rate divisor: 1 = 115200 baud
    mov al, BAUD_115200_DIVISOR & 0xFF
    out COM1_BASE + 0, al   ; divisor low byte
    mov al, BAUD_115200_DIVISOR >> 8
    out COM1_BASE + 1, al   ; divisor high byte

    ; 8 bits, no parity, 1 stop bit (clear DLAB)
    ; 0x03 = 0b00_0_0_0_11 (8N1, DLAB=0)
    mov al, 0x03
    out COM1_BASE + 3, al

    ; Enable FIFO, clear TX/RX, 14-byte threshold
    mov al, 0xC7
    out COM1_BASE + 2, al

    ; Enable IRQs, RTS/DSR ready
    mov al, 0x0B
    out COM1_BASE + 4, al
    ret

; serial_putchar: send one character to COM1
; AL = character to send
serial_putchar:
    push rax
    push rdx

    ; Wait until transmitter holding register is empty
.wait:
    mov dx, COM1_BASE + 5   ; Line Status Register
    in  al, dx
    test al, 0x20           ; bit 5: THR empty?
    jz  .wait               ; not ready yet, wait

    ; Restore character and send
    pop rdx
    pop rax
    out COM1_BASE, al       ; write to transmit register
    push rax                ; push again for final cleanup
    push rdx

    pop rdx
    pop rax
    ret

; serial_puts: send null-terminated string to COM1
; RDI = pointer to string
serial_puts:
    push rax
.loop:
    mov al, [rdi]
    test al, al
    jz .done
    call serial_putchar
    inc rdi
    jmp .loop
.done:
    pop rax
    ret

; serial_write_hex64: write 64-bit value in RAX as hex to serial
serial_write_hex64:
    push rbx
    push rcx
    push rdx
    mov rbx, rax
    mov rcx, 16             ; 16 hex digits
.digit:
    rol rbx, 4              ; rotate MSN to LSN position
    mov al, bl
    and al, 0x0F
    add al, '0'
    cmp al, '9' + 1
    jl  .ok
    add al, 'A' - '9' - 1
.ok:
    call serial_putchar
    dec rcx
    jnz .digit
    pop rdx
    pop rcx
    pop rbx
    ret

Using Serial Output in QEMU

# Redirect serial to stdout (see output in terminal)
qemu-system-x86_64 -drive format=raw,file=minOS.img -serial stdio

# Or redirect to a file
qemu-system-x86_64 -drive format=raw,file=minOS.img -serial file:serial.log

# Or to a PTY (for connecting with minicom or screen)
qemu-system-x86_64 -drive format=raw,file=minOS.img -serial pty
# QEMU prints: char device redirected to /dev/pts/N
# Then: screen /dev/pts/N 115200

⚡ Performance Note: The serial port at 115200 baud transmits at 11,520 bytes per second. Each serial_putchar call with the busy-wait loop takes roughly 87 microseconds per byte. For kernel debugging, this is fine; for production output, use VGA or framebuffer instead.


Memory-Mapped I/O (MMIO)

Most modern devices use memory-mapped I/O: device registers appear at specific physical addresses, accessed with ordinary MOV instructions. The kernel maps these physical addresses into the kernel's virtual address space.

; MMIO example: accessing a device register at physical address 0xFED00000
; In a kernel with 1:1 physical mapping starting at 0xFFFF800000000000:

%define PHYS_MAP_BASE 0xFFFF800000000000

; Read 32-bit register at device offset 0x100
    mov rax, PHYS_MAP_BASE + 0xFED00000
    mov eax, [rax + 0x100]   ; read register

; Write 32-bit register at device offset 0x0B0 (EOI register)
    mov rax, PHYS_MAP_BASE + 0xFED00000
    mov dword [rax + 0x0B0], 0   ; write 0 to EOI register

⚠️ Common Mistake: The C volatile keyword is essential for MMIO in C code. Without it, the compiler may optimize away reads or writes to MMIO addresses that appear unused. In assembly, you have explicit control — every MOV is a real access. But you may still need memory fences (MFENCE) to prevent CPU reordering of MMIO accesses.


The APIC (Advanced Programmable Interrupt Controller)

Modern x86-64 systems use the APIC instead of the legacy 8259A PIC. The Local APIC (LAPIC) is built into each CPU core and is memory-mapped at physical address 0xFEE00000 (by default).

Key LAPIC Registers

LAPIC Register Map (at physical 0xFEE00000):
  Offset 0x020: Local APIC ID Register (read)
  Offset 0x030: Local APIC Version Register (read)
  Offset 0x080: Task Priority Register (TPR) — interrupt priority threshold
  Offset 0x0B0: EOI Register — write 0 to acknowledge interrupt (write only)
  Offset 0x0D0: Logical Destination Register
  Offset 0x0F0: Spurious Interrupt Vector Register — enable APIC + spurious vector
  Offset 0x320: LVT Timer Register — configure LAPIC timer
  Offset 0x380: Initial Count Register (for LAPIC timer)
  Offset 0x390: Current Count Register (read LAPIC timer count)
  Offset 0x3E0: Divide Configuration Register (LAPIC timer divider)

Initializing the LAPIC

; Enable the LAPIC and configure the spurious interrupt vector
; Assumes LAPIC is mapped in kernel address space

%define LAPIC_BASE      0xFEE00000      ; physical (use virtual in kernel)
%define LAPIC_SVR       0x0F0           ; Spurious Vector Register
%define LAPIC_EOI       0x0B0           ; End of Interrupt
%define LAPIC_LVT_TIMER 0x320           ; LVT Timer
%define LAPIC_ICRINIT   0x380           ; Initial Count
%define LAPIC_ICRCOUNT  0x390           ; Current Count
%define LAPIC_DIVCONF   0x3E0           ; Divide Configuration

%define VIRT_LAPIC      0xFFFFFFFFFEE00000   ; kernel virtual address (example)

lapic_enable:
    ; Set SVR: bit 8 = APIC enable, bits 7:0 = spurious vector (255 = 0xFF)
    mov eax, [VIRT_LAPIC + LAPIC_SVR]
    or  eax, 0x1FF              ; enable APIC + spurious vector = 0xFF
    mov [VIRT_LAPIC + LAPIC_SVR], eax
    ret

lapic_send_eoi:
    ; Write 0 to the EOI register to acknowledge the current interrupt
    mov dword [VIRT_LAPIC + LAPIC_EOI], 0
    ret

; Configure LAPIC one-shot timer at a given count
; RDI = initial count value (ticks until interrupt)
; Vector 32 (IRQ0 equivalent in APIC world)
lapic_timer_oneshot:
    ; Set timer vector in LVT (vector 32, mode 00 = one-shot)
    mov dword [VIRT_LAPIC + LAPIC_LVT_TIMER], 32
    ; Set divide by 16
    mov dword [VIRT_LAPIC + LAPIC_DIVCONF], 3
    ; Set initial count
    mov [VIRT_LAPIC + LAPIC_ICRINIT], edi
    ret

PCI Device Discovery

PCI (Peripheral Component Interconnect) is the standard bus for modern peripheral devices. Every PCI device has a configuration space — 256 bytes of registers accessible via port I/O at ports 0xCF8 (address register) and 0xCFC (data register).

; Read a 32-bit value from PCI configuration space
; BUS:DEVICE:FUNCTION:OFFSET selects the register
; rdi = bus (8-bit), rsi = device (5-bit), rdx = function (3-bit), rcx = offset (8-bit)

pci_read32:
    ; Build the 32-bit address:
    ; Bit 31: enable bit (must be 1)
    ; Bits 23:16: bus number
    ; Bits 15:11: device number
    ; Bits 10:8: function number
    ; Bits 7:2: register offset (dword-aligned)
    ; Bits 1:0: must be 0
    mov eax, (1 << 31)          ; enable bit
    shl rdi, 16                  ; bus → bits 23:16
    or  rax, rdi
    shl rsi, 11                  ; device → bits 15:11
    or  rax, rsi
    shl rdx, 8                   ; function → bits 10:8
    or  rax, rdx
    and rcx, ~3                  ; align offset to dword
    or  rax, rcx

    ; Write to PCI address port
    mov dx, 0xCF8
    out dx, eax

    ; Read from PCI data port
    mov dx, 0xCFC
    in  eax, dx
    ret

; Enumerate all PCI devices: scan bus 0, devices 0-31, function 0
pci_enumerate:
    push rbx
    xor rbx, rbx                ; device number
.scan_device:
    cmp rbx, 32
    jge .done

    ; Read vendor:device ID (offset 0)
    xor rdi, rdi                ; bus = 0
    mov rsi, rbx                ; device
    xor rdx, rdx                ; function = 0
    xor rcx, rcx                ; offset = 0
    call pci_read32
    ; If vendor ID = 0xFFFF, no device present
    cmp ax, 0xFFFF
    je .next

    ; Print vendor:device IDs (eax = DevID:VendorID)
    ; ...

.next:
    inc rbx
    jmp .scan_device
.done:
    pop rbx
    ret

ARM64: Everything Is MMIO

ARM64 has no port-mapped I/O. There is no IN/OUT equivalent. Every device register is memory-mapped. Memory access instructions (LDR/STR) are used for all device I/O.

// ARM64 MMIO example: read from a device register
// x0 = device base physical address (mapped in kernel)
ldr w1, [x0, #0x100]   // read 32-bit register at offset 0x100

// ARM64 memory barriers for MMIO:
// DSB (Data Synchronization Barrier): ensures all previous memory accesses complete
// before subsequent accesses
dsb sy                  // full system barrier (most conservative)
dsb st                  // store barrier only
// ISB (Instruction Synchronization Barrier): flushes pipeline
isb
// DMB (Data Memory Barrier): ordering without completion guarantee
dmb sy

The ARM64 memory model allows significant reordering of memory accesses. MMIO accesses in particular must be surrounded by appropriate barriers to prevent the CPU from reordering reads and writes to device registers. On x86-64, MFENCE provides the equivalent but is needed less often due to x86's stronger ordering model.


MinOS: PIT Timer Driver + Serial Debug Console

; minOS/drivers/timer_serial.asm — PIT + Serial initialization for MinOS

; Called during kernel init (before interrupts enabled):
drivers_init:
    call pit_init_100hz     ; set up 100Hz IRQ0
    call serial_init        ; set up 115200 baud COM1
    ; Test serial output
    mov rdi, init_msg
    call serial_puts
    ret

section .data
    init_msg db "MinOS serial console ready at 115200 baud", 13, 10, 0

📐 OS Kernel Project: After this chapter's drivers are integrated, MinOS has: a timer that fires 100 times per second (for scheduling), a serial console for debug output before VGA works, keyboard input, and physical memory management. The combination forms a functional embedded operating system.


Summary

Device I/O in x86-64 uses two mechanisms: port-mapped I/O with IN/OUT for legacy devices (PIC, PIT, UART, PS/2), and memory-mapped I/O with MOV for modern devices (APIC, PCIe, framebuffer). The PIT generates 100Hz timer interrupts. The 16550 UART provides a critical debug output path before the display system is initialized. On ARM64, all I/O is memory-mapped and requires explicit memory barriers.

🔄 Check Your Understanding: 1. What is the formula for calculating the PIT reload value to generate N Hz interrupts? 2. Why must the UART transmitter holding register be empty before writing a character? 3. What does the LAPIC EOI register do, and how does it differ from the 8259A PIC EOI? 4. Why does ARM64 require explicit memory barriers for MMIO when x86-64 generally does not? 5. What two I/O ports are used to read PCI configuration space registers?