8 min read

System calls are the user-space-initiated half of the kernel interface. But the kernel also needs to respond to events it did not ask for: a key was pressed, a timer fired, a process divided by zero, a memory access faulted. These events are handled...

Chapter 26: Interrupts, Exceptions, and Kernel Mode

When the Hardware Needs Your Attention

System calls are the user-space-initiated half of the kernel interface. But the kernel also needs to respond to events it did not ask for: a key was pressed, a timer fired, a process divided by zero, a memory access faulted. These events are handled through a unified mechanism: interrupts and exceptions.

Understanding this mechanism is the foundation of OS kernel development. Without it, you cannot write a keyboard driver, implement a preemptive scheduler, or handle page faults. The Interrupt Descriptor Table is the kernel's event dispatch table, and every OS — from a 64KB RTOS to the Linux kernel — starts by setting one up.


Three Flavors of Interrupt

The x86-64 architecture uses the word "interrupt" loosely to cover three distinct event types:

Hardware interrupts are asynchronous signals from external devices. The keyboard, the NIC, the timer — when any of these needs CPU attention, they signal the interrupt controller, which signals the CPU at the appropriate moment. The CPU finishes its current instruction, saves state, and jumps to the handler.

Software interrupts are triggered intentionally by the INT n instruction. INT 0x80 was the old Linux system call interface on 32-bit x86. INT3 (or equivalently, the single-byte opcode 0xCC) is the breakpoint instruction used by debuggers. INT 0x10 invokes the BIOS video services in real mode.

Exceptions are triggered by the CPU itself when something goes wrong (or something special happens): division by zero, an invalid opcode, a general protection fault, a page fault. Some exceptions are recoverable (the kernel handles them and resumes the program); others are fatal.


The Interrupt Descriptor Table (IDT)

The IDT is a table of 256 entries, one per interrupt or exception vector number (0 through 255). Each entry is a gate descriptor — a 16-byte structure that tells the CPU:

  • The address of the handler function
  • The privilege level required to invoke this vector via a software interrupt
  • The type of gate (interrupt gate vs. trap gate)
  • The code segment the handler runs in
IDT Entry Format (16 bytes, 64-bit mode):

Bytes 0-1:   Offset [15:0]    — low 16 bits of handler address
Bytes 2-3:   Selector         — code segment selector (GDT entry)
Byte  4:     IST              — Interrupt Stack Table index (0 = use current stack)
Byte  5:     Type/Attr        — P(1) DPL(2) 0(1) Type(4)
Bytes 6-7:   Offset [31:16]   — next 16 bits of handler address
Bytes 8-11:  Offset [63:32]   — high 32 bits of handler address
Bytes 12-15: Reserved         — must be zero

Type field:
  0xE = 64-bit Interrupt Gate (clears IF on entry — disables interrupts)
  0xF = 64-bit Trap Gate (does NOT clear IF — interrupts remain enabled)
  0x5 = Task Gate (not used in practice for 64-bit OS)

DPL (Descriptor Privilege Level):
  0 = only kernel can trigger via INT instruction
  3 = user space can trigger via INT instruction (needed for INT 0x80)

P (Present):
  1 = entry is valid
  0 = entry not present (triggers #NP fault)

In NASM, building an IDT entry:

; Build a 16-byte IDT entry for handler at address 'handler_fn'
; in kernel code segment (selector 0x08), DPL=0, 64-bit interrupt gate

%macro IDT_ENTRY 1
    dw (%1 & 0xFFFF)            ; offset [15:0]
    dw 0x0008                   ; kernel code segment selector
    db 0x00                     ; IST = 0 (use current stack)
    db 0x8E                     ; P=1, DPL=0, Type=0xE (64-bit interrupt gate)
    dw ((%1 >> 16) & 0xFFFF)    ; offset [31:16]
    dd ((%1 >> 32) & 0xFFFFFFFF); offset [63:32]
    dd 0                        ; reserved
%endmacro

⚠️ Common Mistake: The handler address is split across three non-contiguous fields in the gate descriptor. Getting the byte layout wrong is the most common IDT bug. Always test your IDT setup with a known-working triple-fault scenario in QEMU first.

The IDTR Register

The CPU finds the IDT through the IDTR (IDT Register), a 10-byte register holding the base address and limit of the IDT. You load it with the LIDT instruction:

section .data
    ; IDTR structure: 6 bytes (2-byte limit + 8-byte base)
    idtr:
        dw (256 * 16 - 1)   ; limit = 256 entries × 16 bytes each - 1
        dq idt_table         ; base address of IDT

section .text
    ; Load the IDT
    lidt [idtr]

LIDT is a privileged instruction (ring 0 only). You call it once during kernel initialization, before enabling interrupts.

Interrupt Gate vs. Trap Gate

The difference between an interrupt gate and a trap gate is what happens to the IF (Interrupt Enable Flag) in RFLAGS when the handler is entered:

  • Interrupt gate (type 0xE): IF is cleared on entry. While your interrupt handler runs, no other hardware interrupts can preempt it. This is the standard choice for hardware interrupt handlers to prevent re-entrancy.
  • Trap gate (type 0xF): IF is preserved. Other interrupts can fire while this handler runs. Use for exceptions where you want interrupts to remain enabled (exception handlers that may need to sleep, for example).

Exception Vectors 0–31

The first 32 vectors (0–31) are reserved by Intel for CPU-generated exceptions:

Vector Mnemonic Name Error Code? Notes
0 #DE Divide Error No DIV or IDIV with divisor 0, or quotient overflow
1 #DB Debug No Hardware breakpoints (DR0–DR3), single-step
2 Non-Maskable Interrupt No Hardware fault; cannot be masked with CLI
3 #BP Breakpoint No INT3 (opcode 0xCC); used by debuggers
4 #OF Overflow No INTO instruction when OF flag set
5 #BR Bound Range No BOUND instruction
6 #UD Invalid Opcode No Undefined or illegal instruction
7 #NM Device Not Available No FPU not present or CR0.TS set
8 #DF Double Fault Yes (0) Exception during exception handling
13 #GP General Protection Fault Yes Privilege violation, segment violation
14 #PF Page Fault Yes Virtual address not mapped, protection violation
16 #MF x87 FPU Error No FPU floating-point error
17 #AC Alignment Check Yes (0) Unaligned access with AC flag set
18 #MC Machine Check No Hardware error; non-recoverable
19 #XM SIMD FP Exception No SSE/AVX floating-point error

The Page Fault (#PF) — Vector 14

Page fault is the exception you will handle most in a real OS kernel. When the CPU cannot translate a virtual address (because the page is not present, because the access violates permissions, or because of various other conditions), it fires vector 14. The kernel's page fault handler determines what happened and responds.

When a page fault fires: 1. The CPU pushes RFLAGS, CS, RIP, and an error code onto the stack 2. CR2 is loaded with the faulting virtual address 3. Vector 14's handler executes

The error code bits:

Bit 0 (P):  0 = page not present, 1 = page present (protection violation)
Bit 1 (W):  0 = caused by read, 1 = caused by write
Bit 2 (U):  0 = in kernel mode (ring 0), 1 = in user mode (ring 3)
Bit 3 (R):  1 = caused by reading a reserved PTE bit
Bit 4 (I):  1 = caused by instruction fetch (NX violation)
; Page fault handler for MinOS
; At entry: error code on stack, CR2 = faulting address
page_fault_handler:
    ; The stack contains (top to bottom):
    ; [RSP+0]  = error code (pushed by CPU for #PF)
    ; [RSP+8]  = RIP (faulting instruction)
    ; [RSP+16] = CS
    ; [RSP+24] = RFLAGS
    ; [RSP+32] = RSP (user-mode RSP, if privilege change occurred)
    ; [RSP+40] = SS

    pop rax             ; pop error code
    mov rbx, cr2        ; get faulting address

    ; Decode the error code
    test al, 1
    jz .page_not_present

.protection_violation:
    ; P=1: page was present but access violated permissions
    ; Could be: write to read-only page (copy-on-write trigger),
    ; user access to kernel page, or NX violation
    ; For MinOS: just kill the process
    call kernel_panic
    ; ... (not shown for brevity)

.page_not_present:
    ; P=0: page not mapped at all
    ; For MinOS: check if in a valid VMA, allocate a physical page,
    ; map it, and return. Or segfault if invalid.
    ; ...
    iretq               ; return from interrupt (restores RIP, CS, RFLAGS, RSP, SS)

Interrupt Stack Frame

When an interrupt or exception fires, the CPU automatically pushes state onto the stack before jumping to the handler. The exact layout depends on whether a privilege change occurred and whether the exception has an error code:

Without privilege change (interrupt in kernel mode):

High address  ┌──────────────────┐
              │     old RFLAGS   │  ← RSP+24
              │      old CS      │  ← RSP+16  (actually CS:RIP saved together)
              │      old RIP     │  ← RSP+8
              │   error code     │  ← RSP+0   (only for exceptions with error code)
Current RSP → └──────────────────┘

With privilege change (interrupt/exception in user mode, ring 3 → ring 0):

High address  ┌──────────────────┐
              │      old SS      │  ← RSP+32  (user-mode SS)
              │      old RSP     │  ← RSP+24  (user-mode RSP)
              │     old RFLAGS   │  ← RSP+16
              │      old CS      │  ← RSP+8
              │      old RIP     │  ← RSP+0   (before error code push)
              │   error code     │  ← (if applicable, at RSP)
Kernel RSP  → └──────────────────┘

The IRET/IRETQ instruction pops all of this back and returns to the interrupted code. For exceptions with error codes, your handler must pop the error code before executing IRETQ.

; Generic exception handler (no error code):
generic_handler:
    ; save registers
    push rbp
    push rax
    push rbx
    ; ... (save all registers you will use)

    ; do work

    ; restore registers
    pop rbx
    pop rax
    pop rbp
    iretq               ; restore RIP, CS, RFLAGS (and RSP, SS if privilege change)

; Exception handler WITH error code (e.g., #GP, #PF):
gp_fault_handler:
    ; At entry, error code is already on the stack (pushed by CPU)
    pop rax             ; pop error code into RAX
    push rbp
    push rbx
    ; ... handle fault ...
    pop rbx
    pop rbp
    iretq

⚠️ Common Mistake: Forgetting that some exceptions push an error code and some do not. If you write one generic handler that pops an error code and use it for an exception that does not push one, you will corrupt your stack and triple-fault. You need separate stubs for each vector, or a macro that inserts a dummy error code for the exceptions that do not push one.

The classic solution (used by Linux):

; For exceptions WITHOUT error code: push a fake 0 error code
; to make all handlers uniform
%macro EXCEPTION_NOERR 1
exception_vector_%1:
    push qword 0        ; fake error code
    push qword %1       ; vector number
    jmp common_exception_handler
%endmacro

; For exceptions WITH error code: just push the vector number
%macro EXCEPTION_ERR 1
exception_vector_%1:
    push qword %1       ; CPU already pushed error code; push vector number
    jmp common_exception_handler
%endmacro

EXCEPTION_NOERR 0   ; #DE divide error
EXCEPTION_NOERR 1   ; #DB debug
EXCEPTION_NOERR 2   ; NMI
EXCEPTION_NOERR 3   ; #BP breakpoint
EXCEPTION_NOERR 6   ; #UD invalid opcode
EXCEPTION_ERR   8   ; #DF double fault
EXCEPTION_ERR   13  ; #GP general protection
EXCEPTION_ERR   14  ; #PF page fault

common_exception_handler:
    ; Stack now: error_code, vector_number, old_RIP, old_CS, old_RFLAGS
    ; (possibly old_RSP, old_SS if ring change)
    pop rdi             ; vector number → first argument
    pop rsi             ; error code → second argument
    call handle_exception
    iretq

Hardware Interrupts and the PIC

Vectors 32–255 are available for hardware interrupts and OS use. The x86-64 does not define which device gets which vector — that is determined by the interrupt controller configuration.

The 8259A PIC (Legacy)

The original IBM PC used a cascade of two Intel 8259A Programmable Interrupt Controllers. Each has 8 inputs (IRQ0–IRQ7), giving 15 total IRQs (IRQ2 is the cascade). The PIC is configured through port I/O:

IRQ0  — System timer (PIT, 18.2 Hz by default)
IRQ1  — PS/2 keyboard
IRQ2  — Cascade to second PIC
IRQ3  — COM2 (serial port 2)
IRQ4  — COM1 (serial port 1)
IRQ5  — LPT2 or sound card
IRQ6  — Floppy controller
IRQ7  — LPT1 (parallel port)
IRQ8  — Real-time clock
IRQ9  — ACPI / redirected IRQ2
IRQ10 — Available (often NIC)
IRQ11 — Available (often USB controller)
IRQ12 — PS/2 mouse
IRQ13 — FPU error (legacy)
IRQ14 — Primary IDE controller
IRQ15 — Secondary IDE controller

Critical issue: By default, IRQ0 maps to interrupt vector 8, but vector 8 is already used for the double fault (#DF). If a timer fires, the CPU would invoke the double fault handler — catastrophically wrong. You must remap the PIC to use vectors 32–47 (above the 32 reserved exception vectors).

; Remap the 8259A PIC to use IRQ0=vector 32, IRQ8=vector 40

%define PIC1_CMD    0x20
%define PIC1_DATA   0x21
%define PIC2_CMD    0xA0
%define PIC2_DATA   0xA1
%define PIC_EOI     0x20    ; End-of-Interrupt command

pic_remap:
    ; ICW1: Initialize PICs
    mov al, 0x11            ; ICW1: init + ICW4 needed
    out PIC1_CMD, al
    out PIC2_CMD, al

    ; ICW2: Set base vectors
    mov al, 0x20            ; PIC1: IRQ0 → vector 32 (0x20)
    out PIC1_DATA, al
    mov al, 0x28            ; PIC2: IRQ8 → vector 40 (0x28)
    out PIC2_DATA, al

    ; ICW3: Tell PIC1 about the cascade, tell PIC2 its ID
    mov al, 0x04            ; PIC1: IRQ2 has slave PIC
    out PIC1_DATA, al
    mov al, 0x02            ; PIC2: I am slave, connected to IRQ2
    out PIC2_DATA, al

    ; ICW4: 8086 mode
    mov al, 0x01
    out PIC1_DATA, al
    out PIC2_DATA, al

    ; Mask all interrupts initially (unmask as drivers are initialized)
    mov al, 0xFF
    out PIC1_DATA, al
    out PIC2_DATA, al
    ret

; After handling a hardware interrupt, send EOI to the PIC
; to re-enable interrupts from that IRQ line
pic_send_eoi:
    ; RDI = IRQ number
    cmp rdi, 8
    jl .eoi_pic1
    ; IRQ8-15: send EOI to both PICs
    mov al, PIC_EOI
    out PIC2_CMD, al
.eoi_pic1:
    mov al, PIC_EOI
    out PIC1_CMD, al
    ret

⚠️ Common Mistake: Forgetting the EOI. If you handle a hardware interrupt but never send the End-of-Interrupt command to the PIC, the PIC will never send another interrupt on that IRQ line. The keyboard will stop working after the first keystroke, or the timer will fire only once.

The APIC (Modern)

Modern x86-64 systems use the Advanced PIC (APIC), which consists of: - A Local APIC (LAPIC) in each CPU core — handles interrupts for that specific CPU - An I/O APIC on the chipset — routes hardware interrupts to the appropriate LAPIC

The LAPIC is memory-mapped at physical address 0xFEE00000 (by default). You read/write it with MOV instructions, not IN/OUT. The LAPIC is covered in detail in Chapter 29.


The Keyboard IRQ Handler (IRQ1 → Vector 33)

IRQ1 is the PS/2 keyboard interrupt. When a key is pressed or released, the PS/2 controller signals IRQ1. Your handler reads the scancode from port 0x60 and processes it.

; IRQ1 handler: PS/2 keyboard
; Vector 33 (IRQ1 = base 32 + 1)
keyboard_handler:
    push rax
    push rbx

    ; Read scancode from keyboard data port
    in al, 0x60             ; port 0x60 = PS/2 keyboard data

    ; Process the scancode
    ; Bit 7 = 0: key pressed (make code)
    ; Bit 7 = 1: key released (break code)
    test al, 0x80
    jnz .key_released

.key_pressed:
    and al, 0x7F            ; strip bit 7 to get the base scancode
    movzx rbx, al
    ; Look up ASCII in scancode table
    lea rax, [scancode_to_ascii]
    movzx rax, byte [rax + rbx]
    ; Store in keyboard ring buffer (if non-zero)
    test al, al
    jz .done
    call kb_buffer_put      ; add to keyboard buffer
    jmp .done

.key_released:
    ; Could track modifier key state here (Shift, Ctrl, Alt)

.done:
    ; Send EOI to PIC to re-enable keyboard interrupts
    mov al, 0x20
    out 0x20, al            ; PIC1 EOI

    pop rbx
    pop rax
    iretq

; Scancode Set 1 (scan code → ASCII) — partial table
scancode_to_ascii:
    db 0,  27, '1','2','3','4','5','6','7','8','9','0','-','='
    db 8,   9, 'q','w','e','r','t','y','u','i','o','p','[',']'
    db 13,  0, 'a','s','d','f','g','h','j','k','l',';',"'", '`'
    db  0, 92, 'z','x','c','v','b','n','m',',','.','/',  0, '*'
    db  0, ' ', 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0
    ; (simplified; real table handles Shift, F-keys, etc.)

MinOS Kernel: Setting Up the Full IDT

Here is the complete IDT initialization for MinOS:

; minOS/kernel/idt.asm — IDT setup for MinOS

IDT_ENTRIES equ 256
IDT_ENTRY_SIZE equ 16

section .bss
    align 16
    idt_table: resb IDT_ENTRIES * IDT_ENTRY_SIZE

section .data
    align 4
    idt_ptr:
        dw IDT_ENTRIES * IDT_ENTRY_SIZE - 1
        dq idt_table

section .text

; idt_set_gate: install a handler in the IDT
; RDI = vector number (0-255)
; RSI = handler address
; DL  = type (0x8E = interrupt gate, 0x8F = trap gate)
idt_set_gate:
    ; Calculate address of IDT entry: idt_table + vector * 16
    imul rdi, rdi, IDT_ENTRY_SIZE
    lea rax, [idt_table + rdi]

    ; Store offset[15:0]
    mov [rax], si
    ; Store code segment selector (kernel CS = 0x08)
    mov word [rax+2], 0x0008
    ; Store IST (0) and reserved
    mov byte [rax+4], 0x00
    ; Store type/attr (DL)
    mov [rax+5], dl
    ; Store offset[31:16]
    ror rsi, 16             ; rotate to get bits 31:16 in low word
    mov [rax+6], si
    ; Store offset[63:32]
    ror rsi, 16             ; rotate back
    shr rsi, 32
    mov [rax+8], esi
    ; Store reserved (must be zero)
    mov dword [rax+12], 0
    ret

idt_init:
    ; Install exception handlers (vectors 0-31)
    mov rdi, 0
    mov rsi, exception_vector_0
    mov dl, 0x8E            ; interrupt gate, DPL=0
    call idt_set_gate

    mov rdi, 1
    mov rsi, exception_vector_1
    mov dl, 0x8E
    call idt_set_gate

    ; ... (vectors 2-31 similarly) ...

    mov rdi, 14             ; #PF page fault
    mov rsi, page_fault_handler
    mov dl, 0x8E
    call idt_set_gate

    ; Remap and set up PIC
    call pic_remap

    ; Install IRQ handlers (vectors 32-47)
    mov rdi, 32             ; IRQ0 = timer
    mov rsi, timer_handler
    mov dl, 0x8E
    call idt_set_gate

    mov rdi, 33             ; IRQ1 = keyboard
    mov rsi, keyboard_handler
    mov dl, 0x8E
    call idt_set_gate

    ; Unmask IRQ0 (timer) and IRQ1 (keyboard) in PIC
    in al, 0x21             ; read PIC1 mask
    and al, 0xFC            ; clear bits 0 and 1 (unmask IRQ0, IRQ1)
    out 0x21, al

    ; Load the IDT
    lidt [idt_ptr]

    ; Enable interrupts
    sti                     ; set IF flag — interrupts now active!
    ret

Ring Transitions: The Privilege Level Mechanism

x86-64 enforces four privilege rings (0–3), though most OSes use only ring 0 (kernel) and ring 3 (user). The current privilege level (CPL) is stored in bits [1:0] of the CS register.

When an interrupt fires in user mode (ring 3), the CPU: 1. Looks up the IDT entry for the vector 2. Checks that CPL ≤ DPL for software interrupts (else #GP) 3. For hardware interrupts and exceptions: always takes the interrupt 4. Switches to the stack defined in the TSS (Task State Segment) for ring 0 5. Pushes SS, RSP (user), RFLAGS, CS, RIP onto the kernel stack 6. Loads CS with the kernel code segment, sets CPL = 0 7. Jumps to the handler

The TSS (Task State Segment) stores the ring-0 stack pointer (RSP0) that the CPU loads when transitioning from ring 3. In a single-processor kernel, there is one TSS; in SMP kernels, one TSS per CPU.

; TSS layout (minimal, 64-bit)
section .data
    align 16
    tss:
        dd 0                ; reserved
        dq kernel_stack_top ; RSP0: kernel stack for ring-0 transitions
        dq 0                ; RSP1 (not used)
        dq 0                ; RSP2 (not used)
        dq 0                ; reserved
        dq 0                ; IST1 (Interrupt Stack Table)
        dq 0                ; IST2
        ; ... IST3-7, IOPB offset, etc. ...
        times 68 db 0       ; fill remaining TSS fields
        dw 104              ; IOPB offset (past end of TSS = no IOPB)

Summary

The IDT is the CPU's event dispatch table. 256 16-byte gate descriptors map interrupt and exception vectors to handler addresses. The PIC (or APIC in modern systems) routes hardware interrupt requests to specific vectors. When any interrupt fires, the CPU saves architectural state automatically and jumps to the handler. Your handler processes the event and returns with IRETQ. Without the IDT, the CPU cannot handle keyboard input, timer events, or memory faults — there is no operating system.

🔄 Check Your Understanding: 1. What is the difference between an interrupt gate and a trap gate, and when would you use each? 2. Why must you remap the 8259A PIC from its default vectors (8–15) to vectors 32–47? 3. What information does the CPU automatically push onto the kernel stack when a page fault occurs in user mode? 4. Why does the keyboard IRQ handler need to send an EOI to the PIC? 5. What is the purpose of the IRETQ instruction, and what does it restore?