System calls are the user-space-initiated half of the kernel interface. But the kernel also needs to respond to events it did not ask for: a key was pressed, a timer fired, a process divided by zero, a memory access faulted. These events are handled...
In This Chapter
- When the Hardware Needs Your Attention
- Three Flavors of Interrupt
- The Interrupt Descriptor Table (IDT)
- Exception Vectors 0–31
- Interrupt Stack Frame
- Hardware Interrupts and the PIC
- The Keyboard IRQ Handler (IRQ1 → Vector 33)
- MinOS Kernel: Setting Up the Full IDT
- Ring Transitions: The Privilege Level Mechanism
- Summary
Chapter 26: Interrupts, Exceptions, and Kernel Mode
When the Hardware Needs Your Attention
System calls are the user-space-initiated half of the kernel interface. But the kernel also needs to respond to events it did not ask for: a key was pressed, a timer fired, a process divided by zero, a memory access faulted. These events are handled through a unified mechanism: interrupts and exceptions.
Understanding this mechanism is the foundation of OS kernel development. Without it, you cannot write a keyboard driver, implement a preemptive scheduler, or handle page faults. The Interrupt Descriptor Table is the kernel's event dispatch table, and every OS — from a 64KB RTOS to the Linux kernel — starts by setting one up.
Three Flavors of Interrupt
The x86-64 architecture uses the word "interrupt" loosely to cover three distinct event types:
Hardware interrupts are asynchronous signals from external devices. The keyboard, the NIC, the timer — when any of these needs CPU attention, they signal the interrupt controller, which signals the CPU at the appropriate moment. The CPU finishes its current instruction, saves state, and jumps to the handler.
Software interrupts are triggered intentionally by the INT n instruction. INT 0x80 was the old Linux system call interface on 32-bit x86. INT3 (or equivalently, the single-byte opcode 0xCC) is the breakpoint instruction used by debuggers. INT 0x10 invokes the BIOS video services in real mode.
Exceptions are triggered by the CPU itself when something goes wrong (or something special happens): division by zero, an invalid opcode, a general protection fault, a page fault. Some exceptions are recoverable (the kernel handles them and resumes the program); others are fatal.
The Interrupt Descriptor Table (IDT)
The IDT is a table of 256 entries, one per interrupt or exception vector number (0 through 255). Each entry is a gate descriptor — a 16-byte structure that tells the CPU:
- The address of the handler function
- The privilege level required to invoke this vector via a software interrupt
- The type of gate (interrupt gate vs. trap gate)
- The code segment the handler runs in
IDT Entry Format (16 bytes, 64-bit mode):
Bytes 0-1: Offset [15:0] — low 16 bits of handler address
Bytes 2-3: Selector — code segment selector (GDT entry)
Byte 4: IST — Interrupt Stack Table index (0 = use current stack)
Byte 5: Type/Attr — P(1) DPL(2) 0(1) Type(4)
Bytes 6-7: Offset [31:16] — next 16 bits of handler address
Bytes 8-11: Offset [63:32] — high 32 bits of handler address
Bytes 12-15: Reserved — must be zero
Type field:
0xE = 64-bit Interrupt Gate (clears IF on entry — disables interrupts)
0xF = 64-bit Trap Gate (does NOT clear IF — interrupts remain enabled)
0x5 = Task Gate (not used in practice for 64-bit OS)
DPL (Descriptor Privilege Level):
0 = only kernel can trigger via INT instruction
3 = user space can trigger via INT instruction (needed for INT 0x80)
P (Present):
1 = entry is valid
0 = entry not present (triggers #NP fault)
In NASM, building an IDT entry:
; Build a 16-byte IDT entry for handler at address 'handler_fn'
; in kernel code segment (selector 0x08), DPL=0, 64-bit interrupt gate
%macro IDT_ENTRY 1
dw (%1 & 0xFFFF) ; offset [15:0]
dw 0x0008 ; kernel code segment selector
db 0x00 ; IST = 0 (use current stack)
db 0x8E ; P=1, DPL=0, Type=0xE (64-bit interrupt gate)
dw ((%1 >> 16) & 0xFFFF) ; offset [31:16]
dd ((%1 >> 32) & 0xFFFFFFFF); offset [63:32]
dd 0 ; reserved
%endmacro
⚠️ Common Mistake: The handler address is split across three non-contiguous fields in the gate descriptor. Getting the byte layout wrong is the most common IDT bug. Always test your IDT setup with a known-working triple-fault scenario in QEMU first.
The IDTR Register
The CPU finds the IDT through the IDTR (IDT Register), a 10-byte register holding the base address and limit of the IDT. You load it with the LIDT instruction:
section .data
; IDTR structure: 6 bytes (2-byte limit + 8-byte base)
idtr:
dw (256 * 16 - 1) ; limit = 256 entries × 16 bytes each - 1
dq idt_table ; base address of IDT
section .text
; Load the IDT
lidt [idtr]
LIDT is a privileged instruction (ring 0 only). You call it once during kernel initialization, before enabling interrupts.
Interrupt Gate vs. Trap Gate
The difference between an interrupt gate and a trap gate is what happens to the IF (Interrupt Enable Flag) in RFLAGS when the handler is entered:
- Interrupt gate (type 0xE):
IFis cleared on entry. While your interrupt handler runs, no other hardware interrupts can preempt it. This is the standard choice for hardware interrupt handlers to prevent re-entrancy. - Trap gate (type 0xF):
IFis preserved. Other interrupts can fire while this handler runs. Use for exceptions where you want interrupts to remain enabled (exception handlers that may need to sleep, for example).
Exception Vectors 0–31
The first 32 vectors (0–31) are reserved by Intel for CPU-generated exceptions:
| Vector | Mnemonic | Name | Error Code? | Notes |
|---|---|---|---|---|
| 0 | #DE | Divide Error | No | DIV or IDIV with divisor 0, or quotient overflow |
| 1 | #DB | Debug | No | Hardware breakpoints (DR0–DR3), single-step |
| 2 | — | Non-Maskable Interrupt | No | Hardware fault; cannot be masked with CLI |
| 3 | #BP | Breakpoint | No | INT3 (opcode 0xCC); used by debuggers |
| 4 | #OF | Overflow | No | INTO instruction when OF flag set |
| 5 | #BR | Bound Range | No | BOUND instruction |
| 6 | #UD | Invalid Opcode | No | Undefined or illegal instruction |
| 7 | #NM | Device Not Available | No | FPU not present or CR0.TS set |
| 8 | #DF | Double Fault | Yes (0) | Exception during exception handling |
| 13 | #GP | General Protection Fault | Yes | Privilege violation, segment violation |
| 14 | #PF | Page Fault | Yes | Virtual address not mapped, protection violation |
| 16 | #MF | x87 FPU Error | No | FPU floating-point error |
| 17 | #AC | Alignment Check | Yes (0) | Unaligned access with AC flag set |
| 18 | #MC | Machine Check | No | Hardware error; non-recoverable |
| 19 | #XM | SIMD FP Exception | No | SSE/AVX floating-point error |
The Page Fault (#PF) — Vector 14
Page fault is the exception you will handle most in a real OS kernel. When the CPU cannot translate a virtual address (because the page is not present, because the access violates permissions, or because of various other conditions), it fires vector 14. The kernel's page fault handler determines what happened and responds.
When a page fault fires:
1. The CPU pushes RFLAGS, CS, RIP, and an error code onto the stack
2. CR2 is loaded with the faulting virtual address
3. Vector 14's handler executes
The error code bits:
Bit 0 (P): 0 = page not present, 1 = page present (protection violation)
Bit 1 (W): 0 = caused by read, 1 = caused by write
Bit 2 (U): 0 = in kernel mode (ring 0), 1 = in user mode (ring 3)
Bit 3 (R): 1 = caused by reading a reserved PTE bit
Bit 4 (I): 1 = caused by instruction fetch (NX violation)
; Page fault handler for MinOS
; At entry: error code on stack, CR2 = faulting address
page_fault_handler:
; The stack contains (top to bottom):
; [RSP+0] = error code (pushed by CPU for #PF)
; [RSP+8] = RIP (faulting instruction)
; [RSP+16] = CS
; [RSP+24] = RFLAGS
; [RSP+32] = RSP (user-mode RSP, if privilege change occurred)
; [RSP+40] = SS
pop rax ; pop error code
mov rbx, cr2 ; get faulting address
; Decode the error code
test al, 1
jz .page_not_present
.protection_violation:
; P=1: page was present but access violated permissions
; Could be: write to read-only page (copy-on-write trigger),
; user access to kernel page, or NX violation
; For MinOS: just kill the process
call kernel_panic
; ... (not shown for brevity)
.page_not_present:
; P=0: page not mapped at all
; For MinOS: check if in a valid VMA, allocate a physical page,
; map it, and return. Or segfault if invalid.
; ...
iretq ; return from interrupt (restores RIP, CS, RFLAGS, RSP, SS)
Interrupt Stack Frame
When an interrupt or exception fires, the CPU automatically pushes state onto the stack before jumping to the handler. The exact layout depends on whether a privilege change occurred and whether the exception has an error code:
Without privilege change (interrupt in kernel mode):
High address ┌──────────────────┐
│ old RFLAGS │ ← RSP+24
│ old CS │ ← RSP+16 (actually CS:RIP saved together)
│ old RIP │ ← RSP+8
│ error code │ ← RSP+0 (only for exceptions with error code)
Current RSP → └──────────────────┘
With privilege change (interrupt/exception in user mode, ring 3 → ring 0):
High address ┌──────────────────┐
│ old SS │ ← RSP+32 (user-mode SS)
│ old RSP │ ← RSP+24 (user-mode RSP)
│ old RFLAGS │ ← RSP+16
│ old CS │ ← RSP+8
│ old RIP │ ← RSP+0 (before error code push)
│ error code │ ← (if applicable, at RSP)
Kernel RSP → └──────────────────┘
The IRET/IRETQ instruction pops all of this back and returns to the interrupted code. For exceptions with error codes, your handler must pop the error code before executing IRETQ.
; Generic exception handler (no error code):
generic_handler:
; save registers
push rbp
push rax
push rbx
; ... (save all registers you will use)
; do work
; restore registers
pop rbx
pop rax
pop rbp
iretq ; restore RIP, CS, RFLAGS (and RSP, SS if privilege change)
; Exception handler WITH error code (e.g., #GP, #PF):
gp_fault_handler:
; At entry, error code is already on the stack (pushed by CPU)
pop rax ; pop error code into RAX
push rbp
push rbx
; ... handle fault ...
pop rbx
pop rbp
iretq
⚠️ Common Mistake: Forgetting that some exceptions push an error code and some do not. If you write one generic handler that pops an error code and use it for an exception that does not push one, you will corrupt your stack and triple-fault. You need separate stubs for each vector, or a macro that inserts a dummy error code for the exceptions that do not push one.
The classic solution (used by Linux):
; For exceptions WITHOUT error code: push a fake 0 error code
; to make all handlers uniform
%macro EXCEPTION_NOERR 1
exception_vector_%1:
push qword 0 ; fake error code
push qword %1 ; vector number
jmp common_exception_handler
%endmacro
; For exceptions WITH error code: just push the vector number
%macro EXCEPTION_ERR 1
exception_vector_%1:
push qword %1 ; CPU already pushed error code; push vector number
jmp common_exception_handler
%endmacro
EXCEPTION_NOERR 0 ; #DE divide error
EXCEPTION_NOERR 1 ; #DB debug
EXCEPTION_NOERR 2 ; NMI
EXCEPTION_NOERR 3 ; #BP breakpoint
EXCEPTION_NOERR 6 ; #UD invalid opcode
EXCEPTION_ERR 8 ; #DF double fault
EXCEPTION_ERR 13 ; #GP general protection
EXCEPTION_ERR 14 ; #PF page fault
common_exception_handler:
; Stack now: error_code, vector_number, old_RIP, old_CS, old_RFLAGS
; (possibly old_RSP, old_SS if ring change)
pop rdi ; vector number → first argument
pop rsi ; error code → second argument
call handle_exception
iretq
Hardware Interrupts and the PIC
Vectors 32–255 are available for hardware interrupts and OS use. The x86-64 does not define which device gets which vector — that is determined by the interrupt controller configuration.
The 8259A PIC (Legacy)
The original IBM PC used a cascade of two Intel 8259A Programmable Interrupt Controllers. Each has 8 inputs (IRQ0–IRQ7), giving 15 total IRQs (IRQ2 is the cascade). The PIC is configured through port I/O:
IRQ0 — System timer (PIT, 18.2 Hz by default)
IRQ1 — PS/2 keyboard
IRQ2 — Cascade to second PIC
IRQ3 — COM2 (serial port 2)
IRQ4 — COM1 (serial port 1)
IRQ5 — LPT2 or sound card
IRQ6 — Floppy controller
IRQ7 — LPT1 (parallel port)
IRQ8 — Real-time clock
IRQ9 — ACPI / redirected IRQ2
IRQ10 — Available (often NIC)
IRQ11 — Available (often USB controller)
IRQ12 — PS/2 mouse
IRQ13 — FPU error (legacy)
IRQ14 — Primary IDE controller
IRQ15 — Secondary IDE controller
Critical issue: By default, IRQ0 maps to interrupt vector 8, but vector 8 is already used for the double fault (#DF). If a timer fires, the CPU would invoke the double fault handler — catastrophically wrong. You must remap the PIC to use vectors 32–47 (above the 32 reserved exception vectors).
; Remap the 8259A PIC to use IRQ0=vector 32, IRQ8=vector 40
%define PIC1_CMD 0x20
%define PIC1_DATA 0x21
%define PIC2_CMD 0xA0
%define PIC2_DATA 0xA1
%define PIC_EOI 0x20 ; End-of-Interrupt command
pic_remap:
; ICW1: Initialize PICs
mov al, 0x11 ; ICW1: init + ICW4 needed
out PIC1_CMD, al
out PIC2_CMD, al
; ICW2: Set base vectors
mov al, 0x20 ; PIC1: IRQ0 → vector 32 (0x20)
out PIC1_DATA, al
mov al, 0x28 ; PIC2: IRQ8 → vector 40 (0x28)
out PIC2_DATA, al
; ICW3: Tell PIC1 about the cascade, tell PIC2 its ID
mov al, 0x04 ; PIC1: IRQ2 has slave PIC
out PIC1_DATA, al
mov al, 0x02 ; PIC2: I am slave, connected to IRQ2
out PIC2_DATA, al
; ICW4: 8086 mode
mov al, 0x01
out PIC1_DATA, al
out PIC2_DATA, al
; Mask all interrupts initially (unmask as drivers are initialized)
mov al, 0xFF
out PIC1_DATA, al
out PIC2_DATA, al
ret
; After handling a hardware interrupt, send EOI to the PIC
; to re-enable interrupts from that IRQ line
pic_send_eoi:
; RDI = IRQ number
cmp rdi, 8
jl .eoi_pic1
; IRQ8-15: send EOI to both PICs
mov al, PIC_EOI
out PIC2_CMD, al
.eoi_pic1:
mov al, PIC_EOI
out PIC1_CMD, al
ret
⚠️ Common Mistake: Forgetting the EOI. If you handle a hardware interrupt but never send the End-of-Interrupt command to the PIC, the PIC will never send another interrupt on that IRQ line. The keyboard will stop working after the first keystroke, or the timer will fire only once.
The APIC (Modern)
Modern x86-64 systems use the Advanced PIC (APIC), which consists of: - A Local APIC (LAPIC) in each CPU core — handles interrupts for that specific CPU - An I/O APIC on the chipset — routes hardware interrupts to the appropriate LAPIC
The LAPIC is memory-mapped at physical address 0xFEE00000 (by default). You read/write it with MOV instructions, not IN/OUT. The LAPIC is covered in detail in Chapter 29.
The Keyboard IRQ Handler (IRQ1 → Vector 33)
IRQ1 is the PS/2 keyboard interrupt. When a key is pressed or released, the PS/2 controller signals IRQ1. Your handler reads the scancode from port 0x60 and processes it.
; IRQ1 handler: PS/2 keyboard
; Vector 33 (IRQ1 = base 32 + 1)
keyboard_handler:
push rax
push rbx
; Read scancode from keyboard data port
in al, 0x60 ; port 0x60 = PS/2 keyboard data
; Process the scancode
; Bit 7 = 0: key pressed (make code)
; Bit 7 = 1: key released (break code)
test al, 0x80
jnz .key_released
.key_pressed:
and al, 0x7F ; strip bit 7 to get the base scancode
movzx rbx, al
; Look up ASCII in scancode table
lea rax, [scancode_to_ascii]
movzx rax, byte [rax + rbx]
; Store in keyboard ring buffer (if non-zero)
test al, al
jz .done
call kb_buffer_put ; add to keyboard buffer
jmp .done
.key_released:
; Could track modifier key state here (Shift, Ctrl, Alt)
.done:
; Send EOI to PIC to re-enable keyboard interrupts
mov al, 0x20
out 0x20, al ; PIC1 EOI
pop rbx
pop rax
iretq
; Scancode Set 1 (scan code → ASCII) — partial table
scancode_to_ascii:
db 0, 27, '1','2','3','4','5','6','7','8','9','0','-','='
db 8, 9, 'q','w','e','r','t','y','u','i','o','p','[',']'
db 13, 0, 'a','s','d','f','g','h','j','k','l',';',"'", '`'
db 0, 92, 'z','x','c','v','b','n','m',',','.','/', 0, '*'
db 0, ' ', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
; (simplified; real table handles Shift, F-keys, etc.)
MinOS Kernel: Setting Up the Full IDT
Here is the complete IDT initialization for MinOS:
; minOS/kernel/idt.asm — IDT setup for MinOS
IDT_ENTRIES equ 256
IDT_ENTRY_SIZE equ 16
section .bss
align 16
idt_table: resb IDT_ENTRIES * IDT_ENTRY_SIZE
section .data
align 4
idt_ptr:
dw IDT_ENTRIES * IDT_ENTRY_SIZE - 1
dq idt_table
section .text
; idt_set_gate: install a handler in the IDT
; RDI = vector number (0-255)
; RSI = handler address
; DL = type (0x8E = interrupt gate, 0x8F = trap gate)
idt_set_gate:
; Calculate address of IDT entry: idt_table + vector * 16
imul rdi, rdi, IDT_ENTRY_SIZE
lea rax, [idt_table + rdi]
; Store offset[15:0]
mov [rax], si
; Store code segment selector (kernel CS = 0x08)
mov word [rax+2], 0x0008
; Store IST (0) and reserved
mov byte [rax+4], 0x00
; Store type/attr (DL)
mov [rax+5], dl
; Store offset[31:16]
ror rsi, 16 ; rotate to get bits 31:16 in low word
mov [rax+6], si
; Store offset[63:32]
ror rsi, 16 ; rotate back
shr rsi, 32
mov [rax+8], esi
; Store reserved (must be zero)
mov dword [rax+12], 0
ret
idt_init:
; Install exception handlers (vectors 0-31)
mov rdi, 0
mov rsi, exception_vector_0
mov dl, 0x8E ; interrupt gate, DPL=0
call idt_set_gate
mov rdi, 1
mov rsi, exception_vector_1
mov dl, 0x8E
call idt_set_gate
; ... (vectors 2-31 similarly) ...
mov rdi, 14 ; #PF page fault
mov rsi, page_fault_handler
mov dl, 0x8E
call idt_set_gate
; Remap and set up PIC
call pic_remap
; Install IRQ handlers (vectors 32-47)
mov rdi, 32 ; IRQ0 = timer
mov rsi, timer_handler
mov dl, 0x8E
call idt_set_gate
mov rdi, 33 ; IRQ1 = keyboard
mov rsi, keyboard_handler
mov dl, 0x8E
call idt_set_gate
; Unmask IRQ0 (timer) and IRQ1 (keyboard) in PIC
in al, 0x21 ; read PIC1 mask
and al, 0xFC ; clear bits 0 and 1 (unmask IRQ0, IRQ1)
out 0x21, al
; Load the IDT
lidt [idt_ptr]
; Enable interrupts
sti ; set IF flag — interrupts now active!
ret
Ring Transitions: The Privilege Level Mechanism
x86-64 enforces four privilege rings (0–3), though most OSes use only ring 0 (kernel) and ring 3 (user). The current privilege level (CPL) is stored in bits [1:0] of the CS register.
When an interrupt fires in user mode (ring 3), the CPU:
1. Looks up the IDT entry for the vector
2. Checks that CPL ≤ DPL for software interrupts (else #GP)
3. For hardware interrupts and exceptions: always takes the interrupt
4. Switches to the stack defined in the TSS (Task State Segment) for ring 0
5. Pushes SS, RSP (user), RFLAGS, CS, RIP onto the kernel stack
6. Loads CS with the kernel code segment, sets CPL = 0
7. Jumps to the handler
The TSS (Task State Segment) stores the ring-0 stack pointer (RSP0) that the CPU loads when transitioning from ring 3. In a single-processor kernel, there is one TSS; in SMP kernels, one TSS per CPU.
; TSS layout (minimal, 64-bit)
section .data
align 16
tss:
dd 0 ; reserved
dq kernel_stack_top ; RSP0: kernel stack for ring-0 transitions
dq 0 ; RSP1 (not used)
dq 0 ; RSP2 (not used)
dq 0 ; reserved
dq 0 ; IST1 (Interrupt Stack Table)
dq 0 ; IST2
; ... IST3-7, IOPB offset, etc. ...
times 68 db 0 ; fill remaining TSS fields
dw 104 ; IOPB offset (past end of TSS = no IOPB)
Summary
The IDT is the CPU's event dispatch table. 256 16-byte gate descriptors map interrupt and exception vectors to handler addresses. The PIC (or APIC in modern systems) routes hardware interrupt requests to specific vectors. When any interrupt fires, the CPU saves architectural state automatically and jumps to the handler. Your handler processes the event and returns with IRETQ. Without the IDT, the CPU cannot handle keyboard input, timer events, or memory faults — there is no operating system.
🔄 Check Your Understanding: 1. What is the difference between an interrupt gate and a trap gate, and when would you use each? 2. Why must you remap the 8259A PIC from its default vectors (8–15) to vectors 32–47? 3. What information does the CPU automatically push onto the kernel stack when a page fault occurs in user mode? 4. Why does the keyboard IRQ handler need to send an EOI to the PIC? 5. What is the purpose of the
IRETQinstruction, and what does it restore?