Case Study 26-2: INT3 — The Debugger's Best Friend

How Software Breakpoints Work at the Instruction Level

When you type break main in GDB, something specific happens at the machine level: GDB writes a single byte — 0xCC — over the first byte of the first instruction in main. This is the INT3 instruction, also called the breakpoint instruction. When the CPU executes it, it fires exception vector 3 (#BP), and the kernel notifies the debugger. GDB restores the original byte, presents you with the register state, and waits for your next command.

This case study traces the complete path from GDB command to INT3 firing, explains the assembly mechanics, and shows how to implement a simple software debugger from scratch.

The INT3 Instruction

INT3 is a one-byte instruction with opcode 0xCC. Its two-byte cousin, INT 3 (opcode 0xCD 0x03), is identical in effect but different in encoding. The one-byte encoding is specifically designed for breakpoints — it can replace any instruction's first byte, regardless of the original instruction's length.

When 0xCC executes: 1. The CPU fires exception vector 3 (#BP) 2. RIP is pushed pointing to the byte after the 0xCC 3. The kernel's #BP handler executes 4. If a debugger is attached, the kernel sends SIGTRAP to the process

The key detail: RIP points past the INT3, not at it. When the debugger wants to continue, it must first restore the original byte at the breakpoint address, then set RIP back by one byte (to re-execute the original instruction), then single-step one instruction, then re-insert the breakpoint.

Examining INT3 in Action with GDB

# Compile with debug info
gcc -g -o test_program test.c

# Start GDB
gdb test_program

# Set a breakpoint
(gdb) break main
Breakpoint 1 at 0x401126: file test.c, line 5.

# Check what GDB patched at that address
(gdb) run
Starting program: test_program

Breakpoint 1, main () at test.c:5

# Examine the instruction GDB will show:
(gdb) x/1bx 0x401126
0x401126 <main>: 0xcc       ← GDB shows 0xCC (INT3) while stopped here

# GDB has the real byte internally:
(gdb) p/x *(unsigned char*)0x401126
$1 = 0x55                   ← GDB shows the original byte (0x55 = PUSH RBP)

The discrepancy is intentional: GDB knows the real byte and shows it to you, even though the memory actually contains 0xCC.

Implementing a Minimal Software Debugger

Here is a functional (if minimal) software debugger implemented in assembly + raw system calls:

; mindbg.asm — A minimal software debugger using ptrace syscalls
; Usage: ./mindbg <program>
;
; Demonstrates: fork, ptrace, breakpoints via POKETEXT, single-step
; Build: nasm -f elf64 mindbg.asm -o mindbg.o && ld mindbg.o -o mindbg

; ptrace request codes
PTRACE_TRACEME    equ 0
PTRACE_PEEKTEXT   equ 1
PTRACE_POKETEXT   equ 4
PTRACE_CONT       equ 7
PTRACE_SINGLESTEP equ 9
PTRACE_GETREGS    equ 12
PTRACE_SETREGS    equ 13
PTRACE_ATTACH     equ 16
PTRACE_DETACH     equ 17

; waitpid status macros
; WIFEXITED(s)   = (s & 0x7F) == 0
; WIFSTOPPED(s)  = (s & 0xFF) == 0x7F
; WSTOPSIG(s)    = (s >> 8) & 0xFF

SYS_FORK     equ 57
SYS_EXECVE   equ 59
SYS_EXIT     equ 60
SYS_WAIT4    equ 61
SYS_PTRACE   equ 101
SYS_WRITE    equ 1

%define SIGTRAP 5

section .bss
    child_pid:  resq 1
    wait_status resd 1
    ; user_regs_struct: 216 bytes (27 × 8-byte registers)
    ; Layout: r15, r14, r13, r12, rbp, rbx, r11, r10, r9, r8,
    ;         rax, rcx, rdx, rsi, rdi, orig_rax, rip, cs, eflags,
    ;         rsp, ss, fs_base, gs_base, ds, es, fs, gs
    regs:       resb 216

section .text
global _start

_start:
    ; For simplicity, trace a hardcoded target program
    ; In a real debugger, we'd parse argv

    ; Fork
    mov rax, SYS_FORK
    syscall
    test rax, rax
    js .error
    jz .child

    ; Parent: debugger
    mov [child_pid], rax

    ; Wait for child to stop (after PTRACE_TRACEME + execve)
    mov rdi, rax            ; child PID
    lea rsi, [wait_status]
    xor rdx, rdx
    xor r10, r10
    mov rax, SYS_WAIT4
    syscall

    ; Child is now stopped at entry point (first instruction)
    ; Set a breakpoint at a known address (hardcoded for demo)
    ; In reality, you'd look up the symbol table
    call set_breakpoint

    ; Continue execution
    call ptrace_continue

    ; Wait for breakpoint hit or exit
.debug_loop:
    mov rdi, [child_pid]
    lea rsi, [wait_status]
    xor rdx, rdx
    xor r10, r10
    mov rax, SYS_WAIT4
    syscall

    ; Check if stopped (WIFSTOPPED)
    mov eax, [wait_status]
    and eax, 0xFF
    cmp eax, 0x7F           ; 0x7F = stopped
    jne .child_exited

    ; Check stop signal (WSTOPSIG)
    mov eax, [wait_status]
    shr eax, 8
    and eax, 0xFF
    cmp eax, SIGTRAP
    je .handle_breakpoint

    ; Other signal: pass through
    call ptrace_continue
    jmp .debug_loop

.handle_breakpoint:
    ; Print register state
    call print_registers

    ; For a real debugger: restore original byte, set RIP back by 1,
    ; single-step, re-insert breakpoint, continue
    call ptrace_continue
    jmp .debug_loop

.child_exited:
    ; Child exited: decode and print exit status
    mov rax, SYS_EXIT
    xor rdi, rdi
    syscall

.child:
    ; Child: set up to be traced
    ; ptrace(PTRACE_TRACEME, 0, 0, 0)
    mov rax, SYS_PTRACE
    mov rdi, PTRACE_TRACEME
    xor rsi, rsi
    xor rdx, rdx
    xor r10, r10
    syscall

    ; Execute the target program
    ; (in a real debugger, use the path from argv)
    mov rax, SYS_EXECVE
    mov rdi, target_path
    mov rsi, target_argv
    xor rdx, rdx
    syscall

    ; If execve fails, exit
    mov rax, SYS_EXIT
    mov rdi, 1
    syscall

.error:
    mov rax, SYS_EXIT
    mov rdi, 1
    syscall

set_breakpoint:
    ; ptrace(PTRACE_POKETEXT, pid, addr, word_with_0xCC)
    ; Reads the current word at addr, replaces first byte with 0xCC
    ; (simplified: doesn't save original byte here)
    mov rax, SYS_PTRACE
    mov rdi, PTRACE_POKETEXT
    mov rsi, [child_pid]
    mov rdx, [breakpoint_addr]  ; address to break at
    mov r10, 0xCC               ; write 0xCC as the first byte
    syscall
    ret

ptrace_continue:
    mov rax, SYS_PTRACE
    mov rdi, PTRACE_CONT
    mov rsi, [child_pid]
    xor rdx, rdx
    xor r10, r10
    syscall
    ret

print_registers:
    ; ptrace(PTRACE_GETREGS, pid, 0, &regs)
    mov rax, SYS_PTRACE
    mov rdi, PTRACE_GETREGS
    mov rsi, [child_pid]
    xor rdx, rdx
    lea r10, [regs]
    syscall
    ; Print RIP (at offset 128 in user_regs_struct)
    ; ... (print formatted output using write syscall)
    ret

section .data
    target_path:  db "/tmp/target", 0
    target_argv:  dq target_path, 0
    breakpoint_addr: dq 0x401126    ; hardcoded for demo

The ptrace System Call

ptrace is the foundation of all Linux debuggers and the mechanism behind strace. It lets one process (the tracer) observe and control another (the tracee). Key operations:

Request Effect
PTRACE_TRACEME Child tells kernel to let its parent trace it
PTRACE_PEEKTEXT Read 8 bytes from tracee's memory
PTRACE_POKETEXT Write 8 bytes into tracee's memory (used to insert 0xCC)
PTRACE_GETREGS Copy all registers into a user_regs_struct
PTRACE_SETREGS Set all registers from a user_regs_struct
PTRACE_CONT Resume execution (optionally deliver a signal)
PTRACE_SINGLESTEP Execute exactly one instruction, then stop (sets TF)

The reason POKETEXT works for breakpoints: it writes a full 64-bit word at the target address. The debugger reads the original word with PEEKTEXT, replaces the lowest byte with 0xCC, writes it back with POKETEXT. When the breakpoint fires, the debugger does the reverse: reads the word, restores the original byte, sets RIP back by 1 (to re-execute the original instruction), and uses SINGLESTEP to execute one instruction before re-inserting the breakpoint.

🔐 Security Note: ptrace is powerful enough to read and modify arbitrary memory and registers of any process you can trace. This is why container security systems restrict it: an unprivileged process can use ptrace to completely control a child process. Seccomp policies often block or limit ptrace.

The lesson: every time you use gdb break foo, you are writing 0xCC into a live running process's memory. The instruction set includes a one-byte encoding for this operation specifically because it is so fundamental to software development.