Case Study 10.2: Jump Tables in the Wild — How GCC Implements switch/case
Setting Up
We will examine GCC's output for a realistic switch statement — a simple bytecode interpreter — and trace through exactly how the jump table is constructed, how bounds checking works, and how the indirect jump dispatches to the right case.
The Bytecode Interpreter
typedef enum {
OP_ADD = 0,
OP_SUB = 1,
OP_MUL = 2,
OP_DIV = 3,
OP_PUSH = 4,
OP_POP = 5,
OP_DUP = 6,
OP_HALT = 7
} Opcode;
// Stack-based VM: stack in 'stack', stack pointer in 'sp'
int execute_op(Opcode op, int64_t operand, int64_t *stack, int *sp) {
switch (op) {
case OP_ADD:
stack[*sp - 1] += stack[*sp];
(*sp)--;
return 1;
case OP_SUB:
stack[*sp - 1] -= stack[*sp];
(*sp)--;
return 1;
case OP_MUL:
stack[*sp - 1] *= stack[*sp];
(*sp)--;
return 1;
case OP_DIV:
if (stack[*sp] == 0) return -1; // division by zero
stack[*sp - 1] /= stack[*sp];
(*sp)--;
return 1;
case OP_PUSH:
(*sp)++;
stack[*sp] = operand;
return 1;
case OP_POP:
if (*sp < 0) return -1;
(*sp)--;
return 1;
case OP_DUP:
stack[*sp + 1] = stack[*sp];
(*sp)++;
return 1;
case OP_HALT:
return 0;
default:
return -1;
}
}
GCC -O2 Assembly Output (Annotated)
Compiled with gcc -O2 -S -o - bytecode.c:
; Function: execute_op
; Arguments (SysV AMD64 ABI):
; EDI = op (int, enum value)
; RSI = operand (int64_t)
; RDX = stack (int64_t *)
; ECX = sp pointer (int *)
execute_op:
; ───────────────────────────────────────────────────────────
; Step 1: Bounds check
; Cases are 0..7. Any value > 7 goes to default (return -1).
; ───────────────────────────────────────────────────────────
cmp edi, 7
ja .default_case ; unsigned above 7 → default
; ───────────────────────────────────────────────────────────
; Step 2: Load jump table address (RIP-relative)
; The jump table lives in the .rodata section.
; ───────────────────────────────────────────────────────────
movsx rdi, edi ; sign-extend op to 64-bit (was int)
lea rax, [rel .jtable] ; rax = address of jump table
mov rax, [rax + rdi*8] ; rax = .jtable[op] = address of case handler
jmp rax ; dispatch!
; ───────────────────────────────────────────────────────────
; Jump table (in .rodata):
; ───────────────────────────────────────────────────────────
.jtable:
dq .op_add ; 0
dq .op_sub ; 1
dq .op_mul ; 2
dq .op_div ; 3
dq .op_push ; 4
dq .op_pop ; 5
dq .op_dup ; 6
dq .op_halt ; 7
; ───────────────────────────────────────────────────────────
; Case handlers:
; ───────────────────────────────────────────────────────────
.op_add:
mov eax, [rcx] ; eax = *sp
movsx r8, eax
add r8, -1 ; r8 = *sp - 1
mov r9, [rdx + r8*8] ; r9 = stack[*sp-1]
add r9, [rdx + rax*8] ; r9 += stack[*sp]
mov [rdx + r8*8], r9 ; stack[*sp-1] = result
sub dword [rcx], 1 ; (*sp)--
mov eax, 1
ret
.op_sub:
mov eax, [rcx]
movsx r8, eax
add r8, -1
mov r9, [rdx + r8*8]
sub r9, [rdx + rax*8]
mov [rdx + r8*8], r9
sub dword [rcx], 1
mov eax, 1
ret
; ... (similar for MUL, DIV, PUSH, POP, DUP) ...
.op_halt:
xor eax, eax ; return 0
ret
.default_case:
mov eax, -1
ret
Anatomy of the Dispatch Sequence
The three instructions that implement the dispatch:
lea rax, [rel .jtable] ; rax = base address of jump table
mov rax, [rax + rdi*8] ; rax = address of handler (8 bytes per pointer)
jmp rax ; jump to handler
This is O(1) dispatch for any number of cases, as long as the case values are dense. For N cases, the hardware executes exactly 3 instructions regardless of N.
Memory Layout of the Jump Table
.jtable:
┌─────────────────────────────────────────┐
│ address of .op_add (8 bytes, 64-bit) │ ← offset 0 (op=0)
├─────────────────────────────────────────┤
│ address of .op_sub (8 bytes) │ ← offset 8 (op=1)
├─────────────────────────────────────────┤
│ address of .op_mul (8 bytes) │ ← offset 16 (op=2)
├─────────────────────────────────────────┤
│ address of .op_div (8 bytes) │ ← offset 24 (op=3)
├─────────────────────────────────────────┤
│ address of .op_push (8 bytes) │ ← offset 32 (op=4)
├─────────────────────────────────────────┤
│ address of .op_pop (8 bytes) │ ← offset 40 (op=5)
├─────────────────────────────────────────┤
│ address of .op_dup (8 bytes) │ ← offset 48 (op=6)
├─────────────────────────────────────────┤
│ address of .op_halt (8 bytes) │ ← offset 56 (op=7)
└─────────────────────────────────────────┘
Each entry is 8 bytes (64-bit pointer), so [.jtable + op * 8] gives the address of the Nth handler.
Why JA for the Bounds Check?
cmp edi, 7
ja .default_case
The JA instruction is unsigned above. Why unsigned, not signed?
Because enum values in C are non-negative. If someone passes op = -1 (which is 0xFFFFFFFF as a 32-bit int), we still want to catch it as out-of-range. In unsigned comparison, 0xFFFFFFFF > 7, so ja correctly routes it to the default case. With signed JG, negative values would not trigger the bounds check because -1 < 7 signed, and we would read 8 bytes from jtable - 8 — accessing memory before the table and jumping to a garbage address.
This is a safety property: always use unsigned comparison for jump table bounds checks.
Performance: Jump Table vs. if-else Chain
| Approach | Instructions to dispatch | Time complexity |
|---|---|---|
| Jump table | 3 (LEA + MOV + JMP) | O(1) |
| if-else chain (linear) | Up to 2N (N comparisons + jumps) | O(N) |
| Binary search tree | O(log N) comparisons | O(log N) |
For 8 cases, the if-else chain takes at most 16 instructions to find the last case. The jump table takes 3. For 64 cases, the jump table still takes 3; the if-else chain takes up to 128.
On a real processor, the indirect jump jmp rax has additional cost: the branch predictor must predict the destination (not just whether the branch is taken). Modern processors have indirect branch predictors that can track recently used destinations. For a dispatch loop in an interpreter, the same opcodes repeat frequently, and the predictor learns the pattern.
Comparing to if-else Chain
What GCC generates when there are only 3 cases (below the threshold for a jump table):
; 3-case switch: GCC generates comparisons
switch_small:
cmp edi, 1
je .case_1
cmp edi, 2
je .case_2
test edi, edi ; edi == 0?
jne .default
; case 0:
...
The switchover point: GCC uses a jump table when the density (cases/range) is above ~60% and there are more than ~4-6 cases. Below that, a comparison chain is emitted.
The computed goto Extension in GCC
For interpreter hot loops, GCC's computed goto extension (goto *addr) generates the same pattern but allows the programmer to avoid re-checking bounds on each dispatch:
static void *dispatch_table[] = {
&&op_add, &&op_sub, &&op_mul, ...
};
// Then: goto *dispatch_table[op];
The assembly is nearly identical to the switch jump table, but the programmer controls the table layout and can embed the next-opcode fetch into each handler (threaded code) for 10-30% interpreter speedup.
Security Note: PIE and Jump Tables
In position-independent executables (PIE), jump table entries cannot be absolute 64-bit addresses at compile time — the binary's load address is not known. GCC handles this in two ways:
- RIP-relative tables with 32-bit offsets: Store 4-byte offsets from the table base, reconstruct the address at runtime.
- Absolute addresses with dynamic relocation: The dynamic linker patches the table at load time.
Modern GCC defaults to the offset approach for PIE binaries, adding a load-and-add step to the dispatch:
; PIE jump table dispatch:
lea rdx, [rel .jtable] ; rdx = table base (RIP-relative)
movsx rax, dword [rdx + rdi*4] ; rax = 32-bit offset from .jtable[op]
add rax, rdx ; rax = absolute address
jmp rax
The jump table stores 4-byte relative offsets instead of 8-byte absolute addresses, halving the table size and enabling PIC.