Case Study 10.2: Jump Tables in the Wild — How GCC Implements switch/case

Setting Up

We will examine GCC's output for a realistic switch statement — a simple bytecode interpreter — and trace through exactly how the jump table is constructed, how bounds checking works, and how the indirect jump dispatches to the right case.

The Bytecode Interpreter

typedef enum {
    OP_ADD  = 0,
    OP_SUB  = 1,
    OP_MUL  = 2,
    OP_DIV  = 3,
    OP_PUSH = 4,
    OP_POP  = 5,
    OP_DUP  = 6,
    OP_HALT = 7
} Opcode;

// Stack-based VM: stack in 'stack', stack pointer in 'sp'
int execute_op(Opcode op, int64_t operand, int64_t *stack, int *sp) {
    switch (op) {
        case OP_ADD:
            stack[*sp - 1] += stack[*sp];
            (*sp)--;
            return 1;
        case OP_SUB:
            stack[*sp - 1] -= stack[*sp];
            (*sp)--;
            return 1;
        case OP_MUL:
            stack[*sp - 1] *= stack[*sp];
            (*sp)--;
            return 1;
        case OP_DIV:
            if (stack[*sp] == 0) return -1;  // division by zero
            stack[*sp - 1] /= stack[*sp];
            (*sp)--;
            return 1;
        case OP_PUSH:
            (*sp)++;
            stack[*sp] = operand;
            return 1;
        case OP_POP:
            if (*sp < 0) return -1;
            (*sp)--;
            return 1;
        case OP_DUP:
            stack[*sp + 1] = stack[*sp];
            (*sp)++;
            return 1;
        case OP_HALT:
            return 0;
        default:
            return -1;
    }
}

GCC -O2 Assembly Output (Annotated)

Compiled with gcc -O2 -S -o - bytecode.c:

; Function: execute_op
; Arguments (SysV AMD64 ABI):
;   EDI  = op (int, enum value)
;   RSI  = operand (int64_t)
;   RDX  = stack (int64_t *)
;   ECX  = sp pointer (int *)

execute_op:
    ; ───────────────────────────────────────────────────────────
    ; Step 1: Bounds check
    ; Cases are 0..7. Any value > 7 goes to default (return -1).
    ; ───────────────────────────────────────────────────────────
    cmp     edi, 7
    ja      .default_case        ; unsigned above 7 → default

    ; ───────────────────────────────────────────────────────────
    ; Step 2: Load jump table address (RIP-relative)
    ; The jump table lives in the .rodata section.
    ; ───────────────────────────────────────────────────────────
    movsx   rdi, edi             ; sign-extend op to 64-bit (was int)
    lea     rax, [rel .jtable]   ; rax = address of jump table
    mov     rax, [rax + rdi*8]   ; rax = .jtable[op] = address of case handler
    jmp     rax                  ; dispatch!

    ; ───────────────────────────────────────────────────────────
    ; Jump table (in .rodata):
    ; ───────────────────────────────────────────────────────────
.jtable:
    dq .op_add     ; 0
    dq .op_sub     ; 1
    dq .op_mul     ; 2
    dq .op_div     ; 3
    dq .op_push    ; 4
    dq .op_pop     ; 5
    dq .op_dup     ; 6
    dq .op_halt    ; 7

    ; ───────────────────────────────────────────────────────────
    ; Case handlers:
    ; ───────────────────────────────────────────────────────────
.op_add:
    mov     eax, [rcx]           ; eax = *sp
    movsx   r8, eax
    add     r8, -1               ; r8 = *sp - 1
    mov     r9, [rdx + r8*8]     ; r9 = stack[*sp-1]
    add     r9, [rdx + rax*8]   ; r9 += stack[*sp]
    mov     [rdx + r8*8], r9    ; stack[*sp-1] = result
    sub     dword [rcx], 1       ; (*sp)--
    mov     eax, 1
    ret

.op_sub:
    mov     eax, [rcx]
    movsx   r8, eax
    add     r8, -1
    mov     r9, [rdx + r8*8]
    sub     r9, [rdx + rax*8]
    mov     [rdx + r8*8], r9
    sub     dword [rcx], 1
    mov     eax, 1
    ret

; ... (similar for MUL, DIV, PUSH, POP, DUP) ...

.op_halt:
    xor     eax, eax             ; return 0
    ret

.default_case:
    mov     eax, -1
    ret

Anatomy of the Dispatch Sequence

The three instructions that implement the dispatch:

lea  rax, [rel .jtable]    ; rax = base address of jump table
mov  rax, [rax + rdi*8]    ; rax = address of handler (8 bytes per pointer)
jmp  rax                   ; jump to handler

This is O(1) dispatch for any number of cases, as long as the case values are dense. For N cases, the hardware executes exactly 3 instructions regardless of N.

Memory Layout of the Jump Table

.jtable:
┌─────────────────────────────────────────┐
│ address of .op_add  (8 bytes, 64-bit)   │ ← offset 0  (op=0)
├─────────────────────────────────────────┤
│ address of .op_sub  (8 bytes)           │ ← offset 8  (op=1)
├─────────────────────────────────────────┤
│ address of .op_mul  (8 bytes)           │ ← offset 16 (op=2)
├─────────────────────────────────────────┤
│ address of .op_div  (8 bytes)           │ ← offset 24 (op=3)
├─────────────────────────────────────────┤
│ address of .op_push (8 bytes)           │ ← offset 32 (op=4)
├─────────────────────────────────────────┤
│ address of .op_pop  (8 bytes)           │ ← offset 40 (op=5)
├─────────────────────────────────────────┤
│ address of .op_dup  (8 bytes)           │ ← offset 48 (op=6)
├─────────────────────────────────────────┤
│ address of .op_halt (8 bytes)           │ ← offset 56 (op=7)
└─────────────────────────────────────────┘

Each entry is 8 bytes (64-bit pointer), so [.jtable + op * 8] gives the address of the Nth handler.

Why JA for the Bounds Check?

cmp edi, 7
ja  .default_case

The JA instruction is unsigned above. Why unsigned, not signed?

Because enum values in C are non-negative. If someone passes op = -1 (which is 0xFFFFFFFF as a 32-bit int), we still want to catch it as out-of-range. In unsigned comparison, 0xFFFFFFFF > 7, so ja correctly routes it to the default case. With signed JG, negative values would not trigger the bounds check because -1 < 7 signed, and we would read 8 bytes from jtable - 8 — accessing memory before the table and jumping to a garbage address.

This is a safety property: always use unsigned comparison for jump table bounds checks.

Performance: Jump Table vs. if-else Chain

Approach Instructions to dispatch Time complexity
Jump table 3 (LEA + MOV + JMP) O(1)
if-else chain (linear) Up to 2N (N comparisons + jumps) O(N)
Binary search tree O(log N) comparisons O(log N)

For 8 cases, the if-else chain takes at most 16 instructions to find the last case. The jump table takes 3. For 64 cases, the jump table still takes 3; the if-else chain takes up to 128.

On a real processor, the indirect jump jmp rax has additional cost: the branch predictor must predict the destination (not just whether the branch is taken). Modern processors have indirect branch predictors that can track recently used destinations. For a dispatch loop in an interpreter, the same opcodes repeat frequently, and the predictor learns the pattern.

Comparing to if-else Chain

What GCC generates when there are only 3 cases (below the threshold for a jump table):

; 3-case switch: GCC generates comparisons
switch_small:
    cmp   edi, 1
    je    .case_1
    cmp   edi, 2
    je    .case_2
    test  edi, edi       ; edi == 0?
    jne   .default
    ; case 0:
    ...

The switchover point: GCC uses a jump table when the density (cases/range) is above ~60% and there are more than ~4-6 cases. Below that, a comparison chain is emitted.

The computed goto Extension in GCC

For interpreter hot loops, GCC's computed goto extension (goto *addr) generates the same pattern but allows the programmer to avoid re-checking bounds on each dispatch:

static void *dispatch_table[] = {
    &&op_add, &&op_sub, &&op_mul, ...
};
// Then: goto *dispatch_table[op];

The assembly is nearly identical to the switch jump table, but the programmer controls the table layout and can embed the next-opcode fetch into each handler (threaded code) for 10-30% interpreter speedup.

Security Note: PIE and Jump Tables

In position-independent executables (PIE), jump table entries cannot be absolute 64-bit addresses at compile time — the binary's load address is not known. GCC handles this in two ways:

  1. RIP-relative tables with 32-bit offsets: Store 4-byte offsets from the table base, reconstruct the address at runtime.
  2. Absolute addresses with dynamic relocation: The dynamic linker patches the table at load time.

Modern GCC defaults to the offset approach for PIE binaries, adding a load-and-add step to the dispatch:

; PIE jump table dispatch:
lea    rdx, [rel .jtable]    ; rdx = table base (RIP-relative)
movsx  rax, dword [rdx + rdi*4]   ; rax = 32-bit offset from .jtable[op]
add    rax, rdx              ; rax = absolute address
jmp    rax

The jump table stores 4-byte relative offsets instead of 8-byte absolute addresses, halving the table size and enabling PIC.