10 min read

Every conditional branch in every C program compiles to two things: a comparison that sets flags, and a conditional jump that reads those flags. There are no if statements in machine code — only flag tests and jumps. This chapter makes that...

Chapter 10: Control Flow

The Jump is the Computer's Only Decision

Every conditional branch in every C program compiles to two things: a comparison that sets flags, and a conditional jump that reads those flags. There are no if statements in machine code — only flag tests and jumps. This chapter makes that translation explicit, covering every conditional jump in the x86-64 ISA, every C control structure mapped to assembly, and the CMOV family of conditional move instructions that implement branching without branches.

Unconditional Jumps

The simplest jump: go to a label, no conditions.

jmp target              ; near jump (32-bit offset from RIP)
jmp short target        ; short jump (8-bit offset, ±127 bytes)
jmp rax                 ; indirect jump: jump to address in RAX
jmp [rax]               ; indirect jump: jump to address stored at memory[RAX]
jmp [table + rax*8]     ; indirect jump through table (jump tables, see below)

Jump Encodings

Three encodings exist for unconditional jumps:

  • Short jump (EB cb): 2 bytes total. Offset is a signed 8-bit value, range ±127 bytes from the next instruction. The assembler uses this when the target is within range.
  • Near jump (E9 cd): 5 bytes total. Offset is a signed 32-bit value, range ±2GB from the next instruction. This covers virtually all jumps in a 64-bit binary.
  • Far jump: Used for switching between privilege levels or code segments (relevant for MinOS kernel work in Chapter 33). Rare in userspace.

The assembler chooses automatically. You can force the short encoding with jmp short label (and get an assembler error if the target is out of range).

Indirect Jumps

jmp rax               ; jump to the address stored in RAX

Indirect jumps are the mechanism behind: 1. Virtual function calls in C++ — the vtable entry holds the address 2. Jump tables for switch statements — an array of addresses indexed by the case value 3. Return from functionret is essentially jmp [rsp]; add rsp, 8 4. Dynamic dispatch in interpreters

; Function pointer call (C: fp())
; Assume function pointer is in RAX
call rax               ; push return address, jump to RAX
; or for a call through memory:
call [rdi + 16]        ; vtable entry at offset 16

Conditional Jumps

Every conditional jump checks one or more flags from RFLAGS. After a CMP or TEST instruction, the flags encode the relationship between the operands, and the conditional jump acts on those flags.

The Complete Conditional Jump Reference

Signed comparisons (use after CMP on signed integers):

Mnemonic Aliases Flag condition C equivalent
JE JZ ZF = 1 ==
JNE JNZ ZF = 0 !=
JL JNGE SF ≠ OF < (signed)
JLE JNG ZF = 1 OR SF ≠ OF <= (signed)
JG JNLE ZF = 0 AND SF = OF > (signed)
JGE JNL SF = OF >= (signed)
JS SF = 1 result is negative
JNS SF = 0 result is non-negative
JO OF = 1 signed overflow occurred
JNO OF = 0 no signed overflow

Unsigned comparisons (use after CMP on unsigned integers):

Mnemonic Aliases Flag condition C equivalent
JE JZ ZF = 1 ==
JNE JNZ ZF = 0 !=
JB JNAE, JC CF = 1 < (unsigned, "below")
JBE JNA CF = 1 OR ZF = 1 <= (unsigned, "below or equal")
JA JNBE CF = 0 AND ZF = 0 > (unsigned, "above")
JAE JNB, JNC CF = 0 >= (unsigned, "above or equal")

Other conditional jumps:

Mnemonic Flag condition Use case
JP JPE PF = 1
JNP JPO PF = 0
JCXZ CX = 0
JECXZ ECX = 0
JRCXZ RCX = 0

⚠️ Common Mistake — Signed vs. Unsigned: This is the most common control flow bug in assembly. Consider:

nasm cmp rax, 0x8000000000000000 jl is_less

If RAX = 0 and the comparison value is 0x8000000000000000 (which is -INT64_MAX-1 as a signed value, or 9223372036854775808 as unsigned), then: - JL (signed less than): 0 < -INT64_MAX-1? No, because -INT64_MAX-1 is negative. JL is NOT taken. - JB (unsigned less than): 0 < 9223372036854775808? Yes. JB IS taken.

Using JL where you mean JB (or vice versa) gives completely wrong results. The rule: use JL/JLE/JG/JGE for signed integers, use JB/JBE/JA/JAE for unsigned.

Translating C Control Structures to Assembly

if/else

if (a > b) {
    do_something();
} else {
    do_other();
}

Assembly pattern — jump over the if-body on false condition:

; Signed comparison, a in RAX, b in RBX
cmp rax, rbx            ; set flags for a - b
jle .else_branch        ; jump if NOT (a > b), i.e., if a <= b
; if-body:
call do_something
jmp .end_if

.else_branch:
call do_other

.end_if:
; continue...

The key: the jump condition is the negation of the C condition. if (a > b)jle .else (jump if less-or-equal, the opposite).

For a simple if without else:

; if (x != 0) { ... }
test rax, rax
jz   .skip              ; jump if x == 0 (negation of x != 0)
; if-body
.skip:

while Loop

while (i < n) {
    process(i);
    i++;
}
; i in RCX, n in RDX
; Option 1: check at top (standard while)
.while_start:
    cmp rcx, rdx        ; i < n?
    jge .while_end      ; exit if i >= n
    mov rdi, rcx        ; arg: i
    call process
    inc rcx             ; i++
    jmp .while_start    ; loop back

.while_end:

Or the preferred form — check at bottom to save one jump:

; Test first, then loop
cmp rcx, rdx
jge .while_end          ; if initially false, skip body entirely

.while_body:
    mov rdi, rcx
    call process
    inc rcx
    cmp rcx, rdx        ; check condition at bottom
    jl .while_body      ; loop if still true

.while_end:

The bottom-test form saves one unconditional jump per iteration.

for Loop

for (int i = 0; i < n; i++) {
    arr[i] = i * 2;
}
; i in RCX (initialized below), n in RDX, arr base in RSI
xor ecx, ecx            ; i = 0
test rdx, rdx           ; n == 0?
jle .for_end            ; skip if n <= 0

.for_body:
    lea rax, [rcx + rcx] ; rax = i*2
    mov [rsi + rcx*8], rax ; arr[i] = i*2
    inc rcx
    cmp rcx, rdx
    jl .for_body         ; continue if i < n

.for_end:

do-while Loop

The do-while loop has no pre-check — the body always executes at least once:

do {
    process(data);
    data = data->next;
} while (data != NULL);
; data pointer in RDI
.do_loop:
    call process         ; body
    mov rdi, [rdi + 16]  ; data = data->next (assume next at offset 16)
    test rdi, rdi        ; data != NULL?
    jnz .do_loop         ; loop if true (no initial check needed)

The do-while maps perfectly to the bottom-test loop form — there is no overhead for the initial check.

Counting Down vs. Counting Up

Counting down to zero is often slightly faster because you compare against zero (which is implicit in the loop test instruction) rather than against a non-zero upper bound:

; Count down: i from n-1 to 0
; More efficient: compare with zero is implicit in DEC + JNZ
mov rcx, n
.loop:
    ; ... body using rcx ...
    dec rcx
    jnz .loop            ; loop while rcx != 0 (JNZ checks ZF from DEC)

; Count up: i from 0 to n-1
; Slightly more instructions (need explicit CMP)
xor ecx, ecx
.loop:
    ; ... body using rcx ...
    inc rcx
    cmp rcx, n
    jl .loop

However, counting down changes the order of iteration and may not be valid if the algorithm requires ascending order.

switch/case and Jump Tables

When a switch statement has many cases with values in a small range, the compiler generates a jump table: an array of addresses, one per case, indexed by the switch variable.

switch (cmd) {
    case 0: handle_read();  break;
    case 1: handle_write(); break;
    case 2: handle_seek();  break;
    case 3: handle_close(); break;
    default: handle_error(); break;
}
; cmd in RDI
; Jump table approach:
    cmp rdi, 3               ; check upper bound for range
    ja  .default             ; above 3: default case

    ; Indirect jump through table:
    jmp [rel .jtable + rdi*8]

.jtable:
    dq .case_0               ; address of case 0 handler
    dq .case_1               ; address of case 1 handler
    dq .case_2               ; address of case 2 handler
    dq .case_3               ; address of case 3 handler

.case_0:
    call handle_read
    jmp .switch_end

.case_1:
    call handle_write
    jmp .switch_end

.case_2:
    call handle_seek
    jmp .switch_end

.case_3:
    call handle_close
    jmp .switch_end

.default:
    call handle_error

.switch_end:

⚙️ How It Works: The address computation [rel .jtable + rdi*8] treats the jump table as an array of 8-byte (pointer-size) values and uses the command value as the index. One jmp instruction dispatches to any of N cases in O(1) time, regardless of N. This is why compilers generate jump tables for dense switch statements — they are faster than a chain of comparisons (which is O(N) in the number of cases).

The compiler's decision: switch → jump table when: - Cases are dense (mostly consecutive values) - There are typically 5+ cases (overhead of bounds check + indirect jump is worth it) - Case values span a range ≤ some threshold (GCC default ~≤128)

For sparse switches (cases like 1, 100, 1000, 9999), the compiler generates a binary search tree of comparisons instead.

The LOOP Instruction

The LOOP instruction decrements RCX and jumps if RCX ≠ 0:

mov rcx, 10            ; loop count
.loop:
    ; ... body ...
    loop .loop         ; decrement RCX, jump if RCX != 0

This is a compact encoding (2 bytes) but has a catch: on modern processors, LOOP is microcoded and slower than the equivalent dec rcx; jnz pair. GCC almost never emits LOOP. It is occasionally useful in very tight code where instruction count matters more than cycle count (e.g., in boot code with size constraints), but for performance-critical loops, use dec rcx; jnz.

CMOV: Conditional Move (Branch-Free Code)

CMOV performs a register-to-register move conditionally, based on flags, without branching:

; General form:
cmov<cc> dst, src       ; if condition <cc> is true, dst = src; else dst unchanged

; Examples:
cmove  rax, rbx         ; if ZF=1 (equal), rax = rbx
cmovne rax, rbx         ; if ZF=0 (not equal)
cmovl  rax, rbx         ; if SF≠OF (signed less than)
cmovle rax, rbx         ; if ZF=1 or SF≠OF (signed less-or-equal)
cmovg  rax, rbx         ; if ZF=0 and SF=OF (signed greater)
cmovge rax, rbx         ; if SF=OF (signed greater-or-equal)
cmovb  rax, rbx         ; if CF=1 (unsigned below)
cmova  rax, rbx         ; if CF=0 and ZF=0 (unsigned above)
cmovs  rax, rbx         ; if SF=1 (negative)
cmovns rax, rbx         ; if SF=0 (non-negative)

The full list mirrors the conditional jump list: every conditional jump J<cc> has a corresponding CMOV<cc>.

Implementing abs(), max(), min() Without Branches

; abs(rdi) → rax
abs64:
    mov  rax, rdi
    neg  rax               ; rax = -rdi
    cmovs rax, rdi         ; if result was negative (original was positive), restore rdi
    ; Wait, that's backward. Let's redo:
    ; We want: if rdi >= 0, return rdi; else return -rdi
    mov  rax, rdi
    neg  rax               ; rax = -rdi
    cmovns rax, rdi        ; if rdi was non-negative (SF=0 before neg or... careful)
    ret

The abs implementation is actually subtle because NEG modifies flags. Let us be precise:

; Safe abs(rdi) → rax
abs64:
    mov  rax, rdi
    test rdi, rdi          ; set flags based on rdi (SF=1 if negative)
    jns  .positive         ; if non-negative, done
    neg  rax               ; negate if negative
.positive:
    ret

; Or fully branchless:
abs64_branchless:
    mov  rax, rdi
    mov  rcx, rdi
    sar  rcx, 63           ; rcx = sign mask: 0xFFFF...FFFF if negative, 0 if positive
    xor  rax, rcx          ; flip all bits if negative (one's complement)
    sub  rax, rcx          ; subtract sign mask: +1 if negative (makes two's complement), +0 if positive
    ret

The XOR+SUB trick: sign mask is -1 (all ones) for negative numbers. XOR with -1 is bitwise NOT. Subtract -1 is add 1. So NEG = NOT + ADD 1 = two's complement negation.

; max(rdi, rsi) → rax (signed)
max64:
    mov  rax, rdi
    cmp  rdi, rsi
    cmovl rax, rsi         ; if rdi < rsi, rax = rsi
    ret

; min(rdi, rsi) → rax (signed)
min64:
    mov  rax, rdi
    cmp  rdi, rsi
    cmovg rax, rsi         ; if rdi > rsi, rax = rsi
    ret

When CMOV Helps vs. Hurts

CMOV eliminates branch misprediction penalty, which can be 10-20 cycles on modern CPUs. But it is not always faster:

CMOV wins when: - The branch is unpredictable (roughly 50/50 distribution) - The values being selected are already in registers (no load involved) - The computation fits the "compute both, select one" pattern

CMOV hurts when: - The branch is highly predictable (>95% one way) — the processor's branch predictor handles it for near-free - The "not selected" computation is expensive or involves a slow load - CMOV creates a longer dependency chain

; This pattern is bad for CMOV when result is rarely changed:
cmp rax, 0
cmovz rax, rbx    ; only select rbx if rax is zero (rare)
; Better: use JNZ to skip when not needed

; This pattern is good for CMOV when condition is 50/50:
; Median of three values (a, b, c) in RDI, RSI, RDX
; Branch predictor cannot predict these reliably
cmp  rdi, rsi
cmovg rdi, rsi    ; ensure rdi = min(a, b)
cmp  rdi, rdx
cmovg rdi, rdx    ; rdi = min(min(a,b), c)
; ... etc for median

Complete Register Trace: A Nested if-else

; Implement: int classify(int x)
;   returns -1 if x < 0, 0 if x == 0, 1 if x > 0
; x in RDI, result in RAX

classify:
    xor  eax, eax          ; rax = 0 (assume x == 0)
    test rdi, rdi          ; set flags
    jz   .done             ; if x == 0, return 0

    ; x != 0
    mov  eax, 1            ; assume positive
    jns  .done             ; if x > 0 (SF=0, ZF=0 after test), done

    ; x < 0
    mov  eax, -1           ; x is negative

.done:
    ret
Instruction RAX RDI (example: -5) ZF SF Notes
xor eax, eax 0 -5 1 0 ZF set by XOR (result=0)
test rdi, rdi 0 -5 0 1 ZF=0 (not zero), SF=1 (negative)
jz .done 0 -5 Not taken (ZF=0)
mov eax, 1 1 -5 Flags unchanged by MOV
jns .done 1 -5 Not taken (SF=1, meaning negative)
mov eax, -1 -1 -5
ret -1 -5 Returns -1

For RDI = 7 (positive): test sets ZF=0, SF=0. JZ not taken. EAX becomes 1. JNS taken (SF=0). Returns 1. For RDI = 0: test sets ZF=1, SF=0. JZ taken. Returns 0.

🛠️ Lab Exercise: Assemble and run classify for inputs -100, 0, 42, INT64_MIN, INT64_MAX. Use GDB to single-step and watch the flags word change. Confirm the jns instruction correctly handles the INT64_MIN case (which is negative and thus SF=1 after TEST).

Branch Prediction Preview

Modern processors speculatively execute both sides of a branch, then discard the incorrect side when the actual condition is known. A correct prediction is free (the instructions were already in-flight); a misprediction costs 10-20 cycles to flush the pipeline and restart.

Factors affecting prediction accuracy: - Static patterns (always-taken backward jumps) are predicted well - Correlated branches (this branch depends on a previous one) can be tracked by the predictor - Random/data-dependent branches (e.g., processing arbitrary user data) are hard to predict

Chapter 31 covers branch prediction microarchitecture in depth. For now: if you have a branch that is taken 50% of the time with no pattern, consider whether CMOV or the "compute-both-and-select" idiom gives better performance.

Jump Tables in the Wild: GCC Disassembly

Let us examine what GCC actually generates for a non-trivial switch. Given:

const char *day_name(int d) {
    switch (d) {
        case 0: return "Sunday";
        case 1: return "Monday";
        case 2: return "Tuesday";
        case 3: return "Wednesday";
        case 4: return "Thursday";
        case 5: return "Friday";
        case 6: return "Saturday";
        default: return "Unknown";
    }
}

GCC -O2 produces (annotated):

day_name:
    ; Bounds check: d must be 0..6
    cmp    edi, 6
    ja     .default           ; unsigned above 6 → default

    ; Load from jump table
    mov    eax, edi
    lea    rdx, [rel .jtable]
    mov    rax, [rdx + rax*8]  ; load address from table
    ret                         ; return the address (it's the string pointer)

.jtable:
    dq .str_sunday    ; 0
    dq .str_monday    ; 1
    dq .str_tuesday   ; 2
    dq .str_wednesday ; 3
    dq .str_thursday  ; 4
    dq .str_friday    ; 5
    dq .str_saturday  ; 6

.default:
    lea rax, [rel .str_unknown]
    ret

Note the ja .default before the table lookup — this bounds check is mandatory. Without it, a value like d = 100 would index 800 bytes past the start of the jump table, reading an arbitrary address from memory and jumping to it. This is a jump table bounds check vulnerability if the check is absent or incorrect (relevant to Chapter 35's buffer overflow discussion).

🔐 Security Note: Jump tables without bounds checks are exploitable. An attacker who can control the switch value and the memory contents near the jump table can redirect execution arbitrarily. GCC always generates the bounds check before the table lookup. If you write a jump table in hand-written assembly, never omit the bounds check.

The Complete C Control Flow Map

C construct Assembly pattern
if (cond) { A } test/cmp; j<notcond> skip; A; skip:
if (cond) { A } else { B } test/cmp; j<notcond> else; A; jmp end; else: B; end:
while (cond) { A } start: cmp/test; j<notcond> end; A; jmp start; end:
do { A } while (cond) start: A; cmp/test; j<cond> start
for (init; cond; incr) { A } init; cmp/test; j<notcond> end; start: A; incr; cmp/test; j<cond> start; end:
break jmp loop_end
continue jmp loop_start
switch (v) { case k: ... } bounds check + jmp [table + v*8]
return x mov rax, x; ret
a > b ? a : b cmp a,b; cmovle dst,b

Summary

Control flow in assembly reduces to two operations: set flags (CMP, TEST, arithmetic), then act on flags (conditional jump or conditional move). Every C control structure — including the humble if/else and the complex switch — has a direct assembly translation following predictable patterns.

The critical distinctions: signed comparisons use JL/JG/JLE/JGE; unsigned comparisons use JB/JA/JBE/JAE. Using the wrong family gives silently incorrect results for values where the signed and unsigned interpretations differ. CMOV provides branch-free conditional selection, trading predictable branch penalty for a fixed execution cost — correct most of the time, but not always faster.

In Chapter 11, control flow meets the call/ret pair: the mechanism that makes functions work, and the foundation for understanding the stack frame layout that Chapter 35's buffer overflow will exploit.