Every conditional branch in every C program compiles to two things: a comparison that sets flags, and a conditional jump that reads those flags. There are no if statements in machine code — only flag tests and jumps. This chapter makes that...
In This Chapter
- The Jump is the Computer's Only Decision
- Unconditional Jumps
- Conditional Jumps
- Translating C Control Structures to Assembly
- The LOOP Instruction
- CMOV: Conditional Move (Branch-Free Code)
- Complete Register Trace: A Nested if-else
- Branch Prediction Preview
- Jump Tables in the Wild: GCC Disassembly
- The Complete C Control Flow Map
- Summary
Chapter 10: Control Flow
The Jump is the Computer's Only Decision
Every conditional branch in every C program compiles to two things: a comparison that sets flags, and a conditional jump that reads those flags. There are no if statements in machine code — only flag tests and jumps. This chapter makes that translation explicit, covering every conditional jump in the x86-64 ISA, every C control structure mapped to assembly, and the CMOV family of conditional move instructions that implement branching without branches.
Unconditional Jumps
The simplest jump: go to a label, no conditions.
jmp target ; near jump (32-bit offset from RIP)
jmp short target ; short jump (8-bit offset, ±127 bytes)
jmp rax ; indirect jump: jump to address in RAX
jmp [rax] ; indirect jump: jump to address stored at memory[RAX]
jmp [table + rax*8] ; indirect jump through table (jump tables, see below)
Jump Encodings
Three encodings exist for unconditional jumps:
- Short jump (EB cb): 2 bytes total. Offset is a signed 8-bit value, range ±127 bytes from the next instruction. The assembler uses this when the target is within range.
- Near jump (E9 cd): 5 bytes total. Offset is a signed 32-bit value, range ±2GB from the next instruction. This covers virtually all jumps in a 64-bit binary.
- Far jump: Used for switching between privilege levels or code segments (relevant for MinOS kernel work in Chapter 33). Rare in userspace.
The assembler chooses automatically. You can force the short encoding with jmp short label (and get an assembler error if the target is out of range).
Indirect Jumps
jmp rax ; jump to the address stored in RAX
Indirect jumps are the mechanism behind:
1. Virtual function calls in C++ — the vtable entry holds the address
2. Jump tables for switch statements — an array of addresses indexed by the case value
3. Return from function — ret is essentially jmp [rsp]; add rsp, 8
4. Dynamic dispatch in interpreters
; Function pointer call (C: fp())
; Assume function pointer is in RAX
call rax ; push return address, jump to RAX
; or for a call through memory:
call [rdi + 16] ; vtable entry at offset 16
Conditional Jumps
Every conditional jump checks one or more flags from RFLAGS. After a CMP or TEST instruction, the flags encode the relationship between the operands, and the conditional jump acts on those flags.
The Complete Conditional Jump Reference
Signed comparisons (use after CMP on signed integers):
| Mnemonic | Aliases | Flag condition | C equivalent |
|---|---|---|---|
| JE | JZ | ZF = 1 | == |
| JNE | JNZ | ZF = 0 | != |
| JL | JNGE | SF ≠ OF | < (signed) |
| JLE | JNG | ZF = 1 OR SF ≠ OF | <= (signed) |
| JG | JNLE | ZF = 0 AND SF = OF | > (signed) |
| JGE | JNL | SF = OF | >= (signed) |
| JS | — | SF = 1 | result is negative |
| JNS | — | SF = 0 | result is non-negative |
| JO | — | OF = 1 | signed overflow occurred |
| JNO | — | OF = 0 | no signed overflow |
Unsigned comparisons (use after CMP on unsigned integers):
| Mnemonic | Aliases | Flag condition | C equivalent |
|---|---|---|---|
| JE | JZ | ZF = 1 | == |
| JNE | JNZ | ZF = 0 | != |
| JB | JNAE, JC | CF = 1 | < (unsigned, "below") |
| JBE | JNA | CF = 1 OR ZF = 1 | <= (unsigned, "below or equal") |
| JA | JNBE | CF = 0 AND ZF = 0 | > (unsigned, "above") |
| JAE | JNB, JNC | CF = 0 | >= (unsigned, "above or equal") |
Other conditional jumps:
| Mnemonic | Flag condition | Use case |
|---|---|---|
| JP | JPE | PF = 1 |
| JNP | JPO | PF = 0 |
| JCXZ | — | CX = 0 |
| JECXZ | — | ECX = 0 |
| JRCXZ | — | RCX = 0 |
⚠️ Common Mistake — Signed vs. Unsigned: This is the most common control flow bug in assembly. Consider:
nasm cmp rax, 0x8000000000000000 jl is_lessIf RAX = 0 and the comparison value is 0x8000000000000000 (which is -INT64_MAX-1 as a signed value, or 9223372036854775808 as unsigned), then: - JL (signed less than): 0 < -INT64_MAX-1? No, because -INT64_MAX-1 is negative. JL is NOT taken. - JB (unsigned less than): 0 < 9223372036854775808? Yes. JB IS taken.
Using JL where you mean JB (or vice versa) gives completely wrong results. The rule: use JL/JLE/JG/JGE for signed integers, use JB/JBE/JA/JAE for unsigned.
Translating C Control Structures to Assembly
if/else
if (a > b) {
do_something();
} else {
do_other();
}
Assembly pattern — jump over the if-body on false condition:
; Signed comparison, a in RAX, b in RBX
cmp rax, rbx ; set flags for a - b
jle .else_branch ; jump if NOT (a > b), i.e., if a <= b
; if-body:
call do_something
jmp .end_if
.else_branch:
call do_other
.end_if:
; continue...
The key: the jump condition is the negation of the C condition. if (a > b) → jle .else (jump if less-or-equal, the opposite).
For a simple if without else:
; if (x != 0) { ... }
test rax, rax
jz .skip ; jump if x == 0 (negation of x != 0)
; if-body
.skip:
while Loop
while (i < n) {
process(i);
i++;
}
; i in RCX, n in RDX
; Option 1: check at top (standard while)
.while_start:
cmp rcx, rdx ; i < n?
jge .while_end ; exit if i >= n
mov rdi, rcx ; arg: i
call process
inc rcx ; i++
jmp .while_start ; loop back
.while_end:
Or the preferred form — check at bottom to save one jump:
; Test first, then loop
cmp rcx, rdx
jge .while_end ; if initially false, skip body entirely
.while_body:
mov rdi, rcx
call process
inc rcx
cmp rcx, rdx ; check condition at bottom
jl .while_body ; loop if still true
.while_end:
The bottom-test form saves one unconditional jump per iteration.
for Loop
for (int i = 0; i < n; i++) {
arr[i] = i * 2;
}
; i in RCX (initialized below), n in RDX, arr base in RSI
xor ecx, ecx ; i = 0
test rdx, rdx ; n == 0?
jle .for_end ; skip if n <= 0
.for_body:
lea rax, [rcx + rcx] ; rax = i*2
mov [rsi + rcx*8], rax ; arr[i] = i*2
inc rcx
cmp rcx, rdx
jl .for_body ; continue if i < n
.for_end:
do-while Loop
The do-while loop has no pre-check — the body always executes at least once:
do {
process(data);
data = data->next;
} while (data != NULL);
; data pointer in RDI
.do_loop:
call process ; body
mov rdi, [rdi + 16] ; data = data->next (assume next at offset 16)
test rdi, rdi ; data != NULL?
jnz .do_loop ; loop if true (no initial check needed)
The do-while maps perfectly to the bottom-test loop form — there is no overhead for the initial check.
Counting Down vs. Counting Up
Counting down to zero is often slightly faster because you compare against zero (which is implicit in the loop test instruction) rather than against a non-zero upper bound:
; Count down: i from n-1 to 0
; More efficient: compare with zero is implicit in DEC + JNZ
mov rcx, n
.loop:
; ... body using rcx ...
dec rcx
jnz .loop ; loop while rcx != 0 (JNZ checks ZF from DEC)
; Count up: i from 0 to n-1
; Slightly more instructions (need explicit CMP)
xor ecx, ecx
.loop:
; ... body using rcx ...
inc rcx
cmp rcx, n
jl .loop
However, counting down changes the order of iteration and may not be valid if the algorithm requires ascending order.
switch/case and Jump Tables
When a switch statement has many cases with values in a small range, the compiler generates a jump table: an array of addresses, one per case, indexed by the switch variable.
switch (cmd) {
case 0: handle_read(); break;
case 1: handle_write(); break;
case 2: handle_seek(); break;
case 3: handle_close(); break;
default: handle_error(); break;
}
; cmd in RDI
; Jump table approach:
cmp rdi, 3 ; check upper bound for range
ja .default ; above 3: default case
; Indirect jump through table:
jmp [rel .jtable + rdi*8]
.jtable:
dq .case_0 ; address of case 0 handler
dq .case_1 ; address of case 1 handler
dq .case_2 ; address of case 2 handler
dq .case_3 ; address of case 3 handler
.case_0:
call handle_read
jmp .switch_end
.case_1:
call handle_write
jmp .switch_end
.case_2:
call handle_seek
jmp .switch_end
.case_3:
call handle_close
jmp .switch_end
.default:
call handle_error
.switch_end:
⚙️ How It Works: The address computation
[rel .jtable + rdi*8]treats the jump table as an array of 8-byte (pointer-size) values and uses the command value as the index. Onejmpinstruction dispatches to any of N cases in O(1) time, regardless of N. This is why compilers generate jump tables for dense switch statements — they are faster than a chain of comparisons (which is O(N) in the number of cases).
The compiler's decision: switch → jump table when: - Cases are dense (mostly consecutive values) - There are typically 5+ cases (overhead of bounds check + indirect jump is worth it) - Case values span a range ≤ some threshold (GCC default ~≤128)
For sparse switches (cases like 1, 100, 1000, 9999), the compiler generates a binary search tree of comparisons instead.
The LOOP Instruction
The LOOP instruction decrements RCX and jumps if RCX ≠ 0:
mov rcx, 10 ; loop count
.loop:
; ... body ...
loop .loop ; decrement RCX, jump if RCX != 0
This is a compact encoding (2 bytes) but has a catch: on modern processors, LOOP is microcoded and slower than the equivalent dec rcx; jnz pair. GCC almost never emits LOOP. It is occasionally useful in very tight code where instruction count matters more than cycle count (e.g., in boot code with size constraints), but for performance-critical loops, use dec rcx; jnz.
CMOV: Conditional Move (Branch-Free Code)
CMOV performs a register-to-register move conditionally, based on flags, without branching:
; General form:
cmov<cc> dst, src ; if condition <cc> is true, dst = src; else dst unchanged
; Examples:
cmove rax, rbx ; if ZF=1 (equal), rax = rbx
cmovne rax, rbx ; if ZF=0 (not equal)
cmovl rax, rbx ; if SF≠OF (signed less than)
cmovle rax, rbx ; if ZF=1 or SF≠OF (signed less-or-equal)
cmovg rax, rbx ; if ZF=0 and SF=OF (signed greater)
cmovge rax, rbx ; if SF=OF (signed greater-or-equal)
cmovb rax, rbx ; if CF=1 (unsigned below)
cmova rax, rbx ; if CF=0 and ZF=0 (unsigned above)
cmovs rax, rbx ; if SF=1 (negative)
cmovns rax, rbx ; if SF=0 (non-negative)
The full list mirrors the conditional jump list: every conditional jump J<cc> has a corresponding CMOV<cc>.
Implementing abs(), max(), min() Without Branches
; abs(rdi) → rax
abs64:
mov rax, rdi
neg rax ; rax = -rdi
cmovs rax, rdi ; if result was negative (original was positive), restore rdi
; Wait, that's backward. Let's redo:
; We want: if rdi >= 0, return rdi; else return -rdi
mov rax, rdi
neg rax ; rax = -rdi
cmovns rax, rdi ; if rdi was non-negative (SF=0 before neg or... careful)
ret
The abs implementation is actually subtle because NEG modifies flags. Let us be precise:
; Safe abs(rdi) → rax
abs64:
mov rax, rdi
test rdi, rdi ; set flags based on rdi (SF=1 if negative)
jns .positive ; if non-negative, done
neg rax ; negate if negative
.positive:
ret
; Or fully branchless:
abs64_branchless:
mov rax, rdi
mov rcx, rdi
sar rcx, 63 ; rcx = sign mask: 0xFFFF...FFFF if negative, 0 if positive
xor rax, rcx ; flip all bits if negative (one's complement)
sub rax, rcx ; subtract sign mask: +1 if negative (makes two's complement), +0 if positive
ret
The XOR+SUB trick: sign mask is -1 (all ones) for negative numbers. XOR with -1 is bitwise NOT. Subtract -1 is add 1. So NEG = NOT + ADD 1 = two's complement negation.
; max(rdi, rsi) → rax (signed)
max64:
mov rax, rdi
cmp rdi, rsi
cmovl rax, rsi ; if rdi < rsi, rax = rsi
ret
; min(rdi, rsi) → rax (signed)
min64:
mov rax, rdi
cmp rdi, rsi
cmovg rax, rsi ; if rdi > rsi, rax = rsi
ret
When CMOV Helps vs. Hurts
CMOV eliminates branch misprediction penalty, which can be 10-20 cycles on modern CPUs. But it is not always faster:
CMOV wins when: - The branch is unpredictable (roughly 50/50 distribution) - The values being selected are already in registers (no load involved) - The computation fits the "compute both, select one" pattern
CMOV hurts when: - The branch is highly predictable (>95% one way) — the processor's branch predictor handles it for near-free - The "not selected" computation is expensive or involves a slow load - CMOV creates a longer dependency chain
; This pattern is bad for CMOV when result is rarely changed:
cmp rax, 0
cmovz rax, rbx ; only select rbx if rax is zero (rare)
; Better: use JNZ to skip when not needed
; This pattern is good for CMOV when condition is 50/50:
; Median of three values (a, b, c) in RDI, RSI, RDX
; Branch predictor cannot predict these reliably
cmp rdi, rsi
cmovg rdi, rsi ; ensure rdi = min(a, b)
cmp rdi, rdx
cmovg rdi, rdx ; rdi = min(min(a,b), c)
; ... etc for median
Complete Register Trace: A Nested if-else
; Implement: int classify(int x)
; returns -1 if x < 0, 0 if x == 0, 1 if x > 0
; x in RDI, result in RAX
classify:
xor eax, eax ; rax = 0 (assume x == 0)
test rdi, rdi ; set flags
jz .done ; if x == 0, return 0
; x != 0
mov eax, 1 ; assume positive
jns .done ; if x > 0 (SF=0, ZF=0 after test), done
; x < 0
mov eax, -1 ; x is negative
.done:
ret
| Instruction | RAX | RDI (example: -5) | ZF | SF | Notes |
|---|---|---|---|---|---|
xor eax, eax |
0 | -5 | 1 | 0 | ZF set by XOR (result=0) |
test rdi, rdi |
0 | -5 | 0 | 1 | ZF=0 (not zero), SF=1 (negative) |
jz .done |
0 | -5 | — | — | Not taken (ZF=0) |
mov eax, 1 |
1 | -5 | — | — | Flags unchanged by MOV |
jns .done |
1 | -5 | — | — | Not taken (SF=1, meaning negative) |
mov eax, -1 |
-1 | -5 | — | — | |
ret |
-1 | -5 | — | — | Returns -1 |
For RDI = 7 (positive): test sets ZF=0, SF=0. JZ not taken. EAX becomes 1. JNS taken (SF=0). Returns 1. For RDI = 0: test sets ZF=1, SF=0. JZ taken. Returns 0.
🛠️ Lab Exercise: Assemble and run
classifyfor inputs -100, 0, 42, INT64_MIN, INT64_MAX. Use GDB to single-step and watch the flags word change. Confirm thejnsinstruction correctly handles the INT64_MIN case (which is negative and thus SF=1 after TEST).
Branch Prediction Preview
Modern processors speculatively execute both sides of a branch, then discard the incorrect side when the actual condition is known. A correct prediction is free (the instructions were already in-flight); a misprediction costs 10-20 cycles to flush the pipeline and restart.
Factors affecting prediction accuracy: - Static patterns (always-taken backward jumps) are predicted well - Correlated branches (this branch depends on a previous one) can be tracked by the predictor - Random/data-dependent branches (e.g., processing arbitrary user data) are hard to predict
Chapter 31 covers branch prediction microarchitecture in depth. For now: if you have a branch that is taken 50% of the time with no pattern, consider whether CMOV or the "compute-both-and-select" idiom gives better performance.
Jump Tables in the Wild: GCC Disassembly
Let us examine what GCC actually generates for a non-trivial switch. Given:
const char *day_name(int d) {
switch (d) {
case 0: return "Sunday";
case 1: return "Monday";
case 2: return "Tuesday";
case 3: return "Wednesday";
case 4: return "Thursday";
case 5: return "Friday";
case 6: return "Saturday";
default: return "Unknown";
}
}
GCC -O2 produces (annotated):
day_name:
; Bounds check: d must be 0..6
cmp edi, 6
ja .default ; unsigned above 6 → default
; Load from jump table
mov eax, edi
lea rdx, [rel .jtable]
mov rax, [rdx + rax*8] ; load address from table
ret ; return the address (it's the string pointer)
.jtable:
dq .str_sunday ; 0
dq .str_monday ; 1
dq .str_tuesday ; 2
dq .str_wednesday ; 3
dq .str_thursday ; 4
dq .str_friday ; 5
dq .str_saturday ; 6
.default:
lea rax, [rel .str_unknown]
ret
Note the ja .default before the table lookup — this bounds check is mandatory. Without it, a value like d = 100 would index 800 bytes past the start of the jump table, reading an arbitrary address from memory and jumping to it. This is a jump table bounds check vulnerability if the check is absent or incorrect (relevant to Chapter 35's buffer overflow discussion).
🔐 Security Note: Jump tables without bounds checks are exploitable. An attacker who can control the switch value and the memory contents near the jump table can redirect execution arbitrarily. GCC always generates the bounds check before the table lookup. If you write a jump table in hand-written assembly, never omit the bounds check.
The Complete C Control Flow Map
| C construct | Assembly pattern |
|---|---|
if (cond) { A } |
test/cmp; j<notcond> skip; A; skip: |
if (cond) { A } else { B } |
test/cmp; j<notcond> else; A; jmp end; else: B; end: |
while (cond) { A } |
start: cmp/test; j<notcond> end; A; jmp start; end: |
do { A } while (cond) |
start: A; cmp/test; j<cond> start |
for (init; cond; incr) { A } |
init; cmp/test; j<notcond> end; start: A; incr; cmp/test; j<cond> start; end: |
break |
jmp loop_end |
continue |
jmp loop_start |
switch (v) { case k: ... } |
bounds check + jmp [table + v*8] |
return x |
mov rax, x; ret |
a > b ? a : b |
cmp a,b; cmovle dst,b |
Summary
Control flow in assembly reduces to two operations: set flags (CMP, TEST, arithmetic), then act on flags (conditional jump or conditional move). Every C control structure — including the humble if/else and the complex switch — has a direct assembly translation following predictable patterns.
The critical distinctions: signed comparisons use JL/JG/JLE/JGE; unsigned comparisons use JB/JA/JBE/JAE. Using the wrong family gives silently incorrect results for values where the signed and unsigned interpretations differ. CMOV provides branch-free conditional selection, trading predictable branch penalty for a fixed execution cost — correct most of the time, but not always faster.
In Chapter 11, control flow meets the call/ret pair: the mechanism that makes functions work, and the foundation for understanding the stack frame layout that Chapter 35's buffer overflow will exploit.