Chapter 10 Exercises: Control Flow
Section A: Conditional Jump Translation
Exercise 1. For each CMP followed by a conditional jump, state whether the jump is taken. Assume signed integers unless stated otherwise.
(a) mov rax, 5; mov rbx, 10; cmp rax, rbx; jl taken
(b) mov rax, -5; mov rbx, 10; cmp rax, rbx; jb taken (unsigned)
(c) mov rax, 0xFFFFFFFFFFFFFFFF; cmp rax, 0; jg taken (signed)
(d) mov rax, 0xFFFFFFFFFFFFFFFF; cmp rax, 0; ja taken (unsigned)
(e) xor rax, rax; test rax, rax; jnz taken
(f) mov rax, -1; test rax, rax; js taken
Exercise 2. Translate the following C code to NASM assembly. Use the register assignments shown. Do not call any functions; implement the body directly.
// a in RDI, b in RSI, result in RAX
long maximum(long a, long b) {
if (a > b) {
return a;
} else {
return b;
}
}
Exercise 3. Translate to NASM assembly (no function calls; inline the body):
// x in RDI, result in RAX
long count_digits(long x) {
if (x < 0) x = -x;
if (x == 0) return 1;
long count = 0;
while (x > 0) {
x /= 10;
count++;
}
return count;
}
Exercise 4.
The following is a correct signed comparison, but the wrong jump type is used. Identify the bug and explain what wrong result it produces for rax = 200, rbx = 100 (treating as unsigned):
cmp rax, rbx
jg .greater ; supposed to mean "if (unsigned)rax > (unsigned)rbx"
What should the instruction be?
Section B: Loop Translation
Exercise 5. Translate this C loop to assembly. RDI = n, RSI = array base address (int64_t *), result in RAX.
long sum_array(long *array, long n) {
long total = 0;
for (long i = 0; i < n; i++) {
total += array[i];
}
return total;
}
Write both the "test at top" and "test at bottom" versions. How many iterations does the bottom-test version save a branch on?
Exercise 6. Translate to assembly using the count-down technique (RCX counts from n to 0):
// RDI = array, ESI = count
void zero_array(int64_t *array, int count) {
for (int i = 0; i < count; i++) {
array[i] = 0;
}
}
Exercise 7. Translate this do-while loop:
// RDI = string pointer, returns length in RAX (not counting null terminator)
long my_strlen(const char *s) {
const char *p = s;
do {
p++;
} while (*p != '\0');
return p - s - 1;
}
Exercise 8.
Translate the following nested loop. Assume matrix is a 5×5 array of int64_t, base address in RDI. Fill the upper triangle (i < j) with 1, lower triangle (i > j) with -1, diagonal (i == j) with 0.
for (int i = 0; i < 5; i++) {
for (int j = 0; j < 5; j++) {
if (i < j) matrix[i*5+j] = 1;
else if (i > j) matrix[i*5+j] = -1;
else matrix[i*5+j] = 0;
}
}
Section C: CMOV Exercises
Exercise 9. Rewrite each of the following using CMOV instead of conditional jumps:
(a)
cmp rdi, rsi
jle .skip
mov rdi, rsi
.skip:
; rdi = min(rdi, rsi) after this
(b)
test rdi, rdi
jns .positive
neg rdi
.positive:
; rdi = abs(rdi) after this
(c)
cmp rcx, 255
jbe .in_range
mov rcx, 255
.in_range:
; rcx = min(rcx, 255) after this
Exercise 10.
Implement a branchless clamp(x, lo, hi) function using only CMOV instructions. x in RDI, lo in RSI, hi in RDX. Return clamped value in RAX.
Exercise 11. For each of the following scenarios, decide whether CMOV or a conditional branch is likely to be faster on a modern out-of-order processor. Explain your reasoning:
(a) Checking if a 64-bit value is divisible by 2 (every other iteration of a loop) (b) A game AI choosing between "chase" and "evade" behavior based on health percentage (c) Checking if an input character is in the range 'A'-'Z' (for a text parser reading random English text) (d) A sorting network comparison step (comparing elements of an array being sorted)
Section D: Jump Tables
Exercise 12.
Write a NASM jump table implementation of the following switch statement. Include the bounds check. The handlers handle_a, handle_b, handle_c, handle_d are external functions.
void dispatch(int code) {
switch (code) {
case 0: handle_a(); break;
case 1: handle_b(); break;
case 2: handle_c(); break;
case 3: handle_d(); break;
default: handle_error(); break;
}
}
Exercise 13. A jump table implementation has the following bug. Find it:
dispatch:
cmp edi, 4 ; bounds check
ja .default
; Note: cases are 1, 2, 3, 4 (NOT 0-based!)
jmp [rel .jtable + rdi*8]
.jtable:
dq .case_1
dq .case_2
dq .case_3
dq .case_4
Exercise 14 (Challenge).
Implement a jump table for a switch with non-contiguous cases: case 10, case 20, case 30, case 40. The input value is in EDI. Two approaches:
(a) Build a table of size 41 with the default handler for all non-case values (zero-fill approach). (b) Subtract the minimum value (10), then divide by the step (10), check range, then index into a small 4-entry table.
Implement approach (b) in NASM.
Section E: Complete Program Exercises
Exercise 15. Write a complete NASM program that reads no input, computes the first 20 Fibonacci numbers, stores them in a static array, and then finds the maximum value in the array. Exit with the maximum value modulo 256 as the exit code.
The Fibonacci sequence: F(0)=0, F(1)=1, F(n)=F(n-1)+F(n-2).
Exercise 16.
Write a NASM function int64_t binary_search(int64_t *array, int64_t n, int64_t target) that performs a binary search. The function should:
- Take array (RDI), n (RSI, number of elements), target (RDX)
- Return the index if found, -1 if not found
Use only conditional jumps (no CMOV) for clarity.
Exercise 17. Convert this recursive function to iterative assembly:
long power(long base, long exp) {
if (exp == 0) return 1;
if (exp % 2 == 0) {
long half = power(base, exp / 2);
return half * half;
} else {
return base * power(base, exp - 1);
}
}
Iterative version in assembly using a loop (fast exponentiation / exponentiation by squaring).
Exercise 18. Optimize the following loop using at least two techniques from this chapter (loop inversion, counting down, CMOV, loop unrolling):
; Count nonzero elements in array
; RDI = array, RSI = count, result in RAX
count_nonzero:
xor eax, eax
xor ecx, ecx
.loop:
cmp ecx, esi
jge .done
cmp qword [rdi + rcx*8], 0
je .skip
inc rax
.skip:
inc ecx
jmp .loop
.done:
ret
Exercise 19 (Debug).
Find all bugs in this translation of if (x > 0 && y > 0):
; x in RDI, y in RSI
test rdi, rdi
jg .first_true ; if x > 0...
jmp .false
.first_true:
test rsi, rsi
jg .both_true
jmp .false
.both_true:
mov eax, 1
ret
.false:
xor eax, eax
ret
Is there a more efficient version? Write it.
Exercise 20 (Challenge: Branchless Sort). Implement a branchless sorting network for 4 elements. The elements are in RDI, RSI, RDX, RCX. Sort them in ascending order (result in the same registers). Use only MOV, CMP, and CMOV — no branches.
A sorting network for 4 elements uses 5 compare-and-swap operations. Look up the optimal sorting network for N=4.