Chapter 10 Exercises: Control Flow

Section A: Conditional Jump Translation

Exercise 1. For each CMP followed by a conditional jump, state whether the jump is taken. Assume signed integers unless stated otherwise.

(a) mov rax, 5; mov rbx, 10; cmp rax, rbx; jl taken (b) mov rax, -5; mov rbx, 10; cmp rax, rbx; jb taken (unsigned) (c) mov rax, 0xFFFFFFFFFFFFFFFF; cmp rax, 0; jg taken (signed) (d) mov rax, 0xFFFFFFFFFFFFFFFF; cmp rax, 0; ja taken (unsigned) (e) xor rax, rax; test rax, rax; jnz taken (f) mov rax, -1; test rax, rax; js taken


Exercise 2. Translate the following C code to NASM assembly. Use the register assignments shown. Do not call any functions; implement the body directly.

// a in RDI, b in RSI, result in RAX
long maximum(long a, long b) {
    if (a > b) {
        return a;
    } else {
        return b;
    }
}

Exercise 3. Translate to NASM assembly (no function calls; inline the body):

// x in RDI, result in RAX
long count_digits(long x) {
    if (x < 0) x = -x;
    if (x == 0) return 1;
    long count = 0;
    while (x > 0) {
        x /= 10;
        count++;
    }
    return count;
}

Exercise 4. The following is a correct signed comparison, but the wrong jump type is used. Identify the bug and explain what wrong result it produces for rax = 200, rbx = 100 (treating as unsigned):

cmp rax, rbx
jg  .greater   ; supposed to mean "if (unsigned)rax > (unsigned)rbx"

What should the instruction be?


Section B: Loop Translation

Exercise 5. Translate this C loop to assembly. RDI = n, RSI = array base address (int64_t *), result in RAX.

long sum_array(long *array, long n) {
    long total = 0;
    for (long i = 0; i < n; i++) {
        total += array[i];
    }
    return total;
}

Write both the "test at top" and "test at bottom" versions. How many iterations does the bottom-test version save a branch on?


Exercise 6. Translate to assembly using the count-down technique (RCX counts from n to 0):

// RDI = array, ESI = count
void zero_array(int64_t *array, int count) {
    for (int i = 0; i < count; i++) {
        array[i] = 0;
    }
}

Exercise 7. Translate this do-while loop:

// RDI = string pointer, returns length in RAX (not counting null terminator)
long my_strlen(const char *s) {
    const char *p = s;
    do {
        p++;
    } while (*p != '\0');
    return p - s - 1;
}

Exercise 8. Translate the following nested loop. Assume matrix is a 5×5 array of int64_t, base address in RDI. Fill the upper triangle (i < j) with 1, lower triangle (i > j) with -1, diagonal (i == j) with 0.

for (int i = 0; i < 5; i++) {
    for (int j = 0; j < 5; j++) {
        if (i < j) matrix[i*5+j] = 1;
        else if (i > j) matrix[i*5+j] = -1;
        else matrix[i*5+j] = 0;
    }
}

Section C: CMOV Exercises

Exercise 9. Rewrite each of the following using CMOV instead of conditional jumps:

(a)

cmp rdi, rsi
jle .skip
mov rdi, rsi
.skip:
; rdi = min(rdi, rsi) after this

(b)

test rdi, rdi
jns .positive
neg rdi
.positive:
; rdi = abs(rdi) after this

(c)

cmp rcx, 255
jbe .in_range
mov rcx, 255
.in_range:
; rcx = min(rcx, 255) after this

Exercise 10. Implement a branchless clamp(x, lo, hi) function using only CMOV instructions. x in RDI, lo in RSI, hi in RDX. Return clamped value in RAX.


Exercise 11. For each of the following scenarios, decide whether CMOV or a conditional branch is likely to be faster on a modern out-of-order processor. Explain your reasoning:

(a) Checking if a 64-bit value is divisible by 2 (every other iteration of a loop) (b) A game AI choosing between "chase" and "evade" behavior based on health percentage (c) Checking if an input character is in the range 'A'-'Z' (for a text parser reading random English text) (d) A sorting network comparison step (comparing elements of an array being sorted)


Section D: Jump Tables

Exercise 12. Write a NASM jump table implementation of the following switch statement. Include the bounds check. The handlers handle_a, handle_b, handle_c, handle_d are external functions.

void dispatch(int code) {
    switch (code) {
        case 0: handle_a(); break;
        case 1: handle_b(); break;
        case 2: handle_c(); break;
        case 3: handle_d(); break;
        default: handle_error(); break;
    }
}

Exercise 13. A jump table implementation has the following bug. Find it:

dispatch:
    cmp edi, 4          ; bounds check
    ja  .default
    ; Note: cases are 1, 2, 3, 4 (NOT 0-based!)
    jmp [rel .jtable + rdi*8]

.jtable:
    dq .case_1
    dq .case_2
    dq .case_3
    dq .case_4

Exercise 14 (Challenge). Implement a jump table for a switch with non-contiguous cases: case 10, case 20, case 30, case 40. The input value is in EDI. Two approaches:

(a) Build a table of size 41 with the default handler for all non-case values (zero-fill approach). (b) Subtract the minimum value (10), then divide by the step (10), check range, then index into a small 4-entry table.

Implement approach (b) in NASM.


Section E: Complete Program Exercises

Exercise 15. Write a complete NASM program that reads no input, computes the first 20 Fibonacci numbers, stores them in a static array, and then finds the maximum value in the array. Exit with the maximum value modulo 256 as the exit code.

The Fibonacci sequence: F(0)=0, F(1)=1, F(n)=F(n-1)+F(n-2).


Exercise 16. Write a NASM function int64_t binary_search(int64_t *array, int64_t n, int64_t target) that performs a binary search. The function should: - Take array (RDI), n (RSI, number of elements), target (RDX) - Return the index if found, -1 if not found

Use only conditional jumps (no CMOV) for clarity.


Exercise 17. Convert this recursive function to iterative assembly:

long power(long base, long exp) {
    if (exp == 0) return 1;
    if (exp % 2 == 0) {
        long half = power(base, exp / 2);
        return half * half;
    } else {
        return base * power(base, exp - 1);
    }
}

Iterative version in assembly using a loop (fast exponentiation / exponentiation by squaring).


Exercise 18. Optimize the following loop using at least two techniques from this chapter (loop inversion, counting down, CMOV, loop unrolling):

; Count nonzero elements in array
; RDI = array, RSI = count, result in RAX
count_nonzero:
    xor  eax, eax
    xor  ecx, ecx
.loop:
    cmp  ecx, esi
    jge  .done
    cmp  qword [rdi + rcx*8], 0
    je   .skip
    inc  rax
.skip:
    inc  ecx
    jmp  .loop
.done:
    ret

Exercise 19 (Debug). Find all bugs in this translation of if (x > 0 && y > 0):

; x in RDI, y in RSI
test rdi, rdi
jg   .first_true    ; if x > 0...
jmp  .false
.first_true:
test rsi, rsi
jg   .both_true
jmp  .false
.both_true:
mov  eax, 1
ret
.false:
xor  eax, eax
ret

Is there a more efficient version? Write it.


Exercise 20 (Challenge: Branchless Sort). Implement a branchless sorting network for 4 elements. The elements are in RDI, RSI, RDX, RCX. Sort them in ascending order (result in the same registers). Use only MOV, CMP, and CMOV — no branches.

A sorting network for 4 elements uses 5 compare-and-swap operations. Look up the optimal sorting network for N=4.