Chapter 7 Exercises: First Programs


Register Trace Exercises

These exercises develop the core skill of mentally executing assembly code. Fill in every blank cell in each table. Use the register state from the previous row as input for the next instruction.


Exercise 1: MOV and XOR Instruction Traces

Starting state: rax = 0xFFFFFFFFFFFFFFFF, rbx = 0x0000000000000042, rcx = 0x0000000000000000

Step Instruction RAX RBX RCX Notes
0 (initial) 0xFFFFFFFFFFFFFFFF 0x0000000000000042 0x0000000000000000
1 mov eax, ebx ? 0x0000000000000042 0x0000000000000000 Key: 32-bit write
2 xor ecx, ecx ? 0x0000000000000042 ? Key: zero idiom
3 mov rcx, rax ? 0x0000000000000042 ?
4 mov al, 0xFF ? 0x0000000000000042 ? Key: 8-bit write
5 xor rax, rax ? 0x0000000000000042 ?

Explain: why does step 1 clear the upper 32 bits of RAX while step 4 does not clear the upper 56 bits?


Exercise 2: ADD and SUB Flag Traces

Starting state: rax = 0x0000000000000005, rbx = 0x0000000000000003, all flags = 0

Step Instruction RAX RBX CF OF SF ZF Notes
0 (initial) 0x5 0x3 0 0 0 0
1 sub rax, rbx ? 0x3 ? ? ? ?
2 sub rax, rbx ? 0x3 ? ? ? ? CF=?
3 add rax, rbx ? 0x3 ? ? ? ?
4 xor rax, rax ? 0x3 ? ? ? ? Which flags change?
5 sub rax, 1 ? 0x3 ? ? ? ? Wraparound

Exercise 3: Overflow Detection Trace

Starting state: rax = 0x7FFFFFFFFFFFFFFF (largest signed 64-bit value), rbx = 0x0000000000000001

Step Instruction RAX (hex) RAX (signed decimal) CF OF SF ZF
0 (initial) 0x7FFFFFFFFFFFFFFF 9,223,372,036,854,775,807 0 0 0 0
1 add rax, rbx ? ? ? ? ? ?
2 add rax, rbx ? ? ? ? ? ?
3 add rax, rbx ? ? ? ? ? ?

After completing the table: 1. Which flag signals that the signed result is incorrect? 2. Which flag signals that the unsigned result wrapped around? 3. Write the two-instruction sequence that would jump to an error label if the addition in step 1 overflowed (signed).


Exercise 4: INC/DEC and NEG Traces

Starting state: rax = 0x0000000000000001, CF = 1 (from a previous operation)

Step Instruction RAX CF ZF SF Notes
0 (initial) 0x1 1 0 0
1 dec rax ? ? ? ? Does INC/DEC affect CF?
2 dec rax ? ? ? ? ZF set here?
3 dec rax ? ? ? ? SF set here?
4 neg rax ? ? ? ? NEG of a negative number
5 inc rax ? ? ? ?
6 neg rax ? ? ? ? NEG of 0 → CF = ?

Exercise 5: System Call Register Setup Trace

You need to call sys_write(1, msg, 13) — write 13 bytes from msg (address 0x402000) to stdout (fd 1).

Step Instruction RAX RDI RSI RDX Notes
0 (initial) ? ? ? ? (unknown, don't care)
1 mov rax, 1 ? ? ? ? syscall number
2 mov rdi, 1 ? ? ? ? fd = stdout
3 mov rsi, 0x402000 ? ? ? ? buffer address
4 mov rdx, 13 ? ? ? ? length
5 syscall ? ? ? ? RAX = return value

After the syscall, if exactly 13 bytes were written: what is RAX? If there was a write error: what range of values would RAX hold?


Programming Exercises


Exercise 6: Read and Echo

Write a complete NASM program that: 1. Reads one byte from stdin into a stack buffer using sys_read (syscall 0) 2. Writes that byte back to stdout using sys_write (syscall 1) 3. Writes a newline (byte value 10) after the echoed byte 4. Exits with code 0

Requirements: - Allocate the buffer on the stack (1 byte at [rsp]) - Keep RSP 16-byte aligned at the point of each syscall - Use only sys_read, sys_write, and sys_exit — no C library

Starter template:

section .text
    global _start

_start:
    ; Step 1: allocate 1-byte buffer on stack
    ; (hint: sub rsp, 8 keeps alignment, use [rsp] as buffer)

    ; Step 2: sys_read(0, [rsp], 1)
    ; rax=0, rdi=0, rsi=rsp, rdx=1

    ; Step 3: sys_write(1, [rsp], 1)

    ; Step 4: write newline (you'll need a newline byte somewhere)

    ; Step 5: sys_exit(0)

Test: compile, run, type a character, press Enter. The character should be echoed back followed by a newline.


Exercise 7: Factorial

Write a factorial function in NASM that computes n! for n in the range 0-12:

; factorial: compute n! iteratively
; Input:  rdi = n (0 <= n <= 12)
; Output: rax = n!
; Clobbers: rcx (loop counter), rax (accumulator)
; Note: 13! overflows 64 bits, so limit n to 12

Requirements: - Implement iteratively (loop, not recursion) - Handle n=0 (return 1) and n=1 (return 1) correctly - Write a _start that calls factorial(10) and exits with the result mod 256 as the exit code - Verify with: echo $? after running (10! = 3628800; 3628800 mod 256 = 0)

Verify your results: 0!=1, 1!=1, 2!=2, 3!=6, 4!=24, 5!=120, 6!=720, 7!=5040, 8!=40320, 9!=362880, 10!=3628800, 11!=39916800, 12!=479001600.


Exercise 8: strlen — Naive Implementation

Write a my_strlen function with this interface:

; my_strlen: compute length of null-terminated string
; Input:  rdi = pointer to null-terminated string
; Output: rax = length (not counting null terminator)
; Clobbers: rdi (advances to null byte), rax

Requirements: - Use a loop that loads a byte and checks if it's zero - Use MOVZX to zero-extend the byte into a register (avoid partial register writes in the loop) - Write a _start that calls my_strlen with a test string and exits with the length as the exit code - Verify: my_strlen("hello") = 5, my_strlen("") = 0, my_strlen("a") = 1

Test data for .data section:

section .data
    test_str    db "Hello, World!", 0    ; expect 13
    empty_str   db 0                    ; expect 0

Exercise 9: Uppercase Converter

Write a program that: 1. Reads up to 64 bytes from stdin into a buffer 2. Converts all lowercase ASCII letters (a-z, values 0x61-0x7A) to uppercase (A-Z, values 0x41-0x5A) by clearing bit 5 (subtract 0x20) 3. Writes the converted buffer back to stdout 4. Exits

Key instruction for this exercise: you can use and al, 0xDF to clear bit 5 of a byte, which uppercases a lowercase ASCII letter. But you only want to apply this to lowercase letters.

Hint: use CMP and conditional jumps to check the range:

    cmp al, 'a'
    jb  .not_lower      ; below 'a' — not lowercase
    cmp al, 'z'
    ja  .not_lower      ; above 'z' — not lowercase
    sub al, 0x20        ; convert to uppercase
.not_lower:

Exercise 10: Count Specific Bytes

Write a count_byte function:

; count_byte: count occurrences of a specific byte value in a buffer
; Input:  rdi = pointer to buffer
;         rsi = buffer length (bytes)
;         rdx = byte value to search for (0-255)
; Output: rax = count of matching bytes
; Clobbers: rdi, rsi, rdx, rcx, rax

Write a test _start that: 1. Calls count_byte on the string "hello world" (length 11) looking for 'l' (0x6C) 2. Exits with the count as the exit code (expect: 3)


Debugging Exercises

Each of the following programs contains a bug. Identify the bug, explain why it causes incorrect behavior, and write the corrected version.


Exercise 11: Debug This — The Register Aliasing Bug

; Intended: count_vowels(str, len) — counts a, e, i, o, u in a string
; Input:  rdi = string pointer, rsi = length
; Output: rax = vowel count

count_vowels:
    xor     rax, rax        ; count = 0
    xor     rcx, rcx        ; index = 0
.loop:
    cmp     rcx, rsi
    jge     .done
    mov     bl, [rdi + rcx]  ; load byte
    ; check if it's a vowel
    cmp     bl, 'a'
    je      .is_vowel
    cmp     bl, 'e'
    je      .is_vowel
    cmp     bl, 'i'
    je      .is_vowel
    cmp     bl, 'o'
    je      .is_vowel
    cmp     bl, 'u'
    je      .is_vowel
    jmp     .next
.is_vowel:
    inc     rax
.next:
    inc     rcx
    jmp     .loop
.done:
    ret

Bug: this function uses RBX without saving it. RBX is a callee-saved register in the System V AMD64 ABI. If the caller was relying on RBX being preserved across the call, this function would corrupt it.

Question: Fix the function. Also: why does using BL (the low byte of RBX) matter here? What would happen if you used mov r10b, [rdi + rcx] instead?


Exercise 12: Debug This — The Off-by-One

; Intended: print string s followed by a newline
; Input:  rdi = pointer to null-terminated string

print_line:
    ; find the length first
    push    rdi             ; save original pointer
    mov     rsi, rdi        ; rsi = working pointer
    xor     rcx, rcx        ; length = 0
.find_end:
    cmp     BYTE [rsi], 0
    je      .found_end
    inc     rcx
    inc     rsi
    jmp     .find_end
.found_end:
    ; rcx = length, original pointer on stack
    pop     rsi             ; rsi = original pointer (the buffer)
    mov     rax, 1          ; sys_write
    mov     rdi, 1          ; stdout
    ; rdx = length
    mov     rdx, rcx
    inc     rdx             ; BUG: why is this wrong?
    syscall
    ret

Question: the programmer added inc rdx intending to "include the null terminator" in the write. Why is this wrong? What would be printed? Fix it.


Exercise 13: Debug This — The Stack Alignment Violation

_start:
    ; read from stdin
    sub     rsp, 1          ; BUG: allocate 1 byte for buffer
    mov     rax, 0          ; sys_read
    mov     rdi, 0          ; stdin
    mov     rsi, rsp        ; buffer = stack
    mov     rdx, 1          ; 1 byte
    syscall

    ; echo to stdout
    mov     rax, 1          ; sys_write
    mov     rdi, 1          ; stdout
    mov     rsi, rsp        ; buffer
    mov     rdx, 1
    syscall

    add     rsp, 1          ; restore stack
    mov     rax, 60
    xor     rdi, rdi
    syscall

The syscall itself works fine with a misaligned stack. But there are two separate problems: 1. A style/ABI issue: what is wrong with sub rsp, 1 from a stack alignment perspective? 2. A functional issue: add rsp, 1 at the end. What is the actual alignment of RSP when _start is entered (given that the OS starts execution with RSP 16-byte aligned), and what alignment does add rsp, 1 leave it at before the final syscall?

Fix the program to maintain 16-byte RSP alignment throughout.


Exercise 14: Debug This — The Flag Dependency

; Intended: return absolute value of rdi
; Input:  rdi = signed 64-bit integer
; Output: rax = |rdi|

abs_val:
    mov     rax, rdi
    neg     rax             ; negate; if rdi was negative, rax is now positive
    ; if the original value was positive, we negated it wrong — fix:
    jns     .was_negative   ; jump if SF=0 (result is non-negative)
    mov     rax, rdi        ; restore original (which was positive)
.was_negative:
    ret

This function has a logic error in the conditional jump. Trace through two cases: 1. rdi = 5 (positive): what does NEG produce? What does jns do? 2. rdi = -5 (negative): what does NEG produce? What does jns do?

After tracing, identify the bug and fix the function. (Hint: the jump condition is backwards.)


Exercise 15: System Call Error Handling

The following program reads from stdin but ignores errors:

_start:
    sub     rsp, 64

    ; read up to 64 bytes
    mov     rax, 0
    mov     rdi, 0
    mov     rsi, rsp
    mov     rdx, 64
    syscall
    ; rax = bytes read, or negative on error

    ; write whatever we read back
    mov     rdx, rax        ; length = bytes read
    mov     rax, 1
    mov     rdi, 1
    mov     rsi, rsp
    syscall

    add     rsp, 64
    mov     rax, 60
    xor     rdi, rdi
    syscall

Problems: 1. If sys_read returns 0 (EOF), we call sys_write with length 0. Is this harmful? 2. If sys_read returns -1 (EBADF or similar negative error code), we pass a negative value as the length argument to sys_write. What does sys_write do with a huge length (the negative number interpreted as unsigned)? 3. If sys_read succeeds but returns fewer bytes than 64 (which it normally does for interactive input), the write is correct — but what about the remaining bytes in the buffer beyond what was read?

Rewrite _start to: - Check if rax <= 0 after sys_read and exit with code 1 if so - Only write the actual number of bytes read - Exit with code 0 on success


Analysis Exercises


Exercise 16: C-to-Assembly Disassembly

Compile the following C function with gcc -O0 -o ex16 ex16.c and disassemble with objdump -d -M intel ex16:

int sum_range(int start, int end) {
    int total = 0;
    for (int i = start; i <= end; i++) {
        total += i;
    }
    return total;
}

In the disassembly: 1. Identify where start and end are stored (registers or stack slots). 2. Identify the loop counter i and accumulator total. 3. Find the comparison that implements i <= end — which flag combination is being tested? 4. Count the total number of instructions in the function body.

Now compile with gcc -O2 and disassemble again. How does the compiler transform the loop?


Exercise 17: Instruction Encoding Analysis

Using objdump -d -M intel (or nasm -f bin and xxd), find the byte encoding of each instruction:

xor     eax, eax         ; 2 bytes
xor     rax, rax         ; 3 bytes (requires REX prefix)
mov     rax, 0           ; how many bytes? compare to xor
mov     eax, 0           ; how many bytes? compare to mov rax, 0
inc     rax              ; 3 bytes
inc     rcx              ; 3 bytes

Questions: 1. Why does xor rax, rax require a REX prefix while xor eax, eax does not? 2. Why is xor eax, eax preferred over mov rax, 0 for zeroing a register? 3. How many bytes is mov rax, 0xDEADBEEFCAFEBABE? Why?


Exercise 18: MinOS Boot Sequence

Read the MinOS Stage 1 bootloader from Chapter 7. Answer:

  1. The bootloader begins at ORG 0x7C00. The first instruction is a JMP. Immediately after the JMP target, a CALL instruction is used. What is the purpose of this CALL — what does it compute?

  2. The boot signature dw 0xAA55 must be at offset 510 of the 512-byte boot sector. If your bootloader code (before times 510-($ - $$) db 0) is exactly 200 bytes long, how many zero-padding bytes will be added?

  3. The print_string_16 function uses BIOS interrupt 10h with AH=0Eh (teletype output). This interrupt writes one character to the screen. In 16-bit real mode, what are the operand size conventions — how wide are push/pop, and can you use 64-bit registers?

  4. After the BIOS loads the boot sector to 0x7C00, CS may be 0x07C0 (with IP=0) or 0x0000 (with IP=0x7C00). The bootloader uses a far JMP to normalize this. Write the NASM instruction that jumps to a label start while forcing CS to 0x0000.


Synthesis Exercise 19: Write print_uint64

Without looking at the Chapter 7 implementation, write print_uint64 from scratch:

; print_uint64: print a 64-bit unsigned integer to stdout in decimal
; Input:  rdi = value to print
; Output: none (prints to stdout)
; Clobbers: rax, rbx, rcx, rdx, rdi, rsi, r8
;
; Algorithm:
;   1. Divide rdi by 10 repeatedly, collecting remainders
;   2. Remainders come out in reverse order (least significant digit first)
;   3. Store them in a local buffer in reverse, then print

Your implementation must: - Handle 0 (print "0", not an empty string) - Handle the maximum 64-bit value (18446744073709551615) - Use only sys_write for output (no C library)


Exercise 20: Performance Comparison Lab

The chapter presents four strlen implementations. Set up a benchmark:

// benchmark.c
#include <stdio.h>
#include <string.h>
#include <time.h>

// Forward declarations of your assembly implementations
extern size_t my_strlen_naive(const char *s);
extern size_t my_strlen_scasb(const char *s);
extern size_t my_strlen_8bytes(const char *s);

int main(void) {
    char buf[1024];
    memset(buf, 'A', 1023);
    buf[1023] = '\0';

    struct timespec t0, t1;
    int N = 10000000;

    clock_gettime(CLOCK_MONOTONIC, &t0);
    for (int i = 0; i < N; i++) {
        volatile size_t r = my_strlen_naive(buf);
        (void)r;
    }
    clock_gettime(CLOCK_MONOTONIC, &t1);
    printf("naive:   %ld ns/iter\n",
           (t1.tv_nsec - t0.tv_nsec + (t1.tv_sec - t0.tv_sec)*1000000000L) / N);

    // Repeat for scasb and 8bytes versions...
    return 0;
}

Implement all three strlen versions, link them with the benchmark, and measure. On a modern x86-64: 1. What is the approximate ratio of naive to 8-bytes performance? 2. Is SCASB faster or slower than the naive loop? Why might this be surprising? 3. What does this suggest about the relationship between "uses specialized instructions" and "is fast"?