Chapter 7 Exercises: First Programs
Register Trace Exercises
These exercises develop the core skill of mentally executing assembly code. Fill in every blank cell in each table. Use the register state from the previous row as input for the next instruction.
Exercise 1: MOV and XOR Instruction Traces
Starting state: rax = 0xFFFFFFFFFFFFFFFF, rbx = 0x0000000000000042, rcx = 0x0000000000000000
| Step | Instruction | RAX | RBX | RCX | Notes |
|---|---|---|---|---|---|
| 0 | (initial) | 0xFFFFFFFFFFFFFFFF |
0x0000000000000042 |
0x0000000000000000 |
|
| 1 | mov eax, ebx |
? | 0x0000000000000042 |
0x0000000000000000 |
Key: 32-bit write |
| 2 | xor ecx, ecx |
? | 0x0000000000000042 |
? | Key: zero idiom |
| 3 | mov rcx, rax |
? | 0x0000000000000042 |
? | |
| 4 | mov al, 0xFF |
? | 0x0000000000000042 |
? | Key: 8-bit write |
| 5 | xor rax, rax |
? | 0x0000000000000042 |
? |
Explain: why does step 1 clear the upper 32 bits of RAX while step 4 does not clear the upper 56 bits?
Exercise 2: ADD and SUB Flag Traces
Starting state: rax = 0x0000000000000005, rbx = 0x0000000000000003, all flags = 0
| Step | Instruction | RAX | RBX | CF | OF | SF | ZF | Notes |
|---|---|---|---|---|---|---|---|---|
| 0 | (initial) | 0x5 |
0x3 |
0 | 0 | 0 | 0 | |
| 1 | sub rax, rbx |
? | 0x3 |
? | ? | ? | ? | |
| 2 | sub rax, rbx |
? | 0x3 |
? | ? | ? | ? | CF=? |
| 3 | add rax, rbx |
? | 0x3 |
? | ? | ? | ? | |
| 4 | xor rax, rax |
? | 0x3 |
? | ? | ? | ? | Which flags change? |
| 5 | sub rax, 1 |
? | 0x3 |
? | ? | ? | ? | Wraparound |
Exercise 3: Overflow Detection Trace
Starting state: rax = 0x7FFFFFFFFFFFFFFF (largest signed 64-bit value), rbx = 0x0000000000000001
| Step | Instruction | RAX (hex) | RAX (signed decimal) | CF | OF | SF | ZF |
|---|---|---|---|---|---|---|---|
| 0 | (initial) | 0x7FFFFFFFFFFFFFFF |
9,223,372,036,854,775,807 | 0 | 0 | 0 | 0 |
| 1 | add rax, rbx |
? | ? | ? | ? | ? | ? |
| 2 | add rax, rbx |
? | ? | ? | ? | ? | ? |
| 3 | add rax, rbx |
? | ? | ? | ? | ? | ? |
After completing the table: 1. Which flag signals that the signed result is incorrect? 2. Which flag signals that the unsigned result wrapped around? 3. Write the two-instruction sequence that would jump to an error label if the addition in step 1 overflowed (signed).
Exercise 4: INC/DEC and NEG Traces
Starting state: rax = 0x0000000000000001, CF = 1 (from a previous operation)
| Step | Instruction | RAX | CF | ZF | SF | Notes |
|---|---|---|---|---|---|---|
| 0 | (initial) | 0x1 |
1 | 0 | 0 | |
| 1 | dec rax |
? | ? | ? | ? | Does INC/DEC affect CF? |
| 2 | dec rax |
? | ? | ? | ? | ZF set here? |
| 3 | dec rax |
? | ? | ? | ? | SF set here? |
| 4 | neg rax |
? | ? | ? | ? | NEG of a negative number |
| 5 | inc rax |
? | ? | ? | ? | |
| 6 | neg rax |
? | ? | ? | ? | NEG of 0 → CF = ? |
Exercise 5: System Call Register Setup Trace
You need to call sys_write(1, msg, 13) — write 13 bytes from msg (address 0x402000) to stdout (fd 1).
| Step | Instruction | RAX | RDI | RSI | RDX | Notes |
|---|---|---|---|---|---|---|
| 0 | (initial) | ? | ? | ? | ? | (unknown, don't care) |
| 1 | mov rax, 1 |
? | ? | ? | ? | syscall number |
| 2 | mov rdi, 1 |
? | ? | ? | ? | fd = stdout |
| 3 | mov rsi, 0x402000 |
? | ? | ? | ? | buffer address |
| 4 | mov rdx, 13 |
? | ? | ? | ? | length |
| 5 | syscall |
? | ? | ? | ? | RAX = return value |
After the syscall, if exactly 13 bytes were written: what is RAX? If there was a write error: what range of values would RAX hold?
Programming Exercises
Exercise 6: Read and Echo
Write a complete NASM program that: 1. Reads one byte from stdin into a stack buffer using sys_read (syscall 0) 2. Writes that byte back to stdout using sys_write (syscall 1) 3. Writes a newline (byte value 10) after the echoed byte 4. Exits with code 0
Requirements:
- Allocate the buffer on the stack (1 byte at [rsp])
- Keep RSP 16-byte aligned at the point of each syscall
- Use only sys_read, sys_write, and sys_exit — no C library
Starter template:
section .text
global _start
_start:
; Step 1: allocate 1-byte buffer on stack
; (hint: sub rsp, 8 keeps alignment, use [rsp] as buffer)
; Step 2: sys_read(0, [rsp], 1)
; rax=0, rdi=0, rsi=rsp, rdx=1
; Step 3: sys_write(1, [rsp], 1)
; Step 4: write newline (you'll need a newline byte somewhere)
; Step 5: sys_exit(0)
Test: compile, run, type a character, press Enter. The character should be echoed back followed by a newline.
Exercise 7: Factorial
Write a factorial function in NASM that computes n! for n in the range 0-12:
; factorial: compute n! iteratively
; Input: rdi = n (0 <= n <= 12)
; Output: rax = n!
; Clobbers: rcx (loop counter), rax (accumulator)
; Note: 13! overflows 64 bits, so limit n to 12
Requirements:
- Implement iteratively (loop, not recursion)
- Handle n=0 (return 1) and n=1 (return 1) correctly
- Write a _start that calls factorial(10) and exits with the result mod 256 as the exit code
- Verify with: echo $? after running (10! = 3628800; 3628800 mod 256 = 0)
Verify your results: 0!=1, 1!=1, 2!=2, 3!=6, 4!=24, 5!=120, 6!=720, 7!=5040, 8!=40320, 9!=362880, 10!=3628800, 11!=39916800, 12!=479001600.
Exercise 8: strlen — Naive Implementation
Write a my_strlen function with this interface:
; my_strlen: compute length of null-terminated string
; Input: rdi = pointer to null-terminated string
; Output: rax = length (not counting null terminator)
; Clobbers: rdi (advances to null byte), rax
Requirements:
- Use a loop that loads a byte and checks if it's zero
- Use MOVZX to zero-extend the byte into a register (avoid partial register writes in the loop)
- Write a _start that calls my_strlen with a test string and exits with the length as the exit code
- Verify: my_strlen("hello") = 5, my_strlen("") = 0, my_strlen("a") = 1
Test data for .data section:
section .data
test_str db "Hello, World!", 0 ; expect 13
empty_str db 0 ; expect 0
Exercise 9: Uppercase Converter
Write a program that: 1. Reads up to 64 bytes from stdin into a buffer 2. Converts all lowercase ASCII letters (a-z, values 0x61-0x7A) to uppercase (A-Z, values 0x41-0x5A) by clearing bit 5 (subtract 0x20) 3. Writes the converted buffer back to stdout 4. Exits
Key instruction for this exercise: you can use and al, 0xDF to clear bit 5 of a byte, which uppercases a lowercase ASCII letter. But you only want to apply this to lowercase letters.
Hint: use CMP and conditional jumps to check the range:
cmp al, 'a'
jb .not_lower ; below 'a' — not lowercase
cmp al, 'z'
ja .not_lower ; above 'z' — not lowercase
sub al, 0x20 ; convert to uppercase
.not_lower:
Exercise 10: Count Specific Bytes
Write a count_byte function:
; count_byte: count occurrences of a specific byte value in a buffer
; Input: rdi = pointer to buffer
; rsi = buffer length (bytes)
; rdx = byte value to search for (0-255)
; Output: rax = count of matching bytes
; Clobbers: rdi, rsi, rdx, rcx, rax
Write a test _start that:
1. Calls count_byte on the string "hello world" (length 11) looking for 'l' (0x6C)
2. Exits with the count as the exit code (expect: 3)
Debugging Exercises
Each of the following programs contains a bug. Identify the bug, explain why it causes incorrect behavior, and write the corrected version.
Exercise 11: Debug This — The Register Aliasing Bug
; Intended: count_vowels(str, len) — counts a, e, i, o, u in a string
; Input: rdi = string pointer, rsi = length
; Output: rax = vowel count
count_vowels:
xor rax, rax ; count = 0
xor rcx, rcx ; index = 0
.loop:
cmp rcx, rsi
jge .done
mov bl, [rdi + rcx] ; load byte
; check if it's a vowel
cmp bl, 'a'
je .is_vowel
cmp bl, 'e'
je .is_vowel
cmp bl, 'i'
je .is_vowel
cmp bl, 'o'
je .is_vowel
cmp bl, 'u'
je .is_vowel
jmp .next
.is_vowel:
inc rax
.next:
inc rcx
jmp .loop
.done:
ret
Bug: this function uses RBX without saving it. RBX is a callee-saved register in the System V AMD64 ABI. If the caller was relying on RBX being preserved across the call, this function would corrupt it.
Question: Fix the function. Also: why does using BL (the low byte of RBX) matter here? What would happen if you used mov r10b, [rdi + rcx] instead?
Exercise 12: Debug This — The Off-by-One
; Intended: print string s followed by a newline
; Input: rdi = pointer to null-terminated string
print_line:
; find the length first
push rdi ; save original pointer
mov rsi, rdi ; rsi = working pointer
xor rcx, rcx ; length = 0
.find_end:
cmp BYTE [rsi], 0
je .found_end
inc rcx
inc rsi
jmp .find_end
.found_end:
; rcx = length, original pointer on stack
pop rsi ; rsi = original pointer (the buffer)
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
; rdx = length
mov rdx, rcx
inc rdx ; BUG: why is this wrong?
syscall
ret
Question: the programmer added inc rdx intending to "include the null terminator" in the write. Why is this wrong? What would be printed? Fix it.
Exercise 13: Debug This — The Stack Alignment Violation
_start:
; read from stdin
sub rsp, 1 ; BUG: allocate 1 byte for buffer
mov rax, 0 ; sys_read
mov rdi, 0 ; stdin
mov rsi, rsp ; buffer = stack
mov rdx, 1 ; 1 byte
syscall
; echo to stdout
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
mov rsi, rsp ; buffer
mov rdx, 1
syscall
add rsp, 1 ; restore stack
mov rax, 60
xor rdi, rdi
syscall
The syscall itself works fine with a misaligned stack. But there are two separate problems:
1. A style/ABI issue: what is wrong with sub rsp, 1 from a stack alignment perspective?
2. A functional issue: add rsp, 1 at the end. What is the actual alignment of RSP when _start is entered (given that the OS starts execution with RSP 16-byte aligned), and what alignment does add rsp, 1 leave it at before the final syscall?
Fix the program to maintain 16-byte RSP alignment throughout.
Exercise 14: Debug This — The Flag Dependency
; Intended: return absolute value of rdi
; Input: rdi = signed 64-bit integer
; Output: rax = |rdi|
abs_val:
mov rax, rdi
neg rax ; negate; if rdi was negative, rax is now positive
; if the original value was positive, we negated it wrong — fix:
jns .was_negative ; jump if SF=0 (result is non-negative)
mov rax, rdi ; restore original (which was positive)
.was_negative:
ret
This function has a logic error in the conditional jump. Trace through two cases:
1. rdi = 5 (positive): what does NEG produce? What does jns do?
2. rdi = -5 (negative): what does NEG produce? What does jns do?
After tracing, identify the bug and fix the function. (Hint: the jump condition is backwards.)
Exercise 15: System Call Error Handling
The following program reads from stdin but ignores errors:
_start:
sub rsp, 64
; read up to 64 bytes
mov rax, 0
mov rdi, 0
mov rsi, rsp
mov rdx, 64
syscall
; rax = bytes read, or negative on error
; write whatever we read back
mov rdx, rax ; length = bytes read
mov rax, 1
mov rdi, 1
mov rsi, rsp
syscall
add rsp, 64
mov rax, 60
xor rdi, rdi
syscall
Problems: 1. If sys_read returns 0 (EOF), we call sys_write with length 0. Is this harmful? 2. If sys_read returns -1 (EBADF or similar negative error code), we pass a negative value as the length argument to sys_write. What does sys_write do with a huge length (the negative number interpreted as unsigned)? 3. If sys_read succeeds but returns fewer bytes than 64 (which it normally does for interactive input), the write is correct — but what about the remaining bytes in the buffer beyond what was read?
Rewrite _start to:
- Check if rax <= 0 after sys_read and exit with code 1 if so
- Only write the actual number of bytes read
- Exit with code 0 on success
Analysis Exercises
Exercise 16: C-to-Assembly Disassembly
Compile the following C function with gcc -O0 -o ex16 ex16.c and disassemble with objdump -d -M intel ex16:
int sum_range(int start, int end) {
int total = 0;
for (int i = start; i <= end; i++) {
total += i;
}
return total;
}
In the disassembly:
1. Identify where start and end are stored (registers or stack slots).
2. Identify the loop counter i and accumulator total.
3. Find the comparison that implements i <= end — which flag combination is being tested?
4. Count the total number of instructions in the function body.
Now compile with gcc -O2 and disassemble again. How does the compiler transform the loop?
Exercise 17: Instruction Encoding Analysis
Using objdump -d -M intel (or nasm -f bin and xxd), find the byte encoding of each instruction:
xor eax, eax ; 2 bytes
xor rax, rax ; 3 bytes (requires REX prefix)
mov rax, 0 ; how many bytes? compare to xor
mov eax, 0 ; how many bytes? compare to mov rax, 0
inc rax ; 3 bytes
inc rcx ; 3 bytes
Questions:
1. Why does xor rax, rax require a REX prefix while xor eax, eax does not?
2. Why is xor eax, eax preferred over mov rax, 0 for zeroing a register?
3. How many bytes is mov rax, 0xDEADBEEFCAFEBABE? Why?
Exercise 18: MinOS Boot Sequence
Read the MinOS Stage 1 bootloader from Chapter 7. Answer:
-
The bootloader begins at
ORG 0x7C00. The first instruction is a JMP. Immediately after the JMP target, a CALL instruction is used. What is the purpose of this CALL — what does it compute? -
The boot signature
dw 0xAA55must be at offset 510 of the 512-byte boot sector. If your bootloader code (beforetimes 510-($ - $$) db 0) is exactly 200 bytes long, how many zero-padding bytes will be added? -
The
print_string_16function uses BIOS interrupt 10h with AH=0Eh (teletype output). This interrupt writes one character to the screen. In 16-bit real mode, what are the operand size conventions — how wide are push/pop, and can you use 64-bit registers? -
After the BIOS loads the boot sector to 0x7C00, CS may be 0x07C0 (with IP=0) or 0x0000 (with IP=0x7C00). The bootloader uses a far JMP to normalize this. Write the NASM instruction that jumps to a label
startwhile forcing CS to 0x0000.
Synthesis Exercise 19: Write print_uint64
Without looking at the Chapter 7 implementation, write print_uint64 from scratch:
; print_uint64: print a 64-bit unsigned integer to stdout in decimal
; Input: rdi = value to print
; Output: none (prints to stdout)
; Clobbers: rax, rbx, rcx, rdx, rdi, rsi, r8
;
; Algorithm:
; 1. Divide rdi by 10 repeatedly, collecting remainders
; 2. Remainders come out in reverse order (least significant digit first)
; 3. Store them in a local buffer in reverse, then print
Your implementation must: - Handle 0 (print "0", not an empty string) - Handle the maximum 64-bit value (18446744073709551615) - Use only sys_write for output (no C library)
Exercise 20: Performance Comparison Lab
The chapter presents four strlen implementations. Set up a benchmark:
// benchmark.c
#include <stdio.h>
#include <string.h>
#include <time.h>
// Forward declarations of your assembly implementations
extern size_t my_strlen_naive(const char *s);
extern size_t my_strlen_scasb(const char *s);
extern size_t my_strlen_8bytes(const char *s);
int main(void) {
char buf[1024];
memset(buf, 'A', 1023);
buf[1023] = '\0';
struct timespec t0, t1;
int N = 10000000;
clock_gettime(CLOCK_MONOTONIC, &t0);
for (int i = 0; i < N; i++) {
volatile size_t r = my_strlen_naive(buf);
(void)r;
}
clock_gettime(CLOCK_MONOTONIC, &t1);
printf("naive: %ld ns/iter\n",
(t1.tv_nsec - t0.tv_nsec + (t1.tv_sec - t0.tv_sec)*1000000000L) / N);
// Repeat for scasb and 8bytes versions...
return 0;
}
Implement all three strlen versions, link them with the benchmark, and measure. On a modern x86-64: 1. What is the approximate ratio of naive to 8-bytes performance? 2. Is SCASB faster or slower than the naive loop? Why might this be surprising? 3. What does this suggest about the relationship between "uses specialized instructions" and "is fast"?