Chapter 1 Exercises: Why Assembly Language?

Section A: Compile and Disassemble

Exercise 1.1 — Your First Disassembly

Create the following C file:

// exercises/ch01/add.c
int add(int a, int b) {
    return a + b;
}

int main(void) {
    int result = add(3, 4);
    return result;
}

Compile it in two ways and compare:

gcc -O0 -S add.c -o add_O0.s
gcc -O2 -S add.c -o add_O2.s

Answer these questions based on the output:

a) In the -O0 version, how many instructions does add() contain (including the function prologue and epilogue)?

b) In the -O2 version, is there still a separate add() function, or has it been inlined into main()? Why might the compiler make this choice?

c) In the -O2 main(), what value does the compiler compute for the call add(3, 4)? Was the function call even emitted?

d) What calling convention is used to pass the arguments to add() in the -O0 version? Which registers hold a and b?


Exercise 1.2 — Instruction Counting

Create this file:

// exercises/ch01/loop.c
long sum_to_n(int n) {
    long sum = 0;
    for (int i = 1; i <= n; i++) {
        sum += i;
    }
    return sum;
}

Compile with gcc -O0 -S loop.c and examine the output.

a) Count the total number of assembly instructions in sum_to_n() (not counting labels or directives).

b) Identify which instructions correspond to: the loop counter initialization, the loop condition check, the addition, the counter increment, and the loop back-jump.

c) Now compile with gcc -O1 -S loop.c. Did GCC apply the Gauss formula (replacing the loop with n*(n+1)/2)? Check with gcc -O2 -S loop.c. What changed?

d) Compile with gcc -O3 -S loop.c. Look for SIMD instructions (they'll have names like vmovdqa, vpaddd, vpaddq). Did the compiler vectorize the loop?


Exercise 1.3 — objdump Analysis

Compile sum.c from the chapter example with gcc -O2 -c sum.c -o sum.o and run:

objdump -d sum.o
objdump -d -M intel sum.o    # Intel syntax
objdump -d -M att sum.o      # AT&T syntax (default on Linux)

a) What is the difference in notation between Intel and AT&T syntax for the instruction that loads from memory? Which do you find more readable?

b) Find the instruction 48 03 07 in the objdump output. What does the 48 prefix byte indicate? What is the base instruction (without the prefix)?

c) The ret instruction appears twice in the optimized sum_array(). Why? Describe what scenario each ret handles.

d) How many bytes does the entire sum_array function occupy in the object file?


Exercise 1.4 — The Full Pipeline

Create a "hello world" in C:

// exercises/ch01/hello.c
#include <stdio.h>
int main(void) {
    printf("Hello, World!\n");
    return 0;
}

Walk through each stage manually:

gcc -E hello.c -o hello.i       # preprocess
gcc -S hello.c -o hello.s       # compile
gcc -c hello.c -o hello.o       # assemble
gcc hello.o -o hello            # link
./hello                          # run

a) How many lines is hello.i (the preprocessed output)? What does the extra content consist of?

b) In hello.s, how is printf("Hello, World!\n") represented? Is it still a printf call, or has it been transformed?

c) Run readelf -h hello and find: the entry point address, the number of section headers, and the machine type. What machine type value corresponds to x86-64?

d) Run nm hello and find the symbol main. What is its address? Find printf — is it defined in hello, or is it marked undefined (meaning it comes from a shared library)?


Section B: Why Assembly Matters

Exercise 1.5 — The Optimization Gap

Consider these two semantically equivalent C functions:

// Version A
long sum_v1(long *arr, int n) {
    long total = 0;
    for (int i = 0; i < n; i++) {
        total = total + arr[i];
    }
    return total;
}

// Version B
long sum_v2(long *arr, int n) {
    long total = 0;
    long *end = arr + n;
    while (arr != end) {
        total += *arr++;
    }
    return total;
}

Compile both with -O2 and compare the assembly output.

a) Do the two functions produce identical assembly? If not, what differs?

b) Now add __restrict__ to the pointer parameter: long *__restrict__ arr. Does this change the assembly? Why might the compiler generate different code when it knows there are no aliasing pointers?

c) Add #pragma GCC optimize("O3,unroll-loops") before sum_v1 and recompile. What changes? Is the loop unrolled?


Exercise 1.6 — The Compiler's Choices

Compile the following with gcc -O2 -S and explain what the compiler did and why:

int is_power_of_two(unsigned int n) {
    return n != 0 && (n & (n - 1)) == 0;
}

a) How many instructions did the compiler emit? Is there a branch (conditional jump) in the output?

b) Would you have expected a branch based on reading the C code? Explain.

c) What does the instruction blsr do if it appears in the output? (If it doesn't appear with your compiler version, compile with -march=native or -march=haswell.)


Exercise 1.7 — Reading Compiler Output

Without compiling, predict what x86-64 assembly gcc -O2 would generate for the following function, then verify your prediction:

int clamp(int value, int min, int max) {
    if (value < min) return min;
    if (value > max) return max;
    return value;
}

Write your predicted assembly (just the key instructions — you don't need to get the exact encoding right). Then compile and compare. Key questions:

a) Does the compiler use conditional moves (cmov) or conditional jumps (jl, jg)?

b) What are the arguments to clamp() when called as clamp(x, 0, 255)?

c) What optimization did the compiler apply that you may not have predicted?


Section C: Architecture Awareness

Exercise 1.8 — x86-64 Instruction Lengths

For each of the following NASM instructions, predict whether it will encode as 1, 2, 3, 4, or 5+ bytes. Then assemble and use objdump to verify:

ret             ; (a)
nop             ; (b)
mov eax, 0      ; (c)
xor eax, eax    ; (d)
mov rax, 0      ; (e) -- hint: this is NOT 5 bytes
mov rax, 1      ; (f)
push rbx        ; (g)
add rax, rdi    ; (h)

For each instruction, write the hex encoding and the length in bytes.


Exercise 1.9 — Disassembly Reading Practice

Given the following raw machine bytes (x86-64), identify what instruction each sequence represents. Use objdump or an online disassembler to help:

48 89 e5        ; (a)
5d              ; (b)
c3              ; (c)
31 c0           ; (d)
48 83 ec 08     ; (e)
0f 05           ; (f)

Write the mnemonic and describe what each instruction does.


Exercise 1.10 — The Seven Categories

For each of the following programming tasks, identify which of the seven categories of assembly users it falls into, and explain in one paragraph why assembly knowledge is specifically useful for that task:

a) Writing a custom memory allocator that must be faster than glibc malloc b) Analyzing a suspected malware binary that has been stripped of debug symbols c) Writing the context switch code for a preemptive multitasking OS d) Debugging why a numerical simulation gives slightly different results on AMD vs Intel CPUs e) Implementing a hash function that must use AES-NI instructions for performance f) Solving a "pwn" challenge in a CTF competition that involves a stack buffer overflow


Section D: ARM64 Comparison

Exercise 1.11 — Cross-Architecture Comparison

The following is an ARM64 (AArch64) version of the sum_array function. Compare it to the x86-64 version from the chapter:

// gcc -O2 output for ARM64
sum_array:
    cbz     w1, .L4         // if n == 0, jump to return-0
    sxtw    x2, w1          // sign-extend n to 64 bits
    add     x2, x0, x2, lsl #3  // x2 = arr + n*8 (end pointer)
    eor     x0, x0, x0      // x0 = 0 (total)  [why not: mov x0, #0?]
.L3:
    ldr     x3, [x1]        // x3 = *arr
    add     x0, x0, x3      // total += *arr
    add     x1, x1, #8      // arr++
    cmp     x1, x2          // arr == end?
    b.ne    .L3             // if not, loop
    ret
.L4:
    mov     x0, #0          // return 0
    ret

a) ARM64 uses x0-x30 for general-purpose registers. Which registers are used here as the function arguments? (ARM64 passes first argument in x0, second in x1.)

b) How does the structure of this loop compare to the x86-64 version? What is the same? What is different?

c) What does lsl #3 mean in x2, x0, x2, lsl #3? What does the full instruction compute?

d) ARM64 instructions are all 32 bits (4 bytes) wide. How does this differ from x86-64's variable-length instructions? What are the tradeoffs?


Section E: Synthesis

Exercise 1.12 — Annotate the Compiler

For the following C function, compile with gcc -O1 -S (note: -O1, not -O2) and provide a complete annotated listing, explaining what each instruction does and why the compiler chose it:

int count_bits(unsigned int x) {
    int count = 0;
    while (x) {
        count += x & 1;
        x >>= 1;
    }
    return count;
}

Compare with gcc -O2 -S. Does the compiler use the popcnt instruction? Compile with gcc -O2 -mpopcnt -S or gcc -O2 -march=native -S. What happens?


Exercise 1.13 — Mental Model Check

Answer the following questions without compiling anything (these test conceptual understanding):

a) A C function takes no arguments and returns void. How many bytes minimum does its x86-64 assembly representation contain? (Hint: what's the minimum function you could write?)

b) The statement "assembly language is slow" is often heard from programmers. Is this true? Explain carefully — in what sense might it be true, and in what sense is it clearly false?

c) Why does x86-64 support variable-length instructions while ARM64 uses fixed 32-bit instructions? What are the practical consequences for the programmer?

d) A programmer claims they never need to read assembly because they use -O3 and trust the compiler. Name three concrete situations where this position would fail them.


Exercise 1.14 — The Full Binary

After completing Exercise 1.4, use xxd hello | head -20 to look at the first 20 lines of the binary in hex.

a) The first 4 bytes of an ELF file are 7f 45 4c 46. What is the human-readable interpretation of these bytes? (Hint: 0x45 = E, 0x4c = L, 0x46 = F.)

b) Find the sequence of bytes in the binary that corresponds to the string "Hello, World!\n". What byte comes after the \n?

c) The entry point is listed in the ELF header from readelf -h. Find those bytes in the binary at the listed offset. Do they correspond to the first instruction of main from your objdump output?


Exercise 1.15 — Preview: The MinOS Project

Read the MinOS project description in the chapter. Before writing any code, answer these planning questions:

a) A BIOS boots in 16-bit real mode and loads the bootloader at physical address 0x7C00. What is the maximum address reachable with a 16-bit address? (Calculate: 2^16 - 1 in hex.)

b) To run 64-bit code, the CPU must be switched from real mode to "long mode." This requires setting up a Global Descriptor Table (GDT) and enabling specific bits in control registers. Without knowing the details yet, what information would a GDT entry need to describe a memory segment?

c) Why might a bootloader be written in assembly rather than C? Give at least three reasons.

d) QEMU is used to test the kernel without real hardware. What advantages does QEMU provide for OS development compared to testing on physical hardware?