Chapter 3 Exercises: The x86-64 Architecture

Section A: Register Aliasing

Exercise 3.1 — Predict the Register State

For each sequence of instructions, give the final value of the full 64-bit register (RAX, RBX, etc.) in hexadecimal. Show your work.

a)

mov  rax, 0x0102030405060708
mov  eax, 0x0A0B0C0D
; What is rax?

b)

mov  rax, 0x0102030405060708
mov  ax, 0xAAAA
; What is rax?

c)

mov  rax, 0x0102030405060708
mov  al, 0xFF
; What is rax?

d)

mov  rax, 0x0102030405060708
mov  ah, 0xFF
; What is rax?

e)

mov  rbx, 0xFFFFFFFFFFFFFFFF
mov  r8d, 0x12345678
mov  ebx, r8d
; What is rbx?

f)

mov  rax, 0xDEADBEEFCAFEBABE
xor  eax, eax
; What is rax?

g) (Tricky)

mov  rax, 0xDEADBEEFCAFEBABE
xor  ax, ax
; What is rax?

Exercise 3.2 — The Aliasing Bug Hunt

The following function is supposed to copy a 64-bit value from RDI into RAX and return it. Find all aliasing bugs:

buggy_copy:
    push rbp
    mov  rbp, rsp

    ; Copy rdi to rax in three steps (the wrong way):
    mov  eax, 0            ; step 1: clear rax
    mov  edx, edi          ; step 2: copy low 32 bits to edx
    mov  eax, edx          ; step 3: put in rax

    pop  rbp
    ret

a) What value does this function return if RDI = 0x00000000FFFFFFFF? b) What value does it return if RDI = 0x0000000100000000? c) Rewrite the function correctly in 3 ways: - Using a single MOV instruction - Using XOR + OR to combine high and low halves (academic exercise) - Using the zero-extension property of 32-bit writes (if the value fits in 32 bits)


Exercise 3.3 — Compiler Exploitation of the Aliasing Rule

Compile the following C function with gcc -O2 -S:

unsigned long zero_extend_32(unsigned int x) {
    return (unsigned long)x;
}

a) How many instructions does the compiler emit? b) Does it use movzx rax, edi or mov eax, edi (or something else)? Why does it prefer one form? c) Now compile this:

long sign_extend_32(int x) {
    return (long)x;
}

What instruction does it use now? Why is the instruction different?


Section B: Register Identification

Exercise 3.4 — Register Name Quiz

Fill in the table. For each 64-bit register, provide the names of all sub-registers:

64-bit 32-bit 16-bit High Byte Low Byte
RAX
RBX
RCX
RDX
RSI N/A N/A
RDI N/A N/A
RSP N/A N/A
RBP N/A N/A
R8 N/A
R12 N/A
R15 N/A

Exercise 3.5 — Calling Convention

For each of the following function signatures (System V AMD64 ABI), identify which register holds each parameter at the start of the function body:

a) void f(int a, int b, int c) b) long g(long x, long y, long z, long w, long v, long u) c) void h(int a, long b, char c, short d) (all integers, different sizes) d) long k(long a, long b, long c, long d, long e, long f, long g) (7 args — where does g go?)

For (d), describe how the 7th argument is passed. What does the callee need to do to access it?


Exercise 3.6 — Callee-Saved Register Discipline

The following function uses R12, R13, and R14 for its own purposes but must preserve them:

; computes: result = a * b + c * d
; args: rdi=a, rsi=b, rdx=c, rcx=d
; returns: rax
quad_product:
    push r12
    push r13
    push r14

    mov  r12, rdi    ; r12 = a
    mov  r13, rsi    ; r13 = b
    mov  r14, rdx    ; r14 = c
    ; (rcx = d, will use directly)

    ; Compute a * b:
    mov  rax, r12
    imul rax, r13    ; rax = a * b

    ; Save intermediate result:
    push rax

    ; Compute c * d:
    mov  rax, r14
    imul rax, rcx    ; rax = c * d

    ; Add a*b + c*d:
    pop  rbx         ; rbx = a * b  <-- PROBLEM: Is RBX caller-saved or callee-saved?
    add  rax, rbx

    pop  r14
    pop  r13
    pop  r12
    ret

a) Is there a bug in this function? Identify it. b) Fix the bug without using PUSH/POP for the intermediate result (use a callee-saved register instead). c) Does this function maintain the required 16-byte stack alignment? The function is called from another function. Before the first PUSH, the stack has the return address on it (8 bytes). After three PUSH instructions, how many bytes are on the stack? Is RSP 16-byte aligned?


Section C: RFLAGS and Condition Codes

Exercise 3.7 — RFLAGS Bit Manipulation

Write a NASM function that reads RFLAGS into RAX, then: a) Clears the Trap Flag (bit 8) b) Sets the Alignment Check bit (bit 18) c) Returns the modified value in RAX (without storing it back to RFLAGS)

modify_flags:
    ; Your code here
    ret

Exercise 3.8 — Condition Code Prediction

For each cmp instruction followed by conditional jumps, state which jumps are taken. Assume the comparison values are as shown:

; (a)
mov  rax, 0xFF
mov  rbx, 0x01
cmp  rax, rbx
; Is je taken? jne? ja? jb? jg? jl?

; (b)
mov  eax, -1        ; 0xFFFFFFFF
mov  ebx, 1
cmp  eax, ebx
; Is ja taken? jb? jg? jl? jae? jbe?

; (c)
mov  rax, 0
sub  rax, 1         ; What is rax now? What flags are set?
cmp  rax, 0
; Is jz taken? jl? jb? js?

; (d) -- a loop termination check
mov  rcx, 10
.loop:
    dec  rcx
    jnz  .loop
; After the loop, what value does rcx contain? What flag caused the exit?

Exercise 3.9 — Reading RFLAGS with PUSHFQ

Write a NASM program that: 1. Performs 0x7FFFFFFFFFFFFFFF + 1 (which causes signed overflow) 2. Immediately saves RFLAGS to memory using PUSHFQ 3. Reads the saved flags back and checks bit 11 (OF) 4. Prints "Overflow!" if bit 11 is set, "No overflow" otherwise

section .data
    msg_overflow    db "Overflow!", 10
    msg_overflow_len equ $ - msg_overflow
    msg_no_overflow db "No overflow", 10
    msg_no_overflow_len equ $ - msg_no_overflow

section .bss
    saved_flags resq 1    ; space for RFLAGS

section .text
    global _start

_start:
    ; Your code here:
    ; 1. Do the arithmetic that causes overflow
    ; 2. PUSHFQ to save flags on stack
    ; 3. POP the flags into saved_flags
    ; 4. AND with (1 << 11) to isolate OF
    ; 5. Print appropriate message

Section D: SIMD and Architecture Extensions

Exercise 3.10 — CPUID Detection

Write a complete NASM program that uses CPUID to detect and report the following CPU features, printing "supported" or "not supported" for each:

  • SSE4.2 (CPUID leaf 1, ECX bit 20)
  • AVX (CPUID leaf 1, ECX bit 28)
  • AVX2 (CPUID leaf 7, sub-leaf 0, EBX bit 5)
  • AES-NI (CPUID leaf 1, ECX bit 25)

Template:

section .data
    msg_sse42    db "SSE4.2: "
    ...

section .text
    global _start

check_features:
    ; CPUID leaf 1 for SSE4.2, AVX, AES-NI:
    mov  eax, 1
    cpuid
    ; ECX bits: 25=AES-NI, 28=AVX, 20=SSE4.2
    ; Store ECX for later use
    mov  r12, rcx

    ; CPUID leaf 7 for AVX2:
    mov  eax, 7
    xor  ecx, ecx
    cpuid
    ; EBX bit 5 = AVX2
    ...

Exercise 3.11 — XMM Register Basics

Without using any SIMD instructions, answer these questions about the XMM register file:

a) How many XMM registers are there in x86-64 (baseline, without AVX-512)? b) Each XMM register is 128 bits wide. How many 32-bit floats can a single XMM register hold simultaneously? c) How many 64-bit doubles? How many 64-bit integers? d) An instruction ADDPS xmm0, xmm1 adds packed single-precision floats. How many additions does this perform simultaneously? e) An instruction ADDPD xmm0, xmm1 adds packed double-precision floats. How many additions? f) With AVX2 (YMM registers, 256 bits), how many 32-bit additions does VADDPS ymm0, ymm1, ymm2 perform?


Section E: The Instruction Pipeline

Exercise 3.12 — Data Hazards

Given the following instruction sequence, identify any data hazards (where one instruction needs the result of the previous one):

; Sequence A:
mov  rax, [rdi]       ; (1) load value from memory
add  rax, rbx         ; (2) add to loaded value
mov  [rsi], rax       ; (3) store result

; Sequence B (same computation, rearranged):
mov  rcx, [rdx]       ; (1) load from different address
add  rax, rbx         ; (2) add -- does (2) depend on (1)?
mov  [rsi], rax       ; (3) does (3) depend on (2)?
mov  rdi, rcx         ; (4) does (4) depend on (1)?

For sequence A: which instruction pairs have a data hazard? For sequence B: which instruction pairs have a data hazard? Which are independent?

Explain how a modern out-of-order CPU would handle the independent instructions in sequence B.


Exercise 3.13 — Instruction Length Prediction

Using what you know about REX prefixes and encoding, predict whether each instruction will be encoded with or without a REX prefix, and estimate the instruction length:

mov  eax, ebx      ; (a) -- 32-bit, no REX needed?
mov  rax, rbx      ; (b) -- 64-bit, needs REX.W
mov  r8, rbx       ; (c) -- uses R8, needs REX.R or REX.B
add  eax, 1        ; (d) -- 32-bit immediate add
add  rax, 1        ; (e) -- 64-bit immediate add
push r12           ; (f) -- pushing extended register
ret                ; (g) -- no operands

Assemble these (put them in a .text section) and use objdump -d to verify your predictions.


Section F: Synthesis

Exercise 3.14 — A Complete Function

Write a NASM function with the following specification: - Name: vector_dot_product - Arguments: rdi = int64_t *a, rsi = int64_t *b, rdx = int64_t n - Returns: rax = sum of a[i] * b[i] for i in 0..n-1 - Must preserve all callee-saved registers - Must maintain stack alignment - Must handle n=0 correctly (return 0)

; vector_dot_product: compute dot product of two int64 arrays
; Args: rdi = a, rsi = b, rdx = n
; Returns: rax = dot product
vector_dot_product:
    ; Your implementation here
    ; Hints:
    ; - IMUL rax, [rdi + rcx*8] for each element
    ; - Loop with a counter in rcx (or better: pointer arithmetic)
    ; - Accumulate in rax
    ; - Use callee-saved registers for loop variables
    ret

After writing the function, verify it with a test case: a=[1,2,3], b=[4,5,6], n=3, expected=14+25+3*6=32.


Exercise 3.15 — GDB Register Inspection

Set up a short program with a deliberate aliasing issue:

section .text
    global _start

_start:
    mov  rax, 0xDEADBEEFCAFEBABE
    mov  rbx, 0x1234567890ABCDEF
    mov  eax, ebx              ; aliasing: does this set rax = rbx?
    mov  rax, 60
    xor  rdi, rdi
    syscall

a) Before running: predict the value of RAX after line 3 (mov eax, ebx) b) Run the program in GDB with break _start, stepi through each instruction, and use info registers or p/x $rax to check after each step c) Was your prediction correct? d) What would you need to change to make RAX equal to RBX (the full 64-bit value)?