Chapter 3 Exercises: The x86-64 Architecture
Section A: Register Aliasing
Exercise 3.1 — Predict the Register State
For each sequence of instructions, give the final value of the full 64-bit register (RAX, RBX, etc.) in hexadecimal. Show your work.
a)
mov rax, 0x0102030405060708
mov eax, 0x0A0B0C0D
; What is rax?
b)
mov rax, 0x0102030405060708
mov ax, 0xAAAA
; What is rax?
c)
mov rax, 0x0102030405060708
mov al, 0xFF
; What is rax?
d)
mov rax, 0x0102030405060708
mov ah, 0xFF
; What is rax?
e)
mov rbx, 0xFFFFFFFFFFFFFFFF
mov r8d, 0x12345678
mov ebx, r8d
; What is rbx?
f)
mov rax, 0xDEADBEEFCAFEBABE
xor eax, eax
; What is rax?
g) (Tricky)
mov rax, 0xDEADBEEFCAFEBABE
xor ax, ax
; What is rax?
Exercise 3.2 — The Aliasing Bug Hunt
The following function is supposed to copy a 64-bit value from RDI into RAX and return it. Find all aliasing bugs:
buggy_copy:
push rbp
mov rbp, rsp
; Copy rdi to rax in three steps (the wrong way):
mov eax, 0 ; step 1: clear rax
mov edx, edi ; step 2: copy low 32 bits to edx
mov eax, edx ; step 3: put in rax
pop rbp
ret
a) What value does this function return if RDI = 0x00000000FFFFFFFF?
b) What value does it return if RDI = 0x0000000100000000?
c) Rewrite the function correctly in 3 ways:
- Using a single MOV instruction
- Using XOR + OR to combine high and low halves (academic exercise)
- Using the zero-extension property of 32-bit writes (if the value fits in 32 bits)
Exercise 3.3 — Compiler Exploitation of the Aliasing Rule
Compile the following C function with gcc -O2 -S:
unsigned long zero_extend_32(unsigned int x) {
return (unsigned long)x;
}
a) How many instructions does the compiler emit?
b) Does it use movzx rax, edi or mov eax, edi (or something else)? Why does it prefer one form?
c) Now compile this:
long sign_extend_32(int x) {
return (long)x;
}
What instruction does it use now? Why is the instruction different?
Section B: Register Identification
Exercise 3.4 — Register Name Quiz
Fill in the table. For each 64-bit register, provide the names of all sub-registers:
| 64-bit | 32-bit | 16-bit | High Byte | Low Byte |
|---|---|---|---|---|
| RAX | ||||
| RBX | ||||
| RCX | ||||
| RDX | ||||
| RSI | N/A | N/A | ||
| RDI | N/A | N/A | ||
| RSP | N/A | N/A | ||
| RBP | N/A | N/A | ||
| R8 | N/A | |||
| R12 | N/A | |||
| R15 | N/A |
Exercise 3.5 — Calling Convention
For each of the following function signatures (System V AMD64 ABI), identify which register holds each parameter at the start of the function body:
a) void f(int a, int b, int c)
b) long g(long x, long y, long z, long w, long v, long u)
c) void h(int a, long b, char c, short d) (all integers, different sizes)
d) long k(long a, long b, long c, long d, long e, long f, long g) (7 args — where does g go?)
For (d), describe how the 7th argument is passed. What does the callee need to do to access it?
Exercise 3.6 — Callee-Saved Register Discipline
The following function uses R12, R13, and R14 for its own purposes but must preserve them:
; computes: result = a * b + c * d
; args: rdi=a, rsi=b, rdx=c, rcx=d
; returns: rax
quad_product:
push r12
push r13
push r14
mov r12, rdi ; r12 = a
mov r13, rsi ; r13 = b
mov r14, rdx ; r14 = c
; (rcx = d, will use directly)
; Compute a * b:
mov rax, r12
imul rax, r13 ; rax = a * b
; Save intermediate result:
push rax
; Compute c * d:
mov rax, r14
imul rax, rcx ; rax = c * d
; Add a*b + c*d:
pop rbx ; rbx = a * b <-- PROBLEM: Is RBX caller-saved or callee-saved?
add rax, rbx
pop r14
pop r13
pop r12
ret
a) Is there a bug in this function? Identify it. b) Fix the bug without using PUSH/POP for the intermediate result (use a callee-saved register instead). c) Does this function maintain the required 16-byte stack alignment? The function is called from another function. Before the first PUSH, the stack has the return address on it (8 bytes). After three PUSH instructions, how many bytes are on the stack? Is RSP 16-byte aligned?
Section C: RFLAGS and Condition Codes
Exercise 3.7 — RFLAGS Bit Manipulation
Write a NASM function that reads RFLAGS into RAX, then: a) Clears the Trap Flag (bit 8) b) Sets the Alignment Check bit (bit 18) c) Returns the modified value in RAX (without storing it back to RFLAGS)
modify_flags:
; Your code here
ret
Exercise 3.8 — Condition Code Prediction
For each cmp instruction followed by conditional jumps, state which jumps are taken. Assume the comparison values are as shown:
; (a)
mov rax, 0xFF
mov rbx, 0x01
cmp rax, rbx
; Is je taken? jne? ja? jb? jg? jl?
; (b)
mov eax, -1 ; 0xFFFFFFFF
mov ebx, 1
cmp eax, ebx
; Is ja taken? jb? jg? jl? jae? jbe?
; (c)
mov rax, 0
sub rax, 1 ; What is rax now? What flags are set?
cmp rax, 0
; Is jz taken? jl? jb? js?
; (d) -- a loop termination check
mov rcx, 10
.loop:
dec rcx
jnz .loop
; After the loop, what value does rcx contain? What flag caused the exit?
Exercise 3.9 — Reading RFLAGS with PUSHFQ
Write a NASM program that:
1. Performs 0x7FFFFFFFFFFFFFFF + 1 (which causes signed overflow)
2. Immediately saves RFLAGS to memory using PUSHFQ
3. Reads the saved flags back and checks bit 11 (OF)
4. Prints "Overflow!" if bit 11 is set, "No overflow" otherwise
section .data
msg_overflow db "Overflow!", 10
msg_overflow_len equ $ - msg_overflow
msg_no_overflow db "No overflow", 10
msg_no_overflow_len equ $ - msg_no_overflow
section .bss
saved_flags resq 1 ; space for RFLAGS
section .text
global _start
_start:
; Your code here:
; 1. Do the arithmetic that causes overflow
; 2. PUSHFQ to save flags on stack
; 3. POP the flags into saved_flags
; 4. AND with (1 << 11) to isolate OF
; 5. Print appropriate message
Section D: SIMD and Architecture Extensions
Exercise 3.10 — CPUID Detection
Write a complete NASM program that uses CPUID to detect and report the following CPU features, printing "supported" or "not supported" for each:
- SSE4.2 (CPUID leaf 1, ECX bit 20)
- AVX (CPUID leaf 1, ECX bit 28)
- AVX2 (CPUID leaf 7, sub-leaf 0, EBX bit 5)
- AES-NI (CPUID leaf 1, ECX bit 25)
Template:
section .data
msg_sse42 db "SSE4.2: "
...
section .text
global _start
check_features:
; CPUID leaf 1 for SSE4.2, AVX, AES-NI:
mov eax, 1
cpuid
; ECX bits: 25=AES-NI, 28=AVX, 20=SSE4.2
; Store ECX for later use
mov r12, rcx
; CPUID leaf 7 for AVX2:
mov eax, 7
xor ecx, ecx
cpuid
; EBX bit 5 = AVX2
...
Exercise 3.11 — XMM Register Basics
Without using any SIMD instructions, answer these questions about the XMM register file:
a) How many XMM registers are there in x86-64 (baseline, without AVX-512)?
b) Each XMM register is 128 bits wide. How many 32-bit floats can a single XMM register hold simultaneously?
c) How many 64-bit doubles? How many 64-bit integers?
d) An instruction ADDPS xmm0, xmm1 adds packed single-precision floats. How many additions does this perform simultaneously?
e) An instruction ADDPD xmm0, xmm1 adds packed double-precision floats. How many additions?
f) With AVX2 (YMM registers, 256 bits), how many 32-bit additions does VADDPS ymm0, ymm1, ymm2 perform?
Section E: The Instruction Pipeline
Exercise 3.12 — Data Hazards
Given the following instruction sequence, identify any data hazards (where one instruction needs the result of the previous one):
; Sequence A:
mov rax, [rdi] ; (1) load value from memory
add rax, rbx ; (2) add to loaded value
mov [rsi], rax ; (3) store result
; Sequence B (same computation, rearranged):
mov rcx, [rdx] ; (1) load from different address
add rax, rbx ; (2) add -- does (2) depend on (1)?
mov [rsi], rax ; (3) does (3) depend on (2)?
mov rdi, rcx ; (4) does (4) depend on (1)?
For sequence A: which instruction pairs have a data hazard? For sequence B: which instruction pairs have a data hazard? Which are independent?
Explain how a modern out-of-order CPU would handle the independent instructions in sequence B.
Exercise 3.13 — Instruction Length Prediction
Using what you know about REX prefixes and encoding, predict whether each instruction will be encoded with or without a REX prefix, and estimate the instruction length:
mov eax, ebx ; (a) -- 32-bit, no REX needed?
mov rax, rbx ; (b) -- 64-bit, needs REX.W
mov r8, rbx ; (c) -- uses R8, needs REX.R or REX.B
add eax, 1 ; (d) -- 32-bit immediate add
add rax, 1 ; (e) -- 64-bit immediate add
push r12 ; (f) -- pushing extended register
ret ; (g) -- no operands
Assemble these (put them in a .text section) and use objdump -d to verify your predictions.
Section F: Synthesis
Exercise 3.14 — A Complete Function
Write a NASM function with the following specification:
- Name: vector_dot_product
- Arguments: rdi = int64_t *a, rsi = int64_t *b, rdx = int64_t n
- Returns: rax = sum of a[i] * b[i] for i in 0..n-1
- Must preserve all callee-saved registers
- Must maintain stack alignment
- Must handle n=0 correctly (return 0)
; vector_dot_product: compute dot product of two int64 arrays
; Args: rdi = a, rsi = b, rdx = n
; Returns: rax = dot product
vector_dot_product:
; Your implementation here
; Hints:
; - IMUL rax, [rdi + rcx*8] for each element
; - Loop with a counter in rcx (or better: pointer arithmetic)
; - Accumulate in rax
; - Use callee-saved registers for loop variables
ret
After writing the function, verify it with a test case: a=[1,2,3], b=[4,5,6], n=3, expected=14+25+3*6=32.
Exercise 3.15 — GDB Register Inspection
Set up a short program with a deliberate aliasing issue:
section .text
global _start
_start:
mov rax, 0xDEADBEEFCAFEBABE
mov rbx, 0x1234567890ABCDEF
mov eax, ebx ; aliasing: does this set rax = rbx?
mov rax, 60
xor rdi, rdi
syscall
a) Before running: predict the value of RAX after line 3 (mov eax, ebx)
b) Run the program in GDB with break _start, stepi through each instruction, and use info registers or p/x $rax to check after each step
c) Was your prediction correct?
d) What would you need to change to make RAX equal to RBX (the full 64-bit value)?