Chapter 14 Exercises: Floating Point
Section A: SSE Scalar Basics
Exercise 1. Write NASM code using SSE2 scalar instructions to compute each expression. Inputs are in XMM0 (a) and XMM1 (b); write result to XMM0:
(a) a + b (double)
(b) a * a - b * b (float)
(c) sqrt(a * a + b * b) (double)
(d) a / b + b / a (double; use XMM2 as temporary)
Exercise 2. Write NASM code to implement the C function:
double clamp_double(double x, double lo, double hi) {
if (x < lo) return lo;
if (x > hi) return hi;
return x;
}
Use MINSD and MAXSD. Arguments: XMM0 = x, XMM1 = lo, XMM2 = hi. Return in XMM0.
Exercise 3. Convert each of the following using the appropriate SSE2 conversion instruction:
(a) int64_t in RAX → double in XMM0
(b) int32_t in EDI → float in XMM0
(c) double in XMM0 → int64_t in RAX (truncate toward zero, C cast semantics)
(d) float in XMM0 → double in XMM0 (widen)
(e) double in XMM0 → float in XMM0 (narrow)
Exercise 4.
Write a NASM function double int_pow(double base, int exp) that computes base^exp for non-negative integer exponents using a simple loop (multiply base exp times). Use SSE2 scalar instructions. Arguments: XMM0 = base, EDI = exp.
Exercise 5.
Implement double fabs_asm(double x) — absolute value of a double — using only bitwise operations on the XMM register (clear the sign bit). No SQRTSD or comparison needed.
Hint: The sign bit of a 64-bit double is bit 63. Use ANDPD with a mask.
Section B: Comparison and Branching
Exercise 6. Write NASM code to implement each comparison and branch:
(a) if (xmm0 > 0.0) goto positive
(b) if (xmm0 == xmm1) goto equal — but do it correctly with UCOMISD and the correct jump
(c) if (isnan(xmm0)) goto nan_handler
For (b), is je the right jump for floating-point equality? Explain.
Exercise 7. Write a NASM function implementing epsilon comparison:
int nearly_equal(double a, double b, double eps) {
return fabs(a - b) < eps;
}
Arguments: XMM0 = a, XMM1 = b, XMM2 = eps. Return 1 in EAX if nearly equal, 0 otherwise.
Exercise 8. The following code intends to find the maximum of an array of doubles. Find and fix the bug:
; Find max of double array: RDI = array, ESI = count, returns max in XMM0
array_max:
movsd xmm0, [rdi]
mov ecx, 1
.loop:
cmp ecx, esi
jge .done
movsd xmm1, [rdi + rcx*8]
ucomisd xmm0, xmm1
jge .skip ; BUG: what if xmm1 is NaN?
movsd xmm0, xmm1
.skip:
inc ecx
jmp .loop
.done:
ret
Section C: x87 FPU
Exercise 9. Translate the following x87 computation to equivalent SSE2 code:
; x87 version: computes sin²(x) + cos²(x) (should always equal 1.0)
fldl [rbx] ; load x
fsincos ; compute sin and cos
; After: ST(0) = cos(x), ST(1) = sin(x)
fmulp ; ST(0) = sin(x) * cos(x), wrong... need squares
; (complete the x87 computation first, then convert to SSE2)
First write the correct x87 version of sin²(x) + cos²(x), then convert it to SSE2 using a transcendental polynomial or call sin()/cos() from libm.
Exercise 10.
What is the output of FSIN when ST(0) = 0? When ST(0) = π/2? When ST(0) = π?
Describe how you would load π/2 using x87 stack instructions.
Section D: IEEE 754 Special Values
Exercise 11. For each operation, predict the IEEE 754 result (assuming no exceptions are masked would signal — but with default MXCSR they are masked and produce results):
(a) 1.0 / 0.0 (double)
(b) -1.0 / 0.0 (double)
(c) 0.0 / 0.0 (double)
(d) sqrt(-1.0) (double)
(e) 1.0 / 0.0 + 1.0 / 0.0 (double, both are +inf)
(f) 1.0 / 0.0 - 1.0 / 0.0 (double: +inf - +inf)
(g) NaN + 5.0
Exercise 12. Write NASM code to detect and handle each special case before performing floating-point division:
; safe_divide(double a, double b) → double
; Returns: a/b normally
; +inf if a > 0 and b == 0
; -inf if a < 0 and b == 0
; 0.0 if a == 0 and b == 0 (instead of NaN, return 0)
; XMM0 = a, XMM1 = b, return in XMM0
Exercise 13. Demonstrate the denormal performance problem:
Write two versions of a loop that computes x = x * 0.1 1000 times:
(a) Starting with x = 1.0 (stays normal for a few iterations, then becomes denormal)
(b) Starting with x = 1e-300 (immediately in denormal territory)
Add the FTZ (Flush-to-Zero) option and explain how it changes the behavior of (b).
Section E: Complete Functions
Exercise 14.
Implement double vector_dot_product(double *a, double *b, int n) in NASM using SSE2 scalar instructions. Sum the products a[i]*b[i] for i = 0..n-1.
Exercise 15.
Implement the Horner's method polynomial evaluator from the chapter as a complete NASM function with proper prologue/epilogue. Test with the polynomial 2x³ - 3x² + x - 5 at x = 1.5.
Coefficients (constant term first): [-5.0, 1.0, -3.0, 2.0], degree = 3.
Exercise 16.
Write a function void normalize_vector(double *v, int n) in NASM that:
1. Computes the length: len = sqrt(v[0]^2 + v[1]^2 + ... + v[n-1]^2)
2. Divides each element by len
Handle the edge case where len is zero (return without modification).
Exercise 17.
Implement the Newton-Raphson method for computing sqrt(a) without using SQRTSD. The iteration is: x_next = (x + a/x) / 2. Start with x = a/2 (or x = 1.0 for simplicity) and iterate until convergence (|x_next - x| < 1e-10).
Exercise 18 (MXCSR).
Write NASM code that:
(a) Saves the current MXCSR value
(b) Sets the rounding mode to "toward positive infinity" (ceil mode)
(c) Computes ceil(2.3) by converting double to int32 using CVTTSD2SI — but wait, will this work? What conversion instruction should you actually use to get the ceiling behavior?
(d) Restores the original MXCSR value
Exercise 19 (Challenge: Reciprocal Square Root).
Implement double rsqrt(double x) using the Newton-Raphson refinement of the SSE approximation:
y = initial_estimate(x) // hardware: RSQRTSS for float
y = y * (1.5 - 0.5 * x * y * y) // one Newton-Raphson step
Note: RSQRTSS is for float (single precision). For double, start with RSQRTSS and then apply two Newton-Raphson steps.
Exercise 20 (Challenge: Fixed-Point vs. Floating-Point). A financial system needs to add 1 cent (0.01) one million times to an account starting at 0.00. Implement both:
(a) Floating-point version (addsd in a loop): start with 0.0, add 0.01 in each iteration
(b) Fixed-point version: use int64_t cents (integer), add 1 each time
Compare the final results. The floating-point version will not produce exactly 10000.00. Quantify the error.
Write both as NASM functions and discuss the implications for financial software.