Chapter 14 Exercises: Floating Point

Section A: SSE Scalar Basics

Exercise 1. Write NASM code using SSE2 scalar instructions to compute each expression. Inputs are in XMM0 (a) and XMM1 (b); write result to XMM0:

(a) a + b (double) (b) a * a - b * b (float) (c) sqrt(a * a + b * b) (double) (d) a / b + b / a (double; use XMM2 as temporary)


Exercise 2. Write NASM code to implement the C function:

double clamp_double(double x, double lo, double hi) {
    if (x < lo) return lo;
    if (x > hi) return hi;
    return x;
}

Use MINSD and MAXSD. Arguments: XMM0 = x, XMM1 = lo, XMM2 = hi. Return in XMM0.


Exercise 3. Convert each of the following using the appropriate SSE2 conversion instruction:

(a) int64_t in RAX → double in XMM0 (b) int32_t in EDI → float in XMM0 (c) double in XMM0 → int64_t in RAX (truncate toward zero, C cast semantics) (d) float in XMM0 → double in XMM0 (widen) (e) double in XMM0 → float in XMM0 (narrow)


Exercise 4. Write a NASM function double int_pow(double base, int exp) that computes base^exp for non-negative integer exponents using a simple loop (multiply base exp times). Use SSE2 scalar instructions. Arguments: XMM0 = base, EDI = exp.


Exercise 5. Implement double fabs_asm(double x) — absolute value of a double — using only bitwise operations on the XMM register (clear the sign bit). No SQRTSD or comparison needed.

Hint: The sign bit of a 64-bit double is bit 63. Use ANDPD with a mask.


Section B: Comparison and Branching

Exercise 6. Write NASM code to implement each comparison and branch:

(a) if (xmm0 > 0.0) goto positive (b) if (xmm0 == xmm1) goto equal — but do it correctly with UCOMISD and the correct jump (c) if (isnan(xmm0)) goto nan_handler

For (b), is je the right jump for floating-point equality? Explain.


Exercise 7. Write a NASM function implementing epsilon comparison:

int nearly_equal(double a, double b, double eps) {
    return fabs(a - b) < eps;
}

Arguments: XMM0 = a, XMM1 = b, XMM2 = eps. Return 1 in EAX if nearly equal, 0 otherwise.


Exercise 8. The following code intends to find the maximum of an array of doubles. Find and fix the bug:

; Find max of double array: RDI = array, ESI = count, returns max in XMM0
array_max:
    movsd xmm0, [rdi]
    mov   ecx, 1
.loop:
    cmp   ecx, esi
    jge   .done
    movsd xmm1, [rdi + rcx*8]
    ucomisd xmm0, xmm1
    jge   .skip         ; BUG: what if xmm1 is NaN?
    movsd xmm0, xmm1
.skip:
    inc ecx
    jmp .loop
.done:
    ret

Section C: x87 FPU

Exercise 9. Translate the following x87 computation to equivalent SSE2 code:

; x87 version: computes sin²(x) + cos²(x) (should always equal 1.0)
fldl [rbx]    ; load x
fsincos       ; compute sin and cos
; After: ST(0) = cos(x), ST(1) = sin(x)
fmulp         ; ST(0) = sin(x) * cos(x), wrong... need squares
; (complete the x87 computation first, then convert to SSE2)

First write the correct x87 version of sin²(x) + cos²(x), then convert it to SSE2 using a transcendental polynomial or call sin()/cos() from libm.


Exercise 10. What is the output of FSIN when ST(0) = 0? When ST(0) = π/2? When ST(0) = π?

Describe how you would load π/2 using x87 stack instructions.


Section D: IEEE 754 Special Values

Exercise 11. For each operation, predict the IEEE 754 result (assuming no exceptions are masked would signal — but with default MXCSR they are masked and produce results):

(a) 1.0 / 0.0 (double) (b) -1.0 / 0.0 (double) (c) 0.0 / 0.0 (double) (d) sqrt(-1.0) (double) (e) 1.0 / 0.0 + 1.0 / 0.0 (double, both are +inf) (f) 1.0 / 0.0 - 1.0 / 0.0 (double: +inf - +inf) (g) NaN + 5.0


Exercise 12. Write NASM code to detect and handle each special case before performing floating-point division:

; safe_divide(double a, double b) → double
; Returns: a/b normally
;          +inf if a > 0 and b == 0
;          -inf if a < 0 and b == 0
;          0.0 if a == 0 and b == 0 (instead of NaN, return 0)
; XMM0 = a, XMM1 = b, return in XMM0

Exercise 13. Demonstrate the denormal performance problem:

Write two versions of a loop that computes x = x * 0.1 1000 times: (a) Starting with x = 1.0 (stays normal for a few iterations, then becomes denormal) (b) Starting with x = 1e-300 (immediately in denormal territory)

Add the FTZ (Flush-to-Zero) option and explain how it changes the behavior of (b).


Section E: Complete Functions

Exercise 14. Implement double vector_dot_product(double *a, double *b, int n) in NASM using SSE2 scalar instructions. Sum the products a[i]*b[i] for i = 0..n-1.


Exercise 15. Implement the Horner's method polynomial evaluator from the chapter as a complete NASM function with proper prologue/epilogue. Test with the polynomial 2x³ - 3x² + x - 5 at x = 1.5.

Coefficients (constant term first): [-5.0, 1.0, -3.0, 2.0], degree = 3.


Exercise 16. Write a function void normalize_vector(double *v, int n) in NASM that: 1. Computes the length: len = sqrt(v[0]^2 + v[1]^2 + ... + v[n-1]^2) 2. Divides each element by len

Handle the edge case where len is zero (return without modification).


Exercise 17. Implement the Newton-Raphson method for computing sqrt(a) without using SQRTSD. The iteration is: x_next = (x + a/x) / 2. Start with x = a/2 (or x = 1.0 for simplicity) and iterate until convergence (|x_next - x| < 1e-10).


Exercise 18 (MXCSR). Write NASM code that: (a) Saves the current MXCSR value (b) Sets the rounding mode to "toward positive infinity" (ceil mode) (c) Computes ceil(2.3) by converting double to int32 using CVTTSD2SI — but wait, will this work? What conversion instruction should you actually use to get the ceiling behavior? (d) Restores the original MXCSR value


Exercise 19 (Challenge: Reciprocal Square Root). Implement double rsqrt(double x) using the Newton-Raphson refinement of the SSE approximation:

y = initial_estimate(x)   // hardware: RSQRTSS for float
y = y * (1.5 - 0.5 * x * y * y)  // one Newton-Raphson step

Note: RSQRTSS is for float (single precision). For double, start with RSQRTSS and then apply two Newton-Raphson steps.


Exercise 20 (Challenge: Fixed-Point vs. Floating-Point). A financial system needs to add 1 cent (0.01) one million times to an account starting at 0.00. Implement both:

(a) Floating-point version (addsd in a loop): start with 0.0, add 0.01 in each iteration (b) Fixed-point version: use int64_t cents (integer), add 1 each time

Compare the final results. The floating-point version will not produce exactly 10000.00. Quantify the error.

Write both as NASM functions and discuss the implications for financial software.