Case Study 2.2: Why Floating-Point Money Calculations Are Wrong

Open Assembly Language Project

Case Study 2.2: Why Floating-Point Money Calculations Are Wrong

IEEE 754 in practice, and what assembly programmers need to know about it

Overview

"Never use floating point for money" is advice every programmer has heard. Most people who repeat it don't know the exact mechanism of why it fails. This case study examines the IEEE 754 representation of common decimal values, demonstrates the specific arithmetic errors that occur, and shows the assembly-level bit patterns involved. Understanding this at the binary level — not just as a software engineering rule of thumb — is essential for any programmer who needs to reason precisely about floating-point behavior.

The Fundamental Problem: Base-2 Fractions

The decimal number 0.1 cannot be represented in binary with a finite number of bits. Here's why:

To represent a decimal fraction in binary, you repeatedly multiply by 2 and record the integer part:

0.1 × 2 = 0.2  → bit: 0
0.2 × 2 = 0.4  → bit: 0
0.4 × 2 = 0.8  → bit: 0
0.8 × 2 = 1.6  → bit: 1
0.6 × 2 = 1.2  → bit: 1
0.2 × 2 = 0.4  → bit: 0  ← back to a previous state!
0.4 × 2 = 0.8  → bit: 0
0.8 × 2 = 1.6  → bit: 1
0.6 × 2 = 1.2  → bit: 1
0.2 × 2 = 0.4  → ...

The sequence 0011 repeats indefinitely. In binary:

0.1 (decimal) = 0.0001100110011001100110011... (binary, repeating)

Just as 1/3 cannot be represented exactly in decimal (it repeats as 0.333...), 1/10 cannot be represented exactly in binary. The IEEE 754 float stores only the first 23 bits (single) or 52 bits (double) of this infinite sequence.

The Actual Bit Patterns

Let's examine the IEEE 754 double-precision representation of 0.1, 0.2, and 0.3:

0.1 in IEEE 754 double precision:

Sign: 0
Exponent: 01111111011  (biased 1019, actual exponent: -4)
Mantissa: 1001100110011001100110011001100110011001100110011010
          ^ implicit leading 1

Hex: 3FB999999999999A
Binary value stored: 0.1000000000000000055511151231257827...

0.2 in IEEE 754 double precision:

Sign: 0
Exponent: 01111111100  (biased 1020, actual exponent: -3)
Mantissa: 1001100110011001100110011001100110011001100110011010

Hex: 3FC999999999999A
Binary value stored: 0.2000000000000000111022302462515654...

0.3 in IEEE 754 double precision:

Sign: 0
Exponent: 01111111101  (biased 1021, actual exponent: -2)
Mantissa: 0011001100110011001100110011001100110011001100110011

Hex: 3FD3333333333333
Binary value stored: 0.2999999999999999888977697537484345...

0.1 + 0.2 computed:

Actual sum (infinite precision): 0.3000000000000000166533453693773481...
Rounded to IEEE 754 double:      0.3000000000000000444089209850062616...

Hex: 3FD3333333333334  (note: last nibble is 4, not 3!)

The stored value of 0.3 is 0x3FD3333333333333. The computed value of 0.1 + 0.2 is 0x3FD3333333333334.

They differ by exactly 1 ULP (Unit in the Last Place) — one bit in the mantissa. This is why 0.1 + 0.2 == 0.3 is false.

The Assembly View

In x86-64 assembly, floating-point values live in XMM registers. Let's trace through the comparison:

section .data
    val_0_1     dq 0.1          ; NASM converts to nearest double
    val_0_2     dq 0.2
    val_0_3     dq 0.3

section .text
    global _start

demonstrate_fp_error:
    ; Load 0.1 into xmm0
    movsd   xmm0, [val_0_1]    ; xmm0 = 0x3FB999999999999A

    ; Add 0.2
    addsd   xmm0, [val_0_2]    ; xmm0 = 0.1 + 0.2
                                ; Result: 0x3FD3333333333334

    ; Load 0.3 into xmm1
    movsd   xmm1, [val_0_3]    ; xmm1 = 0x3FD3333333333333

    ; Compare: are they equal?
    ucomisd xmm0, xmm1          ; unordered compare, sets ZF, PF, CF
                                ; ZF=0, PF=0, CF=1  (they are NOT equal)

    je      .equal              ; this jump is NOT taken
    ; We reach here: 0.1 + 0.2 ≠ 0.3
    ret
.equal:
    ; If we got here, they're equal -- we won't get here
    ret

The UCOMISD instruction sets flags as follows: - ZF=1, PF=0, CF=0: operands are equal (ordered) - ZF=0, PF=0, CF=0: first operand is greater (ordered) - ZF=0, PF=0, CF=1: first operand is less (ordered) - ZF=1, PF=1, CF=1: operands are unordered (NaN involved)

In our case, 0.1 + 0.2 is slightly greater than 0.3, so CF=1 (first operand less is wrong... actually: first > second means ZF=0, PF=0, CF=0; first < second means CF=1). The comparison says "not equal."

The Money Problem: A Concrete Example

Suppose a system tracks account balances in double:

double balance = 1000.00;
double purchase_1 = 4.99;
double purchase_2 = 4.99;
double purchase_3 = 4.99;

balance -= purchase_1;   // 995.01
balance -= purchase_2;   // 990.02
balance -= purchase_3;   // 985.03

// Expected: 985.03
// Actual: ???

Let's work through this at the bit level:

4.99 in IEEE 754 double:

Exact value: 4.99
Stored as:   4.9900000000000002131628...
Hex: 4013F5C28F5C28F6

1000.00 in IEEE 754 double:

1000.0 = 1.953125 × 2^9 × ...
Actually: 1000.0 is exactly representable (it's an integer that fits in 52 bits)
Hex: 408F400000000000

After three subtractions: The rounding error in 4.99 accumulates. After subtracting it three times, the stored result differs slightly from the exact mathematical answer of 985.03.

For financial software, this means: - Balance might show 985.0299999999999727... instead of 985.03 - Two accounts that should sum to the same total may differ by fractions of a cent - Running totals drift from correct values after many operations - Comparison balance == 985.03 may fail

The Scale Problem: Large Floats Lose Precision

There's a second, subtler problem: floating-point values lose integer precision as they grow large.

An IEEE 754 double has 52 bits of mantissa plus the implicit leading 1, giving 53 significant bits total. This means doubles can represent all integers up to 2^53 = 9,007,199,254,740,992 exactly.

Beyond 2^53, consecutive representable doubles are more than 1 apart:

2^53     = 9007199254740992  (exact)
2^53 + 1 = 9007199254740993  (NOT exactly representable — rounds to 9007199254740992!)
2^53 + 2 = 9007199254740994  (exact)

For financial values in cents, this becomes a problem around:

$90,071,992,547,409.92 (about 90 trillion dollars)

Most businesses don't have 90-trillion-dollar transactions. But high-frequency trading systems, central bank clearing systems, and cryptocurrency ledgers can approach these magnitudes.

The Correct Solution: Integer Arithmetic

The correct solution for monetary calculations is to avoid floating-point entirely and use integer arithmetic in the smallest denomination:

// Use cents (or millicents, or whatever the smallest unit is)
typedef int64_t cents_t;  // 64-bit integer

cents_t balance = 100000;     // $1000.00 in cents
cents_t purchase = 499;       // $4.99 in cents

balance -= purchase;          // 99501 cents = $995.01, exactly correct
balance -= purchase;          // 99002 cents = $990.02, exactly correct
balance -= purchase;          // 98503 cents = $985.03, exactly correct

In assembly:

; Integer money arithmetic -- exact
; Using 64-bit integers (cents)

mov  rax, 100000    ; balance = 100000 cents ($1000.00)
sub  rax, 499       ; balance -= $4.99 (499 cents)
sub  rax, 499       ; balance -= $4.99
sub  rax, 499       ; balance -= $4.99

; rax = 98503 cents = $985.03 exactly
; No floating-point rounding errors

This works because 64-bit integers can represent all values up to 2^63 - 1 = 9,223,372,036,854,775,807 cents = $92,233,720,368,547,758.07. That's enough precision for even the largest financial transactions.

When Floating-Point IS Correct to Use

Despite its limitations for money, IEEE 754 is the right tool for many applications:

Scientific computation: Physical quantities like temperature, pressure, and distance don't have exact decimal values. The approximation of floating-point matches the precision of the measurements.

Machine learning: Training neural networks involves millions of multiplications and additions where tiny rounding errors average out and don't matter to the final result. Low-precision formats (FP16, BF16) are actively used to improve throughput.

Graphics: Color values, vertex coordinates, and transformation matrices need relative precision, not absolute. Floating-point's relative error (±0.5 ULP) is appropriate.

Statistical calculations: Standard deviations, regression coefficients, and probability values are inherently approximate quantities.

The key question is: does your application require absolute precision (like money, where $9.99 must be exactly 999 cents) or relative precision (like physics, where "accurate to 6 significant figures" is sufficient)?

Assembly-Level Floating-Point Operations Reference

For reference, the SSE2 floating-point instructions used throughout this book:

Instruction	Meaning
`movss xmm0, [addr]`	Load 32-bit float (single scalar)
`movsd xmm0, [addr]`	Load 64-bit double (double scalar)
`addss xmm0, xmm1`	Add single-precision scalars
`addsd xmm0, xmm1`	Add double-precision scalars
`subss xmm0, xmm1`	Subtract single-precision scalars
`mulss xmm0, xmm1`	Multiply single-precision scalars
`divss xmm0, xmm1`	Divide single-precision scalars
`sqrtss xmm0, xmm1`	Square root single-precision
`comiss xmm0, xmm1`	Compare ordered single-precision
`ucomiss xmm0, xmm1`	Compare unordered single-precision (NaN safe)
`cvtsi2sd xmm0, rax`	Convert 64-bit integer to double
`cvttsd2si rax, xmm0`	Convert double to 64-bit integer (truncate)

The distinction between ordered (COMISS) and unordered (UCOMISS) comparison: ordered comparison raises an exception if either operand is NaN; unordered comparison sets ZF=PF=CF=1 instead. For most code, use UCOMISS to handle NaN gracefully.

Summary

IEEE 754 floating-point is not broken — it does exactly what the specification says. The specification represents real numbers as the nearest expressible binary fraction with 52 or 23 bits of mantissa. That representation cannot exactly represent most decimal fractions, including the basic fractions used in currency (tenths, hundredths).

The consequences for programmers: 1. Never use == to compare floating-point values unless you derived them through identical operations 2. Never use floating-point for money, counts, or other values that must be exact 3. Use integers in the smallest denomination for financial calculations 4. For scientific calculations, understand ULP error propagation and choose double over float when precision matters 5. Be aware that accumulation of small errors across many operations (like the money example) compounds over time 6. The hardware correctly implements IEEE 754 — if your FP calculation gives the wrong answer, the specification predicted it would