Chapter 9 Key Takeaways: Arithmetic and Logic
-
ADD and SUB set six flags: CF, OF, ZF, SF, PF, AF. The two you care about most are CF (unsigned overflow) and OF (signed overflow). They answer different questions: CF says "did the unsigned result wrap around?" and OF says "did the signed result leave the representable range?"
-
INC and DEC do NOT modify CF. This is intentional for multi-precision arithmetic but is a common source of bugs when programmers use INC expecting it to set CF like ADD does. Use ADD/SUB if you need CF behavior.
-
xor eax, eaxis the canonical way to zero a register. It is shorter thanmov rax, 0, zeroes the upper 32 bits of RAX (via the 32-bit zero-extension rule), and is recognized as a zero idiom by modern processors, which can execute it at rename stage with zero latency. -
MUL (unsigned) and IMUL (signed) have fundamentally different forms. The single-operand form produces a full 128-bit result in RDX:RAX. The two-operand IMUL produces a truncated 64-bit result. The three-operand IMUL takes a destination and an immediate. Always use the form that matches your intent.
-
Before IDIV, you must sign-extend RAX into RDX using CQO (for 64-bit) or CDQ (for 32-bit). Before DIV, clear RDX with
xor rdx, rdx. Failing to set up RDX correctly gives wrong results or a #DE exception. -
Division by zero raises a #DE exception immediately — there is no flag to check afterward. Always validate the divisor before dividing.
-
ADC and SBB are the tools for multi-precision arithmetic. They perform addition/subtraction and also incorporate the carry flag from the previous instruction. A chain of ADD + ADC + ADC + ... processes arbitrarily large integers 64 bits at a time.
-
test reg, regis the preferred way to check if a register is zero or negative. It is shorter thancmp reg, 0and does not modify the register.test rax, 0x1checks the low bit (odd/even) without modifying RAX. -
CMP performs SUB without storing the result. The flags it sets encode the relationship between the operands: JL/JB for less-than (signed/unsigned), JG/JA for greater-than, JE/JNE for equal. Which jump you use depends on whether the values are signed or unsigned.
-
SAR (arithmetic right shift) propagates the sign bit; SHR (logical right shift) shifts in zeros. Use SAR for signed division by powers of 2, SHR for unsigned. Be aware that SAR rounds toward negative infinity, not toward zero — this differs from C's
/operator for negative values. -
Logic instructions (AND, OR, XOR, NOT) clear CF and OF, and do NOT modify CF. TEST clears CF and OF. CMP sets CF and OF like a subtraction. This matters when logic instructions appear between a comparison and the conditional jump that uses the comparison result.
-
Constant-time code in security contexts must avoid data-dependent branches and memory accesses. The XOR + OR accumulation pattern — visiting every element and accumulating differences without branching — is the standard technique. Compilers can and do optimize C constant-time code back to variable-time machine code; assembly gives direct control.
-
The ADC/SBB chain carries its correctness in the carry flag. Between iterations, do not execute any instruction that modifies CF (ADD, SUB, NOT, SHL, etc.) — doing so silently corrupts the carry. Use INC/DEC to modify loop counters inside ADC chains.
-
SHLD and SHRD enable multi-precision shifts. To shift a 128-bit value in RDX:RAX left by N:
shld rdx, rax, Nthenshl rax, N. The SHLD fills the vacated bits of RDX from the high bits of RAX. -
Division by a power of 2 is always faster with shifts than with DIV. Unsigned: SHR. Signed: SAR (with adjustment for rounding if C semantics needed). The DIV instruction is slow (20-90 cycles depending on operand size); SHR/SAR is 1 cycle.