Chapter 9 Key Takeaways: Arithmetic and Logic

  1. ADD and SUB set six flags: CF, OF, ZF, SF, PF, AF. The two you care about most are CF (unsigned overflow) and OF (signed overflow). They answer different questions: CF says "did the unsigned result wrap around?" and OF says "did the signed result leave the representable range?"

  2. INC and DEC do NOT modify CF. This is intentional for multi-precision arithmetic but is a common source of bugs when programmers use INC expecting it to set CF like ADD does. Use ADD/SUB if you need CF behavior.

  3. xor eax, eax is the canonical way to zero a register. It is shorter than mov rax, 0, zeroes the upper 32 bits of RAX (via the 32-bit zero-extension rule), and is recognized as a zero idiom by modern processors, which can execute it at rename stage with zero latency.

  4. MUL (unsigned) and IMUL (signed) have fundamentally different forms. The single-operand form produces a full 128-bit result in RDX:RAX. The two-operand IMUL produces a truncated 64-bit result. The three-operand IMUL takes a destination and an immediate. Always use the form that matches your intent.

  5. Before IDIV, you must sign-extend RAX into RDX using CQO (for 64-bit) or CDQ (for 32-bit). Before DIV, clear RDX with xor rdx, rdx. Failing to set up RDX correctly gives wrong results or a #DE exception.

  6. Division by zero raises a #DE exception immediately — there is no flag to check afterward. Always validate the divisor before dividing.

  7. ADC and SBB are the tools for multi-precision arithmetic. They perform addition/subtraction and also incorporate the carry flag from the previous instruction. A chain of ADD + ADC + ADC + ... processes arbitrarily large integers 64 bits at a time.

  8. test reg, reg is the preferred way to check if a register is zero or negative. It is shorter than cmp reg, 0 and does not modify the register. test rax, 0x1 checks the low bit (odd/even) without modifying RAX.

  9. CMP performs SUB without storing the result. The flags it sets encode the relationship between the operands: JL/JB for less-than (signed/unsigned), JG/JA for greater-than, JE/JNE for equal. Which jump you use depends on whether the values are signed or unsigned.

  10. SAR (arithmetic right shift) propagates the sign bit; SHR (logical right shift) shifts in zeros. Use SAR for signed division by powers of 2, SHR for unsigned. Be aware that SAR rounds toward negative infinity, not toward zero — this differs from C's / operator for negative values.

  11. Logic instructions (AND, OR, XOR, NOT) clear CF and OF, and do NOT modify CF. TEST clears CF and OF. CMP sets CF and OF like a subtraction. This matters when logic instructions appear between a comparison and the conditional jump that uses the comparison result.

  12. Constant-time code in security contexts must avoid data-dependent branches and memory accesses. The XOR + OR accumulation pattern — visiting every element and accumulating differences without branching — is the standard technique. Compilers can and do optimize C constant-time code back to variable-time machine code; assembly gives direct control.

  13. The ADC/SBB chain carries its correctness in the carry flag. Between iterations, do not execute any instruction that modifies CF (ADD, SUB, NOT, SHL, etc.) — doing so silently corrupts the carry. Use INC/DEC to modify loop counters inside ADC chains.

  14. SHLD and SHRD enable multi-precision shifts. To shift a 128-bit value in RDX:RAX left by N: shld rdx, rax, N then shl rax, N. The SHLD fills the vacated bits of RDX from the high bits of RAX.

  15. Division by a power of 2 is always faster with shifts than with DIV. Unsigned: SHR. Signed: SAR (with adjustment for rounding if C semantics needed). The DIV instruction is slow (20-90 cycles depending on operand size); SHR/SAR is 1 cycle.