Chapter 9: Arithmetic and Logic

Open Assembly Language Project

11 min read

The Arithmetic Logic Unit executes every computation in your program. This chapter covers the complete vocabulary: addition and subtraction with their carry and overflow semantics, multiplication in its four distinct forms, division with its quirky...

In This Chapter

The ALU's Full Vocabulary
ADD and SUB: The Foundation
INC and DEC: The Increment/Decrement Trap
NEG: Two's Complement Negation
ADC and SBB: Extending Arithmetic Beyond 64 Bits
MUL and IMUL: Multiplication in All Its Forms
DIV and IDIV: Division and Its Quirks
Boolean Logic Instructions
Shift and Rotate Instructions
C Comparison: Arithmetic in C vs. Assembly
Complete Example: Implementing a 128-bit Multiply
Register Trace: ADD, ADC, MUL, and DIV
Summary

Key Takeaways Exercises Quiz Case Study 01 Case Study 02 Further Reading

Chapter 9: Arithmetic and Logic

The ALU's Full Vocabulary

The Arithmetic Logic Unit executes every computation in your program. This chapter covers the complete vocabulary: addition and subtraction with their carry and overflow semantics, multiplication in its four distinct forms, division with its quirky two-register dividend, the Boolean logic instructions, and the shift and rotate family. Each instruction comes with a precise description of how it affects the RFLAGS register — because conditional branches in Chapter 10 read exactly those flags to implement control flow.

Two themes run through this chapter: the distinction between signed and unsigned arithmetic (which determines which flag matters for overflow), and the utility of multi-precision arithmetic (extending the hardware's 64-bit native width to 128 bits or beyond using ADC and SBB chains).

ADD and SUB: The Foundation

Basic Forms

; ADD forms
add rax, rbx          ; rax += rbx (register + register)
add rax, 42           ; rax += 42  (register + immediate)
add rax, [rbx]        ; rax += *rbx (register + memory)
add [rbx], rax        ; *rbx += rax (memory + register)
add [rbx], 42         ; *rbx += 42 (memory += immediate, 32-bit sign-extended imm)

; SUB forms (same patterns)
sub rax, rbx          ; rax -= rbx
sub rax, 100          ; rax -= 100
sub rax, [rbx]        ; rax -= memory
sub [rbx], rax        ; memory -= rax

Note: ADD and SUB cannot use memory as both source and destination simultaneously (the standard memory-to-memory restriction applies here too).

Flag Effects of ADD and SUB

Every ADD and SUB modifies six flags in RFLAGS:

Flag	Name	Set when	Meaning
CF	Carry	Unsigned overflow	Result exceeded unsigned range
OF	Overflow	Signed overflow	Result exceeded signed range
ZF	Zero	Result == 0	Equality or zero result
SF	Sign	Result MSB == 1	Result is negative (in two's complement)
PF	Parity	Low byte has even number of set bits	Rarely used in modern code
AF	Auxiliary	Carry from bit 3 to bit 4	Used only for BCD arithmetic (rare)

Unsigned vs. Signed Overflow: The Critical Distinction

The same bits can represent different values depending on interpretation. ADD does not know whether you intend the operands as signed or unsigned — it just adds the bits. But it sets two flags so you can check either interpretation:

; Unsigned overflow example:
mov rax, 0xFFFFFFFFFFFFFFFF  ; unsigned: 18446744073709551615
add rax, 1                    ; result: 0x0000000000000000
; CF = 1 (unsigned overflow: 2^64 does not fit in 64 bits)
; OF = 0 (no signed overflow: -1 + 1 = 0, which is in signed range)

; Signed overflow example:
mov rax, 0x7FFFFFFFFFFFFFFF  ; signed: 9223372036854775807 (INT64_MAX)
add rax, 1                    ; result: 0x8000000000000000 = -9223372036854775808
; CF = 0 (no unsigned overflow: large unsigned + 1 is still in range... wait)
; Actually: 0x7FFF...+1 = 0x8000..., no carry out of bit 63
; OF = 1 (signed overflow: INT64_MAX + 1 wrapped to INT64_MIN)

The rule: - CF = 1 after ADD means the result overflowed when operands are treated as unsigned - OF = 1 after ADD means the result overflowed when operands are treated as signed - CF = 1 after SUB means borrow occurred (a < b in unsigned comparison) - OF = 1 after SUB means the result overflowed the signed range

💡 Mental Model: The CPU is a hardware calculator that does not know about types. It computes the binary result and then sets flags to describe what happened. You — the programmer — decide whether to check CF (for unsigned) or OF (for signed) based on what the values represent.

; Correct unsigned overflow check:
add rax, rbx
jc  overflow_handler      ; JC = jump if carry (CF=1)

; Correct signed overflow check:
add rax, rbx
jo  overflow_handler      ; JO = jump if overflow (OF=1)

; Testing for zero result:
sub rax, rbx
jz  equal                 ; JZ = jump if zero (ZF=1), i.e., rax was equal to rbx

INC and DEC: The Increment/Decrement Trap

inc rax                   ; rax++ (modifies ZF, SF, OF, PF, AF but NOT CF)
dec rax                   ; rax-- (same flag behavior)

⚠️ Common Mistake: INC and DEC do not modify CF. This is a historical artifact: the original 8086 INC/DEC were designed to allow loop counters without disturbing the carry flag from previous multi-precision arithmetic. The practical implication: you cannot use INC/DEC to test for unsigned overflow. If you need to check for overflow, use ADD/SUB instead.

; This code has a subtle bug:
xor rcx, rcx
.loop:
    inc rcx               ; CF is NOT set when rcx overflows from MAX to 0
    jnc .loop             ; JNC checks CF, which INC never sets → infinite loop!

; Correct:
xor rcx, rcx
.loop:
    add rcx, 1            ; CF IS set if rcx wraps (unlikely but correct)
    jnc .loop

In practice, INC is the right choice for loop counters when you are not doing multi-precision arithmetic — it is a shorter encoding and the no-CF-modification is rarely relevant. Just be aware of the distinction.

NEG: Two's Complement Negation

neg rax                   ; rax = -rax (two's complement negate)
neg qword [rbx]           ; negate memory

NEG sets CF = 0 if the operand was 0 (no borrow), CF = 1 otherwise. It sets OF = 1 if the operand was INT_MIN (the only value where negation overflows: -INT_MIN overflows back to INT_MIN in two's complement).

; Implementing abs(rax) with NEG:
test rax, rax              ; set SF based on rax
jns  .positive             ; if positive (SF=0), done
neg  rax                   ; otherwise, negate
.positive:

ADC and SBB: Extending Arithmetic Beyond 64 Bits

ADC (Add with Carry) and SBB (Subtract with Borrow) are the tools for multi-precision arithmetic. They perform the operation and also add or subtract the current value of CF:

adc rax, rbx              ; rax = rax + rbx + CF
sbb rax, rbx              ; rax = rax - rbx - CF

128-bit Addition

To add two 128-bit integers, each stored as a pair of 64-bit values:

; Add 128-bit values:
; A = [A_hi:A_lo] in [rbp-8]:[rbp-16]
; B = [B_hi:B_lo] in [rbp-24]:[rbp-32]
; Result in [rdx:rax]

mov rax, [rbp - 16]       ; rax = A_lo
mov rdx, [rbp - 8]        ; rdx = A_hi
add rax, [rbp - 32]       ; rax = A_lo + B_lo (may set CF)
adc rdx, [rbp - 24]       ; rdx = A_hi + B_hi + CF (carries the overflow!)

; rax = low 64 bits of result
; rdx = high 64 bits of result

The sequence: ADD the low halves (which may produce a carry), then ADC the high halves (which includes the carry from the low half). The ADC/SBB chain can extend to any precision: 256-bit, 512-bit, or arbitrarily large by processing 64 bits at a time.

256-bit Addition Example

; Add two 256-bit integers
; A = [a3:a2:a1:a0], B = [b3:b2:b1:b0]
; a0..a3 and b0..b3 in registers r8-r11 and r12-r15
; (setup assumed)

add  r8, r12              ; low 64: a0 + b0, set CF
adc  r9, r13              ; next 64: a1 + b1 + CF
adc  r10, r14             ; next 64: a2 + b2 + CF
adc  r11, r15             ; high 64: a3 + b3 + CF
; If CF is still set, there was 256-bit overflow

🔍 Under the Hood: Multi-precision libraries like GMP (GNU Multiple Precision) use exactly this ADC chain technique. For big integers (1024-bit RSA keys, for example), the inner multiply loop is a sequence of MULX + ADC instructions that implements schoolbook multiplication one 64-bit limb at a time.

MUL and IMUL: Multiplication in All Its Forms

Multiplication is complex because multiplying two 64-bit values can produce a 128-bit result. x86-64 handles this with two implicit output registers.

Single-Operand MUL (Unsigned, Full 128-bit Result)

; MUL src — unsigned multiply RDX:RAX = RAX × src
mov rax, 0xFFFFFFFFFFFFFFFF   ; rax = 2^64 - 1
mov rbx, 2
mul rbx                        ; rdx:rax = rax * rbx
; rdx = 1 (high 64 bits)
; rax = 0xFFFFFFFFFFFFFFFE (low 64 bits)

The destination is always RDX:RAX. The source can be any register or memory operand. There is no immediate form for single-operand MUL.

CF and OF are set to 0 if RDX is zero (high bits are zero — result fits in 64 bits). CF and OF are set to 1 if RDX is nonzero (result does not fit in 64 bits).

Single-Operand IMUL (Signed, Full 128-bit Result)

; IMUL src — signed multiply RDX:RAX = RAX × src
mov rax, -1                    ; rax = 0xFFFFFFFFFFFFFFFF
mov rbx, -1
imul rbx                       ; rdx:rax = (-1) * (-1) = 1
; rdx = 0
; rax = 1

Two-Operand IMUL (Truncated Result in Destination)

; IMUL dst, src — signed multiply dst = dst × src (result truncated to 64 bits)
imul rax, rbx                  ; rax = rax * rbx (truncated, no RDX involvement)
imul rax, [rbx]                ; rax = rax * memory

The two-operand form truncates the result to 64 bits. CF and OF are set if the result does not fit in the signed 64-bit range. This is the most common form for general use when you know the result fits in 64 bits.

Three-Operand IMUL (Truncated, Explicit Destination)

; IMUL dst, src, imm — signed multiply dst = src × immediate (truncated)
imul rax, rbx, 100             ; rax = rbx * 100
imul rdx, rcx, -7              ; rdx = rcx * (-7)
imul r8, [rdi + 8], 1024       ; r8 = memory * 1024

The three-operand form is the most convenient: explicit destination, source, and immediate. The immediate can be 8-bit (sign-extended) or 32-bit (sign-extended). This is the form GCC uses for multiplication by arbitrary constants when LEA sequences are too complex.

MULX: Multiply Without Clobbering Flags

BMI2 introduces MULX, which performs the same 128-bit unsigned multiply as MUL but does not modify any flags:

; Requires BMI2 (Haswell+ for Intel, Piledriver+ for AMD)
; MULX rdx_high, dst_low, src — RDX:dst_low = RDX × src (RDX is implicit source)
mov rdx, multiplicand
mulx r8, r9, rbx               ; r8:r9 = rdx * rbx (r8=high, r9=low)

MULX is designed for pipelined multi-precision multiplication where you want to interleave multiplies and adds without the flag-clobbering of MUL interrupting the flow.

DIV and IDIV: Division and Its Quirks

Division is the odd instruction in the ALU: the dividend must always be in RDX:RAX (a 128-bit value), and after division, the quotient lands in RAX and the remainder in RDX.

; DIV src — unsigned divide RDX:RAX ÷ src → RAX=quotient, RDX=remainder
xor rdx, rdx                  ; clear high 64 bits of dividend
mov rax, 100                   ; low 64 bits: dividend = 100
mov rbx, 7
div rbx                        ; rax = 100/7 = 14, rdx = 100%7 = 2

; IDIV src — signed divide RDX:RAX ÷ src → RAX=quotient, RDX=remainder
; Must sign-extend RAX into RDX first:
mov rax, -100
cqo                            ; CQO: sign-extends RAX into RDX (CDQ for 32-bit)
mov rbx, 7
idiv rbx                       ; rax = -100/7 = -14, rdx = -100%7 = -2

The CQO/CDQ/CWD Family

Before IDIV, you must sign-extend RAX into RDX. The cqo instruction does this:

cqo          ; RDX:RAX = sign_extend(RAX) — for 64-bit IDIV
cdq          ; EDX:EAX = sign_extend(EAX) — for 32-bit IDIV
cwd          ; DX:AX = sign_extend(AX)   — for 16-bit IDIV

For unsigned DIV, simply clear RDX with xor rdx, rdx.

Division by Zero

If the divisor is zero, or if the quotient would not fit in RAX (e.g., dividing a huge 128-bit number by 1, resulting in a quotient that exceeds 64 bits), the processor raises #DE (Divide Error exception). In an operating system, this becomes a SIGFPE signal to the process. There is no flag you can check after the fact — the exception fires during the instruction.

; Always check divisor before dividing
test rbx, rbx                  ; is divisor zero?
jz   division_by_zero_handler  ; handle the error before dividing
xor  rdx, rdx
div  rbx                       ; safe to proceed

Dividing by Powers of 2

Division by a power of 2 is much faster with shifts:

; Unsigned divide by 8 (2^3):
shr rax, 3                     ; rax = rax / 8 (unsigned, logical shift)

; Signed divide by 8 (rounds toward negative infinity):
sar rax, 3                     ; rax = rax / 8 (signed, arithmetic shift)
; Note: SAR rounds toward -inf; C division rounds toward zero
; For C semantics on negative values, need adjustment:
mov rbx, rax
sar rbx, 63                    ; rbx = all-ones if negative, all-zeros if positive
and rbx, 7                     ; rbx = 7 if negative (adjustment), 0 if positive
add rax, rbx                   ; add adjustment to handle rounding
sar rax, 3                     ; now correct C-style division

GCC generates this adjustment sequence automatically for signed division by power-of-2 constants.

Boolean Logic Instructions

AND: Masking Bits

and rax, 0xFF                  ; keep only the low 8 bits (mask upper 56 bits to 0)
and rax, rbx                   ; rax &= rbx
and [rbx], rax                 ; memory &= rax

Clearing specific bits (mask has 0s in the positions to clear):

; Clear bit 5 of rax:
and rax, ~(1 << 5)             ; NASM: and rax, 0xFFFFFFFFFFFFFFDF
; (~ is bitwise NOT in expressions; NASM supports this in constants)

OR: Setting Bits

or rax, 0x100                  ; set bit 8 of rax
or rax, rbx                    ; rax |= rbx

XOR: Toggling Bits and Zeroing Registers

xor rax, 0x1                   ; toggle bit 0 of rax
xor rax, rax                   ; rax = 0  ← canonical zero idiom (2 bytes, faster than mov rax,0)
xor eax, eax                   ; eax = 0, also zeros upper 32 bits of rax (shorter encoding)

xor eax, eax is the standard way to zero a register in x86-64. It is 2 bytes shorter than mov rax, 0 (which needs an 8-byte immediate or REX prefix), and processors have special zero-idiom recognition that can eliminate the data dependency on the previous value of EAX.

⚡ Performance Note: Modern Intel processors recognize xor reg, reg as a "zeroing idiom" and execute it at rename stage without actually reading the previous register value. This means it has zero latency and does not consume an ALU execution slot. sub eax, eax has the same optimization. mov eax, 0 does not always get this treatment.

NOT: Bitwise Complement

not rax                        ; rax = ~rax (all bits flipped)
not byte [rbx]                 ; flip all bits in memory byte

NOT does not modify any flags.

TEST: AND Without Storing Result

TEST computes the bitwise AND of its two operands and sets flags based on the result, but does not write the result anywhere:

test rax, rax                  ; sets ZF if rax == 0, SF if rax < 0 (signed)
test rax, 0x1                  ; sets ZF if bit 0 is clear, clears ZF if bit 0 is set
test rax, rbx                  ; flags reflect rax & rbx (but rax and rbx unchanged)

test rax, rax is the standard way to check if a register is zero without modifying it. It is preferred over cmp rax, 0 because it has a shorter encoding.

CMP: SUB Without Storing Result

CMP subtracts its second operand from the first, sets flags, and discards the result:

cmp rax, rbx                   ; flags reflect rax - rbx (but rax unchanged)
cmp rax, 0                     ; same as test rax, rax for ZF, but also sets SF/CF correctly
cmp [rbx], rax                 ; compare memory to register

After CMP, you check flags with conditional jumps (Chapter 10) or conditional moves: - jl (jump if less, signed): checks SF ≠ OF - jb (jump if below, unsigned): checks CF = 1 - je / jz: checks ZF = 1

Flag Summary Table

Instruction	CF	OF	ZF	SF	AF	PF
ADD / SUB	✓	✓	✓	✓	✓	✓
ADC / SBB	✓	✓	✓	✓	✓	✓
INC / DEC	—	✓	✓	✓	✓	✓
NEG	✓	✓	✓	✓	✓	✓
MUL	✓	✓	?	?	?	?
IMUL	✓	✓	?	?	?	?
DIV / IDIV	?	?	?	?	?	?
AND / OR / XOR	0	0	✓	✓	?	✓
NOT	—	—	—	—	—	—
TEST	0	0	✓	✓	0	✓
CMP	✓	✓	✓	✓	✓	✓
SHL / SHR	✓	✓	—	✓	?	✓
SAR	✓	✓	—	✓	?	✓

Legend: ✓ = modified based on result, — = not modified, 0 = cleared, ? = undefined/unpredictable

Shift and Rotate Instructions

SHL and SHR: Logical Shifts

; SHL: shift left logical (shift in zeros from right)
shl rax, 1                     ; rax <<= 1 (multiply by 2)
shl rax, 3                     ; rax <<= 3 (multiply by 8)
shl rax, cl                    ; shift count from CL (the only register allowed as shift count)
shl rax, 63                    ; maximum useful shift

; SHR: shift right logical (shift in zeros from left)
shr rax, 1                     ; rax >>= 1 (unsigned divide by 2)
shr rax, cl                    ; shift count from CL

SHL/SHR treat the operand as an unsigned binary number. After SHL, CF holds the last bit shifted out of the MSB. After SHR, CF holds the last bit shifted out of the LSB.

The shift count must be in CL or be an 8-bit immediate. For 64-bit operands, only bits 5:0 of the count are used (count mod 64). A count of 0 does not modify flags.

SAR: Arithmetic Shift Right

; SAR: shift right arithmetic (shifts in copies of the sign bit)
sar rax, 1                     ; signed divide by 2 (rounds toward -inf)
sar rax, cl

SAR propagates the sign bit (MSB) into the vacated positions. This is the correct right shift for signed values: shifting -8 right by 1 gives -4, not 4.

⚠️ Common Mistake: SAR rounds toward negative infinity, not toward zero. C's right shift for signed integers is implementation-defined, but most compilers (GCC, Clang) use SAR, which means -7 >> 1 gives -4 (not -3) in C. For exact C-style signed division by power of 2, you need the add-before-shift adjustment shown in the division section.

ROL, ROR: Rotate (Without Carry)

; ROL: rotate left — bit shifted out of MSB wraps into LSB
rol rax, 1                     ; rotate left by 1
rol rax, cl

; ROR: rotate right — bit shifted out of LSB wraps into MSB
ror rax, 1
ror rax, 8                     ; byte swap within a word
ror rax, cl

Rotates do not discard bits; they circulate them around the register. The last bit rotated wraps into CF as well as into the opposite end of the register.

RCL and RCR: Rotate Through Carry

; RCL: rotate left through carry — CF becomes new LSB, old MSB goes to CF
rcl rax, 1

; RCR: rotate right through carry — CF becomes new MSB, old LSB goes to CF
rcr rax, 1

RCL/RCR treat the register + CF as a (N+1)-bit register and rotate through that combined value. Useful in multi-precision shift operations.

SHLD and SHRD: Double-Precision Shifts

; SHLD dst, src, count — shift dst left, filling from MSB of src
shld rax, rbx, 4               ; shift rax left 4, low 4 bits come from high 4 of rbx
shld rax, rbx, cl

; SHRD dst, src, count — shift dst right, filling from LSB of src
shrd rax, rbx, 4

SHLD/SHRD are used for multi-precision shifts (shifting a 128-bit value stored in two registers) and for arbitrary-position bit field extraction.

; Example: logical left shift of 128-bit value [RDX:RAX] by 8
shld rdx, rax, 8               ; rdx shifts left, its low 8 get rax's high 8
shl  rax, 8                    ; rax shifts left, low 8 become zero

C Comparison: Arithmetic in C vs. Assembly

int64_t  a = ..., b = ...;
uint64_t ua = ..., ub = ...;

a + b          → add rax, rbx (check OF for signed overflow)
ua + ub        → add rax, rbx (check CF for unsigned overflow)
a - b          → sub rax, rbx
a * b          → imul rax, rbx  (two-op, truncated)
ua * ub        → mul rbx (one-op: rdx:rax = rax*rbx, or imul if truncation OK)
a / b          → cqo; idiv rbx  (quotient in rax)
ua / ub        → xor rdx,rdx; div rbx (quotient in rax)
a % b          → cqo; idiv rbx  (remainder in rdx)
~a             → not rax
a & b          → and rax, rbx
a | b          → or  rax, rbx
a ^ b          → xor rax, rbx
a << b         → shl rax, cl  (b in CL)
(int64_t)a >> b → sar rax, cl
(uint64_t)a >> b → shr rax, cl

Complete Example: Implementing a 128-bit Multiply

The following function computes the full 128-bit product of two 64-bit unsigned integers:

; uint128_t mul128(uint64_t a, uint64_t b)
; System V ABI: a=RDI, b=RSI
; Returns: low 64 bits in RAX, high 64 bits in RDX

section .text
global mul128

mul128:
    mov rax, rdi           ; rax = a
    mul rsi                ; rdx:rax = a * b (128-bit unsigned)
    ret                    ; rdx:rax is the return (128-bit return convention)

That is four instructions including the ret. The MUL instruction does the work.

For a more complex example, let us compute (a * b) + c with full 128-bit precision:

; uint128_t mul_add_128(uint64_t a, uint64_t b, uint64_t c)
; a=RDI, b=RSI, c=RDX
; Returns rdx:rax = a*b + c

mul_add_128:
    push rbx               ; save rbx (callee-saved)
    mov rbx, rdx           ; save c (rdx will be clobbered by mul)
    mov rax, rdi           ; rax = a
    mul rsi                ; rdx:rax = a*b
    add rax, rbx           ; add c to low half
    adc rdx, 0             ; propagate carry to high half
    pop rbx
    ret

🔄 Check Your Understanding: If a = 0xFFFFFFFFFFFFFFFF and b = 2, what are the values of RAX and RDX after mul rsi (assuming RSI = b and RAX = a)?

Answer

RAX = 0xFFFFFFFFFFFFFFFE, RDX = 0x0000000000000001

The product is (2^64 - 1) × 2 = 2^65 - 2. Low 64 bits (RAX): 2^65 - 2 mod 2^64 = 2^64 - 2 = 0xFFFFFFFFFFFFFFFE High 64 bits (RDX): (2^65 - 2) >> 64 = 1 = 0x0000000000000001

Register Trace: ADD, ADC, MUL, and DIV

section .text
global _start
_start:
    ; Step 1: Simple addition
    mov rax, 0x7FFFFFFFFFFFFFFF  ; INT64_MAX
    add rax, 1                    ; overflow!
    ; RAX = 0x8000000000000000, OF = 1, CF = 0

    ; Step 2: 128-bit addition
    mov rax, 0xFFFFFFFFFFFFFFFF  ; A_lo
    mov rdx, 1                    ; A_hi → A = 2^64 - 1 + 2^64 = 2^65 - 1
    mov rbx, 1                    ; B_lo
    xor rcx, rcx                  ; B_hi → B = 1
    add rax, rbx                  ; rax = A_lo + B_lo = 0, CF = 1
    adc rdx, rcx                  ; rdx = A_hi + B_hi + CF = 1 + 0 + 1 = 2

    ; Step 3: Multiplication
    mov rax, 1000000000            ; 10^9
    mov rbx, 1000000000
    mul rbx                        ; rdx:rax = 10^18 = 0x0DE0B6B3A7640000
    ; rdx = 0, rax = 0x0DE0B6B3A7640000 (fits in 64 bits)

    ; Step 4: Division
    xor rdx, rdx                   ; clear high half
    mov rax, 1000000               ; dividend = 10^6
    mov rcx, 7
    div rcx                        ; rax = 142857, rdx = 1

    mov eax, 60
    xor edi, edi
    syscall

Step	RAX	RDX	OF	CF	ZF	Notes
After step 1	0x8000000000000000	—	1	0	0	Signed overflow (INT64_MAX + 1)
After `add rax, rbx`	0	1	0	1	1	Low half wrapped, carry set
After `adc rdx, rcx`	0	2	0	0	0	Carry consumed, 128-bit result correct
After `mul rbx`	0x0DE0B6B3A7640000	0	0	0	—	Product fits in 64 bits
After `div rcx`	142857	1	—	—	—	Quotient and remainder

🛠️ Lab Exercise: Run the code above in GDB. After the add rax, 1 instruction, use info registers eflags to see the flags word. Find the OF bit (bit 11). After adc rdx, rcx, verify that RDX is 2 and the carry was consumed.

Summary

The integer ALU instructions form a complete vocabulary for every computation a program needs. ADD and SUB are the workhorses, but their flag semantics — particularly the difference between CF (unsigned overflow) and OF (signed overflow) — are what make conditional branches meaningful. INC and DEC are compact but do not touch CF, which matters for ADC/SBB chains. Multiplication has four distinct forms to match different use cases; division always uses the RDX:RAX convention and requires careful setup. The Boolean instructions AND, OR, XOR, and NOT cover every bit manipulation need, and TEST + CMP provide non-destructive comparison operations that drive every conditional branch in Chapter 10.