Chapter 13 Key Takeaways: Bit Manipulation

  1. AND isolates and clears bits; OR sets bits; XOR toggles bits. These three operations form the complete vocabulary of bitwise manipulation. Every bit manipulation problem reduces to some combination of them.

  2. TEST reg, mask is the preferred way to check specific bits without modifying the register. TEST computes AND without storing the result, setting ZF (zero if no bits in common) and SF. Follow with JZ (bits not set) or JNZ (at least one bit set).

  3. BT reg, bit_index copies a specific bit to CF without modifying the source register. BTS sets the bit, BTR clears it, BTC toggles it. The memory form BT [mem], reg treats memory as a bit array of arbitrary length — useful for large bitsets.

  4. BSF (Bit Scan Forward) and BSR (Bit Scan Reverse) have undefined behavior when the source is zero. Always check for zero before using them, or prefer TZCNT/LZCNT (BMI1) which return 64 for zero input.

  5. LZCNT and TZCNT (BMI1) are strictly better than BSR/BSF for new code. They have well-defined behavior for zero input, provide leading/trailing zero counts directly (without the conversion step needed for BSR/BSF), and have no false-zero-input undefined behavior.

  6. POPCNT counts set bits in a single instruction. It replaces multi-step software algorithms (the parallel counting technique) that required 10+ instructions. Always use POPCNT when available (all modern x86-64 CPUs have it).

  7. BMI1 provides ANDN, BLSI, BLSMSK, BLSR, BEXTR. These cover the most common single-bit manipulation patterns: BLSI isolates the lowest set bit, BLSR clears it, BLSMSK creates a mask up to and including it. Available on Intel Haswell (2013) and AMD Piledriver.

  8. BMI2 provides PEXT, PDEP, BZHI, and the flag-free shifts SARX/SHLX/SHRX/RORX. PEXT and PDEP are the "gather bits" and "scatter bits" operations used in hardware-accelerated CRC, codec implementations, and spatial indexing. SARX/SHLX/SHRX avoid false flag dependencies in shift-heavy code.

  9. x & (x-1) clears the lowest set bit; x & (-x) isolates it. These are the fundamental building blocks of algorithms that iterate over set bits, test for powers of 2, and manipulate bit positions.

  10. The XOR swap (a ^= b; b ^= a; a ^= b) swaps two registers without a temporary, but fails if both operands are the same variable. In practice, XCHG or a three-MOV sequence is clearer; use XOR swap only when saving one instruction is genuinely important.

  11. XOR is the foundation of symmetric cryptography because it is the only Boolean operation that is both invertible and bit-uniform. If the key is truly random and used only once, XOR with the key (one-time pad) is information-theoretically secure. The problem is key management, not the operation itself.

  12. Simple repeating-key XOR is not secure. Ciphertext1 XOR Ciphertext2 = Plaintext1 XOR Plaintext2 when the same key is used. AES and other modern ciphers use XOR for key mixing but surround it with nonlinear transformations (S-Boxes) that prevent this attack.

  13. The imul rax, 0x0101010101010101 trick replicates a byte to all 8 bytes of a 64-bit register. This is used for XOR cipher (key broadcast), SIMD-style byte operations on 64-bit values, and memset-like operations. The multiplier 0x01010101... ensures each byte of the result gets a copy of the original byte.

  14. Bit manipulation instructions (AND, OR, XOR, NOT, shifts, rotates) clear CF and OF. They do not modify CF (except shifts, which put the last shifted-out bit in CF). This is important when bit manipulation instructions appear in sequences that also use conditional branches based on CF.

  15. PARITY computation: popcnt rax, rdi; and rax, 1. The parity of a value (odd or even number of set bits) is the LSB of the popcount. This is used in ECC (error-correcting codes), CRC computation, and certain encoding schemes.