Chapter 13 Further Reading: Bit Manipulation
The Classic Reference
"Hacker's Delight" by Henry S. Warren, Jr. (2nd edition) Addison-Wesley, 2012. ISBN: 0321842685. The essential book. Chapters 2-7 cover: basics, addition/subtraction, multiplication, division by constants, and the most important chapter — Chapter 2 on basics covering all the power-of-2 tests, bit manipulation idioms, and their mathematical foundations. The treatment of POPCNT algorithms (Chapter 5) and integer logarithms (Chapter 11) is exhaustive. The assembly programmer's desk reference for bit tricks.
"Beautiful Code" (ed. Wilson) — Chapter 2: A Regular Expression Matcher The Kernighan chapter demonstrates how bit manipulation enables elegant algorithms. Not directly x86, but illustrates the design thinking.
Intel and AMD Documentation
Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 2: Bit Manipulation Instructions All BMI1 and BMI2 instructions are fully documented: ANDN, BEXTR, BLSI, BLSMSK, BLSR, BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX, LZCNT, TZCNT. Each entry includes the exact operation, flag behavior, and CPUID detection feature bit.
"Intel Intrinsics Guide" — software.intel.com/sites/landingpage/IntrinsicsGuide/
The intrinsics guide allows you to search for compiler intrinsics that map to specific instructions. For example, _mm_popcnt_u64 maps to POPCNT. This is useful when writing C code that uses these instructions via intrinsics rather than inline assembly.
Cryptography
"Applied Cryptography" by Bruce Schneier (2nd edition) John Wiley & Sons, 1996. Still the most accessible overview of symmetric cryptography. Chapter 1 explains why XOR is the building block of stream ciphers. Chapters on DES and AES show how confusion (S-Boxes) and diffusion (permutations, mixing) work together to achieve security that XOR alone cannot.
NIST FIPS 197: Advanced Encryption Standard (AES) The AES specification document: nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf Section 5 describes the SubBytes (S-Box), ShiftRows, MixColumns, and AddRoundKey operations that AESENC implements. Reading this alongside the Chapter 15 AES-NI implementation shows precisely which operations each instruction performs.
"Software Optimization of AES on x86-64" — Käsper and Schwabe, IACR ePrint 2009 The paper that introduced the "bitsliced" AES implementation achieving record software speeds. Shows that even with AES-NI available, software AES on very old hardware required sophisticated bit manipulation. The contrast with AES-NI performance in Chapter 15 is stark.
BMI1/BMI2
Intel Architecture Instruction Set Extensions Programming Reference, Chapter on BMI1/BMI2 intel.com/content/dam/doc/manual/64-ia-32-arch-instruction-set-ext-manual.pdf The original programming reference for the BMI1/BMI2 extensions with detailed performance notes. PEXT and PDEP latency (3-5 cycles on Haswell, improved on later microarchitectures) and throughput are documented here.
"Parsing Integers Quickly" — Daniel Lemire, blog. lemire.me/blog Shows how PEXT/PDEP can accelerate SIMD parsing of integers from text. Demonstrates real-world use of these BMI2 instructions beyond the toy examples in textbooks.