Case Study 13.1: Bit Manipulation Puzzles — Classic Problems in Assembly
Overview
Five classic bit manipulation problems, each implemented in NASM with full register traces. These algorithms appear in low-level systems code, compiler intrinsics, and competitive programming. Understanding them in assembly reinforces why the CPU's bitwise instruction set is complete and expressive.
Problem 1: Test if a Number is a Power of 2
Algorithm: A power of 2 has exactly one set bit. x & (x-1) clears the lowest set bit. If only one bit was set, the result is 0.
; int is_power_of_2(uint64_t x)
; Returns 1 if x is a nonzero power of 2, 0 otherwise
; RDI = x
section .text
global is_power_of_2
is_power_of_2:
xor eax, eax ; result = false (0)
test rdi, rdi ; is x == 0?
jz .done ; 0 is not a power of 2
; x & (x-1) == 0 iff x is a power of 2
lea rcx, [rdi - 1] ; rcx = x - 1
test rdi, rcx ; x & (x-1) == 0?
setz al ; al = 1 if ZF=1 (equal to zero)
movzx eax, al ; zero-extend to 32-bit return
.done:
ret
Trace for x = 8 (0b1000): - x-1 = 7 = 0b0111 - x & (x-1) = 0b1000 & 0b0111 = 0 - ZF = 1, result = 1 (is power of 2) ✓
Trace for x = 6 (0b0110): - x-1 = 5 = 0b0101 - x & (x-1) = 0b0110 & 0b0101 = 0b0100 ≠ 0 - ZF = 0, result = 0 (not power of 2) ✓
BMI1 version (one instruction):
is_power_of_2_bmi1:
blsr rax, rdi ; rax = rdi & (rdi-1); ZF=1 if rdi was power of 2 or 0
setz cl
test rdi, rdi ; also check rdi != 0
setnz al
and al, cl ; both conditions true
movzx eax, al
ret
Problem 2: Round Up to Next Power of 2
Algorithm: Find the highest set bit (BSR), then shift 1 left by (position+1). Handle already-power-of-2 case.
; uint64_t next_power_of_2(uint64_t x)
; Returns smallest power of 2 >= x
; Special cases: x=0 → 0, x is already power of 2 → x
; RDI = x
next_power_of_2:
test rdi, rdi ; x == 0?
jz .return_zero
; Check if already power of 2
lea rcx, [rdi - 1]
test rdi, rcx
jz .already_power ; x & (x-1) == 0 means already a power of 2
; Not a power of 2: find highest bit and shift up one
bsr rcx, rdi ; rcx = position of highest set bit
mov rax, 2
shl rax, cl ; rax = 2 << position = 2^(position+1)
ret
.already_power:
mov rax, rdi
ret
.return_zero:
xor eax, eax
ret
Trace for x = 6: - x & (x-1) = 6 & 5 = 4 ≠ 0: not already a power of 2 - BSR: highest bit of 6 (0b110) is bit 2, so RCX = 2 - 2 << 2 = 8: next power of 2 is 8 ✓
Trace for x = 8: - x & (x-1) = 8 & 7 = 0: already a power of 2 - Returns 8 ✓
Problem 3: Extract a Bit Field
Algorithm: Shift right to bring the field to bit 0, then AND with a mask to zero the upper bits.
; uint64_t extract_bits(uint64_t value, int start, int len)
; Extracts len bits starting at bit position start from value
; RDI = value, ESI = start (0-63), EDX = len (1-64)
; Returns extracted bits in RAX, zero-extended
extract_bits:
; Create mask: (1 << len) - 1, or all-ones if len == 64
mov rcx, rdx ; rcx = len
xor eax, eax
not rax ; rax = 0xFFFFFFFFFFFFFFFF (all ones)
cmp ecx, 64
je .full_mask ; if len == 64, mask is all-ones
shr rax, cl ; shift right by len
not rax ; invert: low len bits are 1, rest are 0
; Alternative: mov rax, 1; shl rax, cl; sub rax, 1 (only works for len < 64)
.full_mask:
; Shift value right by start to bring field to bit 0
mov rcx, rsi ; rcx = start
mov rdx, rdi ; rdx = value (save, since rdi may be clobbered)
shr rdx, cl ; shift right by start
and rax, rdx ; mask to len bits
ret
Trace for value = 0xABCDEF, start = 8, len = 8: - mask = (1 << 8) - 1 = 0xFF - value >> 8 = 0xABCD (lower bytes shifted out) - 0xABCD & 0xFF = 0xCD - Result: 0xCD (the byte at bits 15:8) ✓
Problem 4: Set a Bit Field
Algorithm: Clear the target bits, then OR in the new value shifted to the correct position.
; uint64_t set_bits(uint64_t dst, uint64_t src, int start, int len)
; Sets bits [start+len-1:start] of dst to the low len bits of src
; RDI = dst, RSI = src, EDX = start, ECX = len
set_bits:
; Step 1: Build mask for the field
mov r8, 1
mov r9, rcx ; r9 = len
shl r8, cl ; r8 = 1 << len
sub r8, 1 ; r8 = mask = (1 << len) - 1
; Step 2: Clear the field in dst
mov r10, r8 ; r10 = mask
mov ecx, edx ; ecx = start
shl r10, cl ; r10 = mask << start (positioned mask)
not r10 ; r10 = ~(mask << start) (clear mask)
and rdi, r10 ; dst with field cleared
; Step 3: Isolate and position src value
and rsi, r8 ; isolate low len bits of src
shl rsi, cl ; shift to position
or rdi, rsi ; insert into dst
mov rax, rdi
ret
Trace: set_bits(0xFFFFFFFF, 0b101, start=4, len=3) — insert 0b101 at bits 6:4: - mask = (1<<3) - 1 = 0b111 - clear mask = ~(0b111 << 4) = ~0b1110000 = 0xFFFFFF8F (bits 6:4 cleared) - dst = 0xFFFFFFFF & 0xFFFFFF8F = 0xFFFFFF8F - src isolated = 0b101 & 0b111 = 0b101 - src positioned = 0b101 << 4 = 0b1010000 - result = 0xFFFFFF8F | 0b1010000 = 0xFFFFFFAF - Bits 6:4 of result = 0b101 ✓
Problem 5: Count Bit Transitions
Count the number of positions where adjacent bits differ (useful for encoding efficiency, signal transitions, run-length encoding analysis):
; uint64_t count_transitions(uint64_t x)
; Counts how many adjacent bit pairs in x differ (0→1 or 1→0)
; RDI = x
count_transitions:
mov rax, rdi ; rax = x
mov rcx, rdi
shr rcx, 1 ; rcx = x >> 1 (shift one position)
xor rax, rcx ; rax = x XOR (x>>1): bit i is 1 iff bit i differs from bit i+1
popcnt rax, rax ; count the transitions
ret
Trace for x = 0b10110001: - x >> 1 = 0b01011000 - XOR = 0b11101001 (1s at positions where adjacent bits differ) - popcount(0b11101001) = 5 transitions ✓ (transitions: 1→0 at pos 7, 0→1 at pos 5, 1→1 at pos 4=same, 1→0 at pos 3, 0→0 at pos 2=same, 0→0 at pos 1=same, 0→1 at pos 0) Wait: 0b10110001: positions 7,6,5,4,3,2,1,0 = 1,0,1,1,0,0,0,1 Transitions: 7→6: 1→0 (diff), 6→5: 0→1 (diff), 5→4: 1→1 (same), 4→3: 1→0 (diff), 3→2: 0→0 (same), 2→1: 0→0 (same), 1→0: 0→1 (diff) Count = 4 transitions. Let's verify: XOR = 1 XOR 0 = 1, 0 XOR 1 = 1, 1 XOR 1 = 0, 1 XOR 0 = 1, 0 XOR 0 = 0, 0 XOR 0 = 0, 0 XOR 1 = 1. So the XOR of overlapping pairs = 0b1101001 (7 bits, positions 6:0), popcount = 4. Correct.
Synthesis: Why These Work in Assembly
Each of these algorithms relies on one insight:
- Power of 2 check:
x & (x-1)clears the lowest bit. Hardware AND in 1 cycle. - Next power of 2: BSR finds the highest bit's position in 3 cycles. One shift gives the answer.
- Field extract: SHR + AND. Two instructions regardless of field width.
- Field set: Three instructions to clear and insert. The same number as the C equivalent.
- Transitions: XOR adjacent bits, then POPCNT. Two instructions.
In C, these require the same operations but the compiler must compile them from the source representation. In assembly, you write the operations directly, with no abstraction layer to potentially miscompile.