Case Study 13.1: Bit Manipulation Puzzles — Classic Problems in Assembly

Open Assembly Language Project

Case Study 13.1: Bit Manipulation Puzzles — Classic Problems in Assembly

Overview

Five classic bit manipulation problems, each implemented in NASM with full register traces. These algorithms appear in low-level systems code, compiler intrinsics, and competitive programming. Understanding them in assembly reinforces why the CPU's bitwise instruction set is complete and expressive.

Problem 1: Test if a Number is a Power of 2

Algorithm: A power of 2 has exactly one set bit. x & (x-1) clears the lowest set bit. If only one bit was set, the result is 0.

; int is_power_of_2(uint64_t x)
; Returns 1 if x is a nonzero power of 2, 0 otherwise
; RDI = x

section .text
global is_power_of_2

is_power_of_2:
    xor  eax, eax          ; result = false (0)
    test rdi, rdi          ; is x == 0?
    jz   .done             ; 0 is not a power of 2

    ; x & (x-1) == 0 iff x is a power of 2
    lea  rcx, [rdi - 1]    ; rcx = x - 1
    test rdi, rcx          ; x & (x-1) == 0?
    setz al                ; al = 1 if ZF=1 (equal to zero)
    movzx eax, al          ; zero-extend to 32-bit return

.done:
    ret

Trace for x = 8 (0b1000): - x-1 = 7 = 0b0111 - x & (x-1) = 0b1000 & 0b0111 = 0 - ZF = 1, result = 1 (is power of 2) ✓

Trace for x = 6 (0b0110): - x-1 = 5 = 0b0101 - x & (x-1) = 0b0110 & 0b0101 = 0b0100 ≠ 0 - ZF = 0, result = 0 (not power of 2) ✓

BMI1 version (one instruction):

is_power_of_2_bmi1:
    blsr rax, rdi          ; rax = rdi & (rdi-1); ZF=1 if rdi was power of 2 or 0
    setz cl
    test rdi, rdi          ; also check rdi != 0
    setnz al
    and  al, cl            ; both conditions true
    movzx eax, al
    ret

Problem 2: Round Up to Next Power of 2

Algorithm: Find the highest set bit (BSR), then shift 1 left by (position+1). Handle already-power-of-2 case.

; uint64_t next_power_of_2(uint64_t x)
; Returns smallest power of 2 >= x
; Special cases: x=0 → 0, x is already power of 2 → x
; RDI = x

next_power_of_2:
    test rdi, rdi          ; x == 0?
    jz   .return_zero

    ; Check if already power of 2
    lea  rcx, [rdi - 1]
    test rdi, rcx
    jz   .already_power    ; x & (x-1) == 0 means already a power of 2

    ; Not a power of 2: find highest bit and shift up one
    bsr  rcx, rdi          ; rcx = position of highest set bit
    mov  rax, 2
    shl  rax, cl           ; rax = 2 << position = 2^(position+1)
    ret

.already_power:
    mov  rax, rdi
    ret

.return_zero:
    xor  eax, eax
    ret

Trace for x = 6: - x & (x-1) = 6 & 5 = 4 ≠ 0: not already a power of 2 - BSR: highest bit of 6 (0b110) is bit 2, so RCX = 2 - 2 << 2 = 8: next power of 2 is 8 ✓

Trace for x = 8: - x & (x-1) = 8 & 7 = 0: already a power of 2 - Returns 8 ✓

Problem 3: Extract a Bit Field

Algorithm: Shift right to bring the field to bit 0, then AND with a mask to zero the upper bits.

; uint64_t extract_bits(uint64_t value, int start, int len)
; Extracts len bits starting at bit position start from value
; RDI = value, ESI = start (0-63), EDX = len (1-64)
; Returns extracted bits in RAX, zero-extended

extract_bits:
    ; Create mask: (1 << len) - 1, or all-ones if len == 64
    mov  rcx, rdx          ; rcx = len
    xor  eax, eax
    not  rax               ; rax = 0xFFFFFFFFFFFFFFFF (all ones)
    cmp  ecx, 64
    je   .full_mask        ; if len == 64, mask is all-ones
    shr  rax, cl           ; shift right by len
    not  rax               ; invert: low len bits are 1, rest are 0
    ; Alternative: mov rax, 1; shl rax, cl; sub rax, 1 (only works for len < 64)

.full_mask:
    ; Shift value right by start to bring field to bit 0
    mov  rcx, rsi          ; rcx = start
    mov  rdx, rdi          ; rdx = value (save, since rdi may be clobbered)
    shr  rdx, cl           ; shift right by start
    and  rax, rdx          ; mask to len bits
    ret

Trace for value = 0xABCDEF, start = 8, len = 8: - mask = (1 << 8) - 1 = 0xFF - value >> 8 = 0xABCD (lower bytes shifted out) - 0xABCD & 0xFF = 0xCD - Result: 0xCD (the byte at bits 15:8) ✓

Problem 4: Set a Bit Field

Algorithm: Clear the target bits, then OR in the new value shifted to the correct position.

; uint64_t set_bits(uint64_t dst, uint64_t src, int start, int len)
; Sets bits [start+len-1:start] of dst to the low len bits of src
; RDI = dst, RSI = src, EDX = start, ECX = len

set_bits:
    ; Step 1: Build mask for the field
    mov  r8, 1
    mov  r9, rcx           ; r9 = len
    shl  r8, cl            ; r8 = 1 << len
    sub  r8, 1             ; r8 = mask = (1 << len) - 1

    ; Step 2: Clear the field in dst
    mov  r10, r8           ; r10 = mask
    mov  ecx, edx          ; ecx = start
    shl  r10, cl           ; r10 = mask << start (positioned mask)
    not  r10               ; r10 = ~(mask << start) (clear mask)
    and  rdi, r10          ; dst with field cleared

    ; Step 3: Isolate and position src value
    and  rsi, r8           ; isolate low len bits of src
    shl  rsi, cl           ; shift to position
    or   rdi, rsi          ; insert into dst

    mov  rax, rdi
    ret

Trace: set_bits(0xFFFFFFFF, 0b101, start=4, len=3) — insert 0b101 at bits 6:4: - mask = (1<<3) - 1 = 0b111 - clear mask = ~(0b111 << 4) = ~0b1110000 = 0xFFFFFF8F (bits 6:4 cleared) - dst = 0xFFFFFFFF & 0xFFFFFF8F = 0xFFFFFF8F - src isolated = 0b101 & 0b111 = 0b101 - src positioned = 0b101 << 4 = 0b1010000 - result = 0xFFFFFF8F | 0b1010000 = 0xFFFFFFAF - Bits 6:4 of result = 0b101 ✓

Problem 5: Count Bit Transitions

Count the number of positions where adjacent bits differ (useful for encoding efficiency, signal transitions, run-length encoding analysis):

; uint64_t count_transitions(uint64_t x)
; Counts how many adjacent bit pairs in x differ (0→1 or 1→0)
; RDI = x

count_transitions:
    mov  rax, rdi          ; rax = x
    mov  rcx, rdi
    shr  rcx, 1            ; rcx = x >> 1 (shift one position)
    xor  rax, rcx          ; rax = x XOR (x>>1): bit i is 1 iff bit i differs from bit i+1
    popcnt rax, rax        ; count the transitions
    ret

Trace for x = 0b10110001: - x >> 1 = 0b01011000 - XOR = 0b11101001 (1s at positions where adjacent bits differ) - popcount(0b11101001) = 5 transitions ✓ (transitions: 1→0 at pos 7, 0→1 at pos 5, 1→1 at pos 4=same, 1→0 at pos 3, 0→0 at pos 2=same, 0→0 at pos 1=same, 0→1 at pos 0) Wait: 0b10110001: positions 7,6,5,4,3,2,1,0 = 1,0,1,1,0,0,0,1 Transitions: 7→6: 1→0 (diff), 6→5: 0→1 (diff), 5→4: 1→1 (same), 4→3: 1→0 (diff), 3→2: 0→0 (same), 2→1: 0→0 (same), 1→0: 0→1 (diff) Count = 4 transitions. Let's verify: XOR = 1 XOR 0 = 1, 0 XOR 1 = 1, 1 XOR 1 = 0, 1 XOR 0 = 1, 0 XOR 0 = 0, 0 XOR 0 = 0, 0 XOR 1 = 1. So the XOR of overlapping pairs = 0b1101001 (7 bits, positions 6:0), popcount = 4. Correct.

Synthesis: Why These Work in Assembly

Each of these algorithms relies on one insight:

Power of 2 check: x & (x-1) clears the lowest bit. Hardware AND in 1 cycle.
Next power of 2: BSR finds the highest bit's position in 3 cycles. One shift gives the answer.
Field extract: SHR + AND. Two instructions regardless of field width.
Field set: Three instructions to clear and insert. The same number as the C equivalent.
Transitions: XOR adjacent bits, then POPCNT. Two instructions.

In C, these require the same operations but the compiler must compile them from the source representation. In assembly, you write the operations directly, with no abstraction layer to potentially miscompile.