Case Study 3.1: The Register Aliasing Bug
A program with a subtle bug caused by 32-bit write zeroing the upper 32 bits, analyzed with GDB
Overview
This case study presents a real-world class of bug: a function that appears to compute a correct result but silently truncates values due to the 32-bit register write rule. We'll diagnose it using GDB, trace through the exact register states, and fix it. This scenario appears regularly in production code when programmers mix 32-bit and 64-bit operations.
The Buggy Program
; aliasing_bug.asm
; Purpose: count occurrences of a byte value in a byte array
; Function: uint64_t count_byte(const char *buf, size_t len, int target)
; Args: rdi=buf, rsi=len, dl=target (byte to search for)
; Returns: rax = count of target bytes in buf[0..len-1]
section .text
global count_byte, _start
; ====== THE BUGGY VERSION ======
count_byte:
push rbp
mov rbp, rsp
xor eax, eax ; count = 0 (this is actually fine: zeroes rax)
xor ecx, ecx ; i = 0 (this is actually fine: zeroes rcx)
test rsi, rsi ; len == 0?
jz .done
.loop:
movzx edx, BYTE [rdi + rcx] ; load buf[i] -- BUG IS HERE!
cmp dl, [rbp - 8] ; compare with... what?
jne .next
inc rax ; count++
.next:
inc ecx ; i++ -- SECOND BUG: uses ECX not RCX
cmp ecx, esi ; compare with len -- THIRD BUG: uses ESI
jl .loop ; but len could be > 2^32!
.done:
pop rbp
ret
_start:
; Test: count 'A' in "AAABBBCCCAAA" (6 A's)
section .data
test_buf db "AAABBBCCCAAA"
test_len equ $ - test_buf ; = 12
target db 'A' ; = 0x41
section .text
lea rdi, [rel test_buf]
mov rsi, test_len
mov dl, 'A'
call count_byte
; RAX should contain 6
; ... exit with count as exit code for verification
mov rdi, rax
mov rax, 60
syscall
Wait — there are multiple bugs in this program. Let's take them one at a time.
Bug Analysis
Bug 1: The Target Byte Was Never Saved
Look at the function prologue:
count_byte:
push rbp
mov rbp, rsp
xor eax, eax
xor ecx, ecx
The function is supposed to compare each byte of the buffer against the target byte, which arrived in DL. But DL is immediately destroyed on the first iteration by:
movzx edx, BYTE [rdi + rcx] ; this overwrites EDX (and DL) with buf[i]
The original target in DL is gone. The comparison cmp dl, [rbp-8] is comparing buf[i] against [rbp-8] — a stack slot that was never written. This is undefined behavior.
The fix: Save DL to a callee-saved register or to a stack slot before the loop:
count_byte:
push rbp
mov rbp, rsp
push rbx ; rbx is callee-saved
movzx rbx, dl ; save target byte in rbx (zero-extended to 64 bits)
xor eax, eax
xor ecx, ecx
; ...
.loop:
movzx esi, BYTE [rdi + rcx] ; buf[i] in esi (or use r8b etc.)
cmp sil, bl ; compare buf[i] with target
Bug 2: The 32-bit Loop Counter
inc ecx ; BUG: increments only 32 bits
cmp ecx, esi ; BUG: compares 32-bit counter with 32-bit of RSI
jl .loop
If len (in RSI) is less than 2^32 (4 billion), this works correctly, because writing ECX zeros the upper 32 bits and the counter stays correct.
But: if len > 2,147,483,647 (INT32_MAX) and len < 2^32, then cmp ecx, esi is a signed comparison between two 32-bit values. When ecx reaches 0x7FFFFFFF and increments to 0x80000000, the 32-bit signed interpretation says the counter is negative — and jl never terminates, creating an infinite loop.
More importantly: if len >= 2^32, the cmp ecx, esi will always terminate early because the 32-bit ESI only sees the lower 32 bits of len.
The fix: Use 64-bit registers:
inc rcx ; 64-bit increment
cmp rcx, rsi ; 64-bit comparison
jl .loop
Or better, use pointer arithmetic:
inc rdi ; advance the pointer
dec rsi ; decrement the remaining count
jnz .loop
Bug 3: The Subtle movzx edx Register Clobber
movzx edx, BYTE [rdi + rcx] ; loads buf[i] into edx
This writes to EDX — a 32-bit write — which zeroes the upper 32 bits of RDX. RDX is the third argument register and also the high word of the implicit MUL/DIV result. If the caller put anything important in the upper 32 bits of RDX (unlikely for a byte argument, but possible in more complex code), it would be destroyed.
More practically: this uses EDX/RDX as a scratch register without declaring it as modified. That's acceptable for caller-saved registers, but it should be intentional.
The Corrected Version
; aliasing_bug_fixed.asm
; count_byte: count occurrences of a byte in a buffer
; Args: rdi=buf (const char*), rsi=len (size_t), dl=target (char)
; Returns: rax = count (uint64_t)
; Clobbers: rax, rcx, rdi (callee-saved: rbx)
section .text
global count_byte
count_byte:
push rbx ; callee-saved
movzx rbx, dl ; rbx = target byte (zero-extended: 0x00..00XX)
xor eax, eax ; rax = 0 (count), uses 32-bit zero idiom
xor ecx, ecx ; rcx = 0 (loop counter)
test rsi, rsi ; len == 0?
jz .done
.loop:
movzx ecx, BYTE [rdi + rcx] ; ecx = buf[i] as unsigned byte -- BUG HERE STILL
Wait — that's still wrong. Using rcx as both the loop counter and the scratch register for loading is a mistake. Let me use a cleaner structure:
count_byte_v2:
push rbx
movzx rbx, dl ; rbx = target byte (zero-extended)
xor eax, eax ; rax = count = 0
lea rcx, [rdi + rsi] ; rcx = buf + len (end pointer)
; Now: rdi = current position, rcx = end pointer
test rsi, rsi
jz .done
.loop:
movzx edx, BYTE [rdi] ; edx = *rdi (current byte, zero-extended)
cmp rdx, rbx ; compare with target
sete al ; al = 1 if equal, 0 otherwise (branchless!)
movzx eax, al ; zero-extend al to rax... but this resets count!
The branchless approach needs fixing too. Let me use a clean loop:
count_byte_final:
push rbx
movzx rbx, dl ; rbx = target byte
xor eax, eax ; rax = count = 0
test rsi, rsi
jz .done
lea rcx, [rdi + rsi] ; rcx = end pointer
; rdi = current pointer
.loop:
movzx edx, BYTE [rdi] ; load current byte (zero-extend to avoid stale bits)
cmp dl, bl ; compare byte against target
jne .skip ; if not equal, skip count increment
inc rax ; count++
.skip:
inc rdi ; advance pointer
cmp rdi, rcx ; reached end?
jb .loop ; if below (unsigned), continue
.done:
pop rbx
ret
This version:
- Uses pointer comparison (end pointer) instead of index comparison — avoids the 32-bit counter issue
- Uses jb (unsigned below) for the pointer comparison, which is correct
- Saves the target in RBX (callee-saved)
- Uses RDX as scratch (caller-saved, acceptable)
- Uses a proper 64-bit loop structure throughout
GDB Session: Catching the Bug
Here is a GDB session catching the original bug in action:
$ nasm -f elf64 aliasing_bug.asm -o aliasing_bug.o
$ ld aliasing_bug.o -o aliasing_bug
$ gdb ./aliasing_bug
(gdb) break count_byte
(gdb) run
Breakpoint 1, count_byte () in aliasing_bug
(gdb) info registers rdx rbx rbp rdi rsi
rdx 0x41 65 ← target 'A' is in DL (0x41)
rbx 0x0 0
rbp 0x0 0 ← rbp is 0 because we haven't saved/set it yet
rdi 0x402000 4202496 ← address of test_buf
rsi 0xc 12 ← len = 12
(gdb) stepi ; push rbp
(gdb) stepi ; mov rbp, rsp
(gdb) stepi ; xor eax, eax
(gdb) stepi ; xor ecx, ecx
(gdb) info registers rdx
rdx 0x41 65 ← DL still has the target value
(gdb) stepi ; test rsi, rsi
(gdb) stepi ; jz .done (not taken)
(gdb) stepi ; movzx edx, BYTE [rdi + rcx]
(gdb) info registers rdx
rdx 0x41 65 ← EDX = 0x41 ('A') -- coincidentally the target!
← But that's because buf[0] IS 'A', not because we saved it
(gdb) nexti ; cmp dl, [rbp-8]
; This compares 0x41 with whatever is at [rbp-8]
; Let's examine:
(gdb) x/1bx $rbp-8
0x7fffffffe028: 0x41 ← happens to be 0x41 due to memory layout (accident!)
; So the comparison succeeds -- but for the wrong reason
(gdb) stepi ; je .next -- TAKEN (happens to work for buf[0])
The first iteration accidentally works because [rbp-8] happens to contain 0x41 (the return address or some saved value coincidentally has 0x41 in the low byte). This is the most dangerous kind of bug: one that appears to work on some inputs.
Let's check at index 3 (where we expect 'B' which should NOT match 'A'):
; Fast-forward to i=3 (buf[3] = 'B' = 0x42)
(gdb) condition 1 $rcx == 3
(gdb) continue
(gdb) stepi ; movzx edx, BYTE [rdi + rcx]
(gdb) info registers rdx
rdx 0x42 66 ← 'B' = 0x42 -- correctly loaded
(gdb) stepi ; cmp dl, [rbp-8]
(gdb) x/1bx $rbp-8
0x7fffffffe028: 0x41 ← still 0x41
; So this comparison: 0x42 vs 0x41 → not equal → won't count 'B' as a match
; This accidentally gives the correct behavior for 'B'!
; But what happens at i=9 (second group of 'A's)?
(gdb) condition 1 $rcx == 9
(gdb) continue
(gdb) stepi ; movzx edx, [rdi+9]
(gdb) info registers rdx
rdx 0x41 65 ← 'A' = 0x41 loaded into EDX
← comparison is 0x41 vs [rbp-8] = 0x41 → equal!
So the buggy function may happen to count correctly for 'A' in this specific test buffer — but only because [rbp-8] coincidentally contains 0x41. Change the target byte (use 'B' instead of 'A'), and the function will count wrong. Or run on a different system where the stack layout is slightly different, and the behavior changes.
This is the classic signature of an uninitialized memory bug: works on one machine/one test, fails on another.
Lessons
-
Register clobbering is silent. The
movzx edx, BYTE [rdi+rcx]instruction wrote DL (the target register) without any warning. GDB is the only way to catch this — the assembler has no type system that could flag it. -
Bugs may appear correct on specific inputs. The first test happened to give the right answer because of coincidental stack layout. This is why fuzzing (testing with many random inputs) finds bugs that unit tests miss.
-
32-bit loop counters work until they don't. For arrays smaller than 4GB, using
ecxas the loop counter produces correct results. The bug only manifests on large arrays — and no one tests with 4-billion-element arrays. -
GDB's
info registersis your first debug tool. The fastest way to find register clobber bugs is to step through the function and check registers at each step against your predictions.