Case Study 3.1: The Register Aliasing Bug

A program with a subtle bug caused by 32-bit write zeroing the upper 32 bits, analyzed with GDB


Overview

This case study presents a real-world class of bug: a function that appears to compute a correct result but silently truncates values due to the 32-bit register write rule. We'll diagnose it using GDB, trace through the exact register states, and fix it. This scenario appears regularly in production code when programmers mix 32-bit and 64-bit operations.


The Buggy Program

; aliasing_bug.asm
; Purpose: count occurrences of a byte value in a byte array
; Function: uint64_t count_byte(const char *buf, size_t len, int target)
; Args: rdi=buf, rsi=len, dl=target (byte to search for)
; Returns: rax = count of target bytes in buf[0..len-1]

section .text
    global count_byte, _start

; ====== THE BUGGY VERSION ======
count_byte:
    push rbp
    mov  rbp, rsp

    xor  eax, eax           ; count = 0  (this is actually fine: zeroes rax)
    xor  ecx, ecx           ; i = 0      (this is actually fine: zeroes rcx)

    test rsi, rsi            ; len == 0?
    jz   .done

.loop:
    movzx edx, BYTE [rdi + rcx]  ; load buf[i] -- BUG IS HERE!
    cmp  dl, [rbp - 8]           ; compare with... what?
    jne  .next
    inc  rax                      ; count++

.next:
    inc  ecx                      ; i++  -- SECOND BUG: uses ECX not RCX
    cmp  ecx, esi                 ; compare with len  -- THIRD BUG: uses ESI
    jl   .loop                    ; but len could be > 2^32!

.done:
    pop  rbp
    ret

_start:
    ; Test: count 'A' in "AAABBBCCCAAA" (6 A's)
    section .data
    test_buf db "AAABBBCCCAAA"
    test_len equ $ - test_buf       ; = 12
    target   db 'A'                 ; = 0x41

    section .text
    lea  rdi, [rel test_buf]
    mov  rsi, test_len
    mov  dl, 'A'
    call count_byte

    ; RAX should contain 6
    ; ... exit with count as exit code for verification
    mov  rdi, rax
    mov  rax, 60
    syscall

Wait — there are multiple bugs in this program. Let's take them one at a time.


Bug Analysis

Bug 1: The Target Byte Was Never Saved

Look at the function prologue:

count_byte:
    push rbp
    mov  rbp, rsp
    xor  eax, eax
    xor  ecx, ecx

The function is supposed to compare each byte of the buffer against the target byte, which arrived in DL. But DL is immediately destroyed on the first iteration by:

movzx edx, BYTE [rdi + rcx]   ; this overwrites EDX (and DL) with buf[i]

The original target in DL is gone. The comparison cmp dl, [rbp-8] is comparing buf[i] against [rbp-8] — a stack slot that was never written. This is undefined behavior.

The fix: Save DL to a callee-saved register or to a stack slot before the loop:

count_byte:
    push rbp
    mov  rbp, rsp
    push rbx                    ; rbx is callee-saved

    movzx rbx, dl               ; save target byte in rbx (zero-extended to 64 bits)

    xor  eax, eax
    xor  ecx, ecx
    ; ...
.loop:
    movzx esi, BYTE [rdi + rcx]  ; buf[i] in esi (or use r8b etc.)
    cmp  sil, bl                  ; compare buf[i] with target

Bug 2: The 32-bit Loop Counter

    inc  ecx                  ; BUG: increments only 32 bits
    cmp  ecx, esi             ; BUG: compares 32-bit counter with 32-bit of RSI
    jl   .loop

If len (in RSI) is less than 2^32 (4 billion), this works correctly, because writing ECX zeros the upper 32 bits and the counter stays correct.

But: if len > 2,147,483,647 (INT32_MAX) and len < 2^32, then cmp ecx, esi is a signed comparison between two 32-bit values. When ecx reaches 0x7FFFFFFF and increments to 0x80000000, the 32-bit signed interpretation says the counter is negative — and jl never terminates, creating an infinite loop.

More importantly: if len >= 2^32, the cmp ecx, esi will always terminate early because the 32-bit ESI only sees the lower 32 bits of len.

The fix: Use 64-bit registers:

    inc  rcx                  ; 64-bit increment
    cmp  rcx, rsi             ; 64-bit comparison
    jl   .loop

Or better, use pointer arithmetic:

    inc  rdi                  ; advance the pointer
    dec  rsi                  ; decrement the remaining count
    jnz  .loop

Bug 3: The Subtle movzx edx Register Clobber

movzx edx, BYTE [rdi + rcx]   ; loads buf[i] into edx

This writes to EDX — a 32-bit write — which zeroes the upper 32 bits of RDX. RDX is the third argument register and also the high word of the implicit MUL/DIV result. If the caller put anything important in the upper 32 bits of RDX (unlikely for a byte argument, but possible in more complex code), it would be destroyed.

More practically: this uses EDX/RDX as a scratch register without declaring it as modified. That's acceptable for caller-saved registers, but it should be intentional.


The Corrected Version

; aliasing_bug_fixed.asm
; count_byte: count occurrences of a byte in a buffer
; Args: rdi=buf (const char*), rsi=len (size_t), dl=target (char)
; Returns: rax = count (uint64_t)
; Clobbers: rax, rcx, rdi (callee-saved: rbx)

section .text
global count_byte

count_byte:
    push  rbx                   ; callee-saved

    movzx rbx, dl               ; rbx = target byte (zero-extended: 0x00..00XX)
    xor   eax, eax              ; rax = 0 (count), uses 32-bit zero idiom
    xor   ecx, ecx              ; rcx = 0 (loop counter)

    test  rsi, rsi              ; len == 0?
    jz    .done

.loop:
    movzx ecx, BYTE [rdi + rcx] ; ecx = buf[i] as unsigned byte  -- BUG HERE STILL

Wait — that's still wrong. Using rcx as both the loop counter and the scratch register for loading is a mistake. Let me use a cleaner structure:

count_byte_v2:
    push  rbx

    movzx rbx, dl               ; rbx = target byte (zero-extended)
    xor   eax, eax              ; rax = count = 0
    lea   rcx, [rdi + rsi]      ; rcx = buf + len (end pointer)
    ; Now: rdi = current position, rcx = end pointer

    test  rsi, rsi
    jz    .done

.loop:
    movzx edx, BYTE [rdi]       ; edx = *rdi (current byte, zero-extended)
    cmp   rdx, rbx              ; compare with target
    sete  al                    ; al = 1 if equal, 0 otherwise (branchless!)
    movzx eax, al               ; zero-extend al to rax... but this resets count!

The branchless approach needs fixing too. Let me use a clean loop:

count_byte_final:
    push rbx

    movzx rbx, dl               ; rbx = target byte
    xor   eax, eax              ; rax = count = 0

    test  rsi, rsi
    jz    .done

    lea   rcx, [rdi + rsi]      ; rcx = end pointer
                                 ; rdi = current pointer

.loop:
    movzx edx, BYTE [rdi]       ; load current byte (zero-extend to avoid stale bits)
    cmp   dl, bl                ; compare byte against target
    jne   .skip                 ; if not equal, skip count increment
    inc   rax                   ; count++

.skip:
    inc   rdi                   ; advance pointer
    cmp   rdi, rcx              ; reached end?
    jb    .loop                 ; if below (unsigned), continue

.done:
    pop   rbx
    ret

This version: - Uses pointer comparison (end pointer) instead of index comparison — avoids the 32-bit counter issue - Uses jb (unsigned below) for the pointer comparison, which is correct - Saves the target in RBX (callee-saved) - Uses RDX as scratch (caller-saved, acceptable) - Uses a proper 64-bit loop structure throughout


GDB Session: Catching the Bug

Here is a GDB session catching the original bug in action:

$ nasm -f elf64 aliasing_bug.asm -o aliasing_bug.o
$ ld aliasing_bug.o -o aliasing_bug
$ gdb ./aliasing_bug

(gdb) break count_byte
(gdb) run

Breakpoint 1, count_byte () in aliasing_bug
(gdb) info registers rdx rbx rbp rdi rsi
rdx   0x41    65      ← target 'A' is in DL (0x41)
rbx   0x0     0
rbp   0x0     0       ← rbp is 0 because we haven't saved/set it yet
rdi   0x402000 4202496  ← address of test_buf
rsi   0xc     12       ← len = 12

(gdb) stepi            ; push rbp
(gdb) stepi            ; mov rbp, rsp
(gdb) stepi            ; xor eax, eax
(gdb) stepi            ; xor ecx, ecx

(gdb) info registers rdx
rdx   0x41    65       ← DL still has the target value

(gdb) stepi            ; test rsi, rsi
(gdb) stepi            ; jz .done (not taken)
(gdb) stepi            ; movzx edx, BYTE [rdi + rcx]

(gdb) info registers rdx
rdx   0x41    65       ← EDX = 0x41 ('A') -- coincidentally the target!
                       ← But that's because buf[0] IS 'A', not because we saved it

(gdb) nexti            ; cmp dl, [rbp-8]
; This compares 0x41 with whatever is at [rbp-8]
; Let's examine:
(gdb) x/1bx $rbp-8
0x7fffffffe028:    0x41   ← happens to be 0x41 due to memory layout (accident!)

; So the comparison succeeds -- but for the wrong reason
(gdb) stepi            ; je .next -- TAKEN (happens to work for buf[0])

The first iteration accidentally works because [rbp-8] happens to contain 0x41 (the return address or some saved value coincidentally has 0x41 in the low byte). This is the most dangerous kind of bug: one that appears to work on some inputs.

Let's check at index 3 (where we expect 'B' which should NOT match 'A'):

; Fast-forward to i=3 (buf[3] = 'B' = 0x42)
(gdb) condition 1 $rcx == 3
(gdb) continue

(gdb) stepi            ; movzx edx, BYTE [rdi + rcx]
(gdb) info registers rdx
rdx   0x42    66       ← 'B' = 0x42 -- correctly loaded

(gdb) stepi            ; cmp dl, [rbp-8]
(gdb) x/1bx $rbp-8
0x7fffffffe028:    0x41   ← still 0x41
; So this comparison: 0x42 vs 0x41 → not equal → won't count 'B' as a match
; This accidentally gives the correct behavior for 'B'!

; But what happens at i=9 (second group of 'A's)?
(gdb) condition 1 $rcx == 9
(gdb) continue
(gdb) stepi            ; movzx edx, [rdi+9]
(gdb) info registers rdx
rdx   0x41    65       ← 'A' = 0x41 loaded into EDX
                       ← comparison is 0x41 vs [rbp-8] = 0x41 → equal!

So the buggy function may happen to count correctly for 'A' in this specific test buffer — but only because [rbp-8] coincidentally contains 0x41. Change the target byte (use 'B' instead of 'A'), and the function will count wrong. Or run on a different system where the stack layout is slightly different, and the behavior changes.

This is the classic signature of an uninitialized memory bug: works on one machine/one test, fails on another.


Lessons

  1. Register clobbering is silent. The movzx edx, BYTE [rdi+rcx] instruction wrote DL (the target register) without any warning. GDB is the only way to catch this — the assembler has no type system that could flag it.

  2. Bugs may appear correct on specific inputs. The first test happened to give the right answer because of coincidental stack layout. This is why fuzzing (testing with many random inputs) finds bugs that unit tests miss.

  3. 32-bit loop counters work until they don't. For arrays smaller than 4GB, using ecx as the loop counter produces correct results. The bug only manifests on large arrays — and no one tests with 4-billion-element arrays.

  4. GDB's info registers is your first debug tool. The fastest way to find register clobber bugs is to step through the function and check registers at each step against your predictions.