Case Study 2: The Hidden Bug
Tracing Through AI-Generated Code That Looks Correct but Is Not
The Scenario
It is late on a Thursday evening. Your task management application has been running flawlessly for three weeks. Users create tasks, set priorities, assign due dates, and mark tasks as complete. Everything works. The tests pass. The users are happy.
Then a bug report comes in: "When I try to sort my tasks by due date, some tasks appear in the wrong order, and sometimes tasks without due dates crash the sort entirely."
You open the code the AI generated for the sorting feature. You read through it. It looks... perfectly fine. Every line makes sense. The logic appears sound. The function names are clear. The type hints are present. It even has a docstring.
And yet, it is broken.
Welcome to the detective story of the hidden bug.
The Suspects: The Code
Here is the sorting module your AI assistant generated:
"""Task sorting utilities for the CLI task manager."""
from datetime import datetime
from typing import Optional
class Task:
"""Represents a task in the task manager."""
def __init__(
self,
task_id: int,
title: str,
priority: str = "medium",
due_date: Optional[str] = None,
completed: bool = False,
):
self.task_id = task_id
self.title = title
self.priority = priority
self.due_date = due_date
self.completed = completed
self.created_at = datetime.now().isoformat()
def __repr__(self) -> str:
return f"Task({self.task_id}, '{self.title}', due={self.due_date})"
PRIORITY_ORDER = {"low": 1, "medium": 2, "high": 3, "critical": 4}
def parse_date(date_string: Optional[str]) -> Optional[datetime]:
"""Parse a date string into a datetime object.
Args:
date_string: A date in YYYY-MM-DD format, or None.
Returns:
A datetime object, or None if the input is None.
"""
if date_string is None:
return None
return datetime.strptime(date_string, "%Y-%m-%d")
def sort_by_due_date(tasks: list[Task]) -> list[Task]:
"""Sort tasks by due date, earliest first.
Tasks without due dates are placed at the end of the list.
Args:
tasks: List of tasks to sort.
Returns:
A new list of tasks sorted by due date.
"""
def sort_key(task: Task):
date = parse_date(task.due_date)
if date is None:
return datetime.max
return date
return sorted(tasks, key=sort_key)
def sort_by_priority(tasks: list[Task]) -> list[Task]:
"""Sort tasks by priority, highest first.
Args:
tasks: List of tasks to sort.
Returns:
A new list of tasks sorted by priority (descending).
"""
return sorted(
tasks,
key=lambda t: PRIORITY_ORDER.get(t.priority, 0),
reverse=True
)
def sort_by_multiple(
tasks: list[Task],
criteria: list[str]
) -> list[Task]:
"""Sort tasks by multiple criteria in order of precedence.
Supported criteria: 'due_date', 'priority', 'title', 'created_at'.
Earlier criteria in the list have higher precedence.
Args:
tasks: List of tasks to sort.
criteria: List of sort criteria in order of precedence.
Returns:
A new list of tasks sorted by the specified criteria.
"""
result = list(tasks) # Make a copy
# Apply sorts in reverse order (last criterion first)
# so that the first criterion ends up as the primary sort
for criterion in reversed(criteria):
if criterion == "due_date":
result = sort_by_due_date(result)
elif criterion == "priority":
result = sort_by_priority(result)
elif criterion == "title":
result = sorted(result, key=lambda t: t.title)
elif criterion == "created_at":
result = sorted(result, key=lambda t: t.created_at)
return result
def get_next_task(tasks: list[Task]) -> Optional[Task]:
"""Get the most urgent incomplete task.
Priority: first by priority (highest first), then by due date
(earliest first).
Args:
tasks: List of all tasks.
Returns:
The most urgent incomplete task, or None if all are completed.
"""
incomplete = [t for t in tasks if not t.completed]
if not incomplete:
return None
sorted_tasks = sort_by_multiple(incomplete, ["priority", "due_date"])
return sorted_tasks[0]
Read through this code carefully. Does anything jump out at you?
If nothing does, that is exactly the point. This code looks correct. It has good structure, clear naming, type hints, docstrings, and logical organization. This is why hidden bugs are so dangerous — they hide behind good-looking code.
The Investigation Begins
Let us do what good detectives do: recreate the crime scene. We need concrete test data.
tasks = [
Task(1, "Write report", "high", "2025-03-15"),
Task(2, "Buy groceries", "low", "2025-03-10"),
Task(3, "Team meeting", "medium", "2025-03-12"),
Task(4, "Fix bug", "critical", "2025-03-10"),
Task(5, "Read book", "low", None),
Task(6, "Plan vacation", "medium", None),
]
Test 1: Sort by Due Date
Let us trace sort_by_due_date(tasks).
The sort_key function converts each task's due date:
| Task | due_date | parse_date result | sort_key value |
|---|---|---|---|
| Task 1: "Write report" | "2025-03-15" | 2025-03-15 00:00:00 | 2025-03-15 |
| Task 2: "Buy groceries" | "2025-03-10" | 2025-03-10 00:00:00 | 2025-03-10 |
| Task 3: "Team meeting" | "2025-03-12" | 2025-03-12 00:00:00 | 2025-03-12 |
| Task 4: "Fix bug" | "2025-03-10" | 2025-03-10 00:00:00 | 2025-03-10 |
| Task 5: "Read book" | None | None | datetime.max |
| Task 6: "Plan vacation" | None | None | datetime.max |
Expected order: Tasks 2 and 4 (March 10), Task 3 (March 12), Task 1 (March 15), then Tasks 5 and 6 (no date).
This looks correct. The datetime.max sentinel value pushes tasks without due dates to the end. Sorting by date works.
But wait. The user reported that "sometimes tasks without due dates crash the sort entirely." Let us think about when that would happen.
What if a task's due_date field is not None but is an empty string ""?
task_7 = Task(7, "Mystery task", "medium", "")
Tracing through:
- parse_date("") — date_string is "", which is not None, so we proceed to datetime.strptime("", "%Y-%m-%d").
- This raises ValueError: time data '' does not match format '%Y-%m-%d'.
- Crash.
This is our first hidden bug. The parse_date function only handles None and valid date strings. An empty string, a malformed date like "March 10", or a date in the wrong format like "10/03/2025" will crash the entire sort operation.
Hidden Bug #1: parse_date does not handle empty strings or malformed dates.
Digging Deeper
Now let us examine sort_by_multiple. The function's docstring says: "Earlier criteria in the list have higher precedence." The implementation reverses the criteria and applies sorts sequentially, relying on Python's stable sort to preserve the ordering of previous sorts.
Let us trace sort_by_multiple(tasks, ["priority", "due_date"]).
This means: sort primarily by priority (highest first), and for tasks with the same priority, sort by due date (earliest first).
The function reverses the criteria, so it first sorts by "due_date", then by "priority".
Step 1: Sort by due_date
After sort_by_due_date, the order is:
1. Task 2: "Buy groceries" (low, 2025-03-10)
2. Task 4: "Fix bug" (critical, 2025-03-10)
3. Task 3: "Team meeting" (medium, 2025-03-12)
4. Task 1: "Write report" (high, 2025-03-15)
5. Task 5: "Read book" (low, None)
6. Task 6: "Plan vacation" (medium, None)
Step 2: Sort by priority (descending)
After sort_by_priority with reverse=True:
1. Task 4: "Fix bug" (critical, 2025-03-10) - priority 4
2. Task 1: "Write report" (high, 2025-03-15) - priority 3
3. Task 3: "Team meeting" (medium, 2025-03-12) - priority 2
4. Task 6: "Plan vacation" (medium, None) - priority 2
5. Task 2: "Buy groceries" (low, 2025-03-10) - priority 1
6. Task 5: "Read book" (low, None) - priority 1
Let us verify: within the same priority, is the due date ordering preserved?
- Medium priority: Task 3 (March 12) before Task 6 (None). Correct.
- Low priority: Task 2 (March 10) before Task 5 (None). Correct.
This appears to work. But now let us try a different case. What about the user's request from get_next_task, which uses ["priority", "due_date"]?
The get_next_task function filters incomplete tasks and then calls sort_by_multiple(incomplete, ["priority", "due_date"]), returning the first element. The expected behavior is: "give me the highest priority task, and if there is a tie, the one due soonest."
Let us test with tasks that expose the problem:
tasks = [
Task(1, "Urgent report", "high", "2025-03-20"),
Task(2, "Quick fix", "high", "2025-03-10"),
Task(3, "Critical deploy", "critical", "2025-03-15"),
]
Expected "next task": Task 3 ("Critical deploy") because it has the highest priority (critical).
Let us trace sort_by_multiple(tasks, ["priority", "due_date"]):
Step 1: Sort by due_date (reversed criteria means due_date is applied first)
- Task 2: "Quick fix" (high, 2025-03-10)
- Task 3: "Critical deploy" (critical, 2025-03-15)
- Task 1: "Urgent report" (high, 2025-03-20)
Step 2: Sort by priority (descending)
- Task 3: "Critical deploy" (critical) - priority 4
- Task 2: "Quick fix" (high) - priority 3
- Task 1: "Urgent report" (high) - priority 3
sorted_tasks[0] = Task 3. Correct!
But now consider this data:
tasks = [
Task(1, "Urgent report", "high", "2025-03-20"),
Task(2, "Quick fix", "high", "2025-03-10"),
]
After sort_by_due_date: Task 2 (March 10), Task 1 (March 20).
After sort_by_priority: Both are "high" (priority 3), so Python's stable sort preserves the due_date order.
Result: Task 2 first, then Task 1. This is correct — same priority, earlier due date comes first.
Everything seems fine... so where is the second hidden bug?
The Subtle Trap
Let us look at the created_at field:
self.created_at = datetime.now().isoformat()
And then in sort_by_multiple:
elif criterion == "created_at":
result = sorted(result, key=lambda t: t.created_at)
This sorts by created_at as a string, not a datetime. Because isoformat() produces strings like "2025-03-10T14:30:00.123456", string comparison happens to produce the correct chronological order (ISO 8601 format is designed for this). So this works.
But what if someone creates tasks across different time zones and the isoformat includes timezone offsets? "2025-03-10T14:30:00+05:00" vs "2025-03-10T14:30:00-08:00" — string comparison would sort these by the offset characters, not by actual chronological order. This is a latent bug that will not manifest until timezone-aware datetimes are introduced.
Hidden Bug #2: String-based datetime comparison is brittle and will break with timezone-aware datetimes.
The Third Discovery
Now let us look at the sort_by_priority function more carefully:
def sort_by_priority(tasks: list[Task]) -> list[Task]:
return sorted(
tasks,
key=lambda t: PRIORITY_ORDER.get(t.priority, 0),
reverse=True
)
What happens if a task has a priority value not in PRIORITY_ORDER? For example, priority = "urgent" or priority = "HIGH" (uppercase)?
PRIORITY_ORDER.get("urgent", 0) # Returns 0
PRIORITY_ORDER.get("HIGH", 0) # Returns 0 — case sensitive!
A task with an unrecognized priority gets a value of 0, placing it below even "low" priority tasks (which have value 1). A task marked as "HIGH" (uppercase) would be treated as lower than "low." This is silent and produces obviously wrong results without any error message.
Hidden Bug #3: Priority sorting silently mishandles unrecognized or case-variant priority values.
The Fourth Discovery
Let us go back to sort_by_due_date and examine the datetime.max sentinel:
def sort_key(task: Task):
date = parse_date(task.due_date)
if date is None:
return datetime.max
return date
datetime.max is datetime(9999, 12, 31, 23, 59, 59, 999999). This is a naive datetime (no timezone info). If parse_date ever returns a timezone-aware datetime, comparing a naive datetime with an aware datetime will raise a TypeError in Python 3. This is the same timezone fragility we noted with created_at, but here it would cause an immediate crash rather than just wrong ordering.
But there is a more immediate issue. Look at parse_date:
return datetime.strptime(date_string, "%Y-%m-%d")
This parses dates like "2025-03-10" but not "2025-3-10" (single-digit month). While most systems produce zero-padded dates, user-entered dates might not be. More importantly, what if the due_date was stored in a different format in an earlier version of the application?
Hidden Bug #4: parse_date is inflexible about date format and does not handle variations gracefully.
The Resolution
After our detective work, we have found four hidden bugs:
| Bug | Description | Trigger Condition | Severity |
|---|---|---|---|
| #1 | parse_date crashes on empty strings and malformed dates |
Empty or non-standard date string | High |
| #2 | String-based created_at comparison breaks with timezones |
Timezone-aware datetimes | Medium |
| #3 | Priority sort silently mishandles unrecognized priorities | Typo or case mismatch in priority | Medium |
| #4 | parse_date only accepts strict %Y-%m-%d format |
User-entered dates in any other format | Low |
Here is the corrected code for the most critical functions:
def parse_date(date_string: Optional[str]) -> Optional[datetime]:
"""Parse a date string into a datetime object.
Handles None, empty strings, and multiple date formats gracefully.
Args:
date_string: A date string, None, or empty string.
Returns:
A datetime object, or None if the input cannot be parsed.
"""
if not date_string or not date_string.strip():
return None
date_string = date_string.strip()
formats = ["%Y-%m-%d", "%Y/%m/%d", "%m/%d/%Y", "%d-%m-%Y"]
for fmt in formats:
try:
return datetime.strptime(date_string, fmt)
except ValueError:
continue
# Log a warning instead of crashing
import logging
logging.warning("Could not parse date: '%s'", date_string)
return None
def sort_by_priority(tasks: list[Task]) -> list[Task]:
"""Sort tasks by priority, highest first.
Unrecognized priorities are treated as lowest priority, and a
warning is issued.
"""
import logging
def priority_key(task: Task) -> int:
normalized = task.priority.lower().strip()
value = PRIORITY_ORDER.get(normalized)
if value is None:
logging.warning(
"Unrecognized priority '%s' for task '%s'. "
"Treating as lowest priority.",
task.priority, task.title
)
return 0
return value
return sorted(tasks, key=priority_key, reverse=True)
The Moral of the Story
This case study illustrates why code review skills are essential for vibe coders. The original code:
- Passed a casual reading with flying colors
- Had good structure, naming, type hints, and docstrings
- Worked correctly for the "happy path" test cases
- Contained four bugs that only surfaced under specific conditions
The bugs were found through systematic tracing with edge cases — the techniques from section 7.3. Specifically:
- Testing with empty and malformed inputs revealed Bug #1.
- Thinking about future changes (timezone support) revealed Bug #2.
- Testing with unexpected values (case variations, typos) revealed Bug #3.
- Considering real-world user behavior (non-standard date formats) revealed Bug #4.
None of these bugs would have been found by simply reading the code and nodding along. They required actively questioning the code: "What if this input is not what you expect?" This skeptical mindset — trusting but verifying — is the core of effective code review.
The next time you receive AI-generated code that "looks fine," remember the hidden bug. Take the time to trace through it with edge cases. The bugs you find in review are the bugs your users never encounter.