Case Study: Raj's Phantom Function — Diagnosing an AI Hallucination in Production

The Setup

Raj was building a new feature for Clearpath Financial's payment processing API: an endpoint that would allow partner companies to query transaction history with advanced filtering options. The feature had been scoped, designed, and was now in implementation.

The implementation required integrating with their existing transaction database layer, which used a Python ORM (SQLAlchemy) along with a custom query builder library that Clearpath had developed internally. Raj knew the internal library well but was less familiar with a specific advanced filtering capability he needed — one that would allow partners to query across multiple nested relationship types simultaneously.

He asked his AI coding assistant:

"Using SQLAlchemy, I need to implement a query that joins the transactions table with payment_methods and merchant_categories, filters by date range and minimum amount, and returns paginated results. The filtering should support nested conditions (e.g., 'transactions in merchant category X OR (amount > Y AND merchant is Z)'). What's the best pattern to implement this?"

The AI provided a detailed implementation:

from sqlalchemy import and_, or_
from sqlalchemy.orm import Session
from clearpath_query import AdvancedFilter, QueryBuilder

def get_filtered_transactions(
    session: Session,
    filters: dict,
    page: int = 1,
    per_page: int = 100
) -> dict:
    builder = QueryBuilder(session, Transaction)

    # Apply nested conditions using AdvancedFilter
    if 'conditions' in filters:
        advanced_filter = AdvancedFilter.from_dict(filters['conditions'])
        builder.apply_advanced_filter(advanced_filter)

    # Apply date range
    if 'date_range' in filters:
        builder.date_filter(
            filters['date_range']['start'],
            filters['date_range']['end']
        )

    # Pagination
    results = builder.paginate(page=page, per_page=per_page)
    return results.to_dict()

The code looked excellent. It used the internal clearpath_query library, it handled the nested conditions via AdvancedFilter.from_dict(), and it was clean, well-structured, and consistent with Clearpath's code style.

Raj reviewed it, added it to his implementation file, wrote unit tests that mocked the QueryBuilder, and committed the code.

The unit tests passed. The code review passed. The feature shipped to staging.

The Failure

Two days after deploying to staging, Raj received a Slack message from the QA engineer:

"The transaction history endpoint is crashing on all requests with a nested filter. AttributeError: module 'clearpath_query' has no attribute 'AdvancedFilter'."

Raj opened the clearpath_query documentation. He searched the codebase. He checked the library's changelog.

AdvancedFilter did not exist.

The QueryBuilder class existed. The date_filter method existed. But AdvancedFilter — the class the AI had used to handle nested conditions — had never been implemented in their internal library.

The AI had generated code that used a class that did not exist, in an internal library the model had presumably seen referenced in code snippets but had no complete knowledge of. The code was syntactically correct, stylistically consistent, and completely plausible — and it called a function that had never been written.

The Diagnostic Process

When Raj examined the failure, he applied the diagnostic framework:

Specific failure: AttributeError: module 'clearpath_query' has no attribute 'AdvancedFilter' — the code calls a method on an object that does not exist in the actual library.

Root cause: Hallucination (Root Cause 5). The model generated a plausible API surface for the clearpath_query library — inventing a class (AdvancedFilter) and method chain that fit the library's naming conventions and usage patterns, but that were never actually implemented.

Why it was hard to catch: 1. The code compiled without errors (Python does not check for attribute existence at compile time) 2. The unit tests mocked the QueryBuilder object, so AdvancedFilter was never instantiated during testing 3. The generated code was consistent with Clearpath's actual code style and library conventions — it looked like something that should exist 4. The code review focused on logic and security, not on verifying that every API call corresponded to real methods

The Hallucination Pattern

After identifying the failure, Raj investigated why this happened and documented the pattern for his team:

The internal library hallucination problem: When an AI model is given context about a custom internal library (method names, usage patterns from examples), it has enough information to generate plausible-looking code that follows the library's conventions — but it does not have the complete library definition. It fills in gaps by generating methods and classes that should exist given the library's structure and naming conventions.

In this case, the AI had likely seen examples of QueryBuilder used in Raj's codebase (provided as context) and inferred that a library called clearpath_query with a sophisticated QueryBuilder would also have an AdvancedFilter class. This was a plausible inference. It was wrong.

The unit test coverage gap: The tests mocked the QueryBuilder, which meant the fake AdvancedFilter was never actually instantiated. This is a common pattern when testing against abstracted interfaces — the mock bypasses the actual API surface. The failure mode was invisible in testing and only manifested when real objects were used in staging.

The Repair and the Protocol

Immediate repair: Raj replaced the AdvancedFilter.from_dict() approach with a direct SQLAlchemy and_/or_ construction, which he knew definitively existed:

from sqlalchemy import and_, or_
from sqlalchemy.orm import Session
from clearpath_query import QueryBuilder

def get_filtered_transactions(
    session: Session,
    filters: dict,
    page: int = 1,
    per_page: int = 100
) -> dict:
    builder = QueryBuilder(session, Transaction)

    # Build nested conditions using SQLAlchemy directly
    if 'conditions' in filters:
        conditions = _build_conditions(filters['conditions'])
        builder.apply_filter(conditions)

    if 'date_range' in filters:
        builder.date_filter(
            filters['date_range']['start'],
            filters['date_range']['end']
        )

    results = builder.paginate(page=page, per_page=per_page)
    return results.to_dict()

def _build_conditions(conditions_dict: dict):
    """Build SQLAlchemy condition from nested dict structure."""
    # [Implementation using real SQLAlchemy API]
    pass

The repair took 45 minutes including documentation updates.

The verification protocol Raj built afterward:

Any time AI generates code that uses: 1. An internal library or custom module 2. Third-party library methods you're less familiar with 3. API calls to external services

Run this verification step before committing:

For each method call or class instantiation in this code:
1. Does this method/class definitely exist in the actual library?
2. Does it have the exact signature the code assumes (parameter names, order, types)?
3. Is there documentation or source code I can reference to verify?

Flag any method calls you are uncertain about with a comment: # VERIFY: [reason for uncertainty]

He added this as a prompt addition to his Code Generator pattern:

After generating this code, review it specifically for:
1. Any calls to methods or classes in [LIBRARY NAME] — verify each one exists
   in the actual library documentation before including it
2. Flag any API calls you are uncertain about with: # VERIFY: check this method exists
3. If you are less than certain that a method exists, suggest the closest verified
   alternative instead of inventing the method name

The Team Discussion

Raj presented the incident to his team in their weekly engineering sync. His summary:

"The code compiled, the tests passed, and the review passed — and none of that detected the problem. The only thing that would have caught it was running the actual code against the actual library in staging. Which is fine — that's what staging is for. But it took two days instead of two minutes if I had thought to verify the AdvancedFilter class existed before committing."

The team discussion produced three new practices:

Practice 1: The "does this exist?" check for internal library calls. Any AI-generated code that calls a method on an internal or custom library gets a 30-second verification check: search the codebase or documentation to confirm the method exists with the expected signature.

Practice 2: Integration tests for AI-generated adapter code. When AI generates code that wraps an external API or internal library (as this code did), the team adds an integration test that actually instantiates the objects (not mocks them). This catches the "phantom method" failure before staging.

Practice 3: A comment convention for uncertain API calls. Any time a developer (human or AI-assisted) uses an API method they haven't personally used before, they add # VERIFY: [what to check] as a comment. This makes unknowns visible in code review.

The Broader Lesson About AI Code Hallucination

Raj added a section to his team's AI Coding Guide:

AI models hallucinate code with specific patterns:

Internal/custom library methods: Models that have seen examples of your internal libraries will generate plausible but invented API surfaces. This is the most dangerous hallucination type because it's the hardest to catch.
Less common method variants: Models sometimes generate method signatures with parameters that don't exist, or parameter orders that are inverted, especially for less commonly used methods in popular libraries.
Deprecated methods: Models trained before a major library version may generate calls to deprecated or removed methods that existed in earlier versions.
Invented exception types: When generating error handling code, models sometimes generate custom exception class names that don't exist in the actual library.

The verification rule of thumb: For any method or class call in AI-generated code that you didn't write yourself and haven't personally used — verify it before committing. 30 seconds of verification prevents 2 hours of debugging.

"This isn't about not trusting AI," Raj said. "I use AI for code generation dozens of times per day. The code it generates is mostly excellent. This is about knowing specifically where the hallucination risk is highest — internal APIs, custom libraries, less-common method variants — and building a 30-second verification habit for exactly those cases. That's calibrated trust, not blanket skepticism or blanket trust."

The Postmortem Documentation

Raj documented this incident in his failure taxonomy:

DATE: [date]
TASK: Implementing query builder for partner API endpoint
FAILURE: AdvancedFilter class called in code does not exist in clearpath_query library
ROOT CAUSE: Hallucination — model invented plausible API surface for internal library
REPAIR: Replaced with verified SQLAlchemy direct construction; 45 minutes
REPAIR PROMPT: Not applicable — full rewrite of the specific function required
PREVENTION FOR NEXT TIME:
  1. Add "does this method exist?" verification step to all AI-assisted code reviews
  2. Verify any internal library method before committing
  3. Add integration tests for all code that wraps internal/external APIs
  4. Add "VERIFY" comment convention for uncertain API calls
TIME LOST: 2.5 days (incident detection + diagnosis + repair + team discussion)

The documentation took 5 minutes to write. The pattern it captured has already prevented three similar incidents in the months since, identified by team members who remembered the AdvancedFilter case and ran the 30-second verification check before they committed their own AI-generated code.