Case Study: Raj's Debugging Chain — How CoT Prompting Found the Bug in 3 Minutes

DataField.Dev

Case Study: Raj's Debugging Chain — How CoT Prompting Found the Bug in 3 Minutes

Background

Raj is a senior software engineer at Clearpath Financial, a fintech company that processes automated payment transfers. The system he maintains handles approximately 40,000 transactions per day, and reliability is non-negotiable. When bugs appear, the pressure to identify and fix them quickly is significant.

On a Tuesday afternoon, the on-call monitoring system flagged an anomaly: a batch payment process was completing successfully (no errors thrown), but approximately 3% of transfers were not reaching the destination accounts. The money was leaving source accounts, the transactions were logging as completed, and destination accounts were not being credited. The transfers were simply vanishing.

This is the kind of bug that produces immediate escalation. Money moving in the wrong direction — or not moving at all — is a severity-one issue in financial services.

The Initial Attempts

Raj's first instinct was to check the obvious causes: database connection errors, API timeouts, network issues. He ran queries, checked logs, and inspected the monitoring dashboard. Everything looked normal. The system was reporting success at each step of the pipeline.

He tried two approaches before reaching for AI assistance.

Attempt 1: Searched internal documentation and Stack Overflow for similar issues with the payment processor API they used. Found several relevant threads, but none matched the specific symptom — silent failure on destination credit with confirmed source debit.

Attempt 2: Ran the problematic transaction IDs through a direct API call. The manual calls succeeded. The automated batch calls were failing silently. This narrowed the issue to something specific to the batch processing code, not the API itself.

At this point, he had spent 45 minutes, narrowed the problem to the batch processor, and was staring at approximately 600 lines of payment processing code looking for the cause of a 3% silent failure rate.

He opened his AI tool.

The First Prompt (Standard Approach)

His initial prompt was what most engineers write when they first turn to AI for debugging:

"This payment processing function is silently failing for about 3% of transactions — the source account gets debited but the destination account never gets credited. It logs success but the money doesn't arrive. What could cause this?"

The response was professional and comprehensive — and almost completely unhelpful for his specific situation. The AI listed 8 potential causes: race conditions, async/await misuse, unhandled promise rejections, incorrect error handling, database transaction rollback issues, API rate limiting, environment variable mismatches, and caching issues.

All of these were legitimate general causes of silent payment failures. None of them pointed to the actual problem. Raj spent 15 minutes checking the top three suggestions and found nothing.

He'd now spent over an hour on a severity-one issue.

The CoT Prompt

Raj remembered something from a team knowledge-sharing session: that AI debugging is dramatically more effective when you force explicit reasoning rather than pattern-matching to generic solutions. He rebuilt his prompt from scratch.

Full Prompt:

I have a critical bug: a batch payment processing function that silently
fails for ~3% of transactions. The source account is debited, the transaction
logs as "SUCCESS", but the destination account is never credited.

Before suggesting any fix, I need you to reason through this step by step:

1. INTENT ANALYSIS: Read this code carefully and describe in your own words
   exactly what it's supposed to do — the full intended execution flow from
   receiving a transaction batch to confirming all transfers complete.

2. FAILURE MODE ANALYSIS: For the specific symptom (source debited, success
   logged, destination NOT credited), what are the only possible execution
   paths that could produce this outcome? Not general causes — specific paths
   through this code.

3. ISOLATION: Of those paths, which ones would affect exactly ~3% of
   transactions rather than all transactions or none? What characteristics
   of specific transactions would select them into this failure group?

4. SUSPECT RANKING: Rank your identified paths from most to least likely,
   explaining why.

5. THEN AND ONLY THEN: Suggest a specific investigation step for the
   top suspect.

Here is the function:

[BATCH PAYMENT PROCESSOR - approximately 80 lines of Python]

async def process_payment_batch(batch_id: str, transactions: List[Transaction]):
    """Process a batch of payment transactions."""
    batch_record = await db.create_batch_record(batch_id)
    results = []

    for transaction in transactions:
        try:
            # Debit source account
            debit_result = await payment_api.debit(
                account_id=transaction.source_account,
                amount=transaction.amount,
                reference=transaction.reference_id
            )

            if debit_result.status == "SUCCESS":
                # Credit destination account
                credit_task = asyncio.create_task(
                    payment_api.credit(
                        account_id=transaction.destination_account,
                        amount=transaction.amount,
                        reference=transaction.reference_id
                    )
                )
                results.append({
                    "transaction_id": transaction.id,
                    "status": "SUCCESS",
                    "debit_ref": debit_result.reference
                })

        except PaymentAPIException as e:
            results.append({
                "transaction_id": transaction.id,
                "status": "FAILED",
                "error": str(e)
            })

    await db.update_batch_record(batch_id, results)
    return results

AI Response:

Step 1 — Intent Analysis:

This function is intended to process a list of payment transactions in sequence. For each transaction, it should: (1) debit the source account, (2) if the debit succeeds, credit the destination account, and (3) record the result. After all transactions, it updates a batch record with the complete results.

Step 2 — Failure Mode Analysis:

For the specific symptom — source debited, SUCCESS logged, destination NOT credited — the execution path would have to be: (1) debit executes and returns SUCCESS, AND (2) the code records SUCCESS AND logs it, AND (3) the credit call either never executes or executes but its result is never awaited or verified.

Looking at the code, I see something critical on line 23: credit_task = asyncio.create_task(...). This creates a coroutine task — but the task is never awaited. The code creates the credit task, then immediately appends a SUCCESS result to the results list, and moves on to the next transaction.

The credit operation is running in the background as an untracked task. The process_payment_batch function does not wait for it to complete and does not check whether it succeeded.

Step 3 — Isolation (Why Only 3%, Not All):

If the credit task is always being created but never awaited, why does it succeed 97% of the time?

The answer is timing. The asyncio.create_task() schedules the coroutine to run, and in most cases the event loop gets to execute it before the batch function exits or the parent process moves on. The 3% failure rate represents cases where the event loop is under enough load — or the batch function completes quickly enough — that the untracked credit tasks are garbage collected or the process terminates before they execute.

The 3% rate would be highly variable with load: under higher system load (more concurrent processes), the event loop has less opportunity to run the untracked tasks, and the failure rate would increase. This is a classic async bug that appears intermittent but is actually deterministic given the right conditions.

Step 4 — Suspect Ranking:

Primary (>95% confidence): Missing await on credit_task. The asyncio.create_task() without a subsequent await credit_task is the exact code pattern that produces this symptom. The fact that it's a background task also explains why no exception is being raised and caught — errors in untracked tasks are silently swallowed unless you explicitly handle them.

Secondary (5%): A race condition in the payment API client itself — possible but would require a very specific and unlikely implementation detail.

Step 5 — Investigation:

Check whether credit_task is ever awaited anywhere in the function. It isn't — the variable is created but never used again. The fix is to replace asyncio.create_task(...) with a direct await payment_api.credit(...), or if parallel execution is intentional, collect all tasks and await them with asyncio.gather() at the end.

The Result

Raj read the response. The reasoning in Step 2 was the moment of recognition: asyncio.create_task() without an await. He had written that code six months earlier during a performance optimization push — the goal had been to create parallel credit calls and gather them at the end, but the gather call had never been implemented.

The credit calls were being fired and forgotten. Ninety-seven percent of the time, the event loop got to them. Three percent of the time, something — high load, batch completion speed, process scheduling — killed them before they executed.

He verified by searching for any await credit_task or task collection anywhere in the function. There was none.

Time from start of CoT prompt to confirmed root cause: 3 minutes.

Total time on the bug including the initial investigation: approximately 75 minutes — with the critical last 3 minutes being the most productive.

The Fix

# Before (broken):
credit_task = asyncio.create_task(
    payment_api.credit(
        account_id=transaction.destination_account,
        amount=transaction.amount,
        reference=transaction.reference_id
    )
)
results.append({"transaction_id": transaction.id, "status": "SUCCESS", ...})

# After (fixed — sequential, guaranteed execution):
credit_result = await payment_api.credit(
    account_id=transaction.destination_account,
    amount=transaction.amount,
    reference=transaction.reference_id
)
if credit_result.status != "SUCCESS":
    # Handle credit failure — needs reversal logic
    await payment_api.reverse_debit(debit_result.reference)
    results.append({"transaction_id": transaction.id, "status": "FAILED", ...})
else:
    results.append({"transaction_id": transaction.id, "status": "SUCCESS", ...})

The fix also revealed a second problem the CoT prompt helped surface: the original code had no reversal logic for the case where the debit succeeds but the credit fails. Even after fixing the await issue, a network failure during the credit call could still debit a source account without crediting the destination. The AI's analysis of the execution paths made this gap visible.

What Made the CoT Prompt Work

Looking back at the prompt structure, three elements were critical:

1. Forcing intent analysis before failure analysis. By asking the model to describe what the code was supposed to do, the prompt required it to actually read and interpret the code rather than pattern-match to generic async bug patterns. The intent analysis is what surfaced the asyncio.create_task() pattern as noteworthy.

2. Constraining the failure mode analysis to specific code paths. The instruction "not general causes — specific paths through this code" prevented the AI from producing the generic list it had given in the first prompt. It had to point to specific lines.

3. Asking about the 3% specificity. Most debugging prompts ask "what could cause this failure?" Raj's prompt asked "what could cause this failure in exactly 3% of cases?" That constraint is what led to the timing/event-loop explanation, which confirmed the diagnosis. A bug that affects all transactions would have different causes than one affecting exactly 3%.

Raj's Retrospective

After the fix was deployed and the phantom transactions reconciled, Raj documented the debugging session in his team's knowledge base. His summary:

"The standard debugging prompt gives you a brainstorm. The CoT debugging prompt gives you a differential diagnosis. The difference is whether you're asking 'what could this be?' or 'given these specific symptoms and this specific code, walk me through the only paths that produce this outcome.' The second question is harder to answer — it requires actually reading the code — but it's the question that finds bugs."

He updated his team's AI debugging template to include the five CoT steps as standard practice, and added a specific note about the 3% constraint: "Always ask why this failure pattern has the specific frequency it does. The frequency is a clue, and making the AI reason about it explicitly often surfaces the mechanism."

The template has since been used by three other engineers on the team, in each case reducing time-to-root-cause on intermittent bugs by more than half.