Case Study 02: The Rubber Duck Breakthrough

A Developer Uses AI Conversation to Discover a Design Flaw

Context

Marcus was a mid-level developer working on a logistics platform called ShipRight. The platform managed package routing across a network of warehouses, sorting facilities, and delivery hubs. Marcus had designed the routing system six months earlier and it had been running in production without major issues --- until the company expanded from 12 facilities to 35.

After the expansion, the routing engine started making suboptimal decisions. Packages were being routed through unnecessary intermediate stops, increasing delivery times by 15-20%. Marcus had spent three days reviewing the code and running tests, but he could not pinpoint the root cause. The code was doing exactly what it was designed to do --- the problem seemed to be in the design itself, not the implementation.

Frustrated, Marcus decided to use an AI assistant not to write code, but as a thinking partner. What followed was a conversation that revealed a fundamental flaw in his architecture --- one he had not even been looking for.

The Conversation

Turn 1: Setting the Stage

Marcus's Prompt:

I want to use you as a thinking partner, not a code generator. I have a
routing system for a logistics platform that's making suboptimal decisions
after we scaled from 12 to 35 facilities. I've been debugging for 3 days
and can't find the issue.

Let me explain the architecture, and I'd like you to:
1. Ask me questions to understand the system
2. Challenge any assumptions I might be making
3. Help me think about where the design might be flawed

Here's the core design:

The routing engine uses a graph where nodes are facilities and edges are
transport links between them. Each edge has a cost (time + fuel).
When a package enters the system, we run Dijkstra's algorithm to find
the shortest path from origin to destination. The path is computed once
at package intake and stored as the package's route plan.

The graph is rebuilt every 24 hours from our facility database.

AI Response: The AI acknowledged the setup and asked its first probing question: "When you say the graph is rebuilt every 24 hours, what happens to packages that are already in transit? Do they keep their original route plan, or do they get re-routed with the new graph?"

This was a reasonable question, and Marcus had considered it.

Turn 2: Exploring the In-Transit Question

Marcus's Response:

Good question. In-transit packages keep their original route plan. We made
that decision intentionally because re-routing mid-transit would add
complexity and could cause packages to bounce back and forth if the graph
keeps changing. Once a package is committed to a route, it follows that
route to completion.

The issue isn't with in-transit packages --- it's with new packages. Even
newly computed routes are suboptimal. They go through unnecessary stops.
For example, a package from Warehouse A to Delivery Hub Z might go
A -> B -> C -> D -> Z when A -> C -> Z is a valid and shorter route.

AI Response: The AI asked about the edge data: "How are the edge costs calculated? Specifically, does the cost on the edge from A to C only represent the direct transport cost, or does it also factor in capacity, congestion, or current load at each facility?"

Turn 3: The Cost Function Deep Dive

Marcus's Response:

The edge cost is calculated as:
cost = base_transport_time + (distance * fuel_cost_per_km)

It's purely based on the transport link itself. We don't factor in facility
load or congestion. The graph is static between rebuilds --- costs don't
change based on how many packages are flowing through.

Actually, wait. We do have one dynamic component. When a facility reaches
80% capacity, we add a "congestion penalty" to all edges leading INTO that
facility. The penalty is a multiplier: cost * 1.5 for 80-90% capacity,
cost * 3.0 for 90-100%. This is meant to route packages away from
overloaded facilities.

AI Response: The AI homed in on this detail: "That congestion penalty is interesting. A few questions about it: (1) How frequently is facility capacity recalculated? (2) When you add the penalty to edges going into a congested facility, does Dijkstra's algorithm then find routes that avoid that facility entirely, or does it sometimes route through the facility via a different incoming edge? (3) With 35 facilities, how many are typically in the 80%+ capacity range?"

This was the question that started to unravel the mystery.

Turn 4: The Eureka Moment Begins

Marcus's Response:

Let me answer each:

1. Capacity is recalculated every hour and penalties are applied to the
   live graph immediately.

2. When penalties are applied, Dijkstra's avoids those edges. So yes,
   packages get routed around congested facilities.

3. Since the expansion... actually, let me check...

Okay I checked. With 35 facilities, on average 8-12 facilities are in
the 80%+ range at any given time during peak hours. Before the expansion,
with 12 facilities, it was typically 1-2.

Hmm. That's a lot. With 12 of 35 facilities penalized, that's over a
third of the network with inflated costs.

AI Response: The AI articulated what Marcus was starting to see: "I think we may be finding the issue. Let me reflect back what I understand and see if it matches your situation:

With 12 facilities, 1-2 being penalized was manageable. Dijkstra's would route around them with minor detours. But with 35 facilities and 8-12 penalized simultaneously, the algorithm is trying to avoid a third of the network. This could create a cascade effect:

Packages get routed away from the 8-12 congested facilities.
Those detour routes flow through the remaining ~23 uncongested facilities.
The increased load pushes some of those facilities past 80%.
New penalties appear, further constraining the available paths.
Routes become increasingly tortuous as the algorithm navigates a shrinking set of 'affordable' paths.

Is this consistent with what you're observing? Do you see the set of congested facilities shifting over time?"

Turn 5: Marcus Connects the Dots

Marcus's Response:

Yes. YES. That's exactly what's happening. I just pulled the congestion
logs and I can see the cascade:

Hour 1: Facilities 3, 7, 12 are congested
Hour 2: Packages avoid 3, 7, 12 -> Facilities 5, 9, 15 become congested
Hour 3: Packages avoid 3, 5, 7, 9, 12, 15 -> More facilities become congested
Hour 4: Half the network is penalized

And routes computed during hour 4 are terrible because Dijkstra is trying
to find paths through an artificially expensive graph where half the edges
have 1.5x or 3x cost multipliers.

The congestion penalty system is creating an oscillation problem. It's a
feedback loop that destabilizes the network at scale.

At 12 facilities the damping was sufficient --- the network absorbed the
redistributed load. At 35 facilities, the system is above the stability
threshold.

AI Response: The AI validated Marcus's insight and then pushed the analysis further: "Right. This is a classic feedback instability problem, similar to what happens in network routing protocols like OSPF when link costs change too aggressively. The core issue is that your congestion penalties cause global rerouting in response to local congestion, and the penalties are too aggressive (1.5x and 3.0x are large multipliers).

A few approaches to consider: (1) damped penalties that change gradually, (2) localized rerouting that only affects nearby packages, (3) predictive routing that anticipates capacity rather than reacting to it, or (4) a different congestion model entirely. Would you like to explore any of these?"

Turn 6: Designing the Fix

Marcus's Response:

Let's explore option 1 (damped penalties) combined with a modified
option 4. I think the penalty model is fundamentally wrong. Instead
of penalizing edges into congested facilities, we should:

1. Use a softer penalty curve --- instead of hard thresholds at 80% and 90%,
   use a continuous function that gradually increases cost as capacity
   fills up.

2. Apply a damping factor so penalties change slowly --- maybe a weighted
   moving average over the last 4 hours instead of reacting to the
   current hour's capacity.

3. Set a maximum penalty cap --- even a fully loaded facility should not
   have more than a 1.3x multiplier, so Dijkstra will still consider
   those routes if they're significantly shorter.

Does this make sense as an approach? What am I missing?

AI Response: The AI affirmed the approach but raised a concern Marcus had not considered: "The approach is sound. The damped, continuous penalty curve with a cap directly addresses the oscillation problem. One thing to consider: with a maximum 1.3x multiplier and damped changes, you might see the opposite problem --- facilities could become genuinely overloaded because the penalty is not strong enough to divert traffic. You might want to add a hard capacity limit (a circuit breaker) as a safety valve: if a facility hits 95% capacity, it stops accepting new packages entirely, regardless of routing costs. This is a different mechanism from the cost penalty and prevents true overload scenarios."

Turn 7: Validation and Implementation Planning

Marcus's Response:

The circuit breaker idea is excellent --- it separates the "optimize routing"
concern from the "prevent overload" concern. The soft penalties handle
optimization, the circuit breaker handles safety.

Let me summarize the new design:

1. Continuous penalty function: penalty = 1 + 0.3 * (utilization / capacity)^3
   This gives almost no penalty below 70% and ramps up to 1.3x at 100%.

2. Damping: penalty_effective = 0.7 * previous_penalty + 0.3 * current_penalty
   Updated every hour.

3. Circuit breaker: If facility reaches 95% capacity, all incoming edges
   are removed from the graph entirely (not penalized, removed) until
   capacity drops below 85% (hysteresis to prevent flapping).

4. Monitoring: Log penalty values and circuit breaker activations so we
   can tune the parameters.

Can you validate this design? Are there any edge cases or failure modes
I should consider?

AI Response: The AI identified two edge cases: first, what if circuit breakers trigger on enough facilities to make certain destinations unreachable (the solution needs a fallback mechanism); second, the damping factor means it takes several hours for the system to react to a genuinely overloaded facility (the circuit breaker handles the safety case, but there is a responsiveness gap). The AI suggested a "fast track" override where rapid capacity increases above a certain rate would bypass damping.

The Resolution

Marcus implemented the new congestion model over two days. The results were dramatic:

Average route length decreased by 18%, returning to pre-expansion levels.
Facility utilization became more balanced, with the standard deviation across facilities dropping by 40%.
Circuit breaker activations averaged 0.3 per day, confirming they were a safety valve, not a routing mechanism.
The oscillation pattern disappeared entirely from the monitoring dashboards.

Analysis: Why the Conversation Worked

The AI Did Not Find the Bug

This is the most important point. The AI did not analyze Marcus's code and locate a bug. The code was working correctly --- it was faithfully implementing a flawed design. The AI served as a thinking partner that helped Marcus see his own system from a new perspective.

The Key Moments

Turn 3: Marcus's "Actually, wait" moment. When describing the cost function, Marcus initially forgot about the congestion penalty. The act of explaining his system comprehensively forced him to recall a detail he had been overlooking during his debugging sessions. This is the rubber duck effect in its purest form.
Turn 4: Marcus checking the data. The AI's question about how many facilities were congested prompted Marcus to pull actual numbers. Seeing "8-12 out of 35" was the spark that ignited his understanding.
Turn 5: The AI's cascade description. The AI articulated the cascade effect clearly, giving Marcus a mental model for a phenomenon he was observing but had not conceptualized.
Turn 6: The capacity concern. Marcus's proposed fix could have introduced a new problem (under-penalizing genuinely overloaded facilities). The AI caught this before it became another multi-day debugging session.

What Made Marcus's Approach Effective

He asked for a thinking partner, not a code generator. By framing the conversation as collaborative thinking, Marcus set up a dynamic where the AI asked probing questions rather than jumping to solutions.
He explained his system honestly, including the parts he was less sure about. When he described the cost function, he corrected himself mid-explanation ("Actually, wait..."). This honesty with himself and the AI was crucial.
He followed the AI's questions genuinely. When the AI asked how many facilities were congested, Marcus took the time to check real data rather than guessing. The real data was the key to the insight.
He did not ask the AI to fix it for him. After understanding the problem, Marcus designed the solution himself with the AI as a sounding board. This meant the solution was grounded in his deep understanding of the system.

Lessons for Your Practice

When to Use AI as a Thinking Partner

The rubber duck approach is especially effective when:

You have been debugging for hours or days without progress. This often means the issue is in your mental model, not the code. Explaining your model to the AI forces you to examine it critically.
The system is working "correctly" but producing wrong outcomes. Design flaws do not cause error messages or stack traces. They produce correct-but-wrong behavior that can only be found by questioning the design itself.
The problem involves emergent behavior. Marcus's oscillation problem was not in any single component --- it emerged from the interaction between the penalty system, the routing algorithm, and the facility network at scale. These kinds of problems are best found through systemic thinking, not code-level debugging.
You suspect the issue is at a level of abstraction above the code. If you have verified the code implements the algorithm correctly, the algorithm itself might be the problem.

How to Structure a Thinking Partner Conversation

Frame the conversation explicitly. Tell the AI you want it to ask questions and challenge assumptions, not generate code.
Explain your system as if teaching a new team member. The forced comprehensiveness of teaching reveals gaps in your own understanding.
Follow up on the AI's questions honestly. When the AI asks about a detail, do not hand-wave. Check the data, review the code, verify your assumptions.
Let the AI articulate patterns. The AI's description of the cascade effect in Turn 5 gave Marcus language for something he had been sensing but could not articulate. Pattern articulation is one of the AI's most valuable thinking-partner capabilities.
Use the AI to stress-test your solution. Before implementing a fix, have the AI look for edge cases and failure modes. Marcus's circuit breaker might have introduced a reachability problem if the AI had not flagged it.

The Deeper Insight

The most powerful aspect of the rubber duck conversation is not the AI's intelligence --- it is the forced articulation of your own thinking. Marcus knew about the congestion penalty. He knew about the expansion. He knew about the suboptimal routes. But he had never put these three facts into a single narrative. The act of explaining the system to the AI created that narrative, and the narrative revealed the flaw.

This is why rubber duck debugging works even with an actual rubber duck. The AI adds value beyond the duck by asking targeted questions, articulating patterns, and catching issues in proposed solutions. But the fundamental mechanism --- forced, comprehensive articulation of your own mental model --- is what drives the breakthrough.

The next time you are stuck, before you write another debugging prompt, try this: explain your system to the AI as a thinking partner. The answer might already be in your head, waiting for the right conversation to bring it out.