19 min read

Software development was one of the first domains where AI assistance became genuinely useful, and it remains the domain where the productivity evidence is strongest. The combination of well-structured tasks, machine-readable outputs, and immediate...

In This Chapter

1. AI-Assisted Development Workflow: All Stages
2. Architecture and Design
3. Implementation: Building Features Through Iterative Prompting
4. Code Review with AI
5. Debugging: AI-Assisted Investigation
9. Trust Calibration Imperative
10. Raj's Full Development Workflow: A Day in the Life
11. Research Breakdown: Empirical Studies on AI-Assisted Development Productivity

Chapter 23: Software Development and Debugging

But the trust calibration challenge is severe in ways that are not always obvious. AI-generated code works. It compiles, it runs, it passes the tests you ask it to write — and it may still contain subtle security vulnerabilities, incorrect edge case handling, performance issues under load, or dependencies on deprecated packages. The output looks correct because code that runs without errors looks identical to code that works correctly.

This chapter builds a comprehensive AI-assisted development workflow for the full software development lifecycle: architecture and design, implementation, code review, debugging, testing, documentation, and refactoring. The framework is grounded in the trust calibration imperative: AI is a powerful collaborator that requires active human oversight at every stage.

Raj is our primary persona throughout. His experience reflects the reality of senior professional software development — technically sophisticated, time-pressured, risk-aware, and operating in a codebase that has organizational consequences if it breaks.

1. AI-Assisted Development Workflow: All Stages

The development workflow maps to seven stages, each with distinct AI applications and trust calibration requirements.

Stage	Primary AI Use	Trust Risk Level
Architecture & Design	Exploring options, discussing trade-offs	Low — output is discussion, not code
Implementation	Code generation, feature completion	High — code runs before you fully understand it
Code Review	Security, performance, style analysis	Medium — AI misses context-dependent issues
Debugging	Hypothesis generation, trace analysis	Medium — AI may suggest wrong root cause confidently
Testing	Test generation, edge case identification	High — tests may miss important cases
Documentation	Docstrings, READMEs, API docs	Low-Medium — depends on accurate technical inputs
Refactoring	Pattern identification, code transformation	Medium — refactoring can introduce subtle bugs

2. Architecture and Design

Architecture and design decisions are where AI assistance is most straightforwardly useful and least risky. The reason is simple: you are producing decisions, not code. A bad architectural discussion does not break a production system.

Exploring Options

When facing an architectural decision, AI can rapidly enumerate the option space. Present your constraints and requirements, ask for multiple approaches, and use the discussion to sharpen your thinking.

Effective architecture prompting:

I'm designing a notification system for a B2B SaaS platform. Requirements:
- Multi-channel: email, SMS, in-app, webhook
- Expected volume: 100K-500K notifications per day
- Need guaranteed delivery for critical alerts
- Users can configure their notification preferences per event type
- Currently using PostgreSQL and Python/FastAPI

Present 3 architectural approaches for the notification system.
For each approach:
1. High-level architecture (2-3 sentences)
2. Key components
3. Advantages and disadvantages
4. Best suited when (what requirements favor this approach)

The resulting discussion gives Raj three viable starting points that he can evaluate against his organization's existing infrastructure, team capabilities, and operational requirements. None of this code exists yet — it is all discussion.

Discussing Trade-offs

AI can play the role of a knowledgeable technical discussion partner for trade-off analysis. Present your current leaning and ask for challenge:

I'm leaning toward a message queue approach (RabbitMQ or SQS) for the notification system.
My reasoning: we already use SQS elsewhere, guaranteed delivery is important, and the team
knows it. Challenge my reasoning. What am I potentially underweighting?

This prompting style — explicitly asking AI to challenge your current thinking — is one of the most productive uses of AI in design. Left to itself, AI will often agree with your proposed direction. Explicitly requesting the counterargument surfaces trade-offs you may have rationalized away.

Diagramming via Text

While AI cannot produce native diagrams, it can produce Mermaid diagram syntax that renders in most modern documentation tools (GitHub, Notion, GitLab). This allows AI to generate visual architecture representations:

Generate a Mermaid sequence diagram showing the notification delivery flow
for the message queue approach. Include: API trigger, queue, worker service,
delivery channels, and retry logic. Keep it readable.

3. Implementation: Building Features Through Iterative Prompting

Implementation is where AI's productivity gains are most dramatic — and where the trust risks are highest.

Chat-Based Code Generation

The most common implementation pattern: you describe what you want, AI generates the code, you review and refine it. The key to productive implementation with AI is iterative refinement rather than single-prompt completion.

Single-prompt temptation: "Write a complete REST API for user management." This produces a lot of code quickly — but the code will have assumptions baked in that you have not specified, and reviewing hundreds of lines of AI-generated code is cognitively demanding.

Better approach — the "explain then generate" technique:

Step 1: Explain the context. Submit your codebase context (the relevant files or summaries), your existing conventions, and the specific feature you need.

Step 2: Discuss the approach. Ask AI to propose an implementation approach before writing code. Review and approve (or correct) the approach.

Step 3: Generate in stages. Generate the data model first. Review. Generate the business logic. Review. Generate the API layer. Review. Generate the tests. Review.

Step 4: Refine iteratively. When something in the generated code is wrong or needs adjustment, describe the specific issue and ask for a targeted fix — do not regenerate the whole section.

The "Explain Then Generate" Technique

The explain-then-generate sequence is the single most productive change most developers make when adopting AI coding assistance. The principle: before AI writes any code, you have agreed on what the code will do.

STEP 1 (explanation):
I'm building a rate limiter for our payment processing API. Constraints:
- 1000 requests per minute per API key
- Must be distributed (multiple API server instances)
- Redis is available
- Python 3.11, FastAPI

Before writing any code, explain the approach you'd use. Include:
- The rate limiting algorithm you'd choose and why
- The Redis data structure you'd use
- How you'd handle the sliding vs fixed window trade-off
- Any edge cases I should think about

STEP 2 (generation, after reviewing the explanation):
Good. Use the sliding window log approach with Redis sorted sets as you described.
Now write the implementation.

By the time generation happens, you have verified that AI understands the requirements correctly. The code is almost always better than what comes from a cold generation prompt.

Providing Adequate Context

AI code generation quality is directly proportional to the context provided. Include: - Existing code patterns: Paste examples of how similar things are done in your codebase. AI will follow your conventions if you show it examples. - Framework and library versions: AI has knowledge of multiple versions of popular frameworks, and may use outdated APIs if you do not specify. - Non-obvious constraints: Performance requirements, security constraints, infrastructure limitations that are not obvious from the feature description alone.

4. Code Review with AI

AI code review is not a replacement for human code review. It is an additional pass that catches specific categories of issues efficiently — particularly issues that require broad knowledge (common security vulnerabilities, performance anti-patterns) rather than deep contextual knowledge (whether this logic aligns with the business requirements).

Self-Review Prompts

Before submitting code for human review, use AI to conduct a structured self-review:

Review the following Python code. Conduct separate passes for each of these dimensions
and report findings under separate headers:

1. SECURITY: SQL injection, XSS, authentication bypasses, insecure deserialization,
   sensitive data exposure, dependency vulnerabilities
2. PERFORMANCE: N+1 queries, missing indexes, inefficient algorithms, memory leaks,
   unnecessary database calls
3. CORRECTNESS: Edge cases not handled, off-by-one errors, incorrect assumptions
   about input data
4. READABILITY: Variable naming, function length, documentation, complexity
5. TEST COVERAGE: What cases should be tested that aren't currently tested?

[PASTE CODE]

Security-Focused Review

For code that handles sensitive data, authentication, or payment flows, a dedicated security review prompt is warranted:

Act as a security engineer reviewing this code for vulnerabilities.
Focus specifically on:
- Input validation and sanitization
- Authentication and authorization logic
- How sensitive data is handled (stored, logged, transmitted)
- Error handling — does it expose sensitive information?
- Third-party dependency risks
- OWASP Top 10 vulnerabilities

For each finding, specify: the vulnerability type, the severity (critical/high/medium/low),
the specific line or section, and the recommended fix.

[PASTE CODE]

What AI Code Review Misses

AI code review has consistent blind spots that you must compensate for with human review:

Business logic correctness: AI does not know whether your business logic correctly implements the business requirement. It can verify that the code does what the code says, but not whether what the code says is what the business needs.

Context-dependent security: Some security vulnerabilities depend on how code is deployed and what surrounds it. AI reviewing a code snippet cannot see that the function is called from a context where the input is already validated upstream.

Architectural coherence: Whether this implementation fits correctly into the broader system — the data model, the API contracts, the deployment constraints — requires understanding the full system context that AI typically does not have.

Subtle race conditions: Concurrency bugs that depend on specific timing and interleaving are notoriously difficult to catch even with careful human review. AI is not reliably better at this than humans.

5. Debugging: AI-Assisted Investigation

Debugging is one of the highest-value AI applications in software development because the task structure is well-suited to AI's strengths: you have a specific, concrete problem (the error), relevant context (the code and trace), and a clear success criterion (the bug is fixed).

The Debug Prompt Structure

Effective debugging prompts have four components: the error message, the relevant code, the expected behavior, and the actual behavior. A fifth component — additional context — improves quality when available.

def build_debug_prompt(
    error_message: str,
    code_snippet: str,
    expected_behavior: str,
    actual_behavior: str,
    context: str = ""
) -> str:
    """Build a structured debugging prompt."""
    prompt = f"""I'm debugging a Python issue and need help identifying the root cause.

**Error Message:**
{error_message}

**Relevant Code:**
```python
{code_snippet}

Expected Behavior: {expected_behavior}

Actual Behavior: {actual_behavior} """ if context: prompt += f"\nAdditional Context: {context}"

prompt += "\n\nPlease analyze step by step and suggest specific fixes."
return prompt


Using this structure consistently produces better debugging responses because AI has all the information it needs to reason about the problem. Missing any of the four components forces AI to make assumptions — and it will make them confidently.

### Rubber Duck Debugging at Scale

The classic "rubber duck debugging" technique — explaining your code aloud to a rubber duck — is effective because the act of articulation surfaces assumptions and logical gaps. AI is a rubber duck that talks back.

The rubber duck prompting approach for complex bugs:

I'm going to explain a bug to you and I want you to ask clarifying questions until you understand the system well enough to suggest what might be wrong. Do not suggest fixes until you fully understand the context.

Here is the situation: [describes the bug symptom] [then answer AI's questions]


The conversation structure — answering AI's clarifying questions — often produces the insight without AI ever needing to identify the bug. The act of answering "what happens when X?" forces you to think through the system behavior precisely, which is where bug insight typically comes from.

### "Explain This Stack Trace"

Stack traces are information-dense and difficult to parse quickly, especially in unfamiliar frameworks. AI is highly effective at explaining stack traces:

Explain this stack trace. Tell me: 1. What error occurred and where 2. The call sequence that led to the error (bottom to top) 3. The most likely cause given this trace 4. What information I should gather to debug further

[PASTE STACK TRACE]


For long stack traces from complex systems, also ask: "What part of this stack trace is framework boilerplate that I can ignore, versus application code that I should focus on?"

### Intermittent Bugs

Intermittent bugs — issues that occur sometimes but not reliably — are the hardest to debug because you cannot reproduce them on demand. AI can help by:

**Generating a hypothesis list:** Describe the symptom (what happens, how often, under what conditions) and ask for hypotheses. Intermittent failures are commonly caused by race conditions, resource exhaustion under load, time-dependent logic, or external service availability. AI can enumerate these quickly.

**Designing diagnostic instrumentation:** "I have an intermittent bug that I believe is a race condition in this code. What logging should I add to capture the state at the moment the bug occurs?"

**Analyzing logs from past occurrences:** If you have logs from past occurrences, AI can help identify the common conditions. Paste the log excerpts from affected runs alongside clean runs and ask: "What is different about the system state when the error occurs versus when it does not?"

### Hypothesis Generation for Complex Bugs

When you have gathered diagnostic information but are not sure what is causing the bug, structured hypothesis generation is valuable:

Here are the symptoms of a bug I'm investigating: - [symptom 1] - [symptom 2] - [symptom 3]

Here is what I've ruled out so far: - [eliminated hypothesis 1] — because [evidence] - [eliminated hypothesis 2] — because [evidence]

Here is the relevant code and system context: [paste context]

Generate 5 hypotheses for what might be causing this bug. For each hypothesis: 1. What would cause this symptom pattern 2. How to test whether this hypothesis is correct 3. How likely you think this hypothesis is given the available evidence


---

## 6. Testing: AI-Assisted Test Generation

Testing is one of the strongest AI use cases in software development because the task is well-structured: given a function, generate cases that test its behavior. AI is good at this in specific ways and unreliable in others.

### Unit Test Generation

Unit test generation is AI's strongest testing contribution:

Generate comprehensive unit tests for this Python function using pytest. Include tests for: - The happy path (expected normal usage) - Edge cases: empty input, null/None values, boundary values - Error cases: invalid types, values that should raise exceptions - Any cases you identify as potentially problematic based on the implementation

Use pytest fixtures where appropriate. Include docstrings explaining what each test verifies.

[PASTE FUNCTION]


The resulting tests are a starting point. You should add tests for:
- Business-logic edge cases that AI does not know about (specific value ranges that have business significance)
- Cases that require knowledge of external system behavior
- Regression tests for bugs that have occurred in production

### Edge Case Identification

One of AI's most valuable contributions to testing is identifying edge cases you had not thought of:

I've written tests for this function [paste function and existing tests]. What edge cases have I missed? Think about: - Numeric boundaries (zero, negative numbers, maximum values for integer types) - String edge cases (empty string, very long strings, special characters, unicode) - Collection edge cases (empty lists, single-element lists, duplicates) - Type edge cases (None where object expected, integer where float expected) - Concurrency edge cases if applicable - Time and date edge cases


This is particularly useful because edge case lists are comprehensive in AI's training data — common failure patterns are well-represented.

### Coverage Analysis

When you have an existing test suite and want to improve coverage:

Here is my function and my current tests. Identify what the tests are NOT covering. Be specific: name the code paths, branches, and conditions that have no test coverage.

[PASTE FUNCTION AND EXISTING TESTS]


### Integration Test Scaffolding

Integration tests — tests that verify how components work together — require more contextual knowledge than unit tests. AI can generate scaffolding, but the specifics must come from you:

Generate an integration test scaffold for this API endpoint [paste endpoint code]. Use pytest and the httpx test client. The test should: 1. Set up test data (I'll fill in the specific data) 2. Make a request to the endpoint 3. Verify the response structure (I'll specify the exact expected values) 4. Verify the database state after the request (I'll specify what to check) 5. Clean up after itself

Generate the scaffold with clear TODOs where I need to fill in specifics.


---

## 7. Documentation: Docstrings, READMEs, API Docs

Documentation is one of the best AI use cases in software development. The genre prioritizes clarity and completeness over voice; the task is well-defined; and the input (the code) provides most of what AI needs to produce good output.

### Docstrings

Generate comprehensive docstrings for all functions in this module. Follow Google Python docstring format. For each function, include: - A one-sentence summary - Args: name, type, description, and whether optional - Returns: type and description - Raises: exceptions that can be raised and when - An example in the Examples section for functions that are not self-evident

[PASTE MODULE]


Review generated docstrings for accuracy — AI may generate plausible-sounding descriptions that do not accurately reflect edge behavior.

### README Generation

Generate a comprehensive README.md for this project.

Project context: [brief description] Primary audience: [developers setting up this project / end users / API consumers]

Include: 1. Project description (what it does, who it's for) 2. Prerequisites (system requirements, dependencies) 3. Installation instructions 4. Configuration (environment variables, config files) 5. Usage examples with code snippets 6. API reference (if applicable) 7. Contributing guidelines 8. License

Base this on the code and configuration files I've provided. Note anything you're uncertain about or where I need to fill in specifics.

[PASTE RELEVANT CODE AND CONFIG FILES]


### API Documentation

For REST APIs, AI can generate OpenAPI-compatible documentation:

Generate OpenAPI 3.0 documentation for this FastAPI endpoint. Include: summary, description, parameters with types and constraints, request body schema, response schemas for all status codes, and example request/response pairs.

[PASTE ENDPOINT CODE]


---

## 8. Refactoring: AI-Assisted Pattern Identification

Refactoring — improving code structure without changing behavior — is a good fit for AI assistance because it has clear patterns and the success criterion is objective: tests still pass, behavior is unchanged, code is measurably cleaner.

### Identifying Refactoring Opportunities

Review this code and identify the highest-priority refactoring opportunities. For each opportunity: 1. What the current code is doing (the pattern to fix) 2. What the refactored version would look like (the pattern to apply) 3. Why this refactoring matters (readability, maintainability, performance) 4. Any risks in the refactoring that I should be aware of

Focus on patterns like: duplicated logic, overly complex functions, poor naming, magic numbers, missing abstractions, inappropriate coupling.

[PASTE CODE] ```

Refactoring Execution

When AI proposes a specific refactoring, ask for the refactored code in stages:

Agree on the approach before generating code.
Generate the refactored version.
Ask AI to highlight every change made and explain why.
Run existing tests against the refactored code.
Review for behavior changes that tests might not catch.

The most common refactoring risk: tests pass but behavior changes in ways the tests do not cover. Always review refactored code for behavior changes, not just test results.

9. Trust Calibration Imperative

The trust calibration requirements for AI-assisted software development are more severe than in almost any other professional domain. The outputs are invisible (code behavior is not directly visible) and the consequences of errors can be serious (security vulnerabilities, data corruption, production outages).

Security Review is Non-Negotiable

AI-generated code should receive security review before being merged, regardless of whether AI was asked to review it during generation. The reasons:

AI does not know your threat model. Security depends on context — who the adversaries are, what data is valuable, what the consequences of compromise are. AI reviews code in isolation, not in the context of your specific security requirements.

AI has training data biases toward common patterns. Unusual or novel attack vectors that are not well-represented in training data may be missed.

Security-relevant details often live in adjacent code. A function may be secure in isolation but vulnerable when combined with how it is called from elsewhere in the codebase.

The practical rule: security review of AI-generated code should always include a human who understands the security requirements and the system context.

Testing Everything Before Merging

Never merge AI-generated code without running the full test suite. Never merge code that changes untested behavior without adding tests. The testing requirement applies equally to AI-generated code and human-generated code — but AI-generated code deserves particular attention because:

AI-generated tests may have been generated by the same model that generated the code, creating correlated blind spots
AI-generated code may work correctly for all tested cases while failing for untested cases that would be obvious to a human reviewer

Dependency Verification

When AI generates code that introduces new dependencies, verify: - The package name is correct and is the intended package (package squatting is a real attack vector) - The version specified is current and has no known security vulnerabilities (check advisories.python.org for Python packages) - The license is compatible with your project's requirements - The package is actively maintained

Never Commit AI Code Without Review

The standard for committing AI-generated code is identical to the standard for committing human-generated code: the committer understands the code and is responsible for it. "AI wrote it" is not a defense for code that causes a production incident.

A practical standard: before committing any AI-generated code, you should be able to explain every line to a colleague without hesitation.

10. Raj's Full Development Workflow: A Day in the Life

🎭 Scenario Walkthrough: Raj's Day

It is a Tuesday. Raj is building a new feature: idempotent transaction processing — the ability to safely retry failed payment transactions without double-charging customers. He has three subtasks: design the idempotency mechanism, implement the endpoint with idempotency key support, and write tests.

8:30 AM — Architecture Discussion

Raj begins with an architecture conversation, not implementation. He pastes the relevant existing code (payment endpoint, database schema, relevant models) and describes the requirement.

"I need to add idempotency key support to our payment transaction API. The requirement is: clients can include an Idempotency-Key header; if we receive two requests with the same key within 24 hours, we should return the same response as the first request without processing the second. I want to discuss approach before writing any code."

AI presents three approaches: a dedicated idempotency_keys database table, a Redis cache with TTL, and a hybrid approach. Raj knows their Redis instance has had reliability issues during traffic spikes — something AI cannot know. He eliminates the Redis-only approach and discusses the trade-offs between the other two. He chooses the database table approach with Redis as an acceleration layer.

This discussion takes thirty minutes. No code exists yet.

9:00 AM — Implementation, Explain Then Generate

Raj asks AI to describe the exact implementation plan before generating code:

"Describe the implementation plan for the database table approach. Include: the schema for the idempotency_keys table, the middleware or decorator approach for intercepting requests, the exact lookup and storage logic, and how to handle concurrent requests with the same key. Do not write code yet."

He reviews the proposed plan. One issue: AI proposed using a unique constraint on the key to handle concurrent requests, but did not account for the 24-hour expiry. He raises this. AI proposes adding a created_at column with a composite unique constraint scoped to active keys. They agree on the approach.

Now he asks for code, stage by stage: 1. The SQLAlchemy model for the idempotency_keys table 2. The database migration 3. The middleware class 4. The updated payment endpoint

Each stage is reviewed before proceeding to the next. In stage 3, the middleware class, Raj catches an issue: the AI implementation stores the raw response body in the database as a text field without specifying encoding, which could cause issues with non-ASCII response bodies. He specifies UTF-8 encoding explicitly.

11:00 AM — Security Review

Raj runs the completed implementation through a security review prompt:

"Security review this idempotency implementation. I'm specifically concerned about: (1) the idempotency key could contain arbitrary user input — is there a risk of injection or key collision attacks? (2) could an attacker use this to enumerate whether specific transactions occurred? (3) is there a timing attack risk in the key lookup?"

The security review surfaces one issue he had not considered: if idempotency keys are predictable (e.g., client generates them as UUID v1 with time-based components), an attacker who can observe key patterns could potentially predict future keys. He adds a note to the implementation guide recommending UUID v4 keys and adds validation that keys conform to UUID format.

1:00 PM — Test Generation

Raj generates tests in two rounds. First, a comprehensive unit test prompt for the middleware class. Then, specifically focused edge case generation:

"What edge cases for the idempotency implementation have I not tested yet? Think specifically about: concurrent requests with the same key, keys that arrive at exactly the 24-hour expiry boundary, keys with unusual characters, and race conditions in the key storage step."

The edge case analysis surfaces one critical missed test: what happens when two requests with the same key arrive simultaneously before either has been processed? His current implementation uses a database unique constraint which will cause the second request to raise a database exception — which he needs to catch and convert to a cache-hit response, not a 500 error. He adds this test and the corresponding error handling.

2:30 PM — Documentation

Raj generates docstrings for the new module and adds API documentation for the idempotency header. He reviews the generated documentation for accuracy — AI correctly describes the behavior for the happy path but is imprecise about the error behavior for malformed keys. He fixes this.

3:30 PM — Code Review Submission

Raj submits the PR. The implementation took approximately six hours with AI assistance. He estimates the same implementation without AI would have taken two full days — not because the implementation is complex, but because the research (idempotency patterns, database approaches, security considerations) and the boilerplate code generation would have taken much longer.

The code that merges is code Raj understands fully. He can explain every line. He knows the edge cases. He knows the security considerations. The AI was a collaborator; Raj is the author.

11. Research Breakdown: Empirical Studies on AI-Assisted Development Productivity

📊 Research Breakdown

The empirical evidence on AI-assisted development productivity is the strongest in any professional domain studied to date.

GitHub Copilot Study (2022): GitHub's internal study found that developers using Copilot completed a specific coding task 55% faster than the control group and were 88% more likely to describe themselves as more productive with the tool. This study used a controlled experimental design — the strongest evidence type available.

Replication and generalization (2023-2024): Multiple independent replications and similar studies have found productivity gains in the 30-55% range for code generation tasks. The gains are most consistent for boilerplate and scaffolding code; less consistent for novel algorithmic problems.

Quality effects are mixed: The GitHub study found no statistically significant quality difference in task completion between Copilot-assisted and unassisted developers. Other studies find that AI-assisted code has higher rates of specific bug categories — particularly security vulnerabilities — when code is not reviewed carefully. The implication is that AI assistance is quality-neutral only when paired with appropriate review.

The security finding: A 2023 Stanford study found that developers who received AI coding assistance were more likely to introduce security vulnerabilities than developers working without assistance. The researchers attributed this to over-trust: developers accepted AI-generated security-relevant code without adequate scrutiny because the code looked correct. This finding directly motivates the security review requirements in this chapter.

Expert vs. novice differences: Studies consistently find larger productivity gains for junior developers than senior developers on routine tasks, but larger quality benefits for senior developers who use AI as a review and discussion tool rather than a code generation tool.

The debugging finding: A 2024 study of AI-assisted debugging found that developers using AI for hypothesis generation resolved bugs 40% faster on average. The gain was largest for bugs in unfamiliar codebases where AI's broad knowledge of common bug patterns compensated for the developer's unfamiliarity with the specific system.

The practical synthesis: AI development assistance produces consistent productivity gains when paired with appropriate code review. The productivity gains are real and substantial. The security risk from over-trust is also real and requires active management through review practices.

✅ Best Practice Before submitting any AI-generated code for review, run through the "can I explain this?" test: for every non-trivial function, every security-relevant decision, and every place where the code does something subtle — can you explain it to a colleague without referring to the AI conversation? If not, you do not understand the code well enough to commit it.

⚠️ Common Pitfall Asking AI to write tests for code AI just generated, without human oversight of either. When AI generates both the implementation and the tests, both products reflect the same assumptions — including wrong ones. The tests may pass while completely missing important failure cases that a human would have recognized.

💡 Intuition Think of AI as a very fast, very well-read pair programming partner who has read every Stack Overflow answer and GitHub repository but has never shipped software to production and has no understanding of your specific system's history, failure modes, or business requirements. They generate code faster than anyone you have ever worked with. They also have no skin in the game if it breaks. You are the one who has to explain the incident. Act accordingly.