Case Study: Raj's Coding Standards — When Half the Team Used AI and Half Didn't
The Situation
Raj had been a software engineering lead for three years when AI coding assistants began reshaping how his team worked. He'd adopted the tools himself about a year before the events of this case study and had developed a careful, effective workflow: using AI coding assistants for first drafts, boilerplate, and debugging, while maintaining full ownership of architecture decisions, security considerations, and anything touching authentication or data storage.
His team of twelve was more heterogeneous in their adoption. Some had followed Raj's lead — careful use with strong verification habits. Others had adopted aggressively without equivalent care. And about a third had significant reservations and were using AI tools minimally or not at all.
For several months, this felt manageable. Different people worked differently; that was normal. Code review caught problems regardless of how they were introduced.
But over the course of about four months, Raj began noticing something in code review that concerned him: code quality was becoming bimodal. Some PRs were excellent — well-structured, well-tested, well-documented. Others were passing technical review but had subtle problems that were increasingly hard to catch: logic errors in edge cases, inconsistent error handling, security assumptions that looked plausible but were subtly wrong.
The bimodal pattern didn't map cleanly onto experience level. Some of his most experienced developers were producing the problematic code. What it mapped onto, he eventually realized, was a specific kind of AI use: developers who were using AI to produce code they didn't fully understand.
The Diagnostic
Raj didn't come to this conclusion quickly. His first hypothesis was that the problem was a rushing — deadline pressure leading to lower-quality submissions. He collected data on PR submission timing and found no correlation with deadline proximity.
His second hypothesis was documentation quality as a proxy for code quality — he hypothesized that well-documented code would tend to be better understood by the developer who wrote it. He asked his team leads to start tracking documentation thoroughness in code reviews. The pattern that emerged wasn't perfect, but it suggested something: the developers producing the most problematic code were also producing documentation that sounded correct at a high level but was thin on specifics.
A conversation with one of his senior developers, Chen, gave him the clearer picture. Chen told him, candidly, that he'd been using AI to write code for some complex data processing functions and that he was "pretty sure" the logic was right but hadn't traced through it line by line. "AI is usually right on this kind of thing," Chen said. "And I verified the inputs and outputs."
Raj pulled the PR for those functions and reviewed them carefully. The inputs and outputs were correct. The internal logic had a subtle error in how it handled null values in certain combinations — the kind of error that would surface in production under specific conditions that didn't occur in the test cases.
The issue wasn't that AI had produced bad code. The issue was that Chen had adopted a trust model — "AI is usually right, I'll verify inputs and outputs" — that wasn't calibrated to the risk level of the code he was writing.
The Intervention
Raj's approach was deliberately not punitive. He'd made similar calibration errors himself early in his AI adoption — trusting too broadly before he'd developed the judgment to know what to verify. He didn't want an intervention that would drive AI use underground or create resentment.
His first step was convening a working group: two of his strongest AI-using developers, two developers who were skeptical and used AI minimally, and himself. He framed the mandate explicitly: "We need AI coding standards. I want this group to write them, because I want them to be good — and that means getting perspectives from people who use AI heavily and people who don't."
The composition of the group turned out to be essential. The heavy AI users had workflow knowledge and practical insight into what the risks actually were in practice. The skeptics had articulate perspectives on what could go wrong and what manual-coding discipline the standards should preserve. The resulting standards document was more practical and more trusted than anything Raj could have produced alone.
The Standards Document
The working group produced a six-section standards document. The core elements:
Approved Tools The team moved to a company-managed AI coding assistant with enterprise data handling (code wasn't being sent to external model training). Developers using personal AI tool subscriptions for company code needed explicit approval.
The Explainability Requirement This was the most debated and ultimately most important standard: A developer submitting code must be able to explain, in a code review, what every function and significant code block does and why it was implemented that way. If they can't explain it, they shouldn't submit it.
This standard wasn't about AI use — it was about code ownership. It applied equally to code written with AI assistance and code pulled from Stack Overflow or any other source. The principle: you own what you submit.
The skeptics on the working group pushed hard for this requirement. Interestingly, the heavy AI users were also supportive — they already operated this way and were frustrated when they inferred that some colleagues didn't.
Security Review Requirements AI-generated code handling authentication, session management, data storage, encryption, external API calls, or user input validation required a security review from a designated team member. This wasn't new — these areas already had informal review expectations — but formalizing the requirement as applicable to all AI-generated code in these domains gave reviewers clear grounds to request it.
Testing Requirements AI-generated code required the same test coverage as manually-written code. This rule closed a gap Raj had identified: some developers were submitting AI-generated code with minimal testing, implicitly trusting that the AI had gotten the logic right. The standard made clear that testing requirements don't relax because AI was involved.
Documentation Standards Documentation for AI-generated code had to be genuinely explanatory, not summary-level. Raj's diagnostic had identified thin documentation as a proxy for thin understanding; the standard explicitly required that documentation describe not just what code does but the decisions behind implementation choices.
One additional rule that generated the most debate: documentation could not itself be AI-generated without review. The concern was that AI-generated documentation for AI-generated code would be accurate at a high level but miss the nuances that mattered in practice. After extensive debate, the working group landed on a practical test: documentation written with AI assistance was fine if the developer could attest that it accurately described their understanding of the code.
The Co-Pilot Review Flag The final element was a code review practice rather than a standard. Reviewers could flag a PR with a "co-pilot review requested" label — indicating that they wanted to discuss the implementation with the developer before approving, not because the code was wrong but because the reviewer had questions about the implementation logic.
The flag was explicitly not a demerit. Its purpose was to create space for the kind of conversation that would catch the Chen-type situation before it reached production: "Walk me through your error handling for null values here."
Implementation
The standards were shared with the full team as a draft for two weeks before implementation, with an explicit invitation to comment. The process generated thirteen substantive comments and five revisions to the draft.
Raj was deliberate about the rollout communication: "This is about code quality, not about catching anyone out. Every one of us is still figuring out how to use these tools well. These standards are how we do that as a team."
The two AI skeptics on the working group were asked to help communicate the standards to their fellow skeptics — to make clear that the standards didn't mandate AI use, didn't disadvantage those who preferred manual coding, and in several ways validated the quality instincts they'd been voicing.
Results Six Weeks Later
Code quality: The bimodal quality pattern improved materially. The proportion of PRs that sailed through review increased; the proportion that required significant back-and-forth decreased. Raj's assessment was that the co-pilot review flag was doing meaningful work — it was surfacing conversations that previously happened only when problems were caught, and having them earlier.
Trust: Perhaps more importantly, the conversation about AI in the development process was now explicit and productive. The standards had named the problem — AI use without adequate understanding — and given everyone a shared vocabulary for addressing it.
Adoption among skeptics: The three developers who had been most reluctant to use AI tools began, gradually, to engage with specific use cases. The standards had addressed their quality concerns directly, and they no longer felt that AI adoption was something happening around them without guardrails. Two of the three started using AI for specific, limited use cases within two months.
One unintended benefit: The explainability requirement had a side effect Raj hadn't anticipated — it prompted some developers to audit their existing code for areas where they'd used AI and didn't fully understand the implementation. This audit surfaced three issues in production code that were fixed before they could cause problems.
What the Skeptics Got Right
One of the most valuable outcomes of the working group process was giving Raj a clear view of what the AI skeptics on his team had been seeing that the enthusiasts had been missing.
The skeptics weren't wrong that AI-generated code had quality risks. They were wrong that the solution was to avoid AI use — but they were right that ungoverned AI use in code was creating real problems that code review wasn't reliably catching.
The framing that finally landed for Raj: the skeptics were applying the discipline to AI-generated code that they applied to all their code. The enthusiasts sometimes weren't. The answer wasn't to abandon AI tools; it was to apply the same discipline to AI-assisted code that good developers already applied to every line they wrote.
The Broader Lesson
The lesson from Raj's case isn't specific to code — it applies to any domain where AI can produce output that looks correct at a surface level but has subtle errors detectable only by someone who fully understands the domain.
In those domains — which include legal analysis, financial modeling, scientific reasoning, and many others — the right governance response is not to restrict AI use but to establish clear standards for what understanding and verification AI assistance requires. The tool is powerful precisely because it can produce sophisticated-looking output quickly. The skill is knowing when that sophistication is real and when it's surface-level.
Raj's standards gave his team a shared framework for making that judgment call. And the working group process that produced the standards created the buy-in that made them stick.
The measurement framework Raj used to track the impact of these standards — including how he quantified the quality improvement — is covered in Chapter 39's scenario walkthroughs.