> "Programs must be written for people to read, and only incidentally for machines to execute."
Prerequisites
- 24
- 25
- 21
- 19
- Familiarity with what a code review, a pull request, a bug tracker, and a version-controlled repository are (you will not write code in this chapter)
Learning Objectives
- Rewrite a harsh, vague code-review comment into one that is specific, kind, and critiques the code rather than the author (apply).
- Write a pull-request description that answers what changed, why, and how a reviewer can test it (apply).
- Turn a useless bug report into one a developer can act on—reproduction steps, expected vs. actual, and environment (apply).
- Write a user story with testable acceptance criteria, and explain why acceptance criteria are the contract for 'done' (apply).
- Evaluate an incident write-up for blamelessness and judge whether it produces system fixes rather than individual blame (evaluate).
In This Chapter
- Chapter Overview
- 34.1 Code Review: Critiquing the Code, Not the Coder
- 34.2 Pull-Request Descriptions: What, Why, and How to Test
- Notes
- Expected
- Actual
- Environment
- Notes
- 34.5 Design Docs and Technical Specifications: Thinking on Paper Before Building
- 34.6 Postmortems and Incident Reviews: Debugging the System, Not the Human
- 34.7 Release Notes and the Software-Engineering Writing Ecosystem
- 34.8 Common Mistakes & Practical Considerations
- Frequently Asked Questions
- Chapter Summary
- Spaced Review
- What's Next
Chapter 34: Writing for Computer Science: From Code Reviews to Architecture Decision Records
"Programs must be written for people to read, and only incidentally for machines to execute." — Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs
Chapter Overview
Raj Patel's chronoparse is healthy now. The README has a quick-start, the API reference documents its errors, and an Architecture Decision Record explains why the grammar engine uses parser combinators instead of regular expressions (Chapter 25). The project has users, and—because it has users—it now has collaborators. And the moment a second person touches a codebase, a new and larger category of writing appears: not the documentation that explains the code, but the writing that surrounds the work of changing it. A contributor opens a pull request; Raj reviews it and leaves comments. A user files a bug; someone has to make it reproducible enough to fix. A feature gets proposed; someone writes it up as a story with acceptance criteria so everyone agrees what "done" means. And one Friday, chronoparse's new date-parsing API returns the wrong day for half an hour in production, and the team has to write a postmortem that makes the system safer instead of making someone feel small.
This is the writing that fills a software engineer's day, and almost none of it is code. A study of how developers actually spend their time would find a startling fraction of it on prose: review comments, PR descriptions, bug reports, design docs, incident write-ups, status updates, and the endless small acts of explaining a change to another human. It rarely gets taught—you learn it by absorbing your team's habits, good or bad—and it is, quietly, what separates an engineer people want to work with from one whose technical brilliance is undermined by review comments that wound, bug reports nobody can act on, and postmortems that hunt for someone to blame. This chapter is theme 2 (audience is everything) aimed at the specific audiences of software work: a teammate reading your critique of their code, a maintainer trying to reproduce your bug, a future engineer reading why an incident happened. It builds directly on Chapter 24 (comment the why), Chapter 25 (docs are a contract; document the failures), and Chapter 21 (blameless incident reports with owners and dates)—the same disciplines, now applied to the social act of building software together.
Here is the threshold this chapter asks you to cross, and it is a single idea wearing two faces: in a code review you are critiquing the code, not the coder; in a postmortem you are debugging the system, not the human. Both are the same move—separating the artifact or the system from the person—and both are hard precisely because the writing feels personal when it isn't supposed to be. Cross this line and a harsh comment ("this is wrong, did you even test it?") becomes a specific one ("this returns None when items is empty—line 14—was that intended? a guard would fix it"), and a blame-seeking incident report ("Priya pushed a bad config") becomes a blameless one ("the deploy pipeline let an unreviewed config reach production"). The point isn't politeness for its own sake; it's that only the code-and-system framing produces the outcome you actually want—better code and fewer outages—while the person framing produces defensiveness, hidden mistakes, and engineers who stop telling you the truth. By the end of this chapter you will be able to rewrite a brutal code-review comment into one that's specific and kind, write a PR description and a bug report a stranger can act on without asking you a question, and write a blameless postmortem that makes the system safer—and you'll watch Raj's team do all of it on chronoparse.
In this chapter, you will learn to:
- Write code-review comments that are specific, kind, and aimed at the code—distinguishing blocking objections from optional nits.
- Write a pull-request description that tells a reviewer what changed, why, and how to test it.
- Write a bug report a developer can act on: reproduction steps, expected vs. actual behavior, and environment—and recognize why a minimal reproducible example is the most valuable thing you can attach.
- Write a user story with testable acceptance criteria, so "done" is a contract rather than an argument.
- Write a blameless postmortem and design doc that make a team smarter instead of assigning fault.
📗 Software/CS track: This is one of your core field chapters, and it assumes Chapter 24 (Code Documentation) and Chapter 25 (README/API docs). Where those chapters covered the writing attached to code (comments, docstrings, READMEs, ADRs), this one covers the writing around the work: reviews, PRs, bug reports, stories, design docs, and postmortems. It pairs with Chapter 21 (Workplace Reports), whose incident report is the direct ancestor of §34.6's postmortem, and Chapter 33 (Writing for Engineering), whose requirements language (SHALL/SHOULD) is the cousin of §34.4's acceptance criteria. Prioritize §34.1–§34.2 (code review), §34.3 (bug reports), and §34.6 (postmortems)—they're the writing you'll do most and the writing most often done badly.
34.1 Code Review: Critiquing the Code, Not the Coder
Open any shared codebase and you'll find the most frequent writing a software engineer ever does: the code-review comment. Before a change merges, a teammate reads it and writes reactions—questions, objections, suggestions, approvals—line by line. Done well, code review is one of the best things software engineering has invented: it catches bugs, spreads knowledge, and keeps a codebase coherent. Done badly, it's where teams quietly poison themselves, because a review comment is read by a human who wrote the code you're critiquing, and that human will remember how it felt long after they've forgotten what it said.
Start with the failure, because it's common and it's corrosive. Here is a real-feeling review comment of the kind that gets left on pull requests every day:
❌ Before — a harsh review comment (composite, but you've seen it): "This is wrong. Why would you parse the string twice? Did you even test this? This whole approach doesn't make sense—you should know better than to do it this way."
Read it as the author will. It makes four moves, and all four are mistakes. It's vague ("this is wrong," "doesn't make sense")—the author can't tell what to change. It's about the person ("did you even test this," "you should know better")—it attacks the coder, not the code. It's a series of questions used as weapons ("why would you parse the string twice?")—not real questions but accusations dressed as inquiry. And it offers no path forward—it tears down without pointing anywhere. The author of this code, even if they were wrong, now feels attacked, gets defensive, and learns nothing except to fear this reviewer. Worse, if there was a real bug in there, it's buried under the hostility—the one useful signal lost in the noise of the put-down.
Now the same concern—the same actual technical objection—rewritten:
✅ After — specific, kind, aimed at the code: "I think this parses the input twice: once on line 12 and again on line 18. If that's right, we could parse once and reuse the result, which would also avoid the edge case where the two parses could disagree on an ambiguous string. Was the double-parse intentional, or an artifact of the refactor? Happy to be wrong here."
Why it's better: It's specific—it names the exact lines (12 and 18) and the exact concern (parsing twice), so the author knows precisely what to look at. It's about the code, not the coder—"this parses the input twice," not "you don't understand parsing." It offers a path forward—"parse once and reuse"—so the comment is a route, not just a wall. It asks a genuine question ("was the double-parse intentional?") that leaves room for the author to know something the reviewer doesn't, because sometimes the thing that looks like a mistake is load-bearing. And "happy to be wrong here" signals that the reviewer holds their objection loosely—this is a conversation between peers, not a verdict from on high. The technical content is identical to the harsh version. What changed is everything about how it will be received and acted on.
🚪 Threshold Concept: You are critiquing the code, not the coder.
How code review feels before you cross this line: the code is an extension of the person who wrote it, so a flaw in the code is a flaw in them, and pointing it out is an act of judgment about their competence. Reviewers in this frame write comments that evaluate the author ("you should know better," "this is sloppy"); authors in this frame receive every comment as a performance review and defend their code as if defending themselves. Review becomes a status contest, and the work suffers, because people stop submitting risky-but-valuable changes and stop telling each other the truth.
What code review actually is: a collaborative act of making the artifact better, in which the author and the reviewer are on the same side, both pointed at the code. The code is not the person. A bug in it is not a character flaw; it's a fact about some lines of text that two people can now improve together. Once you genuinely believe this—not as politeness but as a description of what's happening—your comments change automatically. You write "this returns
Noneon empty input" instead of "you forgot the empty case," because the first is true (it's about the code) and the second is a guess about the person that adds nothing and costs trust. You ask real questions because you might be wrong. You separate "this is a bug" from "I'd have done it differently," because only the first is the code's problem.The tell that you've crossed the threshold: you can leave a hard objection on a senior engineer's code and a gentle one on a junior's, and both read as the same kind of thing—a peer looking at the code with you—because the comment is about the code's behavior, which doesn't care who wrote it. The same move recurs at the end of this chapter as the blameless postmortem: there, you debug the system, not the human. Code review and postmortems are the same threshold seen twice.
This isn't an argument for softness. A reviewer who waves through real bugs to avoid friction is failing just as badly as one who's cruel—they've optimized for comfort over the code, which is the same mistake as optimizing for ego, just pointed the other way. The goal is candor without cruelty: say the true, hard thing about the code, clearly, and aim it at the code. A few moves make that reliable:
- Be specific: name the line, the input, the behavior. "This is confusing" tells the author nothing; "I had to read this loop three times to see that
iis the byte offset, not the character index—a rename tobyte_offsetwould make it obvious" tells them exactly what and exactly how to fix it. Specificity is kindness, because vagueness forces the author to guess what you mean and feel judged while they guess. - Distinguish the blocking objection from the nit. Not every comment carries equal weight, and an author can't tell a must-fix bug from a matter of taste unless you label it. A widely-used convention is to prefix optional comments with
nit:(a nitpick—take it or leave it) and to state plainly when something blocks the merge. "nit: I'd call thisparsedrather thanp" is clearly optional; "This is a blocking issue: this throws on empty input, which the tests don't cover" is clearly not. Mixing the two—burying a real bug among five style preferences, all in the same flat tone—wastes the author's attention on the trivial and risks the serious slipping through (the same attention-allocation problem as the exception-based reporting of Chapter 21). - Ask real questions, and mean them. "Why is this here?" can be an attack or an honest inquiry; make it honest. Often the thing that looks wrong is right for a reason you can't see from the diff, and a genuine question ("is this guarding against the case where
baseis null?") lets the author teach you, which is half of what review is for. - Praise what's good, not just what's wrong. Review isn't only a defect hunt. "This refactor makes the grammar module much easier to follow—thanks for cleaning up the token names while you were in here" costs one sentence and tells the author what to do more of. A review that only ever finds fault trains people to fear your name in the reviewer list.
- Offer the fix, or offer to pair. "This is wrong" is a dead end; "this is wrong; I think
parse_once()would handle it—want to pair on it?" is a door. You don't have to write their code for them, but pointing at the solution (or offering to find it together) turns criticism into collaboration.
🔄 Check Your Understanding A reviewer leaves this comment on a junior engineer's pull request: "No. This is not how we do error handling. Look at literally any other file." The code does, in fact, swallow an exception that should propagate. Identify three things wrong with the comment as writing, and rewrite it to be specific, kind, and aimed at the code.
Answer
Three problems: (1) Vague—"this is not how we do error handling" never says what's wrong (the swallowed exception) or where, so the author has to guess; "look at literally any other file" outsources the reviewer's job to the author and is condescending. (2) About the person—the dismissive "No." and the sarcastic "literally any other file" are aimed at the coder's competence, not the code's behavior; they'll provoke defensiveness, not a fix. (3) No path forward—it identifies (vaguely) a problem but offers no correct pattern, so even a willing author doesn't know what you want. A rewrite: "Thisexcepton line 22 catches theParseErrorand returnsNone, which hides the failure from the caller—they can't tell a real parse failure from a legitimately empty result. I think we want to let it propagate (or wrap it in ourChronoError) so callers can handle it. We do this intokenizer.py:40if you want a pattern to follow. Blocking, since it'll mask bugs in production." Specific (line 22, the swallowedParseError), about the code (what the code does* to callers), with a path forward (propagate or wrap, plus a concrete example file), and honestly labeled as blocking.
34.2 Pull-Request Descriptions: What, Why, and How to Test
A code review can only be as good as the reviewer's understanding of what they're reviewing, and that understanding comes from the pull-request description—the prose a developer writes to accompany a proposed change. A PR bundles a set of code changes ("here's what I want to merge") with a written explanation, and that explanation is the difference between a reviewer who can engage with the substance and one who has to reverse-engineer your intent from the diff before they can even begin. The diff shows what changed line by line; the description has to supply everything the diff can't: why you changed it and how the reviewer can verify it works. It is "comment the why" (Chapter 24) scaled up from a line to a changeset.
Here's the failure, which is less dramatic than a cruel review comment but just as costly in aggregate:
❌ Before — a PR with no useful description: Title:
fixesDescription: (empty)
A reviewer who opens this knows nothing. What does "fixes" fix? Why? Is this urgent? How would they test it? They now have to read every line of the diff cold, infer the intent from the implementation (which is exactly backwards—intent should explain implementation, not the reverse), guess at how to verify it, and probably ping the author with "what is this?"—the question a description exists to pre-answer. Multiply this by every PR on a busy team and you've taxed every reviewer's time and degraded every review, because a reviewer who doesn't understand the why can only check the how, missing the more important question of whether the change should exist at all.
Now a description that does its job. This is Raj reviewing a real change to chronoparse—the fix for a genuine bug in how "next \<weekday>" resolves:
✅ After — a PR description a reviewer can actually use:
## What
Fixes `parse("next <weekday>")` returning *today* when today is already that
weekday. `parse("next Tuesday")` run on a Tuesday now correctly returns the
*following* Tuesday, not today.
## Why
Reported in #150. The old logic computed "next weekday" as "the next date whose
weekday matches," which includes today when today matches. Users reading "next
Tuesday" on a Tuesday universally mean the Tuesday a week out, not today — the
word "next" implies skipping the current one. This was the single most-reported
behavior bug this quarter.
## How
Added a guard in `grammar/relative.py`: when the matched weekday equals the
reference date's weekday, advance by 7 days. Falls back to the existing logic
otherwise, so all other relative-date parsing is unchanged.
## How to test
```bash
pytest tests/test_relative.py -k weekday
New cases cover: today == target weekday (skips to next week), today != target
(unchanged), and the reference argument interacting with the guard. To check
by hand, set your system clock to a Tuesday and run parse("next Tuesday").
Notes
- No API change; behavior change only. Added to CHANGELOG under "Fixed."
- Does not touch "this Tuesday" / "last Tuesday" — out of scope for #150.
**Why it's better:** It's built around the three questions a reviewer (and the future reader of the project's history) actually has, in order. **What** states the change in one or two sentences a human can grasp before reading a single line of code—the headline, BLUF-style ([Chapter 21](../../part-04-professional-workplace-writing/chapter-21-workplace-reports/index.md)). **Why** gives the reasoning the diff can never show: it links the originating issue (#150), explains the *user-facing* rationale ("next" implies skipping the current one), and even supplies a priority signal ("most-reported behavior bug this quarter") that helps the reviewer decide how urgently to engage. **How** summarizes the approach so the reviewer reads the diff already knowing the shape of it—and crucially, names what *didn't* change ("falls back to existing logic"), which is reassurance a reviewer would otherwise have to verify by hand. **How to test** is the section that separates competent PRs from sloppy ones: it gives the exact command to run the relevant tests, says what the new tests cover, and offers a manual check—so the reviewer can *verify* the change rather than merely *read* it. And **Notes** flags the scope boundary ("does not touch 'this Tuesday'") that pre-empts the reviewer's natural "but what about…?"
The "how to test" section deserves its own emphasis, because it's the one most often skipped and most valuable. A reviewer's deepest question is not "does this code look right?" but "does this change actually do what it claims, without breaking anything else?"—and they can only answer that if you tell them how to check. A PR that says "how to test: run `pytest tests/test_relative.py -k weekday`; I added cases for the today-equals-target case" lets a reviewer confirm the fix in thirty seconds. A PR that says nothing about testing forces the reviewer to either trust you (risky) or invent a test plan themselves (slow). This is [Chapter 25](../../part-05-software-data-writing/chapter-25-readme-api-docs/index.md)'s "every example must be runnable" applied to the verification of a change: give the reader the exact command, and make sure it works.
> 🔍 **Why Does This Work?**
> A PR description and the code diff contain overlapping information—both, in a sense, describe the change. So why isn't the diff enough? Why does a good description make review faster *and* better, rather than just repeating what the reviewer can read for themselves? Think before reading on.
>
> Because the diff and the description answer *different questions*, and the reviewer needs both in a specific order. The diff is the *what*: the exact lines that changed, the ground truth of the implementation. But a reviewer can't evaluate an implementation without first knowing its *intent*—you can't tell whether code is correct until you know what it's *supposed* to do, and the diff can't tell you that. (A function that returns `None` on empty input is correct or buggy entirely depending on what the caller is promised, and the diff is silent on the promise.) The description supplies the intent *before* the reviewer reads the implementation, so they read the diff as *evidence for or against a claim* ("does this code actually skip to next week when today matches?") rather than as a puzzle to decode. That's faster (no reverse-engineering) and better (the reviewer can now catch the deepest class of bug—code that works but does the wrong thing—which is invisible if you only check the *how* against itself). The "how to test" section adds a third thing the diff lacks entirely: a way to *verify* rather than merely *inspect*. Reading code can confirm it *looks* right; running the test confirms it *is* right. A description turns review from "read and hope" into "understand the claim, inspect the evidence, run the check."
A few practical rules for PR descriptions:
- **The title is the headline; make it informative.** "fixes" tells no one anything; "Fix `parse('next Tuesday')` returning today when today is Tuesday (#150)" tells a teammate scanning a list of PRs exactly what this one is. The PR title also usually becomes the merged commit message that lives in the project's history forever—it's read long after the review is done, by people debugging six months from now (the future reader of [Chapter 24](../../part-05-software-data-writing/chapter-24-code-documentation/index.md)).
- **Link the issue; don't restate it.** If the PR addresses a tracked bug or feature, link it ("#150") rather than re-explaining the whole context. The link gives the reviewer the full history if they want it without bloating the description.
- **Match the description to the size of the change.** A one-line typo fix needs a one-line description ("Fix typo in error message"). A 400-line refactor that changes the grammar engine needs the full what/why/how/test treatment, possibly with a design doc linked. The effort is proportional to the stakes—the same calibration as everywhere in this book.
- **Keep PRs small.** This is a writing-adjacent point with outsized payoff: a small PR is easier to *describe*, easier to *review*, and easier to *test* than a giant one. If you can't summarize a PR's *what* in two sentences, it's probably doing too many things and should be split—which makes both the writing and the review tractable.
> ✏️ **Try This**
> Find the last pull request you opened (or open a recent one in any public project). Read its description as a stranger would. Can you answer, from the description alone: *what* changed, *why*, and *how to verify it*? If any of the three is missing, you've found the gap that made your reviewer's job harder. Now rewrite the description with the four headings—What, Why, How, How to test—and notice how the act of writing the *why* sometimes reveals that the change is doing more than one thing (theme 1: writing is thinking—articulating the change clarifies it).
---
## 34.3 Bug Reports: Making a Problem a Developer Can Act On
When software misbehaves, someone files a **bug report**, and the gap between a useful one and a useless one is enormous. A useful bug report lets a developer reproduce the problem, understand what should have happened instead, and start fixing—often within minutes. A useless one starts a frustrating exchange of "can you give me more detail?" that can stretch for days and sometimes ends with the bug closed unfixed, not because it wasn't real but because no one could ever pin it down. The bug report is a small genre with a high skill ceiling, and the skill is almost entirely about *anticipating what the developer needs to act*—theme 2, audience, applied to the person who will fix your problem.
Here is the bug report developers dread:
> **❌ Before — a useless bug report:**
> **Title:** *doesn't work*
> **Body:** *"I tried to use chronoparse and it gave me the wrong date. This is broken. Please fix ASAP."*
A developer reading this knows essentially nothing actionable. *What* did you try to parse? What date did it give you? What did you *expect*? Which version, on what platform? "The wrong date" could mean a thousand different inputs producing a thousand different wrong outputs, and "broken" plus "ASAP" adds urgency without information—pressure to fix a problem the developer can't even *see*. The maintainer's only possible reply is a list of questions, and now the bug is a multi-day negotiation before anyone has touched a line of code. The report failed because it described the reporter's *feeling* (frustration) instead of the *facts* a fixer needs.
Now the same bug, reported well:
> **✅ After — a bug report a developer can act on:**
```markdown
## Summary
`parse("in 2 weeks", reference="2024-02-15")` returns a date one day early.
## Steps to reproduce
```python
from chronoparse import parse
parse("in 2 weeks", reference="2024-02-15")
Expected
datetime(2024, 2, 29, 0, 0) — 2024 is a leap year, so two weeks after
Feb 15 is Feb 29.
Actual
datetime(2024, 2, 28, 0, 0) — off by one day. Looks like the offset
calculation skips Feb 29.
Environment
- chronoparse 2.1.0
- Python 3.11.4
- macOS 14.2
Notes
Non-leap-year inputs ("in 2 weeks" from "2023-02-15") return the correct date, so this is specific to crossing Feb 29.
**Why it's better:** Every piece is something the developer needs to act. The **summary** states the bug in one line—a developer scanning the issue tracker knows instantly what this is. The **steps to reproduce** are the single most valuable thing in any bug report: a *runnable* snippet that triggers the bug, so the developer can see it on their own machine in seconds instead of guessing what the reporter did. **Expected vs. actual** is the heart of any bug report—it states what *should* have happened and what *did*, because a bug is precisely a gap between expectation and behavior, and naming both halves turns "it's wrong" into a specific, checkable claim. The **environment** (version, language runtime, OS) matters because bugs are often specific to a version or platform, and "works for me" is frequently an environment mismatch the developer can only spot if you say what yours is. And the **notes**—"non-leap-year inputs work, so it's specific to Feb 29"—are gold: the reporter has done some of the diagnostic work, narrowing the search before the developer even starts. This report could be fixed in the time the useless one would take just to clarify.
> 🧩 **Productive Struggle**
> Before reading on: a teammate says, "I don't have time to write a fancy bug report—I'll just describe what happened and the developer can investigate. That's their job, not mine." They're about to file: *"The export feature crashes sometimes when I have a lot of data."* What, specifically, will go wrong with this report, and what are the *three* highest-value pieces of information they could add in two extra minutes that would most reduce the total time-to-fix? Think before reading.
>
> <details><summary>One good answer</summary>**What goes wrong:** "sometimes" and "a lot of data" are unreproducible. A developer can't fix what they can't see, so the first thing that happens is *not* investigation—it's a round-trip of questions ("how much data? what's the exact error? can you give me a file that does it?"), which can take days and often stalls. The reporter's "that's their job" is half-right (fixing is the developer's job) and half-wrong: *reproducing* is a shared job, and the reporter is the only person who currently *has* the failing case. **The three highest-value additions:** (1) **A reproduction case**—ideally the actual data file (or a minimal version of it) that triggers the crash, or the exact steps. This is the single biggest lever; a developer who can reproduce can usually fix. (2) **The exact error**—the full error message and stack trace, not "it crashes." The trace often points straight at the failing line. (3) **The boundary**—"it works with 1,000 rows but crashes at ~50,000," which turns "sometimes" into a specific, testable threshold and hints at the cause (a memory or size limit). Two minutes of this converts a multi-day negotiation into a same-day fix. The deeper lesson, again: the report's job is not to *express the problem* but to make it *reproducible and actionable*—the audience is a developer who needs to act, not a sympathizer who needs to hear you're frustrated.</details>
The most powerful idea in bug reporting deserves its own name: the **minimal reproducible example** (often "MRE," "repro," or "MCVE"). It's the smallest, self-contained piece of code or the fewest steps that reliably triggers the bug—nothing extra. Raj's leap-year report has one: three lines that anyone can run. A minimal repro is valuable for two reasons. First, it lets the developer *see* the bug immediately, which is most of the battle. Second—and this is the part reporters underestimate—the act of *minimizing* often reveals the cause. When you strip your 500-line program down to the 3 lines that still fail, you frequently discover the bug was in your code, not the library's, or you narrow it to one specific input that points straight at the fix. Building the minimal repro is debugging; it serves the reporter, not just the developer. (This is theme 1 again: the writing—here, the act of reducing the problem to its essence—*is* the thinking that locates the bug.)
A short checklist for a bug report that gets fixed:
- **A one-line summary** that names the specific misbehavior, not "doesn't work."
- **Steps to reproduce** — exact, runnable, minimal. A code snippet or numbered steps a stranger can follow on a clean setup ([Chapter 22](../../part-04-professional-workplace-writing/chapter-22-instructions-procedures/index.md)'s tested-instructions discipline, and [Chapter 25](../../part-05-software-data-writing/chapter-25-readme-api-docs/index.md)'s clean-machine standard, applied to a defect).
- **Expected vs. actual** — both halves, stated as concretely as possible (the exact wrong output, not "the wrong result").
- **Environment** — version of the software, language/runtime version, OS; whatever could plausibly matter.
- **Evidence** — the full error message and stack trace (not a paraphrase), a screenshot for UI bugs, logs if relevant.
- **What you've already ruled out** — any narrowing you've done ("only happens on leap years," "not present in 2.0.x"). Optional, but it can save the developer hours.
> 🔄 **Check Your Understanding**
> Two bug reports describe the same defect. Report A: *"Login is broken, fix it."* Report B: includes a summary, three numbered steps to reproduce on a fresh account, the expected behavior ("redirect to dashboard"), the actual behavior ("500 error, screenshot attached"), the browser and version, and the full error from the console. Beyond "B is more polite," explain *mechanically* why B will be fixed faster—what does each part of B let the developer *do* that A doesn't?
> <details><summary>Answer</summary>It's not about politeness at all; it's about what each part *enables the developer to do*. **The steps to reproduce** let the developer *see the bug themselves* on their own machine—the prerequisite to fixing anything—instead of starting a round-trip of "what did you do exactly?" that A forces. **Expected vs. actual** turns a vague "broken" into a precise, checkable claim (should redirect; instead 500s), so the developer knows what success looks like and when they've achieved it. **The actual error and stack trace** ("500 error," console output) often point *directly* at the failing line, collapsing the diagnosis. **The environment** (browser + version) lets the developer reproduce in the same conditions, since the bug may be browser-specific—and rules out "works for me" confusion. A is a *feeling* ("broken") plus *pressure* ("fix it") with zero actionable content, so the developer's first move is to ask the questions B already answered; B is *reproducible* and *diagnosable*, so the developer's first move is to fix it. B compresses a multi-day clarification loop into a single actionable artifact.</details>
> [📍 Good stopping point — you've covered the three writing tasks of everyday code work: reviewing a change, describing your own change, and reporting a defect. The rest of the chapter moves up a level to the writing that shapes work *before* it's built (stories, design docs) and *learns from it after* (postmortems, release notes).]
---
## 34.4 User Stories and Acceptance Criteria: Defining "Done" Before You Build
Before code gets written, someone has to say what it should do—and the gap between a vague request and a buildable one is where an enormous amount of software waste lives. "Add search to the app" is not a specification; it's a wish. Does search cover titles, or full text? Is it case-sensitive? What happens with no results? How fast must it be? Build from the wish and you'll build the wrong thing, ship it, and discover the mismatch in a demo—the most expensive place to discover it. The **user story** and its **acceptance criteria** exist to close that gap *before* the build, by writing down what "done" means in terms specific enough to test.
A user story is a short, structured statement of a need from the user's point of view. The common template is deliberately simple:
```text
As a [type of user], I want [some goal] so that [some reason].
The template's value is that it forces three things the vague request omits: who wants this (the user type), what they want (the goal), and why (the reason—which is the part that lets the team make good trade-offs, because it reveals what actually matters). "Add search" becomes "As a support agent, I want to search past tickets by customer email so that I can find a customer's history without scrolling." Now you know the user (support agent), the real goal (find history fast), and the why (avoid scrolling)—which already tells you that searching by email matters more than, say, fuzzy full-text search.
But the story alone still isn't buildable, because "search past tickets by email" still has a dozen unspecified behaviors. That's what acceptance criteria are for: the specific, testable conditions that must be true for the story to count as done. They are the contract. Here's the difference between a story with no real criteria and one with them:
❌ Before — a story with vague or no acceptance criteria: As a support agent, I want to search tickets by customer email so I can find their history. Acceptance criteria: "Search works well and returns the right results quickly."
"Works well," "right results," "quickly"—none of these is testable. Who decides what "well" means? Is "quickly" 100 milliseconds or 5 seconds? What are the "right results" for an email that matches nothing, or matches partially, or is typed with a trailing space? A developer building against these criteria is guessing, and a tester can't verify "done" because there's nothing concrete to check. This is the same failure as Chapter 33's "the system should be fast"—an unfalsifiable requirement is not a requirement, it's a hope.
✅ After — testable acceptance criteria:
**Story:** As a support agent, I want to search tickets by customer email
so I can find a customer's history without scrolling.
**Acceptance criteria:**
- Given a valid email that matches existing tickets, when the agent searches,
then all tickets for that email are listed, newest first.
- Given an email that matches no tickets, when the agent searches, then an
empty state reading "No tickets found for this email" is shown (not an error).
- Given a partial email ("jane@"), when the agent searches, then matching
emails are suggested (autocomplete), but the search itself requires a full email.
- Given a search on a dataset of 1M tickets, when the agent searches, then
results return in under 500 ms (95th percentile).
- Search is case-insensitive and ignores leading/trailing whitespace.
Why it's better: Each criterion is a test—a concrete condition someone can check and mark pass or fail. They use the Given-When-Then form (a widely-used pattern: given some context, when an action happens, then an outcome must hold), which forces the writer to specify the starting state, the trigger, and the required result—the three things a vague criterion leaves out. Crucially, they cover the cases the vague version ignored entirely: the empty result (show a friendly empty state, not an error), the partial input (autocomplete, but require a full email to search), the performance bar (under 500 ms at the 95th percentile on a realistic dataset—a number, not "quickly"), and the edge cases (case-insensitivity, whitespace). A developer can build to these without guessing, and a tester can verify "done" objectively. The criteria are where the real design thinking happens—deciding what an empty search should do is a product decision, and writing the criterion forces you to make it on purpose rather than have a developer make it by accident at 2 a.m.
This connects directly to the definition of done—the team-wide agreement on what conditions any piece of work must meet before it's truly finished (tests written and passing, code reviewed, docs updated, deployed to staging). Acceptance criteria are story-specific ("this search must do X"); the definition of done is universal ("everything we ship must be tested and reviewed"). Together they answer the most argued question in software—"is it done?"—with evidence instead of opinion. (Note the kinship with Chapter 33's requirements language: an acceptance criterion is a "SHALL" for a feature, written so a test can confirm it.)
🔄 Check Your Understanding A product manager writes the acceptance criterion: "The checkout process should be intuitive and user-friendly." Why is this not a usable acceptance criterion, and rewrite it as two or three testable Given-When-Then criteria for a hypothetical checkout flow.
Answer
Why it fails: "Intuitive" and "user-friendly" are subjective and untestable—two reasonable people will disagree on whether a given checkout is "intuitive," so a tester can't mark it pass/fail and a developer doesn't know what to build. It states a feeling the criterion should produce, not a behavior the system must exhibit. (Same flaw as "search works well" or Chapter 33's "the system should be fast.") Testable rewrites (examples): "Given a cart with at least one item, when the user clicks Checkout, then they reach the payment step in one click (no intermediate confirmation page)." · "Given a user has previously saved a shipping address, when they reach the shipping step, then that address is pre-filled and selected by default." · "Given a payment fails, when the error is shown, then the user's entered details are preserved (not cleared) and the specific reason is displayed." Each names a starting context, an action, and a checkable outcome—so "intuitive" becomes a set of concrete behaviors a tester can verify and a developer can build.
34.5 Design Docs and Technical Specifications: Thinking on Paper Before Building
Some changes are big enough that you shouldn't start coding until you've written down the plan—and the document that holds that plan is the design doc (sometimes a technical specification, or in many open communities an RFC, "request for comments"). A design doc describes a substantial proposed change before it's built: the problem, the proposed solution, the alternatives considered, the trade-offs, and the impact on the rest of the system. It exists for two reasons, and the first is the more important: writing the design forces you to think it through, surfacing the problems while they're cheap to fix (on paper) instead of expensive (in shipped code). The second is that it lets others critique the plan—catching the flaw you missed, raising the constraint you forgot—before anyone has sunk weeks into building the wrong thing.
This is the book's central thesis (theme 1: writing is thinking) at its most literal and highest-stakes. The senior engineer who writes a design doc and discovers, three paragraphs into "Approach," that their elegant plan doesn't handle a concurrency case has just saved the team a month—because the writing was the design work, and it failed safely on paper. The engineer who skips the doc and starts coding discovers the same flaw in week three, after building on top of it. The doc isn't bureaucratic overhead added to the engineering; for non-trivial work, the doc is the engineering, done in the cheapest possible medium.
A design doc has no single universal template, but the load-bearing sections are remarkably stable across teams:
# Title + author + date + status (draft / under review / accepted)
## Context / Problem ← What problem are we solving? Why now? Why does it matter?
## Goals / Non-goals ← What this design will and explicitly will NOT address
## Proposed solution ← The approach, in enough detail to evaluate (often with a diagram)
## Alternatives considered ← Other options, and why this one won (the most-skipped, most-valuable section)
## Trade-offs / Risks ← What this costs; what could go wrong; how we'll mitigate
## Impact ← On other systems, teams, performance, migration, rollout
## Open questions ← What's still undecided (invites the review to help)
Two of these sections do the heavy lifting and are the two most often shortchanged. Non-goals are as important as goals: explicitly stating what you're not solving ("this design does not address multi-region replication—out of scope") prevents the endless scope-creep of reviewers asking "but what about…?" and keeps the proposal tractable. And Alternatives considered is the section that turns a design doc from an announcement into an argument: by laying out the options you rejected and why, you show the reader you've done the thinking, you let them catch a better option you dismissed too quickly, and—exactly as with an ADR (Chapter 25)—you preserve the reasoning so that six months later, when someone asks "why didn't we just use a message queue?", the answer is written down instead of forgotten. (A design doc and an ADR are close relatives: a design doc proposes and evaluates a substantial change before building; an ADR is the durable, one-page record of a single decision that often emerges from it. The design doc is the deliberation; the ADR is the verdict, kept forever.)
A short before/after on the section engineers most often write badly—the problem statement:
❌ Before — a problem statement that's really a solution statement: "We should migrate the parser to use a streaming architecture with backpressure and a configurable buffer pool."
✅ After — a problem statement that states the problem: "Parsing a large log file (>1 GB) currently loads the entire file into memory, causing out-of-memory crashes on our standard 2 GB workers (see #312, #340 — four crash reports this month). We need to parse files larger than available memory. This proposal evaluates streaming approaches to fix it."
Why it's better: The "before" leaps to a solution (streaming, backpressure, buffer pool) before establishing the problem, which traps the reader into evaluating an answer to a question that was never asked—and shuts down the alternatives the design doc is supposed to weigh. The "after" states the problem concretely (>1 GB files crash 2 GB workers), quantifies it (four crashes this month, with issue links), names the actual requirement (parse files larger than memory), and then signals that solutions will follow. A reader of the "after" can evaluate whether the problem is real and worth solving, and can bring their own solution ideas; a reader of the "before" can only say yes or no to a predetermined answer. State the problem before the solution—the same discipline as a bug report's expected-vs-actual, scaled up to a system change.
🔍 Why Does This Work? Teams under deadline pressure are tempted to skip the design doc and "just start coding"—after all, the doc doesn't ship; the code does. Why is writing the design doc often faster overall than skipping it, even though it delays the first line of code? Think before reading on.
Because the expensive part of building the wrong thing is not the writing you skipped—it's the building, shipping, discovering, and rebuilding. A flaw caught on paper costs an afternoon of rewriting a paragraph; the same flaw caught in week three of implementation costs the three weeks already built on top of it, plus the rework, plus the cost of the team having reasoned about a wrong foundation. The design doc front-loads the cheapest possible error-detection: it forces the author to make the design explicit and complete enough to write down, and the act of writing surfaces the gaps (the unhandled concurrency case, the migration that has no rollback) while they're still just sentences. It also parallelizes the catching: five reviewers reading a two-page doc in an afternoon will collectively spot problems the author would have hit one at a time over weeks of coding. "Just start coding" feels faster because it produces a visible artifact (code) immediately, but it's the planning fallacy in action—it optimizes for the feeling of progress over the total time-to-correct-solution. The doc is the design work; doing it in prose first is doing it where mistakes are cheap. (Theme 1, in its most consequential form: if you can't write the design clearly, you don't understand it yet—and you're about to build something you don't understand.)
A design doc's length should match its stakes: a small feature might need a half-page; a change to a system's core architecture might need several pages and multiple review rounds. The discipline that makes them sustainable is the same as for ADRs—write them when the thinking is happening, keep them findable (in the repo or a team wiki), and resist the urge to make them exhaustive. A design doc is a tool for thinking and deciding, not a contract or a manual; once the decision is made and the code is built, the doc's job is mostly done (the durable record is the ADR and the code's own documentation). Its value was realized at the moment it changed what got built.
34.6 Postmortems and Incident Reviews: Debugging the System, Not the Human
Eventually, something breaks in production. chronoparse's new version ships on a Friday, and within twenty minutes every date it parses is off by an hour—a time-zone bug that the tests didn't catch and that's now corrupting timestamps in a dozen downstream systems. The team scrambles, finds the cause, rolls back, and fixes it. And then comes the writing that determines whether the team gets smarter or just scared: the postmortem (also called an incident review or "retrospective"). It is a written analysis of what went wrong, why, and how to prevent it—and it is the direct descendant of Chapter 21's incident report, with one principle elevated to the center of everything: it must be blameless.
This is the same threshold as code review, seen from the other end. There, you critique the code, not the coder. Here, you debug the system, not the human. A blameless postmortem operates on a single, hard-won premise: people don't cause outages; systems that allow human error to reach production cause outages. The engineer who pushed the bad config didn't fail—the pipeline that let an unreviewed config reach production failed. This isn't sentimentality, and it isn't letting people off the hook. It's the only framing that works, for a reason that's almost mechanical: if your postmortems hunt for someone to blame, your engineers learn to hide mistakes, omit details that make them look bad, and stay quiet during the next incident—which makes every future postmortem less truthful and every future outage more likely. Blame optimizes for the appearance of accountability and destroys the information you need to actually prevent recurrence. Blamelessness trades the satisfaction of finding a culprit for the thing you actually want: a system that won't break the same way twice.
Watch the difference on chronoparse's time-zone incident:
❌ Before — a postmortem that assigns blame: "On Friday, Sam deployed version 2.2.0 without properly testing the time-zone handling, causing a 30-minute outage. Sam should have caught this in testing. Going forward, Sam and the team need to be more careful about testing before deploying. Reminder: always test your code."
Read what this actually accomplishes. It names a person (Sam) and assigns fault ("should have caught this," "be more careful"), which guarantees that Sam—and everyone watching—will be more defensive and less forthcoming next time. Its "corrective action" is an exhortation ("be more careful," "always test your code") rather than a change to the system—and "be more careful" is the single most useless corrective action in engineering, because it asks humans to stop being human while changing nothing about the conditions that let the error through. Six months later the same class of bug ships again, because nothing actually changed except that Sam now dreads deploys. The report optimized for blame and delivered no prevention.
✅ After — a blameless postmortem:
# Postmortem: Incorrect time-zone offsets in chronoparse 2.2.0
**Status:** Resolved · **Severity:** SEV-2 · **Duration:** 32 min
**Author:** Sam · **Date:** 2024-06-14
## Summary
chronoparse 2.2.0 applied the system's local time zone instead of UTC when no
`tz` was specified, shifting every parsed timestamp by the local UTC offset.
For 32 minutes, ~40% of downstream timestamp writes were off by the server's
offset (UTC+1) until we rolled back to 2.1.0.
## Impact
- ~2.1M timestamps written with a +1h offset during the window.
- Three downstream services ingested skewed data; corrected by replay (see #361).
- No data loss; all affected records were identified and re-processed.
## Timeline (UTC)
| Time | Event |
|-------|-------|
| 14:02 | 2.2.0 deployed via the standard pipeline. |
| 14:05 | First alert: downstream timestamp-skew monitor fires. |
| 14:11 | On-call confirms the offset is systematic, not data-specific. |
| 14:19 | Root cause identified: default tz changed from UTC to system-local. |
| 14:34 | Rolled back to 2.1.0; offsets return to correct. |
## Root cause
A refactor in 2.2.0 (#345) changed the default time zone from explicit UTC to
the system local zone when `tz` was unspecified. The behavior change was not
caught because:
- No test asserted the *default* tz behavior — tests passed an explicit `tz`,
so the changed default was never exercised.
- The change was a one-line default in a large refactor PR; the reviewer's
attention was on the parsing logic, and the default change wasn't called out
in the PR description.
- Staging runs in UTC, so the bug was invisible there — it only manifests on
servers with a non-UTC local zone.
## What went well
- The downstream skew monitor caught it in 3 minutes; rollback was clean.
## Action items
| Action | Owner | Due |
|--------|-------|-----|
| Add tests asserting default-tz behavior (no `tz` → UTC) | Sam | 2024-06-17 |
| Run one staging worker in a non-UTC zone to catch tz-local bugs | Priya | 2024-06-21 |
| Add a PR-description checklist item: "Does this change any default?" | Raj | 2024-06-19 |
| Add a deploy guard: block release if tz-default test is absent | Sam | 2024-06-28 |
Why it's better: Notice that Sam is still the author and still named in the timeline and action items—blameless doesn't mean anonymous; it means the analysis targets the system, not the person. The root cause is a chain of system failures (no test for the default, the change not flagged in the PR, staging running only in UTC) rather than a personal one ("Sam didn't test"), and each one yields a concrete, ownable fix: write the missing test, run staging in a non-UTC zone, add a PR-description prompt, add a deploy guard. Every action item has an owner and a date (Chapter 21's discipline—an action without one is a wish), and every action changes the system so the same bug can't recur, rather than asking a human to "be careful." The timeline is a neutral record of facts separate from the root-cause analysis (the Results-vs-Discussion discipline of Chapter 13, so the facts stay trustworthy even where the analysis is debated). And "what went well" is included—a real postmortem captures what to keep (the monitor that caught it fast), not only what to fix. This document makes the team and the system better; the blame version made one person worse.
🔄 Check Your Understanding A postmortem's root-cause section reads: "The engineer forgot to add the database index, so queries were slow." Explain why "the engineer forgot" is not a useful root cause, and rewrite it as a blameless one that would actually prevent recurrence. (Hint: ask "why was it possible for a missing index to reach production?")
Answer
"The engineer forgot" isn't a root cause—it's a stopping point that prevents finding the root cause. Humans forget; that's a constant, not an explanation, and "don't forget" is not a fix you can implement. It also assigns blame, which makes the next engineer hide their next mistake. The useful question is the system one: why was it possible for a missing index to reach production undetected? A blameless root cause traces the chain: e.g., "The schema change added a query pattern with no covering index. This reached production because (1) there is no automated check that flags new query patterns lacking an index, (2) the PR's test data was small enough that the unindexed query was fast in CI, so the slowness was invisible until production scale, and (3) the PR description didn't mention the new query, so the reviewer didn't think to ask about indexing." That yields real, ownable fixes: add a CI check for unindexed query patterns; test against production-scale data; add an index-review prompt to the PR template. None of these asks a human to "remember harder"; each changes the system so a forgotten index can't silently ship. The shift from "who forgot" to "why was forgetting possible" is the entire blameless move—debug the system, not the human.
The disciplines of a good postmortem, gathered:
- Blameless, structurally. Target systems and conditions, not people. The test: could you hand this postmortem to the person at the center of the incident and have them feel informed and supported, not accused? If not, rewrite the analysis toward the system.
- A neutral timeline, separate from the analysis. What happened, in order, by the clock—facts everyone agrees on—kept apart from why it happened (which may be debated). This is the incident-report structure of Chapter 21.
- Root cause means the system cause, traced past the proximate one. "The config was wrong" is proximate; "the pipeline allowed an unreviewed config to ship" is the system cause. Keep asking "why was that possible?" until you reach something you can change.
- Action items with owners and dates, that change the system. Not "be more careful"—a guard, a test, a check, a process change, each owned by a named person with a real due date. An action item without an owner and a date is a wish (Chapter 21).
- Capture what went well. The fast detection, the clean rollback, the good runbook—so the team reinforces what worked, not only patches what didn't.
🪞 Learning Check-In Think about a time something went wrong on a team you were on—a bug shipped, a deadline missed, a deploy broken. How was it handled in writing (if at all)? Was the instinct to find who did it, or to understand what in the system let it happen? Be honest about which felt more natural—because the blame instinct is the default, and blamelessness is a discipline you have to choose against the grain. Here's the reframe to carry out of this chapter: the next time you write up something that went wrong—a postmortem, an incident report, even a Slack message about a broken build—catch yourself at the moment you're about to name a person, and ask instead "what about the system made this possible?" That single substitution, made consistently, is what separates teams that learn from teams that just churn through frightened people. You've now seen the same move twice—critique the code not the coder, debug the system not the human—because it's the most important social skill in all of technical collaboration, and it's a writing skill before it's anything else.
34.7 Release Notes and the Software-Engineering Writing Ecosystem
When a change reaches users, the last piece of writing in the chain is the release notes: the user-facing announcement of what's new, changed, or fixed in a version. Release notes overlap with the changelog (Chapter 25) but tilt toward a broader, sometimes less technical audience and a more communicative tone—a changelog is a precise record for developers tracking every version; release notes are a message to users about why they should care about this one. The skill is the same audience discipline: write what the user needs to know to decide whether and how to adopt the release, led by what affects them most, with the internal churn omitted.
The same before/after that governed the changelog governs release notes, sharpened for the wider audience:
❌ Before — release notes that are really a changelog dump (or worse, marketing fluff): "Version 3.0 is our most powerful release yet, packed with exciting improvements and under-the-hood enhancements that make chronoparse better than ever!"
✅ After — release notes that tell the user what changed and what to do: "chronoparse 3.0 adds async parsing and drops Python 3.7. • New:
parse_async()for parsing large batches without blocking. • Breaking:parse_strict()is removed — passstrict=Truetoparse()instead. • Breaking: Python 3.7 is no longer supported; upgrade to 3.8+. See the full changelog for everything. Upgrading from 2.x? Start with the migration guide."
Why it's better: The "before" is empty—"most powerful," "exciting," "better than ever" are unverifiable adjectives that tell the user nothing actionable (the empty-intensifier problem of Chapter 7, in release-notes form), and a user can't plan an upgrade around enthusiasm. The "after" leads with the one-line summary, surfaces the breaking changes prominently (the entries that cost users time if missed), gives each a migration path ("pass strict=True instead"), and links to the exhaustive changelog and a migration guide for those who need them. It respects that a user reading release notes has one real question—"what does this mean for me, and what do I have to do?"—and answers it.
These genres don't exist in isolation; they form an interlocking software-engineering writing ecosystem, and seeing how the pieces connect is part of the skill. A feature begins as a user story with acceptance criteria (what to build, and what "done" means); a substantial one gets a design doc (how to build it, evaluated before building) and, once decided, an ADR (Chapter 25—the durable record of the key decision). The work is proposed in a pull request with a description (what/why/how to test) and improved through code-review comments (specific, kind, aimed at the code). The behavior is captured in code comments and docstrings (Chapter 24—the why) and README/API docs (Chapter 25—the contract). When it ships, release notes and a changelog tell users. When something goes wrong, a bug report (reproducible, expected-vs-actual) starts the fix, and if it was an outage, a blameless postmortem turns the failure into system improvements. Every one of these is writing, every one has a specific audience (a reviewer, a user, a future maintainer, a frightened on-call engineer), and every one is governed by the same handful of principles this book has taught from the start: know your reader, lead with what they need, be specific, separate fact from interpretation, and aim your words at the work rather than the person. The technical-writing skills are not different in computer science; they're the same skills, wearing the field's particular costumes.
📐 Project Checkpoint: A Developer-Facing Writing Sample (extending Piece 3 / the portfolio)
Across Chapters 22 and 25 you built Portfolio Piece 3—user documentation: a tested procedure (Chapter 22) and a README with a runnable quick-start, plus optionally an API reference and an ADR (Chapter 25). This chapter lets you round out that piece with the collaborative writing that surrounds real software work—the writing that demonstrates you can function on an engineering team, not just produce a document.
Recall the prior increment. Your Piece 3 so far is documentation attached to a project: instructions a doer can follow, a front door a stranger can walk through, a decision recorded for the future.
This chapter's addition — choose the artifact that fits your work, and produce it to this chapter's standard: - If you've ever reviewed or received a rough code review: take a real (or realistic) harsh review comment and rewrite it to be specific, kind, and aimed at the code—then write a one-paragraph reflection on what changed and why the rewrite will get a better result. (This is the exercises' centerpiece task; do it for real.) - If you have a project on a code host: write a complete PR description (what / why / how to test) for a real or planned change, and a bug report for a real defect (summary, repro, expected vs. actual, environment). - If you work from requirements: write one user story with three to five testable Given-When-Then acceptance criteria. - If you've lived through an incident: write a blameless postmortem for it—summary, timeline, system root cause, and action items with owners and dates. (Anonymize as needed; target the system, never the person.)
Whichever you choose, the standard is this chapter's: aimed at a specific audience, actionable without a follow-up question, and—for the review and the postmortem—aimed at the code/system, not the person. Add it to your portfolio as evidence of collaborative technical writing, which is what working engineers are actually evaluated on.
Preview the next increment. Chapter 35 (Writing for Science) turns from the software team to the scientific community, where the audience is peer reviewers and the conventions—statistical reporting, the response-to-reviewers letter, field voice norms—are different costumes on the same skills. If your track is academic rather than software, your portfolio's center of gravity shifts there; if it's software, this chapter's collaborative writing is your portfolio's distinctive strength.
34.8 Common Mistakes & Practical Considerations
The failures below recur across reviews, PRs, bug reports, and postmortems. Most share a root cause that should be familiar by now: writing aimed at the writer's needs (venting, looking thorough, recording activity) instead of the reader's needs (a fix they can act on, a change they can evaluate, a system they can improve)—the audience principle (Chapter 2) in its software habitat, plus a recurring failure to separate the person from the work.
The harsh review comment. Vague, personal, and pathless: "this is wrong, did you even test it?" It wounds, provokes defensiveness, and buries any real signal. Be specific (name the line and behavior), aim at the code, offer a path, and label blocking vs. nit. Candor without cruelty.
The empty PR description. "fixes," or blank—forcing every reviewer to reverse-engineer intent from the diff. Give what / why / how to test. The "how to test" is the most-skipped and most-valuable part: it lets the reviewer verify rather than merely inspect.
The unreproducible bug report. "Doesn't work," "crashes sometimes," "the wrong result," plus "ASAP." It's a feeling, not a fix—it starts a multi-day clarification loop. Give a minimal reproduction, expected vs. actual, and the environment. The act of minimizing the repro often finds the bug.
Acceptance criteria that aren't testable. "Works well," "intuitive," "fast." Unfalsifiable, so a developer guesses and a tester can't verify "done." Use Given-When-Then with concrete, checkable outcomes—including the empty case, the error case, and a real performance number.
The solution disguised as a problem (design docs). Opening a design doc with "we should use a streaming architecture" instead of "large files crash our workers" traps reviewers into evaluating a predetermined answer and kills the alternatives the doc exists to weigh. State the problem—quantified—before any solution.
The blameful postmortem. "Sam should have tested it," with the corrective action "be more careful." It teaches people to hide mistakes and changes nothing about the system, so the bug recurs. Target the system ("why was it possible for this to ship?"), and make every action item a system change with an owner and a date. "Be more careful" is never a corrective action.
Release notes that are marketing or a commit dump. "Our most powerful release yet!" tells users nothing; a raw commit list buries what they need. Lead with what affects the user, surface breaking changes, give migration paths.
Skipping the design doc to "save time." It feels faster (code appears immediately) but it's the planning fallacy: a flaw caught on paper costs an afternoon; the same flaw caught in week three costs the three weeks built on top of it. For non-trivial work, the doc is the engineering.
The honest "it depends": how much process for this change? A one-character typo fix doesn't need a design doc, three acceptance criteria, and a five-paragraph PR description—it needs a one-line PR title and a green test. A change to a payment system's core logic needs all of it and more. The calibration is the same audience-and-stakes judgment as everywhere: who is affected, and what does it cost them if this is wrong? Match the weight of the writing to the weight of the change. Over-process a trivial change and you train your team to resent the process; under-process a consequential one and you ship the time-zone bug. The skill is reading which is which—and erring toward more writing exactly where the blast radius is largest.
📚 Going Deeper: The conventions are local; the principles are universal Every team and community has its own dialect of these genres. One team prefixes optional review comments with
nit:; another uses a four-point scale; Google's widely-published engineering-practices guide formalizes much of §34.1–§34.2 (and is the basis for some of this chapter's vocabulary). One open-source project files design docs as numbered RFCs in a dedicated repository (Rust and Python are famous examples); another uses a Google Doc. Postmortem culture, popularized in large part by the site-reliability-engineering tradition at companies operating at massive scale, has its own established templates and the firm norm of blamelessness. Bug trackers impose their own required fields. None of these dialects is the "right" one, and you'll adapt to whatever your team uses. What doesn't change—across teams, companies, languages, and decades—is the principle underneath each genre: a review comment serves the author and the code; a PR description serves the reviewer; a bug report serves the fixer; a postmortem serves the future and the system. Learn your team's dialect, but hold the principles, because the principles are what make you a good writer in any dialect—and what let you improve a team's conventions when they're serving the writer instead of the reader.
Frequently Asked Questions
How do I write a good code review comment?
Make it specific (name the exact line, input, and behavior—"this returns None when items is empty, line 14," not "this is broken"), aim it at the code, not the coder (describe what the code does, never what the author failed to do), offer a path forward (suggest the fix or offer to pair, rather than just objecting), and label its weight (prefix optional style comments with nit:; say plainly when something blocks the merge). Ask genuine questions when you might be wrong, and praise what's good, not only what's flawed. The goal is candor without cruelty: say the true, hard thing about the code, clearly, and pointed at the code.
What should a pull request description include?
Three things the diff can't show, in order: what changed (a one- or two-sentence headline a reviewer grasps before reading code), why (the reasoning and the linked issue—the diff shows the change but never the intent), and how to test (the exact command to run the relevant tests, plus what they cover, so the reviewer can verify rather than merely read). Add a short notes section for scope boundaries ("does not touch X") and anything that changes a default. Keep PRs small enough that the what fits in two sentences; if it doesn't, the PR is probably doing too much.
How do I write a bug report a developer can actually fix?
Give the developer what they need to act: a one-line summary of the specific misbehavior (not "doesn't work"), steps to reproduce (ideally a minimal, runnable snippet—the single most valuable thing in any bug report), expected vs. actual behavior (both halves, stated concretely), the environment (software version, language/runtime, OS), and the evidence (the full error message and stack trace, not a paraphrase). If you've narrowed it down ("only happens on leap years"), say so. Building the minimal reproduction often reveals the cause—the act of reducing the problem is itself debugging.
What is a blameless postmortem?
A blameless postmortem is a written analysis of an incident that targets the system, not the person—it asks "why was it possible for this error to reach production?" rather than "who made the mistake?" It includes a summary, the impact, a neutral timeline of what happened, a root cause traced to a system condition (not "the engineer forgot," but "no automated check caught the missing index"), action items that change the system (each with an owner and a due date—never "be more careful"), and what went well. It's blameless not out of niceness but because blame makes engineers hide mistakes and changes nothing, so the same failure recurs; targeting the system is the only framing that actually prevents recurrence.
What's the difference between a user story and acceptance criteria?
A user story states a need from the user's point of view ("As a support agent, I want to search tickets by email so that I can find a customer's history")—it captures who, what, and why. Acceptance criteria are the specific, testable conditions that must be true for that story to count as done ("Given an email matching no tickets, when the agent searches, then an empty-state message is shown, not an error"). The story says what to build and why it matters; the criteria are the contract for "done"—concrete enough that a developer can build to them without guessing and a tester can verify them objectively. Written in the Given-When-Then form, they force you to specify the edge cases (empty results, errors, performance) that vague criteria silently leave to chance.
Chapter Summary
Key Takeaways
- In a review you critique the code, not the coder; in a postmortem you debug the system, not the human. Both are the same move—separate the artifact or system from the person—and both produce the outcome you actually want (better code, fewer outages) precisely because of that separation.
- Code-review comments must be specific, kind, and aimed at the code, with a path forward and a clear blocking/
nit:weight. Candor without cruelty. - A PR description gives what the diff can't: what changed, why, and how to test it. The "how to test" lets a reviewer verify, not just inspect—it's the most-skipped, most-valuable section.
- A bug report's job is to be reproducible and actionable, not to express frustration: a minimal repro, expected vs. actual, and the environment. Minimizing the repro is itself debugging.
- Acceptance criteria are the contract for "done"—testable Given-When-Then conditions, including the edge cases. "Works well" is not a criterion.
- A design doc is thinking on paper before building—problem (stated before the solution), alternatives considered, trade-offs. A flaw caught on paper costs an afternoon; caught in week three it costs weeks.
- A blameless postmortem turns a failure into system improvements: neutral timeline, system root cause, action items with owners and dates that change the system—never "be more careful."
Action Items
- Take a harsh code-review comment (yours or one you've received) and rewrite it specific, kind, and aimed at the code.
- Add a "what / why / how to test" description to a PR that lacks one.
- Rewrite one vague bug report with a minimal repro and expected-vs-actual.
- Write three testable Given-When-Then acceptance criteria for a feature currently described as "make it intuitive."
- Take one incident and write (or rewrite) its postmortem to target the system, with owned, dated action items.
Common Mistakes
- Harsh/vague/personal review comments; empty PR descriptions; unreproducible bug reports ("doesn't work, ASAP"); untestable acceptance criteria; design docs that state a solution instead of a problem; blameful postmortems whose "fix" is "be more careful"; release notes that are marketing or a commit dump; skipping the design doc to "save time."
Decision Framework
| Question | If yes → | If no → |
|---|---|---|
| Does your review comment name the specific line/behavior and aim at the code? | Good | Rewrite it specific and aimed at the code; add a path forward |
Have you labeled each comment blocking vs. nit:? |
Good | Label them so the author can prioritize |
| Does your PR description say what, why, and how to test? | Good | Add the missing part—usually "how to test" |
| Could a developer reproduce your bug from the report alone? | Good | Add a minimal repro, expected vs. actual, and environment |
| Are your acceptance criteria testable (Given-When-Then, concrete)? | Good | Replace "works well/intuitive/fast" with checkable conditions |
| Does your design doc state the problem before the solution, and weigh alternatives? | Good | Lead with the (quantified) problem; add alternatives considered |
| Does your postmortem target the system, with owned/dated action items? | Good | Re-aim it from the person to the system; add owners and dates |
Spaced Review
A few questions reaching back, to strengthen retention.
- (From Chapter 24) Chapter 24's threshold was "comment the why, not the what." A pull-request description and a code comment both supply a why the code can't show. Explain the parallel: what what does the diff already show, and what why must the PR description (and a code comment) add—and why does a reviewer need the why before the what?
- (From Chapter 25) Chapter 25 argued that API docs are a contract and the most-skipped, most-needed part is the errors. Where does that same "document the failure, not just the success" instinct reappear in this chapter—in bug reports, in acceptance criteria, and in postmortems? Name the parallel in each.
- (From Chapter 21, bridging) Chapter 21 introduced the blameless incident report ("the pipeline allowed an unreviewed config to ship," not "X was careless") and the rule that an action item needs an owner and a date. This chapter's postmortem is that idea's full form. What does §34.6 add to Chapter 21's incident report—what makes a "postmortem" more than a workplace incident report, and why is the blameless framing not merely polite but mechanically necessary for prevention?
Answers
1. The diff shows the **what**: the exact lines that changed—the ground truth of the implementation. It cannot show the **why**: the intent behind the change (what problem it solves, why this approach, what the originating issue was) and, for a comment, why a particular line is shaped the way it is. The reviewer needs the why *first* because **you can't evaluate an implementation without knowing its intent**—a function returning `None` on empty input is correct or buggy entirely depending on what callers are promised, and the diff is silent on the promise. Given the intent up front, the reviewer reads the diff as *evidence for or against a claim* ("does this code actually do what the description says?") rather than as a puzzle to decode, which is both faster (no reverse-engineering) and better (it surfaces the deepest bug class—code that works but does the wrong thing—which is invisible if you only check the *how* against itself). That's "comment the why, not the what" ([Chapter 24](../../part-05-software-data-writing/chapter-24-code-documentation/index.md)) at the scale of a whole changeset: the *what* is in the artifact; the *why* exists only if the writer supplies it. 2. The same "document the failure, not just the happy path" instinct runs through all three: **In a bug report**, *expected vs. actual* is exactly the failure made explicit—you don't just say what you wanted (the happy path), you state precisely how the system *failed* to deliver it, because the gap is the bug. A report that describes only intended use is as useless as an API doc that documents only the 200 response. **In acceptance criteria**, the testable conditions must cover the *failure and edge cases*—the empty result, the error, the bad input—not only the success path ("Given an email matching nothing, then show an empty state, not an error"); criteria that specify only the happy path leave the failures to chance, exactly the gap [Chapter 25](../../part-05-software-data-writing/chapter-25-readme-api-docs/index.md) warned about in API docs. **In a postmortem**, the entire document *is* an analysis of a failure—and its discipline is to document the failure's *system* causes honestly and completely, just as API error docs document every failure mode the integrator will hit. Across all three: the success path is the easy, obvious half; the *failure* path is where the reader's real need lives, and where lazy writing skips. 3. [Chapter 21](../../part-04-professional-workplace-writing/chapter-21-workplace-reports/index.md)'s incident report and this chapter's postmortem share the same bones (summary, timeline, root cause, corrective actions with owners and dates) and the same blameless seed. What §34.6 *adds* is the elevation of blamelessness from one good practice to the organizing principle of the whole genre, plus the *mechanism* for why it's necessary: a postmortem in the software/SRE tradition exists specifically to convert an outage into durable *system* improvements (guards, tests, checks), and it makes explicit the move that blame is not merely impolite but **self-defeating**—if postmortems hunt for a culprit, engineers learn to hide mistakes, omit damaging details, and stay quiet during the next incident, which *degrades the information* every future postmortem depends on and makes recurrence *more* likely. So blamelessness is mechanically necessary: it's the only framing that preserves the truthful information needed to prevent the failure from happening again. It's also tied here to its twin—"critique the code, not the coder"—revealing both as one threshold (separate the person from the work) seen from two ends. [Chapter 21](../../part-04-professional-workplace-writing/chapter-21-workplace-reports/index.md) taught the *form*; this chapter teaches *why the form's central rule is the rule*.What's Next
Chapter 35 (Writing for Science: The Conventions That Govern Scientific Publication) leaves the software team for the scientific community—where the audience is peer reviewers and journal editors, and the conventions are different costumes on the same skills. You'll meet the field-specific voice norms (why passive voice still rules in chemistry while biology has shifted toward active), the precise grammar of statistical reporting, the response-to-reviewers letter (which is this chapter's "specific, kind, aimed at the work" rewritten for an academic critic—gracious and firm, never defensive), and the cover letter that frames a paper for an editor. The thread that ties Chapter 34 to Chapter 35 is the one that ties the whole of Part VII together: the technical-writing principles don't change between fields; only the conventions do. A scientist responding to a hostile Reviewer 2 and an engineer answering a harsh code review are solving the same problem with the same tools—aim at the work, not the person, and let candor and grace coexist.
Practice: Exercises · Quiz Go deeper: Case Study · Case Study 2 Review: Key Takeaways · Further Reading