Case Study 2 (Deep Dive): The Over-Documented File and the Comment That Lied

DataField.Dev

Case Study 2 (Deep Dive): The Over-Documented File and the Comment That Lied

A contrasting scenario. Case Study 1 fixed the no-documentation end of the spectrum. This one walks the too-much end — the failure that disguises itself as diligence — and shows how over-documentation actively causes the worst kind of harm: a comment that lies. Composite and anonymized; the pattern is one every code reviewer has seen.

The setup

A new engineer, Priya, joins a payments team and is praised in her first week for "thorough documentation." Here is a representative function from her first pull request. Read it the way her reviewer should have:

def calculate_fee(amount, tier):
    # initialize the fee to zero
    fee = 0.0
    # check if the tier is premium
    if tier == "premium":
        # premium users pay a 1% fee
        fee = amount * 0.02
    # check if the tier is standard
    elif tier == "standard":
        # standard users pay a 3% fee
        fee = amount * 0.03
    # otherwise the tier is basic
    else:
        # basic users pay a 5% fee
        fee = amount * 0.05
    # add the fixed processing charge
    # 30 cents covers the card-network interchange floor; below this
    # we lose money on the transaction. Do not remove. See #payments-77.
    fee = fee + 0.30
    # return the fee
    return fee

At a glance it looks well-documented — a comment on nearly every line, nothing left unexplained. That impression is exactly the trap. This file is at the too-much end of the documentation spectrum (§24.6), and the over-documentation isn't neutral clutter; it's doing three kinds of harm at once.

Harm 1: Noise burying signal

Eight of the ten comments narrate the what — # initialize the fee to zero over fee = 0.0, # return the fee over return fee. Each adds nothing the code doesn't already say (§24.2). On their own they'd be merely useless. Together they do something worse: they bury the one comment that matters. The note about the 30-cent interchange floor — a genuine why, with a real warning ("Do not remove") and a ticket — is the single most important line in the file, and it's drowning in a sea of # check if the tier is premium. A reviewer (or a future maintainer) scanning ten comments, nine of which are noise, learns the rational lesson: the comments here aren't worth reading. So they skip all of them — including the one that says "remove this and we lose money on every transaction." Over-documentation didn't just waste space; it disabled the documentation that mattered, by training the reader to ignore the channel.

Harm 2: The comment that lies

Read the premium branch again:

    # premium users pay a 1% fee
    if tier == "premium":
        fee = amount * 0.02

The comment says 1%. The code charges 2% (* 0.02). One of them is wrong, and a reader has no way to know which — maybe the rate was 1% when Priya wrote the comment and someone later bumped it to 2% without touching the comment (classic doc drift, §24.6), or maybe she simply mistyped. Either way, this is the failure the chapter calls worse than no comment at all: a reader who trusts the comment now believes premium users are charged 1% when they're charged 2%. With no comment, they'd have read the code (* 0.02) and known the truth. The comment doesn't aid understanding; it manufactures a false belief — and in a payments system, a false belief about a fee rate is the kind of thing that ends up in an incident report. This is why the chapter insists: every comment is a promise to keep it true, and the more comments you write — especially what-comments that shadow every line — the more promises you'll eventually break.

Harm 3: The obvious explained, the subtle skipped

Notice where Priya spent her documentation effort. Every trivial, self-evident line got a comment (they're easy to narrate). But the function has at least one genuinely subtle question that got no comment: why do the fee tiers differ by exactly these amounts, and is the order of operations right — should the fixed 30-cent charge be inside the percentage or added after? (Here it's added after, which is a real decision with billing consequences.) The over-documenter narrated all the easy whats and skipped the hard why, which is the precise inverse of where the effort should go. Voluminous and useless: everything obvious explained, the one thing worth explaining left silent.

The fix: cut to the calibrated center

The repair is subtraction, mostly — delete the narration, fix the lie, keep and surface the real why, and add the one missing why:

def calculate_fee(amount: float, tier: str) -> float:
    """Return the total fee for a charge of `amount` at the given tier.

    Fee = a tier-based percentage of `amount` plus a fixed processing
    charge. Tiers: premium (2%), standard (3%), basic (5%).

    Args:
        amount: The charge amount in dollars.
        tier: One of "premium", "standard", "basic"; unknown tiers
            are treated as basic.

    Returns:
        The total fee in dollars (percentage + fixed charge).
    """
    rate = FEE_RATES.get(tier, FEE_RATES["basic"])
    percentage_fee = amount * rate

    # The fixed charge is added *after* the percentage, not folded into it:
    # it covers the card-network interchange floor (30¢), which is a flat
    # cost per transaction regardless of size. Below this we lose money.
    # Do not remove. See #payments-77.
    return percentage_fee + PROCESSING_CHARGE

with the rates lifted into named, self-documenting constants:

FEE_RATES = {"premium": 0.02, "standard": 0.03, "basic": 0.05}
PROCESSING_CHARGE = 0.30

What happened:

The eight what-comments are gone. The code says all of that already (§24.2).
The lie is gone — and it's gone structurally: the rates now live in one named place (FEE_RATES), so there is no second copy of "1%/2%" to drift out of sync. The fix didn't just correct the comment; it removed the possibility of that drift (§24.4's coupled-documentation principle, §24.5's named constants).
The real why is preserved and now visible — the interchange-floor warning, no longer buried, reads loud because the noise around it is gone.
The missing why is supplied — the comment now explains the after-not-folded ordering decision, the one genuinely subtle point.
A docstring and types give callers the contract and the data shapes (§24.3–24.4).

The result has three comments where the original had ten — and it is far better documented, because comment count was never the measure. Each surviving comment earns its place with a why the code can't express; everything else became code, constants, or contract.

The takeaway

"Lots of comments" felt like diligence and was the opposite. The over-documented file buried its one crucial warning, shipped a comment that lied about a fee rate, and explained everything except the thing worth explaining. Both ends of the documentation spectrum are failures, and this is the sneakier one: no documentation looks like a problem, so people eventually fix it; too much documentation looks like a virtue, so it festers — comments rotting into lies while everyone congratulates the author for being thorough. The skill is the calibrated center: capture the why, trust the code with the what, and remember that every comment you write is a promise you'll have to keep.

Related: Chapter 24 · Case Study 1 · Exercises · Key Takeaways