You have spent the previous eight chapters building real software. You created CLI tools in Chapter 15, web frontends in Chapter 16, REST APIs in Chapter 17, database schemas in Chapter 18, full-stack applications in Chapter 19, external...
In This Chapter
- Learning Objectives
- Introduction
- 23.1 Documentation as a First-Class Citizen
- 23.2 README Files That Work
- Features
- Documentation
- Contributing
- License
- 23.4 Architecture Decision Records (ADRs)
- 23.5 Docstrings and Inline Documentation
- 23.6 User Guides and Tutorials
- 23.7 Changelog and Release Notes
- 23.8 Code Comments: When and How
- 23.9 Documentation-Driven Development
- 23.10 Maintaining Documentation with AI
- Connecting the Dots: Documentation Across Part III
- Summary
- Chapter Summary
Chapter 23: Documentation and Technical Writing
Learning Objectives
After completing this chapter, you will be able to:
- Remember the key types of software documentation (READMEs, API docs, ADRs, changelogs, inline comments) and when each is appropriate (Bloom's Level 1)
- Understand why documentation is especially critical in AI-generated codebases and how it differs from documentation in traditionally written code (Bloom's Level 2)
- Apply AI-assisted workflows to generate, format, and maintain READMEs, API documentation, docstrings, and user guides (Bloom's Level 3)
- Analyze existing documentation for completeness, accuracy, and alignment with code, identifying gaps and inconsistencies (Bloom's Level 4)
- Evaluate documentation quality using established standards (Google style, NumPy style, Diátaxis framework) and determine which approach fits a given project (Bloom's Level 5)
- Create a comprehensive documentation strategy for a software project, including documentation-driven development workflows and AI-powered maintenance pipelines (Bloom's Level 6)
Introduction
You have spent the previous eight chapters building real software. You created CLI tools in Chapter 15, web frontends in Chapter 16, REST APIs in Chapter 17, database schemas in Chapter 18, full-stack applications in Chapter 19, external integrations in Chapter 20, test suites in Chapter 21, and debugging workflows in Chapter 22. Every one of those projects shares a common need that developers consistently underestimate: documentation.
Documentation is the bridge between code that works and code that is usable. It is the difference between a project that thrives and one that accumulates confusion. And in the era of vibe coding, documentation takes on a dimension it never had before.
When you generate code with an AI assistant, you produce working software rapidly. But the decisions behind that code, the architectural rationale, the constraints you communicated through prompts, the alternatives you rejected, all of that context evaporates the moment you close your chat session. Traditional developers at least carry mental models of why they wrote what they wrote. With AI-assisted development, even the original author may not fully understand every line. Documentation becomes the institutional memory that neither you nor your AI assistant can provide from memory alone.
This chapter teaches you how to treat documentation as a first-class deliverable, how to leverage AI to produce it efficiently, and how to maintain it as your codebase evolves. You will learn specific formats, tools, and workflows that make documentation sustainable rather than burdensome.
Key Insight: The irony of vibe coding is that the easier code is to generate, the harder it is to understand without documentation. Speed of creation amplifies the need for explanation.
23.1 Documentation as a First-Class Citizen
The Documentation Deficit in AI-Generated Code
Consider a scenario that plays out in teams adopting AI-assisted development. A developer uses Claude to generate a FastAPI backend in an afternoon. The API works. Tests pass. The developer moves on to the next feature. Three weeks later, a colleague needs to modify an endpoint. They open the codebase and find well-structured Python with type hints and some auto-generated docstrings, but no explanation of why the API uses a particular authentication scheme, why certain endpoints accept POST instead of PUT, or what the expected data flow looks like across services.
This is the documentation deficit. AI-generated code often looks clean and professional, which paradoxically makes people assume it needs less documentation. In reality, it needs more. When you write code by hand, you build an understanding of the codebase incrementally. Each decision leaves a trace in your memory. When AI writes the code, that trace does not exist.
Types of Documentation
Software documentation falls into several categories, each serving a different audience and purpose:
| Documentation Type | Primary Audience | Purpose | Update Frequency |
|---|---|---|---|
| README | New users, contributors | Project overview and quick start | Per major release |
| API Documentation | Developers consuming the API | Endpoint reference and usage | Per API change |
| Architecture Decision Records | Team members, future maintainers | Capture decision rationale | Per significant decision |
| Docstrings | Developers reading source code | Function/class/module reference | Per code change |
| User Guides | End users | Task-oriented instructions | Per feature release |
| Changelogs | Users, stakeholders | Track what changed and when | Per release |
| Inline Comments | Developers maintaining code | Explain non-obvious logic | As needed |
| Contributing Guides | Open-source contributors | Explain contribution process | Quarterly review |
Callout: The Diátaxis Framework
The Diátaxis framework, developed by Daniele Procida, organizes documentation into four categories based on user needs:
- Tutorials — Learning-oriented, guided experiences for beginners
- How-To Guides — Task-oriented instructions for specific goals
- Reference — Information-oriented, precise technical descriptions
- Explanation — Understanding-oriented, discussion of concepts and decisions
Each category serves a different user need and requires a different writing style. Mixing categories within a single document leads to documentation that serves no audience well. As you work through this chapter, consider which Diátaxis category each documentation type falls into.
Documentation in the Vibe Coding Workflow
In a traditional development workflow, documentation is often treated as an afterthought, something you write reluctantly after shipping. Vibe coding inverts this relationship. Because AI assistants are excellent at generating structured text, documentation becomes one of the cheapest deliverables to produce. The workflow looks like this:
- Before coding: Write a brief specification or ADR that captures what you are building and why
- During coding: Generate docstrings and inline comments as part of the AI-assisted code generation
- After coding: Use AI to generate README sections, API documentation, and user guides from the existing codebase
- During maintenance: Use AI to detect documentation drift and suggest updates
This chapter walks you through each of these phases with concrete tools and techniques.
23.2 README Files That Work
The README as Your Project's Front Door
The README file is the most important piece of documentation in any project. It is the first thing users and contributors see on GitHub, PyPI, or any project directory. A great README answers five questions in rapid succession:
- What is this project?
- Why should I care?
- How do I install it?
- How do I use it?
- How do I contribute or get help?
Anatomy of an Effective README
Here is a structure that works consistently for Python projects:
# Project Name
One-sentence description of what this project does.
[](...)
[](...)
[](...)
[](...)
## Overview
A 2-3 paragraph description that explains:
- What problem this project solves
- Who it is for
- What makes it different from alternatives
## Quick Start
### Installation
```bash
pip install package-name
Basic Usage
from package_name import main_feature
result = main_feature("input")
print(result)
Features
- Feature one with brief description
- Feature two with brief description
- Feature three with brief description
Documentation
Link to full documentation site.
Contributing
Brief description or link to CONTRIBUTING.md.
License
Licensed under the MIT License. See LICENSE for details.
### Badges: Useful Signals, Not Decorations
Badges at the top of a README provide at-a-glance information about project health. Useful badges include:
- **Build status** — Is CI passing? (shields.io or GitHub Actions badge)
- **Version** — What is the latest release?
- **Python versions** — What Python versions are supported?
- **License** — What license governs this project?
- **Test coverage** — What percentage of code is tested?
Avoid purely decorative badges. Every badge should answer a question a user might have before deciding to adopt your project.
> **Practical Tip:** When using AI to generate a README, provide the assistant with your `pyproject.toml`, a directory listing, and a brief description of the project's purpose. This gives the AI enough context to produce an accurate, detailed README. See `code/example-01-readme-generator.py` for an automated approach.
### Common README Mistakes
**The Wall of Text.** A README that reads like an essay fails new users who want quick answers. Use headers, bullet points, and code blocks aggressively.
**Missing Installation Instructions.** Never assume users know how to install your project. Be explicit about prerequisites, Python version requirements, and installation commands.
**Outdated Examples.** A code example that does not work is worse than no example at all. If you change your API, update your README. Section 23.10 covers how AI can help with this.
**No Quick Start.** Users should be able to go from zero to a working example in under five minutes. If your project requires more setup, acknowledge that and provide the minimal path.
### AI-Assisted README Generation
You can prompt an AI assistant to generate a README from project metadata:
I have a Python project with the following structure:
src/ mypackage/ init.py core.py utils.py cli.py tests/ test_core.py test_utils.py pyproject.toml LICENSE
The project is a command-line tool for analyzing CSV files and generating statistical reports. It supports multiple output formats (JSON, HTML, PDF) and can handle files up to 10GB using streaming.
Please generate a comprehensive README.md following best practices.
The AI will produce a solid first draft. Your job is to review it for accuracy, add project-specific nuances the AI could not know, and ensure code examples actually work.
---
## 23.3 API Documentation with AI
### Why API Documentation Matters
In Chapter 17, you built REST APIs with FastAPI. In Chapter 20, you integrated with external APIs. In both cases, the quality of the API documentation directly determined how easy the API was to use. API documentation is the contract between your service and its consumers. Get it wrong, and developers waste hours guessing at parameter types, response formats, and error codes.
### OpenAPI and Automatic Documentation
FastAPI generates OpenAPI (formerly Swagger) documentation automatically from your route definitions and Pydantic models. This is a significant advantage of using typed frameworks:
```python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
app = FastAPI(
title="Document Analysis API",
description="API for analyzing document content and metadata.",
version="1.0.0",
)
class DocumentRequest(BaseModel):
"""Request body for document analysis."""
content: str = Field(
...,
description="The document text to analyze",
min_length=1,
max_length=100000,
)
language: str = Field(
default="en",
description="ISO 639-1 language code",
)
class AnalysisResponse(BaseModel):
"""Response body containing analysis results."""
word_count: int = Field(description="Total number of words")
sentence_count: int = Field(description="Total number of sentences")
reading_level: str = Field(
description="Estimated reading level (e.g., 'Grade 8')"
)
key_phrases: list[str] = Field(
description="Top 10 key phrases extracted from the document"
)
@app.post(
"/analyze",
response_model=AnalysisResponse,
summary="Analyze a document",
description="Performs linguistic analysis on the provided document text.",
)
async def analyze_document(request: DocumentRequest) -> AnalysisResponse:
"""Analyze document content for readability and key phrases."""
# Implementation here
...
The Field descriptions, model docstrings, and route metadata all flow into the generated OpenAPI schema. FastAPI then renders this as interactive documentation at /docs (Swagger UI) and /redoc (ReDoc).
Sphinx for Python Library Documentation
For Python libraries (as opposed to web APIs), Sphinx is the standard documentation tool. Sphinx reads reStructuredText or Markdown files and generates HTML, PDF, or ePub documentation. Its autodoc extension pulls docstrings directly from your code:
# In your Python source code
def calculate_readability(
text: str,
algorithm: str = "flesch-kincaid",
) -> float:
"""Calculate the readability score of a text.
Uses the specified readability algorithm to compute a score
indicating how easy the text is to read.
Args:
text: The input text to analyze. Must contain at least
one sentence.
algorithm: The readability algorithm to use. Supported
values are "flesch-kincaid", "coleman-liau",
"gunning-fog", and "smog".
Returns:
A float representing the readability score. For
Flesch-Kincaid, higher scores indicate easier text
(scale 0-100). For other algorithms, interpretation
varies.
Raises:
ValueError: If text is empty or contains no sentences.
ValueError: If algorithm is not a supported value.
Example:
>>> score = calculate_readability("The cat sat on the mat.")
>>> print(f"Readability: {score:.1f}")
Readability: 116.1
"""
...
With autodoc, Sphinx generates reference documentation directly from these docstrings, ensuring your docs and code stay in sync.
MkDocs for Modern Documentation Sites
MkDocs is an alternative to Sphinx that uses Markdown instead of reStructuredText. With the Material for MkDocs theme, it produces clean, modern documentation sites. Configuration lives in a single mkdocs.yml file:
site_name: My Project Documentation
theme:
name: material
palette:
primary: indigo
accent: indigo
nav:
- Home: index.md
- Getting Started:
- Installation: getting-started/installation.md
- Quick Start: getting-started/quickstart.md
- User Guide:
- Configuration: guide/configuration.md
- Usage: guide/usage.md
- API Reference:
- Core Module: api/core.md
- Utilities: api/utils.md
- Contributing: contributing.md
plugins:
- search
- mkdocstrings:
handlers:
python:
options:
show_source: true
docstring_style: google
The mkdocstrings plugin serves the same role as Sphinx's autodoc, pulling docstrings from your Python source into MkDocs pages.
AI-Assisted Workflow: Ask your AI assistant to generate MkDocs configuration and page stubs from your project structure. Then use it to fill in content for each page based on your source code. This two-pass approach (structure first, then content) produces more organized results than asking for everything at once.
Documenting Error Responses
One area where API documentation consistently falls short is error handling. Every API endpoint should document its possible error responses:
@app.post(
"/analyze",
response_model=AnalysisResponse,
responses={
400: {
"description": "Invalid input",
"content": {
"application/json": {
"example": {
"detail": "Document content must not be empty"
}
}
},
},
413: {
"description": "Document too large",
"content": {
"application/json": {
"example": {
"detail": "Document exceeds 100,000 character limit"
}
}
},
},
422: {
"description": "Validation error",
},
},
)
async def analyze_document(request: DocumentRequest) -> AnalysisResponse:
...
When using AI to review your API documentation, explicitly ask: "What error cases are not documented?" AI assistants are surprisingly good at identifying missing error documentation because they can reason about what could go wrong in each endpoint.
23.4 Architecture Decision Records (ADRs)
What Are ADRs and Why They Matter
An Architecture Decision Record is a short document that captures a significant architectural decision along with its context and consequences. ADRs answer the question that haunts every codebase: "Why was it built this way?"
In vibe-coded projects, ADRs are even more important. When you prompt an AI to build a feature, the AI makes dozens of implicit decisions: which libraries to use, which design patterns to follow, how to structure data. Unless you capture those decisions explicitly, they become invisible. The next person (or even you, six months later) who encounters the code will not know whether those choices were deliberate constraints or arbitrary defaults.
ADR Format
The most widely used format follows a template proposed by Michael Nygard:
# ADR-001: Use PostgreSQL for Primary Data Storage
## Status
Accepted
## Date
2025-11-15
## Context
Our application needs a persistent data store for user accounts,
documents, and analysis results. We need ACID transactions for
financial data, support for JSON fields for flexible document
metadata, and the ability to handle concurrent writes from
multiple API servers.
We considered:
1. PostgreSQL
2. MySQL
3. MongoDB
4. SQLite
## Decision
We will use PostgreSQL as our primary data store.
## Consequences
### Positive
- ACID compliance ensures data integrity for financial records
- JSONB columns allow flexible metadata without a separate
document store
- Strong ecosystem of ORMs (SQLAlchemy) and migration tools
(Alembic)
- Excellent concurrency handling with MVCC
### Negative
- More complex to set up than SQLite for local development
- Requires a running database server (mitigated by Docker)
- Team needs PostgreSQL-specific knowledge for advanced features
### Neutral
- We will use SQLAlchemy as our ORM, as decided in ADR-002
- Database migrations will be managed with Alembic
When to Write an ADR
Not every decision warrants an ADR. Write one when:
- You choose between multiple viable alternatives (database, framework, authentication scheme)
- The decision is difficult to reverse (data model, API versioning strategy)
- The decision affects multiple team members or components
- You find yourself explaining the same decision repeatedly
- You or your AI assistant made a non-obvious architectural choice
Rule of Thumb: If a decision would take more than five minutes to explain to a new team member, it deserves an ADR.
AI-Assisted ADR Writing
AI assistants excel at drafting ADRs because the format is structured and the content draws on widely documented trade-offs. Here is an effective prompt pattern:
I need to write an ADR for the following decision:
We are building a REST API for a document management system.
We chose FastAPI over Flask and Django REST Framework.
Our constraints:
- Need async support for file upload processing
- Team is comfortable with type hints
- OpenAPI documentation is required
- Must handle 500+ concurrent connections
Please draft an ADR following the Nygard format (Status,
Context, Decision, Consequences) that captures the reasoning
behind choosing FastAPI.
The AI produces a thorough first draft that covers technical trade-offs. Your job is to add project-specific context the AI cannot know: team expertise, organizational constraints, budget considerations, and political factors.
Managing ADRs
ADRs should be stored in your repository, typically in a docs/adr/ directory. Number them sequentially (0001-use-postgresql.md, 0002-use-sqlalchemy-orm.md). Once accepted, ADRs are immutable. If you reverse a decision, write a new ADR that supersedes the old one and update the original's status to "Superseded by ADR-NNN."
See code/example-03-adr-template.py for a Python tool that creates and manages ADRs programmatically.
23.5 Docstrings and Inline Documentation
Python Docstring Conventions
Python docstrings are string literals that appear as the first statement in a module, class, function, or method. They serve as the primary form of inline documentation and are accessible at runtime through the __doc__ attribute.
Three major docstring styles dominate the Python ecosystem:
Google Style (used by Google, recommended for most projects):
def fetch_documents(
query: str,
max_results: int = 10,
include_metadata: bool = True,
) -> list[dict]:
"""Fetch documents matching a search query.
Searches the document index for entries matching the provided
query string. Results are ranked by relevance score.
Args:
query: The search query string. Supports boolean operators
(AND, OR, NOT) and phrase matching with quotes.
max_results: Maximum number of documents to return.
Must be between 1 and 100.
include_metadata: Whether to include document metadata
(author, creation date, tags) in the results.
Returns:
A list of dictionaries, each containing:
- id (str): Unique document identifier
- title (str): Document title
- snippet (str): Relevant text excerpt
- score (float): Relevance score (0.0 to 1.0)
- metadata (dict): Present only if include_metadata
is True
Raises:
ValueError: If query is empty or max_results is out of
the valid range.
ConnectionError: If the document index is unreachable.
Example:
>>> docs = fetch_documents("machine learning", max_results=5)
>>> for doc in docs:
... print(f"{doc['title']} (score: {doc['score']:.2f})")
"""
...
NumPy Style (used by NumPy, SciPy, and many scientific Python projects):
def calculate_statistics(
data: list[float],
confidence_level: float = 0.95,
) -> dict:
"""
Calculate descriptive statistics for a dataset.
Parameters
----------
data : list[float]
Input data values. Must contain at least two elements.
confidence_level : float, optional
Confidence level for interval estimation, by default 0.95.
Must be between 0 and 1.
Returns
-------
dict
A dictionary containing:
- mean : float
Arithmetic mean of the data.
- median : float
Median value.
- std_dev : float
Standard deviation.
- ci_lower : float
Lower bound of the confidence interval.
- ci_upper : float
Upper bound of the confidence interval.
Raises
------
ValueError
If data contains fewer than two elements.
ValueError
If confidence_level is not between 0 and 1.
Examples
--------
>>> stats = calculate_statistics([1.0, 2.0, 3.0, 4.0, 5.0])
>>> print(f"Mean: {stats['mean']:.1f}")
Mean: 3.0
"""
...
Sphinx Style (traditional, used by many older projects):
def process_file(path: str, encoding: str = "utf-8") -> bytes:
"""Process a file and return its content as bytes.
:param path: Path to the file to process.
:type path: str
:param encoding: Character encoding of the file, defaults
to "utf-8".
:type encoding: str, optional
:returns: The processed file content.
:rtype: bytes
:raises FileNotFoundError: If the file does not exist.
:raises PermissionError: If the file cannot be read.
"""
...
Recommendation: For new projects, use Google style. It is the most readable in source code, well-supported by documentation tools (Sphinx via the
napoleonextension, MkDocs viamkdocstrings), and the style most commonly produced by AI assistants.
What to Document in Docstrings
Good docstrings explain what a function does, not how it does it. Implementation details belong in code comments (Section 23.8), not docstrings. Focus on:
- Purpose: What does this function do?
- Parameters: What are the inputs, their types, and their constraints?
- Return value: What does the function return and in what format?
- Side effects: Does the function modify any state, write to files, or make network calls?
- Exceptions: What errors can this function raise and under what conditions?
- Examples: A brief usage example that demonstrates typical use
AI-Generated Docstrings
AI assistants can generate docstrings for existing code. This is one of the most reliable uses of AI in documentation because the AI can see the code and describe it accurately. However, review AI-generated docstrings for:
- Accuracy: Does the docstring correctly describe what the code does?
- Completeness: Are all parameters, return values, and exceptions documented?
- Relevance: Does the docstring add information beyond what the type hints already convey?
A docstring that merely restates the function signature adds no value:
# Bad: restates the obvious
def add(a: int, b: int) -> int:
"""Add two integers and return the result.
Args:
a: The first integer.
b: The second integer.
Returns:
The sum of a and b.
"""
return a + b
For simple functions with clear type hints, a one-line docstring suffices:
# Good: concise for simple functions
def add(a: int, b: int) -> int:
"""Return the sum of two integers."""
return a + b
Reserve detailed docstrings for functions where the behavior, constraints, or edge cases are not obvious from the signature alone.
23.6 User Guides and Tutorials
The Difference Between Guides and Tutorials
Following the Diátaxis framework, tutorials and how-to guides serve different purposes:
- Tutorials guide a beginner through a complete experience. They are learning-oriented. The reader does not yet know what questions to ask. ("Build your first document analysis pipeline.")
- How-to guides help an experienced user accomplish a specific task. They are goal-oriented. The reader knows what they want to do but not how. ("How to configure custom readability algorithms.")
Writing Tutorials with AI
Tutorials require a narrative flow that AI assistants can produce well, provided you give them enough context. An effective prompt for generating a tutorial:
Write a tutorial for new users of our document-analysis library.
The tutorial should walk through:
1. Installing the library with pip
2. Analyzing a single document from a file
3. Customizing the analysis parameters
4. Exporting results to JSON and HTML
Target audience: Python developers who have never used this
library. They are comfortable with pip, imports, and basic
file I/O.
Use a conversational but professional tone. Include complete,
runnable code examples at each step. Explain what each code
block does before showing it.
The library's main API:
- DocumentAnalyzer(config: AnalyzerConfig)
- analyzer.analyze(text: str) -> AnalysisResult
- analyzer.analyze_file(path: str) -> AnalysisResult
- result.to_json() -> str
- result.to_html() -> str
- AnalyzerConfig(language="en", algorithms=["flesch-kincaid"])
Structure of a Good Tutorial
Every effective tutorial follows this pattern:
- Set the scene: What will the reader build? What will they learn?
- Prerequisites: What do they need before starting?
- Step-by-step instructions: Each step includes code, explanation, and expected output
- Verification points: After each major step, tell the reader what they should see
- Next steps: Where to go after completing the tutorial
Callout: The "Works on My Machine" Problem
The most common failure in tutorials is untested code examples. Every code block in a tutorial should be extracted and run as part of your CI pipeline. If you change your API, your tutorials should fail in CI before they fail for your users. Tools like
pytest'sdoctestmode or dedicated tools likecogcan help automate this. As we discussed in Chapter 21 on testing, verification is not optional.
How-To Guides
How-to guides are more focused than tutorials. They assume the reader has basic familiarity and wants to accomplish a specific task. A good how-to guide:
- States the goal in the title ("How to Export Analysis Results to PDF")
- Lists prerequisites briefly
- Provides step-by-step instructions without excessive explanation
- Includes a complete, copy-pasteable code example
- Notes any common pitfalls
When using AI to write how-to guides, provide the specific API surface the guide should cover and any common mistakes users make. AI assistants produce better guides when they know what to warn users about.
23.7 Changelog and Release Notes
Why Changelogs Matter
A changelog is a chronological record of notable changes in a project. It answers the question every user has when a new version is released: "What changed, and does it affect me?"
The Keep a Changelog format (keepachangelog.com) provides a clear, consistent structure:
# Changelog
All notable changes to this project will be documented in this
file.
The format is based on [Keep a Changelog](https://keepachangelog.com/),
and this project adheres to
[Semantic Versioning](https://semver.org/).
## [Unreleased]
### Added
- Support for SMOG readability algorithm
- PDF export with customizable templates
### Changed
- Improved sentence detection for non-English languages
## [1.2.0] - 2025-11-01
### Added
- Coleman-Liau readability index
- Batch processing for multiple files
- Progress callback for long-running analyses
### Fixed
- Incorrect word count for hyphenated words
- Memory leak when processing files over 1GB
### Deprecated
- `analyze_string()` method (use `analyze()` instead)
## [1.1.0] - 2025-10-15
### Added
- HTML export format
- Custom algorithm registration
### Changed
- Default language changed from "auto" to "en" for
deterministic behavior
### Security
- Updated dependency `lxml` to 5.1.0 to address CVE-2025-XXXX
Categories in a Changelog
The Keep a Changelog format defines these categories:
- Added — New features
- Changed — Changes in existing functionality
- Deprecated — Features that will be removed in future versions
- Removed — Features that were removed
- Fixed — Bug fixes
- Security — Vulnerability fixes
AI-Assisted Changelog Generation
You can generate changelog entries from git commit history:
Here are the git commits since our last release (v1.1.0):
abc1234 Add SMOG readability algorithm
def5678 Fix word count for hyphenated words
ghi9012 Improve sentence detection for French and German
jkl3456 Add PDF export with Jinja2 templates
mno7890 Fix memory leak in streaming file processor
pqr1234 Update lxml to 5.1.0 (security fix)
stu5678 Deprecate analyze_string in favor of analyze
Please generate a changelog entry in Keep a Changelog format
for the next release (v1.2.0).
The AI organizes commits into the correct categories and writes user-facing descriptions. Your job is to verify accuracy and add context that commit messages alone do not convey, especially around breaking changes and migration steps.
Practical Tip: Maintain an
[Unreleased]section in your changelog and update it with each merge to the main branch. This practice prevents the frantic, error-prone changelog writing that happens right before a release. Many teams automate this with CI checks that reject pull requests without a changelog entry.
Release Notes vs. Changelog
Release notes are a higher-level, more narrative document aimed at a broader audience. While a changelog lists every change, release notes highlight the most important ones and provide context:
# Release Notes: v1.2.0
## Highlights
### PDF Export
You can now export analysis results directly to PDF with
customizable templates. Use `result.to_pdf(template="report")`
to generate a formatted report.
### Improved Multilingual Support
Sentence detection has been significantly improved for French
and German documents, resulting in more accurate readability
scores for non-English content.
## Breaking Changes
None in this release.
## Migration Guide
The `analyze_string()` method is now deprecated. Replace calls
to `analyzer.analyze_string(text)` with `analyzer.analyze(text)`.
The deprecated method will be removed in v2.0.0.
## Full Changelog
See CHANGELOG.md for the complete list of changes.
23.8 Code Comments: When and How
The Comment Debate
Few topics in software engineering generate more debate than code comments. One camp insists that "good code is self-documenting" and that comments are a code smell. The other insists that comments are essential for understanding complex logic. Both positions contain truth, and neither is complete.
The resolution lies in understanding what comments should and should not do.
When to Comment
Comment the "why," not the "what." Code tells you what is happening. Comments should explain why it is happening that way:
# Bad: describes what the code does (obvious from the code)
# Increment the counter by one
counter += 1
# Good: explains why this approach was chosen
# We use a manual retry loop instead of tenacity because we need
# to modify the request payload between retries (adding a
# cache-bust parameter), which tenacity's retry decorator
# does not support.
for attempt in range(max_retries):
response = make_request(payload)
if response.ok:
break
payload["cache_bust"] = generate_cache_bust()
Comment non-obvious constraints and edge cases:
# The API returns dates in ISO 8601 format, but without timezone
# info. We assume UTC because the API documentation specifies
# "all timestamps are in UTC" (see: docs.example.com/timestamps).
created_at = datetime.fromisoformat(data["created_at"]).replace(
tzinfo=timezone.utc
)
Comment workarounds and temporary fixes:
# HACK: The upstream API returns HTTP 200 with an error message
# in the body instead of using proper HTTP error codes.
# We have filed issue #4521 with the vendor.
# TODO: Remove this workaround when the vendor fixes their API.
if response.status_code == 200 and "error" in response.json():
raise APIError(response.json()["error"])
Comment regular expressions and complex algorithms:
# Match email addresses: local-part@domain.tld
# - Local part: alphanumeric, dots, hyphens, underscores
# - Domain: alphanumeric segments separated by dots
# - TLD: 2-63 alphabetic characters
EMAIL_PATTERN = re.compile(
r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,63}"
)
When Not to Comment
Do not comment obvious code. If the code is clear from its naming and structure, a comment adds noise:
# Bad: the function name and type hints tell the whole story
# Get the user's full name by combining first and last name
def get_full_name(first_name: str, last_name: str) -> str:
return f"{first_name} {last_name}"
# Good: no comment needed
def get_full_name(first_name: str, last_name: str) -> str:
return f"{first_name} {last_name}"
Do not use comments to compensate for bad naming. If you need a comment to explain what a variable or function does, rename it instead:
# Bad
d = 7 # number of days until expiration
# Good
days_until_expiration = 7
Do not leave commented-out code. Version control exists to preserve old code. Commented-out code clutters the codebase and creates confusion about whether it should be restored:
# Bad: is this code supposed to be here? Should it be restored?
# result = legacy_processor.analyze(data)
# if result.score < threshold:
# return fallback_result
result = new_processor.analyze(data)
Vibe Coding Consideration: AI-generated code sometimes includes excessive comments that explain obvious operations. When reviewing AI-generated code, remove comments that do not add information beyond what the code itself conveys. Ask your AI assistant: "Remove comments from this code that merely restate what the code does. Keep comments that explain why a particular approach was chosen."
Comments in AI-Generated Code
When you generate code with an AI assistant, consider adding a brief comment at the module or function level indicating that the code was AI-generated and noting the key constraints you provided. This is not about credit or blame; it is about helping future maintainers understand the code's provenance:
# Generated with AI assistance. Key constraints:
# - Must handle CSV files up to 10GB using streaming (no full load)
# - Must support both comma and tab delimiters
# - Error handling follows the project's "fail fast" convention
def process_large_csv(
file_path: str,
delimiter: str = ",",
chunk_size: int = 10000,
) -> Iterator[pd.DataFrame]:
...
23.9 Documentation-Driven Development
Write the Docs First
Documentation-driven development (DDD) inverts the traditional workflow. Instead of writing code first and documenting it afterward, you write the documentation first and use it to guide implementation. This approach has several benefits:
- Forces clarity of thought. If you cannot explain what a function does in a docstring, you do not yet understand what it should do.
- Catches design issues early. Writing API documentation before implementation reveals awkward interfaces, missing parameters, and inconsistent naming.
- Produces better APIs. When you write the usage examples first, you design the API from the consumer's perspective rather than the implementer's perspective.
- Documentation is never out of date. Because the docs came first, the code matches the docs by construction.
DDD in Practice
Here is a documentation-driven workflow for building a new feature:
Step 1: Write the README section.
## Document Comparison
Compare two documents to identify differences in content,
structure, and readability.
### Basic Usage
```python
from docanalyzer import DocumentComparator
comparator = DocumentComparator()
result = comparator.compare("doc1.txt", "doc2.txt")
print(f"Similarity: {result.similarity_score:.1%}")
print(f"Added sections: {len(result.additions)}")
print(f"Removed sections: {len(result.removals)}")
Detailed Comparison
for change in result.changes:
print(f"[{change.type}] {change.description}")
if change.type == "modified":
print(f" Before: {change.before[:100]}...")
print(f" After: {change.after[:100]}...")
**Step 2: Write the docstrings.**
```python
class DocumentComparator:
"""Compare two documents for content and structural differences.
Provides line-level and section-level comparison of documents,
along with readability score changes.
Args:
algorithm: Comparison algorithm to use. Options are
"difflib" (default, best for small documents) and
"patience" (better for large, restructured documents).
Example:
>>> comp = DocumentComparator()
>>> result = comp.compare("old.txt", "new.txt")
>>> print(result.similarity_score)
0.85
"""
def compare(
self,
source_path: str,
target_path: str,
context_lines: int = 3,
) -> ComparisonResult:
"""Compare two documents and return detailed results.
Args:
source_path: Path to the original document.
target_path: Path to the modified document.
context_lines: Number of surrounding lines to include
with each change for context.
Returns:
A ComparisonResult containing similarity score,
list of changes, additions, and removals.
Raises:
FileNotFoundError: If either file does not exist.
UnicodeDecodeError: If either file cannot be decoded.
"""
...
Step 3: Write the tests (following Chapter 21's testing principles).
Step 4: Implement the code to match the documentation.
DDD with AI Assistants
Documentation-driven development pairs exceptionally well with vibe coding. Your documentation becomes the specification that you provide to the AI:
Here is the docstring for a class I need you to implement:
[paste the docstring]
And here are the tests it should pass:
[paste the test cases]
Please implement the DocumentComparator class to match this
specification.
This is a more effective prompt than describing what you want in natural language because the docstring is precise, structured, and unambiguous. The AI has a clear contract to fulfill. This approach connects directly to the specification-driven prompting techniques covered in Chapter 10.
Callout: The Connection to TDD
Documentation-driven development complements test-driven development (TDD). In TDD, tests define the expected behavior. In DDD, documentation defines the expected interface and usage patterns. Together, they form a comprehensive specification: tests verify that the code works correctly, while documentation ensures the code is usable correctly. Neither alone is sufficient.
23.10 Maintaining Documentation with AI
The Documentation Drift Problem
Documentation drift occurs when code changes but documentation does not. Over time, outdated documentation becomes worse than no documentation because it actively misleads readers. In traditional projects, documentation drift is managed (or more often, tolerated) through discipline and code review checklists. In AI-assisted projects, we can do better.
Automated Documentation Checks
Integrate documentation checks into your CI pipeline:
Docstring Coverage Check: Ensure all public functions have docstrings.
import ast
import sys
from pathlib import Path
def check_docstring_coverage(source_dir: str) -> dict:
"""Check docstring coverage for all Python files in a directory.
Args:
source_dir: Path to the source directory to check.
Returns:
A dictionary with 'total', 'documented', and 'missing'
keys and lists of function names.
"""
total = []
documented = []
missing = []
for py_file in Path(source_dir).rglob("*.py"):
tree = ast.parse(py_file.read_text())
for node in ast.walk(tree):
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
if node.name.startswith("_"):
continue
total.append(f"{py_file}:{node.name}")
if (
ast.get_docstring(node) is not None
and len(ast.get_docstring(node).strip()) > 0
):
documented.append(f"{py_file}:{node.name}")
else:
missing.append(f"{py_file}:{node.name}")
return {
"total": len(total),
"documented": len(documented),
"missing": missing,
"coverage": (
len(documented) / len(total) * 100 if total else 100
),
}
README Link Checker: Verify that all URLs in documentation are reachable.
Example Verification: Extract code examples from documentation and run them as tests.
Using AI to Detect Documentation Drift
You can prompt an AI assistant to compare documentation against code:
Here is a function's implementation:
[paste current code]
And here is its docstring:
[paste current docstring]
Identify any discrepancies between the code and the docstring.
Check for:
- Parameters that are documented but no longer exist
- Parameters that exist but are not documented
- Return value description that no longer matches
- Exception documentation that is inaccurate
- Examples that would no longer work
This is a task AI assistants handle well because it is a structured comparison with clear criteria. You can automate this by extracting docstrings and function signatures programmatically, then feeding them to an AI API for comparison.
Documentation Maintenance Workflow
A sustainable documentation maintenance workflow looks like this:
- Pre-commit hook: Check that modified files have updated docstrings (the
code/example-02-docstring-analyzer.pyscript can help) - Pull request template: Include a checkbox: "Documentation updated to reflect changes"
- CI pipeline: Run docstring coverage checks, link verification, and example testing
- Periodic review: Quarterly, use AI to audit documentation for drift across the entire codebase
- Version-pinned docs: Tie documentation versions to code versions so users of older versions see the correct docs
Practical Tip: Create a
docs-checkmake target or npm script that runs all documentation verification in one command. If documentation checks are easy to run locally, developers will run them. If they are buried in CI configuration, no one will think about them until the pipeline fails.
AI-Powered Documentation Updates
When you make significant code changes, ask your AI assistant to update the documentation:
I have modified the `analyze` method to accept an optional
`streaming` parameter (bool, default False). When True,
the method returns an iterator of partial results instead
of a single complete result.
Here is the current docstring and the relevant section of
the user guide. Please update both to reflect this change.
Current docstring:
[paste docstring]
Current user guide section:
[paste guide section]
This is more reliable than rewriting documentation from scratch because the AI preserves the existing style and structure while incorporating the changes.
Documentation as Code
Treat documentation with the same rigor as code:
- Version control: Documentation lives in the repository alongside code
- Code review: Documentation changes go through pull requests
- CI/CD: Documentation is built and validated automatically
- Testing: Code examples in documentation are tested
- Style guide: Establish and enforce consistent documentation style
This "documentation as code" philosophy is particularly natural in vibe-coded projects because the same AI tools that generate code can generate and maintain documentation with equal facility.
Connecting the Dots: Documentation Across Part III
Throughout Part III, you have built increasingly complex software. Each chapter's project would benefit from the documentation practices covered here:
- Chapter 15 (CLI Tools): CLI tools need
--helptext generated from docstrings, README files with installation and usage instructions, and man pages for Unix environments. - Chapter 16 (Web Frontend): Frontend projects need component documentation, storybooks, and user guides for non-technical stakeholders.
- Chapter 17 (REST APIs): APIs need OpenAPI documentation, authentication guides, rate limiting documentation, and error code references.
- Chapter 18 (Database): Database schemas need entity-relationship diagrams, data dictionaries, and migration guides.
- Chapter 19 (Full Stack): Full-stack projects need architecture overviews, deployment guides, and environment setup documentation.
- Chapter 20 (External APIs): Integration projects need API key setup guides, webhook documentation, and troubleshooting guides.
- Chapter 21 (Testing): Test suites need testing strategy documents, test data requirements, and CI configuration documentation.
- Chapter 22 (Debugging): Debugging workflows need runbooks, known issues lists, and incident response documentation.
Documentation is not a separate activity that happens after building. It is an integral part of building.
Summary
Documentation in vibe-coded projects is both more important and more achievable than in traditional development. AI assistants can generate initial documentation efficiently, but human judgment is essential for accuracy, relevance, and context that AI cannot provide.
The key practices covered in this chapter:
- Treat documentation as a first-class citizen by integrating it into your development workflow from the start
- Write READMEs that answer the five essential questions (what, why, install, use, contribute)
- Leverage framework-native documentation (OpenAPI for APIs, Sphinx/MkDocs for libraries)
- Capture architectural decisions in ADRs before the reasoning is lost
- Choose a docstring convention and apply it consistently (Google style recommended)
- Distinguish tutorials from how-to guides using the Diátaxis framework
- Maintain changelogs incrementally rather than writing them at release time
- Comment the "why," not the "what" in code comments
- Practice documentation-driven development to produce better APIs and more accurate docs
- Automate documentation maintenance with CI checks, drift detection, and AI-assisted updates
In the next part of the book, Part IV, you will learn about software architecture, design patterns, and advanced engineering practices. The documentation skills from this chapter will be essential for communicating architectural decisions and maintaining code quality at scale.
Chapter Summary
This chapter covered the full spectrum of software documentation, from README files to architecture decision records, with a focus on how AI assistants change the documentation landscape. You learned that AI-generated codebases need more documentation, not less, because the decision context is lost when code is generated rather than authored. You explored specific formats (README structure, ADR template, docstring conventions, changelog format), tools (Sphinx, MkDocs, mkdocstrings), and workflows (documentation-driven development, automated drift detection) that make documentation sustainable. The central message: documentation is not an afterthought; it is a first-class deliverable that AI makes easier to produce and maintain.