Chapter 23: Exercises — Documentation and Technical Writing
Tier 1: Remember and Understand (Exercises 1–6)
Exercise 1: Documentation Type Matching
Match each documentation type with its primary purpose and audience.
| Documentation Type | Purpose | Audience |
|---|---|---|
| README | ? | ? |
| ADR | ? | ? |
| Changelog | ? | ? |
| Docstring | ? | ? |
| User Guide | ? | ? |
Fill in the purpose and audience for each type from the following options:
Purposes: Capture decision rationale, Project overview, Track changes between releases, Function/class reference, Task-oriented instructions
Audiences: New users and contributors, Team members and future maintainers, Users and stakeholders, Developers reading source code, End users
Exercise 2: Diátaxis Category Classification
Classify each of the following documentation pieces into the correct Diátaxis category (Tutorial, How-To Guide, Reference, Explanation):
- "Build Your First REST API with FastAPI" — a guided walkthrough for beginners
- "How to Configure Rate Limiting for the /api/analyze Endpoint"
- "DocumentAnalyzer Class — Methods, Parameters, and Return Types"
- "Why We Chose PostgreSQL Over MongoDB for Document Storage"
- "Getting Started: Analyzing Your First Document in 5 Minutes"
- "API Error Codes: Complete Reference Table"
- "How to Deploy the Application to AWS Lambda"
- "Understanding the Plugin Architecture: Design Rationale"
Exercise 3: Docstring Style Identification
Identify which docstring style (Google, NumPy, or Sphinx) each of the following uses:
Sample A:
def process(data, threshold=0.5):
"""Process data with the given threshold.
:param data: Input data to process.
:type data: list[float]
:param threshold: Minimum threshold value.
:type threshold: float
:returns: Processed results.
:rtype: list[float]
"""
Sample B:
def process(data, threshold=0.5):
"""Process data with the given threshold.
Args:
data: Input data to process.
threshold: Minimum threshold value.
Returns:
Processed results as a list of floats.
"""
Sample C:
def process(data, threshold=0.5):
"""
Process data with the given threshold.
Parameters
----------
data : list[float]
Input data to process.
threshold : float, optional
Minimum threshold value, by default 0.5.
Returns
-------
list[float]
Processed results.
"""
Exercise 4: Good vs. Bad Comments
For each of the following code comments, determine whether it is a good comment (explains "why") or a bad comment (restates "what"). Explain your reasoning.
# Comment 1
# Increment counter
counter += 1
# Comment 2
# Use binary search instead of linear scan because the dataset
# is always sorted and contains 1M+ entries
index = bisect.bisect_left(sorted_data, target)
# Comment 3
# Check if the user is an admin
if user.role == "admin":
# Comment 4
# The API rate limit resets at the top of each hour (UTC),
# not on a rolling window. We sleep until the next hour
# boundary to avoid wasting retry attempts.
sleep_until = next_hour_boundary(datetime.utcnow())
time.sleep((sleep_until - datetime.utcnow()).total_seconds())
# Comment 5
# Return the result
return result
# Comment 6
# WORKAROUND: pandas 2.1.0 has a bug where read_csv silently
# drops the last row when the file lacks a trailing newline.
# Fixed in 2.1.1 but we're pinned to 2.1.0 for compatibility.
# See: https://github.com/pandas-dev/pandas/issues/54321
df = pd.read_csv(path)
if not raw_text.endswith("\n"):
last_row = raw_text.strip().split("\n")[-1]
df = pd.concat([df, pd.DataFrame([last_row.split(",")])])
Exercise 5: README Section Ordering
A colleague drafts a README with the following sections in this order. Reorder them into the optimal sequence and explain your reasoning:
- Contributing
- Features
- License
- Quick Start
- Project Name and Description
- Installation
- API Reference
- Badges
Exercise 6: Changelog Category Assignment
Assign each of the following changes to the correct Keep a Changelog category (Added, Changed, Deprecated, Removed, Fixed, Security):
- New endpoint
/api/v2/documents/comparefor document comparison - Updated the
analyze()method to return a dataclass instead of a dictionary - The
--verboseflag no longer produces colored output on non-TTY terminals - Patched a vulnerability in JWT token validation (CVE-2025-XXXX)
- The
legacy_export()function is no longer available; useexport()instead - The
formatparameter ofexport()will be renamed tooutput_formatin v3.0 - Fixed a crash when analyzing empty documents
- Upgraded bcrypt dependency to address known timing attack vector
Tier 2: Apply (Exercises 7–12)
Exercise 7: Write a Google-Style Docstring
Write a complete Google-style docstring for the following function. Include Args, Returns, Raises, and Example sections:
def merge_documents(
documents: list[str],
separator: str = "\n\n",
deduplicate: bool = False,
max_length: int | None = None,
) -> str:
# Merges multiple document strings into one.
# If deduplicate is True, removes duplicate paragraphs.
# If max_length is set, truncates the result.
# Raises ValueError if documents list is empty.
# Raises TypeError if any element is not a string.
...
Exercise 8: Write an ADR
Write a complete Architecture Decision Record for the following scenario:
Your team is building a document analysis web application. You need to decide between using Celery with Redis and using Python's built-in asyncio with FastAPI's background tasks for processing document analysis jobs that take 5-30 seconds. Your team has three developers, all comfortable with async Python. The application needs to handle up to 50 concurrent analysis requests during peak hours. You chose asyncio with FastAPI background tasks.
Use the Nygard format (Status, Date, Context, Decision, Consequences with Positive/Negative/Neutral sections).
Exercise 9: Transform Commit Messages into Changelog Entries
Given the following git log, write a changelog section in Keep a Changelog format for version 2.3.0:
a1b2c3d feat: add support for DOCX file format
d4e5f6g fix: correct encoding detection for UTF-16 files
h7i8j9k refactor: split analyzer into separate modules
l0m1n2o feat: add --json flag to CLI for machine-readable output
p3q4r5s fix: handle permission errors when reading files
t6u7v8w docs: update API reference for new parameters
x9y0z1a chore: update pytest to 8.0
b2c3d4e feat: add progress bar for batch processing
f5g6h7i security: update cryptography to 42.0.0 (CVE-2025-1234)
j8k9l0m deprecate: mark XMLExporter as deprecated in favor of export()
Exercise 10: Write a Tutorial Introduction
Write the introduction section (approximately 300 words) for a tutorial titled "Build a Markdown Documentation Site with MkDocs." The tutorial is aimed at Python developers who have never used MkDocs. Include:
- What the reader will build
- What they will learn
- Prerequisites
- Estimated time to complete
Exercise 11: Improve an Existing Docstring
The following docstring is incomplete and uses inconsistent formatting. Rewrite it in proper Google style with full parameter documentation:
def search_documents(query, filters=None, page=1, per_page=20,
sort_by="relevance", include_deleted=False):
"""Search for documents.
query - the search query
filters - optional filters
Returns results.
"""
...
Exercise 12: Create a README Quick Start Section
Write a Quick Start section for a Python library called textmetrics that:
- Is installed with pip install textmetrics
- Has a main class TextAnalyzer that takes a string of text
- Has methods readability_score(), word_frequency(), and summary(max_sentences=3)
- Each method returns different types (float, dict, str)
Include installation, a minimal code example, and expected output.
Tier 3: Analyze (Exercises 13–18)
Exercise 13: Documentation Audit
Analyze the following Python module and identify all documentation gaps. List every function, class, and method that is missing or has incomplete documentation:
"""Utility functions for text processing."""
import re
from collections import Counter
class TextProcessor:
def __init__(self, language="en"):
self.language = language
self._stop_words = self._load_stop_words()
def _load_stop_words(self):
# Load stop words for the given language
stop_words = {"en": {"the", "a", "an", "is", "are", "was"}}
return stop_words.get(self.language, set())
def tokenize(self, text):
"""Split text into tokens."""
return re.findall(r'\b\w+\b', text.lower())
def word_count(self, text):
return len(self.tokenize(text))
def unique_words(self, text):
"""Get unique words."""
tokens = self.tokenize(text)
return set(tokens) - self._stop_words
def word_frequencies(self, text, top_n=10):
tokens = self.tokenize(text)
filtered = [t for t in tokens if t not in self._stop_words]
return Counter(filtered).most_common(top_n)
def sentence_count(self, text):
return len(re.split(r'[.!?]+', text.strip())) - 1
def average_word_length(self, text):
"""Calculate average word length.
Args:
text: Input text.
"""
tokens = self.tokenize(text)
if not tokens:
return 0.0
return sum(len(t) for t in tokens) / len(tokens)
def readability_score(text, algorithm="flesch"):
words = len(re.findall(r'\b\w+\b', text))
sentences = len(re.split(r'[.!?]+', text.strip()))
syllables = sum(count_syllables(w) for w in text.split())
if algorithm == "flesch":
return 206.835 - 1.015 * (words/sentences) - 84.6 * (syllables/words)
raise ValueError(f"Unknown algorithm: {algorithm}")
def count_syllables(word):
word = word.lower()
count = 0
vowels = "aeiouy"
if word[0] in vowels:
count += 1
for i in range(1, len(word)):
if word[i] in vowels and word[i-1] not in vowels:
count += 1
if word.endswith("e"):
count -= 1
return max(count, 1)
For each gap, specify what information is missing and write the corrected docstring.
Exercise 14: Compare Documentation Approaches
You are evaluating two documentation strategies for a medium-sized Python library (50 modules, 200 public functions). Compare the following approaches:
Approach A: Sphinx with autodoc, reStructuredText, hosted on Read the Docs Approach B: MkDocs with mkdocstrings, Markdown, hosted on GitHub Pages
Analyze each approach across these dimensions: - Setup complexity - Maintenance burden - Contributor friendliness - Output quality - Integration with CI/CD - Support for code examples - Search functionality
Which would you recommend for a new open-source project in 2025, and why?
Exercise 15: Identify Documentation Drift
Given the following function implementation and its docstring, identify all discrepancies:
def export_report(
data: dict,
format: str = "json",
output_path: str | None = None,
pretty: bool = True,
include_metadata: bool = True,
) -> str:
"""Export analysis report in the specified format.
Args:
data: The analysis data to export. Must contain
'results' and 'summary' keys.
format: Output format. Supported values are "json",
"csv", and "html". Defaults to "json".
output_path: Path to write the output file. If None,
returns the formatted string without writing.
indent: Number of spaces for indentation in JSON
output. Defaults to 2.
Returns:
The formatted report as a string.
Raises:
ValueError: If format is not supported.
KeyError: If data is missing required keys.
Example:
>>> report = export_report(data, format="html")
>>> print(report[:50])
<html><head><title>Analysis Report</title>...
"""
...
Exercise 16: ADR Evaluation
Read the following ADR and evaluate it for completeness, clarity, and usefulness. Identify what is missing or could be improved:
# ADR-005: Use React
## Status
Accepted
## Context
We need a frontend framework.
## Decision
We will use React.
## Consequences
React is popular and well-supported.
Exercise 17: Docstring vs. Comment Analysis
For each of the following code blocks, determine whether the documentation should be a docstring, an inline comment, both, or neither. Explain your reasoning:
# Block A
MAX_RETRIES = 3 # Maximum number of retry attempts
# Block B
def calculate_tax(amount, rate):
return amount * rate
# Block C
# We use exponential backoff here because the API docs
# recommend it for 429 responses, and linear backoff
# caused cascading failures in production (incident #127)
delay = base_delay * (2 ** attempt)
# Block D
class DatabaseConnection:
pass
# Block E
SUPPORTED_FORMATS = ["json", "csv", "html", "pdf", "xml"]
Exercise 18: Documentation Strategy Analysis
A startup has the following codebase profile: - 3 microservices (Python FastAPI) - 1 React frontend - 2 developers, growing to 5 in 6 months - No existing documentation except code comments - Public API consumed by 12 external customers - Internal tools used by a support team of 4
Design a documentation priority list. Which documentation types should they create first, second, and third? Justify your ordering based on impact and effort.
Tier 4: Evaluate (Exercises 19–24)
Exercise 19: Evaluate AI-Generated Documentation
An AI assistant generated the following README for a Python package. Evaluate it on completeness, accuracy signals, usability, and areas for improvement:
# DataPipeline
A powerful data pipeline framework for Python.
## Installation
pip install datapipeline
## Usage
```python
from datapipeline import Pipeline
p = Pipeline()
p.add_step(clean_data)
p.add_step(transform_data)
p.add_step(load_data)
p.run(input_data)
Features
- Easy to use
- Fast
- Scalable
- Well-tested
License
MIT
Write a detailed critique covering at least 8 specific improvements.
### Exercise 20: Compare Docstring Conventions
You are establishing docstring standards for a new team. Three team members advocate for different styles:
- Developer A wants Google style because "it reads naturally in source code"
- Developer B wants NumPy style because "the scientific Python ecosystem uses it"
- Developer C wants Sphinx style because "it integrates directly with Sphinx without extensions"
The team is building a data processing library that will be used by both web developers and data scientists. Evaluate each position and make a recommendation with justification.
### Exercise 21: Evaluate Documentation-Driven Development
Your team lead proposes adopting documentation-driven development (writing docs before code) for all new features. A senior developer objects, arguing that:
1. "We waste time documenting features that change during implementation"
2. "It slows down development velocity"
3. "We already have tests; we don't need docs first too"
Evaluate each objection. For each one, explain whether it is valid, partially valid, or invalid, and provide a nuanced response that acknowledges trade-offs.
### Exercise 22: Changelog Quality Assessment
Evaluate the following changelog entry for quality, completeness, and usefulness to the target audience:
```markdown
## [3.0.0] - 2025-12-01
### Changed
- Refactored core module
- Updated dependencies
- Improved performance
- Changed API
### Fixed
- Various bug fixes
Identify at least 6 specific problems and rewrite the changelog entry with hypothetical but realistic content that demonstrates best practices.
Exercise 23: Evaluate Comment Necessity
Review the following function and evaluate each comment. For each comment, state whether it should be kept, removed, or rewritten, and explain why:
def process_transaction(transaction: dict) -> dict:
"""Process a financial transaction."""
# Get the amount from the transaction
amount = transaction["amount"]
# Check if amount is negative
if amount < 0:
# Negative amounts are refunds, which use a different
# processing pipeline because our payment provider
# (Stripe) requires refund requests to reference the
# original charge ID, and the refund amount must not
# exceed the original charge
return process_refund(transaction)
# Apply the tax rate
tax = amount * TAX_RATE
# Calculate the total
total = amount + tax
# The processing fee is capped at $50 per transaction
# per our merchant agreement (contract ref: MA-2025-0042)
fee = min(total * FEE_RATE, 50.00)
# Return the result
return {
"amount": amount,
"tax": tax,
"fee": fee,
"total": total + fee,
}
Exercise 24: Evaluate Documentation Tools
Your team needs to choose a documentation hosting and generation solution. Evaluate the following three options for a Python library with 100+ public API endpoints:
Option A: Sphinx + Read the Docs (free for open source) Option B: MkDocs Material + GitHub Pages (free) Option C: Docusaurus + Netlify (free tier)
Evaluate each across: learning curve, Python-specific features, versioning support, search quality, customization, community support, and CI/CD integration. Make a recommendation.
Tier 5: Create (Exercises 25–30)
Exercise 25: Create a Complete Documentation Suite
Create the following documentation files for a fictional Python library called logwise that provides structured logging with automatic context enrichment:
- A README.md (at least 500 words)
- A CONTRIBUTING.md (at least 300 words)
- A CHANGELOG.md with entries for versions 1.0.0 through 1.2.0
- A docstring for the main
Loggerclass and its three primary methods
Exercise 26: Build a Documentation CI Pipeline
Design and write the configuration for a GitHub Actions workflow that:
- Checks docstring coverage (fails if below 90%)
- Validates all links in Markdown files
- Runs code examples extracted from documentation
- Builds MkDocs documentation
- Deploys to GitHub Pages on merge to main
Write the complete .github/workflows/docs.yml file and any supporting scripts.
Exercise 27: Create an ADR System
Design a complete ADR management system for a team project:
- Write a template (
adr-template.md) with all required sections - Write ADR-001 about your chosen ADR format and process
- Write ADR-002 about a technical decision of your choice
- Create a
docs/adr/README.mdthat serves as an index of all ADRs
Exercise 28: Write a Complete Tutorial
Write a complete tutorial (at least 1,500 words) that teaches a Python developer how to add comprehensive documentation to an existing undocumented Flask API. The tutorial should cover:
- Adding docstrings to existing routes and functions
- Setting up Sphinx with autodoc
- Writing a getting-started guide
- Configuring automated documentation builds
Include all code examples, configuration files, and expected outputs.
Exercise 29: Create a Documentation Drift Detector
Write a Python script that:
- Parses all Python files in a given directory
- Extracts function signatures (parameter names and types)
- Extracts docstring parameter documentation
- Compares the two and reports discrepancies
- Outputs a report in both human-readable and JSON formats
The script should handle Google, NumPy, and Sphinx docstring styles. Include complete type hints and docstrings for the script itself.
Exercise 30: Design a Documentation Strategy
You are the technical lead for a new project: a Python-based data pipeline that will be: - Used by 5 internal development teams - Consumed as a library by 3 external partner companies - Operated by a DevOps team with limited Python experience - Extended by a plugin system for custom data transformations
Create a comprehensive documentation strategy document that includes:
- Documentation types needed and their priority
- Tools and hosting decisions (with justification)
- Docstring style guide
- Template for each documentation type
- Maintenance workflow and responsibilities
- Quality metrics and targets
- Timeline for initial documentation creation
The strategy document should be at least 2,000 words and reference specific tools, formats, and practices covered in this chapter.