Chapter 23: Exercises — Documentation and Technical Writing

Tier 1: Remember and Understand (Exercises 1–6)

Exercise 1: Documentation Type Matching

Match each documentation type with its primary purpose and audience.

Documentation Type	Purpose	Audience
README	?	?
ADR	?	?
Changelog	?	?
Docstring	?	?
User Guide	?	?

Fill in the purpose and audience for each type from the following options:

Purposes: Capture decision rationale, Project overview, Track changes between releases, Function/class reference, Task-oriented instructions

Audiences: New users and contributors, Team members and future maintainers, Users and stakeholders, Developers reading source code, End users

Exercise 2: Diátaxis Category Classification

Classify each of the following documentation pieces into the correct Diátaxis category (Tutorial, How-To Guide, Reference, Explanation):

"Build Your First REST API with FastAPI" — a guided walkthrough for beginners
"How to Configure Rate Limiting for the /api/analyze Endpoint"
"DocumentAnalyzer Class — Methods, Parameters, and Return Types"
"Why We Chose PostgreSQL Over MongoDB for Document Storage"
"Getting Started: Analyzing Your First Document in 5 Minutes"
"API Error Codes: Complete Reference Table"
"How to Deploy the Application to AWS Lambda"
"Understanding the Plugin Architecture: Design Rationale"

Exercise 3: Docstring Style Identification

Identify which docstring style (Google, NumPy, or Sphinx) each of the following uses:

Sample A:

def process(data, threshold=0.5):
    """Process data with the given threshold.

    :param data: Input data to process.
    :type data: list[float]
    :param threshold: Minimum threshold value.
    :type threshold: float
    :returns: Processed results.
    :rtype: list[float]
    """

Sample B:

def process(data, threshold=0.5):
    """Process data with the given threshold.

    Args:
        data: Input data to process.
        threshold: Minimum threshold value.

    Returns:
        Processed results as a list of floats.
    """

Sample C:

def process(data, threshold=0.5):
    """
    Process data with the given threshold.

    Parameters
    ----------
    data : list[float]
        Input data to process.
    threshold : float, optional
        Minimum threshold value, by default 0.5.

    Returns
    -------
    list[float]
        Processed results.
    """

Exercise 4: Good vs. Bad Comments

For each of the following code comments, determine whether it is a good comment (explains "why") or a bad comment (restates "what"). Explain your reasoning.

# Comment 1
# Increment counter
counter += 1

# Comment 2
# Use binary search instead of linear scan because the dataset
# is always sorted and contains 1M+ entries
index = bisect.bisect_left(sorted_data, target)

# Comment 3
# Check if the user is an admin
if user.role == "admin":

# Comment 4
# The API rate limit resets at the top of each hour (UTC),
# not on a rolling window. We sleep until the next hour
# boundary to avoid wasting retry attempts.
sleep_until = next_hour_boundary(datetime.utcnow())
time.sleep((sleep_until - datetime.utcnow()).total_seconds())

# Comment 5
# Return the result
return result

# Comment 6
# WORKAROUND: pandas 2.1.0 has a bug where read_csv silently
# drops the last row when the file lacks a trailing newline.
# Fixed in 2.1.1 but we're pinned to 2.1.0 for compatibility.
# See: https://github.com/pandas-dev/pandas/issues/54321
df = pd.read_csv(path)
if not raw_text.endswith("\n"):
    last_row = raw_text.strip().split("\n")[-1]
    df = pd.concat([df, pd.DataFrame([last_row.split(",")])])

Exercise 5: README Section Ordering

A colleague drafts a README with the following sections in this order. Reorder them into the optimal sequence and explain your reasoning:

Contributing
Features
License
Quick Start
Project Name and Description
Installation
API Reference
Badges

Exercise 6: Changelog Category Assignment

Assign each of the following changes to the correct Keep a Changelog category (Added, Changed, Deprecated, Removed, Fixed, Security):

New endpoint /api/v2/documents/compare for document comparison
Updated the analyze() method to return a dataclass instead of a dictionary
The --verbose flag no longer produces colored output on non-TTY terminals
Patched a vulnerability in JWT token validation (CVE-2025-XXXX)
The legacy_export() function is no longer available; use export() instead
The format parameter of export() will be renamed to output_format in v3.0
Fixed a crash when analyzing empty documents
Upgraded bcrypt dependency to address known timing attack vector

Tier 2: Apply (Exercises 7–12)

Exercise 7: Write a Google-Style Docstring

Write a complete Google-style docstring for the following function. Include Args, Returns, Raises, and Example sections:

def merge_documents(
    documents: list[str],
    separator: str = "\n\n",
    deduplicate: bool = False,
    max_length: int | None = None,
) -> str:
    # Merges multiple document strings into one.
    # If deduplicate is True, removes duplicate paragraphs.
    # If max_length is set, truncates the result.
    # Raises ValueError if documents list is empty.
    # Raises TypeError if any element is not a string.
    ...

Exercise 8: Write an ADR

Write a complete Architecture Decision Record for the following scenario:

Your team is building a document analysis web application. You need to decide between using Celery with Redis and using Python's built-in asyncio with FastAPI's background tasks for processing document analysis jobs that take 5-30 seconds. Your team has three developers, all comfortable with async Python. The application needs to handle up to 50 concurrent analysis requests during peak hours. You chose asyncio with FastAPI background tasks.

Use the Nygard format (Status, Date, Context, Decision, Consequences with Positive/Negative/Neutral sections).

Exercise 9: Transform Commit Messages into Changelog Entries

Given the following git log, write a changelog section in Keep a Changelog format for version 2.3.0:

a1b2c3d feat: add support for DOCX file format
d4e5f6g fix: correct encoding detection for UTF-16 files
h7i8j9k refactor: split analyzer into separate modules
l0m1n2o feat: add --json flag to CLI for machine-readable output
p3q4r5s fix: handle permission errors when reading files
t6u7v8w docs: update API reference for new parameters
x9y0z1a chore: update pytest to 8.0
b2c3d4e feat: add progress bar for batch processing
f5g6h7i security: update cryptography to 42.0.0 (CVE-2025-1234)
j8k9l0m deprecate: mark XMLExporter as deprecated in favor of export()

Exercise 10: Write a Tutorial Introduction

Write the introduction section (approximately 300 words) for a tutorial titled "Build a Markdown Documentation Site with MkDocs." The tutorial is aimed at Python developers who have never used MkDocs. Include:

What the reader will build
What they will learn
Prerequisites
Estimated time to complete

Exercise 11: Improve an Existing Docstring

The following docstring is incomplete and uses inconsistent formatting. Rewrite it in proper Google style with full parameter documentation:

def search_documents(query, filters=None, page=1, per_page=20,
                     sort_by="relevance", include_deleted=False):
    """Search for documents.

    query - the search query
    filters - optional filters

    Returns results.
    """
    ...

Exercise 12: Create a README Quick Start Section

Write a Quick Start section for a Python library called textmetrics that: - Is installed with pip install textmetrics - Has a main class TextAnalyzer that takes a string of text - Has methods readability_score(), word_frequency(), and summary(max_sentences=3) - Each method returns different types (float, dict, str)

Include installation, a minimal code example, and expected output.

Tier 3: Analyze (Exercises 13–18)

Exercise 13: Documentation Audit

Analyze the following Python module and identify all documentation gaps. List every function, class, and method that is missing or has incomplete documentation:

"""Utility functions for text processing."""

import re
from collections import Counter


class TextProcessor:
    def __init__(self, language="en"):
        self.language = language
        self._stop_words = self._load_stop_words()

    def _load_stop_words(self):
        # Load stop words for the given language
        stop_words = {"en": {"the", "a", "an", "is", "are", "was"}}
        return stop_words.get(self.language, set())

    def tokenize(self, text):
        """Split text into tokens."""
        return re.findall(r'\b\w+\b', text.lower())

    def word_count(self, text):
        return len(self.tokenize(text))

    def unique_words(self, text):
        """Get unique words."""
        tokens = self.tokenize(text)
        return set(tokens) - self._stop_words

    def word_frequencies(self, text, top_n=10):
        tokens = self.tokenize(text)
        filtered = [t for t in tokens if t not in self._stop_words]
        return Counter(filtered).most_common(top_n)

    def sentence_count(self, text):
        return len(re.split(r'[.!?]+', text.strip())) - 1

    def average_word_length(self, text):
        """Calculate average word length.

        Args:
            text: Input text.
        """
        tokens = self.tokenize(text)
        if not tokens:
            return 0.0
        return sum(len(t) for t in tokens) / len(tokens)


def readability_score(text, algorithm="flesch"):
    words = len(re.findall(r'\b\w+\b', text))
    sentences = len(re.split(r'[.!?]+', text.strip()))
    syllables = sum(count_syllables(w) for w in text.split())
    if algorithm == "flesch":
        return 206.835 - 1.015 * (words/sentences) - 84.6 * (syllables/words)
    raise ValueError(f"Unknown algorithm: {algorithm}")


def count_syllables(word):
    word = word.lower()
    count = 0
    vowels = "aeiouy"
    if word[0] in vowels:
        count += 1
    for i in range(1, len(word)):
        if word[i] in vowels and word[i-1] not in vowels:
            count += 1
    if word.endswith("e"):
        count -= 1
    return max(count, 1)

For each gap, specify what information is missing and write the corrected docstring.

Exercise 14: Compare Documentation Approaches

You are evaluating two documentation strategies for a medium-sized Python library (50 modules, 200 public functions). Compare the following approaches:

Approach A: Sphinx with autodoc, reStructuredText, hosted on Read the Docs Approach B: MkDocs with mkdocstrings, Markdown, hosted on GitHub Pages

Analyze each approach across these dimensions: - Setup complexity - Maintenance burden - Contributor friendliness - Output quality - Integration with CI/CD - Support for code examples - Search functionality

Which would you recommend for a new open-source project in 2025, and why?

Exercise 15: Identify Documentation Drift

Given the following function implementation and its docstring, identify all discrepancies:

def export_report(
    data: dict,
    format: str = "json",
    output_path: str | None = None,
    pretty: bool = True,
    include_metadata: bool = True,
) -> str:
    """Export analysis report in the specified format.

    Args:
        data: The analysis data to export. Must contain
            'results' and 'summary' keys.
        format: Output format. Supported values are "json",
            "csv", and "html". Defaults to "json".
        output_path: Path to write the output file. If None,
            returns the formatted string without writing.
        indent: Number of spaces for indentation in JSON
            output. Defaults to 2.

    Returns:
        The formatted report as a string.

    Raises:
        ValueError: If format is not supported.
        KeyError: If data is missing required keys.

    Example:
        >>> report = export_report(data, format="html")
        >>> print(report[:50])
        <html><head><title>Analysis Report</title>...
    """
    ...

Exercise 16: ADR Evaluation

Read the following ADR and evaluate it for completeness, clarity, and usefulness. Identify what is missing or could be improved:

# ADR-005: Use React

## Status
Accepted

## Context
We need a frontend framework.

## Decision
We will use React.

## Consequences
React is popular and well-supported.

Exercise 17: Docstring vs. Comment Analysis

For each of the following code blocks, determine whether the documentation should be a docstring, an inline comment, both, or neither. Explain your reasoning:

# Block A
MAX_RETRIES = 3  # Maximum number of retry attempts

# Block B
def calculate_tax(amount, rate):
    return amount * rate

# Block C
# We use exponential backoff here because the API docs
# recommend it for 429 responses, and linear backoff
# caused cascading failures in production (incident #127)
delay = base_delay * (2 ** attempt)

# Block D
class DatabaseConnection:
    pass

# Block E
SUPPORTED_FORMATS = ["json", "csv", "html", "pdf", "xml"]

Exercise 18: Documentation Strategy Analysis

A startup has the following codebase profile: - 3 microservices (Python FastAPI) - 1 React frontend - 2 developers, growing to 5 in 6 months - No existing documentation except code comments - Public API consumed by 12 external customers - Internal tools used by a support team of 4

Design a documentation priority list. Which documentation types should they create first, second, and third? Justify your ordering based on impact and effort.

Tier 4: Evaluate (Exercises 19–24)

Exercise 19: Evaluate AI-Generated Documentation

An AI assistant generated the following README for a Python package. Evaluate it on completeness, accuracy signals, usability, and areas for improvement:

# DataPipeline

A powerful data pipeline framework for Python.

## Installation

pip install datapipeline


## Usage

```python
from datapipeline import Pipeline

p = Pipeline()
p.add_step(clean_data)
p.add_step(transform_data)
p.add_step(load_data)
p.run(input_data)

Features

Easy to use
Fast
Scalable
Well-tested

License

MIT


Write a detailed critique covering at least 8 specific improvements.

### Exercise 20: Compare Docstring Conventions
You are establishing docstring standards for a new team. Three team members advocate for different styles:

- Developer A wants Google style because "it reads naturally in source code"
- Developer B wants NumPy style because "the scientific Python ecosystem uses it"
- Developer C wants Sphinx style because "it integrates directly with Sphinx without extensions"

The team is building a data processing library that will be used by both web developers and data scientists. Evaluate each position and make a recommendation with justification.

### Exercise 21: Evaluate Documentation-Driven Development
Your team lead proposes adopting documentation-driven development (writing docs before code) for all new features. A senior developer objects, arguing that:

1. "We waste time documenting features that change during implementation"
2. "It slows down development velocity"
3. "We already have tests; we don't need docs first too"

Evaluate each objection. For each one, explain whether it is valid, partially valid, or invalid, and provide a nuanced response that acknowledges trade-offs.

### Exercise 22: Changelog Quality Assessment
Evaluate the following changelog entry for quality, completeness, and usefulness to the target audience:

```markdown
## [3.0.0] - 2025-12-01

### Changed
- Refactored core module
- Updated dependencies
- Improved performance
- Changed API

### Fixed
- Various bug fixes

Identify at least 6 specific problems and rewrite the changelog entry with hypothetical but realistic content that demonstrates best practices.

Exercise 23: Evaluate Comment Necessity

Review the following function and evaluate each comment. For each comment, state whether it should be kept, removed, or rewritten, and explain why:

def process_transaction(transaction: dict) -> dict:
    """Process a financial transaction."""

    # Get the amount from the transaction
    amount = transaction["amount"]

    # Check if amount is negative
    if amount < 0:
        # Negative amounts are refunds, which use a different
        # processing pipeline because our payment provider
        # (Stripe) requires refund requests to reference the
        # original charge ID, and the refund amount must not
        # exceed the original charge
        return process_refund(transaction)

    # Apply the tax rate
    tax = amount * TAX_RATE

    # Calculate the total
    total = amount + tax

    # The processing fee is capped at $50 per transaction
    # per our merchant agreement (contract ref: MA-2025-0042)
    fee = min(total * FEE_RATE, 50.00)

    # Return the result
    return {
        "amount": amount,
        "tax": tax,
        "fee": fee,
        "total": total + fee,
    }

Exercise 24: Evaluate Documentation Tools

Your team needs to choose a documentation hosting and generation solution. Evaluate the following three options for a Python library with 100+ public API endpoints:

Option A: Sphinx + Read the Docs (free for open source) Option B: MkDocs Material + GitHub Pages (free) Option C: Docusaurus + Netlify (free tier)

Evaluate each across: learning curve, Python-specific features, versioning support, search quality, customization, community support, and CI/CD integration. Make a recommendation.

Tier 5: Create (Exercises 25–30)

Exercise 25: Create a Complete Documentation Suite

Create the following documentation files for a fictional Python library called logwise that provides structured logging with automatic context enrichment:

A README.md (at least 500 words)
A CONTRIBUTING.md (at least 300 words)
A CHANGELOG.md with entries for versions 1.0.0 through 1.2.0
A docstring for the main Logger class and its three primary methods

Exercise 26: Build a Documentation CI Pipeline

Design and write the configuration for a GitHub Actions workflow that:

Checks docstring coverage (fails if below 90%)
Validates all links in Markdown files
Runs code examples extracted from documentation
Builds MkDocs documentation
Deploys to GitHub Pages on merge to main

Write the complete .github/workflows/docs.yml file and any supporting scripts.

Exercise 27: Create an ADR System

Design a complete ADR management system for a team project:

Write a template (adr-template.md) with all required sections
Write ADR-001 about your chosen ADR format and process
Write ADR-002 about a technical decision of your choice
Create a docs/adr/README.md that serves as an index of all ADRs

Exercise 28: Write a Complete Tutorial

Write a complete tutorial (at least 1,500 words) that teaches a Python developer how to add comprehensive documentation to an existing undocumented Flask API. The tutorial should cover:

Adding docstrings to existing routes and functions
Setting up Sphinx with autodoc
Writing a getting-started guide
Configuring automated documentation builds

Include all code examples, configuration files, and expected outputs.

Exercise 29: Create a Documentation Drift Detector

Write a Python script that:

Parses all Python files in a given directory
Extracts function signatures (parameter names and types)
Extracts docstring parameter documentation
Compares the two and reports discrepancies
Outputs a report in both human-readable and JSON formats

The script should handle Google, NumPy, and Sphinx docstring styles. Include complete type hints and docstrings for the script itself.

Exercise 30: Design a Documentation Strategy

You are the technical lead for a new project: a Python-based data pipeline that will be: - Used by 5 internal development teams - Consumed as a library by 3 external partner companies - Operated by a DevOps team with limited Python experience - Extended by a plugin system for custom data transformations

Create a comprehensive documentation strategy document that includes:

Documentation types needed and their priority
Tools and hosting decisions (with justification)
Docstring style guide
Template for each documentation type
Maintenance workflow and responsibilities
Quality metrics and targets
Timeline for initial documentation creation

The strategy document should be at least 2,000 words and reference specific tools, formats, and practices covered in this chapter.