Case Study 02: From Script to Distributed Package

Overview

This case study follows the evolution of a simple Python script into a properly packaged, pip-installable CLI tool. We start with a quick-and-dirty script that converts Markdown files to HTML, and through five stages of refinement, transform it into a professional package ready for distribution on PyPI. Each stage demonstrates specific concepts from Chapter 15 and shows how AI assistance accelerates the transformation.

This mirrors a common real-world scenario: a developer writes a script to solve an immediate problem, the script proves useful, colleagues ask for it, and suddenly the script needs to be a proper tool.

Stage 0: The Original Script

Every journey begins somewhere. Here is the script that started it all -- a 30-line Markdown-to-HTML converter:

# md2html.py - Quick markdown to HTML converter
import sys
import markdown

if len(sys.argv) < 2:
    print("Usage: python md2html.py <input.md> [output.html]")
    sys.exit(1)

input_file = sys.argv[1]
output_file = sys.argv[2] if len(sys.argv) > 2 else input_file.replace('.md', '.html')

with open(input_file) as f:
    content = f.read()

html = markdown.markdown(content, extensions=['tables', 'fenced_code', 'toc'])
html = f"""<!DOCTYPE html>
<html>
<head><title>{input_file}</title></head>
<body>{html}</body>
</html>"""

with open(output_file, 'w') as f:
    f.write(html)

print(f"Converted {input_file} -> {output_file}")

This script works. It solves the immediate problem. But it has significant limitations:

No error handling (crashes on missing files with a traceback)
No argument validation (what if the input is not a .md file?)
Hardcoded HTML template
No support for batch processing
No configuration options (extensions, templates, CSS)
No way to install it as a command
Manual sys.argv parsing with no help text

Stage 1: Adding Proper Argument Parsing

Prompt to AI: "Refactor this Markdown-to-HTML converter script to use click for argument parsing. Add: (1) a positional INPUT argument for the Markdown file(s) that accepts multiple files, (2) an --output/-o option for the output directory (default: same directory as input), (3) a --template/-t option for an HTML template file, (4) a --css option to include a CSS file link, (5) a --watch/-w flag to watch for changes and re-convert automatically, (6) a --verbose/-v flag. Add proper help text for all options."

The AI generates a click-based interface:

@click.command()
@click.argument("input_files", nargs=-1, required=True,
                type=click.Path(exists=True))
@click.option("-o", "--output", "output_dir", type=click.Path(),
              default=None, help="Output directory (default: same as input)")
@click.option("-t", "--template", type=click.Path(exists=True),
              default=None, help="HTML template file")
@click.option("--css", type=str, default=None,
              help="CSS file URL or path to include")
@click.option("-w", "--watch", is_flag=True,
              help="Watch for changes and re-convert")
@click.option("-v", "--verbose", is_flag=True,
              help="Enable verbose output")
def convert(input_files, output_dir, template, css, watch, verbose):
    """Convert Markdown files to HTML."""
    # ... implementation

At this stage, we have proper argument parsing with help text, type validation (click verifies files exist), and multiple file support. The tool is already more professional, but it is still a single file.

Stage 2: Separating Concerns

The single-file script has grown to 150 lines, mixing CLI code with conversion logic. Time to separate concerns:

Prompt to AI: "Refactor the md2html tool into a proper Python package with this structure: (1) converter.py -- the core Markdown-to-HTML conversion logic as a class with no CLI dependencies; (2) templates.py -- HTML template management; (3) cli.py -- the click CLI interface that uses the converter; (4) init.py with version info. The converter should accept configuration as a dataclass, not as individual parameters."

This produces a clean package structure:

md2html/
    __init__.py          # __version__ = "0.2.0"
    cli.py               # Click commands
    converter.py         # MarkdownConverter class
    templates.py         # Template loading and rendering

The key refactoring is the MarkdownConverter class:

@dataclass
class ConversionConfig:
    """Configuration for Markdown conversion."""
    extensions: list[str] = field(default_factory=lambda: [
        "tables", "fenced_code", "toc", "codehilite"
    ])
    template: str | None = None
    css_url: str | None = None
    output_dir: Path | None = None


class MarkdownConverter:
    """Converts Markdown files to HTML."""

    def __init__(self, config: ConversionConfig | None = None) -> None:
        self.config = config or ConversionConfig()
        self._md = markdown.Markdown(extensions=self.config.extensions)

    def convert_file(self, input_path: Path) -> Path:
        """Convert a single Markdown file to HTML. Returns output path."""
        content = input_path.read_text(encoding="utf-8")
        html_body = self._md.convert(content)
        self._md.reset()

        html = self._render_template(
            title=input_path.stem,
            body=html_body,
        )

        output_path = self._get_output_path(input_path)
        output_path.write_text(html, encoding="utf-8")
        return output_path

Now the converter can be imported and used as a library, tested without the CLI, and reused in other projects.

Stage 3: Adding Configuration, Logging, and Error Handling

Prompt to AI: "Add to the md2html package: (1) A TOML configuration file that users can place at ~/.config/md2html/config.toml to set default extensions, template, and CSS. (2) Logging with --verbose support that shows each file being processed. (3) Custom exceptions: ConversionError, TemplateError, and ConfigError. (4) Proper exit codes: 0 for success, 1 for partial failure (some files converted), 2 for complete failure. (5) A --dry-run flag that shows what would be converted without writing files."

This stage adds the production layers:

# Custom exceptions
class Md2HtmlError(Exception):
    """Base exception for md2html."""
    def __init__(self, message: str, exit_code: int = 1) -> None:
        super().__init__(message)
        self.exit_code = exit_code

class ConversionError(Md2HtmlError):
    """A file could not be converted."""
    pass

class TemplateError(Md2HtmlError):
    """The HTML template is invalid."""
    pass

The main function now tracks success and failure counts:

def convert_files(files, config, verbose, dry_run):
    """Convert multiple files with error tracking."""
    succeeded = 0
    failed = 0
    converter = MarkdownConverter(config)

    for file_path in files:
        try:
            if dry_run:
                logger.info("Would convert: %s", file_path)
            else:
                output = converter.convert_file(file_path)
                logger.info("Converted: %s -> %s", file_path, output)
            succeeded += 1
        except ConversionError as exc:
            logger.error("Failed to convert %s: %s", file_path, exc)
            failed += 1

    if failed > 0 and succeeded == 0:
        return 2  # Complete failure
    elif failed > 0:
        return 1  # Partial failure
    return 0      # Complete success

Stage 4: Packaging with pyproject.toml

Now the tool is feature-complete. Time to make it installable:

Prompt to AI: "Create a pyproject.toml for the md2html package. Use setuptools as the build backend. The entry point should map the command 'md2html' to 'md2html.cli:main'. Dependencies: markdown, click, rich, pygments (for code highlighting), tomli (Python < 3.11). Dev dependencies: pytest, pytest-cov, mypy, ruff. Use the src layout."

We restructure the project:

md2html-project/
    pyproject.toml
    README.md
    LICENSE
    src/
        md2html/
            __init__.py
            __main__.py
            cli.py
            converter.py
            templates.py
            config.py
            exceptions.py
            default_template.html
    tests/
        __init__.py
        test_converter.py
        test_cli.py
        conftest.py

The pyproject.toml:

[build-system]
requires = ["setuptools>=68.0"]
build-backend = "setuptools.backends._legacy:_Backend"

[project]
name = "md2html-cli"
version = "1.0.0"
description = "Convert Markdown files to beautiful HTML from the command line"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.10"
dependencies = [
    "markdown>=3.4",
    "click>=8.0",
    "rich>=13.0",
    "pygments>=2.15",
    "tomli>=2.0; python_version < '3.11'",
]

[project.scripts]
md2html = "md2html.cli:main"

[project.optional-dependencies]
dev = ["pytest>=7.0", "pytest-cov>=4.0", "mypy>=1.0", "ruff>=0.1.0"]

The __main__.py:

"""Allow running with python -m md2html."""
from md2html.cli import main

if __name__ == "__main__":
    raise SystemExit(main())

Now users can install and use the tool:

pip install md2html-cli
md2html README.md --css style.css -o build/

Stage 5: Publishing to PyPI

The final stage is publishing the package:

# Install build tools
pip install build twine

# Build the package
python -m build
# Creates dist/md2html_cli-1.0.0.tar.gz and dist/md2html_cli-1.0.0-py3-none-any.whl

# Test on TestPyPI first
twine upload --repository testpypi dist/*

# Verify the installation from TestPyPI
pip install --index-url https://test.pypi.org/simple/ md2html-cli

# If everything works, publish to real PyPI
twine upload dist/*

Before publishing, we also add:

A CHANGELOG.md with the initial release entry
A LICENSE file (MIT)
A README.md with installation instructions, usage examples, and configuration reference
GitHub Actions CI for automated testing and publishing

Evolution Summary

Stage	Lines of Code	Features Added	Key Chapter 15 Concept
0	30	Basic conversion	(Starting point)
1	80	Argument parsing, multi-file, help text	Section 15.2: Argument Parsing
2	200	Separated concerns, classes, library-usable	Section 15.1: CLI Architecture
3	350	Config, logging, errors, dry-run	Sections 15.3, 15.4, 15.7
4	400	pyproject.toml, entry points, src layout	Section 15.8: Packaging
5	400	Published to PyPI	Section 15.8: Distribution

Key Lessons

1. The hardest part is not the code -- it is the structure. The converter logic barely changed from Stage 0 to Stage 5. What changed was how it was organized, configured, invoked, and distributed.

2. AI excels at the boilerplate stages. Generating pyproject.toml, __main__.py, argument parsing decorators, and logging setup is tedious for humans but trivial for AI. This is where AI assistance provides the most value in CLI development.

3. Each stage should be working and tested before moving to the next. We never had a broken tool. At every stage, python md2html.py README.md (or later, md2html README.md) worked correctly. Incremental improvement is safer than big-bang refactoring.

4. The src layout prevents subtle import bugs. By placing the package under src/, we ensure that tests import the installed version of the package, not the source directory. This catches packaging errors (missing files, wrong paths) before they reach users.

5. TestPyPI is essential. Publishing to TestPyPI first catches issues with package metadata, missing data files, and dependency declarations that do not manifest during local development. It costs nothing and saves embarrassment.

6. The script-to-package journey is common enough to template. After going through this process once, you can create a cookiecutter or copier template for your future CLI tools. AI can generate these templates, giving you a head start on every new project.

This case study demonstrates that the distance from "script that works on my machine" to "tool anyone can install and use" is not as large as it seems. With AI assistance handling the boilerplate, a disciplined developer can make the journey in an afternoon.