Case Study 02: From Script to Distributed Package
Overview
This case study follows the evolution of a simple Python script into a properly packaged, pip-installable CLI tool. We start with a quick-and-dirty script that converts Markdown files to HTML, and through five stages of refinement, transform it into a professional package ready for distribution on PyPI. Each stage demonstrates specific concepts from Chapter 15 and shows how AI assistance accelerates the transformation.
This mirrors a common real-world scenario: a developer writes a script to solve an immediate problem, the script proves useful, colleagues ask for it, and suddenly the script needs to be a proper tool.
Stage 0: The Original Script
Every journey begins somewhere. Here is the script that started it all -- a 30-line Markdown-to-HTML converter:
# md2html.py - Quick markdown to HTML converter
import sys
import markdown
if len(sys.argv) < 2:
print("Usage: python md2html.py <input.md> [output.html]")
sys.exit(1)
input_file = sys.argv[1]
output_file = sys.argv[2] if len(sys.argv) > 2 else input_file.replace('.md', '.html')
with open(input_file) as f:
content = f.read()
html = markdown.markdown(content, extensions=['tables', 'fenced_code', 'toc'])
html = f"""<!DOCTYPE html>
<html>
<head><title>{input_file}</title></head>
<body>{html}</body>
</html>"""
with open(output_file, 'w') as f:
f.write(html)
print(f"Converted {input_file} -> {output_file}")
This script works. It solves the immediate problem. But it has significant limitations:
- No error handling (crashes on missing files with a traceback)
- No argument validation (what if the input is not a
.mdfile?) - Hardcoded HTML template
- No support for batch processing
- No configuration options (extensions, templates, CSS)
- No way to install it as a command
- Manual
sys.argvparsing with no help text
Stage 1: Adding Proper Argument Parsing
Prompt to AI: "Refactor this Markdown-to-HTML converter script to use click for argument parsing. Add: (1) a positional INPUT argument for the Markdown file(s) that accepts multiple files, (2) an --output/-o option for the output directory (default: same directory as input), (3) a --template/-t option for an HTML template file, (4) a --css option to include a CSS file link, (5) a --watch/-w flag to watch for changes and re-convert automatically, (6) a --verbose/-v flag. Add proper help text for all options."
The AI generates a click-based interface:
@click.command()
@click.argument("input_files", nargs=-1, required=True,
type=click.Path(exists=True))
@click.option("-o", "--output", "output_dir", type=click.Path(),
default=None, help="Output directory (default: same as input)")
@click.option("-t", "--template", type=click.Path(exists=True),
default=None, help="HTML template file")
@click.option("--css", type=str, default=None,
help="CSS file URL or path to include")
@click.option("-w", "--watch", is_flag=True,
help="Watch for changes and re-convert")
@click.option("-v", "--verbose", is_flag=True,
help="Enable verbose output")
def convert(input_files, output_dir, template, css, watch, verbose):
"""Convert Markdown files to HTML."""
# ... implementation
At this stage, we have proper argument parsing with help text, type validation (click verifies files exist), and multiple file support. The tool is already more professional, but it is still a single file.
Stage 2: Separating Concerns
The single-file script has grown to 150 lines, mixing CLI code with conversion logic. Time to separate concerns:
Prompt to AI: "Refactor the md2html tool into a proper Python package with this structure: (1) converter.py -- the core Markdown-to-HTML conversion logic as a class with no CLI dependencies; (2) templates.py -- HTML template management; (3) cli.py -- the click CLI interface that uses the converter; (4) init.py with version info. The converter should accept configuration as a dataclass, not as individual parameters."
This produces a clean package structure:
md2html/
__init__.py # __version__ = "0.2.0"
cli.py # Click commands
converter.py # MarkdownConverter class
templates.py # Template loading and rendering
The key refactoring is the MarkdownConverter class:
@dataclass
class ConversionConfig:
"""Configuration for Markdown conversion."""
extensions: list[str] = field(default_factory=lambda: [
"tables", "fenced_code", "toc", "codehilite"
])
template: str | None = None
css_url: str | None = None
output_dir: Path | None = None
class MarkdownConverter:
"""Converts Markdown files to HTML."""
def __init__(self, config: ConversionConfig | None = None) -> None:
self.config = config or ConversionConfig()
self._md = markdown.Markdown(extensions=self.config.extensions)
def convert_file(self, input_path: Path) -> Path:
"""Convert a single Markdown file to HTML. Returns output path."""
content = input_path.read_text(encoding="utf-8")
html_body = self._md.convert(content)
self._md.reset()
html = self._render_template(
title=input_path.stem,
body=html_body,
)
output_path = self._get_output_path(input_path)
output_path.write_text(html, encoding="utf-8")
return output_path
Now the converter can be imported and used as a library, tested without the CLI, and reused in other projects.
Stage 3: Adding Configuration, Logging, and Error Handling
Prompt to AI: "Add to the md2html package: (1) A TOML configuration file that users can place at ~/.config/md2html/config.toml to set default extensions, template, and CSS. (2) Logging with --verbose support that shows each file being processed. (3) Custom exceptions: ConversionError, TemplateError, and ConfigError. (4) Proper exit codes: 0 for success, 1 for partial failure (some files converted), 2 for complete failure. (5) A --dry-run flag that shows what would be converted without writing files."
This stage adds the production layers:
# Custom exceptions
class Md2HtmlError(Exception):
"""Base exception for md2html."""
def __init__(self, message: str, exit_code: int = 1) -> None:
super().__init__(message)
self.exit_code = exit_code
class ConversionError(Md2HtmlError):
"""A file could not be converted."""
pass
class TemplateError(Md2HtmlError):
"""The HTML template is invalid."""
pass
The main function now tracks success and failure counts:
def convert_files(files, config, verbose, dry_run):
"""Convert multiple files with error tracking."""
succeeded = 0
failed = 0
converter = MarkdownConverter(config)
for file_path in files:
try:
if dry_run:
logger.info("Would convert: %s", file_path)
else:
output = converter.convert_file(file_path)
logger.info("Converted: %s -> %s", file_path, output)
succeeded += 1
except ConversionError as exc:
logger.error("Failed to convert %s: %s", file_path, exc)
failed += 1
if failed > 0 and succeeded == 0:
return 2 # Complete failure
elif failed > 0:
return 1 # Partial failure
return 0 # Complete success
Stage 4: Packaging with pyproject.toml
Now the tool is feature-complete. Time to make it installable:
Prompt to AI: "Create a pyproject.toml for the md2html package. Use setuptools as the build backend. The entry point should map the command 'md2html' to 'md2html.cli:main'. Dependencies: markdown, click, rich, pygments (for code highlighting), tomli (Python < 3.11). Dev dependencies: pytest, pytest-cov, mypy, ruff. Use the src layout."
We restructure the project:
md2html-project/
pyproject.toml
README.md
LICENSE
src/
md2html/
__init__.py
__main__.py
cli.py
converter.py
templates.py
config.py
exceptions.py
default_template.html
tests/
__init__.py
test_converter.py
test_cli.py
conftest.py
The pyproject.toml:
[build-system]
requires = ["setuptools>=68.0"]
build-backend = "setuptools.backends._legacy:_Backend"
[project]
name = "md2html-cli"
version = "1.0.0"
description = "Convert Markdown files to beautiful HTML from the command line"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.10"
dependencies = [
"markdown>=3.4",
"click>=8.0",
"rich>=13.0",
"pygments>=2.15",
"tomli>=2.0; python_version < '3.11'",
]
[project.scripts]
md2html = "md2html.cli:main"
[project.optional-dependencies]
dev = ["pytest>=7.0", "pytest-cov>=4.0", "mypy>=1.0", "ruff>=0.1.0"]
The __main__.py:
"""Allow running with python -m md2html."""
from md2html.cli import main
if __name__ == "__main__":
raise SystemExit(main())
Now users can install and use the tool:
pip install md2html-cli
md2html README.md --css style.css -o build/
Stage 5: Publishing to PyPI
The final stage is publishing the package:
# Install build tools
pip install build twine
# Build the package
python -m build
# Creates dist/md2html_cli-1.0.0.tar.gz and dist/md2html_cli-1.0.0-py3-none-any.whl
# Test on TestPyPI first
twine upload --repository testpypi dist/*
# Verify the installation from TestPyPI
pip install --index-url https://test.pypi.org/simple/ md2html-cli
# If everything works, publish to real PyPI
twine upload dist/*
Before publishing, we also add:
- A
CHANGELOG.mdwith the initial release entry - A
LICENSEfile (MIT) - A
README.mdwith installation instructions, usage examples, and configuration reference - GitHub Actions CI for automated testing and publishing
Evolution Summary
| Stage | Lines of Code | Features Added | Key Chapter 15 Concept |
|---|---|---|---|
| 0 | 30 | Basic conversion | (Starting point) |
| 1 | 80 | Argument parsing, multi-file, help text | Section 15.2: Argument Parsing |
| 2 | 200 | Separated concerns, classes, library-usable | Section 15.1: CLI Architecture |
| 3 | 350 | Config, logging, errors, dry-run | Sections 15.3, 15.4, 15.7 |
| 4 | 400 | pyproject.toml, entry points, src layout | Section 15.8: Packaging |
| 5 | 400 | Published to PyPI | Section 15.8: Distribution |
Key Lessons
1. The hardest part is not the code -- it is the structure. The converter logic barely changed from Stage 0 to Stage 5. What changed was how it was organized, configured, invoked, and distributed.
2. AI excels at the boilerplate stages. Generating pyproject.toml, __main__.py, argument parsing decorators, and logging setup is tedious for humans but trivial for AI. This is where AI assistance provides the most value in CLI development.
3. Each stage should be working and tested before moving to the next. We never had a broken tool. At every stage, python md2html.py README.md (or later, md2html README.md) worked correctly. Incremental improvement is safer than big-bang refactoring.
4. The src layout prevents subtle import bugs. By placing the package under src/, we ensure that tests import the installed version of the package, not the source directory. This catches packaging errors (missing files, wrong paths) before they reach users.
5. TestPyPI is essential. Publishing to TestPyPI first catches issues with package metadata, missing data files, and dependency declarations that do not manifest during local development. It costs nothing and saves embarrassment.
6. The script-to-package journey is common enough to template. After going through this process once, you can create a cookiecutter or copier template for your future CLI tools. AI can generate these templates, giving you a head start on every new project.
This case study demonstrates that the distance from "script that works on my machine" to "tool anyone can install and use" is not as large as it seems. With AI assistance handling the boilerplate, a disciplined developer can make the journey in an afternoon.