Case Study 01: Building a File Organizer CLI

Overview

In this case study, we build a complete CLI tool called fileorg that automatically organizes files in a directory by sorting them into subdirectories based on file type, modification date, or custom rules. This project demonstrates how to use AI-assisted development to build a practical, production-quality command-line tool from scratch, applying every concept from Chapter 15.

The file organizer is a tool that many developers build at some point -- a cluttered Downloads folder is a universal frustration. What makes this case study instructive is not the novelty of the idea but the quality of the implementation: proper argument parsing, configuration support, progress feedback, safe file operations, comprehensive error handling, and logging.

Requirements

Before writing any code, we define what the tool must do:

Functional Requirements: - Organize files by type (documents, images, audio, video, archives, code, other) - Organize files by date (year/month subdirectories based on modification time) - Support custom rules defined in a configuration file (e.g., move *.psd files to a design/ folder) - Preview mode (dry run) that shows what would happen without moving files - Undo support by writing a manifest of all moves

Non-Functional Requirements: - Handle thousands of files without excessive memory usage - Provide a progress bar for large directories - Log all operations to a file - Exit with appropriate codes on success, partial success, and failure - Support both interactive use and scripting (no mandatory prompts)

Step 1: The AI Prompt for the CLI Skeleton

We start with a detailed prompt that specifies the interface:

Prompt to AI: "Create a Python CLI tool called 'fileorg' using click with these commands: - 'organize' -- the main command. Takes a DIRECTORY argument (the folder to organize). Options: --by (choices: type, date, custom; default: type), --dry-run (flag to preview without moving), --config (path to config file), --verbose, --force (skip confirmation prompt). - 'undo' -- reverses the last organize operation using a manifest file. Takes an optional --manifest path argument. - 'rules' -- shows the current organization rules (built-in and custom).

Use click groups, proper type hints, and docstrings. Just the skeleton with placeholder implementations for now."

The AI generates a clean skeleton. We verify it runs, check that --help output looks correct, and then proceed to implementation.

Step 2: Defining File Categories

The core logic maps file extensions to categories. We prompt the AI:

Prompt to AI: "Create a file_categories.py module for the fileorg tool. Define a dictionary mapping file extensions to category names. Include at least these categories: documents (.pdf, .doc, .docx, .txt, .rtf, .odt, .xlsx, .csv, .pptx), images (.jpg, .jpeg, .png, .gif, .bmp, .svg, .webp, .tiff, .ico), audio (.mp3, .wav, .flac, .aac, .ogg, .wma, .m4a), video (.mp4, .avi, .mkv, .mov, .wmv, .flv, .webm), archives (.zip, .tar, .gz, .bz2, .7z, .rar, .xz), and code (.py, .js, .ts, .java, .c, .cpp, .h, .go, .rs, .rb, .html, .css). Include a get_category(extension) function that returns the category name or 'other'. Support custom overrides from a dictionary parameter."

The resulting module is straightforward:

from typing import Optional

DEFAULT_CATEGORIES: dict[str, str] = {
    # Documents
    ".pdf": "documents", ".doc": "documents", ".docx": "documents",
    ".txt": "documents", ".rtf": "documents", ".odt": "documents",
    ".xlsx": "documents", ".csv": "documents", ".pptx": "documents",
    # Images
    ".jpg": "images", ".jpeg": "images", ".png": "images",
    ".gif": "images", ".bmp": "images", ".svg": "images",
    ".webp": "images", ".tiff": "images", ".ico": "images",
    # Audio
    ".mp3": "audio", ".wav": "audio", ".flac": "audio",
    ".aac": "audio", ".ogg": "audio", ".wma": "audio", ".m4a": "audio",
    # Video
    ".mp4": "video", ".avi": "video", ".mkv": "video",
    ".mov": "video", ".wmv": "video", ".flv": "video", ".webm": "video",
    # Archives
    ".zip": "archives", ".tar": "archives", ".gz": "archives",
    ".bz2": "archives", ".7z": "archives", ".rar": "archives",
    # Code
    ".py": "code", ".js": "code", ".ts": "code", ".java": "code",
    ".c": "code", ".cpp": "code", ".h": "code", ".go": "code",
    ".rs": "code", ".rb": "code", ".html": "code", ".css": "code",
}


def get_category(
    extension: str,
    custom_rules: Optional[dict[str, str]] = None,
) -> str:
    """Return the category for a file extension.

    Custom rules take precedence over defaults.
    """
    ext = extension.lower()
    if custom_rules and ext in custom_rules:
        return custom_rules[ext]
    return DEFAULT_CATEGORIES.get(ext, "other")

Step 3: The Organizer Engine

The business logic lives in an organizer.py module that is completely independent of the CLI layer:

@dataclass
class MoveOperation:
    """Represents a planned or completed file move."""
    source: Path
    destination: Path
    category: str
    status: str = "pending"  # pending, completed, failed, skipped
    error: str = ""


class FileOrganizer:
    """Organizes files in a directory by category, date, or custom rules."""

    def __init__(
        self,
        source_dir: Path,
        strategy: str = "type",
        custom_rules: dict[str, str] | None = None,
        logger: logging.Logger | None = None,
    ) -> None:
        self.source_dir = source_dir
        self.strategy = strategy
        self.custom_rules = custom_rules or {}
        self.logger = logger or logging.getLogger(__name__)
        self.operations: list[MoveOperation] = []

    def plan(self) -> list[MoveOperation]:
        """Scan the directory and plan all moves without executing them."""
        self.operations = []
        for file_path in self.source_dir.iterdir():
            if file_path.is_file():
                dest = self._compute_destination(file_path)
                op = MoveOperation(
                    source=file_path,
                    destination=dest,
                    category=dest.parent.name,
                )
                self.operations.append(op)
        return self.operations

    def execute(self, dry_run: bool = False) -> list[MoveOperation]:
        """Execute all planned move operations."""
        if not self.operations:
            self.plan()

        for op in self.operations:
            if dry_run:
                self.logger.info("DRY RUN: %s -> %s", op.source, op.destination)
                op.status = "skipped"
                continue

            try:
                op.destination.parent.mkdir(parents=True, exist_ok=True)
                # Handle name conflicts
                final_dest = self._resolve_conflict(op.destination)
                shutil.move(str(op.source), str(final_dest))
                op.destination = final_dest
                op.status = "completed"
                self.logger.debug("Moved: %s -> %s", op.source, final_dest)
            except OSError as exc:
                op.status = "failed"
                op.error = str(exc)
                self.logger.error("Failed: %s -> %s (%s)", op.source, op.destination, exc)

        return self.operations

Note the key design decisions: - The plan/execute pattern separates scanning from action, enabling dry-run mode naturally. - The MoveOperation dataclass records what happened to each file, enabling undo and reporting. - The logger is injected rather than created internally, making the class testable.

Step 4: Adding Progress and Rich Output

We add a thin presentation layer that calls the engine and provides visual feedback:

def run_organize(
    directory: Path,
    strategy: str,
    dry_run: bool,
    config: dict,
    verbose: bool,
) -> int:
    """Run the file organization with progress feedback."""
    console = Console()
    logger = setup_logging(verbose=verbose)

    organizer = FileOrganizer(
        source_dir=directory,
        strategy=strategy,
        custom_rules=config.get("rules", {}),
        logger=logger,
    )

    # Planning phase
    with console.status("[bold green]Scanning directory..."):
        operations = organizer.plan()

    if not operations:
        console.print("[yellow]No files to organize.[/yellow]")
        return 0

    console.print(f"Found [bold]{len(operations)}[/bold] files to organize.")

    # Show preview table
    if dry_run:
        table = Table(title="Planned Operations (Dry Run)")
        table.add_column("File", style="cyan")
        table.add_column("Destination", style="green")
        table.add_column("Category", style="magenta")
        for op in operations[:20]:  # Show first 20
            table.add_row(op.source.name, str(op.destination.parent.name), op.category)
        if len(operations) > 20:
            table.add_row("...", f"({len(operations) - 20} more)", "...")
        console.print(table)
        return 0

    # Execute with progress bar
    results = {"completed": 0, "failed": 0, "skipped": 0}
    with Progress(
        SpinnerColumn(),
        TextColumn("[bold blue]{task.description}"),
        BarColumn(),
        MofNCompleteColumn(),
        TimeElapsedColumn(),
    ) as progress:
        task = progress.add_task("Organizing files", total=len(operations))
        for op in organizer.execute():
            results[op.status] = results.get(op.status, 0) + 1
            progress.advance(task)

    # Summary
    console.print(f"\n[green]Completed: {results['completed']}[/green]  "
                  f"[red]Failed: {results['failed']}[/red]  "
                  f"[yellow]Skipped: {results['skipped']}[/yellow]")

    return 1 if results["failed"] > 0 else 0

Step 5: Manifest and Undo

The organizer writes a JSON manifest after each run, recording every file move. The undo command reads this manifest and reverses all moves:

def save_manifest(operations: list[MoveOperation], manifest_path: Path) -> None:
    """Save a manifest of all completed operations for undo support."""
    entries = [
        {
            "source": str(op.source),
            "destination": str(op.destination),
            "category": op.category,
            "status": op.status,
            "timestamp": datetime.now().isoformat(),
        }
        for op in operations
        if op.status == "completed"
    ]
    manifest_path.parent.mkdir(parents=True, exist_ok=True)
    with open(manifest_path, "w", encoding="utf-8") as f:
        json.dump(entries, f, indent=2)


def undo_organize(manifest_path: Path, logger: logging.Logger) -> int:
    """Reverse all moves recorded in a manifest file."""
    with open(manifest_path, "r", encoding="utf-8") as f:
        entries = json.load(f)

    failed = 0
    for entry in reversed(entries):
        src = Path(entry["destination"])
        dst = Path(entry["source"])
        try:
            shutil.move(str(src), str(dst))
            logger.info("Restored: %s -> %s", src, dst)
        except OSError as exc:
            logger.error("Failed to restore: %s -> %s (%s)", src, dst, exc)
            failed += 1

    return 1 if failed > 0 else 0

Step 6: Configuration File Support

The tool reads custom rules from a TOML configuration file:

# ~/.config/fileorg/config.toml
[rules]
".psd" = "design"
".ai" = "design"
".sketch" = "design"
".fig" = "design"
".blend" = "3d"
".obj" = "3d"
".stl" = "3d"

[settings]
default_strategy = "type"
manifest_dir = "~/.local/share/fileorg/manifests"

Lessons Learned

1. The plan/execute pattern pays for itself. By separating scanning from action, dry-run mode was trivial to implement. The same pattern enabled the undo feature and the summary table.

2. AI generated the categories dictionary quickly but needed review. The initial AI output missed several common extensions (.webp, .m4a, .xz) and included some debatable categorizations (should .html be in "code" or "web"?). Human review was essential.

3. The name conflict resolution was tricky. When two files would end up with the same destination name, the AI's first attempt simply overwrote the existing file. We had to prompt explicitly for a renaming strategy (appending _1, _2, etc.).

4. Testing with real directories revealed edge cases. Symlinks, hidden files (dotfiles), files with no extension, read-only files, and files currently in use all needed special handling. None of these appeared in the AI's first draft.

5. Separation of concerns made iterative development smooth. Because the organizer engine was independent of the CLI, we could test it in a Python REPL while building the CLI layer separately. When we added the --by date strategy later, only the engine changed.

Complete Code

The complete implementation is available in code/case-study-code.py. The tool demonstrates: - Click-based CLI with three subcommands - TOML configuration with custom rules - Progress bars and rich tables - Manifest-based undo functionality - Comprehensive error handling with specific exit codes - Logging with verbose mode support

This case study shows that even a conceptually simple tool -- "move files into folders" -- benefits enormously from production-quality architecture. The difference between a 20-line script and a proper CLI tool is not complexity for its own sake; it is the difference between something that works on your machine today and something that works reliably for anyone, anywhere, every time.