Chapter 13 Exercises: Working with Multiple Files and Large Codebases

DataField.Dev

Chapter 13 Exercises: Working with Multiple Files and Large Codebases

These exercises progress from basic recall through creative application to challenging multi-chapter integration. Complete them in order within each tier, but feel free to skip tiers that are too easy for your current level.

Tier 1: Recall (Exercises 1-6)

These exercises test your understanding of the core concepts from this chapter.

Exercise 1: Context Strategy Identification

For each of the following scenarios, identify which context-providing strategy (interface-first, full file inclusion, dependency summary, example-driven, or progressive disclosure) would be most appropriate. Explain your reasoning.

a) You need the AI to write a new service that calls three utility functions from an existing module. b) You need the AI to refactor an existing 200-line file to improve error handling. c) You need the AI to create a new model file that follows the exact same pattern as five existing model files. d) You need the AI to redesign the authentication system, but you are not sure which files are involved. e) You need the AI to write a quick script that imports a User class with 8 fields.

Exercise 2: Repository Map Components

List the five key pieces of information that a good repository map should include for each file. Explain why each piece of information is valuable to an AI assistant.

Exercise 3: File-by-File vs. Holistic Decision

For each scenario below, state whether you would use file-by-file generation, holistic generation, or a hybrid approach:

a) Generating a complete REST API with models, routes, and services for a new microservice (12 files) b) Adding a complex search algorithm to an existing search_engine.py module c) Creating three new tightly-coupled data model files that reference each other d) Building a CLI tool with 6 independent subcommands, each in its own file e) Generating a complete test suite (8 test files) for an existing application

Exercise 4: Import Map Construction

Given the following directory structure, write a complete import map showing what each module exports and the correct import statement for each export.

myapp/
├── __init__.py
├── models/
│   ├── __init__.py
│   ├── user.py (exports: User, UserStatus)
│   └── post.py (exports: Post, PostCategory)
├── services/
│   ├── __init__.py
│   ├── user_service.py (exports: UserService)
│   └── post_service.py (exports: PostService)
└── utils/
    ├── __init__.py
    └── database.py (exports: get_session, DatabaseError)

Exercise 5: Dependency Rule Violations

Review the following import statements and identify which ones violate the standard layered architecture dependency rules (models -> no project imports, utils -> no project imports, services -> models + utils only, api -> services + models only):

# File: models/user.py
from myapp.utils.validation import validate_email
from myapp.services.auth_service import hash_password

# File: services/order_service.py
from myapp.models.order import Order
from myapp.models.user import User
from myapp.api.routes import get_current_user

# File: api/routes.py
from myapp.services.user_service import UserService
from myapp.models.user import User

# File: utils/database.py
from myapp.models.user import User

Exercise 6: Convention Drift Scenarios

Describe three specific ways that convention drift might manifest in a multi-file project generated over a long AI session. For each, explain what the drift looks like and how you would detect it.

Tier 2: Apply (Exercises 7-12)

These exercises ask you to apply the concepts in practical scenarios.

Exercise 7: Write a Context Document

Create a complete context document (under 500 words) for a fictional e-commerce application with the following components: - User management (registration, login, profiles) - Product catalog (categories, search, filtering) - Shopping cart and checkout - Order management and tracking - Payment processing (via Stripe)

Include: architecture overview, technology stack, naming conventions, dependency rules, and module summaries.

Exercise 8: Design a Repository Map Generator Prompt

Write a prompt that asks an AI assistant to generate a repository map from a provided directory tree and a set of file contents. The prompt should specify exactly what information you want in the map (file sizes, exports, dependencies, etc.) and what format to use.

Exercise 9: Create a Consistency Reference

Given the following service file, write a prompt that uses it as a consistency reference to generate a new, different service. Your prompt should explicitly call out which patterns the AI should replicate.

class ProductService:
    """Service for managing product operations."""

    def __init__(self, db: Database, cache: Cache) -> None:
        self._db = db
        self._cache = cache

    def get_by_id(self, product_id: int) -> Product:
        """Retrieve a product by its unique identifier.

        Args:
            product_id: The unique identifier of the product.

        Returns:
            The product with the given ID.

        Raises:
            NotFoundError: If no product with the given ID exists.
        """
        cached = self._cache.get(f"product:{product_id}")
        if cached:
            return cached
        product = self._db.query(Product).filter_by(id=product_id).first()
        if not product:
            raise NotFoundError(f"Product {product_id} not found")
        self._cache.set(f"product:{product_id}", product, ttl=300)
        return product

Exercise 10: Phased Approach Planning

You need to add a "favorites" feature to an existing e-commerce application. Users should be able to favorite products, view their favorites, and receive notifications when favorited products go on sale. Plan a phased approach that breaks this into manageable AI sessions. For each phase, specify: - What files need to be created or modified - What context the AI needs - What the deliverable is

Exercise 11: Monorepo Scoping

You are working in a monorepo with the following packages: user-service, product-service, order-service, notification-service, shared-models, and common-utils. You need to add a "wishlist" feature. Determine: - Which packages need to be modified - What order to make the changes - What context from other packages each session needs - How to verify cross-package consistency

Exercise 12: Sliding Window Context Management

You need to generate 10 data model files, each following the same pattern. Design a sliding window context management plan that specifies: - What goes in the "stable context" (always present) - What goes in the "sliding window" (current + previous file) - How you handle the transition between files - How you verify consistency across all 10 files at the end

Tier 3: Analyze (Exercises 13-18)

These exercises require analyzing scenarios and making judgments.

Exercise 13: Context Efficiency Analysis

A developer provides the following context to an AI assistant for generating a new API endpoint:

Here is my entire models directory (4 files, 400 lines total).
Here is my entire services directory (3 files, 600 lines total).
Here is my entire utils directory (5 files, 350 lines total).
Here is my existing routes.py (200 lines).
Here are my project's requirements.txt (50 lines).

Please add a GET /api/users/{id}/orders endpoint.

Analyze this approach: - What is good about it? - What is wasteful or potentially problematic? - How would you restructure the context to be more efficient? - Estimate the token cost of the original approach vs. your improved approach.

Exercise 14: Consistency Audit

Review the following three function signatures from different files in the same project and identify all consistency issues:

# From user_service.py
def get_user_by_id(self, userId: int) -> Optional[User]:

# From product_service.py
def get_product_by_id(self, product_id: int) -> Product | None:

# From order_service.py
def getOrderById(self, order_id: str) -> Optional[Order]:

For each issue, explain what the inconsistency is, why it matters, and what the standardized version should look like.

Exercise 15: Dependency Graph Analysis

Given the following import statements from a 6-file project, draw the dependency graph and identify: - Any circular dependencies - Any layer violations - The most coupled module (most dependencies) - The most depended-upon module - Suggestions for improvement

# models/user.py - imports: nothing internal
# models/order.py - imports: models.user
# services/user_service.py - imports: models.user, utils.database, utils.email
# services/order_service.py - imports: models.order, models.user, services.user_service, utils.database
# utils/database.py - imports: nothing internal
# utils/email.py - imports: models.user

Exercise 16: Holistic vs. File-by-File Tradeoff Analysis

You are building a new feature that requires 6 new files: - 2 model files (tightly coupled, reference each other) - 2 service files (each depends on both models, loosely coupled with each other) - 2 test files (one for each service)

Analyze the tradeoffs of three approaches: a) Generate all 6 files in one prompt b) Generate all 6 files one at a time c) Generate in groups: models together, services together, tests together

For each approach, discuss: consistency, quality per file, context window usage, number of iterations needed, and risk of integration issues.

Exercise 17: Context Window Budget

You have a model with a 128,000-token context window. Your task requires: - System prompt and instructions: ~1,000 tokens - Style guide and conventions: ~500 tokens - Repository map: ~800 tokens - The file to generate: ~2,000 tokens (estimated output) - Response overhead: ~500 tokens

This leaves approximately 123,200 tokens for source code context. Your project has 40 Python files averaging 150 lines (approximately 450 tokens) each. That is 18,000 tokens total for the entire project.

Should you include all 40 files? Analyze the tradeoffs and recommend a strategy.

Exercise 18: Cross-Package Change Impact Analysis

In a monorepo, you need to add an is_verified boolean field to the shared User model. Analyze the ripple effects: - Which types of files need to change? - In what order should changes be made? - What are the risks of making these changes using AI? - How would you verify that all necessary changes were made? - What testing strategy would you use?

Tier 4: Create (Exercises 19-24)

These exercises require you to build something using the chapter's concepts.

Exercise 19: Build a Repository Map Generator

Using the concepts from Section 13.2 and the example code, create a Python script that: - Takes a directory path as input - Recursively scans all Python files - For each file, extracts: classes, functions, imports, and line count - Outputs a formatted repository map suitable for pasting into an AI prompt - Handles errors gracefully (permission errors, binary files, etc.)

Test it on a real project directory.

Exercise 20: Build a Cross-File Context Builder

Create a Python tool that: - Takes a target file path and a project directory - Analyzes the target file's imports to determine its dependencies - For each dependency within the project, extracts the public interface (class and function signatures with docstrings) - Outputs a formatted "cross-file context" block ready for an AI prompt

Exercise 21: Build a Convention Checker

Create a Python script that checks a directory of Python files for consistency in: - Naming conventions (classes PascalCase, functions snake_case, constants UPPER_CASE) - Docstring presence on all public functions and classes - Import style (all absolute or all relative, but not mixed) - Type hint coverage (functions with vs. without type hints)

Output a report of any inconsistencies found.

Exercise 22: Multi-File Project Generation

Using the vibe coding techniques from this chapter, generate a complete small project (8-10 files) for a library management system. Document: - Your context strategy (what you included in each prompt) - Whether you used file-by-file, holistic, or hybrid generation - Any consistency issues you encountered and how you resolved them - The total number of prompts used

Exercise 23: Context Document Creation

Create a comprehensive context document for an existing open-source Python project. Pick any project with 20+ files. Your document should include: - Architecture overview - Module summaries - Key abstractions and patterns - Dependency map - Naming conventions - Import conventions

Test the document by using it to generate a new feature with AI assistance.

Exercise 24: Import Cycle Detector

Build a Python tool that: - Scans a directory of Python files - Parses import statements from each file - Builds a directed dependency graph - Detects and reports any circular dependencies - Suggests which imports to restructure to break cycles

Tier 5: Challenge (Exercises 25-30)

These exercises integrate concepts from multiple chapters and push beyond the material covered in this chapter.

Exercise 25: Automated Context Optimizer

Build a tool that, given a task description and a codebase, automatically determines the optimal set of files and file fragments to include as context for an AI prompt. The tool should: - Parse the task description to identify key entities (models, services, endpoints mentioned) - Map those entities to files in the codebase - Determine the minimum set of context needed (interfaces vs. full files) - Estimate the token count and flag if it exceeds a configurable limit - Output a formatted context block ready for use in a prompt

This integrates Chapter 9 (Context Management) with this chapter's techniques.

Exercise 26: Multi-Agent Codebase Modification

Design (and optionally implement) a system that uses multiple AI sessions working in parallel to make a cross-cutting change to a codebase. The system should: - Take a high-level change description - Analyze the codebase to determine affected files - Split the work into independent workstreams - Generate a context package for each workstream - Execute the workstreams (potentially in parallel) - Verify consistency across all changes

This integrates Chapter 12 (Advanced Prompting) and previews Chapter 38 (Multi-Agent Systems).

Exercise 27: Enterprise Codebase Simulation

Create a simulated enterprise codebase (50+ files across 5+ packages in a monorepo structure) using AI. Then: - Generate a comprehensive repository map - Create tiered context documents (Tier 1 through Tier 4) - Use these documents to successfully add a new cross-cutting feature - Measure and report: number of prompts needed, consistency issues found, total tokens consumed

Exercise 28: Legacy Code Modernization Pipeline

Design and implement a pipeline that uses AI to modernize legacy Python code: 1. Analyze a legacy module and generate documentation 2. Create a modernization plan 3. Generate modernized code that maintains the public API 4. Generate tests that verify the modernized code matches the original behavior 5. Verify consistency between old and new versions

Test on a real legacy Python file (Python 2 style, no type hints, no docstrings).

Exercise 29: Team Convention Enforcement System

Build a system that: - Reads a team's coding conventions from a configuration file - Scans AI-generated code for convention violations - Generates corrective prompts that can be sent to an AI to fix violations - Tracks convention adherence over time (per developer, per module, per sprint)

This integrates Chapter 25 (Design Patterns and Clean Code) with this chapter.

Exercise 30: Repository Understanding Benchmark

Create a benchmark for measuring how well AI understands a codebase. The benchmark should: - Take a real codebase as input - Generate a set of questions about the code (e.g., "What happens when a user with an expired token makes a request?", "Which module handles database connection pooling?") - Generate correct answers from manual analysis - Measure AI accuracy with different context strategies (full code, repository map only, interface-only, tiered) - Report which strategies yield the best comprehension for the least context cost

This is a research-oriented exercise that integrates Chapter 9, this chapter, and Chapter 7 (Understanding AI-Generated Code).