Chapter 13 Quiz: Working with Multiple Files and Large Codebases

Test your understanding of multi-file vibe coding concepts. Try to answer each question before revealing the answer.

Question 1

What is the primary reason AI coding assistants struggle with multi-file projects?

Show Answer

AI assistants have a finite context window and can only see the code you explicitly provide in the conversation. Unlike a human developer who can browse files in an IDE, the AI cannot autonomously "look around" the codebase. This means it may lack information about other files that are critical for generating correct, well-integrated code.

Question 2

What is a repository map, and what five pieces of information should it include for each file?

Show Answer

A repository map is a structured summary of a project's files that goes beyond a simple directory listing. For each file, it should include: (1) the file path, (2) the line count or size, (3) the key classes defined in the file, (4) the key functions defined in the file, and (5) the file's dependencies (what it imports from other project files). This gives the AI a high-level understanding of the codebase structure and relationships.

Question 3

What is "interface-first context" and when should you use it?

Show Answer

Interface-first context means providing only the public interfaces of dependency files (class signatures, method signatures with type hints, and docstrings) rather than their full implementation. You should use it when the AI needs to write code that interacts with other modules but does not need to understand their internal implementation details. This is the most token-efficient context strategy and should be your default approach for providing cross-file context.

Question 4

When is it appropriate to include a full file in the AI's context instead of just its interface?

Show Answer

Include full files when: (1) the AI needs to modify the existing file, (2) the file is short (under 50 lines) and fully relevant, (3) the integration is complex and implementation details matter, or (4) you are debugging an issue that might be caused by implementation details. In these cases, the interface alone does not provide enough information for the AI to do correct work.

Question 5

What are the main advantages of holistic (multi-file) generation over file-by-file generation?

Show Answer

Holistic generation offers: (1) better consistency across files because the AI generates them in the same context, (2) the AI can reason about inter-file dependencies naturally, (3) it is faster for generating initial project scaffolding, and (4) integration issues are caught during generation rather than later. It works best for initial scaffolding, small-to-medium features (3-8 files), tightly coupled components, and prototyping.

Question 6

What are the main advantages of file-by-file generation over holistic generation?

Show Answer

File-by-file generation offers: (1) more control over each file's content, (2) easier to review and iterate on individual files, (3) works well within context window limits, and (4) good for files that are relatively independent. It works best for large or complex files, loosely coupled components, and situations where you want careful review of each output.

Question 7

What is "convention drift" and what causes it?

Show Answer

Convention drift is the gradual divergence from established coding conventions that occurs over long AI-assisted coding sessions. It is caused by: (1) later prompts not including the full style guide, (2) the AI introducing slight variations that accumulate over time, and (3) context from earlier in the conversation being truncated as the conversation grows. The result is that files generated late in a session may follow slightly different conventions than files generated early.

Question 8

Name three strategies for preventing convention drift during a long vibe coding session.

Show Answer

(1) Re-anchor periodically by including the style guide or a consistency reference every few prompts, not just at the beginning. (2) Compare early files against late files to detect drift. (3) Use a compact "convention prompt" prefix that you paste at the start of every prompt to remind the AI of the conventions. Additional strategies include running automated consistency checks between batches and using a consistency reference file alongside the current task.

Question 9

What is an import map and why is it important for multi-file code generation?

Show Answer

An import map is a concise summary of what each module exports and the correct import statement to access each export (e.g., `from my_project.models.user import User, UserRole`). It is important because without it, the AI must guess import paths, which may lead to incorrect package paths, references to non-existent functions, or imports from third-party libraries that are not installed. Providing an import map ensures all generated code uses consistent, correct imports.

Question 10

How do circular imports occur in AI-generated code, and how can you prevent them?

Show Answer

Circular imports occur when module A imports from module B, and module B imports from module A (directly or through a chain). AI-generated code is particularly susceptible because the AI does not have a complete view of the dependency graph. Prevention strategies: (1) establish clear dependency direction rules (e.g., models never import from services), (2) communicate these rules in the prompt, (3) use Protocol types for reverse dependencies, and (4) review generated import statements for cycles before running the code.

Question 11

What is the recommended approach for working with a monorepo when using AI tools?

Show Answer

Scope each AI session to a specific package or a small set of related packages rather than trying to work with the entire monorepo at once. Provide the directory structure of the focused package, along with the interfaces of shared libraries and other packages it depends on. For cross-package changes, use a phased approach: modify shared libraries first, then update each dependent package in separate sessions.

Question 12

In the tiered context strategy for enterprise codebases, what information belongs in each tier?

Show Answer

Tier 1 (Always Include, 50-100 tokens): project name, purpose, language/framework, architecture style, coding standards link. Tier 2 (Session-Level, 200-500 tokens): the module being worked on, repository map of the relevant section, conventions for this part of the codebase. Tier 3 (Task-Level, variable): specific files being modified, interfaces of direct dependencies, relevant test files. Tier 4 (On-Demand, variable): additional files revealed by errors, historical context, related documentation.

Question 13

What is the "consistency reference" pattern and why is it more effective than an abstract style guide?

Show Answer

The consistency reference pattern involves providing an existing file as a concrete example of the project's patterns and conventions when asking the AI to generate a new, similar file. It is more effective than an abstract style guide because the AI can directly pattern-match against the concrete example, picking up on subtleties (method ordering, error handling flow, variable naming patterns) that might be difficult to describe in a style guide. The AI essentially "imitates" the reference file's style.

Question 14

What is progressive context disclosure, and when should you use it?

Show Answer

Progressive context disclosure is a strategy where you provide context incrementally rather than all at once: (1) start with high-level architecture and repository map, (2) ask the AI to outline its approach, (3) provide the specific files it identifies as needing, (4) have it generate code, (5) provide additional context if needed. Use it when you are unsure exactly what context the AI will need, when the task is complex and exploratory, or when including all possibly-relevant context would overflow the context window.

Question 15

Explain the "sliding window" approach to context management when processing many files sequentially.

Show Answer

The sliding window approach maintains a fixed set of context as you work through many files: always keep the overall task description and conventions, the repository map, the file currently being worked on, and the most recently completed file (as a consistency reference). Drop files from context as you move past them. This prevents the conversation from growing unboundedly while maintaining enough context for consistency between adjacent files.

Question 16

You need to generate 12 data model files, 4 service files, and 4 test files. What generation strategy would you use?

Show Answer

A hybrid approach: (1) Generate the 12 model files using file-by-file generation with a sliding window (since they are numerous and each has complex internal logic, but they follow similar patterns). (2) Generate the 4 service files using file-by-file generation with interfaces from the models as context. (3) Generate all 4 test files holistically in one or two prompts (since they follow similar patterns and benefit from consistency). Throughout, use a style guide and consistency references to maintain conventions.

Question 17

What are three ways AI can help you navigate an unfamiliar codebase without generating new code?

Show Answer

(1) Understanding architecture: provide the directory tree and file headers, and ask the AI for a high-level architecture overview, main data flows, and key design patterns. (2) Tracing data flows: provide relevant files and ask the AI to trace how data moves from input to output through the system. (3) Finding where to make changes: describe a feature or bug fix and provide the repository map, then ask the AI where the change should be implemented. Additional uses include analyzing dependency relationships and generating documentation for undocumented code.

Question 18

What is a "context document" and how do professional teams use it?

Show Answer

A context document is a maintained markdown file that summarizes a project's architecture, key abstractions, naming conventions, common patterns, and important design decisions. Teams paste it at the start of any new AI conversation to quickly bring the assistant up to speed on the project. Some teams check it into the repository and keep it updated as the project evolves. It serves as reusable context that does not need to be reconstructed for each AI session.

Question 19

Why is it important to include dependency rules (which layers can import from which) in your AI prompts?

Show Answer

Without explicit dependency rules, the AI may create imports that violate the project's architectural layering, such as having a model import from a service or a utility import from an API module. These violations can lead to circular dependencies, make the code harder to maintain, and break the separation of concerns that the layered architecture is designed to provide. By stating the rules explicitly, you give the AI the constraints it needs to generate architecturally sound code.

Question 20

What is the difference between a directory tree and a repository map? When would you use each?

Show Answer

A directory tree shows the hierarchical file and folder structure of a project (file names and nesting). A repository map includes the directory tree plus additional information for each file: line count, key classes and functions defined, and dependencies on other files. Use a directory tree when you just need the AI to understand the project layout (e.g., where to place a new file). Use a repository map when the AI needs to understand the relationships between files and the contents of each module (e.g., when generating code that integrates with existing modules).

Question 21

What token-efficiency technique would you use to provide context about a 200-line utility module that exports 3 functions, when the AI only needs to call those functions?

Show Answer

Use interface-first context: provide only the function signatures with type hints and docstrings, rather than the full 200-line implementation. For example, instead of including the entire file (~600 tokens), provide just the signatures and docstrings (~50-80 tokens). This gives the AI everything it needs to call the functions correctly (parameter names, types, return types, expected behavior) without wasting tokens on implementation details.

Question 22

In a monorepo with shared-models, service-a, and service-b, you add a new field to a shared model. What is the correct order of operations?

Show Answer

(1) Modify the shared model in `shared-models` first. (2) Update `service-a` to handle the new field, providing the updated model as context. (3) Update `service-b` to handle the new field, providing the updated model as context. (4) Update tests in all three packages. The key principle is to modify shared/upstream packages first, then update downstream dependents, because each downstream session needs the updated upstream interfaces as context.

Question 23

A colleague includes all 25 project files (totaling 8,000 tokens) in every AI prompt "to be safe." What problems might this cause?

Show Answer

Problems: (1) Wasted context window space that could be used for better instructions or more focused context. (2) The AI may get confused by irrelevant details, reducing output quality. (3) Slower response generation due to processing more input. (4) Higher API costs since pricing is based on token count. (5) The AI may pattern-match against irrelevant files, introducing unwanted patterns. (6) For larger projects, this approach simply does not scale. The better approach is selective context: include only the files directly relevant to the current task.

Question 24

How does the concept of "parallel workstreams" apply to multi-file AI coding, and what determines whether workstreams can be parallelized?

Show Answer

Parallel workstreams allow you to split a large multi-file change into independent AI sessions that can be executed simultaneously (or at least independently). Workstreams can be parallelized if they modify different files with no mutual dependencies. For example, updating model validations and updating API documentation could be parallel (they touch different files), but updating models and then updating services that depend on those models must be sequential. The dependency graph between the changes determines parallelization opportunities.

Question 25

Explain the "meta-prompting" approach mentioned in the Advanced callout, where AI generates context for other AI sessions.

Show Answer

Meta-prompting uses one AI session to analyze a codebase and produce a context document (repository map, module summaries, dependency analysis, conventions identification), which is then fed into a second AI session that performs the actual code generation. The first session specializes in understanding and summarizing, while the second specializes in generating. This is effective for large-scale changes because the first session can explore the codebase thoroughly without being burdened by the code generation task, and the second session receives a high-quality, pre-distilled context that fits within its window.