Chapter 41 Exercises: Capstone Projects

DataField.Dev

Chapter 41 Exercises: Capstone Projects

These exercises progress from basic recall through creative application to challenging multi-chapter integration. Because this is a capstone chapter, the majority of exercises fall in Tier 4 (Create) and Tier 5 (Challenge), requiring you to synthesize skills from across the entire book. Complete them in order within each tier, but feel free to skip tiers that are too easy for your current level.

Tier 1: Recall (Exercises 1-4)

These exercises verify your understanding of the three capstone projects and their architecture decisions.

Exercise 1: Architecture Identification

For each of the three capstone projects (TaskFlow, DataLens, CodeForge), answer the following:

a) What is the primary architectural pattern used (e.g., three-tier, pipeline, orchestrator)? b) What database technology is used and why? c) What is the main API framework? d) Name one design pattern highlighted in the project's implementation walkthrough.

Exercise 2: Integration Points

List all the integration points (places where two systems or layers communicate) in the TaskFlow SaaS application. For each integration point, state the protocol or mechanism used (REST, webhook, ORM, etc.) and identify which chapter in the book covers the relevant technique.

Exercise 3: Pipeline Terminology

Define each of the following terms as they are used in the DataLens project. Provide a concrete example from the chapter for each.

a) Pluggable connector b) Staging table c) Materialized view d) Quality check e) YAML pipeline configuration f) Star schema

Exercise 4: Multi-Agent Workflow Phases

List the six phases of the CodeForge workflow in order. For each phase, name the responsible agent and describe in one sentence what that agent produces as output.

Tier 2: Apply (Exercises 5-8)

These exercises ask you to apply capstone concepts in modified scenarios.

Exercise 5: Requirements Specification for a New SaaS Feature

TaskFlow needs a new feature: time tracking. Team members should be able to log hours against tasks, and project managers should see time reports by member, project, and date range.

Write a complete specification for this feature using the specification-driven approach from Chapter 10. Include: - At least four user stories - Database schema additions (new tables and/or columns) - At least three API endpoints with request/response schemas - Subscription tier restrictions (e.g., time tracking only available on Pro and Enterprise)

Exercise 6: New Data Connector

Write a complete YAML pipeline configuration for a new DataLens data source: a JSON API that returns weather data for geographic regions. The pipeline should: - Ingest daily weather records from the API - Clean records by dropping entries with null temperature values - Normalize temperature to Celsius and dates to ISO 8601 - Enrich records with a computed "severity_index" field - Aggregate by region and week - Include at least two quality checks

Exercise 7: Agent System Prompt

Write a complete system prompt for a new CodeForge agent: the Documentation Agent. This agent should generate comprehensive documentation for code produced by the Coder Agent. Your prompt must include: - A clear role definition - The types of documentation to produce (API docs, README, architecture notes) - Output format requirements - At least three explicit constraints (things the agent should NOT do) - Instructions for handling code that is poorly documented

Exercise 8: Testing Strategy Comparison

Create a table comparing the testing strategies of all three capstone projects. For each project, identify: - The primary type of test (unit, integration, E2E, data quality, etc.) - What makes testing that project uniquely challenging - One specific edge case that tests should cover - The testing tool or framework used

Tier 3: Analyze (Exercises 9-12)

These exercises require deeper analysis of architecture and design decisions.

Exercise 9: Architecture Trade-off Analysis

The TaskFlow project uses a monolithic architecture rather than microservices. Analyze this decision:

a) List three specific benefits of the monolithic approach for TaskFlow's current feature set. b) List three specific scenarios that would justify splitting TaskFlow into microservices. c) If you were to extract one service first, which component would you choose and why? d) What data consistency challenges would the extraction introduce?

Exercise 10: Pipeline Failure Mode Analysis

For the DataLens pipeline, analyze the following failure scenarios. For each, describe what happens, how the system should detect it, and what the recovery strategy should be.

a) The REST API data source returns a 503 Service Unavailable error during ingestion. b) The currency normalization transformer receives a currency code ("XYZ") that is not in its conversion table. c) A pipeline run completes successfully but the row_count quality check fails because only 12 records were processed (minimum is 100). d) Two pipeline instances run simultaneously due to a scheduler bug, and both try to upsert into the same output table. e) The pipeline processes 50,000 records but runs out of memory during the aggregation step.

Exercise 11: Multi-Agent Conflict Resolution

The CodeForge Reviewer Agent and Coder Agent disagree on the following issues. For each, explain whose position you would support and what the Orchestrator should do.

a) The Reviewer says the code should use async/await for all database operations. The Coder implemented synchronous calls because the project is a simple CLI tool with no concurrent users. b) The Reviewer flags a security issue: user passwords are stored in plaintext. The Coder's response is that the original specification did not mention password hashing. c) The Reviewer requests 90% test coverage. The Coder says this is impractical for generated code and suggests 60% coverage with focus on critical paths.

Exercise 12: Cross-Project Pattern Extraction

Identify at least five design patterns or architectural principles that appear in two or more of the three capstone projects. For each, name the pattern, identify which projects use it, and explain why it is appropriate in each context.

Tier 4: Create (Exercises 13-24)

These exercises require you to build substantial components. Each exercise integrates skills from multiple chapters.

Exercise 13: Build a Team Invitation System

Implement the complete team invitation flow for TaskFlow. Your implementation must include: - A Pydantic schema for invitation requests (email, role, optional message) - A database model for pending invitations with expiration timestamps - An API endpoint that creates an invitation and (simulates) sending an email - An API endpoint that accepts an invitation using a secure token - Validation that prevents inviting users who are already team members - Subscription tier enforcement (Free tier limited to 3 members)

Write the code in Python using FastAPI and SQLAlchemy. Include type hints and docstrings.

Exercise 14: Build a Pipeline Monitoring Dashboard API

Implement the backend API for a DataLens pipeline monitoring dashboard. Your implementation must include: - A database model for pipeline run history (name, status, start time, end time, records processed, errors) - An endpoint that returns the last N runs for a given pipeline with pagination - An endpoint that returns aggregate statistics: success rate, average duration, average records processed - An endpoint that returns alert status based on configurable thresholds (e.g., alert if last 3 runs failed) - A health check endpoint that reports whether the most recent pipeline run succeeded and how long ago it completed

Write the code in Python using FastAPI. Include type hints and docstrings.

Exercise 15: Build a Code Validation Pipeline

Extend the CodeForge CodeValidator class to perform deeper analysis. Your extended validator must: - Check for common security anti-patterns (e.g., eval(), exec(), hardcoded secrets like password = "...") - Verify that all functions have return type annotations - Detect functions longer than 50 lines (a complexity heuristic) - Check import statements against a list of allowed and banned packages - Produce a validation report as a structured dictionary with severity levels

Write the code in Python using the ast module. Include type hints and docstrings.

Exercise 16: Implement Webhook Signature Verification

Implement a reusable webhook signature verification system for TaskFlow's Stripe integration. Your implementation must: - Accept raw request body bytes and a signature header - Compute an HMAC-SHA256 signature using a shared secret - Compare signatures using constant-time comparison to prevent timing attacks - Raise appropriate HTTP exceptions for invalid signatures - Include a FastAPI middleware or dependency that can be applied to any webhook endpoint - Include at least five unit tests covering valid signatures, invalid signatures, missing headers, expired timestamps, and replay attacks

Exercise 17: Build a Data Quality Framework

Design and implement a reusable data quality framework for DataLens that goes beyond the basic checks in the chapter. Your framework must support: - Schema validation (expected columns, data types) - Statistical checks (mean, standard deviation, distribution shape compared to historical baselines) - Referential integrity checks (foreign key values exist in reference tables) - Freshness checks (data is not older than a configurable threshold) - A quality score calculation that combines all checks into a single 0-100 score - A report generator that produces a human-readable summary

Write the code in Python. Include type hints and docstrings.

Exercise 18: Implement a Multi-Agent Conversation Logger

Build a conversation logging system for CodeForge that captures all agent interactions for debugging and analysis. Your implementation must: - Log every message sent to and received from each agent with timestamps and token counts - Store logs in a structured format (JSON or SQLite) - Provide query methods: get all messages for a phase, get all messages from a specific agent, get the full conversation timeline - Calculate total tokens used per agent and per phase - Include a method that exports the conversation as a formatted Markdown document for human review - Handle concurrent agents logging simultaneously without data corruption

Exercise 19: Build a Subscription Upgrade Flow

Implement the complete subscription upgrade and downgrade flow for TaskFlow. Your implementation must: - Define Pydantic models for subscription change requests and responses - Implement an endpoint that initiates an upgrade (creates a simulated Stripe Checkout session) - Implement a webhook handler that processes subscription change events - Handle the edge case where a team downgrades but currently exceeds the lower tier's limits (e.g., they have 5 members but are downgrading to Free, which allows 3) - Include a grace period mechanism that gives downgrading teams 14 days to comply with new limits - Write at least four integration tests covering upgrade, downgrade, failed payment, and grace period expiry

Exercise 20: Create a Pipeline Replay System

Implement a replay system for DataLens that allows re-running failed pipelines with the same input data. Your implementation must: - Capture and store the raw ingested data from each pipeline run in a staging table or file - Provide a method to replay a specific pipeline run using the stored raw data (skipping ingestion) - Support replaying with a modified pipeline configuration (e.g., to fix a transformation bug) - Log the difference between the original run and the replay (records gained, records lost, value changes) - Include error handling for cases where the stored data is missing or corrupted

Exercise 21: Build an Agent Performance Dashboard

Create a performance analysis module for CodeForge that tracks and visualizes agent effectiveness. Your implementation must: - Record execution metrics for every agent invocation (tokens, duration, success/failure, output size) - Calculate per-agent statistics: average tokens per task, success rate, average duration - Identify the most expensive phase of the workflow (by tokens and by time) - Track the review-revise loop: how many iterations are needed on average, and does code quality improve with each iteration - Produce a summary report as a structured dictionary

Exercise 22: Implement Role-Based Access Control Middleware

Build a complete RBAC middleware for TaskFlow that can be applied to any API endpoint. Your implementation must: - Define permissions as fine-grained actions (e.g., "task:create", "team:invite", "billing:manage") - Map roles (owner, admin, member) to sets of permissions - Create a FastAPI dependency that checks whether the current user has the required permission for the requested action - Return appropriate HTTP 403 responses with descriptive error messages - Support custom per-team role definitions (e.g., a team could create a "project_lead" role with specific permissions) - Include tests for each role attempting each category of action

Exercise 23: Build a Full-Stack Feature End-to-End

Choose one of the following features and implement it completely, including database model, API endpoint, and a description of the frontend component (pseudo-code or component specification is acceptable for the frontend):

a) TaskFlow: Task Comments -- Users can add comments to tasks, with support for @mentions that trigger notifications. b) DataLens: Custom Dashboard Builder -- Users can create custom dashboards by selecting metrics, chart types, and filters from a configuration interface. c) CodeForge: Template Library -- Users can save and reuse project templates that pre-configure agent prompts and workflow settings.

Exercise 24: Design a Notification System

Design and implement a notification system for TaskFlow that supports multiple delivery channels. Your implementation must include: - A notification model with type, message, recipient, channel (in-app, email), read/unread status, and timestamps - Event triggers: task assigned, task due soon (24 hours), task overdue, team invitation received, comment with @mention - A notification preferences model where users can configure which events generate notifications and through which channels - An API endpoint to list notifications with filtering by read/unread and pagination - An API endpoint to mark notifications as read (individually and in bulk) - A notification dispatcher that routes notifications to the appropriate channel handler

Tier 5: Challenge (Exercises 25-30)

These exercises require integrating concepts from multiple chapters and making complex design decisions. They represent the level of work expected in a professional capstone project.

Exercise 25: Multi-Tenant Data Isolation Audit

TaskFlow stores data for multiple teams in a single database. Perform a security audit of the data isolation:

a) Review the database schema from Section 41.2. Identify every query path by which one team's data could be accessed by another team's users. b) Design a row-level security strategy using PostgreSQL's Row-Level Security (RLS) policies. Write the SQL for at least three RLS policies. c) Implement a FastAPI middleware that injects the current user's team context into every database query, ensuring that queries are automatically scoped to the correct team. d) Write integration tests that verify one team's API calls cannot access another team's data, covering at least five endpoints. e) Document the trade-offs of your approach compared to separate schemas per tenant and separate databases per tenant.

Exercise 26: End-to-End Pipeline with Real Data

Build a complete DataLens pipeline that processes a real (or realistic) dataset:

a) Create a CSV file with at least 500 records of simulated e-commerce order data (order_id, customer_id, product, category, quantity, price, currency, channel, date, region). b) Write a pipeline configuration in YAML that ingests the CSV, cleans it, normalizes currencies and dates, enriches with computed fields (total_amount, margin estimate), and aggregates by date, channel, and category. c) Implement the pipeline using the DataLens framework from the chapter. Process the full dataset and produce output records. d) Implement at least five quality checks with appropriate thresholds. e) Create at least three analytics API endpoints that serve the aggregated data with filtering and date range parameters. f) Write tests that verify the pipeline produces correct output for a known subset of input records.

Exercise 27: Full CodeForge Workflow Simulation

Build a complete CodeForge simulation that processes a real project specification:

a) Write a detailed natural-language specification for a simple project (e.g., "Build a URL shortener with FastAPI, SQLite, and a click tracking dashboard"). b) Implement mock agents that return realistic, pre-written responses for each phase. The Specification Agent should return structured requirements, the Architect Agent should return an architecture document, the Coder Agent should return actual working Python code, the Reviewer Agent should find at least two realistic issues, and the Tester Agent should return actual pytest test code. c) Run the full Orchestrator workflow with your mock agents. Verify that the review-revise loop executes correctly and that the Coder Agent addresses the Reviewer's feedback. d) Validate that all generated code files pass syntax checking using the CodeValidator. e) Write a report (as a Python dictionary or JSON) summarizing the workflow: phases completed, tokens used per agent, issues found and resolved, and total execution time.

Exercise 28: Production Deployment Plan

Create a complete deployment plan for one of the three capstone projects. Your plan must include:

a) A Dockerfile that builds the application image with multi-stage builds for minimal image size. b) A docker-compose.yaml for local development that includes the application, database, and Redis (if applicable). c) Environment variable documentation listing every required variable, its purpose, and example values. d) A CI/CD pipeline definition (GitHub Actions YAML) that runs linting, type checking, tests, builds the Docker image, and deploys to a staging environment. e) A monitoring checklist: what metrics to track, what alerts to set, and what dashboards to create. f) A rollback procedure: step-by-step instructions for reverting a failed deployment. g) A security checklist: secrets management, HTTPS configuration, CORS settings, rate limiting, and input validation.

Exercise 29: Cross-Project Integration

Design and implement an integration between two of the three capstone projects:

a) TaskFlow + DataLens: Build a pipeline that ingests TaskFlow's task completion data and produces productivity analytics (completion rate trends, average time to close by priority, team member workload distribution). b) TaskFlow + CodeForge: Build a system where creating a TaskFlow task with a specific label (e.g., "auto-implement") triggers CodeForge to generate a code implementation based on the task description. c) DataLens + CodeForge: Build a system where CodeForge generates DataLens pipeline configurations from natural-language descriptions of data processing requirements.

For your chosen integration, implement the data flow, API contracts, and at least three tests that verify the integration works correctly.

Exercise 30: Capstone Portfolio Project

Design your own capstone project from scratch that demonstrates mastery of vibe coding. Your project must:

a) Solve a real problem that you care about (not a tutorial example). b) Integrate at least four of the following: frontend (Chapter 16), backend API (Chapter 17), database (Chapter 18), external API integration (Chapter 20), authentication (Chapter 27), deployment (Chapter 29), AI features (Chapter 39). c) Write a complete specification document with user stories, architecture decisions, and technology justifications. d) Implement the core functionality (at minimum, the backend API with database and one integration point). e) Write a comprehensive test suite with at least 20 tests covering unit, integration, and edge cases. f) Document your vibe coding process: what prompts did you use, what worked, what required iteration, and what you would do differently.

This is an open-ended project. The goal is not to build something enormous, but to build something real that demonstrates deliberate architecture decisions, clean code, and effective use of AI-assisted development.

These exercises are designed to be completed over multiple sessions. Tier 4 and Tier 5 exercises are substantial projects that benefit from planning, iterative development, and reflection -- exactly the workflow this book teaches.