Chapter 18: Exercises

Tier 1 -- Remember and Understand (Exercises 1-6)

Exercise 1: Database Vocabulary

Define each of the following terms in one or two sentences: table, row, column, primary key, foreign key, index, transaction, schema, migration, ORM.

Exercise 2: Relationship Identification

For each of the following pairs of entities, identify the relationship type (one-to-one, one-to-many, or many-to-many) and state which side holds the foreign key:

Author and Book
User and Profile
Student and Course
Department and Employee
Order and Product
Country and Capital City

Exercise 3: Normal Forms

Look at the following table and identify which normal form(s) it violates. Explain your reasoning.

+----+----------+------------------+-----------+------------------+
| id | name     | courses          | advisor   | advisor_email    |
+----+----------+------------------+-----------+------------------+
|  1 | Alice    | Math, Physics    | Dr. Smith | smith@univ.edu   |
|  2 | Bob      | Math, Chemistry  | Dr. Smith | smith@univ.edu   |
|  3 | Charlie  | Physics          | Dr. Jones | jones@univ.edu   |
+----+----------+------------------+-----------+------------------+

Exercise 4: ACID Properties

Match each ACID property with the scenario it protects against:

Properties: Atomicity, Consistency, Isolation, Durability

Scenarios: 1. Two users simultaneously update the same bank account balance. 2. A transfer debits one account but crashes before crediting the other. 3. The server loses power one second after confirming a transaction. 4. An INSERT violates a UNIQUE constraint.

Exercise 5: SQL Clause Ordering

Arrange these SQL clauses in the correct syntactic order: WHERE, SELECT, ORDER BY, FROM, GROUP BY, HAVING, LIMIT

Exercise 6: JOIN Type Selection

For each scenario, state whether you should use an INNER JOIN or LEFT JOIN and explain why:

Show all customers and their orders (include customers who have never ordered).
List products and their categories (every product has a category).
Display employees and their assigned parking spots (not every employee has one).
Report all departments and the count of employees in each.

Tier 2 -- Apply (Exercises 7-14)

Exercise 7: Writing Basic SQL

Given a products table with columns id, name, category, price, and stock_quantity, write SQL queries for:

All products in the "Electronics" category, sorted by price descending.
The average price of products in each category.
Categories that have more than 10 products.
The top 5 most expensive products that are in stock (stock_quantity > 0).
Update the price of all "Electronics" products by increasing them 10%.

Exercise 8: SQLAlchemy Model Creation

Write a SQLAlchemy 2.0 model for a BlogPost table with the following requirements:

Integer primary key id
String title (max 200 characters, required)
Text body (required)
String slug (max 250 characters, unique, required)
Boolean published (default False)
DateTime published_at (nullable)
DateTime created_at (server default to current time)
Integer foreign key author_id referencing a users table
Relationship to the User model via author attribute

Include type hints, __repr__, and appropriate indexes.

Exercise 9: Query Building with SQLAlchemy

Using the models from Exercise 8, write SQLAlchemy 2.0 queries for:

Get all published blog posts ordered by publication date (newest first).
Get all posts by a specific author (by author_id).
Count the number of published vs. unpublished posts.
Get the 5 most recent posts with their author's username (using a join).
Find authors who have more than 3 published posts.

Exercise 10: Pydantic Validation Schemas

Create Pydantic v2 schemas for the BlogPost model from Exercise 8:

BlogPostCreate -- for creating new posts (title, body, author_id required; slug auto-generated from title)
BlogPostUpdate -- for updating posts (all fields optional)
BlogPostRead -- for API responses (includes id, timestamps, author username)

Include appropriate field validators (e.g., title length, slug format).

Exercise 11: Writing a Migration

Write an Alembic migration script that:

Adds a view_count column (integer, default 0, non-nullable) to an existing blog_posts table.
Adds a tags table with id and name columns.
Adds a post_tags junction table connecting blog_posts and tags.
Creates indexes on post_tags.post_id and post_tags.tag_id.

Include both upgrade() and downgrade() functions.

Exercise 12: Repository Pattern

Implement a BlogPostRepository class with these methods:

get_by_id(post_id: int) -> BlogPost | None
get_by_slug(slug: str) -> BlogPost | None
list_published(limit: int = 20, offset: int = 0) -> list[BlogPost]
search(query: str) -> list[BlogPost] -- searches title and body
create(data: BlogPostCreate) -> BlogPost
publish(post: BlogPost) -> BlogPost -- sets published=True and published_at
delete(post: BlogPost) -> None

Exercise 13: Index Design

Given the following query patterns for a products table, design an appropriate set of indexes. Justify each index and explain what queries it supports.

Query patterns: 1. Search products by category_id and sort by price 2. Find products by exact sku match 3. List products by created_at in descending order 4. Filter products by category_id and in_stock status 5. Search products by name (using LIKE '%term%')

Exercise 14: Data Validation Layers

For a user registration feature, implement validation at all three layers:

Pydantic schema: Validate email format, password strength (min 8 chars, at least one digit and one uppercase letter), username format (alphanumeric, 3-30 chars).
Business logic: Check username and email uniqueness against the database.
Database constraints: Define the User model with appropriate CHECK constraints, UNIQUE constraints, and NOT NULL constraints.

Tier 3 -- Analyze (Exercises 15-22)

Exercise 15: Schema Review

An AI generated the following schema for a library management system. Identify at least 5 design problems and suggest fixes for each.

class Book(Base):
    __tablename__ = "books"
    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(100))
    author: Mapped[str] = mapped_column(String(100))
    isbn: Mapped[str] = mapped_column(String(20))
    genre: Mapped[str] = mapped_column(String(50))
    available_copies: Mapped[int] = mapped_column()
    total_copies: Mapped[int] = mapped_column()
    borrower_names: Mapped[str] = mapped_column(Text)  # comma-separated

class Borrow(Base):
    __tablename__ = "borrows"
    id: Mapped[int] = mapped_column(primary_key=True)
    book_title: Mapped[str] = mapped_column(String(100))
    borrower_name: Mapped[str] = mapped_column(String(100))
    borrow_date: Mapped[str] = mapped_column(String(20))
    return_date: Mapped[str] = mapped_column(String(20))

Exercise 16: Normalization Analysis

Take the following denormalized table and normalize it to 3NF. Show each step and explain what anomalies each normalization step eliminates.

orders:
+----------+-----------+-----------+------------------+---------+-------+----------+-----------+
| order_id | cust_name | cust_email| cust_address     | prod_id | price | quantity | prod_name |
+----------+-----------+-----------+------------------+---------+-------+----------+-----------+
| 1001     | Alice     | a@ex.com  | 123 Main St      | P01     | 29.99 | 2        | Widget    |
| 1001     | Alice     | a@ex.com  | 123 Main St      | P02     | 49.99 | 1        | Gadget    |
| 1002     | Bob       | b@ex.com  | 456 Oak Ave      | P01     | 29.99 | 3        | Widget    |
+----------+-----------+-----------+------------------+---------+-------+----------+-----------+

Exercise 17: N+1 Query Detection

The following code has an N+1 query problem. Identify the problem, explain why it occurs, and rewrite the code to fix it using SQLAlchemy eager loading.

def get_project_summary(session: Session) -> list[dict]:
    projects = session.scalars(select(Project)).all()
    summaries = []
    for project in projects:
        task_count = len(project.tasks)
        member_count = len(project.members)
        latest_task = max(project.tasks, key=lambda t: t.created_at) if project.tasks else None
        summaries.append({
            "project": project.name,
            "tasks": task_count,
            "members": member_count,
            "latest_task": latest_task.title if latest_task else None,
        })
    return summaries

Exercise 18: Query Optimization

Given this slow query and its EXPLAIN output, suggest at least three ways to improve performance:

SELECT u.username, COUNT(o.id) as order_count, SUM(o.total) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.created_at >= '2025-01-01'
  AND u.is_active = true
GROUP BY u.username
ORDER BY total_spent DESC
LIMIT 10;

EXPLAIN output:

Sort (cost=15234.56..15234.58 rows=10)
  -> HashAggregate (cost=15230.00..15232.00 rows=200)
     -> Hash Left Join (cost=12000.00..14500.00 rows=50000)
        -> Seq Scan on users (cost=0.00..5000.00 rows=100000)
              Filter: (is_active = true)
        -> Seq Scan on orders (cost=0.00..8000.00 rows=500000)
              Filter: (created_at >= '2025-01-01')

Exercise 19: SQL vs. NoSQL Decision

For each of the following applications, recommend either a relational database (PostgreSQL) or a NoSQL database (MongoDB or Redis). Justify your choice by referencing specific data characteristics and access patterns:

A social media platform tracking user posts, likes, and followers
A real-time gaming leaderboard
An e-commerce product catalog where each product category has different attributes
A banking system for financial transactions
A session store for a web application with 10 million active users
A content management system for a news website

Exercise 20: Migration Safety Analysis

Review the following migration and identify potential problems when running it on a production database with 5 million rows in the users table:

def upgrade() -> None:
    op.add_column("users", sa.Column("phone", sa.String(20), nullable=False))
    op.create_index("ix_users_phone", "users", ["phone"], unique=True)
    op.drop_column("users", "legacy_field")

def downgrade() -> None:
    op.add_column("users", sa.Column("legacy_field", sa.Text()))
    op.drop_index("ix_users_phone", table_name="users")
    op.drop_column("users", "phone")

Exercise 21: Relationship Modeling Challenge

Model the following real-world scenario in SQLAlchemy:

A university has departments, professors, courses, and students. - Each department has many professors, but a professor belongs to one department. - Professors teach courses. A course can be taught by multiple professors (across sections), and a professor teaches multiple courses. - Students enroll in courses. Each enrollment has a semester, year, and grade. - Some professors serve as department heads (one per department). - Courses have prerequisites (other courses that must be completed first).

Draw the entity-relationship diagram (as ASCII art) and write the SQLAlchemy models.

Exercise 22: Polymorphic Design

Design a notification system where users can receive different types of notifications:

Email notifications: have a subject line and HTML body
SMS notifications: have a phone number and short message (max 160 chars)
Push notifications: have a device token, title, and body
All notifications share: id, user_id, created_at, read_at, type

Implement this using SQLAlchemy single table inheritance. Then discuss the trade-offs versus using joined table inheritance or a separate table for each notification type.

Tier 4 -- Evaluate and Create (Exercises 23-27)

Exercise 23: Complete Data Layer

Build a complete data layer for a recipe management application with the following features:

Users can create, edit, and delete recipes
Recipes have a title, description, prep time, cook time, servings, and difficulty level
Recipes contain an ordered list of ingredients (with quantity and unit)
Recipes contain ordered preparation steps
Users can tag recipes with categories (e.g., "Italian", "Vegetarian", "Quick")
Users can rate and review recipes (1-5 stars with optional text)
Users can save recipes to personal collections

Deliver: 1. SQLAlchemy models with all relationships and constraints 2. Pydantic schemas for create/update/read operations 3. Repository classes for each entity 4. At least 3 meaningful indexes based on expected query patterns

Exercise 24: Schema Evolution

You have the recipe application from Exercise 23 in production with 50,000 recipes. Design a series of Alembic migrations to support these new features:

Add a "nutrition facts" table linked to recipes (calories, protein, carbs, fat per serving)
Add a "meal plan" feature where users can plan recipes for specific dates
Change the rating system from integer (1-5) to decimal (0.0-5.0)
Add full-text search capability to recipe titles and descriptions

For each migration, explain the strategy for handling existing data and minimizing downtime.

Exercise 25: Performance Optimization Project

You are given a slow recipe search endpoint. The query takes 3 seconds on a database with 100,000 recipes. Design and implement optimizations to bring it under 100ms:

def search_recipes(session, query, category=None, max_time=None, min_rating=None):
    stmt = (
        select(Recipe)
        .join(Recipe.tags)
        .join(Recipe.ratings)
        .where(Recipe.title.ilike(f"%{query}%") | Recipe.description.ilike(f"%{query}%"))
    )
    if category:
        stmt = stmt.where(Tag.name == category)
    if max_time:
        stmt = stmt.where((Recipe.prep_time + Recipe.cook_time) <= max_time)
    if min_rating:
        stmt = stmt.having(func.avg(Rating.score) >= min_rating)
    stmt = stmt.group_by(Recipe.id).order_by(func.avg(Rating.score).desc()).limit(20)
    return session.scalars(stmt).all()

Document your optimization strategy, including index design, query rewriting, and any schema changes.

Exercise 26: Multi-Database Architecture

Design a data architecture for a real-time analytics dashboard that needs:

PostgreSQL for user accounts and configuration (transactional data)
Redis for caching dashboard data and managing real-time updates
A strategy for keeping cached data consistent with the source of truth

Implement the Python code that coordinates between PostgreSQL and Redis, including cache invalidation logic.

Exercise 27: Database Testing Strategy

Write a comprehensive test suite for the recipe application data layer from Exercise 23. Include:

Unit tests for Pydantic validation schemas (valid and invalid inputs)
Integration tests for repository methods using an in-memory SQLite database
Tests for cascading deletes and referential integrity
Tests for concurrent access scenarios
Tests for migration rollback/rollforward

Use pytest fixtures for database setup and teardown. Ensure tests are isolated and can run in any order.

Tier 5 -- Synthesis and Transfer (Exercises 28-30)

Exercise 28: Domain-Driven Data Layer

Choose a domain you are personally familiar with (e.g., healthcare scheduling, inventory management, event planning, educational grading). Design a complete data layer from scratch:

Write a domain description document listing all entities and relationships
Create an entity-relationship diagram
Implement SQLAlchemy models
Design Pydantic schemas
Build repository classes
Create the initial Alembic migration
Write at least 10 integration tests

Document the design decisions you made and the trade-offs you considered.

Exercise 29: AI-Assisted Schema Design Comparison

Use an AI coding assistant to generate a database schema for an e-commerce platform (products, categories, users, orders, payments, reviews, wishlists). Do this three times with different prompting strategies:

Minimal prompt: "Design a database for an e-commerce platform"
Detailed prompt: Provide a full list of entities, relationships, and constraints
Iterative prompt: Start with core entities and add features one at a time

Compare the three resulting schemas. Which approach produced the best design? What errors did the AI make in each case? What did you learn about effective prompting for database design?

Exercise 30: Legacy Migration Project

You inherit a project that stores all data in JSON files (similar to the Chapter 6 task manager). The application has 10,000 users and 500,000 tasks stored across thousands of JSON files. Design and implement a complete migration:

Analyze the existing JSON structure and identify all entities
Design a normalized relational schema
Write a migration script that reads JSON files and populates the database
Handle data quality issues (missing fields, inconsistent formats, duplicates)
Implement a verification step that confirms all data was migrated correctly
Write a rollback plan in case the migration fails partway through
Estimate the migration time and plan for zero-downtime deployment