Chapter 18: Exercises
Tier 1 -- Remember and Understand (Exercises 1-6)
Exercise 1: Database Vocabulary
Define each of the following terms in one or two sentences: table, row, column, primary key, foreign key, index, transaction, schema, migration, ORM.
Exercise 2: Relationship Identification
For each of the following pairs of entities, identify the relationship type (one-to-one, one-to-many, or many-to-many) and state which side holds the foreign key:
- Author and Book
- User and Profile
- Student and Course
- Department and Employee
- Order and Product
- Country and Capital City
Exercise 3: Normal Forms
Look at the following table and identify which normal form(s) it violates. Explain your reasoning.
+----+----------+------------------+-----------+------------------+
| id | name | courses | advisor | advisor_email |
+----+----------+------------------+-----------+------------------+
| 1 | Alice | Math, Physics | Dr. Smith | smith@univ.edu |
| 2 | Bob | Math, Chemistry | Dr. Smith | smith@univ.edu |
| 3 | Charlie | Physics | Dr. Jones | jones@univ.edu |
+----+----------+------------------+-----------+------------------+
Exercise 4: ACID Properties
Match each ACID property with the scenario it protects against:
Properties: Atomicity, Consistency, Isolation, Durability
Scenarios: 1. Two users simultaneously update the same bank account balance. 2. A transfer debits one account but crashes before crediting the other. 3. The server loses power one second after confirming a transaction. 4. An INSERT violates a UNIQUE constraint.
Exercise 5: SQL Clause Ordering
Arrange these SQL clauses in the correct syntactic order:
WHERE, SELECT, ORDER BY, FROM, GROUP BY, HAVING, LIMIT
Exercise 6: JOIN Type Selection
For each scenario, state whether you should use an INNER JOIN or LEFT JOIN and explain why:
- Show all customers and their orders (include customers who have never ordered).
- List products and their categories (every product has a category).
- Display employees and their assigned parking spots (not every employee has one).
- Report all departments and the count of employees in each.
Tier 2 -- Apply (Exercises 7-14)
Exercise 7: Writing Basic SQL
Given a products table with columns id, name, category, price, and stock_quantity, write SQL queries for:
- All products in the "Electronics" category, sorted by price descending.
- The average price of products in each category.
- Categories that have more than 10 products.
- The top 5 most expensive products that are in stock (stock_quantity > 0).
- Update the price of all "Electronics" products by increasing them 10%.
Exercise 8: SQLAlchemy Model Creation
Write a SQLAlchemy 2.0 model for a BlogPost table with the following requirements:
- Integer primary key
id - String
title(max 200 characters, required) - Text
body(required) - String
slug(max 250 characters, unique, required) - Boolean
published(default False) - DateTime
published_at(nullable) - DateTime
created_at(server default to current time) - Integer foreign key
author_idreferencing auserstable - Relationship to the
Usermodel viaauthorattribute
Include type hints, __repr__, and appropriate indexes.
Exercise 9: Query Building with SQLAlchemy
Using the models from Exercise 8, write SQLAlchemy 2.0 queries for:
- Get all published blog posts ordered by publication date (newest first).
- Get all posts by a specific author (by author_id).
- Count the number of published vs. unpublished posts.
- Get the 5 most recent posts with their author's username (using a join).
- Find authors who have more than 3 published posts.
Exercise 10: Pydantic Validation Schemas
Create Pydantic v2 schemas for the BlogPost model from Exercise 8:
BlogPostCreate-- for creating new posts (title, body, author_id required; slug auto-generated from title)BlogPostUpdate-- for updating posts (all fields optional)BlogPostRead-- for API responses (includes id, timestamps, author username)
Include appropriate field validators (e.g., title length, slug format).
Exercise 11: Writing a Migration
Write an Alembic migration script that:
- Adds a
view_countcolumn (integer, default 0, non-nullable) to an existingblog_poststable. - Adds a
tagstable withidandnamecolumns. - Adds a
post_tagsjunction table connectingblog_postsandtags. - Creates indexes on
post_tags.post_idandpost_tags.tag_id.
Include both upgrade() and downgrade() functions.
Exercise 12: Repository Pattern
Implement a BlogPostRepository class with these methods:
get_by_id(post_id: int) -> BlogPost | Noneget_by_slug(slug: str) -> BlogPost | Nonelist_published(limit: int = 20, offset: int = 0) -> list[BlogPost]search(query: str) -> list[BlogPost]-- searches title and bodycreate(data: BlogPostCreate) -> BlogPostpublish(post: BlogPost) -> BlogPost-- sets published=True and published_atdelete(post: BlogPost) -> None
Exercise 13: Index Design
Given the following query patterns for a products table, design an appropriate set of indexes. Justify each index and explain what queries it supports.
Query patterns:
1. Search products by category_id and sort by price
2. Find products by exact sku match
3. List products by created_at in descending order
4. Filter products by category_id and in_stock status
5. Search products by name (using LIKE '%term%')
Exercise 14: Data Validation Layers
For a user registration feature, implement validation at all three layers:
- Pydantic schema: Validate email format, password strength (min 8 chars, at least one digit and one uppercase letter), username format (alphanumeric, 3-30 chars).
- Business logic: Check username and email uniqueness against the database.
- Database constraints: Define the User model with appropriate CHECK constraints, UNIQUE constraints, and NOT NULL constraints.
Tier 3 -- Analyze (Exercises 15-22)
Exercise 15: Schema Review
An AI generated the following schema for a library management system. Identify at least 5 design problems and suggest fixes for each.
class Book(Base):
__tablename__ = "books"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str] = mapped_column(String(100))
author: Mapped[str] = mapped_column(String(100))
isbn: Mapped[str] = mapped_column(String(20))
genre: Mapped[str] = mapped_column(String(50))
available_copies: Mapped[int] = mapped_column()
total_copies: Mapped[int] = mapped_column()
borrower_names: Mapped[str] = mapped_column(Text) # comma-separated
class Borrow(Base):
__tablename__ = "borrows"
id: Mapped[int] = mapped_column(primary_key=True)
book_title: Mapped[str] = mapped_column(String(100))
borrower_name: Mapped[str] = mapped_column(String(100))
borrow_date: Mapped[str] = mapped_column(String(20))
return_date: Mapped[str] = mapped_column(String(20))
Exercise 16: Normalization Analysis
Take the following denormalized table and normalize it to 3NF. Show each step and explain what anomalies each normalization step eliminates.
orders:
+----------+-----------+-----------+------------------+---------+-------+----------+-----------+
| order_id | cust_name | cust_email| cust_address | prod_id | price | quantity | prod_name |
+----------+-----------+-----------+------------------+---------+-------+----------+-----------+
| 1001 | Alice | a@ex.com | 123 Main St | P01 | 29.99 | 2 | Widget |
| 1001 | Alice | a@ex.com | 123 Main St | P02 | 49.99 | 1 | Gadget |
| 1002 | Bob | b@ex.com | 456 Oak Ave | P01 | 29.99 | 3 | Widget |
+----------+-----------+-----------+------------------+---------+-------+----------+-----------+
Exercise 17: N+1 Query Detection
The following code has an N+1 query problem. Identify the problem, explain why it occurs, and rewrite the code to fix it using SQLAlchemy eager loading.
def get_project_summary(session: Session) -> list[dict]:
projects = session.scalars(select(Project)).all()
summaries = []
for project in projects:
task_count = len(project.tasks)
member_count = len(project.members)
latest_task = max(project.tasks, key=lambda t: t.created_at) if project.tasks else None
summaries.append({
"project": project.name,
"tasks": task_count,
"members": member_count,
"latest_task": latest_task.title if latest_task else None,
})
return summaries
Exercise 18: Query Optimization
Given this slow query and its EXPLAIN output, suggest at least three ways to improve performance:
SELECT u.username, COUNT(o.id) as order_count, SUM(o.total) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.created_at >= '2025-01-01'
AND u.is_active = true
GROUP BY u.username
ORDER BY total_spent DESC
LIMIT 10;
EXPLAIN output:
Sort (cost=15234.56..15234.58 rows=10)
-> HashAggregate (cost=15230.00..15232.00 rows=200)
-> Hash Left Join (cost=12000.00..14500.00 rows=50000)
-> Seq Scan on users (cost=0.00..5000.00 rows=100000)
Filter: (is_active = true)
-> Seq Scan on orders (cost=0.00..8000.00 rows=500000)
Filter: (created_at >= '2025-01-01')
Exercise 19: SQL vs. NoSQL Decision
For each of the following applications, recommend either a relational database (PostgreSQL) or a NoSQL database (MongoDB or Redis). Justify your choice by referencing specific data characteristics and access patterns:
- A social media platform tracking user posts, likes, and followers
- A real-time gaming leaderboard
- An e-commerce product catalog where each product category has different attributes
- A banking system for financial transactions
- A session store for a web application with 10 million active users
- A content management system for a news website
Exercise 20: Migration Safety Analysis
Review the following migration and identify potential problems when running it on a production database with 5 million rows in the users table:
def upgrade() -> None:
op.add_column("users", sa.Column("phone", sa.String(20), nullable=False))
op.create_index("ix_users_phone", "users", ["phone"], unique=True)
op.drop_column("users", "legacy_field")
def downgrade() -> None:
op.add_column("users", sa.Column("legacy_field", sa.Text()))
op.drop_index("ix_users_phone", table_name="users")
op.drop_column("users", "phone")
Exercise 21: Relationship Modeling Challenge
Model the following real-world scenario in SQLAlchemy:
A university has departments, professors, courses, and students. - Each department has many professors, but a professor belongs to one department. - Professors teach courses. A course can be taught by multiple professors (across sections), and a professor teaches multiple courses. - Students enroll in courses. Each enrollment has a semester, year, and grade. - Some professors serve as department heads (one per department). - Courses have prerequisites (other courses that must be completed first).
Draw the entity-relationship diagram (as ASCII art) and write the SQLAlchemy models.
Exercise 22: Polymorphic Design
Design a notification system where users can receive different types of notifications:
- Email notifications: have a subject line and HTML body
- SMS notifications: have a phone number and short message (max 160 chars)
- Push notifications: have a device token, title, and body
- All notifications share: id, user_id, created_at, read_at, type
Implement this using SQLAlchemy single table inheritance. Then discuss the trade-offs versus using joined table inheritance or a separate table for each notification type.
Tier 4 -- Evaluate and Create (Exercises 23-27)
Exercise 23: Complete Data Layer
Build a complete data layer for a recipe management application with the following features:
- Users can create, edit, and delete recipes
- Recipes have a title, description, prep time, cook time, servings, and difficulty level
- Recipes contain an ordered list of ingredients (with quantity and unit)
- Recipes contain ordered preparation steps
- Users can tag recipes with categories (e.g., "Italian", "Vegetarian", "Quick")
- Users can rate and review recipes (1-5 stars with optional text)
- Users can save recipes to personal collections
Deliver: 1. SQLAlchemy models with all relationships and constraints 2. Pydantic schemas for create/update/read operations 3. Repository classes for each entity 4. At least 3 meaningful indexes based on expected query patterns
Exercise 24: Schema Evolution
You have the recipe application from Exercise 23 in production with 50,000 recipes. Design a series of Alembic migrations to support these new features:
- Add a "nutrition facts" table linked to recipes (calories, protein, carbs, fat per serving)
- Add a "meal plan" feature where users can plan recipes for specific dates
- Change the rating system from integer (1-5) to decimal (0.0-5.0)
- Add full-text search capability to recipe titles and descriptions
For each migration, explain the strategy for handling existing data and minimizing downtime.
Exercise 25: Performance Optimization Project
You are given a slow recipe search endpoint. The query takes 3 seconds on a database with 100,000 recipes. Design and implement optimizations to bring it under 100ms:
def search_recipes(session, query, category=None, max_time=None, min_rating=None):
stmt = (
select(Recipe)
.join(Recipe.tags)
.join(Recipe.ratings)
.where(Recipe.title.ilike(f"%{query}%") | Recipe.description.ilike(f"%{query}%"))
)
if category:
stmt = stmt.where(Tag.name == category)
if max_time:
stmt = stmt.where((Recipe.prep_time + Recipe.cook_time) <= max_time)
if min_rating:
stmt = stmt.having(func.avg(Rating.score) >= min_rating)
stmt = stmt.group_by(Recipe.id).order_by(func.avg(Rating.score).desc()).limit(20)
return session.scalars(stmt).all()
Document your optimization strategy, including index design, query rewriting, and any schema changes.
Exercise 26: Multi-Database Architecture
Design a data architecture for a real-time analytics dashboard that needs:
- PostgreSQL for user accounts and configuration (transactional data)
- Redis for caching dashboard data and managing real-time updates
- A strategy for keeping cached data consistent with the source of truth
Implement the Python code that coordinates between PostgreSQL and Redis, including cache invalidation logic.
Exercise 27: Database Testing Strategy
Write a comprehensive test suite for the recipe application data layer from Exercise 23. Include:
- Unit tests for Pydantic validation schemas (valid and invalid inputs)
- Integration tests for repository methods using an in-memory SQLite database
- Tests for cascading deletes and referential integrity
- Tests for concurrent access scenarios
- Tests for migration rollback/rollforward
Use pytest fixtures for database setup and teardown. Ensure tests are isolated and can run in any order.
Tier 5 -- Synthesis and Transfer (Exercises 28-30)
Exercise 28: Domain-Driven Data Layer
Choose a domain you are personally familiar with (e.g., healthcare scheduling, inventory management, event planning, educational grading). Design a complete data layer from scratch:
- Write a domain description document listing all entities and relationships
- Create an entity-relationship diagram
- Implement SQLAlchemy models
- Design Pydantic schemas
- Build repository classes
- Create the initial Alembic migration
- Write at least 10 integration tests
Document the design decisions you made and the trade-offs you considered.
Exercise 29: AI-Assisted Schema Design Comparison
Use an AI coding assistant to generate a database schema for an e-commerce platform (products, categories, users, orders, payments, reviews, wishlists). Do this three times with different prompting strategies:
- Minimal prompt: "Design a database for an e-commerce platform"
- Detailed prompt: Provide a full list of entities, relationships, and constraints
- Iterative prompt: Start with core entities and add features one at a time
Compare the three resulting schemas. Which approach produced the best design? What errors did the AI make in each case? What did you learn about effective prompting for database design?
Exercise 30: Legacy Migration Project
You inherit a project that stores all data in JSON files (similar to the Chapter 6 task manager). The application has 10,000 users and 500,000 tasks stored across thousands of JSON files. Design and implement a complete migration:
- Analyze the existing JSON structure and identify all entities
- Design a normalized relational schema
- Write a migration script that reads JSON files and populates the database
- Handle data quality issues (missing fields, inconsistent formats, duplicates)
- Implement a verification step that confirms all data was migrated correctly
- Write a rollback plan in case the migration fails partway through
- Estimate the migration time and plan for zero-downtime deployment