Chapter 18: Quiz

Test your understanding of database design and data modeling concepts. Each question has one best answer unless otherwise noted.

Question 1

What does the "A" in ACID stand for, and what does it guarantee?

Show Answer

**Atomicity.** It guarantees that a transaction either completes entirely or has no effect at all. If any part of a transaction fails, the entire transaction is rolled back, leaving the database unchanged. For example, in a bank transfer, both the debit and credit occur, or neither does.

Question 2

Which type of relationship requires a junction (association) table to implement?

A) One-to-one B) One-to-many C) Many-to-many D) Self-referential

Show Answer

**C) Many-to-many.** A junction table contains foreign keys referencing both related tables. For example, a `book_authors` table with `book_id` and `author_id` columns connects the `books` and `authors` tables when books can have multiple authors and authors can write multiple books.

Question 3

A table stores customer orders with columns: order_id, customer_name, customer_email, product_name, product_price, quantity. The customer_email depends on customer_name, not on order_id. Which normal form does this table violate?

Show Answer

**Third Normal Form (3NF).** The table has a transitive dependency: `order_id` -> `customer_name` -> `customer_email`. The `customer_email` depends on `customer_name`, which is a non-key column, rather than depending directly on the primary key. To fix this, extract customer information into a separate `customers` table.

Question 4

What is the difference between WHERE and HAVING in a SQL query?

Show Answer

**WHERE** filters individual rows *before* grouping and aggregation. **HAVING** filters groups *after* aggregation. For example, `WHERE price > 10` excludes rows before any GROUP BY runs, while `HAVING COUNT(*) > 5` excludes groups that have five or fewer members after aggregation. You cannot use aggregate functions in WHERE; you must use HAVING for those.

Question 5

What will this SQL query return that an INNER JOIN version would not?

SELECT u.username, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.username;

Show Answer

The LEFT JOIN version returns **all users, including those with zero orders** (their `order_count` would be 0). An INNER JOIN would only return users who have at least one order, completely omitting users with no orders from the result set.

Question 6

In SQLAlchemy 2.0, what is the difference between Mapped[str] and Mapped[str | None]?

Show Answer

`Mapped[str]` defines a **non-nullable** string column -- the database will enforce that this column always has a value. `Mapped[str | None]` defines a **nullable** string column -- the column can contain NULL values. This is SQLAlchemy 2.0's way of using Python type hints to control column nullability.

Question 7

What is the N+1 query problem, and how do you solve it in SQLAlchemy?

Show Answer

The **N+1 problem** occurs when code loads a list of N parent objects and then accesses a relationship on each one, triggering N additional queries (1 initial query + N relationship queries). For example, loading 100 users and then accessing `user.tasks` for each one generates 101 queries. The solution is **eager loading**. In SQLAlchemy, use `selectinload()` or `joinedload()` in the query options:

stmt = select(User).options(selectinload(User.tasks))

This loads all tasks in a single additional query (or as part of the main query with `joinedload`), reducing 101 queries to 2 (or 1).

Question 8

What is the purpose of the cascade="all, delete-orphan" parameter in a SQLAlchemy relationship?

Show Answer

It configures **cascade behavior** for the relationship. `"all"` means that operations on the parent (save, merge, refresh, expunge, delete) cascade to children. `"delete-orphan"` means that child objects removed from the parent's collection are automatically deleted from the database, even without explicitly deleting them. Together, they ensure that when a parent is deleted, all its children are deleted too, and children cannot exist without a parent.

Question 9

Why should you never edit an Alembic migration that has already been applied to a shared database?

Show Answer

Alembic tracks applied migrations by their revision ID. If you edit an already-applied migration, the database records show it as already run, so the changes will not be applied. Other developers who already ran the original migration will have a different schema than what the edited migration describes, causing inconsistencies. Instead, **create a new migration** that makes the additional changes. The migration history should be an append-only log.

Question 10

What is the three-step pattern for adding a NOT NULL column to a table that already contains data?

Show Answer

1. **Add the column as nullable**: `op.add_column("table", sa.Column("col", sa.Type(), nullable=True))` 2. **Populate existing rows with a value**: `op.execute("UPDATE table SET col = default_value WHERE col IS NULL")` 3. **Alter the column to be non-nullable**: `op.alter_column("table", "col", nullable=False)` This is necessary because adding a NOT NULL column directly would fail if the table has existing rows (they would have NULL for the new column, violating the constraint).

Question 11

Which SQLAlchemy query style is considered modern and recommended?

A) session.query(User).filter_by(username="alice").first() B) session.scalars(select(User).where(User.username == "alice")).first() C) session.execute("SELECT * FROM users WHERE username = 'alice'") D) User.query.filter_by(username="alice").first()

Show Answer

**B) `session.scalars(select(User).where(User.username == "alice")).first()`** This is the SQLAlchemy 2.0 style using the `select()` construct. Option A is the legacy 1.x style (still functional but not recommended for new code). Option C is raw SQL (loses ORM benefits and is vulnerable to SQL injection). Option D is Flask-SQLAlchemy's legacy query interface.

Question 12

What is the primary disadvantage of adding too many database indexes?

Show Answer

**Write performance degradation.** Every INSERT, UPDATE, and DELETE operation must also update all affected indexes, making write operations slower. Indexes also consume additional disk space. The trade-off is faster reads versus slower writes. You should index columns that are frequently queried but avoid indexing every column, especially on tables with heavy write traffic.

Question 13

When should you choose MongoDB over PostgreSQL?

Show Answer

MongoDB is appropriate when: (1) your data has a genuinely flexible or evolving schema where different documents need different fields, (2) your data is naturally hierarchical/nested and does not require complex cross-document relationships, or (3) you are building a rapid prototype and the schema is expected to change significantly. PostgreSQL is the better choice for most applications, especially those with complex relationships, strict data integrity requirements, or complex query needs (JOINs, aggregations, transactions).

Question 14

What is the purpose of the repository pattern in a data layer?

Show Answer

The **repository pattern** encapsulates all database access logic behind a clean interface, separating data access concerns from business logic. Benefits include: (1) database queries are centralized in one place rather than scattered throughout the codebase, (2) business logic does not depend directly on SQLAlchemy, making it easier to test and potentially swap databases, (3) common query patterns are reusable across different parts of the application, and (4) queries can be optimized in one place without touching business logic code.

Question 15

What is the difference between session.flush() and session.commit()?

Show Answer

**`flush()`** writes pending changes to the database but does not end the transaction. The changes are visible within the current session and can still be rolled back. It is useful for getting auto-generated IDs (like auto-increment primary keys) before committing. **`commit()`** calls flush and then ends the transaction, making all changes permanent and visible to other sessions/connections. Once committed, changes cannot be rolled back.

Question 16

What are the three layers of data validation described in the chapter, and what does each layer catch?

Show Answer

1. **Pydantic validation (application input)**: Catches type errors, format issues, and constraint violations before data enters the application. Examples: invalid email format, string too long, missing required field. 2. **Business logic validation (service layer)**: Catches domain-specific rule violations that require application context or database queries. Examples: username already taken, user does not have permission, referenced resource does not exist. 3. **Database constraints (schema level)**: The final safety net that prevents invalid data regardless of application bugs. Examples: UNIQUE violations, NOT NULL violations, CHECK constraint failures, foreign key violations.

Question 17

What does this SQLAlchemy code do, and why might it cause performance problems?

users = session.scalars(select(User)).all()
for user in users:
    if len(user.orders) > 10:
        print(f"{user.username} is a frequent buyer")

Show Answer

The code loads all users, then checks each user's order count. It causes an **N+1 query problem**: one query loads all users, then accessing `user.orders` for each user triggers a separate query to load that user's orders. With 1,000 users, this generates 1,001 queries. A better approach would be to use eager loading (`selectinload(User.orders)`) or, even better, a single query that counts orders per user using a GROUP BY with HAVING:

stmt = (
    select(User)
    .join(Order)
    .group_by(User.id)
    .having(func.count(Order.id) > 10)
)

Question 18

In the context of database design, what is "denormalization" and when is it justified?

Show Answer

**Denormalization** is the intentional introduction of data redundancy into a database schema -- the opposite of normalization. It involves storing derived or duplicated data to avoid expensive JOINs or calculations at query time. It is justified when: (1) read performance is critical and the redundant data avoids expensive joins in frequently-executed queries, (2) you are building reporting or analytics tables that are read-heavy, (3) the update frequency of the denormalized data is low (reducing the risk of inconsistencies), and (4) you have mechanisms in place to keep redundant data synchronized. The key principle is that denormalization should be a **conscious, documented decision** driven by measured performance needs, not an accident or shortcut.

Question 19

What does the server_default=func.now() parameter do differently from default=func.now()?

Show Answer

**`server_default=func.now()`** adds a DEFAULT clause to the column definition in the actual database schema. The database server generates the timestamp when a row is inserted, even if the insert comes from a raw SQL query or another application. **`default=func.now()`** is a Python-side default handled by SQLAlchemy. The timestamp is generated by Python when creating the object and is included in the INSERT statement. It only works when inserting through SQLAlchemy, not through raw SQL or other tools. `server_default` is generally preferred for timestamps because it works regardless of how data is inserted and uses the database server's clock (which is important for consistency in distributed systems).

Question 20

What Redis data structure would you use for each of these use cases?

Caching the result of an expensive database query for 5 minutes
Tracking the number of API requests per user per minute
Maintaining a sorted leaderboard of player scores
Storing a user's session data (multiple key-value pairs)

Show Answer

1. **String with TTL**: `SET cache_key value EX 300` -- Simple key-value with a 5-minute expiration. 2. **String with INCR and TTL**: `INCR rate:user:123:minute` with `EXPIRE rate:user:123:minute 60` -- An atomic counter that auto-expires. 3. **Sorted Set (ZSET)**: `ZADD leaderboard score player_name` -- Sorted sets maintain elements in order by score and support efficient range queries like "top 10 players." 4. **Hash**: `HSET session:abc123 field value` -- Hashes store multiple field-value pairs under a single key, perfect for structured session data.

Question 21

What is single table inheritance in SQLAlchemy, and what is its main trade-off?

Show Answer

**Single table inheritance** stores all types in a single database table, using a **discriminator column** (e.g., `type`) to distinguish between subtypes. All subtype-specific columns exist on the same table, with NULL values for columns that do not apply to a given subtype. **Main trade-off:** It is simple and performant (no JOINs needed to load any subtype), but it wastes storage space because subtype-specific columns are NULL for rows of other subtypes. It works well when subtypes share most columns and have few unique columns. It works poorly when subtypes have many distinct columns, as the table becomes wide and sparse.

Question 22

A developer writes this migration. What will happen when they run it on a table with 1 million existing rows?

def upgrade():
    op.add_column("orders", sa.Column("status", sa.String(20), nullable=False))

Show Answer

**The migration will fail.** Adding a NOT NULL column without a default value to a table that already contains rows is invalid because the existing rows would have NULL for the new column, violating the NOT NULL constraint. The fix is to either: (1) add a `server_default` value: `sa.Column("status", sa.String(20), nullable=False, server_default="pending")`, or (2) use the three-step pattern: add as nullable, populate with UPDATE, then alter to non-nullable.

Question 23

What is the difference between a composite index on (category_id, price) and two separate indexes on category_id and price?

Show Answer

A **composite index** on `(category_id, price)` is a single B-tree sorted first by `category_id` and then by `price` within each category. It efficiently supports queries that filter by `category_id` alone, or filter by `category_id` AND sort/filter by `price`. However, it does NOT efficiently support queries that only filter by `price` (the second column in the index). **Two separate indexes** can each be used independently. The database might use one or both (via a bitmap index scan) depending on the query. However, they cannot efficiently support a combined filter+sort pattern like "all products in category X sorted by price" -- that requires the composite index. The rule of thumb is: a composite index `(A, B)` supports queries on `A` and queries on `(A, B)`, but not queries on `B` alone. This is called the **leftmost prefix** property.

Question 24

In the defense-in-depth validation model, why is it important to have database constraints even when Pydantic already validates input?

Show Answer

Database constraints serve as the **last line of defense** against invalid data for several reasons: 1. **Application bugs**: Pydantic validation might be bypassed due to a code error, a missing validation decorator, or a code path that skips validation. 2. **Multiple entry points**: Data might enter the database through migration scripts, admin tools, background jobs, or raw SQL queries that do not go through Pydantic validation. 3. **Race conditions**: Two simultaneous requests might both pass the uniqueness check in application code but one must fail at the database level (via UNIQUE constraint). 4. **Defense in depth**: Security and data integrity should never rely on a single layer. If Pydantic has a bug or is misconfigured, the database constraints still protect data integrity.

Question 25

You are designing a database for a project management tool. A task can have multiple assignees, and each assignee has a role on that task (e.g., "owner", "reviewer", "contributor"). How would you model this?

Show Answer

This is a **many-to-many relationship with extra data**, requiring an **association object** pattern:

class TaskAssignment(Base):
    __tablename__ = "task_assignments"

    task_id: Mapped[int] = mapped_column(
        ForeignKey("tasks.id"), primary_key=True
    )
    user_id: Mapped[int] = mapped_column(
        ForeignKey("users.id"), primary_key=True
    )
    role: Mapped[str] = mapped_column(String(20))  # "owner", "reviewer", etc.
    assigned_at: Mapped[datetime] = mapped_column(server_default=func.now())

    task: Mapped["Task"] = relationship(back_populates="assignments")
    user: Mapped["User"] = relationship(back_populates="task_assignments")

A simple junction table (`task_id`, `user_id`) would not work because it cannot store the `role` and `assigned_at` data associated with each assignment. The association object pattern creates a full SQLAlchemy model for the junction table, allowing additional columns and relationships.