Chapter 18: Key Takeaways

Database Design and Data Modeling -- Summary Card

Relational databases organize data into tables with rows and columns, connected by primary and foreign keys. They provide ACID guarantees (Atomicity, Consistency, Isolation, Durability) that file-based storage cannot match.
Normalization through Third Normal Form (3NF) eliminates data redundancy and prevents update anomalies. The core rule: every non-key column should depend on the key, the whole key, and nothing but the key. Denormalize only when you have a documented, measured performance reason.
SQL fundamentals -- SELECT, INSERT, UPDATE, DELETE, JOINs, GROUP BY, and aggregations -- form a universal language for data access. Understanding SQL is essential even when using an ORM, because you need to verify generated queries and write complex ones the ORM cannot handle.
LEFT JOIN vs. INNER JOIN is a critical distinction. LEFT JOIN includes all rows from the left table even when there are no matches on the right; INNER JOIN excludes unmatched rows entirely. Always verify which type an AI generates.
SQLAlchemy 2.0 is the modern Python ORM standard. Use Mapped[] type annotations, mapped_column(), select() queries, and relationship() with back_populates. Avoid the legacy 1.x session.query() API in new code.
The N+1 query problem is the most common ORM performance issue. It occurs when loading N parent objects triggers N separate queries for their children. Fix it with eager loading: selectinload() or joinedload() in your query options.
Alembic migrations version-control your schema changes. Always review autogenerated migrations before running them. Use the three-step pattern (add nullable, populate, make non-nullable) when adding required columns to tables with existing data. Never edit an already-applied migration.
Index design determines query performance. Index columns used in WHERE, JOIN, ORDER BY, and GROUP BY clauses. Use composite indexes for multi-column filter patterns. Remember that indexes speed reads but slow writes.
Validate data at three layers: Pydantic schemas catch type and format errors at the API boundary, business logic checks domain rules requiring database access, and database constraints serve as the final safety net preventing invalid data regardless of application bugs.
NoSQL databases serve specific niches: Redis for caching, sessions, and rate limiting; MongoDB for genuinely flexible document schemas. For most applications with structured, relational data, PostgreSQL is the right choice.
The repository pattern encapsulates database access behind a clean interface, centralizing queries, enabling testing, and separating data access from business logic.
Start with SQLite for development, deploy with PostgreSQL in production. SQLAlchemy makes switching between them nearly painless, and this approach gives you zero-setup local development with production-grade reliability.
Schema design decisions are expensive to change later. Invest time reviewing AI-generated schemas before writing application code. Check for missing indexes, incorrect relationship cardinalities, inappropriate nullability, and missing constraints.
When prompting AI for database work, provide the full schema context, specify the database engine and SQLAlchemy version, and request specific patterns (CTEs for complex queries, eager loading for relationships, defense-in-depth validation). The more specific your prompt, the better the generated code.