Chapter 23 Key Takeaways: Database Basics

The Core Argument

Files and spreadsheets are appropriate for small, single-user datasets that do not require complex cross-referencing. When your data grows, when multiple people need to read and write it simultaneously, or when you need to ask questions that span multiple related datasets, you need a relational database.

Five Concepts That Underpin Everything

Tables store data in rows and columns with strict type and constraint rules — not the flexible anything-goes structure of a spreadsheet.
Primary keys uniquely identify each row — usually an auto-incrementing integer. No two rows can share a primary key, and it can never be null.
Foreign keys create relationships — a column in one table holds the primary key of a row in another table. This is how you link customers to orders, orders to products, and products to categories.
SQL is the universal query language — SELECT, INSERT, UPDATE, DELETE, JOIN. These operations work across SQLite, PostgreSQL, MySQL, and SQL Server with minor dialect differences.
Transactions ensure consistency — a group of related operations either all succeed or all fail. A database that crashes mid-write will roll back to the last committed state.

SQLite and sqlite3: The Starting Point

sqlite3 is part of Python's standard library — no installation required.
The entire database is a single file on disk, or :memory: for testing.
Connect with sqlite3.connect("file.db").
Set conn.row_factory = sqlite3.Row to access columns by name.
Always use PRAGMA foreign_keys = ON to enable constraint enforcement.
Use connection.commit() to save changes; changes are not persisted until you commit.

Parameterized Queries Are Non-Negotiable

Never concatenate user input into SQL strings:

# DANGEROUS
cursor.execute(f"SELECT * FROM users WHERE name = '{name}'")

# SAFE
cursor.execute("SELECT * FROM users WHERE name = ?", (name,))

Parameterized queries prevent SQL injection — one of the most common and severe security vulnerabilities in software.

The SQL You Need to Know

Operation	SQL Pattern
All rows	`SELECT * FROM table`
Filtered rows	`SELECT ... FROM table WHERE condition`
Sorted results	`ORDER BY column DESC`
Limited results	`LIMIT n`
Aggregate totals	`SUM()`, `COUNT()`, `AVG()`, `MIN()`, `MAX()`
Grouped aggregates	`GROUP BY column`
Filter groups	`HAVING aggregate_condition`
Related tables	`INNER JOIN table2 ON t1.fk = t2.pk`
All left rows	`LEFT JOIN table2 ON t1.fk = t2.pk`
Nested query	`WHERE col IN (SELECT ...)`

SQLAlchemy ORM: Python Classes as Tables

Define models as Python classes inheriting from Base (DeclarativeBase).
Base.metadata.create_all(engine) creates all tables from model definitions.
Work with Python objects: session.add(obj), session.commit().
Query with: session.query(Model).filter(Model.column == value).all().
Use session.get(Model, id) to fetch by primary key.
Relationships (one-to-many, many-to-many) are navigable as Python attributes.

When to use the ORM: application logic — creating, updating, and deleting individual records.

When to use raw SQL: complex analytical queries, reporting, bulk operations, performance-critical code.

Switching from SQLite to PostgreSQL

Change one line:

# SQLite
engine = create_engine("sqlite:///mydb.db")

# PostgreSQL
engine = create_engine("postgresql://user:pass@host:5432/dbname")

All model definitions, queries, and session code remain unchanged. This is the core value of SQLAlchemy — one codebase, multiple database backends.

Credentials: Never in Code

# WRONG — password is visible in version control
engine = create_engine("postgresql://admin:mysecret@db.example.com/prod")

# RIGHT — loaded from .env file (which is in .gitignore)
load_dotenv()
engine = create_engine(os.getenv("DATABASE_URL"))

Store connection strings and passwords in .env files. Add .env to .gitignore. Use environment variables in production.

pandas + SQL: The Analytical Sweet Spot

df = pd.read_sql("SELECT * FROM products WHERE category = ?",
                 conn, params=("Hardware",))

pd.read_sql() runs a query and returns a DataFrame. Use this pattern to: - Write complex analytical SQL (which the database executes efficiently) - Hand the results to pandas for formatting, visualization, or export - Avoid loading entire tables into Python memory

What Data Integrity Actually Means

A database can protect your data in ways a spreadsheet cannot: - NOT NULL — this column must have a value - UNIQUE — no two rows can have the same value in this column - CHECK (price >= 0) — the database rejects negative prices - FOREIGN KEY — you cannot reference a customer that does not exist - Transactions — partial writes are automatically rolled back on failure

These constraints catch errors at the moment of insertion, before they propagate through your data.

The Business Case in One Paragraph

Priya used to spend 40 minutes every morning preparing a report from a CSV file that was always 30 minutes old. After getting direct database read access, she spends 8 minutes and her data is live to the second. She also found three products with negative inventory — a data quality problem the CSV exports had been hiding. One SQL query surfaced a $3,400 discrepancy in reported inventory value. The database was already there. The skill to query it directly was the only thing missing.