15 min read

> "A spreadsheet is a notebook. A database is a filing system. One is for thinking; the other is for running a business."

In This Chapter

What You Will Learn
23.1 Why Databases Instead of Files?
23.2 Relational Database Concepts
23.3 SQLite with Python's sqlite3 Module
23.4 Core SQL: The Queries You Will Use Every Day
23.5 SQLAlchemy ORM Introduction
23.6 Connecting to PostgreSQL
23.7 Reading SQL Results into pandas
23.8 Acme Corp's Inventory Database: The Full Schema
23.9 Chapter Summary
Key Terms

Chapter 23: Database Basics — SQL and Python with SQLite and PostgreSQL

"A spreadsheet is a notebook. A database is a filing system. One is for thinking; the other is for running a business." — Marcus Webb, Acme Corp IT Manager

What You Will Learn

By the end of this chapter, you will be able to:

Explain why databases outperform files and spreadsheets at scale
Understand the core concepts of relational databases: tables, rows, columns, primary keys, and foreign keys
Create and query a SQLite database using Python's built-in sqlite3 module
Write the 10–15 SQL queries that handle 90% of real business data needs
Use parameterized queries to prevent SQL injection attacks
Introduce SQLAlchemy's ORM to interact with databases using Python objects
Connect to PostgreSQL using the same SQLAlchemy code as SQLite
Read SQL query results directly into pandas DataFrames

23.1 Why Databases Instead of Files?

Priya Okonkwo has been running Acme Corp's operations reporting for two years using Excel. Every morning, Marcus Webb emails her a fresh CSV export of the inventory system. She pastes it into a master workbook, runs a few VLOOKUP formulas, and emails the summary to Sandra Chen before 9 a.m.

This works. Until it does not.

Last Tuesday, Marcus sent the export at 7:58 a.m. instead of 7:30. Priya's summary went to Sandra with yesterday's numbers. Two days before that, Priya accidentally saved over the master workbook with an intermediate draft. The week before that, the CSV was 180,000 rows — too large for Excel to open on her laptop.

These are not edge cases. They are the natural ceiling of file-based data management. When your data grows beyond a few thousand rows, when more than one person needs to read or write it at the same time, or when you need to ask complex questions that span multiple datasets, you have outgrown files. You need a database.

23.1.1 What a Database Actually Does

At the most basic level, a relational database is a program that:

Stores data in structured tables — rows and columns, like a spreadsheet, but with strict rules about what goes in each column
Answers questions efficiently — using a query language (SQL) designed for exactly this purpose
Manages concurrent access — multiple users can read and write simultaneously without corrupting data
Enforces integrity — the database can reject invalid data before it gets stored
Supports transactions — a group of operations either all succeed or all fail together

23.1.2 Files vs. Databases: A Side-by-Side Comparison

Concern	CSV / Excel File	Relational Database
Concurrent users	One writer at a time (file locking)	Many writers simultaneously
Query capability	Manual filters, VLOOKUP, pivot tables	SQL: join millions of rows in milliseconds
Data integrity	No enforcement — any value in any cell	Constraints, types, foreign keys enforced
File size limit	~1M rows in Excel; slower as it grows	Billions of rows; handles gracefully
Backup and recovery	Manual; easy to lose work	Transaction logs; point-in-time recovery
Access control	File-level permissions only	Row-level security, user roles
Version history	Manual (save copies)	Audit tables, change data capture

23.1.3 Which Database Should You Use?

You will encounter several database systems in business settings:

SQLite — A file-based database engine built into Python's standard library. No server required. Perfect for development, testing, desktop applications, and datasets under a few hundred gigabytes. This is where we start.
PostgreSQL — A full-featured, production-grade open-source database. Handles millions of transactions per day. Free to use. The professional standard for most businesses that do not have a vendor relationship with Oracle or Microsoft.
MySQL / MariaDB — Another popular open-source database, slightly simpler than PostgreSQL but less feature-rich. Very common in web applications.
Microsoft SQL Server — Common in enterprise Windows environments. Excellent tooling with the Microsoft ecosystem.
Oracle Database — Used in very large enterprises. Expensive. Powerful. Often inherited rather than chosen.

The good news: once you learn SQL and SQLAlchemy, switching between these databases requires changing one line of configuration. The concepts and most of the syntax transfer directly.

23.2 Relational Database Concepts

Before writing any Python, you need the vocabulary. These five concepts underpin everything else.

23.2.1 Tables, Rows, and Columns

A table is a named collection of data organized into rows and columns. Think of it as a single sheet within a spreadsheet, but with stricter rules.

A column (also called a field or attribute) represents one piece of information about each item — for example, product_name, price, quantity_in_stock.
A row (also called a record or tuple) represents one item — one product, one customer, one order.
Every column has a data type that restricts what values it can hold: INTEGER, TEXT, REAL (decimal numbers), DATE, BOOLEAN, etc.

23.2.2 Primary Keys

A primary key is a column (or combination of columns) that uniquely identifies each row. No two rows can have the same primary key value, and the primary key can never be null.

In most business databases, the primary key is an auto-incrementing integer: id = 1, 2, 3, 4.... This is called a surrogate key — it has no business meaning, it simply exists to identify rows.

Sometimes a natural key makes sense — a column with an inherently unique value like a US Social Security Number, an ISBN for books, or a stock ticker symbol. Natural keys are controversial because "inherently unique" assumptions can break (two people can have the same name; ISBNs get reassigned; companies change tickers). Most modern databases use surrogate keys and add a unique constraint on the natural identifier separately.

23.2.3 Foreign Keys

A foreign key is a column in one table that holds the primary key of a row in another table. This is how relationships between tables are represented.

In Acme Corp's inventory database:

The orders table has a customer_id column
That customer_id references the id column in the customers table
This means: "find the customer for this order by looking up their ID in the customers table"

The database can enforce that you cannot create an order for a customer that does not exist. This is called referential integrity.

23.2.4 Normalization (Brief)

Normalization is the practice of organizing data to reduce redundancy. The core idea: store each fact in exactly one place.

If you store the customer's name and address on every order, you have redundancy. If the customer moves, you need to update every order row. If you miss one, your data is inconsistent.

The normalized approach: store the customer's information once in a customers table, then reference it with a foreign key in orders. When the customer moves, you update one row.

You do not need to memorize normalization rules (1NF, 2NF, 3NF) to work effectively with databases. The intuition — "store each fact once" — is enough for most business applications.

23.2.5 Schemas

A schema is the complete definition of a database's structure: all its tables, columns, data types, constraints, and relationships. When someone asks "what's your schema?", they want to know how the data is organized, not what data is in it.

23.3 SQLite with Python's sqlite3 Module

SQLite is the best possible starting point. It requires no installation, no server, no configuration. The database is a single file on your disk. Python ships with sqlite3 in its standard library — you can use it today without installing anything.

23.3.1 Connecting to a Database

import sqlite3

# Connect to a file-based database (creates the file if it does not exist)
connection = sqlite3.connect("acme_inventory.db")

# Connect to an in-memory database (useful for testing; disappears when the
# program exits)
connection = sqlite3.connect(":memory:")

The connect() call returns a connection object. Think of it as a phone line to the database. You need this line open to send queries.

23.3.2 Cursors

To actually execute SQL statements, you need a cursor — an object that sends SQL to the database and retrieves results:

cursor = connection.cursor()
cursor.execute("SELECT 1")

In practice, most code follows this pattern:

connection = sqlite3.connect("acme_inventory.db")
cursor = connection.cursor()
# ... execute SQL ...
connection.close()

23.3.3 Creating Tables

import sqlite3

connection = sqlite3.connect("acme_inventory.db")
cursor = connection.cursor()

cursor.execute("""
    CREATE TABLE IF NOT EXISTS products (
        id          INTEGER PRIMARY KEY AUTOINCREMENT,
        sku         TEXT    NOT NULL UNIQUE,
        name        TEXT    NOT NULL,
        category    TEXT    NOT NULL,
        unit_price  REAL    NOT NULL CHECK (unit_price >= 0),
        quantity    INTEGER NOT NULL DEFAULT 0
    )
""")

connection.commit()
connection.close()

Key SQL keywords in CREATE TABLE:

PRIMARY KEY AUTOINCREMENT — SQLite assigns 1, 2, 3... automatically
NOT NULL — the column must have a value; NULL is not allowed
UNIQUE — no two rows can have the same value in this column
CHECK (unit_price >= 0) — enforces a condition; the database rejects rows that fail it
DEFAULT 0 — if you do not supply a value, the database uses this default

23.3.4 Inserting Data

# Single row insert — NEVER do this with user-supplied data (SQL injection risk)
cursor.execute("INSERT INTO products (sku, name, category, unit_price, quantity) "
               "VALUES ('SKU-001', 'Widget A', 'Hardware', 14.99, 100)")

# Safe parameterized insert — always use this form
cursor.execute(
    "INSERT INTO products (sku, name, category, unit_price, quantity) "
    "VALUES (?, ?, ?, ?, ?)",
    ("SKU-002", "Widget B", "Hardware", 24.99, 50)
)

# Insert multiple rows at once
products = [
    ("SKU-003", "Gadget X",  "Electronics", 99.99,  25),
    ("SKU-004", "Gadget Y",  "Electronics", 149.99, 10),
    ("SKU-005", "Connector", "Hardware",    4.99,  200),
]
cursor.executemany(
    "INSERT INTO products (sku, name, category, unit_price, quantity) "
    "VALUES (?, ?, ?, ?, ?)",
    products
)

connection.commit()

The ? placeholders are called parameterized queries or prepared statements. They are essential for security and covered in depth in the next section.

23.3.5 Parameterized Queries and SQL Injection

SQL injection is one of the most common and damaging security vulnerabilities in software. It occurs when user-supplied data is concatenated directly into a SQL string, allowing an attacker to change the meaning of the query.

Consider this code:

# DANGEROUS — never do this
sku = input("Enter SKU to look up: ")
cursor.execute(f"SELECT * FROM products WHERE sku = '{sku}'")

If the user enters: '; DROP TABLE products; --

The executed SQL becomes:

SELECT * FROM products WHERE sku = ''; DROP TABLE products; --'

The products table is deleted. The attacker won. This is a real attack pattern responsible for countless data breaches.

The fix is simple: always use parameterized queries:

# SAFE — the database treats the user input as pure data, not SQL
sku = input("Enter SKU to look up: ")
cursor.execute("SELECT * FROM products WHERE sku = ?", (sku,))

The ? tells SQLite: "there is a value coming — treat it as data, no matter what it contains." The database driver handles escaping, quoting, and all the security details.

Note the (sku,) with a trailing comma — this creates a tuple. Python's sqlite3 requires parameters to be a sequence (tuple or list), even for a single value.

For named parameters (clearer with many values):

cursor.execute(
    "SELECT * FROM products WHERE category = :cat AND unit_price < :max_price",
    {"cat": "Hardware", "max_price": 20.00}
)

Named parameters using :name syntax improve readability when you have more than two or three parameters.

23.3.6 Selecting Data

# Execute a SELECT query
cursor.execute("SELECT id, sku, name, unit_price FROM products WHERE category = ?",
               ("Hardware",))

# Retrieve one row
row = cursor.fetchone()
print(row)  # (1, 'SKU-001', 'Widget A', 14.99)

# Retrieve all remaining rows
rows = cursor.fetchall()
for row in rows:
    print(row)

# Retrieve a batch (useful for large result sets)
batch = cursor.fetchmany(10)  # Returns up to 10 rows

By default, each row is returned as a plain tuple. If you want rows to behave like dictionaries (access by column name), set the row factory:

connection.row_factory = sqlite3.Row

cursor.execute("SELECT id, sku, name FROM products")
row = cursor.fetchone()
print(row["name"])   # Access by column name
print(row[1])        # Still works — access by index

sqlite3.Row objects support both index and name access, which makes code more readable and less fragile (your code does not break if column order changes in the SELECT).

23.3.7 Updating and Deleting Data

# Update a single row
cursor.execute(
    "UPDATE products SET quantity = ? WHERE sku = ?",
    (75, "SKU-001")
)
print(f"Rows updated: {cursor.rowcount}")

# Update multiple rows matching a condition
cursor.execute(
    "UPDATE products SET unit_price = unit_price * 1.05 WHERE category = ?",
    ("Hardware",)
)

# Delete a row
cursor.execute("DELETE FROM products WHERE sku = ?", ("SKU-005",))

# Delete multiple rows
cursor.execute("DELETE FROM products WHERE quantity = 0")

connection.commit()

cursor.rowcount tells you how many rows were affected by the last UPDATE or DELETE. This is useful for confirming that your operation did what you expected.

23.3.8 Context Managers and Transactions

SQLite (and relational databases generally) group operations into transactions. A transaction is a unit of work that either completes entirely or not at all. If your Python program crashes mid-transaction, the database rolls back to the last committed state — no half-written data.

Python's sqlite3 module supports context managers that handle commit() and rollback() automatically:

import sqlite3

with sqlite3.connect("acme_inventory.db") as conn:
    conn.execute(
        "UPDATE products SET quantity = quantity - ? WHERE sku = ?",
        (10, "SKU-001")
    )
    conn.execute(
        "INSERT INTO order_lines (product_id, quantity) VALUES (?, ?)",
        (1, 10)
    )
    # If we reach here without exception, commit happens automatically.
    # If an exception occurs, rollback happens automatically.

The with block commits on success and rolls back on any exception. This is the pattern you should use for any operation that modifies data.

For complex operations involving multiple steps, be explicit about transactions:

conn = sqlite3.connect("acme_inventory.db")
try:
    conn.execute("BEGIN")
    conn.execute("UPDATE products SET quantity = quantity - 10 WHERE id = 1")
    # ... more operations ...
    conn.commit()
except Exception as e:
    conn.rollback()
    print(f"Transaction failed, rolled back: {e}")
finally:
    conn.close()

23.4 Core SQL: The Queries You Will Use Every Day

SQL (Structured Query Language) is the universal language for relational databases. You do not need to know all of SQL to be productive — the following queries cover the overwhelming majority of business data needs.

23.4.1 SELECT with WHERE, ORDER BY, and LIMIT

-- All products
SELECT * FROM products;

-- Specific columns only
SELECT sku, name, unit_price FROM products;

-- Filter with WHERE
SELECT sku, name, quantity
FROM products
WHERE quantity < 20;

-- Multiple conditions
SELECT sku, name, unit_price
FROM products
WHERE category = 'Hardware'
  AND unit_price BETWEEN 5.00 AND 50.00;

-- Pattern matching with LIKE
SELECT sku, name
FROM products
WHERE name LIKE 'Widget%';   -- names starting with "Widget"

-- NULL checks
SELECT * FROM products WHERE discontinued_date IS NULL;
SELECT * FROM products WHERE discontinued_date IS NOT NULL;

-- Sort results
SELECT sku, name, unit_price
FROM products
ORDER BY unit_price DESC;    -- Most expensive first

-- Limit number of results (top 5 most expensive)
SELECT sku, name, unit_price
FROM products
ORDER BY unit_price DESC
LIMIT 5;

23.4.2 Aggregate Functions and GROUP BY

Aggregates compute a single value from a set of rows:

-- Count all products
SELECT COUNT(*) FROM products;

-- Count non-null values in a specific column
SELECT COUNT(discontinued_date) FROM products;

-- Sum, average, min, max
SELECT
    SUM(quantity)         AS total_units,
    AVG(unit_price)       AS avg_price,
    MIN(unit_price)       AS cheapest,
    MAX(unit_price)       AS most_expensive
FROM products;

-- Aggregates by group
SELECT
    category,
    COUNT(*)              AS product_count,
    SUM(quantity)         AS total_stock,
    AVG(unit_price)       AS avg_price,
    SUM(unit_price * quantity) AS total_inventory_value
FROM products
GROUP BY category
ORDER BY total_inventory_value DESC;

-- Filter groups with HAVING (like WHERE, but applied after GROUP BY)
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) >= 3;   -- Only categories with 3 or more products

The difference between WHERE and HAVING: - WHERE filters rows before grouping - HAVING filters groups after grouping

23.4.3 JOINs

A JOIN combines rows from two tables based on a related column. This is how you answer questions that span multiple tables.

Acme's database has three tables: - customers — customer information - orders — one row per order, with a customer_id foreign key - order_lines — one row per product per order, with order_id and product_id foreign keys

-- INNER JOIN: only rows with a match in both tables
SELECT
    orders.id          AS order_id,
    customers.name     AS customer_name,
    orders.order_date,
    orders.total_amount
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id
ORDER BY orders.order_date DESC;

-- Using table aliases for cleaner SQL
SELECT
    o.id           AS order_id,
    c.name         AS customer_name,
    o.order_date,
    o.total_amount
FROM orders AS o
INNER JOIN customers AS c ON o.customer_id = c.id
ORDER BY o.order_date DESC;

-- Three-table join: orders with customer names and product names
SELECT
    o.id           AS order_id,
    c.name         AS customer_name,
    p.name         AS product_name,
    ol.quantity,
    ol.quantity * p.unit_price AS line_total
FROM orders          AS o
INNER JOIN customers AS c  ON o.customer_id = c.id
INNER JOIN order_lines AS ol ON ol.order_id = o.id
INNER JOIN products  AS p  ON ol.product_id = p.id;

-- LEFT JOIN: all rows from the left table; NULL for unmatched right rows
-- Find customers who have never placed an order
SELECT
    c.id,
    c.name,
    COUNT(o.id) AS order_count
FROM customers AS c
LEFT JOIN orders AS o ON o.customer_id = c.id
GROUP BY c.id, c.name
HAVING COUNT(o.id) = 0;

INNER JOIN returns only rows where the join condition matches in both tables. If a customer has no orders, they do not appear in an INNER JOIN with orders.

LEFT JOIN (also called LEFT OUTER JOIN) returns all rows from the left table, filling in NULL for columns from the right table when there is no match. This is useful for finding "missing" relationships.

23.4.4 Subqueries

A subquery is a SELECT statement nested inside another SQL statement:

-- Products priced above average
SELECT sku, name, unit_price
FROM products
WHERE unit_price > (SELECT AVG(unit_price) FROM products);

-- Customers who have placed at least one order over $500
SELECT name, email
FROM customers
WHERE id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE total_amount > 500
);

Subqueries are powerful but can be slow on large tables. For complex analytical queries, a JOIN or a Common Table Expression (CTE) often performs better. That said, subqueries are perfectly fine for most business-scale datasets.

23.4.5 Inserting and Updating with SELECT

-- Insert into a table using data from another table (useful for migrations)
INSERT INTO archived_products (sku, name, unit_price)
SELECT sku, name, unit_price
FROM products
WHERE discontinued_date < '2024-01-01';

-- Update using a subquery
UPDATE products
SET quantity = 0
WHERE id IN (
    SELECT product_id
    FROM discontinued_products
);

23.5 SQLAlchemy ORM Introduction

sqlite3 is excellent for direct SQL work. But as your application grows, managing raw SQL strings throughout your codebase becomes unwieldy. SQLAlchemy is the most widely used Python database toolkit, and it offers two modes:

Core — A SQL expression language that builds SQL programmatically with Python objects (closer to raw SQL)
ORM (Object-Relational Mapper) — Maps Python classes to database tables; you work with Python objects instead of SQL strings

This section focuses on the ORM, which is what most Python developers encounter first.

Install SQLAlchemy:

pip install sqlalchemy

For PostgreSQL, also install the driver:

pip install psycopg2-binary

23.5.1 Engine and Session

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# SQLite — no server required
engine = create_engine("sqlite:///acme_inventory.db", echo=False)

# PostgreSQL — requires a running PostgreSQL server
# engine = create_engine(
#     "postgresql://username:password@localhost:5432/acme_db"
# )

# Create a Session class bound to this engine
Session = sessionmaker(bind=engine)
session = Session()

The engine manages connections to the database. The session is the working space for your ORM operations — it tracks which objects you have loaded, modified, or created, and it coordinates commits and rollbacks.

23.5.2 Defining Models

A SQLAlchemy model is a Python class that maps to a database table:

from sqlalchemy import Column, Integer, String, Float, ForeignKey, DateTime
from sqlalchemy.orm import DeclarativeBase, relationship
from datetime import datetime


class Base(DeclarativeBase):
    pass


class Product(Base):
    __tablename__ = "products"

    id          = Column(Integer, primary_key=True, autoincrement=True)
    sku         = Column(String(50), nullable=False, unique=True)
    name        = Column(String(200), nullable=False)
    category    = Column(String(100), nullable=False)
    unit_price  = Column(Float, nullable=False)
    quantity    = Column(Integer, nullable=False, default=0)

    # Relationship: one product appears in many order lines
    order_lines = relationship("OrderLine", back_populates="product")

    def __repr__(self):
        return f"<Product(sku={self.sku!r}, name={self.name!r})>"


class Customer(Base):
    __tablename__ = "customers"

    id      = Column(Integer, primary_key=True, autoincrement=True)
    name    = Column(String(200), nullable=False)
    email   = Column(String(200), nullable=False, unique=True)
    city    = Column(String(100))
    state   = Column(String(2))

    orders  = relationship("Order", back_populates="customer")

    def __repr__(self):
        return f"<Customer(name={self.name!r}, email={self.email!r})>"


class Order(Base):
    __tablename__ = "orders"

    id           = Column(Integer, primary_key=True, autoincrement=True)
    customer_id  = Column(Integer, ForeignKey("customers.id"), nullable=False)
    order_date   = Column(DateTime, default=datetime.utcnow)
    total_amount = Column(Float, nullable=False, default=0.0)
    status       = Column(String(50), default="pending")

    customer     = relationship("Customer", back_populates="orders")
    order_lines  = relationship("OrderLine", back_populates="order")

    def __repr__(self):
        return f"<Order(id={self.id}, customer_id={self.customer_id})>"


class OrderLine(Base):
    __tablename__ = "order_lines"

    id          = Column(Integer, primary_key=True, autoincrement=True)
    order_id    = Column(Integer, ForeignKey("orders.id"), nullable=False)
    product_id  = Column(Integer, ForeignKey("products.id"), nullable=False)
    quantity    = Column(Integer, nullable=False)
    unit_price  = Column(Float, nullable=False)  # Price at time of order

    order       = relationship("Order", back_populates="order_lines")
    product     = relationship("Product", back_populates="order_lines")

23.5.3 Creating Tables from Models

# Create all tables defined in models that inherit from Base
Base.metadata.create_all(engine)

This single line inspects all your model classes and creates the corresponding tables if they do not already exist.

23.5.4 Inserting with the ORM

# Create Python objects — they are not in the database yet
widget_a = Product(
    sku="SKU-001",
    name="Widget A",
    category="Hardware",
    unit_price=14.99,
    quantity=100
)

acme_co = Customer(
    name="Acme Co",
    email="purchasing@acme.example.com",
    city="Chicago",
    state="IL"
)

# Add to the session (stages for insert)
session.add(widget_a)
session.add(acme_co)

# Add multiple at once
more_products = [
    Product(sku="SKU-002", name="Widget B", category="Hardware",
            unit_price=24.99, quantity=50),
    Product(sku="SKU-003", name="Gadget X", category="Electronics",
            unit_price=99.99, quantity=25),
]
session.add_all(more_products)

# Commit — writes everything to the database
session.commit()

# After commit, SQLAlchemy fills in auto-generated IDs
print(widget_a.id)  # e.g., 1

23.5.5 Querying with the ORM

# Get all products
all_products = session.query(Product).all()

# Filter
hardware = session.query(Product).filter(Product.category == "Hardware").all()

# Multiple filters
affordable_hardware = (
    session.query(Product)
    .filter(Product.category == "Hardware")
    .filter(Product.unit_price < 20.0)
    .all()
)

# Get one record by primary key
product = session.get(Product, 1)  # Get product with id=1

# Get one record by other criteria
product = session.query(Product).filter(Product.sku == "SKU-001").first()

# Order and limit
top_5_expensive = (
    session.query(Product)
    .order_by(Product.unit_price.desc())
    .limit(5)
    .all()
)

# Count
hardware_count = session.query(Product).filter(
    Product.category == "Hardware"
).count()

23.5.6 Updating with the ORM

# Load the object, modify it, commit
product = session.query(Product).filter(Product.sku == "SKU-001").first()
product.quantity = 150
product.unit_price = 16.99
session.commit()

# Bulk update (more efficient for many rows)
from sqlalchemy import update

session.execute(
    update(Product)
    .where(Product.category == "Hardware")
    .values(unit_price=Product.unit_price * 1.05)
)
session.commit()

23.5.7 ORM vs. Raw SQL — When to Use Each

This is a common question, and the honest answer is: both have their place.

Use the ORM when: - Building an application where you create, update, and delete individual records - You want Python objects with methods and properties - You need to switch between database backends (SQLite for dev, PostgreSQL for production) - Your team is more comfortable with Python than SQL

Use raw SQL (via SQLAlchemy Core or sqlite3) when: - Running complex analytical queries (many joins, window functions, CTEs) - Performance is critical — the ORM's generated SQL can be suboptimal for complex queries - You want full control over the SQL - You are reading data into pandas for analysis

A common pattern in production code: use the ORM for CRUD operations (Create, Read, Update, Delete) and raw SQL or SQLAlchemy Core for reporting queries.

23.6 Connecting to PostgreSQL

One of SQLAlchemy's greatest strengths is that switching database backends requires changing only the connection string. Everything else — model definitions, queries, session management — stays the same.

# SQLite (development)
engine = create_engine("sqlite:///acme_inventory.db")

# PostgreSQL (production)
engine = create_engine(
    "postgresql://db_user:db_password@db_host:5432/acme_db"
)

# PostgreSQL with environment variables (recommended — never hardcode credentials)
import os
from dotenv import load_dotenv

load_dotenv()

DB_URL = (
    f"postgresql://{os.getenv('DB_USER')}:{os.getenv('DB_PASSWORD')}"
    f"@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT', '5432')}"
    f"/{os.getenv('DB_NAME')}"
)
engine = create_engine(DB_URL)

The .env file (never committed to version control):

DB_USER=acme_user
DB_PASSWORD=supersecretpassword
DB_HOST=db.acme.example.com
DB_PORT=5432
DB_NAME=acme_production

This pattern — SQLite locally, PostgreSQL in production — is the standard workflow for Python web applications and data pipelines.

23.6.1 PostgreSQL-Specific Features

PostgreSQL supports several features that SQLite does not:

Full-text search — tsvector, tsquery for searching document content
JSON columns — store and query JSON data natively
Window functions — ROW_NUMBER(), RANK(), LAG(), LEAD() for analytical queries
ENUM types — restrict a column to a fixed set of values at the database level
Concurrent writes — PostgreSQL handles multiple simultaneous writers; SQLite is limited

For most business applications that stay within SQL standard syntax, your code will work on both without modification.

23.7 Reading SQL Results into pandas

The bridge between the database world and the data analysis world is pd.read_sql(). This function runs a SQL query and loads the results directly into a DataFrame.

import pandas as pd
import sqlite3

conn = sqlite3.connect("acme_inventory.db")

# Simple query to DataFrame
df = pd.read_sql("SELECT * FROM products", conn)
print(df.head())
print(df.dtypes)

# Parameterized query (use format strings carefully — only for trusted values;
# use sqlalchemy for user-supplied parameters)
category = "Hardware"
df_hardware = pd.read_sql(
    "SELECT sku, name, unit_price, quantity FROM products WHERE category = ?",
    conn,
    params=(category,)
)

# Complex analytical query
df_summary = pd.read_sql("""
    SELECT
        category,
        COUNT(*)                        AS product_count,
        SUM(quantity)                   AS total_units,
        ROUND(AVG(unit_price), 2)       AS avg_price,
        ROUND(SUM(unit_price * quantity), 2) AS inventory_value
    FROM products
    GROUP BY category
    ORDER BY inventory_value DESC
""", conn)

conn.close()

# Now use pandas normally
print(df_summary.to_string(index=False))

With SQLAlchemy:

import pandas as pd
from sqlalchemy import create_engine, text

engine = create_engine("sqlite:///acme_inventory.db")

# Using SQLAlchemy engine directly
df = pd.read_sql("SELECT * FROM products", engine)

# Parameterized with SQLAlchemy text()
df = pd.read_sql(
    text("SELECT * FROM products WHERE category = :cat"),
    engine,
    params={"cat": "Hardware"}
)

This pattern is powerful: write your data retrieval logic in SQL (which databases execute efficiently), then hand the result to pandas for analysis, visualization, or export.

23.8 Acme Corp's Inventory Database: The Full Schema

Throughout this chapter's examples, we have been building toward Acme Corp's inventory database. Here is the complete schema:

CREATE TABLE IF NOT EXISTS customers (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    name        TEXT    NOT NULL,
    email       TEXT    NOT NULL UNIQUE,
    phone       TEXT,
    city        TEXT,
    state       TEXT,
    created_at  TEXT    DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE IF NOT EXISTS products (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    sku             TEXT    NOT NULL UNIQUE,
    name            TEXT    NOT NULL,
    category        TEXT    NOT NULL,
    unit_price      REAL    NOT NULL CHECK (unit_price >= 0),
    quantity        INTEGER NOT NULL DEFAULT 0,
    reorder_point   INTEGER NOT NULL DEFAULT 10,
    supplier        TEXT,
    created_at      TEXT    DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE IF NOT EXISTS orders (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    customer_id     INTEGER NOT NULL REFERENCES customers(id),
    order_date      TEXT    NOT NULL DEFAULT CURRENT_TIMESTAMP,
    status          TEXT    NOT NULL DEFAULT 'pending',
    total_amount    REAL    NOT NULL DEFAULT 0.0,
    shipping_addr   TEXT,
    notes           TEXT
);

CREATE TABLE IF NOT EXISTS order_lines (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    order_id    INTEGER NOT NULL REFERENCES orders(id),
    product_id  INTEGER NOT NULL REFERENCES products(id),
    quantity    INTEGER NOT NULL CHECK (quantity > 0),
    unit_price  REAL    NOT NULL
);

23.9 Chapter Summary

You covered a tremendous amount of ground in this chapter:

Why databases: scale, concurrency, querying power, and data integrity that files cannot provide
Relational concepts: tables, rows, columns, primary keys, foreign keys, and normalization
SQLite with sqlite3: connecting, creating tables, inserting, selecting, updating, deleting — all using parameterized queries
Core SQL: SELECT, WHERE, ORDER BY, LIMIT, aggregate functions, GROUP BY, HAVING, INNER JOIN, LEFT JOIN, and subqueries
Transactions and context managers: ensuring data consistency with commit and rollback
SQLAlchemy ORM: defining models as Python classes, creating tables, inserting and querying with Python objects
PostgreSQL: same SQLAlchemy code, different connection string
pandas integration: pd.read_sql() for loading query results into DataFrames

In the next chapter, you will learn to connect Python to cloud services — uploading files to S3, managing secrets, and running code in the cloud.

Key Terms

Term	Definition
Database	A structured system for storing, organizing, and retrieving data
Table	A named collection of rows and columns in a relational database
Primary key	A column (or set of columns) that uniquely identifies each row
Foreign key	A column that references the primary key of another table
SQL	Structured Query Language — the standard language for relational databases
SQLite	A lightweight, file-based database engine built into Python
Parameterized query	A query using placeholders instead of string concatenation — prevents SQL injection
Transaction	A group of database operations that succeed or fail as a unit
ORM	Object-Relational Mapper — maps Python classes to database tables
SQLAlchemy	The most widely used Python database toolkit, supporting both ORM and raw SQL
JOIN	A SQL operation that combines rows from two or more tables
Normalization	Organizing data to eliminate redundancy — store each fact once

Next: Chapter 24 — Connecting Python to Cloud Services