> "A spreadsheet is a notebook. A database is a filing system. One is for thinking; the other is for running a business."
In This Chapter
- What You Will Learn
- 23.1 Why Databases Instead of Files?
- 23.2 Relational Database Concepts
- 23.3 SQLite with Python's sqlite3 Module
- 23.4 Core SQL: The Queries You Will Use Every Day
- 23.5 SQLAlchemy ORM Introduction
- 23.6 Connecting to PostgreSQL
- 23.7 Reading SQL Results into pandas
- 23.8 Acme Corp's Inventory Database: The Full Schema
- 23.9 Chapter Summary
- Key Terms
Chapter 23: Database Basics — SQL and Python with SQLite and PostgreSQL
"A spreadsheet is a notebook. A database is a filing system. One is for thinking; the other is for running a business." — Marcus Webb, Acme Corp IT Manager
What You Will Learn
By the end of this chapter, you will be able to:
- Explain why databases outperform files and spreadsheets at scale
- Understand the core concepts of relational databases: tables, rows, columns, primary keys, and foreign keys
- Create and query a SQLite database using Python's built-in
sqlite3module - Write the 10–15 SQL queries that handle 90% of real business data needs
- Use parameterized queries to prevent SQL injection attacks
- Introduce SQLAlchemy's ORM to interact with databases using Python objects
- Connect to PostgreSQL using the same SQLAlchemy code as SQLite
- Read SQL query results directly into pandas DataFrames
23.1 Why Databases Instead of Files?
Priya Okonkwo has been running Acme Corp's operations reporting for two years using Excel. Every morning, Marcus Webb emails her a fresh CSV export of the inventory system. She pastes it into a master workbook, runs a few VLOOKUP formulas, and emails the summary to Sandra Chen before 9 a.m.
This works. Until it does not.
Last Tuesday, Marcus sent the export at 7:58 a.m. instead of 7:30. Priya's summary went to Sandra with yesterday's numbers. Two days before that, Priya accidentally saved over the master workbook with an intermediate draft. The week before that, the CSV was 180,000 rows — too large for Excel to open on her laptop.
These are not edge cases. They are the natural ceiling of file-based data management. When your data grows beyond a few thousand rows, when more than one person needs to read or write it at the same time, or when you need to ask complex questions that span multiple datasets, you have outgrown files. You need a database.
23.1.1 What a Database Actually Does
At the most basic level, a relational database is a program that:
- Stores data in structured tables — rows and columns, like a spreadsheet, but with strict rules about what goes in each column
- Answers questions efficiently — using a query language (SQL) designed for exactly this purpose
- Manages concurrent access — multiple users can read and write simultaneously without corrupting data
- Enforces integrity — the database can reject invalid data before it gets stored
- Supports transactions — a group of operations either all succeed or all fail together
23.1.2 Files vs. Databases: A Side-by-Side Comparison
| Concern | CSV / Excel File | Relational Database |
|---|---|---|
| Concurrent users | One writer at a time (file locking) | Many writers simultaneously |
| Query capability | Manual filters, VLOOKUP, pivot tables | SQL: join millions of rows in milliseconds |
| Data integrity | No enforcement — any value in any cell | Constraints, types, foreign keys enforced |
| File size limit | ~1M rows in Excel; slower as it grows | Billions of rows; handles gracefully |
| Backup and recovery | Manual; easy to lose work | Transaction logs; point-in-time recovery |
| Access control | File-level permissions only | Row-level security, user roles |
| Version history | Manual (save copies) | Audit tables, change data capture |
23.1.3 Which Database Should You Use?
You will encounter several database systems in business settings:
- SQLite — A file-based database engine built into Python's standard library. No server required. Perfect for development, testing, desktop applications, and datasets under a few hundred gigabytes. This is where we start.
- PostgreSQL — A full-featured, production-grade open-source database. Handles millions of transactions per day. Free to use. The professional standard for most businesses that do not have a vendor relationship with Oracle or Microsoft.
- MySQL / MariaDB — Another popular open-source database, slightly simpler than PostgreSQL but less feature-rich. Very common in web applications.
- Microsoft SQL Server — Common in enterprise Windows environments. Excellent tooling with the Microsoft ecosystem.
- Oracle Database — Used in very large enterprises. Expensive. Powerful. Often inherited rather than chosen.
The good news: once you learn SQL and SQLAlchemy, switching between these databases requires changing one line of configuration. The concepts and most of the syntax transfer directly.
23.2 Relational Database Concepts
Before writing any Python, you need the vocabulary. These five concepts underpin everything else.
23.2.1 Tables, Rows, and Columns
A table is a named collection of data organized into rows and columns. Think of it as a single sheet within a spreadsheet, but with stricter rules.
- A column (also called a field or attribute) represents one piece of information about each item — for example,
product_name,price,quantity_in_stock. - A row (also called a record or tuple) represents one item — one product, one customer, one order.
- Every column has a data type that restricts what values it can hold:
INTEGER,TEXT,REAL(decimal numbers),DATE,BOOLEAN, etc.
23.2.2 Primary Keys
A primary key is a column (or combination of columns) that uniquely identifies each row. No two rows can have the same primary key value, and the primary key can never be null.
In most business databases, the primary key is an auto-incrementing integer: id = 1, 2, 3, 4.... This is called a surrogate key — it has no business meaning, it simply exists to identify rows.
Sometimes a natural key makes sense — a column with an inherently unique value like a US Social Security Number, an ISBN for books, or a stock ticker symbol. Natural keys are controversial because "inherently unique" assumptions can break (two people can have the same name; ISBNs get reassigned; companies change tickers). Most modern databases use surrogate keys and add a unique constraint on the natural identifier separately.
23.2.3 Foreign Keys
A foreign key is a column in one table that holds the primary key of a row in another table. This is how relationships between tables are represented.
In Acme Corp's inventory database:
- The
orderstable has acustomer_idcolumn - That
customer_idreferences theidcolumn in thecustomerstable - This means: "find the customer for this order by looking up their ID in the customers table"
The database can enforce that you cannot create an order for a customer that does not exist. This is called referential integrity.
23.2.4 Normalization (Brief)
Normalization is the practice of organizing data to reduce redundancy. The core idea: store each fact in exactly one place.
If you store the customer's name and address on every order, you have redundancy. If the customer moves, you need to update every order row. If you miss one, your data is inconsistent.
The normalized approach: store the customer's information once in a customers table, then reference it with a foreign key in orders. When the customer moves, you update one row.
You do not need to memorize normalization rules (1NF, 2NF, 3NF) to work effectively with databases. The intuition — "store each fact once" — is enough for most business applications.
23.2.5 Schemas
A schema is the complete definition of a database's structure: all its tables, columns, data types, constraints, and relationships. When someone asks "what's your schema?", they want to know how the data is organized, not what data is in it.
23.3 SQLite with Python's sqlite3 Module
SQLite is the best possible starting point. It requires no installation, no server, no configuration. The database is a single file on your disk. Python ships with sqlite3 in its standard library — you can use it today without installing anything.
23.3.1 Connecting to a Database
import sqlite3
# Connect to a file-based database (creates the file if it does not exist)
connection = sqlite3.connect("acme_inventory.db")
# Connect to an in-memory database (useful for testing; disappears when the
# program exits)
connection = sqlite3.connect(":memory:")
The connect() call returns a connection object. Think of it as a phone line to the database. You need this line open to send queries.
23.3.2 Cursors
To actually execute SQL statements, you need a cursor — an object that sends SQL to the database and retrieves results:
cursor = connection.cursor()
cursor.execute("SELECT 1")
In practice, most code follows this pattern:
connection = sqlite3.connect("acme_inventory.db")
cursor = connection.cursor()
# ... execute SQL ...
connection.close()
23.3.3 Creating Tables
import sqlite3
connection = sqlite3.connect("acme_inventory.db")
cursor = connection.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sku TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
category TEXT NOT NULL,
unit_price REAL NOT NULL CHECK (unit_price >= 0),
quantity INTEGER NOT NULL DEFAULT 0
)
""")
connection.commit()
connection.close()
Key SQL keywords in CREATE TABLE:
PRIMARY KEY AUTOINCREMENT— SQLite assigns1, 2, 3...automaticallyNOT NULL— the column must have a value; NULL is not allowedUNIQUE— no two rows can have the same value in this columnCHECK (unit_price >= 0)— enforces a condition; the database rejects rows that fail itDEFAULT 0— if you do not supply a value, the database uses this default
23.3.4 Inserting Data
# Single row insert — NEVER do this with user-supplied data (SQL injection risk)
cursor.execute("INSERT INTO products (sku, name, category, unit_price, quantity) "
"VALUES ('SKU-001', 'Widget A', 'Hardware', 14.99, 100)")
# Safe parameterized insert — always use this form
cursor.execute(
"INSERT INTO products (sku, name, category, unit_price, quantity) "
"VALUES (?, ?, ?, ?, ?)",
("SKU-002", "Widget B", "Hardware", 24.99, 50)
)
# Insert multiple rows at once
products = [
("SKU-003", "Gadget X", "Electronics", 99.99, 25),
("SKU-004", "Gadget Y", "Electronics", 149.99, 10),
("SKU-005", "Connector", "Hardware", 4.99, 200),
]
cursor.executemany(
"INSERT INTO products (sku, name, category, unit_price, quantity) "
"VALUES (?, ?, ?, ?, ?)",
products
)
connection.commit()
The ? placeholders are called parameterized queries or prepared statements. They are essential for security and covered in depth in the next section.
23.3.5 Parameterized Queries and SQL Injection
SQL injection is one of the most common and damaging security vulnerabilities in software. It occurs when user-supplied data is concatenated directly into a SQL string, allowing an attacker to change the meaning of the query.
Consider this code:
# DANGEROUS — never do this
sku = input("Enter SKU to look up: ")
cursor.execute(f"SELECT * FROM products WHERE sku = '{sku}'")
If the user enters: '; DROP TABLE products; --
The executed SQL becomes:
SELECT * FROM products WHERE sku = ''; DROP TABLE products; --'
The products table is deleted. The attacker won. This is a real attack pattern responsible for countless data breaches.
The fix is simple: always use parameterized queries:
# SAFE — the database treats the user input as pure data, not SQL
sku = input("Enter SKU to look up: ")
cursor.execute("SELECT * FROM products WHERE sku = ?", (sku,))
The ? tells SQLite: "there is a value coming — treat it as data, no matter what it contains." The database driver handles escaping, quoting, and all the security details.
Note the (sku,) with a trailing comma — this creates a tuple. Python's sqlite3 requires parameters to be a sequence (tuple or list), even for a single value.
For named parameters (clearer with many values):
cursor.execute(
"SELECT * FROM products WHERE category = :cat AND unit_price < :max_price",
{"cat": "Hardware", "max_price": 20.00}
)
Named parameters using :name syntax improve readability when you have more than two or three parameters.
23.3.6 Selecting Data
# Execute a SELECT query
cursor.execute("SELECT id, sku, name, unit_price FROM products WHERE category = ?",
("Hardware",))
# Retrieve one row
row = cursor.fetchone()
print(row) # (1, 'SKU-001', 'Widget A', 14.99)
# Retrieve all remaining rows
rows = cursor.fetchall()
for row in rows:
print(row)
# Retrieve a batch (useful for large result sets)
batch = cursor.fetchmany(10) # Returns up to 10 rows
By default, each row is returned as a plain tuple. If you want rows to behave like dictionaries (access by column name), set the row factory:
connection.row_factory = sqlite3.Row
cursor.execute("SELECT id, sku, name FROM products")
row = cursor.fetchone()
print(row["name"]) # Access by column name
print(row[1]) # Still works — access by index
sqlite3.Row objects support both index and name access, which makes code more readable and less fragile (your code does not break if column order changes in the SELECT).
23.3.7 Updating and Deleting Data
# Update a single row
cursor.execute(
"UPDATE products SET quantity = ? WHERE sku = ?",
(75, "SKU-001")
)
print(f"Rows updated: {cursor.rowcount}")
# Update multiple rows matching a condition
cursor.execute(
"UPDATE products SET unit_price = unit_price * 1.05 WHERE category = ?",
("Hardware",)
)
# Delete a row
cursor.execute("DELETE FROM products WHERE sku = ?", ("SKU-005",))
# Delete multiple rows
cursor.execute("DELETE FROM products WHERE quantity = 0")
connection.commit()
cursor.rowcount tells you how many rows were affected by the last UPDATE or DELETE. This is useful for confirming that your operation did what you expected.
23.3.8 Context Managers and Transactions
SQLite (and relational databases generally) group operations into transactions. A transaction is a unit of work that either completes entirely or not at all. If your Python program crashes mid-transaction, the database rolls back to the last committed state — no half-written data.
Python's sqlite3 module supports context managers that handle commit() and rollback() automatically:
import sqlite3
with sqlite3.connect("acme_inventory.db") as conn:
conn.execute(
"UPDATE products SET quantity = quantity - ? WHERE sku = ?",
(10, "SKU-001")
)
conn.execute(
"INSERT INTO order_lines (product_id, quantity) VALUES (?, ?)",
(1, 10)
)
# If we reach here without exception, commit happens automatically.
# If an exception occurs, rollback happens automatically.
The with block commits on success and rolls back on any exception. This is the pattern you should use for any operation that modifies data.
For complex operations involving multiple steps, be explicit about transactions:
conn = sqlite3.connect("acme_inventory.db")
try:
conn.execute("BEGIN")
conn.execute("UPDATE products SET quantity = quantity - 10 WHERE id = 1")
# ... more operations ...
conn.commit()
except Exception as e:
conn.rollback()
print(f"Transaction failed, rolled back: {e}")
finally:
conn.close()
23.4 Core SQL: The Queries You Will Use Every Day
SQL (Structured Query Language) is the universal language for relational databases. You do not need to know all of SQL to be productive — the following queries cover the overwhelming majority of business data needs.
23.4.1 SELECT with WHERE, ORDER BY, and LIMIT
-- All products
SELECT * FROM products;
-- Specific columns only
SELECT sku, name, unit_price FROM products;
-- Filter with WHERE
SELECT sku, name, quantity
FROM products
WHERE quantity < 20;
-- Multiple conditions
SELECT sku, name, unit_price
FROM products
WHERE category = 'Hardware'
AND unit_price BETWEEN 5.00 AND 50.00;
-- Pattern matching with LIKE
SELECT sku, name
FROM products
WHERE name LIKE 'Widget%'; -- names starting with "Widget"
-- NULL checks
SELECT * FROM products WHERE discontinued_date IS NULL;
SELECT * FROM products WHERE discontinued_date IS NOT NULL;
-- Sort results
SELECT sku, name, unit_price
FROM products
ORDER BY unit_price DESC; -- Most expensive first
-- Limit number of results (top 5 most expensive)
SELECT sku, name, unit_price
FROM products
ORDER BY unit_price DESC
LIMIT 5;
23.4.2 Aggregate Functions and GROUP BY
Aggregates compute a single value from a set of rows:
-- Count all products
SELECT COUNT(*) FROM products;
-- Count non-null values in a specific column
SELECT COUNT(discontinued_date) FROM products;
-- Sum, average, min, max
SELECT
SUM(quantity) AS total_units,
AVG(unit_price) AS avg_price,
MIN(unit_price) AS cheapest,
MAX(unit_price) AS most_expensive
FROM products;
-- Aggregates by group
SELECT
category,
COUNT(*) AS product_count,
SUM(quantity) AS total_stock,
AVG(unit_price) AS avg_price,
SUM(unit_price * quantity) AS total_inventory_value
FROM products
GROUP BY category
ORDER BY total_inventory_value DESC;
-- Filter groups with HAVING (like WHERE, but applied after GROUP BY)
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) >= 3; -- Only categories with 3 or more products
The difference between WHERE and HAVING:
- WHERE filters rows before grouping
- HAVING filters groups after grouping
23.4.3 JOINs
A JOIN combines rows from two tables based on a related column. This is how you answer questions that span multiple tables.
Acme's database has three tables:
- customers — customer information
- orders — one row per order, with a customer_id foreign key
- order_lines — one row per product per order, with order_id and product_id foreign keys
-- INNER JOIN: only rows with a match in both tables
SELECT
orders.id AS order_id,
customers.name AS customer_name,
orders.order_date,
orders.total_amount
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id
ORDER BY orders.order_date DESC;
-- Using table aliases for cleaner SQL
SELECT
o.id AS order_id,
c.name AS customer_name,
o.order_date,
o.total_amount
FROM orders AS o
INNER JOIN customers AS c ON o.customer_id = c.id
ORDER BY o.order_date DESC;
-- Three-table join: orders with customer names and product names
SELECT
o.id AS order_id,
c.name AS customer_name,
p.name AS product_name,
ol.quantity,
ol.quantity * p.unit_price AS line_total
FROM orders AS o
INNER JOIN customers AS c ON o.customer_id = c.id
INNER JOIN order_lines AS ol ON ol.order_id = o.id
INNER JOIN products AS p ON ol.product_id = p.id;
-- LEFT JOIN: all rows from the left table; NULL for unmatched right rows
-- Find customers who have never placed an order
SELECT
c.id,
c.name,
COUNT(o.id) AS order_count
FROM customers AS c
LEFT JOIN orders AS o ON o.customer_id = c.id
GROUP BY c.id, c.name
HAVING COUNT(o.id) = 0;
INNER JOIN returns only rows where the join condition matches in both tables. If a customer has no orders, they do not appear in an INNER JOIN with orders.
LEFT JOIN (also called LEFT OUTER JOIN) returns all rows from the left table, filling in NULL for columns from the right table when there is no match. This is useful for finding "missing" relationships.
23.4.4 Subqueries
A subquery is a SELECT statement nested inside another SQL statement:
-- Products priced above average
SELECT sku, name, unit_price
FROM products
WHERE unit_price > (SELECT AVG(unit_price) FROM products);
-- Customers who have placed at least one order over $500
SELECT name, email
FROM customers
WHERE id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE total_amount > 500
);
Subqueries are powerful but can be slow on large tables. For complex analytical queries, a JOIN or a Common Table Expression (CTE) often performs better. That said, subqueries are perfectly fine for most business-scale datasets.
23.4.5 Inserting and Updating with SELECT
-- Insert into a table using data from another table (useful for migrations)
INSERT INTO archived_products (sku, name, unit_price)
SELECT sku, name, unit_price
FROM products
WHERE discontinued_date < '2024-01-01';
-- Update using a subquery
UPDATE products
SET quantity = 0
WHERE id IN (
SELECT product_id
FROM discontinued_products
);
23.5 SQLAlchemy ORM Introduction
sqlite3 is excellent for direct SQL work. But as your application grows, managing raw SQL strings throughout your codebase becomes unwieldy. SQLAlchemy is the most widely used Python database toolkit, and it offers two modes:
- Core — A SQL expression language that builds SQL programmatically with Python objects (closer to raw SQL)
- ORM (Object-Relational Mapper) — Maps Python classes to database tables; you work with Python objects instead of SQL strings
This section focuses on the ORM, which is what most Python developers encounter first.
Install SQLAlchemy:
pip install sqlalchemy
For PostgreSQL, also install the driver:
pip install psycopg2-binary
23.5.1 Engine and Session
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
# SQLite — no server required
engine = create_engine("sqlite:///acme_inventory.db", echo=False)
# PostgreSQL — requires a running PostgreSQL server
# engine = create_engine(
# "postgresql://username:password@localhost:5432/acme_db"
# )
# Create a Session class bound to this engine
Session = sessionmaker(bind=engine)
session = Session()
The engine manages connections to the database. The session is the working space for your ORM operations — it tracks which objects you have loaded, modified, or created, and it coordinates commits and rollbacks.
23.5.2 Defining Models
A SQLAlchemy model is a Python class that maps to a database table:
from sqlalchemy import Column, Integer, String, Float, ForeignKey, DateTime
from sqlalchemy.orm import DeclarativeBase, relationship
from datetime import datetime
class Base(DeclarativeBase):
pass
class Product(Base):
__tablename__ = "products"
id = Column(Integer, primary_key=True, autoincrement=True)
sku = Column(String(50), nullable=False, unique=True)
name = Column(String(200), nullable=False)
category = Column(String(100), nullable=False)
unit_price = Column(Float, nullable=False)
quantity = Column(Integer, nullable=False, default=0)
# Relationship: one product appears in many order lines
order_lines = relationship("OrderLine", back_populates="product")
def __repr__(self):
return f"<Product(sku={self.sku!r}, name={self.name!r})>"
class Customer(Base):
__tablename__ = "customers"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(200), nullable=False)
email = Column(String(200), nullable=False, unique=True)
city = Column(String(100))
state = Column(String(2))
orders = relationship("Order", back_populates="customer")
def __repr__(self):
return f"<Customer(name={self.name!r}, email={self.email!r})>"
class Order(Base):
__tablename__ = "orders"
id = Column(Integer, primary_key=True, autoincrement=True)
customer_id = Column(Integer, ForeignKey("customers.id"), nullable=False)
order_date = Column(DateTime, default=datetime.utcnow)
total_amount = Column(Float, nullable=False, default=0.0)
status = Column(String(50), default="pending")
customer = relationship("Customer", back_populates="orders")
order_lines = relationship("OrderLine", back_populates="order")
def __repr__(self):
return f"<Order(id={self.id}, customer_id={self.customer_id})>"
class OrderLine(Base):
__tablename__ = "order_lines"
id = Column(Integer, primary_key=True, autoincrement=True)
order_id = Column(Integer, ForeignKey("orders.id"), nullable=False)
product_id = Column(Integer, ForeignKey("products.id"), nullable=False)
quantity = Column(Integer, nullable=False)
unit_price = Column(Float, nullable=False) # Price at time of order
order = relationship("Order", back_populates="order_lines")
product = relationship("Product", back_populates="order_lines")
23.5.3 Creating Tables from Models
# Create all tables defined in models that inherit from Base
Base.metadata.create_all(engine)
This single line inspects all your model classes and creates the corresponding tables if they do not already exist.
23.5.4 Inserting with the ORM
# Create Python objects — they are not in the database yet
widget_a = Product(
sku="SKU-001",
name="Widget A",
category="Hardware",
unit_price=14.99,
quantity=100
)
acme_co = Customer(
name="Acme Co",
email="purchasing@acme.example.com",
city="Chicago",
state="IL"
)
# Add to the session (stages for insert)
session.add(widget_a)
session.add(acme_co)
# Add multiple at once
more_products = [
Product(sku="SKU-002", name="Widget B", category="Hardware",
unit_price=24.99, quantity=50),
Product(sku="SKU-003", name="Gadget X", category="Electronics",
unit_price=99.99, quantity=25),
]
session.add_all(more_products)
# Commit — writes everything to the database
session.commit()
# After commit, SQLAlchemy fills in auto-generated IDs
print(widget_a.id) # e.g., 1
23.5.5 Querying with the ORM
# Get all products
all_products = session.query(Product).all()
# Filter
hardware = session.query(Product).filter(Product.category == "Hardware").all()
# Multiple filters
affordable_hardware = (
session.query(Product)
.filter(Product.category == "Hardware")
.filter(Product.unit_price < 20.0)
.all()
)
# Get one record by primary key
product = session.get(Product, 1) # Get product with id=1
# Get one record by other criteria
product = session.query(Product).filter(Product.sku == "SKU-001").first()
# Order and limit
top_5_expensive = (
session.query(Product)
.order_by(Product.unit_price.desc())
.limit(5)
.all()
)
# Count
hardware_count = session.query(Product).filter(
Product.category == "Hardware"
).count()
23.5.6 Updating with the ORM
# Load the object, modify it, commit
product = session.query(Product).filter(Product.sku == "SKU-001").first()
product.quantity = 150
product.unit_price = 16.99
session.commit()
# Bulk update (more efficient for many rows)
from sqlalchemy import update
session.execute(
update(Product)
.where(Product.category == "Hardware")
.values(unit_price=Product.unit_price * 1.05)
)
session.commit()
23.5.7 ORM vs. Raw SQL — When to Use Each
This is a common question, and the honest answer is: both have their place.
Use the ORM when: - Building an application where you create, update, and delete individual records - You want Python objects with methods and properties - You need to switch between database backends (SQLite for dev, PostgreSQL for production) - Your team is more comfortable with Python than SQL
Use raw SQL (via SQLAlchemy Core or sqlite3) when:
- Running complex analytical queries (many joins, window functions, CTEs)
- Performance is critical — the ORM's generated SQL can be suboptimal for complex queries
- You want full control over the SQL
- You are reading data into pandas for analysis
A common pattern in production code: use the ORM for CRUD operations (Create, Read, Update, Delete) and raw SQL or SQLAlchemy Core for reporting queries.
23.6 Connecting to PostgreSQL
One of SQLAlchemy's greatest strengths is that switching database backends requires changing only the connection string. Everything else — model definitions, queries, session management — stays the same.
# SQLite (development)
engine = create_engine("sqlite:///acme_inventory.db")
# PostgreSQL (production)
engine = create_engine(
"postgresql://db_user:db_password@db_host:5432/acme_db"
)
# PostgreSQL with environment variables (recommended — never hardcode credentials)
import os
from dotenv import load_dotenv
load_dotenv()
DB_URL = (
f"postgresql://{os.getenv('DB_USER')}:{os.getenv('DB_PASSWORD')}"
f"@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT', '5432')}"
f"/{os.getenv('DB_NAME')}"
)
engine = create_engine(DB_URL)
The .env file (never committed to version control):
DB_USER=acme_user
DB_PASSWORD=supersecretpassword
DB_HOST=db.acme.example.com
DB_PORT=5432
DB_NAME=acme_production
This pattern — SQLite locally, PostgreSQL in production — is the standard workflow for Python web applications and data pipelines.
23.6.1 PostgreSQL-Specific Features
PostgreSQL supports several features that SQLite does not:
- Full-text search —
tsvector,tsqueryfor searching document content - JSON columns — store and query JSON data natively
- Window functions —
ROW_NUMBER(),RANK(),LAG(),LEAD()for analytical queries - ENUM types — restrict a column to a fixed set of values at the database level
- Concurrent writes — PostgreSQL handles multiple simultaneous writers; SQLite is limited
For most business applications that stay within SQL standard syntax, your code will work on both without modification.
23.7 Reading SQL Results into pandas
The bridge between the database world and the data analysis world is pd.read_sql(). This function runs a SQL query and loads the results directly into a DataFrame.
import pandas as pd
import sqlite3
conn = sqlite3.connect("acme_inventory.db")
# Simple query to DataFrame
df = pd.read_sql("SELECT * FROM products", conn)
print(df.head())
print(df.dtypes)
# Parameterized query (use format strings carefully — only for trusted values;
# use sqlalchemy for user-supplied parameters)
category = "Hardware"
df_hardware = pd.read_sql(
"SELECT sku, name, unit_price, quantity FROM products WHERE category = ?",
conn,
params=(category,)
)
# Complex analytical query
df_summary = pd.read_sql("""
SELECT
category,
COUNT(*) AS product_count,
SUM(quantity) AS total_units,
ROUND(AVG(unit_price), 2) AS avg_price,
ROUND(SUM(unit_price * quantity), 2) AS inventory_value
FROM products
GROUP BY category
ORDER BY inventory_value DESC
""", conn)
conn.close()
# Now use pandas normally
print(df_summary.to_string(index=False))
With SQLAlchemy:
import pandas as pd
from sqlalchemy import create_engine, text
engine = create_engine("sqlite:///acme_inventory.db")
# Using SQLAlchemy engine directly
df = pd.read_sql("SELECT * FROM products", engine)
# Parameterized with SQLAlchemy text()
df = pd.read_sql(
text("SELECT * FROM products WHERE category = :cat"),
engine,
params={"cat": "Hardware"}
)
This pattern is powerful: write your data retrieval logic in SQL (which databases execute efficiently), then hand the result to pandas for analysis, visualization, or export.
23.8 Acme Corp's Inventory Database: The Full Schema
Throughout this chapter's examples, we have been building toward Acme Corp's inventory database. Here is the complete schema:
CREATE TABLE IF NOT EXISTS customers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE,
phone TEXT,
city TEXT,
state TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sku TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
category TEXT NOT NULL,
unit_price REAL NOT NULL CHECK (unit_price >= 0),
quantity INTEGER NOT NULL DEFAULT 0,
reorder_point INTEGER NOT NULL DEFAULT 10,
supplier TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS orders (
id INTEGER PRIMARY KEY AUTOINCREMENT,
customer_id INTEGER NOT NULL REFERENCES customers(id),
order_date TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
status TEXT NOT NULL DEFAULT 'pending',
total_amount REAL NOT NULL DEFAULT 0.0,
shipping_addr TEXT,
notes TEXT
);
CREATE TABLE IF NOT EXISTS order_lines (
id INTEGER PRIMARY KEY AUTOINCREMENT,
order_id INTEGER NOT NULL REFERENCES orders(id),
product_id INTEGER NOT NULL REFERENCES products(id),
quantity INTEGER NOT NULL CHECK (quantity > 0),
unit_price REAL NOT NULL
);
23.9 Chapter Summary
You covered a tremendous amount of ground in this chapter:
- Why databases: scale, concurrency, querying power, and data integrity that files cannot provide
- Relational concepts: tables, rows, columns, primary keys, foreign keys, and normalization
- SQLite with sqlite3: connecting, creating tables, inserting, selecting, updating, deleting — all using parameterized queries
- Core SQL: SELECT, WHERE, ORDER BY, LIMIT, aggregate functions, GROUP BY, HAVING, INNER JOIN, LEFT JOIN, and subqueries
- Transactions and context managers: ensuring data consistency with commit and rollback
- SQLAlchemy ORM: defining models as Python classes, creating tables, inserting and querying with Python objects
- PostgreSQL: same SQLAlchemy code, different connection string
- pandas integration:
pd.read_sql()for loading query results into DataFrames
In the next chapter, you will learn to connect Python to cloud services — uploading files to S3, managing secrets, and running code in the cloud.
Key Terms
| Term | Definition |
|---|---|
| Database | A structured system for storing, organizing, and retrieving data |
| Table | A named collection of rows and columns in a relational database |
| Primary key | A column (or set of columns) that uniquely identifies each row |
| Foreign key | A column that references the primary key of another table |
| SQL | Structured Query Language — the standard language for relational databases |
| SQLite | A lightweight, file-based database engine built into Python |
| Parameterized query | A query using placeholders instead of string concatenation — prevents SQL injection |
| Transaction | A group of database operations that succeed or fail as a unit |
| ORM | Object-Relational Mapper — maps Python classes to database tables |
| SQLAlchemy | The most widely used Python database toolkit, supporting both ORM and raw SQL |
| JOIN | A SQL operation that combines rows from two or more tables |
| Normalization | Organizing data to eliminate redundancy — store each fact once |
Next: Chapter 24 — Connecting Python to Cloud Services