Case Study 23-1: Priya Queries the Source
Characters
- Priya Okonkwo — Operations Analyst, Acme Corp
- Marcus Webb — IT Manager, Acme Corp
Situation
It is 7:48 a.m. on a Tuesday. Priya is at her desk, coffee in hand, watching her inbox for Marcus's CSV export. The 8:30 operations review with Sandra starts in forty-two minutes.
At 8:04, the email arrives. Priya opens the attachment. 187,000 rows. Excel takes ninety seconds to load it. She applies her filters, builds her pivot table, and copies the summary numbers into the PowerPoint. At 8:29, she hits send.
Later that afternoon, Sandra flags a number. "The Widget A inventory figure doesn't match what the warehouse sent me." Priya checks. Marcus's export was timestamped 7:58 a.m. — he had pulled it before the morning receiving run was entered. The warehouse's number was right. Priya's report was wrong.
This is the fourth time in three months.
The Conversation
Priya walks to Marcus's desk.
"Marcus — can you just give me read access to the inventory database directly? I'm tired of being one CSV behind reality."
Marcus looks up from his monitor. "You know SQL?"
"I'm learning."
"I'll set you up with read-only credentials. But fair warning — the data isn't as clean as those CSVs I send you. I filter out some noise before the export."
"Good," says Priya. "I want to see the noise."
Setting Up the Connection
Marcus gives Priya a PostgreSQL connection string and walks her through installing the Python tools. She creates a .env file (he is very clear that she should not put the password in any script):
DB_HOST=db-prod.acme.internal
DB_PORT=5432
DB_NAME=acme_operations
DB_USER=priya_readonly
DB_PASSWORD=Tr0ub4dor&3_ro
She installs the required packages:
pip install sqlalchemy psycopg2-binary python-dotenv pandas
And writes her first connection script:
# acme_inventory_queries.py
import os
import pandas as pd
from dotenv import load_dotenv
from sqlalchemy import create_engine, text
load_dotenv()
DB_URL = (
f"postgresql://{os.getenv('DB_USER')}:{os.getenv('DB_PASSWORD')}"
f"@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT', '5432')}"
f"/{os.getenv('DB_NAME')}"
)
engine = create_engine(DB_URL)
# Quick sanity check
with engine.connect() as conn:
result = conn.execute(text("SELECT COUNT(*) FROM products"))
count = result.scalar()
print(f"Products in database: {count}")
It prints 183. The CSVs Marcus sent her had 180 rows. Three products were being filtered. She makes a note.
Writing the Inventory Report Query
Priya's first goal: replicate what the CSV gave her, but live. She builds the query incrementally, testing each piece.
def get_inventory_summary(engine) -> pd.DataFrame:
"""
Pull current inventory status for all active products.
Groups by category, showing stock levels and reorder flags.
"""
query = text("""
SELECT
p.category,
p.sku,
p.name,
p.quantity AS current_stock,
p.reorder_point,
p.unit_price,
ROUND(p.unit_price * GREATEST(p.quantity, 0), 2) AS stock_value,
CASE
WHEN p.quantity < 0 THEN 'NEGATIVE'
WHEN p.quantity = 0 THEN 'OUT OF STOCK'
WHEN p.quantity <= p.reorder_point THEN 'REORDER NOW'
WHEN p.quantity <= p.reorder_point * 2 THEN 'LOW'
ELSE 'OK'
END AS stock_status
FROM products AS p
WHERE p.active = TRUE
ORDER BY
CASE
WHEN p.quantity < 0 THEN 0
WHEN p.quantity = 0 THEN 1
WHEN p.quantity <= p.reorder_point THEN 2
ELSE 3
END,
p.category,
p.name
""")
return pd.read_sql(query, engine)
df = get_inventory_summary(engine)
print(df.head(20).to_string(index=False))
The output loads in under two seconds. No waiting for an email. No Excel file to open.
Finding the Negative Inventory
Priya is scanning the output when something catches her eye in the stock_status column: three rows say NEGATIVE. She filters to look closer:
negative_stock = df[df["stock_status"] == "NEGATIVE"].copy()
print(f"\nProducts with NEGATIVE inventory: {len(negative_stock)}")
print(negative_stock[["sku", "name", "category", "current_stock"]].to_string(index=False))
Output:
Products with NEGATIVE inventory: 3
sku name category current_stock
SKU-1005 Stainless Bracket Kit Hardware -5
SKU-2004 27-Inch Monitor Electronics -12
SKU-4003 Sticky Notes Mega Pack Supplies -3
Negative inventory is physically impossible. It means the system recorded more units going out than it ever showed coming in. Possible causes:
- A receiving shipment was entered in the wrong period
- A return was processed incorrectly
- A data entry error in the order management system
- The quantities were manually adjusted without corresponding paperwork
She writes a deeper diagnostic query:
def diagnose_negative_sku(engine, sku: str) -> dict:
"""
For a given SKU, show all inventory transactions to find where
the count went negative.
"""
# Recent orders containing this product
orders_query = text("""
SELECT
o.order_date,
o.id AS order_id,
o.status,
ol.quantity AS qty_out,
o.total_amount
FROM order_lines AS ol
INNER JOIN orders AS o ON o.id = ol.order_id
INNER JOIN products AS p ON p.id = ol.product_id
WHERE p.sku = :sku
ORDER BY o.order_date DESC
LIMIT 20
""")
# Recent inventory receipts for this product
receipts_query = text("""
SELECT
r.received_date,
r.quantity_received,
r.purchase_order_id,
r.notes
FROM inventory_receipts AS r
INNER JOIN products AS p ON p.id = r.product_id
WHERE p.sku = :sku
ORDER BY r.received_date DESC
LIMIT 20
""")
with engine.connect() as conn:
df_orders = pd.read_sql(orders_query, conn, params={"sku": sku})
df_receipts = pd.read_sql(receipts_query, conn, params={"sku": sku})
return {"recent_orders": df_orders, "recent_receipts": df_receipts}
for sku in negative_stock["sku"].tolist():
print(f"\n=== Diagnosing {sku} ===")
result = diagnose_negative_sku(engine, sku)
print("Recent orders:")
print(result["recent_orders"].to_string(index=False))
print("Recent receipts:")
print(result["recent_receipts"].to_string(index=False))
For SKU-2004 (27-Inch Monitor), she can see twelve units were shipped in February but the purchase order that was supposed to add twenty units shows zero units received — the receiving clerk had saved the receipt without entering the quantity. The database accepted it because the field allowed zero.
For SKU-1005 (Stainless Bracket Kit), the trail shows five units that appear on two order records — a duplicate entry when the system was restarted after a crash on January 14th.
Escalating the Finding
Priya documents her findings in a short summary:
def generate_integrity_report(engine) -> pd.DataFrame:
"""
Produce a data integrity report covering:
- Negative inventory
- Orders with no line items
- Products never received but in catalog
"""
query = text("""
SELECT
'Negative inventory' AS issue_type,
p.sku AS reference_id,
p.name AS description,
CAST(p.quantity AS TEXT) AS detail
FROM products AS p
WHERE p.quantity < 0
UNION ALL
SELECT
'Order with no lines',
CAST(o.id AS TEXT),
c.name,
o.order_date::TEXT
FROM orders AS o
INNER JOIN customers AS c ON c.id = o.customer_id
WHERE NOT EXISTS (
SELECT 1 FROM order_lines ol WHERE ol.order_id = o.id
)
ORDER BY issue_type, reference_id
""")
return pd.read_sql(query, engine)
report_df = generate_integrity_report(engine)
report_df.to_csv("data_integrity_report.csv", index=False)
print(f"Data integrity report: {len(report_df)} issues found.")
print(report_df.to_string(index=False))
She sends the CSV to Marcus with a short note: "Found these while setting up my live query. The three negative-inventory SKUs are causing my totals to underreport by about $3,400 in inventory value. Can we look at correcting them?"
Marcus replies in ten minutes: "I knew about two of them. Didn't know about the Sticky Notes one. Good catch. I'll get Warehouse to submit corrections."
The New Morning Workflow
Priya's new 8:30 prep takes eight minutes instead of forty:
# morning_report.py
# Run this at 8:00 AM for the 8:30 operations review.
import os
import pandas as pd
from dotenv import load_dotenv
from sqlalchemy import create_engine, text
from datetime import datetime
load_dotenv()
engine = create_engine(
f"postgresql://{os.getenv('DB_USER')}:{os.getenv('DB_PASSWORD')}"
f"@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT')}/{os.getenv('DB_NAME')}"
)
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M")
print(f"Acme Corp — Inventory Status as of {timestamp}")
print("=" * 60)
# Reorder alerts
reorder_df = pd.read_sql(text("""
SELECT sku, name, quantity, reorder_point, supplier
FROM products
WHERE quantity <= reorder_point AND active = TRUE
ORDER BY quantity ASC
"""), engine)
print(f"\nREORDER ALERTS ({len(reorder_df)} items):")
print(reorder_df.to_string(index=False))
# Category summary
summary_df = pd.read_sql(text("""
SELECT
category,
COUNT(*) AS products,
SUM(GREATEST(quantity, 0)) AS total_units,
ROUND(SUM(unit_price * GREATEST(quantity, 0)), 2) AS stock_value
FROM products WHERE active = TRUE
GROUP BY category ORDER BY stock_value DESC
"""), engine)
print(f"\nCATEGORY SUMMARY:")
print(summary_df.to_string(index=False))
engine.dispose()
The data is live. No email to wait for. No Excel to crash. And now she sees the three SKUs Marcus used to filter out before sending the CSV.
What Priya Learned
-
Direct database access beats scheduled file exports for operational reporting. The data is always current to the second.
-
Read-only credentials are the right way to give analysts database access. Priya cannot accidentally modify production data.
-
SQL in Python gives you the best of both worlds: the filtering power of SQL (done efficiently in the database) combined with the analysis and formatting power of pandas.
-
Clean CSVs can hide data quality problems. Marcus's filtering was well-intentioned but meant Priya was working with a sanitized view of reality. Seeing the raw data — including the negative quantities — led to fixing a real problem.
-
Parameterized queries matter even for read-only work. Priya's diagnostic function uses
:skuplaceholders rather than string formatting, which keeps her code safe and readable.
Try It Yourself
Using the sqlite_demo.py script from this chapter (which simulates the Acme Corp inventory database):
- Run the script to create and populate the database.
- Connect to
acme_inventory.dbusingsqlite3andpd.read_sql(). - Write a query to find all products at or below their reorder point.
- Write a query to find any products with negative quantity (the script inserts three for you to find).
- Write a query that shows the total inventory value per category.
- Export your results to a CSV file.