Case Study 23-1: Priya Queries the Source

Characters

  • Priya Okonkwo — Operations Analyst, Acme Corp
  • Marcus Webb — IT Manager, Acme Corp

Situation

It is 7:48 a.m. on a Tuesday. Priya is at her desk, coffee in hand, watching her inbox for Marcus's CSV export. The 8:30 operations review with Sandra starts in forty-two minutes.

At 8:04, the email arrives. Priya opens the attachment. 187,000 rows. Excel takes ninety seconds to load it. She applies her filters, builds her pivot table, and copies the summary numbers into the PowerPoint. At 8:29, she hits send.

Later that afternoon, Sandra flags a number. "The Widget A inventory figure doesn't match what the warehouse sent me." Priya checks. Marcus's export was timestamped 7:58 a.m. — he had pulled it before the morning receiving run was entered. The warehouse's number was right. Priya's report was wrong.

This is the fourth time in three months.


The Conversation

Priya walks to Marcus's desk.

"Marcus — can you just give me read access to the inventory database directly? I'm tired of being one CSV behind reality."

Marcus looks up from his monitor. "You know SQL?"

"I'm learning."

"I'll set you up with read-only credentials. But fair warning — the data isn't as clean as those CSVs I send you. I filter out some noise before the export."

"Good," says Priya. "I want to see the noise."


Setting Up the Connection

Marcus gives Priya a PostgreSQL connection string and walks her through installing the Python tools. She creates a .env file (he is very clear that she should not put the password in any script):

DB_HOST=db-prod.acme.internal
DB_PORT=5432
DB_NAME=acme_operations
DB_USER=priya_readonly
DB_PASSWORD=Tr0ub4dor&3_ro

She installs the required packages:

pip install sqlalchemy psycopg2-binary python-dotenv pandas

And writes her first connection script:

# acme_inventory_queries.py
import os
import pandas as pd
from dotenv import load_dotenv
from sqlalchemy import create_engine, text

load_dotenv()

DB_URL = (
    f"postgresql://{os.getenv('DB_USER')}:{os.getenv('DB_PASSWORD')}"
    f"@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT', '5432')}"
    f"/{os.getenv('DB_NAME')}"
)

engine = create_engine(DB_URL)

# Quick sanity check
with engine.connect() as conn:
    result = conn.execute(text("SELECT COUNT(*) FROM products"))
    count = result.scalar()
    print(f"Products in database: {count}")

It prints 183. The CSVs Marcus sent her had 180 rows. Three products were being filtered. She makes a note.


Writing the Inventory Report Query

Priya's first goal: replicate what the CSV gave her, but live. She builds the query incrementally, testing each piece.

def get_inventory_summary(engine) -> pd.DataFrame:
    """
    Pull current inventory status for all active products.
    Groups by category, showing stock levels and reorder flags.
    """
    query = text("""
        SELECT
            p.category,
            p.sku,
            p.name,
            p.quantity              AS current_stock,
            p.reorder_point,
            p.unit_price,
            ROUND(p.unit_price * GREATEST(p.quantity, 0), 2) AS stock_value,
            CASE
                WHEN p.quantity < 0          THEN 'NEGATIVE'
                WHEN p.quantity = 0          THEN 'OUT OF STOCK'
                WHEN p.quantity <= p.reorder_point THEN 'REORDER NOW'
                WHEN p.quantity <= p.reorder_point * 2 THEN 'LOW'
                ELSE 'OK'
            END                     AS stock_status
        FROM products AS p
        WHERE p.active = TRUE
        ORDER BY
            CASE
                WHEN p.quantity < 0 THEN 0
                WHEN p.quantity = 0 THEN 1
                WHEN p.quantity <= p.reorder_point THEN 2
                ELSE 3
            END,
            p.category,
            p.name
    """)

    return pd.read_sql(query, engine)


df = get_inventory_summary(engine)
print(df.head(20).to_string(index=False))

The output loads in under two seconds. No waiting for an email. No Excel file to open.


Finding the Negative Inventory

Priya is scanning the output when something catches her eye in the stock_status column: three rows say NEGATIVE. She filters to look closer:

negative_stock = df[df["stock_status"] == "NEGATIVE"].copy()
print(f"\nProducts with NEGATIVE inventory: {len(negative_stock)}")
print(negative_stock[["sku", "name", "category", "current_stock"]].to_string(index=False))

Output:

Products with NEGATIVE inventory: 3

         sku                   name     category  current_stock
    SKU-1005     Stainless Bracket Kit   Hardware             -5
    SKU-2004        27-Inch Monitor   Electronics            -12
    SKU-4003  Sticky Notes Mega Pack    Supplies             -3

Negative inventory is physically impossible. It means the system recorded more units going out than it ever showed coming in. Possible causes:

  1. A receiving shipment was entered in the wrong period
  2. A return was processed incorrectly
  3. A data entry error in the order management system
  4. The quantities were manually adjusted without corresponding paperwork

She writes a deeper diagnostic query:

def diagnose_negative_sku(engine, sku: str) -> dict:
    """
    For a given SKU, show all inventory transactions to find where
    the count went negative.
    """
    # Recent orders containing this product
    orders_query = text("""
        SELECT
            o.order_date,
            o.id            AS order_id,
            o.status,
            ol.quantity     AS qty_out,
            o.total_amount
        FROM order_lines AS ol
        INNER JOIN orders AS o ON o.id = ol.order_id
        INNER JOIN products AS p ON p.id = ol.product_id
        WHERE p.sku = :sku
        ORDER BY o.order_date DESC
        LIMIT 20
    """)

    # Recent inventory receipts for this product
    receipts_query = text("""
        SELECT
            r.received_date,
            r.quantity_received,
            r.purchase_order_id,
            r.notes
        FROM inventory_receipts AS r
        INNER JOIN products AS p ON p.id = r.product_id
        WHERE p.sku = :sku
        ORDER BY r.received_date DESC
        LIMIT 20
    """)

    with engine.connect() as conn:
        df_orders   = pd.read_sql(orders_query,   conn, params={"sku": sku})
        df_receipts = pd.read_sql(receipts_query, conn, params={"sku": sku})

    return {"recent_orders": df_orders, "recent_receipts": df_receipts}


for sku in negative_stock["sku"].tolist():
    print(f"\n=== Diagnosing {sku} ===")
    result = diagnose_negative_sku(engine, sku)
    print("Recent orders:")
    print(result["recent_orders"].to_string(index=False))
    print("Recent receipts:")
    print(result["recent_receipts"].to_string(index=False))

For SKU-2004 (27-Inch Monitor), she can see twelve units were shipped in February but the purchase order that was supposed to add twenty units shows zero units received — the receiving clerk had saved the receipt without entering the quantity. The database accepted it because the field allowed zero.

For SKU-1005 (Stainless Bracket Kit), the trail shows five units that appear on two order records — a duplicate entry when the system was restarted after a crash on January 14th.


Escalating the Finding

Priya documents her findings in a short summary:

def generate_integrity_report(engine) -> pd.DataFrame:
    """
    Produce a data integrity report covering:
    - Negative inventory
    - Orders with no line items
    - Products never received but in catalog
    """
    query = text("""
        SELECT
            'Negative inventory'    AS issue_type,
            p.sku                   AS reference_id,
            p.name                  AS description,
            CAST(p.quantity AS TEXT) AS detail
        FROM products AS p
        WHERE p.quantity < 0

        UNION ALL

        SELECT
            'Order with no lines',
            CAST(o.id AS TEXT),
            c.name,
            o.order_date::TEXT
        FROM orders AS o
        INNER JOIN customers AS c ON c.id = o.customer_id
        WHERE NOT EXISTS (
            SELECT 1 FROM order_lines ol WHERE ol.order_id = o.id
        )

        ORDER BY issue_type, reference_id
    """)

    return pd.read_sql(query, engine)


report_df = generate_integrity_report(engine)
report_df.to_csv("data_integrity_report.csv", index=False)
print(f"Data integrity report: {len(report_df)} issues found.")
print(report_df.to_string(index=False))

She sends the CSV to Marcus with a short note: "Found these while setting up my live query. The three negative-inventory SKUs are causing my totals to underreport by about $3,400 in inventory value. Can we look at correcting them?"

Marcus replies in ten minutes: "I knew about two of them. Didn't know about the Sticky Notes one. Good catch. I'll get Warehouse to submit corrections."


The New Morning Workflow

Priya's new 8:30 prep takes eight minutes instead of forty:

# morning_report.py
# Run this at 8:00 AM for the 8:30 operations review.

import os
import pandas as pd
from dotenv import load_dotenv
from sqlalchemy import create_engine, text
from datetime import datetime

load_dotenv()
engine = create_engine(
    f"postgresql://{os.getenv('DB_USER')}:{os.getenv('DB_PASSWORD')}"
    f"@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT')}/{os.getenv('DB_NAME')}"
)

timestamp = datetime.now().strftime("%Y-%m-%d %H:%M")
print(f"Acme Corp — Inventory Status as of {timestamp}")
print("=" * 60)

# Reorder alerts
reorder_df = pd.read_sql(text("""
    SELECT sku, name, quantity, reorder_point, supplier
    FROM products
    WHERE quantity <= reorder_point AND active = TRUE
    ORDER BY quantity ASC
"""), engine)

print(f"\nREORDER ALERTS ({len(reorder_df)} items):")
print(reorder_df.to_string(index=False))

# Category summary
summary_df = pd.read_sql(text("""
    SELECT
        category,
        COUNT(*)                         AS products,
        SUM(GREATEST(quantity, 0))       AS total_units,
        ROUND(SUM(unit_price * GREATEST(quantity, 0)), 2) AS stock_value
    FROM products WHERE active = TRUE
    GROUP BY category ORDER BY stock_value DESC
"""), engine)

print(f"\nCATEGORY SUMMARY:")
print(summary_df.to_string(index=False))

engine.dispose()

The data is live. No email to wait for. No Excel to crash. And now she sees the three SKUs Marcus used to filter out before sending the CSV.


What Priya Learned

  1. Direct database access beats scheduled file exports for operational reporting. The data is always current to the second.

  2. Read-only credentials are the right way to give analysts database access. Priya cannot accidentally modify production data.

  3. SQL in Python gives you the best of both worlds: the filtering power of SQL (done efficiently in the database) combined with the analysis and formatting power of pandas.

  4. Clean CSVs can hide data quality problems. Marcus's filtering was well-intentioned but meant Priya was working with a sanitized view of reality. Seeing the raw data — including the negative quantities — led to fixing a real problem.

  5. Parameterized queries matter even for read-only work. Priya's diagnostic function uses :sku placeholders rather than string formatting, which keeps her code safe and readable.


Try It Yourself

Using the sqlite_demo.py script from this chapter (which simulates the Acme Corp inventory database):

  1. Run the script to create and populate the database.
  2. Connect to acme_inventory.db using sqlite3 and pd.read_sql().
  3. Write a query to find all products at or below their reorder point.
  4. Write a query to find any products with negative quantity (the script inserts three for you to find).
  5. Write a query that shows the total inventory value per category.
  6. Export your results to a CSV file.