Case Study 1: Optimizing a Slow API Endpoint
From 2-Second Response to 50ms Through Systematic Profiling and Optimization
Background
DataVault is a mid-stage startup building an analytics platform for e-commerce merchants. Their dashboard API endpoint, /api/dashboard, aggregates sales data, customer metrics, and inventory alerts for a merchant's store. After launching with 50 merchants, the endpoint performed adequately — responses returned in about 300ms. But as the platform grew to 500 merchants with larger product catalogs and more transaction history, the endpoint's response time crept up to 2.1 seconds at the P50 level, with P95 reaching 4.8 seconds.
The product team received complaints from merchants: "The dashboard takes forever to load." The engineering team of three developers needed to fix it quickly. They had no dedicated performance engineer, so they decided to use a systematic measurement-driven approach augmented by AI-assisted analysis.
The Application Architecture
The application was a Flask API with a PostgreSQL database, deployed with Gunicorn (4 workers) behind an Nginx reverse proxy. The dashboard endpoint looked like this:
@app.route("/api/dashboard/<int:merchant_id>")
def get_dashboard(merchant_id: int):
"""Generate merchant dashboard data."""
merchant = Merchant.query.get_or_404(merchant_id)
# Sales metrics
orders = Order.query.filter_by(merchant_id=merchant_id).all()
total_revenue = sum(o.total_amount for o in orders)
orders_today = [o for o in orders if o.created_at.date() == date.today()]
revenue_today = sum(o.total_amount for o in orders_today)
# Product performance
products = Product.query.filter_by(merchant_id=merchant_id).all()
product_metrics = []
for product in products:
reviews = Review.query.filter_by(product_id=product.id).all()
order_items = OrderItem.query.filter_by(product_id=product.id).all()
product_metrics.append({
"id": product.id,
"name": product.name,
"avg_rating": (sum(r.rating for r in reviews) / len(reviews))
if reviews else 0,
"total_sold": sum(oi.quantity for oi in order_items),
"revenue": sum(oi.quantity * oi.unit_price for oi in order_items),
})
# Customer overview
customers = Customer.query.filter_by(merchant_id=merchant_id).all()
customer_data = []
for customer in customers:
customer_orders = Order.query.filter_by(
customer_id=customer.id, merchant_id=merchant_id
).all()
customer_data.append({
"id": customer.id,
"name": customer.name,
"order_count": len(customer_orders),
"total_spent": sum(o.total_amount for o in customer_orders),
})
# Low inventory alerts
low_stock = [p for p in products if p.stock_quantity < p.reorder_threshold]
return jsonify({
"merchant": merchant.name,
"total_revenue": total_revenue,
"revenue_today": revenue_today,
"orders_today": len(orders_today),
"products": sorted(product_metrics,
key=lambda p: p["revenue"], reverse=True)[:20],
"top_customers": sorted(customer_data,
key=lambda c: c["total_spent"],
reverse=True)[:10],
"low_stock_alerts": [{"name": p.name, "stock": p.stock_quantity}
for p in low_stock],
})
The database had these approximate row counts for a typical mid-size merchant: - Orders: 15,000 rows - Products: 200 rows - Reviews: 8,000 rows - OrderItems: 45,000 rows - Customers: 3,000 rows
Phase 1: Measurement and Profiling
Step 1: Baseline Measurement
The team first established a reliable baseline. They used a simple timing decorator and ran 50 requests against a staging environment with production-like data:
import time
import statistics
response_times = []
for _ in range(50):
start = time.perf_counter()
response = requests.get(f"{BASE_URL}/api/dashboard/42")
elapsed = (time.perf_counter() - start) * 1000
response_times.append(elapsed)
print(f"P50: {statistics.median(response_times):.0f}ms")
print(f"P95: {sorted(response_times)[47]:.0f}ms")
print(f"P99: {sorted(response_times)[49]:.0f}ms")
Results: - P50: 2,134ms - P95: 4,823ms - P99: 6,102ms
Step 2: cProfile Analysis
The team wrapped the endpoint handler in cProfile:
import cProfile
import pstats
from io import StringIO
def profile_dashboard():
profiler = cProfile.Profile()
profiler.enable()
with app.test_request_context():
get_dashboard(42)
profiler.disable()
stream = StringIO()
stats = pstats.Stats(profiler, stream=stream)
stats.sort_stats("cumulative")
stats.print_stats(25)
return stream.getvalue()
The top lines of the profiling output revealed the bottleneck structure:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.008 0.008 2.147 2.147 views.py:12(get_dashboard)
3201 0.045 0.000 1.423 0.000 query.py:298(all)
3201 1.156 0.000 1.312 0.000 cursor.py:178(execute)
200 0.002 0.000 0.534 0.003 query.py:298(all) [reviews]
200 0.001 0.000 0.489 0.002 query.py:298(all) [order_items]
15000 0.089 0.000 0.089 0.000 {sum}
3000 0.003 0.000 0.378 0.000 query.py:298(all) [cust orders]
The team shared this profiling output with their AI assistant and asked: "Identify the performance bottlenecks and their root causes."
Step 3: AI-Assisted Analysis
The AI assistant identified three critical issues:
-
N+1 queries on products (534ms + 489ms = 1,023ms): For each of 200 products, the code issued separate queries for reviews and order items. That was 400 extra queries.
-
N+1 queries on customers (378ms): For each of 3,000 customers, the code issued a query for their orders. That was 3,000 extra queries.
-
Loading all orders into Python (~200ms for query + ~89ms for in-memory aggregation): All 15,000 orders were loaded into memory just to sum amounts and filter by date. SQL aggregation would be far more efficient.
Total identified waste: approximately 1,690ms out of 2,147ms, representing 79% of execution time.
Phase 2: Systematic Optimization
Optimization 1: Replace N+1 Product Queries with JOINs
Instead of querying reviews and order items for each product individually, the team used SQL aggregation:
product_metrics = db.session.query(
Product.id,
Product.name,
func.coalesce(func.avg(Review.rating), 0).label("avg_rating"),
func.coalesce(func.sum(OrderItem.quantity), 0).label("total_sold"),
func.coalesce(
func.sum(OrderItem.quantity * OrderItem.unit_price), 0
).label("revenue"),
).outerjoin(
Review, Review.product_id == Product.id
).outerjoin(
OrderItem, OrderItem.product_id == Product.id
).filter(
Product.merchant_id == merchant_id
).group_by(
Product.id, Product.name
).order_by(
desc("revenue")
).limit(20).all()
Result: 400 queries replaced with 1. Product metrics section dropped from 1,023ms to 45ms.
Optimization 2: Replace N+1 Customer Queries with Aggregation
top_customers = db.session.query(
Customer.id,
Customer.name,
func.count(Order.id).label("order_count"),
func.coalesce(func.sum(Order.total_amount), 0).label("total_spent"),
).outerjoin(
Order, and_(
Order.customer_id == Customer.id,
Order.merchant_id == merchant_id,
)
).filter(
Customer.merchant_id == merchant_id
).group_by(
Customer.id, Customer.name
).order_by(
desc("total_spent")
).limit(10).all()
Result: 3,000 queries replaced with 1. Customer section dropped from 378ms to 22ms.
Optimization 3: SQL Aggregation for Revenue Metrics
Instead of loading all 15,000 orders into Python:
revenue_stats = db.session.query(
func.sum(Order.total_amount).label("total_revenue"),
func.sum(
case(
(func.date(Order.created_at) == date.today(), Order.total_amount),
else_=0,
)
).label("revenue_today"),
func.sum(
case(
(func.date(Order.created_at) == date.today(), 1),
else_=0,
)
).label("orders_today"),
).filter(
Order.merchant_id == merchant_id
).first()
Result: Loading 15,000 rows replaced with a single aggregate query. Revenue section dropped from 289ms to 8ms.
Intermediate Measurement
After the three query optimizations, the team re-measured:
- P50: 187ms (down from 2,134ms — 11.4x improvement)
- P95: 312ms (down from 4,823ms)
- P99: 498ms (down from 6,102ms)
This was already a huge improvement, but the team wanted to push further.
Optimization 4: Add Database Indexes
Running EXPLAIN ANALYZE on the new queries revealed sequential scans on the orders.merchant_id and order_items.product_id columns:
CREATE INDEX idx_orders_merchant_id ON orders(merchant_id);
CREATE INDEX idx_orders_customer_merchant ON orders(customer_id, merchant_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
CREATE INDEX idx_reviews_product_id ON reviews(product_id);
CREATE INDEX idx_products_merchant_id ON products(merchant_id);
CREATE INDEX idx_customers_merchant_id ON customers(merchant_id);
Result: Query execution time dropped by approximately 60%. P50 went from 187ms to 78ms.
Optimization 5: Add Response Caching
Dashboard data does not need to be real-time — a few minutes of staleness is acceptable. The team added Redis caching:
import json
import redis
redis_client = redis.Redis(host="localhost", port=6379, db=0)
DASHBOARD_CACHE_TTL = 120 # 2 minutes
@app.route("/api/dashboard/<int:merchant_id>")
def get_dashboard(merchant_id: int):
cache_key = f"dashboard:{merchant_id}"
# Check cache
cached = redis_client.get(cache_key)
if cached:
return jsonify(json.loads(cached))
# Generate dashboard (the optimized version)
dashboard_data = generate_dashboard_data(merchant_id)
# Cache the result
redis_client.setex(cache_key, DASHBOARD_CACHE_TTL,
json.dumps(dashboard_data, default=str))
return jsonify(dashboard_data)
Result: Cache hits serve in 2-5ms. Cache misses take 78ms. With a 2-minute TTL and merchants checking their dashboard every few minutes, the cache hit rate was approximately 85%.
Phase 3: Final Results and Validation
Final Performance Numbers
After all five optimizations, the team ran a comprehensive load test with 100 concurrent users:
| Metric | Before | After (Cache Miss) | After (Cache Hit) |
|---|---|---|---|
| P50 | 2,134ms | 78ms | 3ms |
| P95 | 4,823ms | 142ms | 8ms |
| P99 | 6,102ms | 234ms | 15ms |
| Queries per request | ~3,600 | 5 | 0 |
| DB load (CPU) | 89% | 12% | <1% |
Effective P50 considering 85% cache hit rate: approximately 14ms (0.85 * 3 + 0.15 * 78).
Query Count Breakdown
| Component | Before | After |
|---|---|---|
| Merchant lookup | 1 | 1 |
| Revenue aggregation | 1 (loading 15K rows) | 1 (aggregate query) |
| Product metrics | 401 (1 + 200 * 2) | 1 (single query with JOINs) |
| Customer metrics | 3,001 (1 + 3,000) | 1 (single query with JOINs) |
| Low stock check | ~200 | 1 (filter in product query) |
| Total | ~3,604 | 5 |
Lessons Learned
1. Measurement Was Essential
The team's initial assumption was that "the database is slow." Profiling revealed that the database was fast per-query — the problem was making 3,600 queries instead of 5. Without profiling, they might have spent time optimizing database server configuration instead of fixing the actual bottleneck.
2. The Biggest Wins Came from the Simplest Changes
Replacing N+1 queries with JOINs was a conceptual change, not a technically complex one. Adding indexes required no application code changes at all. These two optimizations — taking perhaps 2 hours of work — delivered a 27x improvement.
3. Caching Was the Final Polish, Not the First Solution
If the team had started with caching, they would have masked the underlying problem. Cache misses would still take 2+ seconds, causing poor experience for the first visitor after cache expiration. By optimizing the queries first, cache misses became fast (78ms), and caching provided additional benefit on top of an already-fast endpoint.
4. AI Analysis Accelerated the Process
Sharing the cProfile output with an AI assistant immediately identified the N+1 pattern and suggested specific solutions. This saved the team from spending hours manually tracing through the profiling data and researching optimization strategies. The AI prompt was specific — it included the profiling data, the data volumes, and the desired outcome — which produced actionable suggestions.
5. Optimization Is Iterative
Each optimization shifted the bottleneck. After fixing queries, indexes became the next target. After indexes, caching was the next opportunity. The team ran the full profiling loop five times. Each iteration was informed by fresh measurements, not assumptions carried over from the previous iteration.
Code: The Final Optimized Endpoint
@app.route("/api/dashboard/<int:merchant_id>")
def get_dashboard(merchant_id: int):
"""Generate merchant dashboard with caching and optimized queries."""
cache_key = f"dashboard:{merchant_id}"
cached = redis_client.get(cache_key)
if cached:
return jsonify(json.loads(cached))
merchant = Merchant.query.get_or_404(merchant_id)
dashboard_data = {
"merchant": merchant.name,
**get_revenue_metrics(merchant_id),
"products": get_top_products(merchant_id),
"top_customers": get_top_customers(merchant_id),
"low_stock_alerts": get_low_stock_alerts(merchant_id),
}
redis_client.setex(cache_key, 120, json.dumps(dashboard_data, default=str))
return jsonify(dashboard_data)
Each helper function executes a single optimized query, and the entire endpoint completes in 5 queries on a cache miss. The response time dropped from 2,134ms to an effective 14ms — a 152x improvement — through systematic profiling, targeted optimization, and strategic caching.