Case Study 1: Building a Multi-Service Dashboard
Overview
Project: CityPulse -- A real-time city information dashboard that aggregates data from five external APIs into a unified interface.
Team: Two developers (one backend, one frontend) at a startup building location-based services.
Timeline: Three-week sprint from requirements to production deployment.
External Services Integrated: 1. OpenWeatherMap API -- Current weather and forecasts 2. NewsAPI -- Local and national news headlines 3. Geocoding API -- Address-to-coordinate conversion 4. Air Quality API -- Real-time pollution data 5. Currency Exchange API -- Live exchange rates for the city's country
The Challenge
The CityPulse team needed to build a dashboard that shows a comprehensive snapshot of any city in the world. The user types a city name and sees weather, news, air quality, and currency information -- all in under 2 seconds. The challenge was not just calling five APIs; it was doing so reliably, quickly, and without exceeding any service's rate limits.
The team faced several specific technical challenges:
- Latency budget. Five sequential API calls averaging 400ms each would total 2 seconds -- the entire budget. They needed concurrency.
- Mixed reliability. Weather and geocoding APIs had 99.9% uptime. News and air quality APIs were less reliable, occasionally returning errors or timing out.
- Different rate limits. OpenWeatherMap allowed 60 calls/minute on the free tier. NewsAPI allowed 100 calls/day. The air quality API had no documented rate limits but started throttling at roughly 30 calls/minute.
- Inconsistent response formats. Each API returned data in a different structure, requiring normalization before display.
- Cost management. Several APIs charged per call beyond the free tier. Caching was essential to control costs.
Architecture Decisions
Decision 1: Async-First with httpx
The team chose httpx.AsyncClient over requests for all API calls. The async approach was non-negotiable given the latency requirements:
async def fetch_city_data(city: str) -> CityDashboard:
"""Fetch all data for a city concurrently."""
coords = await geocode_city(city)
weather_task = fetch_weather(coords.lat, coords.lng)
news_task = fetch_news(city)
air_quality_task = fetch_air_quality(coords.lat, coords.lng)
currency_task = fetch_currency(coords.country_code)
results = await asyncio.gather(
weather_task,
news_task,
air_quality_task,
currency_task,
return_exceptions=True,
)
return build_dashboard(city, coords, results)
The geocoding call had to happen first (to get coordinates), but the remaining four calls could execute concurrently. This reduced the total latency from ~2000ms to ~600ms (geocoding + longest of the four parallel calls).
Decision 2: Three-Tier Caching Strategy
Not all data needed to be fresh on every request. The team implemented a caching strategy with different time-to-live (TTL) values:
| Data | Cache TTL | Reasoning |
|---|---|---|
| Geocoding results | 30 days | Cities do not move |
| Weather | 10 minutes | Balances freshness with API call volume |
| Air quality | 15 minutes | Updates less frequently than weather |
| News headlines | 30 minutes | Headlines change but not every minute |
| Currency rates | 1 hour | Rates update but not in real-time for display purposes |
The cache was implemented using Redis with automatic expiration:
class CachedAPIService:
"""API service with Redis-backed caching."""
def __init__(self, redis_client, default_ttl: int = 300):
self.redis = redis_client
self.default_ttl = default_ttl
async def get_or_fetch(
self,
cache_key: str,
fetch_func,
ttl: int | None = None,
) -> dict:
"""Return cached data or fetch from API."""
cached = await self.redis.get(cache_key)
if cached:
return json.loads(cached)
data = await fetch_func()
await self.redis.setex(
cache_key,
ttl or self.default_ttl,
json.dumps(data),
)
return data
This caching layer reduced the total API calls by approximately 85% in production, keeping the team well within free-tier limits for most services.
Decision 3: Graceful Degradation Per Service
The team classified each API as critical or optional:
- Critical: Geocoding (without it, nothing works) and Weather (primary dashboard content)
- Optional: News, Air Quality, Currency (nice to have but not essential)
For optional services, failures returned a placeholder rather than failing the entire dashboard:
def build_dashboard(
city: str,
coords: GeoLocation,
results: list,
) -> CityDashboard:
"""Build dashboard from API results, handling failures."""
weather, news, air_quality, currency = results
dashboard = CityDashboard(city=city, coordinates=coords)
if isinstance(weather, Exception):
raise DashboardError(
f"Weather data unavailable for {city}"
)
dashboard.weather = WeatherData.from_api(weather)
if isinstance(news, Exception):
dashboard.news = NewsData.unavailable(
"News service temporarily unavailable"
)
else:
dashboard.news = NewsData.from_api(news)
if isinstance(air_quality, Exception):
dashboard.air_quality = AirQualityData.unavailable(
"Air quality data temporarily unavailable"
)
else:
dashboard.air_quality = AirQualityData.from_api(air_quality)
if isinstance(currency, Exception):
dashboard.currency = CurrencyData.unavailable(
"Exchange rates temporarily unavailable"
)
else:
dashboard.currency = CurrencyData.from_api(currency)
return dashboard
Decision 4: Circuit Breakers for Unreliable Services
The air quality API was the least reliable, occasionally going down for 10-15 minutes at a time. The team added a circuit breaker specifically for this service:
air_quality_circuit = CircuitBreaker(
failure_threshold=3,
recovery_timeout=60.0,
success_threshold=2,
)
When the circuit opened, the dashboard immediately showed "Air quality data temporarily unavailable" instead of waiting for a timeout on each request. This improved the user experience during outages because the dashboard loaded in 400ms instead of 10+ seconds (the timeout for the failing API call).
Decision 5: Unified Response Models
Each API returned data in a completely different format. The team built Pydantic models to normalize everything:
class WeatherData(BaseModel):
temperature_celsius: float
feels_like_celsius: float
humidity_percent: int
description: str
icon_code: str
wind_speed_mps: float
@classmethod
def from_api(cls, raw: dict) -> "WeatherData":
return cls(
temperature_celsius=raw["main"]["temp"],
feels_like_celsius=raw["main"]["feels_like"],
humidity_percent=raw["main"]["humidity"],
description=raw["weather"][0]["description"],
icon_code=raw["weather"][0]["icon"],
wind_speed_mps=raw["wind"]["speed"],
)
class AirQualityData(BaseModel):
aqi: int
level: str
dominant_pollutant: str
available: bool = True
message: str | None = None
@classmethod
def unavailable(cls, message: str) -> "AirQualityData":
return cls(
aqi=0,
level="unknown",
dominant_pollutant="unknown",
available=False,
message=message,
)
This normalization layer was critical. The frontend code never had to know the specific structure of any external API's response. It worked exclusively with the normalized models, which made frontend development faster and the system more maintainable.
Implementation Timeline
Week 1: Foundation
Days 1-2: API exploration and client setup. The team used an AI coding assistant to quickly generate client classes for each API. The AI prompt was: "Create an async Python client for the OpenWeatherMap API using httpx. Include error handling, type hints, and Pydantic response models." The AI produced working clients for all five APIs in about two hours. The team then spent the rest of the day refining the error handling and testing with real API keys.
Days 3-4: Caching layer and concurrency. The Redis caching layer was implemented and the asyncio.gather() pattern was set up. Initial benchmarks showed the dashboard loading in 500-700ms with a cold cache and 30-50ms with a warm cache.
Day 5: Response normalization. Pydantic models were built for each service's response. The AI assistant was helpful here because it could look at the raw API response and generate the Pydantic model with correct field mappings.
Week 2: Resilience
Days 1-2: Circuit breakers and retry logic. Circuit breakers were added for the air quality and news APIs. Retry logic with exponential backoff was added to all API clients.
Days 3-4: Rate limit handling. A rate limiter was implemented to ensure the application never exceeded any API's limits, even under high traffic. For the NewsAPI (100 calls/day), the team implemented aggressive caching with a 30-minute TTL and a fallback to cached data when the daily limit was reached.
Day 5: Error handling and logging. Structured logging was added for all API calls, including request timing, response status, cache hits/misses, and circuit breaker state changes.
Week 3: Production Readiness
Days 1-2: Load testing. The team simulated 100 concurrent users requesting data for different cities. The system handled the load without exceeding any rate limits, thanks to the caching layer.
Days 3-4: Monitoring and alerting. A health check endpoint was added that tested each external API's availability. Alerts were configured to notify the team via Slack when any circuit breaker opened.
Day 5: Documentation and deployment. API key rotation procedures were documented. The application was deployed with all secrets stored in environment variables managed by the deployment platform.
Results and Lessons Learned
Performance Metrics
| Metric | Target | Achieved |
|---|---|---|
| Dashboard load time (cold cache) | < 2000ms | 580ms avg |
| Dashboard load time (warm cache) | < 200ms | 35ms avg |
| API calls per day | < 10,000 total | ~2,100 avg |
| Cache hit rate | > 70% | 87% |
| Dashboard availability | > 99% | 99.7% |
Lesson 1: Cache Aggressively, But Transparently
The 87% cache hit rate meant the application made only 2,100 API calls per day instead of the estimated 15,000. This kept costs minimal and prevented rate limit issues. The key was being transparent about data freshness -- each dashboard section showed a "Last updated" timestamp so users knew they might be seeing data that was a few minutes old.
Lesson 2: AI Accelerates Boilerplate, Not Architecture
The AI coding assistant saved hours on generating API clients, Pydantic models, and retry logic. But the architectural decisions -- what to cache and for how long, which services are critical vs. optional, how to handle partial failures -- required human judgment based on understanding the specific use case.
Lesson 3: Rate Limits Are Shared Resources
During development, the team hit the NewsAPI daily limit because multiple developers were testing against the same API key. The solution was simple but important: each developer had their own API key, and the production key was never used in development.
Lesson 4: Test with Real Failures
The circuit breaker was not properly tested until the air quality API actually went down. The initial configuration was too sensitive (failure_threshold=2), causing the circuit to open on occasional slow responses. Adjusting to failure_threshold=3 with a 60-second recovery timeout provided the right balance.
Lesson 5: Normalize at the Boundary
Building the Pydantic normalization layer between external APIs and the rest of the application was one of the best decisions. When the weather API changed their response format slightly (adding a new nested field), only the from_api() classmethod needed updating. The rest of the application was completely unaffected.
Code Architecture Summary
citypulse/
├── api_clients/
│ ├── __init__.py
│ ├── base.py # Base async client with retry logic
│ ├── weather.py # OpenWeatherMap client
│ ├── news.py # NewsAPI client
│ ├── air_quality.py # Air quality client
│ ├── geocoding.py # Geocoding client
│ └── currency.py # Currency exchange client
├── models/
│ ├── __init__.py
│ ├── dashboard.py # CityDashboard model
│ ├── weather.py # Normalized weather models
│ ├── news.py # Normalized news models
│ ├── air_quality.py # Normalized AQI models
│ └── currency.py # Normalized currency models
├── services/
│ ├── __init__.py
│ ├── cache.py # Redis caching layer
│ ├── circuit_breaker.py # Circuit breaker implementation
│ └── dashboard.py # Dashboard aggregation service
├── routes/
│ ├── __init__.py
│ └── dashboard.py # FastAPI route handlers
├── config.py # Service configuration
└── main.py # FastAPI application entry point
This case study demonstrates that building a multi-service integration is not just about calling APIs. It requires thoughtful architecture around caching, resilience, normalization, and graceful degradation. The AI coding assistant dramatically accelerated the implementation of well-known patterns, but the strategic decisions about which patterns to apply and how to configure them required human understanding of the specific problem domain.