Case Study 1: Building a Multi-Service Dashboard

Overview

Project: CityPulse -- A real-time city information dashboard that aggregates data from five external APIs into a unified interface.

Team: Two developers (one backend, one frontend) at a startup building location-based services.

Timeline: Three-week sprint from requirements to production deployment.

External Services Integrated: 1. OpenWeatherMap API -- Current weather and forecasts 2. NewsAPI -- Local and national news headlines 3. Geocoding API -- Address-to-coordinate conversion 4. Air Quality API -- Real-time pollution data 5. Currency Exchange API -- Live exchange rates for the city's country


The Challenge

The CityPulse team needed to build a dashboard that shows a comprehensive snapshot of any city in the world. The user types a city name and sees weather, news, air quality, and currency information -- all in under 2 seconds. The challenge was not just calling five APIs; it was doing so reliably, quickly, and without exceeding any service's rate limits.

The team faced several specific technical challenges:

  1. Latency budget. Five sequential API calls averaging 400ms each would total 2 seconds -- the entire budget. They needed concurrency.
  2. Mixed reliability. Weather and geocoding APIs had 99.9% uptime. News and air quality APIs were less reliable, occasionally returning errors or timing out.
  3. Different rate limits. OpenWeatherMap allowed 60 calls/minute on the free tier. NewsAPI allowed 100 calls/day. The air quality API had no documented rate limits but started throttling at roughly 30 calls/minute.
  4. Inconsistent response formats. Each API returned data in a different structure, requiring normalization before display.
  5. Cost management. Several APIs charged per call beyond the free tier. Caching was essential to control costs.

Architecture Decisions

Decision 1: Async-First with httpx

The team chose httpx.AsyncClient over requests for all API calls. The async approach was non-negotiable given the latency requirements:

async def fetch_city_data(city: str) -> CityDashboard:
    """Fetch all data for a city concurrently."""
    coords = await geocode_city(city)

    weather_task = fetch_weather(coords.lat, coords.lng)
    news_task = fetch_news(city)
    air_quality_task = fetch_air_quality(coords.lat, coords.lng)
    currency_task = fetch_currency(coords.country_code)

    results = await asyncio.gather(
        weather_task,
        news_task,
        air_quality_task,
        currency_task,
        return_exceptions=True,
    )

    return build_dashboard(city, coords, results)

The geocoding call had to happen first (to get coordinates), but the remaining four calls could execute concurrently. This reduced the total latency from ~2000ms to ~600ms (geocoding + longest of the four parallel calls).

Decision 2: Three-Tier Caching Strategy

Not all data needed to be fresh on every request. The team implemented a caching strategy with different time-to-live (TTL) values:

Data Cache TTL Reasoning
Geocoding results 30 days Cities do not move
Weather 10 minutes Balances freshness with API call volume
Air quality 15 minutes Updates less frequently than weather
News headlines 30 minutes Headlines change but not every minute
Currency rates 1 hour Rates update but not in real-time for display purposes

The cache was implemented using Redis with automatic expiration:

class CachedAPIService:
    """API service with Redis-backed caching."""

    def __init__(self, redis_client, default_ttl: int = 300):
        self.redis = redis_client
        self.default_ttl = default_ttl

    async def get_or_fetch(
        self,
        cache_key: str,
        fetch_func,
        ttl: int | None = None,
    ) -> dict:
        """Return cached data or fetch from API."""
        cached = await self.redis.get(cache_key)
        if cached:
            return json.loads(cached)

        data = await fetch_func()
        await self.redis.setex(
            cache_key,
            ttl or self.default_ttl,
            json.dumps(data),
        )
        return data

This caching layer reduced the total API calls by approximately 85% in production, keeping the team well within free-tier limits for most services.

Decision 3: Graceful Degradation Per Service

The team classified each API as critical or optional:

  • Critical: Geocoding (without it, nothing works) and Weather (primary dashboard content)
  • Optional: News, Air Quality, Currency (nice to have but not essential)

For optional services, failures returned a placeholder rather than failing the entire dashboard:

def build_dashboard(
    city: str,
    coords: GeoLocation,
    results: list,
) -> CityDashboard:
    """Build dashboard from API results, handling failures."""
    weather, news, air_quality, currency = results

    dashboard = CityDashboard(city=city, coordinates=coords)

    if isinstance(weather, Exception):
        raise DashboardError(
            f"Weather data unavailable for {city}"
        )
    dashboard.weather = WeatherData.from_api(weather)

    if isinstance(news, Exception):
        dashboard.news = NewsData.unavailable(
            "News service temporarily unavailable"
        )
    else:
        dashboard.news = NewsData.from_api(news)

    if isinstance(air_quality, Exception):
        dashboard.air_quality = AirQualityData.unavailable(
            "Air quality data temporarily unavailable"
        )
    else:
        dashboard.air_quality = AirQualityData.from_api(air_quality)

    if isinstance(currency, Exception):
        dashboard.currency = CurrencyData.unavailable(
            "Exchange rates temporarily unavailable"
        )
    else:
        dashboard.currency = CurrencyData.from_api(currency)

    return dashboard

Decision 4: Circuit Breakers for Unreliable Services

The air quality API was the least reliable, occasionally going down for 10-15 minutes at a time. The team added a circuit breaker specifically for this service:

air_quality_circuit = CircuitBreaker(
    failure_threshold=3,
    recovery_timeout=60.0,
    success_threshold=2,
)

When the circuit opened, the dashboard immediately showed "Air quality data temporarily unavailable" instead of waiting for a timeout on each request. This improved the user experience during outages because the dashboard loaded in 400ms instead of 10+ seconds (the timeout for the failing API call).

Decision 5: Unified Response Models

Each API returned data in a completely different format. The team built Pydantic models to normalize everything:

class WeatherData(BaseModel):
    temperature_celsius: float
    feels_like_celsius: float
    humidity_percent: int
    description: str
    icon_code: str
    wind_speed_mps: float

    @classmethod
    def from_api(cls, raw: dict) -> "WeatherData":
        return cls(
            temperature_celsius=raw["main"]["temp"],
            feels_like_celsius=raw["main"]["feels_like"],
            humidity_percent=raw["main"]["humidity"],
            description=raw["weather"][0]["description"],
            icon_code=raw["weather"][0]["icon"],
            wind_speed_mps=raw["wind"]["speed"],
        )


class AirQualityData(BaseModel):
    aqi: int
    level: str
    dominant_pollutant: str
    available: bool = True
    message: str | None = None

    @classmethod
    def unavailable(cls, message: str) -> "AirQualityData":
        return cls(
            aqi=0,
            level="unknown",
            dominant_pollutant="unknown",
            available=False,
            message=message,
        )

This normalization layer was critical. The frontend code never had to know the specific structure of any external API's response. It worked exclusively with the normalized models, which made frontend development faster and the system more maintainable.


Implementation Timeline

Week 1: Foundation

Days 1-2: API exploration and client setup. The team used an AI coding assistant to quickly generate client classes for each API. The AI prompt was: "Create an async Python client for the OpenWeatherMap API using httpx. Include error handling, type hints, and Pydantic response models." The AI produced working clients for all five APIs in about two hours. The team then spent the rest of the day refining the error handling and testing with real API keys.

Days 3-4: Caching layer and concurrency. The Redis caching layer was implemented and the asyncio.gather() pattern was set up. Initial benchmarks showed the dashboard loading in 500-700ms with a cold cache and 30-50ms with a warm cache.

Day 5: Response normalization. Pydantic models were built for each service's response. The AI assistant was helpful here because it could look at the raw API response and generate the Pydantic model with correct field mappings.

Week 2: Resilience

Days 1-2: Circuit breakers and retry logic. Circuit breakers were added for the air quality and news APIs. Retry logic with exponential backoff was added to all API clients.

Days 3-4: Rate limit handling. A rate limiter was implemented to ensure the application never exceeded any API's limits, even under high traffic. For the NewsAPI (100 calls/day), the team implemented aggressive caching with a 30-minute TTL and a fallback to cached data when the daily limit was reached.

Day 5: Error handling and logging. Structured logging was added for all API calls, including request timing, response status, cache hits/misses, and circuit breaker state changes.

Week 3: Production Readiness

Days 1-2: Load testing. The team simulated 100 concurrent users requesting data for different cities. The system handled the load without exceeding any rate limits, thanks to the caching layer.

Days 3-4: Monitoring and alerting. A health check endpoint was added that tested each external API's availability. Alerts were configured to notify the team via Slack when any circuit breaker opened.

Day 5: Documentation and deployment. API key rotation procedures were documented. The application was deployed with all secrets stored in environment variables managed by the deployment platform.


Results and Lessons Learned

Performance Metrics

Metric Target Achieved
Dashboard load time (cold cache) < 2000ms 580ms avg
Dashboard load time (warm cache) < 200ms 35ms avg
API calls per day < 10,000 total ~2,100 avg
Cache hit rate > 70% 87%
Dashboard availability > 99% 99.7%

Lesson 1: Cache Aggressively, But Transparently

The 87% cache hit rate meant the application made only 2,100 API calls per day instead of the estimated 15,000. This kept costs minimal and prevented rate limit issues. The key was being transparent about data freshness -- each dashboard section showed a "Last updated" timestamp so users knew they might be seeing data that was a few minutes old.

Lesson 2: AI Accelerates Boilerplate, Not Architecture

The AI coding assistant saved hours on generating API clients, Pydantic models, and retry logic. But the architectural decisions -- what to cache and for how long, which services are critical vs. optional, how to handle partial failures -- required human judgment based on understanding the specific use case.

Lesson 3: Rate Limits Are Shared Resources

During development, the team hit the NewsAPI daily limit because multiple developers were testing against the same API key. The solution was simple but important: each developer had their own API key, and the production key was never used in development.

Lesson 4: Test with Real Failures

The circuit breaker was not properly tested until the air quality API actually went down. The initial configuration was too sensitive (failure_threshold=2), causing the circuit to open on occasional slow responses. Adjusting to failure_threshold=3 with a 60-second recovery timeout provided the right balance.

Lesson 5: Normalize at the Boundary

Building the Pydantic normalization layer between external APIs and the rest of the application was one of the best decisions. When the weather API changed their response format slightly (adding a new nested field), only the from_api() classmethod needed updating. The rest of the application was completely unaffected.


Code Architecture Summary

citypulse/
├── api_clients/
│   ├── __init__.py
│   ├── base.py              # Base async client with retry logic
│   ├── weather.py            # OpenWeatherMap client
│   ├── news.py               # NewsAPI client
│   ├── air_quality.py        # Air quality client
│   ├── geocoding.py          # Geocoding client
│   └── currency.py           # Currency exchange client
├── models/
│   ├── __init__.py
│   ├── dashboard.py          # CityDashboard model
│   ├── weather.py            # Normalized weather models
│   ├── news.py               # Normalized news models
│   ├── air_quality.py        # Normalized AQI models
│   └── currency.py           # Normalized currency models
├── services/
│   ├── __init__.py
│   ├── cache.py              # Redis caching layer
│   ├── circuit_breaker.py    # Circuit breaker implementation
│   └── dashboard.py          # Dashboard aggregation service
├── routes/
│   ├── __init__.py
│   └── dashboard.py          # FastAPI route handlers
├── config.py                 # Service configuration
└── main.py                   # FastAPI application entry point

This case study demonstrates that building a multi-service integration is not just about calling APIs. It requires thoughtful architecture around caching, resilience, normalization, and graceful degradation. The AI coding assistant dramatically accelerated the implementation of well-known patterns, but the strategic decisions about which patterns to apply and how to configure them required human understanding of the specific problem domain.