11 min read

> "Sitting on your own data while the world's data streams past you is like running a business with blinders on."

Chapter 21: Working with APIs and External Data Services

"Sitting on your own data while the world's data streams past you is like running a business with blinders on."


Opening Scenario: The Monday Morning Gap

Priya Okonkwo has built something real. Over the past several months she's automated Acme Corp's weekly sales reports, cleaned their messy data, and produced visualizations that Sandra Chen actually uses in her executive presentations. The tools live on her laptop. The data lives in CSV files.

Then Sandra asks a question Priya can't answer: "Our biggest competitor just hired a VP of eCommerce. What does that mean for our pricing?"

Priya's internal data says nothing about competitors. It says nothing about exchange rates for Acme's growing international accounts. It says nothing about economic signals, industry news, or the world outside Acme's own spreadsheets.

This is the gap that APIs fill.

An API (Application Programming Interface) is a structured way to request data and services from an external system over the internet. Thousands of organizations — governments, financial data providers, news outlets, weather services, logistics companies — publish their data through APIs. Your Python code can call those APIs the same way it calls a function in your own program.

After this chapter, Priya's toolkit will reach beyond Acme's internal data. Yours will too.


21.1 What APIs Are and Why They Matter

The word "API" is used loosely in software to mean many things. In this chapter, we mean specifically web APIs: services you call over the internet using HTTP requests.

The conceptual model is simple:

  1. Your Python program sends an HTTP request to a URL (the endpoint)
  2. The server processes your request and sends back an HTTP response
  3. The response contains data — almost always formatted as JSON
  4. Your program parses that JSON and uses the data

This is exactly how a web browser works, except instead of a human clicking links, your code is making the requests programmatically and processing the results automatically.

Why APIs Matter for Business Work

Access to real-time data. Financial markets, exchange rates, weather, shipping rates — this data changes minute to minute. No CSV file stays current. An API call always returns the latest.

Access to data you couldn't collect yourself. Company profiles, news archives, demographic data, commodity prices — organizations that specialize in collecting this data sell or share it through APIs. Buying API access is far cheaper than collecting the data yourself.

Automation. Pulling data manually from a website is a job. Pulling it via API is one line of Python. Once you have the code, every future data pull is free and instant.

Integration. Modern business software — CRMs, accounting systems, project tools — exposes APIs. Your Python code can read from your CRM, push data to your accounting system, and trigger actions in your project management tool.

The Business API Ecosystem

A few categories worth knowing:

  • Financial data: Alpha Vantage, Yahoo Finance, Quandl, FRED (Federal Reserve)
  • Currency exchange: Open Exchange Rates, ExchangeRate-API, Fixer.io
  • Weather and geography: OpenWeatherMap, Open-Meteo, OpenStreetMap
  • Company and business data: Clearbit, OpenCorporates, Companies House (UK)
  • News: NewsAPI, The Guardian API, New York Times API
  • Government and public data: US Census Bureau, Bureau of Labor Statistics, World Bank
  • E-commerce and shipping: Shopify, FedEx, UPS, USPS APIs

Many of these have free tiers adequate for business analysis. We'll use several throughout this chapter.


21.2 REST API Concepts

Most web APIs you'll encounter are REST APIs (Representational State Transfer). REST isn't a protocol or a standard — it's an architectural style with a few defining characteristics:

Stateless: Each request contains all the information the server needs. The server doesn't remember previous requests from your program.

Resource-based: Data is organized into "resources" (customers, orders, products) and you interact with them through URLs.

Standard HTTP methods: The same HTTP verbs used by browsers are used for API operations.

HTTP Methods

Method What It Does Business Example
GET Retrieve data Fetch a list of orders
POST Create new data Submit a new customer record
PUT Replace existing data Update an entire order record
PATCH Partial update Change only the status field of an order
DELETE Remove data Delete a draft record

For business data analysis, you'll use GET 90% of the time. POST is useful when submitting data to services (like sending emails or triggering reports). PUT, PATCH, and DELETE become relevant when you're building integrations that write back to systems.

Endpoints

An endpoint is a specific URL that represents a resource or operation. Examples:

GET  https://api.example.com/v1/products
GET  https://api.example.com/v1/products/42
GET  https://api.example.com/v1/orders?status=pending&region=west
POST https://api.example.com/v1/orders

Notice the structure: base URL, version (v1), resource name, optional resource ID, optional query parameters.

HTTP Status Codes

The server's response always includes a status code — a three-digit number that tells you whether the request succeeded or failed, and why.

Code Meaning Your Action
200 OK — request succeeded Use the data
201 Created — resource was created Success for POST requests
400 Bad Request — your request was malformed Check your parameters
401 Unauthorized — authentication failed Check your API key
403 Forbidden — authenticated but no permission Check your account tier
404 Not Found — resource doesn't exist Check the URL and resource ID
429 Too Many Requests — rate limited Wait and retry
500 Internal Server Error — server problem Retry later
503 Service Unavailable — server down Retry later

Your code must handle these status codes. Assuming every response is a 200 is how you end up processing error messages as if they were data.


21.3 The requests Library

Python's built-in urllib can make HTTP requests, but it's verbose and awkward. The requests library is the standard tool for any Python developer working with APIs.

Install it:

pip install requests

Your First API Call

Let's start with the simplest possible example — calling a public API that requires no authentication:

import requests

# Call the Open-Meteo weather API (completely free, no API key required)
# Fetching weather for Chicago (Acme Corp's headquarters city)
response = requests.get(
    "https://api.open-meteo.com/v1/forecast",
    params={
        "latitude": 41.8781,
        "longitude": -87.6298,
        "current_weather": True,
    }
)

print(f"Status code: {response.status_code}")
print(f"Response data: {response.json()}")

Run that and you'll see a 200 status code and a JSON dictionary with Chicago's current weather.

Let's unpack what happened:

  1. requests.get(url, params=...) sent an HTTP GET request to that URL with query parameters appended: ?latitude=41.8781&longitude=-87.6298&current_weather=true
  2. The server returned an HTTP response
  3. response.status_code gives us the numeric status code
  4. response.json() parses the response body as JSON and returns a Python dictionary

The Response Object

The response object from any requests call contains everything the server sent back:

response = requests.get("https://api.open-meteo.com/v1/forecast", params={...})

# Status code
print(response.status_code)        # 200

# Response body as text
print(response.text)               # Raw JSON string

# Response body parsed as Python dictionary/list
data = response.json()             # Python dict — use this for data work

# Response headers (metadata about the response)
print(response.headers)            # dict-like object

# Whether the request succeeded (True for 2xx status codes)
print(response.ok)                 # True or False

# Raise an exception if the status code indicates an error
response.raise_for_status()        # Raises HTTPError for 4xx, 5xx

requests.get() — Detailed Reference

import requests

response = requests.get(
    url="https://api.example.com/data",

    # Query parameters — appended to URL as ?key=value&key2=value2
    params={
        "start_date": "2024-01-01",
        "end_date": "2024-12-31",
        "format": "json",
    },

    # HTTP headers — metadata sent with the request
    headers={
        "Authorization": "Bearer your_token_here",
        "Accept": "application/json",
        "User-Agent": "AcmeCorp-Analytics/1.0",
    },

    # Timeout in seconds — ALWAYS set this in production code
    timeout=30,
)

The params dictionary is the cleanest way to handle query parameters. requests handles URL encoding automatically — you don't need to worry about escaping special characters or formatting the URL string.

requests.post() — Sending Data

When you need to create a resource or send data to an API:

import requests
import json

new_record = {
    "customer_id": "CUST-4821",
    "contact_email": "buyer@client-corp.com",
    "deal_value": 45000,
    "stage": "Proposal",
}

response = requests.post(
    url="https://api.crm-example.com/v1/deals",
    headers={
        "Authorization": "Bearer your_token",
        "Content-Type": "application/json",
    },
    # Send Python dict as JSON body
    json=new_record,  # requests serializes this automatically
    timeout=30,
)

print(response.status_code)  # 201 if created successfully
created_deal = response.json()
print(f"Created deal ID: {created_deal['id']}")

21.4 Query Parameters and Headers

Query Parameters in Depth

Query parameters filter, sort, and configure what an API returns. Every API documents its own parameter names — read the documentation for each API you use.

Common patterns:

# Date range filtering
params = {
    "from": "2024-01-01",
    "to": "2024-12-31",
}

# Pagination
params = {
    "page": 1,
    "per_page": 100,
    "offset": 0,
    "limit": 50,
}

# Field selection — only return the fields you need
params = {
    "fields": "id,name,price,category",
}

# Sorting
params = {
    "sort_by": "date",
    "sort_order": "desc",
}

# Search
params = {
    "q": "office supplies",
    "category": "furniture",
}

Headers in Depth

HTTP headers carry metadata that's separate from the URL and the response body. You'll use headers for three main purposes:

Authentication (covered in section 21.5):

headers = {
    "Authorization": "Bearer eyJhbGciOiJ...",
    "X-API-Key": "your-api-key",
}

Content negotiation — telling the server what format you want back:

headers = {
    "Accept": "application/json",
    "Accept-Language": "en-US",
}

Identifying your application — good practice and sometimes required:

headers = {
    "User-Agent": "AcmeCorp-Analytics/1.0 (priya@acme.com)",
}

21.5 Authentication

Most useful APIs require authentication. The API needs to know who you are to enforce rate limits, track usage, and control access. There are several authentication patterns you'll encounter.

API Keys in Headers

The most common pattern for data APIs:

import requests

api_key = "your_api_key_here"

response = requests.get(
    "https://newsapi.org/v2/everything",
    headers={
        "X-Api-Key": api_key,  # Header name varies by API — check docs
    },
    params={
        "q": "office supplies industry",
        "from": "2024-01-01",
        "language": "en",
    },
    timeout=30,
)

API Keys in Query Parameters

Some APIs accept the key as a URL parameter:

response = requests.get(
    "https://www.alphavantage.co/query",
    params={
        "function": "OVERVIEW",
        "symbol": "SPLS",
        "apikey": "your_api_key",  # Key in params
    },
    timeout=30,
)

Bearer Token Authentication

Bearer tokens are commonly used with OAuth 2.0 systems (like connecting to Google, Salesforce, or Microsoft APIs):

access_token = "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."

response = requests.get(
    "https://api.service.com/v1/data",
    headers={
        "Authorization": f"Bearer {access_token}",
    },
    timeout=30,
)

HTTP Basic Authentication

Older APIs sometimes use username/password:

response = requests.get(
    "https://api.legacy-service.com/data",
    auth=("your_username", "your_password"),
    timeout=30,
)

requests handles the base64 encoding required for basic auth automatically.


21.6 Storing API Credentials Safely

Never hardcode API keys in your scripts. This is the single most important security rule in this chapter. API keys embedded in code get committed to version control, shared in emails, and posted to forums. Even if you only share the script internally, rotating a compromised key means finding every copy.

The standard solution is environment variables: key-value pairs that live in the operating system's environment, not in your code.

Setting Environment Variables

Windows (Command Prompt, permanent):

setx ALPHA_VANTAGE_API_KEY "your_key_here"
setx NEWS_API_KEY "your_key_here"
setx EXCHANGE_RATE_API_KEY "your_key_here"

Windows (PowerShell, permanent):

[Environment]::SetEnvironmentVariable("ALPHA_VANTAGE_API_KEY", "your_key_here", "User")

macOS/Linux (add to ~/.bashrc or ~/.zshrc):

export ALPHA_VANTAGE_API_KEY="your_key_here"
export NEWS_API_KEY="your_key_here"

Using a .env File with python-dotenv

For development, a .env file is more convenient than system environment variables:

pip install python-dotenv

Create a .env file in your project directory:

# .env — NEVER commit this file to version control
ALPHA_VANTAGE_API_KEY=your_key_here
NEWS_API_KEY=your_key_here
EXCHANGE_RATE_API_KEY=your_key_here

Add .env to your .gitignore:

.env
*.env

Load it in your Python code:

import os
from dotenv import load_dotenv

# Load .env file into os.environ
load_dotenv()

# Now access credentials from environment
alpha_vantage_key = os.environ.get("ALPHA_VANTAGE_API_KEY")
news_api_key = os.environ.get("NEWS_API_KEY")

if not alpha_vantage_key:
    raise ValueError(
        "ALPHA_VANTAGE_API_KEY not found in environment. "
        "Set it in your .env file or system environment."
    )

The os.environ.get("KEY_NAME") pattern returns None if the variable doesn't exist, rather than raising an exception. Checking explicitly and raising a descriptive error is better than letting the program fail mysteriously later.


21.7 Working with JSON Responses

JSON (JavaScript Object Notation) is the lingua franca of web APIs. It maps directly to Python data types:

JSON Python
object { } dict
array [ ] list
string "..." str
number int or float
true / false True / False
null None

Real API responses are nested. Here's a realistic example from a financial data API:

response = requests.get(
    "https://www.alphavantage.co/query",
    params={
        "function": "OVERVIEW",
        "symbol": "AAPL",
        "apikey": os.environ.get("ALPHA_VANTAGE_API_KEY"),
    },
    timeout=30,
)

data = response.json()
# data is a large nested dictionary

# Access top-level fields directly
company_name = data["Name"]             # "Apple Inc"
industry = data["Industry"]             # "Electronic Computers"
market_cap = data["MarketCapitalization"] # "2800000000000"
pe_ratio = data["PERatio"]              # "28.5"

print(f"{company_name}: {industry}, P/E: {pe_ratio}")

A more complex nested example — currency exchange rates:

# ExchangeRate-API response structure:
# {
#     "result": "success",
#     "base_code": "USD",
#     "rates": {
#         "EUR": 0.9234,
#         "GBP": 0.7891,
#         "JPY": 149.82,
#         "CAD": 1.3621,
#         ...
#     }
# }

response = requests.get(
    f"https://v6.exchangerate-api.com/v6/{api_key}/latest/USD",
    timeout=30,
)

data = response.json()

# Check if the API returned success
if data["result"] != "success":
    raise RuntimeError(f"API returned error: {data.get('error-type', 'unknown')}")

exchange_rates = data["rates"]  # This is a dict: {"EUR": 0.9234, ...}

# Get specific rates
eur_rate = exchange_rates["EUR"]
gbp_rate = exchange_rates["GBP"]
cad_rate = exchange_rates.get("CAD", None)  # .get() with default is safer

print(f"1 USD = {eur_rate:.4f} EUR")
print(f"1 USD = {gbp_rate:.4f} GBP")

Handling Missing Fields Defensively

Not every response will contain every field you expect. Use .get() with defaults:

# Risky — raises KeyError if "revenue" is missing
revenue = data["financials"]["annual"]["revenue"]

# Safe — returns None if any level is missing
financials = data.get("financials", {})
annual = financials.get("annual", {})
revenue = annual.get("revenue")

# Or use a helper function for deeply nested access
def safe_get(d, *keys, default=None):
    """Safely navigate nested dictionaries."""
    for key in keys:
        if isinstance(d, dict):
            d = d.get(key, default)
        else:
            return default
    return d

revenue = safe_get(data, "financials", "annual", "revenue")

Processing Lists of Records

Many API responses return lists of objects — one item per record:

# A typical paginated list response:
# {
#     "status": "ok",
#     "totalResults": 847,
#     "articles": [
#         {"title": "...", "publishedAt": "...", "source": {...}},
#         {"title": "...", "publishedAt": "...", "source": {...}},
#         ...
#     ]
# }

response = requests.get(
    "https://newsapi.org/v2/everything",
    headers={"X-Api-Key": news_api_key},
    params={
        "q": "office supplies wholesale distribution",
        "language": "en",
        "sortBy": "publishedAt",
        "pageSize": 20,
    },
    timeout=30,
)

data = response.json()
articles = data.get("articles", [])

for article in articles:
    title = article.get("title", "No title")
    source_name = article.get("source", {}).get("name", "Unknown source")
    published = article.get("publishedAt", "")[:10]  # Just the date part
    url = article.get("url", "")

    print(f"[{published}] {source_name}: {title}")

21.8 Pagination

APIs don't return all results at once. If a database has 50,000 orders and you request them all, the response would be enormous, slow, and potentially crash the server. Instead, APIs use pagination: they return results in pages, and you make multiple requests to get all the data.

Page-Based Pagination

The most common pattern — request page 1, page 2, and so on:

import requests
import time

def fetch_all_news_articles(api_key, query, max_pages=10):
    """
    Fetch all matching news articles, handling pagination automatically.

    Returns a list of all article dictionaries.
    """
    all_articles = []
    page_number = 1
    page_size = 100  # Max per page for this API

    while page_number <= max_pages:
        response = requests.get(
            "https://newsapi.org/v2/everything",
            headers={"X-Api-Key": api_key},
            params={
                "q": query,
                "language": "en",
                "sortBy": "publishedAt",
                "page": page_number,
                "pageSize": page_size,
            },
            timeout=30,
        )
        response.raise_for_status()
        data = response.json()

        articles_this_page = data.get("articles", [])

        # If this page is empty, we've retrieved everything
        if not articles_this_page:
            break

        all_articles.extend(articles_this_page)

        # Check if we've retrieved all available results
        total_results = data.get("totalResults", 0)
        if len(all_articles) >= total_results:
            break

        page_number += 1

        # Polite delay between requests — more on this in 21.9
        time.sleep(0.5)

    return all_articles

# Usage
industry_articles = fetch_all_news_articles(
    api_key=news_api_key,
    query="office supplies distribution",
    max_pages=5,
)
print(f"Retrieved {len(industry_articles)} articles")

Cursor-Based Pagination

Some APIs use cursors instead of page numbers — each response includes a "cursor" token pointing to the next page:

def fetch_with_cursor_pagination(base_url, headers, params, cursor_field="next_cursor"):
    """
    Fetch all results using cursor-based pagination.
    """
    all_results = []
    cursor = None

    while True:
        if cursor:
            params["cursor"] = cursor

        response = requests.get(base_url, headers=headers, params=params, timeout=30)
        response.raise_for_status()
        data = response.json()

        results = data.get("results", data.get("data", []))
        all_results.extend(results)

        # Get next cursor — if None, we're done
        cursor = data.get(cursor_field)
        if not cursor:
            break

        time.sleep(0.25)

    return all_results

Offset-Based Pagination

Some APIs use offset and limit parameters:

def fetch_with_offset_pagination(base_url, headers, params, batch_size=100):
    """
    Fetch all results using offset/limit pagination.
    """
    all_results = []
    offset = 0

    while True:
        params.update({"limit": batch_size, "offset": offset})

        response = requests.get(base_url, headers=headers, params=params, timeout=30)
        response.raise_for_status()
        data = response.json()

        batch = data.get("results", [])
        all_results.extend(batch)

        # If we got fewer results than requested, we've reached the end
        if len(batch) < batch_size:
            break

        offset += batch_size
        time.sleep(0.25)

    return all_results

21.9 Rate Limiting and Retry Logic

APIs enforce rate limits: maximum number of requests per second, minute, or day. Exceed them and you'll get 429 responses. Good production code handles rate limits gracefully rather than crashing.

Understanding Rate Limit Headers

Many APIs tell you your current usage in response headers:

response = requests.get(url, headers=headers, params=params, timeout=30)

# Common rate limit header patterns (vary by API):
requests_remaining = response.headers.get("X-RateLimit-Remaining")
rate_limit_reset = response.headers.get("X-RateLimit-Reset")
retry_after = response.headers.get("Retry-After")

if requests_remaining:
    print(f"Requests remaining this window: {requests_remaining}")

Exponential Backoff

When you hit a rate limit or a transient server error, don't retry immediately — wait, then retry. Exponential backoff increases the wait time with each retry:

import requests
import time
import logging

logger = logging.getLogger(__name__)


def api_request_with_retry(
    url,
    method="GET",
    headers=None,
    params=None,
    json_body=None,
    max_retries=3,
    initial_backoff_seconds=1.0,
    timeout=30,
):
    """
    Make an API request with automatic retry and exponential backoff.

    Retries on:
    - 429 (rate limited)
    - 500, 502, 503, 504 (server errors)
    - Connection errors and timeouts
    """
    backoff_seconds = initial_backoff_seconds

    for attempt_number in range(max_retries + 1):
        try:
            if method.upper() == "GET":
                response = requests.get(
                    url,
                    headers=headers,
                    params=params,
                    timeout=timeout,
                )
            elif method.upper() == "POST":
                response = requests.post(
                    url,
                    headers=headers,
                    params=params,
                    json=json_body,
                    timeout=timeout,
                )
            else:
                raise ValueError(f"Unsupported HTTP method: {method}")

            # Success
            if response.status_code == 200 or response.status_code == 201:
                return response

            # Rate limited — respect Retry-After header if present
            if response.status_code == 429:
                retry_after = response.headers.get("Retry-After")
                wait_time = float(retry_after) if retry_after else backoff_seconds * 2
                logger.warning(
                    f"Rate limited on attempt {attempt_number + 1}. "
                    f"Waiting {wait_time:.1f}s before retry."
                )
                time.sleep(wait_time)
                backoff_seconds *= 2
                continue

            # Server errors — retry with backoff
            if response.status_code in (500, 502, 503, 504):
                if attempt_number < max_retries:
                    logger.warning(
                        f"Server error {response.status_code} on attempt "
                        f"{attempt_number + 1}. Retrying in {backoff_seconds:.1f}s."
                    )
                    time.sleep(backoff_seconds)
                    backoff_seconds *= 2
                    continue
                else:
                    response.raise_for_status()

            # Client errors (4xx except 429) — don't retry, they won't fix themselves
            response.raise_for_status()

        except requests.exceptions.Timeout:
            if attempt_number < max_retries:
                logger.warning(
                    f"Request timed out on attempt {attempt_number + 1}. "
                    f"Retrying in {backoff_seconds:.1f}s."
                )
                time.sleep(backoff_seconds)
                backoff_seconds *= 2
            else:
                raise

        except requests.exceptions.ConnectionError:
            if attempt_number < max_retries:
                logger.warning(
                    f"Connection error on attempt {attempt_number + 1}. "
                    f"Retrying in {backoff_seconds:.1f}s."
                )
                time.sleep(backoff_seconds)
                backoff_seconds *= 2
            else:
                raise

    raise RuntimeError(f"All {max_retries + 1} attempts failed for URL: {url}")

21.10 Error Handling for API Calls

Production API code needs to handle three categories of failure:

  1. Network failures: No internet connection, DNS failures, timeouts
  2. HTTP errors: 4xx (your fault) and 5xx (their fault)
  3. Data errors: Unexpected response format, missing fields, type mismatches
import requests
import json
import logging

logger = logging.getLogger(__name__)


def fetch_exchange_rates(api_key, base_currency="USD"):
    """
    Fetch current exchange rates from ExchangeRate-API.

    Returns dict of {currency_code: rate} or raises an exception.
    """
    url = f"https://v6.exchangerate-api.com/v6/{api_key}/latest/{base_currency}"

    try:
        response = requests.get(url, timeout=15)

    except requests.exceptions.ConnectionError as e:
        logger.error(f"Cannot connect to exchange rate API: {e}")
        raise ConnectionError(
            "Exchange rate API is unreachable. Check internet connection."
        ) from e

    except requests.exceptions.Timeout:
        logger.error("Exchange rate API request timed out after 15 seconds")
        raise TimeoutError(
            "Exchange rate API did not respond within 15 seconds. Try again."
        )

    # Check HTTP status
    if response.status_code == 401:
        raise PermissionError(
            "Invalid API key. Check EXCHANGE_RATE_API_KEY environment variable."
        )

    if response.status_code == 429:
        raise RuntimeError(
            "Exchange rate API rate limit exceeded. Upgrade plan or wait."
        )

    if not response.ok:
        raise RuntimeError(
            f"Exchange rate API returned unexpected status: {response.status_code}"
        )

    # Parse JSON
    try:
        data = response.json()
    except json.JSONDecodeError as e:
        logger.error(f"Exchange rate API returned non-JSON response: {response.text[:200]}")
        raise ValueError("Exchange rate API returned malformed response") from e

    # Validate expected structure
    if data.get("result") != "success":
        error_type = data.get("error-type", "unknown")
        raise ValueError(f"Exchange rate API returned error: {error_type}")

    rates = data.get("rates")
    if not rates or not isinstance(rates, dict):
        raise ValueError("Exchange rate API response missing 'rates' dictionary")

    return rates

21.11 Real Business API Examples

Currency Exchange Rates for International Sales

Acme Corp has started selling to Canadian and European customers. Priya needs to convert their foreign revenue to USD for the quarterly roll-up.

import os
import requests
from datetime import datetime

def get_exchange_rates(base_currency="USD"):
    """
    Fetch current exchange rates from ExchangeRate-API free tier.
    Free tier: https://www.exchangerate-api.com (1,500 requests/month free)
    """
    api_key = os.environ.get("EXCHANGE_RATE_API_KEY")

    # Free tier endpoint — no key required for some base currencies
    # Using open.er-api.com which is free without registration for basic use
    url = f"https://open.er-api.com/v6/latest/{base_currency}"

    response = requests.get(url, timeout=15)
    response.raise_for_status()
    data = response.json()

    if data.get("result") != "success":
        raise ValueError(f"API error: {data}")

    return data["rates"]


def convert_international_sales(sales_records, exchange_rates):
    """
    Convert foreign currency sales amounts to USD.

    Args:
        sales_records: list of dicts with 'amount', 'currency', 'customer'
        exchange_rates: dict from get_exchange_rates()

    Returns:
        list of records with added 'amount_usd' field
    """
    converted_records = []

    for record in sales_records:
        currency = record["currency"].upper()
        amount = record["amount"]

        if currency == "USD":
            amount_usd = amount
        elif currency in exchange_rates:
            # exchange_rates["EUR"] = 0.9234 means 1 USD = 0.9234 EUR
            # So to get USD: divide the foreign amount by the rate
            amount_usd = amount / exchange_rates[currency]
        else:
            print(f"Warning: Unknown currency '{currency}' for {record['customer']}")
            amount_usd = None

        converted_records.append({
            **record,
            "amount_usd": round(amount_usd, 2) if amount_usd else None,
            "conversion_date": datetime.now().strftime("%Y-%m-%d"),
        })

    return converted_records


# Example usage
acme_international_sales = [
    {"customer": "Maple Office Supplies Ltd", "amount": 12500.00, "currency": "CAD"},
    {"customer": "Euro Bürobedarf GmbH", "amount": 8400.00, "currency": "EUR"},
    {"customer": "Pacific Office Co", "amount": 3200.00, "currency": "USD"},
    {"customer": "Manchester Business Supplies", "amount": 5600.00, "currency": "GBP"},
]

rates = get_exchange_rates("USD")
converted = convert_international_sales(acme_international_sales, rates)

total_usd = sum(r["amount_usd"] for r in converted if r["amount_usd"])
print(f"\nInternational Sales Conversion Summary")
print(f"{'Customer':<35} {'Orig Amount':>12} {'Currency':>8} {'USD Amount':>12}")
print("-" * 72)
for record in converted:
    usd = f"${record['amount_usd']:,.2f}" if record["amount_usd"] else "N/A"
    print(f"{record['customer']:<35} {record['amount']:>12,.2f} {record['currency']:>8} {usd:>12}")
print("-" * 72)
print(f"{'TOTAL USD':>58} ${total_usd:>10,.2f}")

Weather Data for Logistics Planning

Acme's warehouse operations team needs weather alerts for their regional distribution centers:

import requests

def get_weather_for_warehouse(city_name, latitude, longitude):
    """
    Fetch current weather using Open-Meteo (free, no API key required).
    https://open-meteo.com/
    """
    response = requests.get(
        "https://api.open-meteo.com/v1/forecast",
        params={
            "latitude": latitude,
            "longitude": longitude,
            "current_weather": True,
            "hourly": "precipitation_probability,snow_depth",
            "forecast_days": 2,
            "timezone": "America/Chicago",
        },
        timeout=15,
    )
    response.raise_for_status()
    data = response.json()

    current = data["current_weather"]
    weather_code = current["weathercode"]

    # WMO weather code interpretation (simplified)
    severe_weather_codes = {
        55: "Heavy drizzle",
        65: "Heavy rain",
        75: "Heavy snowfall",
        82: "Violent rain showers",
        95: "Thunderstorm",
        99: "Thunderstorm with heavy hail",
    }

    is_severe = weather_code in severe_weather_codes

    return {
        "city": city_name,
        "temperature_c": current["temperature"],
        "wind_speed_kmh": current["windspeed"],
        "weather_code": weather_code,
        "is_severe": is_severe,
        "condition": severe_weather_codes.get(weather_code, "Normal conditions"),
    }


# Acme's regional distribution centers
acme_warehouses = [
    ("Chicago HQ", 41.8781, -87.6298),
    ("Cleveland East", 41.4993, -81.6944),
    ("Memphis South", 35.1495, -90.0490),
    ("Denver West", 39.7392, -104.9903),
]

print("Acme Corp Warehouse Weather Status")
print("=" * 55)
for city, lat, lon in acme_warehouses:
    weather = get_weather_for_warehouse(city, lat, lon)
    alert = " *** WEATHER ALERT ***" if weather["is_severe"] else ""
    print(
        f"{weather['city']:<20} {weather['temperature_c']:>5.1f}°C  "
        f"Wind: {weather['wind_speed_kmh']:>5.1f} km/h  "
        f"{weather['condition']}{alert}"
    )

Public Financial Data with Alpha Vantage

Alpha Vantage's free tier provides company financial data:

import os
import requests

def get_company_overview(ticker_symbol):
    """
    Fetch company overview from Alpha Vantage free API.
    Free tier: 25 requests/day, 5 requests/minute.
    Sign up at https://www.alphavantage.co/support/#api-key
    """
    api_key = os.environ.get("ALPHA_VANTAGE_API_KEY")
    if not api_key:
        raise ValueError("Set ALPHA_VANTAGE_API_KEY environment variable")

    response = requests.get(
        "https://www.alphavantage.co/query",
        params={
            "function": "OVERVIEW",
            "symbol": ticker_symbol,
            "apikey": api_key,
        },
        timeout=20,
    )
    response.raise_for_status()
    data = response.json()

    # Alpha Vantage returns empty dict for invalid symbols
    if not data or "Name" not in data:
        raise ValueError(f"No data found for ticker: {ticker_symbol}")

    return {
        "ticker": ticker_symbol,
        "name": data.get("Name"),
        "industry": data.get("Industry"),
        "sector": data.get("Sector"),
        "market_cap": data.get("MarketCapitalization"),
        "pe_ratio": data.get("PERatio"),
        "revenue_ttm": data.get("RevenueTTM"),
        "profit_margin": data.get("ProfitMargin"),
        "description": data.get("Description", "")[:200] + "...",
    }


def get_stock_quote(ticker_symbol):
    """
    Get current stock quote from Alpha Vantage.
    """
    api_key = os.environ.get("ALPHA_VANTAGE_API_KEY")

    response = requests.get(
        "https://www.alphavantage.co/query",
        params={
            "function": "GLOBAL_QUOTE",
            "symbol": ticker_symbol,
            "apikey": api_key,
        },
        timeout=20,
    )
    response.raise_for_status()
    data = response.json()

    quote_data = data.get("Global Quote", {})
    if not quote_data:
        raise ValueError(f"No quote data for {ticker_symbol}")

    return {
        "ticker": ticker_symbol,
        "price": float(quote_data.get("05. price", 0)),
        "change": float(quote_data.get("09. change", 0)),
        "change_pct": quote_data.get("10. change percent", "0%"),
        "volume": int(quote_data.get("06. volume", 0)),
        "latest_trading_day": quote_data.get("07. latest trading day"),
    }

News API for Business Monitoring

Tracking industry news for competitive intelligence:

import os
import requests
from datetime import datetime, timedelta

def fetch_industry_news(search_query, days_back=7, max_articles=20):
    """
    Fetch recent news articles using NewsAPI.
    Free tier: 100 requests/day, developer use only.
    Sign up at https://newsapi.org/register
    """
    api_key = os.environ.get("NEWS_API_KEY")
    if not api_key:
        raise ValueError("Set NEWS_API_KEY environment variable")

    from_date = (datetime.now() - timedelta(days=days_back)).strftime("%Y-%m-%d")

    response = requests.get(
        "https://newsapi.org/v2/everything",
        headers={"X-Api-Key": api_key},
        params={
            "q": search_query,
            "from": from_date,
            "language": "en",
            "sortBy": "relevancy",
            "pageSize": max_articles,
        },
        timeout=20,
    )
    response.raise_for_status()
    data = response.json()

    if data.get("status") != "ok":
        raise ValueError(f"NewsAPI error: {data.get('message', 'Unknown error')}")

    articles = []
    for article in data.get("articles", []):
        articles.append({
            "title": article.get("title"),
            "source": article.get("source", {}).get("name"),
            "published_at": article.get("publishedAt", "")[:10],
            "description": article.get("description", "")[:200],
            "url": article.get("url"),
        })

    return articles


# Usage for Priya's competitive monitoring
competitor_news = fetch_industry_news(
    "office supplies distribution wholesale",
    days_back=7,
    max_articles=10,
)

print(f"Industry News — Last 7 Days ({len(competitor_news)} articles)")
print("=" * 65)
for article in competitor_news:
    print(f"\n[{article['published_at']}] {article['source']}")
    print(f"  {article['title']}")
    if article['description']:
        print(f"  {article['description'][:100]}...")

21.12 Building a Reusable API Client

Once you're making calls to multiple APIs, repeating authentication, retry logic, and error handling in every script is wasteful. A reusable API client class encapsulates these concerns once:

import os
import time
import logging
import requests
from typing import Any

logger = logging.getLogger(__name__)


class APIClient:
    """
    Reusable HTTP API client with authentication, retry logic, and error handling.

    Designed for business data retrieval where reliability matters more than speed.
    """

    def __init__(
        self,
        base_url,
        api_key=None,
        api_key_header="X-Api-Key",
        bearer_token=None,
        default_timeout=30,
        max_retries=3,
        rate_limit_pause_seconds=1.0,
    ):
        self.base_url = base_url.rstrip("/")
        self.default_timeout = default_timeout
        self.max_retries = max_retries
        self.rate_limit_pause_seconds = rate_limit_pause_seconds

        # Build default headers
        self.default_headers = {"Accept": "application/json"}

        if api_key:
            self.default_headers[api_key_header] = api_key

        if bearer_token:
            self.default_headers["Authorization"] = f"Bearer {bearer_token}"

        # Use a requests.Session for connection pooling (faster for multiple calls)
        self.session = requests.Session()
        self.session.headers.update(self.default_headers)

    def get(self, endpoint, params=None, extra_headers=None):
        """
        Make a GET request. Returns parsed JSON response.

        Args:
            endpoint: URL path relative to base_url (e.g., "/v1/rates/USD")
            params: dict of query parameters
            extra_headers: additional headers for this specific request

        Returns:
            Parsed JSON as dict or list

        Raises:
            requests.HTTPError for non-retryable HTTP errors
            ConnectionError for network failures after all retries
        """
        url = f"{self.base_url}{endpoint}"
        headers = extra_headers or {}

        return self._request_with_retry("GET", url, params=params, headers=headers)

    def post(self, endpoint, json_body=None, params=None, extra_headers=None):
        """Make a POST request. Returns parsed JSON response."""
        url = f"{self.base_url}{endpoint}"
        headers = extra_headers or {}

        return self._request_with_retry(
            "POST", url, params=params, headers=headers, json_body=json_body
        )

    def _request_with_retry(self, method, url, params=None, headers=None, json_body=None):
        """Internal method: execute request with retry/backoff logic."""
        backoff = self.rate_limit_pause_seconds

        for attempt in range(self.max_retries + 1):
            try:
                response = self.session.request(
                    method=method,
                    url=url,
                    params=params,
                    headers=headers,
                    json=json_body,
                    timeout=self.default_timeout,
                )

                # Success
                if response.status_code in (200, 201, 204):
                    if response.status_code == 204 or not response.text:
                        return {}
                    return response.json()

                # Rate limited
                if response.status_code == 429:
                    retry_after = response.headers.get("Retry-After")
                    wait = float(retry_after) if retry_after else backoff * 2
                    logger.warning(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})")
                    time.sleep(wait)
                    backoff = min(backoff * 2, 60)
                    continue

                # Retryable server errors
                if response.status_code in (500, 502, 503, 504) and attempt < self.max_retries:
                    logger.warning(
                        f"Server error {response.status_code}. "
                        f"Retrying in {backoff:.1f}s (attempt {attempt + 1})"
                    )
                    time.sleep(backoff)
                    backoff = min(backoff * 2, 60)
                    continue

                # All other errors — raise immediately
                response.raise_for_status()

            except requests.exceptions.Timeout:
                if attempt < self.max_retries:
                    logger.warning(f"Timeout. Retrying in {backoff:.1f}s (attempt {attempt + 1})")
                    time.sleep(backoff)
                    backoff = min(backoff * 2, 60)
                else:
                    raise

            except requests.exceptions.ConnectionError:
                if attempt < self.max_retries:
                    logger.warning(
                        f"Connection error. Retrying in {backoff:.1f}s (attempt {attempt + 1})"
                    )
                    time.sleep(backoff)
                    backoff = min(backoff * 2, 60)
                else:
                    raise

        raise RuntimeError(f"All {self.max_retries + 1} attempts exhausted for: {url}")

With this client, calling any API becomes clean and consistent:

# Exchange rate client
exchange_client = APIClient(
    base_url="https://open.er-api.com",
    default_timeout=15,
)
rates_data = exchange_client.get("/v6/latest/USD")
exchange_rates = rates_data["rates"]

# News client
news_client = APIClient(
    base_url="https://newsapi.org",
    api_key=os.environ.get("NEWS_API_KEY"),
    api_key_header="X-Api-Key",
)
news_data = news_client.get("/v2/everything", params={"q": "office supplies", "pageSize": 10})
articles = news_data.get("articles", [])

21.13 Practical Summary: A Complete API Data Pull

Here's how a real business data pull comes together, using everything from this chapter:

"""
acme_market_intelligence.py

Pulls industry news, exchange rates, and weather data from public APIs.
Combines them into a market intelligence summary for Priya's Monday report.

Requirements:
    pip install requests python-dotenv

Environment variables required:
    NEWS_API_KEY  — from newsapi.org (free tier)
    (Exchange rates and weather are free/no key required)
"""

import os
import csv
import logging
from datetime import datetime
from dotenv import load_dotenv
import requests

# Load environment variables from .env file
load_dotenv()

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)
logger = logging.getLogger(__name__)


def fetch_exchange_rates():
    """Fetch USD exchange rates — free, no API key required."""
    logger.info("Fetching exchange rates...")
    response = requests.get("https://open.er-api.com/v6/latest/USD", timeout=15)
    response.raise_for_status()
    data = response.json()
    return data["rates"]


def fetch_industry_news(query, max_articles=10):
    """Fetch industry news via NewsAPI."""
    api_key = os.environ.get("NEWS_API_KEY")
    if not api_key:
        logger.warning("NEWS_API_KEY not set — skipping news fetch")
        return []

    logger.info(f"Fetching news for: '{query}'")
    response = requests.get(
        "https://newsapi.org/v2/everything",
        headers={"X-Api-Key": api_key},
        params={
            "q": query,
            "language": "en",
            "sortBy": "publishedAt",
            "pageSize": max_articles,
        },
        timeout=20,
    )
    response.raise_for_status()
    data = response.json()

    return [
        {
            "title": a.get("title", ""),
            "source": a.get("source", {}).get("name", ""),
            "published": a.get("publishedAt", "")[:10],
            "url": a.get("url", ""),
        }
        for a in data.get("articles", [])
    ]


def save_exchange_rates_to_csv(rates, output_path):
    """Save exchange rates snapshot to CSV for historical tracking."""
    today = datetime.now().strftime("%Y-%m-%d")
    currencies_of_interest = ["EUR", "GBP", "CAD", "JPY", "AUD", "MXN", "CHF"]

    rows = []
    for currency in currencies_of_interest:
        if currency in rates:
            rows.append({
                "date": today,
                "base_currency": "USD",
                "target_currency": currency,
                "rate": rates[currency],
            })

    with open(output_path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=["date", "base_currency", "target_currency", "rate"])
        writer.writeheader()
        writer.writerows(rows)

    logger.info(f"Exchange rates saved to {output_path}")


def generate_market_intelligence_report(exchange_rates, news_articles, output_path):
    """Write a plain-text market intelligence summary."""
    today = datetime.now().strftime("%Y-%m-%d")

    with open(output_path, "w", encoding="utf-8") as f:
        f.write(f"ACME CORP MARKET INTELLIGENCE BRIEF\n")
        f.write(f"Generated: {today}\n")
        f.write("=" * 60 + "\n\n")

        f.write("EXCHANGE RATES (USD Base)\n")
        f.write("-" * 40 + "\n")
        for currency in ["EUR", "GBP", "CAD", "JPY", "AUD"]:
            if currency in exchange_rates:
                f.write(f"  1 USD = {exchange_rates[currency]:.4f} {currency}\n")
        f.write("\n")

        f.write(f"INDUSTRY NEWS ({len(news_articles)} articles)\n")
        f.write("-" * 40 + "\n")
        for article in news_articles[:5]:
            f.write(f"\n[{article['published']}] {article['source']}\n")
            f.write(f"  {article['title']}\n")

    logger.info(f"Market intelligence report saved to {output_path}")


if __name__ == "__main__":
    exchange_rates = fetch_exchange_rates()

    news_articles = fetch_industry_news(
        "office supplies wholesale distribution",
        max_articles=10,
    )

    save_exchange_rates_to_csv(exchange_rates, "acme_exchange_rates_today.csv")

    generate_market_intelligence_report(
        exchange_rates,
        news_articles,
        "acme_market_intelligence.txt",
    )

    print("\nMarket intelligence pull complete.")
    print(f"  Exchange rates: {len(exchange_rates)} currencies")
    print(f"  News articles: {len(news_articles)} articles")

Chapter Summary

APIs transform your Python code from a tool that processes internal data into a system that connects to the world. The key ideas from this chapter:

  • APIs are structured data services accessed over HTTP, returning JSON you can work with directly in Python
  • The requests library makes HTTP calls straightforward: requests.get() for fetching data, response.json() for parsing it
  • Authentication usually means API keys in headers or query params — always stored in environment variables, never hardcoded
  • Error handling is non-negotiable in production code: check status codes, handle network failures, validate response structure
  • Pagination is the norm for large datasets — design your fetch functions to handle multi-page responses from the start
  • Rate limiting is a constraint, not an obstacle — exponential backoff handles it cleanly
  • A reusable APIClient class eliminates repetition across your API-consuming scripts

The next chapter introduces scheduling: once you've built scripts like these, you'll want them to run automatically on a schedule. Chapter 22 shows you how.


Next: Chapter 22 — Scheduling and Task Automation