12 min read

The chat interface is a remarkable thing. It puts a powerful AI assistant within reach of anyone — no setup, no code, no technical knowledge required. For many uses, it is exactly the right tool.

Chapter 36: Programmatic AI — APIs, Python, and Automations

Beyond the Chat Interface

The chat interface is a remarkable thing. It puts a powerful AI assistant within reach of anyone — no setup, no code, no technical knowledge required. For many uses, it is exactly the right tool.

But the chat interface has fundamental constraints. It is designed for one person asking one question at a time. It stores no persistent state beyond the current session. It cannot trigger on external events. It cannot process a folder of 500 documents overnight. It cannot be integrated into an existing application or data pipeline. It cannot run on a schedule without a human clicking "send."

The API removes all of these constraints.

An API (Application Programming Interface) is how software systems communicate with each other. When you call the Anthropic API from a Python script, you are programmatically sending a message to Claude and receiving a response — the same fundamental interaction as the chat interface, but now under your complete control. You control exactly what is sent, exactly what happens with the response, how errors are handled, how many requests run in parallel, what gets logged, and what triggers a request in the first place.

This chapter builds your complete foundation for programmatic AI use. It covers the Anthropic and OpenAI Python SDKs, multi-turn conversation management, batch processing, rate limiting and retry logic, streaming, cost management, and a set of practical automation examples you can adapt directly to your work. All code in this chapter is syntactically correct and runnable.

You do not need to be a software engineer to benefit from this chapter. If you can install Python packages and run a script, you can build useful AI automations. The chapter is written to be accessible to practitioners with basic Python familiarity while also providing depth for those with more experience.

Why Use the API Rather Than the Chat Interface?

Before writing a line of code, it is worth being clear about when the API is the right choice and when it is not.

Use the API when:

You need to process multiple items systematically — analyzing 200 customer survey responses, generating descriptions for 500 product SKUs, translating a collection of documents. The chat interface requires a human to submit each request; the API can process them in a loop while you do other things.

You need to trigger AI on external events — a new support ticket arrives, a form is submitted, a file appears in a folder, a database record changes. The API can be called from event handlers, webhooks, and scheduled jobs.

You need to integrate AI into an existing system — your CRM, your document management platform, your internal tools. The API is how you add AI capabilities to software that did not previously have them.

You need precise control over prompts, models, and parameters for consistent results — the chat interface's behavior varies based on conversational context in ways that can be unpredictable; the API gives you complete control over every parameter of every request.

You need to build something other people can use — a configured assistant, a Slack bot, an internal tool, a customer-facing feature. The API is the mechanism.

Stay with the chat interface when:

The task is genuinely one-off and involves real-time human judgment throughout. If you will read and evaluate each AI response before deciding what to do next, the chat interface is appropriate.

You are exploring a problem space — brainstorming, drafting, thinking out loud. The conversational format is a feature, not a limitation, in exploratory work.

The output is for your own consumption and does not need to be integrated with other systems.

You are working on a sensitive task where you want maximum visibility into each interaction. The chat interface provides a natural audit trail of your session.

The API is not inherently better than the chat interface. It is a different tool for different purposes. The practitioner who knows when to use each is more effective than one who has adopted a blanket preference.

API Concepts: What You Need to Know

Before writing code, a brief tour of the concepts that govern API use.

Endpoints are the specific URLs you send requests to. Each API provider has different endpoints for different capabilities (text generation, image generation, embeddings, etc.). This chapter focuses on the text generation (chat completions) endpoints.

Authentication proves to the API that you are authorized to use it. API keys are the standard mechanism — a long string of characters that acts as a password for your account. Keep API keys secret and never put them in code that will be shared or version-controlled.

Requests and responses follow a predictable structure. You send a request containing your model selection, your messages, and optional parameters; you receive a response containing the AI's output, usage statistics, and metadata.

Rate limits cap how many requests you can make per minute or per day. Hitting a rate limit returns an error rather than a response. Your code needs to handle these errors gracefully — either by waiting and retrying, or by managing request timing to avoid hitting limits.

Tokens and cost are how API usage is measured and billed. A token is roughly four characters or three-quarters of a word. Input tokens (what you send) and output tokens (what the API generates) are both counted. Different models have different per-token prices. Chapter 36 covers cost management in detail.

Models are the specific AI systems available through the API. Different models offer different capability and cost tradeoffs. For the Anthropic API: claude-opus-4-6 is the most capable; claude-haiku-4-5 is the fastest and most economical. For the OpenAI API: gpt-4o is the most capable; gpt-4o-mini is the economical option.

Setting Up Your Environment

Before writing code, you need: - Python 3.9 or later - The anthropic and openai packages - A .env file for storing API keys securely

🐍 Code Block: Environment Setup

# requirements: pip install anthropic openai python-dotenv

from dotenv import load_dotenv
import os

load_dotenv()  # loads ANTHROPIC_API_KEY and OPENAI_API_KEY from .env

# Verify keys are loaded (never print the actual key values)
anthropic_key_loaded = bool(os.getenv("ANTHROPIC_API_KEY"))
openai_key_loaded = bool(os.getenv("OPENAI_API_KEY"))
print(f"Anthropic key loaded: {anthropic_key_loaded}")
print(f"OpenAI key loaded: {openai_key_loaded}")

Your .env file (stored in the same directory as your script, never committed to version control):

ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-your-openai-key-here

Your .gitignore file must include:

.env
*.env

⚠️ Common Pitfall: Hardcoding API keys directly in scripts is a serious security risk. If that code is shared — via GitHub, Slack, email, or any version control system — your key is compromised. Anyone with your API key can make requests billed to your account. Always use environment variables or a secrets manager.

Best Practice: On a team, use a secrets manager (AWS Secrets Manager, HashiCorp Vault, Doppler) rather than distributing .env files. For personal projects, .env with .gitignore is sufficient.

The Anthropic Python SDK: Comprehensive Coverage

The Anthropic Python SDK is the official way to interact with Claude programmatically. It handles authentication, request formatting, response parsing, and error handling.

Installation and Basic Usage

pip install anthropic

🐍 Code Block: Basic Anthropic API Call

import anthropic

client = anthropic.Anthropic()
# The client automatically reads ANTHROPIC_API_KEY from the environment

# Basic message
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=2048,
    system="You are an expert technical writer.",
    messages=[
        {"role": "user", "content": "Explain REST APIs to a non-technical audience."}
    ]
)

print(response.content[0].text)
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

The response object contains several important attributes:

  • response.content[0].text — the AI's text response
  • response.usage.input_tokens — how many tokens your prompt consumed
  • response.usage.output_tokens — how many tokens the response consumed
  • response.model — which model was used
  • response.stop_reason — why the model stopped generating ("end_turn" means normal completion; "max_tokens" means the response was cut off by the token limit)

⚠️ Common Pitfall: Forgetting to check response.stop_reason. If it is "max_tokens", the response is truncated — you received an incomplete output. Either increase max_tokens or design your prompts to produce shorter outputs.

The Messages API in Detail

The Anthropic Messages API uses a conversational message structure. Every request is a list of messages alternating between "user" and "assistant" roles. The system prompt is passed separately.

🐍 Code Block: Messages API Parameters

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",          # Required: model identifier
    max_tokens=4096,                   # Required: maximum output tokens
    system="You are a helpful assistant specializing in data analysis.",  # Optional system prompt
    messages=[                         # Required: conversation history
        {"role": "user", "content": "What are the main approaches to data normalization?"},
        {"role": "assistant", "content": "There are three main normal forms..."},  # Optional: prior turns
        {"role": "user", "content": "Can you compare 2NF and 3NF specifically?"}
    ],
    temperature=0.7,                   # Optional: 0.0 (deterministic) to 1.0 (creative)
    top_p=0.9,                         # Optional: nucleus sampling parameter
    stop_sequences=["END", "DONE"],    # Optional: stop generation at these strings
    metadata={"user_id": "raj-123"}    # Optional: request metadata for your records
)

Key parameters explained:

max_tokens: Set this to comfortably accommodate the response you expect. Do not set it to the model's maximum every time — unused token budget does not cost anything, but setting it very high can lead to unnecessarily verbose responses. For short-answer tasks, 512 tokens is usually sufficient. For detailed analyses, 2048-4096.

temperature: Controls randomness. For factual or analytical tasks (data extraction, classification, summarization), use 0.0-0.3. For creative tasks (writing, brainstorming), use 0.7-1.0. For most workflow automation, 0.3 is a reasonable default.

system: The system prompt sets the AI's persona, capabilities, and behavioral constraints for the entire interaction. Well-designed system prompts dramatically improve output consistency.

Handling API Responses

🐍 Code Block: Complete Response Handling

import anthropic
from anthropic import APIError, APIConnectionError, RateLimitError, AuthenticationError

client = anthropic.Anthropic()

def safe_api_call(prompt: str, system: str = "", model: str = "claude-opus-4-6") -> dict:
    """
    Make an API call with complete error handling.
    Returns a dict with 'success', 'text', 'usage', and 'error' keys.
    """
    try:
        response = client.messages.create(
            model=model,
            max_tokens=2048,
            system=system,
            messages=[{"role": "user", "content": prompt}]
        )

        # Check for truncation
        if response.stop_reason == "max_tokens":
            print("Warning: Response was truncated. Consider increasing max_tokens.")

        return {
            "success": True,
            "text": response.content[0].text,
            "usage": {
                "input_tokens": response.usage.input_tokens,
                "output_tokens": response.usage.output_tokens,
                "total_tokens": response.usage.input_tokens + response.usage.output_tokens
            },
            "stop_reason": response.stop_reason,
            "error": None
        }

    except AuthenticationError:
        return {"success": False, "text": None, "usage": None, "error": "Invalid API key. Check your ANTHROPIC_API_KEY environment variable."}
    except RateLimitError:
        return {"success": False, "text": None, "usage": None, "error": "Rate limit hit. Slow down your requests."}
    except APIConnectionError:
        return {"success": False, "text": None, "usage": None, "error": "Connection failed. Check your internet connection."}
    except APIError as e:
        return {"success": False, "text": None, "usage": None, "error": f"API error: {e.status_code} - {e.message}"}

# Usage
result = safe_api_call(
    prompt="Summarize the three main benefits of containerization in software deployment.",
    system="You are a technical writer. Be concise and clear."
)

if result["success"]:
    print(result["text"])
    print(f"Tokens used: {result['usage']['total_tokens']}")
else:
    print(f"Error: {result['error']}")

The OpenAI Python SDK

The OpenAI SDK follows a similar pattern to Anthropic but with some structural differences.

🐍 Code Block: Basic OpenAI API Call

from openai import OpenAI

client = OpenAI()
# Automatically reads OPENAI_API_KEY from environment

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are an expert technical writer."},
        {"role": "user", "content": "Explain REST APIs to a non-technical audience."}
    ],
    max_tokens=2048,
    temperature=0.7
)

print(response.choices[0].message.content)
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

Key structural differences from Anthropic:

  • OpenAI uses client.chat.completions.create() vs. Anthropic's client.messages.create()
  • OpenAI includes the system message in the messages list (as {"role": "system", ...})
  • OpenAI returns response.choices[0].message.content vs. Anthropic's response.content[0].text
  • Token counts use response.usage.prompt_tokens and response.usage.completion_tokens vs. input_tokens and output_tokens
  • Finish reason is in response.choices[0].finish_reason vs. response.stop_reason

🐍 Code Block: OpenAI with Error Handling

from openai import OpenAI, AuthenticationError, RateLimitError, APIConnectionError, APIError

client = OpenAI()

def openai_call(prompt: str, system: str = "", model: str = "gpt-4o") -> dict:
    """OpenAI API call with complete error handling."""
    try:
        messages = []
        if system:
            messages.append({"role": "system", "content": system})
        messages.append({"role": "user", "content": prompt})

        response = client.chat.completions.create(
            model=model,
            messages=messages,
            max_tokens=2048
        )

        return {
            "success": True,
            "text": response.choices[0].message.content,
            "usage": {
                "input_tokens": response.usage.prompt_tokens,
                "output_tokens": response.usage.completion_tokens,
                "total_tokens": response.usage.total_tokens
            },
            "finish_reason": response.choices[0].finish_reason,
            "error": None
        }

    except AuthenticationError:
        return {"success": False, "text": None, "usage": None, "error": "Invalid OpenAI API key."}
    except RateLimitError:
        return {"success": False, "text": None, "usage": None, "error": "OpenAI rate limit hit."}
    except APIConnectionError:
        return {"success": False, "text": None, "usage": None, "error": "Connection failed."}
    except APIError as e:
        return {"success": False, "text": None, "usage": None, "error": f"OpenAI API error: {str(e)}"}

Managing Conversations: Multi-Turn Interactions

A multi-turn conversation maintains message history across exchanges, allowing the AI to reference earlier parts of the conversation.

🐍 Code Block: Conversation Manager (Anthropic)

import anthropic

client = anthropic.Anthropic()

def chat_session(system_prompt: str):
    """Simple multi-turn conversation manager."""
    messages = []
    print(f"System: {system_prompt}")
    print("Type 'quit' or 'exit' to end the session.\n")

    while True:
        user_input = input("You: ").strip()
        if user_input.lower() in ["quit", "exit"]:
            print("Session ended.")
            break
        if not user_input:
            continue

        messages.append({"role": "user", "content": user_input})

        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=2048,
            system=system_prompt,
            messages=messages
        )

        assistant_message = response.content[0].text
        messages.append({"role": "assistant", "content": assistant_message})
        print(f"\nClaude: {assistant_message}\n")

    return messages

# Run a session
conversation_history = chat_session(
    "You are a technical code reviewer specializing in Python best practices. "
    "When reviewing code, always explain your reasoning and suggest specific improvements."
)

Important: Managing context in long conversations

As conversations grow longer, the cumulative message history grows too, eventually approaching the model's context window limit. For automated pipelines that need sustained multi-turn context, implement a summarization strategy:

🐍 Code Block: Conversation with Context Summarization

import anthropic

client = anthropic.Anthropic()

def summarize_conversation(messages: list[dict]) -> str:
    """Condense a long conversation into a summary for context management."""
    conversation_text = "\n".join([
        f"{msg['role'].upper()}: {msg['content']}"
        for msg in messages
    ])

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # Use fast model for utility tasks
        max_tokens=512,
        messages=[{
            "role": "user",
            "content": (
                f"Summarize the key points from this conversation in 200 words or less. "
                f"Focus on decisions made, information established, and open questions:\n\n"
                f"{conversation_text}"
            )
        }]
    )
    return response.content[0].text

class ManagedConversation:
    """A conversation manager that summarizes when history gets long."""

    def __init__(self, system_prompt: str, max_turns_before_summary: int = 10):
        self.system_prompt = system_prompt
        self.messages = []
        self.max_turns = max_turns_before_summary
        self.summary_context = ""

    def send(self, user_input: str) -> str:
        """Send a message and get a response, managing context automatically."""
        # Check if we need to summarize
        if len(self.messages) >= self.max_turns * 2:  # *2 because each turn is 2 messages
            print("(Summarizing conversation history to manage context...)")
            self.summary_context = summarize_conversation(self.messages)
            self.messages = []  # Clear old messages

        self.messages.append({"role": "user", "content": user_input})

        # Build the system prompt, incorporating summary if available
        effective_system = self.system_prompt
        if self.summary_context:
            effective_system += f"\n\nContext from earlier in this conversation:\n{self.summary_context}"

        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=2048,
            system=effective_system,
            messages=self.messages
        )

        assistant_message = response.content[0].text
        self.messages.append({"role": "assistant", "content": assistant_message})
        return assistant_message

# Usage
conv = ManagedConversation(
    system_prompt="You are a code review assistant. Review code for correctness, efficiency, and style.",
    max_turns_before_summary=8
)

response = conv.send("Here's a Python function I wrote: def calc(x, y): return x/y")
print(f"Assistant: {response}")

response = conv.send("What if y is zero?")
print(f"Assistant: {response}")

Batch Processing: Working at Scale

Batch processing is where the API's value over the chat interface is most dramatic. Processing 100 items that would take hours of manual copy-paste work can be done in minutes.

🐍 Code Block: Batch Summarization Pipeline

import anthropic
import json
from pathlib import Path

def batch_summarize(texts: list[str], output_file: str = "summaries.json") -> list[dict]:
    """Summarize multiple texts using the API."""
    client = anthropic.Anthropic()
    results = []

    for i, text in enumerate(texts, 1):
        print(f"Processing {i}/{len(texts)}...")

        response = client.messages.create(
            model="claude-haiku-4-5-20251001",  # Use faster model for batch tasks
            max_tokens=512,
            messages=[{
                "role": "user",
                "content": f"Summarize the following in 2-3 sentences:\n\n{text}"
            }]
        )

        results.append({
            "original": text[:100] + "...",
            "summary": response.content[0].text,
            "tokens_used": response.usage.input_tokens + response.usage.output_tokens
        })

    Path(output_file).write_text(json.dumps(results, indent=2))
    print(f"Saved {len(results)} summaries to {output_file}")
    return results

# Example usage
sample_texts = [
    "Customer feedback: The onboarding process was confusing. I couldn't find where to set up integrations and the help docs weren't helpful. Eventually figured it out after 45 minutes but almost cancelled.",
    "Customer feedback: Love the product overall. The dashboard is intuitive and the reporting features are exactly what we needed. The mobile app could use some work — it's missing the bulk actions that the web version has.",
    "Customer feedback: Support was fantastic when I had issues. They responded within an hour and solved my problem completely. The price is a bit high compared to alternatives but the quality justifies it.",
]

summaries = batch_summarize(sample_texts)
for i, result in enumerate(summaries, 1):
    print(f"\n--- Item {i} ---")
    print(f"Summary: {result['summary']}")
    print(f"Tokens: {result['tokens_used']}")

Batch Processing with Progress Tracking and Error Recovery

For large batches, you need progress tracking and the ability to resume if the script fails partway through:

🐍 Code Block: Resumable Batch Processor

import anthropic
import json
import time
from pathlib import Path
from datetime import datetime

client = anthropic.Anthropic()

def process_batch_with_recovery(
    items: list[dict],
    process_fn,
    output_file: str = "batch_output.json",
    checkpoint_file: str = "batch_checkpoint.json"
) -> list[dict]:
    """
    Process a batch of items with checkpointing for recovery.

    Args:
        items: List of dicts, each with an 'id' key
        process_fn: Function that takes an item dict and returns a result dict
        output_file: Where to save completed results
        checkpoint_file: Where to save progress for recovery

    Returns: List of result dicts
    """
    # Load existing checkpoint if present
    checkpoint_path = Path(checkpoint_file)
    completed_ids = set()
    results = []

    if checkpoint_path.exists():
        checkpoint = json.loads(checkpoint_path.read_text())
        completed_ids = set(checkpoint.get("completed_ids", []))
        results = checkpoint.get("results", [])
        print(f"Resuming: {len(completed_ids)} items already processed")

    # Process remaining items
    remaining = [item for item in items if item["id"] not in completed_ids]
    total = len(items)

    for i, item in enumerate(remaining, 1):
        current_num = len(completed_ids) + i
        print(f"Processing {current_num}/{total}: {item['id']}")

        try:
            result = process_fn(item)
            results.append(result)
            completed_ids.add(item["id"])

            # Save checkpoint every 10 items
            if current_num % 10 == 0:
                checkpoint_data = {
                    "completed_ids": list(completed_ids),
                    "results": results,
                    "last_updated": datetime.now().isoformat()
                }
                checkpoint_path.write_text(json.dumps(checkpoint_data, indent=2))
                print(f"  Checkpoint saved at item {current_num}")

        except Exception as e:
            print(f"  ERROR processing {item['id']}: {e}")
            results.append({
                "id": item["id"],
                "success": False,
                "error": str(e)
            })

        # Small delay to avoid rate limits
        time.sleep(0.5)

    # Save final output
    Path(output_file).write_text(json.dumps(results, indent=2))

    # Clean up checkpoint
    if checkpoint_path.exists():
        checkpoint_path.unlink()

    print(f"\nBatch complete. {len(results)} items processed.")
    print(f"Results saved to {output_file}")
    return results


def classify_feedback(item: dict) -> dict:
    """Example processing function: classify customer feedback."""
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": (
                f"Classify this customer feedback:\n\n{item['text']}\n\n"
                "Respond with a JSON object containing:\n"
                "- sentiment: positive/neutral/negative\n"
                "- category: product/support/pricing/ux/other\n"
                "- priority: high/medium/low\n"
                "- key_issue: one sentence summary of the main point\n\n"
                "Respond with only the JSON object, no other text."
            )
        }]
    )

    try:
        classification = json.loads(response.content[0].text)
    except json.JSONDecodeError:
        classification = {"raw_response": response.content[0].text}

    return {
        "id": item["id"],
        "success": True,
        "classification": classification,
        "tokens_used": response.usage.input_tokens + response.usage.output_tokens
    }


# Example usage with the resumable processor
feedback_items = [
    {"id": "fb001", "text": "The API documentation is excellent and the SDKs are well-designed."},
    {"id": "fb002", "text": "Pricing is way too high for small teams. We almost churned because of cost."},
    {"id": "fb003", "text": "Support team responded quickly but didn't resolve the actual issue."},
]

results = process_batch_with_recovery(
    items=feedback_items,
    process_fn=classify_feedback,
    output_file="feedback_classifications.json"
)

# Report results
successful = [r for r in results if r.get("success")]
print(f"\n{len(successful)}/{len(results)} items processed successfully")

Rate Limiting and Retry Logic

Rate limits are not errors to avoid — they are a normal part of API usage, especially when processing large batches. Robust code handles them gracefully.

🐍 Code Block: Exponential Backoff Retry

import time
import anthropic
from anthropic import RateLimitError, APIConnectionError

def api_call_with_retry(
    client: anthropic.Anthropic,
    max_retries: int = 3,
    base_wait: float = 1.0,
    **kwargs
) -> anthropic.types.Message:
    """
    API call with exponential backoff on rate limit and connection errors.

    Args:
        client: Anthropic client instance
        max_retries: Maximum number of retry attempts
        base_wait: Base wait time in seconds (doubles each retry)
        **kwargs: Arguments to pass to client.messages.create()

    Returns: API response
    Raises: The last exception if all retries are exhausted
    """
    last_exception = None

    for attempt in range(max_retries):
        try:
            return client.messages.create(**kwargs)

        except RateLimitError as e:
            last_exception = e
            if attempt == max_retries - 1:
                raise

            wait_time = base_wait * (2 ** attempt)
            print(f"Rate limited. Waiting {wait_time:.1f}s before retry {attempt + 1}/{max_retries - 1}...")
            time.sleep(wait_time)

        except APIConnectionError as e:
            last_exception = e
            if attempt == max_retries - 1:
                raise

            wait_time = base_wait * (2 ** attempt)
            print(f"Connection error. Waiting {wait_time:.1f}s before retry {attempt + 1}/{max_retries - 1}...")
            time.sleep(wait_time)

    raise last_exception


# More sophisticated version with jitter to prevent thundering herd
import random

def api_call_with_jitter_retry(
    client: anthropic.Anthropic,
    max_retries: int = 3,
    base_wait: float = 1.0,
    **kwargs
) -> anthropic.types.Message:
    """
    API call with exponential backoff and jitter.
    Jitter prevents multiple concurrent requests from all retrying simultaneously.
    """
    for attempt in range(max_retries):
        try:
            return client.messages.create(**kwargs)

        except (RateLimitError, APIConnectionError) as e:
            if attempt == max_retries - 1:
                raise

            # Exponential backoff with random jitter
            wait_time = base_wait * (2 ** attempt) + random.uniform(0, 1)
            print(f"Retry {attempt + 1}/{max_retries - 1} after {wait_time:.1f}s...")
            time.sleep(wait_time)


# Usage
client = anthropic.Anthropic()

response = api_call_with_retry(
    client,
    max_retries=3,
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain exponential backoff in one paragraph."}]
)
print(response.content[0].text)

Rate Limit Management: Proactive Approach

Rather than waiting for rate limit errors, you can manage your request rate proactively using a token bucket or simple delay strategy:

🐍 Code Block: Rate-Controlled Batch Processing

import anthropic
import time
from collections import deque
from datetime import datetime, timedelta

client = anthropic.Anthropic()

class RateLimitedProcessor:
    """
    Process API requests at a controlled rate to stay within limits.
    Uses a sliding window to track requests per minute.
    """

    def __init__(self, requests_per_minute: int = 50):
        self.requests_per_minute = requests_per_minute
        self.request_times = deque()
        self.min_interval = 60.0 / requests_per_minute

    def _wait_if_needed(self):
        """Wait if we are approaching the rate limit."""
        now = datetime.now()
        cutoff = now - timedelta(minutes=1)

        # Remove requests older than 1 minute
        while self.request_times and self.request_times[0] < cutoff:
            self.request_times.popleft()

        # If at limit, wait until oldest request falls outside window
        if len(self.request_times) >= self.requests_per_minute:
            oldest = self.request_times[0]
            wait_until = oldest + timedelta(minutes=1)
            wait_seconds = (wait_until - now).total_seconds()
            if wait_seconds > 0:
                print(f"Rate limiting: waiting {wait_seconds:.1f}s")
                time.sleep(wait_seconds)

    def process(self, prompt: str, system: str = "", model: str = "claude-haiku-4-5-20251001") -> str:
        """Make an API call with rate limiting."""
        self._wait_if_needed()
        self.request_times.append(datetime.now())

        response = client.messages.create(
            model=model,
            max_tokens=512,
            system=system,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text


# Usage
processor = RateLimitedProcessor(requests_per_minute=40)

items = [f"Summarize point {i}: AI APIs enable scalable automation" for i in range(1, 6)]
results = []

for i, item in enumerate(items, 1):
    result = processor.process(item)
    results.append(result)
    print(f"Processed {i}/{len(items)}")

print(f"\nAll {len(results)} items processed.")

Token Counting and Cost Management

Tokens are the currency of API use. Understanding how to count, estimate, and manage token usage keeps costs predictable and prevents surprises.

🐍 Code Block: Token Estimation and Cost Calculation

import anthropic

client = anthropic.Anthropic()

# Anthropic pricing (verify current prices at anthropic.com/pricing)
PRICING = {
    "claude-opus-4-6": {
        "input_per_million": 15.00,
        "output_per_million": 75.00
    },
    "claude-haiku-4-5-20251001": {
        "input_per_million": 0.25,
        "output_per_million": 1.25
    }
}

def estimate_cost(
    input_tokens: int,
    output_tokens: int,
    model: str = "claude-opus-4-6"
) -> dict:
    """Calculate estimated cost for an API call."""
    if model not in PRICING:
        return {"error": f"Unknown model: {model}"}

    prices = PRICING[model]
    input_cost = (input_tokens / 1_000_000) * prices["input_per_million"]
    output_cost = (output_tokens / 1_000_000) * prices["output_per_million"]
    total_cost = input_cost + output_cost

    return {
        "model": model,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "input_cost_usd": round(input_cost, 6),
        "output_cost_usd": round(output_cost, 6),
        "total_cost_usd": round(total_cost, 6)
    }


class CostTracker:
    """Track cumulative API costs across multiple calls."""

    def __init__(self):
        self.total_input_tokens = 0
        self.total_output_tokens = 0
        self.call_count = 0
        self.model_usage = {}

    def record(self, response: anthropic.types.Message, model: str):
        """Record usage from an API response."""
        self.total_input_tokens += response.usage.input_tokens
        self.total_output_tokens += response.usage.output_tokens
        self.call_count += 1

        if model not in self.model_usage:
            self.model_usage[model] = {"input": 0, "output": 0, "calls": 0}
        self.model_usage[model]["input"] += response.usage.input_tokens
        self.model_usage[model]["output"] += response.usage.output_tokens
        self.model_usage[model]["calls"] += 1

    def report(self) -> dict:
        """Generate a cost report."""
        total_cost = 0.0
        model_costs = {}

        for model, usage in self.model_usage.items():
            cost_info = estimate_cost(usage["input"], usage["output"], model)
            if "error" not in cost_info:
                model_costs[model] = cost_info
                total_cost += cost_info["total_cost_usd"]

        return {
            "total_calls": self.call_count,
            "total_input_tokens": self.total_input_tokens,
            "total_output_tokens": self.total_output_tokens,
            "total_tokens": self.total_input_tokens + self.total_output_tokens,
            "estimated_total_cost_usd": round(total_cost, 4),
            "by_model": model_costs
        }


# Example usage
tracker = CostTracker()

prompts = [
    "Explain Python list comprehensions in one paragraph.",
    "What is the difference between a set and a list in Python?",
    "When would you use a dictionary over a list?"
]

for prompt in prompts:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=256,
        messages=[{"role": "user", "content": prompt}]
    )
    tracker.record(response, "claude-haiku-4-5-20251001")
    print(f"Response: {response.content[0].text[:100]}...")

report = tracker.report()
print("\n=== Cost Report ===")
print(f"Total calls: {report['total_calls']}")
print(f"Total tokens: {report['total_tokens']:,}")
print(f"Estimated cost: ${report['estimated_total_cost_usd']:.4f}")

Cost Optimization Strategies

💡 Model selection saves the most money. For tasks that do not require frontier-model reasoning — classification, summarization, extraction, formatting — a fast, economical model (claude-haiku or gpt-4o-mini) produces comparable results at a fraction of the cost. Use powerful models where capability matters; use economical models where speed and cost matter.

💡 Prompt efficiency matters. Longer system prompts and more verbose instructions cost more per call. For batch processing where the same system prompt is sent with every request, even a 100-token reduction in the system prompt saves 100 tokens × number of items. For a 1,000-item batch, that is 100,000 tokens saved.

💡 Caching input tokens. Anthropic offers prompt caching for repeated system prompts, reducing the cost of the cached portion significantly. If your batch processing uses the same system prompt for every item, prompt caching can reduce costs substantially. Consult current Anthropic documentation for caching implementation.

Streaming Responses

For long outputs, streaming displays text as it is generated rather than waiting for the complete response. This improves user experience in interactive applications.

🐍 Code Block: Streaming with Anthropic

import anthropic

client = anthropic.Anthropic()

def stream_response(prompt: str, system: str = "") -> str:
    """Stream a response, printing text as it arrives. Returns the full text."""
    full_text = []

    print("Response: ", end="", flush=True)

    with client.messages.stream(
        model="claude-opus-4-6",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
            full_text.append(text)

    print()  # New line after streaming completes

    # Get final message with usage stats
    final_message = stream.get_final_message()
    print(f"\nTokens used: {final_message.usage.input_tokens} in, {final_message.usage.output_tokens} out")

    return "".join(full_text)


# Interactive streaming example
result = stream_response(
    prompt="Write a detailed comparison of synchronous and asynchronous programming in Python.",
    system="You are a Python educator. Use clear examples and explain concepts thoroughly."
)

print(f"\nTotal response length: {len(result)} characters")

🐍 Code Block: Streaming with OpenAI

from openai import OpenAI

client = OpenAI()

def stream_openai_response(prompt: str, system: str = "") -> str:
    """Stream a response from OpenAI."""
    messages = []
    if system:
        messages.append({"role": "system", "content": system})
    messages.append({"role": "user", "content": prompt})

    full_text = []
    print("Response: ", end="", flush=True)

    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        max_tokens=2048,
        stream=True
    )

    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            text = chunk.choices[0].delta.content
            print(text, end="", flush=True)
            full_text.append(text)

    print()
    return "".join(full_text)


result = stream_openai_response(
    prompt="Explain the event loop in JavaScript in detail.",
    system="You are a JavaScript educator."
)

Practical Automation Examples

Email Triage and Response Drafting

🐍 Code Block: Email Triage System

import anthropic
import json
from dataclasses import dataclass
from typing import Optional

client = anthropic.Anthropic()

@dataclass
class EmailTriageResult:
    """Result of triaging an email."""
    category: str
    priority: str
    sentiment: str
    requires_response: bool
    suggested_assignee: str
    summary: str
    draft_response: Optional[str]
    confidence: float

def triage_email(
    subject: str,
    body: str,
    sender: str,
    generate_draft: bool = True
) -> EmailTriageResult:
    """
    Triage an email: classify, prioritize, and optionally draft a response.
    Uses a two-step chain: classify first, then draft if needed.
    """
    # Step 1: Classify and prioritize
    classification_prompt = f"""Analyze this email and provide a structured classification.

From: {sender}
Subject: {subject}
Body: {body}

Respond with a JSON object containing:
{{
  "category": "sales_inquiry|support_request|billing|complaint|partnership|spam|internal|other",
  "priority": "urgent|high|medium|low",
  "sentiment": "positive|neutral|negative|angry",
  "requires_response": true|false,
  "suggested_assignee": "sales|support|billing|management|no_action",
  "summary": "one sentence summary of the email's main point",
  "confidence": 0.0 to 1.0
}}

Respond with only the JSON object."""

    classification_response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=256,
        messages=[{"role": "user", "content": classification_prompt}]
    )

    try:
        classification = json.loads(classification_response.content[0].text)
    except json.JSONDecodeError:
        # Fallback if JSON parsing fails
        classification = {
            "category": "other",
            "priority": "medium",
            "sentiment": "neutral",
            "requires_response": True,
            "suggested_assignee": "support",
            "summary": "Unable to parse classification",
            "confidence": 0.0
        }

    # Step 2: Draft response if needed
    draft = None
    if generate_draft and classification.get("requires_response", False):
        draft_prompt = f"""Draft a professional response to this email.

Original email:
From: {sender}
Subject: {subject}
Body: {body}

Classification context:
- Category: {classification.get('category')}
- Priority: {classification.get('priority')}
- Assignee team: {classification.get('suggested_assignee')}

Draft a response that:
- Acknowledges the email promptly and professionally
- Sets appropriate expectations (response time, next steps)
- Matches the tone appropriately (formal for complaints, friendly for general inquiries)
- Is concise (under 150 words)

Do not include subject line. Write only the email body."""

        draft_response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=512,
            messages=[{"role": "user", "content": draft_prompt}]
        )
        draft = draft_response.content[0].text

    return EmailTriageResult(
        category=classification.get("category", "other"),
        priority=classification.get("priority", "medium"),
        sentiment=classification.get("sentiment", "neutral"),
        requires_response=classification.get("requires_response", True),
        suggested_assignee=classification.get("suggested_assignee", "support"),
        summary=classification.get("summary", ""),
        draft_response=draft,
        confidence=classification.get("confidence", 0.0)
    )


# Example usage
result = triage_email(
    sender="jane.smith@prospect.com",
    subject="Interested in your enterprise plan",
    body="Hi, I saw your pricing page and I'm interested in the enterprise plan for our team of 50. Can we schedule a call to discuss? We're looking to make a decision by end of month.",
    generate_draft=True
)

print(f"Category: {result.category}")
print(f"Priority: {result.priority}")
print(f"Assigned to: {result.suggested_assignee}")
print(f"Summary: {result.summary}")
if result.draft_response:
    print(f"\nDraft Response:\n{result.draft_response}")

Document Analysis Pipeline

🐍 Code Block: Document Analysis Pipeline

import anthropic
import json
from pathlib import Path

client = anthropic.Anthropic()

def analyze_document(
    document_text: str,
    analysis_type: str = "general",
    custom_questions: list[str] = None
) -> dict:
    """
    Analyze a document with structured output.

    analysis_type: "general" | "legal" | "financial" | "technical" | "custom"
    custom_questions: list of specific questions to answer about the document
    """

    # Build analysis prompt based on type
    if analysis_type == "general":
        analysis_instructions = """Extract and provide:
1. DOCUMENT TYPE: What type of document is this?
2. MAIN SUBJECT: What is this document primarily about? (2-3 sentences)
3. KEY POINTS: The 5 most important points or findings
4. ACTION ITEMS: Any required actions, deadlines, or next steps mentioned
5. IMPORTANT ENTITIES: Key people, organizations, dates, or amounts mentioned
6. DOCUMENT QUALITY: Assessment of completeness and clarity (high/medium/low)"""

    elif analysis_type == "financial":
        analysis_instructions = """Extract and provide:
1. DOCUMENT TYPE: Invoice, contract, report, statement, etc.
2. PARTIES INVOLVED: Organizations and individuals
3. KEY FINANCIAL FIGURES: All monetary amounts with context
4. DATES: Payment dates, contract periods, reporting periods
5. OBLIGATIONS: What each party is required to do or pay
6. RISK FLAGS: Anything unusual or potentially problematic"""

    elif analysis_type == "technical":
        analysis_instructions = """Extract and provide:
1. TECHNOLOGY SCOPE: What systems, languages, or technologies are discussed?
2. ARCHITECTURE OVERVIEW: How are components structured or related?
3. REQUIREMENTS: Functional and non-functional requirements stated
4. DEPENDENCIES: External systems, libraries, or services required
5. OPEN ISSUES: Problems, bugs, or unresolved questions mentioned
6. TECHNICAL DEBT: Any legacy issues or improvement opportunities noted"""

    elif analysis_type == "custom" and custom_questions:
        questions_text = "\n".join([f"{i+1}. {q}" for i, q in enumerate(custom_questions)])
        analysis_instructions = f"Answer the following questions about the document:\n{questions_text}"
    else:
        analysis_instructions = "Provide a comprehensive summary of the document's content and purpose."

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=2048,
        system=(
            "You are a document analyst. Analyze documents thoroughly and accurately. "
            "Always base your analysis on what is explicitly stated in the document, "
            "not on assumptions. Flag any uncertainty explicitly."
        ),
        messages=[{
            "role": "user",
            "content": f"{analysis_instructions}\n\nDOCUMENT:\n{document_text}"
        }]
    )

    return {
        "analysis_type": analysis_type,
        "analysis": response.content[0].text,
        "tokens_used": response.usage.input_tokens + response.usage.output_tokens
    }


def batch_analyze_documents(
    document_folder: str,
    analysis_type: str = "general",
    output_file: str = "document_analyses.json"
) -> list[dict]:
    """Analyze all text files in a folder."""
    folder = Path(document_folder)
    text_files = list(folder.glob("*.txt")) + list(folder.glob("*.md"))

    if not text_files:
        print(f"No text files found in {document_folder}")
        return []

    results = []

    for i, file_path in enumerate(text_files, 1):
        print(f"Analyzing {i}/{len(text_files)}: {file_path.name}")
        document_text = file_path.read_text(encoding="utf-8")

        result = analyze_document(document_text, analysis_type)
        result["filename"] = file_path.name
        results.append(result)

    Path(output_file).write_text(json.dumps(results, indent=2))
    print(f"\nAnalyzed {len(results)} documents. Results saved to {output_file}")
    return results

Integrating with Files, Databases, and Spreadsheets

🐍 Code Block: CSV Processing Pipeline

import anthropic
import csv
import json
from io import StringIO

client = anthropic.Anthropic()

def process_csv_with_ai(
    csv_content: str,
    task_description: str,
    output_format: str = "json"
) -> str:
    """
    Process CSV data with AI assistance.

    Args:
        csv_content: Raw CSV string
        task_description: What you want the AI to do with the data
        output_format: "json" | "csv" | "text"

    Returns: AI-processed result as string
    """
    # Parse CSV to understand structure
    reader = csv.DictReader(StringIO(csv_content))
    rows = list(reader)
    headers = reader.fieldnames or []

    # Provide context about the data structure
    data_context = (
        f"The CSV has {len(rows)} rows and {len(headers)} columns.\n"
        f"Columns: {', '.join(headers)}\n"
        f"First 3 rows sample:\n"
        f"{json.dumps(rows[:3], indent=2)}"
    )

    format_instruction = {
        "json": "Respond with a valid JSON object or array.",
        "csv": "Respond with valid CSV data, including a header row.",
        "text": "Respond with a clear text explanation or analysis."
    }.get(output_format, "Respond clearly.")

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=4096,
        system=(
            "You are a data analyst. Process data accurately. "
            "When working with CSV or tabular data, be precise about column names and values."
        ),
        messages=[{
            "role": "user",
            "content": (
                f"DATA CONTEXT:\n{data_context}\n\n"
                f"FULL DATA:\n{csv_content}\n\n"
                f"TASK: {task_description}\n\n"
                f"OUTPUT FORMAT: {format_instruction}"
            )
        }]
    )

    return response.content[0].text


# Example: Categorize and enrich a CSV of customer records
sample_csv = """customer_id,name,email,signup_date,plan,monthly_spend
C001,Acme Corp,admin@acme.com,2023-01-15,professional,450
C002,Beta Inc,contact@beta.com,2023-03-22,starter,95
C003,Gamma LLC,info@gamma.com,2022-11-08,enterprise,2100
C004,Delta Co,hello@delta.com,2024-01-30,starter,95
C005,Epsilon Partners,ops@epsilon.com,2023-07-14,professional,450"""

result = process_csv_with_ai(
    csv_content=sample_csv,
    task_description=(
        "For each customer, add two new columns: "
        "1) customer_segment: classify as 'high_value' (>$1000/mo), 'mid_market' ($200-$999/mo), or 'small_business' (<$200/mo). "
        "2) churn_risk: based on plan and spending patterns, classify as 'low', 'medium', or 'high'."
    ),
    output_format="csv"
)

print("Enriched CSV:")
print(result)

Security: API Key Management

Security is non-negotiable when working with API keys. Follow these practices consistently.

Best Practice: Environment Variables

Never hardcode API keys in source code. Use environment variables:

import os
from dotenv import load_dotenv

load_dotenv()

api_key = os.getenv("ANTHROPIC_API_KEY")
if not api_key:
    raise ValueError(
        "ANTHROPIC_API_KEY environment variable not set. "
        "Create a .env file with: ANTHROPIC_API_KEY=your-key-here"
    )

Best Practice: .gitignore Configuration

# In your .gitignore file
.env
.env.local
.env.*.local
*.env
secrets.json
credentials.json

Best Practice: Audit Your Repository

Before adding any code to version control, verify no keys are present:

# Search for potential API key patterns in your code
grep -r "sk-ant-" .
grep -r "sk-" .

Best Practice: Key Rotation

If you suspect a key has been exposed, rotate it immediately through the provider's dashboard. Do not wait to confirm exposure — rotate first, investigate second.

⚠️ Common Pitfall: Committing .env to a public repository. This is the most common API key exposure vector. Even if you delete the key from the repository later, it may have been captured by automated scanning services within minutes of the commit.

Research Breakdown: Developer AI API Usage Patterns

Research on API usage patterns among software developers and technical practitioners reveals several consistent findings.

The most common initial use of AI APIs is single-call, single-prompt integration — taking a chat-style interaction and putting it behind an API call. These integrations tend to remain low-value until practitioners make two shifts: moving to structured outputs (JSON rather than prose) and implementing chaining.

Structured output adoption is strongly correlated with successful automation. When an API call returns structured JSON that can be programmatically processed, it integrates reliably into existing systems. When it returns prose that must be parsed, the integration is fragile — any change in the AI's phrasing can break downstream parsing.

Batch processing is consistently the highest-ROI use case for API adoption among non-engineer practitioners who learn Python basics. The ability to process hundreds of items overnight versus one at a time during working hours represents a qualitative change in what is possible, not just a productivity improvement.

Error handling and retry logic is frequently underimplemented in initial API integrations and frequently the cause of batch job failures. The practitioners who build robust integrations invest in error handling upfront rather than adding it after their first batch job fails at item 347 of 500.

Cost surprises are common among API beginners. The most effective mitigation: track and report costs from the first day of API use, not after the first unexpectedly large bill. The CostTracker class pattern above, or a similar approach, should be part of every API integration from the start.


Continue to Chapter 37 to learn how to build configured AI systems — custom GPTs, Claude Projects, and API-based assistants — that come pre-loaded with the context and instructions they need.