The chat interface is a remarkable thing. It puts a powerful AI assistant within reach of anyone — no setup, no code, no technical knowledge required. For many uses, it is exactly the right tool.
In This Chapter
- Beyond the Chat Interface
- Why Use the API Rather Than the Chat Interface?
- API Concepts: What You Need to Know
- Setting Up Your Environment
- The Anthropic Python SDK: Comprehensive Coverage
- The OpenAI Python SDK
- Managing Conversations: Multi-Turn Interactions
- Batch Processing: Working at Scale
- Rate Limiting and Retry Logic
- Token Counting and Cost Management
- Streaming Responses
- Practical Automation Examples
- Integrating with Files, Databases, and Spreadsheets
- Security: API Key Management
- Research Breakdown: Developer AI API Usage Patterns
Chapter 36: Programmatic AI — APIs, Python, and Automations
Beyond the Chat Interface
The chat interface is a remarkable thing. It puts a powerful AI assistant within reach of anyone — no setup, no code, no technical knowledge required. For many uses, it is exactly the right tool.
But the chat interface has fundamental constraints. It is designed for one person asking one question at a time. It stores no persistent state beyond the current session. It cannot trigger on external events. It cannot process a folder of 500 documents overnight. It cannot be integrated into an existing application or data pipeline. It cannot run on a schedule without a human clicking "send."
The API removes all of these constraints.
An API (Application Programming Interface) is how software systems communicate with each other. When you call the Anthropic API from a Python script, you are programmatically sending a message to Claude and receiving a response — the same fundamental interaction as the chat interface, but now under your complete control. You control exactly what is sent, exactly what happens with the response, how errors are handled, how many requests run in parallel, what gets logged, and what triggers a request in the first place.
This chapter builds your complete foundation for programmatic AI use. It covers the Anthropic and OpenAI Python SDKs, multi-turn conversation management, batch processing, rate limiting and retry logic, streaming, cost management, and a set of practical automation examples you can adapt directly to your work. All code in this chapter is syntactically correct and runnable.
You do not need to be a software engineer to benefit from this chapter. If you can install Python packages and run a script, you can build useful AI automations. The chapter is written to be accessible to practitioners with basic Python familiarity while also providing depth for those with more experience.
Why Use the API Rather Than the Chat Interface?
Before writing a line of code, it is worth being clear about when the API is the right choice and when it is not.
Use the API when:
You need to process multiple items systematically — analyzing 200 customer survey responses, generating descriptions for 500 product SKUs, translating a collection of documents. The chat interface requires a human to submit each request; the API can process them in a loop while you do other things.
You need to trigger AI on external events — a new support ticket arrives, a form is submitted, a file appears in a folder, a database record changes. The API can be called from event handlers, webhooks, and scheduled jobs.
You need to integrate AI into an existing system — your CRM, your document management platform, your internal tools. The API is how you add AI capabilities to software that did not previously have them.
You need precise control over prompts, models, and parameters for consistent results — the chat interface's behavior varies based on conversational context in ways that can be unpredictable; the API gives you complete control over every parameter of every request.
You need to build something other people can use — a configured assistant, a Slack bot, an internal tool, a customer-facing feature. The API is the mechanism.
Stay with the chat interface when:
The task is genuinely one-off and involves real-time human judgment throughout. If you will read and evaluate each AI response before deciding what to do next, the chat interface is appropriate.
You are exploring a problem space — brainstorming, drafting, thinking out loud. The conversational format is a feature, not a limitation, in exploratory work.
The output is for your own consumption and does not need to be integrated with other systems.
You are working on a sensitive task where you want maximum visibility into each interaction. The chat interface provides a natural audit trail of your session.
The API is not inherently better than the chat interface. It is a different tool for different purposes. The practitioner who knows when to use each is more effective than one who has adopted a blanket preference.
API Concepts: What You Need to Know
Before writing code, a brief tour of the concepts that govern API use.
Endpoints are the specific URLs you send requests to. Each API provider has different endpoints for different capabilities (text generation, image generation, embeddings, etc.). This chapter focuses on the text generation (chat completions) endpoints.
Authentication proves to the API that you are authorized to use it. API keys are the standard mechanism — a long string of characters that acts as a password for your account. Keep API keys secret and never put them in code that will be shared or version-controlled.
Requests and responses follow a predictable structure. You send a request containing your model selection, your messages, and optional parameters; you receive a response containing the AI's output, usage statistics, and metadata.
Rate limits cap how many requests you can make per minute or per day. Hitting a rate limit returns an error rather than a response. Your code needs to handle these errors gracefully — either by waiting and retrying, or by managing request timing to avoid hitting limits.
Tokens and cost are how API usage is measured and billed. A token is roughly four characters or three-quarters of a word. Input tokens (what you send) and output tokens (what the API generates) are both counted. Different models have different per-token prices. Chapter 36 covers cost management in detail.
Models are the specific AI systems available through the API. Different models offer different capability and cost tradeoffs. For the Anthropic API: claude-opus-4-6 is the most capable; claude-haiku-4-5 is the fastest and most economical. For the OpenAI API: gpt-4o is the most capable; gpt-4o-mini is the economical option.
Setting Up Your Environment
Before writing code, you need:
- Python 3.9 or later
- The anthropic and openai packages
- A .env file for storing API keys securely
🐍 Code Block: Environment Setup
# requirements: pip install anthropic openai python-dotenv
from dotenv import load_dotenv
import os
load_dotenv() # loads ANTHROPIC_API_KEY and OPENAI_API_KEY from .env
# Verify keys are loaded (never print the actual key values)
anthropic_key_loaded = bool(os.getenv("ANTHROPIC_API_KEY"))
openai_key_loaded = bool(os.getenv("OPENAI_API_KEY"))
print(f"Anthropic key loaded: {anthropic_key_loaded}")
print(f"OpenAI key loaded: {openai_key_loaded}")
Your .env file (stored in the same directory as your script, never committed to version control):
ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-your-openai-key-here
Your .gitignore file must include:
.env
*.env
⚠️ Common Pitfall: Hardcoding API keys directly in scripts is a serious security risk. If that code is shared — via GitHub, Slack, email, or any version control system — your key is compromised. Anyone with your API key can make requests billed to your account. Always use environment variables or a secrets manager.
✅ Best Practice: On a team, use a secrets manager (AWS Secrets Manager, HashiCorp Vault, Doppler) rather than distributing .env files. For personal projects, .env with .gitignore is sufficient.
The Anthropic Python SDK: Comprehensive Coverage
The Anthropic Python SDK is the official way to interact with Claude programmatically. It handles authentication, request formatting, response parsing, and error handling.
Installation and Basic Usage
pip install anthropic
🐍 Code Block: Basic Anthropic API Call
import anthropic
client = anthropic.Anthropic()
# The client automatically reads ANTHROPIC_API_KEY from the environment
# Basic message
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
system="You are an expert technical writer.",
messages=[
{"role": "user", "content": "Explain REST APIs to a non-technical audience."}
]
)
print(response.content[0].text)
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
The response object contains several important attributes:
response.content[0].text— the AI's text responseresponse.usage.input_tokens— how many tokens your prompt consumedresponse.usage.output_tokens— how many tokens the response consumedresponse.model— which model was usedresponse.stop_reason— why the model stopped generating ("end_turn"means normal completion;"max_tokens"means the response was cut off by the token limit)
⚠️ Common Pitfall: Forgetting to check response.stop_reason. If it is "max_tokens", the response is truncated — you received an incomplete output. Either increase max_tokens or design your prompts to produce shorter outputs.
The Messages API in Detail
The Anthropic Messages API uses a conversational message structure. Every request is a list of messages alternating between "user" and "assistant" roles. The system prompt is passed separately.
🐍 Code Block: Messages API Parameters
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6", # Required: model identifier
max_tokens=4096, # Required: maximum output tokens
system="You are a helpful assistant specializing in data analysis.", # Optional system prompt
messages=[ # Required: conversation history
{"role": "user", "content": "What are the main approaches to data normalization?"},
{"role": "assistant", "content": "There are three main normal forms..."}, # Optional: prior turns
{"role": "user", "content": "Can you compare 2NF and 3NF specifically?"}
],
temperature=0.7, # Optional: 0.0 (deterministic) to 1.0 (creative)
top_p=0.9, # Optional: nucleus sampling parameter
stop_sequences=["END", "DONE"], # Optional: stop generation at these strings
metadata={"user_id": "raj-123"} # Optional: request metadata for your records
)
Key parameters explained:
max_tokens: Set this to comfortably accommodate the response you expect. Do not set it to the model's maximum every time — unused token budget does not cost anything, but setting it very high can lead to unnecessarily verbose responses. For short-answer tasks, 512 tokens is usually sufficient. For detailed analyses, 2048-4096.
temperature: Controls randomness. For factual or analytical tasks (data extraction, classification, summarization), use 0.0-0.3. For creative tasks (writing, brainstorming), use 0.7-1.0. For most workflow automation, 0.3 is a reasonable default.
system: The system prompt sets the AI's persona, capabilities, and behavioral constraints for the entire interaction. Well-designed system prompts dramatically improve output consistency.
Handling API Responses
🐍 Code Block: Complete Response Handling
import anthropic
from anthropic import APIError, APIConnectionError, RateLimitError, AuthenticationError
client = anthropic.Anthropic()
def safe_api_call(prompt: str, system: str = "", model: str = "claude-opus-4-6") -> dict:
"""
Make an API call with complete error handling.
Returns a dict with 'success', 'text', 'usage', and 'error' keys.
"""
try:
response = client.messages.create(
model=model,
max_tokens=2048,
system=system,
messages=[{"role": "user", "content": prompt}]
)
# Check for truncation
if response.stop_reason == "max_tokens":
print("Warning: Response was truncated. Consider increasing max_tokens.")
return {
"success": True,
"text": response.content[0].text,
"usage": {
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"total_tokens": response.usage.input_tokens + response.usage.output_tokens
},
"stop_reason": response.stop_reason,
"error": None
}
except AuthenticationError:
return {"success": False, "text": None, "usage": None, "error": "Invalid API key. Check your ANTHROPIC_API_KEY environment variable."}
except RateLimitError:
return {"success": False, "text": None, "usage": None, "error": "Rate limit hit. Slow down your requests."}
except APIConnectionError:
return {"success": False, "text": None, "usage": None, "error": "Connection failed. Check your internet connection."}
except APIError as e:
return {"success": False, "text": None, "usage": None, "error": f"API error: {e.status_code} - {e.message}"}
# Usage
result = safe_api_call(
prompt="Summarize the three main benefits of containerization in software deployment.",
system="You are a technical writer. Be concise and clear."
)
if result["success"]:
print(result["text"])
print(f"Tokens used: {result['usage']['total_tokens']}")
else:
print(f"Error: {result['error']}")
The OpenAI Python SDK
The OpenAI SDK follows a similar pattern to Anthropic but with some structural differences.
🐍 Code Block: Basic OpenAI API Call
from openai import OpenAI
client = OpenAI()
# Automatically reads OPENAI_API_KEY from environment
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are an expert technical writer."},
{"role": "user", "content": "Explain REST APIs to a non-technical audience."}
],
max_tokens=2048,
temperature=0.7
)
print(response.choices[0].message.content)
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
Key structural differences from Anthropic:
- OpenAI uses
client.chat.completions.create()vs. Anthropic'sclient.messages.create() - OpenAI includes the system message in the messages list (as
{"role": "system", ...}) - OpenAI returns
response.choices[0].message.contentvs. Anthropic'sresponse.content[0].text - Token counts use
response.usage.prompt_tokensandresponse.usage.completion_tokensvs.input_tokensandoutput_tokens - Finish reason is in
response.choices[0].finish_reasonvs.response.stop_reason
🐍 Code Block: OpenAI with Error Handling
from openai import OpenAI, AuthenticationError, RateLimitError, APIConnectionError, APIError
client = OpenAI()
def openai_call(prompt: str, system: str = "", model: str = "gpt-4o") -> dict:
"""OpenAI API call with complete error handling."""
try:
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=2048
)
return {
"success": True,
"text": response.choices[0].message.content,
"usage": {
"input_tokens": response.usage.prompt_tokens,
"output_tokens": response.usage.completion_tokens,
"total_tokens": response.usage.total_tokens
},
"finish_reason": response.choices[0].finish_reason,
"error": None
}
except AuthenticationError:
return {"success": False, "text": None, "usage": None, "error": "Invalid OpenAI API key."}
except RateLimitError:
return {"success": False, "text": None, "usage": None, "error": "OpenAI rate limit hit."}
except APIConnectionError:
return {"success": False, "text": None, "usage": None, "error": "Connection failed."}
except APIError as e:
return {"success": False, "text": None, "usage": None, "error": f"OpenAI API error: {str(e)}"}
Managing Conversations: Multi-Turn Interactions
A multi-turn conversation maintains message history across exchanges, allowing the AI to reference earlier parts of the conversation.
🐍 Code Block: Conversation Manager (Anthropic)
import anthropic
client = anthropic.Anthropic()
def chat_session(system_prompt: str):
"""Simple multi-turn conversation manager."""
messages = []
print(f"System: {system_prompt}")
print("Type 'quit' or 'exit' to end the session.\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ["quit", "exit"]:
print("Session ended.")
break
if not user_input:
continue
messages.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
system=system_prompt,
messages=messages
)
assistant_message = response.content[0].text
messages.append({"role": "assistant", "content": assistant_message})
print(f"\nClaude: {assistant_message}\n")
return messages
# Run a session
conversation_history = chat_session(
"You are a technical code reviewer specializing in Python best practices. "
"When reviewing code, always explain your reasoning and suggest specific improvements."
)
Important: Managing context in long conversations
As conversations grow longer, the cumulative message history grows too, eventually approaching the model's context window limit. For automated pipelines that need sustained multi-turn context, implement a summarization strategy:
🐍 Code Block: Conversation with Context Summarization
import anthropic
client = anthropic.Anthropic()
def summarize_conversation(messages: list[dict]) -> str:
"""Condense a long conversation into a summary for context management."""
conversation_text = "\n".join([
f"{msg['role'].upper()}: {msg['content']}"
for msg in messages
])
response = client.messages.create(
model="claude-haiku-4-5-20251001", # Use fast model for utility tasks
max_tokens=512,
messages=[{
"role": "user",
"content": (
f"Summarize the key points from this conversation in 200 words or less. "
f"Focus on decisions made, information established, and open questions:\n\n"
f"{conversation_text}"
)
}]
)
return response.content[0].text
class ManagedConversation:
"""A conversation manager that summarizes when history gets long."""
def __init__(self, system_prompt: str, max_turns_before_summary: int = 10):
self.system_prompt = system_prompt
self.messages = []
self.max_turns = max_turns_before_summary
self.summary_context = ""
def send(self, user_input: str) -> str:
"""Send a message and get a response, managing context automatically."""
# Check if we need to summarize
if len(self.messages) >= self.max_turns * 2: # *2 because each turn is 2 messages
print("(Summarizing conversation history to manage context...)")
self.summary_context = summarize_conversation(self.messages)
self.messages = [] # Clear old messages
self.messages.append({"role": "user", "content": user_input})
# Build the system prompt, incorporating summary if available
effective_system = self.system_prompt
if self.summary_context:
effective_system += f"\n\nContext from earlier in this conversation:\n{self.summary_context}"
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
system=effective_system,
messages=self.messages
)
assistant_message = response.content[0].text
self.messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
# Usage
conv = ManagedConversation(
system_prompt="You are a code review assistant. Review code for correctness, efficiency, and style.",
max_turns_before_summary=8
)
response = conv.send("Here's a Python function I wrote: def calc(x, y): return x/y")
print(f"Assistant: {response}")
response = conv.send("What if y is zero?")
print(f"Assistant: {response}")
Batch Processing: Working at Scale
Batch processing is where the API's value over the chat interface is most dramatic. Processing 100 items that would take hours of manual copy-paste work can be done in minutes.
🐍 Code Block: Batch Summarization Pipeline
import anthropic
import json
from pathlib import Path
def batch_summarize(texts: list[str], output_file: str = "summaries.json") -> list[dict]:
"""Summarize multiple texts using the API."""
client = anthropic.Anthropic()
results = []
for i, text in enumerate(texts, 1):
print(f"Processing {i}/{len(texts)}...")
response = client.messages.create(
model="claude-haiku-4-5-20251001", # Use faster model for batch tasks
max_tokens=512,
messages=[{
"role": "user",
"content": f"Summarize the following in 2-3 sentences:\n\n{text}"
}]
)
results.append({
"original": text[:100] + "...",
"summary": response.content[0].text,
"tokens_used": response.usage.input_tokens + response.usage.output_tokens
})
Path(output_file).write_text(json.dumps(results, indent=2))
print(f"Saved {len(results)} summaries to {output_file}")
return results
# Example usage
sample_texts = [
"Customer feedback: The onboarding process was confusing. I couldn't find where to set up integrations and the help docs weren't helpful. Eventually figured it out after 45 minutes but almost cancelled.",
"Customer feedback: Love the product overall. The dashboard is intuitive and the reporting features are exactly what we needed. The mobile app could use some work — it's missing the bulk actions that the web version has.",
"Customer feedback: Support was fantastic when I had issues. They responded within an hour and solved my problem completely. The price is a bit high compared to alternatives but the quality justifies it.",
]
summaries = batch_summarize(sample_texts)
for i, result in enumerate(summaries, 1):
print(f"\n--- Item {i} ---")
print(f"Summary: {result['summary']}")
print(f"Tokens: {result['tokens_used']}")
Batch Processing with Progress Tracking and Error Recovery
For large batches, you need progress tracking and the ability to resume if the script fails partway through:
🐍 Code Block: Resumable Batch Processor
import anthropic
import json
import time
from pathlib import Path
from datetime import datetime
client = anthropic.Anthropic()
def process_batch_with_recovery(
items: list[dict],
process_fn,
output_file: str = "batch_output.json",
checkpoint_file: str = "batch_checkpoint.json"
) -> list[dict]:
"""
Process a batch of items with checkpointing for recovery.
Args:
items: List of dicts, each with an 'id' key
process_fn: Function that takes an item dict and returns a result dict
output_file: Where to save completed results
checkpoint_file: Where to save progress for recovery
Returns: List of result dicts
"""
# Load existing checkpoint if present
checkpoint_path = Path(checkpoint_file)
completed_ids = set()
results = []
if checkpoint_path.exists():
checkpoint = json.loads(checkpoint_path.read_text())
completed_ids = set(checkpoint.get("completed_ids", []))
results = checkpoint.get("results", [])
print(f"Resuming: {len(completed_ids)} items already processed")
# Process remaining items
remaining = [item for item in items if item["id"] not in completed_ids]
total = len(items)
for i, item in enumerate(remaining, 1):
current_num = len(completed_ids) + i
print(f"Processing {current_num}/{total}: {item['id']}")
try:
result = process_fn(item)
results.append(result)
completed_ids.add(item["id"])
# Save checkpoint every 10 items
if current_num % 10 == 0:
checkpoint_data = {
"completed_ids": list(completed_ids),
"results": results,
"last_updated": datetime.now().isoformat()
}
checkpoint_path.write_text(json.dumps(checkpoint_data, indent=2))
print(f" Checkpoint saved at item {current_num}")
except Exception as e:
print(f" ERROR processing {item['id']}: {e}")
results.append({
"id": item["id"],
"success": False,
"error": str(e)
})
# Small delay to avoid rate limits
time.sleep(0.5)
# Save final output
Path(output_file).write_text(json.dumps(results, indent=2))
# Clean up checkpoint
if checkpoint_path.exists():
checkpoint_path.unlink()
print(f"\nBatch complete. {len(results)} items processed.")
print(f"Results saved to {output_file}")
return results
def classify_feedback(item: dict) -> dict:
"""Example processing function: classify customer feedback."""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=256,
messages=[{
"role": "user",
"content": (
f"Classify this customer feedback:\n\n{item['text']}\n\n"
"Respond with a JSON object containing:\n"
"- sentiment: positive/neutral/negative\n"
"- category: product/support/pricing/ux/other\n"
"- priority: high/medium/low\n"
"- key_issue: one sentence summary of the main point\n\n"
"Respond with only the JSON object, no other text."
)
}]
)
try:
classification = json.loads(response.content[0].text)
except json.JSONDecodeError:
classification = {"raw_response": response.content[0].text}
return {
"id": item["id"],
"success": True,
"classification": classification,
"tokens_used": response.usage.input_tokens + response.usage.output_tokens
}
# Example usage with the resumable processor
feedback_items = [
{"id": "fb001", "text": "The API documentation is excellent and the SDKs are well-designed."},
{"id": "fb002", "text": "Pricing is way too high for small teams. We almost churned because of cost."},
{"id": "fb003", "text": "Support team responded quickly but didn't resolve the actual issue."},
]
results = process_batch_with_recovery(
items=feedback_items,
process_fn=classify_feedback,
output_file="feedback_classifications.json"
)
# Report results
successful = [r for r in results if r.get("success")]
print(f"\n{len(successful)}/{len(results)} items processed successfully")
Rate Limiting and Retry Logic
Rate limits are not errors to avoid — they are a normal part of API usage, especially when processing large batches. Robust code handles them gracefully.
🐍 Code Block: Exponential Backoff Retry
import time
import anthropic
from anthropic import RateLimitError, APIConnectionError
def api_call_with_retry(
client: anthropic.Anthropic,
max_retries: int = 3,
base_wait: float = 1.0,
**kwargs
) -> anthropic.types.Message:
"""
API call with exponential backoff on rate limit and connection errors.
Args:
client: Anthropic client instance
max_retries: Maximum number of retry attempts
base_wait: Base wait time in seconds (doubles each retry)
**kwargs: Arguments to pass to client.messages.create()
Returns: API response
Raises: The last exception if all retries are exhausted
"""
last_exception = None
for attempt in range(max_retries):
try:
return client.messages.create(**kwargs)
except RateLimitError as e:
last_exception = e
if attempt == max_retries - 1:
raise
wait_time = base_wait * (2 ** attempt)
print(f"Rate limited. Waiting {wait_time:.1f}s before retry {attempt + 1}/{max_retries - 1}...")
time.sleep(wait_time)
except APIConnectionError as e:
last_exception = e
if attempt == max_retries - 1:
raise
wait_time = base_wait * (2 ** attempt)
print(f"Connection error. Waiting {wait_time:.1f}s before retry {attempt + 1}/{max_retries - 1}...")
time.sleep(wait_time)
raise last_exception
# More sophisticated version with jitter to prevent thundering herd
import random
def api_call_with_jitter_retry(
client: anthropic.Anthropic,
max_retries: int = 3,
base_wait: float = 1.0,
**kwargs
) -> anthropic.types.Message:
"""
API call with exponential backoff and jitter.
Jitter prevents multiple concurrent requests from all retrying simultaneously.
"""
for attempt in range(max_retries):
try:
return client.messages.create(**kwargs)
except (RateLimitError, APIConnectionError) as e:
if attempt == max_retries - 1:
raise
# Exponential backoff with random jitter
wait_time = base_wait * (2 ** attempt) + random.uniform(0, 1)
print(f"Retry {attempt + 1}/{max_retries - 1} after {wait_time:.1f}s...")
time.sleep(wait_time)
# Usage
client = anthropic.Anthropic()
response = api_call_with_retry(
client,
max_retries=3,
model="claude-opus-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain exponential backoff in one paragraph."}]
)
print(response.content[0].text)
Rate Limit Management: Proactive Approach
Rather than waiting for rate limit errors, you can manage your request rate proactively using a token bucket or simple delay strategy:
🐍 Code Block: Rate-Controlled Batch Processing
import anthropic
import time
from collections import deque
from datetime import datetime, timedelta
client = anthropic.Anthropic()
class RateLimitedProcessor:
"""
Process API requests at a controlled rate to stay within limits.
Uses a sliding window to track requests per minute.
"""
def __init__(self, requests_per_minute: int = 50):
self.requests_per_minute = requests_per_minute
self.request_times = deque()
self.min_interval = 60.0 / requests_per_minute
def _wait_if_needed(self):
"""Wait if we are approaching the rate limit."""
now = datetime.now()
cutoff = now - timedelta(minutes=1)
# Remove requests older than 1 minute
while self.request_times and self.request_times[0] < cutoff:
self.request_times.popleft()
# If at limit, wait until oldest request falls outside window
if len(self.request_times) >= self.requests_per_minute:
oldest = self.request_times[0]
wait_until = oldest + timedelta(minutes=1)
wait_seconds = (wait_until - now).total_seconds()
if wait_seconds > 0:
print(f"Rate limiting: waiting {wait_seconds:.1f}s")
time.sleep(wait_seconds)
def process(self, prompt: str, system: str = "", model: str = "claude-haiku-4-5-20251001") -> str:
"""Make an API call with rate limiting."""
self._wait_if_needed()
self.request_times.append(datetime.now())
response = client.messages.create(
model=model,
max_tokens=512,
system=system,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
# Usage
processor = RateLimitedProcessor(requests_per_minute=40)
items = [f"Summarize point {i}: AI APIs enable scalable automation" for i in range(1, 6)]
results = []
for i, item in enumerate(items, 1):
result = processor.process(item)
results.append(result)
print(f"Processed {i}/{len(items)}")
print(f"\nAll {len(results)} items processed.")
Token Counting and Cost Management
Tokens are the currency of API use. Understanding how to count, estimate, and manage token usage keeps costs predictable and prevents surprises.
🐍 Code Block: Token Estimation and Cost Calculation
import anthropic
client = anthropic.Anthropic()
# Anthropic pricing (verify current prices at anthropic.com/pricing)
PRICING = {
"claude-opus-4-6": {
"input_per_million": 15.00,
"output_per_million": 75.00
},
"claude-haiku-4-5-20251001": {
"input_per_million": 0.25,
"output_per_million": 1.25
}
}
def estimate_cost(
input_tokens: int,
output_tokens: int,
model: str = "claude-opus-4-6"
) -> dict:
"""Calculate estimated cost for an API call."""
if model not in PRICING:
return {"error": f"Unknown model: {model}"}
prices = PRICING[model]
input_cost = (input_tokens / 1_000_000) * prices["input_per_million"]
output_cost = (output_tokens / 1_000_000) * prices["output_per_million"]
total_cost = input_cost + output_cost
return {
"model": model,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"input_cost_usd": round(input_cost, 6),
"output_cost_usd": round(output_cost, 6),
"total_cost_usd": round(total_cost, 6)
}
class CostTracker:
"""Track cumulative API costs across multiple calls."""
def __init__(self):
self.total_input_tokens = 0
self.total_output_tokens = 0
self.call_count = 0
self.model_usage = {}
def record(self, response: anthropic.types.Message, model: str):
"""Record usage from an API response."""
self.total_input_tokens += response.usage.input_tokens
self.total_output_tokens += response.usage.output_tokens
self.call_count += 1
if model not in self.model_usage:
self.model_usage[model] = {"input": 0, "output": 0, "calls": 0}
self.model_usage[model]["input"] += response.usage.input_tokens
self.model_usage[model]["output"] += response.usage.output_tokens
self.model_usage[model]["calls"] += 1
def report(self) -> dict:
"""Generate a cost report."""
total_cost = 0.0
model_costs = {}
for model, usage in self.model_usage.items():
cost_info = estimate_cost(usage["input"], usage["output"], model)
if "error" not in cost_info:
model_costs[model] = cost_info
total_cost += cost_info["total_cost_usd"]
return {
"total_calls": self.call_count,
"total_input_tokens": self.total_input_tokens,
"total_output_tokens": self.total_output_tokens,
"total_tokens": self.total_input_tokens + self.total_output_tokens,
"estimated_total_cost_usd": round(total_cost, 4),
"by_model": model_costs
}
# Example usage
tracker = CostTracker()
prompts = [
"Explain Python list comprehensions in one paragraph.",
"What is the difference between a set and a list in Python?",
"When would you use a dictionary over a list?"
]
for prompt in prompts:
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=256,
messages=[{"role": "user", "content": prompt}]
)
tracker.record(response, "claude-haiku-4-5-20251001")
print(f"Response: {response.content[0].text[:100]}...")
report = tracker.report()
print("\n=== Cost Report ===")
print(f"Total calls: {report['total_calls']}")
print(f"Total tokens: {report['total_tokens']:,}")
print(f"Estimated cost: ${report['estimated_total_cost_usd']:.4f}")
Cost Optimization Strategies
💡 Model selection saves the most money. For tasks that do not require frontier-model reasoning — classification, summarization, extraction, formatting — a fast, economical model (claude-haiku or gpt-4o-mini) produces comparable results at a fraction of the cost. Use powerful models where capability matters; use economical models where speed and cost matter.
💡 Prompt efficiency matters. Longer system prompts and more verbose instructions cost more per call. For batch processing where the same system prompt is sent with every request, even a 100-token reduction in the system prompt saves 100 tokens × number of items. For a 1,000-item batch, that is 100,000 tokens saved.
💡 Caching input tokens. Anthropic offers prompt caching for repeated system prompts, reducing the cost of the cached portion significantly. If your batch processing uses the same system prompt for every item, prompt caching can reduce costs substantially. Consult current Anthropic documentation for caching implementation.
Streaming Responses
For long outputs, streaming displays text as it is generated rather than waiting for the complete response. This improves user experience in interactive applications.
🐍 Code Block: Streaming with Anthropic
import anthropic
client = anthropic.Anthropic()
def stream_response(prompt: str, system: str = "") -> str:
"""Stream a response, printing text as it arrives. Returns the full text."""
full_text = []
print("Response: ", end="", flush=True)
with client.messages.stream(
model="claude-opus-4-6",
max_tokens=2048,
system=system,
messages=[{"role": "user", "content": prompt}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
full_text.append(text)
print() # New line after streaming completes
# Get final message with usage stats
final_message = stream.get_final_message()
print(f"\nTokens used: {final_message.usage.input_tokens} in, {final_message.usage.output_tokens} out")
return "".join(full_text)
# Interactive streaming example
result = stream_response(
prompt="Write a detailed comparison of synchronous and asynchronous programming in Python.",
system="You are a Python educator. Use clear examples and explain concepts thoroughly."
)
print(f"\nTotal response length: {len(result)} characters")
🐍 Code Block: Streaming with OpenAI
from openai import OpenAI
client = OpenAI()
def stream_openai_response(prompt: str, system: str = "") -> str:
"""Stream a response from OpenAI."""
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
full_text = []
print("Response: ", end="", flush=True)
stream = client.chat.completions.create(
model="gpt-4o",
messages=messages,
max_tokens=2048,
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
text = chunk.choices[0].delta.content
print(text, end="", flush=True)
full_text.append(text)
print()
return "".join(full_text)
result = stream_openai_response(
prompt="Explain the event loop in JavaScript in detail.",
system="You are a JavaScript educator."
)
Practical Automation Examples
Email Triage and Response Drafting
🐍 Code Block: Email Triage System
import anthropic
import json
from dataclasses import dataclass
from typing import Optional
client = anthropic.Anthropic()
@dataclass
class EmailTriageResult:
"""Result of triaging an email."""
category: str
priority: str
sentiment: str
requires_response: bool
suggested_assignee: str
summary: str
draft_response: Optional[str]
confidence: float
def triage_email(
subject: str,
body: str,
sender: str,
generate_draft: bool = True
) -> EmailTriageResult:
"""
Triage an email: classify, prioritize, and optionally draft a response.
Uses a two-step chain: classify first, then draft if needed.
"""
# Step 1: Classify and prioritize
classification_prompt = f"""Analyze this email and provide a structured classification.
From: {sender}
Subject: {subject}
Body: {body}
Respond with a JSON object containing:
{{
"category": "sales_inquiry|support_request|billing|complaint|partnership|spam|internal|other",
"priority": "urgent|high|medium|low",
"sentiment": "positive|neutral|negative|angry",
"requires_response": true|false,
"suggested_assignee": "sales|support|billing|management|no_action",
"summary": "one sentence summary of the email's main point",
"confidence": 0.0 to 1.0
}}
Respond with only the JSON object."""
classification_response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=256,
messages=[{"role": "user", "content": classification_prompt}]
)
try:
classification = json.loads(classification_response.content[0].text)
except json.JSONDecodeError:
# Fallback if JSON parsing fails
classification = {
"category": "other",
"priority": "medium",
"sentiment": "neutral",
"requires_response": True,
"suggested_assignee": "support",
"summary": "Unable to parse classification",
"confidence": 0.0
}
# Step 2: Draft response if needed
draft = None
if generate_draft and classification.get("requires_response", False):
draft_prompt = f"""Draft a professional response to this email.
Original email:
From: {sender}
Subject: {subject}
Body: {body}
Classification context:
- Category: {classification.get('category')}
- Priority: {classification.get('priority')}
- Assignee team: {classification.get('suggested_assignee')}
Draft a response that:
- Acknowledges the email promptly and professionally
- Sets appropriate expectations (response time, next steps)
- Matches the tone appropriately (formal for complaints, friendly for general inquiries)
- Is concise (under 150 words)
Do not include subject line. Write only the email body."""
draft_response = client.messages.create(
model="claude-opus-4-6",
max_tokens=512,
messages=[{"role": "user", "content": draft_prompt}]
)
draft = draft_response.content[0].text
return EmailTriageResult(
category=classification.get("category", "other"),
priority=classification.get("priority", "medium"),
sentiment=classification.get("sentiment", "neutral"),
requires_response=classification.get("requires_response", True),
suggested_assignee=classification.get("suggested_assignee", "support"),
summary=classification.get("summary", ""),
draft_response=draft,
confidence=classification.get("confidence", 0.0)
)
# Example usage
result = triage_email(
sender="jane.smith@prospect.com",
subject="Interested in your enterprise plan",
body="Hi, I saw your pricing page and I'm interested in the enterprise plan for our team of 50. Can we schedule a call to discuss? We're looking to make a decision by end of month.",
generate_draft=True
)
print(f"Category: {result.category}")
print(f"Priority: {result.priority}")
print(f"Assigned to: {result.suggested_assignee}")
print(f"Summary: {result.summary}")
if result.draft_response:
print(f"\nDraft Response:\n{result.draft_response}")
Document Analysis Pipeline
🐍 Code Block: Document Analysis Pipeline
import anthropic
import json
from pathlib import Path
client = anthropic.Anthropic()
def analyze_document(
document_text: str,
analysis_type: str = "general",
custom_questions: list[str] = None
) -> dict:
"""
Analyze a document with structured output.
analysis_type: "general" | "legal" | "financial" | "technical" | "custom"
custom_questions: list of specific questions to answer about the document
"""
# Build analysis prompt based on type
if analysis_type == "general":
analysis_instructions = """Extract and provide:
1. DOCUMENT TYPE: What type of document is this?
2. MAIN SUBJECT: What is this document primarily about? (2-3 sentences)
3. KEY POINTS: The 5 most important points or findings
4. ACTION ITEMS: Any required actions, deadlines, or next steps mentioned
5. IMPORTANT ENTITIES: Key people, organizations, dates, or amounts mentioned
6. DOCUMENT QUALITY: Assessment of completeness and clarity (high/medium/low)"""
elif analysis_type == "financial":
analysis_instructions = """Extract and provide:
1. DOCUMENT TYPE: Invoice, contract, report, statement, etc.
2. PARTIES INVOLVED: Organizations and individuals
3. KEY FINANCIAL FIGURES: All monetary amounts with context
4. DATES: Payment dates, contract periods, reporting periods
5. OBLIGATIONS: What each party is required to do or pay
6. RISK FLAGS: Anything unusual or potentially problematic"""
elif analysis_type == "technical":
analysis_instructions = """Extract and provide:
1. TECHNOLOGY SCOPE: What systems, languages, or technologies are discussed?
2. ARCHITECTURE OVERVIEW: How are components structured or related?
3. REQUIREMENTS: Functional and non-functional requirements stated
4. DEPENDENCIES: External systems, libraries, or services required
5. OPEN ISSUES: Problems, bugs, or unresolved questions mentioned
6. TECHNICAL DEBT: Any legacy issues or improvement opportunities noted"""
elif analysis_type == "custom" and custom_questions:
questions_text = "\n".join([f"{i+1}. {q}" for i, q in enumerate(custom_questions)])
analysis_instructions = f"Answer the following questions about the document:\n{questions_text}"
else:
analysis_instructions = "Provide a comprehensive summary of the document's content and purpose."
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
system=(
"You are a document analyst. Analyze documents thoroughly and accurately. "
"Always base your analysis on what is explicitly stated in the document, "
"not on assumptions. Flag any uncertainty explicitly."
),
messages=[{
"role": "user",
"content": f"{analysis_instructions}\n\nDOCUMENT:\n{document_text}"
}]
)
return {
"analysis_type": analysis_type,
"analysis": response.content[0].text,
"tokens_used": response.usage.input_tokens + response.usage.output_tokens
}
def batch_analyze_documents(
document_folder: str,
analysis_type: str = "general",
output_file: str = "document_analyses.json"
) -> list[dict]:
"""Analyze all text files in a folder."""
folder = Path(document_folder)
text_files = list(folder.glob("*.txt")) + list(folder.glob("*.md"))
if not text_files:
print(f"No text files found in {document_folder}")
return []
results = []
for i, file_path in enumerate(text_files, 1):
print(f"Analyzing {i}/{len(text_files)}: {file_path.name}")
document_text = file_path.read_text(encoding="utf-8")
result = analyze_document(document_text, analysis_type)
result["filename"] = file_path.name
results.append(result)
Path(output_file).write_text(json.dumps(results, indent=2))
print(f"\nAnalyzed {len(results)} documents. Results saved to {output_file}")
return results
Integrating with Files, Databases, and Spreadsheets
🐍 Code Block: CSV Processing Pipeline
import anthropic
import csv
import json
from io import StringIO
client = anthropic.Anthropic()
def process_csv_with_ai(
csv_content: str,
task_description: str,
output_format: str = "json"
) -> str:
"""
Process CSV data with AI assistance.
Args:
csv_content: Raw CSV string
task_description: What you want the AI to do with the data
output_format: "json" | "csv" | "text"
Returns: AI-processed result as string
"""
# Parse CSV to understand structure
reader = csv.DictReader(StringIO(csv_content))
rows = list(reader)
headers = reader.fieldnames or []
# Provide context about the data structure
data_context = (
f"The CSV has {len(rows)} rows and {len(headers)} columns.\n"
f"Columns: {', '.join(headers)}\n"
f"First 3 rows sample:\n"
f"{json.dumps(rows[:3], indent=2)}"
)
format_instruction = {
"json": "Respond with a valid JSON object or array.",
"csv": "Respond with valid CSV data, including a header row.",
"text": "Respond with a clear text explanation or analysis."
}.get(output_format, "Respond clearly.")
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
system=(
"You are a data analyst. Process data accurately. "
"When working with CSV or tabular data, be precise about column names and values."
),
messages=[{
"role": "user",
"content": (
f"DATA CONTEXT:\n{data_context}\n\n"
f"FULL DATA:\n{csv_content}\n\n"
f"TASK: {task_description}\n\n"
f"OUTPUT FORMAT: {format_instruction}"
)
}]
)
return response.content[0].text
# Example: Categorize and enrich a CSV of customer records
sample_csv = """customer_id,name,email,signup_date,plan,monthly_spend
C001,Acme Corp,admin@acme.com,2023-01-15,professional,450
C002,Beta Inc,contact@beta.com,2023-03-22,starter,95
C003,Gamma LLC,info@gamma.com,2022-11-08,enterprise,2100
C004,Delta Co,hello@delta.com,2024-01-30,starter,95
C005,Epsilon Partners,ops@epsilon.com,2023-07-14,professional,450"""
result = process_csv_with_ai(
csv_content=sample_csv,
task_description=(
"For each customer, add two new columns: "
"1) customer_segment: classify as 'high_value' (>$1000/mo), 'mid_market' ($200-$999/mo), or 'small_business' (<$200/mo). "
"2) churn_risk: based on plan and spending patterns, classify as 'low', 'medium', or 'high'."
),
output_format="csv"
)
print("Enriched CSV:")
print(result)
Security: API Key Management
Security is non-negotiable when working with API keys. Follow these practices consistently.
✅ Best Practice: Environment Variables
Never hardcode API keys in source code. Use environment variables:
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("ANTHROPIC_API_KEY")
if not api_key:
raise ValueError(
"ANTHROPIC_API_KEY environment variable not set. "
"Create a .env file with: ANTHROPIC_API_KEY=your-key-here"
)
✅ Best Practice: .gitignore Configuration
# In your .gitignore file
.env
.env.local
.env.*.local
*.env
secrets.json
credentials.json
✅ Best Practice: Audit Your Repository
Before adding any code to version control, verify no keys are present:
# Search for potential API key patterns in your code
grep -r "sk-ant-" .
grep -r "sk-" .
✅ Best Practice: Key Rotation
If you suspect a key has been exposed, rotate it immediately through the provider's dashboard. Do not wait to confirm exposure — rotate first, investigate second.
⚠️ Common Pitfall: Committing .env to a public repository. This is the most common API key exposure vector. Even if you delete the key from the repository later, it may have been captured by automated scanning services within minutes of the commit.
Research Breakdown: Developer AI API Usage Patterns
Research on API usage patterns among software developers and technical practitioners reveals several consistent findings.
The most common initial use of AI APIs is single-call, single-prompt integration — taking a chat-style interaction and putting it behind an API call. These integrations tend to remain low-value until practitioners make two shifts: moving to structured outputs (JSON rather than prose) and implementing chaining.
Structured output adoption is strongly correlated with successful automation. When an API call returns structured JSON that can be programmatically processed, it integrates reliably into existing systems. When it returns prose that must be parsed, the integration is fragile — any change in the AI's phrasing can break downstream parsing.
Batch processing is consistently the highest-ROI use case for API adoption among non-engineer practitioners who learn Python basics. The ability to process hundreds of items overnight versus one at a time during working hours represents a qualitative change in what is possible, not just a productivity improvement.
Error handling and retry logic is frequently underimplemented in initial API integrations and frequently the cause of batch job failures. The practitioners who build robust integrations invest in error handling upfront rather than adding it after their first batch job fails at item 347 of 500.
Cost surprises are common among API beginners. The most effective mitigation: track and report costs from the first day of API use, not after the first unexpectedly large bill. The CostTracker class pattern above, or a similar approach, should be part of every API integration from the start.
Continue to Chapter 37 to learn how to build configured AI systems — custom GPTs, Claude Projects, and API-based assistants — that come pre-loaded with the context and instructions they need.