Chapter 24 Exercises: Connecting Python to Cloud Services
These exercises progress from conceptual and environment-setup tasks through full cloud integration. Tiers 1 and 2 can be completed without cloud credentials. Tiers 3 through 5 work best with AWS or Google Cloud access, though clearly marked alternatives are provided for learners working without accounts.
Tier 1: Foundations — Environment and Security Setup
These exercises reinforce the concepts of secure credential management and environment variable patterns before touching any cloud service.
Exercise 1.1 — Create a Secure Project Structure
Create a new project directory called cloud_practice/ with the following structure:
cloud_practice/
.gitignore
.env.example
.env (do not commit this one)
requirements.txt
main.py
Your .gitignore must exclude .env, credentials/, and any JSON files. Your .env.example must have placeholder values (not real credentials) for:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_DEFAULT_REGION
- ACME_S3_BUCKET
Deliverable: Show the contents of .gitignore and .env.example.
Exercise 1.2 — Write a Credential Validator
Write a function validate_required_env_vars(required: list[str]) -> None that:
- Accepts a list of required environment variable names
- Raises ValueError with a clear message listing all missing variables if any are absent
- Prints "All required environment variables are set." if all are present
Test it by calling it with ["AWS_ACCESS_KEY_ID", "FAKE_VAR_THAT_DOES_NOT_EXIST"].
Expected output:
ValueError: Missing required environment variables: FAKE_VAR_THAT_DOES_NOT_EXIST.
Check your .env file or system environment.
Exercise 1.3 — Load and Inspect Environment Variables
Using python-dotenv, write a script that:
1. Creates a .env file programmatically with three fake test values
2. Calls load_dotenv() to load them
3. Reads each value with os.environ.get()
4. Prints each key and its value length (not the value itself — good security practice)
Example output:
AWS_ACCESS_KEY_ID: set (20 characters)
AWS_SECRET_ACCESS_KEY: set (40 characters)
AWS_DEFAULT_REGION: set (9 characters)
Exercise 1.4 — The .env Anti-Pattern Quiz
For each code snippet below, identify whether it is safe or unsafe, and explain why:
Snippet A:
import boto3
client = boto3.client("s3", aws_access_key_id="AKIA1234567890ABCDEF")
Snippet B:
import os
from dotenv import load_dotenv
load_dotenv()
key = os.environ.get("AWS_ACCESS_KEY_ID")
client = boto3.client("s3", aws_access_key_id=key)
Snippet C:
import os
key = os.environ.get("AWS_ACCESS_KEY_ID")
print(f"Using key: {key}")
client = boto3.client("s3", aws_access_key_id=key)
Note: Snippet C has a subtle but real problem. Identify it.
Exercise 1.5 — Write a .env.example Generator
Write a Python function generate_env_example(var_names: list[str], output_path: str) -> None that:
- Creates a .env.example file at the given path
- Writes each variable name with an empty value placeholder
- Adds a comment at the top explaining the file's purpose
- Never overwrites an existing file (check first and raise if it exists)
Tier 2: Applied — boto3 and gspread Patterns Without Live Credentials
These exercises use mock objects and local simulation to practice the API patterns before needing real cloud accounts.
Exercise 2.1 — Mock S3 Upload Function
Write an upload_file_to_s3() function that:
- Accepts local_file_path, bucket_name, and s3_key parameters
- Validates that the local file exists before attempting upload
- Returns the expected S3 URL string (you can construct it without actually uploading)
- Raises FileNotFoundError if the file does not exist
- Has a complete docstring
Test it with a file path that exists and one that does not.
Exercise 2.2 — Build the S3 Key Naming System
Write a build_s3_key() function that generates consistent, dated S3 keys for business reports. The function must:
- Accept report_type (str), region (str, default "all"), and extension (str, default "xlsx") parameters
- Embed the current year and ISO week number in the path
- Follow the pattern: reports/{report_type}/{year}/{week}/{region}.{extension}
- Include a docstring with an example
Write five calls demonstrating different report types and regions, and print the resulting keys.
Exercise 2.3 — Simulate Listing S3 Objects
Create a list of ten dictionaries that simulates the return value of list_s3_objects(). Each dict should have key, size_bytes, and last_modified keys. Then write code to:
- Print only the objects in the reports/weekly-sales/2024/ prefix
- Find the most recently modified object
- Calculate the total storage used in megabytes
- Format sizes in human-readable form (KB, MB as appropriate)
Exercise 2.4 — Google Sheets Data Preparation
Given this fictional DataFrame:
import pandas as pd
from datetime import date, timedelta
projects = pd.DataFrame({
"project_name": ["Website Redesign", "Q1 Analysis", "Training Material", "Budget Review"],
"client_name": ["Apex Corp", "Birch LLC", "Cedar Inc", "Dune Co"],
"completion_percent": [85.0, 100.0, 45.0, 60.0],
"deadline": [
date.today() + timedelta(days=5),
date.today() - timedelta(days=2),
date.today() + timedelta(days=30),
date.today() + timedelta(days=3),
],
"on_hold": [False, False, True, False],
})
Write code to:
1. Add a days_remaining column (negative for overdue)
2. Add a status_label column using the logic: On Hold, Complete, Overdue, At Risk (< 7 days and < 80% done), On Track
3. Add a dashboard_updated column with the current datetime as a string
4. Print the result as a formatted table
Exercise 2.5 — Serialize a DataFrame for Sheets API
The Google Sheets API cannot handle pandas Timestamps, NaN values, or numpy numeric types directly. Write a clean_for_sheets(df: pd.DataFrame) -> list[list] function that:
- Converts all Timestamp/date values to ISO format strings
- Converts all NaN values to empty strings
- Converts all numpy integers and floats to Python native types
- Returns a list of lists (no header row — just data rows)
Test it with a DataFrame containing at least one of each problematic type.
Tier 3: Integration — Connecting the Pieces
These exercises combine multiple concepts into working pipelines. A real AWS or Google account helps but is not strictly required — you can stub the cloud calls and verify the logic.
Exercise 3.1 — Build the Complete Report Pipeline Function
Write a run_report_pipeline(data_path: str, report_name: str) -> dict function that:
1. Loads a CSV file into a pandas DataFrame
2. Computes a simple summary (groupby + agg on any column)
3. Saves the summary to a local Excel file with a timestamped name
4. Uploads the file to S3 (call your upload_file_to_s3() from Exercise 2.1)
5. Generates a presigned URL (stub the actual call if no AWS account)
6. Returns a dict with local_path, s3_key, presigned_url, and generated_at
The function must have full error handling with try/except and a docstring.
Exercise 3.2 — Build the Email Notification Function
Write a send_report_notification() function (see the chapter template) that:
- Reads SMTP configuration from environment variables with sensible defaults
- Validates that all required email credentials are present before attempting to send
- Builds an HTML email body with the report name, summary bullets, and a styled button linking to the presigned URL
- Handles SMTPAuthenticationError with a helpful error message about App Passwords
- Has a dry_run: bool = False parameter that, when True, prints the email body to stdout instead of sending
Test with dry_run=True.
Exercise 3.3 — Acme Weekly Report Integration
Using the functions you have written in this chapter, write a run_acme_weekly_report(csv_path: str) -> None function that:
1. Reads acme_sales_2023.csv (or a similarly structured CSV)
2. Groups by region and aggregates revenue (sum), units_sold (sum), sale_id (count as transactions)
3. Saves to Excel
4. Uploads to S3 with a key from build_report_s3_key("weekly-sales")
5. Generates presigned URL (48 hours)
6. Sends notification email to Sandra with three summary bullet points
7. Prints a completion summary with the S3 URL
Use if __name__ == "__main__": so it runs from the command line.
Exercise 3.4 — Maya's Project Dashboard Pipeline
Write a complete update_maya_dashboard(projects_csv: str) -> None function that:
1. Loads maya_projects.csv
2. Computes days_remaining and status_label
3. Selects client-appropriate columns (no internal rate or billing data)
4. Pushes to a Google Sheet using dataframe_to_sheet()
5. Prints how many projects are in each status category
6. Has a --dry-run mode (via argparse or a parameter) that prints the DataFrame instead of writing to Sheets
Exercise 3.5 — Cloud Database Query Function
Write a query_cloud_db(sql: str, params: dict | None = None) -> pd.DataFrame function that:
- Reads DATABASE_URL from the environment
- Handles both PostgreSQL and SQLite connection strings (SQLite for local testing, PostgreSQL for production)
- Uses pool_pre_ping=True to handle dropped cloud connections
- Returns results as a pandas DataFrame
- Raises a clear ConnectionError if the database is unreachable
Test it with a local SQLite database (use DATABASE_URL=sqlite:///test.db).
Tier 4: Challenge — Advanced Patterns
These exercises require real cloud credentials or substantial local simulation. They address real-world complications.
Exercise 4.1 — Presigned URL Expiration Audit
Write a script that:
1. Lists all objects in an S3 bucket under the reports/ prefix
2. For each object older than 30 days, generates a presigned URL with a 1-hour expiration
3. Prints a report: object key, age in days, presigned URL (first 80 characters only for display)
4. Writes the full report to a local CSV file
Handle the case where the bucket has more than 1,000 objects by using S3 pagination (list_objects_v2 with ContinuationToken).
Exercise 4.2 — Google Sheets Change Detection
Write a function that: 1. Reads the current state of a Google Sheet into a DataFrame 2. Reads the new data from a CSV file into another DataFrame 3. Computes the diff: rows that were added, rows that were removed, rows that changed 4. Only writes to the sheet if there are actual changes (avoiding unnecessary API calls) 5. Logs a summary: "X rows added, Y rows removed, Z rows updated"
Exercise 4.3 — Multi-Recipient Report Distribution
Extend the send_report_notification() function to:
- Accept a list of recipients instead of a single email address
- Generate a separate presigned URL for each recipient (or use the same URL — think about the trade-offs)
- Rate-limit sending to avoid SMTP throttling (add a configurable delay between sends)
- Return a dict mapping each recipient to either "sent" or the error message
Exercise 4.4 — Retry Logic for Flaky Cloud Connections
Cloud API calls occasionally fail due to transient network issues. Write a with_retry(func, max_attempts: int = 3, delay_seconds: float = 1.0) decorator (or context manager) that:
- Calls func up to max_attempts times
- Waits delay_seconds between attempts (with exponential backoff)
- Re-raises the last exception if all attempts fail
- Logs each retry attempt with the attempt number and error message
Apply it to the upload_file_to_s3() function.
Exercise 4.5 — Automated S3 Lifecycle Management
Write a clean_old_reports(bucket_name: str, prefix: str, keep_days: int = 90) -> int function that:
- Lists all objects under the given prefix
- Identifies objects older than keep_days
- Deletes them in batches of up to 1,000 (S3's batch delete limit)
- Returns the count of deleted objects
- Has a dry_run: bool = True parameter that lists what would be deleted without actually deleting
Tier 5: Stretch — Production-Ready Systems
These exercises build toward production-quality cloud integration. They are open-ended and require combining multiple concepts.
Exercise 5.1 — Lambda-Ready Report Function
Refactor the Acme weekly report pipeline so it can run as an AWS Lambda function:
- Write a lambda_handler(event: dict, context) -> dict entry point
- Remove all local file saves (read/write directly from/to S3 using BytesIO)
- Read all configuration from environment variables (no .env file — Lambda uses env vars directly)
- Return a proper response dict with statusCode and body
- Add structured logging using Python's logging module instead of print()
Include a comment explaining how you would configure the Lambda to run every Monday at 7 AM using EventBridge.
Exercise 5.2 — Google Sheets Dashboard with Formatting
Extend the Google Sheets writer to apply conditional formatting:
- Cells in the status_label column should be colored: green for "On Track", yellow for "At Risk", red for "Overdue", gray for "On Hold"
- The header row should be bold with a dark background
- Numbers in completion_percent should display as percentages
Use the gspread-formatting library or the raw Google Sheets API batchUpdate method.
Exercise 5.3 — Full Audit Trail
Design and implement a logging system for the Acme cloud pipeline that:
- Records every upload to S3 in a local SQLite database (table: upload_log with columns: timestamp, local_path, s3_key, bucket, file_size_bytes, success, error_message)
- Records every email notification sent (notification_log table)
- Provides a print_weekly_activity_report() function that queries the log and summarizes the past 7 days
- Handles write failures gracefully (logging should never crash the main pipeline)
Exercise 5.4 — Cloud Storage Cost Estimator
Write a estimate_monthly_s3_cost(bucket_name: str) -> dict function that:
- Lists all objects in the bucket
- Calculates total storage in GB
- Estimates monthly cost based on S3 Standard pricing ($0.023/GB-month)
- Estimates monthly GET request cost assuming each object is downloaded twice per month
- Suggests objects that could move to S3 Infrequent Access to save money (objects not modified in 30+ days)
- Returns a full cost breakdown dict
Exercise 5.5 — Integrated Cloud Business Intelligence System
Design and partially implement a system for Acme Corp that:
1. Every Monday at 7 AM (via scheduled task or Lambda): runs the sales report pipeline, uploads to S3, emails Sandra
2. Every night at 11 PM: pushes updated inventory data to a Google Sheet accessible to all regional managers
3. Whenever a new CSV file lands in the S3 incoming/ prefix (S3 trigger concept): validates it, processes it, and moves it to processed/
4. Maintains a pipeline_health Google Sheet with the last run time and status of each automated task
You do not need to fully implement all four components — design the architecture in comments and implement at least two components fully. Focus on clean interfaces between components and proper error handling.
These exercises are designed to be challenging at higher tiers. If you get stuck on Tier 4 or 5, return to Tier 3 and make sure those implementations are robust before proceeding.