19 min read

> "The cloud is just someone else's computer — but that someone else has a team of engineers making sure your data is safe, available, and accessible from anywhere in the world."

Chapter 24: Connecting Python to Cloud Services

"The cloud is just someone else's computer — but that someone else has a team of engineers making sure your data is safe, available, and accessible from anywhere in the world."


Opening Story: The Monday Morning Email Problem

Priya Okonkwo had been running her weekly sales report every Monday morning for six months. The script was solid — it pulled from Acme Corp's local database, crunched the numbers, and produced a clean Excel file that she emailed to Sandra Chen, the VP of Sales.

The problem surfaced during Sandra's trip to Chicago. "Priya, I need last week's numbers before my 8 AM meeting," Sandra texted at 6:47 AM. "I can't open Excel on my phone. Can you just... make it available somewhere?"

Marcus Webb, Acme's IT Manager, had been suggesting cloud storage for months. "If you're already using Python," he said when Priya mentioned the situation, "connecting to S3 takes about fifteen lines of code."

By the time Sandra landed, Priya had modified the script to upload the report to Amazon S3 and send Sandra a secure link. Sandra opened it in her browser from the airline's WiFi. Marcus, who had been skeptical of Python when Priya first started using it, quietly approved the change in the infrastructure request that same afternoon.

That is the story of this chapter. Your Python scripts do not have to live on your laptop. They can store files that anyone can access, read and write spreadsheets your whole team shares, and connect to databases that live in the cloud. This is the bridge between "script that helps me" and "system that helps the business."


24.1 Why Cloud Matters for Business Python Scripts

Before we write a single line of code, let's be clear about why this matters for you specifically.

The Local Machine Problem

When a Python script runs on your computer and saves files to your hard drive, you have created what engineers call a single point of failure. Consider these scenarios:

  • Your laptop dies before you back it up
  • You are traveling and a colleague needs the output right now
  • Three colleagues all need the same file at the same time
  • Your script produces output that needs to feed into another system in another location

None of these are hypothetical. They happen constantly in real businesses, and they all share the same root cause: your data and your computation are locked to one machine.

What "The Cloud" Actually Means

"The cloud" refers to computing resources — storage, databases, processing power — that live on servers in data centers and are accessible over the internet. When you connect your Python script to a cloud service, you make that script a citizen of the internet. It can then:

  • Store files that anyone with the right permissions can access from any device
  • Read from and write to shared data sources like spreadsheets or databases
  • Run on a schedule without your laptop being open
  • Scale — handle ten customers or ten thousand without changing much code

For a business Python script, cloud integration typically falls into three practical categories:

  1. File storage — saving outputs (reports, exports, processed data) somewhere accessible
  2. Shared data — reading from and writing to data sources others maintain (Google Sheets, cloud databases)
  3. Automation — triggering scripts on a schedule or in response to events, without manual intervention

This chapter focuses on the first two in depth, introduces the third conceptually, and gives you the security foundation all three require.

The Major Platforms at a Glance

You will hear about three dominant cloud platforms constantly. Here is what you need to know at a practical level:

Platform Company Key Services for Python Known For
AWS Amazon S3 (storage), Lambda (serverless), RDS (databases) Largest ecosystem, most documentation
Google Cloud Google Cloud Storage, BigQuery, Cloud Functions Data analytics, AI/ML, Google Workspace integration
Azure Microsoft Blob Storage, Azure Functions, Azure SQL Enterprise Windows environments, Microsoft 365 integration

For this chapter, we will focus on two tools that cover a large percentage of real business Python use cases:

  • Amazon S3 (via the boto3 library) for file storage
  • Google Sheets API (via the gspread library) for collaborative spreadsheets

The patterns you learn here apply across platforms. Once you understand how to authenticate, make requests, and handle errors with one service, the others follow the same logic.


24.2 Credential Security: The Most Important Section in This Chapter

Before we write a single cloud-related function, we need to talk about credentials. This section comes first because a security mistake can be costly and irreversible, while getting it right is straightforward.

The Cardinal Rule: Never Hardcode Credentials

This is the single most important rule in cloud programming:

# WRONG. NEVER DO THIS.
s3_client = boto3.client(
    "s3",
    aws_access_key_id="AKIAIOSFODNN7EXAMPLE",
    aws_secret_access_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
)

If you commit code like this to a git repository — even a private one — you have potentially exposed these credentials. Automated bots scan GitHub for exposed AWS keys within minutes of a commit. Amazon has documented cases where developers were billed tens of thousands of dollars in a single day after accidentally exposing credentials that way.

The solution is environment variables.

Environment Variables: The Right Approach

An environment variable is a value that exists in your operating system's environment, outside your code. Your Python script reads it at runtime, so the secret never appears in the code file itself.

import os
import boto3

# CORRECT: Read credentials from the environment
aws_access_key = os.environ.get("AWS_ACCESS_KEY_ID")
aws_secret_key = os.environ.get("AWS_SECRET_ACCESS_KEY")

if not aws_access_key or not aws_secret_key:
    raise ValueError(
        "AWS credentials not found in environment variables. "
        "Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY before running."
    )

s3_client = boto3.client(
    "s3",
    aws_access_key_id=aws_access_key,
    aws_secret_access_key=aws_secret_key,
)

The question then becomes: where do the environment variables come from?

The .env File and python-dotenv

For local development, the most practical approach is a .env file combined with the python-dotenv library.

A .env file is a plain text file that lives in your project directory:

# .env — DO NOT COMMIT THIS FILE
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
AWS_DEFAULT_REGION=us-east-1
GOOGLE_CREDENTIALS_FILE=credentials/google_service_account.json
ACME_S3_BUCKET=acme-corp-reports

Install python-dotenv with:

pip install python-dotenv

Then at the top of your script, load the .env file before accessing environment variables:

from dotenv import load_dotenv
import os

load_dotenv()  # reads .env and injects values into the process environment

aws_access_key = os.environ.get("AWS_ACCESS_KEY_ID")

The load_dotenv() call reads the .env file and injects its contents into the process environment. If the variables are already set in the system environment (as they would be on a production server), load_dotenv() does not override them — which means the same code works both locally and in production without any changes.

The .gitignore File: Your Second Line of Defense

A .gitignore file tells git which files to never include in commits. Every project that uses cloud credentials needs one:

# .gitignore
.env
*.env
credentials/
*.pem
*.key
__pycache__/
*.pyc
.DS_Store

Create this file before you write a single line of cloud code. Check that your .env file appears ignored before you ever run git add. This habit is non-negotiable.

What About Production Servers?

On production servers — a cloud VM, a deployment platform, an AWS Lambda function — you set environment variables through the platform's interface, not through a .env file. Every major platform provides this:

  • AWS: Systems Manager Parameter Store, Secrets Manager, or Lambda environment variables
  • Google Cloud: Secret Manager or Cloud Run environment variables
  • Heroku / Railway / Render: A settings panel in their dashboard

The pattern is always the same: your code reads os.environ.get("MY_SECRET"), and the environment where the code runs has that variable set. Your code never needs to know whether it is running on your laptop or a server in a data center.


24.3 Amazon S3 with boto3

Amazon S3 (Simple Storage Service) is the most widely used cloud file storage service in the world. For business Python scripts, it serves as a reliable, scalable place to store reports, exports, backups, and any other file-based output.

Installing and Configuring boto3

Install boto3:

pip install boto3

You need an AWS account and an IAM user with S3 permissions. If Marcus Webb or your IT team manages AWS, ask them for: - An AWS access key ID - An AWS secret access key - The name of an S3 bucket you have permission to use - The AWS region your bucket lives in (for example, us-east-1)

If you are setting this up yourself, the AWS IAM documentation walks through creating a user with least-privilege S3 access. At minimum you need s3:PutObject, s3:GetObject, and s3:ListBucket on your specific bucket — not on all AWS resources.

Building a Reusable S3 Client

Rather than repeating authentication code throughout your script, build a single function that creates and returns an authenticated client:

"""
s3_utilities.py — Reusable S3 client and operation functions.
"""

import os
import boto3
from botocore.exceptions import ClientError
from dotenv import load_dotenv

load_dotenv()


def get_s3_client():
    """
    Create and return an authenticated S3 client.

    Reads credentials from environment variables. Call load_dotenv()
    before this function if using a .env file for local development.

    Returns:
        boto3.client: An authenticated S3 client object.

    Raises:
        ValueError: If required environment variables are not set.
    """
    required_vars = [
        "AWS_ACCESS_KEY_ID",
        "AWS_SECRET_ACCESS_KEY",
        "AWS_DEFAULT_REGION",
    ]
    missing = [var for var in required_vars if not os.environ.get(var)]

    if missing:
        raise ValueError(
            f"Missing required environment variables: {', '.join(missing)}. "
            "Check your .env file."
        )

    return boto3.client(
        "s3",
        aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
        aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
        region_name=os.environ["AWS_DEFAULT_REGION"],
    )

Uploading Files to S3

The core operation for most business scripts is uploading a file after generating it:

def upload_file_to_s3(
    local_file_path: str,
    bucket_name: str,
    s3_key: str,
) -> str:
    """
    Upload a local file to an S3 bucket.

    Args:
        local_file_path: Path to the file on your local machine.
        bucket_name: Name of the S3 bucket.
        s3_key: The path/filename in S3, e.g. 'reports/2024/weekly_sales.xlsx'.

    Returns:
        str: The S3 URL of the uploaded file.

    Raises:
        FileNotFoundError: If the local file does not exist.
        PermissionError: If the IAM user lacks s3:PutObject permission.
        ClientError: If the upload fails for another AWS reason.
    """
    if not os.path.exists(local_file_path):
        raise FileNotFoundError(f"File not found: {local_file_path}")

    s3_client = get_s3_client()

    try:
        s3_client.upload_file(local_file_path, bucket_name, s3_key)
        region = os.environ["AWS_DEFAULT_REGION"]
        s3_url = f"https://{bucket_name}.s3.{region}.amazonaws.com/{s3_key}"
        print(f"Uploaded successfully: {s3_url}")
        return s3_url

    except ClientError as e:
        error_code = e.response["Error"]["Code"]
        if error_code == "AccessDenied":
            raise PermissionError(
                f"Access denied to bucket '{bucket_name}'. "
                "Verify your IAM permissions include s3:PutObject."
            ) from e
        raise

Notice the pattern: check preconditions first (does the file exist?), attempt the operation, then catch specific exceptions and raise more helpful errors. This makes debugging far easier than letting boto3's generic ClientError propagate.

Downloading Files from S3

def download_file_from_s3(
    bucket_name: str,
    s3_key: str,
    local_destination_path: str,
) -> None:
    """
    Download a file from S3 to a local path.

    Args:
        bucket_name: Name of the S3 bucket.
        s3_key: The S3 path of the file to download.
        local_destination_path: Local path where the file will be saved.

    Raises:
        FileNotFoundError: If the specified S3 object does not exist.
        PermissionError: If the IAM user lacks s3:GetObject permission.
    """
    s3_client = get_s3_client()

    try:
        s3_client.download_file(bucket_name, s3_key, local_destination_path)
        print(f"Downloaded '{s3_key}' to '{local_destination_path}'")

    except ClientError as e:
        error_code = e.response["Error"]["Code"]
        if error_code == "404":
            raise FileNotFoundError(
                f"Object '{s3_key}' not found in bucket '{bucket_name}'."
            ) from e
        if error_code == "AccessDenied":
            raise PermissionError(
                f"Access denied when downloading '{s3_key}'."
            ) from e
        raise

Listing Objects in a Bucket

Knowing what is in a bucket — and finding the latest version of a report — is a common need:

def list_s3_objects(
    bucket_name: str,
    prefix: str = "",
    max_results: int = 100,
) -> list[dict]:
    """
    List objects in an S3 bucket, optionally filtered by prefix.

    S3 does not have true folders. You can filter by key prefix to
    simulate folder browsing. prefix='reports/2024/' returns only
    objects whose keys start with that string.

    Args:
        bucket_name: Name of the S3 bucket.
        prefix: Optional key prefix to filter results.
        max_results: Maximum number of results to return (default 100).

    Returns:
        List of dicts, each with 'key', 'size_bytes', and 'last_modified'.

    Example:
        objects = list_s3_objects("acme-corp-reports", prefix="reports/2024/")
        for obj in objects:
            print(obj['key'], obj['last_modified'])
    """
    s3_client = get_s3_client()

    try:
        response = s3_client.list_objects_v2(
            Bucket=bucket_name,
            Prefix=prefix,
            MaxKeys=max_results,
        )
    except ClientError as e:
        raise RuntimeError(
            f"Failed to list objects in bucket '{bucket_name}': {e}"
        ) from e

    objects = []
    for obj in response.get("Contents", []):
        objects.append({
            "key": obj["Key"],
            "size_bytes": obj["Size"],
            "last_modified": obj["LastModified"],
        })

    return objects

Presigned URLs: Sharing Files Without AWS Accounts

One of S3's most valuable features for business sharing is the presigned URL. This is a temporary URL that grants access to a specific file without requiring the recipient to have an AWS account. You control exactly how long the link remains valid.

def generate_presigned_url(
    bucket_name: str,
    s3_key: str,
    expiration_hours: int = 24,
) -> str:
    """
    Generate a temporary presigned URL for secure file sharing.

    The URL grants temporary read access to a private S3 object without
    requiring the recipient to have AWS credentials. After expiration,
    the URL stops working automatically.

    Args:
        bucket_name: Name of the S3 bucket.
        s3_key: The S3 key (path) of the file.
        expiration_hours: How many hours the link should remain valid.
                          Default is 24. AWS maximum is 7 days (168 hours).

    Returns:
        str: A presigned URL that provides temporary access to the file.

    Raises:
        RuntimeError: If URL generation fails.
    """
    s3_client = get_s3_client()
    expiration_seconds = expiration_hours * 3600

    try:
        url = s3_client.generate_presigned_url(
            "get_object",
            Params={"Bucket": bucket_name, "Key": s3_key},
            ExpiresIn=expiration_seconds,
        )
        print(f"Presigned URL generated (valid {expiration_hours} hours)")
        return url
    except ClientError as e:
        raise RuntimeError(
            f"Could not generate presigned URL for '{s3_key}': {e}"
        ) from e

How Priya used this: after uploading Acme's weekly report to S3, she generates a presigned URL valid for 48 hours and includes it in the email to Sandra. Sandra clicks the link from her phone and gets the file directly in her browser. No AWS account required on Sandra's end. The link expires automatically.

Organizing Files in S3: Key Naming Conventions

S3 does not have actual folders — it has keys (essentially file paths) that can contain forward slashes. A consistent naming convention makes listing and finding files much easier:

reports/weekly-sales/2024/W03/north-region.xlsx
reports/weekly-sales/2024/W03/south-region.xlsx
reports/inventory/2024/Q1/inventory-snapshot-2024-01-01.csv
from datetime import datetime


def build_report_s3_key(
    report_type: str,
    region: str = "all-regions",
    file_extension: str = "xlsx",
) -> str:
    """
    Build a consistent S3 key for dated business reports.

    Key format: reports/{report_type}/{year}/{week_number}/{region}.{ext}

    Args:
        report_type: Type of report, e.g. 'weekly-sales' or 'inventory'.
        region: Business region identifier, e.g. 'north' or 'all-regions'.
        file_extension: File extension without the dot.

    Returns:
        str: A dated S3 key string with year and ISO week number embedded.

    Example:
        key = build_report_s3_key("weekly-sales", "north")
        # Returns: 'reports/weekly-sales/2024/W03/north.xlsx'
    """
    now = datetime.now()
    year = now.strftime("%Y")
    week = now.strftime("W%W")
    return f"reports/{report_type}/{year}/{week}/{region}.{file_extension}"

24.4 Google Sheets API with gspread

While S3 is ideal for storing finished files, Google Sheets fills a different need: collaborative, always-current data. When Maya Reyes needs her clients to see live project status without sending updates via email, or when Sandra wants a shared dashboard her whole team can filter — Google Sheets integrated with Python is often the right answer.

Why Sheets Instead of Email Attachments?

Approach The Problem
Email attachment Recipient gets a snapshot; file versions proliferate; no one knows which is current
Manual update to shared Drive file Still manual; someone has to remember to re-export
Python + Sheets API Script updates the sheet automatically; always current; no email needed; stakeholders bookmark it

The Google Sheets API lets your Python script write directly to a live spreadsheet. Every time the script runs, the sheet reflects the latest data. Anyone you have shared the sheet with sees it in real time.

Setting Up Google Sheets API Access

This requires a few one-time setup steps in the Google Cloud Console:

Step 1. Go to console.cloud.google.com and create a new project.

Step 2. Enable two APIs: "Google Sheets API" and "Google Drive API". Search for each in the API library and click Enable.

Step 3. Create a Service Account — a bot identity for your script, separate from your personal Google account. Go to IAM and Admin, then Service Accounts, then Create Service Account.

Step 4. Create a key for the service account and download it as JSON. This is your credentials file.

Step 5. Open the Google Sheet you want your script to access. Click Share and give the service account's email address Editor access. The email looks like your-service-account@your-project.iam.gserviceaccount.com.

Step 6. Store the path to your credentials file in .env:

GOOGLE_CREDENTIALS_FILE=credentials/google_service_account.json

And add the credentials directory to .gitignore:

credentials/

Install the required libraries:

pip install gspread google-auth

Authenticating and Connecting

"""
sheets_client.py — Google Sheets client and basic operations.
"""

import os
import gspread
from google.oauth2.service_account import Credentials
from dotenv import load_dotenv

load_dotenv()


def get_sheets_client() -> gspread.Client:
    """
    Create and return an authenticated Google Sheets client.

    Uses a service account credentials file specified by the
    GOOGLE_CREDENTIALS_FILE environment variable.

    Returns:
        gspread.Client: An authenticated client with Sheets and Drive scope.

    Raises:
        ValueError: If the credentials path is not set in environment.
        FileNotFoundError: If the credentials file does not exist.
    """
    credentials_path = os.environ.get("GOOGLE_CREDENTIALS_FILE")

    if not credentials_path:
        raise ValueError(
            "GOOGLE_CREDENTIALS_FILE environment variable not set. "
            "Add it to your .env file pointing to your service account JSON."
        )

    if not os.path.exists(credentials_path):
        raise FileNotFoundError(
            f"Google credentials file not found: {credentials_path}. "
            "Download from Google Cloud Console > Service Accounts > Keys."
        )

    scopes = [
        "https://www.googleapis.com/auth/spreadsheets",
        "https://www.googleapis.com/auth/drive",
    ]

    credentials = Credentials.from_service_account_file(
        credentials_path,
        scopes=scopes,
    )
    return gspread.authorize(credentials)

Finding Your Spreadsheet ID

Every Google Sheet has a unique ID in its URL:

https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms/edit

The ID is the long string between /d/ and /edit. Store this in .env:

CLIENT_DASHBOARD_SHEET_ID=1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms

Reading Data from a Sheet

def read_sheet_as_records(
    spreadsheet_id: str,
    worksheet_name: str = "Sheet1",
) -> list[dict]:
    """
    Read all data from a Google Sheet worksheet as a list of records.

    Each row is returned as a dictionary with column headers as keys.
    The first row of the sheet is treated as the header row.

    Args:
        spreadsheet_id: The Google Sheet ID from its URL.
        worksheet_name: Name of the worksheet tab to read. Default 'Sheet1'.

    Returns:
        List of dicts, one per data row (header row excluded).

    Raises:
        ValueError: If the spreadsheet or worksheet is not accessible.
    """
    client = get_sheets_client()

    try:
        spreadsheet = client.open_by_key(spreadsheet_id)
        worksheet = spreadsheet.worksheet(worksheet_name)
        records = worksheet.get_all_records()
        print(f"Read {len(records)} rows from '{worksheet_name}'")
        return records

    except gspread.exceptions.SpreadsheetNotFound:
        raise ValueError(
            f"Spreadsheet '{spreadsheet_id}' not found. "
            "Check the ID and ensure the service account has access."
        )
    except gspread.exceptions.WorksheetNotFound:
        raise ValueError(
            f"Worksheet '{worksheet_name}' not found in this spreadsheet."
        )

Writing Data to a Sheet

def write_rows_to_sheet(
    spreadsheet_id: str,
    worksheet_name: str,
    headers: list[str],
    rows: list[list],
    clear_existing: bool = True,
) -> None:
    """
    Write rows of data to a Google Sheet worksheet.

    Args:
        spreadsheet_id: The Google Sheet ID from its URL.
        worksheet_name: Name of the worksheet tab to write to.
        headers: Column header strings for the first row.
        rows: List of lists, each inner list being one data row.
        clear_existing: Clear sheet before writing (default True).
                        Set False to append to existing data.

    Raises:
        ValueError: If the spreadsheet is not found.
    """
    client = get_sheets_client()

    try:
        spreadsheet = client.open_by_key(spreadsheet_id)

        try:
            worksheet = spreadsheet.worksheet(worksheet_name)
        except gspread.exceptions.WorksheetNotFound:
            worksheet = spreadsheet.add_worksheet(
                title=worksheet_name,
                rows=1000,
                cols=26,
            )
            print(f"Created new worksheet: '{worksheet_name}'")

        if clear_existing:
            worksheet.clear()

        worksheet.update("A1", [headers] + rows)
        print(f"Wrote {len(rows)} rows to '{worksheet_name}'")

    except gspread.exceptions.SpreadsheetNotFound:
        raise ValueError(f"Spreadsheet '{spreadsheet_id}' not found.")

Writing a pandas DataFrame to a Sheet

In practice, you will usually have a pandas DataFrame you want to push to a sheet. This function handles the conversion:

import pandas as pd


def dataframe_to_sheet(
    df: pd.DataFrame,
    spreadsheet_id: str,
    worksheet_name: str,
    clear_existing: bool = True,
) -> None:
    """
    Write a pandas DataFrame directly to a Google Sheet.

    Handles conversion of Timestamps, NaN values, and other types
    that do not serialize cleanly to JSON for the Sheets API.

    Args:
        df: The DataFrame to write. Column names become the header row.
        spreadsheet_id: Google Sheet ID from its URL.
        worksheet_name: Target worksheet tab name.
        clear_existing: Clear sheet before writing (default True).
    """
    headers = list(df.columns)
    raw_rows = df.values.tolist()

    cleaned_rows = []
    for row in raw_rows:
        cleaned_row = []
        for cell in row:
            if hasattr(cell, "isoformat"):
                cleaned_row.append(cell.isoformat())
            elif isinstance(cell, float) and pd.isna(cell):
                cleaned_row.append("")
            else:
                cleaned_row.append(cell)
        cleaned_rows.append(cleaned_row)

    write_rows_to_sheet(
        spreadsheet_id,
        worksheet_name,
        headers,
        cleaned_rows,
        clear_existing,
    )

Google Sheets Rate Limits

The Google Sheets API has rate limits worth knowing:

  • 100 requests per 100 seconds per project
  • 60 requests per minute per user

For most business scripts that update a sheet once daily or weekly, you will never approach these limits. Use the update() method with a full data array (as shown above) rather than writing cell by cell — one call to update five hundred cells is vastly more efficient than five hundred individual calls.


24.5 Cloud-Based Databases

Local SQLite vs. Cloud PostgreSQL

In Chapter 23, Acme's data lived in acme_inventory.db, a SQLite file on Priya's laptop. SQLite is excellent for learning and single-user scripts, but it has a fundamental constraint: only one process can write to it at a time, and the file must be locally accessible.

Cloud-based databases like PostgreSQL solve this. PostgreSQL runs on a server accessible to multiple users simultaneously over the internet, with proper concurrent access control.

The good news: the Python code to query a cloud database looks almost identical to querying SQLite. The primary difference is the connection string and the driver library.

Connection String Anatomy

A database connection string encodes everything needed to connect: host, port, database name, username, and password.

postgresql://username:password@host:port/database_name

A Supabase connection string might look like:

postgresql://priya.chen:s3cur3p@ss@db.xyzabc.supabase.co:5432/postgres

Like all credentials, this goes in .env:

DATABASE_URL=postgresql://priya.chen:s3cur3p@ss@db.xyzabc.supabase.co:5432/postgres

Connecting with SQLAlchemy

SQLAlchemy provides a consistent interface to many database types. The same code works with PostgreSQL by changing only the connection string:

pip install sqlalchemy psycopg2-binary
"""
cloud_database.py — Cloud PostgreSQL connection using SQLAlchemy.
"""

import os
import pandas as pd
from sqlalchemy import create_engine, text
from sqlalchemy.exc import OperationalError
from dotenv import load_dotenv

load_dotenv()


def get_database_engine():
    """
    Create a SQLAlchemy engine connected to the cloud database.

    Reads the connection string from the DATABASE_URL environment variable.
    Works with both PostgreSQL (cloud) and SQLite (local) connection strings,
    making it easy to switch between environments.

    Returns:
        sqlalchemy.engine.Engine: A configured database engine.

    Raises:
        ValueError: If DATABASE_URL environment variable is not set.
        ConnectionError: If the database cannot be reached.
    """
    database_url = os.environ.get("DATABASE_URL")

    if not database_url:
        raise ValueError(
            "DATABASE_URL environment variable not set. "
            "Add your connection string to .env."
        )

    try:
        # pool_pre_ping=True tests connections before use, handling
        # cloud databases that drop idle connections after a timeout
        engine = create_engine(database_url, pool_pre_ping=True)

        with engine.connect() as conn:
            conn.execute(text("SELECT 1"))

        print("Database connection successful.")
        return engine

    except OperationalError as e:
        raise ConnectionError(
            f"Could not connect to database. "
            f"Check your DATABASE_URL and network connectivity. "
            f"Original error: {e}"
        ) from e


def run_query(
    sql_query: str,
    params: dict | None = None,
) -> pd.DataFrame:
    """
    Execute a SQL query and return results as a pandas DataFrame.

    Uses parameterized queries to prevent SQL injection. Always use
    :param_name placeholders for user-supplied values rather than
    formatting them directly into the query string.

    Args:
        sql_query: SQL query string using :param_name placeholders.
        params: Optional dict mapping parameter names to values.

    Returns:
        pd.DataFrame: Query results with column names from the query.

    Example:
        df = run_query(
            "SELECT * FROM sales WHERE region = :region AND year = :year",
            params={"region": "North", "year": 2024}
        )
    """
    engine = get_database_engine()

    with engine.connect() as conn:
        result = conn.execute(text(sql_query), params or {})
        return pd.DataFrame(result.fetchall(), columns=list(result.keys()))

Supabase and Railway: Accessible Cloud PostgreSQL

Two platforms make it easy to get a cloud PostgreSQL database without DevOps expertise:

Supabase (supabase.com) — Free tier available. Gives you PostgreSQL plus a web interface for browsing data, automatic REST API generation, and file storage. Particularly good if your project might eventually grow into a web application.

Railway (railway.app) — Minimal setup, generous free tier, straightforward deployment. Good for developers who want a database running in under five minutes.

Both platforms give you a connection string on their dashboard. Copy it, paste it into your .env file as DATABASE_URL, and your existing SQLAlchemy code connects to the cloud database immediately.

The Migration Path: SQLite to PostgreSQL

When Priya is ready to move acme_inventory.db from local SQLite to cloud PostgreSQL:

  1. Set up a PostgreSQL database on Supabase or Railway
  2. Export SQLite tables to CSV using pandas to_csv()
  3. Import CSV files into PostgreSQL through the web interface or a migration script
  4. Change DATABASE_URL in .env from sqlite:///acme_inventory.db to the PostgreSQL string
  5. Update any SQLite-specific SQL syntax (mostly around date functions)

Steps 4 and 5 are usually the only code changes required.


24.6 Serverless Functions: The Concept You Need to Know

You do not need to implement a serverless function today. But you need to understand what they are, because they represent the natural next step for many business Python scripts.

What "Serverless" Means

A serverless function is a piece of code that runs in the cloud, on demand, without you managing a server. You write a Python function, upload it to a platform like AWS Lambda or Google Cloud Functions, and it runs whenever triggered — on a schedule, when a file arrives in S3, when an API call comes in, and so on.

The term is slightly misleading: there is always a server. You just do not manage it. The cloud provider allocates a container when the function needs to run and deallocates it when done. You pay only for execution time, not for a server sitting idle.

AWS Lambda: The Most Common Serverless Platform

AWS Lambda lets you run Python code up to 15 minutes long, triggered by: - A schedule — "Run this every Monday at 6 AM" via Amazon EventBridge - An S3 event — "A new file was uploaded to this bucket" - An API call — via API Gateway - Many other AWS services

A Lambda function has a specific required structure:

"""
lambda_function.py — Example AWS Lambda function structure.
This would replace Priya's Monday morning manual script run,
running automatically in the cloud every week.
"""

import json


def lambda_handler(event: dict, context) -> dict:
    """
    AWS Lambda entry point. This exact function signature is required.

    AWS calls this function with an event dict containing trigger data
    and a context object with runtime information. The function must
    return a dict with at minimum a 'statusCode' key.

    Args:
        event: Dict containing trigger data.
        context: Lambda context object with runtime metadata.

    Returns:
        dict: Response with statusCode and body.
    """
    try:
        report_key = generate_and_upload_weekly_report()

        return {
            "statusCode": 200,
            "body": json.dumps({
                "message": "Weekly report generated successfully",
                "s3_key": report_key,
            }),
        }

    except Exception as e:
        # Lambda logs are automatically captured in CloudWatch Logs
        print(f"ERROR: {e}")
        return {
            "statusCode": 500,
            "body": json.dumps({"error": str(e)}),
        }


def generate_and_upload_weekly_report() -> str:
    """
    Generate the weekly sales report and upload it to S3.

    In a full implementation, this calls the same report generation
    pipeline used locally — the same Python code works in Lambda.

    Returns:
        str: S3 key of the uploaded report.
    """
    return "reports/weekly-sales/2024/W03/all-regions.xlsx"

When to Move to Serverless

The natural progression for business Python developers:

  1. Script runs manually on your laptop — where Chapter 23 leaves you
  2. Script uploads its output to cloud storage — what this chapter covers
  3. Script runs automatically in the cloud on a schedule — AWS Lambda or Google Cloud Functions

You do not need to go serverless on day one. But knowing it exists means you will recognize when you have outgrown a manually-run script.


24.7 The Complete Pipeline: Generate, Upload, and Notify

Let's bring everything from this chapter into a workflow that represents real business automation. This is what Priya built for Acme Corp.

The Four Steps

  1. Python script generates a report (DataFrame to Excel file saved locally)
  2. Script uploads the file to S3 with a date-stamped key
  3. Script generates a presigned URL valid for 48 hours
  4. Script sends an email to Sandra containing the link and a brief summary
"""
notification.py — Email notification with S3 presigned URL.
"""

import os
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from datetime import datetime
from dotenv import load_dotenv

load_dotenv()


def send_report_notification(
    recipient_email: str,
    report_name: str,
    presigned_url: str,
    summary_lines: list[str] | None = None,
    expiration_hours: int = 48,
) -> None:
    """
    Send an HTML email notification with a link to a report in S3.

    Args:
        recipient_email: Email address to send the notification to.
        report_name: Human-readable report name for the subject line.
        presigned_url: Presigned S3 URL to include in the email body.
        summary_lines: Optional list of bullet point strings for a brief
                       summary before the download link.
        expiration_hours: How long the link stays valid, for the body text.

    Raises:
        ValueError: If SENDER_EMAIL or SENDER_APP_PASSWORD are not configured.
        smtplib.SMTPAuthenticationError: If email credentials are invalid.
    """
    sender_email = os.environ.get("SENDER_EMAIL")
    sender_password = os.environ.get("SENDER_APP_PASSWORD")
    smtp_host = os.environ.get("SMTP_HOST", "smtp.gmail.com")
    smtp_port = int(os.environ.get("SMTP_PORT", "587"))

    if not sender_email or not sender_password:
        raise ValueError(
            "SENDER_EMAIL and SENDER_APP_PASSWORD must be set in .env. "
            "For Gmail, use an App Password rather than your main password."
        )

    generated_at = datetime.now().strftime("%B %d, %Y at %I:%M %p")

    summary_html = ""
    if summary_lines:
        items = "".join(f"<li>{line}</li>" for line in summary_lines)
        summary_html = f"<ul>{items}</ul>"

    html_body = f"""
    <html>
    <body style="font-family: Arial, sans-serif; color: #333; max-width: 600px;">
        <h2 style="color: #0066cc;">{report_name} is Ready</h2>
        {summary_html}
        <p>
            <a href="{presigned_url}" style="
                display: inline-block;
                background-color: #0066cc;
                color: white;
                padding: 12px 24px;
                text-decoration: none;
                border-radius: 4px;
                font-weight: bold;
            ">View Report</a>
        </p>
        <p style="color: #666; font-size: 0.85em;">
            This link expires in {expiration_hours} hours.<br>
            Generated: {generated_at}
        </p>
        <p style="color: #999; font-size: 0.8em;">
            Sent automatically by Acme Corp Python Reports
        </p>
    </body>
    </html>
    """

    message = MIMEMultipart("alternative")
    message["Subject"] = f"Report Ready: {report_name}"
    message["From"] = sender_email
    message["To"] = recipient_email
    message.attach(MIMEText(html_body, "html"))

    try:
        with smtplib.SMTP(smtp_host, smtp_port) as smtp:
            smtp.ehlo()
            smtp.starttls()
            smtp.login(sender_email, sender_password)
            smtp.send_message(message)
        print(f"Notification sent to {recipient_email}")
    except smtplib.SMTPAuthenticationError:
        raise ValueError(
            "Email authentication failed. Check SENDER_EMAIL and "
            "SENDER_APP_PASSWORD. For Gmail, enable 2FA and use an App Password."
        )

24.8 Acme Corp Scenario: Priya's Cloud Migration in Detail

Priya already has a working script that generates the weekly sales report. The modification is straightforward — after generating the local file, she adds the cloud pipeline.

Before the migration, the script ended with saving locally and emailing an attachment. After the migration:

# After: save locally, upload to S3, send presigned link
from datetime import datetime

local_path = "weekly_sales_report.xlsx"
report_df.to_excel(local_path, index=False)

s3_key = build_report_s3_key("weekly-sales")
upload_file_to_s3(local_path, ACME_S3_BUCKET, s3_key)

share_url = generate_presigned_url(ACME_S3_BUCKET, s3_key, expiration_hours=48)

summary = [
    f"Total revenue: ${report_df['total_revenue'].sum():,.2f}",
    f"Best region: {report_df.loc[report_df['total_revenue'].idxmax(), 'region']}",
    f"Transactions: {report_df['transaction_count'].sum():,}",
]

send_report_notification(
    recipient_email=SANDRA_EMAIL,
    report_name=f"Weekly Sales — {datetime.now().strftime('Week %W, %Y')}",
    presigned_url=share_url,
    summary_lines=summary,
    expiration_hours=48,
)

Marcus Webb's IAM Review

Before this went into production, Marcus Webb reviewed the AWS permissions. He wanted the script's IAM user to have access to exactly one bucket and nothing else.

Priya showed him the IAM policy she had requested:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AcmeReportsBucketAccess",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::acme-corp-reports",
                "arn:aws:s3:::acme-corp-reports/*"
            ]
        }
    ]
}

"This is how it should be done," Marcus said. "The script can only touch that one bucket. If these credentials were ever compromised, the attacker could not access anything else in our AWS account." He approved the IAM user creation that afternoon.

This pattern is worth internalizing: when you ask your IT team for AWS permissions, lead with your specific needs. "I need read and write access to one S3 bucket" is a request they can approve confidently. "I need admin access" is a request they will and should push back on.

Sandra's Experience

Sandra received the first notification email at 6:58 AM. She clicked "View Report" from her iPhone on the way to the airport. The file opened in her browser as a formatted table. She forwarded the link to two regional managers traveling with her — neither needed an AWS account. The link expired automatically 48 hours later.

"This is the kind of thing I have been asking for," Sandra told Priya. "Not the Excel attachment that sits in my inbox getting stale."


24.9 Maya Reyes: Google Sheets as a Client-Facing Dashboard

Maya Reyes has a different problem. Her clients frequently ask for project status updates. What she wants is a shared Google Sheet that each client can bookmark, showing current project status, without Maya having to manually update it.

Her existing data lives in maya_projects.csv. She builds a nightly script that reads the CSV, calculates derived status columns, and pushes a client-appropriate view to a shared Google Sheet.

"""
maya_client_dashboard.py — Nightly push of project status to Google Sheets.
"""

import os
import pandas as pd
from datetime import datetime, date
from dotenv import load_dotenv

load_dotenv()

CLIENT_DASHBOARD_SHEET_ID = os.environ.get("CLIENT_DASHBOARD_SHEET_ID")


def load_and_prepare_project_status(csv_path: str) -> pd.DataFrame:
    """
    Load Maya's project data and compute client-facing status columns.

    Args:
        csv_path: Path to maya_projects.csv.

    Returns:
        pd.DataFrame: Client-appropriate project view with derived status.
    """
    df = pd.read_csv(
        csv_path,
        parse_dates=["start_date", "deadline", "last_update"],
    )

    today = date.today()
    df["days_remaining"] = df["deadline"].apply(
        lambda d: (d.date() - today).days
    )

    df["status_label"] = df.apply(
        lambda row: _compute_status_label(
            completion_percent=row["completion_percent"],
            days_remaining=row["days_remaining"],
            on_hold=bool(row.get("on_hold", False)),
        ),
        axis=1,
    )

    df["dashboard_updated"] = datetime.now().strftime("%Y-%m-%d %H:%M")

    client_columns = [
        "project_name",
        "client_name",
        "start_date",
        "deadline",
        "completion_percent",
        "days_remaining",
        "status_label",
        "dashboard_updated",
    ]
    return df[client_columns]


def _compute_status_label(
    completion_percent: float,
    days_remaining: int,
    on_hold: bool,
) -> str:
    """
    Convert numeric progress metrics to a plain-English status label.

    Args:
        completion_percent: 0 to 100 completion value.
        days_remaining: Days until deadline (negative means overdue).
        on_hold: Whether the project is currently paused.

    Returns:
        str: One of 'On Hold', 'Complete', 'Overdue', 'At Risk', 'On Track'.
    """
    if on_hold:
        return "On Hold"
    if completion_percent >= 100:
        return "Complete"
    if days_remaining < 0:
        return "Overdue"
    if days_remaining < 7 and completion_percent < 80:
        return "At Risk"
    return "On Track"


def push_project_status_to_sheets(csv_path: str) -> None:
    """
    Load project data and write the client-facing view to Google Sheets.

    This is the main function to call from a scheduler. It performs
    the full cycle: load CSV, enrich with derived columns, push to Sheets.

    Args:
        csv_path: Path to maya_projects.csv.

    Raises:
        ValueError: If CLIENT_DASHBOARD_SHEET_ID is not configured.
    """
    if not CLIENT_DASHBOARD_SHEET_ID:
        raise ValueError(
            "CLIENT_DASHBOARD_SHEET_ID not set in .env."
        )

    print(f"Loading project data from {csv_path}...")
    dashboard_df = load_and_prepare_project_status(csv_path)

    at_risk = (dashboard_df["status_label"] == "At Risk").sum()
    overdue = (dashboard_df["status_label"] == "Overdue").sum()
    print(
        f"Projects: {len(dashboard_df)} total, "
        f"{at_risk} at risk, {overdue} overdue"
    )

    from sheets_client import dataframe_to_sheet
    dataframe_to_sheet(
        df=dashboard_df,
        spreadsheet_id=CLIENT_DASHBOARD_SHEET_ID,
        worksheet_name="Project Status",
        clear_existing=True,
    )

    sheet_url = (
        f"https://docs.google.com/spreadsheets/d/{CLIENT_DASHBOARD_SHEET_ID}"
    )
    print(f"Dashboard updated. View at: {sheet_url}")


if __name__ == "__main__":
    push_project_status_to_sheets("data/maya_projects.csv")

Maya shares the sheet link with each client once, with view-only access. Whenever a client wants to check status, they open the bookmarked link. No email from Maya required. The script runs nightly via Task Scheduler, and the sheet reflects the latest data every morning.

"It's the closest thing to a client portal I can build without hiring a developer," Maya told a fellow consultant. "And it took me an afternoon."


24.10 Security Recap: The Checklist Before You Deploy

Before any cloud-connected script goes into regular use, run through this checklist:

Pre-Deployment Security Checklist

[ ] .gitignore created with .env, credentials/, *.json, *.pem
[ ] .env file has all credentials (none in .py source files)
[ ] load_dotenv() called at the top of the main script
[ ] All os.environ.get() calls validated (raise ValueError if None)
[ ] S3 IAM policy uses least privilege (only needed actions, specific bucket ARN)
[ ] Google Service Account has access only to needed spreadsheets
[ ] Presigned URL expiration set to the minimum practical duration
[ ] Credentials never appear in print() statements or log output
[ ] Tested with invalid credentials to verify graceful error handling
[ ] .env verified as ignored by git before first commit

Credential Storage Reference

Credential Local Development Production
AWS access key ID + secret .env file AWS Secrets Manager or Lambda env vars
Google service account JSON File path in .env, JSON in .gitignore'd directory Path in platform env vars
Database connection string .env file Platform secrets panel
Email app password .env file Platform secrets panel
Any API key .env file Platform secrets panel

Complete .env Template for This Chapter

Save as .env.example (safe to commit as a template) and copy to .env (never commit):

# Acme Corp Cloud Configuration
# Copy to .env and fill in real values
# DO NOT commit .env to git

# AWS S3
AWS_ACCESS_KEY_ID=your_access_key_here
AWS_SECRET_ACCESS_KEY=your_secret_key_here
AWS_DEFAULT_REGION=us-east-1
ACME_S3_BUCKET=acme-corp-reports

# Google Sheets
GOOGLE_CREDENTIALS_FILE=credentials/google_service_account.json
CLIENT_DASHBOARD_SHEET_ID=your_sheet_id_from_url_here

# Database
DATABASE_URL=postgresql://user:password@host:port/database

# Email Notifications
SENDER_EMAIL=reports@acmecorp.com
SENDER_APP_PASSWORD=your_gmail_app_password
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SANDRA_EMAIL=sandra.chen@acmecorp.com

24.11 Chapter Summary

You have covered the essential building blocks for connecting Python to cloud services. Here is what you can now do:

Credential security is foundational. Environment variables, the .env file pattern, and .gitignore are not optional — they are the baseline expectation for any code that touches cloud services.

Amazon S3 gives your scripts a reliable place to store output files accessible to anyone you choose. The upload, presigned URL, and email notification pattern is one of the most practical workflows in business Python.

Google Sheets API lets Python write directly to a live spreadsheet that multiple people can view simultaneously. It bridges the gap between "Python script" and "shared dashboard" without requiring a web application.

Cloud databases use the same Python code as local databases. The connection string is what changes — SQLAlchemy abstracts the rest.

Serverless functions are the next step after cloud storage: code that runs automatically in the cloud on a schedule or in response to events. You understand the concept and can implement it when the need arises.

The complete pipeline — generate a report, upload to S3, create a presigned URL, email the link — is a workflow you can implement today and apply to almost any reporting task.

Priya's script now runs every Monday morning as a scheduled task on a server Marcus configured. It uploads the report to S3, generates a presigned URL, and sends Sandra a link before 7 AM. Sandra reads it over morning coffee. Marcus, who had been skeptical of the whole Python initiative, has started asking Priya what else can be automated.

That is the trajectory this chapter sets you on.


What Is Next

In Chapter 25, we will explore web scraping — gathering data from public web sources that do not offer APIs, monitoring competitor pricing, and pulling information into your Python pipeline. In Chapter 26, we will apply the data infrastructure you have built to business forecasting: predicting sales trends, understanding seasonality, and communicating uncertainty honestly with confidence intervals.


End of Chapter 24