21 min read

> "Security is not a feature—it is a quality of every feature."

Chapter 27: Security-First Development

"Security is not a feature—it is a quality of every feature." — Adapted from the principle of secure-by-design development

Learning Objectives

By the end of this chapter, you will be able to:

  1. Evaluate AI-generated code for common security vulnerabilities and articulate why the vibe-coding workflow demands heightened security awareness (Bloom's: Evaluate)
  2. Design comprehensive input validation schemes using whitelisting, sanitization, and schema-based validation with libraries such as Pydantic (Bloom's: Create)
  3. Analyze SQL injection and Cross-Site Scripting (XSS) attack vectors and implement defenses including parameterized queries, ORM safety, output encoding, and Content Security Policy headers (Bloom's: Analyze)
  4. Implement secure authentication flows using bcrypt password hashing, JWT handling, and session management best practices (Bloom's: Apply)
  5. Design authorization architectures using Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC) patterns (Bloom's: Create)
  6. Apply secrets management strategies including environment variables, secret managers, and automated secret-scanning tools (Bloom's: Apply)
  7. Evaluate dependency security using tools like pip-audit, Safety, and Dependabot to protect against supply-chain attacks (Bloom's: Evaluate)
  8. Create AI-assisted security audit workflows and build comprehensive security checklists for vibe-coded projects (Bloom's: Create)

27.1 The Security Mindset for Vibe Coders

Security-first development is not an afterthought bolted onto the end of a project. It is a way of thinking that permeates every line of code, every architectural decision, and every prompt you send to an AI coding assistant. For vibe coders—developers who rely on AI to generate, modify, and extend code—the security mindset is more important than ever.

Why AI-Generated Code Demands Extra Vigilance

When you generate code with an AI assistant, you inherit all the patterns the model learned from its training data. That training data includes millions of tutorials, Stack Overflow answers, and open-source projects—some of which contain insecure patterns that were acceptable five or ten years ago but are dangerous today.

Warning: AI Models Learn from Insecure Code Too

AI coding assistants learn from the broad corpus of publicly available code. Research from Stanford University (2021) and NYU (2023) demonstrated that AI code-generation tools can produce code with security vulnerabilities approximately 25–40% of the time when generating security-sensitive functions. The model does not distinguish between a "teaching example" that omits error handling for clarity and production code that must be hardened. You must.

Consider a simple example. You ask an AI assistant to generate a function that queries a database for a user by name:

# AI-generated code — INSECURE
def get_user(name):
    query = f"SELECT * FROM users WHERE name = '{name}'"
    cursor.execute(query)
    return cursor.fetchone()

This code compiles and runs. It returns the correct user. In a vibe-coding session, where you are moving fast and the code "works," you might accept it and move on. But this function is vulnerable to SQL injection—one of the most devastating and well-known attack vectors in software history. An attacker who supplies ' OR '1'='1 as the name will receive every row in the users table.

The security mindset means asking yourself a series of questions every time you accept AI-generated code:

  1. What inputs does this function accept? Are they validated? Are they sanitized?
  2. What external systems does this function interact with? Databases, file systems, APIs, the browser DOM?
  3. What happens if the input is malicious? Not accidental—intentionally crafted to cause harm.
  4. Does this function handle secrets? API keys, passwords, tokens?
  5. Does this function enforce authorization? Can an unauthorized user reach it?

Key Concept: The Trust Boundary

A trust boundary is any point where data crosses from an untrusted source to a trusted context. User input entering your application is a trust boundary. Data arriving from a third-party API is a trust boundary. Even data read from a database—if it was originally user-supplied—crosses a trust boundary when rendered in HTML. Every trust boundary requires validation, sanitization, or encoding.

The Three Pillars of Security-First Vibe Coding

Throughout this chapter, we will return to three principles:

  1. Never trust, always verify. Every piece of data entering your system must be validated, regardless of its source.
  2. Least privilege. Every component, user, and process should have the minimum permissions necessary to perform its function.
  3. Defense in depth. No single control should be your only protection. Layer defenses so that if one fails, others catch the attack.

These principles apply universally, but they take on special urgency when AI generates your code. The AI does not know your threat model, your compliance requirements, or your data classification. You do.

Cross-Reference: As we discussed in Chapter 14 (When AI Gets It Wrong), AI failure modes include generating insecure code that appears correct. Always apply the verification strategies from that chapter—code review, testing, and static analysis—with a security lens.


27.2 Input Validation and Sanitization

Input validation is the single most impactful security practice you can adopt. The majority of web application vulnerabilities—SQL injection, XSS, command injection, path traversal—stem from insufficient input validation.

Whitelisting vs. Blacklisting

There are two philosophical approaches to validation:

  • Blacklisting (deny-list): Reject inputs that match known-bad patterns. For example, strip <script> tags from user input.
  • Whitelisting (allow-list): Accept only inputs that match known-good patterns. For example, accept only alphanumeric characters for a username.

Always prefer whitelisting. Blacklists are inherently incomplete—attackers constantly discover new encodings, obfuscation techniques, and bypasses. A whitelist defines exactly what is acceptable and rejects everything else.

import re

def validate_username(username: str) -> bool:
    """Validate username: 3-30 alphanumeric characters plus underscores."""
    pattern = r'^[a-zA-Z0-9_]{3,30}$'
    return bool(re.match(pattern, username))

Schema-Based Validation with Pydantic

For complex data structures—API request bodies, configuration files, form submissions—manual validation becomes unwieldy. Pydantic provides schema-based validation that is both expressive and secure:

from pydantic import BaseModel, EmailStr, Field, field_validator
import re

class UserRegistration(BaseModel):
    username: str = Field(..., min_length=3, max_length=30)
    email: EmailStr
    password: str = Field(..., min_length=12, max_length=128)
    age: int = Field(..., ge=13, le=150)

    @field_validator('username')
    @classmethod
    def username_alphanumeric(cls, v: str) -> str:
        if not re.match(r'^[a-zA-Z0-9_]+$', v):
            raise ValueError('Username must be alphanumeric')
        return v

    @field_validator('password')
    @classmethod
    def password_complexity(cls, v: str) -> str:
        if not re.search(r'[A-Z]', v):
            raise ValueError('Password must contain an uppercase letter')
        if not re.search(r'[a-z]', v):
            raise ValueError('Password must contain a lowercase letter')
        if not re.search(r'[0-9]', v):
            raise ValueError('Password must contain a digit')
        if not re.search(r'[^a-zA-Z0-9]', v):
            raise ValueError('Password must contain a special character')
        return v

Best Practice: Validate Early, Validate Strictly

Validate input at the earliest possible point—at the API boundary, before it enters your business logic. Use strict schemas that reject unexpected fields, constrain lengths, and enforce types. This prevents entire classes of vulnerabilities from reaching your application core.

Sanitization: When Validation Alone Is Not Enough

Sometimes you must accept rich input—HTML content for a blog post, Markdown for comments, file uploads. In these cases, sanitization removes or encodes dangerous content while preserving legitimate functionality:

import bleach

def sanitize_html(content: str) -> str:
    """Sanitize HTML, allowing only safe tags and attributes."""
    allowed_tags = ['p', 'br', 'strong', 'em', 'a', 'ul', 'ol', 'li', 'code', 'pre']
    allowed_attrs = {'a': ['href', 'title']}
    return bleach.clean(
        content,
        tags=allowed_tags,
        attributes=allowed_attrs,
        strip=True
    )

Prompting AI for Secure Validation

When prompting your AI assistant, be explicit about security requirements:

Prompt: "Generate an input validation function for a user registration endpoint.
Use Pydantic for schema validation. The function must:
- Whitelist-validate the username (alphanumeric + underscore, 3-30 chars)
- Validate email format
- Enforce password complexity (12+ chars, mixed case, digit, special char)
- Reject any unexpected fields
- Return structured error messages
Include type hints and docstrings."

Explicit prompts that mention whitelisting, complexity requirements, and rejection of unexpected fields produce dramatically more secure code than vague prompts like "validate user input."


27.3 SQL Injection Prevention

SQL injection (SQLi) occurs when an attacker injects malicious SQL code into a query by manipulating user-supplied input. Despite being understood since the late 1990s, SQL injection remains in the OWASP Top 10 and continues to cause major breaches.

How SQL Injection Works

Consider the vulnerable code from Section 27.1:

# VULNERABLE — string interpolation in SQL
def get_user(name: str):
    query = f"SELECT * FROM users WHERE name = '{name}'"
    cursor.execute(query)
    return cursor.fetchone()

An attacker supplies name = "'; DROP TABLE users; --". The resulting query becomes:

SELECT * FROM users WHERE name = ''; DROP TABLE users; --'

The database executes two statements: the original SELECT (which returns nothing) and a DROP TABLE that destroys your data.

The Defense: Parameterized Queries

Parameterized queries (also called prepared statements) separate SQL code from data. The database driver sends the query structure and the parameter values separately, making injection impossible:

# SECURE — parameterized query
def get_user(name: str):
    query = "SELECT * FROM users WHERE name = %s"
    cursor.execute(query, (name,))
    return cursor.fetchone()

The %s placeholder is not string interpolation—it is a parameter marker that the database driver handles safely. The database knows that the value of name is data, never code.

Rule: Never Build SQL Queries with String Formatting

Do not use f-strings, .format(), % formatting, or string concatenation to build SQL queries. Always use parameterized queries. This rule has zero exceptions.

ORM Safety with SQLAlchemy

Object-Relational Mappers (ORMs) like SQLAlchemy abstract database queries into Python objects. When used correctly, they generate parameterized queries automatically:

from sqlalchemy import select
from sqlalchemy.orm import Session

def get_user_by_name(session: Session, name: str) -> User | None:
    """Retrieve a user by name using SQLAlchemy ORM — safe from SQLi."""
    stmt = select(User).where(User.name == name)
    return session.scalars(stmt).first()

However, ORMs are not automatically safe. If you use raw SQL through the ORM, you must still parameterize:

# VULNERABLE — raw SQL through ORM without parameters
result = session.execute(f"SELECT * FROM users WHERE name = '{name}'")

# SECURE — raw SQL through ORM with parameters
from sqlalchemy import text
result = session.execute(text("SELECT * FROM users WHERE name = :name"), {"name": name})

AI Pitfall: ORMs and Raw SQL

AI assistants sometimes generate ORM code that falls back to raw SQL strings for complex queries. Always check whether raw SQL segments are parameterized. The presence of an ORM does not guarantee safety if raw SQL is used improperly within it.

Dynamic Query Building

Sometimes you need to build queries dynamically—for search filters, sorting, or pagination. Never interpolate column names or table names from user input. Use whitelists:

ALLOWED_SORT_COLUMNS = {"name", "email", "created_at"}
ALLOWED_SORT_DIRECTIONS = {"asc", "desc"}

def build_user_query(sort_by: str = "name", direction: str = "asc") -> str:
    """Build a user query with safe dynamic sorting."""
    if sort_by not in ALLOWED_SORT_COLUMNS:
        sort_by = "name"
    if direction.lower() not in ALLOWED_SORT_DIRECTIONS:
        direction = "asc"
    return f"SELECT * FROM users ORDER BY {sort_by} {direction}"

Because sort_by and direction can only take values from a predefined whitelist, injection is impossible.


27.4 Cross-Site Scripting (XSS) Prevention

Cross-Site Scripting (XSS) occurs when an attacker injects malicious scripts into web pages viewed by other users. XSS can steal session cookies, redirect users to phishing sites, deface web pages, and perform actions on behalf of authenticated users.

Types of XSS

  1. Stored (Persistent) XSS: Malicious script is stored in the database (e.g., in a comment) and served to every user who views the page.
  2. Reflected XSS: Malicious script is included in a URL parameter and reflected back in the response.
  3. DOM-Based XSS: Malicious script manipulates the page's DOM through client-side JavaScript without the server being involved.

Output Encoding: The Primary Defense

The core defense against XSS is output encoding—converting special characters into their HTML entity equivalents before rendering them in the browser:

Character HTML Entity
< &lt;
> &gt;
& &amp;
" &quot;
' &#x27;

Most modern template engines perform automatic output encoding:

# Jinja2 — auto-escaping is ON by default in Flask
from flask import render_template

@app.route('/profile/<username>')
def profile(username):
    return render_template('profile.html', username=username)

In profile.html:

<!-- Jinja2 auto-escapes {{ username }} -->
<h1>Welcome, {{ username }}</h1>

If a user sets their name to <script>alert('XSS')</script>, the rendered HTML becomes:

<h1>Welcome, &lt;script&gt;alert(&#x27;XSS&#x27;)&lt;/script&gt;</h1>

The script is displayed as text, not executed.

Warning: Disabling Auto-Escaping

Jinja2's |safe filter and the {% autoescape false %} block disable auto-escaping. AI assistants sometimes use these in generated templates for "convenience." If you see |safe in AI-generated code, ask why. Only use it when you have already sanitized the content through a library like Bleach.

Content Security Policy (CSP) Headers

Content Security Policy is a browser-enforced security mechanism that restricts what resources a page can load. A strict CSP is one of the most effective defenses against XSS:

from flask import Flask, make_response

app = Flask(__name__)

@app.after_request
def add_security_headers(response):
    """Add security headers to every response."""
    response.headers['Content-Security-Policy'] = (
        "default-src 'self'; "
        "script-src 'self'; "
        "style-src 'self' 'unsafe-inline'; "
        "img-src 'self' data:; "
        "font-src 'self'; "
        "connect-src 'self'; "
        "frame-ancestors 'none'; "
        "base-uri 'self'; "
        "form-action 'self'"
    )
    response.headers['X-Content-Type-Options'] = 'nosniff'
    response.headers['X-Frame-Options'] = 'DENY'
    response.headers['X-XSS-Protection'] = '0'  # Deprecated; CSP is the replacement
    response.headers['Strict-Transport-Security'] = 'max-age=31536000; includeSubDomains'
    response.headers['Referrer-Policy'] = 'strict-origin-when-cross-origin'
    return response

The script-src 'self' directive tells the browser to execute only scripts loaded from your own domain. Inline scripts injected by an attacker will be blocked.

Best Practice: Start Strict, Relax as Needed

Begin with the most restrictive CSP possible (default-src 'self'). When legitimate functionality breaks, add specific exceptions. Never use unsafe-eval in production—it defeats the purpose of CSP.

Additional XSS Prevention Measures

  • HTTPOnly cookies: Set the HttpOnly flag on session cookies so JavaScript cannot access them.
  • SameSite cookies: Set SameSite=Lax or SameSite=Strict to prevent Cross-Site Request Forgery (CSRF).
  • Input sanitization: For rich content, sanitize on input using Bleach (as shown in Section 27.2).
  • Subresource Integrity (SRI): When loading scripts from CDNs, use integrity hashes to ensure the file has not been tampered with.
<script src="https://cdn.example.com/lib.js"
        integrity="sha384-abc123..."
        crossorigin="anonymous"></script>

27.5 Authentication Patterns and Best Practices

Authentication is the process of verifying a user's identity. Broken authentication is consistently ranked in the OWASP Top 10, and AI-generated authentication code is frequently insecure.

Cross-Reference: Chapter 17 (Backend Development and REST APIs) introduced basic authentication concepts. This section deepens that discussion with a focus on security hardening.

Password Hashing with bcrypt

Never store passwords in plaintext. Never use MD5 or SHA-256 for password hashing—these are fast hash algorithms designed for data integrity, not password storage. Use a slow, salted, adaptive hash function:

import bcrypt

def hash_password(password: str) -> str:
    """Hash a password using bcrypt with automatic salting."""
    salt = bcrypt.gensalt(rounds=12)  # 2^12 iterations
    hashed = bcrypt.hashpw(password.encode('utf-8'), salt)
    return hashed.decode('utf-8')

def verify_password(password: str, hashed: str) -> bool:
    """Verify a password against its bcrypt hash."""
    return bcrypt.checkpw(
        password.encode('utf-8'),
        hashed.encode('utf-8')
    )

Why bcrypt?

bcrypt is deliberately slow. The rounds parameter (also called the work factor) determines the computational cost. At rounds=12, hashing takes approximately 250ms on modern hardware. This makes brute-force attacks impractical: an attacker who can compute billions of SHA-256 hashes per second can compute only a few bcrypt hashes per second. Other strong choices include Argon2 (the winner of the Password Hashing Competition) and scrypt.

JSON Web Tokens (JWT): Proper Handling

JWTs are widely used for stateless authentication, but they are frequently mishandled. Common vulnerabilities include:

  1. Algorithm confusion: The none algorithm or switching from RSA to HMAC.
  2. Insufficient validation: Not checking expiration, issuer, or audience.
  3. Token storage: Storing JWTs in localStorage (vulnerable to XSS).
  4. Secret weakness: Using short or predictable signing secrets.

Secure JWT implementation:

import jwt
from datetime import datetime, timedelta, timezone
from typing import Any

SECRET_KEY = "your-256-bit-secret-loaded-from-env"  # See Section 27.7
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 15
REFRESH_TOKEN_EXPIRE_DAYS = 7

def create_access_token(user_id: int, roles: list[str]) -> str:
    """Create a short-lived access token."""
    now = datetime.now(timezone.utc)
    payload = {
        "sub": str(user_id),
        "roles": roles,
        "iat": now,
        "exp": now + timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES),
        "iss": "your-app-name",
        "type": "access"
    }
    return jwt.encode(payload, SECRET_KEY, algorithm=ALGORITHM)

def verify_access_token(token: str) -> dict[str, Any]:
    """Verify and decode an access token with full validation."""
    try:
        payload = jwt.decode(
            token,
            SECRET_KEY,
            algorithms=[ALGORITHM],  # Note: list — prevents algorithm confusion
            options={
                "require": ["sub", "exp", "iat", "iss", "type"],
                "verify_exp": True,
                "verify_iss": True,
            },
            issuer="your-app-name"
        )
        if payload.get("type") != "access":
            raise jwt.InvalidTokenError("Invalid token type")
        return payload
    except jwt.ExpiredSignatureError:
        raise ValueError("Token has expired")
    except jwt.InvalidTokenError as e:
        raise ValueError(f"Invalid token: {e}")

Key points in the secure implementation:

  • Short expiration: Access tokens expire in 15 minutes. Use refresh tokens for longer sessions.
  • Explicit algorithm list: Pass algorithms=[ALGORITHM] as a list to prevent algorithm confusion attacks.
  • Required claims: The require option ensures all expected claims are present.
  • Token type checking: Prevents a refresh token from being used as an access token.
  • Issuer validation: Ensures the token was issued by your application.

Session Security

For server-side sessions (an alternative to JWTs), apply these hardening measures:

from flask import Flask

app = Flask(__name__)
app.config.update(
    SECRET_KEY='loaded-from-environment-variable',
    SESSION_COOKIE_HTTPONLY=True,      # JavaScript cannot access the cookie
    SESSION_COOKIE_SECURE=True,        # Cookie sent only over HTTPS
    SESSION_COOKIE_SAMESITE='Lax',     # Mitigates CSRF
    SESSION_COOKIE_NAME='__Host-session',  # __Host- prefix enforces Secure+Path=/
    PERMANENT_SESSION_LIFETIME=1800,   # 30-minute session timeout
)

AI Pitfall: Default Session Configuration

AI assistants frequently generate Flask or Django applications with default session settings. Default settings rarely include Secure, HttpOnly, or SameSite attributes. Always audit session configuration in AI-generated code.

Rate Limiting and Account Protection

Authentication endpoints are high-value targets. Protect them with:

  • Rate limiting: Limit login attempts per IP address and per account.
  • Account lockout: Temporarily lock accounts after repeated failed attempts.
  • CAPTCHA: Add CAPTCHA after a threshold of failed attempts.
  • Multi-factor authentication (MFA): Require a second factor for sensitive operations.
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

limiter = Limiter(
    get_remote_address,
    app=app,
    default_limits=["200 per day", "50 per hour"],
    storage_uri="redis://localhost:6379"
)

@app.route('/login', methods=['POST'])
@limiter.limit("5 per minute")  # Strict rate limit on login
def login():
    # ... authentication logic
    pass

27.6 Authorization and Access Control

Authentication answers "Who are you?" Authorization answers "What are you allowed to do?" Even if authentication is perfect, broken authorization can expose sensitive data and functionality.

Role-Based Access Control (RBAC)

RBAC assigns permissions to roles, and roles to users. It is the most common authorization model for web applications:

from enum import Enum
from functools import wraps
from flask import g, abort

class Role(str, Enum):
    VIEWER = "viewer"
    EDITOR = "editor"
    ADMIN = "admin"

# Permission matrix: role -> set of allowed actions
PERMISSIONS: dict[Role, set[str]] = {
    Role.VIEWER: {"read_article", "read_comment"},
    Role.EDITOR: {"read_article", "read_comment", "create_article", "edit_article",
                  "create_comment"},
    Role.ADMIN: {"read_article", "read_comment", "create_article", "edit_article",
                 "delete_article", "create_comment", "delete_comment", "manage_users"},
}

def require_permission(permission: str):
    """Decorator to enforce permission-based access control."""
    def decorator(f):
        @wraps(f)
        def decorated_function(*args, **kwargs):
            user_role = getattr(g, 'user_role', None)
            if user_role is None:
                abort(401)  # Unauthenticated
            allowed = PERMISSIONS.get(Role(user_role), set())
            if permission not in allowed:
                abort(403)  # Forbidden
            return f(*args, **kwargs)
        return decorated_function
    return decorator

@app.route('/admin/users')
@require_permission('manage_users')
def manage_users():
    """Only admins can access this endpoint."""
    # ... user management logic
    pass

Attribute-Based Access Control (ABAC)

ABAC makes authorization decisions based on attributes of the user, the resource, and the environment. It is more flexible than RBAC for complex scenarios:

from dataclasses import dataclass
from datetime import datetime, timezone

@dataclass
class AccessRequest:
    user_id: int
    user_role: str
    user_department: str
    resource_owner_id: int
    resource_classification: str  # "public", "internal", "confidential"
    action: str
    request_time: datetime

def evaluate_access(request: AccessRequest) -> bool:
    """Evaluate an access request using attribute-based rules."""
    # Rule 1: Admins can access everything
    if request.user_role == "admin":
        return True

    # Rule 2: Users can always read public resources
    if request.action == "read" and request.resource_classification == "public":
        return True

    # Rule 3: Users can edit their own resources
    if request.action in ("read", "edit") and request.user_id == request.resource_owner_id:
        return True

    # Rule 4: Internal resources require same department
    if (request.resource_classification == "internal"
            and request.action == "read"
            and request.user_department != ""):
        return True

    # Rule 5: Confidential resources require explicit admin role
    if request.resource_classification == "confidential" and request.user_role != "admin":
        return False

    # Default deny
    return False

Principle: Default Deny

Always default to denying access. If no rule explicitly grants permission, the request should be rejected. This is the authorization equivalent of whitelisting.

Common Authorization Vulnerabilities

  1. Insecure Direct Object References (IDOR): A user accesses another user's data by changing an ID in the URL. Always verify that the authenticated user is authorized to access the specific resource.
  2. Privilege escalation: A regular user gains admin privileges by manipulating requests. Always validate roles server-side; never trust client-side role information.
  3. Missing function-level access control: Administrative endpoints lack authorization checks. Apply the require_permission pattern to every endpoint.
  4. Path traversal: A user accesses files outside the intended directory. Validate and sanitize file paths.
import os

def safe_file_path(base_dir: str, user_filename: str) -> str:
    """Resolve a filename safely, preventing path traversal."""
    # Normalize and resolve the path
    requested_path = os.path.realpath(os.path.join(base_dir, user_filename))
    base_path = os.path.realpath(base_dir)

    # Ensure the resolved path starts with the base directory
    if not requested_path.startswith(base_path + os.sep) and requested_path != base_path:
        raise ValueError("Path traversal detected")

    return requested_path

27.7 Secrets Management

Secrets—API keys, database passwords, encryption keys, OAuth client secrets—are the crown jewels of your application. A single exposed secret can compromise your entire system.

The Cardinal Rule: Never Hardcode Secrets

# NEVER DO THIS
DATABASE_URL = "postgresql://admin:SuperSecret123@db.example.com:5432/myapp"
API_KEY = "sk-abc123def456ghi789"

This seems obvious, but it is one of the most common security failures in AI-generated code. AI assistants routinely generate code with placeholder secrets like "your-api-key-here" or, worse, realistic-looking secrets that developers forget to replace.

Warning: AI and Hardcoded Secrets

Research published by GitGuardian (2024) found that AI coding assistants are a significant contributor to secret sprawl. When you prompt an AI for "a complete working example with database connection," the generated code almost always includes hardcoded credentials. Always review AI-generated code for embedded secrets before committing.

Environment Variables: The Minimum Standard

Environment variables separate secrets from code:

import os

DATABASE_URL = os.environ["DATABASE_URL"]  # Raises KeyError if not set
API_KEY = os.environ.get("API_KEY")  # Returns None if not set

# Better: with validation
def get_required_env(name: str) -> str:
    """Get a required environment variable or raise a clear error."""
    value = os.environ.get(name)
    if not value:
        raise EnvironmentError(
            f"Required environment variable '{name}' is not set. "
            f"See .env.example for required variables."
        )
    return value

Use a .env file for local development (loaded with python-dotenv) and always add .env to .gitignore:

# .env — NEVER commit this file
DATABASE_URL=postgresql://user:pass@localhost:5432/mydb
API_KEY=sk-development-key
JWT_SECRET=local-dev-secret-at-least-256-bits-long
# .env.example — commit this file as documentation
# DATABASE_URL=postgresql://user:pass@host:port/dbname
# API_KEY=your-api-key
# JWT_SECRET=your-jwt-secret-minimum-256-bits

Secret Managers for Production

For production deployments, use a dedicated secret manager:

Service Provider Key Feature
AWS Secrets Manager Amazon Automatic rotation
Google Secret Manager Google Cloud IAM integration
Azure Key Vault Microsoft HSM-backed keys
HashiCorp Vault Open source / Enterprise Multi-cloud support
Doppler SaaS Developer-friendly
# Example: Loading secrets from AWS Secrets Manager
import boto3
import json

def get_secret(secret_name: str, region: str = "us-east-1") -> dict:
    """Retrieve a secret from AWS Secrets Manager."""
    client = boto3.client("secretsmanager", region_name=region)
    response = client.get_secret_value(SecretId=secret_name)
    return json.loads(response["SecretString"])

# Usage
secrets = get_secret("myapp/production")
database_url = secrets["DATABASE_URL"]

Automated Secret Scanning

Add automated secret scanning to your development workflow:

  1. Pre-commit hooks: Use detect-secrets or gitleaks to scan for secrets before they enter version control.
  2. CI/CD scanning: Run secret scanners in your CI pipeline.
  3. Repository scanning: Enable GitHub's secret scanning or use GitGuardian.
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']

Emergency Response: What to Do When a Secret Is Exposed

  1. Immediately rotate the secret. Generate a new key, password, or token.
  2. Revoke the old secret. Disable the compromised credential in the service provider's console.
  3. Audit access logs. Determine whether the secret was used maliciously.
  4. Scrub git history. Use git filter-repo or BFG Repo-Cleaner to remove the secret from all commits.
  5. Post-incident review. Determine how the secret was exposed and implement controls to prevent recurrence.

27.8 Dependency Security and Supply Chain

Modern Python applications depend on dozens or hundreds of third-party packages. Each dependency is a potential attack vector.

The Supply Chain Threat

Supply chain attacks target the software you depend on, rather than your code directly:

  • Typosquatting: An attacker publishes reqeusts (a typo of requests) containing malware.
  • Dependency confusion: An attacker publishes a public package with the same name as your private internal package.
  • Compromised maintainers: An attacker gains access to a legitimate package's publishing credentials.
  • Malicious updates: A previously safe package publishes a new version containing malware.

Auditing Dependencies

Use automated tools to scan your dependencies for known vulnerabilities:

# pip-audit: Scans installed packages against the OSV vulnerability database
pip install pip-audit
pip-audit

# Safety: Scans requirements files against the Safety DB
pip install safety
safety check --file requirements.txt

# pip-audit with requirements file
pip-audit -r requirements.txt

Best Practice: Pin Your Dependencies

Always pin exact versions in production requirements files. Use pip freeze > requirements.txt or, better yet, use a lock file tool like pip-tools or Poetry that captures the entire dependency tree with exact versions and hashes.

# requirements.txt — pinned with hashes for integrity verification
flask==3.0.0 \
    --hash=sha256:abc123...
pydantic==2.5.0 \
    --hash=sha256:def456...

Automated Dependency Updates

Use Dependabot (GitHub), Renovate, or similar tools to receive automated pull requests when dependencies have security updates:

# .github/dependabot.yml
version: 2
updates:
  - package-ecosystem: "pip"
    directory: "/"
    schedule:
      interval: "weekly"
    open-pull-requests-limit: 10
    labels:
      - "dependencies"
      - "security"

Evaluating Dependencies Before Adoption

Before adding a new dependency, evaluate it:

  1. Maintenance status: Is it actively maintained? When was the last release?
  2. Security track record: Check for past CVEs (Common Vulnerabilities and Exposures).
  3. Popularity and community: More eyes on the code means more chance of catching issues.
  4. Dependency tree: What does it depend on? A single import can pull in dozens of transitive dependencies.
  5. Minimal surface area: Prefer small, focused libraries over large frameworks when possible.

AI Pitfall: Phantom and Outdated Dependencies

AI assistants sometimes suggest packages that no longer exist, have been renamed, or have been deprecated due to security issues. Always verify that a suggested dependency exists on PyPI, is actively maintained, and has no known critical vulnerabilities before installing it.


27.9 Security Testing with AI

AI assistants can be powerful allies in security testing when prompted effectively. They can help you think like an attacker, generate test cases, and review code for vulnerabilities.

AI-Assisted Code Review for Security

Prompt your AI assistant to perform a security-focused code review:

Prompt: "Review the following Python code for security vulnerabilities.
Check for:
1. SQL injection
2. XSS vulnerabilities
3. Hardcoded secrets
4. Missing input validation
5. Insecure cryptographic practices
6. Path traversal vulnerabilities
7. Insecure deserialization
8. Missing rate limiting
9. Improper error handling that leaks information
10. Insecure default configurations

For each issue found, explain:
- The vulnerability type and severity (Critical/High/Medium/Low)
- How an attacker could exploit it
- The specific fix with corrected code

[paste your code here]"

Generating Security Test Cases

AI can generate comprehensive security test suites:

Prompt: "Generate pytest security tests for a user registration endpoint
that accepts JSON with fields: username, email, password.
Include tests for:
- SQL injection payloads in each field
- XSS payloads in each field
- Buffer overflow (extremely long strings)
- Unicode edge cases
- Null bytes and control characters
- Missing required fields
- Extra unexpected fields
- Type confusion (sending int where string expected)
Use parameterized tests for efficiency."

The AI might generate tests like:

import pytest
from typing import Any

SQL_INJECTION_PAYLOADS = [
    "' OR '1'='1",
    "'; DROP TABLE users; --",
    "' UNION SELECT * FROM users --",
    "1; UPDATE users SET role='admin' WHERE '1'='1",
    "' OR 1=1 --",
]

XSS_PAYLOADS = [
    "<script>alert('xss')</script>",
    '<img src=x onerror=alert("xss")>',
    "javascript:alert('xss')",
    '<svg/onload=alert("xss")>',
    "'\"><script>alert('xss')</script>",
]

OVERFLOW_PAYLOADS = [
    "A" * 10_000,
    "A" * 100_000,
    "A" * 1_000_000,
]

@pytest.mark.parametrize("payload", SQL_INJECTION_PAYLOADS)
def test_registration_resists_sql_injection(client, payload: str):
    """Ensure SQL injection payloads do not cause errors or data leakage."""
    response = client.post('/register', json={
        'username': payload,
        'email': 'test@example.com',
        'password': 'SecureP@ssw0rd123'
    })
    assert response.status_code in (400, 422)  # Should be rejected by validation
    assert 'error' not in response.get_data(as_text=True).lower() or \
           'sql' not in response.get_data(as_text=True).lower()

@pytest.mark.parametrize("payload", XSS_PAYLOADS)
def test_registration_resists_xss(client, payload: str):
    """Ensure XSS payloads are rejected or sanitized."""
    response = client.post('/register', json={
        'username': 'testuser',
        'email': 'test@example.com',
        'password': payload
    })
    # The payload should either be rejected or sanitized
    if response.status_code == 200:
        data = response.get_json()
        assert '<script>' not in str(data)
        assert 'onerror' not in str(data)

Penetration Testing Prompts

Use AI to plan and structure penetration tests:

Prompt: "I am performing an authorized security assessment of my Flask
web application. Help me create a penetration testing checklist covering:
1. Authentication bypass attempts
2. Authorization bypass (IDOR, privilege escalation)
3. Input validation bypass
4. Session management weaknesses
5. API security issues
6. Information disclosure in error messages
7. HTTP header security
8. CORS misconfiguration
9. File upload vulnerabilities
10. Business logic flaws

For each category, list specific tests to perform and the tools or
techniques to use."

Ethical Note: Authorized Testing Only

AI-generated penetration testing guidance should only be used on systems you own or have explicit written authorization to test. Unauthorized security testing is illegal in most jurisdictions. Always ensure you have proper authorization before conducting any security assessment.

Static Analysis Integration

Complement AI code review with automated static analysis tools:

# Bandit: Python security linter
pip install bandit
bandit -r myproject/ -f json -o bandit_report.json

# Semgrep: Pattern-based static analysis
pip install semgrep
semgrep --config=p/python myproject/

# Combined pipeline
bandit -r myproject/ -ll && semgrep --config=p/owasp-top-ten myproject/

Integrate these tools into your CI/CD pipeline so that every commit is automatically scanned:

# GitHub Actions example
name: Security Scan
on: [push, pull_request]
jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install bandit safety pip-audit
      - run: bandit -r src/ -ll
      - run: safety check --file requirements.txt
      - run: pip-audit -r requirements.txt

27.10 Building a Security Checklist

A security checklist transforms abstract principles into concrete, verifiable actions. Use this checklist for every vibe-coded project, adapting it to your specific context.

Pre-Development Checklist

  • [ ] Define the threat model: What data do we handle? Who are our adversaries?
  • [ ] Classify data sensitivity: public, internal, confidential, restricted.
  • [ ] Identify compliance requirements: GDPR, HIPAA, PCI-DSS, SOC 2.
  • [ ] Choose authentication and authorization patterns.
  • [ ] Plan secrets management strategy.
  • [ ] Set up .gitignore to exclude .env, credentials, and key files.
  • [ ] Install pre-commit hooks for secret scanning.

Input Validation Checklist

  • [ ] All user inputs are validated using whitelist patterns.
  • [ ] Schema validation (Pydantic or equivalent) is applied to all API request bodies.
  • [ ] File uploads are validated for type, size, and content (not just extension).
  • [ ] Numeric inputs have range constraints.
  • [ ] String inputs have length constraints.
  • [ ] Unexpected fields in request bodies are rejected.

Database Security Checklist

  • [ ] All SQL queries use parameterized statements.
  • [ ] No string interpolation or concatenation in SQL.
  • [ ] ORM raw-SQL fallbacks use parameterized queries.
  • [ ] Database user has minimum necessary privileges (no GRANT ALL).
  • [ ] Database connections use TLS encryption.
  • [ ] Database credentials are stored in secret manager, not in code.

Authentication Checklist

  • [ ] Passwords are hashed with bcrypt (rounds >= 12), Argon2, or scrypt.
  • [ ] Password complexity requirements are enforced (length >= 12, mixed case, digit, special character).
  • [ ] Login endpoint has rate limiting (e.g., 5 attempts per minute).
  • [ ] Account lockout is implemented after repeated failures.
  • [ ] JWTs use strong secrets (>= 256 bits), explicit algorithms, and short expiration.
  • [ ] Refresh tokens are stored securely and can be revoked.
  • [ ] Session cookies use HttpOnly, Secure, SameSite attributes.
  • [ ] Multi-factor authentication is available for sensitive operations.
  • [ ] Password reset flow uses time-limited, single-use tokens.

Authorization Checklist

  • [ ] Every endpoint has explicit authorization checks.
  • [ ] Default behavior is "deny" (no access without explicit grant).
  • [ ] IDOR protection: users cannot access other users' resources by changing IDs.
  • [ ] Role-based or attribute-based access control is consistently applied.
  • [ ] Administrative functions are protected by separate, stricter authorization.
  • [ ] File path access is protected against traversal attacks.

Secrets Management Checklist

  • [ ] No hardcoded secrets in source code.
  • [ ] .env files are in .gitignore.
  • [ ] .env.example documents required variables without real values.
  • [ ] Production secrets are stored in a dedicated secret manager.
  • [ ] Secrets are rotated on a regular schedule.
  • [ ] Pre-commit hooks scan for accidentally committed secrets.

HTTP Security Checklist

  • [ ] Content-Security-Policy header is configured.
  • [ ] Strict-Transport-Security (HSTS) header is set.
  • [ ] X-Content-Type-Options: nosniff is set.
  • [ ] X-Frame-Options: DENY is set.
  • [ ] Referrer-Policy is configured.
  • [ ] CORS is configured with specific origins (not * in production).
  • [ ] All traffic uses HTTPS.

Dependency Security Checklist

  • [ ] Dependencies are pinned to exact versions.
  • [ ] Dependency audit tools (pip-audit, Safety) are run regularly.
  • [ ] Dependabot or Renovate is configured for automated updates.
  • [ ] New dependencies are evaluated before adoption.
  • [ ] Lock files are committed to version control.

Testing and Monitoring Checklist

  • [ ] Security-focused unit tests cover authentication, authorization, and input validation.
  • [ ] SQL injection test cases are included.
  • [ ] XSS test cases are included.
  • [ ] Static analysis (Bandit, Semgrep) runs in CI/CD.
  • [ ] Dependency scanning runs in CI/CD.
  • [ ] Error messages do not leak sensitive information (stack traces, database errors, file paths).
  • [ ] Logging captures security events (failed logins, authorization failures) without logging sensitive data (passwords, tokens).

AI-Specific Security Checklist

  • [ ] All AI-generated code has been reviewed for hardcoded secrets.
  • [ ] All AI-generated database queries use parameterized statements.
  • [ ] All AI-generated templates use auto-escaping.
  • [ ] AI-generated authentication code has been compared against OWASP best practices.
  • [ ] AI-generated dependencies have been verified to exist and be actively maintained.
  • [ ] AI-generated error handling does not leak sensitive information.
  • [ ] AI-generated file operations include path traversal protection.

Best Practice: The Checklist as a Living Document

This checklist should be customized for your project and updated as new threats emerge. Store it in your repository (e.g., SECURITY_CHECKLIST.md) and require developers to review it before opening pull requests. Automate as many checks as possible through CI/CD.


Chapter Summary

Security-first development is not a phase—it is a mindset that permeates every aspect of the software development lifecycle. For vibe coders, who rely on AI to generate significant portions of their code, the stakes are even higher. AI assistants produce code that works, but "working" and "secure" are not synonyms.

In this chapter, we covered the critical security domains that every vibe coder must master:

  1. Input validation is the first line of defense. Whitelist-validate all inputs using schema-based tools like Pydantic.
  2. SQL injection is prevented entirely by using parameterized queries—never string interpolation in SQL.
  3. XSS is prevented by output encoding (auto-escaping in templates) and reinforced by Content Security Policy headers.
  4. Authentication requires strong password hashing (bcrypt/Argon2), careful JWT handling, and session security hardening.
  5. Authorization should follow the principle of least privilege, default to deny, and protect against IDOR and privilege escalation.
  6. Secrets management means never hardcoding credentials, using environment variables as a minimum, and secret managers for production.
  7. Dependency security demands pinned versions, automated auditing, and careful evaluation of new dependencies.
  8. Security testing leverages AI for code review, test generation, and penetration testing planning, complemented by automated static analysis.
  9. The security checklist transforms principles into verifiable actions that integrate into your development workflow.

The three pillars—never trust, always verify; least privilege; defense in depth—should guide every decision you make. When AI generates code that looks correct but lacks security controls, it is your responsibility to recognize the gap and close it. Security is not the AI's job. It is yours.

What's Next: In Chapter 28, we will explore Performance Optimization, where we apply AI-assisted profiling, algorithm analysis, and optimization techniques to make your applications fast without sacrificing the security foundations built in this chapter.


Key Terms

Term Definition
Trust boundary A point where data crosses from an untrusted source to a trusted context
Whitelist validation Accepting only inputs that match known-good patterns
Parameterized query A SQL query that separates code from data, preventing injection
Output encoding Converting special characters to safe representations before rendering
Content Security Policy (CSP) A browser header that restricts what resources a page can load
bcrypt A slow, salted hash function designed for password storage
JWT (JSON Web Token) A compact, signed token format for stateless authentication
RBAC Role-Based Access Control — permissions assigned to roles, roles to users
ABAC Attribute-Based Access Control — decisions based on user, resource, and environment attributes
IDOR Insecure Direct Object Reference — accessing other users' data by changing resource identifiers
Supply chain attack An attack that targets the software dependencies you rely on
Secret sprawl The proliferation of hardcoded secrets across codebases and configuration files
Defense in depth Layering multiple security controls so no single failure is catastrophic