28 min read

> "Architecture is about the important stuff. Whatever that is." -- Ralph Johnson

Chapter 24: Software Architecture with AI Assistance

"Architecture is about the important stuff. Whatever that is." -- Ralph Johnson


Learning Objectives

After completing this chapter, you will be able to:

  • Remember the fundamental vocabulary of software architecture, including patterns, principles, and quality attributes that shape system design
  • Understand how architectural decisions propagate through a system and why early structural choices constrain future possibilities
  • Apply SOLID principles, dependency injection, and event-driven patterns to Python codebases using AI-assisted workflows
  • Analyze the trade-offs between monolithic, microservice, and serverless architectures for a given set of requirements
  • Evaluate architectural proposals generated by AI assistants, identifying hidden assumptions, missing quality attributes, and scalability bottlenecks
  • Create Architecture Decision Records (ADRs) and system design documents using AI as a collaborative thinking partner

Introduction

In Parts I through III of this book, you learned to use AI assistants to write functions, build CLI tools, create web applications, design APIs, and work with databases. Each of those chapters focused on building something -- a tangible artifact you could run, test, and deploy.

This chapter asks a different question: how do those somethings fit together?

Software architecture is the discipline of organizing code into a coherent system. It is the difference between a pile of working functions and a maintainable, scalable, extensible application. It is the reason two teams can build identical features with identical technologies, yet one team's codebase becomes a joy to maintain while the other's becomes a nightmare within six months.

If individual coding is like writing sentences, architecture is like outlining the structure of a book. No amount of beautifully crafted sentences will save a book with no coherent structure, and no amount of clever functions will save a system with no coherent architecture.

The good news for vibe coders is that AI assistants are remarkably effective architecture partners. They have been trained on millions of codebases and can recall architectural patterns, articulate trade-offs, and generate structural proposals on demand. The challenge -- and this chapter's focus -- is learning how to direct that capability. An AI can propose an architecture, but only you understand your team, your constraints, your users, and your business context well enough to evaluate whether the proposal is right.

By the end of this chapter, you will know how to think architecturally, how to have productive system design conversations with AI, and how to document the decisions that shape your system's future.


24.1 Architecture Thinking for Vibe Coders

What Is Software Architecture?

Software architecture is the set of high-level structural decisions about a system. It encompasses which components exist, how they communicate, where data lives, and what principles govern their organization. More precisely, architecture is the set of decisions that are expensive to change later.

Consider the difference between these two decisions:

  1. "Should this variable be named user_count or num_users?"
  2. "Should user data live in a relational database or a document store?"

The first decision is trivially reversible. A find-and-replace operation takes seconds. The second decision, once implemented and deployed with production data, might take weeks or months to reverse. That second decision is architectural.

Key Concept

Architecture is not about getting every detail right. It is about getting the expensive decisions right -- the ones that are difficult, costly, or impossible to change once the system is built and running in production.

The Four Dimensions of Architecture

Architectural thinking operates across four dimensions:

Dimension Question It Answers Example
Structure How is the code organized? "We use a layered architecture with presentation, business logic, and data access layers."
Communication How do components talk to each other? "Services communicate via REST APIs and asynchronous message queues."
Data Where does data live and how does it flow? "Each microservice owns its database; cross-service data is synchronized via events."
Deployment How is the system built, tested, and released? "Each service is containerized and deployed independently via CI/CD pipelines."

When you evaluate an architectural proposal from an AI assistant, ensure it addresses all four dimensions. A common failure mode is focusing exclusively on structure while ignoring communication patterns, data flow, or deployment constraints.

Why Architecture Matters More for AI-Assisted Development

Paradoxically, AI-assisted development makes architecture more important, not less. Here is why:

AI accelerates code generation, which accelerates the accumulation of technical debt. When you can generate hundreds of lines of code per hour, you can also generate hundreds of lines of poorly structured code per hour. Without architectural guardrails, the speed of AI-assisted development becomes a liability.

AI lacks the context of your system's history. An AI assistant does not inherently know that your team tried microservices two years ago and reverted to a monolith because operational complexity was too high. It does not know that your deployment pipeline cannot handle more than five services. It does not know that your database administrator is leaving next month. Architecture decisions require this kind of organizational context.

AI generates code that fits patterns, not necessarily your pattern. If you ask an AI to add a feature without specifying your architectural conventions, it will invent its own. Over time, this results in a codebase with three different approaches to error handling, two different database access patterns, and no consistent module structure.

Warning

The fastest way to create an unmaintainable codebase is to generate code rapidly without a coherent architecture. AI makes code generation fast; architecture makes code generation safe.

The Architect's Mindset

Effective architectural thinking requires a specific mindset that differs from day-to-day coding:

  1. Think in boundaries. Instead of asking "how does this function work?", ask "where does this responsibility begin and end?"
  2. Think in trade-offs. Every architectural decision involves giving something up to gain something else. There are no free lunches.
  3. Think in time. A good architecture is not the one that is easiest to build today, but the one that is easiest to change six months from now.
  4. Think in failure. Assume components will fail, networks will drop, databases will slow down, and users will do unexpected things. Architecture is your plan for when things go wrong.
  5. Think in teams. Architecture shapes how teams work. A monolith means everyone works in the same codebase. Microservices mean teams can work independently but must coordinate on interfaces.

24.2 System Design Conversations with AI

AI as an Architecture Partner

One of the most powerful uses of AI in software development is not code generation but design conversation. AI assistants can serve as tireless architecture partners who never get impatient, never judge a "stupid question," and can recall an enormous breadth of patterns and precedents.

The key to productive design conversations is understanding that you are not asking the AI to make the architecture decision. You are asking it to help you think through the decision by surfacing options, articulating trade-offs, and stress-testing your assumptions.

The RADIO Framework for Design Conversations

When starting a system design conversation with an AI, use the RADIO framework to structure your prompt:

  • Requirements: What must the system do? What are the functional and non-functional requirements?
  • Actors: Who interacts with the system? Users, administrators, external services, internal systems?
  • Data: What data does the system manage? How much? How fast does it change? What are the consistency requirements?
  • Interfaces: What are the external interfaces? APIs, UIs, file imports, third-party integrations?
  • Operational constraints: What are the deployment environment, budget, team size, timeline, and existing technology constraints?

Here is an example of a well-structured architecture prompt:

I'm designing a task management system for a team of 50 engineers.

Requirements:
- Users can create, assign, and track tasks
- Tasks have statuses, priorities, and due dates
- Teams need dashboards showing task progress
- The system should support 500 concurrent users
- Response time under 200ms for page loads

Actors:
- Engineers (create and update tasks)
- Team leads (view dashboards, assign tasks)
- Admins (manage users and teams)

Data:
- Approximately 10,000 active tasks at any time
- Historical data retained for 2 years
- Task updates happen ~100 times per minute during peak hours

Interfaces:
- Web UI (primary interface)
- REST API (for CI/CD integrations)
- Email notifications

Operational constraints:
- Small team (3 developers)
- Deploying to AWS
- Budget: $500/month for infrastructure
- Must be production-ready in 3 months

What architectural approach would you recommend, and why?

Best Practice

Always include operational constraints in your architecture prompts. An AI will default to suggesting architecturally "ideal" solutions that may be operationally infeasible for your team. A microservice architecture is elegant, but if you have three developers and three months, a well-structured monolith is almost certainly the better choice.

Iterative Design Conversations

The first response from an AI is never the final architecture. Treat it as a starting point and iterate:

Round 1: Initial proposal. Ask for a high-level architecture recommendation with justification.

Round 2: Stress testing. Challenge the proposal: "What happens if traffic spikes to 10x normal? What if the database goes down? What if we need to add real-time collaboration later?"

Round 3: Alternatives. Ask for the second-best option and compare: "What would this look like as a serverless architecture instead? What would we gain and lose?"

Round 4: Detail. Drill into specific components: "How should we structure the notification service? What queue system should we use? Show me the API contract between the task service and the dashboard."

Round 5: Documentation. Ask the AI to summarize the final architecture in a format suitable for your team: an Architecture Decision Record, a C4 diagram description, or a technical design document.

Evaluating AI Architecture Proposals

When an AI proposes an architecture, evaluate it against these criteria:

  1. Does it match your constraints? A beautiful architecture that requires five engineers when you have two is useless.
  2. Does it address all quality attributes? Check for performance, security, scalability, maintainability, reliability, and observability.
  3. Is it appropriately simple? The best architecture is the simplest one that meets all requirements. Complexity is a cost, not a feature.
  4. Does it account for what you do not know? Good architectures isolate uncertainty. If you are unsure whether you will need real-time features, the architecture should make it easy to add them later without rewriting everything.
  5. Can your team operate it? An architecture is not just built once -- it must be monitored, debugged, updated, and maintained indefinitely.

Real-World Application

Senior engineers at technology companies report that AI-assisted design conversations often surface considerations they had not thought of. The value is not that the AI knows better, but that the process of articulating requirements and evaluating proposals forces you to think more rigorously than you would in your own head.


24.3 Architectural Patterns: Monolith, Microservices, Serverless

The Monolithic Architecture

A monolith is a single deployable unit that contains all of the application's functionality. Despite its reputation in some circles, the monolith is a perfectly valid architectural pattern and is often the best choice for small-to-medium applications.

Structure:

my_application/
    api/
        routes.py
        middleware.py
    services/
        task_service.py
        user_service.py
        notification_service.py
    models/
        task.py
        user.py
    data/
        repositories.py
        database.py
    config/
        settings.py
    main.py

Advantages: - Simple to develop, test, and deploy - No network latency between components - Straightforward transaction management (one database, ACID transactions) - Easy to debug (single process, single log stream) - Low operational overhead (one thing to monitor, one thing to scale)

Disadvantages: - All components must be deployed together - A failure in one component can bring down the entire application - Scaling requires scaling everything, even if only one component is under load - Large monoliths can become difficult to understand and modify - Technology lock-in (one language, one framework, one database)

When to use it: - Small teams (fewer than 10 developers) - New products where requirements are still evolving - Applications with moderate scale (thousands, not millions, of users) - When time-to-market is critical

Intuition

Think of a monolith like a single-family home. Everything is under one roof, it is easy to navigate, and maintenance is straightforward. It works perfectly for a small family. Problems arise only when you try to house a hundred people in it.

The Microservices Architecture

Microservices decompose an application into small, independently deployable services, each responsible for a specific business capability.

Structure:

task-service/         (owns task data and logic)
user-service/         (owns user data and authentication)
notification-service/ (owns email/SMS/push notifications)
dashboard-service/    (aggregates data for dashboards)
api-gateway/          (routes requests to appropriate services)

Advantages: - Independent deployment (update one service without redeploying others) - Independent scaling (scale the notification service without scaling user management) - Technology flexibility (each service can use different languages or databases) - Fault isolation (one service crashing does not necessarily take down others) - Team autonomy (each team owns a service end-to-end)

Disadvantages: - Significant operational complexity (many things to deploy, monitor, and debug) - Network latency between services - Distributed transaction management is hard - Data consistency across services requires careful design - Requires robust CI/CD, service discovery, and observability infrastructure - Testing the full system is more complex

When to use it: - Large teams (more than 20 developers) that need to work independently - Systems with clearly separable business domains - When different components have drastically different scaling needs - Organizations with mature DevOps practices

Warning

Microservices are not a default best practice. They are a specific solution to a specific problem: enabling large teams to work independently on a large system. If you do not have that problem, microservices will add complexity without adding value. Many successful companies (Basecamp, Stack Overflow, Shopify) run primarily on monolithic architectures.

The Serverless Architecture

Serverless architecture delegates infrastructure management to a cloud provider. Instead of deploying applications to servers, you deploy individual functions that the cloud provider executes on demand.

Structure:

functions/
    create_task.py      (triggered by API Gateway POST /tasks)
    get_tasks.py        (triggered by API Gateway GET /tasks)
    send_notification.py (triggered by SQS message)
    generate_report.py  (triggered by CloudWatch schedule)
infrastructure/
    serverless.yml      (infrastructure as code)

Advantages: - No server management (the cloud provider handles scaling, patching, and availability) - Pay-per-execution pricing (no cost when idle) - Automatic scaling from zero to thousands of concurrent executions - Reduced operational burden

Disadvantages: - Cold start latency (functions that have not run recently take longer to start) - Vendor lock-in (your code is tightly coupled to the cloud provider's services) - Limited execution duration (typically 15 minutes maximum) - Debugging and local development can be difficult - Complex applications become a web of functions that is hard to reason about - State management is challenging (functions are stateless by design)

When to use it: - Event-driven workloads (processing uploads, sending notifications, handling webhooks) - Sporadic or unpredictable traffic patterns - Startups that want to minimize infrastructure costs - Background processing tasks (data pipelines, scheduled reports)

Comparing the Three Patterns

Criterion Monolith Microservices Serverless
Team size 1-10 10+ 1-5
Deployment complexity Low High Medium
Operational overhead Low High Low
Scaling granularity Coarse Fine Very fine
Development speed (initial) Fast Slow Medium
Development speed (at scale) Slows over time Consistent Varies
Technology flexibility Low High Medium
Cost at low traffic Fixed Fixed (higher) Near-zero
Cost at high traffic Medium Varies Can be high
Debugging difficulty Easy Hard Medium-Hard

Asking AI to Compare Patterns

A powerful use of AI is asking it to evaluate patterns against your specific requirements:

Given these requirements:
- 3-person development team
- Expected 10,000 daily active users, growing to 100,000 in 12 months
- E-commerce platform with product catalog, shopping cart, payments, and order tracking
- Must integrate with Stripe for payments and SendGrid for emails
- Budget: $2,000/month for infrastructure
- Team has experience with Python and PostgreSQL

Compare monolith vs. microservices vs. serverless for this project.
For each, describe:
1. How the system would be structured
2. The main risks
3. When we would outgrow it
4. Estimated monthly infrastructure cost

This kind of structured comparison prompt yields far more useful results than simply asking "what architecture should I use?"


24.4 SOLID Principles in Practice

The SOLID principles are five design guidelines that, when followed, produce code that is easier to maintain, extend, and test. They were originally formulated for object-oriented programming, but their underlying insights apply broadly.

Single Responsibility Principle (SRP)

A class should have only one reason to change.

This means each class (or module, or function) should do one thing and do it well. When a class has multiple responsibilities, changes to one responsibility risk breaking the other.

# Violation: This class handles both task logic AND persistence
class TaskManager:
    def create_task(self, title: str) -> dict:
        task = {"title": title, "status": "pending"}
        with open("tasks.json", "w") as f:
            json.dump(task, f)
        return task

# Fixed: Separate responsibilities
class Task:
    def __init__(self, title: str, status: str = "pending"):
        self.title = title
        self.status = status

class TaskRepository:
    def save(self, task: Task) -> None:
        # Handles persistence only
        ...

Open/Closed Principle (OCP)

Software entities should be open for extension but closed for modification.

You should be able to add new behavior without changing existing code. This is typically achieved through abstraction and polymorphism.

from abc import ABC, abstractmethod

class NotificationChannel(ABC):
    @abstractmethod
    def send(self, message: str, recipient: str) -> None:
        ...

class EmailNotification(NotificationChannel):
    def send(self, message: str, recipient: str) -> None:
        # Send via email
        ...

class SlackNotification(NotificationChannel):
    def send(self, message: str, recipient: str) -> None:
        # Send via Slack
        ...

# Adding SMS support requires NO changes to existing code --
# just a new class:
class SMSNotification(NotificationChannel):
    def send(self, message: str, recipient: str) -> None:
        # Send via SMS
        ...

Liskov Substitution Principle (LSP)

Objects of a superclass should be replaceable with objects of a subclass without breaking the program.

If your code works with a base type, it should work correctly with any subtype.

# Violation: Square changes the behavior contract of Rectangle
class Rectangle:
    def __init__(self, width: float, height: float):
        self.width = width
        self.height = height

    def area(self) -> float:
        return self.width * self.height

class Square(Rectangle):
    def __init__(self, side: float):
        super().__init__(side, side)

    # This setter violates LSP because it changes both dimensions,
    # which is not expected behavior for a Rectangle

Interface Segregation Principle (ISP)

Clients should not be forced to depend on interfaces they do not use.

Instead of one large interface, create several small, focused ones.

# Violation: One large interface forces all implementations
# to provide methods they may not need
class Worker(ABC):
    @abstractmethod
    def code(self) -> None: ...
    @abstractmethod
    def review(self) -> None: ...
    @abstractmethod
    def manage(self) -> None: ...

# Fixed: Segregated interfaces
class Coder(ABC):
    @abstractmethod
    def code(self) -> None: ...

class Reviewer(ABC):
    @abstractmethod
    def review(self) -> None: ...

class Manager(ABC):
    @abstractmethod
    def manage(self) -> None: ...

Dependency Inversion Principle (DIP)

High-level modules should not depend on low-level modules. Both should depend on abstractions.

This is the most architecturally significant of the SOLID principles, and we will cover it in depth in Section 24.6.

# Violation: High-level TaskService depends on low-level SQLiteDatabase
class TaskService:
    def __init__(self):
        self.db = SQLiteDatabase()  # Hard-coded dependency

# Fixed: Depend on abstraction
class TaskService:
    def __init__(self, repository: TaskRepository):
        self.repository = repository  # Injected abstraction

Key Concept

SOLID principles are not rigid rules to follow dogmatically. They are guidelines that help you reason about code structure. The goal is not "SOLID-compliant code" but code that is easy to understand, test, and change. Sometimes violating a SOLID principle is the right pragmatic choice -- but you should do so consciously, not accidentally.

See code/example-01-solid-principles.py for a complete, runnable demonstration of all five SOLID principles applied to a realistic task management domain.


24.5 Module Boundaries and Interfaces

What Is a Module Boundary?

A module boundary is the line between "inside" and "outside" of a component. Inside the boundary, implementation details can change freely. Outside the boundary, other components interact only through a defined interface.

Good module boundaries are the foundation of maintainable software. They enable:

  • Independent development: Teams can work on different modules simultaneously
  • Independent testing: Modules can be tested in isolation using mock implementations of their dependencies
  • Independent deployment: In some architectures, modules can be deployed separately
  • Controlled change propagation: Changes inside a module do not ripple through the rest of the system

Identifying Natural Boundaries

How do you decide where to draw module boundaries? Look for these signals:

Business capability boundaries. If your application handles users, orders, and inventory, those are three natural modules. Each corresponds to a distinct business concept with its own data and rules.

Rate-of-change boundaries. Components that change together should live together. If the pricing logic changes every week but the user authentication logic changes once a year, they belong in different modules.

Team boundaries. If different teams own different parts of the system, module boundaries should align with team boundaries. This is Conway's Law in practice: "Organizations design systems that mirror their own communication structure."

Technology boundaries. If one part of your system uses machine learning and another handles real-time communication, the different technology stacks suggest different modules.

Defining Clean Interfaces

An interface is the contract between a module and its consumers. A clean interface:

  1. Exposes what, not how. The interface says what you can do ("create a task," "find tasks by user"), not how it is done internally.
  2. Uses domain language. Interface names should reflect business concepts, not implementation details. TaskRepository.find_by_assignee() is better than TaskRepository.query_sql_where_user_id().
  3. Minimizes surface area. Expose only what consumers need. Every public method is a commitment you must maintain.
  4. Is stable. The interface should change much less frequently than the implementation behind it.
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Optional


@dataclass
class Task:
    """A task in the task management system."""
    id: str
    title: str
    assignee: str
    status: str


class TaskRepository(ABC):
    """Interface for task persistence.

    This is the boundary between the business logic layer
    and the data access layer. Implementations may use
    SQL databases, document stores, or in-memory storage.
    """

    @abstractmethod
    def save(self, task: Task) -> None:
        """Persist a task."""
        ...

    @abstractmethod
    def find_by_id(self, task_id: str) -> Optional[Task]:
        """Retrieve a task by its unique identifier."""
        ...

    @abstractmethod
    def find_by_assignee(self, assignee: str) -> list[Task]:
        """Retrieve all tasks assigned to a specific person."""
        ...

    @abstractmethod
    def delete(self, task_id: str) -> None:
        """Remove a task from the repository."""
        ...

Best Practice

When using AI to generate module structures, prompt explicitly for the interfaces first: "Define the public interface for the notification module. Do not implement anything yet -- just show me the abstract base classes and data classes that other modules will depend on." This forces you to think about boundaries before implementation details.

The Dependency Rule

In a well-structured system, dependencies flow inward. The outermost layers (UI, API routes, database adapters) depend on the inner layers (business logic, domain models), never the reverse.

┌─────────────────────────────────────┐
│  External (APIs, UI, Database)      │
│  ┌─────────────────────────────┐    │
│  │  Interface Adapters          │    │
│  │  ┌─────────────────────┐    │    │
│  │  │  Business Logic      │    │    │
│  │  │  ┌─────────────┐    │    │    │
│  │  │  │  Domain      │    │    │    │
│  │  │  │  Models      │    │    │    │
│  │  │  └─────────────┘    │    │    │
│  │  └─────────────────────┘    │    │
│  └─────────────────────────────┘    │
└─────────────────────────────────────┘

This is often called "Clean Architecture" or "Hexagonal Architecture." The core principle is the same: your business logic should not know or care whether data comes from PostgreSQL, MongoDB, or a CSV file. It should not know or care whether users interact via a web browser, a mobile app, or a CLI.


24.6 Dependency Management and Inversion

The Problem with Direct Dependencies

Consider this code:

import smtplib

class OrderService:
    def place_order(self, order: Order) -> None:
        # Business logic
        self.validate_order(order)
        self.calculate_total(order)
        self.save_to_database(order)

        # Direct dependency on email implementation
        server = smtplib.SMTP("smtp.gmail.com", 587)
        server.starttls()
        server.login("app@example.com", "password123")
        server.sendmail(
            "app@example.com",
            order.customer_email,
            f"Order {order.id} confirmed!"
        )
        server.quit()

This code has several problems:

  1. Untestable. You cannot test place_order without actually sending an email.
  2. Inflexible. Switching from email to SMS or push notifications requires modifying OrderService.
  3. Violates SRP. OrderService is responsible for both order logic and email delivery.
  4. Fragile. If the SMTP server is down, order placement fails entirely.

Dependency Inversion in Practice

The solution is to invert the dependency. Instead of OrderService reaching out to smtplib, we define an abstraction that OrderService depends on, and inject a concrete implementation:

from abc import ABC, abstractmethod


class NotificationService(ABC):
    """Abstraction for sending notifications."""

    @abstractmethod
    def notify(self, recipient: str, message: str) -> None:
        ...


class EmailNotificationService(NotificationService):
    """Concrete implementation using SMTP."""

    def __init__(self, smtp_host: str, smtp_port: int):
        self.smtp_host = smtp_host
        self.smtp_port = smtp_port

    def notify(self, recipient: str, message: str) -> None:
        # SMTP implementation here
        ...


class OrderService:
    """Depends on the abstraction, not the implementation."""

    def __init__(self, notification_service: NotificationService):
        self.notification_service = notification_service

    def place_order(self, order: Order) -> None:
        self.validate_order(order)
        self.calculate_total(order)
        self.save_to_database(order)
        self.notification_service.notify(
            order.customer_email,
            f"Order {order.id} confirmed!"
        )

Now OrderService depends on the NotificationService abstraction. During testing, you can inject a mock. In production, you inject the real email service. If you later add SMS support, you create a new implementation without touching OrderService.

Dependency Injection Patterns

There are three common patterns for injecting dependencies:

Constructor injection (most common and recommended):

class TaskService:
    def __init__(self, repository: TaskRepository, notifier: NotificationService):
        self.repository = repository
        self.notifier = notifier

Method injection (for dependencies that vary per operation):

class ReportGenerator:
    def generate(self, data: list, formatter: ReportFormatter) -> str:
        return formatter.format(data)

Container-based injection (for complex applications with many dependencies):

class Container:
    """Simple dependency injection container."""

    def __init__(self):
        self._factories = {}

    def register(self, interface_type: type, factory: callable) -> None:
        self._factories[interface_type] = factory

    def resolve(self, interface_type: type):
        factory = self._factories.get(interface_type)
        if factory is None:
            raise ValueError(f"No registration for {interface_type}")
        return factory(self)

See code/example-03-dependency-injection.py for a complete, working dependency injection container with realistic examples.

Asking AI for Help

When you need to refactor code to use dependency injection, try this prompt: "I have this class [paste code]. It has hard-coded dependencies on [list them]. Refactor it to use constructor injection with abstract base classes. Show me the abstractions, the concrete implementations, and how to wire them together. Include a test that uses mock implementations."


24.7 Event-Driven Architecture

Beyond Request-Response

Most of the code in Part III (Chapters 15-23) used a request-response model: a client sends a request, the server processes it, and returns a response. This model is intuitive and works well for many scenarios, but it has limitations.

Consider what happens when a user places an order:

  1. Save the order to the database
  2. Send a confirmation email
  3. Update inventory counts
  4. Notify the warehouse
  5. Update the analytics dashboard
  6. Charge the payment method

In a request-response model, the order placement endpoint must do all of these things synchronously. If the email server is slow, the user waits. If the analytics service is down, the order fails. The components are tightly coupled through the request handler.

The Publish-Subscribe Pattern

Event-driven architecture decouples these concerns. Instead of the order service calling each downstream service directly, it publishes an event ("order placed"), and interested services subscribe to that event.

# Instead of this (tightly coupled):
def place_order(order):
    save_to_database(order)
    send_confirmation_email(order)
    update_inventory(order)
    notify_warehouse(order)
    update_analytics(order)
    charge_payment(order)

# You get this (loosely coupled):
def place_order(order):
    save_to_database(order)
    event_bus.publish("order_placed", order)

# Elsewhere, each service subscribes independently:
event_bus.subscribe("order_placed", send_confirmation_email)
event_bus.subscribe("order_placed", update_inventory)
event_bus.subscribe("order_placed", notify_warehouse)
event_bus.subscribe("order_placed", update_analytics)

Advantages of event-driven architecture:

  • Loose coupling: Publishers do not know or care about subscribers
  • Easy extensibility: Adding a new reaction to an event requires no changes to existing code
  • Resilience: If the email service is down, the order still succeeds; the email can be retried later
  • Scalability: Subscribers can process events at their own pace using message queues

Disadvantages:

  • Eventual consistency: Events are processed asynchronously, so the system may be temporarily inconsistent
  • Debugging complexity: Tracing a request through a chain of events is harder than following a single function call
  • Ordering challenges: Events may arrive out of order, and your system must handle this
  • Increased infrastructure: You need a message broker (RabbitMQ, Kafka, Redis Streams, AWS SQS)

Event Types

Events in an event-driven system typically fall into three categories:

Domain events represent something that happened in the business domain: "OrderPlaced," "UserRegistered," "PaymentFailed." These are the most common and most useful type.

Integration events are used to communicate between bounded contexts or services: "InventoryReserved," "ShipmentCreated." They carry only the data needed by the receiving service.

System events represent infrastructure-level occurrences: "DatabaseConnectionLost," "CacheCleared," "DeploymentCompleted." These are used for monitoring and operational automation.

Implementing Events in Python

A simple but effective event system in Python uses a central event bus:

from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Callable
from collections import defaultdict


@dataclass
class Event:
    """Base class for all domain events."""
    event_type: str
    timestamp: datetime = field(default_factory=datetime.utcnow)
    data: dict[str, Any] = field(default_factory=dict)


class EventBus:
    """Simple in-process event bus implementing publish-subscribe."""

    def __init__(self):
        self._subscribers: dict[str, list[Callable]] = defaultdict(list)

    def subscribe(self, event_type: str, handler: Callable[[Event], None]) -> None:
        self._subscribers[event_type].append(handler)

    def publish(self, event: Event) -> None:
        for handler in self._subscribers.get(event.event_type, []):
            handler(event)

See code/example-02-event-driven.py for a complete implementation including event sourcing, async handlers, and error handling.

Key Concept

Event-driven architecture is not all-or-nothing. You can introduce events selectively into specific parts of your system where loose coupling and extensibility are most valuable, while keeping the rest of your system request-response. This hybrid approach is very common in practice.

Event Sourcing

Event sourcing takes the event-driven concept further: instead of storing the current state of an entity, you store the sequence of events that led to that state. The current state is derived by replaying all events.

For example, instead of storing that a bank account has a balance of $500, you store:

  1. AccountOpened (initial deposit: $1000)
  2. Withdrawal ($200)
  3. Deposit ($50)
  4. Withdrawal ($350)

The balance ($500) is computed by replaying these events. This provides a complete audit trail and enables powerful capabilities like temporal queries ("what was the balance on March 15?") and event replay ("reprocess all orders from last week with the corrected pricing logic").


24.8 Scalability Considerations

Vertical vs. Horizontal Scaling

When your system needs to handle more load, there are two fundamental approaches:

Vertical scaling (scaling up): Give a single server more resources -- more CPU, more RAM, faster storage. This is the simplest approach but has hard limits (you cannot buy a server with 10,000 CPU cores) and creates a single point of failure.

Horizontal scaling (scaling out): Add more servers running the same application. This approach scales theoretically without limit but introduces complexity: how do requests get distributed? How is state shared? How do you deploy to multiple servers?

Best Practice

Start with vertical scaling. It is simpler, requires no architectural changes, and modern cloud instances are remarkably powerful. Move to horizontal scaling only when you have concrete evidence that a single server cannot handle your load. Premature horizontal scaling is a common source of unnecessary complexity.

Stateless Services

The prerequisite for horizontal scaling is statelessness. A stateless service does not store any data between requests. Each request contains all the information needed to process it.

# Stateful (cannot be horizontally scaled):
class TaskService:
    def __init__(self):
        self.tasks = {}  # State stored in memory

    def create_task(self, task: Task) -> None:
        self.tasks[task.id] = task  # Lost if this instance dies

# Stateless (can be horizontally scaled):
class TaskService:
    def __init__(self, repository: TaskRepository):
        self.repository = repository  # State lives in external store

    def create_task(self, task: Task) -> None:
        self.repository.save(task)  # Persisted externally

When a service is stateless, any instance can handle any request. A load balancer can distribute requests across instances freely, and instances can be added or removed without data loss.

Caching Layers

Caching stores frequently accessed data in a fast-access location to reduce load on the primary data store. Effective caching can improve performance by orders of magnitude.

Common caching strategies:

Strategy Description Best For
Cache-aside Application checks cache first; on miss, loads from database and populates cache Read-heavy workloads with tolerance for stale data
Write-through Application writes to cache and database simultaneously Workloads requiring strong consistency
Write-behind Application writes to cache; cache asynchronously writes to database Write-heavy workloads where slight data loss is acceptable
class CachedTaskRepository(TaskRepository):
    """Repository with cache-aside pattern."""

    def __init__(self, repository: TaskRepository, cache: Cache, ttl: int = 300):
        self.repository = repository
        self.cache = cache
        self.ttl = ttl

    def find_by_id(self, task_id: str) -> Optional[Task]:
        # Check cache first
        cached = self.cache.get(f"task:{task_id}")
        if cached is not None:
            return cached

        # Cache miss: load from database
        task = self.repository.find_by_id(task_id)
        if task is not None:
            self.cache.set(f"task:{task_id}", task, ttl=self.ttl)
        return task

Load Balancing

A load balancer distributes incoming requests across multiple instances of a service. Common algorithms include:

  • Round-robin: Requests are distributed to instances in sequence
  • Least connections: Requests go to the instance with the fewest active connections
  • Weighted: Instances with more resources receive more requests
  • IP hash: Requests from the same client always go to the same instance (useful for session affinity)

Database Scaling

Databases are often the bottleneck in scaling. Strategies include:

Read replicas: Write to a primary database; read from one or more replicas. This works well when reads vastly outnumber writes.

Sharding: Split data across multiple databases based on a key (e.g., user ID). Each shard handles a subset of the data. This enables horizontal scaling but adds significant complexity to queries that span shards.

Connection pooling: Reuse database connections instead of creating new ones for each request. This is a simple optimization with significant impact.

Asking AI for Help

AI assistants excel at scalability analysis. Try prompts like: "My application currently serves 1,000 requests per minute on a single server. I expect traffic to grow to 50,000 requests per minute over the next year. The application is a Python Flask API backed by PostgreSQL. What scaling strategy would you recommend, and what architectural changes would I need to make?"


24.9 Trade-Off Analysis with AI

Every Decision Is a Trade-Off

The defining characteristic of architectural thinking is the recognition that every decision involves trade-offs. There is no architecture that maximizes performance, minimizes cost, maximizes flexibility, minimizes complexity, and maximizes reliability simultaneously. Choosing one quality attribute often means compromising another.

Common trade-off pairs include:

Quality A vs. Quality B
Consistency Availability (CAP theorem)
Performance Maintainability
Flexibility Simplicity
Security Usability
Speed of delivery Code quality
Cost of build Cost of operation

Using AI to Surface Trade-Offs

AI assistants are excellent at surfacing trade-offs you might not have considered. Here are effective prompting strategies:

The "Devil's Advocate" prompt:

I've decided to use a microservices architecture for our new e-commerce
platform. Our team is 4 developers. Play devil's advocate -- what are all
the reasons this might be a bad decision?

The "What Could Go Wrong" prompt:

Here is our proposed architecture: [describe it]. For each component
and each connection between components, describe the most likely failure
mode and its impact on the overall system.

The "Hidden Assumptions" prompt:

Here is our architecture proposal: [describe it]. What implicit assumptions
are we making? Which of those assumptions are most likely to be wrong?

The "Five Years From Now" prompt:

Our system currently handles 100 users. If it grows to 100,000 users
over 5 years, which parts of this architecture will break first?
What changes would we need to make, and how painful would they be?

The Decision Matrix

When facing a significant architectural decision, use a weighted decision matrix. AI can help both construct and evaluate it:

I need to choose between PostgreSQL and MongoDB for our task management
system. Please create a weighted decision matrix with these criteria:
- Query flexibility (weight: 3)
- Schema evolution (weight: 2)
- Transaction support (weight: 3)
- Team familiarity (weight: 2) -- team knows PostgreSQL well
- Operational simplicity (weight: 1)
- Ecosystem/tooling (weight: 1)

Score each option 1-5 on each criterion and calculate weighted totals.
Explain your reasoning for each score.

Real-World Application

Engineering teams at companies like Amazon and Google use structured trade-off analysis for every major architectural decision. The process is often more valuable than the result -- the act of explicitly identifying and weighting trade-offs forces the team to align on priorities and surface hidden disagreements.

Reversible vs. Irreversible Decisions

Not all architectural decisions deserve the same level of analysis. Amazon distinguishes between:

Type 1 decisions (irreversible): Once made, they are extremely difficult or impossible to undo. These deserve extensive analysis, multiple perspectives, and careful deliberation. Example: choosing your primary database technology for a system that will store billions of records.

Type 2 decisions (reversible): These can be undone with reasonable effort. They should be made quickly by individuals or small groups. Example: choosing a logging library or a JSON serialization format.

Best Practice

When an AI proposes an architectural choice, always ask yourself: "Is this a Type 1 or Type 2 decision?" If it is Type 2, make the decision quickly and move on. If it is Type 1, invest the time to analyze trade-offs thoroughly.


24.10 Documenting Architecture Decisions

Why Document Architecture Decisions?

Code tells you what a system does. Documentation tells you why it does it that way. Architecture Decision Records (ADRs) capture the reasoning behind significant design choices so that future developers -- including your future self -- understand not just what was decided, but what alternatives were considered and why they were rejected.

Without ADRs, teams repeatedly revisit the same decisions, waste time debating choices that were already made for good reasons, or accidentally reverse decisions without understanding the consequences.

The ADR Format

An ADR is a short document (typically one to two pages) with a standard format:

# ADR-001: Use PostgreSQL as Primary Database

## Status
Accepted

## Context
We are building a task management system that requires structured
data storage with complex querying capabilities. The data model
includes users, teams, tasks, comments, and activity logs with
many-to-many relationships. We expect 10,000 active users within
12 months.

## Decision
We will use PostgreSQL as our primary database.

## Alternatives Considered

### MongoDB
- Pros: Flexible schema, easy to get started, good for
  document-shaped data
- Cons: Weak support for relational queries we need,
  eventual consistency by default, team has no experience

### MySQL
- Pros: Mature, widely supported, team has some experience
- Cons: Less sophisticated query planner than PostgreSQL,
  weaker JSON support, fewer advanced features (CTEs,
  window functions)

## Consequences
- We commit to a relational data model with migrations
- We need team members with PostgreSQL experience (or training)
- We gain powerful querying capabilities and ACID transactions
- We may need to add a caching layer for read-heavy workloads
- Schema changes require migration scripts

## Decision Date
2025-03-15

Using AI to Write ADRs

AI is exceptionally useful for drafting ADRs because the process of writing one is essentially the process of articulating a decision with its context and trade-offs -- exactly the kind of structured reasoning AI excels at.

Prompt template for ADR generation:

Write an Architecture Decision Record for the following decision:

Decision: [what was decided]
Context: [why this decision was needed]
Constraints: [team size, budget, timeline, existing technology]
Alternatives we considered: [list them]

Use the standard ADR format with Status, Context, Decision,
Alternatives Considered (with pros and cons for each),
Consequences, and Decision Date.

Key Concept

ADRs are not bureaucratic overhead. They are a form of institutional memory. A well-maintained collection of ADRs is one of the most valuable assets a development team can have. When a new team member asks "why do we use X instead of Y?", you point them to the ADR instead of trying to reconstruct the reasoning from memory.

Maintaining an ADR Log

ADRs should be stored in your version control repository, typically in a docs/adr/ directory. They are numbered sequentially and never deleted -- if a decision is reversed, you write a new ADR that supersedes the old one.

docs/
    adr/
        0001-use-postgresql-as-primary-database.md
        0002-adopt-event-driven-architecture-for-notifications.md
        0003-use-redis-for-caching.md
        0004-migrate-from-rest-to-graphql-for-dashboard-api.md

This creates a chronological record of how your architecture evolved and why. It is invaluable during onboarding, post-mortems, and future planning.

ADRs and AI Context

ADRs serve another important function in AI-assisted development: they provide context for your AI assistant. When you include ADRs in your prompt context, the AI can generate code that is consistent with your existing architectural decisions.

Here are our current Architecture Decision Records:
[paste relevant ADRs]

Given these architectural decisions, implement a new feature
that [describe the feature]. Ensure the implementation is
consistent with the patterns and technologies described in
our ADRs.

This closes the loop between architectural thinking and AI-assisted implementation, ensuring that the code generated by AI fits within the architectural guardrails your team has established.


Chapter Summary

Software architecture is the art of making expensive decisions well. In AI-assisted development, this means learning to use AI as a thinking partner for system design while maintaining the human judgment needed to evaluate proposals against real-world constraints.

The key ideas in this chapter:

  1. Architecture is about the decisions that are expensive to change. Focus your architectural thinking on these high-stakes choices.

  2. AI assistants are powerful architecture partners, but they need structured input (use the RADIO framework) and iterative refinement (stress-test proposals, explore alternatives).

  3. Choose architectural patterns based on your constraints, not trends. A monolith is not inferior to microservices -- it is simply appropriate for different circumstances.

  4. SOLID principles guide the internal structure of your code. They produce code that is testable, extensible, and maintainable.

  5. Module boundaries and clean interfaces are the mechanism by which you manage complexity. Draw boundaries along business capabilities, rates of change, team boundaries, and technology boundaries.

  6. Dependency inversion decouples your business logic from infrastructure concerns, making your code testable and adaptable.

  7. Event-driven architecture enables loose coupling and extensibility for workflows that involve multiple downstream effects.

  8. Scalability is a spectrum, not a binary. Start simple (vertical scaling, stateless services, caching) and add complexity only when you have evidence it is needed.

  9. Every architectural decision is a trade-off. Use AI to surface trade-offs, challenge assumptions, and evaluate alternatives systematically.

  10. Document your decisions with ADRs. They are your system's institutional memory and a valuable source of context for AI-assisted development.

In the next chapter, we will zoom in from system-level architecture to code-level structure, exploring design patterns and clean code principles that make individual components well-crafted and maintainable.


References to Other Chapters

  • Chapter 15 (CLI Tools and Scripts): Example of a well-structured monolithic Python application
  • Chapter 17 (Backend Development and REST APIs): API design patterns and request-response architecture
  • Chapter 18 (Database Design and Data Modeling): Data layer architecture and persistence patterns
  • Chapter 19 (Full-Stack Development): End-to-end system architecture spanning frontend and backend
  • Chapter 20 (External APIs and Integrations): Interface design for external service integration
  • Chapter 25 (Design Patterns and Clean Code): Code-level patterns that complement system-level architecture
  • Chapter 29 (DevOps and Deployment): Deployment architecture and infrastructure as code