Chapter 16: OOP Design: Patterns and Principles

Contributors

18 min read

> "Any fool can write code that a computer can understand. Good programmers write code that humans can understand."

Prerequisites

15
13

Learning Objectives

Apply the Single Responsibility Principle and Open/Closed Principle to evaluate and improve class designs
Distinguish high cohesion from low cohesion and tight coupling from loose coupling in real code
Implement the Strategy, Observer, and Factory design patterns in Python
Use @dataclass to create clean data-holding classes with minimal boilerplate
Identify common code smells and apply systematic refactoring techniques
Evaluate when OOP is the right tool and when simpler approaches are better

Chapter 16: OOP Design: Patterns and Principles

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." — Martin Fowler, Refactoring

Chapter Overview

You know how to write classes. You know how to use inheritance and polymorphism. Congratulations — you've learned the mechanics of object-oriented programming. Now comes the harder question: how do you design classes that actually work well together?

There's a difference between code that runs and code that's good. Good code is easy to change, easy to test, and easy for someone else (including future-you) to understand. Bad code fights you at every turn — you fix one bug and three more appear, you add one feature and half the system breaks, you look at code you wrote three months ago and genuinely can't figure out what it does.

This chapter is about the bridge between those two worlds. We'll explore principles that experienced developers use to evaluate designs, patterns that solve recurring problems in elegant ways, and the practical skill of recognizing when code has gone wrong and fixing it systematically.

A word of caution: design is not something you master by reading about it. You master it by doing it badly, noticing what went wrong, and doing it better next time. The principles and patterns in this chapter are guardrails, not laws. They'll save you from the worst mistakes, but the real learning happens when you apply them to your own projects and feel the difference.

In this chapter, you will learn to: - Evaluate class designs using the Single Responsibility and Open/Closed Principles - Recognize tight coupling and low cohesion — and know how to fix them - Implement Strategy, Observer, and Factory patterns in Python - Use @dataclass to eliminate boilerplate in data-holding classes - Identify code smells and refactor systematically - Know when not to use OOP

🏃 Fast Track: If you're comfortable with design principles conceptually, skim 16.1–16.3 and focus on the pattern implementations (16.4–16.6) and dataclasses (16.7). Then work through the project checkpoint.

🔬 Deep Dive: After this chapter, read Case Study 1 for a full refactoring walkthrough and Case Study 2 to see how real Python libraries use these exact patterns.

16.1 Thinking About Design

Let's start with a story you'll recognize if you've built anything beyond a toy program.

You write a class. It works. You add a feature. Still works. You add another feature, then another. Somewhere around the fifth feature, you notice something uncomfortable: every change requires modifying three different methods. The __init__ method is 40 lines long. You have a method called do_everything() that truly does everything. You want to add a simple notification when a task is overdue, and you realize you'd have to rewrite half the class.

You've hit the wall that separates "code that works" from "code that's designed well."

Why Design Matters

Design isn't about making code look pretty or following rules for the sake of rules. It's about managing the cost of change. Software changes constantly — new features, bug fixes, new requirements from users. Well-designed code makes changes easy and safe. Poorly designed code makes every change a risk.

Here's a concrete example. Consider two versions of a grade calculator:

# Version A: Everything in one class
class GradeSystem:
    def __init__(self):
        self.students = []
        self.grades = {}
        self.display_mode = "text"
        self.storage_file = "grades.json"
        self.email_server = "smtp.school.edu"

    def add_student(self, name): ...
    def record_grade(self, student, grade): ...
    def calculate_gpa(self, student): ...
    def display_report(self): ...
    def save_to_file(self): ...
    def load_from_file(self): ...
    def email_report(self, recipient): ...
    def generate_html_report(self): ...
    def validate_grade(self, grade): ...
    def calculate_class_average(self): ...

This class does everything. Grading, display, storage, email, HTML generation, validation. Now ask yourself: what happens when you need to change how reports are displayed? You have to modify this giant class, which also handles email and storage. What if the email change accidentally breaks grade calculation? You won't know until it blows up in production.

Now consider a different design:

# Version B: Separated responsibilities
class GradeBook:
    """Manages students and their grades."""
    def add_student(self, name): ...
    def record_grade(self, student, grade): ...
    def calculate_gpa(self, student): ...
    def calculate_class_average(self): ...

class GradeValidator:
    """Validates grade values."""
    def validate(self, grade): ...

class ReportGenerator:
    """Creates reports in various formats."""
    def text_report(self, gradebook): ...
    def html_report(self, gradebook): ...

class GradeStorage:
    """Saves and loads grade data."""
    def save(self, gradebook, filename): ...
    def load(self, filename): ...

Same functionality, but now each class has one job. You can change report formatting without touching grade calculations. You can swap file storage for database storage without affecting anything else. Each class is small enough to fit in your head.

That's what design is about: organizing code so that each piece has a clear purpose and changes don't ripple through the entire system.

🔗 Connection to Chapter 6 (Functions): We already applied this thinking to functions — each function should do one thing and do it well. OOP design is the same principle at a larger scale. Functions are your small units of organization; classes are your medium-sized ones.

16.2 SOLID Principles for Beginners

SOLID is an acronym for five design principles popularized by Robert C. Martin ("Uncle Bob"). We'll focus on the two that matter most at this stage: S and O. The others (Liskov Substitution, Interface Segregation, Dependency Inversion) build on these — you'll encounter them in a software engineering course.

Single Responsibility Principle (SRP)

A class should have one reason to change.

That's the formal statement. Here's the practical version: if you can't describe what a class does in one sentence without using the word "and," it probably has too many responsibilities.

Let's test this with our text adventure game:

# VIOLATES SRP: Player class does too many things
class Player:
    def __init__(self, name):
        self.name = name
        self.hp = 100
        self.inventory = []
        self.position = (0, 0)

    def move(self, direction): ...        # Navigation
    def attack(self, enemy): ...          # Combat
    def take_damage(self, amount): ...    # Combat
    def pick_up(self, item): ...          # Inventory
    def drop(self, item): ...             # Inventory
    def save_game(self, filename): ...    # Persistence
    def load_game(self, filename): ...    # Persistence
    def draw_on_screen(self): ...         # Display
    def play_sound(self, sound): ...      # Audio

What does this class do? "It manages the player's state and handles combat and manages inventory and saves/loads games and draws to the screen and plays sounds." That's a lot of "ands."

A better design splits these into focused classes:

# FOLLOWS SRP: Each class has one job
class Player:
    """Manages player state."""
    def __init__(self, name):
        self.name = name
        self.hp = 100
        self.inventory = Inventory()
        self.position = (0, 0)

class Inventory:
    """Manages items."""
    def pick_up(self, item): ...
    def drop(self, item): ...

class CombatSystem:
    """Handles combat between entities."""
    def attack(self, attacker, defender): ...
    def apply_damage(self, target, amount): ...

class GameSaver:
    """Persists game state."""
    def save(self, game_state, filename): ...
    def load(self, filename): ...

🔄 Check Your Understanding: Look at the TaskFlow classes from Chapter 14. Does TaskList follow SRP? What about TaskStorage? Identify any responsibilities that could be separated.

Open/Closed Principle (OCP)

Classes should be open for extension but closed for modification.

Translation: you should be able to add new behavior without changing existing code that already works.

This one sounds abstract, so let's make it concrete. Imagine Elena needs her report generator to support new output formats:

# VIOLATES OCP: Adding a format means modifying existing code
class ReportGenerator:
    def generate(self, data, format_type):
        if format_type == "csv":
            return self._make_csv(data)
        elif format_type == "html":
            return self._make_html(data)
        elif format_type == "pdf":     # Must modify this class!
            return self._make_pdf(data)
        # Every new format = another elif = modifying tested code

Every time Elena needs a new format, she has to open up this class and add another elif branch. That means retesting the entire class, because the change could break the existing CSV and HTML code.

Now consider this approach:

# FOLLOWS OCP: New formats don't touch existing code
from abc import ABC, abstractmethod

class ReportFormatter(ABC):
    @abstractmethod
    def format(self, data: list[dict]) -> str:
        pass

class CSVFormatter(ReportFormatter):
    def format(self, data: list[dict]) -> str:
        if not data:
            return ""
        headers = ",".join(data[0].keys())
        rows = [",".join(str(v) for v in row.values()) for row in data]
        return headers + "\n" + "\n".join(rows)

class HTMLFormatter(ReportFormatter):
    def format(self, data: list[dict]) -> str:
        if not data:
            return "<table></table>"
        headers = "".join(f"<th>{h}</th>" for h in data[0].keys())
        rows = "".join(
            "<tr>" + "".join(f"<td>{v}</td>" for v in row.values()) + "</tr>"
            for row in data
        )
        return f"<table><tr>{headers}</tr>{rows}</table>"

class ReportGenerator:
    def __init__(self, formatter: ReportFormatter):
        self.formatter = formatter

    def generate(self, data: list[dict]) -> str:
        return self.formatter.format(data)

Now adding PDF support means creating a new PDFFormatter class — the existing ReportGenerator, CSVFormatter, and HTMLFormatter are completely untouched. That's the Open/Closed Principle in action.

Notice how this connects to inheritance and polymorphism from Chapter 15. OCP works because of polymorphism: the ReportGenerator doesn't care which formatter it has — it just calls .format() and trusts the subclass to do the right thing.

💡 Intuition Builder: Think of a power strip. When you need to plug in a new device, you don't rewire the power strip — you just plug the new device into an open socket. The power strip is "closed for modification" (you don't change its wiring) but "open for extension" (you can add new devices). OCP is the same idea for code.

16.3 Coupling and Cohesion

Two concepts that experienced developers think about constantly: coupling (how much classes depend on each other) and cohesion (how focused a single class is on one job).

Coupling: How Connected Are Your Classes?

Coupling measures how much one class depends on the internal details of another. Tight coupling is bad — it means changing one class forces you to change another. Loose coupling is good — classes interact through clean interfaces and don't care about each other's internals.

# TIGHT coupling: Display knows everything about Task internals
class Task:
    def __init__(self, title, priority, due_date):
        self.title = title
        self.priority = priority  # 1-5
        self.due_date = due_date
        self.completed = False
        self._internal_id = id(self)

class TaskDisplay:
    def show(self, task):
        # Reaches directly into Task's attributes
        # If Task changes its attribute names, this breaks
        print(f"[{'X' if task.completed else ' '}] {task.title}")
        print(f"  Priority: {'!' * task.priority}")
        print(f"  Due: {task.due_date}")
        print(f"  ID: {task._internal_id}")  # Accessing private attribute!

The TaskDisplay class reaches deep into Task's internals — including the private _internal_id. If Task renames priority to importance, or changes how IDs work, TaskDisplay breaks.

# LOOSE coupling: Display uses Task's public interface
class Task:
    def __init__(self, title, priority, due_date):
        self.title = title
        self.priority = priority
        self.due_date = due_date
        self.completed = False
        self._internal_id = id(self)

    def summary(self) -> str:
        """Public interface for displaying task info."""
        status = "X" if self.completed else " "
        return f"[{status}] {self.title} (Priority: {self.priority}, Due: {self.due_date})"

class TaskDisplay:
    def show(self, task):
        # Uses only the public interface
        print(task.summary())

Now TaskDisplay depends only on Task having a .summary() method. Task can change its internal structure all it wants — as long as .summary() still works, TaskDisplay doesn't care.

Cohesion: How Focused Is Your Class?

Cohesion measures how strongly the methods and attributes within a class belong together. High cohesion is good — everything in the class is related to one purpose. Low cohesion is bad — the class is a grab bag of unrelated stuff.

# LOW cohesion: This class is a junk drawer
class Utilities:
    def calculate_tax(self, amount, rate): ...
    def send_email(self, to, subject, body): ...
    def resize_image(self, image, width, height): ...
    def parse_date(self, date_string): ...
    def encrypt_password(self, password): ...

These methods have nothing to do with each other. Tax calculation, email, image processing, date parsing, and encryption are completely unrelated. This is a "utility class" — the software equivalent of a junk drawer.

# HIGH cohesion: Each class groups related functionality
class TaxCalculator:
    def calculate(self, amount, rate): ...
    def with_deductions(self, amount, rate, deductions): ...

class EmailSender:
    def send(self, to, subject, body): ...
    def send_bulk(self, recipients, subject, body): ...

class ImageProcessor:
    def resize(self, image, width, height): ...
    def crop(self, image, x, y, width, height): ...

Here's a quick diagnostic: if a class's methods don't use the same attributes, cohesion is probably low. In the Utilities class, calculate_tax() and send_email() don't share any state — they shouldn't be in the same class.

⚠️ Pitfall: Don't take this to the extreme. You don't need a separate class for every single method. A GradeBook class with add_grade(), remove_grade(), calculate_average(), and highest_grade() has high cohesion — all those methods work with grades. The goal is cohesion, not one-method-per-class.

Concept	Good	Bad	Diagnostic Question
Coupling	Loose — classes interact through public interfaces	Tight — classes depend on each other's internal details	"If I change this class's internals, what else breaks?"
Cohesion	High — all methods relate to one responsibility	Low — methods are unrelated grab-bag	"Do all methods in this class use the same attributes?"

🔄 Check Your Understanding: Look at the GradeSystem class from Section 16.1 (Version A). Rate its coupling and cohesion. How many "reasons to change" does it have? Now rate Version B's individual classes.

16.4 Design Patterns: Strategy

A design pattern is a reusable solution to a common problem. It's not a library you install — it's a template for how to structure your classes. Think of it like a recipe: it tells you the ingredients and steps, but you adapt the specifics to your situation.

We'll cover three essential patterns. First up: Strategy.

The Problem Strategy Solves

You have an algorithm that needs to vary. Different situations call for different approaches, and you want to swap them easily without changing the code that uses them.

The Text Adventure: Combat Strategies

In Crypts of Pythonia, different character classes fight differently. An aggressive fighter does maximum damage but takes more hits. A defensive fighter blocks more but deals less damage. A magic user casts spells with varying effects.

from abc import ABC, abstractmethod

# The Strategy interface
class CombatStrategy(ABC):
    @abstractmethod
    def execute(self, attacker_power: int, defender_armor: int) -> dict:
        """Returns dict with 'damage', 'description', and 'self_damage'."""
        pass

# Concrete strategies
class AggressiveStrategy(CombatStrategy):
    def execute(self, attacker_power: int, defender_armor: int) -> dict:
        # All-out attack: high damage, but leaves attacker exposed
        raw_damage = int(attacker_power * 1.5)
        actual_damage = max(0, raw_damage - defender_armor // 2)
        return {
            "damage": actual_damage,
            "description": "launches a reckless all-out attack",
            "self_damage": 5  # Leaves self exposed
        }

class DefensiveStrategy(CombatStrategy):
    def execute(self, attacker_power: int, defender_armor: int) -> dict:
        # Careful strike: lower damage, but no self-exposure
        raw_damage = attacker_power // 2
        actual_damage = max(0, raw_damage - defender_armor // 3)
        return {
            "damage": actual_damage,
            "description": "strikes cautiously from behind their shield",
            "self_damage": 0
        }

class MagicStrategy(CombatStrategy):
    def execute(self, attacker_power: int, defender_armor: int) -> dict:
        # Magic bypasses armor but costs mana (represented as self_damage)
        actual_damage = attacker_power  # Ignores armor entirely
        return {
            "damage": actual_damage,
            "description": "channels arcane energy into a devastating spell",
            "self_damage": 10  # Mana cost represented as fatigue
        }

# The Context: uses a strategy without knowing which one
class Character:
    def __init__(self, name: str, power: int, armor: int, hp: int):
        self.name = name
        self.power = power
        self.armor = armor
        self.hp = hp
        self.strategy: CombatStrategy = AggressiveStrategy()  # Default

    def set_strategy(self, strategy: CombatStrategy) -> None:
        """Swap combat style at runtime."""
        self.strategy = strategy

    def attack(self, target: "Character") -> str:
        result = self.strategy.execute(self.power, target.armor)
        target.hp -= result["damage"]
        self.hp -= result["self_damage"]
        return (
            f"{self.name} {result['description']}!\n"
            f"  Deals {result['damage']} damage to {target.name}. "
            f"({target.name} HP: {target.hp})"
        )

# Usage
hero = Character("Aldric", power=20, armor=15, hp=100)
dragon = Character("Smolderfang", power=30, armor=25, hp=200)

hero.set_strategy(AggressiveStrategy())
print(hero.attack(dragon))

hero.set_strategy(MagicStrategy())
print(hero.attack(dragon))

hero.set_strategy(DefensiveStrategy())
print(hero.attack(dragon))

Expected output:

Aldric launches a reckless all-out attack!
  Deals 18 damage to Smolderfang. (Smolderfang HP: 182)
Aldric channels arcane energy into a devastating spell!
  Deals 20 damage to Smolderfang. (Smolderfang HP: 162)
Aldric strikes cautiously from behind their shield!
  Deals 2 damage to Smolderfang. (Smolderfang HP: 160)

The key insight: Character.attack() doesn't contain a single if/elif chain. It delegates to whatever strategy object it currently holds. Adding a new combat style — say, StealthStrategy — means writing one new class. The Character class never changes. That's the Open/Closed Principle in action, enabled by the Strategy pattern.

📊 Pattern Anatomy: Every Strategy pattern has three parts: (1) a Strategy interface (abstract base class defining the method), (2) Concrete strategies (classes implementing the interface), and (3) a Context (the class that uses a strategy). The context holds a reference to a strategy object and delegates work to it.

16.5 Design Patterns: Observer

The Problem Observer Solves

One object changes state, and several other objects need to react to that change — but you don't want the first object to know about all the specific reactors. In a grade calculator, when a grade changes, the display needs to update, the log needs to record it, and the GPA calculator needs to recalculate. But the GradeBook shouldn't have to know about all of those systems directly.

The Grade Calculator: Reacting to Changes

from abc import ABC, abstractmethod

# Observer interface
class GradeObserver(ABC):
    @abstractmethod
    def on_grade_changed(self, student: str, old_grade: float,
                         new_grade: float) -> None:
        pass

# Concrete observers
class DisplayUpdater(GradeObserver):
    def on_grade_changed(self, student: str, old_grade: float,
                         new_grade: float) -> None:
        print(f"[DISPLAY] {student}'s grade updated: "
              f"{old_grade:.1f} -> {new_grade:.1f}")

class GradeLogger(GradeObserver):
    def __init__(self):
        self.log: list[str] = []

    def on_grade_changed(self, student: str, old_grade: float,
                         new_grade: float) -> None:
        entry = f"{student}: {old_grade:.1f} -> {new_grade:.1f}"
        self.log.append(entry)
        print(f"[LOG] Recorded: {entry}")

class GPACalculator(GradeObserver):
    def __init__(self):
        self.grades: dict[str, float] = {}

    def on_grade_changed(self, student: str, old_grade: float,
                         new_grade: float) -> None:
        self.grades[student] = new_grade
        if self.grades:
            avg = sum(self.grades.values()) / len(self.grades)
            print(f"[GPA] Class average is now: {avg:.2f}")

# Subject (the thing being observed)
class GradeBook:
    def __init__(self):
        self._grades: dict[str, float] = {}
        self._observers: list[GradeObserver] = []

    def add_observer(self, observer: GradeObserver) -> None:
        self._observers.append(observer)

    def remove_observer(self, observer: GradeObserver) -> None:
        self._observers.remove(observer)

    def _notify_observers(self, student: str, old_grade: float,
                          new_grade: float) -> None:
        for observer in self._observers:
            observer.on_grade_changed(student, old_grade, new_grade)

    def set_grade(self, student: str, grade: float) -> None:
        old_grade = self._grades.get(student, 0.0)
        self._grades[student] = grade
        self._notify_observers(student, old_grade, grade)

# Wire it up
gradebook = GradeBook()
gradebook.add_observer(DisplayUpdater())
gradebook.add_observer(GradeLogger())
gradebook.add_observer(GPACalculator())

gradebook.set_grade("Alice", 92.0)
print()
gradebook.set_grade("Bob", 85.0)
print()
gradebook.set_grade("Alice", 95.0)

Expected output:

[DISPLAY] Alice's grade updated: 0.0 -> 92.0
[LOG] Recorded: Alice: 0.0 -> 92.0
[GPA] Class average is now: 92.00

[DISPLAY] Bob's grade updated: 0.0 -> 85.0
[LOG] Recorded: Bob: 0.0 -> 85.0
[GPA] Class average is now: 88.50

[DISPLAY] Alice's grade updated: 92.0 -> 95.0
[LOG] Recorded: Alice: 92.0 -> 95.0
[GPA] Class average is now: 90.00

Notice that GradeBook knows nothing about displays, logs, or GPA calculations. It just maintains a list of observers and calls their on_grade_changed() method when something changes. You can add new observers — an email notifier, a parent notification system, a statistics tracker — without changing GradeBook at all.

💡 Intuition Builder: Think of a YouTube subscription. When a creator uploads a video, every subscriber gets notified. The creator doesn't personally email each subscriber — YouTube's notification system handles it. The creator is the "subject," subscribers are "observers," and YouTube is the pattern infrastructure.

🔗 Bridge from Chapter 15: Observer uses polymorphism at its core. GradeBook iterates over a list of GradeObserver objects and calls .on_grade_changed(). It doesn't know (or care) whether each observer is a DisplayUpdater or a GPACalculator. This is exactly the polymorphic dispatch we learned in Chapter 15.

16.6 Design Patterns: Factory

The Problem Factory Solves

You need to create objects, but the exact class to instantiate depends on some input or configuration. You don't want the calling code littered with if/elif chains that decide which class to create.

Elena's Report Formats

Elena's nonprofit needs reports in CSV, HTML, and PDF formats. Instead of the calling code deciding which formatter to create, we use a factory:

from abc import ABC, abstractmethod

class ReportFormatter(ABC):
    @abstractmethod
    def format(self, data: list[dict]) -> str:
        pass

    @abstractmethod
    def file_extension(self) -> str:
        pass

class CSVFormatter(ReportFormatter):
    def format(self, data: list[dict]) -> str:
        if not data:
            return ""
        headers = ",".join(data[0].keys())
        rows = [",".join(str(v) for v in row.values()) for row in data]
        return headers + "\n" + "\n".join(rows)

    def file_extension(self) -> str:
        return ".csv"

class HTMLFormatter(ReportFormatter):
    def format(self, data: list[dict]) -> str:
        if not data:
            return "<table></table>"
        headers = "".join(f"<th>{h}</th>" for h in data[0].keys())
        rows = "".join(
            "<tr>" + "".join(f"<td>{v}</td>" for v in row.values()) + "</tr>"
            for row in data
        )
        return f"<table><thead><tr>{headers}</tr></thead><tbody>{rows}</tbody></table>"

    def file_extension(self) -> str:
        return ".html"

class PlainTextFormatter(ReportFormatter):
    def format(self, data: list[dict]) -> str:
        if not data:
            return "(no data)"
        lines = []
        for row in data:
            lines.append(" | ".join(f"{k}: {v}" for k, v in row.items()))
        return "\n".join(lines)

    def file_extension(self) -> str:
        return ".txt"

# The Factory
class ReportFormatterFactory:
    """Creates the right formatter based on format name."""
    _formatters: dict[str, type[ReportFormatter]] = {
        "csv": CSVFormatter,
        "html": HTMLFormatter,
        "text": PlainTextFormatter,
    }

    @classmethod
    def create(cls, format_name: str) -> ReportFormatter:
        formatter_class = cls._formatters.get(format_name.lower())
        if formatter_class is None:
            available = ", ".join(cls._formatters.keys())
            raise ValueError(
                f"Unknown format '{format_name}'. "
                f"Available: {available}"
            )
        return formatter_class()

    @classmethod
    def register(cls, name: str, formatter_class: type[ReportFormatter]) -> None:
        """Register a new format without modifying factory code."""
        cls._formatters[name.lower()] = formatter_class

# Usage
sample_data = [
    {"name": "Meals Served", "count": 1247, "change": "+12%"},
    {"name": "Clients Housed", "count": 89, "change": "+3%"},
]

for fmt in ["csv", "html", "text"]:
    formatter = ReportFormatterFactory.create(fmt)
    print(f"--- {fmt.upper()} ({formatter.file_extension()}) ---")
    print(formatter.format(sample_data))
    print()

Expected output:

--- CSV (.csv) ---
name,count,change
Meals Served,1247,+12%
Clients Housed,89,+3%

--- HTML (.html) ---
<table><thead><tr><th>name</th><th>count</th><th>change</th></tr></thead><tbody><tr><td>Meals Served</td><td>1247</td><td>+12%</td></tr><tr><td>Clients Housed</td><td>89</td><td>+3%</td></tr></tbody></table>

--- TEXT (.txt) ---
name: Meals Served | count: 1247 | change: +12%
name: Clients Housed | count: 89 | change: +3%

The register() method is the real power move. Third-party code can add new formats without touching the factory's source:

class MarkdownFormatter(ReportFormatter):
    def format(self, data: list[dict]) -> str:
        if not data:
            return ""
        headers = " | ".join(data[0].keys())
        separator = " | ".join("---" for _ in data[0].keys())
        rows = "\n".join(
            " | ".join(str(v) for v in row.values()) for row in data
        )
        return f"{headers}\n{separator}\n{rows}"

    def file_extension(self) -> str:
        return ".md"

ReportFormatterFactory.register("markdown", MarkdownFormatter)
formatter = ReportFormatterFactory.create("markdown")
print(formatter.format(sample_data))

When to Use Each Pattern

Pattern	Use When...	Key Benefit	Example
Strategy	You need to swap algorithms at runtime	Eliminates conditional logic for algorithm selection	Combat styles, sorting algorithms, pricing rules
Observer	Multiple objects need to react when something changes	Subject doesn't know (or care) about its observers	Grade changes, event systems, UI updates
Factory	Object creation logic is complex or varies by type	Centralizes creation logic, easy to extend	Report formats, database connectors, game entities

🔄 Check Your Understanding: You're building a notification system where users can choose to be notified by email, SMS, or push notification. Which pattern would you use? What if the user can switch their notification preference at runtime? (Hint: one pattern handles creation, another handles runtime swapping.)

16.7 Dataclasses: Simple Data Objects

Sometimes you need a class that's basically a container for data — no complex behavior, just a way to group related values together. Python's @dataclass decorator (from the dataclasses module) eliminates the boilerplate.

The Problem with Boilerplate

Here's a regular class to hold task data:

# Without dataclass — lots of boilerplate
class Task:
    def __init__(self, title: str, priority: int, due_date: str,
                 completed: bool = False):
        self.title = title
        self.priority = priority
        self.due_date = due_date
        self.completed = completed

    def __repr__(self):
        return (f"Task(title={self.title!r}, priority={self.priority}, "
                f"due_date={self.due_date!r}, completed={self.completed})")

    def __eq__(self, other):
        if not isinstance(other, Task):
            return NotImplemented
        return (self.title == other.title and self.priority == other.priority
                and self.due_date == other.due_date
                and self.completed == other.completed)

That's 16 lines just to hold four fields and provide reasonable __repr__ and __eq__. And you have to write every attribute name three times — in the parameter list, in the assignments, and in __repr__. If you add a field, you have to update all three places.

The Dataclass Solution

from dataclasses import dataclass

@dataclass
class Task:
    title: str
    priority: int
    due_date: str
    completed: bool = False

Four lines. That's it. The @dataclass decorator auto-generates __init__, __repr__, and __eq__ for you. Let's see it in action:

from dataclasses import dataclass, field

@dataclass
class Task:
    title: str
    priority: int
    due_date: str
    completed: bool = False

# __init__ is generated automatically
task1 = Task("Write chapter 16", priority=1, due_date="2025-03-14")
task2 = Task("Review chapter 15", priority=2, due_date="2025-03-13", completed=True)
task3 = Task("Write chapter 16", priority=1, due_date="2025-03-14")

# __repr__ is generated automatically
print(task1)
print(task2)

# __eq__ compares all fields
print(f"task1 == task3: {task1 == task3}")  # Same values
print(f"task1 == task2: {task1 == task2}")  # Different values

Expected output:

Task(title='Write chapter 16', priority=1, due_date='2025-03-14', completed=False)
Task(title='Review chapter 15', priority=2, due_date='2025-03-13', completed=True)
task1 == task3: True
task1 == task2: False

Dataclass Features

You can customize behavior with parameters and field():

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Event:
    name: str
    location: str
    capacity: int
    attendees: list[str] = field(default_factory=list)  # Mutable default
    created_at: str = field(
        default_factory=lambda: datetime.now().isoformat(),
        repr=False  # Don't show in __repr__
    )

@dataclass(frozen=True)  # Immutable — can't change fields after creation
class Coordinate:
    x: float
    y: float

@dataclass(order=True)  # Generates <, <=, >, >= based on fields
class Student:
    gpa: float
    name: str = field(compare=False)  # Don't use name for ordering

event = Event("Python Meetup", "Room 101", 50)
event.attendees.append("Alice")
event.attendees.append("Bob")
print(event)

coord = Coordinate(3.0, 4.0)
print(coord)
# coord.x = 5.0  # Would raise FrozenInstanceError

students = [
    Student(3.8, "Alice"),
    Student(3.5, "Bob"),
    Student(3.9, "Carol"),
]
print(sorted(students))

Expected output:

Event(name='Python Meetup', location='Room 101', capacity=50, attendees=['Alice', 'Bob'])
Coordinate(x=3.0, y=4.0)
[Student(gpa=3.5, name='Bob'), Student(gpa=3.8, name='Alice'), Student(gpa=3.9, name='Carol')]

⚠️ Pitfall — Mutable Defaults: Never write attendees: list[str] = [] in a dataclass. Just like with function defaults (Chapter 6), all instances would share the same list. Always use field(default_factory=list) for mutable defaults.

When to Use Dataclasses vs. Regular Classes

Use Dataclasses When...	Use Regular Classes When...
The class is primarily data with minimal behavior	The class has significant behavior and business logic
You want auto-generated `__init__`, `__repr__`, `__eq__`	You need custom initialization logic
You want to compare instances by value	You want identity-based comparison (default `is`)
The class is a record, config, or DTO	The class manages complex state with invariants

16.8 Code Smells and Refactoring

A code smell isn't a bug — your code runs fine. It's a surface indicator that something deeper might be wrong with the design. The term comes from Kent Beck and Martin Fowler: if code "smells bad," it's worth investigating.

Refactoring is the discipline of improving code's design without changing its behavior. You're not adding features or fixing bugs — you're making the code cleaner, more readable, and easier to change.

Common Code Smells

Smell	What It Looks Like	What It Suggests
God Class	One class with 500+ lines, 20+ methods	Violates SRP — break it up
Long Method	A method that's 50+ lines	Decompose into smaller methods
Feature Envy	A method that uses another class's data more than its own	Move the method to the class whose data it uses
Shotgun Surgery	One change requires modifying 5+ classes	Classes are too tightly coupled
Primitive Obsession	Using strings/ints where a custom class would be clearer	Create a class (or dataclass) to represent the concept
Duplicated Code	Same logic in multiple places	Extract into a shared method or class

🧩 Productive Struggle: Spot the Smells

Study the following code carefully. Before reading the analysis, identify as many design problems as you can. Write them down. Then compare with the analysis below.

class StudentManager:
    def __init__(self):
        self.students = []
        self.db_connection = None
        self.email_host = "smtp.school.edu"
        self.email_port = 587

    def add_student(self, name, age, email, gpa,
                    major, phone, address, emergency_contact):
        student = {
            "name": name, "age": age, "email": email,
            "gpa": gpa, "major": major, "phone": phone,
            "address": address, "emergency_contact": emergency_contact,
        }
        self.students.append(student)
        # Duplicate: also log to database
        if self.db_connection:
            self.db_connection.execute(
                "INSERT INTO students VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
                (name, age, email, gpa, major, phone, address,
                 emergency_contact)
            )
        # Also send welcome email
        import smtplib
        server = smtplib.SMTP(self.email_host, self.email_port)
        server.send_message(f"Welcome {name}!")

    def get_honor_students(self):
        result = []
        for s in self.students:
            if s["gpa"] >= 3.5:
                result.append(s["name"] + " (" + s["major"] + ") - " +
                              s["email"] + " - GPA: " + str(s["gpa"]))
        return result

    def get_probation_students(self):
        result = []
        for s in self.students:
            if s["gpa"] < 2.0:
                result.append(s["name"] + " (" + s["major"] + ") - " +
                              s["email"] + " - GPA: " + str(s["gpa"]))
        return result

    def export_to_csv(self):
        lines = ["name,age,email,gpa,major,phone,address,emergency_contact"]
        for s in self.students:
            lines.append(f"{s['name']},{s['age']},{s['email']},{s['gpa']},"
                         f"{s['major']},{s['phone']},{s['address']},"
                         f"{s['emergency_contact']}")
        return "\n".join(lines)

Analysis — here's what smells:

God Class: StudentManager handles student data, database operations, email, filtering, AND CSV export. That's at least five responsibilities.
Primitive Obsession: Students are dictionaries with 8 string/number fields. A Student dataclass would be much clearer.
Long Parameter List: add_student() takes 8 parameters. That's a sign the parameters should be grouped into an object.
Duplicated Code: get_honor_students() and get_probation_students() have nearly identical formatting logic.
Feature Envy: The CSV export method reaches into each student dictionary's internals — it should be a method on a Student class or a dedicated exporter.
Tight Coupling: The class directly creates SMTP connections and database queries. You can't test add_student() without an email server and a database.

Refactored Version

from dataclasses import dataclass

@dataclass
class Student:
    name: str
    age: int
    email: str
    gpa: float
    major: str
    phone: str = ""
    address: str = ""
    emergency_contact: str = ""

    def summary(self) -> str:
        return f"{self.name} ({self.major}) - {self.email} - GPA: {self.gpa}"

class StudentRepository:
    """Manages student collection — one responsibility."""
    def __init__(self):
        self._students: list[Student] = []

    def add(self, student: Student) -> None:
        self._students.append(student)

    def find_by_gpa(self, min_gpa: float = 0.0,
                    max_gpa: float = 4.0) -> list[Student]:
        return [s for s in self._students
                if min_gpa <= s.gpa <= max_gpa]

    @property
    def all(self) -> list[Student]:
        return list(self._students)

class StudentCSVExporter:
    """Exports students to CSV — one responsibility."""
    def export(self, students: list[Student]) -> str:
        headers = "name,age,email,gpa,major,phone,address,emergency_contact"
        rows = [
            f"{s.name},{s.age},{s.email},{s.gpa},"
            f"{s.major},{s.phone},{s.address},{s.emergency_contact}"
            for s in students
        ]
        return headers + "\n" + "\n".join(rows)

The refactored version is longer in total lines, but each piece is simpler, testable, and independently changeable. That's the trade-off: a little more structure now saves a lot of pain later.

🔗 Spaced Review — Chapter 13 (Testing): Notice how the refactored version is much easier to test. You can test StudentRepository without an email server. You can test StudentCSVExporter by creating Student objects directly — no database required. Good design and testability go hand in hand. If your code is hard to test, that's often a code smell pointing at a design problem.

16.9 When NOT to Use OOP

This might be the most important section in this chapter. OOP is powerful, but it's not always the right tool. One of the marks of a mature programmer is knowing when not to use something.

Simple Scripts

If your program is under 100 lines and does one thing, a few functions are probably better than a class hierarchy:

# DON'T: Over-engineering a simple script
class FileWordCounter:
    def __init__(self, filename):
        self.filename = filename
        self.word_count = 0

    def count(self):
        with open(self.filename) as f:
            self.word_count = len(f.read().split())
        return self.word_count

    def display(self):
        print(f"{self.filename}: {self.word_count} words")

counter = FileWordCounter("essay.txt")
counter.count()
counter.display()

# DO: Just use a function
def count_words(filename):
    with open(filename) as f:
        count = len(f.read().split())
    print(f"{filename}: {count} words")
    return count

count_words("essay.txt")

The class version is three times longer, harder to read, and provides zero benefit. The function version is clear, concise, and does the same thing.

Data Pipelines

When you're transforming data through a series of steps, functions with clear inputs and outputs are often cleaner than objects with mutable state:

# Functional pipeline — clean and readable
def load_data(filename: str) -> list[dict]:
    ...

def clean_data(records: list[dict]) -> list[dict]:
    return [r for r in records if r.get("valid")]

def calculate_stats(records: list[dict]) -> dict:
    values = [r["value"] for r in records]
    return {"mean": sum(values) / len(values), "count": len(values)}

def format_report(stats: dict) -> str:
    return f"Count: {stats['count']}, Mean: {stats['mean']:.2f}"

# Clear flow: data goes in one end, results come out the other
# result = format_report(calculate_stats(clean_data(load_data("data.csv"))))

The Decision Checklist

Use OOP when: - You have multiple objects of the same type that need to maintain individual state - You need polymorphism — different objects responding to the same method differently - The domain naturally has entities with behavior (players, accounts, vehicles) - You're building a library or framework that others will extend

Use functions when: - The task is a one-off script or data transformation - There's no state to maintain between calls - The logic is a pipeline (input -> transform -> output) - Adding a class would just be wrapping a single function

✅ Best Practice: Start with functions. Refactor into classes when you notice that you're passing the same group of data between multiple functions, or when you need multiple instances that maintain their own state. Don't design a class hierarchy on day one — let the design emerge from the code you're actually writing.

16.10 Project Checkpoint: TaskFlow v1.5

Time to apply what we've learned. In this checkpoint, we'll upgrade TaskFlow with three improvements:

Observer pattern for notifications (alert when a task is overdue)
Dataclasses for simple data objects
Loose coupling between storage and display

Here's a condensed but complete implementation. The full version is in code/project-checkpoint.py.

from dataclasses import dataclass, field
from datetime import datetime, timedelta
from abc import ABC, abstractmethod

# --- Data objects using dataclasses ---

@dataclass
class Task:
    title: str
    priority: int = 3
    due_date: str = ""
    completed: bool = False
    created_at: str = field(
        default_factory=lambda: datetime.now().strftime("%Y-%m-%d %H:%M")
    )

    def is_overdue(self) -> bool:
        if not self.due_date or self.completed:
            return False
        try:
            due = datetime.strptime(self.due_date, "%Y-%m-%d")
            return datetime.now() > due
        except ValueError:
            return False

    def summary(self) -> str:
        status = "X" if self.completed else " "
        overdue = " [OVERDUE]" if self.is_overdue() else ""
        return f"[{status}] {self.title} (P{self.priority}){overdue}"

# --- Observer pattern for notifications ---

class TaskObserver(ABC):
    @abstractmethod
    def on_task_added(self, task: Task) -> None:
        pass

    @abstractmethod
    def on_task_completed(self, task: Task) -> None:
        pass

    @abstractmethod
    def on_task_overdue(self, task: Task) -> None:
        pass

class ConsoleNotifier(TaskObserver):
    def on_task_added(self, task: Task) -> None:
        print(f"  [NOTIFY] New task: {task.title}")

    def on_task_completed(self, task: Task) -> None:
        print(f"  [NOTIFY] Completed: {task.title}")

    def on_task_overdue(self, task: Task) -> None:
        print(f"  [ALERT] OVERDUE: {task.title} (was due {task.due_date})")

class TaskLog(TaskObserver):
    def __init__(self):
        self.entries: list[str] = []

    def on_task_added(self, task: Task) -> None:
        self.entries.append(f"ADDED: {task.title}")

    def on_task_completed(self, task: Task) -> None:
        self.entries.append(f"COMPLETED: {task.title}")

    def on_task_overdue(self, task: Task) -> None:
        self.entries.append(f"OVERDUE: {task.title}")

# --- Loosely coupled TaskManager ---

class TaskManager:
    """Manages tasks with observer notifications."""
    def __init__(self):
        self._tasks: list[Task] = []
        self._observers: list[TaskObserver] = []

    def add_observer(self, observer: TaskObserver) -> None:
        self._observers.append(observer)

    def _notify(self, method_name: str, task: Task) -> None:
        for observer in self._observers:
            getattr(observer, method_name)(task)

    def add_task(self, task: Task) -> None:
        self._tasks.append(task)
        self._notify("on_task_added", task)

    def complete_task(self, index: int) -> None:
        if 0 <= index < len(self._tasks):
            self._tasks[index].completed = True
            self._notify("on_task_completed", self._tasks[index])

    def check_overdue(self) -> None:
        for task in self._tasks:
            if task.is_overdue():
                self._notify("on_task_overdue", task)

    def list_tasks(self) -> list[str]:
        return [f"  {i+1}. {t.summary()}" for i, t in enumerate(self._tasks)]

# --- Demo ---

def main():
    manager = TaskManager()
    notifier = ConsoleNotifier()
    log = TaskLog()

    manager.add_observer(notifier)
    manager.add_observer(log)

    print("=== Adding tasks ===")
    manager.add_task(Task("Write chapter 16", priority=1, due_date="2025-03-14"))
    manager.add_task(Task("Review pull request", priority=2, due_date="2024-01-01"))
    manager.add_task(Task("Buy groceries", priority=3))

    print("\n=== Current tasks ===")
    for line in manager.list_tasks():
        print(line)

    print("\n=== Checking for overdue tasks ===")
    manager.check_overdue()

    print("\n=== Completing a task ===")
    manager.complete_task(0)

    print("\n=== Log entries ===")
    for entry in log.entries:
        print(f"  {entry}")

if __name__ == "__main__":
    main()

Expected output:

=== Adding tasks ===
  [NOTIFY] New task: Write chapter 16
  [NOTIFY] New task: Review pull request
  [NOTIFY] New task: Buy groceries

=== Current tasks ===
  1. [ ] Write chapter 16 (P1)
  2. [ ] Review pull request (P2) [OVERDUE]
  3. [ ] Buy groceries (P3)

=== Checking for overdue tasks ===
  [ALERT] OVERDUE: Review pull request (was due 2024-01-01)

=== Completing a task ===
  [NOTIFY] Completed: Write chapter 16

=== Log entries ===
  ADDED: Write chapter 16
  ADDED: Review pull request
  ADDED: Buy groceries
  OVERDUE: Review pull request
  COMPLETED: Write chapter 16

Notice the design improvements over previous versions:

Task is a dataclass — clean, minimal boilerplate, auto-generated __repr__ and __eq__
Observer pattern — TaskManager doesn't know about console output or logging details; it just notifies observers
Loose coupling — you can swap ConsoleNotifier for an EmailNotifier without touching TaskManager
High cohesion — TaskManager manages tasks, ConsoleNotifier handles notifications, TaskLog handles logging

🔗 Spaced Review — Chapter 6 (Functions): Look at how main() reads as a clean sequence of high-level steps. Each step is a method call with a clear name. This is the same principle we learned for functions in Chapter 6 — give things clear names, keep each unit focused, and let the top-level code tell a story.

Chapter Summary

This chapter bridges the gap between knowing OOP mechanics and using OOP effectively. Here's what you should take away:

Principles guide your thinking. SRP says each class should have one reason to change. OCP says you should extend behavior by adding new classes, not modifying existing ones. Coupling and cohesion give you a vocabulary for evaluating designs — aim for loose coupling and high cohesion.

Patterns solve recurring problems. Strategy lets you swap algorithms at runtime. Observer lets multiple objects react to state changes without tight coupling. Factory centralizes object creation decisions. These aren't the only patterns, but they're the ones you'll use most often.

Dataclasses eliminate busywork. When a class is primarily data with minimal behavior, @dataclass gives you __init__, __repr__, and __eq__ for free. Use field(default_factory=...) for mutable defaults and frozen=True for immutable data.

Code smells are your early warning system. God classes, duplicated code, long parameter lists, and feature envy are signals that your design needs attention. Refactoring — improving design without changing behavior — is a skill you'll use throughout your career.

OOP isn't always the answer. Simple scripts, data pipelines, and one-off transformations are often better served by plain functions. Start simple; refactor into classes when the complexity warrants it.

Design is a practice, not a destination. You won't get it right the first time, and that's fine. The principles and patterns in this chapter give you a vocabulary for discussing design, a toolkit for solving common problems, and — most importantly — the ability to look at code and say, "This could be better," and know how to make it so.

Next up: Chapter 17 takes us into algorithms and data structures, where we'll learn to think about efficiency — not just whether code works, but how fast it works and how much memory it uses.

Prerequisites

Learning Objectives

In This Chapter

Chapter 16: OOP Design: Patterns and Principles

Chapter Overview

16.1 Thinking About Design

Why Design Matters

16.2 SOLID Principles for Beginners

Single Responsibility Principle (SRP)

Open/Closed Principle (OCP)

16.3 Coupling and Cohesion

Coupling: How Connected Are Your Classes?

Cohesion: How Focused Is Your Class?

16.4 Design Patterns: Strategy

The Problem Strategy Solves

The Text Adventure: Combat Strategies

16.5 Design Patterns: Observer

The Problem Observer Solves

The Grade Calculator: Reacting to Changes

16.6 Design Patterns: Factory

The Problem Factory Solves

Elena's Report Formats

When to Use Each Pattern

16.7 Dataclasses: Simple Data Objects

The Problem with Boilerplate

The Dataclass Solution

Dataclass Features

When to Use Dataclasses vs. Regular Classes

16.8 Code Smells and Refactoring

Common Code Smells

🧩 Productive Struggle: Spot the Smells

Refactored Version

16.9 When NOT to Use OOP

Simple Scripts

Data Pipelines

The Decision Checklist

16.10 Project Checkpoint: TaskFlow v1.5

Chapter Summary

Related Reading