> "Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
Learning Objectives
- Apply the Single Responsibility Principle and Open/Closed Principle to evaluate and improve class designs
- Distinguish high cohesion from low cohesion and tight coupling from loose coupling in real code
- Implement the Strategy, Observer, and Factory design patterns in Python
- Use @dataclass to create clean data-holding classes with minimal boilerplate
- Identify common code smells and apply systematic refactoring techniques
- Evaluate when OOP is the right tool and when simpler approaches are better
In This Chapter
- Chapter Overview
- 16.1 Thinking About Design
- 16.2 SOLID Principles for Beginners
- 16.3 Coupling and Cohesion
- 16.4 Design Patterns: Strategy
- 16.5 Design Patterns: Observer
- 16.6 Design Patterns: Factory
- 16.7 Dataclasses: Simple Data Objects
- 16.8 Code Smells and Refactoring
- 16.9 When NOT to Use OOP
- 16.10 Project Checkpoint: TaskFlow v1.5
- Chapter Summary
Chapter 16: OOP Design: Patterns and Principles
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." — Martin Fowler, Refactoring
Chapter Overview
You know how to write classes. You know how to use inheritance and polymorphism. Congratulations — you've learned the mechanics of object-oriented programming. Now comes the harder question: how do you design classes that actually work well together?
There's a difference between code that runs and code that's good. Good code is easy to change, easy to test, and easy for someone else (including future-you) to understand. Bad code fights you at every turn — you fix one bug and three more appear, you add one feature and half the system breaks, you look at code you wrote three months ago and genuinely can't figure out what it does.
This chapter is about the bridge between those two worlds. We'll explore principles that experienced developers use to evaluate designs, patterns that solve recurring problems in elegant ways, and the practical skill of recognizing when code has gone wrong and fixing it systematically.
A word of caution: design is not something you master by reading about it. You master it by doing it badly, noticing what went wrong, and doing it better next time. The principles and patterns in this chapter are guardrails, not laws. They'll save you from the worst mistakes, but the real learning happens when you apply them to your own projects and feel the difference.
In this chapter, you will learn to:
- Evaluate class designs using the Single Responsibility and Open/Closed Principles
- Recognize tight coupling and low cohesion — and know how to fix them
- Implement Strategy, Observer, and Factory patterns in Python
- Use @dataclass to eliminate boilerplate in data-holding classes
- Identify code smells and refactor systematically
- Know when not to use OOP
🏃 Fast Track: If you're comfortable with design principles conceptually, skim 16.1–16.3 and focus on the pattern implementations (16.4–16.6) and dataclasses (16.7). Then work through the project checkpoint.
🔬 Deep Dive: After this chapter, read Case Study 1 for a full refactoring walkthrough and Case Study 2 to see how real Python libraries use these exact patterns.
16.1 Thinking About Design
Let's start with a story you'll recognize if you've built anything beyond a toy program.
You write a class. It works. You add a feature. Still works. You add another feature, then another. Somewhere around the fifth feature, you notice something uncomfortable: every change requires modifying three different methods. The __init__ method is 40 lines long. You have a method called do_everything() that truly does everything. You want to add a simple notification when a task is overdue, and you realize you'd have to rewrite half the class.
You've hit the wall that separates "code that works" from "code that's designed well."
Why Design Matters
Design isn't about making code look pretty or following rules for the sake of rules. It's about managing the cost of change. Software changes constantly — new features, bug fixes, new requirements from users. Well-designed code makes changes easy and safe. Poorly designed code makes every change a risk.
Here's a concrete example. Consider two versions of a grade calculator:
# Version A: Everything in one class
class GradeSystem:
def __init__(self):
self.students = []
self.grades = {}
self.display_mode = "text"
self.storage_file = "grades.json"
self.email_server = "smtp.school.edu"
def add_student(self, name): ...
def record_grade(self, student, grade): ...
def calculate_gpa(self, student): ...
def display_report(self): ...
def save_to_file(self): ...
def load_from_file(self): ...
def email_report(self, recipient): ...
def generate_html_report(self): ...
def validate_grade(self, grade): ...
def calculate_class_average(self): ...
This class does everything. Grading, display, storage, email, HTML generation, validation. Now ask yourself: what happens when you need to change how reports are displayed? You have to modify this giant class, which also handles email and storage. What if the email change accidentally breaks grade calculation? You won't know until it blows up in production.
Now consider a different design:
# Version B: Separated responsibilities
class GradeBook:
"""Manages students and their grades."""
def add_student(self, name): ...
def record_grade(self, student, grade): ...
def calculate_gpa(self, student): ...
def calculate_class_average(self): ...
class GradeValidator:
"""Validates grade values."""
def validate(self, grade): ...
class ReportGenerator:
"""Creates reports in various formats."""
def text_report(self, gradebook): ...
def html_report(self, gradebook): ...
class GradeStorage:
"""Saves and loads grade data."""
def save(self, gradebook, filename): ...
def load(self, filename): ...
Same functionality, but now each class has one job. You can change report formatting without touching grade calculations. You can swap file storage for database storage without affecting anything else. Each class is small enough to fit in your head.
That's what design is about: organizing code so that each piece has a clear purpose and changes don't ripple through the entire system.
🔗 Connection to Chapter 6 (Functions): We already applied this thinking to functions — each function should do one thing and do it well. OOP design is the same principle at a larger scale. Functions are your small units of organization; classes are your medium-sized ones.
16.2 SOLID Principles for Beginners
SOLID is an acronym for five design principles popularized by Robert C. Martin ("Uncle Bob"). We'll focus on the two that matter most at this stage: S and O. The others (Liskov Substitution, Interface Segregation, Dependency Inversion) build on these — you'll encounter them in a software engineering course.
Single Responsibility Principle (SRP)
A class should have one reason to change.
That's the formal statement. Here's the practical version: if you can't describe what a class does in one sentence without using the word "and," it probably has too many responsibilities.
Let's test this with our text adventure game:
# VIOLATES SRP: Player class does too many things
class Player:
def __init__(self, name):
self.name = name
self.hp = 100
self.inventory = []
self.position = (0, 0)
def move(self, direction): ... # Navigation
def attack(self, enemy): ... # Combat
def take_damage(self, amount): ... # Combat
def pick_up(self, item): ... # Inventory
def drop(self, item): ... # Inventory
def save_game(self, filename): ... # Persistence
def load_game(self, filename): ... # Persistence
def draw_on_screen(self): ... # Display
def play_sound(self, sound): ... # Audio
What does this class do? "It manages the player's state and handles combat and manages inventory and saves/loads games and draws to the screen and plays sounds." That's a lot of "ands."
A better design splits these into focused classes:
# FOLLOWS SRP: Each class has one job
class Player:
"""Manages player state."""
def __init__(self, name):
self.name = name
self.hp = 100
self.inventory = Inventory()
self.position = (0, 0)
class Inventory:
"""Manages items."""
def pick_up(self, item): ...
def drop(self, item): ...
class CombatSystem:
"""Handles combat between entities."""
def attack(self, attacker, defender): ...
def apply_damage(self, target, amount): ...
class GameSaver:
"""Persists game state."""
def save(self, game_state, filename): ...
def load(self, filename): ...
🔄 Check Your Understanding: Look at the
TaskFlowclasses from Chapter 14. DoesTaskListfollow SRP? What aboutTaskStorage? Identify any responsibilities that could be separated.
Open/Closed Principle (OCP)
Classes should be open for extension but closed for modification.
Translation: you should be able to add new behavior without changing existing code that already works.
This one sounds abstract, so let's make it concrete. Imagine Elena needs her report generator to support new output formats:
# VIOLATES OCP: Adding a format means modifying existing code
class ReportGenerator:
def generate(self, data, format_type):
if format_type == "csv":
return self._make_csv(data)
elif format_type == "html":
return self._make_html(data)
elif format_type == "pdf": # Must modify this class!
return self._make_pdf(data)
# Every new format = another elif = modifying tested code
Every time Elena needs a new format, she has to open up this class and add another elif branch. That means retesting the entire class, because the change could break the existing CSV and HTML code.
Now consider this approach:
# FOLLOWS OCP: New formats don't touch existing code
from abc import ABC, abstractmethod
class ReportFormatter(ABC):
@abstractmethod
def format(self, data: list[dict]) -> str:
pass
class CSVFormatter(ReportFormatter):
def format(self, data: list[dict]) -> str:
if not data:
return ""
headers = ",".join(data[0].keys())
rows = [",".join(str(v) for v in row.values()) for row in data]
return headers + "\n" + "\n".join(rows)
class HTMLFormatter(ReportFormatter):
def format(self, data: list[dict]) -> str:
if not data:
return "<table></table>"
headers = "".join(f"<th>{h}</th>" for h in data[0].keys())
rows = "".join(
"<tr>" + "".join(f"<td>{v}</td>" for v in row.values()) + "</tr>"
for row in data
)
return f"<table><tr>{headers}</tr>{rows}</table>"
class ReportGenerator:
def __init__(self, formatter: ReportFormatter):
self.formatter = formatter
def generate(self, data: list[dict]) -> str:
return self.formatter.format(data)
Now adding PDF support means creating a new PDFFormatter class — the existing ReportGenerator, CSVFormatter, and HTMLFormatter are completely untouched. That's the Open/Closed Principle in action.
Notice how this connects to inheritance and polymorphism from Chapter 15. OCP works because of polymorphism: the ReportGenerator doesn't care which formatter it has — it just calls .format() and trusts the subclass to do the right thing.
💡 Intuition Builder: Think of a power strip. When you need to plug in a new device, you don't rewire the power strip — you just plug the new device into an open socket. The power strip is "closed for modification" (you don't change its wiring) but "open for extension" (you can add new devices). OCP is the same idea for code.
16.3 Coupling and Cohesion
Two concepts that experienced developers think about constantly: coupling (how much classes depend on each other) and cohesion (how focused a single class is on one job).
Coupling: How Connected Are Your Classes?
Coupling measures how much one class depends on the internal details of another. Tight coupling is bad — it means changing one class forces you to change another. Loose coupling is good — classes interact through clean interfaces and don't care about each other's internals.
# TIGHT coupling: Display knows everything about Task internals
class Task:
def __init__(self, title, priority, due_date):
self.title = title
self.priority = priority # 1-5
self.due_date = due_date
self.completed = False
self._internal_id = id(self)
class TaskDisplay:
def show(self, task):
# Reaches directly into Task's attributes
# If Task changes its attribute names, this breaks
print(f"[{'X' if task.completed else ' '}] {task.title}")
print(f" Priority: {'!' * task.priority}")
print(f" Due: {task.due_date}")
print(f" ID: {task._internal_id}") # Accessing private attribute!
The TaskDisplay class reaches deep into Task's internals — including the private _internal_id. If Task renames priority to importance, or changes how IDs work, TaskDisplay breaks.
# LOOSE coupling: Display uses Task's public interface
class Task:
def __init__(self, title, priority, due_date):
self.title = title
self.priority = priority
self.due_date = due_date
self.completed = False
self._internal_id = id(self)
def summary(self) -> str:
"""Public interface for displaying task info."""
status = "X" if self.completed else " "
return f"[{status}] {self.title} (Priority: {self.priority}, Due: {self.due_date})"
class TaskDisplay:
def show(self, task):
# Uses only the public interface
print(task.summary())
Now TaskDisplay depends only on Task having a .summary() method. Task can change its internal structure all it wants — as long as .summary() still works, TaskDisplay doesn't care.
Cohesion: How Focused Is Your Class?
Cohesion measures how strongly the methods and attributes within a class belong together. High cohesion is good — everything in the class is related to one purpose. Low cohesion is bad — the class is a grab bag of unrelated stuff.
# LOW cohesion: This class is a junk drawer
class Utilities:
def calculate_tax(self, amount, rate): ...
def send_email(self, to, subject, body): ...
def resize_image(self, image, width, height): ...
def parse_date(self, date_string): ...
def encrypt_password(self, password): ...
These methods have nothing to do with each other. Tax calculation, email, image processing, date parsing, and encryption are completely unrelated. This is a "utility class" — the software equivalent of a junk drawer.
# HIGH cohesion: Each class groups related functionality
class TaxCalculator:
def calculate(self, amount, rate): ...
def with_deductions(self, amount, rate, deductions): ...
class EmailSender:
def send(self, to, subject, body): ...
def send_bulk(self, recipients, subject, body): ...
class ImageProcessor:
def resize(self, image, width, height): ...
def crop(self, image, x, y, width, height): ...
Here's a quick diagnostic: if a class's methods don't use the same attributes, cohesion is probably low. In the Utilities class, calculate_tax() and send_email() don't share any state — they shouldn't be in the same class.
⚠️ Pitfall: Don't take this to the extreme. You don't need a separate class for every single method. A
GradeBookclass withadd_grade(),remove_grade(),calculate_average(), andhighest_grade()has high cohesion — all those methods work with grades. The goal is cohesion, not one-method-per-class.
| Concept | Good | Bad | Diagnostic Question |
|---|---|---|---|
| Coupling | Loose — classes interact through public interfaces | Tight — classes depend on each other's internal details | "If I change this class's internals, what else breaks?" |
| Cohesion | High — all methods relate to one responsibility | Low — methods are unrelated grab-bag | "Do all methods in this class use the same attributes?" |
🔄 Check Your Understanding: Look at the
GradeSystemclass from Section 16.1 (Version A). Rate its coupling and cohesion. How many "reasons to change" does it have? Now rate Version B's individual classes.
16.4 Design Patterns: Strategy
A design pattern is a reusable solution to a common problem. It's not a library you install — it's a template for how to structure your classes. Think of it like a recipe: it tells you the ingredients and steps, but you adapt the specifics to your situation.
We'll cover three essential patterns. First up: Strategy.
The Problem Strategy Solves
You have an algorithm that needs to vary. Different situations call for different approaches, and you want to swap them easily without changing the code that uses them.
The Text Adventure: Combat Strategies
In Crypts of Pythonia, different character classes fight differently. An aggressive fighter does maximum damage but takes more hits. A defensive fighter blocks more but deals less damage. A magic user casts spells with varying effects.
from abc import ABC, abstractmethod
# The Strategy interface
class CombatStrategy(ABC):
@abstractmethod
def execute(self, attacker_power: int, defender_armor: int) -> dict:
"""Returns dict with 'damage', 'description', and 'self_damage'."""
pass
# Concrete strategies
class AggressiveStrategy(CombatStrategy):
def execute(self, attacker_power: int, defender_armor: int) -> dict:
# All-out attack: high damage, but leaves attacker exposed
raw_damage = int(attacker_power * 1.5)
actual_damage = max(0, raw_damage - defender_armor // 2)
return {
"damage": actual_damage,
"description": "launches a reckless all-out attack",
"self_damage": 5 # Leaves self exposed
}
class DefensiveStrategy(CombatStrategy):
def execute(self, attacker_power: int, defender_armor: int) -> dict:
# Careful strike: lower damage, but no self-exposure
raw_damage = attacker_power // 2
actual_damage = max(0, raw_damage - defender_armor // 3)
return {
"damage": actual_damage,
"description": "strikes cautiously from behind their shield",
"self_damage": 0
}
class MagicStrategy(CombatStrategy):
def execute(self, attacker_power: int, defender_armor: int) -> dict:
# Magic bypasses armor but costs mana (represented as self_damage)
actual_damage = attacker_power # Ignores armor entirely
return {
"damage": actual_damage,
"description": "channels arcane energy into a devastating spell",
"self_damage": 10 # Mana cost represented as fatigue
}
# The Context: uses a strategy without knowing which one
class Character:
def __init__(self, name: str, power: int, armor: int, hp: int):
self.name = name
self.power = power
self.armor = armor
self.hp = hp
self.strategy: CombatStrategy = AggressiveStrategy() # Default
def set_strategy(self, strategy: CombatStrategy) -> None:
"""Swap combat style at runtime."""
self.strategy = strategy
def attack(self, target: "Character") -> str:
result = self.strategy.execute(self.power, target.armor)
target.hp -= result["damage"]
self.hp -= result["self_damage"]
return (
f"{self.name} {result['description']}!\n"
f" Deals {result['damage']} damage to {target.name}. "
f"({target.name} HP: {target.hp})"
)
# Usage
hero = Character("Aldric", power=20, armor=15, hp=100)
dragon = Character("Smolderfang", power=30, armor=25, hp=200)
hero.set_strategy(AggressiveStrategy())
print(hero.attack(dragon))
hero.set_strategy(MagicStrategy())
print(hero.attack(dragon))
hero.set_strategy(DefensiveStrategy())
print(hero.attack(dragon))
Expected output:
Aldric launches a reckless all-out attack!
Deals 18 damage to Smolderfang. (Smolderfang HP: 182)
Aldric channels arcane energy into a devastating spell!
Deals 20 damage to Smolderfang. (Smolderfang HP: 162)
Aldric strikes cautiously from behind their shield!
Deals 2 damage to Smolderfang. (Smolderfang HP: 160)
The key insight: Character.attack() doesn't contain a single if/elif chain. It delegates to whatever strategy object it currently holds. Adding a new combat style — say, StealthStrategy — means writing one new class. The Character class never changes. That's the Open/Closed Principle in action, enabled by the Strategy pattern.
📊 Pattern Anatomy: Every Strategy pattern has three parts: (1) a Strategy interface (abstract base class defining the method), (2) Concrete strategies (classes implementing the interface), and (3) a Context (the class that uses a strategy). The context holds a reference to a strategy object and delegates work to it.
16.5 Design Patterns: Observer
The Problem Observer Solves
One object changes state, and several other objects need to react to that change — but you don't want the first object to know about all the specific reactors. In a grade calculator, when a grade changes, the display needs to update, the log needs to record it, and the GPA calculator needs to recalculate. But the GradeBook shouldn't have to know about all of those systems directly.
The Grade Calculator: Reacting to Changes
from abc import ABC, abstractmethod
# Observer interface
class GradeObserver(ABC):
@abstractmethod
def on_grade_changed(self, student: str, old_grade: float,
new_grade: float) -> None:
pass
# Concrete observers
class DisplayUpdater(GradeObserver):
def on_grade_changed(self, student: str, old_grade: float,
new_grade: float) -> None:
print(f"[DISPLAY] {student}'s grade updated: "
f"{old_grade:.1f} -> {new_grade:.1f}")
class GradeLogger(GradeObserver):
def __init__(self):
self.log: list[str] = []
def on_grade_changed(self, student: str, old_grade: float,
new_grade: float) -> None:
entry = f"{student}: {old_grade:.1f} -> {new_grade:.1f}"
self.log.append(entry)
print(f"[LOG] Recorded: {entry}")
class GPACalculator(GradeObserver):
def __init__(self):
self.grades: dict[str, float] = {}
def on_grade_changed(self, student: str, old_grade: float,
new_grade: float) -> None:
self.grades[student] = new_grade
if self.grades:
avg = sum(self.grades.values()) / len(self.grades)
print(f"[GPA] Class average is now: {avg:.2f}")
# Subject (the thing being observed)
class GradeBook:
def __init__(self):
self._grades: dict[str, float] = {}
self._observers: list[GradeObserver] = []
def add_observer(self, observer: GradeObserver) -> None:
self._observers.append(observer)
def remove_observer(self, observer: GradeObserver) -> None:
self._observers.remove(observer)
def _notify_observers(self, student: str, old_grade: float,
new_grade: float) -> None:
for observer in self._observers:
observer.on_grade_changed(student, old_grade, new_grade)
def set_grade(self, student: str, grade: float) -> None:
old_grade = self._grades.get(student, 0.0)
self._grades[student] = grade
self._notify_observers(student, old_grade, grade)
# Wire it up
gradebook = GradeBook()
gradebook.add_observer(DisplayUpdater())
gradebook.add_observer(GradeLogger())
gradebook.add_observer(GPACalculator())
gradebook.set_grade("Alice", 92.0)
print()
gradebook.set_grade("Bob", 85.0)
print()
gradebook.set_grade("Alice", 95.0)
Expected output:
[DISPLAY] Alice's grade updated: 0.0 -> 92.0
[LOG] Recorded: Alice: 0.0 -> 92.0
[GPA] Class average is now: 92.00
[DISPLAY] Bob's grade updated: 0.0 -> 85.0
[LOG] Recorded: Bob: 0.0 -> 85.0
[GPA] Class average is now: 88.50
[DISPLAY] Alice's grade updated: 92.0 -> 95.0
[LOG] Recorded: Alice: 92.0 -> 95.0
[GPA] Class average is now: 90.00
Notice that GradeBook knows nothing about displays, logs, or GPA calculations. It just maintains a list of observers and calls their on_grade_changed() method when something changes. You can add new observers — an email notifier, a parent notification system, a statistics tracker — without changing GradeBook at all.
💡 Intuition Builder: Think of a YouTube subscription. When a creator uploads a video, every subscriber gets notified. The creator doesn't personally email each subscriber — YouTube's notification system handles it. The creator is the "subject," subscribers are "observers," and YouTube is the pattern infrastructure.
🔗 Bridge from Chapter 15: Observer uses polymorphism at its core.
GradeBookiterates over a list ofGradeObserverobjects and calls.on_grade_changed(). It doesn't know (or care) whether each observer is aDisplayUpdateror aGPACalculator. This is exactly the polymorphic dispatch we learned in Chapter 15.
16.6 Design Patterns: Factory
The Problem Factory Solves
You need to create objects, but the exact class to instantiate depends on some input or configuration. You don't want the calling code littered with if/elif chains that decide which class to create.
Elena's Report Formats
Elena's nonprofit needs reports in CSV, HTML, and PDF formats. Instead of the calling code deciding which formatter to create, we use a factory:
from abc import ABC, abstractmethod
class ReportFormatter(ABC):
@abstractmethod
def format(self, data: list[dict]) -> str:
pass
@abstractmethod
def file_extension(self) -> str:
pass
class CSVFormatter(ReportFormatter):
def format(self, data: list[dict]) -> str:
if not data:
return ""
headers = ",".join(data[0].keys())
rows = [",".join(str(v) for v in row.values()) for row in data]
return headers + "\n" + "\n".join(rows)
def file_extension(self) -> str:
return ".csv"
class HTMLFormatter(ReportFormatter):
def format(self, data: list[dict]) -> str:
if not data:
return "<table></table>"
headers = "".join(f"<th>{h}</th>" for h in data[0].keys())
rows = "".join(
"<tr>" + "".join(f"<td>{v}</td>" for v in row.values()) + "</tr>"
for row in data
)
return f"<table><thead><tr>{headers}</tr></thead><tbody>{rows}</tbody></table>"
def file_extension(self) -> str:
return ".html"
class PlainTextFormatter(ReportFormatter):
def format(self, data: list[dict]) -> str:
if not data:
return "(no data)"
lines = []
for row in data:
lines.append(" | ".join(f"{k}: {v}" for k, v in row.items()))
return "\n".join(lines)
def file_extension(self) -> str:
return ".txt"
# The Factory
class ReportFormatterFactory:
"""Creates the right formatter based on format name."""
_formatters: dict[str, type[ReportFormatter]] = {
"csv": CSVFormatter,
"html": HTMLFormatter,
"text": PlainTextFormatter,
}
@classmethod
def create(cls, format_name: str) -> ReportFormatter:
formatter_class = cls._formatters.get(format_name.lower())
if formatter_class is None:
available = ", ".join(cls._formatters.keys())
raise ValueError(
f"Unknown format '{format_name}'. "
f"Available: {available}"
)
return formatter_class()
@classmethod
def register(cls, name: str, formatter_class: type[ReportFormatter]) -> None:
"""Register a new format without modifying factory code."""
cls._formatters[name.lower()] = formatter_class
# Usage
sample_data = [
{"name": "Meals Served", "count": 1247, "change": "+12%"},
{"name": "Clients Housed", "count": 89, "change": "+3%"},
]
for fmt in ["csv", "html", "text"]:
formatter = ReportFormatterFactory.create(fmt)
print(f"--- {fmt.upper()} ({formatter.file_extension()}) ---")
print(formatter.format(sample_data))
print()
Expected output:
--- CSV (.csv) ---
name,count,change
Meals Served,1247,+12%
Clients Housed,89,+3%
--- HTML (.html) ---
<table><thead><tr><th>name</th><th>count</th><th>change</th></tr></thead><tbody><tr><td>Meals Served</td><td>1247</td><td>+12%</td></tr><tr><td>Clients Housed</td><td>89</td><td>+3%</td></tr></tbody></table>
--- TEXT (.txt) ---
name: Meals Served | count: 1247 | change: +12%
name: Clients Housed | count: 89 | change: +3%
The register() method is the real power move. Third-party code can add new formats without touching the factory's source:
class MarkdownFormatter(ReportFormatter):
def format(self, data: list[dict]) -> str:
if not data:
return ""
headers = " | ".join(data[0].keys())
separator = " | ".join("---" for _ in data[0].keys())
rows = "\n".join(
" | ".join(str(v) for v in row.values()) for row in data
)
return f"{headers}\n{separator}\n{rows}"
def file_extension(self) -> str:
return ".md"
ReportFormatterFactory.register("markdown", MarkdownFormatter)
formatter = ReportFormatterFactory.create("markdown")
print(formatter.format(sample_data))
When to Use Each Pattern
| Pattern | Use When... | Key Benefit | Example |
|---|---|---|---|
| Strategy | You need to swap algorithms at runtime | Eliminates conditional logic for algorithm selection | Combat styles, sorting algorithms, pricing rules |
| Observer | Multiple objects need to react when something changes | Subject doesn't know (or care) about its observers | Grade changes, event systems, UI updates |
| Factory | Object creation logic is complex or varies by type | Centralizes creation logic, easy to extend | Report formats, database connectors, game entities |
🔄 Check Your Understanding: You're building a notification system where users can choose to be notified by email, SMS, or push notification. Which pattern would you use? What if the user can switch their notification preference at runtime? (Hint: one pattern handles creation, another handles runtime swapping.)
16.7 Dataclasses: Simple Data Objects
Sometimes you need a class that's basically a container for data — no complex behavior, just a way to group related values together. Python's @dataclass decorator (from the dataclasses module) eliminates the boilerplate.
The Problem with Boilerplate
Here's a regular class to hold task data:
# Without dataclass — lots of boilerplate
class Task:
def __init__(self, title: str, priority: int, due_date: str,
completed: bool = False):
self.title = title
self.priority = priority
self.due_date = due_date
self.completed = completed
def __repr__(self):
return (f"Task(title={self.title!r}, priority={self.priority}, "
f"due_date={self.due_date!r}, completed={self.completed})")
def __eq__(self, other):
if not isinstance(other, Task):
return NotImplemented
return (self.title == other.title and self.priority == other.priority
and self.due_date == other.due_date
and self.completed == other.completed)
That's 16 lines just to hold four fields and provide reasonable __repr__ and __eq__. And you have to write every attribute name three times — in the parameter list, in the assignments, and in __repr__. If you add a field, you have to update all three places.
The Dataclass Solution
from dataclasses import dataclass
@dataclass
class Task:
title: str
priority: int
due_date: str
completed: bool = False
Four lines. That's it. The @dataclass decorator auto-generates __init__, __repr__, and __eq__ for you. Let's see it in action:
from dataclasses import dataclass, field
@dataclass
class Task:
title: str
priority: int
due_date: str
completed: bool = False
# __init__ is generated automatically
task1 = Task("Write chapter 16", priority=1, due_date="2025-03-14")
task2 = Task("Review chapter 15", priority=2, due_date="2025-03-13", completed=True)
task3 = Task("Write chapter 16", priority=1, due_date="2025-03-14")
# __repr__ is generated automatically
print(task1)
print(task2)
# __eq__ compares all fields
print(f"task1 == task3: {task1 == task3}") # Same values
print(f"task1 == task2: {task1 == task2}") # Different values
Expected output:
Task(title='Write chapter 16', priority=1, due_date='2025-03-14', completed=False)
Task(title='Review chapter 15', priority=2, due_date='2025-03-13', completed=True)
task1 == task3: True
task1 == task2: False
Dataclass Features
You can customize behavior with parameters and field():
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class Event:
name: str
location: str
capacity: int
attendees: list[str] = field(default_factory=list) # Mutable default
created_at: str = field(
default_factory=lambda: datetime.now().isoformat(),
repr=False # Don't show in __repr__
)
@dataclass(frozen=True) # Immutable — can't change fields after creation
class Coordinate:
x: float
y: float
@dataclass(order=True) # Generates <, <=, >, >= based on fields
class Student:
gpa: float
name: str = field(compare=False) # Don't use name for ordering
event = Event("Python Meetup", "Room 101", 50)
event.attendees.append("Alice")
event.attendees.append("Bob")
print(event)
coord = Coordinate(3.0, 4.0)
print(coord)
# coord.x = 5.0 # Would raise FrozenInstanceError
students = [
Student(3.8, "Alice"),
Student(3.5, "Bob"),
Student(3.9, "Carol"),
]
print(sorted(students))
Expected output:
Event(name='Python Meetup', location='Room 101', capacity=50, attendees=['Alice', 'Bob'])
Coordinate(x=3.0, y=4.0)
[Student(gpa=3.5, name='Bob'), Student(gpa=3.8, name='Alice'), Student(gpa=3.9, name='Carol')]
⚠️ Pitfall — Mutable Defaults: Never write
attendees: list[str] = []in a dataclass. Just like with function defaults (Chapter 6), all instances would share the same list. Always usefield(default_factory=list)for mutable defaults.
When to Use Dataclasses vs. Regular Classes
| Use Dataclasses When... | Use Regular Classes When... |
|---|---|
| The class is primarily data with minimal behavior | The class has significant behavior and business logic |
You want auto-generated __init__, __repr__, __eq__ |
You need custom initialization logic |
| You want to compare instances by value | You want identity-based comparison (default is) |
| The class is a record, config, or DTO | The class manages complex state with invariants |
16.8 Code Smells and Refactoring
A code smell isn't a bug — your code runs fine. It's a surface indicator that something deeper might be wrong with the design. The term comes from Kent Beck and Martin Fowler: if code "smells bad," it's worth investigating.
Refactoring is the discipline of improving code's design without changing its behavior. You're not adding features or fixing bugs — you're making the code cleaner, more readable, and easier to change.
Common Code Smells
| Smell | What It Looks Like | What It Suggests |
|---|---|---|
| God Class | One class with 500+ lines, 20+ methods | Violates SRP — break it up |
| Long Method | A method that's 50+ lines | Decompose into smaller methods |
| Feature Envy | A method that uses another class's data more than its own | Move the method to the class whose data it uses |
| Shotgun Surgery | One change requires modifying 5+ classes | Classes are too tightly coupled |
| Primitive Obsession | Using strings/ints where a custom class would be clearer | Create a class (or dataclass) to represent the concept |
| Duplicated Code | Same logic in multiple places | Extract into a shared method or class |
🧩 Productive Struggle: Spot the Smells
Study the following code carefully. Before reading the analysis, identify as many design problems as you can. Write them down. Then compare with the analysis below.
class StudentManager:
def __init__(self):
self.students = []
self.db_connection = None
self.email_host = "smtp.school.edu"
self.email_port = 587
def add_student(self, name, age, email, gpa,
major, phone, address, emergency_contact):
student = {
"name": name, "age": age, "email": email,
"gpa": gpa, "major": major, "phone": phone,
"address": address, "emergency_contact": emergency_contact,
}
self.students.append(student)
# Duplicate: also log to database
if self.db_connection:
self.db_connection.execute(
"INSERT INTO students VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
(name, age, email, gpa, major, phone, address,
emergency_contact)
)
# Also send welcome email
import smtplib
server = smtplib.SMTP(self.email_host, self.email_port)
server.send_message(f"Welcome {name}!")
def get_honor_students(self):
result = []
for s in self.students:
if s["gpa"] >= 3.5:
result.append(s["name"] + " (" + s["major"] + ") - " +
s["email"] + " - GPA: " + str(s["gpa"]))
return result
def get_probation_students(self):
result = []
for s in self.students:
if s["gpa"] < 2.0:
result.append(s["name"] + " (" + s["major"] + ") - " +
s["email"] + " - GPA: " + str(s["gpa"]))
return result
def export_to_csv(self):
lines = ["name,age,email,gpa,major,phone,address,emergency_contact"]
for s in self.students:
lines.append(f"{s['name']},{s['age']},{s['email']},{s['gpa']},"
f"{s['major']},{s['phone']},{s['address']},"
f"{s['emergency_contact']}")
return "\n".join(lines)
Analysis — here's what smells:
- God Class:
StudentManagerhandles student data, database operations, email, filtering, AND CSV export. That's at least five responsibilities. - Primitive Obsession: Students are dictionaries with 8 string/number fields. A
Studentdataclass would be much clearer. - Long Parameter List:
add_student()takes 8 parameters. That's a sign the parameters should be grouped into an object. - Duplicated Code:
get_honor_students()andget_probation_students()have nearly identical formatting logic. - Feature Envy: The CSV export method reaches into each student dictionary's internals — it should be a method on a
Studentclass or a dedicated exporter. - Tight Coupling: The class directly creates SMTP connections and database queries. You can't test
add_student()without an email server and a database.
Refactored Version
from dataclasses import dataclass
@dataclass
class Student:
name: str
age: int
email: str
gpa: float
major: str
phone: str = ""
address: str = ""
emergency_contact: str = ""
def summary(self) -> str:
return f"{self.name} ({self.major}) - {self.email} - GPA: {self.gpa}"
class StudentRepository:
"""Manages student collection — one responsibility."""
def __init__(self):
self._students: list[Student] = []
def add(self, student: Student) -> None:
self._students.append(student)
def find_by_gpa(self, min_gpa: float = 0.0,
max_gpa: float = 4.0) -> list[Student]:
return [s for s in self._students
if min_gpa <= s.gpa <= max_gpa]
@property
def all(self) -> list[Student]:
return list(self._students)
class StudentCSVExporter:
"""Exports students to CSV — one responsibility."""
def export(self, students: list[Student]) -> str:
headers = "name,age,email,gpa,major,phone,address,emergency_contact"
rows = [
f"{s.name},{s.age},{s.email},{s.gpa},"
f"{s.major},{s.phone},{s.address},{s.emergency_contact}"
for s in students
]
return headers + "\n" + "\n".join(rows)
The refactored version is longer in total lines, but each piece is simpler, testable, and independently changeable. That's the trade-off: a little more structure now saves a lot of pain later.
🔗 Spaced Review — Chapter 13 (Testing): Notice how the refactored version is much easier to test. You can test
StudentRepositorywithout an email server. You can testStudentCSVExporterby creatingStudentobjects directly — no database required. Good design and testability go hand in hand. If your code is hard to test, that's often a code smell pointing at a design problem.
16.9 When NOT to Use OOP
This might be the most important section in this chapter. OOP is powerful, but it's not always the right tool. One of the marks of a mature programmer is knowing when not to use something.
Simple Scripts
If your program is under 100 lines and does one thing, a few functions are probably better than a class hierarchy:
# DON'T: Over-engineering a simple script
class FileWordCounter:
def __init__(self, filename):
self.filename = filename
self.word_count = 0
def count(self):
with open(self.filename) as f:
self.word_count = len(f.read().split())
return self.word_count
def display(self):
print(f"{self.filename}: {self.word_count} words")
counter = FileWordCounter("essay.txt")
counter.count()
counter.display()
# DO: Just use a function
def count_words(filename):
with open(filename) as f:
count = len(f.read().split())
print(f"{filename}: {count} words")
return count
count_words("essay.txt")
The class version is three times longer, harder to read, and provides zero benefit. The function version is clear, concise, and does the same thing.
Data Pipelines
When you're transforming data through a series of steps, functions with clear inputs and outputs are often cleaner than objects with mutable state:
# Functional pipeline — clean and readable
def load_data(filename: str) -> list[dict]:
...
def clean_data(records: list[dict]) -> list[dict]:
return [r for r in records if r.get("valid")]
def calculate_stats(records: list[dict]) -> dict:
values = [r["value"] for r in records]
return {"mean": sum(values) / len(values), "count": len(values)}
def format_report(stats: dict) -> str:
return f"Count: {stats['count']}, Mean: {stats['mean']:.2f}"
# Clear flow: data goes in one end, results come out the other
# result = format_report(calculate_stats(clean_data(load_data("data.csv"))))
The Decision Checklist
Use OOP when: - You have multiple objects of the same type that need to maintain individual state - You need polymorphism — different objects responding to the same method differently - The domain naturally has entities with behavior (players, accounts, vehicles) - You're building a library or framework that others will extend
Use functions when: - The task is a one-off script or data transformation - There's no state to maintain between calls - The logic is a pipeline (input -> transform -> output) - Adding a class would just be wrapping a single function
✅ Best Practice: Start with functions. Refactor into classes when you notice that you're passing the same group of data between multiple functions, or when you need multiple instances that maintain their own state. Don't design a class hierarchy on day one — let the design emerge from the code you're actually writing.
16.10 Project Checkpoint: TaskFlow v1.5
Time to apply what we've learned. In this checkpoint, we'll upgrade TaskFlow with three improvements:
- Observer pattern for notifications (alert when a task is overdue)
- Dataclasses for simple data objects
- Loose coupling between storage and display
Here's a condensed but complete implementation. The full version is in code/project-checkpoint.py.
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from abc import ABC, abstractmethod
# --- Data objects using dataclasses ---
@dataclass
class Task:
title: str
priority: int = 3
due_date: str = ""
completed: bool = False
created_at: str = field(
default_factory=lambda: datetime.now().strftime("%Y-%m-%d %H:%M")
)
def is_overdue(self) -> bool:
if not self.due_date or self.completed:
return False
try:
due = datetime.strptime(self.due_date, "%Y-%m-%d")
return datetime.now() > due
except ValueError:
return False
def summary(self) -> str:
status = "X" if self.completed else " "
overdue = " [OVERDUE]" if self.is_overdue() else ""
return f"[{status}] {self.title} (P{self.priority}){overdue}"
# --- Observer pattern for notifications ---
class TaskObserver(ABC):
@abstractmethod
def on_task_added(self, task: Task) -> None:
pass
@abstractmethod
def on_task_completed(self, task: Task) -> None:
pass
@abstractmethod
def on_task_overdue(self, task: Task) -> None:
pass
class ConsoleNotifier(TaskObserver):
def on_task_added(self, task: Task) -> None:
print(f" [NOTIFY] New task: {task.title}")
def on_task_completed(self, task: Task) -> None:
print(f" [NOTIFY] Completed: {task.title}")
def on_task_overdue(self, task: Task) -> None:
print(f" [ALERT] OVERDUE: {task.title} (was due {task.due_date})")
class TaskLog(TaskObserver):
def __init__(self):
self.entries: list[str] = []
def on_task_added(self, task: Task) -> None:
self.entries.append(f"ADDED: {task.title}")
def on_task_completed(self, task: Task) -> None:
self.entries.append(f"COMPLETED: {task.title}")
def on_task_overdue(self, task: Task) -> None:
self.entries.append(f"OVERDUE: {task.title}")
# --- Loosely coupled TaskManager ---
class TaskManager:
"""Manages tasks with observer notifications."""
def __init__(self):
self._tasks: list[Task] = []
self._observers: list[TaskObserver] = []
def add_observer(self, observer: TaskObserver) -> None:
self._observers.append(observer)
def _notify(self, method_name: str, task: Task) -> None:
for observer in self._observers:
getattr(observer, method_name)(task)
def add_task(self, task: Task) -> None:
self._tasks.append(task)
self._notify("on_task_added", task)
def complete_task(self, index: int) -> None:
if 0 <= index < len(self._tasks):
self._tasks[index].completed = True
self._notify("on_task_completed", self._tasks[index])
def check_overdue(self) -> None:
for task in self._tasks:
if task.is_overdue():
self._notify("on_task_overdue", task)
def list_tasks(self) -> list[str]:
return [f" {i+1}. {t.summary()}" for i, t in enumerate(self._tasks)]
# --- Demo ---
def main():
manager = TaskManager()
notifier = ConsoleNotifier()
log = TaskLog()
manager.add_observer(notifier)
manager.add_observer(log)
print("=== Adding tasks ===")
manager.add_task(Task("Write chapter 16", priority=1, due_date="2025-03-14"))
manager.add_task(Task("Review pull request", priority=2, due_date="2024-01-01"))
manager.add_task(Task("Buy groceries", priority=3))
print("\n=== Current tasks ===")
for line in manager.list_tasks():
print(line)
print("\n=== Checking for overdue tasks ===")
manager.check_overdue()
print("\n=== Completing a task ===")
manager.complete_task(0)
print("\n=== Log entries ===")
for entry in log.entries:
print(f" {entry}")
if __name__ == "__main__":
main()
Expected output:
=== Adding tasks ===
[NOTIFY] New task: Write chapter 16
[NOTIFY] New task: Review pull request
[NOTIFY] New task: Buy groceries
=== Current tasks ===
1. [ ] Write chapter 16 (P1)
2. [ ] Review pull request (P2) [OVERDUE]
3. [ ] Buy groceries (P3)
=== Checking for overdue tasks ===
[ALERT] OVERDUE: Review pull request (was due 2024-01-01)
=== Completing a task ===
[NOTIFY] Completed: Write chapter 16
=== Log entries ===
ADDED: Write chapter 16
ADDED: Review pull request
ADDED: Buy groceries
OVERDUE: Review pull request
COMPLETED: Write chapter 16
Notice the design improvements over previous versions:
Taskis a dataclass — clean, minimal boilerplate, auto-generated__repr__and__eq__- Observer pattern —
TaskManagerdoesn't know about console output or logging details; it just notifies observers - Loose coupling — you can swap
ConsoleNotifierfor anEmailNotifierwithout touchingTaskManager - High cohesion —
TaskManagermanages tasks,ConsoleNotifierhandles notifications,TaskLoghandles logging
🔗 Spaced Review — Chapter 6 (Functions): Look at how
main()reads as a clean sequence of high-level steps. Each step is a method call with a clear name. This is the same principle we learned for functions in Chapter 6 — give things clear names, keep each unit focused, and let the top-level code tell a story.
Chapter Summary
This chapter bridges the gap between knowing OOP mechanics and using OOP effectively. Here's what you should take away:
Principles guide your thinking. SRP says each class should have one reason to change. OCP says you should extend behavior by adding new classes, not modifying existing ones. Coupling and cohesion give you a vocabulary for evaluating designs — aim for loose coupling and high cohesion.
Patterns solve recurring problems. Strategy lets you swap algorithms at runtime. Observer lets multiple objects react to state changes without tight coupling. Factory centralizes object creation decisions. These aren't the only patterns, but they're the ones you'll use most often.
Dataclasses eliminate busywork. When a class is primarily data with minimal behavior, @dataclass gives you __init__, __repr__, and __eq__ for free. Use field(default_factory=...) for mutable defaults and frozen=True for immutable data.
Code smells are your early warning system. God classes, duplicated code, long parameter lists, and feature envy are signals that your design needs attention. Refactoring — improving design without changing behavior — is a skill you'll use throughout your career.
OOP isn't always the answer. Simple scripts, data pipelines, and one-off transformations are often better served by plain functions. Start simple; refactor into classes when the complexity warrants it.
Design is a practice, not a destination. You won't get it right the first time, and that's fine. The principles and patterns in this chapter give you a vocabulary for discussing design, a toolkit for solving common problems, and — most importantly — the ability to look at code and say, "This could be better," and know how to make it so.
Next up: Chapter 17 takes us into algorithms and data structures, where we'll learn to think about efficiency — not just whether code works, but how fast it works and how much memory it uses.