Chapter 17: Entity-Relationship Modeling — Drawing the Blueprint Before Building

DataField.Dev

30 min read

> Where you are: Part III, Chapter 17 of 40 — the start of Database Design, where theme #1 takes center stage: design is the most important skill. You've mastered SQL; now you learn to design the schema that makes SQL clean, fast, and correct.

In This Chapter

Design first, build second
The three building blocks
Cardinality: how many relate to how many
Participation: must it relate, or may it?
ER diagram notation
Weak entities
The design process
Modeling in practice: tools and habits
From conceptual to logical: the bridge ahead
Common design patterns
From prose to model: a worked extraction
Cardinality, discovered
Participation and its real consequences
A worked model: the library, end to end
Common mistakes
Entity or attribute? The judgment calls
The model as a communication tool
Reviewing and critiquing a model
Progressive project: model your domain
Why design is the most important skill
Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 17: Entity-Relationship Modeling — Drawing the Blueprint Before Building

Where you are: Part III, Chapter 17 of 40 — the start of Database Design, where theme #1 takes center stage: design is the most important skill. You've mastered SQL; now you learn to design the schema that makes SQL clean, fast, and correct.

Learning paths: 💻 📊 🔬 🏗️ — everyone. 💻 developers and 🏗️ DBAs live by schema quality; 🔬 CS students get the formal modeling concepts; 📊 analysts who understand why a schema is shaped a certain way write better queries against it.

Design first, build second

You wouldn't pour a building's foundation before drawing the plans. Yet developers routinely create tables ad hoc — an ALTER TABLE here, a column there — and end up trapped in a structure that fights every new feature (Chapter 14's "table with no rules"). Entity-Relationship (ER) modeling is the discipline of designing your data's structure on paper (or a diagram) before writing any DDL — capturing what things exist, what you know about them, and how they relate, in a form you can critique and refine before it's expensive to change.

An ER model is a conceptual design, independent of any particular database. It answers: What are the things? What do we know about each? How are they connected? Get this right and the tables (Chapter 18), normalization (Chapter 19), and queries (Part II) all fall into place. Get it wrong and no amount of clever SQL rescues you.

The three building blocks

Entities — the "things"

An entity is a thing your system tracks — a noun with independent existence: a customer, a product, an order, an employee. In the model it becomes (roughly) a table. An entity set is the collection of all entities of that type (all customers). In Mercado: customers, products, orders, categories, suppliers, warehouses, employees, reviews, and so on.

Finding entities: scan your requirements for the important nouns. "Customers place orders for products" → three entities: customer, order, product.

Attributes — what you know about an entity

An attribute is a property of an entity — a customer's email, a product's price. Attributes come in several flavors:

Simple (atomic): price, email.
Composite: an attribute made of parts — an address is really line1 + city + state + postal_code. You usually break composites into their parts.
Multi-valued: an entity can have several — a customer's multiple phone numbers or addresses. (These become separate tables — Chapter 18.)
Derived: computable from other data — a customer's total_spent or age. Often not stored (compute it in a query) unless performance demands it (Chapter 20).
Key: an attribute (or set) that uniquely identifies the entity — the basis of the primary key (Chapter 3).

Relationships — how entities connect

A relationship is an association between entities: a customer places an order; a product belongs to a category; an order contains products. Relationships are usually verbs between the noun-entities. They're the heart of the model — and where most design decisions live.

Finding relationships: scan requirements for verbs connecting nouns. "A customer places orders," "an order contains products," "an employee manages employees."

Cardinality: how many relate to how many

The most important property of a relationship is its cardinality — how many entities on each side participate. Three kinds:

One-to-one (1:1) — each A relates to at most one B, and vice versa. Rare; often a sign two entities could be one table (or a deliberate split for security/optional data). Example: an employee and their single parking-space assignment.

One-to-many (1:N) — one A relates to many B, but each B to one A. The most common relationship. In Mercado: one customer has many orders; one category has many products; one order has many shipments.

   customer ───< order          "one customer, many orders"
   (one)         (many)         the "crow's foot" (<) marks the many side

Many-to-many (M:N) — many A relate to many B. An order contains many products, and a product appears in many orders. M:N relationships can't be stored directly with a foreign key; they need a junction table (the resolution you'll do in Chapter 18). In Mercado, order_items is that junction:

   order >───< product          "many orders, many products"
   resolved by the order_items junction table:
   order ───< order_items >─── product

Recognizing M:N relationships and resolving them with junction tables is one of the most important design skills — and a place beginners often go wrong (trying to cram a list into one column).

Participation: must it relate, or may it?

Beyond "how many," a relationship has participation (or optionality):

Total (mandatory): every entity must participate. Every order must have a customer (orders.customer_id is NOT NULL).
Partial (optional): an entity may participate. A customer may have zero orders; an order may have no assigned sales rep (orders.employee_id is nullable).

Participation maps directly to NOT NULL (total) vs. nullable (partial) on the foreign key (Chapter 18), and to whether you use an INNER vs LEFT JOIN to query it (Chapter 6). Getting it right is the difference between "every order has a customer" being enforced or merely hoped.

ER diagram notation

Two notations dominate; learn to read both.

Crow's foot (the practical favorite, used in most tools) draws entities as boxes and relationships as lines, with symbols at each end showing cardinality and participation:

   ┌─────────────┐                      ┌─────────────┐
   │  customers  │──────────┼──────<    │   orders    │
   │             │ (one,                │             │
   │ customer_id │  mandatory)  (many,  │ order_id    │
   │ email       │              optional)│ customer_id │
   └─────────────┘                      └─────────────┘

   Symbols at a line end:
     │   "one (and only one)"        ───│
     ○   "zero (optional)"           ──○
     <   "many" (the crow's foot)    ──<
   Combined:  ──○<  = "zero or many"   ──│<  = "one or many"

Chen notation (the classic academic form) uses rectangles for entities, ovals for attributes, and diamonds for relationships, labeling cardinality with 1/N/M:

   [customers] ──1── <places> ──N── [orders]

Chen is more explicit and great for teaching; crow's foot is more compact and dominates real tools (dbdiagram, DBeaver, draw.io, Lucidchart). This book uses crow's-foot-style ASCII; Appendix B's Mercado diagram does too.

Weak entities

A weak entity can't be identified by its own attributes alone — it depends on another entity for its identity. order_items is weak: a line item has no meaning without its order; it's identified by (order_id, product_id), partly borrowing the order's key. Weak entities typically become tables with a composite key that includes the owner's key — exactly Mercado's order_items and inventory.

The design process

ER modeling fits into a three-stage flow from idea to implementation:

Requirements → Conceptual model. Read the requirements; extract entities (nouns), attributes (properties), and relationships (verbs) with cardinality and participation. Draw the ER diagram. Database-independent — pure thinking about the domain.
Conceptual → Logical model. Turn the ER diagram into tables, keys, and foreign keys (Chapter 18), and normalize (Chapter 19). Still mostly database-independent.
Logical → Physical model. Choose concrete PostgreSQL types, add constraints and indexes, write the DDL (Chapter 14). Database-specific.

This chapter is stage 1; Chapter 18 is the bridge to stage 2. The discipline of doing stage 1 first — thinking in entities and relationships before tables — is what separates designed databases from accreted ones.

Modeling in practice: tools and habits

Knowing the theory of ER modeling is one thing; doing it fluidly in real work is another, and a few practical habits make the difference. The first is where to model. You can model on paper or a whiteboard — excellent for thinking and for collaborative sessions, where the low friction encourages exploration and erasing. For anything you'll keep, digital tools are worth learning: dbdiagram.io (text-to-diagram, version-controllable), draw.io / diagrams.net (free, general-purpose), Lucidchart, and the ER-diagram features built into database tools like DBeaver and pgAdmin (which can even reverse-engineer a diagram from an existing database — invaluable for understanding a schema you've inherited). The text-based tools have a particular advantage: because the diagram is defined in text, it lives in version control alongside your code, and changes to the model are reviewable like any other change.

The second habit is iteration. A first model is a draft, not a deliverable. Sketch it quickly, then critique it (using the review questions above), then refine — and expect several passes. The conceptual model's whole value is that it's cheap to change, so use that cheapness: try a structure, find its flaws, restructure, repeat, all before any DDL exists. Resist the urge to "just start building" once you have a rough model; the extra hour spent refining the diagram routinely saves days of migration pain later. The third habit is naming discipline. Even in the conceptual model, name entities and relationships clearly and consistently (singular or plural for entities — pick one and stick to it; verbs for relationships). Good names make the model communicate, and they carry forward into the table names you'll live with for years.

The fourth, easily-forgotten habit is to keep the model alive. A diagram drawn once and never updated drifts from the actual schema until it's worse than useless — confidently wrong. Treat the model as documentation that evolves with the database: when the schema changes (Chapter 22), update the diagram. A current ER diagram is one of the most valuable pieces of documentation a project can have, because it lets any new team member grasp the data's structure at a glance, far faster than reading a hundred CREATE TABLE statements. Mercado's diagram lives in Appendix B for exactly this reason. The habits — model in the right tool, iterate, name well, keep it current — are what turn ER modeling from a textbook exercise into a working practice that pays dividends across a system's whole life.

From conceptual to logical: the bridge ahead

This chapter is stage one of the three-stage design process — the conceptual model, pure thinking about the domain, database-independent. It's worth previewing how it connects to what comes next, because the stages form a pipeline and each builds on the last. The conceptual ER model you've learned to build (entities, attributes, relationships, cardinalities, participations) becomes the input to the logical model of Chapter 18, where mechanical, learnable rules turn it into concrete tables, columns, primary keys, and foreign keys — including the resolution of M:N relationships into junction tables and the mapping of the link-entities you identified. Then Chapter 19's normalization refines that logical model to eliminate redundancy, and Chapter 14's DDL realizes it as physical PostgreSQL tables with types, constraints, and indexes.

The reassuring news, which the next chapter will demonstrate, is that once the conceptual model is right, the translation to tables is largely mechanical. There are rules — an entity becomes a table, a 1:N relationship becomes a foreign key on the many side, an M:N becomes a junction table, a multi-valued attribute becomes its own table — and applying them faithfully turns a good ER diagram into a good schema almost automatically. This is exactly why the conceptual stage deserves such care: it's where the genuine design thinking happens, the judgment about what things exist and how they relate, and once that thinking is sound, the rest is craft applied to a good foundation. Conversely, no amount of skill at the later stages rescues a flawed conceptual model — you can apply the table-mapping rules perfectly to a wrong model and get a perfectly-built wrong schema.

So the effort you invest in this chapter's skills — extracting entities from prose, determining cardinality in both directions, deciding participation, judging entity-versus-attribute, critiquing the result — is the highest-leverage effort in all of database design. It's the stage where getting it right makes everything downstream easy, and getting it wrong makes everything downstream a struggle against a bad foundation. Chapter 18 will feel almost mechanical because this chapter did the hard part. That's the design process working as intended: think hard about the domain first, in the cheap and flexible conceptual model, and let the disciplined translation rules carry that thinking faithfully into a sound, well-built database.

Common design patterns

Certain relationship shapes recur in every domain. Recognizing them speeds design:

Order–line-items (M:N via junction): orders ↔ products through order_items. Any "X contains many Y, and Y appears in many X" is this pattern.
Category hierarchy (self-referencing 1:N): categories.parent_category_id → a tree (Chapters 11, 18, 21).
Org chart (self-referencing 1:N): employees.manager_id.
User–roles (M:N): users ↔ roles through a user_roles junction (Chapter 32's permissions).
Tagging (M:N): items ↔ tags through an item_tags junction.
Polymorphic association (a comment that can belong to a post or a photo) — trickier; covered in Chapter 21.

You'll study these as reusable data modeling patterns in Chapter 21; for now, learn to spot them in requirements.

From prose to model: a worked extraction

The skill of ER modeling is largely the skill of reading requirements — usually written in plain prose by people who aren't thinking about databases — and extracting the entities, attributes, and relationships hidden in them. Let's do it on a concrete paragraph, because this translation from human language to structured model is where design begins and where it most often goes wrong.

Suppose a client describes their need like this: "We run a community library. Members can borrow books; each loan has a due date and we track when the book is returned. A book can have several authors, and an author writes many books. Books belong to a genre. Members sometimes place holds on books that are currently out, and they join a waiting list. We charge fines for overdue returns." Read it once for nouns, once for verbs, once for properties.

The nouns — the candidate entities — leap out: member, book, loan, author, genre, hold, fine. Each is a "thing with independent existence" the system must track, so each is likely a table. The verbs reveal relationships: members borrow books (through loans), books have authors, books belong to a genre, members place holds, the library charges fines. The properties attach to the nouns: a loan has a due date and a return date; a book presumably has a title and the genre it belongs to; a member has a name and contact details; a fine has an amount. In a single careful reading, the shape of the model emerges from the prose.

But the careful reading also surfaces the interesting decisions, which is where modeling becomes craft rather than transcription. "A book can have several authors, and an author writes many books" is explicitly a many-to-many relationship — neither a column on books nor a column on authors can hold it; it needs a junction. "Members can borrow books" with "each loan has a due date" tells you the loan isn't just a link but an entity in its own right (it has its own attributes — due date, return date), so it's not a bare M:N junction but a full entity with relationships to both member and book. "Holds" and a "waiting list" hint at ordering (position in the queue) that you'll need to capture. Each of these is a judgment the prose implies but doesn't spell out, and catching them is exactly what separates a model that fits the real need from one that technically stores the data but fights every feature. This extraction — nouns to entities, verbs to relationships, properties to attributes, with special attention to the M:N relationships and the link-entities-with-attributes — is the core move you'll make on every modeling task.

Cardinality, discovered

Cardinality — whether a relationship is 1:1, 1:N, or M:N — is the single most consequential property of a relationship, and learning to determine it from requirements (rather than guessing) is essential. The reliable technique is to ask the question in both directions, because cardinality is really two questions, one per side.

Take member and loan. Ask: "Can one member have many loans?" Yes — a member borrows many books over time. Now ask the reverse: "Can one loan belong to many members?" No — a loan is one member borrowing one book. One-many in one direction, one in the other: this is a one-to-many relationship (one member, many loans). Now take book and author. "Can one book have many authors?" Yes (co-authored books). "Can one author have many books?" Yes (prolific authors). Many in both directions: this is many-to-many, requiring a junction. Now take member and library-card, if each member has exactly one card and each card belongs to one member: one in both directions, a one-to-one (rare, and often a hint the two could merge or were split deliberately).

The both-directions test never fails you, and it catches the error beginners make most: assuming a relationship is 1:N when it's actually M:N. "Books belong to a genre" sounds 1:N (one genre, many books) — and if each book has exactly one genre, it is. But if the requirement were "books can be tagged with multiple genres," the reverse question ("can one book have many genres?") flips to yes, and it's M:N, needing a junction. The cardinality isn't a property of the entities; it's a property of how they relate in this domain, and only asking both directions reveals it. Get this wrong and you build the wrong structure: a 1:N foreign key where you needed an M:N junction means you literally cannot store the second author, the second genre, the second value — and you discover this only when real data arrives that the schema can't hold. Always ask both directions, explicitly, for every relationship.

Participation and its real consequences

If cardinality is "how many," participation is "must it, or may it" — and it has consequences that ripple all the way into your queries. A relationship is total (mandatory) on a side if every entity on that side must participate, and partial (optional) if it merely may. Every order must have a customer (total participation — an order without a customer is meaningless); a customer may have zero orders (partial — a newly registered customer has none yet). These aren't abstractions; each maps to a concrete implementation decision and a concrete query behavior.

Participation maps directly to nullability of the foreign key (Chapter 18). Total participation means the foreign key is NOT NULL — the database enforces that every order has a customer, making "an order with no customer" structurally impossible. Partial participation means the foreign key is nullable — an order may have no assigned sales rep, so orders.employee_id allows NULL. The participation you decide at modeling time becomes a NOT NULL constraint (or its absence) at build time, which is the difference between a rule the database guarantees and a rule you merely hope holds. Deciding participation deliberately is therefore deciding which integrity rules your schema will enforce.

Participation also determines how you query the relationship, connecting modeling all the way back to Chapter 6's joins. When you want "all orders with their customer," an INNER JOIN is correct because participation is total — every order has a customer, so no order is lost. But when you want "all customers with their orders," you need a LEFT JOIN, because participation is partial — some customers have no orders, and an inner join would silently drop them. The participation you modeled predicts which join preserves the rows you care about. This is why getting participation right at design time pays off repeatedly: it sets the NOT NULL constraints that guarantee integrity, and it tells every future query-writer whether to reach for an inner or outer join. A model that doesn't think through participation produces both nullable columns that should be required (silent integrity gaps) and confusion about which join to use (silent missing rows). Decide, for every relationship, whether each side must participate or merely may — and write it on the diagram.

A worked model: the library, end to end

Let's assemble the library requirements into a complete conceptual model, making every decision explicit, because seeing one done end to end is worth more than any list of rules. From the extraction above, the entities are: member, book, author, genre, loan, hold, fine. Now we work through each relationship with cardinality (both directions) and participation.

Book–genre: one genre has many books; one book has one genre (per the requirement). One-to-many. A book must have a genre (total on the book side); a genre may have zero books (partial). → a genre_id foreign key on book, NOT NULL. Book–author: many-to-many (both directions yes), so a junction entity book_authors linking the two, each side total within the junction. Member–loan: one member, many loans; each loan one member; one-to-many; a loan must have a member (total), a member may have no loans (partial). → a NOT NULL member_id on loan. Book–loan: one book, many loans over time; each loan one book; one-to-many; total on the loan side. → a NOT NULL book_id on loan. Member–hold and book–hold: holds link members and books, but a hold has its own attributes (position in queue, date placed), so it's a link-entity with one-to-many from each of member and book. Loan–fine: a fine arises from an overdue loan; one loan might incur one fine (or arguably many over time); a fine must reference a loan (total).

   genre ──< book >──< book_authors >── author      (book↔author is M:N via junction)
              │
   member ──< loan >── book                          (loan links member & book; has its own attrs)
              │
   member ──< hold >── book                          (hold links member & book; has position)
   loan ──< fine                                     (a fine belongs to a loan)

Notice the decisions the model captured that the prose only implied. The loan and the hold became full entities, not mere junctions, because they carry their own attributes (due date, return date, queue position) — a bare M:N junction has only the two foreign keys, but a relationship that describes itself needs to be an entity. The book–author relationship became a true junction (book_authors) because it's M:N. Every "must" became total participation and thus a NOT NULL foreign key; every "may" became partial and nullable. This conceptual model — entities, relationships, cardinalities, participations, all decided deliberately — is now a blueprint you can critique before a single table exists, and a blueprint Chapter 18 will turn mechanically into tables. The hard thinking happened here, on paper, where changing your mind costs nothing; the building, next chapter, is comparatively mechanical because the design is sound.

Common mistakes

Cramming a multi-valued attribute into one column (comma-separated tags, a list of phones in one field). Multi-valued → a separate table. (Chapters 18, 19.)
Missing an M:N relationship — trying to model "orders contain products" with a single foreign key. M:N needs a junction table.
Confusing an attribute with an entity (or vice versa). If a "thing" has its own attributes and relationships, it's probably an entity (a table), not an attribute.
Ignoring participation — not deciding whether a relationship is mandatory, then discovering nullable-vs-not-null bugs later.
Designing tables before modeling — jumping to CREATE TABLE without the conceptual model, and baking in a shape you'll regret.

Entity or attribute? The judgment calls

The trickiest modeling decisions are often not about relationships but about a deceptively simple question: is this thing an entity (its own table) or an attribute (a column on another entity)? The prose rarely tells you directly, and getting it wrong creates structures that fight you later. A few principles resolve most cases.

The core test: does this thing have its own attributes, its own relationships, or its own independent existence? If yes, it's an entity. Consider "genre." If a genre is just a label — a word like "fiction" — it might seem like an attribute (a genre text column on book). But if genres have their own properties (a description, a parent genre, a display order), or if you need to ensure consistency (so it's "Science Fiction" everywhere, not "Sci-Fi" in some rows and "SF" in others), or if you'll relate other things to genres, then genre is an entity with its own table, referenced by a foreign key. The same thing can be an attribute in one design and an entity in another, depending on how much the domain needs to know about it. A customer's country is an attribute if you just store the name; it's an entity if you track country-level data (tax rates, shipping zones, currency) and want referential integrity on which countries are valid.

A related judgment is when to extract a composite attribute into its own entity. An address is a composite attribute (line, city, state, postal code). If each customer has exactly one address you never share or reuse, it can be columns on the customer. But the moment a customer can have several addresses (billing, shipping, multiple shipping), or addresses are shared, or you query them independently, address becomes an entity with its own table and a relationship to customer — which is exactly Mercado's addresses table. The signal is multiplicity and independent use: a single, owned, never-queried-alone composite can be columns; anything multi-valued or independently meaningful wants its own table. This is the multi-valued-attribute rule (a value an entity can have several of becomes its own table) generalized into a habit of mind.

These judgments have no mechanical answer because they depend on what the domain actually needs, which is why modeling is design rather than transcription. The guiding question to keep asking is: how much does my system need to know about this thing, and in how many places? The more a thing has its own properties, participates in its own relationships, must be kept consistent, or appears multiply, the more it wants to be an entity. The less — a simple, single, owned value — the more it's just an attribute. Making these calls well is the heart of conceptual design, and like all design judgment, it improves with practice and with the willingness to ask "what will this need to do later?" before committing.

The model as a communication tool

An ER diagram is not only a blueprint for tables — it's a communication artifact, and this is a dimension of its value that beginners overlook. A good diagram lets you discuss the design with people who will never write SQL: the client describing the library, the product manager defining features, the teammate reviewing your work. Because it's visual and free of implementation detail, an ER diagram can be put in front of a domain expert who can say "no, actually a book can be in multiple genres" or "a member can have a family membership covering several people" — catching a fundamental modeling error while it's still just a line on a diagram, not a deployed schema holding production data.

This communicative power is precisely why the conceptual model comes first and stays database-independent. By deliberately excluding types, indexes, and PostgreSQL specifics, the conceptual model stays at the level where domain experts can engage with it. "Customers place orders, orders contain products" is a sentence anyone in the business can validate; "orders.customer_id integer NOT NULL REFERENCES customers" is not. The conceptual model is the shared language between the people who understand the domain and the people who will build the database — and often those are different people, or the same person wearing different hats at different times. Drawing the model and walking stakeholders through it surfaces misunderstandings early, when they're cheap to fix.

The practical discipline this suggests: treat your ER diagram as something to show and discuss, not just a private step on the way to DDL. Walk a colleague through it. Read it aloud to the person who described the requirements ("so, a member can have many loans, and each loan is for exactly one book — is that right?"). The act of explaining the model in plain language, and inviting correction, catches errors that no amount of solitary cleverness would. Databases are expensive to change once they hold data (Chapter 22), so the cheapest possible place to find a design flaw is in a conversation over a diagram, before any table exists. The diagram earns its keep not only as a blueprint but as the medium for that conversation.

Reviewing and critiquing a model

Because the conceptual model is where design errors are cheapest to fix, learning to critique a model — your own or someone else's — is a core skill. A good ER review runs through a checklist of questions that surface the most common and most costly mistakes, and developing the habit of asking them turns modeling from a one-shot guess into an iterative refinement.

Start with the relationships, since that's where most errors hide. For each one, ask: is the cardinality right in both directions? (The both-directions test catches mislabeled M:N relationships.) Is participation decided on each side, and does it match reality? (An overlooked optional side becomes a wrongly-NOT NULL column.) Are there any M:N relationships not yet resolved into junctions? Are there relationships that carry their own attributes and should therefore be entities, not bare links? Then examine the entities: does each have a clear identity (a key)? Are there things modeled as attributes that should be entities (multi-valued, independently meaningful, needing consistency)? Are there composite attributes hiding a structure? Finally, check completeness against the requirements: can the model actually answer every question the system needs to answer, and store every fact it needs to store? Walk through the key use cases mentally and confirm the data is there to support them.

The most valuable review question is often the forward-looking one: what's likely to change, and does the model accommodate it gracefully? If the client says each book has one genre today but might want multiple later, modeling it as M:N now (even though 1:N would suffice today) may save a painful migration. If a relationship is 1:1 today but might become 1:N, note it. Anticipating plausible evolution — without over-engineering for changes that will never come — is judgment that develops with experience, but even asking the question puts you ahead of designs that bake in today's assumptions as if they were permanent. A model reviewed against these questions, refined, and reviewed again is dramatically more likely to be one you don't regret. And every flaw caught here, on the diagram, is a flaw that never becomes a 2 a.m. migration on a production database. That asymmetry — cheap to fix now, expensive to fix later — is the entire economic case for taking the conceptual model seriously, and for reviewing it as carefully as you'd review code.

Progressive project: model your domain

Time to formalize the design you've been sketching since Chapter 1:

List your entities (the nouns): for a library — member, book, loan, author, category, hold, fine.
For each entity, list its attributes, marking the key, and noting any multi-valued (→ future table) or derived (→ maybe don't store) ones.
Identify the relationships (verbs) between entities, with cardinality (1:1 / 1:N / M:N) and participation (mandatory/optional). Watch for M:N (e.g., a book has many authors, an author writes many books → junction).
Draw the ER diagram in crow's-foot notation (on paper, or a free tool like dbdiagram.io / draw.io).
Identify which patterns appear (hierarchy? M:N junctions? tagging?).

Save the diagram and notes in your project. Chapter 18 turns it into tables.

Why design is the most important skill

This chapter opens Part III, and with it, theme #1 — design is the most important skill — moves from a recurring reminder to the main event. It's worth stating plainly why this theme has been emphasized from the very first chapter and why it deserves its own part of the book. The reason is asymmetry of cost. A query you write today, you can rewrite tomorrow in minutes if it's wrong. A schema you design today, once it holds production data and serves live traffic, is something you will live with for years — and changing it (Chapter 22) is among the hardest, riskiest operations in all of software, requiring careful migrations, backward-compatibility dances, and the constant risk of corrupting or losing data. The decisions you make at the conceptual modeling stage echo for the entire life of the system.

This asymmetry is why the discipline of modeling first — thinking in entities and relationships before touching CREATE TABLE — is not academic fussiness but hard-won professional wisdom. Every experienced engineer has a story about a schema designed in haste that fought every subsequent feature, that required heroic migrations to fix, that made simple requests inexplicably hard because the data was shaped wrong. And every experienced engineer has felt the opposite: working with a well-designed schema where new features slot in naturally, queries are clean, and the data's structure anticipated what you needed. The difference between those two experiences is almost entirely the quality of the conceptual design — the thinking done, or skipped, at the stage this chapter teaches.

The further truth is that good design makes everything else easier. A well-designed schema makes SQL clean (the joins follow the relationships you modeled), makes integrity automatic (the constraints follow the participation you decided), makes performance achievable (the structure supports the indexes you'll need), and makes the system comprehensible (the tables mirror the domain). Conversely, a poorly-designed schema makes every query awkward, every rule a battle, every optimization a workaround, and the whole system harder to reason about. Design isn't one skill among many; it's the foundation that determines how much the other skills can accomplish. The cleverest SQL in the world, written against a badly-modeled schema, is lipstick on a structural problem.

So as you move through Part III — this chapter's conceptual modeling, Chapter 18's translation to tables, Chapter 19's normalization, Chapter 20's deliberate denormalization, Chapter 21's patterns, Chapter 22's evolution — hold onto why it matters. You are learning to make the decisions that everything else in a database-backed system rests upon. The SQL mastery of Part II was essential, but it expresses queries against whatever structure exists; Part III is where you create that structure, and create it well. That is why design is the most important skill, and why this part, though it teaches no new SQL syntax, may be the most valuable in the book. Get the design right, and the rest of your work as a database practitioner becomes immeasurably easier; get it wrong, and you'll spend years working around decisions you could have gotten right in an afternoon with a diagram.

Summary

ER modeling designs your data conceptually — before building tables — in terms of entities (things/nouns), attributes (properties; simple/composite/multi-valued/derived/key), and relationships (associations/verbs). A relationship's cardinality (1:1, 1:N, M:N) and participation (total/mandatory vs. partial/optional) are its defining properties; M:N relationships require a junction table, and multi-valued attributes become separate tables. Weak entities depend on an owner for identity (composite keys). Diagrams use crow's-foot (practical) or Chen (academic) notation. The design process flows requirements → conceptual (ER) → logical (tables) → physical (DDL), and doing the conceptual step first is the discipline that produces databases you don't regret. Recurring patterns (order-items, hierarchies, M:N junctions, tagging) speed the work.

You can now: - Identify entities, attributes (all five kinds), and relationships from requirements. - Determine a relationship's cardinality (1:1/1:N/M:N) and participation (total/partial). - Recognize that M:N needs a junction table and multi-valued attributes need their own table. - Read and draw crow's-foot (and Chen) ER diagrams. - Follow the requirements → conceptual → logical → physical design process. - Spot common design patterns in a new domain.

What's next. Chapter 18 — From ER Diagram to Tables — the mechanical, learnable rules for turning your ER model into concrete tables, keys, and foreign keys (including resolving M:N with junction tables and mapping inheritance), producing the DDL that realizes your design.

Practice in exercises.md, test yourself with the quiz, apply it in the case studies, review the key takeaways, and go deeper with further reading.