> "A picture is worth a thousand words, but a graph is worth a thousand pictures."
Prerequisites
- 8
- 12
Learning Objectives
- Define a graph formally as a pair (V, E) and translate a real-world network into vertices and edges.
- Use core graph terminology precisely: degree, adjacency, path, cycle, and connectedness.
- Classify graphs as directed, weighted, bipartite, complete, or simple, and choose the right model for a problem.
- State and prove the handshaking lemma and apply its parity corollary to rule out impossible graphs.
- Recognize when two drawings depict the same graph (isomorphism) and explain why deciding this is subtle.
- Model a social network as a graph and implement a from-scratch Graph data structure in Python.
In This Chapter
- Overview
- Learning Paths
- 27.1 What is a graph?
- 27.2 Terminology: the language every algorithm assumes
- 27.3 Types of graphs: choosing the right model
- 27.4 The handshaking lemma
- 27.5 Graph isomorphism (an introduction)
- 27.6 Modeling with graphs: social networks, maps, and the web
- Project Checkpoint: starting graphs.py with a Graph class
- Summary
- Spaced Review
- What's Next
Chapter 27: Introduction to Graphs
"A picture is worth a thousand words, but a graph is worth a thousand pictures." — common saying among network scientists
Overview
Open any application you use and you are looking at a graph. Your contacts list is a graph of people connected by who-knows-whom. A road map is a graph of intersections connected by streets. The web is a graph of pages connected by links. Your program's modules form a graph of dependencies; the call stack is a path through a graph of functions; a database's foreign keys stitch tables into a graph of references. When a recruiter says "we need someone who can think about systems," what they often mean is someone who can see the graph hiding inside a problem and then reach for the right algorithm.
That is the payoff of this part of the book, and it starts here. A graph is the single most useful structure in computer science precisely because so many problems — routing, scheduling, matching, ranking, recommending, detecting fraud — turn out to be the same problem once you strip away the surface and look at the dots and the lines. Learn graphs once and you get a discount on all of them.
This chapter is the vocabulary chapter. Before you can run breadth-first search to find "degrees of separation" (Chapter 28) or Dijkstra's algorithm to find a route (Chapter 29), you need to say exactly what a graph is, name its parts without ambiguity, and prove your first theorem about them. We will build that vocabulary the way the rest of the book builds everything: starting from a concrete example you already understand — a small social network — generalizing to a precise definition, and returning to code you could ship.
You already have the raw material. In Chapter 12 you represented a relation as a digraph — dots with arrows — and reasoned about reachability. A graph is that idea, promoted from a side-view of relations to a first-class object of study, with its own theorems and its own algorithms.
🔗 Connection. A graph is, almost exactly, a relation (Chapter 12) drawn as a picture. An edge $\{u, v\}$ is an unordered pair; a relation's pair $(u, v)$ is ordered. That one difference — ordered versus unordered — is the difference between a directed and an undirected graph, and we will make it precise in §27.3. The set-theory of Chapter 8 (subsets, unordered and ordered pairs, cardinality) is the foundation underneath all of it.
In this chapter you will learn to:
- Translate a network — people, roads, web pages, tasks — into a precise mathematical object.
- Use the terminology (vertex, edge, degree, neighbor, path, cycle, connected) that every graph algorithm assumes you know.
- Tell directed from undirected, weighted from unweighted, simple from multigraph, bipartite from complete — and pick the model that fits.
- Prove the handshaking lemma and use it to instantly reject claims like "a party where everyone shook exactly three hands had seven guests."
- Decide, at least informally, when two differently-drawn graphs are really the same graph.
- Implement a
Graphclass from scratch and use it to begin the Toolkit'sgraphs.pymodule.
Learning Paths
🏃 Fast Track: If you have seen graphs in a data-structures course, skim §27.1–§27.2 for our exact notation (it is used by every later chapter), read the handshaking-lemma proof in §27.4 carefully, and jump to the Project Checkpoint to lock in the
Graphclass. Do the ⭐⭐ and ⭐⭐⭐ exercises.📖 Standard Path: Read in order. Work every
🔄 Check Your Understandingbefore moving on. Draw every example graph yourself — graph intuition lives in your pencil, not your eyes.🔬 Deep Dive: After the chapter, prove the two handshaking corollaries (§27.4) rigorously, then attempt the isomorphism-invariant exercises in §27.5 and convince yourself why "just try all relabelings" is not a practical algorithm.
27.1 What is a graph?
Start with something concrete. Five people — Ana, Ben, Cam, Dev, and Eve — are members of a small study group. Some of them are friends; some are not. The friendships are: Ana–Ben, Ana–Cam, Ben–Cam, Cam–Dev, and Dev–Eve. That is the whole social world we care about.
Notice what we just did. We named a set of things (the five people) and a set of connections between pairs of those things (the five friendships). We threw away everything else — ages, names' spellings, who sits where — because none of it matters for the questions we want to ask, such as "can a message pass from Ana to Eve through mutual friends?" That act of keeping only the things and their pairwise connections is the act of building a graph. Everything in this part of the book is a variation on it.
A natural way to record this is a picture: draw a dot for each person and a line between two dots when those people are friends.
Ana
/ \
Ben----Cam
\
Dev
\
Eve
The dots are the things; the lines are the connections. The picture is suggestive, but a picture is not a definition — two people might draw it differently, and a computer cannot store "a picture." We need to pin the object down with sets, which is exactly what Chapter 8 prepared us to do.
Definition (graph). A graph is a pair $G = (V, E)$ where $V$ is a non-empty finite set whose elements are called vertices (singular: vertex; also called nodes), and $E$ is a set of edges, where each edge is an unordered pair $\{u, v\}$ of distinct vertices with $u, v \in V$. We say the edge $\{u, v\}$ joins $u$ and $v$.
For the study group: $$V = \{\text{Ana}, \text{Ben}, \text{Cam}, \text{Dev}, \text{Eve}\},$$ $$E = \big\{\{\text{Ana},\text{Ben}\}, \{\text{Ana},\text{Cam}\}, \{\text{Ben},\text{Cam}\}, \{\text{Cam},\text{Dev}\}, \{\text{Dev},\text{Eve}\}\big\}.$$
That pair of sets is the graph. The drawing is just one rendering of it; the sets are the truth. Because an edge is an unordered pair, $\{\text{Ana}, \text{Ben}\}$ and $\{\text{Ben}, \text{Ana}\}$ are the same edge — friendship, in this model, has no direction. We will add direction back deliberately in §27.3 when we need it.
💡 Intuition: A graph is a relationship made into an object. The vertices are the nouns; the edges are a single verb ("is friends with", "links to", "depends on"). Choosing what becomes a vertex and what becomes an edge is the entire modeling decision — and it is a decision, not a given. The same situation can yield different graphs depending on the question you are asking (we return to this in §27.6).
The definition above describes the most common kind of graph, which has a name worth stating now because most theorems quietly assume it.
Definition (simple graph). A simple graph is a graph with no loops (no edge from a vertex to itself, i.e. no $\{v, v\}$) and no parallel edges (no two distinct edges joining the same pair of vertices). Equivalently: $E$ is a set of two-element subsets of $V$, so each pair of vertices is joined by at most one edge.
Unless we say otherwise, "graph" in this book means "simple graph." When a model genuinely needs loops or parallel edges (say, a road map where two distinct bridges connect the same pair of islands), we call the result a multigraph and say so explicitly.
How big can a simple graph's edge set be? Each edge is a 2-element subset of $V$, and the number of 2-element subsets of an $n$-element set is the binomial coefficient $\binom{n}{2}$ from Chapter 16. So a simple graph on $n$ vertices has at most $$\binom{n}{2} = \frac{n(n-1)}{2}$$ edges. This little fact recurs constantly: it is the size of a "fully connected" network, the worst-case number of edges an algorithm might have to scan, and (we will see) the maximum of the handshaking sum.
🔗 Connection. "$E$ is a set of 2-element subsets of $V$" is pure Chapter 8. An edge is an element of the power set $\mathcal{P}(V)$ — specifically one of the 2-element members. Counting those members is the Chapter 16 result $\binom{n}{2}$. Graphs are not a new universe; they are sets and counting wearing a diagram.
Here is the study-group graph as data in Python. We store the vertices in a set and the edges as a set of frozensets, because an edge is unordered and we need it to be hashable so it can live in a set.
V = {"Ana", "Ben", "Cam", "Dev", "Eve"}
E = {frozenset({"Ana", "Ben"}), frozenset({"Ana", "Cam"}),
frozenset({"Ben", "Cam"}), frozenset({"Cam", "Dev"}),
frozenset({"Dev", "Eve"})}
print(len(V), "vertices and", len(E), "edges")
print(frozenset({"Ben", "Ana"}) in E) # same edge, written backwards
# Expected output:
# 5 vertices and 5 edges
# True
The second line prints True because frozenset({"Ben", "Ana"}) and frozenset({"Ana", "Ben"}) are equal — that is exactly the "unordered" in the definition, enforced by the data structure. A frozenset of two names is our faithful encoding of an undirected edge.
🔄 Check Your Understanding 1. Write $V$ and $E$ for a graph of three vertices $a, b, c$ in which every pair is joined by an edge. How many edges is that, and does it match $\binom{3}{2}$? 2. Why did we store edges as
frozensetrather than as tuples("Ana", "Ben")? 3. Is $\{v, v\}$ allowed as an edge in a simple graph? What is such an edge called, and what kind of graph permits it?
Answers
- $V = \{a, b, c\}$, $E = \{\{a,b\}, \{a,c\}, \{b,c\}\}$ — three edges, and $\binom{3}{2} = \frac{3\cdot 2}{2} = 3$. ✓ 2. A tuple is ordered, so
("Ana","Ben")and("Ben","Ana")would be two different objects — but they are the same undirected edge. Afrozensetis unordered and hashable, so it both deduplicates and can be stored in a set. (An ordered tuple is exactly what we want for a directed edge in §27.3.) 3. No. $\{v, v\}$ is a loop; only a multigraph (or a graph explicitly allowing loops) permits it.
27.2 Terminology: the language every algorithm assumes
Graph theory has a lot of vocabulary, and skipping it is a false economy: every algorithm in the next seven chapters is described in these words, and a single misread term ("path" vs. "walk", "connected" vs. "complete") can flip an algorithm's meaning. We will introduce the terms in the order you meet them when you actually trace through a graph, anchoring each to the study-group example.
Adjacency and neighbors
Two vertices $u$ and $v$ are adjacent if $\{u, v\} \in E$ — that is, if an edge joins them. An edge $e = \{u, v\}$ is incident to each of its two endpoints $u$ and $v$. The set of all vertices adjacent to a vertex $v$ is its neighborhood, written $N(v)$: $$N(v) = \{\, u \in V \mid \{u, v\} \in E \,\}.$$ In the study group, $N(\text{Cam}) = \{\text{Ana}, \text{Ben}, \text{Dev}\}$: Cam is friends with three people. Eve's neighborhood is $N(\text{Eve}) = \{\text{Dev}\}$ — a single friend.
Degree
The degree of a vertex $v$, written $\deg(v)$, is the number of edges incident to it. In a simple graph (no loops, no parallel edges) this equals the size of its neighborhood: $$\deg(v) = \lvert N(v) \rvert.$$
Definition (degree). In a simple graph $G = (V, E)$, the degree $\deg(v)$ of a vertex $v$ is the number of edges incident to $v$, equivalently the number of neighbors $\lvert N(v)\rvert$. A vertex of degree 0 is isolated; a vertex of degree 1 is a leaf (or pendant vertex).
The degrees in our study group:
| Vertex | Neighbors | Degree |
|---|---|---|
| Ana | Ben, Cam | 2 |
| Ben | Ana, Cam | 2 |
| Cam | Ana, Ben, Dev | 3 |
| Dev | Cam, Eve | 2 |
| Eve | Dev | 1 |
Degree is the most important local statistic of a graph. In a social network it is a person's number of friends — a first, crude measure of influence. In a road map it is the number of streets meeting at an intersection. In a dependency graph it is how many things a module touches. Whole subfields (centrality, scale-free networks) are built on the distribution of degrees.
Walks, paths, and cycles
Now we move from local (one vertex) to global (getting around the graph). Suppose we want to pass a message from Ana to Eve. We need a route along edges.
Definition (walk, path, cycle). A walk of length $k$ from $v_0$ to $v_k$ is a sequence of vertices $v_0, v_1, \dots, v_k$ in which each consecutive pair $\{v_{i-1}, v_i\}$ is an edge. Its length is the number of edges, $k$. A path is a walk with no repeated vertex. A cycle is a walk of length at least 3 that starts and ends at the same vertex and otherwise repeats no vertex.
The distinctions matter. A walk may wander and repeat (Ana–Cam–Ana–Ben is a walk). A path may not reuse a vertex (Ana–Cam–Dev–Eve is a path of length 3). A cycle is a closed path (Ana–Ben–Cam–Ana is a cycle of length 3). When people say "the shortest path between two people," they mean a path, and they mean the one with the fewest edges — which is exactly what breadth-first search computes in Chapter 28.
⚠️ Common Pitfall: "Path" is overloaded. In casual speech and in some textbooks, "path" means any route, repeats allowed (what we call a walk). In this book — and in most algorithm statements — a path has no repeated vertices. When a result says "there is a path from $u$ to $v$," it is also true that there is a walk; the converse needs an argument (you can always shorten a walk to a path by snipping out any loop, an idea you will prove in the exercises). Read the definition local to whatever you are studying.
For the study group, is there a path from Ana to Eve? Yes: Ana–Cam–Dev–Eve. Its length is 3. Is there a shorter one? No edge skips ahead, so 3 is the minimum — Ana and Eve are "three degrees apart."
Connectedness
Definition (connected). A graph is connected if for every pair of vertices $u, v$ there is a path from $u$ to $v$. A graph that is not connected splits into maximal connected pieces called connected components.
Our study group is connected: from any person you can reach any other through friendships (check a few). If we deleted the edge $\{\text{Cam}, \text{Dev}\}$, the graph would break into two components, $\{\text{Ana}, \text{Ben}, \text{Cam}\}$ and $\{\text{Dev}, \text{Eve}\}$ — Ana could no longer reach Eve at all. Connectedness is a yes/no property of the whole graph; "is my network in one piece?" is one of the first questions you ask of any graph, and Chapter 28's traversal algorithms answer it directly.
🔗 Connection. "Reachable by a path" is an equivalence relation on vertices (Chapter 12): it is reflexive (a length-0 path from $v$ to itself), symmetric (reverse the path — edges are undirected), and transitive (concatenate two paths, then snip repeats). Its equivalence classes are exactly the connected components. The partition theorem from Chapter 12 is why components never overlap and always cover $V$.
Let us compute these quantities in code, on a tiny adjacency-list representation (a dictionary from each vertex to the set of its neighbors). We will formalize this representation in Chapter 28; here it is just a convenient way to use the terminology we just defined.
adj = {"Ana": {"Ben", "Cam"}, "Ben": {"Ana", "Cam"},
"Cam": {"Ana", "Ben", "Dev"}, "Dev": {"Cam", "Eve"},
"Eve": {"Dev"}}
degree = {v: len(nbrs) for v, nbrs in adj.items()}
print(degree)
print("Cam's neighbors:", sorted(adj["Cam"]))
print("Most-connected:", max(degree, key=degree.get))
# Expected output:
# {'Ana': 2, 'Ben': 2, 'Cam': 3, 'Dev': 2, 'Eve': 1}
# Cam's neighbors: ['Ana', 'Ben', 'Dev']
# Most-connected: Cam
The dictionary comprehension reads each vertex's neighbor set and records its size — that is the degree, straight from the definition $\deg(v) = \lvert N(v)\rvert$. max(degree, key=degree.get) returns the vertex with the largest degree, our crude "most influential person." Cam, with three friends, wins.
🔄 Check Your Understanding 1. List one walk, one path, and one cycle in the study-group graph, and give each one's length. 2. What is $\deg(\text{Ben})$ and what is $N(\text{Ben})$? 3. If you delete vertex Cam (and all edges touching it), is the remaining graph connected? Into how many components does it fall?
Answers
- Walk: Ana–Ben–Ana–Cam (length 3, repeats Ana). Path: Ana–Cam–Dev–Eve (length 3, no repeats). Cycle: Ana–Ben–Cam–Ana (length 3). (Other answers are fine.) 2. $\deg(\text{Ben}) = 2$ and $N(\text{Ben}) = \{\text{Ana}, \text{Cam}\}$. 3. Deleting Cam removes edges $\{$Ana,Cam$\}$, $\{$Ben,Cam$\}$, $\{$Cam,Dev$\}$. What remains: edge Ana–Ben, edge Dev–Eve, and nothing linking the two. So the graph is not connected; it falls into two components, $\{$Ana, Ben$\}$ and $\{$Dev, Eve$\}$. (Cam is a cut vertex — a single point of failure.)
27.3 Types of graphs: choosing the right model
The plain undirected simple graph is the default, but real problems come in flavors, and matching the flavor to the problem is most of the skill. We will meet four variations. Each adds exactly one feature to the basic definition, and each unlocks a different class of problems.
Directed graphs
Friendship is mutual, but plenty of relations are not. "Follows" on a social platform, "links to" on the web, "depends on" between software modules, "is a prerequisite for" between courses — all have a direction. To model them we let edges be ordered pairs.
Definition (directed graph). A directed graph (or digraph) is a pair $G = (V, E)$ where $E$ is a set of ordered pairs $(u, v)$ of vertices, called directed edges or arcs. The arc $(u, v)$ goes from $u$ (its tail) to $v$ (its head); it is drawn as an arrow $u \to v$. Each vertex now has two degrees: its in-degree $\deg^-(v)$ (number of arcs into $v$) and its out-degree $\deg^+(v)$ (number of arcs out of $v$).
The lone difference from §27.1 is ordered versus unordered pairs — and that is precisely the difference between a relation's pair $(u, v)$ and an undirected edge $\{u, v\}$.
🔗 Connection. A directed graph is a relation on $V$ (Chapter 12), drawn. Chapter 12 even called the drawing of a relation its "digraph." So everything you proved about relations — reflexive, symmetric, transitive, transitive closure (reachability) — is a statement about directed graphs. A symmetric relation is exactly an undirected graph in disguise: every arc has its reverse, so you may as well draw an undirected edge. The term directed graph is first defined here as a first-class object; Chapter 12 used it only as a view of a relation.
Example: a tiny "follows" network where Ana follows Ben, Ben follows Cam, Cam follows Ana, and Cam also follows Ben. $$V = \{\text{Ana}, \text{Ben}, \text{Cam}\}, \quad E = \{(\text{Ana},\text{Ben}), (\text{Ben},\text{Cam}), (\text{Cam},\text{Ana}), (\text{Cam},\text{Ben})\}.$$ Here $\deg^+(\text{Cam}) = 2$ (Cam follows two people) but $\deg^-(\text{Cam}) = 1$ (one person follows Cam). On a real platform, in-degree is followers and out-degree is followees — and the asymmetry is the whole point.
Weighted graphs
So far an edge either exists or does not. But a road has a length; a network link has a latency; a flight has a price. To carry that information we attach a number to each edge.
Definition (weighted graph). A weighted graph is a graph $G = (V, E)$ together with a weight function $w \colon E \to \mathbb{R}$ that assigns a real number $w(e)$ to each edge. The weight typically models a cost, distance, capacity, or strength. (A graph may be both directed and weighted — a one-way street with a length.)
The "shortest path" between two cities is no longer the fewest edges but the least total weight along a path — the sum $\sum_{e \in P} w(e)$ over the path's edges. Unweighted graphs are the special case where every weight is 1, and then "least total weight" collapses back to "fewest edges." That is why breadth-first search (Chapter 28) suffices for the unweighted case while Dijkstra's algorithm (Chapter 29) is needed once weights differ.
🔗 Connection. GPS routing, internet packet routing, and flight-price search are all "minimum-weight path in a weighted graph." We build the algorithm that solves them — Dijkstra's — in Chapter 29, and the weights are exactly the function $w$ defined here.
Bipartite graphs
Some networks have two kinds of vertices, and edges only ever go between kinds, never within. Students and the courses they take. Job applicants and the positions they apply to. Customers and the products they buy. These are bipartite.
Definition (bipartite graph). A graph $G = (V, E)$ is bipartite if its vertex set can be split into two disjoint sets $X$ and $Y$ (so $V = X \cup Y$ and $X \cap Y = \emptyset$) such that every edge joins a vertex in $X$ to a vertex in $Y$ — no edge lies within $X$ or within $Y$. The pair $(X, Y)$ is called a bipartition.
Example: applicants $X = \{\text{a}_1, \text{a}_2, \text{a}_3\}$ and jobs $Y = \{\text{j}_1, \text{j}_2\}$, with edges recording "applicant is qualified for job." Every edge crosses from $X$ to $Y$; no applicant connects to an applicant. Bipartite structure is what makes the matching problem (Chapter 34) tractable: pairing applicants to jobs so that as many as possible are filled.
There is a clean test for bipartiteness that you will prove later: a graph is bipartite if and only if it has no cycle of odd length. Intuitively, color $X$ red and $Y$ blue; every edge then connects red to blue, so following edges flips color each step — and you can only return to your starting vertex (same color) after an even number of steps. An odd cycle would demand a vertex be both colors. We will revisit this idea as 2-coloring in Chapter 33.
🔗 Connection. Bipartite matching — assign each applicant to at most one qualified job, maximizing assignments — is solved in Chapter 34 via network flow. The bipartition defined here is the input to that algorithm.
Complete graphs
At the opposite extreme from a sparse network is one where everyone is connected to everyone.
Definition (complete graph). A complete graph on $n$ vertices, written $K_n$, is the simple graph in which every pair of distinct vertices is joined by an edge. It has $\binom{n}{2} = \frac{n(n-1)}{2}$ edges, and every vertex has degree $n - 1$.
$K_n$ is the "fully connected" or worst-case-dense graph. It is the maximum a simple graph on $n$ vertices can be (any more edges would require loops or parallel edges). $K_5$ — five vertices, ten edges — will reappear in Chapter 33 as one of the two graphs that cannot be drawn in the plane without edges crossing. We also write $K_{m,n}$ for the complete bipartite graph: parts of size $m$ and $n$ with every cross-pair joined ($mn$ edges).
Here is a small Python sketch that builds $K_n$'s edge set and confirms two facts from the definitions — the edge count and that every degree is $n-1$.
from itertools import combinations
def complete_graph_edges(n):
"""Edges of K_n on vertices 0..n-1, as frozensets."""
return {frozenset(pair) for pair in combinations(range(n), 2)}
E = complete_graph_edges(5)
deg = [sum(1 for e in E if v in e) for v in range(5)]
print("K5 has", len(E), "edges")
print("degrees:", deg)
# Expected output:
# K5 has 10 edges
# degrees: [4, 4, 4, 4, 4]
combinations(range(5), 2) yields each 2-element subset of $\{0,1,2,3,4\}$ exactly once — there are $\binom{5}{2}=10$ of them, matching the printed edge count. Each vertex sits in $5 - 1 = 4$ of those edges, so every degree is 4, exactly as the definition promises.
🔄 Check Your Understanding 1. In the "follows" digraph above, compute $\deg^+(\text{Ana})$ and $\deg^-(\text{Ben})$. 2. Is the study-group friendship graph bipartite? (Hint: it contains the triangle Ana–Ben–Cam.) 3. How many edges does $K_{10}$ have? How many does the complete bipartite graph $K_{3,4}$ have?
Answers
- $\deg^+(\text{Ana}) = 1$ (Ana follows only Ben). $\deg^-(\text{Ben}) = 2$ (Ben is followed by Ana and Cam). 2. No. It contains the triangle Ana–Ben–Cam, a cycle of length 3 (odd). A bipartite graph has no odd cycle, so this graph is not bipartite. 3. $K_{10}$ has $\binom{10}{2} = \frac{10\cdot 9}{2} = 45$ edges. $K_{3,4}$ has $3 \times 4 = 12$ edges.
27.4 The handshaking lemma
We have enough vocabulary to prove our first real theorem about graphs — and it is a small gem, both because the proof is a one-line idea and because its corollary lets you reject impossible claims on sight.
Here is the question that motivates it. At a party, several people shake hands. Some statistician adds up, over all guests, the number of hands each person shook. Is there any constraint on that total? It cannot be just any number. Watch.
Every handshake involves exactly two people. So when we sum each person's handshake count over everyone, every single handshake gets counted twice — once for each of its two participants. Therefore the grand total must be even, and in fact must equal twice the number of handshakes. Translate "person" to vertex, "handshake" to edge, "number of hands shaken" to degree, and you have a theorem about every graph.
The Handshaking Lemma. For any graph $G = (V, E)$, $$\sum_{v \in V} \deg(v) = 2\lvert E\rvert.$$ The sum of all vertex degrees equals twice the number of edges.
The strategy first. This is a double-counting argument — one of the most powerful techniques in all of combinatorics. We count a single quantity two different ways and set the two counts equal. The quantity is the number of (vertex, edge) incident pairs: pairs $(v, e)$ where vertex $v$ is an endpoint of edge $e$. Counting by vertices gives the left side; counting by edges gives the right side. Because both expressions count the same set of pairs, they must be equal. No induction, no cases — just "count it twice."
Proof. Consider the set of all incidences $I = \{\, (v, e) : v \in V,\ e \in E,\ v \text{ is an endpoint of } e \,\}$. We count $\lvert I\rvert$ in two ways.
Count by vertices. Group the incidences by their first coordinate. For a fixed vertex $v$, the incidences with that vertex are exactly the edges meeting $v$ — and there are $\deg(v)$ of those, by the definition of degree. Summing over all vertices, $$\lvert I\rvert = \sum_{v \in V} \deg(v).$$
Count by edges. Group the incidences by their second coordinate. For a fixed edge $e = \{u, w\}$, there are exactly two incidences: $(u, e)$ and $(w, e)$, since an edge has exactly two endpoints (the graph is simple — no loops). Summing over all edges, $$\lvert I\rvert = \sum_{e \in E} 2 = 2\lvert E\rvert.$$
The two counts are of the same set $I$, so they are equal: $$\sum_{v \in V} \deg(v) = 2\lvert E\rvert. \qquad \blacksquare$$
The lemma is named for the handshake story, and the metaphor is exact: hands shaken summed over people equals twice the number of handshakes. Its real power is a corollary about parity that catches impossible graphs instantly.
Corollary (the even-degree-count rule). In any graph, the number of vertices of odd degree is even.
The strategy first. We already know the total degree is even ($= 2\lvert E\rvert$). Split the vertices into the even-degree ones and the odd-degree ones. The even-degree group contributes an even sum no matter what. For the whole sum to stay even, the odd-degree group must also contribute an even sum — and a sum of odd numbers is even only when there is an even count of them. That is the whole argument; let us write it cleanly.
Proof. Partition $V$ into $V_{\text{even}} = \{v : \deg(v) \text{ is even}\}$ and $V_{\text{odd}} = \{v : \deg(v) \text{ is odd}\}$. By the handshaking lemma, $$\sum_{v \in V_{\text{even}}} \deg(v) + \sum_{v \in V_{\text{odd}}} \deg(v) = 2\lvert E\rvert.$$ The right side is even. The first sum on the left is a sum of even numbers, hence even. Subtracting, the second sum — $\sum_{v \in V_{\text{odd}}} \deg(v)$ — must be even as well. But it is a sum of odd numbers, and a sum of odd numbers is even if and only if there is an even number of them (each odd number is $1 \bmod 2$, so $k$ of them sum to $k \bmod 2$, which is $0$ exactly when $k$ is even). Therefore $\lvert V_{\text{odd}}\rvert$ is even. $\blacksquare$
This corollary is a lie detector for graphs. Someone claims: "At our meetup, every one of the 7 attendees shook hands with exactly 3 others." Impossible — that describes a graph with 7 vertices each of degree 3, i.e. 7 vertices of odd degree, and 7 is odd. No such graph exists, and you knew it without drawing anything. (Equivalently, the handshaking lemma would force $\sum \deg = 7 \times 3 = 21 = 2\lvert E\rvert$, but $21$ is odd, so $\lvert E\rvert$ would not be an integer.)
💡 Intuition: The handshaking lemma is the graph-theory incarnation of "every edge has two ends, so it gets counted twice." Almost every basic counting fact about graphs traces back to it. When a quantity about a graph feels like it should be constrained, ask whether double-counting incidences pins it down.
Let's verify the lemma empirically on our study group — not as a proof (we have one) but as the Chapter 6 habit of testing a claim in code before and after proving it.
adj = {"Ana": {"Ben", "Cam"}, "Ben": {"Ana", "Cam"},
"Cam": {"Ana", "Ben", "Dev"}, "Dev": {"Cam", "Eve"},
"Eve": {"Dev"}}
degree_sum = sum(len(nbrs) for nbrs in adj.values())
num_edges = degree_sum // 2 # the lemma: |E| = (sum of degrees) / 2
odd_count = sum(1 for nbrs in adj.values() if len(nbrs) % 2 == 1)
print("sum of degrees:", degree_sum, "= 2 *", num_edges)
print("vertices of odd degree:", odd_count)
# Expected output:
# sum of degrees: 10 = 2 * 5
# vertices of odd degree: 2
The sum of degrees is $2+2+3+2+1 = 10 = 2 \times 5$, and the graph indeed has 5 edges — the lemma holds. The odd-degree vertices are Cam (degree 3) and Eve (degree 1): exactly two of them, an even count, as the corollary guarantees.
🔄 Check Your Understanding 1. A graph has degree sequence $(4, 3, 3, 2, 2)$ — five vertices with those degrees. How many edges does it have? Is the degree sequence even possible? (Check the corollary.) 2. Can a simple graph have exactly one vertex of odd degree? Why or why not? 3. Could there be a party of 6 people in which everyone shook hands with exactly 3 others? Give the edge count if so.
Answers
- Sum $= 4+3+3+2+2 = 14$, so $\lvert E\rvert = 14/2 = 7$ edges. The number of odd-degree vertices is two (the two 3's) — even, so the corollary is satisfied and the sequence is not ruled out. (It is in fact realizable.) 2. No. One odd-degree vertex would mean an odd number (one) of odd-degree vertices, violating the corollary. 3. Yes — 6 is even, so 6 vertices of degree 3 is allowed. Edge count: $\sum \deg = 6 \times 3 = 18$, so $\lvert E\rvert = 18/2 = 9$ edges. (One such graph: $K_{3,3}$, or the prism.)
27.5 Graph isomorphism (an introduction)
Here is a puzzle. Draw a square with its four corners, edges around the rim: top-left to top-right, top-right to bottom-right, bottom-right to bottom-left, bottom-left to top-left. Now draw a totally different-looking figure: four vertices $1, 2, 3, 4$ with edges $\{1,2\}, \{2,3\}, \{3,4\}, \{4,1\}$. Are these the same graph?
In every way that matters to graph theory, yes. A graph does not care where you place the dots or how you bend the lines — only which vertices are joined to which. Both figures are "four vertices in a single cycle." The drawings differ; the graphs are identical up to relabeling the vertices. That notion has a name.
Definition (graph isomorphism). Two graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$ are isomorphic, written $G_1 \cong G_2$, if there is a bijection $f \colon V_1 \to V_2$ such that for every pair $u, v \in V_1$, $$\{u, v\} \in E_1 \iff \{f(u), f(v)\} \in E_2.$$ Such an $f$ is an isomorphism: a relabeling of vertices that preserves adjacency exactly (both ways).
The condition says $f$ turns edges into edges and non-edges into non-edges — adjacency is preserved and reflected. Isomorphic graphs are "the same graph wearing different name tags." Every property we have defined — degree sequence, number of edges, connectedness, presence of a triangle — is identical in isomorphic graphs, because $f$ is just a renaming.
🔗 Connection. An isomorphism is a bijection (Chapter 9) with an extra structure-preserving property. This is the same pattern you will see throughout mathematics: two objects are "the same" when there is a bijection between them that respects the relevant structure. For sets, any bijection works (Chapter 9, Chapter 10). For graphs, the bijection must also preserve edges.
How do you show two graphs are isomorphic? Exhibit the bijection and check it. For the square and the cycle $1$–$2$–$3$–$4$–$1$, the map (top-left $\mapsto 1$, top-right $\mapsto 2$, bottom-right $\mapsto 3$, bottom-left $\mapsto 4$) sends rim edges to the listed edges and non-edges (the two diagonals) to non-edges. Done.
How do you show two graphs are not isomorphic? You cannot try every relabeling and report "none worked" as a casual argument — there are $n!$ relabelings. Instead, find a single invariant the two graphs disagree on. An invariant is a property preserved by every isomorphism; if two graphs differ on one, no isomorphism can exist. The cheap, powerful invariants:
| Invariant | Why it's preserved |
|---|---|
| Number of vertices $\lvert V\rvert$ | $f$ is a bijection on vertices |
| Number of edges $\lvert E\rvert$ | $f$ maps edges bijectively to edges |
| Degree sequence (multiset of all degrees) | $f$ preserves each vertex's degree |
| Number of connected components | paths map to paths |
| Number of cycles of a given length (e.g. triangles) | cycles map to cycles |
Strategy: proving two graphs are not isomorphic. Compute a cheap invariant for both — start with the degree sequence, sorted. If the sorted degree sequences differ, the graphs are not isomorphic, full stop. (If they match, you have not proven isomorphism — you must still exhibit a bijection or find a finer invariant; matching degree sequences is necessary but not sufficient.)
For example, a 4-vertex path $a$–$b$–$c$–$d$ has degree sequence $(1, 2, 2, 1)$ sorted to $(1,1,2,2)$, while the 4-cycle has degree sequence $(2,2,2,2)$. Different sorted sequences, so the path and the cycle are not isomorphic — no relabeling can turn a path into a cycle. We did not try $4! = 24$ relabelings; one invariant settled it.
Here is the degree-sequence test in code — a partial isomorphism check that can refute but not confirm.
def degree_sequence(adj):
"""Sorted list of vertex degrees from an adjacency dict."""
return sorted(len(nbrs) for nbrs in adj.values())
path4 = {"a": {"b"}, "b": {"a", "c"}, "c": {"b", "d"}, "d": {"c"}}
cycle4 = {1: {2, 4}, 2: {1, 3}, 3: {2, 4}, 4: {3, 1}}
print(degree_sequence(path4))
print(degree_sequence(cycle4))
print("Could be isomorphic?", degree_sequence(path4) == degree_sequence(cycle4))
# Expected output:
# [1, 1, 2, 2]
# [2, 2, 2, 2]
# Could be isomorphic? False
The sorted degree sequences differ, so the function correctly reports the two graphs cannot be isomorphic. Had they matched, the honest answer would be "possibly — needs more checking," because the degree sequence is a necessary condition, not a sufficient one.
⚠️ Common Pitfall: Matching degree sequences do not prove isomorphism. There are non-isomorphic graphs with identical degree sequences (you will construct a pair in the exercises). The degree sequence is a fast filter: a mismatch rules isomorphism out, but a match only earns you the right to keep looking.
Why does this matter to a programmer? Deciding whether two arbitrary graphs are isomorphic is a famously hard problem — no efficient algorithm is known for the general case (it sits in a curious complexity limbo we will touch on in Part VI), yet the brute force of trying all $n!$ relabelings is hopeless for even modest $n$. Real systems that must compare graph structure — molecule databases, circuit-equivalence checkers, deduplicating program ASTs — lean heavily on invariants to avoid the brute force. The lesson, which echoes Theme 3 of this book, is that the right invariant (a well-chosen abstraction) can replace an astronomically expensive search.
🔄 Check Your Understanding 1. Graph $G$ has degree sequence $(3,3,2,2,2)$ and graph $H$ has $(3,3,3,2,1)$. Can they be isomorphic? How do you know without drawing them? 2. Two graphs each have degree sequence $(2,2,2,2,2,2)$. Does that prove they are isomorphic? Give the safe conclusion. 3. Why is "try all $n!$ relabelings" impractical even for $n = 15$? (Order-of-magnitude is enough.)
Answers
- No. Sorted, $G$ is $(2,2,2,3,3)$ and $H$ is $(1,2,3,3,3)$ — different multisets of degrees. Degree sequence is an isomorphism invariant, so a mismatch rules out isomorphism. 2. No — matching degree sequences are necessary but not sufficient. Safe conclusion: "they might be isomorphic; we must exhibit an adjacency-preserving bijection or find a finer invariant." (E.g. a single 6-cycle and two disjoint triangles both have all degrees 2 but differ in number of components.) 3. $15! \approx 1.3 \times 10^{12}$ — over a trillion relabelings; even at a billion checks per second that is roughly 20 minutes for one pair, and it explodes factorially from there. Invariants are essential.
27.6 Modeling with graphs: social networks, maps, and the web
The hardest part of using graphs is not the algorithms — it is the modeling: deciding what becomes a vertex, what becomes an edge, whether edges are directed or weighted, and what question you are actually asking. The same raw situation yields different graphs depending on the question. This section walks through the three canonical models you will meet for the rest of your career.
Social networks (our anchor)
This is the lens we will carry through Part V. Model a social platform by letting each person be a vertex and each relationship be an edge. The single most important modeling choice is directed or undirected:
- Friendship (Facebook-style, mutual) → undirected edges. If Ana is friends with Ben, Ben is friends with Ana; one unordered edge $\{$Ana, Ben$\}$ captures it.
- Following (Twitter/X- or Instagram-style, one-way) → directed edges. Ana may follow Ben without Ben following Ana; we need the arc $(\text{Ana}, \text{Ben})$.
Once the graph exists, the questions become algorithms you will learn by name:
| Real question | Graph question | Where in the book |
|---|---|---|
| "How are two people connected?" | shortest path between two vertices | BFS, Chapter 28 |
| "Degrees of separation" | path length in an unweighted graph | Chapter 28 |
| "Who is most influential?" | high degree / centrality | this chapter (degree); beyond |
| "Find tight-knit communities" | clustering / coloring | Chapter 33 |
| "Who shares the most mutual friends?" | common-neighbor counts | exercises |
🔗 Connection. This social-network graph is the anchor example for all of Part V. In Chapter 28 we run breadth-first search on it to compute the literal "six degrees of separation." In Chapter 29 we add edge weights (interaction strength) and find strongest-connection paths. In Chapter 33 we detect communities by coloring. In Chapter 34 we match people to resources. Every algorithm in this part will be demonstrated on a graph exactly like the study group you have been reading about — just bigger.
A worked modeling decision: suppose you are asked to find "people you may know" suggestions. You would model friendship as an undirected graph and look for vertices at distance 2 — friends-of-friends who are not yet your friends, ranked by the number of mutual friends (common neighbors). That single sentence is a complete, correct specification, and it is only expressible because you now have the words: vertex, edge, undirected, distance, neighbor.
Maps and road networks
Model a map by letting each intersection be a vertex and each road segment be an edge. Now the natural refinements appear:
- Two-way streets → undirected edges; one-way streets → directed edges. Real maps are mixed, usually stored as directed graphs with both arcs added for two-way streets.
- Distances or travel times → weighted edges, with $w(e)$ the length or expected time.
"Find the fastest route from home to work" is then exactly "find the minimum-weight path" — the problem Dijkstra's algorithm (Chapter 29) solves and your phone runs millions of times a second. "Is the city reachable as one road network?" is "is the graph connected?" "What's the cheapest set of roads to plow that still connects every neighborhood?" is a minimum spanning tree (Chapter 32).
The web and dependency graphs
Model the web by letting each page be a vertex and each hyperlink be a directed edge $(\text{from page}, \text{to page})$. Links are inherently one-way, so the web graph is directed. In-degree (how many pages link to you) is a first proxy for importance — the seed of the idea behind PageRank. The same directed-graph model describes:
- Software dependencies: module $A$ depends on module $B$ is an arc $A \to B$; a build order is a topological sort (you saw it in Chapter 13 via posets, and Chapter 28 re-derives it via DFS). A cycle in this graph is a circular dependency — a bug.
- Course prerequisites: "course X requires course Y" is an arc; a valid schedule is again a topological sort, and again a cycle (X requires Y requires X) makes the requirements unsatisfiable.
💡 Intuition: The modeling questions are always the same four. (1) What is a vertex? (2) What is an edge? (3) Is the relationship directed (one-way) or undirected (mutual)? (4) Does an edge carry a weight (cost, distance, capacity)? Answer those four and you have chosen your graph — and usually named the algorithm you need.
A compact way to encode and explore any of these models is a single Graph data structure. We will build a proper one in the Project Checkpoint, but here is the modeling payoff in miniature: the same dictionary-of-neighbors code answered "who is most connected?" for friendships, and the identical code would answer "which intersection has the most roads?" for a map. One abstraction, many domains — which is the entire reason graphs earn a place at the center of computer science.
🔄 Check Your Understanding 1. You are modeling "retweets" (Ana retweets Ben's post). Directed or undirected? What does in-degree mean? 2. For "find a build order for these software modules," which graph type do you use, and what does a cycle in that graph signify? 3. Give the four modeling questions you ask before building any graph.
Answers
- Directed — retweeting is one-way (Ana retweets Ben does not imply Ben retweets Ana). In-degree of a person/post is the number of times it was retweeted (its reach/popularity). 2. A directed graph (dependencies point one way); a cycle signifies a circular dependency — modules that mutually require each other — which makes a valid build order impossible (no topological sort exists). 3. (1) What is a vertex? (2) What is an edge? (3) Directed or undirected? (4) Weighted or unweighted?
Project Checkpoint: starting graphs.py with a Graph class
Part V's Toolkit module is graphs.py, and it will grow into the largest module of the whole dmtoolkit package — over the next seven chapters it gains breadth-first search, depth-first search, Dijkstra's algorithm, a minimum-spanning-tree builder, and a max-flow routine. Every one of those algorithms needs a graph to run on. So Chapter 27's job is to lay the foundation: the Graph class itself, the data structure all later algorithms will compose against.
We will implement an undirected simple graph backed by an adjacency dictionary (vertex → set of neighbors). This representation makes the operations we have used all chapter — add a vertex, add an edge, list neighbors, compute degree — clean and fast, and it is exactly what Chapter 28 formalizes as the adjacency list. We also add a degree_sum() method so the handshaking lemma becomes a one-line sanity check on any graph you build.
Create dmtoolkit/graphs.py and add:
class Graph:
"""An undirected simple graph backed by an adjacency dict (vertex -> set)."""
def __init__(self):
self.adj = {} # vertex -> set of neighbor vertices
def add_vertex(self, v):
self.adj.setdefault(v, set()) # no-op if v already present
def add_edge(self, u, v):
if u == v:
raise ValueError("simple graphs have no self-loops")
self.add_vertex(u) # endpoints exist before the edge
self.add_vertex(v)
self.adj[u].add(v) # undirected: record both directions
self.adj[v].add(u)
def neighbors(self, v):
return self.adj[v] # the set N(v)
def degree(self, v):
return len(self.adj[v]) # deg(v) = |N(v)| for a simple graph
def vertices(self):
return set(self.adj)
def num_edges(self):
return self.degree_sum() // 2 # handshaking lemma: |E| = (sum deg)/2
def degree_sum(self):
return sum(len(nbrs) for nbrs in self.adj.values())
A quick driver that rebuilds the study group and checks the lemma:
g = Graph()
for u, v in [("Ana","Ben"), ("Ana","Cam"), ("Ben","Cam"),
("Cam","Dev"), ("Dev","Eve")]:
g.add_edge(u, v)
print(sorted(g.vertices()))
print("deg(Cam) =", g.degree("Cam"))
print("sum of degrees =", g.degree_sum(), "-> |E| =", g.num_edges())
# Expected output:
# ['Ana', 'Ben', 'Cam', 'Dev', 'Eve']
# deg(Cam) = 3
# sum of degrees = 10 -> |E| = 5
Trace it by hand to be sure (the Chapter 6 habit — never trust unrun code on faith). Five add_edge calls each insert two directed entries into adj; Cam ends up with {Ana, Ben, Dev}, so degree("Cam") is 3. The degree sum is $2+2+3+2+1 = 10$, and num_edges() returns $10 // 2 = 5$ — the handshaking lemma doing real work as a //2.
How this builds toward the capstone. Every later graph chapter assumes a Graph it can traverse. Chapter 28 will add bfs(g, s) and dfs(g, s) that walk g.neighbors(v); Chapter 29 attaches weights and adds dijkstra; Chapters 32 and 34 add mst_kruskal and max_flow. By keeping add_edge, neighbors, and degree stable here, those chapters' code will drop in without rework — and in the Chapter 39 capstone's Track B, this exact class becomes the backbone of a working social-network analyzer.
Toolkit so far:
logic.py(Chapters 1–3),sets.py(Chapter 8),relations.py(Chapters 12–13),combinatorics.py(Chapters 15–17), and now the start ofgraphs.py— theGraphclass that the rest of Part V is built on.
Summary
A graph is the universal model for "things and the pairwise connections between them." The reference facts of this chapter:
Core definitions
| Term | Definition |
|---|---|
| Graph $G=(V,E)$ | vertex set $V$ (finite, non-empty) and edge set $E$ of unordered pairs $\{u,v\}$ |
| Simple graph | no loops, no parallel edges; default in this book |
| Degree $\deg(v)$ | number of edges at $v$; $=\lvert N(v)\rvert$ in a simple graph |
| Neighborhood $N(v)$ | the set of vertices adjacent to $v$ |
| Walk / path / cycle | edge sequence / walk with no repeated vertex / closed path of length $\ge 3$ |
| Connected | a path exists between every pair of vertices; pieces are components |
Graph types (each adds one feature)
| Type | Feature | Models |
|---|---|---|
| Directed (digraph) | edges are ordered pairs $(u,v)$; in/out-degree | follows, links, dependencies |
| Weighted | weight function $w\colon E\to\mathbb{R}$ | distances, costs, capacities |
| Bipartite | $V=X\cup Y$, edges only cross; $\iff$ no odd cycle | applicants–jobs, students–courses |
| Complete $K_n$ | every pair joined; $\binom{n}{2}$ edges, all degrees $n-1$ | worst-case-dense networks |
Key results
- A simple graph on $n$ vertices has at most $\binom{n}{2}=\frac{n(n-1)}{2}$ edges.
- Handshaking lemma: $\displaystyle\sum_{v\in V}\deg(v)=2\lvert E\rvert$ (proved by double-counting incidences).
- Corollary: the number of odd-degree vertices is always even — a fast impossibility test.
- Isomorphism ($G_1\cong G_2$): an adjacency-preserving bijection on vertices. To refute isomorphism, exhibit a differing invariant (degree sequence first). Matching degree sequences are necessary but not sufficient.
Modeling checklist (ask in order): What is a vertex? What is an edge? Directed or undirected? Weighted or unweighted?
Toolkit: graphs.py begun — the Graph class (add_vertex, add_edge, neighbors, degree, degree_sum, num_edges) on an adjacency dictionary.
Spaced Review
Retrieval keeps knowledge available. Answer from memory before checking.
- (Ch. 12) A directed graph is literally a relation drawn as a digraph. Which property of a relation corresponds to "every edge of the graph is undirected" (i.e. each arc has its reverse)? And which relation property corresponds to "no self-loops"?
- (Ch. 12) "Reachable by a path" partitions the vertices into connected components. Name the three properties that make reachability an equivalence relation, and say which one gives a length-0 path.
- (Ch. 8) Why is the maximum number of edges in a simple graph on $n$ vertices equal to $\binom{n}{2}$? Phrase the answer in terms of subsets of $V$.
- (Ch. 8) We stored an undirected edge as a
frozensetof two vertices. Which Chapter 8 notion does an unordered pair correspond to, and how does it differ from the ordered pair used for a directed edge?
Answers
- "Every arc has its reverse" is symmetry ($(u,v)\in R \Rightarrow (v,u)\in R$). "No self-loops" is irreflexivity (no $(v,v)\in R$); equivalently, a simple graph's relation is anti-reflexive. 2. Reflexive (the length-0 path from a vertex to itself), symmetric (reverse a path — edges are undirected), transitive (concatenate two paths). The length-0 path gives reflexivity. 3. Each edge is a 2-element subset of $V$, and the number of 2-element subsets of an $n$-element set is $\binom{n}{2}$; a simple graph can include each such subset at most once, so that count is the maximum. 4. An unordered pair $\{u,v\}$ is a 2-element set (set membership ignores order, so $\{u,v\}=\{v,u\}$). An ordered pair $(u,v)$ distinguishes first from second coordinate ($(u,v)\ne(v,u)$ unless $u=v$) — which is exactly the directionality of a directed edge.
What's Next
You can now describe a graph precisely, name its parts, classify its type, and prove the handshaking lemma — but you cannot yet do much with one efficiently. A computer needs the graph in memory, and the choice of representation (an adjacency matrix versus an adjacency list) turns out to dominate the speed of everything you build on top. With a representation in hand, the two foundational algorithms — breadth-first search and depth-first search — let you systematically visit every vertex, test whether the graph is connected, find the components, detect cycles, and compute the literal "degrees of separation" in our social network. That is Chapter 28, and it is where graphs stop being vocabulary and start being power.