40 min read

Every program you write is a chain of conclusions. The type checker concludes that x + y is an

Prerequisites

  • 1
  • 2

Learning Objectives

  • Distinguish an argument's validity (its form) from its soundness (form plus true premises).
  • Apply the standard propositional inference rules — modus ponens, modus tollens, hypothetical and disjunctive syllogism, and the rest — to draw conclusions from premises.
  • Identify the common formal fallacies, especially affirming the consequent and denying the antecedent, and explain why they fail.
  • Use the quantifier inference rules (universal/existential instantiation and generalization) correctly, respecting their side conditions.
  • Construct a multi-step formal derivation, citing a rule and its inputs on every line, and check that a derivation is well-formed.

Chapter 3: Rules of Inference

"Logic is the anatomy of thought." — commonly attributed to John Locke

Overview

Every program you write is a chain of conclusions. The type checker concludes that x + y is an int because it concluded x is an int and y is an int. The optimizer concludes it can delete a branch because it concluded the branch's condition is always false. A SQL planner concludes two queries are equivalent and swaps one for the faster one. None of these tools "guesses." Each one applies a small, fixed set of rules of inference — patterns that take statements you already believe and hand you new statements you are now entitled to believe.

In Chapter 1 you learned the vocabulary of logic (connectives and truth tables) and in Chapter 2 the grammar (predicates and quantifiers). This chapter is about reasoning: how to get from premises to a conclusion in steps that are guaranteed to preserve truth. That guarantee is the whole game. A valid argument is a machine that, fed true inputs, can only ever produce true outputs — and that is exactly what you want from a type system, a verifier, or your own proofs in the chapters ahead.

There is a second payoff, just as important: knowing the valid patterns lets you recognize the invalid ones. The most expensive bugs in reasoning are fallacies that look like good arguments. "The test passed, so the code is correct" has the exact shape of a classic fallacy, and by the end of this chapter you will be able to name it. Spotting bad inference is a debugging skill for thought.

In this chapter, you will learn to:

  • Define an argument precisely as premises plus a conclusion, and separate validity (good form) from soundness (good form and true premises).
  • Apply the workhorse inference rules — modus ponens, modus tollens, the syllogisms, simplification, conjunction, addition, and resolution — and justify each by a truth table.
  • Recognize the two fallacies you will actually meet in the wild, affirming the consequent and denying the antecedent, and explain the difference between a fallacy and a false premise.
  • Reason with quantifiers using the four instantiation/generalization rules, and avoid the trap of reusing a name.
  • Build a formal derivation line by line, citing a rule on each step, and read someone else's derivation critically.

This is the chapter where Part I pivots from describing logic to doing it. Everything in Chapters 4–7 — direct proof, contradiction, induction — is a disciplined application of the rules you meet here.


Learning Paths

🏃 Fast Track: If you can already state modus ponens and modus tollens in your sleep, skim 3.1–3.2, read 3.3 (fallacies) carefully because the named traps recur, then spend your time on 3.4 (quantifier rules — the side conditions are subtle) and 3.5 (building derivations). Do the ⭐⭐ and ⭐⭐⭐ exercises.

📖 Standard Path: Read in order. Work every 🔄 Check Your Understanding before continuing, and try each 🧩 Productive Struggle and 🐛 Find the Error before reading the resolution. Write out the derivation in 3.5 yourself before reading ours.

🔬 Deep Dive: After the chapter, prove that the rule set in 3.2 is sound (each rule's premises entail its conclusion) by building the relevant truth tables, and read about completeness — the deeper fact that these rules can derive every valid propositional consequence. Then connect 3.6 to a real type checker or a Prolog interpreter. The starred exercises and the case studies go here.


3.1 Arguments and validity

Start with something concrete. A teammate sends you this reasoning in a code review:

"If the cache is stale, the test fails. The test failed. So the cache is stale."

Is that correct reasoning? It feels persuasive. But before we can answer, we need to pin down what "correct reasoning" even means — and the surprising answer is that correctness here has nothing to do with whether the cache is actually stale. It has to do with the shape of the argument.

An argument is a finite list of statements called the premises, followed by one final statement called the conclusion. We are claiming that the premises, taken together, support the conclusion. The standard way to write one is to stack the premises, draw a line, and write the conclusion below, with the symbol $\therefore$ ("therefore"):

$$ \begin{array}{l} p \rightarrow q \\ q \\ \hline \therefore\ p \end{array} $$

Here $p$ is "the cache is stale" and $q$ is "the test fails." The premises are $p \rightarrow q$ and $q$; the conclusion is $p$. This is your teammate's argument, stripped to its skeleton.

Now the central definition. We want a notion of "good argument" that captures the idea of a truth-preserving machine: if you feed it true premises, it cannot hand you a false conclusion.

Definition (validity). An argument is valid if there is no assignment of truth values to its propositional variables that makes all the premises true while making the conclusion false. Equivalently, the argument is valid exactly when the single conditional $$(\text{premise}_1 \land \text{premise}_2 \land \dots \land \text{premise}_n) \rightarrow \text{conclusion}$$ is a tautology (true under every row of its truth table — the idea from Chapter 1).

Validity is a property of form, not of content. It asks one question: is there any way for the premises to be true and the conclusion false at the same time? If "no," the argument is valid. If "yes" — if even one such row exists — the argument is invalid, and that row is a counterexample to the argument (the same counterexample idea you met in Chapter 2, now applied to whole arguments).

Let's settle the cache argument the honest way: a truth table. We need the premises $p \rightarrow q$ and $q$, and the conclusion $p$.

$p$ $q$ $p \rightarrow q$ $q$ $p$ (conclusion)
T T T T T
T F F F T
F T T T F
F F T F F

Look at the third row: $p$ is false, $q$ is true. Both premises ($p \rightarrow q$ and $q$) are true, yet the conclusion $p$ is false. That row is a counterexample. The argument is invalid — and in plain English the counterexample says: the test could be failing for some other reason entirely (a real bug, a flaky network) while the cache is perfectly fresh. Your teammate's reasoning is a classic mistake, and in 3.3 it gets a name.

💡 Intuition: A valid argument is a pipe, not a pump. It does not generate truth; it only carries truth downhill from premises to conclusion without leaking. If you pour truth in the top, truth comes out the bottom. If you pour falsehood in the top, the pipe makes no promises about what comes out — and that is fine, because validity is only about preserving truth, never about manufacturing it.

⚠️ Common Pitfall: "The conclusion is true, so the argument is valid." No. Validity is about the connection between premises and conclusion, not about whether the conclusion happens to be true. You can reach a true conclusion through a broken argument (luck), and a valid argument can have a false conclusion (if you fed it a false premise). Judge the form, not the outcome.

Validity versus soundness

Validity guarantees truth flows correctly, but it never checks whether you started with truth. Consider:

$$ \begin{array}{l} \text{All integers are even.} \\ 7 \text{ is an integer.} \\ \hline \therefore\ 7 \text{ is even.} \end{array} $$

This argument is valid — its form is impeccable (it's the quantifier version of modus ponens, as we'll see in 3.4). If both premises were true, the conclusion would have to be true. But the first premise is false, so the conclusion is false too. A valid argument with a false premise tells you nothing about reality. We need a stronger notion for arguments we actually want to trust.

Definition (soundness). An argument is sound if it is (1) valid and (2) all of its premises are actually true. A sound argument's conclusion is guaranteed true.

🚪 Threshold Concept: validity is about form; soundness is about form plus facts. This split runs through all of computer science. A type checker proves your program is valid with respect to its typing rules — but if your specification is wrong, a well-typed program is still wrong. A theorem prover checks the derivation (validity); you are responsible for the axioms (the premises). When you internalize that "valid" and "true" are different properties, you stop trusting clever arguments built on shaky assumptions — and you start asking the two separate questions every careful engineer asks: Is the reasoning airtight? and Are the inputs actually true?

Here is the clean way to hold the two ideas apart:

Premises true? Reasoning valid? Conclusion guaranteed true?
Sound Yes Yes Yes
Valid but unsound No Yes No (could be either)
Invalid Either No No (could be either)

A division of labor falls out of this table, and we'll lean on it for the rest of Part I: establishing validity is the logician's job (this chapter gives you the rules), while establishing the premises is the domain expert's job (number theory, the problem statement, the spec). A proof is a sound argument whose validity you demonstrate step by step and whose premises are axioms or previously proven results.

🔄 Check Your Understanding 1. Is the cache argument above invalid, or is it valid but unsound? How do you know? 2. Give an everyday valid argument with two false premises but a true conclusion. 3. True or false: a valid argument can have all true premises and a false conclusion. Justify.

Answers

  1. Invalid. The truth table has a row (row 3) where the premises are all true and the conclusion is false — that is the definition of invalid, regardless of any particular facts about caches.
  2. For example: "All prime numbers are even; 4 is prime; therefore 4 is even." Both premises are false, the form is valid (universal modus ponens), and the conclusion "4 is even" happens to be true. (This shows a true conclusion tells you nothing about soundness.)
  3. False. That is exactly what validity forbids: validity means there is no assignment making all premises true and the conclusion false. If the premises are genuinely all true, a valid argument's conclusion must be true. (This is why a sound argument's conclusion is guaranteed.)

3.2 Inference rules for propositions

Truth tables settle validity, but they don't scale. An argument with 10 propositional variables has a $2^{10} = 1024$-row table; with 20 variables, over a million rows. Worse, truth tables don't show the reasoning — they verify an argument all at once instead of building the conclusion step by step, the way a proof does. (This blowup is not a quirk of our method; checking whether a propositional formula can be satisfied is the canonical hard problem, SAT, which Chapter 1 previewed and Chapter 37 will treat in depth.)

So we precompute. An inference rule is a small argument form that we have already verified to be valid by truth table, packaged so we can apply it by pattern-matching instead of re-checking. Think of each rule as a lemma about arguments: prove it valid once, reuse it forever. This is exactly the abstraction move — theme three of this book — that separates fluent reasoners from people re-deriving everything from scratch.

The single most important rule is modus ponens ("mode that affirms"), the rule your cache teammate wished they were using.

Modus ponens. $$ > \begin{array}{l} > p \rightarrow q \\ > p \\ > \hline > \therefore\ q > \end{array} > $$ If $p$ implies $q$, and $p$ holds, then $q$ holds.

Why is it valid? The associated conditional is $\big((p \rightarrow q) \land p\big) \rightarrow q$. Build its truth table:

$p$ $q$ $p \rightarrow q$ $(p\rightarrow q)\land p$ $\big((p\rightarrow q)\land p\big)\rightarrow q$
T T T T T
T F F F T
F T T F T
F F T F T

The last column is all T — a tautology — so modus ponens is valid. The only row where both premises hold is row 1, and there $q$ is true. That is the entire content of the rule.

Its mirror image is modus tollens ("mode that denies"): instead of affirming the antecedent to get the consequent, you deny the consequent to deny the antecedent.

Modus tollens. $$ > \begin{array}{l} > p \rightarrow q \\ > \neg q \\ > \hline > \therefore\ \neg p > \end{array} > $$ If $p$ implies $q$, and $q$ is false, then $p$ must be false.

The justification is one line if you remember the contrapositive from Chapter 1: $p \rightarrow q \equiv \neg q \rightarrow \neg p$. So "$p \rightarrow q$ and $\neg q$" is just modus ponens applied to the contrapositive $\neg q \rightarrow \neg p$, yielding $\neg p$. Concretely: "If the cache is stale, the test fails. The test did not fail. Therefore the cache is not stale." That reasoning is valid — feel the difference from the broken version in 3.1.

💡 Intuition: Modus ponens runs the implication forward (cause $\Rightarrow$ effect: we have the cause, so we get the effect). Modus tollens runs it backward through negation (no effect $\Rightarrow$ no cause: the effect is absent, so the cause must be too). Both are valid. The two invalid directions — affirming the effect or denying the cause — are the fallacies of 3.3. Keeping these four straight is most of what 3.2 and 3.3 are about.

Here is the standard toolkit. Each rule has been (or can be) verified by a truth table exactly as above; we list them with their patterns and a one-line gloss.

Rule Premises Conclusion In words
Modus ponens $p \rightarrow q,\ p$ $q$ Affirm the antecedent.
Modus tollens $p \rightarrow q,\ \neg q$ $\neg p$ Deny the consequent.
Hypothetical syllogism $p \rightarrow q,\ q \rightarrow r$ $p \rightarrow r$ Chain implications.
Disjunctive syllogism $p \lor q,\ \neg p$ $q$ Eliminate one option.
Addition $p$ $p \lor q$ Weaken to a disjunction.
Simplification $p \land q$ $p$ Drop a conjunct.
Conjunction $p,\ q$ $p \land q$ Combine two truths.
Resolution $p \lor q,\ \neg p \lor r$ $q \lor r$ Cancel $p$ / $\neg p$.

A few of these deserve a second look because programmers use them constantly without naming them:

  • Hypothetical syllogism is transitivity of implication. It is the formal version of chaining function contracts: if parse takes a string to a token list, and eval takes a token list to a value, then their composition takes a string to a value. Each in a chain of reasoning is one link.
  • Disjunctive syllogism is case elimination. "The error is a 404 or a 500. It's not a 404. Therefore it's a 500." Every time you rule out possibilities until one remains, you are applying it.
  • Addition looks too weak to be useful ("from $p$, conclude $p \lor q$ for any $q$?"), but it is valid — if $p$ is true, then $p \lor q$ is certainly true. You use it whenever you weaken a precise fact into a looser one a later rule can consume.
  • Simplification and conjunction are the obvious "take apart" and "put together" rules for $\land$. They feel trivial, and they are — but a derivation needs to cite even the trivial moves so a checker (human or machine) can verify every step.
  • Resolution is the engine of automated theorem provers and the SLD-resolution at the heart of Prolog (3.6). Notice it generalizes disjunctive syllogism: set $r$ to a contradiction and resolution collapses to "$p \lor q$, $\neg p \vdash q$."

A worked multi-rule argument

Rules become powerful when you compose them. Suppose we are given four premises and want to reach a conclusion that no single rule delivers.

Claim. From the premises $$p \rightarrow q,\qquad \neg q,\qquad p \lor r,\qquad r \rightarrow s,$$ the conclusion $s$ follows validly.

The strategy first. We can't get $s$ directly — only $r \rightarrow s$ mentions $s$, so by modus ponens we'll need $r$ first. How do we get $r$? We have $p \lor r$, so if we can rule out $p$, disjunctive syllogism hands us $r$. And we can rule out $p$: modus tollens on $p \rightarrow q$ and $\neg q$ gives $\neg p$. So the plan is: (1) $\neg p$ by modus tollens; (2) $r$ by disjunctive syllogism; (3) $s$ by modus ponens. Working backward from the goal to see which rule could produce it, then forward to fill the gaps, is the core skill of 3.5.

Derivation.

Step Statement Justification
1 $p \rightarrow q$ Premise
2 $\neg q$ Premise
3 $p \lor r$ Premise
4 $r \rightarrow s$ Premise
5 $\neg p$ Modus tollens, 1, 2
6 $r$ Disjunctive syllogism, 3, 5
7 $s$ Modus ponens, 4, 6

Every line is either a premise or follows from earlier lines by a named rule. Because each rule is valid (truth-preserving), and we only ever applied rules to lines already established, the final line $s$ is a valid consequence of the premises. We never built a $2^4 = 16$-row truth table; we proved the conclusion. That is the difference between verification and reasoning, and it is the shape of every proof in this book.

Inference rules in code

We can check an inference rule the same way Chapter 1 built truth tables: enumerate every assignment and confirm there is no row with all premises true and the conclusion false. Here is modus ponens, checked exhaustively from scratch (this is the seed of this chapter's Toolkit increment).

from itertools import product

def implies(a, b):           # p -> q is false only when p true, q false
    return (not a) or b

def is_valid(premises, conclusion, n):
    """premises, conclusion: functions of n booleans. Return True iff valid."""
    for vals in product([False, True], repeat=n):
        if all(prem(*vals) for prem in premises) and not conclusion(*vals):
            return False     # counterexample: premises hold, conclusion fails
    return True

# Modus ponens: from (p -> q) and p, conclude q.
premises = [lambda p, q: implies(p, q), lambda p, q: p]
conclusion = lambda p, q: q
print(is_valid(premises, conclusion, 2))
# Expected output:
# True

The function returns True because no assignment of (p, q) makes both premises true while making q false — precisely what the truth table showed. Point the same function at the cache argument (premises p -> q and q, conclusion p) and it returns False, because the assignment p=False, q=True is a counterexample. Computation tests validity here; the truth-table argument proves it — theme four again, the two halves working together.

🔄 Check Your Understanding 1. Which rule lets you conclude $p \rightarrow r$ from $p \rightarrow q$ and $q \rightarrow r$? 2. You have $\neg p$ and $p \lor q$. What can you conclude, and by which rule? 3. Is "from $p \land q$, conclude $q$" a valid rule? Name it.

Answers

  1. Hypothetical syllogism (transitivity of implication): $p \rightarrow r$.
  2. $q$, by disjunctive syllogism (eliminate the false disjunct $p$).
  3. Yes — simplification (drop a conjunct). From $p \land q$ you may conclude $q$ (or $p$).

3.3 Common fallacies

A fallacy is an argument form that is invalid — there is at least one assignment making all premises true and the conclusion false — but that resembles a valid form closely enough to fool people. Fallacies are dangerous precisely because they are persuasive. The two you must be able to name on sight are the evil twins of modus ponens and modus tollens.

Definition (fallacy). A fallacy is an invalid argument form that is commonly mistaken for a valid one. (We mean specifically formal fallacies — bad logical structure. Informal fallacies, like attacking the person instead of the argument, are about rhetoric, not form, and are out of scope here.)

Affirming the consequent

This is the cache argument from 3.1, now named. It takes the valid modus ponens and runs it backward illegitimately: it affirms the consequent $q$ and illegitimately concludes the antecedent $p$.

Affirming the consequent (fallacy). $$ > \begin{array}{l} > p \rightarrow q \\ > q \\ > \hline > \therefore\ p > \end{array} \qquad \text{(INVALID)} > $$

We already built its truth table in 3.1 and found the counterexample in row 3 ($p$ false, $q$ true). In words: an implication does not promise that $q$ can only arise from $p$. There may be other ways to make $q$ true. "If it rained, the grass is wet. The grass is wet. Therefore it rained" ignores the sprinkler. The CS version is the most important sentence in this chapter:

⚠️ Common Pitfall — the testing fallacy. "If the code is correct, the tests pass. The tests passed. Therefore the code is correct." This is affirming the consequent, and it is invalid. Passing tests are consistent with correct code, but they are also consistent with buggy code that the tests simply don't exercise. This is exactly why theme two of this book insists that "it passed the tests" is not the same as "it is correct." Tests can refute correctness (a failing test is modus tollens: correct code would pass, this didn't, so the code is not correct) but they cannot confirm it. Only a proof closes that gap — which is what the rest of Part I teaches you to write.

Denying the antecedent

The mirror mistake corrupts modus tollens. Modus tollens validly denies the consequent to deny the antecedent; this fallacy denies the antecedent and illegitimately concludes the negated consequent.

Denying the antecedent (fallacy). $$ > \begin{array}{l} > p \rightarrow q \\ > \neg p \\ > \hline > \therefore\ \neg q > \end{array} \qquad \text{(INVALID)} > $$

The truth table reveals the counterexample:

$p$ $q$ $p \rightarrow q$ $\neg p$ $\neg q$ (conclusion)
T T T F F
T F F F T
F T T T F
F F T T T

Row 3 again: $p$ false, $q$ true makes both premises ($p \rightarrow q$ and $\neg p$) true while the conclusion $\neg q$ is false. "If you're an admin, you can read the file. You're not an admin. Therefore you can't read the file" — but maybe the file is world-readable. Denying the antecedent forgets that $q$ might hold for reasons unrelated to $p$.

💡 Intuition: Line up all four patterns built from $p \rightarrow q$ and one extra premise. The two valid ones key off the direction the implication actually runs; the two fallacies try to run it the wrong way.

Extra premise Conclude Valid? Name
$p$ (affirm antecedent) $q$ modus ponens
$\neg q$ (deny consequent) $\neg p$ modus tollens
$q$ (affirm consequent) $p$ affirming the consequent
$\neg p$ (deny antecedent) $\neg q$ denying the antecedent

The diagonal pairs are valid; the "straight-across" pairs ($p \!\to\! p$-ish, affirming/denying the same side) are fallacies. When in doubt, ask: is there another way the consequent could be true? If yes, you're being asked to commit a fallacy.

A fallacy is not the same as a false premise

A subtle but crucial distinction: an argument can fail for two completely different reasons. It can be invalid (bad form — a fallacy), or it can be unsound because a premise is false (good form, bad inputs). These are not the same defect, and the fix is different.

  • Affirming the consequent is invalid: even if every premise is true, the conclusion can be false. The structure leaks. No choice of true premises rescues it.
  • "All birds can fly; a penguin is a bird; therefore a penguin can fly" is valid (it's universal modus ponens, 3.4) but unsound — the first premise is false. The structure is fine; an input was wrong.

Diagnosing which kind of failure you're looking at is a real skill. If the form is broken, no amount of fact-checking the premises will help. If the form is fine but a premise is false, the reasoning is salvageable once you fix the premise.

🐛 Find the Error. A reviewer argues: "If there's a memory leak, RSS grows without bound. RSS is growing without bound. So there's a memory leak — ship the fix." Name the flaw, give a concrete counterexample, and say whether the argument is invalid or merely unsound.

Answer

This is affirming the consequent (premises $p \rightarrow q$ and $q$, concluding $p$), so the argument is invalid — a form failure, not a premise failure. Counterexample: RSS can grow without bound for reasons other than a leak — for example, a deliberately growing in-memory cache, memory fragmentation, or a large legitimate working set. Growing RSS is consistent with "no leak," so the assignment $p=\text{false}, q=\text{true}$ makes both premises true and the conclusion false. The correct move is to gather evidence that distinguishes a leak from the alternatives, not to treat the symptom as proof of one cause.

🔄 Check Your Understanding 1. "If $n$ is divisible by 4, then $n$ is even. $n = 6$ is even. Therefore $n$ is divisible by 4." Name the fallacy and give the counterexample already embedded in the statement. 2. Your friend says modus tollens and denying the antecedent are "basically the same." In one sentence, what's the difference? 3. Can a valid argument also be a fallacy? Why or why not?

Answers

  1. Affirming the consequent. The counterexample is right there: $n = 6$ is even (consequent true) but not divisible by 4 (antecedent false) — premises true, conclusion false.
  2. Modus tollens denies the consequent ($\neg q$) to conclude $\neg p$ and is valid; denying the antecedent denies the antecedent ($\neg p$) to conclude $\neg q$ and is a fallacy. Same implication, opposite (and opposite-validity) moves.
  3. No, by definition. "Fallacy" means an invalid form mistaken for a valid one; validity and being a fallacy are mutually exclusive.

3.4 Inference with quantifiers

Propositional rules manipulate whole statements, but most real claims are quantified: "for all inputs, the function returns a sorted list," "there exists a key that decrypts the message." To reason about these we need rules that move between a quantified statement and a statement about a specific (or arbitrary) element. There are exactly four, organized as a tidy $2 \times 2$: for each quantifier ($\forall$, $\exists$) we can strip it off (instantiate) or put it on (generalize). These build directly on the quantifier machinery from Chapter 2.

Let $P(x)$ be a predicate over a domain (recall domain and predicate from Chapter 2). Write $c$ for a specific named element of the domain and $x$ for an arbitrary one.

Universal instantiation (UI). From $\forall x\, P(x)$, conclude $P(c)$ for any particular element $c$ of the domain. $$\forall x\, P(x) \;\;\vdash\;\; P(c)$$ What's true of everything is true of any specific thing.

This is the rule behind the "All integers are even; 7 is an integer; ∴ 7 is even" argument from 3.1: UI strips the $\forall$ to talk about 7, and then propositional modus ponens finishes the job. UI is the formal justification for applying a general theorem to your specific case — something you do in every proof. When you write "since $\gcd(a,b)$ divides both $a$ and $b$" about your particular $a$ and $b$, you are silently using UI on the general theorem.

Universal generalization (UG). If you prove $P(c)$ for an arbitrary element $c$ — one about which you assumed nothing specific — then you may conclude $\forall x\, P(x)$. $$P(c)\ \text{for arbitrary } c \;\;\vdash\;\; \forall x\, P(x)$$ What you can prove about a no-special-properties element, you've proved about all of them.

UG is the engine of every "let $n$ be an arbitrary integer…" proof you will ever write. The phrase "let $x$ be arbitrary" is a promise to use no special property of $x$, so that whatever you conclude about $x$ holds universally. This is the rule that makes general proofs possible at all — without it, you could only ever check examples.

⚠️ Common Pitfall — the side condition on UG. "Arbitrary" must mean genuinely arbitrary. If your argument secretly used a special feature of $c$ (you assumed $c$ is even, or $c > 0$, or $c$ equals some earlier element), you may not generalize. The classic abuse: prove $P(0)$ and then claim $\forall x\, P(x)$ by UG — illegal, because $0$ is special. UG requires that the element be unconstrained throughout the subproof. This single side condition is where most bogus "proofs" using quantifiers go wrong.

Existential instantiation (EI). From $\exists x\, P(x)$, introduce a fresh name $c$ (one not used anywhere earlier) and assert $P(c)$. $$\exists x\, P(x) \;\;\vdash\;\; P(c)\quad (c \text{ new})$$ If something satisfies $P$, give that something a brand-new name and reason about it.

Existential generalization (EG). From $P(c)$ for some particular $c$, conclude $\exists x\, P(x)$. $$P(c) \;\;\vdash\;\; \exists x\, P(x)$$ If a specific thing satisfies $P$, then something does.

EG is the easy one: exhibit a witness and you've proved existence. (We'll formalize existence proofs in Chapter 5.) EI is the subtle one, and its side condition is the mirror of UG's.

⚠️ Common Pitfall — the freshness condition on EI. When you instantiate an existential, the name $c$ must be brand new. You know something satisfies $P$, but you do not get to choose which thing — so you must not reuse a name that already carries commitments. The deadly error: from $\exists x\, P(x)$ and $\exists x\, Q(x)$, instantiate both with the same $c$ to get $P(c) \land Q(c)$, then conclude $\exists x\,(P(x) \land Q(x))$. That is invalid — the $x$ making $P$ true need not be the $x$ making $Q$ true. Some student passed and some student failed does not mean some single student both passed and failed. Always instantiate each existential with its own fresh name.

Here is the full table for reference:

Rule From Conclude Side condition
Universal instantiation (UI) $\forall x\, P(x)$ $P(c)$ $c$ any element of the domain
Universal generalization (UG) $P(c)$, $c$ arbitrary $\forall x\, P(x)$ $c$ truly arbitrary (no special assumptions)
Existential instantiation (EI) $\exists x\, P(x)$ $P(c)$ $c$ a fresh name
Existential generalization (EG) $P(c)$ $\exists x\, P(x)$ none

A worked quantified argument

Let's prove the classic "Socrates" syllogism formally — the textbook example of valid quantified reasoning — using our rules. Domain: all things.

Premises. $\forall x\,(H(x) \rightarrow M(x))$ ("all humans are mortal") and $H(s)$ ("Socrates is human", where $s$ names Socrates). Conclusion. $M(s)$ ("Socrates is mortal").

The strategy first. The conclusion $M(s)$ is about the specific individual $s$, but our general premise is quantified. So we strip the quantifier with UI to get a statement about $s$, then finish with propositional modus ponens. Quantified arguments almost always go: instantiate to a specific element, do propositional logic, generalize back if the goal is quantified.

Derivation.

Step Statement Justification
1 $\forall x\,(H(x) \rightarrow M(x))$ Premise
2 $H(s)$ Premise
3 $H(s) \rightarrow M(s)$ Universal instantiation, 1 (take $c = s$)
4 $M(s)$ Modus ponens, 2, 3

Clean and mechanical. Note the interplay: UI bridges from the quantified world to the propositional world, where modus ponens does the real work. Most quantified proofs have this two-phase rhythm.

Quantifier rules in code

Over a finite domain, instantiation and generalization are literally loops — the same connection Chapter 2 drew between quantifiers and all()/any(). Universal generalization is "check every element"; existential generalization is "find one witness." We can verify a quantified argument by brute force over a finite domain.

def forall(domain, pred):
    return all(pred(x) for x in domain)        # universal: every element

def exists(domain, pred):
    return any(pred(x) for x in domain)        # existential: some element

domain = range(1, 11)                          # the integers 1..10
H = lambda x: x % 2 == 0                        # "x is even"  (our 'human')
M = lambda x: x % 1 == 0                        # "x is an integer" (our 'mortal')

# Premise: forall x (H(x) -> M(x)); verify by checking every element.
premise = forall(domain, lambda x: (not H(x)) or M(x))
# Apply UI at c = 4, then modus ponens since H(4) holds:
conclusion = M(4) if (premise and H(4)) else None
print(premise, conclusion)
# Expected output:
# True True

The loop confirms the universal premise on this finite domain and then mimics UI + modus ponens at $c = 4$. On a finite domain this is a genuine check; over an infinite domain (all integers) you cannot loop, and you need the rules — which is exactly why UI/UG exist. Computation handles the finite case; inference handles the infinite case. (We'll see in Chapter 4 how a proof discharges the infinite case that no loop can reach.)

🔄 Check Your Understanding 1. From $\forall x\,(x^2 \ge 0)$, what does UI let you conclude about $x = -3$? 2. Why can't you instantiate $\exists x\, P(x)$ and $\exists x\, Q(x)$ with the same fresh name? 3. You prove "$P(c)$" where, midway, you assumed $c$ is even. Can you conclude $\forall x\, P(x)$?

Answers

  1. $(-3)^2 \ge 0$, i.e. $9 \ge 0$. UI lets you plug in any domain element, including $-3$.
  2. Because the witness for $P$ and the witness for $Q$ need not be the same element. Reusing the name falsely asserts that one element satisfies both. Each EI needs its own brand-new name.
  3. No. You used a special property ($c$ even), so $c$ was not arbitrary; UG's side condition fails. You've only proved $P$ for even elements: $\forall x\,(x \text{ even} \rightarrow P(x))$.

3.5 Building a derivation step by step

You now have the rules. The remaining skill is orchestration: assembling rules into a derivation (also called a formal proof) that carries you from premises to a target conclusion.

Definition (derivation). A derivation of a conclusion $C$ from premises $\Gamma = \{p_1, \dots, p_n\}$ is a finite numbered sequence of statements ending in $C$, in which every line is either (a) one of the premises, or (b) obtained from earlier lines by a single named inference rule. The existence of a derivation certifies that $C$ is a valid consequence of $\Gamma$, written $\Gamma \vdash C$ (read "$\Gamma$ proves $C$").

The symbol $\vdash$ ("turnstile") means syntactic consequence — "there is a derivation." It is the proof-step cousin of the semantic $\equiv$ from Chapter 1 (truth-table equivalence). The beautiful fact (the soundness of the rule system) is that anything you can derive is genuinely valid: if $\Gamma \vdash C$, then every assignment making all of $\Gamma$ true makes $C$ true. So a derivation isn't just bookkeeping — it is a certificate of validity that a machine can check line by line, which is exactly what a proof assistant does.

The method: work backward, then forward

Beginners stare at the premises and hope. Experts work backward from the goal:

  1. Look at the conclusion. Which rule could produce it? If the goal is $q$, candidates are modus ponens (need $p \rightarrow q$ and $p$) or disjunctive syllogism (need $p \lor q$ and $\neg p$), etc.
  2. Make the inputs to that rule your new sub-goals. Now you need, say, $p \rightarrow q$ and $p$.
  3. Repeat until every sub-goal is a premise or something you can already derive.
  4. Then write the derivation forward, premises first, so each line's justification points only backward to earlier lines.

This backward-then-forward discipline is the same one you'll use for every proof technique in Chapters 4–7. The strategy comes from working backward; the written proof always reads forward.

A fully worked derivation

Claim. $\{\,r,\ r \rightarrow (s \lor t),\ s \rightarrow u,\ \neg u,\ t \rightarrow w\,\} \vdash w$.

The strategy first (worked backward). The goal is $w$. Only $t \rightarrow w$ produces $w$, by modus ponens — so I need $t$. Where does $t$ come from? I can get $s \lor t$ (modus ponens on $r$ and $r \rightarrow (s \lor t)$), and if I can rule out $s$, disjunctive syllogism gives $t$. Can I rule out $s$? Yes: $s \rightarrow u$ and $\neg u$ give $\neg s$ by modus tollens. So the forward plan is: get $s \lor t$; get $\neg s$; get $t$; get $w$.

Derivation.

Step Statement Justification
1 $r$ Premise
2 $r \rightarrow (s \lor t)$ Premise
3 $s \rightarrow u$ Premise
4 $\neg u$ Premise
5 $t \rightarrow w$ Premise
6 $s \lor t$ Modus ponens, 1, 2
7 $\neg s$ Modus tollens, 3, 4
8 $t$ Disjunctive syllogism, 6, 7
9 $w$ Modus ponens, 5, 8

Read it forward and every step is locally checkable: line 6 cites lines 1 and 2 and the rule "modus ponens"; you can verify that those two lines really do match the pattern $p,\ p \rightarrow q$ with $q = (s \lor t)$. A reader (or a program) needs no creativity to check the derivation — all the creativity went into finding it, working backward. That asymmetry between hard-to-find and easy-to-check is one of the deepest ideas in computer science, and it returns in force when we study P versus NP in Chapter 37.

🧩 Productive Struggle. Before reading on, derive $\neg p$ from the three premises $p \rightarrow (q \land r)$, $\ \neg r$, and $\ q$. (Hint: work backward — what single rule produces a negation like $\neg p$? What does that rule need, and how do you build it from $\neg r$?)

One solution

Goal $\neg p$ suggests modus tollens on $p \rightarrow (q \land r)$, which needs $\neg(q \land r)$. By Chapter 1's De Morgan, $\neg(q \land r) \equiv \neg q \lor \neg r$; from $\neg r$, addition gives $\neg q \lor \neg r$, i.e. $\neg(q \land r)$. Then modus tollens finishes. Derivation: (1) $p \rightarrow (q \land r)$ [premise]; (2) $\neg r$ [premise]; (3) $\neg q \lor \neg r$ [addition, 2]; (4) $\neg(q \land r)$ [De Morgan, 3 — a logical equivalence from Ch. 1]; (5) $\neg p$ [modus tollens, 1, 4]. (The premise $q$ is a red herring — not every premise must be used.)

Checking a derivation mechanically

Because each step cites a rule and its input lines, a derivation is machine-checkable: walk the lines in order, and for each non-premise line confirm its cited rule legitimately produces it from the cited earlier lines. This is the heart of how proof assistants like Coq, Lean, and Isabelle work — they don't find your proof, but they ruthlessly check every step. Our Project Checkpoint builds a tiny checker in this spirit.

🔄 Check Your Understanding 1. In the worked derivation, which line is the witness that lets us discharge the disjunction $s \lor t$ down to $t$? 2. A derivation has a line "7. $q$ — Modus ponens, 5, 6" but line 5 is $q \rightarrow r$ and line 6 is $q$. Is line 7 justified? Explain. 3. Why does every justification cite only earlier lines, never later ones?

Answers

  1. Line 7, $\neg s$ — ruling out $s$ is what lets disjunctive syllogism extract $t$ from $s \lor t$.
  2. No. Modus ponens from $q \rightarrow r$ (line 5) and $q$ (line 6) yields $r$, not $q$. The cited rule does not produce the claimed line — the step is invalid. (From these you could derive $r$, not $q$.)
  3. To prevent circular reasoning. If a line could cite a later line, you could "prove" $A$ from $B$ and $B$ from $A$ with neither ever established. Forward-only citation guarantees every line rests on an already-justified foundation, bottoming out at the premises.

3.6 Where inference lives in computer science

Rules of inference are not a museum piece you leave behind after Part I. They are running, right now, in the tools on your machine. Three examples make the point — and each is a system you can go read about.

Type inference. A type system is, quite literally, a set of inference rules. Languages in the ML family (and, under the hood, the type checkers in Rust, Haskell, TypeScript, and Scala) are built on rules written in exactly the premises-over-conclusion form of this chapter. A typing rule like

$$ \frac{\Gamma \vdash e_1 : \text{int} \qquad \Gamma \vdash e_2 : \text{int}}{\Gamma \vdash e_1 + e_2 : \text{int}} $$

reads: from the premises "$e_1$ has type int" and "$e_2$ has type int," infer the conclusion "$e_1 + e_2$ has type int."* That horizontal bar is the same $\therefore$ line you've been drawing all chapter. When tsc or rustc reports a type, it has constructed a derivation in a system of such rules — a proof that your program is well-typed. A type error is the checker failing to find* such a derivation. The Hindley–Milner type inference algorithm (the basis of ML and a major influence on the others) is essentially a procedure for discovering these derivations automatically.

Logic programming. In Prolog, you write the premises and ask the engine to derive a conclusion. A clause like mortal(X) :- human(X). is the implication $\forall x\,(\text{human}(x) \rightarrow \text{mortal}(x))$, and the fact human(socrates). is a premise. Ask ?- mortal(socrates). and the engine runs the Socrates derivation from 3.4 — universal instantiation plus modus ponens — entirely on its own, using a strategy built on the resolution rule from 3.2. Prolog is, in a real sense, an inference engine you program by stating logical facts and rules.

Verification and SMT solvers. Tools like model checkers and SMT solvers (used to verify everything from CPU designs to distributed protocols and aircraft software) prove that a system always satisfies a property by, in effect, showing no counterexample assignment exists — the validity question from 3.1, scaled up with enormously clever search. When such a tool says "verified," it is asserting it found a proof (or exhausted the possibilities); when it says "counterexample," it hands you exactly the assignment that makes the premises true and your desired property false — the same kind of counterexample row you read off truth tables in this chapter.

🔗 Connection. The thread runs forward through the whole book. The "no counterexample exists" question is SAT, formalized as the first NP-complete problem in Chapter 37. The "find a derivation" skill is what every proof in Chapters 4–7 exercises by hand. And the premises-over-line notation you learned here is the same notation researchers use to define programming languages. You are not learning a toy; you are learning the notation and the moves that the foundations of computing are written in.

🔄 Check Your Understanding 1. A typing rule has two premises above the bar and one judgment below it. Which inference idea from this chapter does the bar correspond to? 2. When a type checker reports an error, in derivation terms, what has happened?

Answers

  1. The bar is the $\therefore$ of an argument: it says the judgment below follows from the premises above. A typing rule is an inference rule. 2. The checker failed to construct a derivation of any type for the expression — there is no valid sequence of typing-rule applications that types your program, so it rejects it.

Project Checkpoint: a validity checker for logic.py

Chapters 1–2 began the Toolkit's logic.py with a truth-table generator and a tautology checker. Chapter 3 adds the natural next piece: a function that decides whether an argument is valid by the definition in 3.1 — no assignment makes every premise true while the conclusion is false. This turns the chapter's central concept into one reusable function, and it embodies theme four: computation tests validity exhaustively on a finite variable set; the inference rules prove it for the shape once and for all.

Add this to dmtoolkit/logic.py (alongside truth_table and is_tautology from Chapters 1–2):

from itertools import product

def is_valid(premises, conclusion, names):
    """Return True iff the argument (premises ⊢ conclusion) is valid.

    premises  : list of callables taking len(names) booleans -> bool
    conclusion: one such callable
    names     : list of variable-name strings (fixes the arity & order)

    Valid means: no assignment makes every premise True and conclusion False.
    """
    n = len(names)
    for vals in product([False, True], repeat=n):
        if all(p(*vals) for p in premises) and not conclusion(*vals):
            return False                      # found a counterexample row
    return True

# Modus ponens is valid; affirming the consequent is not.
mp = is_valid([lambda p, q: (not p) or q, lambda p, q: p],
              lambda p, q: q, ["p", "q"])
ac = is_valid([lambda p, q: (not p) or q, lambda p, q: q],
              lambda p, q: p, ["p", "q"])
print(mp, ac)
# Expected output:
# True False

The output True False says exactly what the chapter argued: modus ponens is valid, and affirming the consequent is not — the checker finds the row $p=$False, $q=$True where the premises hold but the conclusion fails. Because is_valid consumes the same boolean-function representation as truth_table and is_tautology, the three compose: an argument is valid precisely when the conditional premises... -> conclusion is a tautology, so is_valid(prems, c, names) always agrees with feeding that big conditional to is_tautology. Two views of one idea, cross-checking each other.

Toolkit so far: logic.py now offers truth_table, is_tautology, equivalent, and is_valid — a complete propositional reasoning kit. By the capstone (Chapter 39) it will sit beside modules for sets, number theory, graphs, and more. For now you can mechanically certify any finite-variable argument and — thanks to this chapter's rules — prove the ones too large to enumerate.


Summary

This chapter turned logic from something you describe into something you do: reason from premises to conclusions in truth-preserving steps.

Core definitions.

Term Definition
Argument A list of premises followed by a conclusion.
Validity No assignment makes all premises true and the conclusion false (equivalently: $\bigwedge \text{premises} \rightarrow \text{conclusion}$ is a tautology). A property of form.
Soundness Valid and all premises actually true. Guarantees a true conclusion.
Fallacy An invalid form commonly mistaken for a valid one.
Derivation A numbered sequence ending in the conclusion, each line a premise or obtained from earlier lines by a named rule; certifies $\Gamma \vdash C$.

The propositional inference rules.

Rule Premises ⟹ Conclusion
Modus ponens $p \rightarrow q,\ p \ \Rightarrow\ q$
Modus tollens $p \rightarrow q,\ \neg q \ \Rightarrow\ \neg p$
Hypothetical syllogism $p \rightarrow q,\ q \rightarrow r \ \Rightarrow\ p \rightarrow r$
Disjunctive syllogism $p \lor q,\ \neg p \ \Rightarrow\ q$
Addition $p \ \Rightarrow\ p \lor q$
Simplification $p \land q \ \Rightarrow\ p$
Conjunction $p,\ q \ \Rightarrow\ p \land q$
Resolution $p \lor q,\ \neg p \lor r \ \Rightarrow\ q \lor r$

The two fallacies to memorize.

Fallacy Form Why invalid
Affirming the consequent $p \rightarrow q,\ q \ \Rightarrow\ p$ $q$ may hold for other reasons (the "tests passed ⇒ correct" trap)
Denying the antecedent $p \rightarrow q,\ \neg p \ \Rightarrow\ \neg q$ $q$ may hold even when $p$ doesn't

The four quantifier rules.

Rule From ⟹ To Side condition
Universal instantiation (UI) $\forall x\, P(x) \Rightarrow P(c)$ $c$ any element
Universal generalization (UG) $P(c) \Rightarrow \forall x\, P(x)$ $c$ truly arbitrary
Existential instantiation (EI) $\exists x\, P(x) \Rightarrow P(c)$ $c$ a fresh name
Existential generalization (EG) $P(c) \Rightarrow \exists x\, P(x)$ none

Decision aids.

  • Validity vs. soundness: ask two questions — Is the form airtight (valid)? Are the premises true (sound)? Both "yes" ⟹ trust the conclusion.
  • Spotting a fallacy: if an argument concludes the antecedent from the consequent, or the negated consequent from the negated antecedent, it is invalid. Ask "could the consequent be true another way?"
  • Building a derivation: work backward from the goal to pick rules and sub-goals; write the proof forward so each line cites only earlier lines.
  • Quantifier side conditions: UG needs a genuinely arbitrary element; EI needs a genuinely fresh name. Most quantifier-proof bugs violate one of these.

Toolkit added: is_valid(premises, conclusion, names) in logic.py — exhaustively certifies a finite-variable argument; agrees with is_tautology on the corresponding conditional.


Spaced Review

Retrieval keeps knowledge available. Answer from memory before checking.

  1. (Ch. 1) Validity says a certain conditional is a tautology. Write that conditional for the argument with premises $p \rightarrow q$, $\neg q$ and conclusion $\neg p$, and say in one phrase how you'd confirm it's a tautology.
  2. (Ch. 1) We justified modus tollens using the contrapositive equivalence. State the contrapositive of $p \rightarrow q$ and name the Chapter 1 fact that says it's equivalent.
  3. (Ch. 2) Existential instantiation requires a fresh name. Using negation of quantifiers from Chapter 2, write $\neg \forall x\, P(x)$ as an existential, and explain why the resulting witness must get a fresh name when you instantiate it.
  4. (Ch. 2) The Socrates argument used a predicate $H(x)$ over a domain. What is the domain of a predicate, and what does $H(s)$ mean once you fix the individual $s$?

Answers

  1. The conditional is $\big((p \rightarrow q) \land \neg q\big) \rightarrow \neg p$; confirm it's a tautology by checking its truth table is all-T (every row true) — the Chapter 1 test. (It is: this is modus tollens.) 2. The contrapositive is $\neg q \rightarrow \neg p$; Chapter 1's logical-equivalence $p \rightarrow q \equiv \neg q \rightarrow \neg p$ guarantees they have identical truth tables.
  2. $\neg \forall x\, P(x) \equiv \exists x\, \neg P(x)$ (pushing $\neg$ through the quantifier, Chapter 2). Instantiating that existential asserts some element satisfies $\neg P$, but not a particular known one — so it needs a fresh name to avoid falsely identifying it with an element already in play.
  3. The domain is the set of values the variable ranges over (Chapter 2); $H(s)$ is the proposition obtained by substituting the specific individual $s$ for $x$ in $H(x)$ — a concrete true-or-false statement, no longer quantified.

What's Next

You can now reason in airtight steps — but every derivation in this chapter started from premises handed to you. Real mathematics asks you to prove theorems where the premises are definitions and the conclusion is something nobody told you was true. Chapter 4 turns the rules of inference into full proof strategies: how to write a rigorous direct proof straight from definitions (even, odd, divisibility — the first appearance of the number theory that powers Part IV), and how proof by contraposition repackages a stubborn claim into the equivalent modus-tollens-shaped one you already know how to attack. The inference rules are the gears; from here on, you build the machines.