Case Study: Building a Witness-Checked Conjecture Lab
"First make the conjecture cheap to test. Then make the proof impossible to fake."
Executive Summary
In this build-focused case study you will construct a small tool — a conjecture lab — that captures the workflow this whole chapter is about: test a universal claim on many cases, then back it with a proof whose key step is a concrete witness. You will write it in three layers. First, a counterexample searcher that screens a $\forall$ claim cheaply (theme four: computation tests). Second, a witness checker that takes the constructive heart of a direct proof — the integer the proof "manufactures" — and verifies it does its job on a domain. Third, a tiny proof-shape recorder that distinguishes a direct proof from a proof by contraposition, so the lab forces you to name your strategy the way §4.5 demands.
The lab is deliberately small and never executed (every output is hand-derived). The point is not the code; it is that building the workflow makes the chapter's central distinction — evidence versus proof — concrete and reusable. By the end you will have a reusable harness, three proven theorems that ran through it, and a clear sense of what your harness does not certify.
Skills applied
- Building a small verification harness incrementally, layer by layer (§4.1, §4.3).
- Encoding the witness of a direct proof as a function and checking it (§4.2, §4.3).
- Encoding a proof by contraposition and seeing why its witness lives on the $\neg Q$ side (§4.4).
- Separating "no counterexample found" (evidence) from "proof" (certainty) in code (§4.1, theme four).
- Designing for the limits of testing — knowing what the harness cannot prove (§4.1).
Background: why build a lab at all?
You already have two halves of the workflow lying around. The chapter's Project Checkpoint gave you
find_counterexample(predicate, domain) — the testing half. And every direct proof in §4.3 had the
same shape: assume the hypothesis, then exhibit a witness integer (the "$m$" in $n^2 = 2m+1$, the
"$3c$" in $n = 2(3c)$). What is missing is a tool that holds those two halves together — that lets you
screen a conjecture, and then, once you have a candidate witness rule from a proof, check that the
witness actually works across a domain before you trust your algebra.
🔗 Connection. This is theme four — computation and proof are complementary — turned into a single object. The same instinct scales: when you build RSA in Chapter 25, you will test key generation on small primes before trusting the number theory that proves it correct (Chapter 23). A conjecture lab is training wheels for that habit.
💡 Intuition: A direct proof of "$P(n) \rightarrow Q(n)$" is really a function: feed it the data that makes $P(n)$ true, and it returns the witness that makes $Q(n)$ true. Our lab makes that function literal. If your witness function is wrong, the checker catches it on some $n$ — exactly where your proof's algebra slipped.
Phase 1: The screening layer (test the conjecture)
Start with the cheap screen. Given a conjecture as a predicate over a domain, look for a counterexample. This is the Project Checkpoint function, repackaged into the lab with a friendlier report.
def screen(claim, domain):
"""Screen 'for all x in domain, claim(x)'. Return ('refuted', x) at the
first failure, else ('no counterexample', None). NOT a proof either way."""
for x in domain:
if not claim(x):
return ("refuted", x)
return ("no counterexample", None)
# Conjecture 1: for every integer n, if n is odd then 5n+3 is even.
c1 = lambda n: (n % 2 == 0) or (5 * n + 3) % 2 == 0 # "n odd -> 5n+3 even"
print(screen(c1, range(-100, 101)))
# Expected output:
# ('no counterexample', None)
Hand-trace the report: c1(n) is True whenever n is even (the hypothesis fails, so the conditional
holds vacuously) and, for odd n, whenever 5n+3 is even. For odd n, $5n + 3$ is (odd $\times$ odd)
$+$ odd $=$ odd $+$ odd $=$ even, so c1(n) is True for every n in range. The loop finds no failure
and returns ('no counterexample', None).
⚠️ Common Pitfall: It is tempting to read
('no counterexample', None)as "proved." It is not. The screen examined $201$ integers; the claim is about infinitely many. The whole reason for Phases 2 and 3 is that this report is evidence, and evidence is where a proof begins, not ends (§4.1).🔄 Check Your Understanding Why does
c1returnTruefor evenneven though the claim only talks about oddn? Which row of the implication truth table is that?
Answer
For even
n, the hypothesis "$n$ is odd" is false, and a conditional with a false hypothesis is vacuously true (§4.5) — rows 3–4 of the $\rightarrow$ table. We encode the conditionalP -> Qas(not P) or Q, soc1correctly reportsTrueon the rows wherePfails.
Phase 2: The witness-checker (the heart of a direct proof)
Now the new piece. A direct proof of "$P(n) \rightarrow Q(n)$" works by exhibiting a witness — for an "even/odd/divides" conclusion, an integer $w$ such that the conclusion's defining equation holds. We encode three things: the hypothesis predicate, a witness function $w(n)$ extracted from the proof, and a check that the conclusion's equation holds with that witness.
def check_direct(hyp, witness, verify, domain):
"""For each n in domain with hyp(n) true, confirm verify(n, witness(n)).
Returns ('ok', None) if the witness works everywhere the hypothesis holds,
else ('witness fails', n). This is the constructive core of a direct proof."""
for n in domain:
if hyp(n) and not verify(n, witness(n)):
return ("witness fails", n)
return ("ok", None)
# Theorem (Example 1, sec 4.3): if n is odd, then n^2 is odd.
# Proof's witness: n = 2k+1 gives n^2 = 2(2k^2+2k)+1, so m = 2k^2 + 2k.
# In terms of n itself (n = 2k+1 => k = (n-1)//2):
hyp_odd = lambda n: n % 2 == 1
witness_m = lambda n: 2 * ((n - 1) // 2) ** 2 + 2 * ((n - 1) // 2)
verify_sq = lambda n, m: n * n == 2 * m + 1 # n^2 == 2m + 1 (def. of odd)
print(check_direct(hyp_odd, witness_m, verify_sq, range(-99, 100, 2)))
# Expected output:
# ('ok', None)
This is the part that makes the abstraction click. The proof of Example 1 manufactured the integer
$m = 2k^2 + 2k$ so that $n^2 = 2m + 1$. Here we turn that manufactured integer into the function
witness_m, and check_direct confirms $n^2 = 2m + 1$ for every odd $n$ in the domain. Let us
hand-trace one value to see the witness is real, not magic.
Take $n = 5$ (so $k = (5-1)//2 = 2$). Then witness_m(5) $= 2 \cdot 2^2 + 2 \cdot 2 = 8 + 4 = 12$, and
verify_sq(5, 12) checks $5^2 = 2 \cdot 12 + 1$, i.e. $25 = 25$. True. The witness $m = 12$ is exactly
the integer the proof promised. Repeating for every odd $n$ in range yields no failure, so the report is
('ok', None).
🚪 Threshold Concept. Here is the idea worth carrying out of this chapter: a direct proof is a construction, and its witness can be run. Once you see "$\exists m,\ n^2 = 2m+1$" as "here is a program computing $m$ from $n$," the line between proving and building dissolves. The same realization powers constructive mathematics, type theory, and the proof-carrying code that ships in real compilers. For now: if you can write the witness function and watch it pass, your direct proof's core is sound; if it fails on some $n$, your algebra slipped exactly there.
🐛 Find the Error. A teammate adds a "theorem": "if $n$ is odd, then $n^2$ is odd," but supplies the wrong witness
witness_bad = lambda n: ((n - 1) // 2) ** 2(they forgot the cross term). What doescheck_direct(hyp_odd, witness_bad, verify_sq, range(1, 10, 2))return, and what does that tell them about their proof?
Answer
It returns
('witness fails', 3). Trace $n = 3$: $k = 1$,witness_bad(3)$= 1^2 = 1$, andverify_sq(3, 1)checks $9 = 2 \cdot 1 + 1 = 3$ — false. (Even $n = 1$ passes by luck: witness $0$, $1 = 1$; the failure first shows at $3$.) The harness pinpoints $n = 3$ as where the algebra breaks — the missing cross term $2k$ matters. A wrong witness is a wrong proof, and the checker finds the exact case that exposes it.
Phase 3: Encoding a proof by contraposition
Direct proofs put the witness on the conclusion side. A proof by contraposition of
"$P(n) \rightarrow Q(n)$" instead proves "$\neg Q(n) \rightarrow \neg P(n)$," so its witness lives on the
$\neg P$ side. We do not need a new checker — we reuse check_direct, but feed it the contrapositive's
hypothesis ($\neg Q$) and a witness for $\neg P$. The lab just has to record which shape we used.
Take the theorem the chapter could not prove directly (Example 4, §4.4): "if $n^2$ is odd, then $n$ is odd." Its contrapositive is "if $n$ is even, then $n^2$ is even," whose direct proof exhibits the witness $n^2 = 2(2k^2)$, i.e. $m = 2k^2$.
def check_contrapositive(not_q, witness, verify, domain):
"""Check the contrapositive 'not_q(n) -> not_p(n)' directly, via a witness
for not_p. Same engine as check_direct -- only the labels differ."""
return check_direct(not_q, witness, verify, domain) # reuse the core
# Theorem (Example 4): if n^2 is odd then n is odd.
# Contrapositive: if n is EVEN (not_q) then n^2 is EVEN (not_p).
# Direct proof of contrapositive: n = 2k => n^2 = 2(2k^2), witness m = 2k^2.
not_q_even = lambda n: n % 2 == 0 # "n is even" = not "n is odd"
witness_even = lambda n: 2 * (n // 2) ** 2 # m with n^2 = 2m
verify_even = lambda n, m: n * n == 2 * m # n^2 == 2m (def. of even)
print(check_contrapositive(not_q_even, witness_even, verify_even, range(-100, 101, 2)))
# Expected output:
# ('ok', None)
Hand-trace $n = 6$: $k = 3$, witness_even(6) $= 2 \cdot 3^2 = 18$, and verify_even(6, 18) checks
$36 = 2 \cdot 18 = 36$. True. Across all even $n$ in range, no failure: ('ok', None). We have now
checked both halves of the biconditional "$n$ odd iff $n^2$ odd" — the forward half by check_direct
(Phase 2) and the reverse half by check_contrapositive (here) — mirroring exactly how §4.3–4.4
assembled the full theorem from a direct proof plus a contraposition.
⚠️ Common Pitfall: feeding the converse instead of the contrapositive. A natural slip is to test "if $n$ is even then $n^2$ is even" and believe you have certified the converse direction too. You have not — that conditional is the contrapositive of "$n^2$ odd $\rightarrow n$ odd," not of "$n$ odd $\rightarrow n^2$ odd." The harness is honest only if you wire $\neg Q$ and $\neg P$ correctly; garbage strategy in, garbage certificate out. Write $\neg P$ and $\neg Q$ explicitly (§4.4) before you build the predicates.
Phase 4: The lab in one place, and its limits
Tie the layers into a single driver that, for one theorem, screens it and then checks its witness — producing exactly the chapter's workflow: test, then prove (constructively).
def lab(name, claim, hyp, witness, verify, domain):
s = screen(claim, domain) # cheap evidence
c = check_direct(hyp, witness, verify, domain) # constructive core
return (name, s, c)
# Run the "odd -> odd square" theorem through both stages.
c1 = lambda n: (n % 2 == 0) or (n * n) % 2 == 1 # the conditional, for screening
print(lab("odd^2 is odd", c1, hyp_odd, witness_m, verify_sq, range(1, 60, 2)))
# Expected output:
# ('odd^2 is odd', ('no counterexample', None), ('ok', None))
The report says two different things, and the difference is the whole lesson. ('no counterexample',
None) is evidence: across the screened domain, nothing falsified the claim. ('ok', None) is the
constructive core of a proof: a witness rule, extracted from the §4.3 proof, demonstrably produces
the required equation on every hypothesis-satisfying input checked. Neither line is, by itself, a
finished proof — but together they encode the two activities the chapter insists you keep separate and
keep both.
The honest limits. What the lab does not do: it does not prove anything for the integers outside
domain.check_directreturning('ok', None)confirms your witness works on the sampled inputs; it cannot confirm your witness formula is correct for all $n$ — only the algebra in your written proof does that. The lab's value is catching a wrong witness fast (a('witness fails', n)) and giving you confidence to invest in the proof, not replacing the proof. Mistaking('ok', None)for QED is the very error §4.1 warned against, now wearing a function's clothing.🔄 Check Your Understanding Suppose
check_directreturns('ok', None)onrange(-99, 100, 2)for a witness that is actually wrong for some gigantic odd $n$ outside that range. Has the lab lied? What is the lab actually certifying?
Answer
The lab has not lied — it certifies exactly what it says: the witness works for every (odd) $n$ in the domain it checked. It makes no claim about $n$ outside that domain. A witness correct on a finite sample but wrong in general is precisely why the written proof's algebra (valid for an arbitrary $n$) is irreplaceable. The lab is a fast falsifier and a confidence-builder, not a universal certifier.
Discussion Questions
- Phase 2's
check_directand Phase 3'scheck_contrapositiveshare one engine. Explain why the same code checks both — and what changes (the predicates, not the algorithm) when you switch strategies. Tie this to "a contraposition is a direct proof of the contrapositive" (§4.4). - The lab can return
('witness fails', n)for a true theorem if you supply a wrong witness. Does that mean the theorem is false? Distinguish "the proof attempt is broken" from "the claim is false," referencing §4.6. screenandcheck_directcould disagree:screenfinds no counterexample, butcheck_directreports a witness failure. What would that combination tell you about your proof versus the claim? (Hint: the claim might be true while your witness rule is wrong.)- Why can no enhancement to this lab ever turn
('ok', None)into a genuine proof for all integers? Connect to §4.1's argument that a finite check cannot cover an infinite set. - The lab encodes "even," "odd," and "divides" conclusions as equations with a witness. What kinds of conclusions could it not check this way (e.g., "$\sqrt 2$ is irrational")? What does that reveal about the reach of constructive (witness-based) proof versus the contradiction proofs of Chapter 5?
Your Turn: Extensions
- Option A (divisibility witnesses). Extend the lab to the theorem "if $a \mid b$ then $a \mid bc$"
(Exercise 4.9). Encode the hypothesis on pairs $(a, b, c)$, write the witness ($b = a s \Rightarrow bc
= a(sc)$, so $w = sc$), and a `verify` that checks $bc = a \cdot w$. Run it over a small grid of
triples and hand-derive the
('ok', None). - Option B (catch a real bug). Re-implement the odd-square witness with the deliberate error from the
Find-the-Error box ($m = k^2$, missing the cross term) and run the full
lab. Show thatscreenstill says('no counterexample', None)(because the claim is true) whilecheck_directsays('witness fails', 3)(because the witness is wrong). Write one sentence on why this is the most instructive output the lab can produce. - Option C (strategy recorder). Add a field to
labthat records the string"direct"or"contraposition"and refuses to run unless the caller has named a strategy — enforcing the §4.5 rule "choose a strategy and announce it" in code. Discuss whether forcing the human to declare intent catches the converse error earlier. - Option D (compare to the Toolkit). Relate
screento the Toolkit'sfind_counterexample/holds_for_allfrom this chapter's Project Checkpoint. Are they the same function with different return conventions? Refactorscreento callfind_counterexampleand explain what you kept and what you added.
Key Takeaways
- Build the workflow, not just the answer. The lab makes the chapter's core distinction — evidence (a screen) versus the constructive core of a proof (a checked witness) — into a reusable object you can run on any new conjecture.
- A direct proof's witness can be coded and checked. Turning the manufactured integer ($m = 2k^2+2k$, $m = 2k^2$, $w = sc$) into a function and watching it pass is the most concrete possible grasp of "a proof is a construction" (§4.3).
- Contraposition reuses the direct engine. Because a contraposition is a direct proof of $\neg Q \rightarrow \neg P$, the same checker works — you only re-wire which predicate is the hypothesis and where the witness lives (§4.4).
- The lab falsifies fast but never certifies the infinite.
('witness fails', n)is gold — it pinpoints where a proof breaks;('ok', None)is only confidence over a finite domain. The written proof, valid for an arbitrary integer, is what makes the guarantee total (§4.1, theme four). - Know what a witness-checker cannot reach. Equation-with-a-witness conclusions (even/odd/divides) fit the lab; "irrational," "no largest prime," and other negative claims do not — foreshadowing the contradiction technique of Chapter 5.