Case Study: Building a Witness-Checked Conjecture Lab

DataField.Dev

Case Study: Building a Witness-Checked Conjecture Lab

"First make the conjecture cheap to test. Then make the proof impossible to fake."

Executive Summary

In this build-focused case study you will construct a small tool — a conjecture lab — that captures the workflow this whole chapter is about: test a universal claim on many cases, then back it with a proof whose key step is a concrete witness. You will write it in three layers. First, a counterexample searcher that screens a $\forall$ claim cheaply (theme four: computation tests). Second, a witness checker that takes the constructive heart of a direct proof — the integer the proof "manufactures" — and verifies it does its job on a domain. Third, a tiny proof-shape recorder that distinguishes a direct proof from a proof by contraposition, so the lab forces you to name your strategy the way §4.5 demands.

The lab is deliberately small and never executed (every output is hand-derived). The point is not the code; it is that building the workflow makes the chapter's central distinction — evidence versus proof — concrete and reusable. By the end you will have a reusable harness, three proven theorems that ran through it, and a clear sense of what your harness does not certify.

Skills applied

Building a small verification harness incrementally, layer by layer (§4.1, §4.3).
Encoding the witness of a direct proof as a function and checking it (§4.2, §4.3).
Encoding a proof by contraposition and seeing why its witness lives on the $\neg Q$ side (§4.4).
Separating "no counterexample found" (evidence) from "proof" (certainty) in code (§4.1, theme four).
Designing for the limits of testing — knowing what the harness cannot prove (§4.1).

Background: why build a lab at all?

You already have two halves of the workflow lying around. The chapter's Project Checkpoint gave you find_counterexample(predicate, domain) — the testing half. And every direct proof in §4.3 had the same shape: assume the hypothesis, then exhibit a witness integer (the "$m$" in $n^2 = 2m+1$, the "$3c$" in $n = 2(3c)$). What is missing is a tool that holds those two halves together — that lets you screen a conjecture, and then, once you have a candidate witness rule from a proof, check that the witness actually works across a domain before you trust your algebra.

🔗 Connection. This is theme four — computation and proof are complementary — turned into a single object. The same instinct scales: when you build RSA in Chapter 25, you will test key generation on small primes before trusting the number theory that proves it correct (Chapter 23). A conjecture lab is training wheels for that habit.

💡 Intuition: A direct proof of "$P(n) \rightarrow Q(n)$" is really a function: feed it the data that makes $P(n)$ true, and it returns the witness that makes $Q(n)$ true. Our lab makes that function literal. If your witness function is wrong, the checker catches it on some $n$ — exactly where your proof's algebra slipped.

Phase 1: The screening layer (test the conjecture)

Start with the cheap screen. Given a conjecture as a predicate over a domain, look for a counterexample. This is the Project Checkpoint function, repackaged into the lab with a friendlier report.

def screen(claim, domain):
    """Screen 'for all x in domain, claim(x)'. Return ('refuted', x) at the
    first failure, else ('no counterexample', None). NOT a proof either way."""
    for x in domain:
        if not claim(x):
            return ("refuted", x)
    return ("no counterexample", None)

# Conjecture 1: for every integer n, if n is odd then 5n+3 is even.
c1 = lambda n: (n % 2 == 0) or (5 * n + 3) % 2 == 0   # "n odd -> 5n+3 even"
print(screen(c1, range(-100, 101)))
# Expected output:
# ('no counterexample', None)

Hand-trace the report: c1(n) is True whenever n is even (the hypothesis fails, so the conditional holds vacuously) and, for odd n, whenever 5n+3 is even. For odd n, $5n + 3$ is (odd $\times$ odd) $+$ odd $=$ odd $+$ odd $=$ even, so c1(n) is True for every n in range. The loop finds no failure and returns ('no counterexample', None).

⚠️ Common Pitfall: It is tempting to read ('no counterexample', None) as "proved." It is not. The screen examined $201$ integers; the claim is about infinitely many. The whole reason for Phases 2 and 3 is that this report is evidence, and evidence is where a proof begins, not ends (§4.1).

🔄 Check Your Understanding Why does c1 return True for even n even though the claim only talks about odd n? Which row of the implication truth table is that?

Answer

For even n, the hypothesis "$n$ is odd" is false, and a conditional with a false hypothesis is vacuously true (§4.5) — rows 3–4 of the $\rightarrow$ table. We encode the conditional P -> Q as (not P) or Q, so c1 correctly reports True on the rows where P fails.

Phase 2: The witness-checker (the heart of a direct proof)

Now the new piece. A direct proof of "$P(n) \rightarrow Q(n)$" works by exhibiting a witness — for an "even/odd/divides" conclusion, an integer $w$ such that the conclusion's defining equation holds. We encode three things: the hypothesis predicate, a witness function $w(n)$ extracted from the proof, and a check that the conclusion's equation holds with that witness.

def check_direct(hyp, witness, verify, domain):
    """For each n in domain with hyp(n) true, confirm verify(n, witness(n)).
    Returns ('ok', None) if the witness works everywhere the hypothesis holds,
    else ('witness fails', n). This is the constructive core of a direct proof."""
    for n in domain:
        if hyp(n) and not verify(n, witness(n)):
            return ("witness fails", n)
    return ("ok", None)

# Theorem (Example 1, sec 4.3): if n is odd, then n^2 is odd.
# Proof's witness: n = 2k+1 gives n^2 = 2(2k^2+2k)+1, so m = 2k^2 + 2k.
# In terms of n itself (n = 2k+1 => k = (n-1)//2):
hyp_odd   = lambda n: n % 2 == 1
witness_m = lambda n: 2 * ((n - 1) // 2) ** 2 + 2 * ((n - 1) // 2)
verify_sq = lambda n, m: n * n == 2 * m + 1     # n^2 == 2m + 1  (def. of odd)
print(check_direct(hyp_odd, witness_m, verify_sq, range(-99, 100, 2)))
# Expected output:
# ('ok', None)

This is the part that makes the abstraction click. The proof of Example 1 manufactured the integer $m = 2k^2 + 2k$ so that $n^2 = 2m + 1$. Here we turn that manufactured integer into the function witness_m, and check_direct confirms $n^2 = 2m + 1$ for every odd $n$ in the domain. Let us hand-trace one value to see the witness is real, not magic.

Take $n = 5$ (so $k = (5-1)//2 = 2$). Then witness_m(5) $= 2 \cdot 2^2 + 2 \cdot 2 = 8 + 4 = 12$, and verify_sq(5, 12) checks $5^2 = 2 \cdot 12 + 1$, i.e. $25 = 25$. True. The witness $m = 12$ is exactly the integer the proof promised. Repeating for every odd $n$ in range yields no failure, so the report is ('ok', None).

🚪 Threshold Concept. Here is the idea worth carrying out of this chapter: a direct proof is a construction, and its witness can be run. Once you see "$\exists m,\ n^2 = 2m+1$" as "here is a program computing $m$ from $n$," the line between proving and building dissolves. The same realization powers constructive mathematics, type theory, and the proof-carrying code that ships in real compilers. For now: if you can write the witness function and watch it pass, your direct proof's core is sound; if it fails on some $n$, your algebra slipped exactly there.

🐛 Find the Error. A teammate adds a "theorem": "if $n$ is odd, then $n^2$ is odd," but supplies the wrong witness witness_bad = lambda n: ((n - 1) // 2) ** 2 (they forgot the cross term). What does check_direct(hyp_odd, witness_bad, verify_sq, range(1, 10, 2)) return, and what does that tell them about their proof?

Answer

It returns ('witness fails', 3). Trace $n = 3$: $k = 1$, witness_bad(3) $= 1^2 = 1$, and verify_sq(3, 1) checks $9 = 2 \cdot 1 + 1 = 3$ — false. (Even $n = 1$ passes by luck: witness $0$, $1 = 1$; the failure first shows at $3$.) The harness pinpoints $n = 3$ as where the algebra breaks — the missing cross term $2k$ matters. A wrong witness is a wrong proof, and the checker finds the exact case that exposes it.

Phase 3: Encoding a proof by contraposition

Direct proofs put the witness on the conclusion side. A proof by contraposition of "$P(n) \rightarrow Q(n)$" instead proves "$\neg Q(n) \rightarrow \neg P(n)$," so its witness lives on the $\neg P$ side. We do not need a new checker — we reuse check_direct, but feed it the contrapositive's hypothesis ($\neg Q$) and a witness for $\neg P$. The lab just has to record which shape we used.

Take the theorem the chapter could not prove directly (Example 4, §4.4): "if $n^2$ is odd, then $n$ is odd." Its contrapositive is "if $n$ is even, then $n^2$ is even," whose direct proof exhibits the witness $n^2 = 2(2k^2)$, i.e. $m = 2k^2$.

def check_contrapositive(not_q, witness, verify, domain):
    """Check the contrapositive 'not_q(n) -> not_p(n)' directly, via a witness
    for not_p. Same engine as check_direct -- only the labels differ."""
    return check_direct(not_q, witness, verify, domain)   # reuse the core

# Theorem (Example 4): if n^2 is odd then n is odd.
# Contrapositive: if n is EVEN (not_q) then n^2 is EVEN (not_p).
# Direct proof of contrapositive: n = 2k => n^2 = 2(2k^2), witness m = 2k^2.
not_q_even   = lambda n: n % 2 == 0                 # "n is even" = not "n is odd"
witness_even = lambda n: 2 * (n // 2) ** 2          # m with n^2 = 2m
verify_even  = lambda n, m: n * n == 2 * m          # n^2 == 2m  (def. of even)
print(check_contrapositive(not_q_even, witness_even, verify_even, range(-100, 101, 2)))
# Expected output:
# ('ok', None)

Hand-trace $n = 6$: $k = 3$, witness_even(6) $= 2 \cdot 3^2 = 18$, and verify_even(6, 18) checks $36 = 2 \cdot 18 = 36$. True. Across all even $n$ in range, no failure: ('ok', None). We have now checked both halves of the biconditional "$n$ odd iff $n^2$ odd" — the forward half by check_direct (Phase 2) and the reverse half by check_contrapositive (here) — mirroring exactly how §4.3–4.4 assembled the full theorem from a direct proof plus a contraposition.

⚠️ Common Pitfall: feeding the converse instead of the contrapositive. A natural slip is to test "if $n$ is even then $n^2$ is even" and believe you have certified the converse direction too. You have not — that conditional is the contrapositive of "$n^2$ odd $\rightarrow n$ odd," not of "$n$ odd $\rightarrow n^2$ odd." The harness is honest only if you wire $\neg Q$ and $\neg P$ correctly; garbage strategy in, garbage certificate out. Write $\neg P$ and $\neg Q$ explicitly (§4.4) before you build the predicates.

Phase 4: The lab in one place, and its limits

Tie the layers into a single driver that, for one theorem, screens it and then checks its witness — producing exactly the chapter's workflow: test, then prove (constructively).

def lab(name, claim, hyp, witness, verify, domain):
    s = screen(claim, domain)              # cheap evidence
    c = check_direct(hyp, witness, verify, domain)   # constructive core
    return (name, s, c)

# Run the "odd -> odd square" theorem through both stages.
c1   = lambda n: (n % 2 == 0) or (n * n) % 2 == 1     # the conditional, for screening
print(lab("odd^2 is odd", c1, hyp_odd, witness_m, verify_sq, range(1, 60, 2)))
# Expected output:
# ('odd^2 is odd', ('no counterexample', None), ('ok', None))

The report says two different things, and the difference is the whole lesson. ('no counterexample', None) is evidence: across the screened domain, nothing falsified the claim. ('ok', None) is the constructive core of a proof: a witness rule, extracted from the §4.3 proof, demonstrably produces the required equation on every hypothesis-satisfying input checked. Neither line is, by itself, a finished proof — but together they encode the two activities the chapter insists you keep separate and keep both.

The honest limits. What the lab does not do: it does not prove anything for the integers outside domain. check_direct returning ('ok', None) confirms your witness works on the sampled inputs; it cannot confirm your witness formula is correct for all $n$ — only the algebra in your written proof does that. The lab's value is catching a wrong witness fast (a ('witness fails', n)) and giving you confidence to invest in the proof, not replacing the proof. Mistaking ('ok', None) for QED is the very error §4.1 warned against, now wearing a function's clothing.

🔄 Check Your Understanding Suppose check_direct returns ('ok', None) on range(-99, 100, 2) for a witness that is actually wrong for some gigantic odd $n$ outside that range. Has the lab lied? What is the lab actually certifying?

Answer

The lab has not lied — it certifies exactly what it says: the witness works for every (odd) $n$ in the domain it checked. It makes no claim about $n$ outside that domain. A witness correct on a finite sample but wrong in general is precisely why the written proof's algebra (valid for an arbitrary $n$) is irreplaceable. The lab is a fast falsifier and a confidence-builder, not a universal certifier.

Discussion Questions

Phase 2's check_direct and Phase 3's check_contrapositive share one engine. Explain why the same code checks both — and what changes (the predicates, not the algorithm) when you switch strategies. Tie this to "a contraposition is a direct proof of the contrapositive" (§4.4).
The lab can return ('witness fails', n) for a true theorem if you supply a wrong witness. Does that mean the theorem is false? Distinguish "the proof attempt is broken" from "the claim is false," referencing §4.6.
screen and check_direct could disagree: screen finds no counterexample, but check_direct reports a witness failure. What would that combination tell you about your proof versus the claim? (Hint: the claim might be true while your witness rule is wrong.)
Why can no enhancement to this lab ever turn ('ok', None) into a genuine proof for all integers? Connect to §4.1's argument that a finite check cannot cover an infinite set.
The lab encodes "even," "odd," and "divides" conclusions as equations with a witness. What kinds of conclusions could it not check this way (e.g., "$\sqrt 2$ is irrational")? What does that reveal about the reach of constructive (witness-based) proof versus the contradiction proofs of Chapter 5?

Your Turn: Extensions

Option A (divisibility witnesses). Extend the lab to the theorem "if $a \mid b$ then $a \mid bc$" (Exercise 4.9). Encode the hypothesis on pairs $(a, b, c)$, write the witness ($b = a s \Rightarrow bc = a(sc)$, so $w = sc$), and a `verify` that checks $bc = a \cdot w$. Run it over a small grid of triples and hand-derive the ('ok', None).
Option B (catch a real bug). Re-implement the odd-square witness with the deliberate error from the Find-the-Error box ($m = k^2$, missing the cross term) and run the full lab. Show that screen still says ('no counterexample', None) (because the claim is true) while check_direct says ('witness fails', 3) (because the witness is wrong). Write one sentence on why this is the most instructive output the lab can produce.
Option C (strategy recorder). Add a field to lab that records the string "direct" or "contraposition" and refuses to run unless the caller has named a strategy — enforcing the §4.5 rule "choose a strategy and announce it" in code. Discuss whether forcing the human to declare intent catches the converse error earlier.
Option D (compare to the Toolkit). Relate screen to the Toolkit's find_counterexample / holds_for_all from this chapter's Project Checkpoint. Are they the same function with different return conventions? Refactor screen to call find_counterexample and explain what you kept and what you added.

Key Takeaways

Build the workflow, not just the answer. The lab makes the chapter's core distinction — evidence (a screen) versus the constructive core of a proof (a checked witness) — into a reusable object you can run on any new conjecture.
A direct proof's witness can be coded and checked. Turning the manufactured integer ($m = 2k^2+2k$, $m = 2k^2$, $w = sc$) into a function and watching it pass is the most concrete possible grasp of "a proof is a construction" (§4.3).
Contraposition reuses the direct engine. Because a contraposition is a direct proof of $\neg Q \rightarrow \neg P$, the same checker works — you only re-wire which predicate is the hypothesis and where the witness lives (§4.4).
The lab falsifies fast but never certifies the infinite. ('witness fails', n) is gold — it pinpoints where a proof breaks; ('ok', None) is only confidence over a finite domain. The written proof, valid for an arbitrary integer, is what makes the guarantee total (§4.1, theme four).
Know what a witness-checker cannot reach. Equation-with-a-witness conclusions (even/odd/divides) fit the lab; "irrational," "no largest prime," and other negative claims do not — foreshadowing the contradiction technique of Chapter 5.