Case Study: Building a Contract Checker from Quantified Specifications
"Every assertion you write is a quantifier you decided to check at runtime."
Executive Summary
In this build-focused case study you'll construct a small but real piece of infrastructure: a
contract checker that takes a function's precondition and postcondition — written as quantified
predicates — and enforces them every time the function runs. Then you'll build its partner, a
counterexample-searching property tester that hunts a finite domain for an input where a forall
claim breaks. Together they are the executable face of §2.6 (specifications as $\forall$/$\exists$
statements) and §2.4 (negation as counterexample search). You are, in miniature, building what tools
like Python's icontract and the property-based tester Hypothesis do — and you'll understand them
because you'll have written them from the quantifiers up.
Where Case Study 1 audited a fixed dataset, here you build reusable machinery and assemble it into a
verified sorted_merge function. The payoff is theme four made concrete: the contract checker
evaluates quantified specs over the inputs that actually occur, the property tester searches a finite
domain for counterexamples, and a proof (which we sketch) settles the rest.
Skills applied
- Turning a precondition / postcondition into
all()/any()predicates (§2.5, §2.6). - Building a decorator that evaluates a universal precondition and a universal/existential postcondition.
- Implementing counterexample search as the executable negation of a universal (§2.4).
- Specifying a real function (
mergeof two sorted lists) and checking it against its contract. - Distinguishing what runtime checking and finite search establish from what a proof establishes.
Background: assertions are quantifiers in disguise
You already write assert x >= 0 and assert is_sorted(arr). Each such assertion is a predicate you
chose to evaluate at a program point — and the interesting ones are quantified: assert all(x >= 0 for
x in arr) is literally "$\forall x \in arr,\ x \ge 0$," checked at runtime. §2.6 told us a function's
contract is two such statements: a precondition (the caller's obligation) and a postcondition
(the function's promise). This case study turns that idea into a tool: instead of scattering asserts by
hand, we'll attach the quantified contract to the function once and let a decorator check it on every
call.
🔗 Connection: This is the §2.6 binary-search contract, generalized. There the precondition was $\forall i\,(0 \le i < n-1 \rightarrow arr[i] \le arr[i+1])$ and the "found" postcondition was $\exists j\,(arr[j] = t)$. We'll make any such pair enforceable, not just hand-write the checks for one function.
Phase 1: Build the precondition checker
A precondition is a predicate on the function's inputs. We start with the simplest useful contract: a
single boolean function of the arguments that must be True on entry.
def require(precondition):
"""Decorator: check `precondition(*args)` before the function runs.
precondition returns True iff the call is legal (the caller's obligation, sec 2.6)."""
def decorate(fn):
def wrapped(*args):
if not precondition(*args):
raise AssertionError(f"precondition violated on {args}")
return fn(*args)
return wrapped
return decorate
@require(lambda a, b: a != 0) # precondition: divisor a is nonzero
def divide(a, b):
return b // a
print(divide(4, 20)) # legal: 4 != 0
# print(divide(0, 20)) # would raise AssertionError
# Expected output:
# 5
The decorator evaluates the precondition predicate and refuses the call when it fails — the executable
form of "the caller's obligation." The interesting case is when the precondition is itself a
universal over a collection, which is where §2.5's all() enters.
@require(lambda arr: all(arr[i] <= arr[i+1] for i in range(len(arr)-1)))
def binary_search_first(arr, target):
"""Precondition: arr is sorted ascending (forall i. arr[i] <= arr[i+1])."""
lo, hi, found = 0, len(arr) - 1, -1
while lo <= hi:
mid = (lo + hi) // 2
if arr[mid] == target:
found, hi = mid, mid - 1 # keep searching left for the FIRST index
elif arr[mid] < target:
lo = mid + 1
else:
hi = mid - 1
return found
print(binary_search_first([1, 3, 3, 3, 7], 3)) # sorted; first index of 3
# print(binary_search_first([3, 1, 2], 3)) # would raise: precondition violated
# Expected output:
# 1
The decorator now checks the quantified precondition $\forall i\,(arr[i] \le arr[i+1])$ on every call,
via exactly the all(...) from §2.5. A call on an unsorted list is rejected before the algorithm
whose correctness depends on sortedness ever runs.
💡 Intuition: The decorator is just "evaluate a $\forall$ at the door."
requiredoesn't know or care what the predicate is — it only knows that a precondition is a statement that must beTrueon entry, and that checking a universal means runningall(). The logic of §2.2 and §2.5 is the entire design.🔄 Check Your Understanding The
dividepreconditiona != 0is not visibly quantified. Is it still a predicate in the sense of §2.1? Over what (trivial) domain could you view its check as a degenerateall?
Answer
Yes —
a != 0is a unary predicate $P(a)$ on the single argument; checking it is the degenerate case of a quantifier over a one-element "domain" (just this call's argument). The universalall([P(a)])equalsP(a). The decorator treats both uniformly: evaluate the predicate, demandTrue.
Phase 2: Build the postcondition checker
A postcondition is a predicate on the inputs and the result — the function's promise on return. We extend the decorator to evaluate it after the call.
def contract(pre, post):
"""pre(*args) must hold on entry; post(result, *args) must hold on return.
pre is the caller's obligation; post is the function's guarantee (sec 2.6)."""
def decorate(fn):
def wrapped(*args):
if not pre(*args):
raise AssertionError(f"precondition violated on {args}")
result = fn(*args)
if not post(result, *args):
raise AssertionError(f"postcondition violated: f{args} = {result!r}")
return result
return wrapped
return decorate
Now we can state a universal postcondition. Consider absolutes(xs), which should return the
elementwise absolute values. Its postcondition is two universals: every output is non-negative, and
every output has the right magnitude.
@contract(
pre = lambda xs: True, # no obligation on input
post = lambda r, xs: (len(r) == len(xs)
and all(r[i] >= 0 for i in range(len(r))) # forall i. r[i] >= 0
and all(r[i] == abs(xs[i]) for i in range(len(r)))) # forall i. r[i] == |xs[i]|
)
def absolutes(xs):
return [abs(x) for x in xs]
print(absolutes([-3, 4, -1]))
# Expected output:
# [3, 4, 1]
The postcondition is the conjunction of two universal statements,
$\forall i\,(r[i] \ge 0)$ and $\forall i\,(r[i] = \lvert xs[i]\rvert)$, each evaluated with all(). A
buggy implementation — say return [x for x in xs] (forgetting abs) — would pass the length check but
fail the first universal on the first negative input, and the decorator would raise. The contract is
the spec; the body must earn it.
⚠️ Common Pitfall: a vacuously-true postcondition that checks nothing. If you wrote the postcondition as
all(r[i] >= 0 for i in range(len(r)))and forgot the length check, then on an empty resultr == []theall(...)is vacuously true (§2.5) — the contract would "pass" a function that wrongly returns[]for every input. A universal over an empty domain is always satisfied; guard the size explicitly (as we did withlen(r) == len(xs)) when emptiness would hide a bug. This is the §2.5 vacuous-truth pitfall biting a real contract.
Phase 3: Build a counterexample-searching property tester
The decorator checks the contract on the inputs that actually occur. A property tester is more
aggressive: it generates many inputs from a finite domain and searches for one that breaks a forall
claim — the executable negation of §2.4. The core is the chapter's counterexample, lifted to operate
on whole inputs rather than single domain elements.
def find_counterexample(property_fn, inputs):
"""Return the first x in `inputs` with property_fn(x) False
-- a witness to exists x. not property_fn(x) (sec 2.4),
or None if the property holds on every tested input (EVIDENCE, not proof)."""
for x in inputs:
if not property_fn(x):
return x
return None
This is the §2.4 counterexample finder, renamed for its job: property_fn is the universal we hope
holds, and a returned value is the negation $\exists x\,\neg P(x)$ made concrete. We test a claimed
property of absolutes — "the output equals the input elementwise" — which is false whenever an
input is negative, and watch the tester find the witness.
test_inputs = [[0, 1, 2], [5], [-3, 4], [10, 11, 12]] # a finite domain of inputs
claim = lambda xs: absolutes(xs) == xs # forall xs. abs(xs) == xs (FALSE!)
print(find_counterexample(claim, test_inputs))
# Expected output:
# [-3, 4]
Hand-trace it (you must, per the book's rule): [0,1,2] → absolutes returns [0,1,2] which equals the
input, property True; [5] → [5] == [5], True; [-3,4] → absolutes returns [3,4], which is
not [-3,4], property False → the loop returns [-3, 4] immediately. That returned list is the
counterexample: the executable form of $\exists xs\,\neg(\text{absolutes}(xs) = xs)$. The correct
property — absolutes(xs) == [abs(x) for x in xs] — would instead return None over these inputs
(no counterexample found).
🔗 Connection:
find_counterexamplereturningNonemeans "no counterexample in the tested inputs" — evidence the property holds, not proof, exactly as §2.4 and theme four stress. A real property-based tester (Hypothesis) is this function plus a clever input generator; the logic of "search a domain for a witness to the negation" is what you just built.🔄 Check Your Understanding If
find_counterexample(P, inputs)returnsNone, what is the strongest claim you may make, and what claim may you not make? Phrase both using §2.4's negation law.
Answer
You may claim: $\forall x \in \texttt{inputs},\ P(x)$ — the property holds over the finite set you tested (no witness to $\exists x\,\neg P(x)$ was found there). You may not claim $\forall x\,P(x)$ over the full, possibly infinite input space — an untested input could still be a counterexample. Only a proof closes that gap (theme four).
Phase 4: Specify and verify sorted_merge
Now assemble the pieces into something real: merging two sorted lists into one sorted list — the merge step of mergesort. We state its full contract as quantified statements, enforce it with the decorator, and stress it with the property tester.
The contract. Inputs a, b are each sorted ascending (precondition). The output m must be (i)
sorted ascending and (ii) a permutation of a + b — every element accounted for. Sortedness is a
universal; "permutation" we check via multiset equality (sorted(m) == sorted(a + b)), which is itself
the assertion that the same elements appear.
def is_sorted(xs):
"""forall i. 0 <= i < len-1 -> xs[i] <= xs[i+1] (sec 2.6 precondition predicate)."""
return all(xs[i] <= xs[i+1] for i in range(len(xs) - 1))
@contract(
pre = lambda a, b: is_sorted(a) and is_sorted(b),
post = lambda m, a, b: is_sorted(m) and sorted(m) == sorted(a + b)
)
def sorted_merge(a, b):
i = j = 0
out = []
while i < len(a) and j < len(b):
if a[i] <= b[j]:
out.append(a[i]); i += 1
else:
out.append(b[j]); j += 1
out.extend(a[i:]); out.extend(b[j:]) # one list is now exhausted
return out
print(sorted_merge([1, 4, 7], [2, 3, 8]))
# Expected output:
# [1, 2, 3, 4, 7, 8]
Hand-trace the merge on [1,4,7] and [2,3,8]: compare 1≤2 → take 1; 4 vs 2 → take 2; 4 vs 3 → take 3;
4 vs 8 → take 4; 7 vs 8 → take 7; a exhausted, extend with remaining b = [8]. Result
[1,2,3,4,7,8]. The postcondition checks: is_sorted([1,2,3,4,7,8]) is True, and
sorted([1,2,3,4,7,8]) == sorted([1,4,7,2,3,8]) is [1,2,3,4,7,8] == [1,2,3,4,7,8], True. Contract
satisfied.
Now hunt for bugs with the property tester. We feed pairs of sorted lists and assert the output is sorted — the postcondition's first half — as a standalone property:
pairs = [([1, 4, 7], [2, 3, 8]),
([], [5, 6]),
([1, 1, 1], [1, 1]),
([0, 10], [5])]
sorted_output = lambda pair: is_sorted(sorted_merge(pair[0], pair[1]))
print(find_counterexample(sorted_output, pairs))
# Expected output:
# None
None: every tested pair produced sorted output. Now introduce the classic bug — change a[i] <= b[j]
to a[i] < b[j] and mishandle equal elements, or more simply imagine the extend lines were
forgotten so leftover elements are dropped. The sorted(m) == sorted(a+b) half of the postcondition
would fail on the first call with leftovers (e.g. [1,4,7] & [2,3,8] drops 8), and the decorator
would raise postcondition violated. Two independent safety nets — the contract on real calls and the
property search over generated inputs — make the bug hard to miss.
💡 Intuition: The contract checks correctness where the program actually goes; the property tester checks it where you point it. Neither is a proof: both evaluate the quantifier over a finite set of inputs (§2.5). A proof that
sorted_mergemeets its postcondition for all sorted inputs uses a loop invariant — "outis always a sorted merge of the consumed prefixes" — which is the Chapter 6 technique. Spec now (this chapter), prove later (Ch. 6).🔗 Connection: Proving
sorted_mergecorrect for every input is a loop-invariant induction, the subject of Chapter 6; andmergeitself returns to center stage as the combine step of mergesort when we analyze divide-and-conquer recurrences in Chapter 18. The specification you wrote here is the $\forall$/$\exists$ statement those later proofs will discharge.
Discussion Questions
- The
contractdecorator evaluates the postcondition on every call in production, which costs time. When is paying that cost worth it, and when would you check contracts only in testing? Relate your answer to "a passing test is one true conjunct; correctness is the whole $\forall$" (theme two). - The
sorted_mergepostcondition checks "permutation ofa + b" withsorted(m) == sorted(a + b). That re-sorts inside the check — arguably defeating the purpose of an $O(n)$ merge. Propose a cheaper postcondition predicate that still pins down "same elements," and discuss what it does and doesn't guarantee. find_counterexamplereturningNoneis evidence, not proof. Construct a property ofsorted_mergethat is false but for which yourpairslist happens to contain no counterexample — illustrating exactly why finite search cannot certify a universal (§2.4, theme four).- The empty-list case
sorted_merge([], [5,6])exercised a vacuous truth:is_sorted([])isTrue. Walk through the precondition and postcondition for this call and confirm each quantifier's value over the empty domain (§2.5). Where could a vacuously-true check hide a real bug?
Your Turn: Extensions
- Option A (existential postcondition). Add
binary_searchwith the "found" postcondition $\exists j\,(arr[j] = target) \rightarrow arr[\text{result}] = target$, expressed in the decorator. Note that an existential postcondition is checked withany(...)— the dual of the universal checks above. - Option B (richer generator). Replace the hand-written
pairs/test_inputslists with a small generator that produces many sorted-list pairs from a fixed pool of values, then runfind_counterexampleover them. You'll have built a miniatureHypothesis. State precisely what aNoneover 10,000 generated inputs does and does not prove. - Option C (toolkit). Rebuild
find_counterexampleon top of the Toolkit'scounterexamplefrom this chapter's Project Checkpoint, andis_sortedon top offor_all. Confirm thatcounterexample(P, D) is Noneis exactlyfor_all(P, D)— the negation law $\neg\forall x\,P(x) \equiv \exists x\,\neg P(x)$ in code.
Key Takeaways
- A contract is a pair of quantified predicates, made enforceable. The decorator evaluates a
universal precondition with
all()on entry and a universal/existential postcondition on return — §2.6 turned into a tool. - Counterexample search is the executable negation of a universal.
find_counterexamplereturns the witness to $\exists x\,\neg P(x)$ — §2.4's law, running. - Watch vacuous truth in checks. A universal over an empty result is
Trueand can silently pass a buggy function; guard the size when emptiness would hide a fault (§2.5). - Runtime checks and finite search are evidence; a loop-invariant proof is certainty. Both safety nets evaluate the quantifier over finitely many inputs; only a proof (Chapter 6) settles the whole $\forall$ — theme four, built rather than just stated.