Case Study: Building a Contract Checker from Quantified Specifications

DataField.Dev

Case Study: Building a Contract Checker from Quantified Specifications

"Every assertion you write is a quantifier you decided to check at runtime."

Executive Summary

In this build-focused case study you'll construct a small but real piece of infrastructure: a contract checker that takes a function's precondition and postcondition — written as quantified predicates — and enforces them every time the function runs. Then you'll build its partner, a counterexample-searching property tester that hunts a finite domain for an input where a forall claim breaks. Together they are the executable face of §2.6 (specifications as $\forall$/$\exists$ statements) and §2.4 (negation as counterexample search). You are, in miniature, building what tools like Python's icontract and the property-based tester Hypothesis do — and you'll understand them because you'll have written them from the quantifiers up.

Where Case Study 1 audited a fixed dataset, here you build reusable machinery and assemble it into a verified sorted_merge function. The payoff is theme four made concrete: the contract checker evaluates quantified specs over the inputs that actually occur, the property tester searches a finite domain for counterexamples, and a proof (which we sketch) settles the rest.

Skills applied

Turning a precondition / postcondition into all() / any() predicates (§2.5, §2.6).
Building a decorator that evaluates a universal precondition and a universal/existential postcondition.
Implementing counterexample search as the executable negation of a universal (§2.4).
Specifying a real function (merge of two sorted lists) and checking it against its contract.
Distinguishing what runtime checking and finite search establish from what a proof establishes.

Background: assertions are quantifiers in disguise

You already write assert x >= 0 and assert is_sorted(arr). Each such assertion is a predicate you chose to evaluate at a program point — and the interesting ones are quantified: assert all(x >= 0 for x in arr) is literally "$\forall x \in arr,\ x \ge 0$," checked at runtime. §2.6 told us a function's contract is two such statements: a precondition (the caller's obligation) and a postcondition (the function's promise). This case study turns that idea into a tool: instead of scattering asserts by hand, we'll attach the quantified contract to the function once and let a decorator check it on every call.

🔗 Connection: This is the §2.6 binary-search contract, generalized. There the precondition was $\forall i\,(0 \le i < n-1 \rightarrow arr[i] \le arr[i+1])$ and the "found" postcondition was $\exists j\,(arr[j] = t)$. We'll make any such pair enforceable, not just hand-write the checks for one function.

Phase 1: Build the precondition checker

A precondition is a predicate on the function's inputs. We start with the simplest useful contract: a single boolean function of the arguments that must be True on entry.

def require(precondition):
    """Decorator: check `precondition(*args)` before the function runs.
    precondition returns True iff the call is legal (the caller's obligation, sec 2.6)."""
    def decorate(fn):
        def wrapped(*args):
            if not precondition(*args):
                raise AssertionError(f"precondition violated on {args}")
            return fn(*args)
        return wrapped
    return decorate

@require(lambda a, b: a != 0)            # precondition: divisor a is nonzero
def divide(a, b):
    return b // a

print(divide(4, 20))                     # legal: 4 != 0
# print(divide(0, 20))                   # would raise AssertionError
# Expected output:
# 5

The decorator evaluates the precondition predicate and refuses the call when it fails — the executable form of "the caller's obligation." The interesting case is when the precondition is itself a universal over a collection, which is where §2.5's all() enters.

@require(lambda arr: all(arr[i] <= arr[i+1] for i in range(len(arr)-1)))
def binary_search_first(arr, target):
    """Precondition: arr is sorted ascending  (forall i. arr[i] <= arr[i+1])."""
    lo, hi, found = 0, len(arr) - 1, -1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            found, hi = mid, mid - 1     # keep searching left for the FIRST index
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return found

print(binary_search_first([1, 3, 3, 3, 7], 3))    # sorted; first index of 3
# print(binary_search_first([3, 1, 2], 3))         # would raise: precondition violated
# Expected output:
# 1

The decorator now checks the quantified precondition $\forall i\,(arr[i] \le arr[i+1])$ on every call, via exactly the all(...) from §2.5. A call on an unsorted list is rejected before the algorithm whose correctness depends on sortedness ever runs.

💡 Intuition: The decorator is just "evaluate a $\forall$ at the door." require doesn't know or care what the predicate is — it only knows that a precondition is a statement that must be True on entry, and that checking a universal means running all(). The logic of §2.2 and §2.5 is the entire design.

🔄 Check Your Understanding The divide precondition a != 0 is not visibly quantified. Is it still a predicate in the sense of §2.1? Over what (trivial) domain could you view its check as a degenerate all?

Answer

Yes — a != 0 is a unary predicate $P(a)$ on the single argument; checking it is the degenerate case of a quantifier over a one-element "domain" (just this call's argument). The universal all([P(a)]) equals P(a). The decorator treats both uniformly: evaluate the predicate, demand True.

Phase 2: Build the postcondition checker

A postcondition is a predicate on the inputs and the result — the function's promise on return. We extend the decorator to evaluate it after the call.

def contract(pre, post):
    """pre(*args) must hold on entry; post(result, *args) must hold on return.
    pre is the caller's obligation; post is the function's guarantee (sec 2.6)."""
    def decorate(fn):
        def wrapped(*args):
            if not pre(*args):
                raise AssertionError(f"precondition violated on {args}")
            result = fn(*args)
            if not post(result, *args):
                raise AssertionError(f"postcondition violated: f{args} = {result!r}")
            return result
        return wrapped
    return decorate

Now we can state a universal postcondition. Consider absolutes(xs), which should return the elementwise absolute values. Its postcondition is two universals: every output is non-negative, and every output has the right magnitude.

@contract(
    pre  = lambda xs: True,                                   # no obligation on input
    post = lambda r, xs: (len(r) == len(xs)
                          and all(r[i] >= 0 for i in range(len(r)))          # forall i. r[i] >= 0
                          and all(r[i] == abs(xs[i]) for i in range(len(r))))  # forall i. r[i] == |xs[i]|
)
def absolutes(xs):
    return [abs(x) for x in xs]

print(absolutes([-3, 4, -1]))
# Expected output:
# [3, 4, 1]

The postcondition is the conjunction of two universal statements, $\forall i\,(r[i] \ge 0)$ and $\forall i\,(r[i] = \lvert xs[i]\rvert)$, each evaluated with all(). A buggy implementation — say return [x for x in xs] (forgetting abs) — would pass the length check but fail the first universal on the first negative input, and the decorator would raise. The contract is the spec; the body must earn it.

⚠️ Common Pitfall: a vacuously-true postcondition that checks nothing. If you wrote the postcondition as all(r[i] >= 0 for i in range(len(r))) and forgot the length check, then on an empty result r == [] the all(...) is vacuously true (§2.5) — the contract would "pass" a function that wrongly returns [] for every input. A universal over an empty domain is always satisfied; guard the size explicitly (as we did with len(r) == len(xs)) when emptiness would hide a bug. This is the §2.5 vacuous-truth pitfall biting a real contract.

Phase 3: Build a counterexample-searching property tester

The decorator checks the contract on the inputs that actually occur. A property tester is more aggressive: it generates many inputs from a finite domain and searches for one that breaks a forall claim — the executable negation of §2.4. The core is the chapter's counterexample, lifted to operate on whole inputs rather than single domain elements.

def find_counterexample(property_fn, inputs):
    """Return the first x in `inputs` with property_fn(x) False
    -- a witness to  exists x. not property_fn(x)  (sec 2.4),
    or None if the property holds on every tested input (EVIDENCE, not proof)."""
    for x in inputs:
        if not property_fn(x):
            return x
    return None

This is the §2.4 counterexample finder, renamed for its job: property_fn is the universal we hope holds, and a returned value is the negation $\exists x\,\neg P(x)$ made concrete. We test a claimed property of absolutes — "the output equals the input elementwise" — which is false whenever an input is negative, and watch the tester find the witness.

test_inputs = [[0, 1, 2], [5], [-3, 4], [10, 11, 12]]     # a finite domain of inputs
claim = lambda xs: absolutes(xs) == xs                    # forall xs. abs(xs) == xs   (FALSE!)

print(find_counterexample(claim, test_inputs))
# Expected output:
# [-3, 4]

Hand-trace it (you must, per the book's rule): [0,1,2] → absolutes returns [0,1,2] which equals the input, property True; [5] → [5] == [5], True; [-3,4] → absolutes returns [3,4], which is not [-3,4], property False → the loop returns [-3, 4] immediately. That returned list is the counterexample: the executable form of $\exists xs\,\neg(\text{absolutes}(xs) = xs)$. The correct property — absolutes(xs) == [abs(x) for x in xs] — would instead return None over these inputs (no counterexample found).

🔗 Connection: find_counterexample returning None means "no counterexample in the tested inputs" — evidence the property holds, not proof, exactly as §2.4 and theme four stress. A real property-based tester (Hypothesis) is this function plus a clever input generator; the logic of "search a domain for a witness to the negation" is what you just built.

🔄 Check Your Understanding If find_counterexample(P, inputs) returns None, what is the strongest claim you may make, and what claim may you not make? Phrase both using §2.4's negation law.

Answer

You may claim: $\forall x \in \texttt{inputs},\ P(x)$ — the property holds over the finite set you tested (no witness to $\exists x\,\neg P(x)$ was found there). You may not claim $\forall x\,P(x)$ over the full, possibly infinite input space — an untested input could still be a counterexample. Only a proof closes that gap (theme four).

Phase 4: Specify and verify `sorted_merge`

Now assemble the pieces into something real: merging two sorted lists into one sorted list — the merge step of mergesort. We state its full contract as quantified statements, enforce it with the decorator, and stress it with the property tester.

The contract. Inputs a, b are each sorted ascending (precondition). The output m must be (i) sorted ascending and (ii) a permutation of a + b — every element accounted for. Sortedness is a universal; "permutation" we check via multiset equality (sorted(m) == sorted(a + b)), which is itself the assertion that the same elements appear.

def is_sorted(xs):
    """forall i. 0 <= i < len-1  ->  xs[i] <= xs[i+1]   (sec 2.6 precondition predicate)."""
    return all(xs[i] <= xs[i+1] for i in range(len(xs) - 1))

@contract(
    pre  = lambda a, b: is_sorted(a) and is_sorted(b),
    post = lambda m, a, b: is_sorted(m) and sorted(m) == sorted(a + b)
)
def sorted_merge(a, b):
    i = j = 0
    out = []
    while i < len(a) and j < len(b):
        if a[i] <= b[j]:
            out.append(a[i]); i += 1
        else:
            out.append(b[j]); j += 1
    out.extend(a[i:]); out.extend(b[j:])     # one list is now exhausted
    return out

print(sorted_merge([1, 4, 7], [2, 3, 8]))
# Expected output:
# [1, 2, 3, 4, 7, 8]

Hand-trace the merge on [1,4,7] and [2,3,8]: compare 1≤2 → take 1; 4 vs 2 → take 2; 4 vs 3 → take 3; 4 vs 8 → take 4; 7 vs 8 → take 7; a exhausted, extend with remaining b = [8]. Result [1,2,3,4,7,8]. The postcondition checks: is_sorted([1,2,3,4,7,8]) is True, and sorted([1,2,3,4,7,8]) == sorted([1,4,7,2,3,8]) is [1,2,3,4,7,8] == [1,2,3,4,7,8], True. Contract satisfied.

Now hunt for bugs with the property tester. We feed pairs of sorted lists and assert the output is sorted — the postcondition's first half — as a standalone property:

pairs = [([1, 4, 7], [2, 3, 8]),
         ([], [5, 6]),
         ([1, 1, 1], [1, 1]),
         ([0, 10], [5])]
sorted_output = lambda pair: is_sorted(sorted_merge(pair[0], pair[1]))

print(find_counterexample(sorted_output, pairs))
# Expected output:
# None

None: every tested pair produced sorted output. Now introduce the classic bug — change a[i] <= b[j] to a[i] < b[j] and mishandle equal elements, or more simply imagine the extend lines were forgotten so leftover elements are dropped. The sorted(m) == sorted(a+b) half of the postcondition would fail on the first call with leftovers (e.g. [1,4,7] & [2,3,8] drops 8), and the decorator would raise postcondition violated. Two independent safety nets — the contract on real calls and the property search over generated inputs — make the bug hard to miss.

💡 Intuition: The contract checks correctness where the program actually goes; the property tester checks it where you point it. Neither is a proof: both evaluate the quantifier over a finite set of inputs (§2.5). A proof that sorted_merge meets its postcondition for all sorted inputs uses a loop invariant — "out is always a sorted merge of the consumed prefixes" — which is the Chapter 6 technique. Spec now (this chapter), prove later (Ch. 6).

🔗 Connection: Proving sorted_merge correct for every input is a loop-invariant induction, the subject of Chapter 6; and merge itself returns to center stage as the combine step of mergesort when we analyze divide-and-conquer recurrences in Chapter 18. The specification you wrote here is the $\forall$/$\exists$ statement those later proofs will discharge.

Discussion Questions

The contract decorator evaluates the postcondition on every call in production, which costs time. When is paying that cost worth it, and when would you check contracts only in testing? Relate your answer to "a passing test is one true conjunct; correctness is the whole $\forall$" (theme two).
The sorted_merge postcondition checks "permutation of a + b" with sorted(m) == sorted(a + b). That re-sorts inside the check — arguably defeating the purpose of an $O(n)$ merge. Propose a cheaper postcondition predicate that still pins down "same elements," and discuss what it does and doesn't guarantee.
find_counterexample returning None is evidence, not proof. Construct a property of sorted_merge that is false but for which your pairs list happens to contain no counterexample — illustrating exactly why finite search cannot certify a universal (§2.4, theme four).
The empty-list case sorted_merge([], [5,6]) exercised a vacuous truth: is_sorted([]) is True. Walk through the precondition and postcondition for this call and confirm each quantifier's value over the empty domain (§2.5). Where could a vacuously-true check hide a real bug?

Your Turn: Extensions

Option A (existential postcondition). Add binary_search with the "found" postcondition $\exists j\,(arr[j] = target) \rightarrow arr[\text{result}] = target$, expressed in the decorator. Note that an existential postcondition is checked with any(...) — the dual of the universal checks above.
Option B (richer generator). Replace the hand-written pairs/test_inputs lists with a small generator that produces many sorted-list pairs from a fixed pool of values, then run find_counterexample over them. You'll have built a miniature Hypothesis. State precisely what a None over 10,000 generated inputs does and does not prove.
Option C (toolkit). Rebuild find_counterexample on top of the Toolkit's counterexample from this chapter's Project Checkpoint, and is_sorted on top of for_all. Confirm that counterexample(P, D) is None is exactly for_all(P, D) — the negation law $\neg\forall x\,P(x) \equiv \exists x\,\neg P(x)$ in code.

Key Takeaways

A contract is a pair of quantified predicates, made enforceable. The decorator evaluates a universal precondition with all() on entry and a universal/existential postcondition on return — §2.6 turned into a tool.
Counterexample search is the executable negation of a universal. find_counterexample returns the witness to $\exists x\,\neg P(x)$ — §2.4's law, running.
Watch vacuous truth in checks. A universal over an empty result is True and can silently pass a buggy function; guard the size when emptiness would hide a fault (§2.5).
Runtime checks and finite search are evidence; a loop-invariant proof is certainty. Both safety nets evaluate the quantifier over finitely many inputs; only a proof (Chapter 6) settles the whole $\forall$ — theme four, built rather than just stated.

Case Study: Building a Contract Checker from Quantified Specifications

Executive Summary

Background: assertions are quantifiers in disguise

Phase 1: Build the precondition checker

Phase 2: Build the postcondition checker

Phase 3: Build a counterexample-searching property tester

Phase 4: Specify and verify sorted_merge

Discussion Questions

Your Turn: Extensions

Key Takeaways

Phase 4: Specify and verify `sorted_merge`