Case Study: Building a Feature-Flag Configuration Explorer

DataField.Dev

Case Study: Building a Feature-Flag Configuration Explorer

"A set is a Many that allows itself to be thought of as a One." — Georg Cantor

Executive Summary

In this build-focused case study you will construct, from scratch, a small tool that enumerates every possible configuration of a set of feature flags and pairs each configuration with each environment it might run in. Modern software ships behind feature flags — independent on/off switches like dark_mode, new_checkout, beta_search — and a tester needs to reason about all the on/off combinations, because a bug can hide in a combination nobody tried. "All on/off combinations of a set of flags" is precisely the power set of §8.2, and "every (configuration, environment) pair" is the Cartesian product of §8.4. You will build the power-set generator using the doubling algorithm from the chapter's Project Checkpoint, verify it produces exactly $2^n$ configurations as the theorem guarantees, then combine it with environments via a product, and finally hand-derive the full output so you can trust the code without running it.

Skills applied

Implementing the power set as a set of sets using frozenset (§8.2, §8.6).
Verifying $\lvert \mathcal{P}(S) \rvert = 2^{\lvert S \rvert}$ against the doubling algorithm (§8.2).
Building a Cartesian product to pair configurations with environments (§8.4).
Reasoning about the cardinality explosion $2^n \cdot m$ and what it means for test budgets.

Background

The scenario

This is a constructed teaching scenario (a hypothetical example), but the structure is exactly how feature-flag test matrices are reasoned about. A team has three independent feature flags:

$$F = \{\texttt{dark\_mode},\ \texttt{new\_checkout},\ \texttt{beta\_search}\}.$$

A configuration is a choice of which flags are on. Turning on dark_mode and beta_search while leaving new_checkout off is the configuration $\{\texttt{dark\_mode}, \texttt{beta\_search}\}$. Turning everything off is the empty configuration $\emptyset$ (a perfectly real configuration — the box is empty but it is still a box, §8.1). So:

The set of all configurations of the flags $F$ is exactly the power set $\mathcal{P}(F)$.

The team also runs in two environments, $E = \{\texttt{staging}, \texttt{prod}\}$, and wants a test matrix: every configuration paired with every environment. That is the Cartesian product $\mathcal{P}(F) \times E$.

💡 Intuition: A configuration is built by walking the flag list and making one independent yes/no decision per flag — "is this flag on?" That is the exact mental model behind the $2^n$ theorem in §8.2: $n$ independent binary choices give $2^n$ outcomes. A "set of flags that are on" is a subset of $F$, so the configurations are the subsets, and the subsets are $\mathcal{P}(F)$.

Why this matters

Feature flags multiply fast. The reason production incidents so often trace to "an untested combination of flags" is the $2^n$ growth you are about to build: ten independent flags already have $2^{10} = 1024$ configurations, and pairing them with a handful of environments multiplies again. A tool that enumerates the space makes the size visible and lets a tester sample it deliberately instead of hoping. Building the enumerator also cements the chapter's deepest point — that the power set is not an abstract curiosity but a constructible object whose size you can predict and verify.

Phase 1: Represent a single configuration

A configuration is a set of "on" flags. But we are about to put configurations inside another set (the set of all configurations), and §8.6 warns that the elements of a set must be hashable — a plain set is not. So each configuration must be a frozenset.

# A configuration = the frozenset of flags that are ON.
c1 = frozenset({"dark_mode", "beta_search"})   # two flags on, new_checkout off
c2 = frozenset()                                # the empty configuration: all off
c3 = frozenset({"dark_mode", "new_checkout", "beta_search"})  # all on

print(len(c1))                       # number of ON flags in c1
print("new_checkout" in c1)          # is new_checkout on in c1?
print(c2 == frozenset())             # the all-off config is the empty set
# Expected output:
# 2
# False
# True

⚠️ Common Pitfall: Do not represent a configuration as a list of on-flags. Two lists ["dark_mode", "beta_search"] and ["beta_search", "dark_mode"] are different objects (order matters for lists), but they describe the same configuration — order is irrelevant to "which flags are on." A frozenset enforces §8.1's "no order, no duplicates" automatically, so equal configurations compare equal, exactly as set equality (§8.2) requires.

Phase 2: Build the power set (the doubling algorithm)

Now generate all configurations. We implement the power set from scratch using the doubling algorithm from this chapter's Project Checkpoint — the $2^n$ proof of §8.2 turned into code. It starts with {∅} and, for each flag, keeps every configuration built so far and adds a copy of each with that flag switched on, doubling the collection per flag.

def all_configurations(flags):
    """{ A | A subseteq flags }, as a set of frozensets.
    Each pass over a flag DOUBLES the collection: keep each config,
    plus a copy with this flag turned on. After n flags: 2**n configs."""
    configs = {frozenset()}                  # start: just the all-off configuration
    for flag in flags:                       # process one flag at a time
        with_flag = {c | {flag} for c in configs}   # each config, now with `flag` on
        configs = configs | with_flag        # keep both "off" and "on" versions
    return configs

F = ["dark_mode", "new_checkout", "beta_search"]
configs = all_configurations(F)
print(len(configs))                          # should be 2 ** 3 = 8
print(frozenset({"dark_mode", "beta_search"}) in configs)   # is that config present?
print(frozenset() in configs)                # is the all-off config present?
# Expected output:
# 8
# True
# True

Let us hand-trace the doubling to be sure the count is right, processing the flags in order dark_mode, then new_checkout, then beta_search. Write d, n, b for the three flags.

After processing	`configs` (each row is one frozenset)	size
(start)	$\emptyset$	1
`d`	$\emptyset,\ \{d\}$	2
`n`	$\emptyset,\ \{d\},\ \{n\},\ \{d,n\}$	4
`b`	$\emptyset,\ \{d\},\ \{n\},\ \{d,n\},\ \{b\},\ \{d,b\},\ \{n,b\},\ \{d,n,b\}$	8

Each row doubled the one above it: $1 \to 2 \to 4 \to 8$. After three flags the count has doubled three times, $2^3 = 8$, exactly the §8.2 theorem $\lvert \mathcal{P}(S) \rvert = 2^{\lvert S \rvert}$. The "with flag / without flag" branch in the code is the independent binary choice from the theorem's combinatorial proof, running in a loop.

🔗 Connection. This doubling is the same structure as the induction in Exercise 8.27: the base case is $\{\emptyset\}$ (one subset of the empty set), and each new element splits every existing subset into two — with it and without it — doubling the count. The algorithm is the inductive proof, and the loop invariant is "after processing the first $j$ flags, configs is exactly $\mathcal{P}(\{$first $j$ flags$\})$, which has $2^j$ elements" (Exercise 8.28).

🔄 Check Your Understanding In the hand-trace, how many configurations have dark_mode on? Predict the count using the §8.2 reasoning before counting the rows.

Answer

Four. Fix dark_mode as "on," then the other two flags are free (each on or off): $2^2 = 4$ configurations — namely $\{d\}, \{d,n\}, \{d,b\}, \{d,n,b\}$. Exactly half of the $8$ configurations, matching the §8.2 Check-Your-Understanding result that half the subsets contain a fixed element.

Phase 3: Verify the count matches the theorem

A good build verifies its own invariant. The §8.2 theorem predicts $\lvert \mathcal{P}(F) \rvert = 2^{\lvert F \rvert}$ for *any* flag set, so we write a check that compares the generated count to $2^n$.

def power_set_count_matches(flags):
    """True iff the generated power set has exactly 2**n elements (n = #flags)."""
    n = len(set(flags))                       # distinct flags only
    generated = len(all_configurations(flags))
    return generated == 2 ** n

print(power_set_count_matches([]))                       # 2**0 = 1
print(power_set_count_matches(["a"]))                    # 2**1 = 2
print(power_set_count_matches(["a", "b", "c"]))          # 2**3 = 8
print(power_set_count_matches(["a", "b", "c", "d", "e"]))  # 2**5 = 32
# Expected output:
# True
# True
# True
# True

Each call returns True because the doubling algorithm produces exactly $2^n$ frozensets — once for the empty flag list (the lone configuration $\emptyset$, $2^0 = 1$), and doubling thereafter. This is theme four in miniature: the test confirms the theorem on these cases, and the theorem (plus the loop invariant) guarantees it for all cases the test cannot reach.

⚠️ Common Pitfall: If you had stored configurations as plain set objects instead of frozenset, the line configs = {frozenset()} would already fail — you cannot put a mutable set inside a set (it is unhashable, §8.6). Worse, a subtle version that "works" by storing tuples would let ("dark_mode","beta_search") and ("beta_search","dark_mode") count as two configurations, inflating the count past $2^n$ and silently breaking the verification. frozenset is what makes equal configurations equal.

Phase 4: Pair configurations with environments (Cartesian product)

The test matrix is every configuration paired with every environment — the Cartesian product $\mathcal{P}(F) \times E$ from §8.4. We build it with the comprehension that reads like the set-builder definition $\{(c, e) \mid c \in \mathcal{P}(F),\ e \in E\}$.

def test_matrix(flags, environments):
    """{ (config, env) | config in P(flags), env in environments }.
    Size = |P(flags)| * |environments| = 2**|flags| * |environments|."""
    configs = all_configurations(flags)
    return {(c, e) for c in configs for e in environments}   # one pair per (config, env)

F = ["dark_mode", "new_checkout", "beta_search"]
E = ["staging", "prod"]
matrix = test_matrix(F, E)
print(len(matrix))           # |P(F)| * |E| = 8 * 2
print((frozenset(), "prod") in matrix)        # all-off config in prod: a real test case
print((frozenset({"dark_mode"}), "staging") in matrix)
# Expected output:
# 16
# True
# True

By the product rule of §8.4, $\lvert \mathcal{P}(F) \times E \rvert = \lvert \mathcal{P}(F) \rvert \cdot \lvert E \rvert = 8 \cdot 2 = 16$. To build one element of the matrix you choose a configuration ($8$ ways) and an environment ($2$ ways), so there are $8 \cdot 2 = 16$ pairs — the printed count confirms it. Each pair $(c, e)$ is a concrete, runnable test case: "deploy with exactly these flags on, in this environment."

🚪 Threshold Concept. Watch the size: with $n$ flags and $m$ environments the matrix has $2^n \cdot m$ cases. At $n = 10$, $m = 3$ that is already $3072$ — too many to run by hand, and this is one small feature area. The power set's exponential growth is the mathematical reason exhaustive testing of flag combinations is infeasible at scale, and the reason engineers must sample the space (e.g., test all single-flag and all pairwise combinations) rather than enumerate it. Seeing $2^n$ here is seeing why a whole subfield (combinatorial testing) exists.

Phase 5: Putting it together — a configuration report

Finally, assemble the pieces into a small report the team can read: how big the space is, and a couple of sample cases. This is just the Toolkit operations combined.

def report(flags, environments):
    n = len(set(flags))
    configs = all_configurations(flags)
    matrix = test_matrix(flags, environments)
    print(f"flags: {n}  ->  configurations: {len(configs)} (= 2**{n})")
    print(f"environments: {len(environments)}  ->  test cases: {len(matrix)}")
    # show the all-on and all-off configurations explicitly:
    print("all-off config present:", frozenset() in configs)
    print("all-on  config present:", frozenset(set(flags)) in configs)

report(["dark_mode", "new_checkout", "beta_search"], ["staging", "prod"])
# Expected output:
# flags: 3  ->  configurations: 8 (= 2**3)
# environments: 2  ->  test cases: 16
# all-off config present: True
# all-on  config present: True

The two "always present" configurations are the two §8.2 reminded us never to forget: the empty set $\emptyset$ (all flags off) and the whole set $F$ itself (all flags on). Both always belong to $\mathcal{P}(F)$, and the report confirms the generator includes them.

Discussion Questions

The build represents a configuration as the set of on-flags. An alternative is a length-$n$ bitstring (one bit per flag, 1 = on). Describe the correspondence between subsets and bitstrings (§8.2's "subset $\leftrightarrow$ bitstring"), and argue why both have exactly $2^n$ elements. Which representation would you choose for $n = 64$ flags, and why?
Suppose two flags are mutually exclusive (light_mode and dark_mode can't both be on). The valid configurations are now a proper subset of $\mathcal{P}(F)$. How many configurations are forbidden, and how would you modify all_configurations to exclude them? Is the result still a power set of anything?
The Cartesian product $\mathcal{P}(F) \times E$ is not commutative: $\mathcal{P}(F) \times E \ne E \times \mathcal{P}(F)$ as sets (§8.4). Does that distinction matter for the test matrix's purpose? When (if ever) would the order of coordinates matter to the code that consumes the matrix?
The doubling algorithm runs in time proportional to the output size, $2^n$. Is that "slow"? Argue that no algorithm could enumerate the power set faster than $\Theta(2^n)$, regardless of cleverness. (Hint: it must produce $2^n$ things.)

Your Turn: Extensions

Option A (pairwise coverage). Instead of all $2^n$ configurations, generate just the configurations needed so that every pair of flags appears in all four on/off combinations at least once. Compare the size of your pairwise set to $2^n$ for $n = 5, 10$. (This is the practical sampling the Threshold callout motivated.)
Option B (use the Toolkit). Replace all_configurations with the Project Checkpoint's power_set and test_matrix's comprehension with the Toolkit's cartesian, importing from dmtoolkit.sets. Verify the counts are unchanged. This confirms your case-study code and the Toolkit compute the same sets (set equality, §8.2).
Option C (count without building). Write a function that returns the number of test cases for $n$ flags and $m$ environments without constructing the matrix (just compute $2^n \cdot m$). Explain why this is the right tool when you only need the size — and why building the full matrix for $n = 30$ would exhaust memory while the count is instant.

Key Takeaways

The power set is a constructible object. "All on/off combinations of $n$ flags" is $\mathcal{P}(F)$, built by the doubling algorithm — each flag doubles the collection, yielding $2^n$ configurations exactly as §8.2's theorem predicts.
frozenset is what makes a set of sets work. Configurations must be hashable to live inside the set of all configurations, and immutable so equal configurations compare equal (§8.6); a list or tuple would break order-insensitivity or the count.
Cartesian product builds the test matrix. Pairing configurations with environments is $\mathcal{P}(F) \times E$, with $\lvert \mathcal{P}(F) \times E \rvert = 2^{n} \cdot m$ by the product rule (§8.4).
Exponential growth is the real lesson. $2^n$ configurations make exhaustive combination-testing infeasible quickly — seeing the size before building is why the power set matters to anyone who ships software behind flags.