Chapter 17: Quiz

DataField.Dev

Chapter 17: Quiz

Test your understanding of graphical causal models. Answers follow each question.

Question 1

What is a directed acyclic graph (DAG), and why is the acyclicity constraint important for causal modeling?

Answer

A **directed acyclic graph (DAG)** is a graph consisting of nodes (representing variables) connected by directed edges (arrows representing direct causal effects), with the constraint that no directed path leads from any node back to itself (no cycles). The acyclicity constraint encodes the assumption that the causal system does not contain simultaneous feedback loops at the time scale of analysis. This is necessary for the structural causal model to have a well-defined solution: given the exogenous variables, the structural equations can be solved in topological order (parents before children). Systems with feedback (such as climate models or control systems) require either temporal unrolling (representing the same variable at different time steps) or different formalisms (simultaneous equations, differential equations).

Question 2

Name the three elementary junction types in a DAG. For each, state whether conditioning on the middle node blocks or opens the path.

Answer

1. **Fork** (common cause): $X \leftarrow Z \rightarrow Y$. Conditioning on $Z$ **blocks** the path. 2. **Chain** (mediation): $X \rightarrow Z \rightarrow Y$. Conditioning on $Z$ **blocks** the path. 3. **Collider** (common effect): $X \rightarrow Z \leftarrow Y$. Conditioning on $Z$ **opens** the path (the path is blocked by default when $Z$ is not conditioned on). The critical asymmetry: forks and chains are active (open) by default and blocked by conditioning; colliders are inactive (blocked) by default and activated by conditioning.

Question 3

What is d-separation? State the algorithm for determining whether $X$ and $Y$ are d-separated by $\mathbf{Z}$.

Answer

**d-separation** (directional separation) is the algorithm for determining whether two variables $X$ and $Y$ are conditionally independent given a set $\mathbf{Z}$ in a DAG. The algorithm is: $X$ and $Y$ are d-separated by $\mathbf{Z}$ if and only if **every path** between $X$ and $Y$ is **blocked** by $\mathbf{Z}$. A path is blocked if it contains at least one of: - A **non-collider** (fork or chain node) that is in $\mathbf{Z}$ — conditioning on it blocks the path. - A **collider** that is **not** in $\mathbf{Z}$ and has no descendant in $\mathbf{Z}$ — the collider blocks the path by default. If any path has all non-colliders outside $\mathbf{Z}$ and all colliders either in $\mathbf{Z}$ or with a descendant in $\mathbf{Z}$, that path is active (d-connected), and $X$ and $Y$ are not d-separated.

Question 4

Consider the DAG: $A \to B \to C \leftarrow D$. Is $A$ d-separated from $D$ given $\emptyset$? Given $\{C\}$?

Answer

The only path from $A$ to $D$ is $A \to B \to C \leftarrow D$. The node $C$ is a **collider** on this path. - **Given $\emptyset$:** The collider $C$ is not conditioned on and has no descendant in $\mathbf{Z}$, so the path is **blocked**. $A$ and $D$ are **d-separated** given $\emptyset$. (They are marginally independent.) - **Given $\{C\}$:** The collider $C$ is now conditioned on, so the path is **opened**. $A$ and $D$ are **d-connected** given $\{C\}$. (They become conditionally dependent — this is collider bias.)

Question 5

What is the difference between a structural equation and a regression equation?

Answer

A **structural equation** $Y = f(\text{pa}(Y), U_Y)$ is a **causal** statement: it represents the mechanism by which $Y$ is generated from its direct causes plus exogenous noise. It tells us what happens under intervention: if we set a parent to a specific value, $Y$ changes according to the function $f$. A **regression equation** $Y = \alpha + \beta X + \varepsilon$ is a **statistical** statement about the conditional expectation $\mathbb{E}[Y \mid X]$. It describes the best linear prediction of $Y$ given $X$ in the observed data. It does not tell us what happens under intervention, because the coefficient $\beta$ conflates causal effects with confounding associations. The directions of structural equations are meaningful (they reflect causal direction); the direction of a regression equation is a modeling choice (we could equally regress $X$ on $Y$). In a structural equation, the noise $U_Y$ represents all unmeasured causes of $Y$; in a regression, $\varepsilon$ represents prediction error, which may include confounders.

Question 6

State the backdoor criterion. What two conditions must be satisfied?

Answer

A set of variables $\mathbf{Z}$ satisfies the **backdoor criterion** relative to $(X, Y)$ if: 1. **No node in $\mathbf{Z}$ is a descendant of $X$.** (This ensures we do not condition on mediators, colliders downstream of treatment, or other post-treatment variables.) 2. **$\mathbf{Z}$ blocks every backdoor path between $X$ and $Y$.** (A backdoor path is any path that starts with an arrow into $X$. These paths transmit non-causal associations.) When both conditions hold, the causal effect is given by the backdoor adjustment formula: $P(Y \mid \text{do}(X = x)) = \sum_\mathbf{z} P(Y \mid X = x, \mathbf{Z} = \mathbf{z}) P(\mathbf{Z} = \mathbf{z})$.

Question 7

In the MediCore DAG, why is $\{$Severity, Age$\}$ a valid adjustment set but $\{$Severity, Age, Biomarker$\}$ is not?

Answer

$\{$Severity, Age$\}$ is valid because: (1) neither Severity nor Age is a descendant of Drug X (they are upstream variables), and (2) conditioning on them blocks all backdoor paths from Drug X to Hospitalization (the fork through Severity and the fork through Age). $\{$Severity, Age, Biomarker$\}$ is **not valid** because Biomarker is a **descendant of Drug X** (Drug X $\to$ Biomarker). This violates condition 1 of the backdoor criterion. Furthermore, Biomarker is a **mediator** on the causal path Drug X $\to$ Biomarker $\to$ Hospitalization. Conditioning on it blocks the causal pathway, attenuating the estimated total effect toward zero. In the chapter's empirical demonstration, the Drug X coefficient drops from $-1.01$ (correct) to $-0.02$ (nearly zero) when Biomarker is included.

Question 8

Explain the concept of "collider bias" using a concrete example. Why is it dangerous to "control for everything"?

Answer

**Collider bias** occurs when you condition on a variable that is caused by two other variables (a collider), creating a spurious association between the collider's causes. **Example:** Talent and Luck are independent. Both cause Fame (talent + luck = celebrity status). Among famous people, talent and luck are negatively correlated: a famous person who lacks talent must have been very lucky. Conditioning on Fame (the collider) creates a spurious negative association between Talent and Luck. This makes "control for everything" dangerous because some variables are colliders. If you include a collider in your adjustment set, you **open** a spurious path that was previously blocked, **introducing** bias rather than removing it. The correct set of controls depends on the causal structure (the DAG), not on the statistical properties of the variables.

Question 9

What is the front-door criterion? Under what conditions does it apply?

Answer

The **front-door criterion** identifies a causal effect through a mediating variable when the backdoor criterion fails due to unmeasured confounders. A set $\mathbf{M}$ satisfies the front-door criterion relative to $(X, Y)$ if: 1. $\mathbf{M}$ **intercepts all directed paths** from $X$ to $Y$. 2. There is **no unblocked backdoor path** from $X$ to $\mathbf{M}$. 3. All **backdoor paths from $\mathbf{M}$ to $Y$ are blocked by $X$**. The adjustment formula is: $P(Y \mid \text{do}(X=x)) = \sum_m P(M=m \mid X=x) \sum_{x'} P(Y \mid M=m, X=x') P(X=x')$. The front-door criterion applies when: there is an unmeasured confounder between $X$ and $Y$; the causal effect is fully mediated by $M$; and $X$ itself blocks the backdoor paths from $M$ to $Y$. These conditions are restrictive, making the front-door criterion less commonly applicable than the backdoor criterion in practice.

Question 10

What is the do-operator? Explain the difference between $P(Y \mid X = x)$ and $P(Y \mid \text{do}(X = x))$ using a concrete example.

Answer

The **do-operator** formalizes the concept of intervention. $P(Y \mid \text{do}(X = x))$ is the distribution of $Y$ when $X$ is **set to** $x$ by an external intervention, regardless of $X$'s natural causes. Graphically, it corresponds to "graph surgery": deleting all arrows into $X$ and fixing $X = x$. **Example:** Consider Drug X and Hospitalization, confounded by Disease Severity. - $P(\text{Hosp} \mid \text{Drug} = 1)$: the hospitalization rate among patients who **were prescribed** Drug X. This includes patients selected by physicians, who tend to be sicker. The observed association mixes the causal drug effect with the confounding severity effect. - $P(\text{Hosp} \mid \text{do}(\text{Drug} = 1))$: the hospitalization rate if we **forced everyone** to take Drug X, severing the link between severity and drug assignment. This is the causal effect — what would happen in an ideal randomized trial. In the chapter's simulation, $P(\text{Hosp} \mid \text{Drug} = 1)$ suggests Drug X *increases* hospitalization (because sicker patients receive it), while $P(\text{Hosp} \mid \text{do}(\text{Drug} = 1))$ correctly shows that Drug X *reduces* hospitalization.

Question 11

Why is $P(Y \mid \text{do}(X = x)) \neq P(Y \mid X = x)$ in general? Under what condition are they equal?

Answer

They differ because conditioning on $X = x$ (observing) provides information about $X$'s causes (since $X = x$ is more likely when its causes take certain values), while intervening on $X = x$ (doing) severs $X$ from its causes. When $X$ is confounded with $Y$ through a common cause $Z$, observing $X = x$ tells us something about $Z$ (which also affects $Y$), but intervening on $X = x$ tells us nothing about $Z$. **They are equal when there is no confounding** — specifically, when $X$ has no common causes with $Y$, meaning all paths between $X$ and $Y$ are directed from $X$ to $Y$ (no backdoor paths). In this case, the observational conditional distribution already reflects the causal effect. Formally: if $\{Y(0), Y(1)\} \perp\!\!\!\perp X$ (unconditional ignorability), then $P(Y \mid X) = P(Y \mid \text{do}(X))$. This is the case in a randomized experiment.

Question 12

What are the three rules of do-calculus? Why are they important?

Answer

The three rules of do-calculus (Pearl, 1995) allow algebraic manipulation of expressions involving the do-operator: 1. **Rule 1 (Insertion/deletion of observations):** Irrelevant observations can be added or removed based on d-separation in the manipulated graph. 2. **Rule 2 (Action/observation exchange):** Under certain d-separation conditions, intervening on a variable can be replaced by observing it (or vice versa). 3. **Rule 3 (Insertion/deletion of actions):** Irrelevant interventions can be added or removed based on d-separation conditions. **Importance:** The three rules are **complete** (Huang and Valtorta, 2006; Shpitser and Pearl, 2006). This means: if a causal effect is identifiable from the graph and observational data, these three rules are sufficient to derive the identification formula. If the rules cannot derive a formula, the effect is provably not identifiable without additional assumptions. This completeness result is one of the most important results in modern causal inference theory.

Question 13

Classify each of the following as a "good control" or "bad control" for estimating the effect of a recommendation on engagement:

(a) User age (affects both what is recommended and engagement). (b) Time spent browsing after the recommendation (downstream of recommendation). (c) Content quality (affects engagement but not the recommendation algorithm's decision).

Answer

**(a) User age: Good control.** Age is a confounder (it causes both the recommendation, through its influence on the algorithm's predictions, and engagement). Controlling for it blocks a backdoor path. **(b) Time spent browsing after recommendation: Bad control.** This is a post-treatment variable — it is a consequence (descendant) of the recommendation. Controlling for it conditions on a variable downstream of treatment, which can introduce collider bias or block the causal pathway. It violates condition 1 of the backdoor criterion. **(c) Content quality: Good control.** It is a cause of the outcome only (not a cause of treatment, and not a descendant of treatment). Controlling for it does not affect bias but improves precision by explaining variation in engagement.

Question 14

What is the Causal Markov Condition? How does it connect the DAG to the joint probability distribution?

Answer

The **Causal Markov Condition** states that, given a causal DAG $\mathcal{G}$, every variable $V_i$ is conditionally independent of its non-descendants given its parents: $$V_i \perp\!\!\!\perp \text{NonDescendants}(V_i) \mid \text{Parents}(V_i)$$ This implies that the joint distribution factorizes as: $$P(V_1, \ldots, V_p) = \prod_{i=1}^{p} P(V_i \mid \text{pa}(V_i))$$ The Markov condition connects the graph (a structural/causal object) to the probability distribution (a statistical object). It says that the only variables that directly influence $V_i$ are its parents in the graph; once you know the parents, all other non-descendant variables are irrelevant for predicting $V_i$. This is the probabilistic consequence of the structural equation $V_i = f_i(\text{pa}(V_i), U_i)$ when the exogenous variables are mutually independent.

Question 15

What is the faithfulness assumption? Give an example of when it might be violated.

Answer

The **faithfulness assumption** states that the only conditional independencies in the data are those implied by d-separation in the DAG. That is, if two variables are conditionally independent, it is because they are d-separated — not because of an "accidental" cancellation of effects along different paths. **Example of a violation:** Consider the DAG $X \to Y \leftarrow Z$ and $X \to Z$. There are two paths from $X$ to $Y$: the direct path $X \to Y$ and the indirect path $X \to Z \to Y$ (going against the arrow at $Z$, which is not relevant here — let me correct the example). Consider instead: $X \to Y$ with coefficient $+2$ and $X \to Z \to Y$ with coefficient $-2$ (i.e., $X$ to $Z$ has coefficient $+1$, $Z$ to $Y$ has coefficient $-2$). The total effect of $X$ on $Y$ is $2 + (1)(-2) = 0$. $X$ and $Y$ appear independent even though $X$ is a direct cause of $Y$. This is a faithfulness violation: the graph says $X$ and $Y$ are d-connected, but the parameters are "fine-tuned" so the effects cancel exactly.

Question 16

In DoWhy, what are "refutation tests"? Name two types and explain what they check.

Answer

**Refutation tests** in DoWhy are automated checks of the robustness and internal consistency of a causal estimate. They do not prove that the causal model is correct, but they can detect some forms of invalidity. 1. **Placebo treatment refuter:** Replaces the actual treatment with random noise (a permutation of the treatment variable). If the causal estimate with the placebo treatment is near zero, this confirms that the original estimate is driven by the specific treatment variable, not by random chance or data artifacts. If the placebo estimate is also large, the original estimate may be spurious. 2. **Random common cause refuter:** Adds a randomly generated variable as an additional confounder. If the estimate remains stable after adding this random variable, it suggests the estimate is not sensitive to the inclusion of irrelevant confounders. If the estimate changes substantially, it may indicate the original model is fragile. Other refutation tests include the data subset refuter (re-estimates on subsets of the data) and the bootstrap refuter (checks stability under resampling).

Question 17

Explain why the DAG framework breaks down for climate systems with feedback loops. What is the standard workaround?

Answer

Climate systems contain **feedback loops** (e.g., temperature $\to$ ice melt $\to$ albedo change $\to$ temperature), which violate the acyclicity constraint of DAGs. In a DAG, there can be no directed path from a variable back to itself. Feedback loops create exactly such cycles. The standard workaround is **temporal unrolling**: represent the same variable at different time points (e.g., $\text{Temp}(t)$, $\text{Temp}(t+1)$, $\text{Temp}(t+2)$) so that the feedback operates across time steps rather than within a single time step. Each time slice is a valid DAG. This requires specifying the time scale at which causal effects operate. If the feedback is faster than the observation frequency, the temporal DAG may not capture the relevant dynamics. For truly simultaneous feedback, alternative formalisms such as simultaneous equation models (econometrics) or differential equation models (physics) are required.

Question 18

What is the connection between the backdoor criterion (graphical framework) and conditional ignorability (potential outcomes framework)?

Answer

The backdoor criterion in Pearl's graphical framework **implies** conditional ignorability in Rubin's potential outcomes framework. Specifically, if a set $\mathbf{Z}$ satisfies the backdoor criterion for $(X, Y)$ in a causal DAG $\mathcal{G}$, and the causal Markov condition and faithfulness hold, then: $$\{Y(0), Y(1)\} \perp\!\!\!\perp X \mid \mathbf{Z}$$ This is the conditional ignorability assumption from Chapter 16. The graphical framework provides a systematic, visual way to determine *which* sets $\mathbf{Z}$ satisfy conditional ignorability — something the potential outcomes framework leaves to the analyst's informal judgment. The backdoor criterion automates the search for valid adjustment sets. Conversely, if conditional ignorability holds for a set $\mathbf{Z}$, this is consistent with a DAG where $\mathbf{Z}$ satisfies the backdoor criterion (though the DAG may not be unique).

Question 19

For the StreamRec progressive project, what is the primary confounder in the causal DAG, and why does it make the causal effect of recommendations hard to identify?

Answer

The primary confounder is **User Preference** — the latent variable representing a user's underlying interest in specific content. User Preference is a common cause (fork) of both the Recommendation (the algorithm recommends items it predicts the user will like, which is driven by preferences) and Engagement (users engage more with content that matches their preferences). This makes the causal effect hard to identify because: (1) User Preference is **largely unobserved** — we observe proxies (user history, demographics, past behavior) but not the latent preference itself. (2) The algorithm is specifically designed to exploit preference signals, creating strong confounding. (3) Because the confounder is unobserved, the backdoor criterion cannot be straightforwardly satisfied with available data. This motivates the alternative identification strategies (instrumental variables, difference-in-differences) covered in [Chapter 18](../chapter-18-causal-estimation-methods/index.md) and the machine learning approaches (double ML) covered in Chapter 19.

Question 20

Explain the concept of "graph surgery" in the context of the do-operator. How does it formalize the idea of an intervention?

Answer

**Graph surgery** is the graphical operation that formalizes an intervention $\text{do}(X = x)$. In the original DAG, $X$ has parents that determine its value through a structural equation $X = f_X(\text{pa}(X), U_X)$. Under the intervention $\text{do}(X = x)$: 1. The structural equation for $X$ is **replaced** by the constant $X = x$. 2. All **arrows into $X$ are deleted** from the graph (because $X$'s value is no longer determined by its parents — it is set externally). 3. All **arrows out of $X$ remain** (because $X$ still causes its children — the intervention sets $X$ but does not alter downstream mechanisms). 4. All other structural equations remain unchanged. This captures the essential difference between observation and intervention: when we **observe** $X = x$, all of $X$'s causal relationships (including with its parents) remain intact, so information flows backward through the graph. When we **intervene** on $X = x$, we sever $X$ from its causes, so information no longer flows from $X$'s parents through $X$. This is precisely what a randomized experiment does — it replaces the natural treatment assignment mechanism with a random one, severing the link between confounders and treatment.