Adjustment checker

In this article, we show how to implement an adjustment checker for CPDAGs using CIfly. We directly rely on the complete graphical validity criterion derived by Perković et al. (2018). Covariate adjustment is a fundamental technique for identifying causal effects. Applying covariate adjustment to CPDAGs (a class of causal graphs that can be learned from data), allows us to learn causal effects from observational data by combining causal discovery and causal effect estimation. As we will be purely focused on implementing an algorithm checking this criterion, we refer the reader to the paper for additional background on CPDAGs and adjustment.

Theorem [Perković et al. (2018)]
Consider disjoint node sets $X$ , $Y$ and $W$ in a CPDAG $G$ . Then, $W$ is a valid adjustment set relative to $(X, Y)$ in $G$ if, and only if,

all proper possibly directed paths from $X$ to $Y$ begin with a directed edge,
$\text{forb}_G(X, Y) \cap W = \emptyset$ and
all proper definite-status non-causal paths from $X$ to $Y$ are blocked by $W$ in $G$ .

The criterion is formulated for CPDAGs, which are graphs with both directed and undirected edges. They do not contain directed cycles and satisfy additional properties, which we, however, do not discuss in this article. The graphical terminology in the criterion is introduced below.

We call the first condition of the Theorem amenability, the second forbidden set and the third blocking. Our implementation proceeds by checking these conditions one-by-one.

Amenability condition

Let us first revisit the notion of a proper possibly directed path. A proper path between $X$ and $Y$ is one which contains exactly one node from $X$ and one node from $Y$ (the endpoints of the path). A path is possibly directed from $X$ to $Y$ if every edge on it is either undirected or points towards the node in $Y$ . For example, $x \rightarrow u - v \rightarrow y$ is, for $X = \{x\}$ and $Y = \{y\}$ , a possibly directed path from $X$ to $Y$ . The condition refers to those proper possibly directed paths that start with an undirected edge out of $X$ , for example $x - u - v \rightarrow y$ would be such a path.

In our documentation, we already used the rule table below for finding all non-amenable nodes as a running example. It finds all nodes $v$ for which there exists a proper possibly directed path starting with an undirected edge from a node $x \in X$ to $v$ . Effectively, these are the nodes that cannot be in set $Y$ .

not_amenable_cpdag.txt

EDGES --> <--, ---
SETS X
COLORS init, yield
START ... [init] AT X
OUTPUT ... [yield]

... [init]  | ---      [yield] | next not in X
... [yield] | ---, --> [yield] | next not in X

To check amenability, one can hence use this ruletable for set $X$ and check whether the intersection of returned nodes with $Y$ is empty.

Forbidden set condition

The forbidden set $\text{forb}_G(X, Y)$ contains all nodes that are possible descendants of a node $w$ that lies on a proper possibly directed path from $X$ to $Y$ . A possible descendant of $w$ is a node $v$ such that there exists a possibly directed path from $w$ to $v$ . We compute the forbidden set in multiple steps. First, we find all $w$ on a proper possibly directed path from $X$ to $Y$ . Afterwards, we find the possible descendants of these $w$ . The first step can be done by finding all nodes on a proper possibly directed path from $X$ that does not contain a node in $Y$ and, vice versa, all nodes from which there exists a proper possibly directed path to $Y$ not containing a node in $X$ . By definition, the intersection of these two node sets are precisely the nodes which lie on a proper possibly directed path from $X$ to $Y$ .

Hence, we need a table for finding nodes on proper possibly directed paths from $X$ with the option to stop before nodes of a passed set $W$ are reached. Clearly, this table can also be used to find possible descendants.

possible_descendants_cpdag.txt

EDGES --> <--, ---
SETS X, W
START --> AT X
OUTPUT ...

... | --> | next not in W
... | --- | next not in W

We also need to find the nodes from which there exists a proper possibly directed path to $Y$ . For this, we can simply start the reachability algorithm at $Y$ and traverse these paths the opposite way.

possible_ancestors_cpdag.txt

EDGES --> <--, ---
SETS X, W
START <-- AT X
OUTPUT ...

... | <-- | next not in W
... | --- | next not in W

Using these tables, we can compute the forbidden set as follows.

anc = cf.reach(g, {"X": Y, "W": X}, "./possible_ancestors_cpdag.txt")
des = cf.reach(g, {"X": X, "W": []}, "./possible_descendants_cpdag.txt")
cn = set(anc).intersection(des)
forb = cf.reach(g, {"X": cn}, "./possible_descendants_cpdag.txt")

anc <- reach(cpdag, list("X" = Y, "W" = X), "./possible_ancestors_cpdag.txt")
des <- reach(cpdag, list("X" = X, "W" = c()), "./possible_descendants_cpdag.txt")
cn <- intersect(anc, des)
forb <- reach(cpdag, list("X" = cn), "./possible_descendants_cpdag.txt")

Blocking condition

The blocking condition concerns itself with proper definite-status non-causal paths. Again, let us briefly introduce this terminology. A path from $X$ to $Y$ is called non-causal if it is not possibly directed from $X$ to $Y$ . Further, a path is definite-status if every non-endpoint node on it is either a collider or a definite non-collider on the path. A collider is a node $w$ such that it is connected to its neighbors on the path as $u \rightarrow w \leftarrow v$ . A definite non-collider is a node $w$ where the edge to its neighbor $u$ or to its neighbor $v$ points towards $u$ , respectively $v$ , or where both edges are undirected, that is, $u \leftarrow w - v,u - w \rightarrow v$ or $u - w - v$ .

We need to verify that all proper definite-status non-causal paths are blocked given $W$ , that is, for each path there exists a collider $w$ on the path such $w \not\in W$ or a definite-status non-colllider $w$ such that $w \in W$ . We again verify this by finding all reachable nodes and checking that $Y$ contains no such nodes. To do this, we use the following table.

backdoor_connected_cpdag.txt

EDGES --> <--, ---
SETS X, C, W
START <-- AT X
OUTPUT ...

--> | <-- | current in W
--- | --- | current not in W
... | --> | current not in W and (current not in X or next not in C)
<-- | ... | current not in W

This table tracks definite-status walks open given $W$ and it is easy to show that tracking walks is equivalent to tracking paths. To ensure that they are non-causal, any transition from $X$ to a node in $C$ (a set containing all nodes on a possible directed path from $X$ to $Y$ ) is excluded in the table. Here, we use that the amenability condition ensures that only directed edges from $X$ to $C$ may exist (else the exclusion of transitions from $X$ to $C$ would need to be added to the second and fourth rule as well).

Full implementation

In the final implementation, we use the four tables introduced above to check the conditions one-by-one. We note that we could add early returns for when either the first or the second condition is violated. This is omitted here in favor of a clearer presentation.

We rely on a function in utils that returns the path to the rule-table folder. In Python, this function returns a Path object from the pathlib library. In R, it returns a string holding this path, which is created using the here library.

cpdag_adjustment_check.py

import ciflypy as cf
import ciflypy_examples.utils as utils

ruletables = utils.get_ruletable_path()
not_amenable = cf.Ruletable(ruletables / "not_amenable_cpdag.txt")
possible_anc = cf.Ruletable(ruletables / "possible_ancestors_cpdag.txt")
possible_des = cf.Ruletable(ruletables / "possible_descendants_cpdag.txt")
backdoor_conn = cf.Ruletable(ruletables / "backdoor_connected_cpdag.txt")

def is_cpdag_adjustment(g, X, Y, W):
    nam = cf.reach(g, {"X": X}, not_amenable)

    anc = cf.reach(g, {"X": Y, "W": X}, possible_anc)
    des = cf.reach(g, {"X": X, "W": []}, possible_des)
    cn = set(anc).intersection(des)
    forb = cf.reach(g, {"X": cn}, possible_des)

    bconn = cf.reach(g, {"X": X, "C": cn, "W": W}, backdoor_conn)

    return (not set(nam).intersection(Y) 
            and not set(forb).intersection(W)
            and not set(bconn).intersection(Y))

cpdagAdjustmentCheck.R

library("ciflyr") 
library("here")

source(here("R", "utils.R"))

ruletables <- getRuletablePath()
notAmenable <- parseRuletable(file.path(ruletables, "not_amenable_cpdag.txt"))
possibleAncestors <- parseRuletable(file.path(ruletables, "possible_ancestors_cpdag.txt"))
possibleDescendants <- parseRuletable(file.path(ruletables, "possible_descendants_cpdag.txt"))
backdoorConnected <- parseRuletable(file.path(ruletables, "backdoor_connected_cpdag.txt"))

isCpdagAdjustment <- function(cpdag, X, Y, W) {
	nam <- reach(cpdag, list("X" = X), notAmenable)

	anc <- reach(cpdag, list("X" = Y, "W" = X), possibleAncestors)
	des <- reach(cpdag, list("X" = X), possibleDescendants)
	cn <- intersect(anc, des)
	forb <- reach(cpdag, list("X" = cn), possibleDescendants)

	bconn <- reach(cpdag, list("X" = X, "C" = cn, "W" = W), backdoorConnected)

	return  (length(intersect(nam, Y)) == 0 && length(intersect(forb, W)) == 0 && length(intersect(bconn, Y)) == 0)
}

References

Emilija Perković, Johannes Textor, Markus Kalisch, Marloes H. Maathuis.

Complete Graphical Characterization and Construction of Adjustment Sets in Markov Equivalence Classes of Ancestral Graphs. JMLR, 2018.

Verifying adjustment sets

Amenability condition

Forbidden set condition

Blocking condition

Full implementation

References