Verifying adjustment sets

In this article, we show how to implement an adjustment checker for CPDAGs using CIfly. We directly rely on the complete graphical validity criterion derived by Perković et al. (2018). Covariate adjustment is a fundamental technique for identifying causal effects. Applying covariate adjustment to CPDAGs (a class of causal graphs that can be learned from data), allows us to learn causal effects from observational data by combining causal discovery and causal effect estimation. As we will be purely focused on implementing an algorithm checking this criterion, we refer the reader to the paper for additional background on CPDAGs and adjustment.

Theorem [Perković et al. (2018)]
Consider disjoint node sets XX, YY and WW in a CPDAG GG. Then, WW is a valid adjustment set relative to (X,Y)(X, Y) in GG if, and only if,

  1. all proper possibly directed paths from XX to YY begin with a directed edge,
  2. forbG(X,Y)W=\text{forb}_G(X, Y) \cap W = \emptyset and
  3. all proper definite-status non-causal paths from XX to YY are blocked by WW in GG.

The criterion is formulated for CPDAGs, which are graphs with both directed and undirected edges. They do not contain directed cycles and satisfy additional properties, which we, however, do not discuss in this article. The graphical terminology in the criterion is introduced below.

We call the first condition of the Theorem amenability, the second forbidden set and the third blocking. Our implementation proceeds by checking these conditions one-by-one.

Amenability condition

Let us first revisit the notion of a proper possibly directed path. A proper path between XX and YY is one which contains exactly one node from XX and one node from YY (the endpoints of the path). A path is possibly directed from XX to YY if every edge on it is either undirected or points towards the node in YY. For example, xuvyx \rightarrow u - v \rightarrow y is, for X={x}X = \{x\} and Y={y}Y = \{y\}, a possibly directed path from XX to YY. The condition refers to those proper possibly directed paths that start with an undirected edge out of XX, for example xuvyx - u - v \rightarrow y would be such a path.

In our documentation, we already used the rule table below for finding all non-amenable nodes as a running example. It finds all nodes vv for which there exists a proper possibly directed path starting with an undirected edge from a node xXx \in X to vv. Effectively, these are the nodes that cannot be in set YY.

not_amenable_cpdag.txt
EDGES --> <--, ---
SETS X
COLORS init, yield
START ... [init] AT X
OUTPUT ... [yield]

... [init]  | ---      [yield] | next not in X
... [yield] | ---, --> [yield] | next not in X

To check amenability, one can hence use this ruletable for set XX and check whether the intersection of returned nodes with YY is empty.

Forbidden set condition

The forbidden set forbG(X,Y)\text{forb}_G(X, Y) contains all nodes that are possible descendants of a node ww that lies on a proper possibly directed path from XX to YY. A possible descendant of ww is a node vv such that there exists a possibly directed path from ww to vv. We compute the forbidden set in multiple steps. First, we find all ww on a proper possibly directed path from XX to YY. Afterwards, we find the possible descendants of these ww. The first step can be done by finding all nodes on a proper possibly directed path from XX that does not contain a node in YY and, vice versa, all nodes from which there exists a proper possibly directed path to YY not containing a node in XX. By definition, the intersection of these two node sets are precisely the nodes which lie on a proper possibly directed path from XX to YY.

Hence, we need a table for finding nodes on proper possibly directed paths from XX with the option to stop before nodes of a passed set WW are reached. Clearly, this table can also be used to find possible descendants.

possible_descendants_cpdag.txt
EDGES --> <--, ---
SETS X, W
START --> AT X
OUTPUT ...

... | --> | next not in W
... | --- | next not in W

We also need to find the nodes from which there exists a proper possibly directed path to YY. For this, we can simply start the reachability algorithm at YY and traverse these paths the opposite way.

possible_ancestors_cpdag.txt
EDGES --> <--, ---
SETS X, W
START <-- AT X
OUTPUT ...

... | <-- | next not in W
... | --- | next not in W

Using these tables, we can compute the forbidden set as follows.

anc = cf.reach(g, {"X": Y, "W": X}, "./possible_ancestors_cpdag.txt")
des = cf.reach(g, {"X": X, "W": []}, "./possible_descendants_cpdag.txt")
cn = set(anc).intersection(des)
forb = cf.reach(g, {"X": cn}, "./possible_descendants_cpdag.txt")

Blocking condition

The blocking condition concerns itself with proper definite-status non-causal paths. Again, let us briefly introduce this terminology. A path from XX to YY is called non-causal if it is not possibly directed from XX to YY. Further, a path is definite-status if every non-endpoint node on it is either a collider or a definite non-collider on the path. A collider is a node ww such that it is connected to its neighbors on the path as uwvu \rightarrow w \leftarrow v. A definite non-collider is a node ww where the edge to its neighbor uu or to its neighbor vv points towards uu, respectively vv, or where both edges are undirected, that is, uwv,uwvu \leftarrow w - v,u - w \rightarrow v or uwvu - w - v.

We need to verify that all proper definite-status non-causal paths are blocked given WW, that is, for each path there exists a collider ww on the path such w∉Ww \not\in W or a definite-status non-colllider ww such that wWw \in W. We again verify this by finding all reachable nodes and checking that YY contains no such nodes. To do this, we use the following table.

backdoor_connected_cpdag.txt
EDGES --> <--, ---
SETS X, C, W
START <-- AT X
OUTPUT ...

--> | <-- | current in W
--- | --- | current not in W
... | --> | current not in W and (current not in X or next not in C)
<-- | ... | current not in W 

This table tracks definite-status walks open given WW and it is easy to show that tracking walks is equivalent to tracking paths. To ensure that they are non-causal, any transition from XX to a node in CC (a set containing all nodes on a possible directed path from XX to YY) is excluded in the table. Here, we use that the amenability condition ensures that only directed edges from XX to CC may exist (else the exclusion of transitions from XX to CC would need to be added to the second and fourth rule as well).

Full implementation

In the final implementation, we use the four tables introduced above to check the conditions one-by-one. We note that we could add early returns for when either the first or the second condition is violated. This is omitted here in favor of a clearer presentation.

We rely on a function in utils that returns the path to the rule-table folder. In Python, this function returns a Path object from the pathlib library. In R, it returns a string holding this path, which is created using the here library.

cpdag_adjustment_check.py
import ciflypy as cf
import ciflypy_examples.utils as utils

ruletables = utils.get_ruletable_path()
not_amenable = cf.Ruletable(ruletables / "not_amenable_cpdag.txt")
possible_anc = cf.Ruletable(ruletables / "possible_ancestors_cpdag.txt")
possible_des = cf.Ruletable(ruletables / "possible_descendants_cpdag.txt")
backdoor_conn = cf.Ruletable(ruletables / "backdoor_connected_cpdag.txt")

def is_cpdag_adjustment(g, X, Y, W):
    nam = cf.reach(g, {"X": X}, not_amenable)

    anc = cf.reach(g, {"X": Y, "W": X}, possible_anc)
    des = cf.reach(g, {"X": X, "W": []}, possible_des)
    cn = set(anc).intersection(des)
    forb = cf.reach(g, {"X": cn}, possible_des)

    bconn = cf.reach(g, {"X": X, "C": cn, "W": W}, backdoor_conn)

    return (not set(nam).intersection(Y) 
            and not set(forb).intersection(W)
            and not set(bconn).intersection(Y))

References