Verifying adjustment sets
In this article, we show how to implement an adjustment checker for CPDAGs using CIfly. We directly rely on the complete graphical validity criterion derived by Perković et al. (2018). Covariate adjustment is a fundamental technique for identifying causal effects. Applying covariate adjustment to CPDAGs (a class of causal graphs that can be learned from data), allows us to learn causal effects from observational data by combining causal discovery and causal effect estimation. As we will be purely focused on implementing an algorithm checking this criterion, we refer the reader to the paper for additional background on CPDAGs and adjustment.
Theorem [Perković et al. (2018)]
Consider disjoint node sets , and in a CPDAG . Then, is a valid adjustment set relative to in if, and only if,
- all proper possibly directed paths from to begin with a directed edge,
- and
- all proper definite-status non-causal paths from to are blocked by in .
The criterion is formulated for CPDAGs, which are graphs with both directed and undirected edges. They do not contain directed cycles and satisfy additional properties, which we, however, do not discuss in this article. The graphical terminology in the criterion is introduced below.
We call the first condition of the Theorem amenability, the second forbidden set and the third blocking. Our implementation proceeds by checking these conditions one-by-one.
Amenability condition
Let us first revisit the notion of a proper possibly directed path. A proper path between and is one which contains exactly one node from and one node from (the endpoints of the path). A path is possibly directed from to if every edge on it is either undirected or points towards the node in . For example, is, for and , a possibly directed path from to . The condition refers to those proper possibly directed paths that start with an undirected edge out of , for example would be such a path.
In our documentation, we already used the rule table below for finding all non-amenable nodes as a running example. It finds all nodes for which there exists a proper possibly directed path starting with an undirected edge from a node to . Effectively, these are the nodes that cannot be in set .
EDGES --> <--, ---
SETS X
COLORS init, yield
START ... [init] AT X
OUTPUT ... [yield]
... [init] | --- [yield] | next not in X
... [yield] | ---, --> [yield] | next not in X
To check amenability, one can hence use this ruletable for set and check whether the intersection of returned nodes with is empty.
Forbidden set condition
The forbidden set contains all nodes that are possible descendants of a node that lies on a proper possibly directed path from to . A possible descendant of is a node such that there exists a possibly directed path from to . We compute the forbidden set in multiple steps. First, we find all on a proper possibly directed path from to . Afterwards, we find the possible descendants of these . The first step can be done by finding all nodes on a proper possibly directed path from that does not contain a node in and, vice versa, all nodes from which there exists a proper possibly directed path to not containing a node in . By definition, the intersection of these two node sets are precisely the nodes which lie on a proper possibly directed path from to .
Hence, we need a table for finding nodes on proper possibly directed paths from with the option to stop before nodes of a passed set are reached. Clearly, this table can also be used to find possible descendants.
EDGES --> <--, ---
SETS X, W
START --> AT X
OUTPUT ...
... | --> | next not in W
... | --- | next not in W
We also need to find the nodes from which there exists a proper possibly directed path to . For this, we can simply start the reachability algorithm at and traverse these paths the opposite way.
EDGES --> <--, ---
SETS X, W
START <-- AT X
OUTPUT ...
... | <-- | next not in W
... | --- | next not in W
Using these tables, we can compute the forbidden set as follows.
anc = cf.reach(g, {"X": Y, "W": X}, "./possible_ancestors_cpdag.txt")
des = cf.reach(g, {"X": X, "W": []}, "./possible_descendants_cpdag.txt")
cn = set(anc).intersection(des)
forb = cf.reach(g, {"X": cn}, "./possible_descendants_cpdag.txt")
anc <- reach(cpdag, list("X" = Y, "W" = X), "./possible_ancestors_cpdag.txt")
des <- reach(cpdag, list("X" = X, "W" = c()), "./possible_descendants_cpdag.txt")
cn <- intersect(anc, des)
forb <- reach(cpdag, list("X" = cn), "./possible_descendants_cpdag.txt")
Blocking condition
The blocking condition concerns itself with proper definite-status non-causal paths. Again, let us briefly introduce this terminology. A path from to is called non-causal if it is not possibly directed from to . Further, a path is definite-status if every non-endpoint node on it is either a collider or a definite non-collider on the path. A collider is a node such that it is connected to its neighbors on the path as . A definite non-collider is a node where the edge to its neighbor or to its neighbor points towards , respectively , or where both edges are undirected, that is, or .
We need to verify that all proper definite-status non-causal paths are blocked given , that is, for each path there exists a collider on the path such or a definite-status non-colllider such that . We again verify this by finding all reachable nodes and checking that contains no such nodes. To do this, we use the following table.
EDGES --> <--, ---
SETS X, C, W
START <-- AT X
OUTPUT ...
--> | <-- | current in W
--- | --- | current not in W
... | --> | current not in W and (current not in X or next not in C)
<-- | ... | current not in W
This table tracks definite-status walks open given and it is easy to show that tracking walks is equivalent to tracking paths. To ensure that they are non-causal, any transition from to a node in (a set containing all nodes on a possible directed path from to ) is excluded in the table. Here, we use that the amenability condition ensures that only directed edges from to may exist (else the exclusion of transitions from to would need to be added to the second and fourth rule as well).
Full implementation
In the final implementation, we use the four tables introduced above to check the conditions one-by-one. We note that we could add early returns for when either the first or the second condition is violated. This is omitted here in favor of a clearer presentation.
We rely on a function in utils
that returns the path to the rule-table folder. In Python, this function returns a Path
object from the pathlib
library. In R, it returns a string holding this path, which is created using the here
library.
import ciflypy as cf
import ciflypy_examples.utils as utils
ruletables = utils.get_ruletable_path()
not_amenable = cf.Ruletable(ruletables / "not_amenable_cpdag.txt")
possible_anc = cf.Ruletable(ruletables / "possible_ancestors_cpdag.txt")
possible_des = cf.Ruletable(ruletables / "possible_descendants_cpdag.txt")
backdoor_conn = cf.Ruletable(ruletables / "backdoor_connected_cpdag.txt")
def is_cpdag_adjustment(g, X, Y, W):
nam = cf.reach(g, {"X": X}, not_amenable)
anc = cf.reach(g, {"X": Y, "W": X}, possible_anc)
des = cf.reach(g, {"X": X, "W": []}, possible_des)
cn = set(anc).intersection(des)
forb = cf.reach(g, {"X": cn}, possible_des)
bconn = cf.reach(g, {"X": X, "C": cn, "W": W}, backdoor_conn)
return (not set(nam).intersection(Y)
and not set(forb).intersection(W)
and not set(bconn).intersection(Y))
library("ciflyr")
library("here")
source(here("R", "utils.R"))
ruletables <- getRuletablePath()
notAmenable <- parseRuletable(file.path(ruletables, "not_amenable_cpdag.txt"))
possibleAncestors <- parseRuletable(file.path(ruletables, "possible_ancestors_cpdag.txt"))
possibleDescendants <- parseRuletable(file.path(ruletables, "possible_descendants_cpdag.txt"))
backdoorConnected <- parseRuletable(file.path(ruletables, "backdoor_connected_cpdag.txt"))
isCpdagAdjustment <- function(cpdag, X, Y, W) {
nam <- reach(cpdag, list("X" = X), notAmenable)
anc <- reach(cpdag, list("X" = Y, "W" = X), possibleAncestors)
des <- reach(cpdag, list("X" = X), possibleDescendants)
cn <- intersect(anc, des)
forb <- reach(cpdag, list("X" = cn), possibleDescendants)
bconn <- reach(cpdag, list("X" = X, "C" = cn, "W" = W), backdoorConnected)
return (length(intersect(nam, Y)) == 0 && length(intersect(forb, W)) == 0 && length(intersect(bconn, Y)) == 0)
}