Finding minimal d-separators
In this article, we implement the algorithm developed by van der Zander, Liśkiewicz (2020) for finding a minimal d-separator using CIfly. The concept of d-separation lies at the center of graphical causality as a bridge between graphs and conditional independencies. We have already seen an simple CIfly algorithm for testing d-separation on the home page. However, now we consider the task of finding d-separators and, more precisely, minimal ones. A d-separator for sets of nodes and is minimal if no subset of d-separates and .
For our implementation, we follow the algorithmic techniques presented in Section 5 in (van der Zander, Liśkiewicz 2020) which apply to quite general classes of causal graph including CPDAGs and MAGs, both popular in the context of causal structure learning. However, here we only focus on acyclic directed mixed graphs (ADMGs). At the core of the algorithm lies the notion of a closure of a set of nodes in a causal graph. It is used to find a nearest d-separator that, in turn, can be used to find a minimal d-separator. We also use nearest d-separators later when implementing a sound-and-complete algorithm for finding conditional instrumental sets. Let us now discuss these concepts in more detail.
Preliminaries
Acyclic directed mixed graphs, or ADMGs for short, are graphs that include directed as well as bidirected edges and do not contain a directed cycle. A node is an ancestor of a node if there exists a directed path from to . The set of all ancestors of in a graph is denoted by . This can be generalized to sets of nodes , for which we define as the union of the ancestors of the individual nodes, that is . Moreover, a collider on a path is a node where the incident edges point towards it (for example ) and a non-collider is a node for which this does not hold.
Closure
Based on these notions, the closure of a set of nodes can be defined as follows.
Definition [van der Zander, Liśkiewicz (2020)]
Let be an ADMG, and set of nodes, and . Then, the closure of with respect to , written , is defined as the set of all nodes for which there exists a path from to in that only contains nodes in and no non-collider in .
The closure can be directly encoded into a CIfly rule table that tracks paths over nodes in and ensures that non-colliders, here in the bottom rule, are not in .
EDGES --> <--, <->
SETS X, Z, A
START <-- AT X
OUTPUT ...
-->, <-> | <--, <-> | next in A
... | ... | next in A and current not in Z
Nearest d-separators
As van der Zander, Liśkiewicz (2020) show, nearest separators can be used to compute minimal ones. We will follow this strategy, in particular because we will re-use the concept of nearest separators in a later article. Roughly speaking, a nearest separator relative to and d-separates these sets using nodes closer to than whenever possible. For a more precise definition, we refer to van der Zander, Liśkiewicz (2020).
The strategy for computing the nearest separator works as follows:
- determine a potential d-separator for and , namely ,
- compute , the closure of with respect to , and
- if contains nodes in , then return as there exists no d-separator, else return .
The idea behind this strategy is that the closure contains the nodes closest to that are needed to d-separate and . This is essentially the case because the search stops at non-colliders in . Due to the definition of d-separation non-colliders not in are not blocked and, because the search runs over ancestors of and , it also continues for colliders (no matter whether in or not). For the formal details, we refer to van der Zander, Liśkiewicz (2020).
This algorithm can be implemented as follows using CIfly and the rule table for computing the closure from above. Here, as in the original work, we have added and as arguments which ensure that only separators which satisfy are returned. This gives the user the option to further constrain the desired d-separator and is used below.
import ciflypy as cf
import ciflypy_examples.utils as utils
ruletables = utils.get_ruletable_path()
ancestors_table = cf.Ruletable(ruletables / "ancestors_admg.txt")
closure_table = cf.Ruletable(ruletables / "closure_admg.txt")
def find_nearest_separator(g, X, Y, I, R):
A = cf.reach(g, {"X": X + Y + I}, ancestors_table)
Z0 = set(R).intersection(set(A) - set(X + Y))
Xstar = cf.reach(g, {"X": X, "Z": Z0, "A": A}, closure_table)
if set(Xstar).intersection(Y):
return None
return list(Z0.intersection(Xstar).union(I))
library("ciflyr")
library("here")
source(here("R", "utils.R"))
ruletables <- getRuletablePath()
ancestorsTable <- parseRuletable(file.path(ruletables, "ancestors_admg.txt"))
closureTable = parseRuletable(file.path(ruletables, "closure_admg.txt"))
findNearestSeparator <- function(g, X, Y, I, R) {
A <- reach(g, list("X" = c(X, Y, I)), ancestorsTable)
Z0 <- intersect(R, setdiff(A, c(X, Y)))
Xstar <- reach(g, list("X" = X, "Z" = Z0, "A" = A), closureTable)
if (length(intersect(Xstar, Y)) > 0) {
return (NULL)
}
return (union(intersect(Z0, Xstar), I))
}
Minimal d-separators
Using the algorithm for finding nearest separators as a primitive, it is possible to compute a minimal d-separator with the following strategy:
- Compute as a nearest separator between and (or return if none exists),
- compute as a nearest d-separator between and with (or return if none exists) and
- return .
Here is a nearest separator between and that is used to constrain the nearest separator for in the second step. This strategy can be directly translated to Python and R code. Again, we allow for arguments and constraining the minimal d-separator to . For more details on intuition and the correctness of this procedure, we again refer to van der Zander, Liśkiewicz (2020).
from ciflypy_examples.nearest_dsep import find_nearest_separator
def find_min_separator(g, X, Y, I, R):
Zx = find_nearest_separator(g, X, Y, I, R)
if not Zx:
return None
Zy = find_nearest_separator(g, Y, X, I, Zx)
if not Zy:
return None
return set(Zx).intersection(Zy).union(I)
library("here")
source(here("R", "nearestDsep.R"))
findMinSeparator <- function(g, X, Y, I, R) {
Zx <- findNearestSeparator(g, X, Y, I, R)
if (is.null(Zx)) {
return (NULL)
}
Zy <- findNearestSeparator(g, Y, X, I, Zx)
if (is.null(Zy)) {
return (NULL)
}
return(union(intersect(Zx, Zy), I))
}