Finding conditional instruments
In this article, we provide a CIfly-based implementation of a sound-and-complete algorithm for finding a conditional instrument in an acyclic directed mixed graph (ADMG). Such graphs are very common in graphical causality. They contain directed and bidirected edges and represent causal models with potential latent confounding. For causal effect identification in such graphs, instrumental variables constitute a fundamental technique under a linearity assumption and they can be effective even in case of latent confounding between the outcome and treatment variables. Conditional instruments allow for an additional set to make this strategy more general.
The algorithm we develop here is the first sound and complete one for this setting, that is, one that finds a conditional instrument precisely if a valid one exists. This is achieved by building on the recently developed complete graphical criterion by Henckel et al. (2024a). Algorithmically, we rely on techniques from earlier work by van der Zander, Liśkiewicz (2020) that are, however, not based on an complete criterion. We begin by stating the criterion by Henckel et al. (2024a).
Theorem 1 [Henckel et al. (2024a)] Let and be disjoint node sets in an acyclic directed mixed graph . Then, is a valid conditional instrumental set relative to in if, and only if,
- ,
- and
- , where is with all directed edges from to a node in removed.
Let us introduce some of the terminology of this criterion more precisely. An acyclic directed mixed graph, ADMG for short, is a graph which may contain both directed and bidirected edges, however, no directed cycle. We denote d-separation in by and d-connectivity by . For the descendants of a node in graph , we write . We use to denote all nodes on directed paths from to , excluding itself, and the set of forbidden nodes relative to in as .
Let us begin by saying that checking this criterion for given , , and can be done straightforwardly with CIfly by verifying the conditions one-by-one. Conditions 2 and 3 are d-separation checks that we have already seen on the home page. The first condition can be checked by computing which amounts to chaining descendant and ancestor computations similar to what we have done in the CPDAG adjustment criterion verifier. We discuss this in more detail below.
Algorithm for finding conditional instruments
For finding a conditional instrument based on this criterion, one first has to develop a suitable algorithm which can then be implemented with CIfly. This is a non-trivial task and needs some sophistication. In our paper, we derive such an algorithm using the techniques introduced by van der Zander, Liśkiewicz (2020).
The algorithm is based on the notion of a nearest separator that we previously discussed in this article. More precisely, it relies on the following observations:
- if a conditional instrument , exists, then there also exists a conditional instrument , , that is one consisting of a single instrument node instead of a set ,
- if there exists a set for a node such that , is a conditional instrument, then , is also a conditional instrument with being a nearest separator between and restricted to nodes in .
While the first observation is straightforward, it would go beyond the scope of this article to prove the second observation. For more details, we refer to the appendix of the CIfly paper.
These observations enable the following algorithmic strategy:
- iterate over all that are not in ,
- compute as the nearest separator between and in restricted to nodes in ,
- if exists and , then return conditional instrument
As all non-forbidden are tried by computing a nearest separator (again only containing non-forbidden nodes, thereby ensuring Condition 1), which by definition satisfies Condition 3, and Condition 2 is checked explicitly, the correctness follows. If no conditional instrument is returned, then none exists, and, conversely, every returned instrument is valid. The algorithm runs in time because CIfly algorithms are called for each potential . For the implementation, we first need to discuss how to compute .
Computing the forbidden set
The forbidden set is defined as where contains all node on a directed path from to (excluding itself). It is easy to see that because a node is on a directed path between and it is a descendant of and an ancestor of . Hence, the forbidden set can be constructed by
- computing based on the descendants of and the ancestors of
- computing based on the descendants of
Thus, we need to compute the ancestors and descendants in an ADMG. Clearly, this is a trivial task that can be performed easily without CIfly as well. Still, we do this here with the following two straightforward rule tables.
EDGES --> <--, <->
SETS X
START <-- AT X
OUTPUT ...
... | <-- | true
The descendants can be computed analogously to the ancestors.
EDGES --> <--, <->
SETS X
START --> AT X
OUTPUT ...
... | --> | true
Hence, we obtain the following code for computing and .
anc = cf.reach(g, {"X": y}, anc_table)
des = cf.reach(g, {"X": x}, desc_table)
cn = set(anc).intersection(des) - set([x])
forb = cf.reach(g, {"X": cn}, desc_table) + [x]
anc <- reach(g, list("X" = y), ancTable)
des <- reach(g, list("X" = x), desTable)
cn <- setdiff(intersect(anc, des), c(x))
forb <- append(reach(g, list("X" = cn), desTable), x)
Full implementation
Now, we are able to provide a full implementation based on the algorithm described above. The only thing we haven’t discussed yet is how to compute , that is the graph obtained by deleting all directed edges from to a node in from . This graph modification can be done directly for graphs in the CIfly format (based on edgelists), but can also be done in your favorite graph library if you prefer to use the CIfly graph format only in reach
calls. We show code for doing this directly CIfly graphs below. We note that one can also avoid the graph modification altogether, when encoding in the rule table that edges between and can not be traversed (as we have done in this article).
The function removes edge between the sets from
and to
which have the specified edge type. This is done by constructing a new graph where these edges are filtered out. Hence, this function has no side effects and runs in time . You may implement this differently depending on your use case. We note that the function only works for ordered edges but can be easily adapted to unordered ones similar to the discussing for checking parents and siblings in this article.
def removed_ordered_edges(g, from_vars, to_vars, edge_type):
if edge_type not in g:
return g
# enable efficient lookup of membership in from_vars and to_vars
from_set = set(from_vars)
to_set = set(to_vars)
gmod = dict(g)
gmod[edge_type] = list(
filter(
lambda uv: uv[0] not in from_set or uv[1] not in to_set,
g[edge_type],
)
)
return gmod
removeOrderedEdges <- function(g, fromVars, toVars, edgeType) {
p <- highestNodeId(g)
if (p == 0 || !(edgeType %in% names(g))) {
return (g)
}
# enable efficient lookup of membership in fromVars and toVars
fromVarsLogical <- rep(FALSE, p)
fromVarsLogical[fromVars] <- TRUE
toVarsLogical <- rep(FALSE, p)
toVarsLogical[toVars] <- TRUE
edgeList <- g[[edgeType]]
edgeListNew <- edgeList[!fromVarsLogical[edgeList[,1]] | !toVarsLogical[edgeList[,2]], , drop=FALSE]
gNew <- g
gNew[[edgeType]] <- edgeListNew
return (gNew)
}
Now, we are able to give the full implementation of the first sound-and-complete algorithm for finding conditional instruments.
import ciflypy as cf
import ciflypy_examples.utils as utils
from ciflypy_examples.nearest_dsep import find_nearest_separator
ruletables = utils.get_ruletable_path()
anc_table = cf.Ruletable(ruletables / "ancestors_admg.txt")
desc_table = cf.Ruletable(ruletables / "descendants_admg.txt")
dsep_table = cf.Ruletable(ruletables / "dconnected_admg.txt")
def sound_and_complete_instrument(p, g, x, y):
anc = cf.reach(g, {"X": y}, anc_table)
des = cf.reach(g, {"X": x}, desc_table)
cn = set(anc).intersection(des) - set([x])
forb = cf.reach(g, {"X": cn}, desc_table) + [x]
non_forb = set(range(p)) - set(forb)
g_parsed = cf.Graph(g, anc_table)
g_mod = utils.removed_ordered_edges(g, [x], cn, "-->")
g_mod_parsed = cf.Graph(g_mod, anc_table)
for z in non_forb - set([y]):
W = find_nearest_separator(g_mod_parsed, [y], [z], [], non_forb)
if W is not None and z in cf.reach(g_parsed, {"X": x, "Z": W}, dsep_table):
return ([z], list(W))
return None
library("ciflyr")
library("here")
source(here("R", "utils.R"))
source(here("R", "nearestDsep.R"))
ruletables <- getRuletablePath()
ancTable <- parseRuletable(file.path(ruletables, "ancestors_admg.txt"))
desTable <- parseRuletable(file.path(ruletables, "descendants_admg.txt"))
dconTable <- parseRuletable(file.path(ruletables, "dconnected_admg.txt"))
soundAndCompleteInstrument <- function(p, g, x, y) {
anc <- reach(g, list("X" = y), ancTable)
des <- reach(g, list("X" = x), desTable)
cn <- setdiff(intersect(anc, des), c(x))
forb <- append(reach(g, list("X" = cn), desTable), x)
notForb <- setdiff(1:p, forb)
gParsed <- parseGraph(g, ancTable)
gMod <- removeOrderedEdges(g, x, cn, "-->")
gModParsed <- parseGraph(gMod, ancTable)
for (z in setdiff(notForb, c(y))) {
W = findNearestSeparator(gModParsed, c(y), c(z), c(), notForb)
if (!is.null(W) && z %in% reach(gParsed, list("X" = x, "Z" = W), dconTable)) {
return (list("Z" = z, "W" = W))
}
}
return (NULL)
}