Finding IVs · CIfly

In this article, we provide a CIfly-based implementation of a sound-and-complete algorithm for finding a conditional instrument in an acyclic directed mixed graph (ADMG). Such graphs are very common in graphical causality. They contain directed and bidirected edges and represent causal models with potential latent confounding. For causal effect identification in such graphs, instrumental variables constitute a fundamental technique under a linearity assumption and they can be effective even in case of latent confounding between the outcome and treatment variables. Conditional instruments allow for an additional set $W$ to make this strategy more general.

The algorithm we develop here is the first sound and complete one for this setting, that is, one that finds a conditional instrument precisely if a valid one exists. This is achieved by building on the recently developed complete graphical criterion by Henckel et al. (2024a). Algorithmically, we rely on techniques from earlier work by van der Zander, Liśkiewicz (2020) that are, however, not based on an complete criterion. We begin by stating the criterion by Henckel et al. (2024a).

Theorem 1 [Henckel et al. (2024a)] Let $\{x,y\},Z$ and $W$ be disjoint node sets in an acyclic directed mixed graph $G$ . Then, $(Z,W)$ is a valid conditional instrumental set relative to $(x,y)$ in $G$ if, and only if,

$(Z \cup W) \cap \text{forb}_G(x, y) = \emptyset$ ,
$x \not\perp_{G} Z \mid W$ and
$y \perp_{\tilde{G}} Z \mid W$ , where $\tilde{G}$ is $G$ with all directed edges from $x$ to a node in $\text{causal}_G(x,y)$ removed.

Let us introduce some of the terminology of this criterion more precisely. An acyclic directed mixed graph, ADMG for short, is a graph which may contain both directed and bidirected edges, however, no directed cycle. We denote d-separation in $G$ by $\perp_G$ and d-connectivity by $\not\perp_G$ . For the descendants of a node $x$ in graph $G$ , we write $\text{de}_G(x)$ . We use $\text{causal}_G(x, y)$ to denote all nodes on directed paths from $x$ to $y$ , excluding $x$ itself, and the set of forbidden nodes relative to $(x,y)$ in $G$ as $\text{forb}_G = \text{de}_G(\text{causal}_G(x, y)) \cup \{x\}$ .

Let us begin by saying that checking this criterion for given $x$ , $y$ , $Z$ and $W$ can be done straightforwardly with CIfly by verifying the conditions one-by-one. Conditions 2 and 3 are d-separation checks that we have already seen on the home page. The first condition can be checked by computing $\text{forb}$ which amounts to chaining descendant and ancestor computations similar to what we have done in the CPDAG adjustment criterion verifier. We discuss this in more detail below.

Algorithm for finding conditional instruments

For finding a conditional instrument based on this criterion, one first has to develop a suitable algorithm which can then be implemented with CIfly. This is a non-trivial task and needs some sophistication. In our paper, we derive such an algorithm using the techniques introduced by van der Zander, Liśkiewicz (2020).

The algorithm is based on the notion of a nearest separator that we previously discussed in this article. More precisely, it relies on the following observations:

if a conditional instrument $Z$ , $W$ exists, then there also exists a conditional instrument $z$ , $W$ , that is one consisting of a single instrument node $z$ instead of a set $Z$ ,
if there exists a set $W$ for a node $z$ such that $z$ , $W$ is a conditional instrument, then $z$ , $W'$ is also a conditional instrument with $W'$ being a nearest separator between $y$ and $z$ restricted to nodes in $\text{an}_{\tilde{G}}(y, z) \setminus \text{forb}_G(x, y)$ .

While the first observation is straightforward, it would go beyond the scope of this article to prove the second observation. For more details, we refer to the appendix of the CIfly paper.

These observations enable the following algorithmic strategy:

iterate over all $z \in V$ that are not in $\text{forb}_G(x, y) \cup \{x, y\}$ ,
compute $W$ as the nearest separator between $y$ and $z$ in $\tilde{G}$ restricted to nodes in $V \setminus \text{forb}_G(x, y) \cup \{x, y\}$ ,
if $W$ exists and $x \not\perp_G z \mid W$ , then return conditional instrument $(z, W)$

As all non-forbidden $z$ are tried by computing a nearest separator $W$ (again only containing non-forbidden nodes, thereby ensuring Condition 1), which by definition satisfies Condition 3, and Condition 2 is checked explicitly, the correctness follows. If no conditional instrument is returned, then none exists, and, conversely, every returned instrument is valid. The algorithm runs in time $O(p \cdot (p + m))$ because CIfly algorithms are called for each potential $z$ . For the implementation, we first need to discuss how to compute $\text{forb}_G(x, y)$ .

Computing the forbidden set

The forbidden set is defined as $\text{forb}_G = \text{de}_G(\text{causal}_G(x, y)) \cup \{x\}$ where $\text{causal}_G(x, y)$ contains all node on a directed path from $x$ to $y$ (excluding $x$ itself). It is easy to see that $\text{causal}_G(x, y) = \text{de}_G(x) \cap \text{an}_G(y) \setminus \{x\}$ because a node is on a directed path between $x$ and $y$ it is a descendant of $x$ and an ancestor of $y$ . Hence, the forbidden set can be constructed by

computing $\text{causal}_G(x, y)$ based on the descendants of $x$ and the ancestors of $y$
computing $\text{forb}_G(x, y)$ based on the descendants of $\text{causal}_G(x, y)$

Thus, we need to compute the ancestors and descendants in an ADMG. Clearly, this is a trivial task that can be performed easily without CIfly as well. Still, we do this here with the following two straightforward rule tables.

ancestors_admg.txt

EDGES --> <--, <->
SETS X
START <-- AT X
OUTPUT ...

... | <-- | true

The descendants can be computed analogously to the ancestors.

descendants_admg.txt

EDGES --> <--, <->
SETS X
START --> AT X
OUTPUT ...

... | --> | true

Hence, we obtain the following code for computing $\text{causal}_G(x, y)$ and $\text{forb}_G(x, y)$ .

anc = cf.reach(g, {"X": y}, anc_table)
des = cf.reach(g, {"X": x}, desc_table)
cn = set(anc).intersection(des) - set([x])
forb = cf.reach(g, {"X": cn}, desc_table) + [x]

Full implementation

Now, we are able to provide a full implementation based on the algorithm described above. The only thing we haven’t discussed yet is how to compute $\tilde{G}$ , that is the graph obtained by deleting all directed edges from $x$ to a node in $\text{causal}_G(x, y)$ from $G$ . This graph modification can be done directly for graphs in the CIfly format (based on edgelists), but can also be done in your favorite graph library if you prefer to use the CIfly graph format only in reach calls. We show code for doing this directly CIfly graphs below. We note that one can also avoid the graph modification altogether, when encoding in the rule table that edges between $x$ and $\text{causal}_G(x, y)$ can not be traversed (as we have done in this article).

The function removes edge between the sets from and to which have the specified edge type. This is done by constructing a new graph where these edges are filtered out. Hence, this function has no side effects and runs in time $O(p + m)$ . You may implement this differently depending on your use case. We note that the function only works for ordered edges but can be easily adapted to unordered ones similar to the discussing for checking parents and siblings in this article.

def removed_ordered_edges(g, from_vars, to_vars, edge_type):
  if edge_type not in g:
      return g

  # enable efficient lookup of membership in from_vars and to_vars
  from_set = set(from_vars)
  to_set = set(to_vars)
  gmod = dict(g)
  gmod[edge_type] = list(
      filter(
          lambda uv: uv[0] not in from_set or uv[1] not in to_set,
          g[edge_type],
      )
  )
  return gmod

removeOrderedEdges <- function(g, fromVars, toVars, edgeType) {
  p <- highestNodeId(g)
  if (p == 0 || !(edgeType %in% names(g))) {
      return (g)
  }

  # enable efficient lookup of membership in fromVars and toVars
  fromVarsLogical <- rep(FALSE, p)
  fromVarsLogical[fromVars] <- TRUE
  toVarsLogical <- rep(FALSE, p)
  toVarsLogical[toVars] <- TRUE

  edgeList <- g[[edgeType]]
  edgeListNew <- edgeList[!fromVarsLogical[edgeList[,1]] | !toVarsLogical[edgeList[,2]], , drop=FALSE]
  gNew <- g
 gNew[[edgeType]] <- edgeListNew
 return (gNew)
}

Now, we are able to give the full implementation of the first sound-and-complete algorithm for finding conditional instruments.

sound_and_complete_iv.py

import ciflypy as cf
import ciflypy_examples.utils as utils
from ciflypy_examples.nearest_dsep import find_nearest_separator

ruletables = utils.get_ruletable_path()
anc_table = cf.Ruletable(ruletables / "ancestors_admg.txt")
desc_table = cf.Ruletable(ruletables / "descendants_admg.txt")
dsep_table = cf.Ruletable(ruletables / "dconnected_admg.txt")

def sound_and_complete_instrument(p, g, x, y):
    anc = cf.reach(g, {"X": y}, anc_table)
    des = cf.reach(g, {"X": x}, desc_table)
    cn = set(anc).intersection(des) - set([x])
    forb = cf.reach(g, {"X": cn}, desc_table) + [x]
    non_forb = set(range(p)) - set(forb)

    g_parsed = cf.Graph(g, anc_table)
    g_mod = utils.removed_ordered_edges(g, [x], cn, "-->")
    g_mod_parsed = cf.Graph(g_mod, anc_table)

    for z in non_forb - set([y]):
        W = find_nearest_separator(g_mod_parsed, [y], [z], [], non_forb)
        if W is not None and z in cf.reach(g_parsed, {"X": x, "Z": W}, dsep_table):
            return ([z], list(W))
    return None

soundAndCompleteIV.R

library("ciflyr")
library("here")

source(here("R", "utils.R"))
source(here("R", "nearestDsep.R"))

ruletables <- getRuletablePath()
ancTable <- parseRuletable(file.path(ruletables, "ancestors_admg.txt"))
desTable <- parseRuletable(file.path(ruletables, "descendants_admg.txt"))
dconTable <- parseRuletable(file.path(ruletables, "dconnected_admg.txt"))

soundAndCompleteInstrument <- function(p, g, x, y) {
	anc <- reach(g, list("X" = y), ancTable)
	des <- reach(g, list("X" = x), desTable)
	cn <- setdiff(intersect(anc, des), c(x))
	forb <- append(reach(g, list("X" = cn), desTable), x)
	notForb <- setdiff(1:p, forb)

	gParsed <- parseGraph(g, ancTable)
	gMod <- removeOrderedEdges(g, x, cn, "-->")
	gModParsed <- parseGraph(gMod, ancTable)
	
	for (z in setdiff(notForb, c(y))) {
		W = findNearestSeparator(gModParsed, c(y), c(z), c(), notForb)
		if (!is.null(W) && z %in% reach(gParsed, list("X" = x, "Z" = W), dconTable)) {
			return (list("Z" = z, "W" = W))
		}
	}
	return (NULL)
}

References

Leonard Henckel, Martin Buttenschoen, Marloes H. Maathuis.

Graphical tools for selecting conditional instrumental sets. Biometrika, 2024.

Benito van der Zander, Maciej Liśkiewicz.

Finding minimal d-separators in linear time and applications. UAI, 2020.

Finding conditional instruments

Algorithm for finding conditional instruments

Computing the forbidden set

Full implementation

References