Optimal conditional instruments

In this article, we show how to implement an algorithm for constructing a graphically optimal conditional instrumental set. Instrumental variables are a popular tool for identifying and estimating causal effects in the presence of unmeasured confounding and conditional instrumental sets generalize this notion further. As there may be many different conditional instruments in a causal graph, it is an important question which intruments are preferable over others. Henckel et al. (2024a) provide a graphical criterion for a conditonal instrumental set with a statistical efficiency guarantees they call graphical optimality. The main result in this regard is stated in the following theorem. Note that we focus here purely on the graphical aspects of this problem, for more statistical and causal background, we refer the reader to the original paper.

Theorem 1 [Henckel et al. (2024a)]
Let xx and yy be distinct nodes in an acyclic directed mixed graph (ADMG) GG such that de(x)G={x,y}\text{de}(x)_G = \{x, y\}. Then, it holds for the sets Wopt=disG,{x}+(y){x,y}W^{\text{opt}} = \text{dis}^+_{G, \{x\}}(y) \setminus \{x, y\} and Zopt=disG,{y}+(x)(Wopt{x,y})Z^{\text{opt}} = \text{dis}^+_{G, \{y\}}(x) \setminus (W^{\text{opt}} \cup \{x, y\}) that

  1. if ZoptZ^\text{opt} \neq \emptyset, then (Zopt,Wopt)(Z^\text{opt}, W^\text{opt}) is a valid conditional instrumental set relative to (x,y)(x, y) in GG and
  2. if Zopt(paG(x)sibG(x))Z^\text{opt} \cap (\text{pa}_G(x) \cup \text{sib}_G(x)) \neq \emptyset, then (Zopt,Wopt)(Z^\text{opt}, W^\text{opt}) is graphically optimal relative to (x,y)(x, y) in GG.

The criterion is stated for acyclic directed mixed graph, from now on abbreviated as ADMGs, which are graphs with both directed and bidirected edges that do not contain a directed cycle. paG(x)\text{pa}_G(x) are the parents of xx in GG, i.e., nodes pp with pxp \rightarrow x, and sibG(x)\text{sib}_G(x) are the siblings of xx in GG, i.e., the nodes with pxp \leftrightarrow x. Further, disG,W(x)\text{dis}_{G, W}(x) contains all nodes connected to xx via a path of bidirected edges, with all nodes on the path not being in WW, and disG,W+(x)=(disG,W(x)paG(disG,W(x)))W\text{dis}_{G, W}^+(x) = (\text{dis}_{G, W}(x) \cup \text{pa}_G(\text{dis}_{G, W}(x))) \setminus W.

The criterion assumes that yy is the only descendant of xx, which may appear like a strong restriction. However, one may consider the latent projection over all other descendants of xx to satisfy this condition (for more details on the latent projection, we refer to the CIfly paper). The only consequence of this, is that no descendants of xx in the original graph can be part of the instrumental set. However, this is a mild assumption because if such variables could be used as part of the instrument, then the causal effect would also be identifiable using adjustment. One way to deal with it, is to first search for an adjustment set (a task that can also be solved using CIfly) and if none exists proceed with the methodology described in this article. Hence, in the following, we will always assume that the causal effect cannot be identified by adjustment.

While using the latent projection to remove all descendants of xx (apart from yy) is methodologically reasonable under this assumption, algorithmically, computing the latent projection is more expensive than CIfly reachability algorithms. Hence, we would like to avoid this operation here. To this aim, we use the following graphical characterization of WoptW^\text{opt} and ZoptZ^\text{opt} developed in the CIfly paper.

Theorem 2
Consider distinct nodes xx and yy in an ADMG GG and D=deG(x)D = \text{de}_G(x) such that yDy \in D and let A=VDA = V \setminus D. Then, WoptW^{\text{opt}} and ZoptZ^{\text{opt}} for the graph obtained by latent projection over D{x,y}D \setminus \{x, y\} are equivalent to the following constructions in GG:

  1. Wopt={vV    v is on a path pc1ckd1dly}AW^{\text{opt}} = \{ v \in V \; \mid \; v \text{ is on a path } p \rightarrow c_1 \leftrightarrow \cdots \leftrightarrow \cdots c_k \leftrightarrow d_1 \rightarrow \cdots \rightarrow \cdots d_l \rightarrow y \} \cap A, where p,c1,,ckAp, c_1, \dots, c_k \in A and d1,,dl(D{x})d_1, \dots, d_l \in (D \setminus \{ x \}) possibly with k=0k=0 or l=0l=0 and
  2. Zopt={vV    v is on a path pc1ckx}WoptZ^{\text{opt}} = \{ v \in V \; \mid \; v \text{ is on a path } p \rightarrow c_1 \leftrightarrow \cdots \leftrightarrow c_k \leftrightarrow x\} \setminus W^{\text{opt}}, where p,c1,,ckAp, c_1, \dots, c_k \in A possibly with k=0k=0.

Using this theorem, we proceed as follows:

  1. Check that ydeG(x)y \in \text{de}_G(x).
  2. Compute WoptW^{\text{opt}} and ZoptZ^{\text{opt}} as defined in Theorem 2.
  3. Check Condition 1. and 2. of Theorem 1.

This will guarantee that the resulting sets WoptW^{\text{opt}} and ZoptZ^{\text{opt}} constitute graphical optimal instrumental sets, assuming that there is no adjustment set relative to xx and yy in GG. Now, let us consider the three steps one-by-one.

Descendant check

Of course, the task of checking whether yy is a descendant of xx in GG is obvious and available in many off-the-shelf graph packages. Still, we show how it can be easily done in CIfly as well. For this, we use the following rule table.

descendants_admg.txt
EDGES --> <--, <->
SETS X
START --> AT X
OUTPUT ...

... | --> | true

Effectively, this table describes to start at the nodes in XX as if the last edge was \rightarrow and from there only follow directed edges. All reached nodes are returned as the descendants of XX. We can use this in as follows to check whether ydeG(x)y \in \text{de}_G(x).

descendants = cf.reach(g, {"X": x}, "descendants_admg.txt")
isDescendant = y in descendants

Compute WoptW^{\text{opt}} and ZoptZ^{\text{opt}}

We will compute both WoptW^{\text{opt}} and ZoptZ^{\text{opt}} using the same underlying ruletable. Let us begin with WoptW^{\text{opt}}. Here, we want to start at yy and track paths that, first, follow edges \leftarrow (assuming the respective nodes are in D{x}D \setminus \{x\}) until at some point we switch to bidirected edges for edges in AA. The path may end with final directed edge \leftarrow to a node in AA. To do this consider the following rule table.

optimal_iv_admg.txt
EDGES  --> <--, <->
SETS   S, D, F
COLORS pass, yield
START  <-- [pass] AT S
OUTPUT ... [yield]

...         | -->              | false
<-- [pass]  | <--      [pass]  | next in D     and next not in F
<-- [pass]  | <--, <-> [yield] | next not in D and next not in F
<-> [yield] | <--, <-> [yield] | next not in D and next not in F

Here, we use the fact that A=VDA = V \setminus D, hence it suffices to pass DD. The set SS forms the starting nodes and FF allows us to restrict which nodes are visited. This table tracks exactly the kind of paths we have described above. We ensure that it only returns nodes reached in AA by using the colors pass and yield. We can compute both WoptW^{\text{opt}} and ZoptZ^{\text{opt}} using this table as follows. For the latter, observe that descendants of xx can never be connected to it via a path of the form xx \leftarrow \cdots due to the acyclicity of the underlying graph.

W = set(cf.reach(g, {"S": y, "D": descendants, "F": x}, "./optimal_iv_admg.txt"))
Z = set(cf.reach(g, {"S": x, "D": descendants, "F": []}, "./optimal_iv_admg.txt")) - W

Checking the optimality condition

The resulting sets ZoptZ^{\text{opt}} and WoptW^{\text{opt}} are valid, provided ZoptZ^{\text{opt}} is non-empty, which is trivial to check outside of CIfly. The second condition is that ZoptZ^{\text{opt}} contains a parent or sibling of xx. While this could be checked using a CIfly algorithm, clearly it is easier to do this directly in Python or R as well. Here, we do this based on the CIfly graph representation, but you may prefer to do this using your favorite graph library or format instead.

Recall that a CIfly graph is stored as one edge list per edge type. In Python, each edge is given as a pair of node IDs. Hence, we may use the following helper functions to check whether ZZ contains a parent or sibling of xx.

def contains_parent(g, x, Z):
    return any(map(lambda uv: uv[0] in Z and uv[1] == x, g["-->"]))

def contains_sibling(g, x, Z):
    return any(map(lambda uv: (uv[0] in Z and uv[1] == x) or (uv[0] == x and uv[1] in Z), g["<->"]))

To do the parent check, the lambda function tests for a pair uv from the edge list of directed edges whether the first node is in ZZ and the second node is xx. The sibling check is a bit more verbose because both orderings have to be tested. Provided that testing membership in ZZ is constant-time (as is the case here because we pass ZZ as a Python set), these functions run in linear-time in the size of the graph as a whole edge list is searched. Of course, better run-times are possible if graph representations more suitable to this task are used. While this does not matter for the time complexity of this procedure, it can be useful to maintain such a representation in case the parent and sibling checks are needed repeatedly.

In R, the edge lists are given as r×2r \times 2 matrices in case there are rr edges of that type. In order to check whether some node in ZZ is a parent of xx we can filter the edge list matrix for such rows, quite similarly to the Python code above. However, because R has no native set data structure, we take some extra steps to ensure efficient look-up in the filtering (even for large sets ZZ). Namely, we first store a Boolean vector indicating membership in ZZ. Afterwards, we filter the edges in g[["-->"]] that have a node in ZZ as first and xx as second element and return TRUE or FALSE based on the length of the filtered matrix. Let us consider this for the parent check first.

containsParent <- function(g, x, Z) {
	# enable efficient lookup of membership in Z
	p <- highestNodeId(g)
	zLogical <- rep(FALSE, p)
	zLogical[Z] <- TRUE

	edgeList <- g[["-->"]]
	return (length(edgeList[zLogical[edgeList[,1]] & edgeList[,2] == x]) > 0)
}

This code uses a small helper function highestNodeId to find the highest node ID that is part of an edge in a CIfly graph. Note that the number of nodes is not stored in a CIfly graph object and, hence, has to be managed outside of it if needed.

highestNodeId <- function(g) {
	nodeIds <- unlist(g)
	if (length(nodeIds) == 0) {
		return (0)
	} else {
		return (max(nodeIds))
	}
}

Again, the siblings can be retrieved similarly, by checking for both possible orders of the endpoints in the edge list matrix.

containsSibling <- function(g, x, Z) {
	p <- highestNodeId(g)
	zLogical <- rep(FALSE, p)
	zLogical[Z] <- TRUE

	edgeList <- g[["<->"]]
	return (length(edgeList[(zLogical[edgeList[,1]] & edgeList[,2] == x) | (zLogical[edgeList[,2]] & edgeList[,1] == x)]) > 0)
}

We note that in case this functionality is used more often, a helper function

containsNeighborCifly(g, x, Z, edgeType, orderedEdge=TRUE)

may prove useful, which abstracts this code for different edge types including an argument that indicates whether the edge type is ordered.

Full implementation

Hence, we obtain the following implementation for finding graphically optimal instrumental sets ZZ and WW assuming that the causal effect of xx on yy cannot be identified with adjustment.

The functions for checking parents and siblings are loaded from utils. As before, we also rely on a simple function get_ruletable_path(), respectively getRuletablePath(), that returns a path to the rule-table folder.

optimal_iv.py
import ciflypy as cf
import ciflypy_examples.utils as utils

ruletables = utils.get_ruletable_path()
descendants_table = cf.Ruletable(ruletables / "descendants_admg.txt")
optimal_iv_table = cf.Ruletable(ruletables / "optimal_iv_admg.txt")

# find Z, W according to the criterion given by Henckel et al. (2024)
# NOTE: the returned instrument is only optimal if there is no adjustment set between x and y
# we do NOT check this here and thus if this assumption is violated
# the function alone only ensures soundness, but not optimality
def optimal_instrument(g, x, y):
    descendants = cf.reach(g, {"X": x}, descendants_table)
    if y not in descendants:
        return None

    W = set(cf.reach(g, {"S": y, "D": descendants, "F": x}, optimal_iv_table))
    Z = set(cf.reach(g, {"S": x, "D": descendants, "F": []}, optimal_iv_table)) - W

    if Z and (utils.contains_parent(g, x, Z) or utils.contains_sibling(g, x, Z)):
        return (list(Z), list(W))
    else:
        return None

References