Rule tables · CIfly

At the heart of CIfly lie rule tables. These objects specify CIfly algorithms. In this article, we describe the syntax of CIfly rule tables. A CIfly rule table consists of two parts. In the upper half, general configurations are specified, such as the edge types that the causal graph may contain or the states, where the reachability algorithm is started. In the bottom half, the rules are stated line-by-line. As a basic example, consider the rule table for d-connectivity in DAGs.

To showcase the CIfly rule-table syntax, we use a rule table for finding all non-amenable nodes from some starting set $X$ . This notion is relevant for checking adjustment sets in CPDAGs or more generally, whether a causal effect is identifiable from observational data. However, the causal theory behind this definition will not be important in this article. Essentially, a node $y$ is called non-amenable if there exists a path from a node $x \in X$ to $y$ such that the first edge is undirected, all other edges are either undirected or directed towards $y$ and that the path contains no further node in $x$ . This can be expressed in the following rule table.

EDGES --> <--, ---
SETS X
COLORS init, yield
START ... [init] AT X
OUTPUT ... [yield]

... [init]  | ---      [yield] | next not in X
... [yield] | ---, --> [yield] | next not in X

Let us briefly give some intuitive explanation of this rule table. In the rules at the bottom of the table, we ensure two things. First, when the current state has the init color and there is an undirected edge from the current node, we check whether there is a transition to a state with the yield color by testing whether the next node (the one connected to the current node by an undirected edge) is not in set $X$ . Second, there is a transition between states with color yield whenever there is an undirected edges or a directed edge pointing from the current towards the next node. In the specification at the top, specifically the OUTPUT line, we impose that only nodes reached with color yield be returned. We also impose that the reachability algorithm start at the nodes in $X$ with initial color init as specified in the START line. With these specifications, the rule table ensures that we return exactly those nodes on paths starting at a node $x \in X$ with the first edge being undirected, followed afterwards by undirected or directed edges pointing away from $x$ , with no other node in $X$ .

Below, we focus on the syntax of rule tables in more detail. Before we start, we note that one can add whole-line comments in a rule table by having # as first non-whitespace character. Empty lines are ignored.

Edges

A line beginning with EDGES specifies the allowed edge types, which are given as arbitrary strings (with the restriction of not being equal to ... nor containing a whitespace or comma). The edge descriptions are separated by comma and we distinguish symmetric edges (such as --- or <->), which are given by a single string, from asymmetric edges (such as -->), which are given as a whitespace-separated pair such as --> <--, stating that --> and <-- refer to the same underlying edges with left and right endpoint reversed. Thus, the edges of a CPDAG may be given as EDGES --> <--, ---, but one can also use other encodings like EDGES right left, undir.

Colors

A line beginning with COLORS specifies colors to extend the state space. Each color is given as a arbitrary string (as before, not being equal to ... nor containing a whitespace or comma) and these strings are comma separated. Common are colors such as init and yield, as in our example above, to specify initial and output states. These are specified as COLORS init, yield. This line can be omitted if no colors are used.

Sets

A line beginning with SETS specifies sets to use in the search rules. Each set is given as a string (subject to the same restrictions as before) and these strings are comma separated. A classical example would be a d-connectivity search with regard to a set Z, which is started at set X. In this case one would specify SETS Z, X. Or, in our example above, we have SETS X because we are only concerned with our starting set of nodes $X$ . Note that the sets which are specified have to be provided when running the CIfly algorithm later on. Only sets specified in this line can be used in the rules below.

Start

Any line beginning with START describes a starting configuration for the search. It should take the form START e [c] FOR s specifying a comma-separated list of edge types, a comma-separated list of colors in square brackets followed by a comma-separated list of sets (all of these need to be declared before this line). An example would be START <-- [init] FOR X, which states that the search is started for all vertices in X with color init as if the last edge was <--. It might seem counterintuitive to specify the non-existing previous edge as <--. Sometimes this allows for elegant specifications of the search rules. One alternative is to state START ... [init] for X with the ... being a placeholder for ‘all defined edge types’ meaning the search is started for all possible edge types. This placeholder for any edge type is supported in the search rule specification as well (more on that below). If one wants to be more explicit, it is also possible to use START <--, --- [init] for X to state multiple options, here the edge types <-- and ---. Another way to avoid the artificial use of <-- could be to specify a new edge type init, which never occurs in the given graph and is only present for handling the rules for the starting states START init [init] FOR X. However, we often prefer using ... as an edge type placeholder combined with the color init as explained above.

Note that the color can also be omitted, for example, writing START <-- FOR X. This is necessary, when no color was specified, but it can also be helpful if colors are used, in which case it has the same meaning as ... for edges and matches all possible colors. Moreover, it is possible to specify multiple START lines and the search will take every starting configuration into account.

Output

Any line beginning with OUTPUT describes a target configuration for the search. The search will return a list of all vertices, which are reached in a target configuration. The configuration is specified as pair e [c]. Similarly to the starting configurations, it is possible to use ... as an edge type placeholder, to give multiple edge types and colors as a comma-separated list, and to specify multiple OUTPUT lines. Above, we have OUTPUT ... [yield] to indicate that all vertices reached with color yield, no matter through which edge type, are returned.

Rules

After the edges, colors, sets, start and output states are specified, the rules can be stated. This is done in multiple lines. Each line is separated by two | characters into a pattern for a previous state, a pattern for the next state (this part is also called the case) and an expression, which is essentially a logical formula (syntax is specified below). Generally, to decide whether there exists a transition from one state current e [c] to next f [d] for vertices current and next, edge types e and f, and colors c and d, the lines are considered top to bottom. For the first case that matches, the corresponding expression is evaluated and the resulting boolean indicates whether the transition exists. If no case matches, then false is returned.

A case is described as a e [c] | f [d] referring to the edge types and colors stated in the previous paragraph. As above, it is possible to use ... to match against any edge type and to specify multiple options as a comma-separated list. Colors can be omitted with the same meaning as above. For example, the case ... [init] | --- [yield] matches transitions from states of color init to yield if the next edge is undirected, or ... [yield] | ---, --> [yield] matches all transitions following an undirected or directed edge from current to next, with both states having color yield. It is again possible to omit the colors, for example, to just have ... | ---, -->, assuming that colors are not needed in this transition.

With expression we mean a logical expression which supports the infix operators and, or, in, not in and the prefix operator not. Atoms true and false are available. Furthermore, one can refer to current and next, the current and next vertex, as well as the specified sets. This allows to test set membership as, for example, current in Z or next not in X, assuming Z and X were specified as described above. The operator precedences are such that in and not in have the highest precedence, followed by not. The operators and and or have the lowest precedence. Parentheses ( and ) can be used to group expressions.

As further example, a full line containing a rule may look like this

<-- [pass] | ... [yield] | next not in D and current not in F

assuming that the sets D and F were specified beforehand. In most applications, more than one rule is needed, for example, consider the basic d-connectivity example for ADMGs given set Z:

->, <-> | <-, <-> | current in Z
...     | ...     | current not in Z

Recall that the the evaluation of the rule for the first matching pattern is returned. That means that, for a collider, it is checked whether current in Z and if that does not hold, then false is returned. The second line is never considered in this case.