ciflyr · CIfly

In this article, we discuss how CIfly can be used from R with the ciflyr package. We plan to make it available on CRAN so that it can be installed with install.packages("ciflyr"). At its core, the package provides a single function reach that performs a CIfly algorithm for an input graph and input sets according to a specified rule table. As a simple example, consider the code below that tests amenability using the table for finding non-amenable nodes discussed in the previous article introducing rule tables.

library("ciflyr")

# it is assumed that X and Y are disjoint
isAmenable <- function(g, X, Y) {
    sets <- list("X" = X)
    tablePath <- "./not_amenable_cpdag.txt"
    reached <- reach(g, sets, tablePath)
    return (!(intersect(reached, Y)))
}

g <- list("-->" = list(c(1, 2), c(3, 2)), "---" = list(c(3, 4)))
X <- c(4)
Y <- c(2)
print(isAmenable(g, X, Y))

Here, reach is called with a graph g (as a list mapping edge types to lists of of edges), a list of sets (mapping the set names specified in the rule table to node vectors) and a path to the text file containing the rule table (in this case assuming it lies in the current working directory; for better options, see below). This function returns a vector of nodes, the output of the CIfly algorithm specified by the rule table. In the example above, these are all non-amenable nodes with respect to $X$ which we can use to test amenability of a pair $X$ and $Y$ . We now discuss how the arguments should be passed to reach.

Graph

The graph is represented by a list with each name describing an edge type and containing a $r \times 2$ matrix whose rows represent the edges of this type. For example, the graph which has edges $1 \rightarrow 2 \leftarrow 3 - 4$ would be represented as list("-->" = rbind(c(1, 2), c(3, 2)), "---" = rbind(c(3, 4))). This is equivalent to list("-->" = rbind(c(1,2)), "<--" = rbind(c(2, 3)), "---" = rbind(c(3,4))) provided the rule table contains the edge description edges: --> <--, --- specifying --> and <-- to be reciprocal.

For the node IDs, we strongly recommend to use the numbers $1, 2, \dots, p$ for a graph with $p$ nodes. Generally, any positive integers can be used, however, the memory demand of ciflyr is linear in the largest node ID because of the internal representation of graphs and sets. Hence, unnecessarily large node numbers lead to slower run times and potential memory problems. Non-integer node names have to be stored outside of the ciflyr graph format, e.g., in a vector indexed by the node IDs.

Sets

The sets are represented by a list as well with the names of the respective node sets as strings mapping to their instantiation represented as vector of positive integers. As an example, list("X" = c(1), "Z" = c(3, 4)) would represent the two sets $X = \{1\}$ and $Z = \{3, 4\}$ . The names of the sets must match the names specified in the rule table.

Rule Table

In its basic version, the rule table is passed to reach by its file path as string. If used outside of simple scripts, make sure that the rule table file is included within the package and that its path is specified relative to the project root and not the current working directory. Better code compared to the example above would be to use the here package to achieve this.

library("ciflyr")
library("here")

# path to the file "not_amenable_cpdag.txt" relative to the project root
tablePath = here("ruletables", "not_amenable_cpdag.txt")
# ...
# pass tablePath to cf.reach as third argument
reach(g, sets, tablePath)

If you do not want rule-table text files in your project, it is also possible to directly embed tables into your R code as multi-line strings.

notAmenableTable = "
EDGES --> <--, ---
SETS X
COLORS init, yield
START ... [init] AT X
OUTPUT ... [yield]

... [init]  | ---      [yield] | next not in X
... [yield] | ---, --> [yield] | next not in X
"
# ...
# pass table as string and set keyword argument tableAsString accordingly
reach(g, sets, notAmenableTable, tableAsString=TRUE)

Output

reach returns a vector of positive integers, representing the vertices that are reached in an output state (as specified in the rule table) when running the search on the given graph with the provided sets.

Debugging

Currently, the ciflyr error messages are quite verbose (due to limitations of the R-to-Rust interface) and look roughly as follows:

thread '<unnamed>' panicked at /path/to/ciflyr/src/vendor/extendr-api/src/robj/into_robj.rs:74:13:
called `Result::unwrap()` on an `Err` value: Other("Error: expected a string as ruletable argument.")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace`

The relevant information here is the string "Error: expected a string as ruletable argument." in the second line. This message gives a hint what went wrong (in this case that a string was expected for the argument passing the rule table).

In case your CIfly algorithms runs without error but returns an unexpected result, it is possible to pass the keyword argument verbose=TRUE to reach for debugging purposes. This logs the steps the reachability algorithm takes during its execution.

Advanced Usage

In every reach call, the implementation needs to parse the graph, sets and rule table. This can introduce unnecessary overhead in the case that the same graph, sets or rule table are used repeatedly. To address this, ciflyr offers the option to parse these objects outside of the reach function with the functions parseGraph, parseSets and parseRuletable that, respectively, return a Graph, Sets and Ruletable object.

parseRuletable can be called as parseRuletable(tablePath) or parseRuletable(tableString, tableAsString=TRUE). The first argument of parseGraph and parseSets are also in the same encoding as the respective reach arguments. However, it is necessary to pass a rule table as second argument (this can be done via file path, as a multi-line string with tableAsString set to TRUE or by passing a Ruletable object). Afterwards, these objects must be used in combination with rule tables that have the same EDGES, respectively SETS specification as the passed rule table. All three objects are internal CIfly representations and can only be passed as their respective arguments into reach. We offer no further methods on these classes.

As an example, consider the code below that computes all amenable nodes for every possible $x$ . Here, the same rule table and the same graph are used repeatedly. Hence, parsing them once outside of reach avoids unnecessary overhead. We also parse sets outside of reach. As this is done once per reach call it, however, offers no performance advantage in this setting.

library("ciflyr")

table = parseRuletable("./not_amenable_cpdag.txt")

# p stores the number of nodes
p = 4
g = parseGraph(list("-->" = list(c(1, 2), c(3, 2)), "---" = list(c(3, 4))), table)

for (x in seq(p)) {
    sets = parseSets(list("X" = c(x)), table)
    notAmenable = append(reach(g, sets, table), x)
    print(paste0("amenable for ", x, ": ", paste(setdiff(seq(p), notAmenable), collapse = ", ")))
}

Conversion to the ciflyr graph format

The ciflyr graph format is based on edge lists represented by $2 \times m$ matrices. Graphs can usually be converted to this format with little effort. For example, consider a directed graph in igraph:

# construct graph
g <- igraph::make_graph(edges = c(1, 2, 2, 3), n = 3, directed = TRUE)

# convert to ciflyr format
ciflyGraph <- list("-->" = as_edgelist(g))
print(ciflyGraph)

This assumes that the directed edges are referred to by the string "-->" in the rule tables.

As another example, consider graphs in the graph package, such as the graphNEL-class:

# construct graph (node names are required for graphNEL)
nodeNames <- LETTERS[1:3]
adjList <- vector("list", length=3)
names(adjList) <- nodeNames
adjList[[1]] <- list(edges=2)
adjList[[2]] <- list(edges=3)
g <- graph::graphNEL(nodes=nodeNames, edgeL=adjList, edgemode="directed")

# convert to ciflyr format
ciflyGraph <- list("-->" = t(graph::edgeMatrix(g)))
print(ciflyGraph)

Note that the node names are not part of the ciflyr format and have to be stored separately.

Adjacency matrices can be converted just as easily. We use the same example graph and offer a function that works for both possible matrix orientations, i --> j if g[i, j] = 1 or if g[j, i], as specified by the edgeDirection argument.

# function for conversion to ciflyr graph format
adjacencymatrix2cifly <- function(g, edgeDirection) {
    edges <- which(g == 1, arr.ind = TRUE)
    if (edgeDirection == "from row to column") {
        return (list("-->" = edges))
    } else if (edgeDirection == "from column to row") {
        return (list("<--" = edges))
    } else {
        stop("Invalid edgeDirection")
    }
}

# construct graph
g <- matrix(c(0, 0, 0, 1, 0, 0, 0, 1, 0), nrow = 3, ncol = 3)

# convert to ciflyr format using the option "from row to column"
# thus, the matrix orientation is specified as i --> j if g[i, j] = 1
# conversely, "from column to row" would imply i --> j if g[j, i] = 1
ciflyGraph <- adjacencymatrix2cifly(g, "from row to column")
print(ciflyGraph)