The ciflyr package
In this article, we discuss how CIfly can be used from R with the ciflyr
package. We plan to make it available on CRAN so that it can be installed with install.packages("ciflyr")
. At its core, the package provides a single function reach
that performs a CIfly algorithm for an input graph and input sets according to a specified rule table. As a simple example, consider the code below that tests amenability using the table for finding non-amenable nodes discussed in the previous article introducing rule tables.
library("ciflyr")
# it is assumed that X and Y are disjoint
isAmenable <- function(g, X, Y) {
sets <- list("X" = X)
tablePath <- "./not_amenable_cpdag.txt"
reached <- reach(g, sets, tablePath)
return (!(intersect(reached, Y)))
}
g <- list("-->" = list(c(1, 2), c(3, 2)), "---" = list(c(3, 4)))
X <- c(4)
Y <- c(2)
print(isAmenable(g, X, Y))
Here, reach
is called with a graph g
(as a list mapping edge types to lists of of edges), a list of sets (mapping the set names specified in the rule table to node vectors) and a path to the text file containing the rule table (in this case assuming it lies in the current working directory; for better options, see below). This function returns a vector of nodes, the output of the CIfly algorithm specified by the rule table. In the example above, these are all non-amenable nodes with respect to which we can use to test amenability of a pair and . We now discuss how the arguments should be passed to reach
.
Graph
The graph is represented by a list with each name describing an edge type and containing a matrix whose rows represent the edges of this type. For example, the graph which has edges would be represented as list("-->" = rbind(c(1, 2), c(3, 2)), "---" = rbind(c(3, 4)))
. This is equivalent to list("-->" = rbind(c(1,2)), "<--" = rbind(c(2, 3)), "---" = rbind(c(3,4)))
provided the rule table contains the edge description edges: --> <--, ---
specifying -->
and <--
to be reciprocal.
For the node IDs, we strongly recommend to use the numbers for a graph with nodes. Generally, any positive integers can be used, however, the memory demand of ciflyr
is linear in the largest node ID because of the internal representation of graphs and sets. Hence, unnecessarily large node numbers lead to slower run times and potential memory problems. Non-integer node names have to be stored outside of the ciflyr
graph format, e.g., in a vector indexed by the node IDs.
Sets
The sets are represented by a list as well with the names of the respective node sets as strings mapping to their instantiation represented as vector of positive integers. As an example, list("X" = c(1), "Z" = c(3, 4))
would represent the two sets and . The names of the sets must match the names specified in the rule table.
Rule Table
In its basic version, the rule table is passed to reach
by its file path as string. If used outside of simple scripts, make sure that the rule table file is included within the package and that its path is specified relative to the project root and not the current working directory. Better code compared to the example above would be to use the here
package to achieve this.
library("ciflyr")
library("here")
# path to the file "not_amenable_cpdag.txt" relative to the project root
tablePath = here("ruletables", "not_amenable_cpdag.txt")
# ...
# pass tablePath to cf.reach as third argument
reach(g, sets, tablePath)
If you do not want rule-table text files in your project, it is also possible to directly embed tables into your R code as multi-line strings.
notAmenableTable = "
EDGES --> <--, ---
SETS X
COLORS init, yield
START ... [init] AT X
OUTPUT ... [yield]
... [init] | --- [yield] | next not in X
... [yield] | ---, --> [yield] | next not in X
"
# ...
# pass table as string and set keyword argument tableAsString accordingly
reach(g, sets, notAmenableTable, tableAsString=TRUE)
Output
reach
returns a vector of positive integers, representing the vertices that are reached in an output state (as specified in the rule table) when running the search on the given graph with the provided sets.
Debugging
Currently, the ciflyr
error messages are quite verbose (due to limitations of the R-to-Rust interface) and look roughly as follows:
thread '<unnamed>' panicked at /path/to/ciflyr/src/vendor/extendr-api/src/robj/into_robj.rs:74:13:
called `Result::unwrap()` on an `Err` value: Other("Error: expected a string as ruletable argument.")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace`
The relevant information here is the string "Error: expected a string as ruletable argument."
in the second line. This message gives a hint what went wrong (in this case that a string was expected for the argument passing the rule table).
In case your CIfly algorithms runs without error but returns an unexpected result, it is possible to pass the keyword argument verbose=TRUE
to reach
for debugging purposes. This logs the steps the reachability algorithm takes during its execution.
Advanced Usage
In every reach
call, the implementation needs to parse the graph, sets and rule table. This can introduce unnecessary overhead in the case that the same graph, sets or rule table are used repeatedly. To address this, ciflyr
offers the option to parse these objects outside of the reach
function with the functions parseGraph
, parseSets
and parseRuletable
that, respectively, return a Graph
, Sets
and Ruletable
object.
parseRuletable
can be called as parseRuletable(tablePath)
or parseRuletable(tableString, tableAsString=TRUE)
. The first argument of parseGraph
and parseSets
are also in the same encoding as the respective reach
arguments. However, it is necessary to pass a rule table as second argument (this can be done via file path, as a multi-line string with tableAsString
set to TRUE
or by passing a Ruletable
object). Afterwards, these objects must be used in combination with rule tables that have the same EDGES
, respectively SETS
specification as the passed rule table. All three objects are internal CIfly representations and can only be passed as their respective arguments into reach
. We offer no further methods on these classes.
As an example, consider the code below that computes all amenable nodes for every possible . Here, the same rule table and the same graph are used repeatedly. Hence, parsing them once outside of reach
avoids unnecessary overhead. We also parse sets
outside of reach
. As this is done once per reach
call it, however, offers no performance advantage in this setting.
library("ciflyr")
table = parseRuletable("./not_amenable_cpdag.txt")
# p stores the number of nodes
p = 4
g = parseGraph(list("-->" = list(c(1, 2), c(3, 2)), "---" = list(c(3, 4))), table)
for (x in seq(p)) {
sets = parseSets(list("X" = c(x)), table)
notAmenable = append(reach(g, sets, table), x)
print(paste0("amenable for ", x, ": ", paste(setdiff(seq(p), notAmenable), collapse = ", ")))
}
Conversion to the ciflyr graph format
The ciflyr
graph format is based on edge lists represented by matrices. Graphs can usually be converted to this format with little effort. For example, consider a directed graph in igraph
:
# construct graph
g <- igraph::make_graph(edges = c(1, 2, 2, 3), n = 3, directed = TRUE)
# convert to ciflyr format
ciflyGraph <- list("-->" = as_edgelist(g))
print(ciflyGraph)
This assumes that the directed edges are referred to by the string "-->"
in the rule tables.
As another example, consider graphs in the graph
package, such as the graphNEL-class:
# construct graph (node names are required for graphNEL)
nodeNames <- LETTERS[1:3]
adjList <- vector("list", length=3)
names(adjList) <- nodeNames
adjList[[1]] <- list(edges=2)
adjList[[2]] <- list(edges=3)
g <- graph::graphNEL(nodes=nodeNames, edgeL=adjList, edgemode="directed")
# convert to ciflyr format
ciflyGraph <- list("-->" = t(graph::edgeMatrix(g)))
print(ciflyGraph)
Note that the node names are not part of the ciflyr
format and have to be stored separately.
Adjacency matrices can be converted just as easily. We use the same example graph and offer a function that works for both possible matrix orientations, i --> j
if g[i, j] = 1
or if g[j, i]
, as specified by the edgeDirection
argument.
# function for conversion to ciflyr graph format
adjacencymatrix2cifly <- function(g, edgeDirection) {
edges <- which(g == 1, arr.ind = TRUE)
if (edgeDirection == "from row to column") {
return (list("-->" = edges))
} else if (edgeDirection == "from column to row") {
return (list("<--" = edges))
} else {
stop("Invalid edgeDirection")
}
}
# construct graph
g <- matrix(c(0, 0, 0, 1, 0, 0, 0, 1, 0), nrow = 3, ncol = 3)
# convert to ciflyr format using the option "from row to column"
# thus, the matrix orientation is specified as i --> j if g[i, j] = 1
# conversely, "from column to row" would imply i --> j if g[j, i] = 1
ciflyGraph <- adjacencymatrix2cifly(g, "from row to column")
print(ciflyGraph)