PreprintPDF Available

# Graph isomorphism testing boosted by path coloring

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

## Abstract and Figures

A method for improving the efficiency of graph isomorphism testing is presented. The method uses the structure of the graph colored by vertex hash codes as a means of partitioning vertices into equivalence classes, which in turn reduces the combinatorial burden of isomorphism testing. Unrolling the graph into a tree at each vertex allows structurally different regular graphs to be discriminated, a capability that the color refinement algorithm cannot do.
Content may be subject to copyright.
Graph isomorphism testing boosted by path coloring
Thomas E. Portegys, Dialectek, portegys@gmail.com
Abstract
A method for improving the efficiency of graph isomorphism testing is presented. The
method uses the structure of the graph colored by vertex hash codes as a means of
partitioning vertices into equivalence classes, which in turn reduces the combinatorial
burden of isomorphism testing. Unrolling the graph into a tree at each vertex allows
structurally different regular graphs to be discriminated, a capability that the color
refinement algorithm cannot do.
Key words: graph isomorphism, graph hashing, color refinement, vertex partitioning,
equivalence classes.
Introduction
Numerous uses can be found for unique and concise graph identifiers: graphs could then
be counted, sorted, compared and verified more easily. For example, chemical compounds
could be specified by identifying their constituent molecules represented by graphs of
spatial and bonding relationships between atoms. However, a problem with developing a
method for identifying graphs is that graphs are very general objects. Uniquely identifying
vertices and edges solves the problem but begs the question, since the problem then
becomes how to arrive at these identifiers in a uniform fashion (Sayers and Karp, 2004).
A method developed by Portegys (2008) identifies vertices by computing an MD5 hash
(Rivest, 1992) for a tree of nodes rooted at each vertex. A vertex tree is composed by
unrolling reachable vertices. Once each vertex is hashed, the vertex hashes are sorted and
hashed to yield a hash for the graph, a technique similar to that used by Melnik and Dunham
(2001) and Bhat (1980).
The vertex hashing can be seen as a coloring process, along the lines of the well-known
color refinement algorithm (Arvind et al., 2015; Grohe et al., 2014). Given a graph G, the
color refinement algorithm (or naive vertex classification) iteratively computes a sequence
of colorings Ci of V (G). The initial coloring C0 is uniform.
Then, Ci+1(u) = {{ Ci(a) : a N(u) }}, (1)
where {{. . .}} is a multiset operator. Note that C1(u) = C1(v) iff the two vertices have the
same degree.
Thus the coloring begins with a uniform coloring of the vertices of the graph and refines it
step by step so that, if two vertices have equal colors but differently colored neighborhoods
(with the multiplicities of colors counted), then these vertices get new different colors in
the next refinement step. The algorithm terminates as soon as no further refinement is
possible.
Graphs are isomorphic if there is a consistent mapping between their vertices (Karp, 1972).
For isomorphism testing of graphs G and H, the color refinement algorithm concludes that
G and H are non-isomorphic if the multisets of colors occurring in these graphs are
different. If this happens, the conclusion is correct. However, not all non-isomorphic
graphs are distinguishable, or amenable, to color refinement. The simplest example is given
by any two non-isomorphic regular graphs of the same degree with the same number of
vertices, such as those shown in Figure 1.
Figure 1 Regular non-isomorphic graphs.
Graph isomorphism testing, a problem that has long been believed to be of non-polynomial
complexity (NP), has recently been the subject of renewed attention (Babai, 2015). Graph
isomorphism is also of practical use in a number of areas, including mathematical
chemistry and electronic design automation. A number of isomorphism testing algorithms
are in use, e.g. the Ullman (1976) and Schmidt-Druffel (1976) algorithms. The Ullmann
algorithm is one of the most commonly used for graph isomorphism because of its
generality and effectiveness (Cordella, et. al., 2001).
Graph coloring partitions vertices into equivalence classes of identical colors, which can
reduce the complexity of isomorphism testing significantly. While the color refinement
algorithm uses vertex degree as a shallow means of grouping vertices, using deep vertex
hashing as a coloring method produces a finer discrimination of structure such that non-
isomorphic regular graphs can be differentiated. For example, the hashes for the graphs
depicted in Figure 1 will be different.
Description
This section describes the hashing algorithm.
Graph format
Using a pseudo-C++ notation, the following define a graph vertex and edge:
Vertex
{
int label;
Edge edges[];
};
Edge
{
int label;
Vertex source;
Vertex target;
bool directed;
};
This general scheme allows for a number of graph variations: labeled/unlabeled (using null
labels), directed/undirected (for undirected, source and target are synonymous), and
multigraphs.
Algorithm
The following object is used to construct MD5 hash codes based on vertex graph
neighborhoods:
VertexCoder
{
Vertex vertex;
vector<Vertex *> vertexBranch;
unsigned char code[MD5_SIZE];
void encode(bool hashLabels);
void expand();
void contract();
};
The algorithm iteratively expands each vertex in the graph into tree of coder objects
representing the vertices and edges in its neighborhood. Branching terminates when a
duplicate vertex appears in a branch, at which point the terminal coder takes on the hashed
value of the distance of the first appearance of the vertex in the branch. The graph hash
code is then constructed by sorting and hashing the vertex codes.
// Encode graph.
// The boolean argument allows labels to be included in the
// hash calculation.
void encode(bool hashLabels)
{
if (vertex != NULL)
{
expand();
}
int numChildren = children.size();
for (i = 0; i < numChildren; i++)
{
children[i]->coder->encode(hashLabels);
children[i]->coder->contract();
}
sort(children);
input = new unsigned char[HASH_INPUT_SIZE];
if (vertex != NULL)
{
if (hashLabels)
{
append(input, vertex->label);
}
if (numChildren > 0)
{
for (i = 0; i < numChildren; i++)
{
edge = children[i]->edge;
if (hashLabels)
{
append(input, edge->label);
}
if (edge->directed)
{
if (edge->source == vertex)
{
append(input, 1);
}
else
{
append(input, 0);
}
}
else
{
append(input, 2);
}
}
}
else
{
for (i = 0; i < vertexBranch.size(); i++)
{
if (vertex == vertexBranch[i])
{
break;
}
}
i++;
append(input, i);
}
}
for (i = 0; i < numChildren; i++)
{
append(input, children[i]->coder->code);
}
code = MD5hash(input);
}
// Expand coder.
void expand()
{
vector<Vertex *> childVertexBranch;
Vertex *childVertex;
VertexCoder *child;
for (i = 0; i < vertexBranch.size(); i++)
{
if (vertex == vertexBranch[i])
{
return;
}
childVertexBranch.push_back(vertexBranch[i]);
}
childVertexBranch.push_back(vertex);
for (i = 0; i < vertex->edges.size(); i++)
{
if (vertex == vertex->edges[i]->source)
{
childVertex = vertex->edges[i]->target;
}
else
{
childVertex = vertex->edges[i]->source;
}
child = new VertexCoder(childVertex, childVertexBranch);
children.push_back(child);
}
}
// Contract coder.
void contract()
{
for (i = 0; i < children.size(); i++)
{
delete children[i]->coder;
delete children[i];
}
children.clear();
}
Since each of N vertices unrolls a tree of potentially N nodes, the algorithm complexity is
O(N2). A proof of the algorithm appears challenging, but is currently underway. These
initial results are provided with the hope of eliciting further insights.
Example
The method is illustrated through an example. Consider the simple directed graph shown
in Figure 2. The vertices and edges are labeled for illustrative purposes, but the algorithm
works for unlabeled vertices and edges as well as undirected edges.
Figure 2 A simple directed graph.
Before the first call to encode(), the coder is configured as in Figure 3. This configuration
reveals nothing about the edges in the graph, and thus must always be expanded.
Figure 3 Initial coder configuration.
After the first expansion, the coder appears as in Figure 4. The (f) and (b) notation on the
edges represent a directed edge in the source graph in the forward and backward direction
respectively. Note that for vertex 0, there are 2 forward edges to vertices 1 and 2. For vertex
1, there is a forward and backward edge to vertex 2, and a backward edge to vertex 0.
Vertex 2 has a forward and backward edge to vertex 1, and a backward edge to vertex 0.
Although Figure 4 shows expanded vertices concurrently, in actuality a vertex is removed
through contraction after its hash values is obtained by its parent. The expansion continues
until each branch reaches a duplicate vertex.
Figure 4 First expansion.
Figure 5 depicts how the terminal coder values for two branches in the expansion tree are
assigned. On the left branch, the terminal is assigned a value of 1, as it is a duplicate of the
first coder (shown double-bordered), which 1 distant from the root. Likewise the right
terminal is a duplicate of the second coder and is assigned a value of 2. If the terminal coder
has no duplicate, it is assigned the length of the branch as a value.
Figure 5 Branch terminal coder values.
Results
To highlight the potency of vertex partitioning using a coloring algorithm such as hashing,
a comparison with brute force isomorphism testing is given in Table 1. For each
isomorphism test, a graph with random edge connections and random vertex and edge
labels is generated. Its isomorph is created by adding to each vertex and edge label the
maximum label value of the original graph plus one. The number of search combinations
to test isomorphism was measured. The relatively small graphs rapidly explodes in
complexity for the brute force method, while the hashed method remains remarkably flat.
Vertices x edges
Brute force
Hashed
5x5
14.1
15.5
5x10
16.2
26.9
10x10
5465.9
29.6
10x20
1593.8
43.2
15x15
5108975
44.2
Table 1 Isomorphism testing comparison.
The results presented in Portegys (2008) validated the ability of the algorithm to uniquely
hash unique graphs. Here we focused on the ability of the hash algorithm to discriminate
regular graphs in comparison to the color refinement algorithm. A graph generation
package (Johnsonbaugh and Kalin, 1991) was used to generate pairs of regular graphs with
varying number of vertices and degrees. Each pair consisted of graphs having the equal
quantities of vertices and equal degree. Some pairs were by chance isomorphic and others
were non-isomorphic. The color refinement algorithm, as expected, classified all the pairs
as isomorphic. The hash algorithm correctly distinguished all isomorphic and non-
isomorphic pairs.
Conclusion
A method for boosting the efficiency of graph isomorphism testing has been presented.
The method is able to discriminate graphs that elude the color refinement algorithm. The
method builds on a previously developed technique for identifying graphs using MD5
hashing.
The C++ code can be found here:
http://sourceforge.net/projects/graph-hashing/
References
V. Arvind, J. bler J., G. Rattan, O. Verbitsky (2015). Graph Isomorphism, Color
Refinement, and Compactness. ECCC TR15-032.
L. Babai (2015). Graph Isomorphism in Quasipolynomial Time.
http://arxiv.org/abs/1512.03547
K. V. S. Bhat (1980). Refined vertex codes and vertex partitioning methodology for
graph isomorphism testing. IEEE Trans. Systems Man Cybernet, 10(10) (1980) 610-615.
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento (2001). An Improved
Algorithm for Matching Large Graphs. Proceedings of International
Workshop on Graph-based Representation in Pattern Recognition, Ischia,
Italy, pp. 149 - 159.
M. Grohe, K. Kersting, M. Mladenov, E. Selman (2014). Dimension Reduction via
Colour Refinement. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp.
505-516. Springer, Heidelberg.
R. Johnsonbaugh and M. Kalin (1991). A graph generation software package. Proceedings
of the twenty-second SIGCSE technical symposium on Computer science education. ACM
New York, NY, USA. pp. 151-154.
R. M. Karp (1972). Reducibility among combinatorial problems, in: R. E. Miller and J.
W. Thatcher (Eds.), Complexity of Computer Computations (Plenum, New York) 85-103.
S. Melnik (2001). RDF API draft: Cryptographic digests of RDF models and statements,
http://www-db.stanford.edu/~melnik/rdf/api.html#digest.
T. E. Portegys (2008). General Graph Identification by Hashing.
http://arxiv.org/abs/1512.07263
R. Rivest (1992). RFC 1321 The MD5 Message-Digest Algorithm,
http://tools.ietf.org/html/rfc1321.
C. Sayers and A. H. Karp (2004). RDF Graph Digest Techniques and Potential
Applications. Mobile and Media Systems Laboratory, HP Laboratories Palo Alto, HPL-
2004-95.
D. C. Schmidt and L. E. Druffel (1976). A Fast Backtracking Algorithm to Test Directed
Graphs for Isomorphism Using Distance Matrices. Journal of the Association for
Computing Machinery, 23, pp. 433-445.
J. R. Ullmann (1976). An Algorithm for Subgraph Isomorphism. Journal of the
Association for Computing Machinery, vol. 23, pp. 31-42.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We show that the Graph Isomorphism (GI) problem and the related problems of String Isomorphism (under group action) (SI) and Coset Intersection (CI) can be solved in quasipolynomial ($\exp((\log n)^{O(1)})$) time. The best previous bound for GI was $\exp(O(\sqrt{n\log n}))$, where $n$ is the number of vertices (Luks, 1983); for the other two problems, the bound was similar, $\exp(\tilde{O}(\sqrt{n}))$, where $n$ is the size of the permutation domain (Babai, 1983). The algorithm builds on Luks's SI framework and attacks the barrier configurations for Luks's algorithm by group theoretic "local certificates" and combinatorial canonical partitioning techniques. We show that in a well-defined sense, Johnson graphs are the only obstructions to effective canonical partitioning.
Article
Full-text available
Color refinement is a classical technique to show that two given graphs $G$ and $H$ are non-isomorphic; it is very efficient, even if incomplete in general. We call a graph $G$ amenable to color refinement if this algorithm succeeds in distinguishing $G$ from any non-isomorphic graph $H$. Babai, Erd\H{o}s, and Selkow (1982) proved that almost all graphs $G$ are amenable. We here determine the exact range of applicability of color refinement by showing that the class of all amenable graphs is recognizable in time $O((n+m)\log n)$, where $n$ and $m$ denote the number of vertices and the number of edges in the input graph. Furthermore, we prove that amenable graphs are compact in the sense of Tinhofer (1991), that is, their polytopes of fractional automorphisms are integral. The concept of compactness was introduced in order to identify the class of graphs $G$ for which isomorphism $G\cong H$ can be decided by computing an extreme point of the polytope of fractional isomorphisms from $G$ to $H$ and checking if this point is integral. Our result implies that this linear programming approach to isomorphism testing has the applicability range at least as large as the combinatorial approach based on color refinement.
Article
Full-text available
A method for identifying graphs using MD5 hashing is presented. This allows fast graph equality comparisons and can also be used to facilitate graph isomorphism testing. The graphs can be labeled or unlabeled. The method identifies vertices by hashing the graph configuration in their neighborhoods. With each vertex hashed, the entire graph can be identified by hashing the vertex hashes.
Conference Paper
Full-text available
A large class of computational problems involve the determination of properties of graphs, digraphs, integers, arrays of integers, finite families of finite sets, boolean formulas and elements of other countable domains. Through simple encodings from such domains into the set of words over a finite alphabet these problems can be converted into language recognition problems, and we can inquire into their computational complexity. It is reasonable to consider such a problem satisfactorily solved when an algorithm for its solution is found which terminates within a number of steps bounded by a polynomial in the length of the input. We show that a large number of classic unsolved problems of covering, matching, packing, routing, assignment and sequencing are equivalent, in the sense that either each of them possesses a polynomial-bounded algorithm or none of them does.
Conference Paper
Colour refinement is a basic algorithmic routine for graph isomorphism testing, appearing as a subroutine in almost all practical isomorphism solvers. It partitions the vertices of a graph into “colour classes” in such a way that all vertices in the same colour class have the same number of neighbours in every colour class. There is a tight correspondence between colour refinement and fractional isomorphisms of graphs, which are solutions to the LP relaxation of a natural ILP formulation of graph isomorphism. We introduce a version of colour refinement for matrices and extend existing quasilinear algorithms for computing the colour classes. Then we generalise the correspondence between colour refinement and fractional automorphisms and develop a theory of fractional automorphisms and isomorphisms of matrices. We apply our results to reduce the dimensions of systems of linear equations and linear programs. Specifically, we show that any given LP L can efficiently be transformed into a (potentially) smaller LP L′ whose number of variables and constraints is the number of colour classes of the colour refinement algorithm, applied to a matrix associated with the LP. The transformation is such that we can easily (by a linear mapping) map both feasible and optimal solutions back and forth between the two LPs. We demonstrate empirically that colour refinement can indeed greatly reduce the cost of solving linear programs.
Article
Subgraph isomorphism can be determined by means of a brute-force tree-search enumeration procedure. In this paper a new algorithm is introduced that attains efficiency by inferentially eliminating successor nodes in the tree search. To assess the time actually taken by the new algorithm, subgraph isomorphism, clique detection, graph isomorphism, and directed graph isomorphism experiments have been carried out with random and with various nonrandom graphs. A parallel asynchronous logic-in-memory implementation of a vital part of the algorithm is also described, although this hardware has not actually been built. The hardware implementation would allow very rapid determination of isomorphism.
Conference Paper
We discuss a software package that generates graphs of specified sizes and properties. Among the types of graphs are. random graphs l random connected graphs l random directed acyclic graphs l random complete weighted graphs l random pairs of isomorphic regular graphs l random graphs with Hamiltonian cycles l random networks Graphs may be specified further with respect to one or more of these properties: o weighted or unweighed. directed or undirected l simple or nonsimple Such graphs are useful to faculty and students for testing and experimenting with many algorithms that appear in the computer science curriculum, such as algorithms to find components, to perform a topological sort, to solve the traveling salesperson problem, to find a minimal spanning tree, or to solve the maximal flow problem. Our software package, written in C, writes graphs to user-specified files. The package is available at no cost from the authors.
Article
A backtracking algorithm for testing a pair of digraphs for isomorphism is presented. The information contained in the distance matrix representation of a graph is used to establish an initial partition of the graph's vertices. This distance matrix information is then applied in a backtracking procedure to reduce the search tree of possible mappings. While the algorithm is not guaranteed to run in polynomial time, it performs efficiently for a large class of graphs.