Content uploaded by Thomas E. Portegys

Author content

All content in this area was uploaded by Thomas E. Portegys on May 30, 2016

Content may be subject to copyright.

Graph isomorphism testing boosted by path coloring

Thomas E. Portegys, Dialectek, portegys@gmail.com

Abstract

A method for improving the efficiency of graph isomorphism testing is presented. The

method uses the structure of the graph colored by vertex hash codes as a means of

partitioning vertices into equivalence classes, which in turn reduces the combinatorial

burden of isomorphism testing. Unrolling the graph into a tree at each vertex allows

structurally different regular graphs to be discriminated, a capability that the color

refinement algorithm cannot do.

Key words: graph isomorphism, graph hashing, color refinement, vertex partitioning,

equivalence classes.

Introduction

Numerous uses can be found for unique and concise graph identifiers: graphs could then

be counted, sorted, compared and verified more easily. For example, chemical compounds

could be specified by identifying their constituent molecules represented by graphs of

spatial and bonding relationships between atoms. However, a problem with developing a

method for identifying graphs is that graphs are very general objects. Uniquely identifying

vertices and edges solves the problem but begs the question, since the problem then

becomes how to arrive at these identifiers in a uniform fashion (Sayers and Karp, 2004).

A method developed by Portegys (2008) identifies vertices by computing an MD5 hash

(Rivest, 1992) for a tree of nodes rooted at each vertex. A vertex tree is composed by

unrolling reachable vertices. Once each vertex is hashed, the vertex hashes are sorted and

hashed to yield a hash for the graph, a technique similar to that used by Melnik and Dunham

(2001) and Bhat (1980).

The vertex hashing can be seen as a coloring process, along the lines of the well-known

color refinement algorithm (Arvind et al., 2015; Grohe et al., 2014). Given a graph G, the

color refinement algorithm (or naive vertex classification) iteratively computes a sequence

of colorings Ci of V (G). The initial coloring C0 is uniform.

Then, Ci+1(u) = {{ Ci(a) : a ∈ N(u) }}, (1)

where {{. . .}} is a multiset operator. Note that C1(u) = C1(v) iff the two vertices have the

same degree.

Thus the coloring begins with a uniform coloring of the vertices of the graph and refines it

step by step so that, if two vertices have equal colors but differently colored neighborhoods

(with the multiplicities of colors counted), then these vertices get new different colors in

the next refinement step. The algorithm terminates as soon as no further refinement is

possible.

Graphs are isomorphic if there is a consistent mapping between their vertices (Karp, 1972).

For isomorphism testing of graphs G and H, the color refinement algorithm concludes that

G and H are non-isomorphic if the multisets of colors occurring in these graphs are

different. If this happens, the conclusion is correct. However, not all non-isomorphic

graphs are distinguishable, or amenable, to color refinement. The simplest example is given

by any two non-isomorphic regular graphs of the same degree with the same number of

vertices, such as those shown in Figure 1.

Figure 1 – Regular non-isomorphic graphs.

Graph isomorphism testing, a problem that has long been believed to be of non-polynomial

complexity (NP), has recently been the subject of renewed attention (Babai, 2015). Graph

isomorphism is also of practical use in a number of areas, including mathematical

chemistry and electronic design automation. A number of isomorphism testing algorithms

are in use, e.g. the Ullman (1976) and Schmidt-Druffel (1976) algorithms. The Ullmann

algorithm is one of the most commonly used for graph isomorphism because of its

generality and effectiveness (Cordella, et. al., 2001).

Graph coloring partitions vertices into equivalence classes of identical colors, which can

reduce the complexity of isomorphism testing significantly. While the color refinement

algorithm uses vertex degree as a shallow means of grouping vertices, using deep vertex

hashing as a coloring method produces a finer discrimination of structure such that non-

isomorphic regular graphs can be differentiated. For example, the hashes for the graphs

depicted in Figure 1 will be different.

Description

This section describes the hashing algorithm.

Graph format

Using a pseudo-C++ notation, the following define a graph vertex and edge:

Vertex

{

int label;

Edge edges[];

};

Edge

{

int label;

Vertex source;

Vertex target;

bool directed;

};

This general scheme allows for a number of graph variations: labeled/unlabeled (using null

labels), directed/undirected (for undirected, source and target are synonymous), and

multigraphs.

Algorithm

The following object is used to construct MD5 hash codes based on vertex graph

neighborhoods:

VertexCoder

{

Vertex vertex;

vector<Vertex *> vertexBranch;

unsigned char code[MD5_SIZE];

void encode(bool hashLabels);

void expand();

void contract();

};

The algorithm iteratively expands each vertex in the graph into tree of coder objects

representing the vertices and edges in its neighborhood. Branching terminates when a

duplicate vertex appears in a branch, at which point the terminal coder takes on the hashed

value of the distance of the first appearance of the vertex in the branch. The graph hash

code is then constructed by sorting and hashing the vertex codes.

// Encode graph.

// The boolean argument allows labels to be included in the

// hash calculation.

void encode(bool hashLabels)

{

if (vertex != NULL)

{

expand();

}

int numChildren = children.size();

for (i = 0; i < numChildren; i++)

{

children[i]->coder->encode(hashLabels);

children[i]->coder->contract();

}

sort(children);

input = new unsigned char[HASH_INPUT_SIZE];

if (vertex != NULL)

{

if (hashLabels)

{

append(input, vertex->label);

}

if (numChildren > 0)

{

for (i = 0; i < numChildren; i++)

{

edge = children[i]->edge;

if (hashLabels)

{

append(input, edge->label);

}

if (edge->directed)

{

if (edge->source == vertex)

{

append(input, 1);

}

else

{

append(input, 0);

}

}

else

{

append(input, 2);

}

}

}

else

{

for (i = 0; i < vertexBranch.size(); i++)

{

if (vertex == vertexBranch[i])

{

break;

}

}

i++;

append(input, i);

}

}

for (i = 0; i < numChildren; i++)

{

append(input, children[i]->coder->code);

}

code = MD5hash(input);

}

// Expand coder.

void expand()

{

vector<Vertex *> childVertexBranch;

Vertex *childVertex;

VertexCoder *child;

for (i = 0; i < vertexBranch.size(); i++)

{

if (vertex == vertexBranch[i])

{

return;

}

childVertexBranch.push_back(vertexBranch[i]);

}

childVertexBranch.push_back(vertex);

for (i = 0; i < vertex->edges.size(); i++)

{

if (vertex == vertex->edges[i]->source)

{

childVertex = vertex->edges[i]->target;

}

else

{

childVertex = vertex->edges[i]->source;

}

child = new VertexCoder(childVertex, childVertexBranch);

children.push_back(child);

}

}

// Contract coder.

void contract()

{

for (i = 0; i < children.size(); i++)

{

delete children[i]->coder;

delete children[i];

}

children.clear();

}

Since each of N vertices unrolls a tree of potentially N nodes, the algorithm complexity is

O(N2). A proof of the algorithm appears challenging, but is currently underway. These

initial results are provided with the hope of eliciting further insights.

Example

The method is illustrated through an example. Consider the simple directed graph shown

in Figure 2. The vertices and edges are labeled for illustrative purposes, but the algorithm

works for unlabeled vertices and edges as well as undirected edges.

Figure 2 – A simple directed graph.

Before the first call to encode(), the coder is configured as in Figure 3. This configuration

reveals nothing about the edges in the graph, and thus must always be expanded.

Figure 3 – Initial coder configuration.

After the first expansion, the coder appears as in Figure 4. The (f) and (b) notation on the

edges represent a directed edge in the source graph in the forward and backward direction

respectively. Note that for vertex 0, there are 2 forward edges to vertices 1 and 2. For vertex

1, there is a forward and backward edge to vertex 2, and a backward edge to vertex 0.

Vertex 2 has a forward and backward edge to vertex 1, and a backward edge to vertex 0.

Although Figure 4 shows expanded vertices concurrently, in actuality a vertex is removed

through contraction after its hash values is obtained by its parent. The expansion continues

until each branch reaches a duplicate vertex.

Figure 4 – First expansion.

Figure 5 depicts how the terminal coder values for two branches in the expansion tree are

assigned. On the left branch, the terminal is assigned a value of 1, as it is a duplicate of the

first coder (shown double-bordered), which 1 distant from the root. Likewise the right

terminal is a duplicate of the second coder and is assigned a value of 2. If the terminal coder

has no duplicate, it is assigned the length of the branch as a value.

Figure 5 – Branch terminal coder values.

Results

To highlight the potency of vertex partitioning using a coloring algorithm such as hashing,

a comparison with brute force isomorphism testing is given in Table 1. For each

isomorphism test, a graph with random edge connections and random vertex and edge

labels is generated. Its isomorph is created by adding to each vertex and edge label the

maximum label value of the original graph plus one. The number of search combinations

to test isomorphism was measured. The relatively small graphs rapidly explodes in

complexity for the brute force method, while the hashed method remains remarkably flat.

Vertices x edges

Brute force

Hashed

5x5

14.1

15.5

5x10

16.2

26.9

10x10

5465.9

29.6

10x20

1593.8

43.2

15x15

5108975

44.2

Table 1 – Isomorphism testing comparison.

The results presented in Portegys (2008) validated the ability of the algorithm to uniquely

hash unique graphs. Here we focused on the ability of the hash algorithm to discriminate

regular graphs in comparison to the color refinement algorithm. A graph generation

package (Johnsonbaugh and Kalin, 1991) was used to generate pairs of regular graphs with

varying number of vertices and degrees. Each pair consisted of graphs having the equal

quantities of vertices and equal degree. Some pairs were by chance isomorphic and others

were non-isomorphic. The color refinement algorithm, as expected, classified all the pairs

as isomorphic. The hash algorithm correctly distinguished all isomorphic and non-

isomorphic pairs.

Conclusion

A method for boosting the efficiency of graph isomorphism testing has been presented.

The method is able to discriminate graphs that elude the color refinement algorithm. The

method builds on a previously developed technique for identifying graphs using MD5

hashing.

The C++ code can be found here:

http://sourceforge.net/projects/graph-hashing/

References

V. Arvind, J. Köbler J., G. Rattan, O. Verbitsky (2015). Graph Isomorphism, Color

Refinement, and Compactness. ECCC TR15-032.

L. Babai (2015). Graph Isomorphism in Quasipolynomial Time.

http://arxiv.org/abs/1512.03547

K. V. S. Bhat (1980). Refined vertex codes and vertex partitioning methodology for

graph isomorphism testing. IEEE Trans. Systems Man Cybernet, 10(10) (1980) 610-615.

L. P. Cordella, P. Foggia, C. Sansone, and M. Vento (2001). An Improved

Algorithm for Matching Large Graphs. Proceedings of International

Workshop on Graph-based Representation in Pattern Recognition, Ischia,

Italy, pp. 149 - 159.

M. Grohe, K. Kersting, M. Mladenov, E. Selman (2014). Dimension Reduction via

Colour Refinement. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp.

505-516. Springer, Heidelberg.

R. Johnsonbaugh and M. Kalin (1991). A graph generation software package. Proceedings

of the twenty-second SIGCSE technical symposium on Computer science education. ACM

New York, NY, USA. pp. 151-154.

R. M. Karp (1972). Reducibility among combinatorial problems, in: R. E. Miller and J.

W. Thatcher (Eds.), Complexity of Computer Computations (Plenum, New York) 85-103.

S. Melnik (2001). RDF API draft: Cryptographic digests of RDF models and statements,

http://www-db.stanford.edu/~melnik/rdf/api.html#digest.

T. E. Portegys (2008). General Graph Identification by Hashing.

http://arxiv.org/abs/1512.07263

R. Rivest (1992). RFC 1321 The MD5 Message-Digest Algorithm,

http://tools.ietf.org/html/rfc1321.

C. Sayers and A. H. Karp (2004). RDF Graph Digest Techniques and Potential

Applications. Mobile and Media Systems Laboratory, HP Laboratories Palo Alto, HPL-

2004-95.

D. C. Schmidt and L. E. Druffel (1976). A Fast Backtracking Algorithm to Test Directed

Graphs for Isomorphism Using Distance Matrices. Journal of the Association for

Computing Machinery, 23, pp. 433-445.

J. R. Ullmann (1976). An Algorithm for Subgraph Isomorphism. Journal of the

Association for Computing Machinery, vol. 23, pp. 31-42.