Conference PaperPDF Available

Graph Mover's Distance: An Efficiently Computable Distance Measure for Geometric Graphs

Authors:

Abstract

Many applications in pattern recognition represent patterns as a geometric graph. The geometric graph distance (GGD) has recently been studied in [13] as a meaningful measure of similarity between two geometric graphs. Since computing the GGD is known to be NP-hard, the distance measure proves an impractical choice for applications. As a computationally tractable alternative , we propose in this paper the Graph Mover's Distance (GMD), which has been formulated as an instance of the earth mover's distance. The computation of the GMD between two geometric graphs with at most n vertices takes only O(n^3)-time. Alongside studying the metric properties of the GMD, we investigate the stability of the GGD and GMD. The GMD also demonstrates extremely promising empirical evidence at recognizing letter drawings from the LETTER dataset [18].
CCCG 2023, Montreal, QC, Canada, July 31 August 4, 2023
Graph Mover’s Distance: An Efficiently Computable Distance Measure for
Geometric Graphs
Sushovan Majhi
*
Abstract
Many applications in pattern recognition represent pat-
terns as a geometric graph. The geometric graph dis-
tance (GGD) has recently been studied in [13] as a
meaningful measure of similarity between two geometric
graphs. Since computing the GGD is known to be NP
hard, the distance measure proves an impractical choice
for applications. As a computationally tractable alter-
native, we propose in this paper the Graph Mover’s Dis-
tance (GMD), which has been formulated as an instance
of the earth mover’s distance. The computation of the
GMD between two geometric graphs with at most n
vertices takes only O(n3)-time. Alongside studying the
metric properties of the GMD, we investigate the stabil-
ity of the GGD and GMD. The GMD also demonstrates
extremely promising empirical evidence at recognizing
letter drawings from the LETTER dataset [18].
1 Introduction
Graphs have been a widely accepted object for pro-
viding structural representation of patterns involving
relational properties. While hierarchical patterns are
commonly reduced to a string [7] or a tree represen-
tation [6], non-hierarchical patterns generally require a
graph representation. The problem of pattern recogni-
tion in such a representation then requires quantifying
(dis-)similarity between a query graph and a model or
prototype graph. Defining a relevant distance measure
for a class of graphs has been studied for almost five
decades now and has a myriad of applications including
chemical structure matching [21], fingerprint matching
[16], face identification [11], and symbol recognition [12].
Depending on the class of graphs of interest and the
area of application, several methods have been pro-
posed. Graph isomorphisms [5] or subgraph isomor-
phisms can be considered.
These, however, cannot cope with (sometimes minor)
local and structural deformations of the two graphs. To
address this issue, several alternative distance measures
have been studied. We particularly mention edit dis-
tance [20, 9] and inexact matching distance [3].
*
School of Information, University of California, Berkeley,
USA, smajhi@berkeley.edu
Although these distance measures have been battle-
proven for attributed graphs (i.e., combinatorial graphs
with finite label sets), the formulations seem inadequate
in providing meaningful similarity measures for geomet-
ric graphs.
A geometric graph belongs to a special class of at-
tributed graphs having an embedding into a Euclidean
space Rd, where the vertex labels are inferred from the
Euclidean locations of the vertices and the edge labels
are the Euclidean lengths of the edges.
In the last decade, there has been a gain in practical
applications involving comparison of geometric graphs,
such as road-network or map comparison [1], detection
of chemical structures using their spatial bonding ge-
ometry, etc. In addition, large datasets like [18] are be-
ing curated by pattern recognition and machine learning
communities.
1.1 Related Work and Our Contribution
We are inspired by the recently developed geometric
graph distance (GGD) in [4, 13]. Although the GGD
succeeds to be a relevant distance measure for geomet-
ric graphs, its computation, unfortunately, is known to
be NP-hard. Our motivation stems from applications
that demand an efficiently computable measure of sim-
ilarity for geometric graphs. The formulation of our
graph mover’s distance is based on the theoretical un-
derpinning of the GGD. The GMD provides a mean-
ingful yet computationally efficient similarity measure
between two geometric graphs.
In Section 2, we revisit the definition of the (GGD)
to investigate its stability under Hausdorff perturbation.
Section 3 is devoted to the study of the GMD. The GMD
has been shown to render a pseudo-metric on the class
of (ordered) geometric graphs. Finally, we apply the
GMD to classify letter drawings in Section 4. Our ex-
periment involves matching each of 2250 test drawings,
modeled as geometric graphs, to 15 prototype letters
from the English alphabet. For the drawings with LOW
distortion, the correct letter has been found among the
top 3 matches at a rate of 98.93%, where the benchmark
accuracy is 99.6% obtained using a k-nearest neighbor
classifier (k-NN) with the graph edit distance [3].
35th Canadian Conference on Computational Geometry, 2022
2 Geometric Graph Distance (GGD)
We first formally define a geometric graph. Throughout
the paper, the dimension of the ambient Euclidean space
is denoted by d1. We also assume that the cost
coefficients CVand CEare positive constants.
Definition 2.1 (Geometric Graph) Ageomet-
ric graph of Rdis a (finite) combinatorial graph
G= (VG, EG)with vertex set VGRd, and the
Euclidean straight-line segments {ab |(a, b)EG}
intersect (possibly) at their endpoints.
We denote the set of all geometric graphs of Rdby
G(Rd). Two geometric graphs G= (VG, E G) and
H= (VH, EH) are said to be equal, written G=H,
if and only if VG=VHand EG=EH. We make no
distinction between a geometric graph G= (VG, EG)
and its geometric realization as a subset of Rd; an edge
(u, v)EGcan be identified as the line-segment uv in
Rd, and its length by the Euclidean length |uv|.
Following the style of [13], we first revisit the def-
inition of GGD. The definition uses the notion of an
inexact matching. In order to denote a deleted vertex
and a deleted edge, we introduce the dummy vertex ϵV
and the dummy edge ϵE, respectively.
Definition 2.2 (Inexact Matching) Let G, H
G(Rd)be two geometric graphs. A relation π(VG
{ϵV})×(VH {ϵV})is called an (inexact) matching
if for any uVG(resp. vVH) there is exactly
one vVH {ϵV}(resp. uVG {ϵV}) such that
(u, v)π.
The set of all matchings between graphs G, H is de-
noted by Π(G, H). Intuitively, a matching πis a relation
that covers the vertex sets VG, V Hexactly once. As a
result, when restricted to VG(resp. VH), a matching
πcan be expressed as a map π:VGVH {ϵV}
(resp. π1:VHVG {ϵV}). In other words, when
(u, v)πand u=ϵV(resp. v=ϵV), it is justified to
write π(u) = v(resp. π1(v) = u). It is evident from
the definition that the induced map
π:{uVG|π(u)=ϵV}→{vVH|π1(v)=ϵV}
is a bijection. For edges e= (u1, u2)EGand
f= (v1, v2)EH, we introduce the short-hand π(e) :=
(π(u1), π(u2)) and π1(f) := (π1(v1), π1(v2)).
Another perspective of πis to view it as a match-
ing between portions of Gand H, (possibly) after ap-
plying some edits on the two graphs. For example,
π(u) = ϵV(resp. π1(v) = ϵV) encodes deletion of the
vertex ufrom G(resp. vfrom H), whereas π(e) = ϵE
(resp. π1(f) = ϵE) encodes deletion of the edge efrom
G(resp. ffrom H). Once the above deletion opera-
tions have been performed on the graphs, the resulting
subgraphs of Gand Hbecome isomorphic, which are
finally matched by translating the remaining vertices u
to π(u). Now, the cost of the matching πis defined as
the total cost for all of these operations:
Definition 2.3 (Cost of a Matching) Let G, H
G(Rd)be geometric graphs and πΠ(G, H)an inex-
act matching. The cost of π, is Cost(π) =
X
uVG
π(u)=ϵV
CV|uπ(u)|
| {z }
vertex translations
+X
eEG
π(e)=ϵE
CE|e|−|π(e)|
| {z }
edge translations
+
X
eEG
π(e)=ϵE
CE|e|
| {z }
edge deletions
+X
fEH
π1(f)=ϵE
CE|f|
| {z }
edge deletions
.(1)
Definition 2.4 (GGD)For geometric graphs G, H
G(Rd), their geometric graph distance, GGD(G, H), is
GGD(G, H)def
= min
πΠ(G,H)Cost(π).
2.1 Stability of GGD
A distance measure is said to be stable if it does not
change much if the inputs are perturbed only slightly.
Usually, the change is expected to be bounded above
by the amount of perturbation inflicted on the inputs.
The perturbation is measured under a suitable choice of
metric. In the context of geometric graphs, it is natural
to wonder if the GGD is stable under the Hausdorff
distance between two graphs. To our disappointment,
we can easily see for the graphs shown in Fig. 1 that
the GGD is positive, whereas the Hausdorff distance
between their realizations is zero. So, the Hausdorff
distance between the graphs can not bound their GGD
from above.
v1v2
u1u2u3
H
G
Figure 1: The graphs G(top) and H(bottom) are em-
bedded in the real line; the distance between consec-
utive ticks is 1 unit. The Hausdorff distance between
Gand His zero, however GGD(G, H) = CV+CEis
non-zero. The optimal matching is given by π(u1) = v1,
π(u2) = v2, and π(u3) = ϵV.
One might think that the GGD is stable when the
Hausdorff distance only between the vertices is consid-
ered. However, the graphs in Fig. 2 indicate otherwise.
Under strong requirements, however, it is not difficult
to prove the following result on the stability of GGD
under the Hausdorff distance.
CCCG 2023, Montreal, QC, Canada, July 31 August 4, 2023
0 1 2 3
0
1
2
3
u1
u3
u2
0 1 2 3
0
1
2
3
v3
v1
v2
Figure 2: For the graphs G, H G(R2), the Haus-
dorff distance between the vertex sets is zero, however
GGD(G, H)=4CEis non-zero. The optimal matching
is given by π(u1) = v1,π(u3) = v3,π(u2) = ϵV, and
π1(v2) = ϵV.
Theorem 1 (Hausdorff Stability of GGD) Let
G, H G(Rd)be geometric graphs with a graph
isomorphism π:VGVH. If δ > 0is such that
|uπ(u)| δfor all uVG, then
GGD(G, H)CV|VG|δ.
Proof. The given graph isomorphism πis a bijective
mapping between the vertices of Gand H. So, π
Π(G, H), i.e., it defines an inexact matching. Since πis
a graph isomorphism, it does not delete any vertex or
edge. More formally, for all uVGand vVH, we
have π(u)=ϵVand π1(v)=ϵV, respectively. Also,
for all eEGand fEH, we have π(e)=ϵEand
π1(f)=ϵE, respectively. From (1), the cost
Cost(π) = X
uVG
CV|uπ(u)| CV|VG|δ.
So, GGD(G, H)Cost(π)CV|VG|δ.
3 Graph Mover’s Distance (GMD)
We define the Graph Mover’s Distance for two ordered
geometric graphs. A geometric graph is called ordered if
its vertices are ordered or indexed. In that case, we de-
note the vertex set as a (finite) sequence VG={ui}m
i=1.
Let us denote by GO(Rd) the set of all ordered geomet-
ric graphs of Rd. The formulation of the GMD uses the
framework known as the earth mover’s distance (EMD).
3.1 Earth Mover’s Distance (EMD)
The EMD is a well-studied distance measure between
weighted point sets, with many successful applications
in a variety of domains; for example, see [8, 10, 17, 19].
The idea of the EMD was first conceived by Monge [14]
in 1781, in the context of transportation theory. The
name “earth mover’s distance” was coined only recently,
and is well-justified due to the following analogy. The
first weighted point set can be thought of as piles of
earth (dirt) lying on the point sites, with the weight
of a site indicating the amount of earth; whereas, the
other point set as pits of volumes given by the corre-
sponding weights. Given that the total amount of earth
in the piles equals the total volume of the pits, the EMD
computes the least (cumulative) cost needed to fill all
the pits with earth. Here, a unit of cost corresponds to
moving a unit of earth by a unit of “ground distance”
between the pile and the pit.
The EMD can be cast as a transportation problem
on a bipartite graph, which has several efficient imple-
mentations, e.g., the network simplex algorithm [2, 15].
Let the weighted point sets P={(pi, wpi)}m
i=1 and
Q={(qj, wqj)}n
j=1 be a set of suppliers and a set of
consumers, respectively. The weight wpidenotes the to-
tal supply of the supplier pi, and wqjthe total demand
of the consumer qj. The matrix [di,j ] is the matrix of
ground distances, where di,j denotes the cost of trans-
porting a unit of supply from pito qj. We also assume
the feasibility condition that the total supply equals the
total demand:
m
X
i=1
wpi=
n
X
j=1
wqj.(2)
Aflow of supply is given by a matrix [fi,j] with fi,j
denoting the units of supply transported from pito qj.
We want to find a flow that minimizes the overall cost
m
X
i=1
n
X
j=1
fi,j di,j
subject to:
fi,j 0 for any i= 1, . . . , m and j= 1, . . . , n (3)
n
X
j=1
fi,j =wifor any i= 1, . . . , m (4)
m
X
i=1
fi,j =wjfor any j= 1, . . . , n, (5)
Constraint (3) ensures a flow of units from Pto Q, and
not vice versa; constraint (4) dictates that a supplier
must send all its supply—not more or less; constraint
(5) guarantees that the demand of every consumer is
exactly fulfilled.
The earth mover’s distance (EMD) is then defined by
the cost of the optimal flow. A solution always exists,
provided condition (2) is satisfied. The weights and the
ground distances can be chosen to be any non-negative
numbers. However, we choose them appropriately in
order to solve our graph matching problem.
3.2 Defining the GMD
Let G, H GO(Rd) be two ordered geometric graphs
of Rdwith VG={ui}m
i=1 and VH={vj}n
j=1. For
35th Canadian Conference on Computational Geometry, 2022
1
1
1
2
u11
u21
u31
u42
v1
1
v2
1
v3
3
Figure 3: The bipartite network used by the GMD is
shown for two ordered graphs G, H with vertex sets
VG={u1, u2, u3}and VH={v1, v2}, respectively. The
dummy nodes u4for Gand v3for H, respectively, have
been shown in gray. Below each node, the corresponding
weights are shown. A particular flow has been depicted
here. The gray edges do not transport anything. A
red edge has a non-zero flow with the transported units
shown on them.
each i= 1, . . . , m, let EG
idenote the (row) m–vector
containing the lengths of (ordered) edges incident to the
vertex uiof G. More precisely, the
kth element of EG
i=(|eG
i,k|,if eG
i,k := (ui, uk)EG
0,otherwise.
Similarly, for each j= 1, . . . , n, we define EH
jto be the
(row) n–vector with the
kth element of EH
j=(|eH
j,k|,if eH
j,k := (vj, vk)EH
0,otherwise.
In order to formulate the desired instance of the EMD,
we take the point sets to be P={ui}m+1
i=1 and Q=
{vj}n+1
j=1 . Here, um+1 and vn+1 have been taken to be a
dummy supplier and dummy consumer, respectively, to
incorporate vertex deletion into our GMD framework.
The weights on the sites are defined as follows:
wui= 1 for i= 1 ...,m and wum+1 =n .
And,
wvj= 1 for j= 1 ...,n and wvm+1 =m .
We note that the feasibility condition (2) is satisfied:
m+nis the total weight for both Pand Q. An instance
of the transportation problem is depicted in Fig. 3.
Finally, the ground distance from uito vjis defined
by:
di,j =
CV|uivj|+CEEG
iDm×pEH
jDn×p1,
if 1 im, 1jn
CEEH
j1,if i=m+ 1 and 1 jn
CEEG
i1,if 1 imand j=n+ 1
0,otherwise.
Here, p= min{m, n}, the 1–norm of a row vector is
denoted by ∥·∥1, and Ddenotes a diagonal matrix with
the all diagonal entries being 1.
0 1 2
0
1
2u4u1
u5
u2u3
0 1 2
0
1
2v3
v1
v5
v2
v4
G H
Figure 4: For the geometric graph G, H GO(R2), the
GMD is zero. The optimal flow is given by the matching
π(u1) = v2,π(u2) = v1,π(u3) = v4,π(u4) = v3, and
π(u5) = v5.
3.3 Metric Properties
We can see that the GMD induces a pseudo-metric on
the space of ordered geometric graphs GO(Rd). Non-
negativity, symmetry, and triangle inequality follow
from those of the cost matrix [di,j ] defined in the GMD.
In addition, we note that G=H(as ordered graphs)
implies that di,j = 0 whenever i=j. The trivial flow,
where each uisends its full supply to vi, has a zero cost.
So, GMD(G, H) = 0. The GMD does not, however,
satisfy the separability condition on GO(Rd).
For the graphs G, H shown in Fig. 4, we have
GMD(G, H) = 0. We note that G, H have the follow-
ing adjacency length matrices [EG
i]iand [EH
j]j, respec-
tively:
0 0 0 2 2
0 0 2 0 2
0 2 0 0 0
2 0 0 0 0
22 0 0 0
and
0 0 2 0 2
0 0 0 2 2
2 0 0 0 0
0 2 0 0 0
22 0 0 0
.
It can be easily checked that the flow that transports
a unit of supply from u17→ v2,u27→ v1,u37→ v4,
u47→ v3,u57→ v5, and five units from u67→ v6has total
cost zero. So, GMD(G, H ) = 0. However, the graphs G
and Hare not the same geometric graph. The fact that
GGD(G, H)= 0 implies the GGD is not stable under
the GMD.
One can easily find even simpler configurations for
two distinct geometric graphs with a zero GMD—if the
graphs are allowed to have multiple connected compo-
nents.
We conclude this section by stating a stability result
for the GMD under the Hausdorff distance. We omit
the proof, since it uses a similar argument presented in
Theorem 1.
CCCG 2023, Montreal, QC, Canada, July 31 August 4, 2023
Theorem 2 (Hausdorff Stability of GMD) Let
G, H GO(Rd)be ordered geometric graphs with a
bijection π:VGVHsuch that eG
i,j =eH
π(i)(j)for
all i, j . If δ > 0is such that |uiπ(ui)| δfor all
uiVG, then
GMD(G, H)CV|VG|δ.
3.4 Computing the GMD
As pointed out earlier, the GMD can be computed as an
instance of transportation problem—using, for example,
the network simplex algorithm. If the graphs have at
most nvertices, computing the ground cost matrix [di,j]
takes O(n3)-time. Since the bipartite network has O(n)
vertices and O(n2) edges, the simplex algorithm runs
with a time complexity of O(n3), with a pretty good
constant. Overall, the time complexity of the GMD is
O(n3).
4 Experimental Results
We have implemented the GMD in Python, using net-
work simplex algorithm from the networkx package. We
ran a pattern retrieval experiment on letter drawings
from the IAM Graph Database [18]. The repository pro-
vides an extensive collection of graphs, both geometric
and labeled.
In particular, we performed our experiment on the
LETTER database from the repository. The graphs in
the database represent distorted letter drawings. The
database considers only 15 uppercase letters from the
English alphabet: A,E,F,H,I,K,L,M,N,T,V,W,X,Y,
and Z. For each letter, a prototype line drawing has been
manually constructed. On the prototypes, distortions
are applied with three different level of strengths: LOW,
MED, and HIGH, in order to produce 2250 letter graphs
for each level. Each test letter drawing is a graph with
straight-line edges; each node is labeled with its two-
dimensional coordinates. Since some of the graphs in
the dataset were not embedded, we had to compute the
intersections of the intersecting edges and label them
as nodes. The preprocessing guaranteed that all the
considered graphs were geometric; a prototype and a
distorted graph are shown in Fig. 5.
We devised a classifier for these letter drawings us-
ing the GMD. For this application, we chose CV= 4.5
and CE= 1. For a test letter, we computed its GMD
from the 15 prototypes, then sorted the prototypes in
an increasing order of their distance to the test graph.
We then check if the letter generating the test graph is
among the first kprototypes. For each level of distor-
tion and various values of k, we present the rate at which
the correct letter has been found in the first kmodels.
The summary of the empirical results have been shown
0 1 2 3
0
1
2
3
u3
u4
u1
u2
u5
0 1 2 3
0
1
2
3
v1
v2
v3
v4
v5
v6
v7
Figure 5: The prototype geometric graph of the letter
Ais shown on the left. On the right, a (MED) distorted
letter Ais shown.
correct letter in first kmodels (%)
Distortion k= 1 k= 3 k= 5
LOW 96.66% 98.93% 99.37%
MED 66.66% 85.37% 91.15%
HIGH 73.73% 90.48% 95.51%
Table 1: Empirical result on the LETTER dataset
in Table 1. Although the graph edit distance based k-
NN classifier still outperforms the GMD by a very small
margin, our results has been extremely satisfactory.
One possible reason why the GMD might fail to cor-
rectly classify some of the graphs is that lacks the sep-
arability property as a metric.
5 Discussions
We have successfully introduced an efficiently com-
putable and meaningful similarity measure for geomet-
ric graphs. However, the GMD lacks some of the de-
sirable properties, like separability and stability. The
currently presented stability results for the GGD and
GMD have a factor that depends on the size of the in-
put graphs. The question remains if the distance mea-
sures are in fact stable under much weaker conditions,
possibly with constant factors on the right side. It will
also be interesting to study the exact class of geometric
graphs for which the GMD is, in fact, a metric.
References
[1] M. Ahmed, S. Karagiorgou, D. Pfoser, and C. Wenk.
Map Construction Algorithms. Springer International
Publishing, first edition, 2015.
[2] R. Ahuja, T. Magnanti, and J. Orlin. Network Flows:
Theory, Algorithms, and Applications. Always learning.
Pearson, 2013.
[3] H. Bunke and G. Allermann. Inexact graph matching
for structural pattern recognition. Pattern Recognition
Letters, 1(4):245–253, May 1983.
[4] O. Cheong, J. Gudmundsson, H.-S. Kim, D. Schymura,
and F. Stehn. Measuring the Similarity of Geometric
35th Canadian Conference on Computational Geometry, 2022
Graphs. In J. Vahrenhold, editor, Experimental Algo-
rithms, volume 5526, pages 101–112. Springer, 2009.
[5] D. G. Corneil and C. C. Gotlieb. An efficient algorithm
for graph isomorphism. J. ACM, 17(1):51–64, 1970.
[6] K.-S. Fu and B. Bhargava. Tree systems for syntactic
pattern recognition. IEEE Transactions on Computers,
C-22(12):1087–1099, 1973.
[7] K.-S. Fu and P. Swain. On syntactic pattern recogni-
tion. In J. T. Tou, editor, Computer and Information
Sciences 1969, volume 2 of SEN Report Series Soft-
ware Engineering, pages 155–182. Elsevier, 1971.
[8] C. J. Hargreaves, M. S. Dyer, M. W. Gaultois, V. A.
Kurlin, and M. J. Rosseinsky. The Earth Mover’s Dis-
tance as a Metric for the Space of Inorganic Compo-
sitions. Chemistry of Materials, 32(24):10610–10620,
Dec. 2020.
[9] D. Justice and A. Hero. A binary linear programming
formulation of the graph edit distance. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
28(8):1200–1214, Aug. 2006.
[10] M. Kusner, Y. Sun, N. Kolkin, and K. Weinberger.
From Word Embeddings To Document Distances. In
Proceedings of the 32nd International Conference on
Machine Learning, pages 957–966. PMLR, June 2015.
ISSN: 1938-7228.
[11] J. Liu and Y. T. Lee. Graph-based method for face iden-
tification from a single 2d line drawing. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
23(10):1106–1119, 2001.
[12] J. Llados, E. Marti, and J. Villanueva. Symbol recog-
nition by error-tolerant subgraph matching between re-
gion adjacency graphs. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 23(10):1137–1143,
2001.
[13] S. Ma jhi and C. Wenk. Distance measures for geometric
graphs. arXiv preprint arXiv:2209.12869, 2022.
[14] G. Monge. emoire sur la th´eorie des eblais et des
remblais. Imprimerie royale, 1781.
[15] O. Pele and M. Werman. A Linear Time Histogram
Metric for Improved SIFT Matching. In D. Forsyth,
P. Torr, and A. Zisserman, editors, Computer Vision
ECCV 2008, Lecture Notes in Computer Science, pages
495–508, Berlin, Heidelberg, 2008. Springer.
[16] J. W. Raymond and P. Willett. Effectiveness
of graph-based and fingerprint-based similarity mea-
sures for virtual screening of 2D chemical structure
databases. Journal of Computer-Aided Molecular De-
sign, 16(1):59–71, 2002.
[17] Z. Ren, J. Yuan, and Z. Zhang. Robust hand gesture
recognition based on finger-earth mover’s distance with
a commodity depth camera. In Proceedings of the 19th
ACM international conference on Multimedia, MM ’11,
pages 1093–1096, New York, NY, USA, Nov. 2011. As-
sociation for Computing Machinery.
[18] K. Riesen and H. Bunke. IAM Graph Database Repos-
itory for Graph Based Pattern Recognition and Ma-
chine Learning. In Structural, Syntactic, and Statisti-
cal Pattern Recognition, volume 5342, pages 287–297.
Springer, 2008.
[19] Y. Rubner, C. Tomasi, and L. J. Guibas. The Earth
Mover’s Distance as a Metric for Image Retrieval. In-
ternational Journal of Computer Vision, 40(2):99–121,
Nov. 2000.
[20] A. Sanfeliu and K.-S. Fu. A distance measure be-
tween attributed relational graphs for pattern recog-
nition. IEEE Transactions on Systems, Man, and Cy-
bernetics, SMC-13(3):353–362, May 1983.
[21] P. Willett. Similarity Searching in Databases of Three-
Dimensional Chemical Structures. In H.-H. Bock,
W. Lenski, and M. M. Richter, editors, Information
Systems and Data Analysis, pages 280–293. Springer,
1994.
... For graphical applications, Kim et al. 45 pioneered a two-step ARG matching algorithm that improved the robustness and performance of graph matching through nested WD computations, while Noels et al. 46 crafted an WD-based graph distance metric tailored for analyzing financial statements, enhancing tools for fraud detection and company benchmarking. Other closely related existing metrics in this domain include the Geometric Graph Distance (GGD), 47 which quantifies the minimum cost required to transform one geometric graph into another by modifying node positions and edge connections within a Euclidean space, and the Graph Mover's Distance (GMD) , 48 which offers a more computationally tractable extension of the GGD, simplifying the transformation process using the EMD and making it more feasible for large-scale applications. ...
Preprint
Full-text available
Intelligent autonomous agents hold much potential for the domain of cyber-security. However, due to many state-of-the-art approaches relying on uninterpretable black-box models, there is growing demand for methods that offer stakeholders clear and actionable insights into their latent beliefs and motivations. To address this, we evaluate Theory of Mind (ToM) approaches for Autonomous Cyber Operations. Upon learning a robust prior, ToM models can predict an agent's goals, behaviours, and contextual beliefs given only a handful of past behaviour observations. In this paper, we introduce a novel Graph Neural Network (GNN)-based ToM architecture tailored for cyber-defence, Graph-In, Graph-Out (GIGO)-ToM, which can accurately predict both the targets and attack trajectories of adversarial cyber agents over arbitrary computer network topologies. To evaluate the latter, we propose a novel extension of the Wasserstein distance for measuring the similarity of graph-based probability distributions. Whereas the standard Wasserstein distance lacks a fixed reference scale, we introduce a graph-theoretic normalization factor that enables a standardized comparison between networks of different sizes. We furnish this metric, which we term the Network Transport Distance (NTD), with a weighting function that emphasizes predictions according to custom node features, allowing network operators to explore arbitrary strategic considerations. Benchmarked against a Graph-In, Dense-Out (GIDO)-ToM architecture in an abstract cyber-defence environment, our empirical evaluations show that GIGO-ToM can accurately predict the goals and behaviours of various unseen cyber-attacking agents across a range of network topologies, as well as learn embeddings that can effectively characterize their policies.
Conference Paper
Full-text available
The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g. in human body tracking and body gesture recognition, robust hand gesture recognition remains an open problem. Compared to the entire human body, the hand is a smaller object with more complex articulations and more easily affected by segmentation errors. It is thus a very challenging problem to recognize hand gestures. This paper focuses on building a robust hand gesture recognition system using the Kinect sensor. To handle the noisy hand shape obtained from the Kinect sensor, we propose a novel distance metric for hand dissimilarity measure, called Finger-Earth Mover's Distance (FEMD). As it only matches fingers while not the whole hand shape, it can better distinguish hand gestures of slight differences. The extensive experiments demonstrate the accuracy, efficiency, and robustness of our hand gesture recognition system.
Article
It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established, we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the earth mover's distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the inorganic crystal structure database. The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision, the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.
Book
The book provides an overview of the state-of-the-art of map construction algorithms, which use tracking data in the form of trajectories to generate vector maps. The most common trajectory type is GPS-based trajectories. It introduces three emerging algorithmic categories, outlines their general algorithmic ideas, and discusses three representative algorithms in greater detail. To quantify map construction algorithms, the authors include specific datasets and evaluation measures. The datasets, source code of map construction algorithms and evaluation measures are publicly available on http://www.mapconstruction.org. The web site serves as a repository for map construction data and algorithms and researchers can contribute by uploading their own code and benchmark data.Map Construction Algorithms is an excellent resource for professionals working in computational geometry, spatial databases, and GIS. Advanced-level students studying computer science, geography and mathematics will also find this book a useful tool.
Chapter
Databases of chemical structures play an increasingly important role in the fine-chemicals industry, e.g. for the development of novel pharamaceuticals and agrochemicals (Ash et al., 1991). These databases contain tens or hundreds of thousands of chemical substances, either in two-dimensional (2D) or in three-dimensional (3D) form, and several different searching mechanisms have been developed to provide access to the data that is stored in them. The most common mechanisms are structure searching, which involves the retrieval of a single specific molecule, and substructure searching, which involves the retrieval of all of those molecules that contain a user-defined partial structure, e.g. a putative pharmacophore pattern. An extended programme of research in the University of Sheffield has sought to develop a complementary means of access, called similarity searching, and this chapter provides an overview of some of the algorithms that have been developed for this purpose since the programme commenced in the early 1980s. Specifically, we are interested in techniques that will allow a user of a chemical database to input a target structure of interest, and then to retrieve those molecules in the database that are structurally most similar to the target molecule. Our programme of research has also considered how cluster-analysis methods can be used for the processing of chemical databases.
Article
This chapter focuses on syntactic pattern recognition. Syntactic pattern recognition adapts the techniques of formal language theory, which provide both a notation and an analysis mechanism to the problem of representing and analyzing patterns containing a significant syntactic content. Regardless of the precise form of the data, syntactic analysis can proceed only if a grammatical model for the data generation and/or analysis process can be formulated. The structural description of the language in terms of a grammar is called syntax of the language. Analysis in terms of this structure is called—syntactic analysis or parsing. A related practical problem is the development of efficient analysis procedures based on the grammatical model. In formal language theory, the only relation between the elements in a string, either a terminal string or an intermediate string containing nonterminals, is concatenation, that is, the juxtaposition of adjacent elements. The first step in formulating a syntactic model for pattern analysis is the determination of a set of primitives, in terms of which the data of interest can be described. This can be largely influenced by the nature of the data, the specific application in question, and the technology available for implementing the analysis.
Article
A method to determine a distance measure between two nonhierarchical attributed relational graphs is presented. In order to apply this distance measure, the graphs are characterised by descriptive graph grammars (DGG). The proposed distance measure is based on the computation of the minimum number of modifications required to transform an input graph into the reference one. Specifically, the distance measure is defined as the cost of recognition of nodes plus the number of transformations which include node insertion, node deletion, branch insertion, branch deletion, node label substitution and branch label substitution. The major difference between the proposed distance measure and the other ones is the consideration of the cost of recognition of nodes in the distance computation. In order to do this, the principal features of the nodes are described by one or several cost functions which are used to compute the similarity between the input nodes and the reference ones. Finally, an application of this distance measure to the recognition of lower case handwritten English characters is presented.
Article
This paper is concerned with the inexact matching of attributed, relational graphs for structural pattern recognition. The matching procedure is based on a state space search utilizing heuristic information. Some experimental results are reported.