Content uploaded by Ross W Gayler
Author content
All content in this area was uploaded by Ross W Gayler on Nov 28, 2016
Content may be subject to copyright.
Analogical Mapping with Vector Symbolic Architectures
1
A
DISTRIBUTED
BASIS FOR
ANALOGICAL MAPPING
Ross
W. Gayler
r.gayler@gmail.com
School ofCommunication,Arts and Critical Enquiry
La Trobe University
Victoria 3086 Australia
Simon D. Levy
levys@wlu.edu
Department of Computer Science
Washington and Lee Un
iversity
Lexington, Virginia USA
ABSTRACT
We
are concerned with
the practical fe
a-
sibility of
the neural basis of analogical
ma
p-
ping.
All e
xisting connectionist models of an
a-
logical mapping rely to some degree on loca
l-
ist repr
e
sentation (each concept or
relation is
represented by a dedicated unit/neuron). These
localist solutions are implausible b
e
cause they
need too many units for human
-
level comp
e-
tence or require the dynamic re
-
wiring of ne
t-
works on a sub
-
second time
-
scale.
Analogical mapping can be for
ma
l
ised as
fin
ding an approximate isomorphism between
graphs representing the source and target co
n-
ceptual structures. Connectionist models of
analogical mapping implement continuous
heuristic processes for finding graph isomo
r-
phisms. We present a novel
co
nnectionist
mechanism for finding graph is
o
morphisms
that
relies on distributed
, high
-
dimensional
representations of structure and mappings.
Consequently, it does not suffer from the pro
b-
lems of the number of units scaling
combinat
o-
rially
with the number of
concepts or requiring
dynamic ne
t
work re
-
wiring.
GRAPH ISOMORPHISM
Researchers tend to divide the process of
analogy into three stages: retrieval (finding an
appropriate source situation), mapping (ident
i-
fying the corresponding elements of the source
and
target situations), and application.
Our
concern is
with
the mapping stage, which is
essentially
about
structural correspo
n
dence.
If
the source and target situations are formally
represented as graphs, the structural corr
e-
s
pondence between them can be d
e
s
cribed as
approximate graph isomorphism.
A
ny
mech
a-
nism for finding graph isomo
r
phisms is, by
definition, a mechanism for fin
d
ing structural
correspo
n
dence and a possible mechanism for
implementing analogical ma
p
ping.
W
e are
concerned with the formal unde
r
p
inning of
analogical mapping (indepen
d
ently of whether
any particular r
e
searcher chooses to describe
their specific model in these terms).
It might be supposed that representing
situations as graphs is unnecessarily restrictive
.
However,
anything that can
be formal
ise
d
can
be represented by a graph. Cat
e
gory theory,
which is effectively a theory of
structure an
d
graphs, is an alternative to set theory as a
foundation for mathematics (Ma
r
quis, 2009),
so anything that can be mathematically repr
e-
sented can be
re
p
resented as a graph.
It might also be supposed that by wor
k
ing
solely with graph isomorphism we favo
u
r
structural correspo
n
dence to the exclusion of
other factors that are known to influence an
a-
logic
al
mapping, such as semantic similarity
and pragmatics
. However, as any formal stru
c-
ture can be represented by graphs it fo
l
lows
that semantics and pragmatics can also be e
n-
coded as graphs.
For example, some models of
analogical mapping are based on labelled
graphs with the process being sensitive to label
Ross
W. Gaylerand Simon D. Levy
2
si
milarity. However, any label value
can
be
encoded as a graph and label similarity ca
p-
tured by the degree of approximate isomo
r-
phism
. Further
,
the
mathematics
of graph is
o-
morphism has been extended to include attri
b-
ute similarity
and is commonly used this w
ay
in computer vision and pattern reco
g
nition
(Bomze, Budinich, Pardalos & Pelillo, 1999)
.
T
he extent to which a
nalogical
mapping
based on graph isomorphism, is sensitive to
different types of information depends on what
information is e
n
coded into the gra
phs. Our
current research is concerned only with
the
practical feasibility of connectionist impleme
n-
tations of
graph isomorphism
. The
question
of
what
information is encoded in the graphs is
separable.
Cons
e
quently
,
we are not concerned
with modelling the
psychological properties of
analogical mapping as such que
s
tions belong
to a co
m
pletely different level of inquiry.
CONNECTIONIST IMPLEM
ENTATIONS
It is possible to model analogical ma
p-
ping as a purely algorithmic process. Ho
w
ever,
we are concerned with p
hysiological plausibi
l-
ity and consequently limit our atte
n
tion to
connectionist models of analogical mapping
such as ACME (Holyoak & Thagard, 1989),
AMBR (Kokinov, 1988), DRAMA (Eliasmith
& Thagard, 2001), and LISA (Hummel &
Holyoak, 1997). These models va
ry in their
theoretical emphases and the d
e
tails of their
connectionist implementations
, but
they all
share a problem in the scalability of
the
repr
e-
senta
tion
or construct
ion
of
the connectionist
ma
p
ping network. We contend that this is a
cons
e
quence of us
ing localist connectionist
representations or
pro
c
esses
.
In essence,
they
either have to allow in advance for all comb
i-
natorial possibilities, which requires too many
units (Stewart & El
i
asmith,
in press
), or they
have to construct the required ne
t
work
for
each
new mapping task
in a fraction of a second.
Problems with localist implement
a
tion
Rather than review all the major conne
c-
tionist models of analogical mapping, we will
use ACME and DRAMA to illustrate the pro
b-
lem with loca
l
ist representation. Locali
st and
distributed connectionist models have often
been co
m
pared in terms of properties such as
neural plausibility and robustness. Here, we
are concerned only with a single issue: d
y-
namic
re
-
wiring
(
i.e.
, the need for conne
c
tions
to be made between neuron
s as a fun
c
tion of
the source and target situations to be mapped).
ACME constructs a localist network to
represent possible mappings between the
source and target structures. The network is a
function of the source and target represent
a-
tions, and a new net
work has to be co
n
structed
for every source and target pair. A localist
unit
is constructed to represent each possible ma
p-
ping between a source vertex and target vertex.
The activation of each
unit
indicates the d
e
gree
of support for the corresponding vert
ex ma
p-
ping being part of the overall mapping b
e-
tween the source and target. The conne
c
tions
between the network
units
encode compatibi
l-
ity between the corresponding vertex ma
p-
pings. These connections are a fun
c
tion of the
source and target representations
and
co
n-
structed anew for each problem
. Compatible
vertex mappings are linked by excitatory co
n-
ne
c
tions so that support for plausibility of one
vertex mapping transmits support to compat
i-
ble mappings. Similarly, inhibitory conne
c-
tions are used to connect th
e
units
represen
t
ing
incompatible mappings. The network impl
e-
ments a relaxation
labelling
that finds a co
m-
patible set of mappings. The oper
a
tion of the
mapping network is neurally pla
u
sible, but the
process of
its
construction
is
not.
The inputs to ACME ar
e symbolic repr
e-
sentations of the source and target structures.
The mapping network is constructed by a
symbolic process that traverses the source and
target structures. The time co
m
plexity of the
traversal will be a function of the size of the
structures
to be mapped. Given that we believe
analogical mapping is a continually used core
part of cognition and that all cognitive info
r-
mation is encoded as
(large)
graph structure
s
,
we strongly prefer mapping ne
t
work setup to
require approximately constant time i
ndepen
d-
ent of the structures
to b
e
mapped.
Analogical Mapping with Vector Symbolic Architectures
3
DRAMA is a variant of ACME
with
di
s-
tributed
source and target
represent
a
tions
.
However, it appears that the process of co
n-
structing the distributed repr
e
sentation of the
mapping network
is
functionally localist, r
e-
quiring a decomposition and sequential tr
a-
versal of the source and target stru
c
tures.
Ideally, the connectionist mapping ne
t-
work sh
ould have a fixed neural arch
i
tec
ture.
T
he
units
and their connections sh
ould be
fixed in advance and not need to be
re
-
wire
d
in
response to the source and target represent
a-
tions. The structure of the current ma
p
ping
task
sh
ould
be encoded entirely in activations
ge
n
erated on the fixed neural architecture by
the source and target represe
n
tations
and
the
set
-
up process
should be
holistic
rather than
requiring decomposition of the source and ta
r-
get representations. Our r
e
search aims to
achieve this
by
using distributed representation
and processing from the VSA family of co
n-
nectionist models.
We proceed by introducing replicator
e
quations
;
a localist heuristic for finding graph
isomorphisms. Then we introduce Vector
Symbolic Architectures
(VSA),
a family of
di
s
tributed connectionist mechanisms for the
re
p
resentation and manipulation of structured
information.
Our novel contribution
is to i
m-
plement replicator equations in a completely
distributed fashion based on VSA. We co
n-
clude with a proof
-
of
-
concept demonstration
of
a distributed re
-
implementation of the
pri
n-
cipal
example from the seminal paper on graph
is
o
morphism via
replicator
equations
.
REPLICATOR EQUATIONS
The approach we are pursuing for graph
isomorphism is based on the work of Pelillo
(1999), who casts
sub
graph isomorphism as
the problem of finding a maximal clique (set of
mutually adjacent vertices) in the
association
g
raph
derived from the two graphs
to be
mapped
. Given a graph
G
of size
N
with an
N
N
adj
a
cency matrix
ij
a
A
and a graph
G
of
size
N
with an
N
N
adjacency
m
a-
trix
hk
a
A
, their association graph
G
of
size
2
N
can be represented by an
2
2
N
N
adjacency matrix
)
,
(
jk
ih
a
a
A
whose edges
e
n
code pairs of edges from
G
and
G
:
otherwise
0
and
if
)
(
1
2
,
k
h
j
i
a
a
a
hk
ij
jk
ih
(1)
The elements of
A
are
1
if the corr
e-
sponding edges in
G
and
G
have the same
state of existence and
0
if the corresponding
edges have different states of existence. D
e-
fined this way, the edges of the ass
o
ciation
graph
G
provide evidence about potential
mappings b
e
tween the
vertices of
G
and
G
based on whether the correspon
d
ing edges and
non
-
edges are consistent. The pre
s
ence of an
edge between two vertices in one graph and an
edge between two vertices in the other graph
supports a
p
ossible
mapping between the
members of each pair of vertices (as does the
a
b
sence of such an edge in both graphs).
By treating the graph isomorphism pro
b-
lem as a maximal
-
clique
-
finding problem
,
P
e-
lillo exploits an important result in graph th
e-
ory. Consider
a graph
G
with adjacency m
a-
trix
A
, a subset
C
of vert
i
ces of
G
, and a
characteristic vector
C
x
(indicating membe
r-
ship of the subset
C
) defined as
otherwise
0
if
1
C
i
C
x
C
i
(2)
where
C
is the cardinality of
C
. It turns out
that
C
is a maximum clique of
G
if and only
if
C
x
m
aximizes the function
Ax
x
x
f
T
)
(
,
where
T
x
is the transpose of
x
,
N
x
,
1
1
N
i
i
x
,
and
0
i
x
i
.
Starting at some initial condition (typ
i-
cally the baryc
enter,
N
x
i
1
corresponding
to all
i
x
being equally su
p
ported as part of the
solution),
x
can be obtained through iterative
application of the fo
l
lowing equation:
N
j
j
j
i
i
i
t
t
x
t
t
x
t
x
1
)
(
)
(
)
(
)
(
)
1
(
(
3
)
Ross
W. Gaylerand Simon D. Levy
4
where
N
j
j
ij
i
t
x
w
t
1
)
(
)
(
(4)
and
W
is a matrix of weights,
ij
w
,
typically
just the adjacency matrix
A
of the association
graph or a linear function of
A
. The
x
vector
can thus be consi
d
ered to represent the state of
the system’s belief about the vertex ma
p
pings
at a given time, with E
quations 3 and 4 repr
e-
senting a dynamical system parameterized by
the weights in
W
.
i
can be interpreted as the
evidence for
i
x
obtained from all the compat
i-
ble
j
x
where the compatibility is encoded by
ij
w
.
The denominator in E
quation 3 is a no
r-
malizing factor ensuring
that
1
1
N
i
i
x
.
Pelillo borrows Equations 3 and 4 from
the literature on evolutionary game theory in
which
i
is the overall payoff associated with
playing strategy
i
, and
ij
w
is
the payoff ass
o-
ciated with playing strategy
i
against strategy
j
. In the context of the maximum
-
clique pro
b-
lem, these
replicator equations
can be used to
derive a vector
x
(vertex mappings)
that
maximizes the “payoff” (edge consistency)
encoded in the adjacency matrix. Vertex ma
p-
pings co
rrespond to strategies, and as E
quation
3 is iterated, mappings with higher fitness
(co
n
sistency
of mappings
) come to dominate
ones with lower fitness.
Figure 1. A simple graphisomorphism pro
b
lem.
Consider the simple graphs in Figure 1,
used as
the principal
example by Pelillo
(1999)
and which we will later re
-
implement in a di
s-
tributed fashion
. The maximal isomorphism
b
e
tween t
hese two graphs is {A=P, B=Q, C=R,
D=S} or {A=P, B=Q, C=S, D=R}. Table 1
shows the first and last rows of the adjacency
matrix for the association graph of these
graphs, gene
r
ated using Equation 1. Looking at
the first row of the table, we see that the ma
p-
ping A=P is consistent with the mappings
B=Q, B=R, B=S, C=Q, C=R, C=S, D=Q,
D=R, and D=S, but not with A=Q, A=R, A=S,
B=P, etc.
AP
AQ
AR
AS
BP
BQ
BR
BS
CP
CQ
CR
CS
DP
DQ
DR
DS
AP
0
0
0
0
0
1
1
1
0
1
1
1
0
1
1
1
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
DS
1
0
1
0
0
1
0
0
1
0
1
0
0
0
0
0
Table 1. Fragment of adjacency matrix for Fi
g
. 1.
Initially, all values in the state vector
x
are set to 0.0625 (
1/16). Repeated applic
a
tion
of E
quations 3 and 4 produces a final state
vector that encod
es the two maximal isomo
r-
phisms, with 0.3 in the pos
i
tions for A=P and
B=Q, 0.1 in the positions for C=R, C=S, D=R,
and D=S, and 0 in the others. The conflicting
mappings for
C, D, R, and S correspond to a
saddle point in the d
y
namics of the replicator
equ
ations, created by the symmetry in the
graphs. Adding a small amount of noise to the
state breaks this symmetry, pr
o
ducing a final
state vector with va
l
ues of 0.25 for the optimal
mappings A=P, B=Q, and C=R, D=S or C=S,
D=R, and zero elsewhere.
The top gra
ph of
Figure 4 shows the time course of the settling
process from our impl
e
mentation o
f Pelillo’s
localist algorithm.
This example is trivially small
.
H
owever
,
the same approach has been successfully a
p-
plied to graphs with more than 65,000 vertices
(
Pelill
o
& Torsello, 2006)
. It has also been
extended to match hierarchical, attributed
structures for computer vision problems (P
e-
lillo
, Siddiqi & Zucker
1999). Thus
,
we are
confident that replicator equations
are
a re
a-
sonable candidate mechanism for the structu
re
matching at the heart of analogical ma
p
ping.
DISTRIBUTED
IMPLEMENTATION
The replicator equation mechanism can
be easily implemented as a localist connectio
n-
ist
circuit
. This is qualitatively very similar to
A
B
C
D
P
Q
S
R
Analogical Mapping with Vector Symbolic Architectures
5
ACME and
suffers the same problems due to
th
e localist representation. In this section we
present a distributed conne
c
tionist scheme for
representing edges, vertices, and mappings that
does not suffer from these pro
b
lems.
Vector Symbolic Archite
c
ture
Vector Symbolic Architecture
is a name
that we
coined
(Gayler, 2003)
to describe a
class of connectionist models that use high
-
dimensional vectors (typically around 10,000
dimensions
) of low
-
precision numbers to e
n-
code structured information as distributed re
p-
resenta
tions.
VSA can represent complex en
t
i-
ties such as trees and graphs
as vectors. E
very
such entity, no matter how simple or complex,
is represented by a pattern of activation di
s-
tributed over all the elements of the vector.
This general class of archite
c
tures traces its
origins to the tensor
product work of Smole
n-
sky (1990), but avoids the exponential growth
in dimensionality of tensor products. VSAs
employ three types of
vector
operat
or
: a mult
i-
plication
-
like operator, an addition
-
like oper
a-
tor, and a permu
tation
-
like operator. The mu
l-
tiplica
tion operat
or
is used to assoc
i
ate or bind
vectors. The addition operat
or
is used to s
u-
perpose vectors or add them to a set. The pe
r-
mutation operat
or
is used to quote or pr
o
tect
vectors from the other oper
a
tions.
The use of hyperdimensional vectors to
repr
esent symbols and their combinations pr
o-
vides a number of mathematically desirable
and biologically realistic features (Kanerva,
2009
). A hyperdimensional vector space co
n-
tain
s
as many mutually orthogonal ve
c
tors as
there are dimensions and exp
o
nentially m
any
almost
-
orthogonal vectors (Hecht
-
Nielsen,
1994), thereby supporting the repr
e
sentation of
astronomically large numbers of distinct items.
Such representations are also highly robust to
noise
.
Approximately 30%
of the va
l
ues in a
vector can be randomly
changed before it b
e-
comes more similar to another mea
n
ingful
(previously
-
defined) ve
c
tor than to its original
form. It is also possible to impl
e
ment such
vectors in a spiking neuron model (El
i
asmith,
2005
)
.
The main difference among types of
VSAs is in the
type
of numbers used as vector
elements and the related choice of multiplic
a-
tion
-
like operation. Holographic Reduced Re
p-
resentations (Plate, 2003) use real numbers and
circular convolution.
Kanerva’s (1996)
Binary
Spatter Codes (
BSC) use
Boolean va
l
ues an
d
element
wise exclusive
-
or.
Gayler
’s (
1998
)
Mu
l
tiply, Add, Permute coding (
MAP
) uses
values
from
}
1
,
1
{
and elementwise mult
i-
plication. A useful fe
a
ture of BSC and MAP is
that each vector is its own multiplicative i
n-
verse
.
M
ultiplying a
ny
vector by itself eleme
n-
twise yields the identity vector. As in ordinary
algebra, multiplication and addition are ass
o-
ciative and commutative, and multiplication
distri
b
utes over addition.
We use MAP in the work described here.
As an illustration of how VSA
can be used to
represent graph structure, consider again the
optimal mapping {A=P, B=Q, C=R, D=S} for
the graphs in Figure 1.
W
e represent this set of
mappings as the vector
S
D
R
C
Q
B
P
A
(5)
where
A
,
B
,
C
, ... are arbitrarily chosen
(ra
n
dom) vectors over
}
1
,
1
{
and
and
represent elementwise vector multiplic
a
tion
and addition respectively. For any
mapped
vertex pair X
=Y, the
representation
Y
of
ve
r-
tex
Y
can be retrieved by multiplying the
ma
p-
ping
vector
)
*
(
Y
X
by
X
, and vice
-
versa. The resulting vector will contain the
representation of Y plus a set of repres
ent
a-
tions not corr
e
sponding to any vertex, which
can be treated as noise;
e.g.
:
noise
P
S
D
A
R
C
A
Q
B
A
P
S
D
A
R
C
A
Q
B
A
P
A
A
S
D
R
C
Q
B
P
A
A
)
(
(6)
The noise can be removed from the r
e-
trieved vector by passing it through a “cleanup
memory” that stores only the meaningful ve
c-
tors
)
,
,
,
,
,
,
,
(
S
R
Q
P
D
C
B
A
. Cleanup memory
can be impl
e
mented in a biologically plausible
way as a Hopfield network that ass
o
ciates each
meaningful vector to itself (a variant of He
b-
bian learning). Such networks can reco
n
struct
Ross
W. Gaylerand Simon D. Levy
6
the original form of a vector from a highly
deg
raded exemplar, via self
-
reinforcing fee
d-
back dynamics.
Note that although the vectors de
picted in
E
quations 5 and 6 appear complex they are just
vector values like any other. From the point of
view of the implementing har
d
ware all vectors
are of equal com
putational complexity. This
has profound implications for the r
e
source
requirements of VSA
-
based systems. For e
x-
ample, the computational cost of labe
l
ling a
graph vertex with a simple attribute or a co
m-
plex structure is e
x
actly the same.
Our Model
Our go
al is to build a distributed impl
e-
mentation of the replicator E
quations 3 and 4
by representing the problem as distributed pa
t-
terns of fixed, high dimension in VSA such
that the
distributed
system
has
the same
d
y-
namic
s
as the localist formulation. As in th
e
localist version, we need a represent
a
tion
x
of
the evolving state of the system’s belief about
the vertex mappings, and a repr
e
sentation
w
of the adjacencies in the associ
a
tion graph.
In
the
VSA
representation
of a graph
w
e
represent vertices
by
random hyperdime
n-
sional vectors
,
edges
by
products of the ve
c
tors
representing
the
vertices
, and
mapping
s
by
product
s
of the mapped ent
i
ties
.
I
t is natural to
represent the set of vertices as the sum of the
vectors repr
esenting the
ve
r
tices. The product
of the vertex sets of
the
two graphs is
then
ide
n
tical to the sum of the possible mappings
of vertices
(
E
qu
a
tion 7). That is, the initial
value of
x
can be calculated holistically from
the represent
a
tions of the graphs using only
one product operation that does not require
decomposition of the vertex set into comp
o-
nent vert
i
ces. For the graphs in Figure 1:
S
D
R
D
Q
B
P
B
Q
A
P
A
S
R
Q
P
D
C
B
A
x
)
(
)
(
(7)
For VSA it is natural to represent the set
of edges of a graph as th
e sum of the products
of the vertices connected by each edge. The
product of the edge sets of the two graphs is
identical to a sum of products of four vertices
.
This
encodes information about mappings of
edges, or equivalently, about compatibility of
verte
x mappings. That is, one holistic product
operation applied to the edge sets is able to
encode all the possible edge mappings in co
n-
stant time no matter how many edges there are.
The reader may have noticed that the d
e-
scription above refers only to edges,
whereas
Pelillo’s association graph also encodes info
r-
mation about the mapping of non
-
edges in the
two graphs. We believe the explicit represent
a-
tion of non
-
edges is cognitively i
m
plausible.
However, Pelillo was not concerned with co
g-
nitive plausibility. S
ince our aim here is to
r
e
produce his work, we include non
-
edges in
Equation 8
.
The distributed vector
w
fun
c-
tions as the
localist
ass
o
ciation matrix
W
. For
the graphs in Fi
g
ure 1:
S
R
D
C
S
R
C
A
R
P
C
A
Q
P
C
A
S
R
B
A
R
P
B
A
Q
P
B
A
S
Q
D
B
R
Q
D
B
S
Q
C
B
R
Q
C
B
S
R
S
P
R
P
Q
P
D
C
D
A
C
A
B
A
S
Q
R
Q
D
B
C
B
w
)
(
)
(
)
(
)
(
(8)
The
terms of this sum correspond to the
no
n
zero elements of Table 1 (allowing for the
symmetries due to commutati
v
ity). With
x
and
w
set up this way, we can co
m
pute the
payoff vector
as the pr
oduct of
x
and
w
.
A
s in
the localist formulation (E
qu
a
tion 4), this
product causes consistent mappings to rei
n-
force each other
. E
vidence is propagated from
each vertex mapping to consistent vertex ma
p-
pings via th
e edge compat
i
bility information
encoded in
w
. (The terms of E
quation
9 have
been
rearranged to highlight th
is
cancell
a
tion.)
R
B
P
Q
B
P
R
B
Q
B
R
B
P
A
Q
B
P
A
Q
A
P
A
w
x
)
(
)
(
(9)
Implementing the update of
x
(E
quation
3)
is
more challeng
ing for the VSA formul
a-
tion. As in the localist ve
r
sion, the idea is for
corresponding ver
te
x ma
p
pings in
x
and
to
reinforce each other multiplic
a
tively, in a kind
of multiset intersection (d
e
noted here as
): if
)
(
3
2
1
R
B
k
Q
B
k
P
A
k
x
a
n
d
)
(
5
4
Q
B
k
P
A
k
then
x
equal
s
)
(
5
2
4
1
Q
B
k
k
P
A
k
k
, for
Analogical Mapping with Vector Symbolic Architectures
7
non
-
negative
weights
1
k
,
2
k
,
3
k
,
4
k
,
and
5
k
.
B
e
cause of the self
-
cancellation property of the
MAP architecture, simple elementwise mult
i-
plication of
x
and
will not work. We could
extract the
i
k
by it
e
r
ating through each of the
pairwise mappings
)
,
,
,
(
S
D
Q
A
P
A
and
dividing
x
and
element
wise by each ma
p-
ping, but this is the kind of functionally loca
l-
ist approach we argue is neurally implausible.
Ins
tead, we need a holistic distributed interse
c-
tion operator. This can be construed as a sp
e-
cial case of lateral inhib
i
tion,
a
winner
-
takes
-
all competition, which has traditionally been
considered a l
o
calist operation (Page, 2000
;
Levy & Gayler,
in press
).
Figure 2. A neural circuit for vector interse
c
tion.
To implement this intersection operator
in a holistic, distributed manner we e
x
ploit the
third component of the MAP architecture:
permutation. Our solution,
shown
in Fi
g
ure 2,
works as follows:
1:
and
2:
are registers (ve
c-
tors of units) loaded with the vectors represen
t-
ing the multisets to be intersected.
P
1
(
)
co
m-
putes some arbitrary, fixed permutation of the
vector in
1:
, and
P
2
(
)
computes a different
fixed permutation of th
e vector in
2:
. Regis
ter
3:
contains the
product
of these permuted ve
c-
tors. Register
4:
is a memory (a constant ve
c-
tor value) pre
-
loaded with each of the possible
multiset
elements
transformed by multiplying
it with
both
pe
r
mutations of itself
.
T
hat is
,
)
(
)
(
:
4
2
1
1
i
i
M
i
i
X
P
X
P
X
, where
M
is
the number of items in
the
me
m
ory
vector
(
4:
)
.
To implement the r
e
plicator equations the
clean
-
up memory
4:
must be loaded with a
pattern based on the sum of all the possible
vertex mappings (sim
i
lar
to the initial value of
the mapping vector
x
).
To see how this circuit implements inte
r-
section, consider the simple case of a system
with three meaningful vectors
X
,
Y
, and
Z
where we want to compute the intersection of
X
k
1
with
)
(
3
2
Y
k
X
k
. The first vector is
loaded into register
1:
, the second into
2:
, and
the sum
)
(
)
(
)
(
)
(
)
(
)
(
2
1
2
1
2
1
Z
P
Z
P
Z
Y
P
Y
P
Y
X
P
X
P
X
is loaded into
4:
. After passing the register
con
tents through their respective permut
a
tions
and multiplying the results, register
3:
will
contain
)
(
2
)
(
1
3
1
)
(
2
)
(
1
2
1
)
3
2
(
2
)
1
(
1
Y
P
X
P
k
k
X
P
X
P
k
k
Y
k
X
k
P
X
k
P
Multiplying registers
3:
and
4:
together will
then result in the desired intersection
(relevant
terms in bold)
plus noise
,
which can b
e r
e-
moved by standard cleanup techniques:
noise
Z
P
Z
P
Z
Y
P
Y
P
Y
Y
P
X
P
k
k
X
2
k
1
k
(X)
2
P
(X)
1
P
X
(X)
2
P
(X)
1
P
2
k
1
k
))
(
2
)
(
1
)
(
2
)
(
1
(
))
(
2
)
(
1
3
1
(
In brief, the circuit in Figure 2 works by
guaranteeing that the permutations will cancel
only
for
th
os
e
terms
i
X
that
a
re
presen
t
in
both input registers, with other
te
rms
being
re
n
dered as noise.
In order to improve noise
-
reduction it is
necessary to sum over several such intersection
circuits, each based on different permutations.
This sum over permutations has a natural i
n-
terpretation in
terms of
sigma
–
pi units
(R
u-
mel
hart, Hinton & McClelland,
1986
)
, where
each unit calculates the sum of many pro
d
ucts
of a few inputs from units in the prior layer.
The apparent complexity of Figure 2
results
from drawing
it
for
ease of
explanat
i
o
n
rather
than
correspondence
to
i
m
plement
ation
. The
intersection network
of Figure 2
could be i
m-
plemented
as
a single layer of sigma
–
pi units.
COMPARING THE APPROA
CHES
Figure 3 shows the replicator
equation
approach to graph isomorphism as a recurrent
neural circuit. Common to Pelillo’s approa
ch
and ours is the initialization of a weight vector
w
with evidence of compat
i
bility of edges and
non
-
edges from the associ
a
tion graph, as well
as the computation of the payoff vector
2:
P
2
( )
1:
P
1
( )
4:
3:
5:
Ross
W. Gaylerand Simon D. Levy
8
from multiplication (
) of
x
and
w
, the
computation of the intersection of
x
and
(
), and the normalization of
x
(
/
). The
VSA formulation
additionally
requires a
cleanup memory (
c
) and intersection
-
cleanup
memory (
c
),
each
initia
l
ized
to a constant
value
.
Figure 3. A neural circuit fo
r graph is
o
morphism.
Figure 3
also
shows the commonality of
the localist and VSA approaches, with the
VSA
-
only components depicted in dashed
lines. Note that the archite
c
ture is completely
fixed and the specifics of the mapping pro
b
lem
to be solved are re
presented entirely in the
patterns of activation loaded into the circuit.
Likewise, the circuit does not make any dec
i-
sions based on the contents of the vectors b
e-
ing manipulated. The product and intersection
operators are applied to wha
t
ever vectors are
p
resent on their inputs and the circuit settles to
a stable state representing the solution.
To demonstrate the viability of our a
p-
proach, we used this circuit with a 10,000
-
dimensional VSA to deduce isomorphisms for
the graphs in Figure 1. This example was
ch
o-
sen to allow direct comparison with Pelillo’s
results. Although it was not intended as an
example of analogical mapping, it does d
i-
rectly address the underlying mechanism of
graph isomorphism. Memory and processor
limitations made it impractical to imp
lement
the main cleanup memory as a Hopfield net
(10
8
weights), so we simulated the Hopfield
net with a table that stored the meaningful ve
c-
tors and returned the one closest to the noisy
version. To implement the intersection circuit
from Figure 2 we summe
d over 50 replicates
of that circuit, differing only in their arbitrary
permutations. The updated mapping vector
was passed back through the circuit until the
Euclidean distance between
t
x
and
1
t
x
di
f-
fered by less t
han 0.00
1. At each iteration we
computed the c
o
sine of
x
with each item in
cleanup memory, in order to compare our VSA
implementation with the localist version; ho
w-
ever, nothing in our implementation depended
on this functionally loca
list comput
a
tion.
Figure 4. Convergence of localist (top) and VSA
(bottom) implementation.
Figure 4 compares the results of Pelillo’s
localist approach to ours, for the graph is
o-
morphism problem shown in Figure 1. Time
(iterations)
t
is plotted on the abscissa, and the
corresponding values in the mapping ve
c
tor on
the ordinate. For the loca
l
ist version we added
a small amount of Gau
s
sian noise to the state
vector on the first iter
a
tion in order to keep it
from getting stuck on a sa
ddle point; the VSA
version, which starts with a noisy mapping
vector, does not suffer from this pro
b
lem. In
both versions one set of consistent vertex
mappings (shown in marked lines) comes to
dominate the other, inconsistent mappings
c
Λ
c
w
*
Λ
cleanup
x
t
x
t+
1
π
t
Analogical Mapping with Vector Symbolic Architectures
9
(shown in solid line
s) in less than 100 iter
a-
tions.
The obvious difference between the VSA and
localist versions is that the loca
l
ist version
settles into a “clean” state correspon
d
ing to the
characteristic vector in Equ
a
tion 2, with four
values equal to 0.25 and the others
equal to
zero; whereas in the VSA ve
r
sion the final
state approximates this distrib
u
tion. (The small
negative values are an artifact of using the c
o-
sine as a metric for compar
i
son.)
CONCLUSIONS AND FUTU
RE WORK
The work presented here
has
demo
n-
strate
d a p
roof
-
of
-
concept
that a distributed
representation (Ve
c
tor Symbolic Architecture)
can be applied successfully to a problem
(graph isomorphism) that until now has been
considered the purview of loca
l
ist
modelling
.
The results achieved with VSA are qualit
a-
tiv
ely similar to those with the localist form
u-
lation. In the process, we have provided an
example of how a distributed representation
can impl
e
ment an operation reminiscent of
lateral inhib
i
tion, winner
-
takes
-
all competition,
which likewise has been consi
d
er
ed to be a
localist operation. The ability to model comp
e-
tition among neurally encoded structures and
rel
a
tions, not just individual items or concepts,
points to promising new directions for cogn
i-
tive
modelling
in ge
n
eral.
The next steps in this research w
ill be to
demonstrate the technique on larger graphs and
investigate how performance degrades as the
graph size exceeds the representational capa
c-
ity set by the vector dimensionality. We will
also investigate the performance of the system
in finding subgra
ph isomorphisms.
Graph isomorphism by itself
does
not
constitute
a
psychologically
realistic
analogical
mapping system. The
re are many related pro
b-
lems to be investigated
in that broader co
n
text
.
The question ofwhat conceptual information is
encoded in th
e graphs, and how, is foremost.
It
also
seems reasonable to expect constraints on
the
graphs enco
d
ing cognitive structures (
e.g.
constraints on the
maximum and minimum
nu
m
bers of edges from each vertex). It may be
po
s
sible to exploit such constraints to im
prove
some aspects of the mapping circuit. For e
x-
ample, it may be possible to avoid the cogn
i-
tively implausible use of non
-
edges as ev
i-
dence for mappings.
Another area we intend to investigate is
the requirement for population of the clean
-
up
memories. In
this system the clean
-
up mem
o-
ries are populated from representations of the
source and target graphs. This is not unreaso
n-
able if retrieval is completely separate from
mapping. However, we wish to explore the
possibility of intertwining retrieval and ma
p-
pi
ng. For this to be feasible we would need to
reconfigure the mapping so that cleanup me
m-
ory can be populated with items that have been
previously encountered rather than items co
r-
responding to potential mappings.
We expect this approach to provide fertile
lines of research for many years to come.
SOFTWARE DOWNLOAD
MATLAB
code implementing the alg
o-
rithm in (Pelillo, 1999) and our VSA version
can be downloaded from
tinyurl.com/gidemo
ACKNOWLEDGMENTS
We thank Pentti Kanerva, Tony Plate,
and Roger Wales for
many useful sugge
s
tions.
REFERENCES
Bomze, I. M., Budinich, M., Pa
r
dalos, P. M.,
& Pelillo, M. (1999) The Maximum
Clique Problem. In D.
-
Z. Du & P. M.
Pardalos (Eds.)
Handbook of
c
ombinat
o-
rial
o
ptimization
.
Supplement
Vol
ume A
(pp. 1
-
74)
.
Boston, MA, USA
: Kluwer
Academic
Publishers
.
Eliasmith, C. (2005).Cognition with neurons:
A large
-
scale, biologically realistic
model of the Wason task. In G. Bara, L.
Barsalou, & M. Bucciarelli (Eds.),
Pr
o-
ceedings of the
27
th
Annual Meeting of
the Cognitive Science So
ciety
.
Eliasmith, C., & Thagard, P. (2001). Integra
t-
ing structure and meaning: A distributed
Ross
W. Gaylerand Simon D. Levy
10
model of analogical mapping.
Cognitive
Science
,
25
, 245
-
286.
Gayler, R. (1998). Multiplicative binding, re
p-
resentation operators, and analogy,. In K.
Holyoak, D. G
entner, &
B.
Kokinov
(Eds.),
Advances in analogy research: I
n-
tegration of theory and data from the
cognitive, computational, and neural sc
i-
ences
(p. 405). Sofia, Bulgaria: New Bu
l-
garian University.
Gayler, R.
W. (2003).
Vector Symbolic Arch
i-
tectures answer
Jackendoff’s challenges
for cognitive neuroscience. In Peter
Slezak (Ed.),
ICCS/ASCS International
Conference on Cognitive Science
(pp.
133
-
138). Sydney, Australia: University
of New South Wales.
Hecht
-
Nielsen, R. (1994). Context vectors:
general purpose
approximate meaning
representations self
-
organized from raw
data. In J. Zurada, R. M. II, &
B.
Robi
n-
son (Eds.),
Computational intelligence:
Imitating life
(
p
p. 43
-
56). IEEE Press.
Holyoak, J., & Thagard, P. (1989). Analogical
mapping by constraint satisfa
ction.
Co
g-
nitive Science
,
13
, 295
-
355.
Hummel, J., & Holyoak, K. (1997). Distri
b-
uted representations of structure: A th
e-
ory of analogical access and mapping.
Psychological Review
,
104
,
427
-
466.
Kanerva, P. (1996). Binary spatter
-
coding of
ordered
k
-
tuples
. In C. von der Malsburg,
W. von Seelen,
J. Vorbr
ü
ggen, & B.
Sendhoff (Eds.),
Artificial neural ne
t-
works
(
Proceedings of
ICANN 96)
(
p
p.
869
-
873). Berlin:
Springer
-
Verlag.
Kanerva, P. (
2009
). Hyperdimensional co
m-
puting: An introduction to computing in
distr
i
b
uted representation with high
-
dimensional random vectors.
Cognitive
Computation
,
1
, 139
-
159
.
Kokinov, B. (1988). Associative memory
-
based reasoning: Howto represent and
r
e
trieve cases. In T. O’Shea & V. Sgurev
(Eds.),
Artificial intelligence III: Metho
d-
ology, systems, applications
(
p
p. 51
-
58).
Amsterdam: Elsevier Science Publishers
B.V. (North Holland).
Levy, S.
D., & Gayler, R.
W. (
in press
).
"La
t-
eral inhibition" in afully distributed co
n-
nectionist architecture
. In
Proceedings of
the Ninth Internationa
l Conference on
Cognitive Modeling
(ICCM 2009)
. Ma
n-
chester, UK
.
Marquis, J.
-
P. (2009). Category theory. In E.
N. Zalta (Ed.),
The
Stanford
E
ncyclop
e-
dia of
P
hilosophy
(Spring 2009 Edition)
,
http://plato.stanford.edu/archives/spr2009/
e
n
tries/category
-
theory
/
Page, M. (2000). Connectionist modellingin
psychology: A localist manifesto.
Beha
v-
ioral and Brain Sciences
,
23
, 443
-
512.
Pelillo, M. (1999). Replicator equations,
maximal cliques, and graph isomo
r
phism.
Neural Computation
,
11
, 1933
-
1955.
Pelillo,
M.,
Si
ddiqi,
K., &
Zucker,
S. W.
(1999).
Matching hierarchical stru
c
tures
using association graphs
.
IEEE Transa
c-
tions on Pattern Analysis and Machine
Intelligence
,
21
, 1105
-
1120
.
Pelillo
, M., &
T
orsello
, A. (2006).
Payoff
-
monotonic game dynamics and the
maximum clique problem
.
Neural Co
m-
putation
, 18,
1215
-
1258
.
Plate, T.
A. (2003).
Holographic reduced re
p-
resent
ation: Distributed representation
for cognitive science
.
Stanford, CA,
USA:
CSLI Publications.
Rumelhart, D.
E., Hinton, G.
E., &
McClelland, J.
L. (1986).
A general
framework for parallel distributed pro
c-
essing.
In
D. E. Rumelhart & J. L.
McClelland
(Eds.
),
Parallel distributed
processing: Explorations in the micr
o-
structure of cognition.
Volume 1: Fou
n-
dations
(p
p
. 45
-
76
).
Cambridge, MA,
USA
:
The MIT Press
.
Smolensky, P. (1990). Tensor product variable
binding and the representation of sy
m-
bolic structures i
n connectionist systems.
Artificial Intelligence
,
46
, 159
-
216.
Stewart, T.
,
&
Eliasmith, C.
(
in press
).
Compositionality and biologically plaus
i-
ble models
. In
M. Werning
,
W. Hinzen,
&
E. Machery (Ed
s
.),
The
Oxford
h
an
d-
book of
c
ompositionality
.
Oxford, UK:
Oxford University Press.