Conference PaperPDF Available

# A distributed basis for analogical mapping

Authors:
• Independent Researcher https://www.rossgayler.com

## Abstract and Figures

We are concerned with the practical fea- sibility of the neural basis of analogical map- ping. All existing connectionist models of ana- logical mapping rely to some degree on local- ist representation (each concept or relation is represented by a dedicated unit/neuron). These localist solutions are implausible because they need too many units for human-level compe- tence or require the dynamic re-wiring of net- works on a sub-second time-scale. Analogical mapping can be formalised as finding an approximate isomorphism between graphs representing the source and target con- ceptual structures. Connectionist models of analogical mapping implement continuous heuristic processes for finding graph isomor- phisms. We present a novel connectionist mechanism for finding graph isomorphisms that relies on distributed, high-dimensional representations of structure and mappings. Consequently, it does not suffer from the prob- lems of the number of units scaling combinato- rially with the number of concepts or requiring dynamic network re-wiring.
Content may be subject to copyright.
165
A DISTRIBUTED BASIS FOR ANALOGICAL MAPPING
Ross W. Gayler
r.gayler@gmail.com
School of Communication, Arts and Critical Enquiry
La Trobe University
Victoria 3086 Australia
Simon D. Levy
levys@wlu.edu
Department of Computer Science
Washington and Lee University
Lexington, Virginia USA
ABSTRACT
We are concerned with the practical fea-
sibility of the neural basis of analogical map-
ping. All existing connectionist models of ana-
logical mapping rely to some degree on local-
ist representation (each concept or relation is
represented by a dedicated unit/neuron). These
localist solutions are implausible because they
need too many units for human-level compe-
tence or require the dynamic re-wiring of net-
works on a sub-second time-scale.
Analogical mapping can be formalised as
finding an approximate isomorphism between
graphs representing the source and target con-
ceptual structures. Connectionist models of
analogical mapping implement continuous
heuristic processes for finding graph isomor-
phisms. We present a novel connectionist
mechanism for finding graph isomorphisms
that relies on distributed, high-dimensional
representations of structure and mappings.
Consequently, it does not suffer from the prob-
lems of the number of units scaling combinato-
rially with the number of concepts or requiring
dynamic network re-wiring.
GRAPH ISOMORPHISM
Researchers tend to divide the process of
analogy into three stages: retrieval (finding an
appropriate source situation), mapping (identi-
fying the corresponding elements of the source
and target situations), and application. Our
concern is with the mapping stage, which is
the source and target situations are formally
represented as graphs, the structural corre-
spondence between them can be described as
approximate graph isomorphism. Any mecha-
nism for finding graph isomorphisms is, by
definition, a mechanism for finding structural
correspondence and a possible mechanism for
implementing analogical mapping. We are
concerned with the formal underpinning of
analogical mapping (independently of whether
any particular researcher chooses to describe
their specific model in these terms).
It might be supposed that representing
situations as graphs is unnecessarily restrictive.
However, anything that can be formalised can
be represented by a graph. Category theory,
which is effectively a theory of structure and
graphs, is an alternative to set theory as a
foundation for mathematics (Marquis, 2009),
so anything that can be mathematically repre-
sented can be represented as a graph.
It might also be supposed that by working
solely with graph isomorphism we favour
structural correspondence to the exclusion of
other factors that are known to influence ana-
logical mapping, such as semantic similarity
and pragmatics. However, as any formal struc-
ture can be represented by graphs it follows
that semantics and pragmatics can also be en-
coded as graphs. For example, some models of
analogical mapping are based on labelled
graphs with the process being sensitive to label
similarity. However, any label value can be
encoded as a graph and label similarity cap-
Analogical Mapping with Vector Symbolic Architectures
166
tured by the degree of approximate isomor-
phism. Further, the mathematics of graph iso-
morphism has been extended to include attrib-
ute similarity and is commonly used this way
in computer vision and pattern recognition
(Bomze, Budinich, Pardalos & Pelillo, 1999).
The extent to which analogical mapping
based on graph isomorphism, is sensitive to
different types of information depends on what
information is encoded into the graphs. Our
current research is concerned only with the
practical feasibility of connectionist implemen-
tations of graph isomorphism. The question of
what information is encoded in the graphs is
separable. Consequently, we are not concerned
with modelling the psychological properties of
analogical mapping as such questions belong
to a completely different level of inquiry.
CONNECTIONIST IMPLEMENTATIONS
It is possible to model analogical map-
ping as a purely algorithmic process. However,
we are concerned with physiological plausibil-
ity and consequently limit our attention to
connectionist models of analogical mapping
such as ACME (Holyoak & Thagard, 1989),
AMBR (Kokinov, 1988), DRAMA (Eliasmith
& Thagard, 2001), and LISA (Hummel &
Holyoak, 1997). These models vary in their
theoretical emphases and the details of their
connectionist implementations, but they all
share a problem in the scalability of the repre-
sentation or construction of the connectionist
mapping network. We contend that this is a
consequence of using localist connectionist
representations or processes. In essence, they
either have to allow in advance for all combi-
natorial possibilities, which requires too many
units (Stewart & Eliasmith, in press), or they
have to construct the required network for each
new mapping task in a fraction of a second.
Problems with localist implementation
Rather than review all the major connec-
tionist models of analogical mapping, we will
use ACME and DRAMA to illustrate the prob-
lem with localist representation. Localist and
distributed connectionist models have often
been compared in terms of properties such as
neural plausibility and robustness. Here, we
are concerned only with a single issue: dy-
namic re-wiring (i.e., the need for connections
to be made between neurons as a function of
the source and target situations to be mapped).
ACME constructs a localist network to
represent possible mappings between the
source and target structures. The network is a
function of the source and target representa-
tions, and a new network has to be constructed
for every source and target pair. A localist unit
is constructed to represent each possible map-
ping between a source vertex and target vertex.
The activation of each unit indicates the degree
of support for the corresponding vertex map-
ping being part of the overall mapping be-
tween the source and target. The connections
between the network units encode compatibil-
ity between the corresponding vertex map-
pings. These connections are a function of the
source and target representations and con-
structed anew for each problem. Compatible
vertex mappings are linked by excitatory con-
nections so that support for plausibility of one
vertex mapping transmits support to compati-
ble mappings. Similarly, inhibitory connec-
tions are used to connect the units representing
incompatible mappings. The network imple-
ments a relaxation labelling that finds a com-
patible set of mappings. The operation of the
mapping network is neurally plausible, but the
process of its construction is not.
The inputs to ACME are symbolic repre-
sentations of the source and target structures.
The mapping network is constructed by a
symbolic process that traverses the source and
target structures. The time complexity of the
traversal will be a function of the size of the
structures to be mapped. Given that we believe
analogical mapping is a continually used core
part of cognition and that all cognitive infor-
mation is encoded as (large) graph structures,
we strongly prefer mapping network setup to
require approximately constant time independ-
ent of the structures to be mapped.
DRAMA is a variant of ACME with dis-
tributed source and target representations.
Ross W. Gayler and Simon D. Levy
167
However, it appears that the process of con-
structing the distributed representation of the
mapping network is functionally localist, re-
quiring a decomposition and sequential tra-
versal of the source and target structures.
Ideally, the connectionist mapping net-
work should have a fixed neural architecture.
The units and their connections should be
fixed in advance and not need to be re-wired in
response to the source and target representa-
tions. The structure of the current mapping
task should be encoded entirely in activations
generated on the fixed neural architecture by
the source and target representations and the
set-up process should be holistic rather than
requiring decomposition of the source and
target representations. Our research aims to
achieve this by using distributed representation
and processing from the VSA family of con-
nectionist models.
We proceed by introducing replicator
equations; a localist heuristic for finding graph
isomorphisms. Then we introduce Vector
Symbolic Architectures (VSA), a family of
distributed connectionist mechanisms for the
representation and manipulation of structured
information. Our novel contribution is to im-
plement replicator equations in a completely
distributed fashion based on VSA. We con-
clude with a proof-of-concept demonstration
of a distributed re-implementation of the prin-
cipal example from the seminal paper on graph
isomorphism via replicator equations.
REPLICATOR EQUATIONS
The approach we are pursuing for graph
isomorphism is based on the work of Pelillo
(1999), who casts subgraph isomorphism as
the problem of finding a maximal clique (set of
mutually adjacent vertices) in the association
graph derived from the two graphs to be
mapped. Given a graphG of size N with an
aA
=
and a graph
G of size N with an NN × adjacency ma-
trix hk
aA
=
, their association graph G of
size 2
N can be represented by an 22 NN ×
=
whose edges
encode pairs of edges from G
and G:
=otherwise0
andif)(1 2
,khjiaa
ahkij
jkih (1)
The elements of
A
are 1 if the corre-
sponding edges in G
and G
have the same
state of existence and 0 if the corresponding
edges have different states of existence. De-
fined this way, the edges of the association
graph G provide evidence about potential
mappings between the vertices of Gand G
based on whether the corresponding edges and
non-edges are consistent. The presence of an
edge between two vertices in one graph and an
edge between two vertices in the other graph
supports a possible mapping between the
members of each pair of vertices (as does the
absence of such an edge in both graphs).
By treating the graph isomorphism prob-
lem as a maximal-clique-finding problem, Pe-
lillo exploits an important result in graph the-
ory. Consider a graph G with adjacency ma-
trix
A
, a subset C of vertices of G, and a
characteristic vector C
x (indicating member-
ship of the subset C) defined as
=otherwise0
if1 CiC
xC
i (2)
where C is the cardinality of C. It turns out
that C is a maximum clique of G if and only
if C
x maximizes the function Axxxf T
=)( ,
where T
x is the transpose of
x
, N
x,
1
1=
=
N
ii
x, and 0
i
xi .
Starting at some initial condition (typi-
cally the barycenter, Nxi1
=
corresponding
to all i
x being equally supported as part of the
solution),
x
can be obtained through iterative
application of the following equation:
=
=+ N
jjj
ii
ittx
ttx
tx
1)()(
)()(
)1(
π
π
(3)
where
Analogical Mapping with Vector Symbolic Architectures
168
=
=N
jjiji txwt 1)()(
π
(4)
and W is a matrix of weights, ij
w, typically
A
of the association
graph or a linear function of
A
. The
x
vector
can thus be considered to represent the state of
the system’s belief about the vertex mappings
at a given time, with Equations 3 and 4 repre-
senting a dynamical system parameterized by
the weights in W. i
can be interpreted as the
evidence for i
x obtained from all the compati-
ble j
x where the compatibility is encoded by
ij
w. The denominator in Equation 3 is a nor-
malizing factor ensuring that 1
1=
=
N
ii
x.
Pelillo borrows Equations 3 and 4 from
the literature on evolutionary game theory in
which i
is the overall payoff associated with
playing strategy i, and ij
w is the payoff asso-
ciated with playing strategy i against strategy
j. In the context of the maximum-clique
problem, these replicator equations can be
used to derive a vector
x
(vertex mappings)
that maximizes the “payoff” (edge consis-
tency) encoded in the adjacency matrix. Vertex
mappings correspond to strategies, and as
Equation 3 is iterated, mappings with higher
fitness (consistency of mappings) come to
dominate ones with lower fitness.
Figure 1. A simple graph isomorphism problem.
Consider the simple graphs in Figure 1,
used as the principal example by Pelillo (1999)
and which we will later re-implement in a dis-
tributed fashion. The maximal isomorphism
between these two graphs is {A=P, B=Q, C=R,
D=S} or {A=P, B=Q, C=S, D=R}. Table 1
shows the first and last rows of the adjacency
matrix for the association graph of these
graphs, generated using Equation 1. Looking at
the first row of the table, we see that the map-
ping A=P is consistent with the mappings
B=Q, B=R, B=S, C=Q, C=R, C=S, D=Q,
D=R, and D=S, but not with A=Q, A=R, A=S,
B=P, etc.
AP AQ AR AS BP BQ BR BS CP CQ CR CS DP DQ DR DS
AP 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1
DS 1 0 1 0 0 1 0 0 1 0 1 0 0 0 0 0
Table 1. Fragment of adjacency matrix for Fig. 1.
Initially, all values in the state vector
x
are set to 0.0625 (1/16). Repeated application
of Equations 3 and 4 produces a final state
vector that encodes the two maximal isomor-
phisms, with 0.3 in the positions for A=P and
B=Q, 0.1 in the positions for C=R, C=S, D=R,
and D=S, and 0 in the others. The conflicting
mappings for C, D, R, and S correspond to a
saddle point in the dynamics of the replicator
equations, created by the symmetry in the
graphs. Adding a small amount of noise to the
state breaks this symmetry, producing a final
state vector with values of 0.25 for the optimal
mappings A=P, B=Q, and C=R, D=S or C=S,
D=R, and zero elsewhere. The top graph of
Figure 4 shows the time course of the settling
process from our implementation of Pelillo’s
localist algorithm.
This example is trivially small. However,
the same approach has been successfully ap-
plied to graphs with more than 65,000 vertices
(Pelillo & Torsello, 2006). It has also been
extended to match hierarchical, attributed
structures for computer vision problems (Pe-
lillo, Siddiqi & Zucker 1999). Thus, we are
confident that replicator equations are a rea-
sonable candidate mechanism for the structure
matching at the heart of analogical mapping.
DISTRIBUTED IMPLEMENTATION
The replicator equation mechanism can
be easily implemented as a localist connection-
ist circuit. This is qualitatively very similar to
ACME and suffers the same problems due to
the localist representation. In this section we
A B
C
D P Q
S
R
Ross W. Gayler and Simon D. Levy
169
present a distributed connectionist scheme for
representing edges, vertices, and mappings that
does not suffer from these problems.
Vector Symbolic Architecture
Vector Symbolic Architecture is a name
that we coined (Gayler, 2003) to describe a
class of connectionist models that use high-
dimensional vectors (typically around 10,000
dimensions) of low-precision numbers to en-
code structured information as distributed rep-
resentations. VSA can represent complex enti-
ties such as trees and graphs as vectors. Every
such entity, no matter how simple or complex,
is represented by a pattern of activation dis-
tributed over all the elements of the vector.
This general class of architectures traces its
origins to the tensor product work of Smolen-
sky (1990), but avoids the exponential growth
in dimensionality of tensor products. VSAs
employ three types of vector operator: a multi-
tor, and a permutation-like operator. The mul-
tiplication operator is used to associate or bind
vectors. The addition operator is used to su-
perpose vectors or add them to a set. The per-
mutation operator is used to quote or protect
vectors from the other operations.
The use of hyperdimensional vectors to
represent symbols and their combinations pro-
vides a number of mathematically desirable
and biologically realistic features (Kanerva,
2009). A hyperdimensional vector space con-
tains as many mutually orthogonal vectors as
there are dimensions and exponentially many
almost-orthogonal vectors (Hecht-Nielsen,
1994), thereby supporting the representation of
astronomically large numbers of distinct items.
Such representations are also highly robust to
noise. Approximately 30% of the values in a
vector can be randomly changed before it be-
comes more similar to another meaningful
(previously-defined) vector than to its original
form. It is also possible to implement such
vectors in a spiking neuron model (Eliasmith,
2005).
The main difference among types of
VSAs is in the type of numbers used as vector
elements and the related choice of multiplica-
tion-like operation. Holographic Reduced Rep-
resentations (Plate, 2003) use real numbers and
circular convolution. Kanerva’s (1996) Binary
Spatter Codes (BSC) use Boolean values and
elementwise exclusive-or. Gayler’s (1998)
Multiply, Add, Permute coding (MAP) uses
values from }1,1{
+
and elementwise multi-
plication. A useful feature of BSC and MAP is
that each vector is its own multiplicative in-
verse. Multiplying any vector by itself elemen-
twise yields the identity vector. As in ordinary
algebra, multiplication and addition are asso-
ciative and commutative, and multiplication
We use MAP in the work described here.
As an illustration of how VSA can be used to
represent graph structure, consider again the
optimal mapping {A=P, B=Q, C=R, D=S} for
the graphs in Figure 1. We represent this set of
mappings as the vector
SDRCQBPA
+
+
+
(5)
where
A
, B, C, ... are arbitrarily chosen
(random) vectors over }1,1{
+
and and +
represent elementwise vector multiplication
and addition respectively. For any mapped
vertex pair X=Y, the representation Y of ver-
tex Y can be retrieved by multiplying the map-
ping vector )*(
Κ
+
YX by
X
, and vice-
versa. The resulting vector will contain the
representation of Y plus a set of representa-
tions not corresponding to any vertex, which
can be treated as noise; e.g.:
noisePSDARCAQBAP
SDARCAQBAPAA
SDRCQBPAA
+=+++=
+++=
+
+
+
)( (6)
The noise can be removed from the re-
trieved vector by passing it through a “cleanup
memory” that stores only the meaningful vec-
tors ),,,,,,,( SRQPDCBA . Cleanup memory
can be implemented in a biologically plausible
way as a Hopfield network that associates each
meaningful vector to itself (a variant of Heb-
bian learning). Such networks can reconstruct
the original form of a vector from a highly
Analogical Mapping with Vector Symbolic Architectures
170
back dynamics.
Note that although the vectors depicted in
Equations 5 and 6 appear complex they are just
vector values like any other. From the point of
view of the implementing hardware all vectors
are of equal computational complexity. This
has profound implications for the resource
requirements of VSA-based systems. For ex-
ample, the computational cost of labelling a
graph vertex with a simple attribute or a com-
plex structure is exactly the same.
Our Model
Our goal is to build a distributed imple-
mentation of the replicator Equations 3 and 4
by representing the problem as distributed pat-
terns of fixed, high dimension in VSA such
that the distributed system has the same dy-
namics as the localist formulation. As in the
localist version, we need a representation
x
of
the evolving state of the system’s belief about
the vertex mappings, and a representation w
of the adjacencies in the association graph.
In the VSA representation of a graph we
represent vertices by random hyperdimen-
sional vectors, edges by products of the vectors
representing the vertices, and mappings by
products of the mapped entities. It is natural to
represent the set of vertices as the sum of the
vectors representing the vertices. The product
of the vertex sets of the two graphs is then
identical to the sum of the possible mappings
of vertices (Equation 7). That is, the initial
value of
x
can be calculated holistically from
the representations of the graphs using only
one product operation that does not require
decomposition of the vertex set into compo-
nent vertices. For the graphs in Figure 1:
SDRDQBPBQAPA
SRQPDCBAx
+++++++=
+
+
++++=
ΚΚ
)()( (7)
For VSA it is natural to represent the set
of edges of a graph as the sum of the products
of the vertices connected by each edge. The
product of the edge sets of the two graphs is
identical to a sum of products of four vertices.
This encodes information about mappings of
edges, or equivalently, about compatibility of
vertex mappings. That is, one holistic product
operation applied to the edge sets is able to
encode all the possible edge mappings in con-
stant time no matter how many edges there are.
The reader may have noticed that the de-
scription above refers only to edges, whereas
Pelillo’s association graph also encodes infor-
mation about the mapping of non-edges in the
two graphs. We believe the explicit representa-
tion of non-edges is cognitively implausible.
However, Pelillo was not concerned with cog-
nitive plausibility. Since our aim here is to
reproduce his work, we include non-edges in
Equation 8. The distributed vector w func-
tions as the localist association matrix W. For
the graphs in Figure 1:
SRDCSRCARPCAQPCA
SRBARPBAQPBA
SQDBRQDBSQCBRQCB
SRSPRPQPDCDACABA
SQRQDBCBw
++++++
++++
+++=
++++++
+
+
+
=
ΚΚ
Κ
)()(
)()(
(8)
The terms of this sum correspond to the
nonzero elements of Table 1 (allowing for the
symmetries due to commutativity). With
x
and w set up this way, we can compute the
payoff vector
π
as the product of
x
and w.
As in the localist formulation (Equation 4), this
product causes consistent mappings to rein-
force each other. Evidence is propagated from
each vertex mapping to consistent vertex map-
pings via the edge compatibility information
encoded in w. (The terms of Equation 9 have
been rearranged to highlight this cancellation.)
Κ
ΚΚ
++++=
++++=
=
RBPQBPRBQB
RBPAQBPAQAPA
w
x
)()(
π
(9)
Implementing the update of
x
(Equation
3) is more challenging for the VSA formula-
tion. As in the localist version, the idea is for
corresponding vertex mappings in
x
and
π
to
reinforce each other multiplicatively, in a kind
of multiset intersection (denoted here as ): if
)( 321 RBkQBkPAkx
+
+
=
and )( 54 QBkPAk +=
π
then
x
equals )( 5241 QBkkPAkk
+
, for
non-negative weights 1
k, 2
k, 3
k, 4
k, and 5
k.
Ross W. Gayler and Simon D. Levy
171
Because of the self-cancellation property of the
MAP architecture, simple elementwise multi-
plication of
x
and
will not work. We could
extract the i
k by iterating through each of the
pairwise mappings ),,,( SDQAPA
Κ and
dividing
x
and
elementwise by each map-
ping, but this is the kind of functionally local-
ist approach we argue is neurally implausible.
Instead, we need a holistic distributed intersec-
tion operator. This can be construed as a spe-
cial case of lateral inhibition, a winner-takes-
all competition, which has traditionally been
considered a localist operation (Page, 2000;
Levy & Gayler, in press).
Figure 2. A neural circuit for vector intersection.
To implement this intersection operator
in a holistic, distributed manner we exploit the
third component of the MAP architecture:
permutation. Our solution, shown in Figure 2,
works as follows: 1: and 2: are registers (vec-
tors of units) loaded with the vectors represent-
ing the multisets to be intersected. P1( ) com-
putes some arbitrary, fixed permutation of the
vector in 1:, and P2( ) computes a different
fixed permutation of the vector in 2:. Register
3: contains the product of these permuted vec-
tors. Register 4: is a memory (a constant vec-
tor value) pre-loaded with each of the possible
multiset elements transformed by multiplying
it with both permutations of itself. That is,
)()(:4 2
11ii
M
iiXPXPX = =, where
M
is
the number of items in the memory vector (4:).
To implement the replicator equations the
clean-up memory 4: must be loaded with a
pattern based on the sum of all the possible
vertex mappings (similar to the initial value of
the mapping vector
x
).
To see how this circuit implements inter-
section, consider the simple case of a system
with three meaningful vectors
X
, Y, and
Z
where we want to compute the intersection of
Xk1 with )( 32 YkXk
+
. The first vector is
loaded into register 1:, the second into 2:, and
the sum )()()()()()( 212121 ZPZPZYPYPYXPXPX +
+
is loaded into 4:. After passing the register
contents through their respective permutations
and multiplying the results, register 3: will
contain
)(
2
)(
131
)(
2
)(
121
)
32
(
2
)
1
(
1
YPXPkkXPXPkk
YkXkPXkP
+=
+
Multiplying registers 3: and 4: together will
then result in the desired intersection (relevant
terms in bold) plus noise, which can be re-
moved by standard cleanup techniques:
noise
ZPZPZYPYPY
YPXPkk
+=
++
+
X
2
k
1
k
(X)
2
P(X)
1
PX
(X)
2
P(X)
1
P
2
k
1
k
))(
2
)(
1
)(
2
)(
1
(
))(
2
)(
131
(
In brief, the circuit in Figure 2 works by
guaranteeing that the permutations will cancel
only for those terms i
X that are present in
both input registers, with other terms being
rendered as noise.
In order to improve noise-reduction it is
necessary to sum over several such intersection
circuits, each based on different permutations.
This sum over permutations has a natural in-
terpretation in terms of sigma–pi units (Ru-
melhart, Hinton & McClelland, 1986), where
each unit calculates the sum of many products
of a few inputs from units in the prior layer.
The apparent complexity of Figure 2 results
from drawing it for ease of explanation rather
than correspondence to implementation. The
intersection network of Figure 2 could be im-
plemented as a single layer of sigma–pi units.
COMPARING THE APPROACHES
Figure 3 shows the replicator equation
approach to graph isomorphism as a recurrent
neural circuit. Common to Pelillo’s approach
and ours is the initialization of a weight vector
w with evidence of compatibility of edges and
non-edges from the association graph, as well
as the computation of the payoff vector
π
2: P2( )
1: P1( )
4:
3: 5:
∗ ∗
Analogical Mapping with Vector Symbolic Architectures
172
from multiplication (
) of
x
and w, the
computation of the intersection of
x
and
(), and the normalization of
x
( Σ/). The
cleanup memory (c) and intersection-cleanup
memory (
c), each initialized to a constant
value.
Figure 3. A neural circuit for graph isomorphism.
Figure 3 also shows the commonality of
the localist and VSA approaches, with the
VSA-only components depicted in dashed
lines. Note that the architecture is completely
fixed and the specifics of the mapping problem
to be solved are represented entirely in the
patterns of activation loaded into the circuit.
Likewise, the circuit does not make any deci-
sions based on the contents of the vectors be-
ing manipulated. The product and intersection
operators are applied to whatever vectors are
present on their inputs and the circuit settles to
a stable state representing the solution.
To demonstrate the viability of our ap-
proach, we used this circuit with a 10,000-
dimensional VSA to deduce isomorphisms for
the graphs in Figure 1. This example was cho-
sen to allow direct comparison with Pelillo’s
results. Although it was not intended as an
example of analogical mapping, it does di-
rectly address the underlying mechanism of
graph isomorphism. Memory and processor
limitations made it impractical to implement
the main cleanup memory as a Hopfield net
(108 weights), so we simulated the Hopfield
net with a table that stored the meaningful vec-
tors and returned the one closest to the noisy
version. To implement the intersection circuit
from Figure 2 we summed over 50 replicates
of that circuit, differing only in their arbitrary
permutations. The updated mapping vector
was passed back through the circuit until the
Euclidean distance between t
x and 1t
x dif-
fered by less than 0.001. At each iteration we
computed the cosine of
x
with each item in
cleanup memory, in order to compare our VSA
implementation with the localist version; how-
ever, nothing in our implementation depended
on this functionally localist computation.
Figure 4. Convergence of localist (top) and VSA
(bottom) implementation.
Figure 4 compares the results of Pelillo’s
localist approach to ours, for the graph iso-
morphism problem shown in Figure 1. Time
(iterations)
t
is plotted on the abscissa, and the
corresponding values in the mapping vector on
the ordinate. For the localist version we added
a small amount of Gaussian noise to the state
vector on the first iteration in order to keep it
from getting stuck on a saddle point; the VSA
version, which starts with a noisy mapping
vector, does not suffer from this problem. In
both versions one set of consistent vertex
mappings (shown in marked lines) comes to
dominate the other, inconsistent mappings
cΛ
c
w *
Λ
cleanup /
Σ
xt
xt+1
πt
Ross W. Gayler and Simon D. Levy
173
(shown in solid lines) in less than 100 itera-
tions.
The obvious difference between the VSA and
localist versions is that the localist version
settles into a “clean” state corresponding to the
characteristic vector in Equation 2, with four
values equal to 0.25 and the others equal to
zero; whereas in the VSA version the final
state approximates this distribution. (The small
negative values are an artifact of using the
cosine as a metric for comparison.)
CONCLUSIONS AND FUTURE WORK
The work presented here has demon-
strated a proof-of-concept that a distributed
representation (Vector Symbolic Architecture)
can be applied successfully to a problem
(graph isomorphism) that until now has been
considered the purview of localist modelling.
The results achieved with VSA are qualita-
tively similar to those with the localist formu-
lation. In the process, we have provided an
example of how a distributed representation
can implement an operation reminiscent of
lateral inhibition, winner-takes-all competition,
which likewise has been considered to be a
localist operation. The ability to model compe-
tition among neurally encoded structures and
relations, not just individual items or concepts,
points to promising new directions for cogni-
tive modelling in general.
The next steps in this research will be to
demonstrate the technique on larger graphs and
investigate how performance degrades as the
graph size exceeds the representational capac-
ity set by the vector dimensionality. We will
also investigate the performance of the system
in finding subgraph isomorphisms.
Graph isomorphism by itself does not
constitute a psychologically realistic analogical
mapping system. There are many related prob-
lems to be investigated in that broader context.
The question of what conceptual information is
encoded in the graphs, and how, is foremost. It
also seems reasonable to expect constraints on
the graphs encoding cognitive structures (e.g.
constraints on the maximum and minimum
numbers of edges from each vertex). It may be
possible to exploit such constraints to improve
some aspects of the mapping circuit. For ex-
ample, it may be possible to avoid the cogni-
tively implausible use of non-edges as evi-
dence for mappings.
Another area we intend to investigate is
the requirement for population of the clean-up
memories. In this system the clean-up memo-
ries are populated from representations of the
source and target graphs. This is not unreason-
able if retrieval is completely separate from
mapping. However, we wish to explore the
possibility of intertwining retrieval and map-
ping. For this to be feasible we would need to
reconfigure the mapping so that cleanup mem-
ory can be populated with items that have been
previously encountered rather than items cor-
responding to potential mappings.
We expect this approach to provide fer-
tile lines of research for many years to come.
MATLAB code implementing the algo-
rithm in (Pelillo, 1999) and our VSA version
ACKNOWLEDGMENTS
We thank Pentti Kanerva, Tony Plate,
and Roger Wales for many useful suggestions.
REFERENCES
Bomze, I. M., Budinich, M., Pardalos, P. M.,
& Pelillo, M. (1999) The Maximum
Clique Problem. In D.-Z. Du & P. M.
Pardalos (Eds.) Handbook of combinato-
rial optimization. Supplement Volume A
(pp. 1-74). Boston, MA, USA: Kluwer
Eliasmith, C. (2005). Cognition with neurons:
A large- scale, biologically realistic
model of the Wason task. In G. Bara, L.
Barsalou, & M. Bucciarelli (Eds.), Pro-
ceedings of the 27th Annual Meeting of
the Cognitive Science Society.
Eliasmith, C., & Thagard, P. (2001). Integrat-
ing structure and meaning: A distributed
Analogical Mapping with Vector Symbolic Architectures
174
model of analogical mapping. Cognitive
Science, 25, 245-286.
Gayler, R. (1998). Multiplicative binding, rep-
resentation operators, and analogy,. In K.
Holyoak, D. Gentner, & B. Kokinov
(Eds.), Advances in analogy research: In-
tegration of theory and data from the
cognitive, computational, and neural sci-
ences (p. 405). Sofia, Bulgaria: New Bul-
garian University.
Gayler, R. W. (2003). Vector Symbolic Archi-
for cognitive neuroscience. In Peter
Slezak (Ed.), ICCS/ASCS International
Conference on Cognitive Science (pp.
133-138). Sydney, Australia: University
of New South Wales.
Hecht-Nielsen, R. (1994). Context vectors:
general purpose approximate meaning
representations self- organized from raw
data. In J. Zurada, R. M. II, & B. Robin-
son (Eds.), Computational intelligence:
Imitating life (pp. 43-56). IEEE Press.
Holyoak, J., & Thagard, P. (1989). Analogical
mapping by constraint satisfaction. Cog-
nitive Science, 13, 295- 355.
Hummel, J., & Holyoak, K. (1997). Distrib-
uted representations of structure: A the-
ory of analogical access and mapping.
Psychological Review, 104, 427-466.
Kanerva, P. (1996). Binary spatter-coding of
ordered k-tuples. In C. von der Malsburg,
W. von Seelen, J. Vorbrüggen, & B.
Sendhoff (Eds.), Artificial neural net-
works (Proceedings of ICANN 96) (pp.
869-873). Berlin: Springer-Verlag.
Kanerva, P. (2009). Hyperdimensional com-
puting: An introduction to computing in
distributed representation with high-
dimensional random vectors. Cognitive
Computation, 1, 139-159.
Kokinov, B. (1988). Associative memory-
based reasoning: How to represent and
retrieve cases. In T. O’Shea & V. Sgurev
(Eds.), Artificial intelligence III: Meth-
odology, systems, applications (pp. 51-
58). Amsterdam: Elsevier Science Pub-
lishers B.V. (North Holland).
Levy, S. D., & Gayler, R. W. (in press). "Lat-
eral inhibition" in a fully distributed con-
nectionist architecture. In Proceedings of
the Ninth International Conference on
Cognitive Modeling (ICCM 2009). Man-
chester, UK.
Marquis, J.-P. (2009). Category theory. In E.
N. Zalta (Ed.), The Stanford Encyclope-
dia of Philosophy (Spring 2009 Edition),
http://plato.stanford.edu/archives/spr2009/
entries/category-theory/
Page, M. (2000). Connectionist modelling in
psychology: A localist manifesto. Behav-
ioral and Brain Sciences, 23, 443-512.
Pelillo, M. (1999). Replicator equations,
maximal cliques, and graph isomorphism.
Neural Computation, 11, 1933-1955.
Pelillo, M., Siddiqi, K., & Zucker, S. W.
(1999). Matching hierarchical structures
using association graphs. IEEE Transac-
tions on Pattern Analysis and Machine
Intelligence, 21, 1105-1120.
Pelillo, M., & Torsello, A. (2006). Payoff-
monotonic game dynamics and the
maximum clique problem. Neural Com-
putation, 18, 1215-1258.
Plate, T. A. (2003). Holographic reduced rep-
resentation: Distributed representation
for cognitive science. Stanford, CA,
USA: CSLI Publications.
Rumelhart, D. E., Hinton, G. E., &
McClelland, J. L. (1986). A general
framework for parallel distributed proc-
essing. In D. E. Rumelhart & J. L.
McClelland (Eds.), Parallel distributed
processing: Explorations in the micro-
structure of cognition. Volume 1: Foun-
dations (pp. 45-76). Cambridge, MA,
USA: The MIT Press.
Smolensky, P. (1990). Tensor product variable
binding and the representation of sym-
bolic structures in connectionist systems.
Artificial Intelligence, 46, 159-216.
Stewart, T., & Eliasmith, C. (in press).
Compositionality and biologically plausi-
ble models. In M. Werning, W. Hinzen,
& E. Machery (Eds.), The Oxford hand-
book of compositionality. Oxford, UK:
Oxford University Press.
... The problem with the current models of HDC/VSA for analogical mapping is that they lack interaction and competition of consistent alternative mappings. They could probably be improved by using an approach involving the associative memory akin to [109]. ...
... The abstract formulation of isomorphism is graph isomorphism. In [109], an interesting scheme was proposed for finding the graph isomorphism with HDC/VSA and associative memory. The scheme used the mechanism proposed in [250]. ...
... Nevertheless, it was demonstrated in [193] that the Tensor Product Representations model (Section 2.3.2 in [222]) can be used to formalize MINERVA 2 as a fixed size tensor of order four. Moreover, it was demonstrated that the lateral inhibition mechanism for HDC/VSA [109] and HRR can be used to approximate MINERVA 2 with HVs. HVs allowed compressing the exact formulation of the model, which relies on tensors, into several HVs, thus making the model more computationally tractable at the cost of lossy representation in HVs. ...
Article
Full-text available
This is Part II of the two-part comprehensive survey devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HDC/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of structured symbolic representations and vector distributed representations. Holographic Reduced Representations [322, 327] is an influential HDC/VSA model that is well-known in the machine learning domain and often used to refer to the whole family. However, for the sake of consistency, we use HDC/VSA to refer to the field. Part I of this survey [223] covered foundational aspects of the field, such as the historical context leading to the development of HDC/VSA, key elements of any HDC/VSA model, known HDC/VSA models, and the transformation of input data of various types into high-dimensional vectors suitable for HDC/VSA. This second part surveys existing applications, the role of HDC/VSA in cognitive computing and architectures, as well as directions for future work. Most of the applications lie within the Machine Learning/Artificial Intelligence domain, however, we also cover other applications to provide a complete picture. The survey is written to be useful for both newcomers and practitioners.
... The works above were focused on the case where a single HV was used to store information but as it was demonstrated in [14] the decoding from HVs can be improved if the redundant storage is used. As an example, the "multiset intersection" circuit in [35] effectively does this by summing over multiple copies of the same data (derived from multiple data permutations) to average away noise. ...
... Figure 3 presents examples of both directed and undirected graphs. First, we consider the following simple transformation of graphs into HVs [35]. A random HV is assigned to each node of the graph, following Figure 3 node HVs are denoted by letters (i.e., a for node "a" and so on). ...
... For graphs that have the same node HVs, the dot product is a measure of the number of overlapping edges. The described graph representations do not represent isolated vertices, but this could be fixed, by, e.g., separate representations of the vertex set and edge set [35]. ...
Article
Full-text available
This two-part comprehensive survey is devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HDC/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of structured symbolic representations and vector distributed representations. Notable models in the HDC/VSA family are Tensor Product Representations, Holographic Reduced Representations, Multiply-Add-Permute, Binary Spatter Codes, and Sparse Binary Distributed Representations but there are other models too. HDC/VSA is a highly interdisciplinary field with connections to computer science, electrical engineering, artificial intelligence, mathematics, and cognitive science. This fact makes it challenging to create a thorough overview of the field. However, due to a surge of new researchers joining the field in recent years, the necessity for a comprehensive survey of the field has become extremely important. Therefore, amongst other aspects of the field, this Part I surveys important aspects such as: known computational models of HDC/VSA and transformations of various input data types to high-dimensional distributed representations. Part II of this survey [84]
... To be able to estimate these constituent factors, visual perception must begin with the observed luminance and solve an inverse problem that involves undoing the multiplication by which the attributes were combined 1,4 . This factorization problem is also at the core of other levels of the conceptual hierarchy, such as factoring time-varying pixel data of dynamic scenes into persistent and dynamic components [6][7][8][9] , factoring a sentence structure into roles and fillers 10,11 , and finally cognitive analogical reasoning [12][13][14][15][16] . How these factorization problems could be solved efficiently by biological neural circuits is still unclear to date. ...
... Note that the application of in-memory factorizers can go beyond visual perception, as factorization problems arise everywhere in perception and cognition, for instance in analogical reasoning [12][13][14][15][16]42 . Other applications include tree search 21 and the prime factorization of integers 43 . ...
Preprint
Full-text available
Disentanglement of constituent factors of a sensory signal is central to perception and cognition and hence is a critical task for future artificial intelligence systems. In this paper, we present a compute engine capable of efficiently factorizing holographic perceptual representations by exploiting the computation-in-superposition capability of brain-inspired hyperdimensional computing and the intrinsic stochasticity associated with analog in-memory computing based on nanoscale memristive devices. Such an iterative in-memory factorizer is shown to solve at least five orders of magnitude larger problems that cannot be solved otherwise, while also significantly lowering the computational time and space complexity. We present a large-scale experimental demonstration of the factorizer by employing two in-memory compute chips based on phase-change memristive devices. The dominant matrix-vector multiply operations are executed at O(1) thus reducing the computational time complexity to merely the number of iterations. Moreover, we experimentally demonstrate the ability to factorize visual perceptual representations reliably and efficiently.
... One promising approach is to factorize representations in which various aspects of knowledge are represented separately and can then be flexibly recombined to represent novel experiences [1] with better downstream performance for abstract reasoning tasks [2]. Solving such factorization problems is also fundamental to biological perception and cognition, e.g., factoring sensory and spatial M. Hersche, G. Karunaratne representations [3], [4], factoring time-varying pixel data [5]- [8], factoring a sentence structure [9], [10], and analogical reasoning [11]- [15]. Although it is unclear how the biological neural circuits may solve these factorization problems, one elegant solution is to cast the entanglement and disentanglement of neurally-encoded information as multiplication and unmultiplication (factorization) of high-dimensional distributed vectors representing neural activities. ...
Preprint
Full-text available
Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-with vectors. One major challenge however is to disentangle, or factorize, such data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging when queried by noisy SBCs wherein symbol representations are relaxed due to perceptual uncertainty and approximations made when modern neural networks are used to generate the query vectors. To address these challenges, we first propose a fast and highly accurate method for factorizing a more flexible and hence generalized form of SBCs, dubbed GSBCs. Our iterative factorizer introduces a threshold-based nonlinear activation, a conditional random sampling, and an $\ell_\infty$-based similarity metric. Its random sampling mechanism in combination with the search in superposition allows to analytically determine the expected number of decoding iterations, which matches the empirical observations up to the GSBC's bundling capacity. Secondly, the proposed factorizer maintains its high accuracy when queried by noisy product vectors generated using deep convolutional neural networks (CNNs). This facilitates its application in replacing the large fully connected layer (FCL) in CNNs, whereby C trainable class vectors, or attribute combinations, can be implicitly represented by our factorizer having F-factor codebooks, each with $\sqrt[\leftroot{-2}\uproot{2}F]{C}$ fixed codevectors. We provide a methodology to flexibly integrate our factorizer in the classification layer of CNNs with a novel loss function. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets. In all use cases, the number of parameters and operations are significantly reduced compared to the FCL.
... The VSA representations can be composed, decomposed, probed and transformed in various ways using a set of well-defined operations, including binding, unbinding, bundling (that is, additive superposition), permutations and associative memory. The compositionality and transparency enabled the use of VSAs in analogical reasoning [34][35][36][37][38] , but these inspiring works lacked a perception module to process the raw sensory inputs. Instead, they assumed a perception system, for example, a symbolic parser, to provide the symbolic representations that support the reasoning. ...
Article
Full-text available
Neither deep neural networks nor symbolic artificial intelligence (AI) alone has approached the kind of intelligence expressed in humans. This is mainly because neural networks are not able to decompose joint representations to obtain distinct objects (the so-called binding problem), while symbolic AI suffers from exhaustive rule searches, among other problems. These two problems are still pronounced in neuro-symbolic AI, which aims to combine the best of the two paradigms. Here we show that the two problems can be addressed with our proposed neuro-vector-symbolic architecture (NVSA) by exploiting its powerful operators on high-dimensional distributed representations that serve as a common language between neural networks and symbolic AI. The efficacy of NVSA is demonstrated by solving Raven’s progressive matrices datasets. Compared with state-of-the-art deep neural network and neuro-symbolic approaches, end-to-end training of NVSA achieves a new record of 87.7% average accuracy in RAVEN, and 88.1% in I-RAVEN datasets. Moreover, compared with the symbolic reasoning within the neuro-symbolic approaches, the probabilistic reasoning of NVSA with less expensive operations on the distributed representations is two orders of magnitude faster.
... Holographic Graph Representation: There are existing research works focused on high-dimensional and holographic graph representation. Work in Gayler and Levy (2009) represented graphs in an HDC model by binding together vertices to represent edges and adding the vectors together. However, they specified only a single graph isomorphism problem that can be solved using their model, without specifying how their model can be generalized to solve additional problems. ...
Article
Full-text available
Memorization is an essential functionality that enables today's machine learning algorithms to provide a high quality of learning and reasoning for each prediction. Memorization gives algorithms prior knowledge to keep the context and define confidence for their decision. Unfortunately, the existing deep learning algorithms have a weak and nontransparent notion of memorization. Brain-inspired HyperDimensional Computing (HDC) is introduced as a model of human memory. Therefore, it mimics several important functionalities of the brain memory by operating with a vector that is computationally tractable and mathematically rigorous in describing human cognition. In this manuscript, we introduce a brain-inspired system that represents HDC memorization capability over a graph of relations. We propose GrapHD , hyperdimensional memorization that represents graph-based information in high-dimensional space. GrapHD defines an encoding method representing complex graph structure while supporting both weighted and unweighted graphs. Our encoder spreads the information of all nodes and edges across into a full holistic representation so that no component is more responsible for storing any piece of information than another. Then, GrapHD defines several important cognitive functionalities over the encoded memory graph. These operations include memory reconstruction, information retrieval, graph matching, and shortest path. Our extensive evaluation shows that GrapHD : (1) significantly enhances learning capability by giving the notion of short/long term memorization to learning algorithms, (2) enables cognitive computing and reasoning over memorization graph, and (3) enables holographic brain-like computation with substantial robustness to noise and failure.
... In the domain of language, it has been argued that a factorization of sentence structure into "roles" and "fillers" is required for robust and flexible processing (Smolensky, 1990;Jackendoff, 2002). Many cognitive tasks, such as analogical reasoning, also require a form of factorization (Hummel & Holyoak, 1997;Kanerva, 1998;Plate, 2000a;Gayler & Levy, 2009). However, to date, it has been unclear how these factorization problems could be represented and solved efficiently by neural circuits in the brain. ...
Article
The ability to encode and manipulate data structures with distributed neural representations could qualitatively enhance the capabilities of traditional neural networks by supporting rule-based symbolic reasoning, a central property of cognition. Here we show how this may be accomplished within the framework of vector symbolic architectures (VSA) (Plate, 1991; Gayler, 1998; Kanerva, 1996), whereby data structures are encoded by combining high-dimensional vectors with operations that together form an algebra on the space of distributed representations. In particular, we propose an efficient solution to a hard combinatorial search problem that arises when decoding elements of a VSA data structure: the factorization of products of multiple code vectors. Our proposed algorithm, called a resonator network, is a new type of recurrent neural network that interleaves VSA multiplication operations and pattern completion. We show in two examples—parsing of a tree-like data structure and parsing of a visual scene—how the factorization problem arises and how the resonator network can solve it. More broadly, resonator networks open the possibility of applying VSAs to myriad artificial intelligence problems in real-world domains. The companion paper in this issue (“Resonator Networks, 2: Factorization Performance and Capacity Compared to Optimization-Based Methods” by Kent, Frady, Sommer, and Olshausen) presents a rigorous analysis and evaluation of the performance of resonator networks, showing it outperforms alternative approaches.
... • and a recurrent VSA circuit to provide the dynamics to evolve the substitution vector 6 Maximal subgraph isomorphism circuit • Maximal subgraph isomorphism circuit (Gayler & Levy, 2009) • Finds the maximal subgraph isomorphism between two graphs represented as vectors • Implemented as a recurrent VSA circuit with a register containing a vertex substitution vector that evolves and settles over the course of the computation • Final state of substitution vector represents the set of vertex substitutions that best transforms each static graph into the other graph 7 ...
Presentation
Full-text available
It has been argued that analogy is at the core of cognition [7, 1]. My work in VSA is driven by the goal of building a practical, effective analogical memory/reasoning system. Analogy is commonly construed as structure mapping between a source and target [5], which in turn can be construed as representing the source and target as graphs and finding maximal graph isomorphisms between them. This can also be viewed as a kind of dynamic similarity in that the initially dissimilar source and target are effectively very similar after mapping. Similarity (the angle between vectors) is central to the mechanics of VSA/HDC. Introductory papers (e.g. [8]) necessarily devote space to vector similarityand the effect of the primitive operators (sum, product, permutation) on similarity. Most VSA examples rely on static similarity, where the vector representations are fixed over the time scale of the core computation (which is usually a single-pass, feed-forward computation). This emphasises encoding methods (e.g. [12, 13]) that create vector representations with the similarity structure required by the core computation. Random Indexing [13] is an instance of the vector embedding approach to representation [11] that is widely used in NLP and ML. The important point is that the vector embeddings are developed in advance and then used as static representations (with fixed similarity structure) in the subsequent computation of interest. Human similarity judgments are known to be context-dependent (see [3] for a brief review). It has also been argued that similarity and analogy are based on the same processes [6] and that cognition is so thoroughly context-dependent that representations are created on-the-fly in response to task demands [2]. This seems extreme, but doesn’t necessarily imply that the base representations are context-dependent as long as the cognitive process that compares them is context-dependent, which can be achieved by having dynamic representations that are derived from the static base representations by context-dependent transforms (or any functionally equivalent process). An obvious candidate for a dynamic transformation function in VSA is substitution by binding, because the substitution can be specified as a vector and dynamically generated (see Representing substitution with a computed mapping in [8]). This implies an internal degree of freedom (a register to hold the substitution vector while it evolves) and a recurrent VSA circuit to provide the dynamics to evolve the substitution vector. These essential aspects are present in [4], which finds the maximal subgraph isomorphism between two graphs represented as vectors. This is implemented as a recurrent VSA circuit with a register containing a substitution vector that evolves and settles over the course of the computation. The final state of the substitution vector represents the set of substitutions that transforms the static base representation of each graph into the best subgraph isomorphism to the static base representation of the other graph. This is a useful step along the path to an analogical memory system. Interestingly, the subgraph isomorphism circuit can be interpreted as related to the recently developed Resonator Circuits for factorisation of VSA representations [9], which have internal degrees of freedom for each of the factors to be calculated and a recurrent VSA dynamics that settles on the factorisation. The graph isomorphism circuit can be interpreted as finding a factor (the substitution vector) such that the product of that factor with each of the graphs is the best possible approximation to the other graph. This links the whole enterprise back to statistical modelling, where there is a long history of approximating matrices/tensors as the product of simpler factors [10]. References 1. Blokpoel, M., Wareham, T., Haselager, P., van Rooij, I.: Deep Analogical Inference as the Origin of Hypotheses. The Journal of Problem Solving 11(1), 1–24 (2018) 2. Chalmers, D.J., French, R.M., Hofstadter, D.R.: High-level perception, representation, and analogy: A critique of artificial intelligence methodology. Journal of Experimental & Theoretical Artificial Intelligence 4(3), 185–211 (1992) 3. Cheng, Y.: Context-dependent similarity. In: Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence (UAI’90), pp. 27–30. Cambridge, MA, USA (1990) 4. Gayler, R.W., Levy, S.D.: A distributed basis for analogical mapping. In: Proceedings of the Second International Conference on Analogy (ANALOGY-2009), pp. 165–174. New Bulgarian University, Sofia, Bulgaria (2009) 5. Gentner, D.: Structure-mapping: A theoretical framework for analogy. Cognitive Science 7(2), 155–170 (1983) 6. Gentner, D., Markman, A.B.: Structure mapping in analogy and similarity. American Psychologist 52(1), 45–56 (1997) 7. Gust, H., Krumnack, U., Kühnberger, K.-U., Schwering, A.: Analogical Reasoning: A core of cognition. KI - Künstliche Intelligenz 1(8), 8–12 (2008) 8. Kanerva, P.: Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation 1, 139–159 (2009) 9. Kent, S.J., Frady, E.P., Sommer, F.T., Olshausen, B.A.: Resonator Circuits for factoring high-dimensional vectors. http://arxiv.org/abs/1906.11684 (2019) 10. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review 51(3), 455–500 (2009) 11. Pennington, J., Socher, R., Manning, C.D.: GloVe: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (2014) 12. Purdy, S.: Encoding data for HTM systems. http://arxiv.org/abs/1602.05925 (2016) 13. Sahlgren, M.: An introduction to random indexing. In: Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (TKE 2005), Copenhagen, Denmark (2005)
Preprint
Full-text available
Neither deep neural networks nor symbolic AI alone have approached the kind of intelligence expressed in humans. This is mainly because neural networks are not able to decompose distinct objects from their joint representation (the so-called binding problem), while symbolic AI suffers from exhaustive rule searches, among other problems. These two problems are still pronounced in neuro-symbolic AI which aims to combine the best of the two paradigms. Here, we show that the two problems can be addressed with our proposed neuro-vector-symbolic architecture (NVSA) by exploiting its powerful operators on fixed-width holographic vectorized representations that serve as a common language between neural networks and symbolic logical reasoning. The efficacy of NVSA is demonstrated by solving the Raven's progressive matrices. NVSA achieves a new record of 97.7% average accuracy in RAVEN, and 98.8% in I-RAVEN datasets, with two orders of magnitude faster execution than the symbolic logical reasoning on CPUs.
Preprint
Vector Symbolic Architectures (VSAs) combine a high-dimensional vector space with a set of carefully designed operators in order to perform symbolic computations with large numerical vectors. Major goals are the exploitation of their representational power and ability to deal with fuzziness and ambiguity. Over the past years, VSAs have been applied to a broad range of tasks and several VSA implementations have been proposed. The available implementations differ in the underlying vector space (e.g., binary vectors or complex-valued vectors) and the particular implementations of the required VSA operators - with important ramifications for the properties of these architectures. For example, not every VSA is equally well suited to address each task, including complete incompatibility. In this paper, we give an overview of eight available VSA implementations and discuss their commonalities and differences in the underlying vector space, bundling, and binding/unbinding operations. We create a taxonomy of available binding/unbinding operations and show an important ramification for non self-inverse binding operation using an example from analogical reasoning. A main contribution is the experimental comparison of the available implementations regarding (1) the capacity of bundles, (2) the approximation quality of non-exact unbinding operations, and (3) the influence of combined binding and bundling operations on the query answering performance. We expect this systematization and comparison to be relevant for development and evaluation of new VSAs, but most importantly, to support the selection of an appropriate VSA for a particular task.
Conference Paper
Full-text available
We present a fully distributed connectionist architecture supporting lateral inhibition / winner-takes all competition. All items (individuals, relations, and structures) are represented by high-dimensional distributed vectors, and (multi)sets of items as the sum of such vectors. The architecture uses a neurally plausible permutation circuit to support a multiset intersec- tion operation without decomposing the summed vector into its constituent items or requiring more hardware for more complex representations. Iterating this operation produces a vector in which an initially slightly favored item comes to dominate the others. This result (1) challenges the view that lateral inhibition calls for localist representation; and (2) points toward a neural implementation where more complex representations do not require more complex hardware.
Conference Paper
Full-text available
Jackendoff (2002) posed four challenges that linguistic combinatoriality and rules of language present to theories of brain function. The essence of these problems is the question of how to neurally instantiate the rapid construction and transformation of the compositional structures that are typically taken to be the domain of symbolic processing. He contended that typical connectionist approaches fail to meet these challenges and that the dialogue between linguistic theory and cognitive neuroscience will be relatively unproductive until the importance of these problems is widely recognised and the challenges answered by some technical innovation in connectionist modelling. This paper claims that a little-known family of connectionist models (Vector Symbolic Architectures) are able to meet Jackendoff’s challenges.
Article
This article describes an integrated theory of analogical access and mapping, instantiated in a computational model called LISA (Learning and Inference with Schemas and Analogies). LISA represents predicates and objects as distributed patterns of activation that are dynamically bound into propositional structures, thereby achieving both the flexibility of a connectionist system and the structure sensitivity of a symbolic system. The model treats access and mapping as types of guided pattern classification, differing only in that mapping is augmented by a capacity to learn new correspondences. The resulting model simulates a wide range of empirical findings concerning human analogical access and mapping. LISA also has a number of inherent limitations, including capacity limits, that arise in human reasoning and suggests a specific computational account of these limitations. Extensions of this approach also account for analogical inference and schema induction.
Article
A general method, the tensor product representation, is defined for the connectionist representation of value/variable bindings. The technique is a formalization of the idea that a set of value/variable pairs can be represented by accumulating activity in a collection of units each of which computes the product of a feature of a variable and a feature of its value. The method allows the fully distributed representation of bindings and symbolic structures. Fully and partially localized special cases of the tensor product representation reduce to existing cases of connectionist representations of structured data. The representation rests on a principled analysis of structure; it saturates gracefully as larger structures are represented; it permits recursive construction of complex representations from simpler ones; it respects the independence of the capacities to generate and maintain multiple bindings in parallel; it extends naturally to continuous structures and continuous representational patterns; it permits values to also serve as variables; and it enables analysis of the interference of symbolic structures stored in associative memories. It has also served as the basis for working connectionist models of high-level cognitive tasks.
Article
In this paper we present Drama, a distributed model of analogical mapping that integrates semantic and structural constraints on constructing analogies. Specifically, Drama uses holographic reduced representations (Plate, 1994), a distributed representation scheme, to model the effects of structure and meaning on human performance of analogical mapping. Drama is compared to three symbolic models of analogy (SME, Copycat, and ACME) and one partially distributed model (LISA). We describe Drama’s performance on a number of example analogies and assess the model in terms of neurological and psychological plausibility. We argue that Drama’s successes are due largely to integrating structural and semantic constraints throughout the mapping process. We also claim that Drama is an existence proof of using distributed representations to model high-level cognitive phenomena.
Article
A theory of analogical mapping between source and target analogs based upon interacting structural, semantic, and pragmatic constraints is proposed here. The structural constraint of isomorphism encourages mappings that maximize the consistency of relational corresondences between the elements of the two analogs. The constraint of semantic similarity supports mapping hypotheses to the degree that mapped predicates have similar meanings. The constraint of pragmatic centrality favors mappings involving elements the analogist believes to be important in order to achieve the purpose for which the analogy is being used. The theory is implemented in a computer program called ACME (Analogical Constraint Mapping Engine), which represents constraints by means of a network of supporting and competing hypotheses regarding what elements to map. A cooperative algorithm for parallel constraint satisfaction identities mapping hypotheses that collectively represent the overall mapping that best fits the interacting constraints. ACME has been applied to a wide range of examples that include problem analogies, analogical arguments, explanatory analogies, story analogies, formal analogies, and metaphors. ACME is sensitive to semantic and pragmatic information if it is available, and yet able to compute mappings between formally isomorphic analogs without any similar or identical elements. The theory is able to account for empirical findings regarding the impact of consistency and similarity on human processing of analogies.