Content uploaded by Ross W Gayler

Author content

All content in this area was uploaded by Ross W Gayler

Content may be subject to copyright.

165

A DISTRIBUTED BASIS FOR ANALOGICAL MAPPING

Ross W. Gayler

r.gayler@gmail.com

School of Communication, Arts and Critical Enquiry

La Trobe University

Victoria 3086 Australia

Simon D. Levy

levys@wlu.edu

Department of Computer Science

Washington and Lee University

Lexington, Virginia USA

ABSTRACT

We are concerned with the practical fea-

sibility of the neural basis of analogical map-

ping. All existing connectionist models of ana-

logical mapping rely to some degree on local-

ist representation (each concept or relation is

represented by a dedicated unit/neuron). These

localist solutions are implausible because they

need too many units for human-level compe-

tence or require the dynamic re-wiring of net-

works on a sub-second time-scale.

Analogical mapping can be formalised as

finding an approximate isomorphism between

graphs representing the source and target con-

ceptual structures. Connectionist models of

analogical mapping implement continuous

heuristic processes for finding graph isomor-

phisms. We present a novel connectionist

mechanism for finding graph isomorphisms

that relies on distributed, high-dimensional

representations of structure and mappings.

Consequently, it does not suffer from the prob-

lems of the number of units scaling combinato-

rially with the number of concepts or requiring

dynamic network re-wiring.

GRAPH ISOMORPHISM

Researchers tend to divide the process of

analogy into three stages: retrieval (finding an

appropriate source situation), mapping (identi-

fying the corresponding elements of the source

and target situations), and application. Our

concern is with the mapping stage, which is

essentially about structural correspondence. If

the source and target situations are formally

represented as graphs, the structural corre-

spondence between them can be described as

approximate graph isomorphism. Any mecha-

nism for finding graph isomorphisms is, by

definition, a mechanism for finding structural

correspondence and a possible mechanism for

implementing analogical mapping. We are

concerned with the formal underpinning of

analogical mapping (independently of whether

any particular researcher chooses to describe

their specific model in these terms).

It might be supposed that representing

situations as graphs is unnecessarily restrictive.

However, anything that can be formalised can

be represented by a graph. Category theory,

which is effectively a theory of structure and

graphs, is an alternative to set theory as a

foundation for mathematics (Marquis, 2009),

so anything that can be mathematically repre-

sented can be represented as a graph.

It might also be supposed that by working

solely with graph isomorphism we favour

structural correspondence to the exclusion of

other factors that are known to influence ana-

logical mapping, such as semantic similarity

and pragmatics. However, as any formal struc-

ture can be represented by graphs it follows

that semantics and pragmatics can also be en-

coded as graphs. For example, some models of

analogical mapping are based on labelled

graphs with the process being sensitive to label

similarity. However, any label value can be

encoded as a graph and label similarity cap-

Analogical Mapping with Vector Symbolic Architectures

166

tured by the degree of approximate isomor-

phism. Further, the mathematics of graph iso-

morphism has been extended to include attrib-

ute similarity and is commonly used this way

in computer vision and pattern recognition

(Bomze, Budinich, Pardalos & Pelillo, 1999).

The extent to which analogical mapping

based on graph isomorphism, is sensitive to

different types of information depends on what

information is encoded into the graphs. Our

current research is concerned only with the

practical feasibility of connectionist implemen-

tations of graph isomorphism. The question of

what information is encoded in the graphs is

separable. Consequently, we are not concerned

with modelling the psychological properties of

analogical mapping as such questions belong

to a completely different level of inquiry.

CONNECTIONIST IMPLEMENTATIONS

It is possible to model analogical map-

ping as a purely algorithmic process. However,

we are concerned with physiological plausibil-

ity and consequently limit our attention to

connectionist models of analogical mapping

such as ACME (Holyoak & Thagard, 1989),

AMBR (Kokinov, 1988), DRAMA (Eliasmith

& Thagard, 2001), and LISA (Hummel &

Holyoak, 1997). These models vary in their

theoretical emphases and the details of their

connectionist implementations, but they all

share a problem in the scalability of the repre-

sentation or construction of the connectionist

mapping network. We contend that this is a

consequence of using localist connectionist

representations or processes. In essence, they

either have to allow in advance for all combi-

natorial possibilities, which requires too many

units (Stewart & Eliasmith, in press), or they

have to construct the required network for each

new mapping task in a fraction of a second.

Problems with localist implementation

Rather than review all the major connec-

tionist models of analogical mapping, we will

use ACME and DRAMA to illustrate the prob-

lem with localist representation. Localist and

distributed connectionist models have often

been compared in terms of properties such as

neural plausibility and robustness. Here, we

are concerned only with a single issue: dy-

namic re-wiring (i.e., the need for connections

to be made between neurons as a function of

the source and target situations to be mapped).

ACME constructs a localist network to

represent possible mappings between the

source and target structures. The network is a

function of the source and target representa-

tions, and a new network has to be constructed

for every source and target pair. A localist unit

is constructed to represent each possible map-

ping between a source vertex and target vertex.

The activation of each unit indicates the degree

of support for the corresponding vertex map-

ping being part of the overall mapping be-

tween the source and target. The connections

between the network units encode compatibil-

ity between the corresponding vertex map-

pings. These connections are a function of the

source and target representations and con-

structed anew for each problem. Compatible

vertex mappings are linked by excitatory con-

nections so that support for plausibility of one

vertex mapping transmits support to compati-

ble mappings. Similarly, inhibitory connec-

tions are used to connect the units representing

incompatible mappings. The network imple-

ments a relaxation labelling that finds a com-

patible set of mappings. The operation of the

mapping network is neurally plausible, but the

process of its construction is not.

The inputs to ACME are symbolic repre-

sentations of the source and target structures.

The mapping network is constructed by a

symbolic process that traverses the source and

target structures. The time complexity of the

traversal will be a function of the size of the

structures to be mapped. Given that we believe

analogical mapping is a continually used core

part of cognition and that all cognitive infor-

mation is encoded as (large) graph structures,

we strongly prefer mapping network setup to

require approximately constant time independ-

ent of the structures to be mapped.

DRAMA is a variant of ACME with dis-

tributed source and target representations.

Ross W. Gayler and Simon D. Levy

167

However, it appears that the process of con-

structing the distributed representation of the

mapping network is functionally localist, re-

quiring a decomposition and sequential tra-

versal of the source and target structures.

Ideally, the connectionist mapping net-

work should have a fixed neural architecture.

The units and their connections should be

fixed in advance and not need to be re-wired in

response to the source and target representa-

tions. The structure of the current mapping

task should be encoded entirely in activations

generated on the fixed neural architecture by

the source and target representations and the

set-up process should be holistic rather than

requiring decomposition of the source and

target representations. Our research aims to

achieve this by using distributed representation

and processing from the VSA family of con-

nectionist models.

We proceed by introducing replicator

equations; a localist heuristic for finding graph

isomorphisms. Then we introduce Vector

Symbolic Architectures (VSA), a family of

distributed connectionist mechanisms for the

representation and manipulation of structured

information. Our novel contribution is to im-

plement replicator equations in a completely

distributed fashion based on VSA. We con-

clude with a proof-of-concept demonstration

of a distributed re-implementation of the prin-

cipal example from the seminal paper on graph

isomorphism via replicator equations.

REPLICATOR EQUATIONS

The approach we are pursuing for graph

isomorphism is based on the work of Pelillo

(1999), who casts subgraph isomorphism as

the problem of finding a maximal clique (set of

mutually adjacent vertices) in the association

graph derived from the two graphs to be

mapped. Given a graphG′ of size N with an

NN × adjacency matrix ij

aA ′

=

′ and a graph

G′′ of size N with an NN × adjacency ma-

trix hk

aA ′′

=

′′ , their association graph G of

size 2

N can be represented by an 22 NN ×

adjacency matrix ),( jkih aaA

=

whose edges

encode pairs of edges from G

′

and G′′ :

⎪

⎩

⎪

⎨

⎧≠≠

′′

−

′

−

=otherwise0

andif)(1 2

,khjiaa

ahkij

jkih (1)

The elements of

A

are 1 if the corre-

sponding edges in G

′

and G

′

′

have the same

state of existence and 0 if the corresponding

edges have different states of existence. De-

fined this way, the edges of the association

graph G provide evidence about potential

mappings between the vertices of G′and G′′

based on whether the corresponding edges and

non-edges are consistent. The presence of an

edge between two vertices in one graph and an

edge between two vertices in the other graph

supports a possible mapping between the

members of each pair of vertices (as does the

absence of such an edge in both graphs).

By treating the graph isomorphism prob-

lem as a maximal-clique-finding problem, Pe-

lillo exploits an important result in graph the-

ory. Consider a graph G with adjacency ma-

trix

A

, a subset C of vertices of G, and a

characteristic vector C

x (indicating member-

ship of the subset C) defined as

⎩

⎨

⎧∈

=otherwise0

if1 CiC

xC

i (2)

where C is the cardinality of C. It turns out

that C is a maximum clique of G if and only

if C

x maximizes the function Axxxf T

=)( ,

where T

x is the transpose of

x

, N

xℜ∈ ,

1

1=

∑=

N

ii

x, and 0≥

∀

i

xi .

Starting at some initial condition (typi-

cally the barycenter, Nxi1

=

corresponding

to all i

x being equally supported as part of the

solution),

x

can be obtained through iterative

application of the following equation:

∑=

=+ N

jjj

ii

ittx

ttx

tx

1)()(

)()(

)1(

π

π

(3)

where

Analogical Mapping with Vector Symbolic Architectures

168

∑=

=N

jjiji txwt 1)()(

π

(4)

and W is a matrix of weights, ij

w, typically

just the adjacency matrix

A

of the association

graph or a linear function of

A

. The

x

vector

can thus be considered to represent the state of

the system’s belief about the vertex mappings

at a given time, with Equations 3 and 4 repre-

senting a dynamical system parameterized by

the weights in W. i

π

can be interpreted as the

evidence for i

x obtained from all the compati-

ble j

x where the compatibility is encoded by

ij

w. The denominator in Equation 3 is a nor-

malizing factor ensuring that 1

1=

∑=

N

ii

x.

Pelillo borrows Equations 3 and 4 from

the literature on evolutionary game theory in

which i

π

is the overall payoff associated with

playing strategy i, and ij

w is the payoff asso-

ciated with playing strategy i against strategy

j. In the context of the maximum-clique

problem, these replicator equations can be

used to derive a vector

x

(vertex mappings)

that maximizes the “payoff” (edge consis-

tency) encoded in the adjacency matrix. Vertex

mappings correspond to strategies, and as

Equation 3 is iterated, mappings with higher

fitness (consistency of mappings) come to

dominate ones with lower fitness.

Figure 1. A simple graph isomorphism problem.

Consider the simple graphs in Figure 1,

used as the principal example by Pelillo (1999)

and which we will later re-implement in a dis-

tributed fashion. The maximal isomorphism

between these two graphs is {A=P, B=Q, C=R,

D=S} or {A=P, B=Q, C=S, D=R}. Table 1

shows the first and last rows of the adjacency

matrix for the association graph of these

graphs, generated using Equation 1. Looking at

the first row of the table, we see that the map-

ping A=P is consistent with the mappings

B=Q, B=R, B=S, C=Q, C=R, C=S, D=Q,

D=R, and D=S, but not with A=Q, A=R, A=S,

B=P, etc.

AP AQ AR AS BP BQ BR BS CP CQ CR CS DP DQ DR DS

AP 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1

…

…

…

…

…

…

…

…

…

…

…

…

…

…

…

…

…

DS 1 0 1 0 0 1 0 0 1 0 1 0 0 0 0 0

Table 1. Fragment of adjacency matrix for Fig. 1.

Initially, all values in the state vector

x

are set to 0.0625 (1/16). Repeated application

of Equations 3 and 4 produces a final state

vector that encodes the two maximal isomor-

phisms, with 0.3 in the positions for A=P and

B=Q, 0.1 in the positions for C=R, C=S, D=R,

and D=S, and 0 in the others. The conflicting

mappings for C, D, R, and S correspond to a

saddle point in the dynamics of the replicator

equations, created by the symmetry in the

graphs. Adding a small amount of noise to the

state breaks this symmetry, producing a final

state vector with values of 0.25 for the optimal

mappings A=P, B=Q, and C=R, D=S or C=S,

D=R, and zero elsewhere. The top graph of

Figure 4 shows the time course of the settling

process from our implementation of Pelillo’s

localist algorithm.

This example is trivially small. However,

the same approach has been successfully ap-

plied to graphs with more than 65,000 vertices

(Pelillo & Torsello, 2006). It has also been

extended to match hierarchical, attributed

structures for computer vision problems (Pe-

lillo, Siddiqi & Zucker 1999). Thus, we are

confident that replicator equations are a rea-

sonable candidate mechanism for the structure

matching at the heart of analogical mapping.

DISTRIBUTED IMPLEMENTATION

The replicator equation mechanism can

be easily implemented as a localist connection-

ist circuit. This is qualitatively very similar to

ACME and suffers the same problems due to

the localist representation. In this section we

A B

C

D P Q

S

R

Ross W. Gayler and Simon D. Levy

169

present a distributed connectionist scheme for

representing edges, vertices, and mappings that

does not suffer from these problems.

Vector Symbolic Architecture

Vector Symbolic Architecture is a name

that we coined (Gayler, 2003) to describe a

class of connectionist models that use high-

dimensional vectors (typically around 10,000

dimensions) of low-precision numbers to en-

code structured information as distributed rep-

resentations. VSA can represent complex enti-

ties such as trees and graphs as vectors. Every

such entity, no matter how simple or complex,

is represented by a pattern of activation dis-

tributed over all the elements of the vector.

This general class of architectures traces its

origins to the tensor product work of Smolen-

sky (1990), but avoids the exponential growth

in dimensionality of tensor products. VSAs

employ three types of vector operator: a multi-

plication-like operator, an addition-like opera-

tor, and a permutation-like operator. The mul-

tiplication operator is used to associate or bind

vectors. The addition operator is used to su-

perpose vectors or add them to a set. The per-

mutation operator is used to quote or protect

vectors from the other operations.

The use of hyperdimensional vectors to

represent symbols and their combinations pro-

vides a number of mathematically desirable

and biologically realistic features (Kanerva,

2009). A hyperdimensional vector space con-

tains as many mutually orthogonal vectors as

there are dimensions and exponentially many

almost-orthogonal vectors (Hecht-Nielsen,

1994), thereby supporting the representation of

astronomically large numbers of distinct items.

Such representations are also highly robust to

noise. Approximately 30% of the values in a

vector can be randomly changed before it be-

comes more similar to another meaningful

(previously-defined) vector than to its original

form. It is also possible to implement such

vectors in a spiking neuron model (Eliasmith,

2005).

The main difference among types of

VSAs is in the type of numbers used as vector

elements and the related choice of multiplica-

tion-like operation. Holographic Reduced Rep-

resentations (Plate, 2003) use real numbers and

circular convolution. Kanerva’s (1996) Binary

Spatter Codes (BSC) use Boolean values and

elementwise exclusive-or. Gayler’s (1998)

Multiply, Add, Permute coding (MAP) uses

values from }1,1{

−

+

and elementwise multi-

plication. A useful feature of BSC and MAP is

that each vector is its own multiplicative in-

verse. Multiplying any vector by itself elemen-

twise yields the identity vector. As in ordinary

algebra, multiplication and addition are asso-

ciative and commutative, and multiplication

distributes over addition.

We use MAP in the work described here.

As an illustration of how VSA can be used to

represent graph structure, consider again the

optimal mapping {A=P, B=Q, C=R, D=S} for

the graphs in Figure 1. We represent this set of

mappings as the vector

SDRCQBPA

∗

+

∗

+

∗

+

∗

(5)

where

A

, B, C, ... are arbitrarily chosen

(random) vectors over }1,1{

−

+

and ∗ and +

represent elementwise vector multiplication

and addition respectively. For any mapped

vertex pair X=Y, the representation Y of ver-

tex Y can be retrieved by multiplying the map-

ping vector )*(

Κ

+

YX by

X

, and vice-

versa. The resulting vector will contain the

representation of Y plus a set of representa-

tions not corresponding to any vertex, which

can be treated as noise; e.g.:

noisePSDARCAQBAP

SDARCAQBAPAA

SDRCQBPAA

+=∗∗+∗∗+∗∗+=

∗∗+∗∗+∗∗+∗∗=

∗

+

∗

+

∗

+

∗

∗

)( (6)

The noise can be removed from the re-

trieved vector by passing it through a “cleanup

memory” that stores only the meaningful vec-

tors ),,,,,,,( SRQPDCBA . Cleanup memory

can be implemented in a biologically plausible

way as a Hopfield network that associates each

meaningful vector to itself (a variant of Heb-

bian learning). Such networks can reconstruct

the original form of a vector from a highly

Analogical Mapping with Vector Symbolic Architectures

170

degraded exemplar, via self-reinforcing feed-

back dynamics.

Note that although the vectors depicted in

Equations 5 and 6 appear complex they are just

vector values like any other. From the point of

view of the implementing hardware all vectors

are of equal computational complexity. This

has profound implications for the resource

requirements of VSA-based systems. For ex-

ample, the computational cost of labelling a

graph vertex with a simple attribute or a com-

plex structure is exactly the same.

Our Model

Our goal is to build a distributed imple-

mentation of the replicator Equations 3 and 4

by representing the problem as distributed pat-

terns of fixed, high dimension in VSA such

that the distributed system has the same dy-

namics as the localist formulation. As in the

localist version, we need a representation

x

of

the evolving state of the system’s belief about

the vertex mappings, and a representation w

of the adjacencies in the association graph.

In the VSA representation of a graph we

represent vertices by random hyperdimen-

sional vectors, edges by products of the vectors

representing the vertices, and mappings by

products of the mapped entities. It is natural to

represent the set of vertices as the sum of the

vectors representing the vertices. The product

of the vertex sets of the two graphs is then

identical to the sum of the possible mappings

of vertices (Equation 7). That is, the initial

value of

x

can be calculated holistically from

the representations of the graphs using only

one product operation that does not require

decomposition of the vertex set into compo-

nent vertices. For the graphs in Figure 1:

SDRDQBPBQAPA

SRQPDCBAx

∗+∗++∗+∗++∗+∗=

+

+

+∗+++=

ΚΚ

)()( (7)

For VSA it is natural to represent the set

of edges of a graph as the sum of the products

of the vertices connected by each edge. The

product of the edge sets of the two graphs is

identical to a sum of products of four vertices.

This encodes information about mappings of

edges, or equivalently, about compatibility of

vertex mappings. That is, one holistic product

operation applied to the edge sets is able to

encode all the possible edge mappings in con-

stant time no matter how many edges there are.

The reader may have noticed that the de-

scription above refers only to edges, whereas

Pelillo’s association graph also encodes infor-

mation about the mapping of non-edges in the

two graphs. We believe the explicit representa-

tion of non-edges is cognitively implausible.

However, Pelillo was not concerned with cog-

nitive plausibility. Since our aim here is to

reproduce his work, we include non-edges in

Equation 8. The distributed vector w func-

tions as the localist association matrix W. For

the graphs in Figure 1:

SRDCSRCARPCAQPCA

SRBARPBAQPBA

SQDBRQDBSQCBRQCB

SRSPRPQPDCDACABA

SQRQDBCBw

∗∗∗++∗∗∗++∗∗∗+∗∗∗+

∗∗∗++∗∗∗+∗∗∗+

∗

∗∗+∗∗∗+∗∗∗+∗∗∗=

∗+∗+∗+∗∗∗+∗+∗+∗

+

∗

+

∗

∗

∗

+

∗

=

ΚΚ

Κ

)()(

)()(

(8)

The terms of this sum correspond to the

nonzero elements of Table 1 (allowing for the

symmetries due to commutativity). With

x

and w set up this way, we can compute the

payoff vector

π

as the product of

x

and w.

As in the localist formulation (Equation 4), this

product causes consistent mappings to rein-

force each other. Evidence is propagated from

each vertex mapping to consistent vertex map-

pings via the edge compatibility information

encoded in w. (The terms of Equation 9 have

been rearranged to highlight this cancellation.)

Κ

ΚΚ

+∗∗∗+∗+∗+∗=

+∗∗∗+∗∗∗∗+∗+∗=

∗

=

RBPQBPRBQB

RBPAQBPAQAPA

w

x

)()(

π

(9)

Implementing the update of

x

(Equation

3) is more challenging for the VSA formula-

tion. As in the localist version, the idea is for

corresponding vertex mappings in

x

and

π

to

reinforce each other multiplicatively, in a kind

of multiset intersection (denoted here as ∧): if

)( 321 RBkQBkPAkx

∗

+

∗

+

∗

=

and )( 54 QBkPAk ∗+∗=

π

then

π

∧

x

equals )( 5241 QBkkPAkk ∗

+

∗

, for

non-negative weights 1

k, 2

k, 3

k, 4

k, and 5

k.

Ross W. Gayler and Simon D. Levy

171

Because of the self-cancellation property of the

MAP architecture, simple elementwise multi-

plication of

x

and

π

will not work. We could

extract the i

k by iterating through each of the

pairwise mappings ),,,( SDQAPA ∗∗

∗

Κ and

dividing

x

and

π

elementwise by each map-

ping, but this is the kind of functionally local-

ist approach we argue is neurally implausible.

Instead, we need a holistic distributed intersec-

tion operator. This can be construed as a spe-

cial case of lateral inhibition, a winner-takes-

all competition, which has traditionally been

considered a localist operation (Page, 2000;

Levy & Gayler, in press).

Figure 2. A neural circuit for vector intersection.

To implement this intersection operator

in a holistic, distributed manner we exploit the

third component of the MAP architecture:

permutation. Our solution, shown in Figure 2,

works as follows: 1: and 2: are registers (vec-

tors of units) loaded with the vectors represent-

ing the multisets to be intersected. P1( ) com-

putes some arbitrary, fixed permutation of the

vector in 1:, and P2( ) computes a different

fixed permutation of the vector in 2:. Register

3: contains the product of these permuted vec-

tors. Register 4: is a memory (a constant vec-

tor value) pre-loaded with each of the possible

multiset elements transformed by multiplying

it with both permutations of itself. That is,

)()(:4 2

11ii

M

iiXPXPX ∗∗= ∑=, where

M

is

the number of items in the memory vector (4:).

To implement the replicator equations the

clean-up memory 4: must be loaded with a

pattern based on the sum of all the possible

vertex mappings (similar to the initial value of

the mapping vector

x

).

To see how this circuit implements inter-

section, consider the simple case of a system

with three meaningful vectors

X

, Y, and

Z

where we want to compute the intersection of

Xk1 with )( 32 YkXk

+

. The first vector is

loaded into register 1:, the second into 2:, and

the sum )()()()()()( 212121 ZPZPZYPYPYXPXPX ∗∗+

∗

∗

+

∗

∗

is loaded into 4:. After passing the register

contents through their respective permutations

and multiplying the results, register 3: will

contain

)(

2

)(

131

)(

2

)(

121

)

32

(

2

)

1

(

1

YPXPkkXPXPkk

YkXkPXkP

∗+∗=

+

∗

Multiplying registers 3: and 4: together will

then result in the desired intersection (relevant

terms in bold) plus noise, which can be re-

moved by standard cleanup techniques:

noise

ZPZPZYPYPY

YPXPkk

+=

∗∗+∗∗+∗∗

∗∗+∗

X

2

k

1

k

(X)

2

P(X)

1

PX

(X)

2

P(X)

1

P

2

k

1

k

))(

2

)(

1

)(

2

)(

1

(

))(

2

)(

131

(

In brief, the circuit in Figure 2 works by

guaranteeing that the permutations will cancel

only for those terms i

X that are present in

both input registers, with other terms being

rendered as noise.

In order to improve noise-reduction it is

necessary to sum over several such intersection

circuits, each based on different permutations.

This sum over permutations has a natural in-

terpretation in terms of sigma–pi units (Ru-

melhart, Hinton & McClelland, 1986), where

each unit calculates the sum of many products

of a few inputs from units in the prior layer.

The apparent complexity of Figure 2 results

from drawing it for ease of explanation rather

than correspondence to implementation. The

intersection network of Figure 2 could be im-

plemented as a single layer of sigma–pi units.

COMPARING THE APPROACHES

Figure 3 shows the replicator equation

approach to graph isomorphism as a recurrent

neural circuit. Common to Pelillo’s approach

and ours is the initialization of a weight vector

w with evidence of compatibility of edges and

non-edges from the association graph, as well

as the computation of the payoff vector

π

2: P2( )

1: P1( )

4:

3: 5:

∗ ∗

Analogical Mapping with Vector Symbolic Architectures

172

from multiplication (

∗

) of

x

and w, the

computation of the intersection of

x

and

π

(∧), and the normalization of

x

( Σ/). The

VSA formulation additionally requires a

cleanup memory (c) and intersection-cleanup

memory ( ∧

c), each initialized to a constant

value.

Figure 3. A neural circuit for graph isomorphism.

Figure 3 also shows the commonality of

the localist and VSA approaches, with the

VSA-only components depicted in dashed

lines. Note that the architecture is completely

fixed and the specifics of the mapping problem

to be solved are represented entirely in the

patterns of activation loaded into the circuit.

Likewise, the circuit does not make any deci-

sions based on the contents of the vectors be-

ing manipulated. The product and intersection

operators are applied to whatever vectors are

present on their inputs and the circuit settles to

a stable state representing the solution.

To demonstrate the viability of our ap-

proach, we used this circuit with a 10,000-

dimensional VSA to deduce isomorphisms for

the graphs in Figure 1. This example was cho-

sen to allow direct comparison with Pelillo’s

results. Although it was not intended as an

example of analogical mapping, it does di-

rectly address the underlying mechanism of

graph isomorphism. Memory and processor

limitations made it impractical to implement

the main cleanup memory as a Hopfield net

(108 weights), so we simulated the Hopfield

net with a table that stored the meaningful vec-

tors and returned the one closest to the noisy

version. To implement the intersection circuit

from Figure 2 we summed over 50 replicates

of that circuit, differing only in their arbitrary

permutations. The updated mapping vector

was passed back through the circuit until the

Euclidean distance between t

x and 1−t

x dif-

fered by less than 0.001. At each iteration we

computed the cosine of

x

with each item in

cleanup memory, in order to compare our VSA

implementation with the localist version; how-

ever, nothing in our implementation depended

on this functionally localist computation.

Figure 4. Convergence of localist (top) and VSA

(bottom) implementation.

Figure 4 compares the results of Pelillo’s

localist approach to ours, for the graph iso-

morphism problem shown in Figure 1. Time

(iterations)

t

is plotted on the abscissa, and the

corresponding values in the mapping vector on

the ordinate. For the localist version we added

a small amount of Gaussian noise to the state

vector on the first iteration in order to keep it

from getting stuck on a saddle point; the VSA

version, which starts with a noisy mapping

vector, does not suffer from this problem. In

both versions one set of consistent vertex

mappings (shown in marked lines) comes to

dominate the other, inconsistent mappings

cΛ

c

w *

Λ

cleanup /

Σ

xt

xt+1

πt

Ross W. Gayler and Simon D. Levy

173

(shown in solid lines) in less than 100 itera-

tions.

The obvious difference between the VSA and

localist versions is that the localist version

settles into a “clean” state corresponding to the

characteristic vector in Equation 2, with four

values equal to 0.25 and the others equal to

zero; whereas in the VSA version the final

state approximates this distribution. (The small

negative values are an artifact of using the

cosine as a metric for comparison.)

CONCLUSIONS AND FUTURE WORK

The work presented here has demon-

strated a proof-of-concept that a distributed

representation (Vector Symbolic Architecture)

can be applied successfully to a problem

(graph isomorphism) that until now has been

considered the purview of localist modelling.

The results achieved with VSA are qualita-

tively similar to those with the localist formu-

lation. In the process, we have provided an

example of how a distributed representation

can implement an operation reminiscent of

lateral inhibition, winner-takes-all competition,

which likewise has been considered to be a

localist operation. The ability to model compe-

tition among neurally encoded structures and

relations, not just individual items or concepts,

points to promising new directions for cogni-

tive modelling in general.

The next steps in this research will be to

demonstrate the technique on larger graphs and

investigate how performance degrades as the

graph size exceeds the representational capac-

ity set by the vector dimensionality. We will

also investigate the performance of the system

in finding subgraph isomorphisms.

Graph isomorphism by itself does not

constitute a psychologically realistic analogical

mapping system. There are many related prob-

lems to be investigated in that broader context.

The question of what conceptual information is

encoded in the graphs, and how, is foremost. It

also seems reasonable to expect constraints on

the graphs encoding cognitive structures (e.g.

constraints on the maximum and minimum

numbers of edges from each vertex). It may be

possible to exploit such constraints to improve

some aspects of the mapping circuit. For ex-

ample, it may be possible to avoid the cogni-

tively implausible use of non-edges as evi-

dence for mappings.

Another area we intend to investigate is

the requirement for population of the clean-up

memories. In this system the clean-up memo-

ries are populated from representations of the

source and target graphs. This is not unreason-

able if retrieval is completely separate from

mapping. However, we wish to explore the

possibility of intertwining retrieval and map-

ping. For this to be feasible we would need to

reconfigure the mapping so that cleanup mem-

ory can be populated with items that have been

previously encountered rather than items cor-

responding to potential mappings.

We expect this approach to provide fer-

tile lines of research for many years to come.

SOFTWARE DOWNLOAD

MATLAB code implementing the algo-

rithm in (Pelillo, 1999) and our VSA version

can be downloaded from tinyurl.com/gidemo

ACKNOWLEDGMENTS

We thank Pentti Kanerva, Tony Plate,

and Roger Wales for many useful suggestions.

REFERENCES

Bomze, I. M., Budinich, M., Pardalos, P. M.,

& Pelillo, M. (1999) The Maximum

Clique Problem. In D.-Z. Du & P. M.

Pardalos (Eds.) Handbook of combinato-

rial optimization. Supplement Volume A

(pp. 1-74). Boston, MA, USA: Kluwer

Academic Publishers.

Eliasmith, C. (2005). Cognition with neurons:

A large- scale, biologically realistic

model of the Wason task. In G. Bara, L.

Barsalou, & M. Bucciarelli (Eds.), Pro-

ceedings of the 27th Annual Meeting of

the Cognitive Science Society.

Eliasmith, C., & Thagard, P. (2001). Integrat-

ing structure and meaning: A distributed

Analogical Mapping with Vector Symbolic Architectures

174

model of analogical mapping. Cognitive

Science, 25, 245-286.

Gayler, R. (1998). Multiplicative binding, rep-

resentation operators, and analogy,. In K.

Holyoak, D. Gentner, & B. Kokinov

(Eds.), Advances in analogy research: In-

tegration of theory and data from the

cognitive, computational, and neural sci-

ences (p. 405). Sofia, Bulgaria: New Bul-

garian University.

Gayler, R. W. (2003). Vector Symbolic Archi-

tectures answer Jackendoff’s challenges

for cognitive neuroscience. In Peter

Slezak (Ed.), ICCS/ASCS International

Conference on Cognitive Science (pp.

133-138). Sydney, Australia: University

of New South Wales.

Hecht-Nielsen, R. (1994). Context vectors:

general purpose approximate meaning

representations self- organized from raw

data. In J. Zurada, R. M. II, & B. Robin-

son (Eds.), Computational intelligence:

Imitating life (pp. 43-56). IEEE Press.

Holyoak, J., & Thagard, P. (1989). Analogical

mapping by constraint satisfaction. Cog-

nitive Science, 13, 295- 355.

Hummel, J., & Holyoak, K. (1997). Distrib-

uted representations of structure: A the-

ory of analogical access and mapping.

Psychological Review, 104, 427-466.

Kanerva, P. (1996). Binary spatter-coding of

ordered k-tuples. In C. von der Malsburg,

W. von Seelen, J. Vorbrüggen, & B.

Sendhoff (Eds.), Artificial neural net-

works (Proceedings of ICANN 96) (pp.

869-873). Berlin: Springer-Verlag.

Kanerva, P. (2009). Hyperdimensional com-

puting: An introduction to computing in

distributed representation with high-

dimensional random vectors. Cognitive

Computation, 1, 139-159.

Kokinov, B. (1988). Associative memory-

based reasoning: How to represent and

retrieve cases. In T. O’Shea & V. Sgurev

(Eds.), Artificial intelligence III: Meth-

odology, systems, applications (pp. 51-

58). Amsterdam: Elsevier Science Pub-

lishers B.V. (North Holland).

Levy, S. D., & Gayler, R. W. (in press). "Lat-

eral inhibition" in a fully distributed con-

nectionist architecture. In Proceedings of

the Ninth International Conference on

Cognitive Modeling (ICCM 2009). Man-

chester, UK.

Marquis, J.-P. (2009). Category theory. In E.

N. Zalta (Ed.), The Stanford Encyclope-

dia of Philosophy (Spring 2009 Edition),

http://plato.stanford.edu/archives/spr2009/

entries/category-theory/

Page, M. (2000). Connectionist modelling in

psychology: A localist manifesto. Behav-

ioral and Brain Sciences, 23, 443-512.

Pelillo, M. (1999). Replicator equations,

maximal cliques, and graph isomorphism.

Neural Computation, 11, 1933-1955.

Pelillo, M., Siddiqi, K., & Zucker, S. W.

(1999). Matching hierarchical structures

using association graphs. IEEE Transac-

tions on Pattern Analysis and Machine

Intelligence, 21, 1105-1120.

Pelillo, M., & Torsello, A. (2006). Payoff-

monotonic game dynamics and the

maximum clique problem. Neural Com-

putation, 18, 1215-1258.

Plate, T. A. (2003). Holographic reduced rep-

resentation: Distributed representation

for cognitive science. Stanford, CA,

USA: CSLI Publications.

Rumelhart, D. E., Hinton, G. E., &

McClelland, J. L. (1986). A general

framework for parallel distributed proc-

essing. In D. E. Rumelhart & J. L.

McClelland (Eds.), Parallel distributed

processing: Explorations in the micro-

structure of cognition. Volume 1: Foun-

dations (pp. 45-76). Cambridge, MA,

USA: The MIT Press.

Smolensky, P. (1990). Tensor product variable

binding and the representation of sym-

bolic structures in connectionist systems.

Artificial Intelligence, 46, 159-216.

Stewart, T., & Eliasmith, C. (in press).

Compositionality and biologically plausi-

ble models. In M. Werning, W. Hinzen,

& E. Machery (Eds.), The Oxford hand-

book of compositionality. Oxford, UK:

Oxford University Press.