Content uploaded by Rod Rinkus

Author content

All content in this area was uploaded by Rod Rinkus on Nov 19, 2020

Content may be subject to copyright.

Content uploaded by Rod Rinkus

Author content

All content in this area was uploaded by Rod Rinkus on Aug 18, 2020

Content may be subject to copyright.

A

Combinatorial Population Code

(CPC) can simultaneously transmit the full similarity

(likelihood) distribution via an atemporal first-spike code

Neurithmic Systems

Rod Rinkus, Chief Scientist, Neurithmic Systems www.sparsey.com

Summary

Lead Research Scholar, Center for Brain-Inspired Computing, Purdue University

•Most prior spike codes fall into one of two classes, rate codes and

time-of-spike codes (either absolute or relative) and both have

also been generalized to populations. As shown Fig. 2a, these are

both fundamentally temporal. If both the source and target are

viewed as single units, the main weaknesses are:

a) Only one value can be sent at a time.

b) To first approx., sending an

N

-ary signal, requires a decode

window (

T

) that is order

N

×width of a single spike.

•Neural substrate

(“hardware”) used

unevenly: e.g.,

leftmost source

unit is on in every

signal, rightmost

unit only on for

signal, “4”, etc.

Fig. 2a: Temporal Spike Coding Fig. 2b: Atemporal (purely spatial) Spike Coding

•Rate (frequency): signal encoded in

the number of spikes per unit time.

•Time: signal encoded in precise or

relative time(s) of spike(s).

Note: to describe these temporal

codes, both source and target CFs

need only one cell.

1

φ

54324321

5 4 3 2 4321

5 4 3 2 4321

5 4 3 2 4321

a) Active source code,

φ

1,sends message to

target CF. Cell 1 has

max input sum and

wins, reads out, and

turns on the inhib.

cell (pink).

b) Inhib. cell sends to all

target cells, but only

effects the currently

active cell, cell 1.

c) Remaining cells again

compete. Cell 2 (which

represents

φ

2) wins,

reads out, and also

reactivates the

inhibitory cell.

d) Inhib. cell sends to all

target cells, but only

effects the currently

active cell, cell 2, etc.,

till all cells read out.

Fig. 3: Serial Readout of stored items in

descending rank order.

I1I2I3I4I5I6

I7

(Test Stim.)

I7

a) b)

Learned

Inputs

CM 0 1

u

U

ρ

µ

2345678910 11 12 13 14 15 16 17 18 19 20 21 22 23

c)

CM 7

µ

U

1.0

e)

µ

U

CM 15

f)

I1I2I3I4I5I6

Likelihood (L)

1

d)

0

24

0

18

12

6

L

∩(in red) of I7with I1to I6(and as decimals)

0.417 0.25 0.167 0.083 0.083 0.0

Q= 24

K= 8

φ

(I7)

Fig. 6: Simulation results showing that the CSA

approximately preserves similarity

•“Fixed time”: the # of steps to learn (store) a new item

remains constant as the # items stored grows?

•The learning algorithm is called the Code Selection

Algorithm (CSA) (Rinkus, 1996, 2010, 2014, 2017).

•Table 1 states a simplified version of it. Figs. 4 and 5

then provide a semi-quantitative explanation of how

the CSA approximately preserves similarity from inputs

to codes (CPCs).

1. Fig. 4b shows a tiny instance of

a CPC coding field (

Q=

5,

K=

3)

connected to an input field

comprised of eight binary pixels

with an active input, A, its CPC,

φ

A, and the wts that would be

increased to embed (learn) the

association. Fig. 4a shows input

A and three other inputs, B-D,

with decreasing similarity

(pixel overlap) to A.

2. Fig. 5a shows the learning event

for A. A is presented, causing

binary signals to be sent to the

coding field (CF). However, all

wts are initially zero (gray).

Thus, all CF cells have zero

input summation (

u=

0), as

shown in

u

charts.

3. Because we assume that all

inputs will be of the same

weight

(number of active

pixels), we can normalize input

sums to [0,1], denoted

U

. Here,

all

U

values are also zero.

Fig. 4

A

B

C

D

a) Four inputs

b) Model instance

5. Consistent with the key principle stated above, since familiarity (

G

) is zero (i.e., novelty is

maximal), we add maximal noise into the process of selecting the code.

•Algorithmically, this is done by creating uniform probability distributions (

ρ

)

(highlighted in yellow) in the

Q

WTA CMs and choosing a winner in each one. Thus,

the overall code,

φ

A, assigned to A is completely random.

•Neurally, we hypothesize this is done via some fast time-scale modulation of one or

more neuromods (ACh, NE, DA), which boosts the intrinsic excitability of the

competing cells, thus reducing the influence of the synaptic inputs (which reflect prior

learning, thus signal). See Rinkus 2010 for neural hypothesis.

6. Now, having learned (stored) A, Figs. 5b-e consider four possible next inputs to the model.

Fig. 5b shows the state variables (

u

,

U

,

ρ

) if A were presented again, i.e., a retrieval test trial

for A. In this case, due to the wts increased in Fig. 5a (black wts in Fig. 5b-e), the units that

won (by chance) during the learning trial now have

u=

5, thus

U=

1, and thus

G=

1,

correctly indicating that the input is completely familiar.

7. In this case, we want the prior learning, i.e., signal, to

dominate the choice of winner in each CM. Thus, we want

to add zero noise into the

ρ

distributions from which the

winners will be chosen. If we really add zero noise, then

the resulting

ρ

distribution in each CM would have all

probability mass on the cell that was included in

φ

Aand

zero mass for the other cells. Technically, this is a hard max

and is the optimal policy if the model “knows” it is in a

pure retrieval (test) mode. However, if the model does not

know that, i.e., if it is operating autonomously in the world,

then even when

G=

1, some small probability is still given

to the cells with non-maximal

u

(thus

U

) values, i.e., the

winner in each CM is formally chosen by softmax. This is

the case shown in Figs 5b. Because the distributions are so

peaked over the max-

u

cell (in each CM), we depict the

statistically plausible case where the max-

ρ

(thus max-

u

)

cell happens to be selected in all

Q=

5 CMs, i.e.,

φ

Ais re-

activated in its entirety.

4. The CSA uses the

U

values

across all

Q

CMs to compute a

measure,

G

, of the familiarity

(i.e., inverse novelty) of the

input. As Table 1 Steps 3 and 4

show,

G

is the average of the

max

U

values across the

Q

CMs. The semantics of

G

will

become clearer in Figs. 5b-e,

but in Fig. 5a,

G=

0.

8. They key principle is then clearly seen in looking across

Figs. 5c-e, from left to right. In Fig. 5c, item B, which has 4

out of 5 pixels in common with A, is presented (here, red

indicates non-intersecting cells, both in the input and

coding fields). Thus, the cells in

φ

Aeach have

u=

4, thus

U=

0.8, yielding

G=

0.8, a high, but non-maximal familiarity.

Accordingly, more noise is added into the

ρ

distributions

(CSA Steps 5-7), making them slightly flatter than in Fig. 5b.

Consequently, we show an overall code,

φ

B,being chosen

with high, but non-maximal, intersection with

φ

A.

9. The same logic applies to the last two cases, Figs. 5d and 5e.

The progressively smaller input similarities (to input A)

yield progressively lower

u

,

U

, and thus

G

, values, which

yields more noise (flatter ρdistributions)and progressively

lower expected intersection of the resulting code with

φ

A.

10. Overall, what this example depicts is the approximate

preservation of similarity by adding novelty-contingent

noise into the process of choosing a code.

Table 1: Code Selection Algorithm (CSA)

•Fig. 6 shows experimental

(simulation) results of a slightly larger

model (Fig. 6b). The input layer is

12x12 binary pixels. The CF has

Q=

24

WTA CMs, each with

K=

8 binary units.

•Panel a (top) shows six inputs,

I1

-I6 ,

all

with 12 active pixels that were learned

with single trials.

•Panel a (bottom) shows a test input,

I7

,

which was manually created to have

progressively smaller intersections

with

I1

-I6

. [red pixels show

intersections (n.b., this is opposite the

convention for red in Fig. 2)].

•Panel b shows the code,

φ

(I7), that gets

activated when I7is presented. Here,

black cells are those that intersect with

code,

φ

(I1), where I1is the stored input

most similar to I7. Red cells are cells

that won in their respective CMs but

are not in

φ

(I1), green indicates a cell

that did not win and is in

φ

(I1).

•Panel c shows the state variables

[including

µ

, an auxiliary variable (see

CSA Step 7)]. The main result appears

in panel d which shows that when I7is

presented, all the stored codes,

φ

(I1)-

φ

(I6) become active with strength

approximately proportional to the

similarities of their corresponding

inputs to I7, where code strength

activation is measured by the fraction

of its units that are active.

The two trials (for both learning and retrieval) shown above

(CASE 3) repeat the first two of CASE 2, except that we also

add a third level, a localist read-out level at the top.

a) Learning trial

A

0

5

0

1

0

1

u

ρ

U

A

φ

0

5

0

1

A

A

φ

0

1

b) Re-present (test) A

B

B

φ

c) Learning B

C

C

φ

d) Learning C

D

D

φ

e) Learning D

Fig. 5

Note: Fig 6 use a slightly different notation for codes:

e.g., φ

(

I7

)vs.

7

I

φ

ii) Fixed-size Sparse Spatial Code (Combinatorial Population Code)

2

φ

3

φ

4

φ

1

φ

→1

→2

→3

→4

1

2

3

4

Variable-size

“Thermometer”

T

Source Coding

Field (CF)

Target Coding

Field (CF)

Signal

1

2

3

4

Rate

Code

Time

Code

T

Source Coding

Field (CF)

Target Coding

Field (CF)

•NOTE: we assume that likelihood correlates with similarity. True, it is easy to find input domains where this is not true. However, it is true for vast regions of naturalistic input spaces.

E.g., for the visual case, as an object rotates in 3-space, all or much of it deforms continuously (approximately linearly) most of the time. Discontinuities, e.g., when the handle of a coffee

cup becomes obscured, are by far the exception. The theory implicit herein applies specifically to the ranges of input domains for which assuming correlation approximately correlates

with similarity is valid. Discontinuities are dealt with at the level of the overall hierarchy of CFs (e.g., macrocolumns) that comprise an overall model (not covered here).

•We know this because a simple circuit

(Fig. 3) could read out the four items

in descending likelihood order.

•Crucial Advantage: Each of the four signals

sends more than two bits. This is because

each of the four CPCs stored in the source

CF represents the similarity structure over

all four signals (again, see intersection

charts), or in other words, the full similarity

(likelihood) ordering over all four signals.

•Since there are 4! = 24 possible orderings

of four items, each CPC contains log2(24) =

4.58 bits of info. Thus, the wave of single

spikes sent from the active source CPC

transmits 4.58 bits, instead of only 2 bits,

as for the codes of Figs. 2a and 2b-i.

•However, if the source field represents signals using a

fixed-size Combinatorial

Population Code

(CPC) (defined at left), then fundamentally more efficient

signaling becomes possible. Specifically, if the code, i.e., mapping, is learned

and the learning process approximately preserves similarity, i.e., maps more

similar inputs to more highly intersecting codes, then:

When any one code is fully active in the source CF, the wave of single (thus,

first) spikes sent simultaneously from the active units comprising that code

transmits the explicit similarities (likelihoods) of all N stored codes, i.e., such

that all N explicit similarities / likelihoods can be read out (decoded).

•Here I describe a new kind of atemporal population spiking code, the

Combinatorial Population Code

(CPC), which:

a) Also requires a

T

of only order one spike width, and

b) Sends the entire explicit likelihood distribution over all items (signal

values) stored in the source coding field.

•Crucially, this contrasts with

Probabilistic Population Code

(PPC) models [e.g.,

Georgopolous et al (1986), Pouget et al (1998), Jazayeri & Movshon (2006)],

which also send only one value, possibly with additional information about the

shape of the distribution or uncertainty of the sent value, but do not send the

entire explicit distributions, i.e., in a way where the likelihoods of the

individual items stored in the source CF can be decoded from the sent signal.

•Only one signal can be sent at a time.

•For Rate / Thermometer codes, different signals have different energies (different #s of spikes).

•Each possible signal carries two bits, i.e., distinguishes between one of four possible values.

•Thus, unlike the spike codes of Figs. 2a and

2b-i, the decoded signal in target CF

manifests as the identity of the winning cell,

i.e., the one with the highest input sum,

not

the activation level(s) of any unit(s).

•Signal encoded in the source field by the

number (fraction) of active cells (cf.

Gerstner et al, Fig. 7.8).

•Signal encoded in the

message

as the input

sum to the single target cell.

•A CPC

Coding Field

(CF) consists of

Q

WTA

Competitive Modules

(CMs)

•Each CM consists of

K

binary units

(pyramidal cell analogs)

Input, A

•The weights

increased in

storing, i.e.,

learning, the

association,

A

φ

A

•Single-trial

learning

L2/3

Input Field

•Binary features (pixels)

•Fully connected to CF

•All wts initially 0

•

Q

=7 WTA CMs

•

K

=7 units / CM

•Code space:

K Q

CM

Coding Field (CF)

Macrocolumn

Minicolumn = CM

(Fig. from Peters &

Sethares, 1996)

Fig. 1: What is a Combinatorial

Population Code (CPC)?

A particular CPC,

φ

A

•A CPC is a particular kind of modular

sparse distributed code (SDC)

i) Variable-size Spatial Code

•Decoded signal (red

number) in target

field manifests as

the activation level

(e.g., spike rate or

time) of target cell.

•Requires a time window (

T

) of order one

spike width to send any of the signals, i.e.,

transmit time is independent of

N

.

Disadvantages

•A variable-size atemporal (purely spatial) code (Fig. 2b-i) is

better because

T

need only be order one spike width to send

any signal, i.e.,

T

independent of

N

.

•The sent signal, i.e., the vector of input

sums across the target units (numbers

over units), encodes the likelihoods of

all

items (hypotheses) stored in the source CF.

Source Coding Field (CF): Fixed-size CPC

Target Coding Field (CF): Localist

2

φ

3

φ

4

φ

1

φ

Learning Retrieval

2 3 4

2 3 4

2 3 4

2 3 4

5 4 3 2

1 2 3 4

4 5 4 3

1 2 3 4

3 4 5 4

1 2 3 4

2 3 4 5

1 2 3 4

1

2

3

4

2

φ

3

φ

4

φ

1

φ

1

2

3

4

Source Coding Field (CF)

Target Coding Field (CF)

∩with

φ4

Intersections

∩with

φ1

1

φ

3

φ

4

φ

2

φ

∩with

φ2

∩with

φ3

CASE 1:

1

A

φ

1

B

φ

2

A

φ

2

B

φ

3

A

φ

3

B

φ

4

A

φ

4

B

φ

500 0 050 253 05054

55 00 0 050 354 050 4

55 00 0 050 45 5 04 0 3

55 00 0 05054503 0 2

Learning Retrieval

1

A

φ

1

B

φ

2

A

φ

2

B

φ

3

A

φ

3

B

φ

4

A

φ

4

B

φ

Source Coding Field (CF)

Target Coding Field (CF)

Source Coding Field (CF): Fixed-size CPC

Target Coding Field (CF): Fixed-size CPC

CASE 2:

•The four CPCs (codes),

φ

1−

φ

4, (at right) were manually chosen so their

intersection structure correlates with scalar similarity, i.e., codes,

φ

2,

φ

3

and

φ

4, have decreasing intersection with

φ

1(cyan represents cells that do

not intersect with

φ

1).

•In contrast to Case 1, in Case 2, the

decoded signal in the target CF (also

called CF B) manifests as a particular

combination of cells, namely the

Q=

5

winners in their respective CMs.

•Reading down the Learning column, we show the single-trial learning

events for the four associations,

φ

1→1,

φ

2→2,

φ

3→3, and

φ

4→4. All wts are

initially zero (gray) and are set to 1 (black) upon pre-post coincidence.

•Unlike the spike codes of Figs. 2a and Fig. 2b-i, there is no intrinsic relation between the scalar signals, 1−4, and CPCs.

That is, a CPC is just a particular combination of units. All CPCs are of size

Q=

5, and the input summation for the most

likely sent value is always

Q=

5. So any such similarity-preserving relation must either be manually designed or learned.

•While similarity-preserving codes were manually chosen in this example, an extremely efficient, fixed-time, single-trial,

unsupervised learning algorithm that approximately preserves similarity is described below.

•Together, the Retrieval and Intersections

columns show that after having learned

(stored) the four associations, reinstating

any of the four source codes transmits the

entire explicit similarity (likelihood)

distribution to the target field with a wave

of single spikes from the active code.

The cortical learning algorithm that makes the efficient communication of

probability distributions possible

The CSA’s key principle

Add noise proportional

to the novelty of an

input X into the process

of choosing a code

(CPC) for X.

CASE 3 is included just to show more clearly that if the

target CF also uses CPC, then:

a) Not only can the full explicit likelihood distribution be

sent

simultaneously to the target field, but

b) It can also be

decoded

(

read out

) simultaneously !

•Requires a time window (

T

) of

order

N

x spike width to send a

signal, where

N

is the number of

different signals that can be sent.

Alternate, “linear” views of CPC CF (used in figures below)

=

•An additional advantage of the CPC-based, atemporal spike code described

here is that all signals have the same energy (same # of spikes) And, this

remains true as additional codes (associations) are stored.

•As in Case 1, the Retrieval column

shows that when any of the four

source signals is sent, the entire

likelihood distribution is sent. That

information is present in the vector

of input sums across the target cells,

and more importantly, in the

Q=

5

cells with the max sum in their

respective CM (shown in red).

•Note: In these four retrieval trials,

we can use hard max in each CM to

choose a winner.

•However, if the model is in learning

mode, it instead uses softmax in each

CM, as is described in bottom panel.

•In order to demonstrate the final, most

powerful claim, we need to have the target

CF (now also called CF B) also use CPC. We

manually chose a set of CF B codes whose

intersection structure can also represent

scalar similarity (as can be seen ar right).

•During learning, we associated the source

CF (CF A) codes with the CF B codes in a

way that preserved similarity, as shown at

right. Note: the source CF codes now have a

superscript, A; similarly for target CF codes.

For source CF: active units are black or blue. Blue indicates units not ∩’ing with

φ

1.For source CF A, active units are black or blue. Blue indicates units not ∩’ing with

φ

1.

For target CF B, active units are black or red. Red indicates units not ∩’ing with

φ

1

1

A

φ

1

B

φ

1

B

φ

1

A

φ

2

B

φ

2

A

φ

3

B

φ

3

A

φ

4

B

φ

4

A

φ

5 4 3 2

1 432

500 0 050 253 05054

1 432

500 0 050 354 050 45

4 5 4 3

1

A

φ

1

B

φ

2

A

φ

2

B

φ

Learning Retrieval

1 432

1 432

CASE 3:

Source Coding Field (CF): Fixed-size CPC

Target Coding Field (CF): Fixed-size CPC

Read-out Field: Localist

1

A

φ

1

B

φ

2

A

φ

2

B

φ

Read-out

Field

The full likelihood distribution is not only

sent, but also decoded, simultaneously

•Figs. 2b-i and 3 already established that the circuit of Fig. 3

can serially read out the entire distribution in descending

likelihood order, from the localist read-out field of Fig. 2b-i.

•The information needed to do that is present in the wave of

single (and wlog, simultaneous) spikes that arrive at CF B.

•This is true in CASE 3 as well, even though the target CF, CF

B, now also uses CPC.

•Let CF B then send a wave of simultaneous single spikes to

the localist read-out field.

•If the localist field can then be serially read (as in Fig. 3),

then it must be the case that the decoded signal is present

in that wave of spikes from CF B to the read-out field.

•Therefore, at the instant the code is activated in CF B

(based on the wave of spikes from CF A), it already has

been decoded.

Proof