PreprintPDF Available

A combinatorial population code can simultaneously transmit the full similarity (likelihood) distribution via an atemporal first-spike code. Accepted as Poster to NAISys 2020 (Nov. 9-12).

Authors:
  • Neurithmic Systems
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

A simple, atemporal, first-spike code, operating on Combinatorial Population Codes (CPCs) (a.k.a., binary sparse distributed representations) is described, which allows the similarities (more generally, likelihoods) of all items (hypotheses) stored in a CPC field to be simultaneously transmitted with a wave of single spikes from any single active code (i.e., the code of any one particular stored item). Moreover, the number of underlying binary signals sent remains constant as the number of stored items grows.
Content may be subject to copyright.
A
Combinatorial Population Code
(CPC) can simultaneously transmit the full similarity
(likelihood) distribution via an atemporal first-spike code
Neurithmic Systems
Rod Rinkus, Chief Scientist, Neurithmic Systems www.sparsey.com
Summary
Lead Research Scholar, Center for Brain-Inspired Computing, Purdue University
Most prior spike codes fall into one of two classes, rate codes and
time-of-spike codes (either absolute or relative) and both have
also been generalized to populations. As shown Fig. 2a, these are
both fundamentally temporal. If both the source and target are
viewed as single units, the main weaknesses are:
a) Only one value can be sent at a time.
b) To first approx., sending an
N
-ary signal, requires a decode
window (
T
) that is order
N
×width of a single spike.
Neural substrate
(“hardware”) used
unevenly: e.g.,
leftmost source
unit is on in every
signal, rightmost
unit only on for
signal, “4”, etc.
Fig. 2a: Temporal Spike Coding Fig. 2b: Atemporal (purely spatial) Spike Coding
Rate (frequency): signal encoded in
the number of spikes per unit time.
Time: signal encoded in precise or
relative time(s) of spike(s).
Note: to describe these temporal
codes, both source and target CFs
need only one cell.
1
φ
54324321
5 4 3 2 4321
5 4 3 2 4321
5 4 3 2 4321
a) Active source code,
φ
1,sends message to
target CF. Cell 1 has
max input sum and
wins, reads out, and
turns on the inhib.
cell (pink).
b) Inhib. cell sends to all
target cells, but only
effects the currently
active cell, cell 1.
c) Remaining cells again
compete. Cell 2 (which
represents
φ
2) wins,
reads out, and also
reactivates the
inhibitory cell.
d) Inhib. cell sends to all
target cells, but only
effects the currently
active cell, cell 2, etc.,
till all cells read out.
Fig. 3: Serial Readout of stored items in
descending rank order.
I1I2I3I4I5I6
I7
(Test Stim.)
I7
a) b)
Learned
Inputs
CM 0 1
u
U
ρ
µ
2345678910 11 12 13 14 15 16 17 18 19 20 21 22 23
c)
CM 7
µ
U
1.0
e)
µ
U
CM 15
f)
I1I2I3I4I5I6
Likelihood (L)
1
d)
0
24
0
18
12
6
L
(in red) of I7with I1to I6(and as decimals)
0.417 0.25 0.167 0.083 0.083 0.0
Q= 24
K= 8
φ
(I7)
Fig. 6: Simulation results showing that the CSA
approximately preserves similarity
“Fixed time”: the # of steps to learn (store) a new item
remains constant as the # items stored grows?
The learning algorithm is called the Code Selection
Algorithm (CSA) (Rinkus, 1996, 2010, 2014, 2017).
Table 1 states a simplified version of it. Figs. 4 and 5
then provide a semi-quantitative explanation of how
the CSA approximately preserves similarity from inputs
to codes (CPCs).
1. Fig. 4b shows a tiny instance of
a CPC coding field (
Q=
5,
K=
3)
connected to an input field
comprised of eight binary pixels
with an active input, A, its CPC,
φ
A, and the wts that would be
increased to embed (learn) the
association. Fig. 4a shows input
A and three other inputs, B-D,
with decreasing similarity
(pixel overlap) to A.
2. Fig. 5a shows the learning event
for A. A is presented, causing
binary signals to be sent to the
coding field (CF). However, all
wts are initially zero (gray).
Thus, all CF cells have zero
input summation (
u=
0), as
shown in
u
charts.
3. Because we assume that all
inputs will be of the same
weight
(number of active
pixels), we can normalize input
sums to [0,1], denoted
U
. Here,
all
U
values are also zero.
Fig. 4
A
B
C
D
a) Four inputs
b) Model instance
5. Consistent with the key principle stated above, since familiarity (
G
) is zero (i.e., novelty is
maximal), we add maximal noise into the process of selecting the code.
Algorithmically, this is done by creating uniform probability distributions (
ρ
)
(highlighted in yellow) in the
Q
WTA CMs and choosing a winner in each one. Thus,
the overall code,
φ
A, assigned to A is completely random.
Neurally, we hypothesize this is done via some fast time-scale modulation of one or
more neuromods (ACh, NE, DA), which boosts the intrinsic excitability of the
competing cells, thus reducing the influence of the synaptic inputs (which reflect prior
learning, thus signal). See Rinkus 2010 for neural hypothesis.
6. Now, having learned (stored) A, Figs. 5b-e consider four possible next inputs to the model.
Fig. 5b shows the state variables (
u
,
U
,
ρ
) if A were presented again, i.e., a retrieval test trial
for A. In this case, due to the wts increased in Fig. 5a (black wts in Fig. 5b-e), the units that
won (by chance) during the learning trial now have
u=
5, thus
U=
1, and thus
G=
1,
correctly indicating that the input is completely familiar.
7. In this case, we want the prior learning, i.e., signal, to
dominate the choice of winner in each CM. Thus, we want
to add zero noise into the
ρ
distributions from which the
winners will be chosen. If we really add zero noise, then
the resulting
ρ
distribution in each CM would have all
probability mass on the cell that was included in
φ
Aand
zero mass for the other cells. Technically, this is a hard max
and is the optimal policy if the model “knows” it is in a
pure retrieval (test) mode. However, if the model does not
know that, i.e., if it is operating autonomously in the world,
then even when
G=
1, some small probability is still given
to the cells with non-maximal
u
(thus
U
) values, i.e., the
winner in each CM is formally chosen by softmax. This is
the case shown in Figs 5b. Because the distributions are so
peaked over the max-
u
cell (in each CM), we depict the
statistically plausible case where the max-
ρ
(thus max-
u
)
cell happens to be selected in all
Q=
5 CMs, i.e.,
φ
Ais re-
activated in its entirety.
4. The CSA uses the
U
values
across all
Q
CMs to compute a
measure,
G
, of the familiarity
(i.e., inverse novelty) of the
input. As Table 1 Steps 3 and 4
show,
G
is the average of the
max
U
values across the
Q
CMs. The semantics of
G
will
become clearer in Figs. 5b-e,
but in Fig. 5a,
G=
0.
8. They key principle is then clearly seen in looking across
Figs. 5c-e, from left to right. In Fig. 5c, item B, which has 4
out of 5 pixels in common with A, is presented (here, red
indicates non-intersecting cells, both in the input and
coding fields). Thus, the cells in
φ
Aeach have
u=
4, thus
U=
0.8, yielding
G=
0.8, a high, but non-maximal familiarity.
Accordingly, more noise is added into the
ρ
distributions
(CSA Steps 5-7), making them slightly flatter than in Fig. 5b.
Consequently, we show an overall code,
φ
B,being chosen
with high, but non-maximal, intersection with
φ
A.
9. The same logic applies to the last two cases, Figs. 5d and 5e.
The progressively smaller input similarities (to input A)
yield progressively lower
u
,
U
, and thus
G
, values, which
yields more noise (flatter ρdistributions)and progressively
lower expected intersection of the resulting code with
φ
A.
10. Overall, what this example depicts is the approximate
preservation of similarity by adding novelty-contingent
noise into the process of choosing a code.
Table 1: Code Selection Algorithm (CSA)
Fig. 6 shows experimental
(simulation) results of a slightly larger
model (Fig. 6b). The input layer is
12x12 binary pixels. The CF has
Q=
24
WTA CMs, each with
K=
8 binary units.
Panel a (top) shows six inputs,
I1
-I6 ,
all
with 12 active pixels that were learned
with single trials.
Panel a (bottom) shows a test input,
I7
,
which was manually created to have
progressively smaller intersections
with
I1
-I6
. [red pixels show
intersections (n.b., this is opposite the
convention for red in Fig. 2)].
Panel b shows the code,
φ
(I7), that gets
activated when I7is presented. Here,
black cells are those that intersect with
code,
φ
(I1), where I1is the stored input
most similar to I7. Red cells are cells
that won in their respective CMs but
are not in
φ
(I1), green indicates a cell
that did not win and is in
φ
(I1).
Panel c shows the state variables
[including
µ
, an auxiliary variable (see
CSA Step 7)]. The main result appears
in panel d which shows that when I7is
presented, all the stored codes,
φ
(I1)-
φ
(I6) become active with strength
approximately proportional to the
similarities of their corresponding
inputs to I7, where code strength
activation is measured by the fraction
of its units that are active.
The two trials (for both learning and retrieval) shown above
(CASE 3) repeat the first two of CASE 2, except that we also
add a third level, a localist read-out level at the top.
a) Learning trial
A
0
5
0
1
0
1
u
ρ
U
A
φ
0
5
0
1
A
φ
0
1
b) Re-present (test) A
B
B
φ
c) Learning B
C
C
φ
d) Learning C
D
D
φ
e) Learning D
Fig. 5
Note: Fig 6 use a slightly different notation for codes:
e.g., φ
(
I7
)vs.
7
I
φ
ii) Fixed-size Sparse Spatial Code (Combinatorial Population Code)
2
φ
3
φ
4
φ
1
φ
1
2
3
4
1
2
3
4
Variable-size
“Thermometer”
T
Source Coding
Field (CF)
Target Coding
Field (CF)
Signal
1
2
3
4
Rate
Code
Time
Code
T
Source Coding
Field (CF)
Target Coding
Field (CF)
NOTE: we assume that likelihood correlates with similarity. True, it is easy to find input domains where this is not true. However, it is true for vast regions of naturalistic input spaces.
E.g., for the visual case, as an object rotates in 3-space, all or much of it deforms continuously (approximately linearly) most of the time. Discontinuities, e.g., when the handle of a coffee
cup becomes obscured, are by far the exception. The theory implicit herein applies specifically to the ranges of input domains for which assuming correlation approximately correlates
with similarity is valid. Discontinuities are dealt with at the level of the overall hierarchy of CFs (e.g., macrocolumns) that comprise an overall model (not covered here).
We know this because a simple circuit
(Fig. 3) could read out the four items
in descending likelihood order.
Crucial Advantage: Each of the four signals
sends more than two bits. This is because
each of the four CPCs stored in the source
CF represents the similarity structure over
all four signals (again, see intersection
charts), or in other words, the full similarity
(likelihood) ordering over all four signals.
Since there are 4! = 24 possible orderings
of four items, each CPC contains log2(24) =
4.58 bits of info. Thus, the wave of single
spikes sent from the active source CPC
transmits 4.58 bits, instead of only 2 bits,
as for the codes of Figs. 2a and 2b-i.
However, if the source field represents signals using a
fixed-size Combinatorial
Population Code
(CPC) (defined at left), then fundamentally more efficient
signaling becomes possible. Specifically, if the code, i.e., mapping, is learned
and the learning process approximately preserves similarity, i.e., maps more
similar inputs to more highly intersecting codes, then:
When any one code is fully active in the source CF, the wave of single (thus,
first) spikes sent simultaneously from the active units comprising that code
transmits the explicit similarities (likelihoods) of all N stored codes, i.e., such
that all N explicit similarities / likelihoods can be read out (decoded).
Here I describe a new kind of atemporal population spiking code, the
Combinatorial Population Code
(CPC), which:
a) Also requires a
T
of only order one spike width, and
b) Sends the entire explicit likelihood distribution over all items (signal
values) stored in the source coding field.
Crucially, this contrasts with
Probabilistic Population Code
(PPC) models [e.g.,
Georgopolous et al (1986), Pouget et al (1998), Jazayeri & Movshon (2006)],
which also send only one value, possibly with additional information about the
shape of the distribution or uncertainty of the sent value, but do not send the
entire explicit distributions, i.e., in a way where the likelihoods of the
individual items stored in the source CF can be decoded from the sent signal.
Only one signal can be sent at a time.
For Rate / Thermometer codes, different signals have different energies (different #s of spikes).
Each possible signal carries two bits, i.e., distinguishes between one of four possible values.
Thus, unlike the spike codes of Figs. 2a and
2b-i, the decoded signal in target CF
manifests as the identity of the winning cell,
i.e., the one with the highest input sum,
not
the activation level(s) of any unit(s).
Signal encoded in the source field by the
number (fraction) of active cells (cf.
Gerstner et al, Fig. 7.8).
Signal encoded in the
message
as the input
sum to the single target cell.
A CPC
Coding Field
(CF) consists of
Q
WTA
Competitive Modules
(CMs)
Each CM consists of
K
binary units
(pyramidal cell analogs)
Input, A
The weights
increased in
storing, i.e.,
learning, the
association,
A
φ
A
Single-trial
learning
L2/3
Input Field
Binary features (pixels)
Fully connected to CF
All wts initially 0
Q
=7 WTA CMs
K
=7 units / CM
Code space:
K Q
CM
Coding Field (CF)
Macrocolumn
Minicolumn = CM
(Fig. from Peters &
Sethares, 1996)
Fig. 1: What is a Combinatorial
Population Code (CPC)?
A particular CPC,
φ
A
A CPC is a particular kind of modular
sparse distributed code (SDC)
i) Variable-size Spatial Code
Decoded signal (red
number) in target
field manifests as
the activation level
(e.g., spike rate or
time) of target cell.
Requires a time window (
T
) of order one
spike width to send any of the signals, i.e.,
transmit time is independent of
N
.
Disadvantages
A variable-size atemporal (purely spatial) code (Fig. 2b-i) is
better because
T
need only be order one spike width to send
any signal, i.e.,
T
independent of
N
.
The sent signal, i.e., the vector of input
sums across the target units (numbers
over units), encodes the likelihoods of
all
items (hypotheses) stored in the source CF.
Source Coding Field (CF): Fixed-size CPC
Target Coding Field (CF): Localist
2
φ
3
φ
4
φ
1
φ
Learning Retrieval
2 3 4
2 3 4
2 3 4
2 3 4
5 4 3 2
1 2 3 4
4 5 4 3
1 2 3 4
3 4 5 4
1 2 3 4
2 3 4 5
1 2 3 4
1
2
3
4
2
φ
3
φ
4
φ
1
φ
1
2
3
4
Source Coding Field (CF)
Target Coding Field (CF)
with
φ4
Intersections
with
φ1
1
φ
3
φ
4
φ
2
φ
with
φ2
with
φ3
CASE 1:
1
A
φ
1
B
φ
2
A
φ
2
B
φ
3
A
φ
3
B
φ
4
A
φ
4
B
φ
500 0 050 253 05054
55 00 0 050 354 050 4
55 00 0 050 45 5 04 0 3
55 00 0 05054503 0 2
Learning Retrieval
1
A
φ
1
B
φ
2
A
φ
2
B
φ
3
A
φ
3
B
φ
4
A
φ
4
B
φ
Source Coding Field (CF)
Target Coding Field (CF)
Source Coding Field (CF): Fixed-size CPC
Target Coding Field (CF): Fixed-size CPC
CASE 2:
The four CPCs (codes),
φ
1
φ
4, (at right) were manually chosen so their
intersection structure correlates with scalar similarity, i.e., codes,
φ
2,
φ
3
and
φ
4, have decreasing intersection with
φ
1(cyan represents cells that do
not intersect with
φ
1).
In contrast to Case 1, in Case 2, the
decoded signal in the target CF (also
called CF B) manifests as a particular
combination of cells, namely the
Q=
5
winners in their respective CMs.
Reading down the Learning column, we show the single-trial learning
events for the four associations,
φ
11,
φ
22,
φ
33, and
φ
44. All wts are
initially zero (gray) and are set to 1 (black) upon pre-post coincidence.
Unlike the spike codes of Figs. 2a and Fig. 2b-i, there is no intrinsic relation between the scalar signals, 14, and CPCs.
That is, a CPC is just a particular combination of units. All CPCs are of size
Q=
5, and the input summation for the most
likely sent value is always
Q=
5. So any such similarity-preserving relation must either be manually designed or learned.
While similarity-preserving codes were manually chosen in this example, an extremely efficient, fixed-time, single-trial,
unsupervised learning algorithm that approximately preserves similarity is described below.
Together, the Retrieval and Intersections
columns show that after having learned
(stored) the four associations, reinstating
any of the four source codes transmits the
entire explicit similarity (likelihood)
distribution to the target field with a wave
of single spikes from the active code.
The cortical learning algorithm that makes the efficient communication of
probability distributions possible
The CSA’s key principle
Add noise proportional
to the novelty of an
input X into the process
of choosing a code
(CPC) for X.
CASE 3 is included just to show more clearly that if the
target CF also uses CPC, then:
a) Not only can the full explicit likelihood distribution be
sent
simultaneously to the target field, but
b) It can also be
decoded
(
read out
) simultaneously !
Requires a time window (
T
) of
order
N
x spike width to send a
signal, where
N
is the number of
different signals that can be sent.
Alternate, “linear” views of CPC CF (used in figures below)
=
An additional advantage of the CPC-based, atemporal spike code described
here is that all signals have the same energy (same # of spikes) And, this
remains true as additional codes (associations) are stored.
As in Case 1, the Retrieval column
shows that when any of the four
source signals is sent, the entire
likelihood distribution is sent. That
information is present in the vector
of input sums across the target cells,
and more importantly, in the
Q=
5
cells with the max sum in their
respective CM (shown in red).
Note: In these four retrieval trials,
we can use hard max in each CM to
choose a winner.
However, if the model is in learning
mode, it instead uses softmax in each
CM, as is described in bottom panel.
In order to demonstrate the final, most
powerful claim, we need to have the target
CF (now also called CF B) also use CPC. We
manually chose a set of CF B codes whose
intersection structure can also represent
scalar similarity (as can be seen ar right).
During learning, we associated the source
CF (CF A) codes with the CF B codes in a
way that preserved similarity, as shown at
right. Note: the source CF codes now have a
superscript, A; similarly for target CF codes.
For source CF: active units are black or blue. Blue indicates units not ing with
φ
1.For source CF A, active units are black or blue. Blue indicates units not ing with
φ
1.
For target CF B, active units are black or red. Red indicates units not ing with
φ
1
1
A
φ
1
B
φ
1
B
φ
1
A
φ
2
B
φ
2
A
φ
3
B
φ
3
A
φ
4
B
φ
4
A
φ
5 4 3 2
1 432
500 0 050 253 05054
1 432
500 0 050 354 050 45
4 5 4 3
1
A
φ
1
B
φ
2
A
φ
2
B
φ
Learning Retrieval
1 432
1 432
CASE 3:
Source Coding Field (CF): Fixed-size CPC
Target Coding Field (CF): Fixed-size CPC
Read-out Field: Localist
1
A
φ
1
B
φ
2
A
φ
2
B
φ
Read-out
Field
The full likelihood distribution is not only
sent, but also decoded, simultaneously
Figs. 2b-i and 3 already established that the circuit of Fig. 3
can serially read out the entire distribution in descending
likelihood order, from the localist read-out field of Fig. 2b-i.
The information needed to do that is present in the wave of
single (and wlog, simultaneous) spikes that arrive at CF B.
This is true in CASE 3 as well, even though the target CF, CF
B, now also uses CPC.
Let CF B then send a wave of simultaneous single spikes to
the localist read-out field.
If the localist field can then be serially read (as in Fig. 3),
then it must be the case that the decoded signal is present
in that wave of spikes from CF B to the read-out field.
Therefore, at the instant the code is activated in CF B
(based on the wave of spikes from CF A), it already has
been decoded.
Proof
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.