Conference PaperPDF Available

A cell assembly transmits the full likelihood distribution via an atemporal combinatorial spike code

Authors:
  • Neurithmic Systems

Abstract and Figures

The “neuron doctrine” says the individual neuron is the functional unit of meaning. For a single source neuron, spike coding schemes can be based on spike rate or precise spike time(s). Both are fundamentally temporal codes with two key limitations. 1) Assuming M different messages from (activation levels of) the source neuron are possible, the decode window duration, T, must be ≥ M times a single spike’s duration. 2) Only one message (activation level) can be sent at a time. If instead, we define the cell assembly (CA), i.e., a set of co-active neurons, as the functional unit of meaning, then a message is carried by the set of spikes simultaneously propagating in the bundle of axons leading from the CA’s neurons. This admits a more efficient, faster, fundamentally atemporal, combinatorial coding scheme, which removes the above limitations. First, T becomes independent of M, in principle, shrinking to a single spike duration. Second, multiple messages, in fact, the entire similarity (thus, likelihood) distribution over all items stored in the coding field can be sent simultaneously. This requires defining CAs as sets of fixed cardinality, Q, which allows the similarity structure over a set of items stored as CAs to be represented by their intersection structure. Moreover, when any one CA is fully active (all Q of its neurons are active), all other CAs stored in the coding field are partially active proportional to their intersections with the fully active CA. If M concepts are stored, there are M! possible similarity orderings. Thus,sending any one of those orderings sends log2(M!) bits, far exceeding the log2(M) bits sent by any single message using temporal spike codes. This marriage of a fixed-size CA representation and atemporal coding scheme may explain the speed and efficiency of probabilistic computation in the brain.
Content may be subject to copyright.
A cell assembly transmits the full likelihood distribution via an atemporal combinatorial spike code
Summary. The “neuron doctrine” says the individual neuron is the functional unit of meaning. For a single
source neuron, spike coding schemes can be based on spike rate or precise spike time(s). Both are fundamentally
temporal codes with two key limitations. 1) Assuming M different messages from (activation levels of) the source
neuron are possible, the decode window duration, T, must be M times a single spike’s duration. 2) Only one
message (activation level) can be sent at a time. If instead, we define the cell assembly (CA), i.e., a set of co-
active neurons, as the functional unit of meaning, then a message is carried by the set of spikes simultaneously
propagating in the bundle of axons leading from the CA’s neurons. This admits a more efficient, faster,
fundamentally atemporal, combinatorial coding scheme, which removes the above limitations. First, T becomes
independent of M, in principle, shrinking to a single spike duration. Second, multiple messages, in fact, the entire
similarity (thus, likelihood) distribution over all items stored in the coding field can be sent simultaneously. This
requires defining CAs as sets of fixed cardinality, Q, which allows the similarity structure over a set of items
stored as CAs to be represented by their intersection structure. Moreover, when any one CA is fully active (all Q
of its neurons are active), all other CAs stored in the coding field are partially active proportional to their
intersections with the fully active CA. If M concepts are stored, there are M! possible similarity orderings. Thus,
sending any one of those orderings sends log2(M!) bits, far exceeding the log2M bits sent by any single message
using temporal spike codes. This marriage of a fixed-size CA representation and atemporal coding scheme may
explain the speed and efficiency of probabilistic computation in the brain.
For a single source neuron, two types of spike codes are possible, rate (frequency) (Fig. 1b), and latency (e.g.,
of spike(s) relative to an event, e.g., phase of gamma) (Fig. 1c). Both are fundamentally temporal, requiring a
decode window duration T much longer than a single spike. [n.b. no relation between T and axon length intended.]
Further, T must grow with the number of unique values that need to be reliably sent/decoded. But if information
is represented by Hebbian cell assemblies (CAs), a particular kind of distributed code wherein items are
represented by sets of co-active neurons chosen from a (typically much larger) coding field, then messages are
carried by sets of spikes propagating in bundles of axons. N.b.: Some population-based models remain
fundamentally temporal: the signal depends on spike rates of the afferent axons, e.g., [1-4] (not shown in Fig. 1).
Figure 1. Temporal vs. non-temporal spike coding concepts.
Distributed coding allows a fundamentally atemporal coding scheme where the signal is encoded in the
pattern of instantaneous sums of spikes arriving simultaneously via multiple afferent synapses onto the target
field neurons, in principle, allowing T to shrink to the duration of a single, i.e., first, spike. Fig. 1d illustrates one
such scheme [5] in which the fraction of active neurons in a source population carries the message [input
a) Scalar
Signal
1
2
3
4
Non-Temporal Population Codes
e) Fixed-Size CA
1
2
3
4
Name
WTA
module
b) Rate c) Latency d) Variable-Size
Target Field 1
2
3
4
T
5 4 3 2
4 5 4 3
3 4 5 4
2 3 4 5
1
2
3
4
with
1
1
2
3
4
with
2
with
3
with
4
Temporal Codes
Source
Field
summation (number of simultaneous spikes) shown next to the target cell for each of the four signals values].
This variable-size population (a.k.a. “thermometer”) coding scheme has the benefit that all signals are sent in the
same, short, time, but it is not combinatorial in nature, and has many limitations: a) uneven usage of the neurons;
b) different messages (signals) require different energies; c) individual neurons represent localistically, e.g., each
source field neuron represents a specific increment of a scalar encoded variable; d) the max number of
representable values (items/concepts) is the number of units comprising the coding field; and most importantly,
e) any single message sent represents only one value, e.g., a single value (level) of a scalar variable, implying that
any one message carries log2M bits, where M is the number of possible messages (values).
In contrast, consider the fixed-size CA representation of Fig. 1e. This CA coding field is organized as Q=5
WTA competitive modules (CMs), each with K=4 binary units. Thus, all codes, are of the same fixed size, Q=5.
As explained in [6] and illustrated by charts at right of Fig. 1, if the learning algorithm preserves similarity, i.e.,
maps more similar values to more highly intersecting CAs, then any single CA,
i
, represents (encodes) the
similarity distribution over all items (values) stored in the field. Note: blue denotes active units not in the
intersection with
1
. If we can further assume that value (e.g., input pattern) similarity correlates with likelihood
(as is reasonable for vast portions of input spaces having natural statistics), then any
i
encodes not just the single
item to which it was assigned during learning, but the detailed likelihood distribution over all items stored in the
field. By “detailed distribution”, we mean the individual likelihoods of all stored items [including items learned
(stored) after
i
], not just a few higher-order statistics describing the distribution. Rinkus [7, 8] described a
neurally-plausible, unsupervised, single-trial, on-line, similarity-preserving learning algorithm that runs in fixed-
time, i.e., the number of steps to learn (i.e., store) an item remains fixed as the number of stored items grows, as
does the number of steps needed to retrieve the most similar (likely) stored input.
Crucially, since any one CA encodes the entire likelihood distribution, the set of single spikes sent from such
a code simultaneously transmits that entire distribution: the instantaneous sums at the target cells carry the
information. Note: whenever any CA,
i
, is sent, 20 lines (axons) are active (black), meaning that all four target
cells will have Q (=5) active inputs. Thus, due to the combinatorial nature of the fixed-size CA code, the specific
values of the binary weights are essential to describing the code, unlike the other codes where the weights could
all be assumed to be 1. This emphasizes that for the combinatorial, fixed-size CA code, we need to view the
“channel”, i.e., the weight matrix, as having an internal structure that is changed during learning (by the signals
that traverse the weights), whereas for the other codes, the channel can be viewed as a “neutral bus”. Thus, for
the example of Fig. 1e, we assume: a) all weights are initially 0; b) the four associations,
1target cell 1
,
2target cell 2
, etc., were previously learned with single trials; and c) on those trials, coactive pre-post synapses
were increased to wt=1. Thus, if
1
is reactivated, target cell 1’s input sum will be 5 and other cells’ sums will be
as shown (to left of target cells). If
is reactivated, target cell 2’s input sum will be 5, etc. [black line: active,
increased wt; dotted line: non-active increased wt; gray line: non-increased wt.] The four target cells could be
embedded in a recurrent field with inhibitory infrastructure that would allow them to be read out sequentially in
descending input summation order. That implies that the full similarity (likelihood) order information over all
four stored items is sent in each of the four cases. Since there are 4! orderings of the four items, then each such
message, each a volley of 20 simultaneous spikes sent from five active CA neurons, sends log2(4!)=4.58 bits. I
suggest this marriage of fixed-size CAs and an atemporal first-spike coding scheme is a crucial advance beyond
prior population-based models, i.e., the “distributional encoding” models (see [9-11] for reviews), and may be
key to explaining the speed and efficiency of probabilistic computation in the brain.
References
1. Georgopoulos, A.P., A.B. Schwartz, and R.E. Kettner, Neuronal population coding of movement direction. Science, 1986. 233.
2. Stewart, T.C., T. Bekolay, and C. Eliasmith, Neural representations of compositional structures: representing and manipulating
vector spaces with spiking neurons. Connection Science, 2011. 23(2): p. 145-153.
3. Jazayeri, M. and J.A. Movshon, Optimal representation of sensory information by neural populations. Nat Neurosci, 2006. 9(5).
4. Zemel, R., P. Dayan, and A. Pouget, Probabilistic interpretation of population codes. Neural Comput., 1998. 10: p. 403-430.
5. Gerstner, W., et al., Neuronal Dynamics: From single neurons to networks and models of cognition. 2014, NY: Cambridge U. Press.
6. Rinkus, G., Quantum Computing via Sparse Distributed Representation. NeuroQuantology, 2012. 10(2): p. 311-315.
7. Rinkus, G., A Combinatorial Neural Network Exhibiting Episodic and Semantic Memory Properties for Spatio-Temporal Patterns,
in Cognitive & Neural Systems. 1996, Boston U.: Boston.
8. Rinkus, G., A cortical sparse distributed coding model linking mini- and macrocolumn-scale functionality. FIN, 2010. 4.
9. Pouget, A., et al., Probabilistic brains: knowns and unknowns. Nat Neurosci, 2013. 16(9): p. 1170-1178.
10. Pouget, A., P. Dayan, and R. Zemel, Information processing with population codes. Nat Rev Neurosci, 2000. 1(2): p. 125-132.
11. Pouget, A., P. Dayan, and R.S. Zemel, Inference and Computation with Population Codes. Ann. Rev. of Neuroscience, 2003. 26(1).
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Quantum superposition says that any physical system simultaneously exists in all of its possible states, the number of which is exponential in the number of entities composing the system. The strength of presence of each possible state in the superposition, i.e., its probability of being observed, is represented by its probability amplitude coefficient. The assumption that these coefficients must be represented physically disjointly from each other, i.e., localistically, is nearly universal in the quantum theory/computing literature. Alternatively, these coefficients can be represented using sparse distributed representations (SDR), wherein each coefficient is represented by small subset of an overall population of units, and the subsets can overlap. Specifically, I consider an SDR model in which the overall population consists of Q WTA clusters, each with K binary units. Each coefficient is represented by a set of Q units, one per cluster. Thus, K^Q coefficients can be represented with KQ units. Thus, the particular world state, X, whose coefficient's representation, R(X), is the set of Q units active at time t has the max probability and the probability of every other state, Y_i, at time t, is measured by R(Y_i)'s intersection with R(X). Thus, R(X) simultaneously represents both the particular state, X, and the probability distribution over all states. Thus, set intersection may be used to classically implement quantum superposition. If algorithms exist for which the time it takes to store (learn) new representations and to find the closest-matching stored representation (probabilistic inference) remains constant as additional representations are stored, this meets the criterion of quantum computing. Such an algorithm has already been described: it achieves this "quantum speed-up" without esoteric hardware, and in fact, on a single-processor, classical (Von Neumann) computer.
Article
Full-text available
This paper re-examines the question of localist vs. distributed neural representations using a biologically realistic framework based on the central notion of neurons having a preferred direction vector.Apreferred direction vector captures the general observation that neurons fire most vigorously when the stimulus lies in a particular direction in a represented vector space. This framework has been successful in capturing a wide variety of detailed neural data, although here we focus on cognitive representation. In particular, we describe methods for constructing spiking networks that can represent and manipulate structured, symbol-like representations. In the context of such networks, neuron activities can seem both localist and distributed, depending on the space of inputs being considered. This analysis suggests that claims of a set of neurons being localist or distributed cannot be made sense of without specifying the particular stimulus set used to examine the neurons.
Article
Full-text available
No generic function for the minicolumn - i.e., one that would apply equally well to all cortical areas and species - has yet been proposed. I propose that the minicolumn does have a generic functionality, which only becomes clear when seen in the context of the function of the higher-level, subsuming unit, the macrocolumn. I propose that: (a) a macrocolumn's function is to store sparse distributed representations of its inputs and to be a recognizer of those inputs; and (b) the generic function of the minicolumn is to enforce macrocolumnar code sparseness. The minicolumn, defined here as a physically localized pool of approximately 20 L2/3 pyramidals, does this by acting as a winner-take-all (WTA) competitive module, implying that macrocolumnar codes consist of approximately 70 active L2/3 cells, assuming approximately 70 minicolumns per macrocolumn. I describe an algorithm for activating these codes during both learning and retrievals, which causes more similar inputs to map to more highly intersecting codes, a property which yields ultra-fast (immediate, first-shot) storage and retrieval. The algorithm achieves this by adding an amount of randomness (noise) into the code selection process, which is inversely proportional to an input's familiarity. I propose a possible mapping of the algorithm onto cortical circuitry, and adduce evidence for a neuromodulatory implementation of this familiarity-contingent noise mechanism. The model is distinguished from other recent columnar cortical circuit models in proposing a generic minicolumnar function in which a group of cells within the minicolumn, the L2/3 pyramidals, compete (WTA) to be part of the sparse distributed macrocolumnar code.
Thesis
Full-text available
A model is described in which three types of memory—episodic memory, complex sequence memory and semantic memory—coexist within a single distributed associative memory. Episodic memory stores traces of specific events. Its basic properties are: high capacity, single-trial learning, memory trace permanence, and ability to store non-orthogonal patterns. Complex sequence memory is the storage of sequences in which states can recur multiple times: e.g. [A B B A C B A]. Semantic memory is general knowledge of the degree of featural overlap between the various objects and events in the world. The model's initial version, TEMECOR-1, exhibits episodic and complex sequence memory properties for both uncorrelated and correlated spatiotemporal patterns. Simulations show that its capacity increases approximately quadratically with the size of the model. An enhanced version of the model, TEMECOR-II, adds semantic memory properties. The TEMECOR-I model is a two-layer network that uses a sparse, distributed internal representation (IR) scheme in its layer two (L2). Noise and competition allow the IRs of each input state to be chosen in a random fashion. This randomness effects an orthogonalization in the input-to- IR mapping, thereby increasing capacity. Successively activated IRs are linked via Hebbian learning in a matrix of horizontal synapses. Each L2 cell participates in numerous episodic traces. A variable threshold prevents interference between traces during recall. The random choice of IRs in TEMECOR-I precludes the continuity property of semantic memory: that there be a relationship between the similarity (degree of overlap) of two IRs and the similarity of the corresponding inputs. To create continuity in TEMECOR-II, the choice of the IR is a function of both noise (Lambda) and signals propagating in the L2 horizontal matrix and input-to-IR map. These signals are deterministic and shaped by prior experience. On each time slice, TEMECOR-II computes an expected input based on the history-dependent influences, and then computes the difference between the expected and actual inputs. When the current situation is completely familiar, Lamda = 0 and the choice of IRs is determined by the history-dependent influences. The resulting IR has large overlap with previously used IRs. As perceived novelty increases, so does Lambda, with the result that the overlap between the chosen IR and any previously-used IRs decreases.
Article
Full-text available
We present a general encoding-decoding framework for interpreting the activity of a population of units. A standard population code interpretation method, the Poisson model, starts from a description as to how a single value of an underlying quantity can generate the activities of each unit in the population. In casting it in the encoding-decoding framework, we find that this model is too restrictive to describe fully the activities of units in population codes in higher processing areas, such as the medial temporal area. Under a more powerful model, the population activity can convey information not only about a single value of some quantity but also about its whole distribution, including its variance, and perhaps even the certainty the system has in the actual presence in the world of the entity generating this quantity. We propose a novel method for forming such probabilistic interpretations of population codes and compare it to the existing method.
Book
What happens in our brain when we make a decision? What triggers a neuron to send out a signal? What is the neural code? This textbook for advanced undergraduate and beginning graduate students provides a thorough and up-to-date introduction to the fields of computational and theoretical neuroscience. It covers classical topics, including the Hodgkin-Huxley equations and Hopfield model, as well as modern developments in the field such as Generalized Linear Models and decision theory. Concepts are introduced using clear step-by-step explanations suitable for readers with only a basic knowledge of differential equations and probabilities, and are richly illustrated by figures and worked-out examples. End-of-chapter summaries and classroom-tested exercises make the book ideal for courses or for self-study. The authors also give pointers to the literature and an extensive bibliography, which will prove invaluable to readers interested in further study.
Article
There is strong behavioral and physiological evidence that the brain both represents probability distributions and performs probabilistic inference. Computational neuroscientists have started to shed light on how these probabilistic representations and computations might be implemented in neural circuits. One particularly appealing aspect of these theories is their generality: they can be used to model a wide range of tasks, from sensory processing to high-level cognition. To date, however, these theories have only been applied to very simple tasks. Here we discuss the challenges that will emerge as researchers start focusing their efforts on real-life computations, with a focus on probabilistic learning, structural learning and approximate inference.
Article
Although individual neurons in the arm area of the primate motor cortex are only broadly tuned to a particular direction in three-dimensional space, the animal can very precisely control the movement of its arm. The direction of movement was found to be uniquely predicted by the action of a population of motor cortical neurons. When individual cells were represented as vectors that make weighted contributions along the axis of their preferred direction (according to changes in their activity during the movement under consideration) the resulting vector sum of all cell vectors (population vector) was in a direction congruent with the direction of movement. This population vector can be monitored during various tasks, and similar measures in other neuronal populations could be of heuristic value where there is a neural representation of variables with vectorial attributes.
Article
Information is encoded in the brain by populations or clusters of cells, rather than by single cells. This encoding strategy is known as population coding. Here we review the standard use of population codes for encoding and decoding information, and consider how population codes can be used to support neural computations such as noise removal and nonlinear mapping. More radical ideas about how population codes may directly represent information about stimulus uncertainty are also discussed.
Article
In the vertebrate nervous system, sensory stimuli are typically encoded through the concerted activity of large populations of neurons. Classically, these patterns of activity have been treated as encoding the value of the stimulus (e.g., the orientation of a contour), and computation has been formalized in terms of function approximation. More recently, there have been several suggestions that neural computation is akin to a Bayesian inference process, with population activity patterns representing uncertainty about stimuli in the form of probability distributions (e.g., the probability density function over the orientation of a contour). This paper reviews both approaches, with a particular emphasis on the latter, which we see as a very promising framework for future modeling and experimental work.