PreprintPDF Available

World Model Formation as a Side-effect of Non-optimization- based Unsupervised Episodic Memory (NNPC 2023 accepted as poster))

Authors:
  • Neurithmic Systems
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Evidence suggests information is represented in the brain, e.g., neocortex, hippocampus, in the form of sparse distributed codes (SDCs), i.e., small sets of principal cells, a kind of "cell assembly" concept. Two key questions are: a) how are such codes formed (learned) on the basis of single trials, as is needed to account for episodic memory; and b) how do more similar inputs get mapped to more similar SDCs, as needed to account for similarity-based responding, i.e., responding based on a world model? I describe a Modular Sparse Distributed Code (MSDC) and associated single-trial, on-line, non-optimization-based, unsupervised learning algorithm that answers both questions. An MSDC coding field (CF) consists of Q WTA competitive modules (CMs), each comprised of K binary units (principal cell analogs). The key principle of the learning algorithm is to add noise inversely proportional to input familiarity (directly proportional to novelty) to the deterministic synaptic input sums of the CF units, yielding a distribution in each CM from which a winner is chosen. The more familiar an input, X, the less noise added, allowing prior learning to dominate winner selection, resulting in greater expected intersection of the code assigned to X, not only with the code of the single most similar stored input, but, because all codes are stored in superposition, with the codes of all stored inputs. I believe this is the first proposal for a normative use of noise during learning, specifically, for the purpose of approximately preserving similarity from inputs to neural codes. Thus, the model constructs a world model as a side-effect of its primary action, storing episodic memories. Crucially, it runs in fixed time, i.e., the number of steps needed to store an item (an episodic memory) remains constant as the number of stored items grows.
World Model Formation as a Side-effect of Non-optimization-
based Unsupervised Episodic Memory
Gerard (Rod) Rinkus1
1Neurithmic Systems, Newton, MA, USA
Summary. Evidence suggests information is represented in the brain, e.g., neocortex,
hippocampus, in the form of sparse distributed codes (SDCs), i.e., small sets of principal cells, a
kind of cell assembly” concept. Two key questions are: a) how are such codes formed (learned) on
the basis of single trials, as is needed to account for episodic memory; and b) how do more similar
inputs get mapped to more similar SDCs, as needed to account for similarity-based responding, i.e.,
responding based on a world model? I describe a Modular Sparse Distributed Code (MSDC) and
associated single-trial, on-line, non-optimization-based, unsupervised learning algorithm that
answers both questions. An MSDC coding field (CF) consists of Q WTA competitive modules
(CMs), each comprised of K binary units (principal cell analogs). The key principle of the learning
algorithm is to add noise inversely proportional to input familiarity (directly proportional to novelty)
to the deterministic synaptic input sums of the CF units, yielding a distribution in each CM from
which a winner is chosen. The more familiar an input, X, the less noise added, allowing prior
learning to dominate winner selection, resulting in greater expected intersection of the code assigned
to X, not only with the code of the single most similar stored input, but, because all codes are stored
in superposition, with the codes of all stored inputs. I believe this is the first proposal for a normative
use of noise during learning, specifically, for the purpose of approximately preserving similarity
from inputs to neural codes. Thus, the model constructs a world model as a side-effect of its primary
action, storing episodic memories. Crucially, it runs in fixed time, i.e., the number of steps needed
to store an item (an episodic memory) remains constant as the number of stored items grows.
The “Neuron Doctrine”, the idea that the individual (principal) neuron is the atomic functional unit of
meaning, has long dominated neuroscience. But improving experimental methods yield increasing evidence
that the “cell assembly” [1], a set of co-active neurons, is the atomic functional unit of representation and
thus of cognition [2, 3]. This raises two key questions. 1) How can a cell assembly be assigned to represent
an input based on a single trial, as needed to explain episodic memory? 2) How can similarity relations, not
just pairwise but ideally, of all orders present in the input space, be preserved in cell assembly space, i.e.,
how might a world model be built, as needed to explain similarity-based responding / generalization?
I describe a novel cell assembly concept, Modular Sparse Distributed Coding (MSDC), providing a
neurally plausible, computationally efficient answer to both questions. MSDC admits a single-trial, on-line,
unsupervised learning algorithm that approximately preserves similaritymaps more similar inputs to
more highly intersecting MSDCsand crucially, runs in fixed time, i.e., the number of steps needed to store
(learn) a new item remains constant as the number of stored items increases. Further, since the MSDCs of
all items are stored in superposition and such that their intersection structure reflects the input space's
similarity structure, best-match (nearest-neighbor) retrieval also runs in fixed time.
The MSDC coding field (CF) is modular: it consists of Q winner-take-all (WTA) Competitive Modules
(CMs), each comprised of K binary units. Thus, all codes stored in the CF are of size Q, one winner per
CM. This modular structure: i) distinguishes it from many “flat” SDC models, e.g., [4-6]; and ii) admits an
extremely efficient way to compute an input’s familiarity, G (a generalized similarity measure, defined
shortly), without requiring explicit comparison to each individual stored input. The model’s most important
contribution is a novel, normative use of noise (randomness) in the learning process to statistically preserve
similarity. Specifically, an amount of noise inversely proportional to G (directly proportional to novelty) is
injected into the process of choosing winners in the Q CMs. This causes the expected intersection of the
code,
(X), assigned to a new input, X, with the code,
(Y), of a previously stored input, Y, to be an
increasing function of the similarity of X and Y. Higher input familiarity maximizes expected intersection
with previously stored codes, embedding the similarity structure over the inputs. Higher novelty minimizes
expected intersection with previously stored codes, maximizing storage capacity. The tradeoff between
embedding statistical structure and capacity maximization is an area of active research [7, 8].
Fig. 1 shows the MSDC coding format and succinctly explains how adding noise proportional to
novelty into the code selection process approximately preserves similarity. Fig. 1a (bottom) shows a small
MSDC model instance: an input level composed of 8 binary units is fully connected to an MSDC CF with
Q=5 CMs, each with K=3 binary units. All wts are binary and initially zero (gray). The first input, A [5
active (black) units], has presented, giving rise to bottom-up binary signals to the CF. The U charts show
the normalized input sums for the 15 CF units: all are 0 since there has not yet been any learning.
Normalization is possible because we assume all inputs have exactly five active units and so can divide raw
input sums by 5. The algorithm then computes G, defined as the average max U value across the Q CMs;
here, G=0, indicating A is completely novel. The key step is to then transform the U distribution in each
CM into a noisy distribution (ρ values) from which a winner is chosen. In this case (G=0), substantial noise
is added (albeit to an already flat U distribution), resulting in a uniform ρ distribution and thus, a completely
random code,
(A) (top of Fig. 1a, black units). The 25 wts from the active input units to the units
comprising
(A) are then permanently increased to 1 (increased wts are black lines in subsequent panels).
Figure 1. Approximate similarity preservation via injection of novelty-contingent noise
Fig. 1b shows re-presentation of A (i.e., exact-match retrieval). Here, the units comprising
(A) have
U=1, all other units have U=0, thus G=1, indicating maximal familiarity, thus, very little noise is added in
the U-to-ρ transform, plausibly resulting in the max-ρ unit winning in all Q=5 CMs, i.e., reactivation of
(A). In Fig. 1c, a novel input, B, very similar to A [4 of 5 units (black) in common)] is presented. Here,
the units comprising
(A) have U=0.8, yielding G=0.8 (lower familiarity). So, more noise is added into the
U-to-ρ transform, yielding flatter ρ distributions, plausibly resulting in the max-ρ cell winning in four of
the five CMs, losing in one (red unit). Figs. 1d,e show decreasingly similar (to A) inputs, C and D, yielding
progressively flatter ρ distributions and thus, codes,
(C) and
(D), with decreasing (expected) intersections
with
(A). The local (as part of a mesoscopic canonical cortical circuit) G computation could drive some
combination of fast (~100 ms) neuromodulatory signals, e.g., ACh [9], NE [10], to control noise: e.g., high
novelty acts to increase intrinsic excitability of principal cells comprising the CF, thus increasing noise
relative to the deterministic effects of their synaptic inputs (signal), which reflect prior learning. Results
demonstrating this model, for spatial and spatiotemporal input domains, will be reported.
References
[1] D.O. Hebb, The organization of behavior; a neuropsychological theory, Wiley, NY, 1949.
[2] R. Yuste, From the neuron doctrine to neural networks, Nat Rev Neurosci, 16 (2015) 487-497.
[3] S.A. Josselyn, P.W. Frankland, Memory Allocation: Mechanisms and Function, Ann. Rev. Neuro, 41 (2018) 389.
[4] D.J. Willshaw, O.P. Buneman, H.C. Longuet-Higgins, Non Holographic Associative Memory, Nature, 222 (1969).
[5] P. Kanerva, Sparse distributed memory, MIT Press, Cambridge, MA, 1988.
[6] G. Palm, Neural assemblies: An alternative approach to artificial intelligence, Springer, Berlin, 1982.
[7] C. Curto, V., et al., Combin. Neural Codes from a Math. Coding Theory Perspect., Neural Comp, 25 (2013) 1891.
[8] P.E. Latham, Correlations demystified, Nat Neurosci, 20 (2017) 6-8.
[9] D.A. McCormick, et al., Mechanisms of acetylcholine in guinea-pig cereb. cortex, J Physiol, 375 (1986) 169.
[10] S.J. Sara, A. Vankov, A. Hervé, Locus coeruleus-evoked responses in behaving rats: A clue to the role of
noradrenaline in memory, Brain Research Bulletin, 35 (1994) 457-465.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The mechanisms of action of acetylcholine (ACh) in the guinea-pig neocortex were investigated using intracellular recordings from layer V pyramidal cells of the anterior cingulate cortical slice. At resting membrane potential (Vm = -80 to -70 mV), ACh application resulted in a barrage of excitatory and inhibitory post-synaptic potentials (p.s.p.s) associated with a decrease in apparent input resistance (Ri). ACh, applied to pyramidal neurones depolarized to just below firing threshold (Vm = -65 to -55 mV), produced a short-latency hyperpolarization concomitant with p.s.p.s and a decrease in Ri, followed by a long-lasting (10 to greater than 60 s) depolarization and action potential generation. Both of these responses were also found in presumed pyramidal neurones of other cortical regions (sensorimotor and visual) and were blocked by muscarinic, but not nicotinic, antagonists. The ACh-induced hyperpolarization possessed an average reversal potential of -75.8 mV, similar to that for the hyperpolarizing response to gamma-aminobutyric acid (GABA; -72.4 mV) and for the i.p.s.p. generated by orthodromic stimulation (-69.6 mV). This cholinergic inhibitory response could be elicited by ACh applications at significantly greater distance from the cell than the slow depolarizing response. Blockade of GABAergic synaptic transmission with solution containing Mn2+ and low Ca2+, or by local application of tetrodotoxin (TTX), bicuculline or picrotoxin, abolished the ACh-induced inhibitory response but not the slow excitatory response. In TTX (or Mn2+, low Ca2+) the slow excitatory response possessed a minimum onset latency of 250 ms and was associated with a voltage-dependent increase in Ri. Application of ACh caused short-latency excitation associated with a decrease in Ri in eight neurones. The time course of this excitation was similar to that of the inhibition seen in pyramidal neurones. Seven of these neurones had action potentials with unusually brief durations, indicating that they were probably non-pyramidal cells. ACh blocked the slow after-hyperpolarization (a.h.p.) following a train of action potentials, occasionally reduced orthodromically evoked p.s.p.s, and had no effect on the width or maximum rate of rise or fall of the action potential. It is concluded that cholinergic inhibition of pyramidal neurones is mediated through a rapid muscarinic excitation of non-pyramidal cells, resulting in the release of GABA. In pyramidal cells ACh causes a relatively slow blockade of both a voltage-dependent hyperpolarizing conductance (M-current) which is most active at depolarized membrane potentials, and the Ca2+-activated K+ conductance underlying the a.h.p.(ABSTRACT TRUNCATED AT 400 WORDS)
Article
Memories for events are thought to be represented in sparse, distributed neuronal ensembles (or engrams). In this article, we review how neurons are chosen to become part of a particular engram, via a process of neuronal allocation. Experiments in rodents indicate that eligible neurons compete for allocation to a given engram, with more excitable neurons winning this competition. Moreover, fluctuations in neuronal excitability determine how engrams interact, promoting either memory integration (via coallocation to overlapping engrams) or separation (via disallocation to nonoverlapping engrams). In parallel with rodent studies, recent findings in humans verify the importance of this memory integration process for linking memories that occur close in time or share related content. A deeper understanding of allocation promises to provide insights into the logic underlying how knowledge is normally organized in the brain and the disorders in which this process has gone awry. Expected final online publication date for the Annual Review of Neuroscience Volume 41 is July 8, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Book
You can't tell how deep a puddle is until you step in it. When I am asked about my profession, I have two ways of answering. If I want a short discussion, I say that I am a mathematician; if I want a long discussion, I say that I try to understand how the human brain works. A long discussion often leads to further questions: What does it mean to understand "how the brain works"? Does it help to be trained in mathematics when you try to understand the brain, and what kind of mathematics can help? What makes a mathematician turn into a neuroscientist? This may lead into a metascientific discussion which I do not like par­ ticularly because it is usually too far off the ground. In this book I take quite a different approach. I just start explaining how I think the brain works. In the course of this explanation my answers to the above questions will become clear to the reader, and he will perhaps learn some facts about the brain and get some insight into the construc­ tions of artificial intelligence.
Article
For over a century, the neuron doctrine - which states that the neuron is the structural and functional unit of the nervous system - has provided a conceptual foundation for neuroscience. This viewpoint reflects its origins in a time when the use of single-neuron anatomical and physiological techniques was prominent. However, newer multineuronal recording methods have revealed that ensembles of neurons, rather than individual cells, can form physiological units and generate emergent functional properties and states. As a new paradigm for neuroscience, neural network models have the potential to incorporate knowledge acquired with single-neuron approaches to help us understand how emergent functional states generate behaviour, cognition and mental disease.
Article
Neuromodulatory properties of noradrenaline (NA) suggest that the coreruleo-cortical NA projection should play an important role in attention and memory processes. Our research is aimed at providing some behavioral evidence. Single units of the locus coeruleus (LC) are recorded during controlled behavioral situations, in order to relate LC activation to specific behavioral contexts. LC cells respond in burst to imposed novel sensory stimuli or to novel objects encountered during free exploration. When there is no predictive value of the stimulus or no behavioral response required, there is rapid habituation of the LC response. When a stimulus is then associated with reinforcement, there is a renewed response, which is transient. During extinction, LC neuronal responses reappear. Thus, LC cells respond to novelty or change in incoming information, but do not have a sustained response to stimuli, even when they have a high level of biological significance. The gating and tuning action of NA released in target sensory systems would promote selective attention to relevant stimuli at the critical moment of change. The adaptive behavioral outcome would result from the integration of retrieved memory with the sensory information selected from the environment.
  • D J Willshaw
  • O P Buneman
  • H C Longuet-Higgins
D.J. Willshaw, O.P. Buneman, H.C. Longuet-Higgins, Non Holographic Associative Memory, Nature, 222 (1969).
  • C Curto
C. Curto, V., et al., Combin. Neural Codes from a Math. Coding Theory Perspect., Neural Comp, 25 (2013) 1891.
  • P E Latham
P.E. Latham, Correlations demystified, Nat Neurosci, 20 (2017) 6-8.