Content uploaded by Rod Rinkus
Author content
All content in this area was uploaded by Rod Rinkus on Apr 08, 2023
Content may be subject to copyright.
World Model Formation as a Side-effect of Non-optimization-
based Unsupervised Episodic Memory
Gerard (Rod) Rinkus1
1Neurithmic Systems, Newton, MA, USA
Summary. Evidence suggests information is represented in the brain, e.g., neocortex,
hippocampus, in the form of sparse distributed codes (SDCs), i.e., small sets of principal cells, a
kind of “cell assembly” concept. Two key questions are: a) how are such codes formed (learned) on
the basis of single trials, as is needed to account for episodic memory; and b) how do more similar
inputs get mapped to more similar SDCs, as needed to account for similarity-based responding, i.e.,
responding based on a world model? I describe a Modular Sparse Distributed Code (MSDC) and
associated single-trial, on-line, non-optimization-based, unsupervised learning algorithm that
answers both questions. An MSDC coding field (CF) consists of Q WTA competitive modules
(CMs), each comprised of K binary units (principal cell analogs). The key principle of the learning
algorithm is to add noise inversely proportional to input familiarity (directly proportional to novelty)
to the deterministic synaptic input sums of the CF units, yielding a distribution in each CM from
which a winner is chosen. The more familiar an input, X, the less noise added, allowing prior
learning to dominate winner selection, resulting in greater expected intersection of the code assigned
to X, not only with the code of the single most similar stored input, but, because all codes are stored
in superposition, with the codes of all stored inputs. I believe this is the first proposal for a normative
use of noise during learning, specifically, for the purpose of approximately preserving similarity
from inputs to neural codes. Thus, the model constructs a world model as a side-effect of its primary
action, storing episodic memories. Crucially, it runs in fixed time, i.e., the number of steps needed
to store an item (an episodic memory) remains constant as the number of stored items grows.
The “Neuron Doctrine”, the idea that the individual (principal) neuron is the atomic functional unit of
meaning, has long dominated neuroscience. But improving experimental methods yield increasing evidence
that the “cell assembly” [1], a set of co-active neurons, is the atomic functional unit of representation and
thus of cognition [2, 3]. This raises two key questions. 1) How can a cell assembly be assigned to represent
an input based on a single trial, as needed to explain episodic memory? 2) How can similarity relations, not
just pairwise but ideally, of all orders present in the input space, be preserved in cell assembly space, i.e.,
how might a world model be built, as needed to explain similarity-based responding / generalization?
I describe a novel cell assembly concept, Modular Sparse Distributed Coding (MSDC), providing a
neurally plausible, computationally efficient answer to both questions. MSDC admits a single-trial, on-line,
unsupervised learning algorithm that approximately preserves similarity—maps more similar inputs to
more highly intersecting MSDCs—and crucially, runs in fixed time, i.e., the number of steps needed to store
(learn) a new item remains constant as the number of stored items increases. Further, since the MSDCs of
all items are stored in superposition and such that their intersection structure reflects the input space's
similarity structure, best-match (nearest-neighbor) retrieval also runs in fixed time.
The MSDC coding field (CF) is modular: it consists of Q winner-take-all (WTA) Competitive Modules
(CMs), each comprised of K binary units. Thus, all codes stored in the CF are of size Q, one winner per
CM. This modular structure: i) distinguishes it from many “flat” SDC models, e.g., [4-6]; and ii) admits an
extremely efficient way to compute an input’s familiarity, G (a generalized similarity measure, defined
shortly), without requiring explicit comparison to each individual stored input. The model’s most important
contribution is a novel, normative use of noise (randomness) in the learning process to statistically preserve
similarity. Specifically, an amount of noise inversely proportional to G (directly proportional to novelty) is
injected into the process of choosing winners in the Q CMs. This causes the expected intersection of the
code,
(X), assigned to a new input, X, with the code,
(Y), of a previously stored input, Y, to be an
increasing function of the similarity of X and Y. Higher input familiarity maximizes expected intersection
with previously stored codes, embedding the similarity structure over the inputs. Higher novelty minimizes
expected intersection with previously stored codes, maximizing storage capacity. The tradeoff between
embedding statistical structure and capacity maximization is an area of active research [7, 8].
Fig. 1 shows the MSDC coding format and succinctly explains how adding noise proportional to
novelty into the code selection process approximately preserves similarity. Fig. 1a (bottom) shows a small
MSDC model instance: an input level composed of 8 binary units is fully connected to an MSDC CF with
Q=5 CMs, each with K=3 binary units. All wts are binary and initially zero (gray). The first input, A [5
active (black) units], has presented, giving rise to bottom-up binary signals to the CF. The U charts show
the normalized input sums for the 15 CF units: all are 0 since there has not yet been any learning.
Normalization is possible because we assume all inputs have exactly five active units and so can divide raw
input sums by 5. The algorithm then computes G, defined as the average max U value across the Q CMs;
here, G=0, indicating A is completely novel. The key step is to then transform the U distribution in each
CM into a noisy distribution (ρ values) from which a winner is chosen. In this case (G=0), substantial noise
is added (albeit to an already flat U distribution), resulting in a uniform ρ distribution and thus, a completely
random code,
(A) (top of Fig. 1a, black units). The 25 wts from the active input units to the units
comprising
(A) are then permanently increased to 1 (increased wts are black lines in subsequent panels).
Figure 1. Approximate similarity preservation via injection of novelty-contingent noise
Fig. 1b shows re-presentation of A (i.e., exact-match retrieval). Here, the units comprising
(A) have
U=1, all other units have U=0, thus G=1, indicating maximal familiarity, thus, very little noise is added in
the U-to-ρ transform, plausibly resulting in the max-ρ unit winning in all Q=5 CMs, i.e., reactivation of
(A). In Fig. 1c, a novel input, B, very similar to A [4 of 5 units (black) in common)] is presented. Here,
the units comprising
(A) have U=0.8, yielding G=0.8 (lower familiarity). So, more noise is added into the
U-to-ρ transform, yielding flatter ρ distributions, plausibly resulting in the max-ρ cell winning in four of
the five CMs, losing in one (red unit). Figs. 1d,e show decreasingly similar (to A) inputs, C and D, yielding
progressively flatter ρ distributions and thus, codes,
(C) and
(D), with decreasing (expected) intersections
with
(A). The local (as part of a mesoscopic canonical cortical circuit) G computation could drive some
combination of fast (~100 ms) neuromodulatory signals, e.g., ACh [9], NE [10], to control noise: e.g., high
novelty acts to increase intrinsic excitability of principal cells comprising the CF, thus increasing noise
relative to the deterministic effects of their synaptic inputs (signal), which reflect prior learning. Results
demonstrating this model, for spatial and spatiotemporal input domains, will be reported.
References
[1] D.O. Hebb, The organization of behavior; a neuropsychological theory, Wiley, NY, 1949.
[2] R. Yuste, From the neuron doctrine to neural networks, Nat Rev Neurosci, 16 (2015) 487-497.
[3] S.A. Josselyn, P.W. Frankland, Memory Allocation: Mechanisms and Function, Ann. Rev. Neuro, 41 (2018) 389.
[4] D.J. Willshaw, O.P. Buneman, H.C. Longuet-Higgins, Non Holographic Associative Memory, Nature, 222 (1969).
[5] P. Kanerva, Sparse distributed memory, MIT Press, Cambridge, MA, 1988.
[6] G. Palm, Neural assemblies: An alternative approach to artificial intelligence, Springer, Berlin, 1982.
[7] C. Curto, V., et al., Combin. Neural Codes from a Math. Coding Theory Perspect., Neural Comp, 25 (2013) 1891.
[8] P.E. Latham, Correlations demystified, Nat Neurosci, 20 (2017) 6-8.
[9] D.A. McCormick, et al., Mechanisms of acetylcholine in guinea-pig cereb. cortex, J Physiol, 375 (1986) 169.
[10] S.J. Sara, A. Vankov, A. Hervé, Locus coeruleus-evoked responses in behaving rats: A clue to the role of
noradrenaline in memory, Brain Research Bulletin, 35 (1994) 457-465.