Project

Canonical Cortical Circuit based on Sparse Distributed Representations

Goal: Understand the core circuit and algorithm of neocortex, which is likely essentially similar to that of hippocampus and olfactory bulb as well.

Updates
0 new
0
Recommendations
0 new
0
Followers
0 new
6
Reads
0 new
107

Project log

Rod Rinkus
added a research item
There is increasing realization in neuroscience that information is represented in the brain, e.g., neocortex, hippocampus, in the form sparse distributed codes (SDCs), a kind of cell assembly. Two essential questions are: a) how are such codes formed on the basis of single trials; and b) how is similarity preserved during learning, i.e., how do more similar inputs get mapped to more similar SDCs.
Rod Rinkus
added a research item
There is increasing realization in neuroscience that information is represented in the brain, e.g., neocortex, hippocampus, in the form sparse distributed codes (SDCs), a kind of cell assembly. Two essential questions are: a) how are such codes formed on the basis of single trials, and how is similarity preserved during learning, i.e., how do more similar inputs get mapped to more similar SDCs. I describe a novel Modular Sparse Distributed Code (MSDC) that provides simple, neurally plausible answers to both questions. An MSDC coding field (CF) consists of Q WTA competitive modules (CMs), each comprised of K binary units (analogs of principal cells). The modular nature of the CF makes possible a single-trial, unsupervised learning algorithm that approximately preserves similarity and crucially, runs in fixed time, i.e., the number of steps needed to store an item remains constant as the number of stored items grows. Further, once items are stored as MSDCs in superposition and such that their intersection structure reflects input similarity, both fixed time best-match retrieval and fixed time belief update (updating the probabilities of all stored items) also become possible. The algorithm's core principle is simply to add noise into the process of choosing a code, i.e., choosing a winner in each CM, which is proportional to the novelty of the input. This causes the expected intersection of the code for an input, X, with the code of each previously stored input, Y, to be proportional to the similarity of X and Y. Results demonstrating these capabilities for spatial patterns are given in the appendix.
Rod Rinkus
added a research item
A simple, atemporal, first-spike code, operating on Combinatorial Population Codes (CPCs) (a.k.a., binary sparse distributed representations) is described, which allows the similarities (more generally, likelihoods) of all items (hypotheses) stored in a CPC field to be simultaneously transmitted with a wave of single spikes from any single active code (i.e., the code of any one particular stored item). Moreover, the number of underlying binary signals sent remains constant as the number of stored items grows.
Rod Rinkus
added a research item
(Web page: https://medium.com/@rod_83597/a-hebbian-cell-assembly-is-formed-at-full-strength-on-a-single-trial-d9def1d2fa89) Hebb defined a cell assembly as a group of reciprocally interconnected cells that represents a concept. It’s surely true that a set of cortical cells that becomes a CA will have some, possibly large, degree of interconnectivity. However, a CA can be formed with full strength on a single trial, even if there were no interconnections amongst those cells. That is, if a set of cells that receives a matrix of connections from some presynaptic field becomes active, then they will all experience correlated weight increases from the active cells in the presynaptic field. They become bound as a CA not because of increases amongst themselves, i.e., via a recurrent matrix, but simply by virtue of their correlated afferent weight increases. This possible functionality of CAs--i.e., full strength formation on one trial--has not been emphasized in the literature, despite its having large consequences in both neuroscience and machine learning.
Rod Rinkus
added a research item
For a single source neuron, spike coding schemes can be based on rate or on precise spike time(s) relative to an event, e.g., to a particular phase of gamma. Both are fundamentally temporal, requiring a decode window duration T much longer than a single spike. But, if information is represented by population activity (distributed codes, cell assemblies) then messages are carried by populations of spikes propagating in bundles of axons. This allows an atemporal coding scheme where the signal is encoded in the instantaneous sum of simultaneously arriving spikes, in principle, allowing T to shrink to the duration of a single spike. In one type of atemporal population coding scheme, the fraction of active neurons in a source population (thus, the fraction of active afferent synapses) carries the message. However, any single message carried by this variable-size code can represent only one value (signal). In contrast, if the source field uses fixed-size, combinatorial coding, a particular instantiation of Hebb's cell assembly concept, then any one active code can simultaneously represent multiple values, in fact, the likelihoods of all values, stored in the source field. Consequently, the vector of single, e.g., first, spikes sent by such a code can simultaneously transmit that full distribution. Combining fixed-size combinatorial coding and an atemporal first-spike coding scheme may be keys to explaining the speed and energy efficiency of probabilistic computation in the brain.
Rod Rinkus
added a research item
Four hallmarks of human intelligence are: 1) on-line, single/few-trial learning; 2) important/salient memories and knowledge are permanent over lifelong durations, though confabulation (semantically plausible retrieval errors) accrues with age; 3) the times to learn a new item and to retrieve the best-matching (most relevant) item(s) remain constant as the number of stored items grows; and 4) new items can be learned throughout life (storage capacity is never reached). No machine learning model, the vast majority of which are optimization-centric, i.e., learning involves optimizing a global objective (loss, energy), has all these capabilities. Here, I describe a memory-centric model, Sparsey, which in principle, has them all. I note prior results showing possession of Hallmarks 1 and 3 and sketch an argument, relying on hierarchy, critical periods, metaplasticity, and the recursive, compositional (part-whole) structure of natural objects/events, that it also possesses Hallmarks 2 and 4. Two of Sparsey's essential properties are: i) information is represented in the form of fixed-size sparse distributed representations (SDRs); and ii) its fixed-time learning algorithm maps more similar inputs to more highly intersecting SDRs. Thus, the similarity (statistical) structure over the inputs, not just pair-wise but in principle, of all orders present, in essence, a generative model, emerges in the pattern of intersections of the SDRs of individual inputs. Thus, semantic and episodic memory are fully superposed and semantic memory emerges as a by-product of storing episodic memories, contrasting sharply with deep learning (DL) approaches in which semantic and episodic memory are physically separate.
Rod Rinkus
added a research item
Among the more important hallmarks of human intelligence, which any artificial general intelligence (AGI) should have, are the following. 1. It must be capable of on-line learning, including with single/few trials. 2. Memories/knowledge must be permanent over lifelong durations, safe from catastrophic forgetting. Some confabulation, i.e., semantically plausible retrieval errors, may gradually accumulate over time. 3. The time to both: a) learn a new item, and b) retrieve the best-matching / most relevant item(s), i.e., do similarity-based retrieval, must remain constant throughout the lifetime. 4. The system should never become full: it must remain able to store new information, i.e., make new permanent memories, throughout very long lifetimes. No artificial computational system has been shown to have all these properties. Here, we describe a neuromorphic associative memory model, Sparsey, which does, in principle, possess them all. We cite prior results supporting possession of hallmarks 1 and 3 and sketch an argument, hinging on strongly recursive, hierarchical, part-whole compositional structure of natural data, that Sparsey also possesses hallmarks 2 and 4.
Rod Rinkus
added a research item
Abstract: For a single source neuron, spike coding schemes can be based on rate or on precise spike time(s) relative to an event, e.g., to a particular phase of gamma. Both are fundamentally temporal, requiring a decode window duration T much longer than a single spike. But, if information is represented by population activity (distributed codes, cell assemblies) then messages are carried by populations of spikes propagating in bundles of axons. This allows an atemporal coding scheme where the signal is encoded in the instantaneous sum of simultaneously arriving spikes, in principle, allowing T to shrink to the duration of a single spike. In one type of atemporal population coding scheme, the fraction of active neurons in a source population (thus, the fraction of active afferent synapses) carries the message. However, any single message carried by this variable-size code can represent only one value (signal). In contrast, if the source field uses fixed-size, combinatorial coding, then any one active code can represent multiple values, in fact, the entire likelihood distribution, e.g., over all values, e.g., of a scalar variable, stored in the field. Consequently, the vector of single, e.g., first, spikes sent by such a code can simultaneously transmit the full distribution. Combining fixed-size combinatorial coding and first-spike coding may be key to explaining the speed and energy efficiency of probabilistic computation in the brain.
Rod Rinkus
added a research item
Among the more important hallmarks of human intelligence, which any artificial general intelligence (AGI) should have, are the following. 1. It must be capable of on-line learning, including with single/few trials. 2. Memories/knowledge must be permanent over lifelong durations, safe from catastrophic forgetting. Some confabulation, i.e., semantically plausible retrieval errors, may gradually accumulate over time. 3. The time to both: a) learn a new item; and b) retrieve the best-matching / most relevant item(s), i.e., do similarity-based retrieval, must remain constant throughout the lifetime. 4. The system should never become full: it must remain able to store new information, i.e., make new permanent memories, throughout very long lifetimes. No artificial computational system has been shown to have all these properties. Here, we describe a neuromorphic associative memory model, Sparsey, which does, in principle, possess them all. We cite prior results supporting possession of hallmarks 1 and 3. Our current primary research focus is to show possession of hallmarks 2 and 4, and sketch an argument, hinging on the strongly recursive, hierarchical, part-whole compositional structure of natural data, that Sparsey also possesses hallmarks 2 and 4..
Rod Rinkus
added a research item
Machine learning (ML) representation formats have been dominated by: a) localism, wherein individual items are represented by single units, e.g., Bayes Nets, HMMs; and b) fully distributed representations (FDR), wherein items are represented by unique activation patterns over all the units, e.g., Deep Learning (DL) and its progenitors. DL has had great success vis-a-vis classification accuracy and learning complex mappings (e.g., AlphaGo). But, without massive machine parallelism (MP), e.g., GPUs, TPUs, and thus high power, DL learning is intractably slow. The brain is also massively parallel, but uses only 20 watts and moreover, the forms of MP used in DL, model / data parallelism and shared parameters, are patently non-biological, suggesting DL’s core principles do not emulate biological intelligence. We claim that a basic disconnect between DL/ML and biology and the key to biological intelligence is that instead of FDR or localism, the brain uses sparse distributed representations (SDR), i.e., “cell assemblies”, wherein items are represented by small sets of binary units, which may overlap, and where the pattern of overlaps embeds the similarity/statistical structure (generative model) of the domain. We’ve previously described an SDR-based, extremely efficient, one-shot learning algorithm in which the primary operation is permanent storage of experienced events based on single trials (episodic memory), but in which the generative model (semantic memory, classification) emerges automatically, and as a computationally free, in terms of time and power, side effect of the episodic storage process. Here, we discuss fundamental differences between the mainstream localist/FDR-based and our SDR-based approaches.
Rod Rinkus
added a project reference
Rod Rinkus
added a project reference
Rod Rinkus
added a research item
A hyperessay ( http://www.sparsey.com/Sparsey_Hyperessay.html ) describing many elements of Neurithmic Systems's overall AGI theory, Sparsey®. Since the brain is the only known thing that possesses true intelligence and since the brain's cortex is the locus of memory, reasoning, etc., our goal reduces to discovering the fundamental nature of cortical computation. Perhaps the most salient property of the ~2mm thick sheet of neurons/neuropil that comprises the cortex is its extremely regular structure over most of its extent, i.e., 6-layer isocortex (but even the older non-isocortex areas, e.g. paleo/archi-cortex has very regular structure that is phylogenetically and systematically related to isocortex). The gross anatomical/circuit motif repeats on the minicolumnar (~20-30 um) scale throughout isocortex. And there is appreciable evidence for functional / structural repetition at the macrocolumnar ("mac") scale (~300-500 um) as well, e.g., hypercolumns of V1, barrel-related columns of rodent S1. We believe that the mac is the fundamental computational module of cortex and Neurithmic's Sparsey model is proposed to capture the essential computational nature of the cortical mac
Rod Rinkus
added 6 research items
The problem of representing large sets of complex state sequences (CSSs)---i.e., sequences in which states can recur multiple times---has thus far resisted solution. This paper describes a novel neural network model, TEMECOR, which has very large capacity for storing CSSs. Furthermore, in contrast to the various back-propagation-based attempts at solving the CSS problem, TEMECOR requires only a single presentation of each sequence. TEMECOR's power derives from a) its use of a combinatorial, distributed representation scheme, and b) its method of choosing internal representations of states at random. Simulation results are presented which show that the number of spatio-temporal binary feature patterns which can be stored to some criterion accuracy (e.g., 97%) increases faster-than-linearly in the size of the network. This is true for both uncorrelated and correlated pattern sets, although the rate is slightly slower for correlated patterns.
The remarkable structural homogeneity of isocortex strongly suggests a canonical cortical algorithm that performs the same essential function in all regions. That function is widely construed/modeled as probabilistic inference, i.e., the ability, given an input, to retrieve the best-matching memory (or, most likely hypothesis) stored in memory. Here we describe a model for which storage (learning) of new items into memory and probabilistic inference are constant time operations, which is a level of performance not present in any other published information processing system. This efficiency depends critically on: a) representing inputs with sparse distributed representations (SDRs), i.e., relatively small sets of binary units chosen from a large pool; and on b) choosing (learning) new SDRs so that more similar inputs are mapped to more highly intersecting SDRs. The macrocolumn (specifically, its pool of L2/3 pyramidals) was proposed as the large pool, with its minicolumns acting in winner-take-all fashion, ensuring that macrocolumnar codes consist of one winner per minicolumn. Here, I present results of models performing: a) single-trial learning of sets of sequences derived from natural video; and b) immediate (i.e., no search) retrieval of best-matching stored sequences. The change from localist representation of stored items, i.e., hypotheses, to SDRs has a potentially large impact on explaining the storage capacity of cortex, but more importantly on explaining the speed and other characteristics of probabilistic/approximate reasoning possessed by biological brains.
A model is described in which three types of memory—episodic memory, complex sequence memory and semantic memory—coexist within a single distributed associative memory. Episodic memory stores traces of specific events. Its basic properties are: high capacity, single-trial learning, memory trace permanence, and ability to store non-orthogonal patterns. Complex sequence memory is the storage of sequences in which states can recur multiple times: e.g. [A B B A C B A]. Semantic memory is general knowledge of the degree of featural overlap between the various objects and events in the world. The model's initial version, TEMECOR-1, exhibits episodic and complex sequence memory properties for both uncorrelated and correlated spatiotemporal patterns. Simulations show that its capacity increases approximately quadratically with the size of the model. An enhanced version of the model, TEMECOR-II, adds semantic memory properties. The TEMECOR-I model is a two-layer network that uses a sparse, distributed internal representation (IR) scheme in its layer two (L2). Noise and competition allow the IRs of each input state to be chosen in a random fashion. This randomness effects an orthogonalization in the input-to- IR mapping, thereby increasing capacity. Successively activated IRs are linked via Hebbian learning in a matrix of horizontal synapses. Each L2 cell participates in numerous episodic traces. A variable threshold prevents interference between traces during recall. The random choice of IRs in TEMECOR-I precludes the continuity property of semantic memory: that there be a relationship between the similarity (degree of overlap) of two IRs and the similarity of the corresponding inputs. To create continuity in TEMECOR-II, the choice of the IR is a function of both noise (Lambda) and signals propagating in the L2 horizontal matrix and input-to-IR map. These signals are deterministic and shaped by prior experience. On each time slice, TEMECOR-II computes an expected input based on the history-dependent influences, and then computes the difference between the expected and actual inputs. When the current situation is completely familiar, Lamda = 0 and the choice of IRs is determined by the history-dependent influences. The resulting IR has large overlap with previously used IRs. As perceived novelty increases, so does Lambda, with the result that the overlap between the chosen IR and any previously-used IRs decreases.
Rod Rinkus
added a project goal
Understand the core circuit and algorithm of neocortex, which is likely essentially similar to that of hippocampus and olfactory bulb as well.
 
Rod Rinkus
added 2 research items
No generic function for the minicolumn - i.e., one that would apply equally well to all cortical areas and species - has yet been proposed. I propose that the minicolumn does have a generic functionality, which only becomes clear when seen in the context of the function of the higher-level, subsuming unit, the macrocolumn. I propose that: (a) a macrocolumn's function is to store sparse distributed representations of its inputs and to be a recognizer of those inputs; and (b) the generic function of the minicolumn is to enforce macrocolumnar code sparseness. The minicolumn, defined here as a physically localized pool of approximately 20 L2/3 pyramidals, does this by acting as a winner-take-all (WTA) competitive module, implying that macrocolumnar codes consist of approximately 70 active L2/3 cells, assuming approximately 70 minicolumns per macrocolumn. I describe an algorithm for activating these codes during both learning and retrievals, which causes more similar inputs to map to more highly intersecting codes, a property which yields ultra-fast (immediate, first-shot) storage and retrieval. The algorithm achieves this by adding an amount of randomness (noise) into the code selection process, which is inversely proportional to an input's familiarity. I propose a possible mapping of the algorithm onto cortical circuitry, and adduce evidence for a neuromodulatory implementation of this familiarity-contingent noise mechanism. The model is distinguished from other recent columnar cortical circuit models in proposing a generic minicolumnar function in which a group of cells within the minicolumn, the L2/3 pyramidals, compete (WTA) to be part of the sparse distributed macrocolumnar code.
Quantum superposition says that any physical system simultaneously exists in all of its possible states, the number of which is exponential in the number of entities composing the system. The strength of presence of each possible state in the superposition, i.e., its probability of being observed, is represented by its probability amplitude coefficient. The assumption that these coefficients must be represented physically disjointly from each other, i.e., localistically, is nearly universal in the quantum theory/computing literature. Alternatively, these coefficients can be represented using sparse distributed representations (SDR), wherein each coefficient is represented by small subset of an overall population of units, and the subsets can overlap. Specifically, I consider an SDR model in which the overall population consists of Q WTA clusters, each with K binary units. Each coefficient is represented by a set of Q units, one per cluster. Thus, K^Q coefficients can be represented with KQ units. Thus, the particular world state, X, whose coefficient's representation, R(X), is the set of Q units active at time t has the max probability and the probability of every other state, Y_i, at time t, is measured by R(Y_i)'s intersection with R(X). Thus, R(X) simultaneously represents both the particular state, X, and the probability distribution over all states. Thus, set intersection may be used to classically implement quantum superposition. If algorithms exist for which the time it takes to store (learn) new representations and to find the closest-matching stored representation (probabilistic inference) remains constant as additional representations are stored, this meets the criterion of quantum computing. Such an algorithm has already been described: it achieves this "quantum speed-up" without esoteric hardware, and in fact, on a single-processor, classical (Von Neumann) computer.