PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

A cognitive architecture aimed at cumulative learning must provide the necessary information and control structures to allow agents to learn incrementally and autonomously from their experience. This involves managing an agent's goals as well as continuously relating sensory information to these in its perception-cognition information stack. The more varied the environment of a learning agent is, the more general and flexible must be these mechanisms to handle a wider variety of relevant patterns, tasks, and goal structures. While many researchers agree that information at different levels of abstraction likely differs in its makeup and structure and processing mechanisms, agreement on the particulars of such differences is not generally shared in the research community. A binary processing architecture (often referred to as System-1 and System-2) has been proposed as a model of cognitive processing for low- and high-level information, respectively. We posit that cognition is not binary in this way and that knowledge at any level of abstraction involves what we refer to as neurosymbolic information, meaning that data at both high and low levels must contain both symbolic and subsymbolic information. Further, we argue that the main differentiating factor between the processing of high and low levels of data abstraction can be largely attributed to the nature of the involved attention mechanisms. We describe the key arguments behind this view and review relevant evidence from the literature.
Content may be subject to copyright.
Neurosymbolic Systems of Perception
& Cognition: The Role of Attention
Hugo Latapie,1Ozkan Kilic,1,Kristinn R. Th ´
orisson,2Pei Wang,3and Patrick
Hammer 4
1Emerging Technologies & Incubation, Cisco Systems, San Jose, CA, USA
2Icelandic Institute for Intelligent Machines and Department of Computer Science,
Reykjavik University, Reykjavik, Iceland
3Department of Computer and Information Sciences, Temple University,
Philadelphia, PA, USA
4Center for Digital Futures, KTH Royal Institute of Technology and Stockholm
University, Stockholm, Sweden
Ozkan Kilic
A cognitive architecture aimed at cumulative learning must provide the necessary information
and control structures to allow agents to learn incrementally and autonomously from their
experience. This involves managing an agent’s goals as well as continuously relating sensory
information to these in its perception-cognition information processing stack. The more varied the
environment of a learning agent is, the more general and flexible must be these mechanisms to7
handle a wider variety of relevant patterns, tasks, and goal structures. While many researchers
agree that information at different levels of abstraction likely differs in its makeup and structure and
processing mechanisms, agreement on the particulars of such differences is not generally shared
in the research community. A dual processing architecture (often referred to as System-1 and
System-2) has been proposed as a model of cognitive processing, and they are often considered
as responsible for low- and high-level information, respectively. We posit that cognition is not
binary in this way and that knowledge at any level of abstraction involves what we refer to as
neurosymbolic information, meaning that data at both high and low levels must contain both
symbolic and subsymbolic information. Further, we argue that the main differentiating factor
between the processing of high and low levels of data abstraction can be largely attributed to the
nature of the involved attention mechanisms. We describe the key arguments behind this view
and review relevant evidence from the literature.19
Keywords: Artificial Intelligence, Cognitive Architecture, Perception, Cognition, Levels of Abstraction, Neurosymbolic Models, Learning,
Cumulative Learning, Systems of Thinking21
Cognitive architectures aim to capture the information and control structures necessary to create autonomous
learning agents. The sensory modalities of artificially intelligent (AI) agents operating in physical
environments must measure relevant information at relatively low levels of detail, commensurate with
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
the agent’s intended tasks. Self-supervised learning makes additional requirements on the ability of an
agent to dynamically and continuously relate a wide variety of sensory information to high-level goals of
tasks. The more general an agent’s learning is, the larger a part of its perception-cognition “information
stack” must capture the necessary flexibility to accommodate a wide variety of patterns, plans, tasks, and
goal structures. Low levels of cognition (close to the perceptual senses) seem to quickly generate and use29
predictions to generalize across similar problems. This is a key responsibility of a sensory system because
low-latency predictions (i.e. those that the agent can act quickly on) are vital for survival in a rapidly
changing world. Natural intelligence has several outstanding skills that Deep Learning does not have. Two
of these, as pointed out by e.g. Bengio et al. (2021), are that (a) it does not require thousands of samples to
learn, and (b) it can cope with out-of-order (OOD) samples. As detailed by e.g. Th
orisson et al. (2019),
another equally important shortcoming is that Deep Learning does not handle learning after the system
leaves the laboratory – i.e. cumulative learning – in part because it does not harbour any means to verify
newly acquired information autonomously. Such skills require not only perception processes that categorize
the sensory data dynamically so that the lower levels can recognize ‘familiar’ situations by reconfiguring
known pieces and trigger higher-level cognition in the case of surprises, but also the reasoning to evaluate
the new knowledge that has been thus produced. Whenever high-level cognition solves a new problem,
the coordination allows the new knowledge to modify and improve the lower levels for similar future
situations, which also means that both systems have access to long-term memory. Architectures addressing
both sensory- and planning-levels of cognition are as of yet few and far between.43
While general agreement exists in the research community that information at different levels of
abstraction likely differs in makeup and structure, agreement on these differences – and thus the particulars
of the required architecture and processes involved – is not widely shared. It is sometimes assumed that
lower levels of abstraction are subsymbolic
and higher levels symbolic, which has led some researchers to
the idea that Deep Learning models are analogous to perceptual mechanisms while higher levels involve rule-
based reasoning skills due to a symbolic nature, and according to e.g. Kahneman (2011), is the only system
that can use language. This view has been adopted in some AI research, where ‘subsymbolic’ processing
are classified as System-1 processes, while higher-level and ‘symbolic’ processing is considered belonging
to System-2 (c.f. Smolensky, 1988; Sloman, 1996; Kahneman, 2011, Strack & Deutch, 2004). According to
this view, artificial neural networks, including Deep Learning, are System-1 processes; rule-based systems
are System-2 processes (see Bengio et al., 2021; and Bengio, 2019 for discussion). Similarly, William James
1890 proposed that the mind has two mechanisms of thought, one which handled reasoning and another
which was associative. We posit instead that cognition is not binary in this way at all, and that any level of
abstraction involves processes operating on what might be called “neurosymbolic” knowledge, meaning that
data at both high and low levels must accommodate both symbolic and subsymbolic information.
we argue that a major differentiating factor between the processing of high and low levels of data abstraction
can be largely attributed to the nature of the involved attention mechanisms.60
More than a century ago, James (1890) defined attention as “taking possession by the mind, in clear and
vivid form, of one out of what may seem several simultaneously possible objects or trains of thought...It
implies withdrawal from some things in order to deal effectively with others.” We consider attention to
consist of a (potentially large) set of processes whose role consists in steering the available resources of
a cognitive system, from moment to moment, including (but not limited to) its short-term focus, goal
We classify data as ‘subsymbolic’ if it can only be manipulated through approximate similarity-mapping processes, i.e. cannot be grouped and addressed as a
(named) set.
By ‘symbolic’ here we mean that the information is at the level of abstraction close to human verbal description, not that it uses ‘symbols’ that must be
interpreted or ‘grounded’ to become meaningful.
This is a provisional file, not the final typeset article 2
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
pursuit, sensory control, deliberate memorization, memory retrieval, selection of sensory data, and many
other subconscious control mechanisms that we can only hypothesize at this point and thus have no
names for. Low-level cognition, like perception, is characterized by a relatively high-speed, distributed
(“multi-threaded”), subconscious
attention control, while higher-level cognition seems more “single-
threaded”, and relatively slower. When people introspect, our conscious threads of attention seem to
consist primarily of the latter, while much of our low-level perceptions are subconscious and under the
control of autonomous attention mechanisms (see Koch and Tsuchiya, 2006; Marchetti, 2011; Sumner
et al., 2006 for evidence and discussion about decoupling attention from conscious introspection). Low-
level perception and cognitive operations may reflect autonomous access to long-term memory through
subconscious attention mechanisms, while higher-level operation may involve the recruitment of deliberate
(introspectively-accessible) cognitive control, working memory, and focused attention (Papaioannou et al.,
Two separate issues in the System-1/System-2 discussion are often confused: (1) Knowledge
representation and (2) information processing. The first is the (by now, familiar) ‘symbolic vs. subsymbolic’
distinction, while the second involves the ‘automatic vs. controlled’ distinction. Not only are these two
distinctly different, they are also not perfectly aligned; while subsymbolic knowledge may be more often
processed ‘automatically’ and symbolic knowledge seem generally more accessible through voluntary
control and introspection, this mapping cannot be taken as given. A classic example is skill learning
like riding a bike, which starts as a controlled process, and gradually becomes automatic with increased
training. On the whole this process is largely subsymbolic, with hardly anything but the top-level goals
introspectively accessible to the learner of bicycle-riding (“I want to ride this bicycle without falling or
crashing into things”). Though we acknowledge the above differences, in this article our focus is on the
relations and correlations between these two distinctions.88
The sharp distinction between two hypothesized systems that some AI researchers have interpreted dual-
process theory to entail (cf. Posner, 2020) doesn’t seem very convincing when we look at the dependencies
between the necessary levels of processing. For instance, it has been demonstrated time and again (cf. Spivey
et al., 2013) that expectations created verbally (‘System-2 information’) have a significant influence on
low-level behavior like eye movements (‘System-1 information’). It is not obvious why – or how – two
sharply separated control systems would be the best – or even a good – way to achieve a tight coupling
between levels thus demonstrated, as has been noted by other authors (cf. Houwer, 2019). Until more
direct evidence is collected for the hypothesis that there really are two systems (as opposed to three, four,96
fifty, or indeed a continuity), it is a fairly straight forward task to fit the available evidence onto that theory
(cf. Strack and Deutch, 2004). In the context of AI, more direct evidence would include a demonstration of
an implemented control scheme that produced some of the same key properties as human cognition from
first principles.100
We would expect high-level (abstract) and low-level (perceptual/concrete) cognition to work in
coordination, not competition, after millions of years of evolution. Rather than implementing a (strict, or
semi-strict) pipeline structure between S1 and S2, where only data would go upstream (from S1 to S2) and
only control downstream (from S2 to S1; cf. Evans and Elqayam, 2007; Evans and Stanovich, 2013; Keren,
2013; Monteiro and Norman, 2013), we hypothesize high-level and low-level cognition to be coupled
We consider ‘subconscious’ cognitive processes the set of processes that are necessary for thought and that a mind cannot make the subject of its own
cognitive processing, i.e. all its processes that it does not have direct intropsective access to.
Frontiers 3
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
through a two-way control-and-data communication, as demonstrated in numerous experiments (see Xu,
2020 review article on cross-modal processing between high- and low- level cognition). In other words,
the low-level cognition does not solely work under control of the high-level one; rather, the two levels
cooperate to optimize resource utilization through joint control.109
Through the evolution of the human brain, some evidence seems to indicate that language-based
conceptual representations replaced sensory-based compositional concepts, explaining the slower reaction
times in humans than other mammals, e.g., chimpanzees (see for instance Martin et al., 2014). However,
this replacement may have pushed the boundaries of human higher-level cognition by allowing complex
propositional representations and mental simulations. While animals do not demonstrate propositional
properties of human language, researchers have found some recursion in birdsong (Gentner et al., 2006)
and in syntax among bonobos (Clay and Zuberbuhler, 2011). Moreover, Camp (2009) found evidence
that some animals think in compositional representational systems. In other words, animals seem to lack
propositional thought, but they have compositional conceptual thought, which is mostly based on integrated
multisensory data. Since animals appear to have symbol-like mental representations, these findings indicate
that their lower levels can be neurosymbolic. Evidence for this can be found in a significant number studies
from the animal-cognition literature (for review, see Camp, 2009; Hubbard et al., 2008; Diester and Nieder,
2007; Hauser et al., 2007; Brannon, 2005).122
Among the processes of key importance in skill learning, to continue with that example, is attention; a
major cognitive difference between a skilled bike rider and a learner of bike-riding is what they pay attention
to: The knowledgeable rider pays keen attention to the tilt angle and speed of the bicycle, responding
by changing the angle of the steering wheel dynamically, in a non-linear relationship. Capable as they
may already be of turning the front wheel to any desired angle, a learner is prone to fall over in large part
because they don’t know what to pay attention to. This is why one of the few obviously useful tips that
a teacher of bicycle-riding can give a learner is to “always turn the front wheel in the direction you are
Kahneman (1973) sees attention as a pool of resources which allows different process to share cognitive
capabilities and posits a System-1 that is fast, intrinsic, autonomous, emotional, parallel, and a System-2
that is slower, deliberate, conscious, and serial (Kahneman, 2011). For example, driving a car on an
empty road (with no unexpected events), recognizing your mother’s voice, and calculating 2+2, mostly
involve System-1, whereas counting the number of people with eyeglasses in a meeting, recalling and
dialing your significant other’s phone number, calculating 13x17, and filling out a tax form depend on
System-2. Kahneman’s System-1 is good at making quick predictions because it constantly models similar
situations based on experience. It should be noted that “experience” in this context relates to the the process
of learning, and its transfer – i.e. generalization and adaptation – which presumably relies heavily on
higher-level cognition (and should thus be part of System-2). Learning achieved in conceptual symbolic
space can be projected to subsymbolic space. In other words, since symbolic and subsymbolic spaces are
in constant interaction, acquired knowledge in symbolic space has correspondences in subsymbolic space.
This allows System-1 to start quickly using the projections of the knowledge, even based on System-2
Several fMRI studies support the idea that sensory-specific areas, such as thalamus, may be involved
in multi-sensory stimulus integrations (Miller and D’Esposito, 2005; Noesselt et al., 2007; Werner
and Noppeney, 2010), which are symbolic representations in nature. Sensory-specific brain regions are
considered to be networks specialized in subsymbolic data that originates from the outside world and
different body parts. Thalamo-cortical oscillation is known as a synchronization mechanism or temporal
This is a provisional file, not the final typeset article 4
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
binding between different cortical regions (Llinas, 2002). However, recent evidence shows that the thalamus,
previously assumed to be responsible only for relaying sensory impulses from body receptors to the cerebral
cortex, can actually integrate these low-level impulses (Tyll et al., 2011; Sampathkumar et al., 2021). In
other words, in the thalamus there are sensory-based integrations, and they are essential in sustaining
cortical cognitive functions.154
Wolff and Vann (2019) use the term “cognitive thalamus” to describe a gateway to mental representations
because recent findings support the idea that thalamocortical and corticothalamic pathways may play
complementary but dissociable cognitive roles (see Bolkan et al., 2017; Alcaraz et al., 2018). More
specifically, the thalamocortical pathway (the fibers connecting thalamus to cortex region) can create and
save task-related representations, not just purely sensory information, and this pathway is essential
for updating cortical representations. Similarly, corticothalamic pathways seem to have two major
functions: directing cognitive resources (focused attention) and contributing to learning. In a way, the
thalamocortical pathway defines the world for the cortex, and the corticothalamic pathway uses attention
to tell thalamus what the cortex needs from it to focus. Furthermore, a growing body of evidence shows
that the thalamus plays a role in cognitive dysfunction, such as schizophrenia (Anticevic et al.,2014),
Down’s syndrome (Perry et al., 2018), drug addiction (Balleine et al., 2015), and ADHD (Hua et al.,
2021). These discoveries support other recent findings about the role of the thalamus in cognition via
the thalamocortical loop. The thalamus, a structure proficient in using and integrating subsymbolic data
actively, describes the world for the cortex by contributing to the symbolic representations in it. On the
other hand, the cortex uses attention to direct resources to refresh its symbolic representations from the
subsymbolic space. In Non-Axiomatic Reasoning System (NARS; Wang, 2006) attention has the role
of allocating processing power for producing and scheduling inference steps, whereby inferences can
compose new representation from existing components, seek out new ones, and update the strength of
existing relationships via knowledge revision. This control also leads to a refreshing of representations
in a certain sense, as the system will utilize the representations which are most reliable and switch to
alternatives if some of them turn out to be unreliable.175
In the Auto-catalytic Endogenous Reflective Architecture (AERA) attention is implemented as system-
permeating control of computational/cognitive resources at very fine-grain levels of processing, bounded by
goals at one end and the current situation at the other (cf. Nivel et al., 2015; Helgason et al., 2013). Studies
on multitasking in humans have shown that a degree of parallelism among multiple tasks is more likely if
the tasks involve different data modalities, such as linguistic and tactile. Low-level attention continuously
monitors both mind and the outside world and assesses situations (i.e., relates it to active goals and plans)
with little or no effort, through its access to long-term memory and the sensory information. Surprises and
threats and detected early in the perceptual stream, while plans and questions are handled at higher levels
of abstraction, triggering higher levels of processing, which also provide a top-down control of attention
and reasoning.185
In contrast to so-called “attention” mechanisms in artificial neural networks (which are for the most
part rather narrow interpretations of resource control in general), mental resources (processing power and
storage in computer systems) are explicitly distributed, whereby filtering of input for useful input patterns
is just a special case. Another aspect is priming for related information by activating it, which is not limited
to currently perceived information but can integrate long-term memory content rather than just content of a
sliding window (as in Transformers) of recent stimuli in input space.191
Frontiers 5
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
The idea of combining symbolic and sub-symbolic approaches, also known as the neurosymbolic approach,
is not new. Many researchers are working on integrated neural-symbolic systems which translate symbolic
knowledge into neural networks (or the other way around), because symbols, relations, and rules should
have counterparts in the sub-symbolic space. Moreover, the neurosymbolic network needs a symbol
manipulation that also supports preservation of the structural relations between the two systems without
losing the correspondences.197
Currently, Deep Learning and related machine learning methods are primarily subsymbolic. Meanwhile,
rule-based systems and related reasoning systems are usually strictly symbolic. We consider it possible to
have a Deep Learning model that demonstrates symbolic cognition (without reasoning mechanisms) that
entails the transformation of symbolic representations into subsymbolic ML/DL/statistical models. One
of the costs associated with such transformation, however, is an inevitable loss of the underlying causal
model which may have existed in the symbolic representation (Parisi et al., 2019). Current subsymbolic
representations are exclusively correlational; information based on spurious correlation is indistinguishable
from other correlations and causal direction between correlating variables is not represented and thus not
separable from either of those knowledge sets.206
There is an ongoing interest in bringing symbolic and abstract thinking to Deep Learning, which could
enable more powerful kinds of learning. Graph neural networks with distinct nodes (Kipf et al., 2018; Van
Steenkiste et al., 2018), transformers with discrete positional elements (Vaswani et al., 2017), and modular
models with bandwidth (Goyal and Bengio, 2020) are examples of attempts in this direction. Liu et al.
(2021) summarize the advantages of having discrete values (symbols) in a Deep Learning architecture.
First, using symbols allows a language for inter-modular interaction and learning, whereby the meaning
of symbols is not innate but determined by the relationships with others (as in Semiotics). Second, it
allows reusing previously learned symbols in unseen or out-of-order situations, by reinterpreting them in
a way suitable to the situation. Discretization in Deep Learning may provide systematic generalization
(recombining existing concepts) but it is currently not very successful (Lake and Baroni, 2018).216
Current hybrid approaches attempt to combine symbolic and subsymbolic models to compensate for
each other’s drawbacks. However, the authors believe that there is a need for a metamodel which will
accommodate hierarchical knowledge representations. Latapie et al. (2021) proposed such a model inspired
by Korzybski’s (1994) idea about levels of abstraction. Their model promotes cognitive synergy and
metalearning, which refer to the use of different computational techniques and AGI approaches, e.g.,
probabilistic programming, machine learning/Deep Learning, AERA (Th
orisson, 2020; Nivel et al., 2013),
(Wang, 2006; Wang, 2013) to enrich its knowledge and address combinatorial explosion issues.
The current paper extends the metamodel as a neurosymbolic architecture5as in Figure 1.224
In this metamodel, the levels of abstractions
are marked with L.L0 is the closest to the raw data
collected from various sensors. L1 contains the links between raw data and higher level abstractions. L2
corresponds to the highest integrated levels of abstraction learned through statistical learning, reasoning,
and other processes. The layer L2 can have an infinite number of sub-layers since any level of abstraction
in L2 can have metadata existing at an even higher level of abstraction. L* holds the high-level goals
4With open-source implementation OpenNARS at — last accessed on Oct 20th, 2021.
This architecture has a “symbolic” aspect in the sense that there are components that can be are accessed and manipulated using their identifiers. This is
different from traditional Symbolic AI where a “symbol” gets its meaning by referring to an external object or event, as stated by Newell and Simon (1976).
6Krozybski (1994) states that knowledge is a multiordinal, hierarchical structure with varying levels of abstraction.
This is a provisional file, not the final typeset article 6
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
and motivations, such as self-monitoring, self-adjusting, self-repair, and the like. Similar to the previous
version, the neurosymbolic metamodel is based on the assumption of insufficient knowledge and resources
(Wang, 2005). The symbolic piece of the metamodel can be thought of as a knowledge graph with some
additional structure that includes both a formalized means of handling anti-symmetric and symmetric
relations, as well as a model of abstraction. The regions in the subsymbolic piece of the metamodel are
mapped to the nodes in the symbolic system in L1. In this approach, the symbolic representations are
always refreshed in a bottom-up manner.236
Depending on the system’s goal or subgoals, the metamodel can be readily partitioned into subgraphs
using the hierarchical abstraction substructure associated with the current focus of attention. This
partitioning mechanism is crucial to manage combinatorial explosion issues while enabling multiple
reasoners to operate in parallel. Each partition can trigger a sub-focus of attention (sFoA), which requests
subsymbolic data from System-1 or some answers from System-2. The bottom-up refreshing and the
neurosymbolic mapping between regions and symbols allow the metamodel to benefit from different
computational techniques (e.g., probabilistic programming, Machine Learning/Deep Learning and such) to
enrich its knowledge and benefit from the ‘blessing of dimensionality’ (cf. Gorban, 2018), also referred to
as ‘cognitive synergy.245
A precursor to the metamodel as a neurosymbolic approach was first used by Hammer et al. (2019).
This version was the first commercial implementation of a neurosymbolic AGI-aspiring
approach in the
smart city domain. Later, the need for use of the levels of abstraction in the metamodel became mandatory
due to the combinatorial explosion issue. In other words, structural knowledge representation with the
levels of abstraction became very important for partitioning the problem, process subsymbolic or symbolic
information for each sub problem (focus of attention, FoA), and then combine the symbolic results in the
metamodel. The metamodel with the level of abstraction was actually achieved fully in the retail domain
(see Latapie et al., 2021 for details). The flow of the retail use case with the metamodel is shown in Figure
2. The example for the levels of abstraction using the results of the retail use case is shown in Figure 3.
Latapie et al. (2021) emphasized that no Deep Learning model was trained with product or shelf images for
the retail use case. The system used for the retail use case is solely based on representing the subsymbolic
information in a world of bounding boxes with spatial semantics. The authors tested the metamodel in 4
different settings with and without the FoA and reported the results as in Table 1.258
Table 1. Experimental results from Retail Use Case using Metamodel
Category without FoA (%) with FoA (%)
precision recall f1-score precision recall f1-score
product 80.70 29.32 52.88 96.36 99.07 97.70
shelf 8.82 18.75 12.00 82.35 87.50 88.85
other 36.61 89.66 52.00 96.00 82.76 88.89
overall accuracy 46.30 (min/max: 30.13/84.65) 94.73 (min/max: 88.10/100.00)
Another use case for the metamodel is the processing of more than 200,000 time series with a total of
more than 30 million individual data points. The time series are network telemetry data. For this use case,
there are only two underlying assumptions: The first assumption is that the time series or a subset of them
is at least weakly-related, such as time series from computer network devices. The second assumption
Artificial general intelligence (AGI) is the research area closest to the original vision of the field of AI, namely, to create machines with intelligence on par
with humans.
Frontiers 7
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
is that when a number of time series simultaneously change their behaviors, it might indicate that an
event-of-interest has happened. For detecting anomalies and finding regime change locations, Matrix
Profile algorithms are used (see Yeh et al., 2016; Gharhabi et al. 2017 for Matrix Profile and Semantic
Segmentation). Similar to the retail use case, millions of sensory data points are reduced to a much smaller
number of events based on the semantic segmentation points. These points are used to form a histogram
of regime changes as shown in Figure 4. The large spikes in the histogram are identified as the candidate
events-of-interest. Then the metamodel creates a descriptive model for all time series, which allows system
to downsize millions of data points into a few thousand structural actionable and explainable knowledge.270
To test the metamodel with time series, we first use a subset of the Cisco Open Telemetry Data Set.
After being able to identify the anomalies in the data set, we create our own data sets similar to the Open
Telemetry Data. For this purpose, 30 computer network events, such as memory leak, transceiver pull,
port flap, port shut down, and such, are injected to a physical computer network. The system is able to
identify 100% of the events with a maximum of 1 minute delay. For example, Figure 4 represents the
histogram of regime changes for a port shut down event, which is injected at the 50
timestamp. Since the
sampling rate is 6 seconds, one minute later (which is at the 60
timestamp) the system detects a spike as
an event-of-interest. It can take time for a single incident to display a cascading effect on multiple devices.
When the injection ends at the 100
timestamp, another spike is observed within 10 timestamps, which
represents a recovery behavior for the network. It should be noted that not all events necessarily mean
an error has happened. Some usual activities in the network, e.g., a usual firmware update on multiple
devices as events-of-no-interest, are also captured by the metamodel. The metamodel learns to classify
such activities either by observing the network. Although the time series processing using the metamodel
does not require any knowledge of computer networking, it can easily incorporate such features extracted
by networking-specific modules, e.g., Cisco Joy,
or ingest some expert knowledge defined in the symbolic
world, specifically at the 2
level of abstraction This neurosymbolic approach with the metamodel can
quickly reduce the sensory data into knowledge, reason on this knowledge, and notify the network operators
for remediation or trigger a self-healing protocol.288
The neurosymbolic approach presented here evolved from several independent research efforts by four
core teams (NARS, AERA, OpenCog (Hart and Goertzel, 2008)) as well as efforts at Cisco over the
past 10 years focusing on hybrid state-of-the-art AI for commercial applications. This empirically-based
approach to AI took off (circa 2010) with deep-learning based computer vision, augmented by well-known
tracking algorithms (e.g. Kalman filtering / Hungarian algorithm). The initial hybrid architecture resulted293
in improved object detection and tracking functionality, but the types of errors, arguably related to weak
knowledge representation and poor ability to define and learn complex behaviors, resulted in systems
which did not meet our performance objectives. This initial hybrid architecture was called DFRE, Deep
Fusion Reasoning Engine, which actually lacked the metamodel. In order to improve the system’s ability to
generalize, NARS was incorporated. The initial architecture used NARS to reason about objects and their
movements in busy city intersection with trains, busses, pedestrians, and heavy traffic. This initial attempt
at a commercial neurosymbolic system dramatically improved the ability of the system to generalize and
learn behaviors of interest, which in this case were all related to safety. In essence the objective of the
system was to raise alerts if any two moving objects either made contact or were predicted to make contact
This is a provisional file, not the final typeset article 8
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
as well as to learn other dangerous behaviors such as jay walking, wrong-way driving, and such. While
this system worked well as an initial prototype and is considered a success, there were early indications of
potential computational scalability issues if the number of objects requiring real-time processing were to
increase from the average 100 or so to say an order of magnitude more objects, such as 1000. In order to
explore this problem we then focused on a retail inventory use case that required the processing of over
1000 objects. As expected, DFRE suffered from the predicted combinatorial explosion issues. In the retail
use case, this problem was solved via the metamodel’s abstraction hierarchy which provides a natural
knowledge partitioning mechanism. This partitioning mechanism was used to address the exponential time
complexity problem and convert it to a linear time complexity problem.311
While NARS enabled the system to learn by reasoning in an unsupervised manner, there was a
growing need in commercial applications for a principled mechanism for unsupervised learning directly
from temporal data streams such as sensor data, video data, telemetry data, etc. This is the focus of
AERA as well as internal Cisco project Kronos based on Matrix Profile (Yeh, 2016). While there is
a large body of work on time series processing (FFT, Wavelets, Matrix Profile, etc.), the problem of
dealing with large-scale time series and incorporating contextual knowledge to produce descriptive and
predictive models with explanatory capability seems relatively unsolved at the time of this writing. In
our preliminary experimentation, both AERA and Cisco’s Kronos projects are demonstrating promising
results. Incorporating AERA and Kronos into the hybrid architecture is expected to result in enhanced
unsupervised learning and attention mechanisms directly from large-scale time series.321
This evolved hybrid architecture (ML/DL/NARS/Kronos metamodel) is expected to promote cognitive
synergy while preserving level of abstraction, symmetric and anti-symmetric properties of knowledge and
using a bottom-up approach to refresh System-2 symbols from System-1 data integration (see Latapie et al.,
2021 for details). Moreover, System-1 provides rapid responses to the outside world and activates System-2
in case of a surprise such as an emergency or other significant event that requires further analysis and
potential action. System-2 uses conscious attention to request subsymbolic knowledge and sensory data
from System-1, to be integrated into the levels of abstraction inspired from Korzybski’s work. Korzybski’s
two major works (Korzybski, 1921; Korzybski, 1994) emphasize the importance of bottom-up knowledge.
The corticothalamic and thalamocortical connections play different but complementary roles.330
A balanced interplay between System-1 and System-2 is important. System-1’s innate role is to ensure
the many faceted health of the organism. System-2 is ideally used to help humans better contend with
surprises, threats, complex situations, important goals, and achieve higher levels in Maslow’s hierarchy
of needs. From an AI systems perspective, contemporary Deep/Machine Learning methods (including
Deep Learning) have it the other way around: Causal modeling and advanced reasoning are being solved
in System 1, leveraging statistical models which can be seen as an inversion of proper thalamocortical
While not conclusive, findings about natural intelligence from psychology, neuroscience, cognitive science,
and animal cognition imply that both low-level perceptual knowledge and higher-level more abstract
knowledge may be neurosymbolic. The difference between high and low levels of abstraction may be that
lower levels involve a greater amount of unconscious (automatic) processing and attention, while higher
levels are introspectable to a greater extent (in humans, at least) and involve conscious (i.e. steerable)
attention. The neurosymbolic metamodel and framework introduced in this paper for artificial general
Frontiers 9
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
intelligence is based on these findings, and the nature of the distinction between both systems will be
subject to further research. One may ask whether artificial intelligence needs to mimic natural intelligence
as a key performance indicator. The answer is yes and no. No, because natural intelligence, a result of
billions of years of evolution, is full of imperfections and mistakes. Yes, because it is the best way known
to help organisms survive for countless generations.348
Both natural and artificial intelligences can exhibit astounding generalizability, performance, ability
to learn, and other important adaptive behaviors when symbolic originating attention and sub-symbolic
originating attention are properly handled. Allowing one system of attention to dominate, or inverting
the natural order (e.g. reasoning in the subsymbolic space or projecting symbolic space stressors into the
subsymbolic space) may lead to suboptimal results for engineered systems, individuals, and societies.353
H. Latapie and O. Kilic conceived of the presented idea. H. Latapie designed the framework and the
experiments. H. Latapie and O. Kilic implemented the framework. O. Kilic ran the tests and collected data.
K. R. Th
orisson, P. Wang and P. Hammer contributed to the theoretical framework. All authors contributed
to the writing of this manuscript and approved the final version.357
The authors would like to thank Tony Lofthouse for his valuable comments.358
Alcaraz, F., Fresno, V., Marchand, A. R., Kremer, E. J., Coutureau, E., and Wolff, M. (2018).
Thalamocortical and corticothalamic pathways differentially contribute to goal-directed behaviors in the
rat. eLife 7, e32517. doi:10.7554/eLife.32517361
Anticevic, A., Cole, M. W., Repovs, G., Murray, J. D., Brumbaugh, M. S., Savic, A. M. W. A., et al. (2014).
Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cerebral Cortex 24,
3116–3130. doi:10.1093/cercor/bht165pmid:23825317364
Balleine, B. W. and Leung, R. W. M. B. K. (2015). Thalamocortical integration of instrumental learning
and performance and their disintegration in addiction. Brain Research 1628, 104–116. doi:10.1016/j.
Bengio, Y. (2019). From system1 deep learning to system2 deep learning [conference presentation].
NeurIPS 2019 Posner Lecture369
Bengio, Y., Lecun, Y., and Hinton, G. (2021). Deep learning for ai. Communications of the ACM 64,
Bolkan, S. S., Stujenske, J. M., Parnaudeau, S., Spellman, T. J., Rauffenbart, C., Abbas, A. I., et al.
(2017). Thalamic projections sustain prefrontal activity during working memory maintenance. Nature
Neuroscience 20, 987–996374
Brannon, E. M. (2005). What animals know about number. In Handbook of mathematical cognition, ed.
J. I. D. Campbell (New York: Psychology Press). 85–108376
Camp, E. (2009). Language of baboon thought. In The Philosophy of Animal Minds, ed. R. W. Lurz
(Cambridge: Cambridge University). 108–127378
Clay, Z. and Zuberb
uhler, K. (2011). Bonobos extract meaning from call sequences. PLoS ONE 6, e18786
This is a provisional file, not the final typeset article 10
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Diester, I. and Nieder, A. (2007). Semantic associations between signs and numerical categories in the
prefrontal cortex. PLoS Biol 5381
Evans, J. S. B. and Elqayam, S. (2007). Dual-processing explains base-rate neglect, but which dual-process
theory and how? Behavior and Brain Science 30, 261–262. doi:10.1017/S0140525X07001720383
Evans, J. S. B. T. and Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the
debate. Perspectives in Psychological Science 8, 223–241385
Gentner, T. Q., Fenn, K. M., Margoliash, D., and Nusbaum, H. C. (2006). Recursive syntactic pattern
learning by songbirds. Nature2006 440, 1204–1207387
Gharghabi, S., Ding, Y., Yeh, C.-C. M., Kamgar, K., Ulanova, L., and Keogh, E. (2017). Matrix profile
viii: Domain agnostic online semantic segmentation at superhuman performance levels. In 2017 IEEE
International Conference on Data Mining (ICDM). 117–126. doi:10.1109/ICDM.2017.21390
Gorban, A. N. and Tyukin, I. Y. (2018). Blessing of dimensionality: mathematical foundations of the
statistical physics of data. Philosophical Transactions of the of the Royal Society Mathematical Physical
and Engineering Sciences 440, 1204–1207393
Goyal, A. and Bengio, Y. (2020). Inductive biases for deep learning of higher-level cognition. arXiv
Hammer, P., Lofthouse, T., Fenoglio, E., and Latapie, H. (2019). A reasoning based model for anomaly
detection in the smart city domain. In NARS Workshop in AGI-19, Shenzhen, China, August 6. 1–10397
Hart, D. and Goertzel, B. (2008). Opencog: A software framework for integrative artificial general
intelligence. In Proc. of AGI2008, eds. P. Wang, B. Goertzel, and S. Franklin. 468–472399
Hauser, M. D., Dehaene, S., Dehaene-Lambertz, G., and Patalano, A. L. (2007). Spontaneous number
discrimination of multi-format auditory stimuli in cotton-top tamarins (saguinus oedipus). Cognition 86,
Helgason, H. P., Th
orisson, K. R., Garrett, D., and Nivel, E. (2013). Towards a general attention mechanism
for embedded intelligent systems 4, 1–7404
Houwer, J. D. (2019). Moving beyond system 1 and system 2: Conditioning, implicit evaluation, and
habitual responding might be mediated by relational knowledge. Experimental Psychology 66, 257–265
Hua, M., Chen, Y., Chen, M., Huang, K., Hsu, J., Bai, Y., et al. (2021). Network-specific corticothalamic
dysconnection in attention-deficit hyperactivity disorder. Journal of Developmental and Behavioral
Pediatrics 42, 122–127. doi:10.1097/DBP.0000000000000875409
Hubbard, E. M., Diester, I., Cantlon, J. F., Ansar, D., van Opstal, F., and Troiani, V. (2008). The evolution
of numerical cognition: From number neurons to linguistic quantifiers. The Journal of Neuroscience 28,
11819 –11824412
James, W. (1890). The Principles of Psychology, Vol. 2 (NY: Dover Publication)413
Kahneman, D. (1973). Attention and Effort (NJ: Prentice-Hall)414
Kahneman, D. (2011). Thinking, fast and slow (NY: Farrar, Straus and Giroux)415
Keren, G. (2013). A tale of two systems: A scientific advance or a theoretical stone soup? commentary on
evans stanovic. Perspectives on Psychological Science 8, 257–262417
Kipf, T., Fetaya, E., Wang, K. C., Welling, M., and Zemel, R. (2018). Neural relational inference for
interacting systems. arXiv preprint419
Koch, C. and Tsuchiya, N. (2006). Attention and consciousness: two distinct brain processes. Trends
Cognitive Science 11, 16–22. doi:10.1016/j.tics.2006.10.012421
Korzybski, A. (1921). Manhood Of Humanity, The Science and Art of Human Engineering (NY: E. P.
Dutton and Company)423
Frontiers 11
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Korzybski, A. (1994). Science and Sanity: An Introduction to Non-Aristotelian Systems, 5th edn, (NY:
Institute of General Semantics)425
Lake, B. and Baroni, M. (2018). Generalization without systematicity: On the compositional skills of
sequence-to-sequence recurrent networks. In Proc. of International Conference on Machine Learning
(PMLR), 2873–2882428
Latapie, H., Liu, O. K. G., Kompella, R., Lawrence, A., Sun, Y., Srinivasa, J., et al. (2021). A metamodel
and framework for artificial general intelligence from theory to practice. Journal of Artificial Intelligence
and Consciousness 8, 205–227. doi:10.1142/S2705078521500119431
Liu, D., Lamb, A., Kawaguchi, K., Goyal, A., Mozer, C. S. M. C., and Bengio, Y. (2021). Discrete-valued
neural communication. arXiv preprint433
Llinas, R. R. (2002). Thalamocortical assemblies: How ion channels, single neurons and large-scale
networks organize sleep oscillations. In Thalamus and Related Systems, eds. A. Destexhe and T. J.
Sejnowski (Oxford: Oxford University). 87–88436
Marchetti, M. (2011). Against the view that consciousness and attention are fully dissociable. Frontiers in
Psychology 3. doi:
Martin, C., Bhui, R., and Bossaerts, P. (2014). Chimpanzee choice rates in competitive games match
equilibrium game theory predictions. Sci Rep 4, 51–81440
Miller, L. M. and D’Esposito, M. (2005). Perceptual fusion and stimulus coincidence in the cross-modal
integration of speech. Journal of Neuroscience 25, 5884–5893442
Monteiro, S. M. and Norman, G. (2013). Diagnostic reasoning: Where we’ve been, where we’re going.
Teaching and Learning in Medicine 25, S26–S3. doi:10.1080/10401334.2013.842911444
Newell, A. and Simon, H. A. (1976). Computer science as empirical inquiry: symbols and search.
Communications of the ACM 19, 113–126446
Nivel, E., Th
orisson, K. R., Steunebrink, B., Dindo, H., Pezzulo, G., Rodriguez, M., et al. (2013). Bounded
recursive self-improvement. Tech report RUTR-SCS13006, Reykjavik University – School of Computer
Nivel, E., Th
orisson, K. R., Steunebrink, B., and Schmidh
uber, J. (2015). Anytime bounded rationality. In
Proc. 8th International Conference on Artificial General Intelligence (AGI-15). 121–130451
Noesselt, T., Riegerand, J. W., Schoenfeld, M. A., Kanowski, M., Hinrichs, H., and Heinze, H. J. (2007).
Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus
primary sensory cortices. Journal of Neuroscience 27, 11431–11441454
Papaioannou, A. G., Kalantzi, E., Papageorgiou, C. C., and Korombili, K. (2021). Complexity analysis of
the brain activity in autism spectrum disorder (asd) and attention deficit hyperactivity disorder (adhd)
due to cognitive loads/demands induced by aristotle’s type of syllogism/reasoning. a power spectral
density and multiscale entropy (mse) analysis. Heliyon 7, e07984458
Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., and Wermter, S. (2019). Continual lifelong learning with
neural networks: A review. Neural Networks 113, 54–71. doi:
Perry, J. C., Pakkenberg, B., and Vann, S. D. (2018). Striking reduction in neurons and glial cells in anterior
thalamic nuclei of older patients with down’s syndrome. BioRxiv 449678 doi:doi:10.1101/449678463
Posner, I. (2020). Robots thinking fast and slow: On dual process theory and metacognition in embodied ai.
In RSS 2020 Workshop RobRetro465
Sampathkumar, V., Miller-Hansen, A., Sherman, S. M., and Kasthuri, N. (2021). Integration of signals
from different cortical areas in higher order thalamic neurons. PNAS 118, e2104137118. doi:10.1073/
This is a provisional file, not the final typeset article 12
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin 119, 3–24
Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences 11, 1–43
Spivey, M. J., Tanenhaus, M. K., Eberhard, K. M., and Sedivy, J. C. (2013). Eye movements and spoken
language comprehension: Effects of visual context on syntactic ambiguity resolution 45, 447–481472
Steenkiste, S. V., Chang, M., Greff, K., and Schmidhuber, J. (2018). Relational neural expectation
maximization: Unsupervised discovery of objects and their interactions. arXiv preprint474
Strack, F. and Deutsch, R. (2004). Reflective and impulsive determinants of social behavior. Personality
and Social Psychology Review 8, 220–247476
Sumner, P., Tsai, P. C., Yu, K., and Nachev, P. (2006). Attentional modulation of sensorimotor processes in
the absence of perceptual awareness. PNAS 103, 10520–10525478
orisson, K. R. (2020). Seed-programmed autonomous general learning. In Proceedings of Machine
Learning Research. 32–70480
orisson, K. R., Bieger, J., Li, X., and Wang, P. (2019). Cumulative learning. In Proc. International
Conference on Artificial General Intelligence (AGI-19). 198–209482
Tyll, S., Budinger, E., and Noesselt, T. (2011). Thalamic influences on multisensory integration. Commun.
Integr. Biol 4, 145–171484
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Kaiser, A. N. G. L., et al. (2017). Attention
is all you need. In Proc. Advances in Neural Information Processing Systems. 5998–6008486
Wang, P. (2005). Experience-grounded semantics: A theory for intelligent systems. Cogn. Syst. Res. 6,
282–302. doi:10.1016/j.cogsys.2004.08.003488
Wang, P. (2006). Rigid Flexibility: The Logic of Intelligence (Dordrecht: Springer)489
Wang, P. (2013). Non-Axiomatic Logic: A Model of Intelligent Reasoning (Singapore: World Scientific)490
Werner, S. and Noppeney, U. (2010). Superadditive responses in superior temporal sulcus predict
audiovisual benefits in object categorization. Cerebral Cortex 20, 1829 – 1842492
Wolff, M. and Vann, S. D. (2019). The cognitive thalamus as a gateway to mental representations. J
Neurosci 39, 3–14. doi:10.1523/JNEUROSCI.0479-18494
Xu, X., Hanganu-Opatz, I. L., and Bieler, M. (2020). Cross-talk of low-level sensory and high-level
cognitive processing: Development, mechanisms, and relevance for cross-modal abilities of the brain.
Frontiers in Neurorobotics 14. doi:10.3389/fnbot.2020.00007497
Yeh, C.-C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, A., et al. (2016). Matrix profile i: All pairs
similarity joins for time series: A unifying view that includes motifs, discords and shapelets. 1317–1322.
Frontiers 13
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Figure 1. Neurosymbolic Metamodel and Framework for Artificial General Intelligence
Figure 2. Flow of Retail Use Case for Metamodel (from Latapie et al., 2021)
This is a provisional file, not the final typeset article 14
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Figure 3. Levels of Abstraction for Retail Use Case (from Latapie et al., 2021)
Frontiers 15
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Figure 4. A Histogram of Regime Changes from Network Telemetry Data (A port shut down event started
at the 50th timestamp and ended at the 100th)
This is a provisional file, not the final typeset article 16
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Objective We aim to investigate whether EEG dynamics differ in adults with ASD (Autism Spectrum Disorders), ADHD (attention-deficit/hyperactivity disorder), compared with healthy subjects during the performance of an innovative cognitive task: Aristotle's valid and invalid syllogisms. We follow the Neuroanatomical differences type of criterion in assessing the results of our study in supporting or not the dual-process theory of Kahneman, 2011) (Systems I & II of thinking). Method We recorded EEGs from 14 scalp electrodes in 30 adults with ADHD, 30 with ASD and 24 healthy, normal subjects. The subjects were exposed in a set of innovative cognitive tasks (inducing varying cognitive loads), the Aristotle's four types of syllogism mentioned above. The multiscale entropy (MSE), a nonlinear information-theoretic measure or tool was computed to extract features that quantify the complexity of the EEG. Results The dynamics of the curves of the grand average of MSE values of the ADHD and ASD participants was significantly in higher levels for the majority of time scales, than the healthy subjects over a number of brain regions (electrodes locations), during the performance of both valid and invalid types of syllogism. This result is seemingly not in accordance of the broadly accepted ‘theory’ of complexity loss in ‘pathological’ subjects, but actually this is not the case as explained in the text. ADHD subjects are engaged in System II of thinking, for both Valid and Invalid syllogism, ASD and Control in System I for valid and invalid syllogism, respectively. A surprising and ‘provocative’ result of this paper, as shown in the next sections, is that the Complexity-variability of ASD and ADHD subjects, when they face Aristotle's types of syllogisms, is higher than that of the control subjects. An explanation is suggested as described in the text. Also, in the case of invalid type of Aristotelian syllogisms, the linguistic and visuo-spatial systems are both engaged ONLY in the temporal and occipital regions of the brain, respectively, of ADHD subjects. In the case of valid type, both above systems are engaged in the temporal and occipital regions of the brain, respectively, of both ASD and ADHD subjects, while in the control subjects only the visuo-spatial type is engaged (Goel et al., 2000; Knauff, 2007). Conclusion Based on the results of the analysis described in this work, the differences in the EEG complexity between the three groups of participants lead to the conclusion that cortical information processing is changed in ASD and ADHD adults, therefore their level of cortical activation may be insufficient to meet the peculiar cognitive demand of Aristotle's reasoning. Significance The present paper suggest that MSE, is a powerful and efficient nonlinear measure in detecting neural dysfunctions in adults with ASD and ADHD characteristics, when they are called on to perform in a very demanding as well as innovative set of cognitive tasks, that can be considered as a new diagnostic ‘benchmark’ in helping detecting more effectively such type of disorders. A linear measure alone, as the typical PSD, is not capable in making such a distinction. The work contributes in shedding light on the neural mechanisms of syllogism/reasoning of Aristotelian type, as well as toward understanding how humans reason logically and why ‘pathological’ subjects deviate from the norms of formal logic.
Full-text available
This paper introduces a new metamodel-based knowledge representation that significantly improves autonomous learning and adaptation. While interest in hybrid machine learning / symbolic AI systems leveraging, for example, reasoning and knowledge graphs, is gaining popularity, we find there remains a need for both a clear definition of knowledge and a metamodel to guide the creation and manipulation of knowledge. Some of the benefits of the metamodel we introduce in this paper include a solution to the symbol grounding problem, cumulative learning, and federated learning. We have applied the metamodel to problems ranging from time series analysis, computer vision, and natural language understanding and have found that the metamodel enables a wide variety of learning mechanisms ranging from machine learning, to graph network analysis and learning by reasoning engines to interoperate in a highly synergistic way. Our metamodel-based projects have consistently exhibited unprecedented accuracy, performance , and ability to generalize. This paper is inspired by the state-of-the-art approaches to AGI, recent AGI-aspiring work, the granular computing community, as well as Alfred Korzybski's general semantics. One surprising consequence of the metamodel is that it not only enables a new level of autonomous learning and optimal functioning for machine intelligences, but may also shed light on a path to better understanding how to improve human cognition.
Conference Paper
Full-text available
The knowledge that a natural learner creates based on its experience of any new situation is likely to be both partial and incorrect. To improve such knowledge with increased experience , cognitive processes must bring already-acquired knowledge towards making sense of new situations and update it with new evidence, cumulatively. For the initial creation of knowledge, and its subsequent usage, expansion, modification, unification, compaction and deletion, cognitive mechanisms must be capable of self-supervised "surgical" operation on existing knowledge, involving among other things self-inspection or reflection, to make possible selective discrimination, comparison, and manipulation of newly demarcated subsets of any relevant part of the whole knowledge set. Few proposals exist for how to achieve this in a single learner. Here we present a theory of how systems with these properties may work, and how cumulative self-supervised learning mechanisms might reach greater levels of autonomy than seen to date. Our theory rests on the hypotheses that learning must be (a) organized around causal relations, (b) bootstrapped from observed correlations and analogy, using (c) fine-grain relational models, manipulated by (d) micro-ampliative reasoning processes. We further hypothesize that a machine properly constructed in this way will be capable of seed-programmed autonomous generality: The ability to apply learning to any phenomenon-that is, being domain-independent-provided that the seed reference observable variables from the outset (at "birth"), and that new phenomena and existing knowledge overlap on one or more observables or inferred features. The theory is based on implemented systems that have produced notable results in the direction of increased general machine intelligence.
Conference Paper
Full-text available
Using a proprietary visual scene object tracker and the Open-NARS reasoning system we demonstrate how to predict and detect various anomaly classes. The approach combines an object tracker with a base ontology and the OpenNARS reasoning system to learn to classify scene regions based on accumulating evidence from typical entity class (tracked object) behaviours. The system can autonomously satisfy goals related to anomaly detection and respond to user Q&A in real time. The system learns directly from experience with no initial training required (one-shot). The solution is a fusion of sub-symbolic (object tracker) and symbolic (ontology and reasoning).
Full-text available
The anterior thalamic nuclei are important for spatial and episodic memory; however, there is surprisingly little information about how these nuclei are affected in many conditions that present with memory impairments, including Down syndrome. To assess the status of the anterior thalamic nuclei in Down syndrome we quantified neurons and glial cells in the brains from four older patients with this condition. There was a striking reduction in the volume of the anterior thalamic nuclei and this appeared to reflect the loss of approximately 70% of neurons. The number of glial cells was also reduced but to a lesser degree than neurons. The anterior thalamic nuclei appear to be particularly sensitive to effects of aging in Down syndrome and the pathology in this region likely contributes to the memory impairments observed. These findings re-affirm the importance of assessing the status of the anterior thalamic nuclei in conditions where memory impairments have been principally assigned to pathology in the medial temporal lobe.
Full-text available
Common-sense physical reasoning is an essential ingredient for any intelligent agent operating in the real-world. For example, it can be used to simulate the environment, or to infer the state of parts of the world that are currently unobserved. In order to match real-world conditions this causal knowledge must be learned without access to supervised data. To address this problem we present a novel method that learns to discover objects and model their physical interactions from raw visual images in a purely \emph{unsupervised} fashion. It incorporates prior knowledge about the compositional nature of human perception to factor interactions between object-pairs and learn efficiently. On videos of bouncing balls we show the superior modelling capabilities of our method compared to other unsupervised neural approaches that do not incorporate such prior knowledge. We demonstrate its ability to handle occlusion and show that it can extrapolate learned knowledge to scenes with different numbers of objects.
Full-text available
Highly distributed neural circuits are thought to support adaptive decision-making in volatile and complex environments. Notably, the functional interactions between prefrontal and reciprocally connected thalamic nuclei areas may be important when choices are guided by current goal value or action-outcome contingency. We examined the functional involvement of selected thalamocortical and corticothalamic pathways connecting the dorsomedial prefrontal cortex (dmPFC) and the mediodorsal thalamus (MD) in the behaving rat. Using a chemogenetic approach to inhibit projection-defined dmPFC and MD neurons during an instrumental learning task, we show that thalamocortical and corticothalamic pathways differentially support goal attributes. Both pathways participate in adaptation to the current goal value, but only thalamocortical neurons are required to integrate current causal relationships. These data indicate that antiparallel flow of information within thalamocortical circuits may convey qualitatively distinct aspects of adaptive decision-making and highlight the importance of the direction of information flow within neural circuits.
Full-text available
The concentrations of measure phenomena were discovered as the mathematical background to statistical mechanics at the end of the nineteenth/beginning of the twentieth century and have been explored in mathematics ever since. At the beginning of the twenty-first century, it became clear that the proper utilization of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality . This paper summarizes recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median-level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher’s discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us with such classifiers and determine a non-iterative (one-shot) procedure for their construction. This article is part of the theme issue ‘Hilbert’s sixth problem’.
Higher order thalamic neurons receive driving inputs from cortical layer 5 and project back to the cortex, reflecting a transthalamic route for corticocortical communication. To determine whether or not individual neurons integrate signals from different cortical populations, we combined electron microscopy “connectomics” in mice with genetic labeling to disambiguate layer 5 synapses from somatosensory and motor cortices to the higher order thalamic posterior medial nucleus. A significant convergence of these inputs was found on 19 of 33 reconstructed thalamic cells, and as a population, the layer 5 synapses were larger and located more proximally on dendrites than were unlabeled synapses. Thus, many or most of these thalamic neurons do not simply relay afferent information but instead integrate signals as disparate in this case as those emanating from sensory and motor cortices. These findings add further depth and complexity to the role of the higher order thalamus in overall cortical functioning.
Background: Functional connectivity (FC) is believed to be abnormal in attention-deficit hyperactivity disorder (ADHD). Most studies have focused on frontostriatal systems, and the role of the thalamic network in ADHD remains unclear. The current study used FC magnetic resonance imaging (fcMRI) to explore corticothalamic network properties and correlated network dysconnection with ADHD symptom severity. Methods: Eighteen adolescents with ADHD and 16 healthy controls aged 12 to 17 years underwent resting functional MRI scans, clinical evaluations, and 2 parent rating scales, namely the Swanson, Nolan, and Pelham IV scale and the Child Behavior Checklist. Six a priori cortical regions of interest were used to derive 6 networks: the dorsal default mode network, frontoparietal network, cingulo-opercular network (CON), primary sensorimotor network (SM1), primary auditory network, and primary visual network (V1). The corticothalamic connectivity for each network was calculated for each participant and then compared between the groups. We also compared the 2 scales with the network connectivity. Results: The corticothalamic connectivity within the CON was significantly reduced (p < 0.05) among adolescents with ADHD compared with the controls. The corticothalamic dysconnection within the CON, SM1, and V1 networks negatively correlated with ADHD symptom severity. Conclusion: This network analysis indicates that corticothalamic dysconnection in ADHD involves the CON, SM1, and V1 networks and relates to symptom severity. The findings provide evidence of dysfunctional thalamus-related networks in ADHD.