PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

A cognitive architecture aimed at cumulative learning must provide the necessary information and control structures to allow agents to learn incrementally and autonomously from their experience. This involves managing an agent's goals as well as continuously relating sensory information to these in its perception-cognition information stack. The more varied the environment of a learning agent is, the more general and flexible must be these mechanisms to handle a wider variety of relevant patterns, tasks, and goal structures. While many researchers agree that information at different levels of abstraction likely differs in its makeup and structure and processing mechanisms, agreement on the particulars of such differences is not generally shared in the research community. A binary processing architecture (often referred to as System-1 and System-2) has been proposed as a model of cognitive processing for low- and high-level information, respectively. We posit that cognition is not binary in this way and that knowledge at any level of abstraction involves what we refer to as neurosymbolic information, meaning that data at both high and low levels must contain both symbolic and subsymbolic information. Further, we argue that the main differentiating factor between the processing of high and low levels of data abstraction can be largely attributed to the nature of the involved attention mechanisms. We describe the key arguments behind this view and review relevant evidence from the literature.
Content may be subject to copyright.
1
Neurosymbolic Systems of Perception
& Cognition: The Role of Attention
Hugo Latapie,1Ozkan Kilic,1,Kristinn R. Th ´
orisson,2Pei Wang,3and Patrick
Hammer 4
1Emerging Technologies & Incubation, Cisco Systems, San Jose, CA, USA
2Icelandic Institute for Intelligent Machines and Department of Computer Science,
Reykjavik University, Reykjavik, Iceland
3Department of Computer and Information Sciences, Temple University,
Philadelphia, PA, USA
4Center for Digital Futures, KTH Royal Institute of Technology and Stockholm
University, Stockholm, Sweden
Correspondence*:
Ozkan Kilic
okilic@cisco.com
ABSTRACT2
A cognitive architecture aimed at cumulative learning must provide the necessary information
3
and control structures to allow agents to learn incrementally and autonomously from their
4
experience. This involves managing an agent’s goals as well as continuously relating sensory
5
information to these in its perception-cognition information processing stack. The more varied the
6
environment of a learning agent is, the more general and flexible must be these mechanisms to7
handle a wider variety of relevant patterns, tasks, and goal structures. While many researchers
8
agree that information at different levels of abstraction likely differs in its makeup and structure and
9
processing mechanisms, agreement on the particulars of such differences is not generally shared
10
in the research community. A dual processing architecture (often referred to as System-1 and
11
System-2) has been proposed as a model of cognitive processing, and they are often considered
12
as responsible for low- and high-level information, respectively. We posit that cognition is not
13
binary in this way and that knowledge at any level of abstraction involves what we refer to as
14
neurosymbolic information, meaning that data at both high and low levels must contain both
15
symbolic and subsymbolic information. Further, we argue that the main differentiating factor
16
between the processing of high and low levels of data abstraction can be largely attributed to the
17
nature of the involved attention mechanisms. We describe the key arguments behind this view
18
and review relevant evidence from the literature.19
Keywords: Artificial Intelligence, Cognitive Architecture, Perception, Cognition, Levels of Abstraction, Neurosymbolic Models, Learning,
20
Cumulative Learning, Systems of Thinking21
1 INTRODUCTION
Cognitive architectures aim to capture the information and control structures necessary to create autonomous
22
learning agents. The sensory modalities of artificially intelligent (AI) agents operating in physical
23
environments must measure relevant information at relatively low levels of detail, commensurate with
24
1
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
the agent’s intended tasks. Self-supervised learning makes additional requirements on the ability of an
25
agent to dynamically and continuously relate a wide variety of sensory information to high-level goals of
26
tasks. The more general an agent’s learning is, the larger a part of its perception-cognition “information
27
stack” must capture the necessary flexibility to accommodate a wide variety of patterns, plans, tasks, and
28
goal structures. Low levels of cognition (close to the perceptual senses) seem to quickly generate and use29
predictions to generalize across similar problems. This is a key responsibility of a sensory system because
30
low-latency predictions (i.e. those that the agent can act quickly on) are vital for survival in a rapidly
31
changing world. Natural intelligence has several outstanding skills that Deep Learning does not have. Two
32
of these, as pointed out by e.g. Bengio et al. (2021), are that (a) it does not require thousands of samples to
33
learn, and (b) it can cope with out-of-order (OOD) samples. As detailed by e.g. Th
´
orisson et al. (2019),
34
another equally important shortcoming is that Deep Learning does not handle learning after the system
35
leaves the laboratory – i.e. cumulative learning – in part because it does not harbour any means to verify
36
newly acquired information autonomously. Such skills require not only perception processes that categorize
37
the sensory data dynamically so that the lower levels can recognize ‘familiar’ situations by reconfiguring
38
known pieces and trigger higher-level cognition in the case of surprises, but also the reasoning to evaluate
39
the new knowledge that has been thus produced. Whenever high-level cognition solves a new problem,
40
the coordination allows the new knowledge to modify and improve the lower levels for similar future
41
situations, which also means that both systems have access to long-term memory. Architectures addressing
42
both sensory- and planning-levels of cognition are as of yet few and far between.43
While general agreement exists in the research community that information at different levels of
44
abstraction likely differs in makeup and structure, agreement on these differences – and thus the particulars
45
of the required architecture and processes involved – is not widely shared. It is sometimes assumed that
46
lower levels of abstraction are subsymbolic
1
and higher levels symbolic, which has led some researchers to
47
the idea that Deep Learning models are analogous to perceptual mechanisms while higher levels involve rule-
48
based reasoning skills due to a symbolic nature, and according to e.g. Kahneman (2011), is the only system
49
that can use language. This view has been adopted in some AI research, where ‘subsymbolic’ processing
50
are classified as System-1 processes, while higher-level and ‘symbolic’ processing is considered belonging
51
to System-2 (c.f. Smolensky, 1988; Sloman, 1996; Kahneman, 2011, Strack & Deutch, 2004). According to
52
this view, artificial neural networks, including Deep Learning, are System-1 processes; rule-based systems
53
are System-2 processes (see Bengio et al., 2021; and Bengio, 2019 for discussion). Similarly, William James
54
1890 proposed that the mind has two mechanisms of thought, one which handled reasoning and another
55
which was associative. We posit instead that cognition is not binary in this way at all, and that any level of
56
abstraction involves processes operating on what might be called “neurosymbolic” knowledge, meaning that
57
data at both high and low levels must accommodate both symbolic and subsymbolic information.
2
Further,
58
we argue that a major differentiating factor between the processing of high and low levels of data abstraction
59
can be largely attributed to the nature of the involved attention mechanisms.60
More than a century ago, James (1890) defined attention as “taking possession by the mind, in clear and
61
vivid form, of one out of what may seem several simultaneously possible objects or trains of thought...It
62
implies withdrawal from some things in order to deal effectively with others.” We consider attention to
63
consist of a (potentially large) set of processes whose role consists in steering the available resources of
64
a cognitive system, from moment to moment, including (but not limited to) its short-term focus, goal
65
1
We classify data as ‘subsymbolic’ if it can only be manipulated through approximate similarity-mapping processes, i.e. cannot be grouped and addressed as a
(named) set.
2
By ‘symbolic’ here we mean that the information is at the level of abstraction close to human verbal description, not that it uses ‘symbols’ that must be
interpreted or ‘grounded’ to become meaningful.
This is a provisional file, not the final typeset article 2
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
pursuit, sensory control, deliberate memorization, memory retrieval, selection of sensory data, and many
66
other subconscious control mechanisms that we can only hypothesize at this point and thus have no
67
names for. Low-level cognition, like perception, is characterized by a relatively high-speed, distributed
68
(“multi-threaded”), subconscious
3
attention control, while higher-level cognition seems more “single-
69
threaded”, and relatively slower. When people introspect, our conscious threads of attention seem to
70
consist primarily of the latter, while much of our low-level perceptions are subconscious and under the
71
control of autonomous attention mechanisms (see Koch and Tsuchiya, 2006; Marchetti, 2011; Sumner
72
et al., 2006 for evidence and discussion about decoupling attention from conscious introspection). Low-
73
level perception and cognitive operations may reflect autonomous access to long-term memory through
74
subconscious attention mechanisms, while higher-level operation may involve the recruitment of deliberate
75
(introspectively-accessible) cognitive control, working memory, and focused attention (Papaioannou et al.,
76
2021).77
Two separate issues in the System-1/System-2 discussion are often confused: (1) Knowledge
78
representation and (2) information processing. The first is the (by now, familiar) ‘symbolic vs. subsymbolic’
79
distinction, while the second involves the ‘automatic vs. controlled’ distinction. Not only are these two
80
distinctly different, they are also not perfectly aligned; while subsymbolic knowledge may be more often
81
processed ‘automatically’ and symbolic knowledge seem generally more accessible through voluntary
82
control and introspection, this mapping cannot be taken as given. A classic example is skill learning
83
like riding a bike, which starts as a controlled process, and gradually becomes automatic with increased
84
training. On the whole this process is largely subsymbolic, with hardly anything but the top-level goals
85
introspectively accessible to the learner of bicycle-riding (“I want to ride this bicycle without falling or
86
crashing into things”). Though we acknowledge the above differences, in this article our focus is on the
87
relations and correlations between these two distinctions.88
2 RELATED WORK & ATTENTION’S ROLE IN COGNITION
The sharp distinction between two hypothesized systems that some AI researchers have interpreted dual-
89
process theory to entail (cf. Posner, 2020) doesn’t seem very convincing when we look at the dependencies
90
between the necessary levels of processing. For instance, it has been demonstrated time and again (cf. Spivey
91
et al., 2013) that expectations created verbally (‘System-2 information’) have a significant influence on
92
low-level behavior like eye movements (‘System-1 information’). It is not obvious why – or how – two
93
sharply separated control systems would be the best – or even a good – way to achieve a tight coupling
94
between levels thus demonstrated, as has been noted by other authors (cf. Houwer, 2019). Until more
95
direct evidence is collected for the hypothesis that there really are two systems (as opposed to three, four,96
fifty, or indeed a continuity), it is a fairly straight forward task to fit the available evidence onto that theory
97
(cf. Strack and Deutch, 2004). In the context of AI, more direct evidence would include a demonstration of
98
an implemented control scheme that produced some of the same key properties as human cognition from
99
first principles.100
We would expect high-level (abstract) and low-level (perceptual/concrete) cognition to work in
101
coordination, not competition, after millions of years of evolution. Rather than implementing a (strict, or
102
semi-strict) pipeline structure between S1 and S2, where only data would go upstream (from S1 to S2) and
103
only control downstream (from S2 to S1; cf. Evans and Elqayam, 2007; Evans and Stanovich, 2013; Keren,
104
2013; Monteiro and Norman, 2013), we hypothesize high-level and low-level cognition to be coupled
105
3
We consider ‘subconscious’ cognitive processes the set of processes that are necessary for thought and that a mind cannot make the subject of its own
cognitive processing, i.e. all its processes that it does not have direct intropsective access to.
Frontiers 3
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
through a two-way control-and-data communication, as demonstrated in numerous experiments (see Xu,
106
2020 review article on cross-modal processing between high- and low- level cognition). In other words,
107
the low-level cognition does not solely work under control of the high-level one; rather, the two levels
108
cooperate to optimize resource utilization through joint control.109
Through the evolution of the human brain, some evidence seems to indicate that language-based
110
conceptual representations replaced sensory-based compositional concepts, explaining the slower reaction
111
times in humans than other mammals, e.g., chimpanzees (see for instance Martin et al., 2014). However,
112
this replacement may have pushed the boundaries of human higher-level cognition by allowing complex
113
propositional representations and mental simulations. While animals do not demonstrate propositional
114
properties of human language, researchers have found some recursion in birdsong (Gentner et al., 2006)
115
and in syntax among bonobos (Clay and Zuberbuhler, 2011). Moreover, Camp (2009) found evidence
116
that some animals think in compositional representational systems. In other words, animals seem to lack
117
propositional thought, but they have compositional conceptual thought, which is mostly based on integrated
118
multisensory data. Since animals appear to have symbol-like mental representations, these findings indicate
119
that their lower levels can be neurosymbolic. Evidence for this can be found in a significant number studies
120
from the animal-cognition literature (for review, see Camp, 2009; Hubbard et al., 2008; Diester and Nieder,
121
2007; Hauser et al., 2007; Brannon, 2005).122
Among the processes of key importance in skill learning, to continue with that example, is attention; a
123
major cognitive difference between a skilled bike rider and a learner of bike-riding is what they pay attention
124
to: The knowledgeable rider pays keen attention to the tilt angle and speed of the bicycle, responding
125
by changing the angle of the steering wheel dynamically, in a non-linear relationship. Capable as they
126
may already be of turning the front wheel to any desired angle, a learner is prone to fall over in large part
127
because they don’t know what to pay attention to. This is why one of the few obviously useful tips that
128
a teacher of bicycle-riding can give a learner is to “always turn the front wheel in the direction you are
129
falling.130
Kahneman (1973) sees attention as a pool of resources which allows different process to share cognitive
131
capabilities and posits a System-1 that is fast, intrinsic, autonomous, emotional, parallel, and a System-2
132
that is slower, deliberate, conscious, and serial (Kahneman, 2011). For example, driving a car on an
133
empty road (with no unexpected events), recognizing your mother’s voice, and calculating 2+2, mostly
134
involve System-1, whereas counting the number of people with eyeglasses in a meeting, recalling and
135
dialing your significant other’s phone number, calculating 13x17, and filling out a tax form depend on
136
System-2. Kahneman’s System-1 is good at making quick predictions because it constantly models similar
137
situations based on experience. It should be noted that “experience” in this context relates to the the process
138
of learning, and its transfer – i.e. generalization and adaptation – which presumably relies heavily on
139
higher-level cognition (and should thus be part of System-2). Learning achieved in conceptual symbolic
140
space can be projected to subsymbolic space. In other words, since symbolic and subsymbolic spaces are
141
in constant interaction, acquired knowledge in symbolic space has correspondences in subsymbolic space.
142
This allows System-1 to start quickly using the projections of the knowledge, even based on System-2
143
experience.144
Several fMRI studies support the idea that sensory-specific areas, such as thalamus, may be involved
145
in multi-sensory stimulus integrations (Miller and D’Esposito, 2005; Noesselt et al., 2007; Werner
146
and Noppeney, 2010), which are symbolic representations in nature. Sensory-specific brain regions are
147
considered to be networks specialized in subsymbolic data that originates from the outside world and
148
different body parts. Thalamo-cortical oscillation is known as a synchronization mechanism or temporal
149
This is a provisional file, not the final typeset article 4
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
binding between different cortical regions (Llinas, 2002). However, recent evidence shows that the thalamus,
150
previously assumed to be responsible only for relaying sensory impulses from body receptors to the cerebral
151
cortex, can actually integrate these low-level impulses (Tyll et al., 2011; Sampathkumar et al., 2021). In
152
other words, in the thalamus there are sensory-based integrations, and they are essential in sustaining
153
cortical cognitive functions.154
Wolff and Vann (2019) use the term “cognitive thalamus” to describe a gateway to mental representations
155
because recent findings support the idea that thalamocortical and corticothalamic pathways may play
156
complementary but dissociable cognitive roles (see Bolkan et al., 2017; Alcaraz et al., 2018). More
157
specifically, the thalamocortical pathway (the fibers connecting thalamus to cortex region) can create and
158
save task-related representations, not just purely sensory information, and this pathway is essential
159
for updating cortical representations. Similarly, corticothalamic pathways seem to have two major
160
functions: directing cognitive resources (focused attention) and contributing to learning. In a way, the
161
thalamocortical pathway defines the world for the cortex, and the corticothalamic pathway uses attention
162
to tell thalamus what the cortex needs from it to focus. Furthermore, a growing body of evidence shows
163
that the thalamus plays a role in cognitive dysfunction, such as schizophrenia (Anticevic et al.,2014),
164
Down’s syndrome (Perry et al., 2018), drug addiction (Balleine et al., 2015), and ADHD (Hua et al.,
165
2021). These discoveries support other recent findings about the role of the thalamus in cognition via
166
the thalamocortical loop. The thalamus, a structure proficient in using and integrating subsymbolic data
167
actively, describes the world for the cortex by contributing to the symbolic representations in it. On the
168
other hand, the cortex uses attention to direct resources to refresh its symbolic representations from the
169
subsymbolic space. In Non-Axiomatic Reasoning System (NARS; Wang, 2006) attention has the role
170
of allocating processing power for producing and scheduling inference steps, whereby inferences can
171
compose new representation from existing components, seek out new ones, and update the strength of
172
existing relationships via knowledge revision. This control also leads to a refreshing of representations
173
in a certain sense, as the system will utilize the representations which are most reliable and switch to
174
alternatives if some of them turn out to be unreliable.175
In the Auto-catalytic Endogenous Reflective Architecture (AERA) attention is implemented as system-
176
permeating control of computational/cognitive resources at very fine-grain levels of processing, bounded by
177
goals at one end and the current situation at the other (cf. Nivel et al., 2015; Helgason et al., 2013). Studies
178
on multitasking in humans have shown that a degree of parallelism among multiple tasks is more likely if
179
the tasks involve different data modalities, such as linguistic and tactile. Low-level attention continuously
180
monitors both mind and the outside world and assesses situations (i.e., relates it to active goals and plans)
181
with little or no effort, through its access to long-term memory and the sensory information. Surprises and
182
threats and detected early in the perceptual stream, while plans and questions are handled at higher levels
183
of abstraction, triggering higher levels of processing, which also provide a top-down control of attention
184
and reasoning.185
In contrast to so-called “attention” mechanisms in artificial neural networks (which are for the most
186
part rather narrow interpretations of resource control in general), mental resources (processing power and
187
storage in computer systems) are explicitly distributed, whereby filtering of input for useful input patterns
188
is just a special case. Another aspect is priming for related information by activating it, which is not limited
189
to currently perceived information but can integrate long-term memory content rather than just content of a
190
sliding window (as in Transformers) of recent stimuli in input space.191
Frontiers 5
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
3 A NEUROSYMBOLIC ARCHITECTURE AS SYSTEMS OF THINKING
The idea of combining symbolic and sub-symbolic approaches, also known as the neurosymbolic approach,
192
is not new. Many researchers are working on integrated neural-symbolic systems which translate symbolic
193
knowledge into neural networks (or the other way around), because symbols, relations, and rules should
194
have counterparts in the sub-symbolic space. Moreover, the neurosymbolic network needs a symbol
195
manipulation that also supports preservation of the structural relations between the two systems without
196
losing the correspondences.197
Currently, Deep Learning and related machine learning methods are primarily subsymbolic. Meanwhile,
198
rule-based systems and related reasoning systems are usually strictly symbolic. We consider it possible to
199
have a Deep Learning model that demonstrates symbolic cognition (without reasoning mechanisms) that
200
entails the transformation of symbolic representations into subsymbolic ML/DL/statistical models. One
201
of the costs associated with such transformation, however, is an inevitable loss of the underlying causal
202
model which may have existed in the symbolic representation (Parisi et al., 2019). Current subsymbolic
203
representations are exclusively correlational; information based on spurious correlation is indistinguishable
204
from other correlations and causal direction between correlating variables is not represented and thus not
205
separable from either of those knowledge sets.206
There is an ongoing interest in bringing symbolic and abstract thinking to Deep Learning, which could
207
enable more powerful kinds of learning. Graph neural networks with distinct nodes (Kipf et al., 2018; Van
208
Steenkiste et al., 2018), transformers with discrete positional elements (Vaswani et al., 2017), and modular
209
models with bandwidth (Goyal and Bengio, 2020) are examples of attempts in this direction. Liu et al.
210
(2021) summarize the advantages of having discrete values (symbols) in a Deep Learning architecture.
211
First, using symbols allows a language for inter-modular interaction and learning, whereby the meaning
212
of symbols is not innate but determined by the relationships with others (as in Semiotics). Second, it
213
allows reusing previously learned symbols in unseen or out-of-order situations, by reinterpreting them in
214
a way suitable to the situation. Discretization in Deep Learning may provide systematic generalization
215
(recombining existing concepts) but it is currently not very successful (Lake and Baroni, 2018).216
Current hybrid approaches attempt to combine symbolic and subsymbolic models to compensate for
217
each other’s drawbacks. However, the authors believe that there is a need for a metamodel which will
218
accommodate hierarchical knowledge representations. Latapie et al. (2021) proposed such a model inspired
219
by Korzybski’s (1994) idea about levels of abstraction. Their model promotes cognitive synergy and
220
metalearning, which refer to the use of different computational techniques and AGI approaches, e.g.,
221
probabilistic programming, machine learning/Deep Learning, AERA (Th
´
orisson, 2020; Nivel et al., 2013),
222
NARS
4
(Wang, 2006; Wang, 2013) to enrich its knowledge and address combinatorial explosion issues.
223
The current paper extends the metamodel as a neurosymbolic architecture5as in Figure 1.224
In this metamodel, the levels of abstractions
6
are marked with L.L0 is the closest to the raw data
225
collected from various sensors. L1 contains the links between raw data and higher level abstractions. L2
226
corresponds to the highest integrated levels of abstraction learned through statistical learning, reasoning,
227
and other processes. The layer L2 can have an infinite number of sub-layers since any level of abstraction
228
in L2 can have metadata existing at an even higher level of abstraction. L* holds the high-level goals
229
4With open-source implementation OpenNARS at https://github.com/opennars/opennars — last accessed on Oct 20th, 2021.
5
This architecture has a “symbolic” aspect in the sense that there are components that can be are accessed and manipulated using their identifiers. This is
different from traditional Symbolic AI where a “symbol” gets its meaning by referring to an external object or event, as stated by Newell and Simon (1976).
6Krozybski (1994) states that knowledge is a multiordinal, hierarchical structure with varying levels of abstraction.
This is a provisional file, not the final typeset article 6
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
and motivations, such as self-monitoring, self-adjusting, self-repair, and the like. Similar to the previous
230
version, the neurosymbolic metamodel is based on the assumption of insufficient knowledge and resources
231
(Wang, 2005). The symbolic piece of the metamodel can be thought of as a knowledge graph with some
232
additional structure that includes both a formalized means of handling anti-symmetric and symmetric
233
relations, as well as a model of abstraction. The regions in the subsymbolic piece of the metamodel are
234
mapped to the nodes in the symbolic system in L1. In this approach, the symbolic representations are
235
always refreshed in a bottom-up manner.236
Depending on the system’s goal or subgoals, the metamodel can be readily partitioned into subgraphs
237
using the hierarchical abstraction substructure associated with the current focus of attention. This
238
partitioning mechanism is crucial to manage combinatorial explosion issues while enabling multiple
239
reasoners to operate in parallel. Each partition can trigger a sub-focus of attention (sFoA), which requests
240
subsymbolic data from System-1 or some answers from System-2. The bottom-up refreshing and the
241
neurosymbolic mapping between regions and symbols allow the metamodel to benefit from different
242
computational techniques (e.g., probabilistic programming, Machine Learning/Deep Learning and such) to
243
enrich its knowledge and benefit from the ‘blessing of dimensionality’ (cf. Gorban, 2018), also referred to
244
as ‘cognitive synergy.245
A precursor to the metamodel as a neurosymbolic approach was first used by Hammer et al. (2019).
246
This version was the first commercial implementation of a neurosymbolic AGI-aspiring
7
approach in the
247
smart city domain. Later, the need for use of the levels of abstraction in the metamodel became mandatory
248
due to the combinatorial explosion issue. In other words, structural knowledge representation with the
249
levels of abstraction became very important for partitioning the problem, process subsymbolic or symbolic
250
information for each sub problem (focus of attention, FoA), and then combine the symbolic results in the
251
metamodel. The metamodel with the level of abstraction was actually achieved fully in the retail domain
252
(see Latapie et al., 2021 for details). The flow of the retail use case with the metamodel is shown in Figure
253
2. The example for the levels of abstraction using the results of the retail use case is shown in Figure 3.
254
Latapie et al. (2021) emphasized that no Deep Learning model was trained with product or shelf images for
255
the retail use case. The system used for the retail use case is solely based on representing the subsymbolic
256
information in a world of bounding boxes with spatial semantics. The authors tested the metamodel in 4
257
different settings with and without the FoA and reported the results as in Table 1.258
Table 1. Experimental results from Retail Use Case using Metamodel
Category without FoA (%) with FoA (%)
precision recall f1-score precision recall f1-score
product 80.70 29.32 52.88 96.36 99.07 97.70
shelf 8.82 18.75 12.00 82.35 87.50 88.85
other 36.61 89.66 52.00 96.00 82.76 88.89
overall accuracy 46.30 (min/max: 30.13/84.65) 94.73 (min/max: 88.10/100.00)
Another use case for the metamodel is the processing of more than 200,000 time series with a total of
259
more than 30 million individual data points. The time series are network telemetry data. For this use case,
260
there are only two underlying assumptions: The first assumption is that the time series or a subset of them
261
is at least weakly-related, such as time series from computer network devices. The second assumption
262
7
Artificial general intelligence (AGI) is the research area closest to the original vision of the field of AI, namely, to create machines with intelligence on par
with humans.
Frontiers 7
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
is that when a number of time series simultaneously change their behaviors, it might indicate that an
263
event-of-interest has happened. For detecting anomalies and finding regime change locations, Matrix
264
Profile algorithms are used (see Yeh et al., 2016; Gharhabi et al. 2017 for Matrix Profile and Semantic
265
Segmentation). Similar to the retail use case, millions of sensory data points are reduced to a much smaller
266
number of events based on the semantic segmentation points. These points are used to form a histogram
267
of regime changes as shown in Figure 4. The large spikes in the histogram are identified as the candidate
268
events-of-interest. Then the metamodel creates a descriptive model for all time series, which allows system
269
to downsize millions of data points into a few thousand structural actionable and explainable knowledge.270
To test the metamodel with time series, we first use a subset of the Cisco Open Telemetry Data Set.
8
271
After being able to identify the anomalies in the data set, we create our own data sets similar to the Open
272
Telemetry Data. For this purpose, 30 computer network events, such as memory leak, transceiver pull,
273
port flap, port shut down, and such, are injected to a physical computer network. The system is able to
274
identify 100% of the events with a maximum of 1 minute delay. For example, Figure 4 represents the
275
histogram of regime changes for a port shut down event, which is injected at the 50
th
timestamp. Since the
276
sampling rate is 6 seconds, one minute later (which is at the 60
th
timestamp) the system detects a spike as
277
an event-of-interest. It can take time for a single incident to display a cascading effect on multiple devices.
278
When the injection ends at the 100
th
timestamp, another spike is observed within 10 timestamps, which
279
represents a recovery behavior for the network. It should be noted that not all events necessarily mean
280
an error has happened. Some usual activities in the network, e.g., a usual firmware update on multiple
281
devices as events-of-no-interest, are also captured by the metamodel. The metamodel learns to classify
282
such activities either by observing the network. Although the time series processing using the metamodel
283
does not require any knowledge of computer networking, it can easily incorporate such features extracted
284
by networking-specific modules, e.g., Cisco Joy,
9
or ingest some expert knowledge defined in the symbolic
285
world, specifically at the 2
nd
level of abstraction This neurosymbolic approach with the metamodel can
286
quickly reduce the sensory data into knowledge, reason on this knowledge, and notify the network operators
287
for remediation or trigger a self-healing protocol.288
4 DISCUSSION
The neurosymbolic approach presented here evolved from several independent research efforts by four
289
core teams (NARS, AERA, OpenCog (Hart and Goertzel, 2008)) as well as efforts at Cisco over the
290
past 10 years focusing on hybrid state-of-the-art AI for commercial applications. This empirically-based
291
approach to AI took off (circa 2010) with deep-learning based computer vision, augmented by well-known
292
tracking algorithms (e.g. Kalman filtering / Hungarian algorithm). The initial hybrid architecture resulted293
in improved object detection and tracking functionality, but the types of errors, arguably related to weak
294
knowledge representation and poor ability to define and learn complex behaviors, resulted in systems
295
which did not meet our performance objectives. This initial hybrid architecture was called DFRE, Deep
296
Fusion Reasoning Engine, which actually lacked the metamodel. In order to improve the system’s ability to
297
generalize, NARS was incorporated. The initial architecture used NARS to reason about objects and their
298
movements in busy city intersection with trains, busses, pedestrians, and heavy traffic. This initial attempt
299
at a commercial neurosymbolic system dramatically improved the ability of the system to generalize and
300
learn behaviors of interest, which in this case were all related to safety. In essence the objective of the
301
system was to raise alerts if any two moving objects either made contact or were predicted to make contact
302
8https://github.com/cisco-ie/telemetry
9https://github.com/cisco/joy
This is a provisional file, not the final typeset article 8
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
as well as to learn other dangerous behaviors such as jay walking, wrong-way driving, and such. While
303
this system worked well as an initial prototype and is considered a success, there were early indications of
304
potential computational scalability issues if the number of objects requiring real-time processing were to
305
increase from the average 100 or so to say an order of magnitude more objects, such as 1000. In order to
306
explore this problem we then focused on a retail inventory use case that required the processing of over
307
1000 objects. As expected, DFRE suffered from the predicted combinatorial explosion issues. In the retail
308
use case, this problem was solved via the metamodel’s abstraction hierarchy which provides a natural
309
knowledge partitioning mechanism. This partitioning mechanism was used to address the exponential time
310
complexity problem and convert it to a linear time complexity problem.311
While NARS enabled the system to learn by reasoning in an unsupervised manner, there was a
312
growing need in commercial applications for a principled mechanism for unsupervised learning directly
313
from temporal data streams such as sensor data, video data, telemetry data, etc. This is the focus of
314
AERA as well as internal Cisco project Kronos based on Matrix Profile (Yeh, 2016). While there is
315
a large body of work on time series processing (FFT, Wavelets, Matrix Profile, etc.), the problem of
316
dealing with large-scale time series and incorporating contextual knowledge to produce descriptive and
317
predictive models with explanatory capability seems relatively unsolved at the time of this writing. In
318
our preliminary experimentation, both AERA and Cisco’s Kronos projects are demonstrating promising
319
results. Incorporating AERA and Kronos into the hybrid architecture is expected to result in enhanced
320
unsupervised learning and attention mechanisms directly from large-scale time series.321
This evolved hybrid architecture (ML/DL/NARS/Kronos metamodel) is expected to promote cognitive
322
synergy while preserving level of abstraction, symmetric and anti-symmetric properties of knowledge and
323
using a bottom-up approach to refresh System-2 symbols from System-1 data integration (see Latapie et al.,
324
2021 for details). Moreover, System-1 provides rapid responses to the outside world and activates System-2
325
in case of a surprise such as an emergency or other significant event that requires further analysis and
326
potential action. System-2 uses conscious attention to request subsymbolic knowledge and sensory data
327
from System-1, to be integrated into the levels of abstraction inspired from Korzybski’s work. Korzybski’s
328
two major works (Korzybski, 1921; Korzybski, 1994) emphasize the importance of bottom-up knowledge.
329
The corticothalamic and thalamocortical connections play different but complementary roles.330
A balanced interplay between System-1 and System-2 is important. System-1’s innate role is to ensure
331
the many faceted health of the organism. System-2 is ideally used to help humans better contend with
332
surprises, threats, complex situations, important goals, and achieve higher levels in Maslow’s hierarchy
333
of needs. From an AI systems perspective, contemporary Deep/Machine Learning methods (including
334
Deep Learning) have it the other way around: Causal modeling and advanced reasoning are being solved
335
in System 1, leveraging statistical models which can be seen as an inversion of proper thalamocortical
336
integration.337
5 CONCLUSIONS
While not conclusive, findings about natural intelligence from psychology, neuroscience, cognitive science,
338
and animal cognition imply that both low-level perceptual knowledge and higher-level more abstract
339
knowledge may be neurosymbolic. The difference between high and low levels of abstraction may be that
340
lower levels involve a greater amount of unconscious (automatic) processing and attention, while higher
341
levels are introspectable to a greater extent (in humans, at least) and involve conscious (i.e. steerable)
342
attention. The neurosymbolic metamodel and framework introduced in this paper for artificial general
343
Frontiers 9
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
intelligence is based on these findings, and the nature of the distinction between both systems will be
344
subject to further research. One may ask whether artificial intelligence needs to mimic natural intelligence
345
as a key performance indicator. The answer is yes and no. No, because natural intelligence, a result of
346
billions of years of evolution, is full of imperfections and mistakes. Yes, because it is the best way known
347
to help organisms survive for countless generations.348
Both natural and artificial intelligences can exhibit astounding generalizability, performance, ability
349
to learn, and other important adaptive behaviors when symbolic originating attention and sub-symbolic
350
originating attention are properly handled. Allowing one system of attention to dominate, or inverting
351
the natural order (e.g. reasoning in the subsymbolic space or projecting symbolic space stressors into the
352
subsymbolic space) may lead to suboptimal results for engineered systems, individuals, and societies.353
AUTHOR CONTRIBUTIONS
H. Latapie and O. Kilic conceived of the presented idea. H. Latapie designed the framework and the
354
experiments. H. Latapie and O. Kilic implemented the framework. O. Kilic ran the tests and collected data.
355
K. R. Th
´
orisson, P. Wang and P. Hammer contributed to the theoretical framework. All authors contributed
356
to the writing of this manuscript and approved the final version.357
ACKNOWLEDGMENTS
The authors would like to thank Tony Lofthouse for his valuable comments.358
REFERENCES
Alcaraz, F., Fresno, V., Marchand, A. R., Kremer, E. J., Coutureau, E., and Wolff, M. (2018).
359
Thalamocortical and corticothalamic pathways differentially contribute to goal-directed behaviors in the
360
rat. eLife 7, e32517. doi:10.7554/eLife.32517361
Anticevic, A., Cole, M. W., Repovs, G., Murray, J. D., Brumbaugh, M. S., Savic, A. M. W. A., et al. (2014).
362
Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cerebral Cortex 24,
363
3116–3130. doi:10.1093/cercor/bht165pmid:23825317364
Balleine, B. W. and Leung, R. W. M. B. K. (2015). Thalamocortical integration of instrumental learning
365
and performance and their disintegration in addiction. Brain Research 1628, 104–116. doi:10.1016/j.
366
brainres.2014.12.023367
Bengio, Y. (2019). From system1 deep learning to system2 deep learning [conference presentation].
368
NeurIPS 2019 Posner Lecture369
Bengio, Y., Lecun, Y., and Hinton, G. (2021). Deep learning for ai. Communications of the ACM 64,
370
58–65371
Bolkan, S. S., Stujenske, J. M., Parnaudeau, S., Spellman, T. J., Rauffenbart, C., Abbas, A. I., et al.
372
(2017). Thalamic projections sustain prefrontal activity during working memory maintenance. Nature
373
Neuroscience 20, 987–996374
Brannon, E. M. (2005). What animals know about number. In Handbook of mathematical cognition, ed.
375
J. I. D. Campbell (New York: Psychology Press). 85–108376
Camp, E. (2009). Language of baboon thought. In The Philosophy of Animal Minds, ed. R. W. Lurz
377
(Cambridge: Cambridge University). 108–127378
Clay, Z. and Zuberb
¨
uhler, K. (2011). Bonobos extract meaning from call sequences. PLoS ONE 6, e18786
379
This is a provisional file, not the final typeset article 10
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Diester, I. and Nieder, A. (2007). Semantic associations between signs and numerical categories in the
380
prefrontal cortex. PLoS Biol 5381
Evans, J. S. B. and Elqayam, S. (2007). Dual-processing explains base-rate neglect, but which dual-process
382
theory and how? Behavior and Brain Science 30, 261–262. doi:10.1017/S0140525X07001720383
Evans, J. S. B. T. and Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the
384
debate. Perspectives in Psychological Science 8, 223–241385
Gentner, T. Q., Fenn, K. M., Margoliash, D., and Nusbaum, H. C. (2006). Recursive syntactic pattern
386
learning by songbirds. Nature2006 440, 1204–1207387
Gharghabi, S., Ding, Y., Yeh, C.-C. M., Kamgar, K., Ulanova, L., and Keogh, E. (2017). Matrix profile
388
viii: Domain agnostic online semantic segmentation at superhuman performance levels. In 2017 IEEE
389
International Conference on Data Mining (ICDM). 117–126. doi:10.1109/ICDM.2017.21390
Gorban, A. N. and Tyukin, I. Y. (2018). Blessing of dimensionality: mathematical foundations of the
391
statistical physics of data. Philosophical Transactions of the of the Royal Society Mathematical Physical
392
and Engineering Sciences 440, 1204–1207393
Goyal, A. and Bengio, Y. (2020). Inductive biases for deep learning of higher-level cognition. arXiv
394
preprint395
Hammer, P., Lofthouse, T., Fenoglio, E., and Latapie, H. (2019). A reasoning based model for anomaly
396
detection in the smart city domain. In NARS Workshop in AGI-19, Shenzhen, China, August 6. 1–10397
Hart, D. and Goertzel, B. (2008). Opencog: A software framework for integrative artificial general
398
intelligence. In Proc. of AGI2008, eds. P. Wang, B. Goertzel, and S. Franklin. 468–472399
Hauser, M. D., Dehaene, S., Dehaene-Lambertz, G., and Patalano, A. L. (2007). Spontaneous number
400
discrimination of multi-format auditory stimuli in cotton-top tamarins (saguinus oedipus). Cognition 86,
401
B23–B32402
Helgason, H. P., Th
´
orisson, K. R., Garrett, D., and Nivel, E. (2013). Towards a general attention mechanism
403
for embedded intelligent systems 4, 1–7404
Houwer, J. D. (2019). Moving beyond system 1 and system 2: Conditioning, implicit evaluation, and
405
habitual responding might be mediated by relational knowledge. Experimental Psychology 66, 257–265
406
Hua, M., Chen, Y., Chen, M., Huang, K., Hsu, J., Bai, Y., et al. (2021). Network-specific corticothalamic
407
dysconnection in attention-deficit hyperactivity disorder. Journal of Developmental and Behavioral
408
Pediatrics 42, 122–127. doi:10.1097/DBP.0000000000000875409
Hubbard, E. M., Diester, I., Cantlon, J. F., Ansar, D., van Opstal, F., and Troiani, V. (2008). The evolution
410
of numerical cognition: From number neurons to linguistic quantifiers. The Journal of Neuroscience 28,
411
11819 –11824412
James, W. (1890). The Principles of Psychology, Vol. 2 (NY: Dover Publication)413
Kahneman, D. (1973). Attention and Effort (NJ: Prentice-Hall)414
Kahneman, D. (2011). Thinking, fast and slow (NY: Farrar, Straus and Giroux)415
Keren, G. (2013). A tale of two systems: A scientific advance or a theoretical stone soup? commentary on
416
evans stanovic. Perspectives on Psychological Science 8, 257–262417
Kipf, T., Fetaya, E., Wang, K. C., Welling, M., and Zemel, R. (2018). Neural relational inference for
418
interacting systems. arXiv preprint419
Koch, C. and Tsuchiya, N. (2006). Attention and consciousness: two distinct brain processes. Trends
420
Cognitive Science 11, 16–22. doi:10.1016/j.tics.2006.10.012421
Korzybski, A. (1921). Manhood Of Humanity, The Science and Art of Human Engineering (NY: E. P.
422
Dutton and Company)423
Frontiers 11
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Korzybski, A. (1994). Science and Sanity: An Introduction to Non-Aristotelian Systems, 5th edn, (NY:
424
Institute of General Semantics)425
Lake, B. and Baroni, M. (2018). Generalization without systematicity: On the compositional skills of
426
sequence-to-sequence recurrent networks. In Proc. of International Conference on Machine Learning
427
(PMLR), 2873–2882428
Latapie, H., Liu, O. K. G., Kompella, R., Lawrence, A., Sun, Y., Srinivasa, J., et al. (2021). A metamodel
429
and framework for artificial general intelligence from theory to practice. Journal of Artificial Intelligence
430
and Consciousness 8, 205–227. doi:10.1142/S2705078521500119431
Liu, D., Lamb, A., Kawaguchi, K., Goyal, A., Mozer, C. S. M. C., and Bengio, Y. (2021). Discrete-valued
432
neural communication. arXiv preprint433
Llinas, R. R. (2002). Thalamocortical assemblies: How ion channels, single neurons and large-scale
434
networks organize sleep oscillations. In Thalamus and Related Systems, eds. A. Destexhe and T. J.
435
Sejnowski (Oxford: Oxford University). 87–88436
Marchetti, M. (2011). Against the view that consciousness and attention are fully dissociable. Frontiers in
437
Psychology 3. doi:https://doi.org/10.3389/fpsyg.2012.00036438
Martin, C., Bhui, R., and Bossaerts, P. (2014). Chimpanzee choice rates in competitive games match
439
equilibrium game theory predictions. Sci Rep 4, 51–81440
Miller, L. M. and D’Esposito, M. (2005). Perceptual fusion and stimulus coincidence in the cross-modal
441
integration of speech. Journal of Neuroscience 25, 5884–5893442
Monteiro, S. M. and Norman, G. (2013). Diagnostic reasoning: Where we’ve been, where we’re going.
443
Teaching and Learning in Medicine 25, S26–S3. doi:10.1080/10401334.2013.842911444
Newell, A. and Simon, H. A. (1976). Computer science as empirical inquiry: symbols and search.
445
Communications of the ACM 19, 113–126446
Nivel, E., Th
´
orisson, K. R., Steunebrink, B., Dindo, H., Pezzulo, G., Rodriguez, M., et al. (2013). Bounded
447
recursive self-improvement. Tech report RUTR-SCS13006, Reykjavik University – School of Computer
448
Science449
Nivel, E., Th
´
orisson, K. R., Steunebrink, B., and Schmidh
¨
uber, J. (2015). Anytime bounded rationality. In
450
Proc. 8th International Conference on Artificial General Intelligence (AGI-15). 121–130451
Noesselt, T., Riegerand, J. W., Schoenfeld, M. A., Kanowski, M., Hinrichs, H., and Heinze, H. J. (2007).
452
Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus
453
primary sensory cortices. Journal of Neuroscience 27, 11431–11441454
Papaioannou, A. G., Kalantzi, E., Papageorgiou, C. C., and Korombili, K. (2021). Complexity analysis of
455
the brain activity in autism spectrum disorder (asd) and attention deficit hyperactivity disorder (adhd)
456
due to cognitive loads/demands induced by aristotle’s type of syllogism/reasoning. a power spectral
457
density and multiscale entropy (mse) analysis. Heliyon 7, e07984458
Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., and Wermter, S. (2019). Continual lifelong learning with
459
neural networks: A review. Neural Networks 113, 54–71. doi:https://doi.org/10.1016/j.neunet.2019.01.460
012461
Perry, J. C., Pakkenberg, B., and Vann, S. D. (2018). Striking reduction in neurons and glial cells in anterior
462
thalamic nuclei of older patients with down’s syndrome. BioRxiv 449678 doi:doi:10.1101/449678463
Posner, I. (2020). Robots thinking fast and slow: On dual process theory and metacognition in embodied ai.
464
In RSS 2020 Workshop RobRetro465
Sampathkumar, V., Miller-Hansen, A., Sherman, S. M., and Kasthuri, N. (2021). Integration of signals
466
from different cortical areas in higher order thalamic neurons. PNAS 118, e2104137118. doi:10.1073/
467
pnas.2104137118468
This is a provisional file, not the final typeset article 12
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin 119, 3–24
469
Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences 11, 1–43
470
Spivey, M. J., Tanenhaus, M. K., Eberhard, K. M., and Sedivy, J. C. (2013). Eye movements and spoken
471
language comprehension: Effects of visual context on syntactic ambiguity resolution 45, 447–481472
Steenkiste, S. V., Chang, M., Greff, K., and Schmidhuber, J. (2018). Relational neural expectation
473
maximization: Unsupervised discovery of objects and their interactions. arXiv preprint474
Strack, F. and Deutsch, R. (2004). Reflective and impulsive determinants of social behavior. Personality
475
and Social Psychology Review 8, 220–247476
Sumner, P., Tsai, P. C., Yu, K., and Nachev, P. (2006). Attentional modulation of sensorimotor processes in
477
the absence of perceptual awareness. PNAS 103, 10520–10525478
Th
´
orisson, K. R. (2020). Seed-programmed autonomous general learning. In Proceedings of Machine
479
Learning Research. 32–70480
Th
´
orisson, K. R., Bieger, J., Li, X., and Wang, P. (2019). Cumulative learning. In Proc. International
481
Conference on Artificial General Intelligence (AGI-19). 198–209482
Tyll, S., Budinger, E., and Noesselt, T. (2011). Thalamic influences on multisensory integration. Commun.
483
Integr. Biol 4, 145–171484
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Kaiser, A. N. G. L., et al. (2017). Attention
485
is all you need. In Proc. Advances in Neural Information Processing Systems. 5998–6008486
Wang, P. (2005). Experience-grounded semantics: A theory for intelligent systems. Cogn. Syst. Res. 6,
487
282–302. doi:10.1016/j.cogsys.2004.08.003488
Wang, P. (2006). Rigid Flexibility: The Logic of Intelligence (Dordrecht: Springer)489
Wang, P. (2013). Non-Axiomatic Logic: A Model of Intelligent Reasoning (Singapore: World Scientific)490
Werner, S. and Noppeney, U. (2010). Superadditive responses in superior temporal sulcus predict
491
audiovisual benefits in object categorization. Cerebral Cortex 20, 1829 – 1842492
Wolff, M. and Vann, S. D. (2019). The cognitive thalamus as a gateway to mental representations. J
493
Neurosci 39, 3–14. doi:10.1523/JNEUROSCI.0479-18494
Xu, X., Hanganu-Opatz, I. L., and Bieler, M. (2020). Cross-talk of low-level sensory and high-level
495
cognitive processing: Development, mechanisms, and relevance for cross-modal abilities of the brain.
496
Frontiers in Neurorobotics 14. doi:10.3389/fnbot.2020.00007497
Yeh, C.-C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, A., et al. (2016). Matrix profile i: All pairs
498
similarity joins for time series: A unifying view that includes motifs, discords and shapelets. 1317–1322.
499
doi:10.1109/ICDM.2016.0179500
FIGURE CAPTIONS
Frontiers 13
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Figure 1. Neurosymbolic Metamodel and Framework for Artificial General Intelligence
Figure 2. Flow of Retail Use Case for Metamodel (from Latapie et al., 2021)
This is a provisional file, not the final typeset article 14
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Figure 3. Levels of Abstraction for Retail Use Case (from Latapie et al., 2021)
Frontiers 15
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Figure 4. A Histogram of Regime Changes from Network Telemetry Data (A port shut down event started
at the 50th timestamp and ended at the 100th)
This is a provisional file, not the final typeset article 16
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Objective We aim to investigate whether EEG dynamics differ in adults with ASD (Autism Spectrum Disorders), ADHD (attention-deficit/hyperactivity disorder), compared with healthy subjects during the performance of an innovative cognitive task: Aristotle's valid and invalid syllogisms. We follow the Neuroanatomical differences type of criterion in assessing the results of our study in supporting or not the dual-process theory of Kahneman, 2011) (Systems I & II of thinking). Method We recorded EEGs from 14 scalp electrodes in 30 adults with ADHD, 30 with ASD and 24 healthy, normal subjects. The subjects were exposed in a set of innovative cognitive tasks (inducing varying cognitive loads), the Aristotle's four types of syllogism mentioned above. The multiscale entropy (MSE), a nonlinear information-theoretic measure or tool was computed to extract features that quantify the complexity of the EEG. Results The dynamics of the curves of the grand average of MSE values of the ADHD and ASD participants was significantly in higher levels for the majority of time scales, than the healthy subjects over a number of brain regions (electrodes locations), during the performance of both valid and invalid types of syllogism. This result is seemingly not in accordance of the broadly accepted ‘theory’ of complexity loss in ‘pathological’ subjects, but actually this is not the case as explained in the text. ADHD subjects are engaged in System II of thinking, for both Valid and Invalid syllogism, ASD and Control in System I for valid and invalid syllogism, respectively. A surprising and ‘provocative’ result of this paper, as shown in the next sections, is that the Complexity-variability of ASD and ADHD subjects, when they face Aristotle's types of syllogisms, is higher than that of the control subjects. An explanation is suggested as described in the text. Also, in the case of invalid type of Aristotelian syllogisms, the linguistic and visuo-spatial systems are both engaged ONLY in the temporal and occipital regions of the brain, respectively, of ADHD subjects. In the case of valid type, both above systems are engaged in the temporal and occipital regions of the brain, respectively, of both ASD and ADHD subjects, while in the control subjects only the visuo-spatial type is engaged (Goel et al., 2000; Knauff, 2007). Conclusion Based on the results of the analysis described in this work, the differences in the EEG complexity between the three groups of participants lead to the conclusion that cortical information processing is changed in ASD and ADHD adults, therefore their level of cortical activation may be insufficient to meet the peculiar cognitive demand of Aristotle's reasoning. Significance The present paper suggest that MSE, is a powerful and efficient nonlinear measure in detecting neural dysfunctions in adults with ASD and ADHD characteristics, when they are called on to perform in a very demanding as well as innovative set of cognitive tasks, that can be considered as a new diagnostic ‘benchmark’ in helping detecting more effectively such type of disorders. A linear measure alone, as the typical PSD, is not capable in making such a distinction. The work contributes in shedding light on the neural mechanisms of syllogism/reasoning of Aristotelian type, as well as toward understanding how humans reason logically and why ‘pathological’ subjects deviate from the norms of formal logic.
Article
Full-text available
This paper introduces a new metamodel-based knowledge representation that significantly improves autonomous learning and adaptation. While interest in hybrid machine learning / symbolic AI systems leveraging, for example, reasoning and knowledge graphs, is gaining popularity, we find there remains a need for both a clear definition of knowledge and a metamodel to guide the creation and manipulation of knowledge. Some of the benefits of the metamodel we introduce in this paper include a solution to the symbol grounding problem, cumulative learning, and federated learning. We have applied the metamodel to problems ranging from time series analysis, computer vision, and natural language understanding and have found that the metamodel enables a wide variety of learning mechanisms ranging from machine learning, to graph network analysis and learning by reasoning engines to interoperate in a highly synergistic way. Our metamodel-based projects have consistently exhibited unprecedented accuracy, performance , and ability to generalize. This paper is inspired by the state-of-the-art approaches to AGI, recent AGI-aspiring work, the granular computing community, as well as Alfred Korzybski's general semantics. One surprising consequence of the metamodel is that it not only enables a new level of autonomous learning and optimal functioning for machine intelligences, but may also shed light on a path to better understanding how to improve human cognition.
Conference Paper
Full-text available
The knowledge that a natural learner creates based on its experience of any new situation is likely to be both partial and incorrect. To improve such knowledge with increased experience , cognitive processes must bring already-acquired knowledge towards making sense of new situations and update it with new evidence, cumulatively. For the initial creation of knowledge, and its subsequent usage, expansion, modification, unification, compaction and deletion, cognitive mechanisms must be capable of self-supervised "surgical" operation on existing knowledge, involving among other things self-inspection or reflection, to make possible selective discrimination, comparison, and manipulation of newly demarcated subsets of any relevant part of the whole knowledge set. Few proposals exist for how to achieve this in a single learner. Here we present a theory of how systems with these properties may work, and how cumulative self-supervised learning mechanisms might reach greater levels of autonomy than seen to date. Our theory rests on the hypotheses that learning must be (a) organized around causal relations, (b) bootstrapped from observed correlations and analogy, using (c) fine-grain relational models, manipulated by (d) micro-ampliative reasoning processes. We further hypothesize that a machine properly constructed in this way will be capable of seed-programmed autonomous generality: The ability to apply learning to any phenomenon-that is, being domain-independent-provided that the seed reference observable variables from the outset (at "birth"), and that new phenomena and existing knowledge overlap on one or more observables or inferred features. The theory is based on implemented systems that have produced notable results in the direction of increased general machine intelligence.
Conference Paper
Full-text available
Using a proprietary visual scene object tracker and the Open-NARS reasoning system we demonstrate how to predict and detect various anomaly classes. The approach combines an object tracker with a base ontology and the OpenNARS reasoning system to learn to classify scene regions based on accumulating evidence from typical entity class (tracked object) behaviours. The system can autonomously satisfy goals related to anomaly detection and respond to user Q&A in real time. The system learns directly from experience with no initial training required (one-shot). The solution is a fusion of sub-symbolic (object tracker) and symbolic (ontology and reasoning).
Preprint
Full-text available
The anterior thalamic nuclei are important for spatial and episodic memory; however, there is surprisingly little information about how these nuclei are affected in many conditions that present with memory impairments, including Down syndrome. To assess the status of the anterior thalamic nuclei in Down syndrome we quantified neurons and glial cells in the brains from four older patients with this condition. There was a striking reduction in the volume of the anterior thalamic nuclei and this appeared to reflect the loss of approximately 70% of neurons. The number of glial cells was also reduced but to a lesser degree than neurons. The anterior thalamic nuclei appear to be particularly sensitive to effects of aging in Down syndrome and the pathology in this region likely contributes to the memory impairments observed. These findings re-affirm the importance of assessing the status of the anterior thalamic nuclei in conditions where memory impairments have been principally assigned to pathology in the medial temporal lobe.
Article
Full-text available
Common-sense physical reasoning is an essential ingredient for any intelligent agent operating in the real-world. For example, it can be used to simulate the environment, or to infer the state of parts of the world that are currently unobserved. In order to match real-world conditions this causal knowledge must be learned without access to supervised data. To address this problem we present a novel method that learns to discover objects and model their physical interactions from raw visual images in a purely \emph{unsupervised} fashion. It incorporates prior knowledge about the compositional nature of human perception to factor interactions between object-pairs and learn efficiently. On videos of bouncing balls we show the superior modelling capabilities of our method compared to other unsupervised neural approaches that do not incorporate such prior knowledge. We demonstrate its ability to handle occlusion and show that it can extrapolate learned knowledge to scenes with different numbers of objects.
Article
Full-text available
Highly distributed neural circuits are thought to support adaptive decision-making in volatile and complex environments. Notably, the functional interactions between prefrontal and reciprocally connected thalamic nuclei areas may be important when choices are guided by current goal value or action-outcome contingency. We examined the functional involvement of selected thalamocortical and corticothalamic pathways connecting the dorsomedial prefrontal cortex (dmPFC) and the mediodorsal thalamus (MD) in the behaving rat. Using a chemogenetic approach to inhibit projection-defined dmPFC and MD neurons during an instrumental learning task, we show that thalamocortical and corticothalamic pathways differentially support goal attributes. Both pathways participate in adaptation to the current goal value, but only thalamocortical neurons are required to integrate current causal relationships. These data indicate that antiparallel flow of information within thalamocortical circuits may convey qualitatively distinct aspects of adaptive decision-making and highlight the importance of the direction of information flow within neural circuits.
Article
Full-text available
The concentrations of measure phenomena were discovered as the mathematical background to statistical mechanics at the end of the nineteenth/beginning of the twentieth century and have been explored in mathematics ever since. At the beginning of the twenty-first century, it became clear that the proper utilization of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality . This paper summarizes recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median-level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher’s discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us with such classifiers and determine a non-iterative (one-shot) procedure for their construction. This article is part of the theme issue ‘Hilbert’s sixth problem’.
Article
Higher order thalamic neurons receive driving inputs from cortical layer 5 and project back to the cortex, reflecting a transthalamic route for corticocortical communication. To determine whether or not individual neurons integrate signals from different cortical populations, we combined electron microscopy “connectomics” in mice with genetic labeling to disambiguate layer 5 synapses from somatosensory and motor cortices to the higher order thalamic posterior medial nucleus. A significant convergence of these inputs was found on 19 of 33 reconstructed thalamic cells, and as a population, the layer 5 synapses were larger and located more proximally on dendrites than were unlabeled synapses. Thus, many or most of these thalamic neurons do not simply relay afferent information but instead integrate signals as disparate in this case as those emanating from sensory and motor cortices. These findings add further depth and complexity to the role of the higher order thalamus in overall cortical functioning.
Article
Background: Functional connectivity (FC) is believed to be abnormal in attention-deficit hyperactivity disorder (ADHD). Most studies have focused on frontostriatal systems, and the role of the thalamic network in ADHD remains unclear. The current study used FC magnetic resonance imaging (fcMRI) to explore corticothalamic network properties and correlated network dysconnection with ADHD symptom severity. Methods: Eighteen adolescents with ADHD and 16 healthy controls aged 12 to 17 years underwent resting functional MRI scans, clinical evaluations, and 2 parent rating scales, namely the Swanson, Nolan, and Pelham IV scale and the Child Behavior Checklist. Six a priori cortical regions of interest were used to derive 6 networks: the dorsal default mode network, frontoparietal network, cingulo-opercular network (CON), primary sensorimotor network (SM1), primary auditory network, and primary visual network (V1). The corticothalamic connectivity for each network was calculated for each participant and then compared between the groups. We also compared the 2 scales with the network connectivity. Results: The corticothalamic connectivity within the CON was significantly reduced (p < 0.05) among adolescents with ADHD compared with the controls. The corticothalamic dysconnection within the CON, SM1, and V1 networks negatively correlated with ADHD symptom severity. Conclusion: This network analysis indicates that corticothalamic dysconnection in ADHD involves the CON, SM1, and V1 networks and relates to symptom severity. The findings provide evidence of dysfunctional thalamus-related networks in ADHD.