Content uploaded by Hugo Latapie
Author content
All content in this area was uploaded by Hugo Latapie on Mar 15, 2022
Content may be subject to copyright.
Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
1
Neurosymbolic Systems of Perception
& Cognition: The Role of Attention
Hugo Latapie,1Ozkan Kilic,1,∗Kristinn R. Th ´
orisson,2Pei Wang,3and Patrick
Hammer 4
1Emerging Technologies & Incubation, Cisco Systems, San Jose, CA, USA
2Icelandic Institute for Intelligent Machines and Department of Computer Science,
Reykjavik University, Reykjavik, Iceland
3Department of Computer and Information Sciences, Temple University,
Philadelphia, PA, USA
4Center for Digital Futures, KTH Royal Institute of Technology and Stockholm
University, Stockholm, Sweden
Correspondence*:
Ozkan Kilic
okilic@cisco.com
ABSTRACT2
A cognitive architecture aimed at cumulative learning must provide the necessary information
3
and control structures to allow agents to learn incrementally and autonomously from their
4
experience. This involves managing an agent’s goals as well as continuously relating sensory
5
information to these in its perception-cognition information processing stack. The more varied the
6
environment of a learning agent is, the more general and flexible must be these mechanisms to7
handle a wider variety of relevant patterns, tasks, and goal structures. While many researchers
8
agree that information at different levels of abstraction likely differs in its makeup and structure and
9
processing mechanisms, agreement on the particulars of such differences is not generally shared
10
in the research community. A dual processing architecture (often referred to as System-1 and
11
System-2) has been proposed as a model of cognitive processing, and they are often considered
12
as responsible for low- and high-level information, respectively. We posit that cognition is not
13
binary in this way and that knowledge at any level of abstraction involves what we refer to as
14
neurosymbolic information, meaning that data at both high and low levels must contain both
15
symbolic and subsymbolic information. Further, we argue that the main differentiating factor
16
between the processing of high and low levels of data abstraction can be largely attributed to the
17
nature of the involved attention mechanisms. We describe the key arguments behind this view
18
and review relevant evidence from the literature.19
Keywords: Artificial Intelligence, Cognitive Architecture, Perception, Cognition, Levels of Abstraction, Neurosymbolic Models, Learning,
20
Cumulative Learning, Systems of Thinking21
1 INTRODUCTION
Cognitive architectures aim to capture the information and control structures necessary to create autonomous
22
learning agents. The sensory modalities of artificially intelligent (AI) agents operating in physical
23
environments must measure relevant information at relatively low levels of detail, commensurate with
24
1
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
the agent’s intended tasks. Self-supervised learning makes additional requirements on the ability of an
25
agent to dynamically and continuously relate a wide variety of sensory information to high-level goals of
26
tasks. The more general an agent’s learning is, the larger a part of its perception-cognition “information
27
stack” must capture the necessary flexibility to accommodate a wide variety of patterns, plans, tasks, and
28
goal structures. Low levels of cognition (close to the perceptual senses) seem to quickly generate and use29
predictions to generalize across similar problems. This is a key responsibility of a sensory system because
30
low-latency predictions (i.e. those that the agent can act quickly on) are vital for survival in a rapidly
31
changing world. Natural intelligence has several outstanding skills that Deep Learning does not have. Two
32
of these, as pointed out by e.g. Bengio et al. (2021), are that (a) it does not require thousands of samples to
33
learn, and (b) it can cope with out-of-order (OOD) samples. As detailed by e.g. Th
´
orisson et al. (2019),
34
another equally important shortcoming is that Deep Learning does not handle learning after the system
35
leaves the laboratory – i.e. cumulative learning – in part because it does not harbour any means to verify
36
newly acquired information autonomously. Such skills require not only perception processes that categorize
37
the sensory data dynamically so that the lower levels can recognize ‘familiar’ situations by reconfiguring
38
known pieces and trigger higher-level cognition in the case of surprises, but also the reasoning to evaluate
39
the new knowledge that has been thus produced. Whenever high-level cognition solves a new problem,
40
the coordination allows the new knowledge to modify and improve the lower levels for similar future
41
situations, which also means that both systems have access to long-term memory. Architectures addressing
42
both sensory- and planning-levels of cognition are as of yet few and far between.43
While general agreement exists in the research community that information at different levels of
44
abstraction likely differs in makeup and structure, agreement on these differences – and thus the particulars
45
of the required architecture and processes involved – is not widely shared. It is sometimes assumed that
46
lower levels of abstraction are subsymbolic
1
and higher levels symbolic, which has led some researchers to
47
the idea that Deep Learning models are analogous to perceptual mechanisms while higher levels involve rule-
48
based reasoning skills due to a symbolic nature, and according to e.g. Kahneman (2011), is the only system
49
that can use language. This view has been adopted in some AI research, where ‘subsymbolic’ processing
50
are classified as System-1 processes, while higher-level and ‘symbolic’ processing is considered belonging
51
to System-2 (c.f. Smolensky, 1988; Sloman, 1996; Kahneman, 2011, Strack & Deutch, 2004). According to
52
this view, artificial neural networks, including Deep Learning, are System-1 processes; rule-based systems
53
are System-2 processes (see Bengio et al., 2021; and Bengio, 2019 for discussion). Similarly, William James
54
1890 proposed that the mind has two mechanisms of thought, one which handled reasoning and another
55
which was associative. We posit instead that cognition is not binary in this way at all, and that any level of
56
abstraction involves processes operating on what might be called “neurosymbolic” knowledge, meaning that
57
data at both high and low levels must accommodate both symbolic and subsymbolic information.
2
Further,
58
we argue that a major differentiating factor between the processing of high and low levels of data abstraction
59
can be largely attributed to the nature of the involved attention mechanisms.60
More than a century ago, James (1890) defined attention as “taking possession by the mind, in clear and
61
vivid form, of one out of what may seem several simultaneously possible objects or trains of thought...It
62
implies withdrawal from some things in order to deal effectively with others.” We consider attention to
63
consist of a (potentially large) set of processes whose role consists in steering the available resources of
64
a cognitive system, from moment to moment, including (but not limited to) its short-term focus, goal
65
1
We classify data as ‘subsymbolic’ if it can only be manipulated through approximate similarity-mapping processes, i.e. cannot be grouped and addressed as a
(named) set.
2
By ‘symbolic’ here we mean that the information is at the level of abstraction close to human verbal description, not that it uses ‘symbols’ that must be
interpreted or ‘grounded’ to become meaningful.
This is a provisional file, not the final typeset article 2
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
pursuit, sensory control, deliberate memorization, memory retrieval, selection of sensory data, and many
66
other subconscious control mechanisms that we can only hypothesize at this point and thus have no
67
names for. Low-level cognition, like perception, is characterized by a relatively high-speed, distributed
68
(“multi-threaded”), subconscious
3
attention control, while higher-level cognition seems more “single-
69
threaded”, and relatively slower. When people introspect, our conscious threads of attention seem to
70
consist primarily of the latter, while much of our low-level perceptions are subconscious and under the
71
control of autonomous attention mechanisms (see Koch and Tsuchiya, 2006; Marchetti, 2011; Sumner
72
et al., 2006 for evidence and discussion about decoupling attention from conscious introspection). Low-
73
level perception and cognitive operations may reflect autonomous access to long-term memory through
74
subconscious attention mechanisms, while higher-level operation may involve the recruitment of deliberate
75
(introspectively-accessible) cognitive control, working memory, and focused attention (Papaioannou et al.,
76
2021).77
Two separate issues in the System-1/System-2 discussion are often confused: (1) Knowledge
78
representation and (2) information processing. The first is the (by now, familiar) ‘symbolic vs. subsymbolic’
79
distinction, while the second involves the ‘automatic vs. controlled’ distinction. Not only are these two
80
distinctly different, they are also not perfectly aligned; while subsymbolic knowledge may be more often
81
processed ‘automatically’ and symbolic knowledge seem generally more accessible through voluntary
82
control and introspection, this mapping cannot be taken as given. A classic example is skill learning
83
like riding a bike, which starts as a controlled process, and gradually becomes automatic with increased
84
training. On the whole this process is largely subsymbolic, with hardly anything but the top-level goals
85
introspectively accessible to the learner of bicycle-riding (“I want to ride this bicycle without falling or
86
crashing into things”). Though we acknowledge the above differences, in this article our focus is on the
87
relations and correlations between these two distinctions.88
2 RELATED WORK & ATTENTION’S ROLE IN COGNITION
The sharp distinction between two hypothesized systems that some AI researchers have interpreted dual-
89
process theory to entail (cf. Posner, 2020) doesn’t seem very convincing when we look at the dependencies
90
between the necessary levels of processing. For instance, it has been demonstrated time and again (cf. Spivey
91
et al., 2013) that expectations created verbally (‘System-2 information’) have a significant influence on
92
low-level behavior like eye movements (‘System-1 information’). It is not obvious why – or how – two
93
sharply separated control systems would be the best – or even a good – way to achieve a tight coupling
94
between levels thus demonstrated, as has been noted by other authors (cf. Houwer, 2019). Until more
95
direct evidence is collected for the hypothesis that there really are two systems (as opposed to three, four,96
fifty, or indeed a continuity), it is a fairly straight forward task to fit the available evidence onto that theory
97
(cf. Strack and Deutch, 2004). In the context of AI, more direct evidence would include a demonstration of
98
an implemented control scheme that produced some of the same key properties as human cognition from
99
first principles.100
We would expect high-level (abstract) and low-level (perceptual/concrete) cognition to work in
101
coordination, not competition, after millions of years of evolution. Rather than implementing a (strict, or
102
semi-strict) pipeline structure between S1 and S2, where only data would go upstream (from S1 to S2) and
103
only control downstream (from S2 to S1; cf. Evans and Elqayam, 2007; Evans and Stanovich, 2013; Keren,
104
2013; Monteiro and Norman, 2013), we hypothesize high-level and low-level cognition to be coupled
105
3
We consider ‘subconscious’ cognitive processes the set of processes that are necessary for thought and that a mind cannot make the subject of its own
cognitive processing, i.e. all its processes that it does not have direct intropsective access to.
Frontiers 3
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
through a two-way control-and-data communication, as demonstrated in numerous experiments (see Xu,
106
2020 review article on cross-modal processing between high- and low- level cognition). In other words,
107
the low-level cognition does not solely work under control of the high-level one; rather, the two levels
108
cooperate to optimize resource utilization through joint control.109
Through the evolution of the human brain, some evidence seems to indicate that language-based
110
conceptual representations replaced sensory-based compositional concepts, explaining the slower reaction
111
times in humans than other mammals, e.g., chimpanzees (see for instance Martin et al., 2014). However,
112
this replacement may have pushed the boundaries of human higher-level cognition by allowing complex
113
propositional representations and mental simulations. While animals do not demonstrate propositional
114
properties of human language, researchers have found some recursion in birdsong (Gentner et al., 2006)
115
and in syntax among bonobos (Clay and Zuberbuhler, 2011). Moreover, Camp (2009) found evidence
116
that some animals think in compositional representational systems. In other words, animals seem to lack
117
propositional thought, but they have compositional conceptual thought, which is mostly based on integrated
118
multisensory data. Since animals appear to have symbol-like mental representations, these findings indicate
119
that their lower levels can be neurosymbolic. Evidence for this can be found in a significant number studies
120
from the animal-cognition literature (for review, see Camp, 2009; Hubbard et al., 2008; Diester and Nieder,
121
2007; Hauser et al., 2007; Brannon, 2005).122
Among the processes of key importance in skill learning, to continue with that example, is attention; a
123
major cognitive difference between a skilled bike rider and a learner of bike-riding is what they pay attention
124
to: The knowledgeable rider pays keen attention to the tilt angle and speed of the bicycle, responding
125
by changing the angle of the steering wheel dynamically, in a non-linear relationship. Capable as they
126
may already be of turning the front wheel to any desired angle, a learner is prone to fall over in large part
127
because they don’t know what to pay attention to. This is why one of the few obviously useful tips that
128
a teacher of bicycle-riding can give a learner is to “always turn the front wheel in the direction you are
129
falling.”130
Kahneman (1973) sees attention as a pool of resources which allows different process to share cognitive
131
capabilities and posits a System-1 that is fast, intrinsic, autonomous, emotional, parallel, and a System-2
132
that is slower, deliberate, conscious, and serial (Kahneman, 2011). For example, driving a car on an
133
empty road (with no unexpected events), recognizing your mother’s voice, and calculating 2+2, mostly
134
involve System-1, whereas counting the number of people with eyeglasses in a meeting, recalling and
135
dialing your significant other’s phone number, calculating 13x17, and filling out a tax form depend on
136
System-2. Kahneman’s System-1 is good at making quick predictions because it constantly models similar
137
situations based on experience. It should be noted that “experience” in this context relates to the the process
138
of learning, and its transfer – i.e. generalization and adaptation – which presumably relies heavily on
139
higher-level cognition (and should thus be part of System-2). Learning achieved in conceptual symbolic
140
space can be projected to subsymbolic space. In other words, since symbolic and subsymbolic spaces are
141
in constant interaction, acquired knowledge in symbolic space has correspondences in subsymbolic space.
142
This allows System-1 to start quickly using the projections of the knowledge, even based on System-2
143
experience.144
Several fMRI studies support the idea that sensory-specific areas, such as thalamus, may be involved
145
in multi-sensory stimulus integrations (Miller and D’Esposito, 2005; Noesselt et al., 2007; Werner
146
and Noppeney, 2010), which are symbolic representations in nature. Sensory-specific brain regions are
147
considered to be networks specialized in subsymbolic data that originates from the outside world and
148
different body parts. Thalamo-cortical oscillation is known as a synchronization mechanism or temporal
149
This is a provisional file, not the final typeset article 4
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
binding between different cortical regions (Llinas, 2002). However, recent evidence shows that the thalamus,
150
previously assumed to be responsible only for relaying sensory impulses from body receptors to the cerebral
151
cortex, can actually integrate these low-level impulses (Tyll et al., 2011; Sampathkumar et al., 2021). In
152
other words, in the thalamus there are sensory-based integrations, and they are essential in sustaining
153
cortical cognitive functions.154
Wolff and Vann (2019) use the term “cognitive thalamus” to describe a gateway to mental representations
155
because recent findings support the idea that thalamocortical and corticothalamic pathways may play
156
complementary but dissociable cognitive roles (see Bolkan et al., 2017; Alcaraz et al., 2018). More
157
specifically, the thalamocortical pathway (the fibers connecting thalamus to cortex region) can create and
158
save task-related representations, not just purely sensory information, and this pathway is essential
159
for updating cortical representations. Similarly, corticothalamic pathways seem to have two major
160
functions: directing cognitive resources (focused attention) and contributing to learning. In a way, the
161
thalamocortical pathway defines the world for the cortex, and the corticothalamic pathway uses attention
162
to tell thalamus what the cortex needs from it to focus. Furthermore, a growing body of evidence shows
163
that the thalamus plays a role in cognitive dysfunction, such as schizophrenia (Anticevic et al.,2014),
164
Down’s syndrome (Perry et al., 2018), drug addiction (Balleine et al., 2015), and ADHD (Hua et al.,
165
2021). These discoveries support other recent findings about the role of the thalamus in cognition via
166
the thalamocortical loop. The thalamus, a structure proficient in using and integrating subsymbolic data
167
actively, describes the world for the cortex by contributing to the symbolic representations in it. On the
168
other hand, the cortex uses attention to direct resources to refresh its symbolic representations from the
169
subsymbolic space. In Non-Axiomatic Reasoning System (NARS; Wang, 2006) attention has the role
170
of allocating processing power for producing and scheduling inference steps, whereby inferences can
171
compose new representation from existing components, seek out new ones, and update the strength of
172
existing relationships via knowledge revision. This control also leads to a refreshing of representations
173
in a certain sense, as the system will utilize the representations which are most reliable and switch to
174
alternatives if some of them turn out to be unreliable.175
In the Auto-catalytic Endogenous Reflective Architecture (AERA) attention is implemented as system-
176
permeating control of computational/cognitive resources at very fine-grain levels of processing, bounded by
177
goals at one end and the current situation at the other (cf. Nivel et al., 2015; Helgason et al., 2013). Studies
178
on multitasking in humans have shown that a degree of parallelism among multiple tasks is more likely if
179
the tasks involve different data modalities, such as linguistic and tactile. Low-level attention continuously
180
monitors both mind and the outside world and assesses situations (i.e., relates it to active goals and plans)
181
with little or no effort, through its access to long-term memory and the sensory information. Surprises and
182
threats and detected early in the perceptual stream, while plans and questions are handled at higher levels
183
of abstraction, triggering higher levels of processing, which also provide a top-down control of attention
184
and reasoning.185
In contrast to so-called “attention” mechanisms in artificial neural networks (which are for the most
186
part rather narrow interpretations of resource control in general), mental resources (processing power and
187
storage in computer systems) are explicitly distributed, whereby filtering of input for useful input patterns
188
is just a special case. Another aspect is priming for related information by activating it, which is not limited
189
to currently perceived information but can integrate long-term memory content rather than just content of a
190
sliding window (as in Transformers) of recent stimuli in input space.191
Frontiers 5
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
3 A NEUROSYMBOLIC ARCHITECTURE AS SYSTEMS OF THINKING
The idea of combining symbolic and sub-symbolic approaches, also known as the neurosymbolic approach,
192
is not new. Many researchers are working on integrated neural-symbolic systems which translate symbolic
193
knowledge into neural networks (or the other way around), because symbols, relations, and rules should
194
have counterparts in the sub-symbolic space. Moreover, the neurosymbolic network needs a symbol
195
manipulation that also supports preservation of the structural relations between the two systems without
196
losing the correspondences.197
Currently, Deep Learning and related machine learning methods are primarily subsymbolic. Meanwhile,
198
rule-based systems and related reasoning systems are usually strictly symbolic. We consider it possible to
199
have a Deep Learning model that demonstrates symbolic cognition (without reasoning mechanisms) that
200
entails the transformation of symbolic representations into subsymbolic ML/DL/statistical models. One
201
of the costs associated with such transformation, however, is an inevitable loss of the underlying causal
202
model which may have existed in the symbolic representation (Parisi et al., 2019). Current subsymbolic
203
representations are exclusively correlational; information based on spurious correlation is indistinguishable
204
from other correlations and causal direction between correlating variables is not represented and thus not
205
separable from either of those knowledge sets.206
There is an ongoing interest in bringing symbolic and abstract thinking to Deep Learning, which could
207
enable more powerful kinds of learning. Graph neural networks with distinct nodes (Kipf et al., 2018; Van
208
Steenkiste et al., 2018), transformers with discrete positional elements (Vaswani et al., 2017), and modular
209
models with bandwidth (Goyal and Bengio, 2020) are examples of attempts in this direction. Liu et al.
210
(2021) summarize the advantages of having discrete values (symbols) in a Deep Learning architecture.
211
First, using symbols allows a language for inter-modular interaction and learning, whereby the meaning
212
of symbols is not innate but determined by the relationships with others (as in Semiotics). Second, it
213
allows reusing previously learned symbols in unseen or out-of-order situations, by reinterpreting them in
214
a way suitable to the situation. Discretization in Deep Learning may provide systematic generalization
215
(recombining existing concepts) but it is currently not very successful (Lake and Baroni, 2018).216
Current hybrid approaches attempt to combine symbolic and subsymbolic models to compensate for
217
each other’s drawbacks. However, the authors believe that there is a need for a metamodel which will
218
accommodate hierarchical knowledge representations. Latapie et al. (2021) proposed such a model inspired
219
by Korzybski’s (1994) idea about levels of abstraction. Their model promotes cognitive synergy and
220
metalearning, which refer to the use of different computational techniques and AGI approaches, e.g.,
221
probabilistic programming, machine learning/Deep Learning, AERA (Th
´
orisson, 2020; Nivel et al., 2013),
222
NARS
4
(Wang, 2006; Wang, 2013) to enrich its knowledge and address combinatorial explosion issues.
223
The current paper extends the metamodel as a neurosymbolic architecture5as in Figure 1.224
In this metamodel, the levels of abstractions
6
are marked with L.L0 is the closest to the raw data
225
collected from various sensors. L1 contains the links between raw data and higher level abstractions. L2
226
corresponds to the highest integrated levels of abstraction learned through statistical learning, reasoning,
227
and other processes. The layer L2 can have an infinite number of sub-layers since any level of abstraction
228
in L2 can have metadata existing at an even higher level of abstraction. L* holds the high-level goals
229
4With open-source implementation OpenNARS at https://github.com/opennars/opennars — last accessed on Oct 20th, 2021.
5
This architecture has a “symbolic” aspect in the sense that there are components that can be are accessed and manipulated using their identifiers. This is
different from traditional Symbolic AI where a “symbol” gets its meaning by referring to an external object or event, as stated by Newell and Simon (1976).
6Krozybski (1994) states that knowledge is a multiordinal, hierarchical structure with varying levels of abstraction.
This is a provisional file, not the final typeset article 6
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
and motivations, such as self-monitoring, self-adjusting, self-repair, and the like. Similar to the previous
230
version, the neurosymbolic metamodel is based on the assumption of insufficient knowledge and resources
231
(Wang, 2005). The symbolic piece of the metamodel can be thought of as a knowledge graph with some
232
additional structure that includes both a formalized means of handling anti-symmetric and symmetric
233
relations, as well as a model of abstraction. The regions in the subsymbolic piece of the metamodel are
234
mapped to the nodes in the symbolic system in L1. In this approach, the symbolic representations are
235
always refreshed in a bottom-up manner.236
Depending on the system’s goal or subgoals, the metamodel can be readily partitioned into subgraphs
237
using the hierarchical abstraction substructure associated with the current focus of attention. This
238
partitioning mechanism is crucial to manage combinatorial explosion issues while enabling multiple
239
reasoners to operate in parallel. Each partition can trigger a sub-focus of attention (sFoA), which requests
240
subsymbolic data from System-1 or some answers from System-2. The bottom-up refreshing and the
241
neurosymbolic mapping between regions and symbols allow the metamodel to benefit from different
242
computational techniques (e.g., probabilistic programming, Machine Learning/Deep Learning and such) to
243
enrich its knowledge and benefit from the ‘blessing of dimensionality’ (cf. Gorban, 2018), also referred to
244
as ‘cognitive synergy.’245
A precursor to the metamodel as a neurosymbolic approach was first used by Hammer et al. (2019).
246
This version was the first commercial implementation of a neurosymbolic AGI-aspiring
7
approach in the
247
smart city domain. Later, the need for use of the levels of abstraction in the metamodel became mandatory
248
due to the combinatorial explosion issue. In other words, structural knowledge representation with the
249
levels of abstraction became very important for partitioning the problem, process subsymbolic or symbolic
250
information for each sub problem (focus of attention, FoA), and then combine the symbolic results in the
251
metamodel. The metamodel with the level of abstraction was actually achieved fully in the retail domain
252
(see Latapie et al., 2021 for details). The flow of the retail use case with the metamodel is shown in Figure
253
2. The example for the levels of abstraction using the results of the retail use case is shown in Figure 3.
254
Latapie et al. (2021) emphasized that no Deep Learning model was trained with product or shelf images for
255
the retail use case. The system used for the retail use case is solely based on representing the subsymbolic
256
information in a world of bounding boxes with spatial semantics. The authors tested the metamodel in 4
257
different settings with and without the FoA and reported the results as in Table 1.258
Table 1. Experimental results from Retail Use Case using Metamodel
Category without FoA (%) with FoA (%)
precision recall f1-score precision recall f1-score
product 80.70 29.32 52.88 96.36 99.07 97.70
shelf 8.82 18.75 12.00 82.35 87.50 88.85
other 36.61 89.66 52.00 96.00 82.76 88.89
overall accuracy 46.30 (min/max: 30.13/84.65) 94.73 (min/max: 88.10/100.00)
Another use case for the metamodel is the processing of more than 200,000 time series with a total of
259
more than 30 million individual data points. The time series are network telemetry data. For this use case,
260
there are only two underlying assumptions: The first assumption is that the time series or a subset of them
261
is at least weakly-related, such as time series from computer network devices. The second assumption
262
7
Artificial general intelligence (AGI) is the research area closest to the original vision of the field of AI, namely, to create machines with intelligence on par
with humans.
Frontiers 7
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
is that when a number of time series simultaneously change their behaviors, it might indicate that an
263
event-of-interest has happened. For detecting anomalies and finding regime change locations, Matrix
264
Profile algorithms are used (see Yeh et al., 2016; Gharhabi et al. 2017 for Matrix Profile and Semantic
265
Segmentation). Similar to the retail use case, millions of sensory data points are reduced to a much smaller
266
number of events based on the semantic segmentation points. These points are used to form a histogram
267
of regime changes as shown in Figure 4. The large spikes in the histogram are identified as the candidate
268
events-of-interest. Then the metamodel creates a descriptive model for all time series, which allows system
269
to downsize millions of data points into a few thousand structural actionable and explainable knowledge.270
To test the metamodel with time series, we first use a subset of the Cisco Open Telemetry Data Set.
8
271
After being able to identify the anomalies in the data set, we create our own data sets similar to the Open
272
Telemetry Data. For this purpose, 30 computer network events, such as memory leak, transceiver pull,
273
port flap, port shut down, and such, are injected to a physical computer network. The system is able to
274
identify 100% of the events with a maximum of 1 minute delay. For example, Figure 4 represents the
275
histogram of regime changes for a port shut down event, which is injected at the 50
th
timestamp. Since the
276
sampling rate is 6 seconds, one minute later (which is at the 60
th
timestamp) the system detects a spike as
277
an event-of-interest. It can take time for a single incident to display a cascading effect on multiple devices.
278
When the injection ends at the 100
th
timestamp, another spike is observed within 10 timestamps, which
279
represents a recovery behavior for the network. It should be noted that not all events necessarily mean
280
an error has happened. Some usual activities in the network, e.g., a usual firmware update on multiple
281
devices as events-of-no-interest, are also captured by the metamodel. The metamodel learns to classify
282
such activities either by observing the network. Although the time series processing using the metamodel
283
does not require any knowledge of computer networking, it can easily incorporate such features extracted
284
by networking-specific modules, e.g., Cisco Joy,
9
or ingest some expert knowledge defined in the symbolic
285
world, specifically at the 2
nd
level of abstraction This neurosymbolic approach with the metamodel can
286
quickly reduce the sensory data into knowledge, reason on this knowledge, and notify the network operators
287
for remediation or trigger a self-healing protocol.288
4 DISCUSSION
The neurosymbolic approach presented here evolved from several independent research efforts by four
289
core teams (NARS, AERA, OpenCog (Hart and Goertzel, 2008)) as well as efforts at Cisco over the
290
past 10 years focusing on hybrid state-of-the-art AI for commercial applications. This empirically-based
291
approach to AI took off (circa 2010) with deep-learning based computer vision, augmented by well-known
292
tracking algorithms (e.g. Kalman filtering / Hungarian algorithm). The initial hybrid architecture resulted293
in improved object detection and tracking functionality, but the types of errors, arguably related to weak
294
knowledge representation and poor ability to define and learn complex behaviors, resulted in systems
295
which did not meet our performance objectives. This initial hybrid architecture was called DFRE, Deep
296
Fusion Reasoning Engine, which actually lacked the metamodel. In order to improve the system’s ability to
297
generalize, NARS was incorporated. The initial architecture used NARS to reason about objects and their
298
movements in busy city intersection with trains, busses, pedestrians, and heavy traffic. This initial attempt
299
at a commercial neurosymbolic system dramatically improved the ability of the system to generalize and
300
learn behaviors of interest, which in this case were all related to safety. In essence the objective of the
301
system was to raise alerts if any two moving objects either made contact or were predicted to make contact
302
8https://github.com/cisco-ie/telemetry
9https://github.com/cisco/joy
This is a provisional file, not the final typeset article 8
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
as well as to learn other dangerous behaviors such as jay walking, wrong-way driving, and such. While
303
this system worked well as an initial prototype and is considered a success, there were early indications of
304
potential computational scalability issues if the number of objects requiring real-time processing were to
305
increase from the average 100 or so to say an order of magnitude more objects, such as 1000. In order to
306
explore this problem we then focused on a retail inventory use case that required the processing of over
307
1000 objects. As expected, DFRE suffered from the predicted combinatorial explosion issues. In the retail
308
use case, this problem was solved via the metamodel’s abstraction hierarchy which provides a natural
309
knowledge partitioning mechanism. This partitioning mechanism was used to address the exponential time
310
complexity problem and convert it to a linear time complexity problem.311
While NARS enabled the system to learn by reasoning in an unsupervised manner, there was a
312
growing need in commercial applications for a principled mechanism for unsupervised learning directly
313
from temporal data streams such as sensor data, video data, telemetry data, etc. This is the focus of
314
AERA as well as internal Cisco project Kronos based on Matrix Profile (Yeh, 2016). While there is
315
a large body of work on time series processing (FFT, Wavelets, Matrix Profile, etc.), the problem of
316
dealing with large-scale time series and incorporating contextual knowledge to produce descriptive and
317
predictive models with explanatory capability seems relatively unsolved at the time of this writing. In
318
our preliminary experimentation, both AERA and Cisco’s Kronos projects are demonstrating promising
319
results. Incorporating AERA and Kronos into the hybrid architecture is expected to result in enhanced
320
unsupervised learning and attention mechanisms directly from large-scale time series.321
This evolved hybrid architecture (ML/DL/NARS/Kronos metamodel) is expected to promote cognitive
322
synergy while preserving level of abstraction, symmetric and anti-symmetric properties of knowledge and
323
using a bottom-up approach to refresh System-2 symbols from System-1 data integration (see Latapie et al.,
324
2021 for details). Moreover, System-1 provides rapid responses to the outside world and activates System-2
325
in case of a surprise such as an emergency or other significant event that requires further analysis and
326
potential action. System-2 uses conscious attention to request subsymbolic knowledge and sensory data
327
from System-1, to be integrated into the levels of abstraction inspired from Korzybski’s work. Korzybski’s
328
two major works (Korzybski, 1921; Korzybski, 1994) emphasize the importance of bottom-up knowledge.
329
The corticothalamic and thalamocortical connections play different but complementary roles.330
A balanced interplay between System-1 and System-2 is important. System-1’s innate role is to ensure
331
the many faceted health of the organism. System-2 is ideally used to help humans better contend with
332
surprises, threats, complex situations, important goals, and achieve higher levels in Maslow’s hierarchy
333
of needs. From an AI systems perspective, contemporary Deep/Machine Learning methods (including
334
Deep Learning) have it the other way around: Causal modeling and advanced reasoning are being solved
335
in System 1, leveraging statistical models which can be seen as an inversion of proper thalamocortical
336
integration.337
5 CONCLUSIONS
While not conclusive, findings about natural intelligence from psychology, neuroscience, cognitive science,
338
and animal cognition imply that both low-level perceptual knowledge and higher-level more abstract
339
knowledge may be neurosymbolic. The difference between high and low levels of abstraction may be that
340
lower levels involve a greater amount of unconscious (automatic) processing and attention, while higher
341
levels are introspectable to a greater extent (in humans, at least) and involve conscious (i.e. steerable)
342
attention. The neurosymbolic metamodel and framework introduced in this paper for artificial general
343
Frontiers 9
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
intelligence is based on these findings, and the nature of the distinction between both systems will be
344
subject to further research. One may ask whether artificial intelligence needs to mimic natural intelligence
345
as a key performance indicator. The answer is yes and no. No, because natural intelligence, a result of
346
billions of years of evolution, is full of imperfections and mistakes. Yes, because it is the best way known
347
to help organisms survive for countless generations.348
Both natural and artificial intelligences can exhibit astounding generalizability, performance, ability
349
to learn, and other important adaptive behaviors when symbolic originating attention and sub-symbolic
350
originating attention are properly handled. Allowing one system of attention to dominate, or inverting
351
the natural order (e.g. reasoning in the subsymbolic space or projecting symbolic space stressors into the
352
subsymbolic space) may lead to suboptimal results for engineered systems, individuals, and societies.353
AUTHOR CONTRIBUTIONS
H. Latapie and O. Kilic conceived of the presented idea. H. Latapie designed the framework and the
354
experiments. H. Latapie and O. Kilic implemented the framework. O. Kilic ran the tests and collected data.
355
K. R. Th
´
orisson, P. Wang and P. Hammer contributed to the theoretical framework. All authors contributed
356
to the writing of this manuscript and approved the final version.357
ACKNOWLEDGMENTS
The authors would like to thank Tony Lofthouse for his valuable comments.358
REFERENCES
Alcaraz, F., Fresno, V., Marchand, A. R., Kremer, E. J., Coutureau, E., and Wolff, M. (2018).
359
Thalamocortical and corticothalamic pathways differentially contribute to goal-directed behaviors in the
360
rat. eLife 7, e32517. doi:10.7554/eLife.32517361
Anticevic, A., Cole, M. W., Repovs, G., Murray, J. D., Brumbaugh, M. S., Savic, A. M. W. A., et al. (2014).
362
Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cerebral Cortex 24,
363
3116–3130. doi:10.1093/cercor/bht165pmid:23825317364
Balleine, B. W. and Leung, R. W. M. B. K. (2015). Thalamocortical integration of instrumental learning
365
and performance and their disintegration in addiction. Brain Research 1628, 104–116. doi:10.1016/j.
366
brainres.2014.12.023367
Bengio, Y. (2019). From system1 deep learning to system2 deep learning [conference presentation].
368
NeurIPS 2019 Posner Lecture369
Bengio, Y., Lecun, Y., and Hinton, G. (2021). Deep learning for ai. Communications of the ACM 64,
370
58–65371
Bolkan, S. S., Stujenske, J. M., Parnaudeau, S., Spellman, T. J., Rauffenbart, C., Abbas, A. I., et al.
372
(2017). Thalamic projections sustain prefrontal activity during working memory maintenance. Nature
373
Neuroscience 20, 987–996374
Brannon, E. M. (2005). What animals know about number. In Handbook of mathematical cognition, ed.
375
J. I. D. Campbell (New York: Psychology Press). 85–108376
Camp, E. (2009). Language of baboon thought. In The Philosophy of Animal Minds, ed. R. W. Lurz
377
(Cambridge: Cambridge University). 108–127378
Clay, Z. and Zuberb
¨
uhler, K. (2011). Bonobos extract meaning from call sequences. PLoS ONE 6, e18786
379
This is a provisional file, not the final typeset article 10
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Diester, I. and Nieder, A. (2007). Semantic associations between signs and numerical categories in the
380
prefrontal cortex. PLoS Biol 5381
Evans, J. S. B. and Elqayam, S. (2007). Dual-processing explains base-rate neglect, but which dual-process
382
theory and how? Behavior and Brain Science 30, 261–262. doi:10.1017/S0140525X07001720383
Evans, J. S. B. T. and Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the
384
debate. Perspectives in Psychological Science 8, 223–241385
Gentner, T. Q., Fenn, K. M., Margoliash, D., and Nusbaum, H. C. (2006). Recursive syntactic pattern
386
learning by songbirds. Nature2006 440, 1204–1207387
Gharghabi, S., Ding, Y., Yeh, C.-C. M., Kamgar, K., Ulanova, L., and Keogh, E. (2017). Matrix profile
388
viii: Domain agnostic online semantic segmentation at superhuman performance levels. In 2017 IEEE
389
International Conference on Data Mining (ICDM). 117–126. doi:10.1109/ICDM.2017.21390
Gorban, A. N. and Tyukin, I. Y. (2018). Blessing of dimensionality: mathematical foundations of the
391
statistical physics of data. Philosophical Transactions of the of the Royal Society Mathematical Physical
392
and Engineering Sciences 440, 1204–1207393
Goyal, A. and Bengio, Y. (2020). Inductive biases for deep learning of higher-level cognition. arXiv
394
preprint395
Hammer, P., Lofthouse, T., Fenoglio, E., and Latapie, H. (2019). A reasoning based model for anomaly
396
detection in the smart city domain. In NARS Workshop in AGI-19, Shenzhen, China, August 6. 1–10397
Hart, D. and Goertzel, B. (2008). Opencog: A software framework for integrative artificial general
398
intelligence. In Proc. of AGI2008, eds. P. Wang, B. Goertzel, and S. Franklin. 468–472399
Hauser, M. D., Dehaene, S., Dehaene-Lambertz, G., and Patalano, A. L. (2007). Spontaneous number
400
discrimination of multi-format auditory stimuli in cotton-top tamarins (saguinus oedipus). Cognition 86,
401
B23–B32402
Helgason, H. P., Th
´
orisson, K. R., Garrett, D., and Nivel, E. (2013). Towards a general attention mechanism
403
for embedded intelligent systems 4, 1–7404
Houwer, J. D. (2019). Moving beyond system 1 and system 2: Conditioning, implicit evaluation, and
405
habitual responding might be mediated by relational knowledge. Experimental Psychology 66, 257–265
406
Hua, M., Chen, Y., Chen, M., Huang, K., Hsu, J., Bai, Y., et al. (2021). Network-specific corticothalamic
407
dysconnection in attention-deficit hyperactivity disorder. Journal of Developmental and Behavioral
408
Pediatrics 42, 122–127. doi:10.1097/DBP.0000000000000875409
Hubbard, E. M., Diester, I., Cantlon, J. F., Ansar, D., van Opstal, F., and Troiani, V. (2008). The evolution
410
of numerical cognition: From number neurons to linguistic quantifiers. The Journal of Neuroscience 28,
411
11819 –11824412
James, W. (1890). The Principles of Psychology, Vol. 2 (NY: Dover Publication)413
Kahneman, D. (1973). Attention and Effort (NJ: Prentice-Hall)414
Kahneman, D. (2011). Thinking, fast and slow (NY: Farrar, Straus and Giroux)415
Keren, G. (2013). A tale of two systems: A scientific advance or a theoretical stone soup? commentary on
416
evans stanovic. Perspectives on Psychological Science 8, 257–262417
Kipf, T., Fetaya, E., Wang, K. C., Welling, M., and Zemel, R. (2018). Neural relational inference for
418
interacting systems. arXiv preprint419
Koch, C. and Tsuchiya, N. (2006). Attention and consciousness: two distinct brain processes. Trends
420
Cognitive Science 11, 16–22. doi:10.1016/j.tics.2006.10.012421
Korzybski, A. (1921). Manhood Of Humanity, The Science and Art of Human Engineering (NY: E. P.
422
Dutton and Company)423
Frontiers 11
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Korzybski, A. (1994). Science and Sanity: An Introduction to Non-Aristotelian Systems, 5th edn, (NY:
424
Institute of General Semantics)425
Lake, B. and Baroni, M. (2018). Generalization without systematicity: On the compositional skills of
426
sequence-to-sequence recurrent networks. In Proc. of International Conference on Machine Learning
427
(PMLR), 2873–2882428
Latapie, H., Liu, O. K. G., Kompella, R., Lawrence, A., Sun, Y., Srinivasa, J., et al. (2021). A metamodel
429
and framework for artificial general intelligence from theory to practice. Journal of Artificial Intelligence
430
and Consciousness 8, 205–227. doi:10.1142/S2705078521500119431
Liu, D., Lamb, A., Kawaguchi, K., Goyal, A., Mozer, C. S. M. C., and Bengio, Y. (2021). Discrete-valued
432
neural communication. arXiv preprint433
Llinas, R. R. (2002). Thalamocortical assemblies: How ion channels, single neurons and large-scale
434
networks organize sleep oscillations. In Thalamus and Related Systems, eds. A. Destexhe and T. J.
435
Sejnowski (Oxford: Oxford University). 87–88436
Marchetti, M. (2011). Against the view that consciousness and attention are fully dissociable. Frontiers in
437
Psychology 3. doi:https://doi.org/10.3389/fpsyg.2012.00036438
Martin, C., Bhui, R., and Bossaerts, P. (2014). Chimpanzee choice rates in competitive games match
439
equilibrium game theory predictions. Sci Rep 4, 51–81440
Miller, L. M. and D’Esposito, M. (2005). Perceptual fusion and stimulus coincidence in the cross-modal
441
integration of speech. Journal of Neuroscience 25, 5884–5893442
Monteiro, S. M. and Norman, G. (2013). Diagnostic reasoning: Where we’ve been, where we’re going.
443
Teaching and Learning in Medicine 25, S26–S3. doi:10.1080/10401334.2013.842911444
Newell, A. and Simon, H. A. (1976). Computer science as empirical inquiry: symbols and search.
445
Communications of the ACM 19, 113–126446
Nivel, E., Th
´
orisson, K. R., Steunebrink, B., Dindo, H., Pezzulo, G., Rodriguez, M., et al. (2013). Bounded
447
recursive self-improvement. Tech report RUTR-SCS13006, Reykjavik University – School of Computer
448
Science449
Nivel, E., Th
´
orisson, K. R., Steunebrink, B., and Schmidh
¨
uber, J. (2015). Anytime bounded rationality. In
450
Proc. 8th International Conference on Artificial General Intelligence (AGI-15). 121–130451
Noesselt, T., Riegerand, J. W., Schoenfeld, M. A., Kanowski, M., Hinrichs, H., and Heinze, H. J. (2007).
452
Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus
453
primary sensory cortices. Journal of Neuroscience 27, 11431–11441454
Papaioannou, A. G., Kalantzi, E., Papageorgiou, C. C., and Korombili, K. (2021). Complexity analysis of
455
the brain activity in autism spectrum disorder (asd) and attention deficit hyperactivity disorder (adhd)
456
due to cognitive loads/demands induced by aristotle’s type of syllogism/reasoning. a power spectral
457
density and multiscale entropy (mse) analysis. Heliyon 7, e07984458
Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., and Wermter, S. (2019). Continual lifelong learning with
459
neural networks: A review. Neural Networks 113, 54–71. doi:https://doi.org/10.1016/j.neunet.2019.01.460
012461
Perry, J. C., Pakkenberg, B., and Vann, S. D. (2018). Striking reduction in neurons and glial cells in anterior
462
thalamic nuclei of older patients with down’s syndrome. BioRxiv 449678 doi:doi:10.1101/449678463
Posner, I. (2020). Robots thinking fast and slow: On dual process theory and metacognition in embodied ai.
464
In RSS 2020 Workshop RobRetro465
Sampathkumar, V., Miller-Hansen, A., Sherman, S. M., and Kasthuri, N. (2021). Integration of signals
466
from different cortical areas in higher order thalamic neurons. PNAS 118, e2104137118. doi:10.1073/
467
pnas.2104137118468
This is a provisional file, not the final typeset article 12
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin 119, 3–24
469
Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences 11, 1–43
470
Spivey, M. J., Tanenhaus, M. K., Eberhard, K. M., and Sedivy, J. C. (2013). Eye movements and spoken
471
language comprehension: Effects of visual context on syntactic ambiguity resolution 45, 447–481472
Steenkiste, S. V., Chang, M., Greff, K., and Schmidhuber, J. (2018). Relational neural expectation
473
maximization: Unsupervised discovery of objects and their interactions. arXiv preprint474
Strack, F. and Deutsch, R. (2004). Reflective and impulsive determinants of social behavior. Personality
475
and Social Psychology Review 8, 220–247476
Sumner, P., Tsai, P. C., Yu, K., and Nachev, P. (2006). Attentional modulation of sensorimotor processes in
477
the absence of perceptual awareness. PNAS 103, 10520–10525478
Th
´
orisson, K. R. (2020). Seed-programmed autonomous general learning. In Proceedings of Machine
479
Learning Research. 32–70480
Th
´
orisson, K. R., Bieger, J., Li, X., and Wang, P. (2019). Cumulative learning. In Proc. International
481
Conference on Artificial General Intelligence (AGI-19). 198–209482
Tyll, S., Budinger, E., and Noesselt, T. (2011). Thalamic influences on multisensory integration. Commun.
483
Integr. Biol 4, 145–171484
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Kaiser, A. N. G. L., et al. (2017). Attention
485
is all you need. In Proc. Advances in Neural Information Processing Systems. 5998–6008486
Wang, P. (2005). Experience-grounded semantics: A theory for intelligent systems. Cogn. Syst. Res. 6,
487
282–302. doi:10.1016/j.cogsys.2004.08.003488
Wang, P. (2006). Rigid Flexibility: The Logic of Intelligence (Dordrecht: Springer)489
Wang, P. (2013). Non-Axiomatic Logic: A Model of Intelligent Reasoning (Singapore: World Scientific)490
Werner, S. and Noppeney, U. (2010). Superadditive responses in superior temporal sulcus predict
491
audiovisual benefits in object categorization. Cerebral Cortex 20, 1829 – 1842492
Wolff, M. and Vann, S. D. (2019). The cognitive thalamus as a gateway to mental representations. J
493
Neurosci 39, 3–14. doi:10.1523/JNEUROSCI.0479-18494
Xu, X., Hanganu-Opatz, I. L., and Bieler, M. (2020). Cross-talk of low-level sensory and high-level
495
cognitive processing: Development, mechanisms, and relevance for cross-modal abilities of the brain.
496
Frontiers in Neurorobotics 14. doi:10.3389/fnbot.2020.00007497
Yeh, C.-C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, A., et al. (2016). Matrix profile i: All pairs
498
similarity joins for time series: A unifying view that includes motifs, discords and shapelets. 1317–1322.
499
doi:10.1109/ICDM.2016.0179500
FIGURE CAPTIONS
Frontiers 13
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Figure 1. Neurosymbolic Metamodel and Framework for Artificial General Intelligence
Figure 2. Flow of Retail Use Case for Metamodel (from Latapie et al., 2021)
This is a provisional file, not the final typeset article 14
Latapie et al. Neurosymbolic Systems of Perception & Cognition: The Role of Attention
Figure 4. A Histogram of Regime Changes from Network Telemetry Data (A port shut down event started
at the 50th timestamp and ended at the 100th)
This is a provisional file, not the final typeset article 16