Conference PaperPDF Available

Cognitive Chunks, Neural Engrams and Natural Concepts: Bridging the Gap between Connectionism and Symbolism

Authors:

Abstract and Figures

Chunking theory is among the most established theories in cognitive psychology. However, little work has been done to connect the key ideas of chunks and chunking to the neural substrate. The current study addresses this issue by investigating the convergence of a cognitive CHREST model (the computational embodiment of chunking theory) and its neuroscience-based counterpart (based on deep learning). Both models were trained from raw data to categorise novel stimuli in the real-life domains of literature and music. Despite having vastly different mechanisms and structures, both models largely converged in their predictions of classical writers and composers-in both qualitative and quantitative terms. Moreover, the use of the same chunk/engram activation mechanism for CHREST and deep learning models demonstrated functional equivalence between cognitive chunks and neural engrams. The study addresses a historical feud between symbolic/serial and subsymbolic/parallel processing approaches to modelling cognition. The findings also further bridge the gap between cognition and its neural substrate, connect the mechanisms proposed by chunking theory to the neural network modelling approach, and make further inroads towards integrating concept formation theories into a Unified Theory of Cognition (Newell, 1990).
Content may be subject to copyright.
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©2024 IEEE
Cognitive Chunks, Neural Engrams and Natural
Concepts:
Bridging the Gap between Connectionism and
Symbolism
Dmitry Bennett
Centre for Philosophy of Natural and
Social Science
London School of Economics
London, UK
D.Bennett5@lse.ac.uk
Fernand Gobet
Centre for Philosophy of Natural and
Social Science
London School of Economics
London, UK
F.Gobet@lse.ac.uk
AbstractChunking theory is among the most established
theories in cognitive psychology. However, little work has been
done to connect the key ideas of chunks and chunking to the
neural substrate. The current study addresses this issue by
investigating the convergence of a cognitive CHREST model
(the computational embodiment of chunking theory) and its
neuroscience-based counterpart (based on deep learning). Both
models were trained from raw data to categorise novel stimuli
in the real-life domains of literature and music. Despite having
vastly different mechanisms and structures, both models largely
converged in their predictions of classical writers and
composers in both qualitative and quantitative terms.
Moreover, the use of the same chunk/engram activation
mechanism for CHREST and deep learning models
demonstrated functional equivalence between cognitive chunks
and neural engrams. The study addresses a historical feud
between symbolic/serial and subsymbolic/parallel processing
approaches to modelling cognition. The findings also further
bridge the gap between cognition and its neural substrate,
connect the mechanisms proposed by chunking theory to the
neural network modelling approach, and make further inroads
towards integrating concept formation theories into a Unified
Theory of Cognition (Newell, 1990).
Keywordschunking, symbolic, deep learning, subsymbolic,
CHREST (key words)
I. INTRODUCTION
The brain is often said to be the most complex object in the
known universe. There are multiple levels of investigating
brain functions: starting from the subatomic relations between
particles, to the molecular level, to whole neurons that
generate action potentials, to networks of networks that link
various brain regions, to cognition and behaviour that emerge
from all of these multilevel relations (Purves et al., 2013).
How do we approach this level of complexity?
The approach of cognitive psychology was to unravel the
mechanisms that underlie cognition starting with the top levels
(cognition and behaviour). For example, experiments on
human processing have shed light on the cognitive functions
of human attention, long-term and short-term memory
modules (LTM and STM, respectively), and their
interconnectedness with the perceptual apparatus.
Neuroscience, in turn, focused on the investigation of the
lower-level neural functions (e.g., synapses, neuronal
structure, firing rates, and refractory periods) and their
relationship to higher cognition. Understandably, both levels
of explanation rely on different sets of assumptions and
proposed mechanisms.
While many of the findings remain in the shape of verbal
theories, a sizable part has been captured via computational
formalisms (Anderson & Lebiere, 1998; Laird, Lebiere, &
Rosenbloom, 2017; Newell, 1990; Ritter, Tehranchi, & Oury,
2019). Computational formal models form a solution to the
problem of “magic parameters” associated with purely verbal
theories and integrate proposed mechanisms into a more
unified whole (Byrne, 2012; Lane & Gobet, 2012a; Newell,
1990).
The currently ongoing AI revolution is powered by formal
models of artificial neural networks (ANNs, historically
known under umbrella terms of connectionism and, more
recently, deep learning) (Hambling, 2020; Jo, Nho, & Saykin,
2019; Mnih et al., 2013; Silver et al., 2016; Silver et al., 2017;
Vaswani et al., 2017). Deep learning is also commonly used
in psychological models (e.g., Battleday, Peterson, &
Griffiths, 2020; Hoffman, McClelland, & Lambon Ralph,
2018; Sanders & Nosofsky, 2020). Indeed, its set of
fundamental mechanisms was largely developed and refined
through research in neuroscience and psychology (Hahnloser
et al., 2000; Hinton & McClelland, 1987; Hinton et al., 2012;
McCulloch & Pitts, 1943; Nair & Hinton, 2010; Rosenblatt,
1958, 1962; Rumelhart, Hinton, & Williams, 1986). A review
of computational neuroscience models concluded that despite
the problem of oversimplification, deep learning models can
already provide profound insights into the processing of the
brain (Richards et al., 2019).
On the cognitive psychology side, one of its most
established theories chunking theory has been also
embodied in computational cognitive architectures, first
EPAM (Feigenbaum, 1963; Richman, Staszewski, & Simon,
1995) and now CHREST (Chunking Hierarchy REtrieval
STructures) (Gobet, 1993, 2000; Gobet & Lane, 2012; Gobet
& Simon, 2000). Chunking theory’s key idea – a chunk is
defined as a meaningful unit of information made from
elements that have strong associations between each other
(e.g., several digits making up a telephone number). Hence,
chunking is the process of forming and updating chunks in the
cognitive system (Simon, 1974). Although the chunks
themselves vary between people due to personal differences,
the chunking mechanism is mostly invariant across domains,
individuals and cultures (Chase & Simon, 1973; Gobet et al.,
2001; Miller, 1956).
Since its emergence in 1959, cognitive chunking has been
found to be central in verbal learning (Feigenbaum & Simon,
1984; Richman & Simon, 1989; Richman, Simon, &
Feigenbaum, 2002), perception and memory systems involved
in expert behaviour (Gobet & Simon, 2000; Richman et al.,
1996; Richman et al., 1995; Simon & Chase, 1973; Simon &
Gilmartin, 1973), concept learning and categorisation
(Bennett, Gobet, & Lane, 2020; Lane & Gobet, 2012b),
developmental abilities and cognitive decline due to ageing
(Mathy et al., 2016; R. Smith, Gobet, & Lane, 2007),
acquisition of grammar in children (Freudenthal et al., 2016;
Gegov et al., 2012), and the list goes on. Thus, the idea of a
chunk is one of the key ideas in all of cognitive psychology.
One classic example of chunking theory is the finding that
stronger chess players are able to recall more novel chess
positions from a given chessboard when compared to weaker
players, but this effect is much more pronounced when the
said novel positions come from an actual game, and not just
randomly placed chess pieces. According to CHREST a
computational model based on chunking this is due to the
experts possessing more chunks in their LTM (Gobet &
Simon, 1996) (see Figure 1).
Figure 1. Recollection of game and random chess positions as a
function of ELO rating in humans and number of chunks in
CHREST. From Gobet and Simon (1996).
The neural basis of chunking was investigated using
neuroimaging techniques. It was found that experts possess
large domain-specific knowledge structures that activate in the
areas of the brain associated with episodic LTM memory.
While novices primarily rely on the prefrontal cortex to form
new primitives and update their shallow chunking networks,
experts show less activation in the prefrontal areas, but large
activations in the medial temporal lobe, presumably due to
rapid utilisation of large knowledge structures (see Figure 2)
(for a review, see Guida et al., 2012). While these findings
were important for establishing a link between chunking and
the neural function, they were presented in a form of a verbal
theory that was difficult to operationalise, e.g., using a
connectionist model of chunking.
The current paper aims to address this issue by
investigating the correspondence between the rigorous
cognitive CHREST model that is based on chunking and a
neuroscience-based model based on deep learning.
Figure 2. Experts’ brains (on the right) have more activations in the
temporal regions associated with LTM, and fewer activations in the
prefrontal STM regions when compared to novices (on the left).
Darker shades of green signify stronger activation. Adapted from
Guida et al. (2012).
II. CHREST AND DEEP LEARNING
CHREST is a self-organising computer model that simulates
human learning processes via interacting cognitive
mechanisms and structures. For CHREST, learning implies
gradual growth of a network of chunks in LTM, a process
influenced both by the environmental stimuli and the data that
have already been stored (Gobet & Lane, 2012). CHREST’s
STM structure allows for additional ways to create links
between chunks, such as linking chunks across visual and
verbal modalities.
Another way to present CHREST is to say that it is
analogous to deep learning both in terms of its power and
simplicity, with the caveat that CHREST’s level of
investigating the brain function starts with the top level
(cognition and behaviour) as opposed to neural mechanisms.
With regard to power, like the multi-layer artificial neural
nets, CHREST is an example of a universal function
approximator (Fredkin, 1960; Gobet, 1996; Hornik,
Stinchcombe, & White, 1989). Thus, like deep learning,
CHREST is capable of classifying complex multidimensional
stimuli while learning from raw data, using both supervised
and unsupervised approaches.
As for simplicity, like deep learning, CHREST is very
simple at its core. While a perceptron is an idealised model of
a neuron, CHREST presents an idealised model of a cognitive
system. But, where deep learning relies on linear algebra,
partial differentials and the differentiation chain rule,
CHREST relies on a different set of formalisms. They include
Figure 3. Some of the core neural (on the left) and cognitive
mechanisms (on the right), and their respective formalisms (below)
in deep learning and CHREST respectively. The microscopy image
was taken from Olexik (2015).
graph data structures, first-in-first-out queues, with the whole
system being trained by a process of chunking that is
functionally equivalent to deep learning’s backpropagation.
(Backpropagation is the process of adjusting synaptic weights
based on the error rate of the artificial neural network
(Rumelhart et al., 1986)). Chunks are operationalized as graph
nodes and chunking is the process of adding new data to the
LTM (see Figure 3). This is done via two psychologically
plausible cognitive processes: discrimination and
familiarisation. Discrimination is the process of adding a new
node to the network. Familiarisation updates existing nodes
with new information.
An important difference between CHREST and
connectionism/deep learning is that CHREST is an example
of a symbolic architecture, while the connectionist neural nets
are subsymbolic (the question of which approach better
models cognition was hotly debated) (Simon, 1991). In
practice, this means that CHREST’s patterns are meaningful
and are represented as symbols (i.e., text) for objects inside
(cognition) and outside (input) the architecture. This is in
contrast to deep learning, where, for example, meaningful
input text is converted into numbers which are then
manipulated by the internal functions to generate a desired
output (see Figure 4). We should also add that CHREST is
different to many symbolic models (like “expert systems”)
and is closer to deep learning in its focus on perception as the
primary driver of intelligence. Gobet and Lane (2012) offer an
in-depth introduction to the chunking theory; for deep
learning, see LeCun, Bengio, and Hinton (2015).
Figure 4. Representations in cognitive, ANN and biological systems.
From left to right: a chunk with letter “A” in CHREST; numerical
neural heatmap corresponding to letter “A” in a deep learning
model; biological neuronal engram corresponding to “safe place”
in an optogenetically modified mouse (from Liu, Ramirez, and
Tonegawa (2013)). Yellow colour represents strong positive
activations and dark blue represents strong negative activations.
III. CONCEPTS AND CHUNKS
As was mentioned above, chunking plays a crucial role in a
wide range of cognitive phenomena. We focused on chunking
in concept learning/categorisation as this field is particularly
complex, and, “in some way, everything is concepts”
(Murphy, 2002, p. 3).
What are concepts? One definition is that concepts are
“mental representations of classes of things”, with “classes of
things“ themselves being categories (Murphy, 2002).
Historically, the psychological literature on concept formation
was dominated by formal models that operated on artificial
categories with a few and often binary dimensions (e.g.,
Anderson, 1991; Love, Medin, & Gureckis, 2004; Nosofsky,
2011), or natural categories that were pre-processed (into a
few and often binary dimensions) (e.g., Nosofsky et al., 2017;
Nosofsky et al., 2018). This led to reformulating the definition
of concepts as either prototypes (summary descriptions)
(Frixione & Lieto, 2012), clusters of specific instances
(exemplars) (Nosofsky, 2011), or clusters based on Bayesian
inference (Anderson, 1991) and other clustering algorithms
(Murphy, 2002). More recently, a number of psychological
deep learning models moved towards processing raw natural
categories, for example, classifying real-life images using
their pixel data (Battleday et al., 2020; Sanders & Nosofsky,
2020), or finding false sentences (e.g., “fur has cat”) in a
natural language text (Bhatia & Richie, 2022). Moreover,
Battleday et al. (2020) concluded that intuitions about theory
and model performance for low-dimensional categories do not
transfer to higher-dimensional ones.
Chunking theory’s CHREST has also been used to model
concept learning in tasks with high-dimensional real-life
complexity. One CHREST model was able to categorise novel
chess positions as one of two types of opening a French, or
a Sicilian (in chess, an opening concerns the first 10-20 moves
of a game, with there being billions of potential different
sequences of moves) (Lane & Gobet, 2012b). A more recent
CHREST model was able to categorise novel literature pieces
and music scores by predicting their respective author or
composer (Bennett et al., 2020). The latter model was also
notable for being domain general. For example, while the
chess model relied on chess-specific heuristics/mechanisms
that were hand-crafted by a chess expert (e.g., one of the
heuristics guided model’s attention towards chess pieces
under attack), the literature and music categorisation did not
have such pre-built knowledge structures and feature
detectors. Instead, the model automatically formed chunking
hierarchies during the learning phase (exposure to different
literature and music pieces). During the test phase, the model
automatically activated the largest of the formed chunks to
“vote” for a category. In broader terms, this meant that a
concept (e.g., a mental representation such as “Mozart” or
“Homer”) was a collection of chunks in a cognitive LTM-like
structure as operationalised by chunking theory’s CHREST.
IV. THE PRESENT STUDY
The present study intended to establish the level of
convergence between the chunking theory and
connectionism/deep learning. We replicated and compared
CHREST’s artificial category learning performance, as well
as literature and music categorisation experiment, with a deep
learning model. While deep learning models have a rich
history in text classification, with over 150 models built in
recent years alone (Minaee et al., 2020), models of music
scores classification are less numerous. Dor and Reich (2011)
analysed MIDI data and achieved over 90% accuracy on
classification of composer pairs (e.g., Bach or Chopin, Bach
or Mozart). Herremans, Martens, and Sörensen (2015)
achieved over 80% accuracy on a 3-way classification of a
large dataset containing MIDI music by Bach, Beethoven and
Haydn. However, both of the studies above relied on hand
engineered musical features such as “melodic fifth
frequency”, “note count feature” and “melodic octave
frequency”, instead of training from raw music score data.
Indeed, Dor and Reich (2011) considered classification of raw
music scores to be impossible for the then current machine
learning methods.
The approach of the current study included training
linguistic and non-linguistic domains in one pass (i.e.
simultaneous learning of both the literature authors and music
composers as was done in the CHREST study). Also, the
training sets were kept to raw data only. To our knowledge,
this approach is novel. We should also note that our deep
learning model is meant to be supplementary to CHREST
research while our model is not trivial, it makes no claim to
state-of-the-art categorisation performance. Instead, it was
designed to aid comparison and to provide important
theoretical neuroscience context to the state-of-the-art
cognitive model of concept learning (CHREST). We trained
both CHREST and deep learning models on the same set of
unabridged works by various authors and composers. We
tested categorisation on previously unseen pieces produced by
the same authors and composers.
V. METHOD
A. The Training and Testing
The training and testing procedure was meant to mostly
replicate the experiment by Bennett et al. (2020). There were
10 categories altogether, with 6 literature authors (Homer,
Chaucer, Shakespeare, Scott, Dickens, Joyce) and 4 music
composers (Bach, Mozart, Beethoven, Chopin). For each
category (i.e., a literature author or a music composer), the
training set contained approximately 300Kb of text in total.
The test dataset was expanded from the original study’s 50
pieces of literature and music to 120 pieces (60 literature
pieces and 60 music scores; these were not part of the training
set). This was done both to further test the CHREST model
and to broaden the scope of comparison to the ANN model.
B. CHREST Model
Our CHREST model architecture was replicated from the
original experiment by Bennett et al. (2020): it once again
contained an LTM data structure that acquired chunks through
learning, an STM structure that allowed to create category
naming links between chunks, and a sliding attention window.
The model also had a “chunk activation” mechanism: if there
are m categories, the vector of categories is c =[,,
 ], the vector of category specific chunk activations is
a=[,, …] and the confidence of a prediction that a
stimulus belongs to category would be calculated using the
equation
󰇛󰇜 󰇛󰇜

where 󰇛󰇜is confidence that the category is , given a
literature or music stimulus x;  is the LTM chunks’
activation corresponding to that category, and the summation
part being the sum of chunk activations across all m
categories. See Bennett et al. (2020) for the full details of
CHREST categorisation model.
C. Deep Learning Model
A common way to model sequence processing in neural
networks is with a recurrent type of neural architecture, also
referred to as Recurrent Neural Networks (RNN) (Elman,
1990). Our model was also of this type. An RNN neuron has
an axon that branches and outputs signal into that neuron
itself, as well the subsequent neurons. Concretely, RNN
neuron’s output at time t is 󰇛󰇜,
where  is the threshold activation function (see below
for more details), is the input at time t,  is the synapse
weight between the input and the neuron,  is the output of
that neuron at the previous time step t-1, and the  is the
synapse weight between the neuron’s output and itself. The
backward propagation of RNN is also known as “backward
propagation through time” (Elman, 1990), but, despite the
added time component, the basic logic remains the same.
As was mentioned above, simple off-the-shelf RNNs
could not categorise raw music scores presumably due to the
network “forgetting” input that is above approximately ten
time steps (Bengio, Frasconi, & Schmidhuber, 2001;
Goodfellow, Bengio, & Courville, 2016). Thus, our model
was enriched with four additional psychologically plausible
mechanisms.
Firstly, the model featured random shutdown of neurons
(also known as “Dropout”) (Hinton et al., 2012). This can be
viewed as an approximation for the neural refractory period
the short period of time when the neuron may not fire again
(Deutsch, 1964). Functionally, the dropping out of neurons
from the learning process prevents overfitting and excessive
synaptic co-adaptation to patterns. Indeed, recent
neuroscience research suggests that artificially inducing
higher levels of neuronal dropout in biological brains (e.g., via
Deep Brain Stimulation) can both disrupt and improve
memory (Tan et al., 2020).
Secondly, the activation function for the neurons was
chosen to be “ReLU” (rectified linear unit f(x) = max(0,x)).
Originally proposed as a more biologically plausible depiction
of the neural threshold function that is analogue as well as
digital in nature (Hahnloser et al., 2000), ReLU became one
of the key drivers behind the breakthrough with training
artificial neural networks with many layers (Nair & Hinton,
2010). Because ReLU is so similar to a linear function while
being non-linear it significantly diminished gradient
saturation (and the ensuing vanishing gradient problem) that
was associated with the traditional neural activation function
sigmoid(x) and its variants like the tanh(x).
Our third addition was the “sliding attention window” – as
it was used with CHREST. The sliding attention window
passed the retrieved short word sequences to the model and
the model generated a vote/prediction for each of the
sequences.
The fourth addition was the “LTM activation” mechanism
that aimed to resolve conflicts between votes for different
categories. As the model generated category “votes”, these
votes were then aggregated and the overall winner was
declared by the confidence formula. The multidomain
confidence criterion was calculated by the same formula as
was used in CHREST model and still had the aim of resolving
conflicting “voting” among category representations. The
important difference was that, this time, the voting conflict
was among different neural activations/engrams (as opposed
to different cognitive chunks with CHREST):
󰇛󰇜 󰇛󰇜

where 󰇛󰇜is confidence that category (i.e., author or
composer) is , given a novel literature or music stimulus x;
is the neural engrams’ activation score corresponding to
that category (i.e., author or composer), and the summation
part being the sum of neural engram activations across all m
categories.
Training and testing text files were converted to numeric
vectors using the TensorFlow Tokenizer library to streamline
processing. It should be noted that text vectorisation was done
in the name of convenience despite the psychological
questionability of such a mechanism. The plausibility of the
overall model would not be affected by this particular decision
as artificial neural nets are capable of character recognition at
approximately human level of performance (LeCun et al.,
1999). The order of the training samples was randomised. No
notes or words were removed from either training or testing
patterns.
The model had around 8.5 million trainable parameters
and was trained for 20 epochs. The vocabulary size was set to
50,000; the size of the attention window was set to 20
words/chords/notes; the embedding dimension was set to 1.
In all other aspects, the current experiment was a complete
replica of the CHREST music and literature categorisation
experiment discussed above.
See https://github.com/Voskod for Python3 source code;
for Java implementation of CHREST with graphical user
interface and more documentation see www.chrest.info.
VI. RESULTS
Both CHREST and deep learning models were able to learn
complex categories in the real-world music and literature
domains. They required no ad hoc additions to the core
architecture in order to simultaneously process music or
literature specific nuances. The descriptive statistics for both
models are summarised in Table 1.
CHREST’s categorisation performance was substantially
above chance of the 120 tests across 10 categories (implying
12 correct answers by pure chance), 83 were classed correctly.
Within modalities, CHREST correctly categorised 41 out of
60 literature works and 42 out of 60 music scores. The deep
learning model’s categorisation performance was also
substantially above chance of the 120 tests across 10
categories, 93 were classed correctly.
CHREST and ANN models made similar quantitative
predictions. The CHREST model had the highest true
predictions for Bach, Mozart, and Beethoven in music and
Chaucer, Homer, Shakespeare, and Dickens in literature. Bach
and Chaucer had the highest confidence scores for their
respective modalities. Chopin and Joyce had the lowest
confidence scores as well as the lowest true prediction rates.
The same pattern of results was true for the deep learning
model. One notable outlier was the discrepancy on the Walter
Scott category, where CHREST had 4/10 correct predictions
while the deep learning model had a 9/10 score.
There were no mistakes across modalities by either model
literature was never categorised as music and vice versa.
This implies that while the models were taught to classify 10
types of regularities, they formed (empirically) distinct
memory chunks/engrams that separate the domains of music
and literature as was evident from the overall winning
confidence scores. However, while CHREST had no
activations across modalities, they were occasionally present
for the deep learning model. For example, the first four Scott
pieces generated various activations across literature authors
but had zero activations for any music composer. On the other
hand, some stimuli did generate exactly that kind of “multi-
modality” engram activation pattern. For instance, Mozart’s
Fantasia in D activated an engram that was made up from
76% of Mozart’s representations, but also with 3% of Joyce’s
(see Table 2).
VII. DISCUSSION
A. Summarising the Results
Both cognitive and neural models were able to learn real-life
highly multidimensional categories while learning from raw
data only, without bootstrapping to pre-made knowledge
structures and feature detectors.
The comparison of the performance obtained by the CHREST
and deep learning models provides for intriguing analysis and
warrants further investigation. From a high-level perspective,
both models demonstrated the capability of learning concepts
in complex, dissimilar domains (linguistic in the case of
literature and non-linguistic in the case of music).
Beyond the qualitative similarity, the CHREST and deep-
learning models also made similar quantitative predictions.
CHREST and deep learning also share functional similarities.
This was demonstrated using the same activation mechanism
for both CHREST and deep-learning models. Indeed, the
proposed activation formula/code was completely
interchangeable between the models and required no
adjustment to measure the activation of chunks or the
activation of neural engrams. In this technical sense, cognitive
chunks may be said to be equivalent to neural engrams. The
attention window mechanism was also similar for the two
types of models.
Nevertheless, this similarity of the two models was not
absolute. While CHREST had no cross-modality activations
at all, the deep-learning model had slight (mostly around 1-
5%) activations on some of the cross-modal representations.
This being said, the overall distinct clustering of literature and
music was true for both models. The occasional and mostly
small cross modal activations of the deep-learning model may
need further investigation: while the result above may
represent statistical noise of the artificial neurons, similar
mechanism may also potentially elucidate the complex
phenomenon of synaesthesia (Ward & Simner, 2022).
B. Constraining the Infinite Space of Candidate Theories
One potential criticism of our study is that having two models
arriving at similar behaviour by different means does not
necessarily imply that the models are equivalent in any but the
broadest sense. Indeed, in a trivial example where x = 2 and
f(x) = 4, it does not make sense to talk about the equivalence
of functions such as f(x) = 2x; f(x) = x3 x2; f(x) = 4*sin( /
x), and so on. Of the similarly infinite number of functions that
may potentially represent a working cognitive system, which
one did nature choose to make a human categorise a novel
music piece as a “Beethoven”? Similarly, to what extent can
psychological models claim convergence with humans and
Table 1. Categorisation performance summary for CHREST and
deep learning models.
CHREST Deep Learning
Accuracy rate (%) Mean confidence Accuracy rate (%) Mean confidence
Music Bach 100 0.50 93 0.56
Beethoven 93 0.44 87 0.46
Chopin 20 0.31 27 0.34
Mozart 67 0.42 67 0.52
Literature Chaucer 100 0.50 100 0.85
Dickens 70 0.26 100 0.49
Homer 90 0.31 100 0.58
Joyce 20 0.17 40 0.37
Scott 40 0.20 90 0.39
Shakespeare 90 0.34 90 0.50
with each other? To put it yet another way, given two points
“A” and “B”, there is an infinite number of paths that lead
from point “A” to point “B”; how would we decide on which
path to take? On the one hand, there is no real answer a
“solution” that is commonly known as Hume’s problem of
induction (Hume, 1748). We relied on three pragmatic means
to address this problem in the current study.
Firstly, the simplest answer would be to choose the path
that satisfies some other constraints. For example, the shortest
path, a path through the gates “X”, “Y”, “Z” and so on. In
terms of choosing a psychologically plausible computational
approach, one should choose a method that satisfies multiple
constraints e.g., postdicting past psychological experimental
data as well as predicting findings that have not yet been
reported (Newell, 1990). Our CHREST and deep-learning
models both satisfy these constraints as they incorporate
fundamental psychological mechanisms and structures, and
are rooted in decades of psychological research. For example,
our CHREST model features the STM, LTM, chunking,
familiarisation, discrimination, association and attention;
while our ANN model has dendrites, axons, threshold
activation and a refractory period. While not the focus of the
current study, CHREST family of models also successfully
simulates human timings and learning rates in a variety of
cognitive experiments (Gobet, Lane, & Lloyd-Kelly, 2015).
As was mentioned above, a recent Nature review concluded
that deep learning models offer profound insights into the
working of the brain (Richards et al., 2019). This means that,
in terms of the “path from A to B” analogy above, our models
not only connect the “A” and “B”, but also pass the “X”, “Y”,
“Z” gates/constraints that are relevant to psychology. This is
unlike other hypothetical models of categorisation (e.g.,
semantic parsers, support vector machines, etc) that connect
the “A” and “B” without passing the gates and thus make up
the unconstrained infinite space of candidate models. More
broadly, this aspect forms a crucial difference between
computer science (where all models are “equal” as long as
they succeed at a task) and psychology (where
“psychologically plausible” models are desired). For instance,
Deep Blue is not considered to be a psychologically plausible
model of human chess playing (Gobet, 1997a) due to its
reliance on brute search, but AlphaZero is more psychological
in how it relies on pattern recognition (as well as incorporating
Table 2. An excerpt of individual categorisation confidence scores by CHREST and deep learning models. Red colour signifies zero memory
activation and darker shades of green signify stronger activation. Numbers in bold denote the highest confidence prediction on a given novel
test.
CHREST model
Author File Chaucer Dickens Homer Joyce Scott Shakespeare Bach Beethoven Chopin Mozart Correct
Mozart A Piece For Piano K176 0.00 0.00 0.00 0.00 0.00 0.00 0.60 0.16 0.03 0.21 FALSE
Adagio In B Flat 0.00 0.00 0.00 0.00 0.00 0.00 0.12 0.04 0.29 0.55 TRUE
Fantasia In C K.475 0.00 0.00 0.00 0.00 0.00 0.00 0.14 0.20 0.28 0.38 TRUE
Fantasia In D, K397 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.06 0.31 0.58 TRUE
K309 Piano Sonata N10 1mov 0.00 0.00 0.00 0.00 0.00 0.00 0.44 0.24 0.11 0.22 FALSE
Scott Talisman Part 2 0.05 0.22 0.21 0.08 0.26 0.17 0.00 0.00 0.00 0.00 TRUE
Talisman Part 3 0.05 0.21 0.20 0.14 0.25 0.16 0.00 0.00 0.00 0.00 TRUE
Talisman Part 4 0.11 0.21 0.21 0.10 0.16 0.22 0.00 0.00 0.00 0.00 FALSE
Talisman Part 5 0.11 0.17 0.26 0.11 0.17 0.19 0.00 0.00 0.00 0.00 FALSE
The Lay Of The Last Minstrel 0.04 0.13 0.11 0.15 0.17 0.39 0.00 0.00 0.00 0.00 FALSE
Beethoven Piano Sonata N08 Op13 1mov 0.00 0.00 0.00 0.00 0.00 0.00 0.31 0.44 0.14 0.10 TRUE
Piano Sonata N08 Op13 3mov 0.00 0.00 0.00 0.00 0.00 0.00 0.46 0.32 0.12 0.11 FALSE
Piano Sonata N09_1 0.00 0.00 0.00 0.00 0.00 0.00 0.33 0.50 0.10 0.06 TRUE
Piano Sonata N09_2 0.00 0.00 0.00 0.00 0.00 0.00 0.34 0.51 0.09 0.06 TRUE
Piano Sonata N10 1mov 0.00 0.00 0.00 0.00 0.00 0.00 0.41 0.41 0.08 0.09 TRUE
Joyce A_Portrait_Of_The_Artist _1 0.05 0.25 0.14 0.24 0.20 0.11 0.00 0.00 0.00 0.00 FALSE
A_Portrait_Of_The_Artist _2 0.04 0.20 0.14 0.24 0.21 0.16 0.00 0.00 0.00 0.00 TRUE
A_Portrait_Of_The_Artist _3 0.02 0.23 0.20 0.23 0.10 0.21 0.00 0.00 0.00 0.00 TRUE
Finnegans_Wake_1 0.12 0.12 0.19 0.18 0.10 0.29 0.00 0.00 0.00 0.00 FALSE
Finnegans_Wake_2 0.13 0.16 0.18 0.09 0.16 0.29 0.00 0.00 0.00 0.00 FALSE
Deep Learning model
Author File Chaucer Dickens Homer Joyce Scott Shakespeare Bach Beethoven Chopin Mozart Correct
Mozart A Piece For Piano K176 0.00 0.00 0.00 0.00 0.00 0.00 0.57 0.14 0.05 0.24 FALSE
Adagio In B Flat 0.00 0.00 0.02 0.10 0.00 0.01 0.01 0.08 0.16 0.61 TRUE
Fantasia In C K.475 0.00 0.00 0.00 0.05 0.00 0.01 0.01 0.09 0.40 0.43 TRUE
Fantasia In D, K397 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.21 0.76 TRUE
K309 Piano Sonata N10 1mov 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.46 0.11 0.18 FALSE
Scott Talisman Part 2 0.00 0.13 0.20 0.04 0.51 0.11 0.00 0.00 0.00 0.00 TRUE
Talisman Part 3 0.00 0.11 0.09 0.05 0.54 0.21 0.00 0.00 0.00 0.00 TRUE
Talisman Part 4 0.00 0.15 0.15 0.09 0.38 0.24 0.00 0.00 0.00 0.00 TRUE
Talisman Part 5 0.00 0.11 0.25 0.05 0.38 0.22 0.00 0.00 0.00 0.00 TRUE
The Lay Of The Last Minstrel 0.04 0.02 0.07 0.11 0.26 0.48 0.00 0.00 0.02 0.00 FALSE
Beethoven Piano Sonata N08 Op13 1mov 0.00 0.00 0.00 0.02 0.00 0.02 0.13 0.38 0.14 0.30 TRUE
Piano Sonata N08 Op13 3mov 0.00 0.00 0.00 0.00 0.00 0.00 0.43 0.42 0.05 0.10 FALSE
Piano Sonata N09_1 0.00 0.00 0.00 0.00 0.00 0.00 0.18 0.49 0.18 0.16 TRUE
Piano Sonata N09_2 0.00 0.00 0.00 0.00 0.00 0.02 0.23 0.58 0.12 0.05 TRUE
Piano Sonata N10 1mov 0.00 0.00 0.00 0.00 0.00 0.00 0.41 0.40 0.05 0.15 FALSE
Joyce A_Portrait_Of_The_Artist _1 0.00 0.45 0.07 0.35 0.11 0.02 0.00 0.00 0.01 0.01 FALSE
A_Portrait_Of_The_Artist _2 0.00 0.47 0.01 0.34 0.09 0.09 0.00 0.00 0.00 0.00 FALSE
A_Portrait_Of_The_Artist _3 0.00 0.38 0.04 0.46 0.07 0.04 0.00 0.00 0.00 0.00 TRUE
Finnegans_Wake_1 0.00 0.42 0.05 0.41 0.03 0.09 0.00 0.00 0.00 0.00 FALSE
Finnegans_Wake_2 0.03 0.41 0.03 0.34 0.06 0.12 0.00 0.00 0.00 0.00 FALSE
some neural mechanisms) (Gobet & Waters, 2023; Silver et
al., 2017).
The second important constraint is the inherent difficulty
of problems that were long considered to be “ill-posed”, yet
routinely solved by the brains/minds of various organisms
such as inverse optics and inverse kinematics (Palmer, 1999;
Pizlo, 2001). Musical and literature categorisation is one type
of this “ill-posed” problem. Psychological models of concept
learning traditionally struggled with such tasks and resorted to
either artificial categories with a few dimensions (e.g.,
Braunlich & Love, 2022; Nosofsky, 2011), pre-processed
natural categories into a few dimensions (e.g., Konovalova &
Le Mens, 2018; Nosofsky, 2011), or bootstrapping to hand-
crafted knowledge structures such as semantic dictionaries
(e.g., Lieto, 2019). More recently, there was a move to
combine psychological models with deep learning, where
ANNs do the heavy lifting of learning from raw data
(Battleday et al., 2020). In the current study, both models learn
from highly multidimensional raw data without bootstrapping,
with CHREST performing all the learning required with its
own mechanisms.
The third crucial constraint is the “single algorithm
hypothesis”, which proposes that visual, auditory, motor, and
somato-sensory cortices utilise approximately the same
algorithm to extract approximately one type of data structure
from various types of information (i.e., visual, auditory,
motor, etc) (Hawkins & Blakeslee, 2004; Mountcastle, 1978).
In this context, the similarity between CHREST and deep
learning is constrained in two important ways. Firstly, chunks-
based and engrams-based models converged in classification
of relatively dissimilar complex domains. Secondly, both
approximations of this “single algorithm” shared a common
activation mechanism which works on both cognitive chunks
and neural engrams. Having said that, of course, a conclusive
guarantee that the models’ overlap provides a unique
explanation is impossible (Lakatos, 1970). See Lieto (2022)
for more discussion on “function only” (functionalist) versus
“function + constraints” (structural) models as well as a more
general framework of evaluating bio-inspired models.
C. Future Research and Conclusions
We focused on one of the most complex, yet most
fundamental psychological processes concept learning.
Future research, while utilising a similar methodology, may
focus on other cognitive phenomena that involve chunking
(e.g., working memory, expertise, acquisition of grammar,
verbal learning, reasoning, and the list goes on) to further
ground chunking mechanisms in the neural substrate. For
example, it has been established that human working memory
contains around seven chunks at a time (Baddeley, 1986;
Miller, 1956; Robbins, 1995); and, that human experts’ LTM
typically possess between one to five hundred thousand
chunks with information specific to their domain (Gobet &
Simon, 1998; Richman et al., 1996). One natural extension of
the current study would be to adapt the chunk/engram
activation mechanisms proposed in the current paper to
translate the above work on chunking to connectionist models.
Such work would be of obvious benefit to both psychology
and AI. On the psychology side, this would further bridge
cognitive psychology, the neuroimaging studies of chunking
(Guida et al., 2012) and computational neuroscience. On the
AI side, establishing chunking mechanisms in deep learning
architectures would allow for better interpretability of the
models.
The correspondence of chunking and other, non-RNN,
deep learning architectures also warrants further investigation.
We chose the RNNs as they have deep roots in psychology
(this was important for the constraint saturation aspects that
were discussed above). RNNs may also be considered to be
closely related to CNNs (Convolutional Neural Networks)
(LeCun et al., 1999), LSTMs (Long Short-term Memory)
(Hochreiter & Schmidhuber, 1997) and GRUs (Gated
Recurrent Units) (Cho et al., 2014) indeed, our current
engram activation mechanism is compatible with all of these
architectures. On the other hand, the latest advancements in
deep learning based on the transformer architecture
(Vaswani et al., 2017) are radically different in important
ways (e.g., in modelling of the attention function) and require
a separate study.
The current study addresses a long historical feud between
the symbolic/serial processing and subsymbolic/parallel
processing approaches to modelling cognition. Generally, the
focus on perception and bottom up processes is attributed to
the subsymbolic approach, while the focus on heuristics,
symbol manipulation and high levels of abstraction is
considered to be the way of the symbolic approach (for a
review, see Lieto, 2021). It is important to note that such
differentiation was not universally accepted. Indeed, Simon
and Newell considered perception to be a vital component in
symbolic models of intelligence, together with a physical
symbol system (Newell, 1990; Simon, 1981). Our CHREST
model also adheres to their view (which is natural, as it is
closely related to their models). However, contrary to Simon’s
intuitions (Simon, 1991, pp. 81-83) and more in line with
Newell’s thinking (Newell, 1990, p. 487), our results
demonstrate both serial and parallel approaches to be
convergent in important ways (with cognitive chunks and
neural engrams being equivalent in a narrow technical sense),
and with there being multiple levels of explicit representation
a mind-level representation and a brain-level representation.
To conclude, the current paper further bridges the gap
between cognition and its neural substrate by demonstrating
profound convergence between a rigorous cognitive
psychology-based model and its neuroscience-based
counterpart. Our findings connect the mechanisms proposed
by chunking theory to the neural network modelling approach,
and make further inroads towards a Unified Theory of
Cognition (Newell, 1990).
REFERENCES
[1] Battleday, R. M., Peterson, J. C., & Griffiths, T. L. (2020). Capturing
human categorization of natural images by combining deep networks
and cognitive models. Nature Communications, 11.
[2] Bennett, D., Gobet, F., & Lane, P. (2020). Forming concepts of Mozart
and Homer using short-term and long-term memory: A computational
model based on chunking. In S. Denison, M. Mack, Y. Xu, & B. C.
Armstrong (Eds.), Proceedings of the 42nd annual conference of the
cognitive science society (pp. 178-184). Toronto.
[3] Anderson, J. R. (1991). The adaptive nature of human categorization.
Psychological Review, 98, 409-429.
[4] Anderson, J. R., & Lebiere, C. (1998). The atomic components of
thought. Mahwah, NJ: Erlbaum.
[5] Baddeley, A. (1986). Working memory. Oxford: Clarendon Press.
[6] Battleday, R. M., Peterson, J. C., & Griffiths, T. L. (2020). Capturing
human categorization of natural images by combining deep networks
and cognitive models. Nature Communications, 11.
[7] Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in
recurrent nets: The difficulty of learning long-term dependencies. In J.
Kolen & S. Kremer (Eds.), A field guide to dynamical recurrent neural
networks (pp. 237-240). New York: Wiley.
[8] Bennett, D., Gobet, F., & Lane, P. (2020). Forming concepts of Mozart
and Homer using short-term and long-term memory: A computational
model based on chunking. In S. Denison, M. Mack, Y. Xu, & B. C.
Armstrong (Eds.), Proceedings of the 42nd annual conference of the
cognitive science society (pp. 178-184). Toronto.
[9] Bhatia, S., & Richie, R. (2022). Transformer networks of human
conceptual knowledge. Psychological Review.
[10] Braunlich, K., & Love, B. (2022). Bidirectional influences of
information sampling and concept learning. Psychological Review,
129(2), 213-234. doi:10.1037/rev0000287
[11] Byrne, M. D. (2012). Unified theories of cognition. WIREs Cognitive
Science, 3(4), 431-438. doi:https://doi.org/10.1002/wcs.1180
[12] Chase, W. G., & Simon, H. (1973). Perception in chess. Cognitive
Psychology, 4, 55-81.
[13] Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk,
H., & Bengio, Y. (2014). Learning phrase representations using RNN
encoder-decoder for statistical machine translation. In Conference on
Empirical Methods in Natural Language Processing (EMNLP 2014)
[14] Deutsch, J. A. (1964). Behavioral measurement of the neural refractory
period and its application to intracranial self-stimulation. Journal of
Comparative and Physiological Psychology, 58(1), 1-9.
[15] Dor, O., & Reich, Y. (2011). An evaluation of musical score
characteristics for automatic classification of composers. Computer
Music Journal, 35, 86-97.
[16] Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14,
179-211.
[17] Feigenbaum, E. A. (1963). The simulation of verbal learning behavour.
Proceedings of the Western joint computer conference, 19, 121-132.
[18] Feigenbaum, E. A., & Simon, H. (1984). EPAM-like models of
recognition and learning. Cognitive Science, 8, 305-336.
[19] Fredkin, E. (1960). Trie memory. Communications of the ACM, 3(9),
490499. doi:10.1145/367390.367400
[20] Freudenthal, D., Pine, J., Jones, G., & Gobet, F. (2016).
Developmentally plausible learning of word categories from
distributional statistics. In D. Papafragou, D. Grodner, D. Mirman, &
J. Trueswell (Eds.), 38th annual conference of the cognitive science
society. Austin, TX.
[21] Frixione, M., & Lieto, A. (2012). Prototypes vs exemplars in concept
representation. Paper presented at the International Conference on
Knowledge Engineering and Ontology Development, Barcelona.
[22] Gegov, E., Gegov, A., Gobet, F., Atherton, M., Freudenthal, D., &
Pine, J. (2012). Cognitive modelling of language acquisition with
complex networks. In A. Floares (Ed.), Computational intelligence (pp.
95-106). New York: Nova Science Publishers.
[23] Gobet, F. (1993). A computer model of chess memory. In W. Kintsch
(Ed.), Fifteenth annual meeting of the cognitive science society (pp.
463-468): Erlbaum.
[24] Gobet, F., & Simon, H. (1996). Recall of rapidly presented random
chess positions is a function of skill. Psychonomic Bulletin and Review
3(2), 159-63.
[25] Gobet, F. (1996). Discrimination nets, production systems and
semantic networks: Elements of a unified framework. In D. Edelson &
E. Domeshek (Eds.), Proceedings of the 2nd international conference
on the learning sciences (pp. 398-403). Evanston, IL: Northwestern
University.
[26] Gobet, F. (1997a). Can Deep Blue make us happy? Reflections on
human and artificial expertise. In B. Kuipers & B. Webber (Eds.),
American association for artificial intelligence-97 workshop: Deep
Blue vs. Kasparov: The significance for artificial intelligence (pp. 20-
23). New York, US: American Association for Artificial Intelligence
Press.
[27] Gobet, F. (2000). Discrimination nets, production systems and
semantic networks: Elements of a unified framework. Evanston: The
Association for the Advancement of Computing in Education.
[28] Gobet, F., & Lane, P. (2012). Chunking mechanisms and learning. In
M. M. Seel (Ed.), Encyclopedia of the sciences of learning (pp. 541-
544). New York: NY: Springer.
[29] Gobet, F., Lane, P., & Lloyd-Kelly, M. (2015). Chunks, schemata and
retrieval structures: Past and current computational models. . Frontiers
in Psychology, 6(6).
[30] Gobet, F., Lane, P. C., Croker, S., Cheng, P. C., Jones, G., Oliver, I., &
Pine, J. M. (2001). Chunking mechanisms in human learning. Trends
in Cognitive Sciences, 5(6), 236.
[31] Gobet, F., & Simon, H. (1998). Expert chess memory: Revisiting the
chunking hypothesis. Memory, 6, 225-255.
[32] Gobet, F., & Simon, H. (2000). Five seconds or sixty? Presentation
time in expert memory. Cognitive Science, 24, 651-682.
[33] Gobet, F., & Waters, A. (2023). Searching for answers: Expert pattern
recognition and planning. Trends in Cognitive Sciences, 27.
doi:10.1016/j.tics.2023.07.006
[34] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning.
New York: MIT Press.
[35] Guida, A., Gobet, F., Tardieu, H., & Nicolas, S. (2012a). How chunks,
long-term working memory and templates offer a cognitive explanation
for neuroimaging data on expertise acquisition: A two-stage
framework. Brain and Cognition, 79(3), 221-244.
[36] Guida, A., Gobet, F., Tardieu, H., & Nicolas, S. (2012b). How chunks,
long-term working memory and templates offer a cognitive explanation
for neuroimaging data on expertise acquisition: A two-stage
framework. Brain and Cognition, 79, 221-244.
[37] Hahnloser, R. H. R., Sarpeshkar, R., Mahowald, M. A., Douglas, R. J.,
& Seung, H. S. (2000). Digital selection and analogue amplification
coexist in a cortex-inspired silicon circuit. Nature, 405(6789), 947-951.
doi:10.1038/35016072
[38] Hambling, D. (2020). AI outguns a human fighter pilot. New Scientist,
247(3297), 12. doi:https://doi.org/10.1016/S0262-4079(20)31477-9
[39] Hawkins, J., & Blakeslee, S. (2004). On intelligence. London: Henry
Holt and Company.
[40] Herremans, D., Martens, D., & Sörensen, K. (2015). Composer
classification models for music-theory building. In D. Meredith (Ed.),
Computational music analysis (pp. 369-392). NY: Springer.
[41] Hinton, G., & Mcclelland, J. (1987). Learning representations by
recirculation. In Proceedings of the 1987 international conference on
neural information processing systems (pp. 358366): MIT Press.
[42] Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., &
Salakhutdinov, R. (2012). Improving neural networks by preventing
co-adaptation of feature detectors. arXiv preprint, arXiv.
[43] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory.
Neural Computation, 9, 1735-1780. doi:10.1162/neco.1997.9.8.1735
[44] Hoffman, P., Mcclelland, J., & Lambon Ralph, M. (2018). Concepts,
control, and context: A connectionist account of normal and disordered
semantic cognition. Psychological Review, 125, 293328.
[45] Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer
feedforward networks are universal approximators. Neural Networks,
2(5), 359-366. doi:https://doi.org/10.1016/0893-6080(89)90020-8
[46] Hume, D. (1748). Treatise on human nature. Oxford: Blackwell.
[47] Jo, T., Nho, K., & Saykin, A. J. (2019). Deep learning in alzheimer's
disease: Diagnostic classification and prognostic prediction using
neuroimaging data. Frontiers in Aging Neuroscience, 11(220).
doi:10.3389/fnagi.2019.00220
[48] Konovalova, E., & Le Mens, G. L. (2018). Feature inference with
uncertain categorization: Re-assessing anderson’s rational model.
Psychonomic Bulletin & Review, 25(5), 1666. doi:10.3758/s13423-
017-1372-y
[49] Laird, J. E., Lebiere, C., & Rosenbloom, P. S. (2017). A standard model
of the mind: Toward a common computational framework across
artificial intelligence, cognitive science, neuroscience, and robotics. AI
Magazine, 38(4), 13-26.
[50] Lakatos, I. (1970). Falsification and methodology of scientific research
programmes. In I. Lakatos & A. Musgrave (Eds.), Criticism and the
growth of knowledge. Cambridge: CUP.
[51] Lane, P., & Gobet, F. (2012a). A theory-driven testing methodology
for developing scientific software. Journal of Experimental &
Theoretical Artificial Intelligence., 24(4), 421-456.
[52] Lane, P., & Gobet, F. (2012b). Using chunks to categorise chess
positions. In M. Bramer & M. Petrides (Eds.), Specialist group on
artificial intelligence international conference 2012 (pp. 93-106).
London: Springer-Verlag.
[53] Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,
521(7553), 436-444.
[54] Lecun, Y., Haffner, P., Bottou, L., & Bengio, Y. (1999). Object
recognition with gradient-based learning. In Shape, contour and
grouping in computer vision (pp. 319-345). Berlin: Springer.
[55] Lieto, A. (2019). Heterogeneous proxytypes extended: Integrating
theory-like representations and mechanisms with prototypes and
exemplars. In A. Samsonovich (Ed.), Biologically inspired cognitive
architectures 2018 (Vol. 848, pp. 217-227). London: Springer.
[56] Lieto, A. (2021). Cognitive design for artificial minds. London:
Routledge.
[57] Lieto, A. (2022). Analyzing the explanatory power of bionic systems
with the minimal cognitive grid. Frontiers in Robotics and AI, 9.
Retrieved from
https://www.frontiersin.org/articles/10.3389/frobt.2022.888199
[58] Liu, X., Ramirez, S., & Tonegawa, S. (2013). Inception of a false
memory by optogenetic manipulation of a hippocampal memory
engram. Philosophical transactions of the Royal Society of London.
Series B, Biological sciences, 369(1633), 20130142-20130142.
doi:10.1098/rstb.2013.0142
[59] Love, B., & Medin, D. (1998). SUSTAIN: A model of human category
learning. Paper presented at the National Conference on Artificial
Intelligence, Wisconsin, US.
[60] Love, B., Medin, D., & Gureckis, T. (2004). SUSTAIN: A network
model of category learning. Psychological Review, 111(2), 309-332.
doi:10.1037/0033-295x.111.2.309
[61] Mathy, F., Fartoukh, M., Gauvrit, N., & Guida, A. (2016).
Developmental abilities to form chunks in immediate memory and its
non-relationship to span development. Frontiers in Psychology, 7, 201.
[62] Mcculloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas
immanent in nervous activity. 1943. Bulletin of Mathematical Biology,
52(1-2), 99-115.
[63] Miller, G. A. (1956). The magical number seven, plus or minus two:
Some limits on our capacity for processing information. Psychological
Review, 63, 81-97.
[64] Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M.,
& Gao, J. (2020). Deep learning based text classification: A
comprehensive review. arXiv preprint arXiv:2004.03705.
[65] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I.,
Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep
reinforcement learning. arXiv preprint arXiv:1312.5602.
[66] Mountcastle, V. (1978). An organizing principle for cerebral function
: The unit module and the distributed system. In G. Edelman (Ed.),
Mindful brain. New York: MIT Press.
[67] Murphy, G. L. (2002). The big book of concepts: Cambridge, MA: MIT
Press, 2002.
[68] Nair, V., & Hinton, G. (2010). Rectified linear units improve restricted
boltzmann machines. In J. Furnkrantz & T. Joachims (Eds.),
Proceedings of the 27th international conference on machine learning
(pp. 807814). Madison, WI: Omnipress.
[69] Newell, A. (1990). Unified theories of cognition. Cambridge, MA, US:
Harvard University Press.
[70] Nosofsky, R. M. (2011). The generalized context model: An exemplar
model of classification. In Formal approaches in categorization. (pp.
18-39). New York: Cambridge University Press.
[71] Nosofsky, R. M., Sanders, C. A., Gerdom, A., Douglas, B. J., &
Mcdaniel, M. A. (2017). On learning natural-science categories that
violate the family-resemblance principle. Psychological Science,
28(1), 104-114.
[72] Nosofsky, R. M., Sanders, C. A., Meagher, B. J., & Douglas, B. J.
(2018). Toward the development of a feature-space representation for
a complex natural category domain. Behavior Research Methods,
50(2), 530-556.
[73] Olexik, W. (2015). Motor neurons miscroscopy. In B. Neurons (Ed.).
[74] Palmer, S. E. (1999). Vision science: Photons to phenomenology.
Cambridge, MA, US: The MIT Press.
[75] Pizlo, Z. (2001). Perception viewed as an inverse problem. Vision
Research, 41(24), 3145-3161. doi:https://doi.org/10.1016/S0042-
6989(01)00173-0
[76] Purves, D., Cabeza, R., Huettel, S., Labar, K., Platt, M. L., & Woldorff,
M. (2013). Principles of cognitive neuroscience. Sunderland, MA:
Sinauer Associates.
[77] Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R.,
Christensen, A., . . . Kording, K. P. (2019). A deep learning framework
for neuroscience. Nature Neuroscience, 22(11), 1761-1770.
[78] Richman, H. B., Gobet, F., Staszewski, J., & Simon, H. (1996).
Perceptual and memory processes in the acquisition of expert
performance: The EPAM model. In K. A. Ericsson (Ed.), The road to
excellence: The acquisition of expert performance in the arts and
sciences, sports, and games (pp. 167187). Mahwah, MA: Erlbaum.
[79] Richman, H. B., & Simon, H. (1989). Context effects in letter
perception: Comparison of two theories. Psychological Review, 96(3),
417-432. doi:10.1037/0033-295X.96.3.417
[80] Richman, H. B., Simon, H., & Feigenbaum, E. A. (2002). Simulations
of paired associate learning using EPAM VI. Unpublished.
[81] Richman, H. B., Staszewski, J. J., & Simon, H. (1995). Simulation of
expert memory using EPAM IV. Psychological Review, 102(2), 305-
330.
[82] Ritter, F. E., Tehranchi, F., & Oury, J. D. (2019). ACT-R: A cognitive
architecture for modeling cognition. WIREs Cognitive Science, 10(3),
e1488. doi:https://doi.org/10.1002/wcs.1488
[83] Robbins, T. W., Anderson, E., Barker, D. R., Bradley, A. C.,
Fearnyhough, C., Henson, R., Hudson, S. R., Baddeley, A. D. . (1995).
Working memory in chess. . Memory and Cognition, 24, 83-93.
[84] Rosenblatt, F. (1958). The perceptron: A probabilistic model for
information storage and organization in the brain. Psychological
Review, 65(6), 386-408. doi:10.1037/h0042519
[85] Rosenblatt, F. (1962). Principles of neurodynamics: Perceptrons and
the theory of brain mechanisms. MA: Spartan Books.
[86] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning
representations by back-propagating errors. Nature, 323(6088), 533.
[87] Sanders, C. A., & Nosofsky, R. M. (2020). Training deep networks to
construct a psychological feature space for a natural-object category
domain. Computational Brain & Behavior, 3(3), 229-251.
doi:10.1007/s42113-020-00073-z
[88] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den
Driessche, G., . . . Hassabis, D. (2016). Mastering the game of go with
deep neural networks and tree search. Nature, 529(7587), 484-849.
doi:10.1038/nature16961
[89] Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez,
A., . . . Graepel, T. (2017). Mastering chess and shogi by self-play with
a general reinforcement learning algorithm. Science, 362(6419).
[90] Simon, H. (1974). How big is a chunk? Science, 183(4124), 482-488.
[91] Simon, H. (1981). Information-processing models of cognition. Journal
of the American Society for Information Science, 32, 364-377.
[92] Simon, H. (1991). The sciences of the artificial. New York: MIT Press.
[93] Simon, H., & Chase, W. G. (1973). Skill in chess. American Scientist,
61(4), 394-403.
[94] Simon, H., & Gilmartin, K. (1973). A simulation of memory for chess
positions. . Cognitive Psychology, 5, 29-46.
[95] Smith, D. J., & Minda, J. P. (2000). Thirty categorization results in
search of a model. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 26(1), 3-27.
[96] Smith, R., Gobet, F., & Lane, P. (2007). An investigation into the effect
of ageing on expert memory with CHREST. Paper presented at the
Proceedings of The Seventh UK Workshop on Computational
Intelligence, Aberdeen.
[97] Tan, S. Z. K., Du, R., Perucho, J. a. U., Chopra, S. S., Vardhanabhuti,
V., & Lim, L. W. (2020). Dropout in neural networks simulates the
paradoxical effects of deep brain stimulation on memory. Frontiers in
Aging Neuroscience, 12(273).
[98] Vaswani, A., Shazeer, N. M., Parmar, N., Uszkoreit, J., Jones, L.,
Gomez, A. N., . . . Polosukhin, I. (2017). Attention is all you need.
Paper presented at the Neural Information Processing Systems.
[99] Ward, J., & Simner, J. (2022). How do different types of synesthesia
cluster together? Implications for causal mechanisms. Perception,
51(2), 91-113.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Does expertise mostly stem from pattern recognition or look-ahead search? van Opheusden et al. contribute to this important debate in cognitive psychology and artificial intelligence (AI) with a multi-method, multi-experiment study and a new model. Using a novel, relatively simple board game, they show that players increase depth of search when improving their skill.
Article
Full-text available
We present a computational model capable of simulating aspects of human knowledge for thousands of real-world concepts. Our approach involves a pretrained transformer network that is further fine-tuned on large data sets of participant-generated feature norms. We show that such a model can successfully extrapolate from its training data, and predict human knowledge for new concepts and features. We apply our model to stimuli from 25 previous experiments in semantic cognition research and show that it reproduces many findings on semantic verification, concept typicality, feature distribution, and semantic similarity. We also compare our model against several variants, and by doing so, establish the model properties that are necessary for good prediction. The success of our approach shows how a combination of language data and (laboratory-based) psychological data can be used to build models with rich world knowledge. Such models can be used in the service of new psychological applications, such as the modeling of naturalistic semantic verification and knowledge retrieval, as well as the modeling of real-world categorization, decision-making, and reasoning.
Article
Full-text available
In this article, I argue that the artificial components of hybrid bionic systems do not play a direct explanatory role, i.e., in simulative terms, in the overall context of the systems in which they are embedded in. More precisely, I claim that the internal procedures determining the output of such artificial devices, replacing biological tissues and connected to other biological tissues, cannot be used to directly explain the corresponding mechanisms of the biological component(s) they substitute (and therefore cannot be used to explain the local mechanisms determining an overall biological or cognitive function replicated by such bionic models). I ground this analysis on the use of the Minimal Cognitive Grid (MCG), a novel framework proposed in Lieto (Cognitive design for artificial minds, 2021) to rank the epistemological and explanatory status of biologically and cognitively inspred artificial systems. Despite the lack of such a direct mechanistic explanation from the artificial component, however, I also argue that the hybrid bionic systems can have an indirect explanatory role similar to the one played by some AI systems built by using an overall structural design approach (but including the partial adoption of functional components). In particular, the artificial replacement of part(s) of a biological system can provide i) a local functional account of that part(s) in the context of the overall functioning of the hybrid biological–artificial system and ii) global insights about the structural mechanisms of the biological elements connected to such artificial devices.
Article
Full-text available
It is unclear whether synesthesia is one condition or many, and this has implications for whether theories should postulate a single cause or multiple independent causes. Study 1 analyses data from a large sample of self-referred synesthetes ( N = 2,925), who answered a questionnaire about N = 164 potential types of synesthesia. Clustering and factor analysis methods identified around seven coherent groupings of synesthesia, as well as showing that some common types of synesthesia do not fall into any grouping at all (mirror-touch, hearing-motion, tickertape). There was a residual positive correlation between clusters (they tend to associate rather than compete). Moreover, we observed a “snowball effect” whereby the chances of having a given cluster of synesthesia go up in proportion to the number of other clusters a person has (again suggesting non-independence). Clusters tended to be distinguished by shared concurrent experiences rather than shared triggering stimuli (inducers). We speculate that modulatory feedback pathways from the concurrent to inducers may play a key role in the emergence of synesthesia. Study 2 assessed the external validity of these clusters by showing that they predict performance on other measures known to be linked to synesthesia.
Article
Full-text available
Contemporary models of categorization typically tend to sidestep the problem of how information is initially encoded during decision making. Instead, a focus of this work has been to investigate how, through selective attention, stimulus representations are "contorted" such that behaviorally relevant dimensions are accentuated (or "stretched"), and the representations of irrelevant dimensions are ignored (or "compressed"). In high-dimensional real-world environments, it is computationally infeasible to sample all available information, and human decision makers selectively sample information from sources expected to provide relevant information. To address these and other shortcomings, we develop an active sampling model, Sampling Emergent Attention (SEA), which sequentially and strategically samples information sources until the expected cost of information exceeds the expected benefit. The model specifies the interplay of two components, one involved in determining the expected utility of different information sources and the other in representing knowledge and beliefs about the environment. These two components interact such that knowledge of the world guides information sampling, and what is sampled updates knowledge. Like human decision makers, the model displays strategic sampling behavior, such as terminating information search when sufficient information has been sampled and adaptively adjusting the search path in response to previously sampled information. The model also shows human-like failure modes. For example, when information exploitation is prioritized over exploration, the bidirectional influences between information sampling and learning can lead to the development of beliefs that systematically differ from reality. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Book
Full-text available
Cognitive Design for Artificial Minds explains the crucial role that human cognition research plays in the design and realization of artificial intelligence systems, illustrating the steps necessary for the design of artificial models of cognition. It bridges the gap between the theoretical, experimental, and technological issues addressed in the context of AI of cognitive inspiration and computational cognitive science. Beginning with an overview of the historical, methodological, and technical issues in the field of cognitively inspired artificial intelligence, Lieto illustrates how the cognitive design approach has an important role to play in the development of intelligent AI technologies and plausible computational models of cognition. Introducing a unique perspective that draws upon Cybernetics and early AI principles, Lieto emphasizes the need for an equivalence between cognitive processes and implemented AI procedures, in order to realize biologically and cognitively inspired artificial minds. He also introduces the Minimal Cognitive Grid, a pragmatic method to rank the different degrees of biological and cognitive accuracy of artificial systems in order to project and predict their explanatory power with respect to the natural systems taken as a source of inspiration. Providing a comprehensive overview of cognitive design principles in constructing artificial minds, this text will be essential reading for students and researchers of artificial intelligence and cognitive science.
Article
Full-text available
Human categorization is one of the most important and successful targets of cognitive modeling, with decades of model development and assessment using simple, low-dimensional artificial stimuli. However, it remains unclear how these findings relate to categorization in more natural settings, involving complex, high-dimensional stimuli. Here, we take a step towards addressing this question by modeling human categorization over a large behavioral dataset, comprising more than 500,000 judgments over 10,000 natural images from ten object categories. We apply a range of machine learning methods to generate candidate representations for these images, and show that combining rich image representations with flexible cognitive models captures human decisions best. We also find that in the high-dimensional representational spaces these methods generate, simple prototype models can perform comparably to the more complex memory-based exemplar models dominant in laboratory settings. Theories of human categorization have traditionally been evaluated in the context of simple, low-dimensional stimuli. In this work, the authors use a large dataset of human behavior over 10,000 natural images to re-evaluate these theories, revealing interesting differences from previous results.
Chapter
Computational modeling plays a central role in cognitive science. This book provides a comprehensive introduction to computational models of human cognition. It covers major approaches and architectures, both neural network and symbolic; major theoretical issues; and specific computational models of a variety of cognitive processes, ranging from low-level (e.g., attention and memory) to higher-level (e.g., language and reasoning). The articles included in the book provide original descriptions of developments in the field. The emphasis is on implemented computational models rather than on mathematical or nonformal approaches, and on modeling empirical data from human subjects. Bradford Books imprint
Chapter
Two books have been particularly influential in contemporary philosophy of science: Karl R. Popper's Logic of Scientific Discovery, and Thomas S. Kuhn's Structure of Scientific Revolutions. Both agree upon the importance of revolutions in science, but differ about the role of criticism in science's revolutionary growth. This volume arose out of a symposium on Kuhn's work, with Popper in the chair, at an international colloquium held in London in 1965. The book begins with Kuhn's statement of his position followed by seven essays offering criticism and analysis, and finally by Kuhn's reply. The book will interest senior undergraduates and graduate students of the philosophy and history of science, as well as professional philosophers, philosophically inclined scientists, and some psychologists and sociologists.
Chapter
This collection of texts on the Sublime provides the historical context for the foundation and discussion of one of the most important aesthetic debates of the Enlightenment. The significance of the Sublime in the eighteenth century ranged across a number of fields - literary criticism, empirical psychology, political economy, connoisseurship, landscape design and aesthetics, painting and the fine arts, and moral philosophy - and has continued to animate aesthetic and theoretical debates to this day. However, the unavailability of many of the crucial texts of the founding tradition has resulted in a conception of the Sublime often limited to the definitions of its most famous theorist Edmund Burke. Andrew Ashfield and Peter de Bolla's anthology, which includes an introduction and notes to each entry, offers students and scholars ready access to a much deeper and more complex tradition of writings on the Sublime, many of them never before printed in modern editions.