Cognitive Chunks, Neural Engrams and Natural Concepts: Bridging the Gap between Connectionism and Symbolism


Abstract and Figures

Chunking theory is among the most established theories in cognitive psychology. However, little work has been done to connect the key ideas of chunks and chunking to the neural substrate. The current study addresses this issue by investigating the convergence of a cognitive CHREST model (the computational embodiment of chunking theory) and its neuroscience-based counterpart (based on deep learning). Both models were trained from raw data to categorise novel stimuli in the real-life domains of literature and music. Despite having vastly different mechanisms and structures, both models largely converged in their predictions of classical writers and composers-in both qualitative and quantitative terms. Moreover, the use of the same chunk/engram activation mechanism for CHREST and deep learning models demonstrated functional equivalence between cognitive chunks and neural engrams. The study addresses a historical feud between symbolic/serial and subsymbolic/parallel processing approaches to modelling cognition. The findings also further bridge the gap between cognition and its neural substrate, connect the mechanisms proposed by chunking theory to the neural network modelling approach, and make further inroads towards integrating concept formation theories into a Unified Theory of Cognition (Newell, 1990).
The brain is often said to be the most complex object in the
known universe. There are multiple levels of investigating
brain functions: starting from the subatomic relations between
particles, to the molecular level, to whole neurons that
generate action potentials, to networks of networks that link
various brain regions, to cognition and behaviour that emerge
from all of these multilevel relations (Purves et al., 2013).
How do we approach this level of complexity?
The approach of cognitive psychology was to unravel the
mechanisms that underlie cognition starting with the top levels
(cognition and behaviour). For example, experiments on
human processing have shed light on the cognitive functions
of human attention, long-term and short-term memory
modules (LTM and STM, respectively), and their
interconnectedness with the perceptual apparatus.
Neuroscience, in turn, focused on the investigation of the
lower-level neural functions (e.g., synapses, neuronal
structure, firing rates, and refractory periods) and their
relationship to higher cognition. Understandably, both levels
of explanation rely on different sets of assumptions and
proposed mechanisms.
While many of the findings remain in the shape of verbal
theories, a sizable part has been captured via computational
formalisms (Anderson & Lebiere, 1998; Laird, Lebiere, &
Rosenbloom, 2017; Newell, 1990; Ritter, Tehranchi, & Oury,
2019). Computational formal models form a solution to the
problem of “magic parameters” associated with purely verbal
theories and integrate proposed mechanisms into a more
unified whole (Byrne, 2012; Lane & Gobet, 2012a; Newell,
The currently ongoing AI revolution is powered by formal
models of artificial neural networks (ANNs, historically
known under umbrella terms of connectionism and, more
recently, deep learning) (Hambling, 2020; Jo, Nho, & Saykin,
2019; Mnih et al., 2013; Silver et al., 2016; Silver et al., 2017;
Vaswani et al., 2017). Deep learning is also commonly used
in psychological models (e.g., Battleday, Peterson, &
Griffiths, 2020; Hoffman, McClelland, & Lambon Ralph,
2018; Sanders & Nosofsky, 2020). Indeed, its set of
fundamental mechanisms was largely developed and refined
through research in neuroscience and psychology (Hahnloser
et al., 2000; Hinton & McClelland, 1987; Hinton et al., 2012;
McCulloch & Pitts, 1943; Nair & Hinton, 2010; Rosenblatt,
1958, 1962; Rumelhart, Hinton, & Williams, 1986). A review
of computational neuroscience models concluded that despite
the problem of oversimplification, deep learning models can
already provide profound insights into the processing of the
brain (Richards et al., 2019).
On the cognitive psychology side, one of its most
established theories chunking theory has been also
embodied in computational cognitive architectures, first
EPAM (Feigenbaum, 1963; Richman, Staszewski, & Simon,
1995) and now CHREST (Chunking Hierarchy REtrieval
STructures) (Gobet, 1993, 2000; Gobet & Lane, 2012; Gobet
& Simon, 2000). Chunking theory’s key idea – a chunk is
defined as a meaningful unit of information made from
elements that have strong associations between each other
(e.g., several digits making up a telephone number). Hence,
chunking is the process of forming and updating chunks in the
cognitive system (Simon, 1974). Although the chunks
themselves vary between people due to personal differences,
the chunking mechanism is mostly invariant across domains,
individuals and cultures (Chase & Simon, 1973; Gobet et al.,
2001; Miller, 1956).
Since its emergence in 1959, cognitive chunking has been
found to be central in verbal learning (Feigenbaum & Simon,
1984; Richman & Simon, 1989; Richman, Simon, &
Feigenbaum, 2002), perception and memory systems involved
in expert behaviour (Gobet & Simon, 2000; Richman et al.,
1996; Richman et al., 1995; Simon & Chase, 1973; Simon &
Gilmartin, 1973), concept learning and categorisation
(Bennett, Gobet, & Lane, 2020; Lane & Gobet, 2012b),
developmental abilities and cognitive decline due to ageing
(Mathy et al., 2016; R. Smith, Gobet, & Lane, 2007),
acquisition of grammar in children (Freudenthal et al., 2016;
Gegov et al., 2012), and the list goes on. Thus, the idea of a
chunk is one of the key ideas in all of cognitive psychology.
One classic example of chunking theory is the finding that
stronger chess players are able to recall more novel chess
positions from a given chessboard when compared to weaker
players, but this effect is much more pronounced when the
said novel positions come from an actual game, and not just
randomly placed chess pieces. According to CHREST a
computational model based on chunking this is due to the
experts possessing more chunks in their LTM (Gobet &
Simon, 1996) (see Figure 1).
Figure 1. Recollection of game and random chess positions as a
function of ELO rating in humans and number of chunks in
CHREST. From Gobet and Simon (1996).
The neural basis of chunking was investigated using
neuroimaging techniques. It was found that experts possess
large domain-specific knowledge structures that activate in the
areas of the brain associated with episodic LTM memory.
While novices primarily rely on the prefrontal cortex to form
new primitives and update their shallow chunking networks,
experts show less activation in the prefrontal areas, but large
activations in the medial temporal lobe, presumably due to
rapid utilisation of large knowledge structures (see Figure 2)
(for a review, see Guida et al., 2012). While these findings
were important for establishing a link between chunking and
the neural function, they were presented in a form of a verbal
theory that was difficult to operationalise, e.g., using a
connectionist model of chunking.
The current paper aims to address this issue by
investigating the correspondence between the rigorous
cognitive CHREST model that is based on chunking and a
neuroscience-based model based on deep learning.
Figure 2. Experts’ brains (on the right) have more activations in the
temporal regions associated with LTM, and fewer activations in the
prefrontal STM regions when compared to novices (on the left).
Darker shades of green signify stronger activation. Adapted from
Guida et al. (2012).
CHREST is a self-organising computer model that simulates
human learning processes via interacting cognitive
mechanisms and structures. For CHREST, learning implies
gradual growth of a network of chunks in LTM, a process
influenced both by the environmental stimuli and the data that
have already been stored (Gobet & Lane, 2012). CHREST’s
STM structure allows for additional ways to create links
between chunks, such as linking chunks across visual and
verbal modalities.
Another way to present CHREST is to say that it is
analogous to deep learning both in terms of its power and
simplicity, with the caveat that CHREST’s level of
investigating the brain function starts with the top level
(cognition and behaviour) as opposed to neural mechanisms.
With regard to power, like the multi-layer artificial neural
nets, CHREST is an example of a universal function
approximator (Fredkin, 1960; Gobet, 1996; Hornik,
Stinchcombe, & White, 1989). Thus, like deep learning,
CHREST is capable of classifying complex multidimensional
stimuli while learning from raw data, using both supervised
and unsupervised approaches.
As for simplicity, like deep learning, CHREST is very
simple at its core. While a perceptron is an idealised model of
a neuron, CHREST presents an idealised model of a cognitive
system. But, where deep learning relies on linear algebra,
partial differentials and the differentiation chain rule,
CHREST relies on a different set of formalisms. They include
Figure 3. Some of the core neural (on the left) and cognitive
mechanisms (on the right), and their respective formalisms (below)
in deep learning and CHREST respectively. The microscopy image
was taken from Olexik (2015).
graph data structures, first-in-first-out queues, with the whole
system being trained by a process of chunking that is
functionally equivalent to deep learning’s backpropagation.
(Backpropagation is the process of adjusting synaptic weights
based on the error rate of the artificial neural network
(Rumelhart et al., 1986)). Chunks are operationalized as graph
nodes and chunking is the process of adding new data to the
LTM (see Figure 3). This is done via two psychologically
plausible cognitive processes: discrimination and
familiarisation. Discrimination is the process of adding a new
node to the network. Familiarisation updates existing nodes
with new information.
An important difference between CHREST and
connectionism/deep learning is that CHREST is an example
of a symbolic architecture, while the connectionist neural nets
are subsymbolic (the question of which approach better
models cognition was hotly debated) (Simon, 1991). In
practice, this means that CHREST’s patterns are meaningful
and are represented as symbols (i.e., text) for objects inside
(cognition) and outside (input) the architecture. This is in
contrast to deep learning, where, for example, meaningful
input text is converted into numbers which are then
manipulated by the internal functions to generate a desired
output (see Figure 4). We should also add that CHREST is
different to many symbolic models (like “expert systems”)
and is closer to deep learning in its focus on perception as the
primary driver of intelligence. Gobet and Lane (2012) offer an
in-depth introduction to the chunking theory; for deep
learning, see LeCun, Bengio, and Hinton (2015).
Figure 4. Representations in cognitive, ANN and biological systems.
From left to right: a chunk with letter “A” in CHREST; numerical
neural heatmap corresponding to letter “A” in a deep learning
model; biological neuronal engram corresponding to “safe place”
in an optogenetically modified mouse (from Liu, Ramirez, and
Tonegawa (2013)). Yellow colour represents strong positive
activations and dark blue represents strong negative activations.
As was mentioned above, chunking plays a crucial role in a
wide range of cognitive phenomena. We focused on chunking
in concept learning/categorisation as this field is particularly
complex, and, “in some way, everything is concepts”
(Murphy, 2002, p. 3).
What are concepts? One definition is that concepts are
“mental representations of classes of things”, with “classes of
things“ themselves being categories (Murphy, 2002).
Historically, the psychological literature on concept formation
was dominated by formal models that operated on artificial
categories with a few and often binary dimensions (e.g.,
Anderson, 1991; Love, Medin, & Gureckis, 2004; Nosofsky,
2011), or natural categories that were pre-processed (into a
few and often binary dimensions) (e.g., Nosofsky et al., 2017;
Nosofsky et al., 2018). This led to reformulating the definition
of concepts as either prototypes (summary descriptions)
(Frixione & Lieto, 2012), clusters of specific instances
(exemplars) (Nosofsky, 2011), or clusters based on Bayesian
inference (Anderson, 1991) and other clustering algorithms
(Murphy, 2002). More recently, a number of psychological
deep learning models moved towards processing raw natural
categories, for example, classifying real-life images using
their pixel data (Battleday et al., 2020; Sanders & Nosofsky,
2020), or finding false sentences (e.g., “fur has cat”) in a
natural language text (Bhatia & Richie, 2022). Moreover,
Battleday et al. (2020) concluded that intuitions about theory
and model performance for low-dimensional categories do not
transfer to higher-dimensional ones.
Chunking theory’s CHREST has also been used to model
concept learning in tasks with high-dimensional real-life
complexity. One CHREST model was able to categorise novel
chess positions as one of two types of opening a French, or
a Sicilian (in chess, an opening concerns the first 10-20 moves
of a game, with there being billions of potential different
sequences of moves) (Lane & Gobet, 2012b). A more recent
CHREST model was able to categorise novel literature pieces
and music scores by predicting their respective author or
composer (Bennett et al., 2020). The latter model was also
notable for being domain general. For example, while the
chess model relied on chess-specific heuristics/mechanisms
that were hand-crafted by a chess expert (e.g., one of the
heuristics guided model’s attention towards chess pieces
under attack), the literature and music categorisation did not
have such pre-built knowledge structures and feature
detectors. Instead, the model automatically formed chunking
hierarchies during the learning phase (exposure to different
literature and music pieces). During the test phase, the model
automatically activated the largest of the formed chunks to
“vote” for a category. In broader terms, this meant that a
concept (e.g., a mental representation such as “Mozart” or
“Homer”) was a collection of chunks in a cognitive LTM-like
structure as operationalised by chunking theory’s CHREST.
The present study intended to establish the level of
convergence between the chunking theory and
connectionism/deep learning. We replicated and compared
CHREST’s artificial category learning performance, as well
as literature and music categorisation experiment, with a deep
learning model. While deep learning models have a rich
history in text classification, with over 150 models built in
recent years alone (Minaee et al., 2020), models of music
scores classification are less numerous. Dor and Reich (2011)
analysed MIDI data and achieved over 90% accuracy on
classification of composer pairs (e.g., Bach or Chopin, Bach
or Mozart). Herremans, Martens, and Sörensen (2015)
achieved over 80% accuracy on a 3-way classification of a
large dataset containing MIDI music by Bach, Beethoven and
Haydn. However, both of the studies above relied on hand
engineered musical features such as “melodic fifth
frequency”, “note count feature” and “melodic octave
frequency”, instead of training from raw music score data.
Indeed, Dor and Reich (2011) considered classification of raw
music scores to be impossible for the then current machine
learning methods.
The approach of the current study included training
linguistic and non-linguistic domains in one pass (i.e.
simultaneous learning of both the literature authors and music
composers as was done in the CHREST study). Also, the
training sets were kept to raw data only. To our knowledge,
this approach is novel. We should also note that our deep
learning model is meant to be supplementary to CHREST
research while our model is not trivial, it makes no claim to
state-of-the-art categorisation performance. Instead, it was
designed to aid comparison and to provide important
theoretical neuroscience context to the state-of-the-art
cognitive model of concept learning (CHREST). We trained
both CHREST and deep learning models on the same set of
unabridged works by various authors and composers. We
tested categorisation on previously unseen pieces produced by
the same authors and composers.
A. The Training and Testing
The training and testing procedure was meant to mostly
replicate the experiment by Bennett et al. (2020). There were
10 categories altogether, with 6 literature authors (Homer,
Chaucer, Shakespeare, Scott, Dickens, Joyce) and 4 music
composers (Bach, Mozart, Beethoven, Chopin). For each
category (i.e., a literature author or a music composer), the
training set contained approximately 300Kb of text in total.
The test dataset was expanded from the original study’s 50
pieces of literature and music to 120 pieces (60 literature
pieces and 60 music scores; these were not part of the training
set). This was done both to further test the CHREST model
and to broaden the scope of comparison to the ANN model.
Our CHREST model architecture was replicated from the
original experiment by Bennett et al. (2020): it once again
contained an LTM data structure that acquired chunks through
learning, an STM structure that allowed to create category
naming links between chunks, and a sliding attention window.
The model also had a “chunk activation” mechanism: if there
are m categories, the vector of categories is c =[,,
 ], the vector of category specific chunk activations is
a=[,, …] and the confidence of a prediction that a
stimulus belongs to category would be calculated using the
󰇛󰇜 󰇛󰇜
where 󰇛󰇜is confidence that the category is , given a
literature or music stimulus x;  is the LTM chunks’
activation corresponding to that category, and the summation
part being the sum of chunk activations across all m
categories. See Bennett et al. (2020) for the full details of
CHREST categorisation model.
C. Deep Learning Model
A common way to model sequence processing in neural
networks is with a recurrent type of neural architecture, also
referred to as Recurrent Neural Networks (RNN) (Elman,
1990). Our model was also of this type. An RNN neuron has
an axon that branches and outputs signal into that neuron
itself, as well the subsequent neurons. Concretely, RNN
neuron’s output at time t is 󰇛󰇜,
where  is the threshold activation function (see below
for more details), is the input at time t,  is the synapse
weight between the input and the neuron,  is the output of
that neuron at the previous time step t-1, and the  is the
synapse weight between the neuron’s output and itself. The
backward propagation of RNN is also known as “backward
propagation through time” (Elman, 1990), but, despite the
added time component, the basic logic remains the same.
As was mentioned above, simple off-the-shelf RNNs
could not categorise raw music scores presumably due to the
network “forgetting” input that is above approximately ten
time steps (Bengio, Frasconi, & Schmidhuber, 2001;
Goodfellow, Bengio, & Courville, 2016). Thus, our model
was enriched with four additional psychologically plausible
Firstly, the model featured random shutdown of neurons
(also known as “Dropout”) (Hinton et al., 2012). This can be
viewed as an approximation for the neural refractory period
the short period of time when the neuron may not fire again
(Deutsch, 1964). Functionally, the dropping out of neurons
from the learning process prevents overfitting and excessive
synaptic co-adaptation to patterns. Indeed, recent
neuroscience research suggests that artificially inducing
higher levels of neuronal dropout in biological brains (e.g., via
Deep Brain Stimulation) can both disrupt and improve
memory (Tan et al., 2020).
Secondly, the activation function for the neurons was
chosen to be “ReLU” (rectified linear unit f(x) = max(0,x)).
Originally proposed as a more biologically plausible depiction
of the neural threshold function that is analogue as well as
digital in nature (Hahnloser et al., 2000), ReLU became one
of the key drivers behind the breakthrough with training
artificial neural networks with many layers (Nair & Hinton,
2010). Because ReLU is so similar to a linear function while
being non-linear it significantly diminished gradient
saturation (and the ensuing vanishing gradient problem) that
was associated with the traditional neural activation function
sigmoid(x) and its variants like the tanh(x).
Our third addition was the “sliding attention window” – as
it was used with CHREST. The sliding attention window
passed the retrieved short word sequences to the model and
the model generated a vote/prediction for each of the
The fourth addition was the “LTM activation” mechanism
that aimed to resolve conflicts between votes for different
categories. As the model generated category “votes”, these
votes were then aggregated and the overall winner was
declared by the confidence formula. The multidomain
confidence criterion was calculated by the same formula as
was used in CHREST model and still had the aim of resolving
conflicting “voting” among category representations. The
important difference was that, this time, the voting conflict
was among different neural activations/engrams (as opposed
to different cognitive chunks with CHREST):
󰇛󰇜 󰇛󰇜
where 󰇛󰇜is confidence that category (i.e., author or
composer) is , given a novel literature or music stimulus x;
is the neural engrams’ activation score corresponding to
that category (i.e., author or composer), and the summation
part being the sum of neural engram activations across all m
Training and testing text files were converted to numeric
vectors using the TensorFlow Tokenizer library to streamline
processing. It should be noted that text vectorisation was done
in the name of convenience despite the psychological
questionability of such a mechanism. The plausibility of the
overall model would not be affected by this particular decision
as artificial neural nets are capable of character recognition at
approximately human level of performance (LeCun et al.,
1999). The order of the training samples was randomised. No
notes or words were removed from either training or testing
The model had around 8.5 million trainable parameters
and was trained for 20 epochs. The vocabulary size was set to
50,000; the size of the attention window was set to 20
words/chords/notes; the embedding dimension was set to 1.
In all other aspects, the current experiment was a complete
replica of the CHREST music and literature categorisation
experiment discussed above.
See for Python3 source code;
for Java implementation of CHREST with graphical user
interface and more documentation see
Both CHREST and deep learning models were able to learn
complex categories in the real-world music and literature
domains. They required no ad hoc additions to the core
architecture in order to simultaneously process music or
literature specific nuances. The descriptive statistics for both
models are summarised in Table 1.
CHREST’s categorisation performance was substantially
above chance of the 120 tests across 10 categories (implying
12 correct answers by pure chance), 83 were classed correctly.
Within modalities, CHREST correctly categorised 41 out of
60 literature works and 42 out of 60 music scores. The deep
learning model’s categorisation performance was also
substantially above chance of the 120 tests across 10
categories, 93 were classed correctly.
CHREST and ANN models made similar quantitative
predictions. The CHREST model had the highest true
predictions for Bach, Mozart, and Beethoven in music and
Chaucer, Homer, Shakespeare, and Dickens in literature. Bach
and Chaucer had the highest confidence scores for their
respective modalities. Chopin and Joyce had the lowest
confidence scores as well as the lowest true prediction rates.
The same pattern of results was true for the deep learning
model. One notable outlier was the discrepancy on the Walter
Scott category, where CHREST had 4/10 correct predictions
while the deep learning model had a 9/10 score.
There were no mistakes across modalities by either model
literature was never categorised as music and vice versa.
This implies that while the models were taught to classify 10
types of regularities, they formed (empirically) distinct
memory chunks/engrams that separate the domains of music
and literature as was evident from the overall winning
confidence scores. However, while CHREST had no
activations across modalities, they were occasionally present
for the deep learning model. For example, the first four Scott
pieces generated various activations across literature authors
but had zero activations for any music composer. On the other
hand, some stimuli did generate exactly that kind of “multi-
modality” engram activation pattern. For instance, Mozart’s
Fantasia in D activated an engram that was made up from
76% of Mozart’s representations, but also with 3% of Joyce’s
(see Table 2).
A. Summarising the Results
Both cognitive and neural models were able to learn real-life
highly multidimensional categories while learning from raw
data only, without bootstrapping to pre-made knowledge
structures and feature detectors.
The comparison of the performance obtained by the CHREST
and deep learning models provides for intriguing analysis and
warrants further investigation. From a high-level perspective,
both models demonstrated the capability of learning concepts
in complex, dissimilar domains (linguistic in the case of
literature and non-linguistic in the case of music).
Beyond the qualitative similarity, the CHREST and deep-
learning models also made similar quantitative predictions.
CHREST and deep learning also share functional similarities.
This was demonstrated using the same activation mechanism
for both CHREST and deep-learning models. Indeed, the
proposed activation formula/code was completely
interchangeable between the models and required no
adjustment to measure the activation of chunks or the
activation of neural engrams. In this technical sense, cognitive
chunks may be said to be equivalent to neural engrams. The
attention window mechanism was also similar for the two
types of models.
Nevertheless, this similarity of the two models was not
absolute. While CHREST had no cross-modality activations
at all, the deep-learning model had slight (mostly around 1-
5%) activations on some of the cross-modal representations.
This being said, the overall distinct clustering of literature and
music was true for both models. The occasional and mostly
small cross modal activations of the deep-learning model may
need further investigation: while the result above may
represent statistical noise of the artificial neurons, similar
mechanism may also potentially elucidate the complex
phenomenon of synaesthesia (Ward & Simner, 2022).
B. Constraining the Infinite Space of Candidate Theories
One potential criticism of our study is that having two models
arriving at similar behaviour by different means does not
necessarily imply that the models are equivalent in any but the
broadest sense. Indeed, in a trivial example where x = 2 and
f(x) = 4, it does not make sense to talk about the equivalence
of functions such as f(x) = 2x; f(x) = x3 x2; f(x) = 4*sin( /
x), and so on. Of the similarly infinite number of functions that
may potentially represent a working cognitive system, which
one did nature choose to make a human categorise a novel
music piece as a “Beethoven”? Similarly, to what extent can
psychological models claim convergence with humans and
Table 1. Categorisation performance summary for CHREST and
deep learning models.
CHREST Deep Learning
Accuracy rate (%) Mean confidence Accuracy rate (%) Mean confidence
Music Bach 100 0.50 93 0.56
Beethoven 93 0.44 87 0.46
Chopin 20 0.31 27 0.34
Mozart 67 0.42 67 0.52
Literature Chaucer 100 0.50 100 0.85
Dickens 70 0.26 100 0.49
Homer 90 0.31 100 0.58
Joyce 20 0.17 40 0.37
Scott 40 0.20 90 0.39
Shakespeare 90 0.34 90 0.50
with each other? To put it yet another way, given two points
“A” and “B”, there is an infinite number of paths that lead
from point “A” to point “B”; how would we decide on which
path to take? On the one hand, there is no real answer a
“solution” that is commonly known as Hume’s problem of
induction (Hume, 1748). We relied on three pragmatic means
to address this problem in the current study.
Firstly, the simplest answer would be to choose the path
that satisfies some other constraints. For example, the shortest
path, a path through the gates “X”, “Y”, “Z” and so on. In
terms of choosing a psychologically plausible computational
approach, one should choose a method that satisfies multiple
constraints e.g., postdicting past psychological experimental
data as well as predicting findings that have not yet been
reported (Newell, 1990). Our CHREST and deep-learning
models both satisfy these constraints as they incorporate
fundamental psychological mechanisms and structures, and
are rooted in decades of psychological research. For example,
our CHREST model features the STM, LTM, chunking,
familiarisation, discrimination, association and attention;
while our ANN model has dendrites, axons, threshold
activation and a refractory period. While not the focus of the
current study, CHREST family of models also successfully
simulates human timings and learning rates in a variety of
cognitive experiments (Gobet, Lane, & Lloyd-Kelly, 2015).
As was mentioned above, a recent Nature review concluded
that deep learning models offer profound insights into the
working of the brain (Richards et al., 2019). This means that,
in terms of the “path from A to B” analogy above, our models
not only connect the “A” and “B”, but also pass the “X”, “Y”,
“Z” gates/constraints that are relevant to psychology. This is
unlike other hypothetical models of categorisation (e.g.,
semantic parsers, support vector machines, etc) that connect
the “A” and “B” without passing the gates and thus make up
the unconstrained infinite space of candidate models. More
broadly, this aspect forms a crucial difference between
computer science (where all models are “equal” as long as
they succeed at a task) and psychology (where
“psychologically plausible” models are desired). For instance,
Deep Blue is not considered to be a psychologically plausible
model of human chess playing (Gobet, 1997a) due to its
reliance on brute search, but AlphaZero is more psychological
in how it relies on pattern recognition (as well as incorporating
Table 2. An excerpt of individual categorisation confidence scores by CHREST and deep learning models. Red colour signifies zero memory
activation and darker shades of green signify stronger activation. Numbers in bold denote the highest confidence prediction on a given novel
CHREST model
Author File Chaucer Dickens Homer Joyce Scott Shakespeare Bach Beethoven Chopin Mozart Correct
Mozart A Piece For Piano K176 0.00 0.00 0.00 0.00 0.00 0.00 0.60 0.16 0.03 0.21 FALSE
Adagio In B Flat 0.00 0.00 0.00 0.00 0.00 0.00 0.12 0.04 0.29 0.55 TRUE
Fantasia In C K.475 0.00 0.00 0.00 0.00 0.00 0.00 0.14 0.20 0.28 0.38 TRUE
Fantasia In D, K397 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.06 0.31 0.58 TRUE
K309 Piano Sonata N10 1mov 0.00 0.00 0.00 0.00 0.00 0.00 0.44 0.24 0.11 0.22 FALSE
Scott Talisman Part 2 0.05 0.22 0.21 0.08 0.26 0.17 0.00 0.00 0.00 0.00 TRUE
Talisman Part 3 0.05 0.21 0.20 0.14 0.25 0.16 0.00 0.00 0.00 0.00 TRUE
Talisman Part 4 0.11 0.21 0.21 0.10 0.16 0.22 0.00 0.00 0.00 0.00 FALSE
Talisman Part 5 0.11 0.17 0.26 0.11 0.17 0.19 0.00 0.00 0.00 0.00 FALSE
The Lay Of The Last Minstrel 0.04 0.13 0.11 0.15 0.17 0.39 0.00 0.00 0.00 0.00 FALSE
Beethoven Piano Sonata N08 Op13 1mov 0.00 0.00 0.00 0.00 0.00 0.00 0.31 0.44 0.14 0.10 TRUE
Piano Sonata N08 Op13 3mov 0.00 0.00 0.00 0.00 0.00 0.00 0.46 0.32 0.12 0.11 FALSE
Piano Sonata N09_1 0.00 0.00 0.00 0.00 0.00 0.00 0.33 0.50 0.10 0.06 TRUE
Piano Sonata N09_2 0.00 0.00 0.00 0.00 0.00 0.00 0.34 0.51 0.09 0.06 TRUE
Piano Sonata N10 1mov 0.00 0.00 0.00 0.00 0.00 0.00 0.41 0.41 0.08 0.09 TRUE
Joyce A_Portrait_Of_The_Artist _1 0.05 0.25 0.14 0.24 0.20 0.11 0.00 0.00 0.00 0.00 FALSE
A_Portrait_Of_The_Artist _2 0.04 0.20 0.14 0.24 0.21 0.16 0.00 0.00 0.00 0.00 TRUE
A_Portrait_Of_The_Artist _3 0.02 0.23 0.20 0.23 0.10 0.21 0.00 0.00 0.00 0.00 TRUE
Finnegans_Wake_1 0.12 0.12 0.19 0.18 0.10 0.29 0.00 0.00 0.00 0.00 FALSE
Finnegans_Wake_2 0.13 0.16 0.18 0.09 0.16 0.29 0.00 0.00 0.00 0.00 FALSE
Deep Learning model
Author File Chaucer Dickens Homer Joyce Scott Shakespeare Bach Beethoven Chopin Mozart Correct
Mozart A Piece For Piano K176 0.00 0.00 0.00 0.00 0.00 0.00 0.57 0.14 0.05 0.24 FALSE
Adagio In B Flat 0.00 0.00 0.02 0.10 0.00 0.01 0.01 0.08 0.16 0.61 TRUE
Fantasia In C K.475 0.00 0.00 0.00 0.05 0.00 0.01 0.01 0.09 0.40 0.43 TRUE
Fantasia In D, K397 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.21 0.76 TRUE
K309 Piano Sonata N10 1mov 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.46 0.11 0.18 FALSE
Scott Talisman Part 2 0.00 0.13 0.20 0.04 0.51 0.11 0.00 0.00 0.00 0.00 TRUE
Talisman Part 3 0.00 0.11 0.09 0.05 0.54 0.21 0.00 0.00 0.00 0.00 TRUE
Talisman Part 4 0.00 0.15 0.15 0.09 0.38 0.24 0.00 0.00 0.00 0.00 TRUE
Talisman Part 5 0.00 0.11 0.25 0.05 0.38 0.22 0.00 0.00 0.00 0.00 TRUE
The Lay Of The Last Minstrel 0.04 0.02 0.07 0.11 0.26 0.48 0.00 0.00 0.02 0.00 FALSE
Beethoven Piano Sonata N08 Op13 1mov 0.00 0.00 0.00 0.02 0.00 0.02 0.13 0.38 0.14 0.30 TRUE
Piano Sonata N08 Op13 3mov 0.00 0.00 0.00 0.00 0.00 0.00 0.43 0.42 0.05 0.10 FALSE
Piano Sonata N09_1 0.00 0.00 0.00 0.00 0.00 0.00 0.18 0.49 0.18 0.16 TRUE
Piano Sonata N09_2 0.00 0.00 0.00 0.00 0.00 0.02 0.23 0.58 0.12 0.05 TRUE
Piano Sonata N10 1mov 0.00 0.00 0.00 0.00 0.00 0.00 0.41 0.40 0.05 0.15 FALSE
Joyce A_Portrait_Of_The_Artist _1 0.00 0.45 0.07 0.35 0.11 0.02 0.00 0.00 0.01 0.01 FALSE
A_Portrait_Of_The_Artist _2 0.00 0.47 0.01 0.34 0.09 0.09 0.00 0.00 0.00 0.00 FALSE
A_Portrait_Of_The_Artist _3 0.00 0.38 0.04 0.46 0.07 0.04 0.00 0.00 0.00 0.00 TRUE
Finnegans_Wake_1 0.00 0.42 0.05 0.41 0.03 0.09 0.00 0.00 0.00 0.00 FALSE
Finnegans_Wake_2 0.03 0.41 0.03 0.34 0.06 0.12 0.00 0.00 0.00 0.00 FALSE
some neural mechanisms) (Gobet & Waters, 2023; Silver et
al., 2017).
The second important constraint is the inherent difficulty
of problems that were long considered to be “ill-posed”, yet
routinely solved by the brains/minds of various organisms
such as inverse optics and inverse kinematics (Palmer, 1999;
Pizlo, 2001). Musical and literature categorisation is one type
of this “ill-posed” problem. Psychological models of concept
learning traditionally struggled with such tasks and resorted to
either artificial categories with a few dimensions (e.g.,
Braunlich & Love, 2022; Nosofsky, 2011), pre-processed
natural categories into a few dimensions (e.g., Konovalova &
Le Mens, 2018; Nosofsky, 2011), or bootstrapping to hand-
crafted knowledge structures such as semantic dictionaries
(e.g., Lieto, 2019). More recently, there was a move to
combine psychological models with deep learning, where
ANNs do the heavy lifting of learning from raw data
(Battleday et al., 2020). In the current study, both models learn
from highly multidimensional raw data without bootstrapping,
with CHREST performing all the learning required with its
own mechanisms.
The third crucial constraint is the “single algorithm
hypothesis”, which proposes that visual, auditory, motor, and
somato-sensory cortices utilise approximately the same
algorithm to extract approximately one type of data structure
from various types of information (i.e., visual, auditory,
motor, etc) (Hawkins & Blakeslee, 2004; Mountcastle, 1978).
In this context, the similarity between CHREST and deep
learning is constrained in two important ways. Firstly, chunks-
based and engrams-based models converged in classification
of relatively dissimilar complex domains. Secondly, both
approximations of this “single algorithm” shared a common
activation mechanism which works on both cognitive chunks
and neural engrams. Having said that, of course, a conclusive
guarantee that the models’ overlap provides a unique
explanation is impossible (Lakatos, 1970). See Lieto (2022)
for more discussion on “function only” (functionalist) versus
“function + constraints” (structural) models as well as a more
general framework of evaluating bio-inspired models.
C. Future Research and Conclusions
We focused on one of the most complex, yet most
fundamental psychological processes concept learning.
Future research, while utilising a similar methodology, may
focus on other cognitive phenomena that involve chunking
(e.g., working memory, expertise, acquisition of grammar,
verbal learning, reasoning, and the list goes on) to further
ground chunking mechanisms in the neural substrate. For
example, it has been established that human working memory
contains around seven chunks at a time (Baddeley, 1986;
Miller, 1956; Robbins, 1995); and, that human experts’ LTM
typically possess between one to five hundred thousand
chunks with information specific to their domain (Gobet &
Simon, 1998; Richman et al., 1996). One natural extension of
the current study would be to adapt the chunk/engram
activation mechanisms proposed in the current paper to
translate the above work on chunking to connectionist models.
Such work would be of obvious benefit to both psychology
and AI. On the psychology side, this would further bridge
cognitive psychology, the neuroimaging studies of chunking
(Guida et al., 2012) and computational neuroscience. On the
AI side, establishing chunking mechanisms in deep learning
architectures would allow for better interpretability of the
The correspondence of chunking and other, non-RNN,
deep learning architectures also warrants further investigation.
We chose the RNNs as they have deep roots in psychology
(this was important for the constraint saturation aspects that
were discussed above). RNNs may also be considered to be
closely related to CNNs (Convolutional Neural Networks)
(LeCun et al., 1999), LSTMs (Long Short-term Memory)
(Hochreiter & Schmidhuber, 1997) and GRUs (Gated
Recurrent Units) (Cho et al., 2014) indeed, our current
engram activation mechanism is compatible with all of these
architectures. On the other hand, the latest advancements in
deep learning based on the transformer architecture
(Vaswani et al., 2017) are radically different in important
ways (e.g., in modelling of the attention function) and require
a separate study.
The current study addresses a long historical feud between
the symbolic/serial processing and subsymbolic/parallel
processing approaches to modelling cognition. Generally, the
focus on perception and bottom up processes is attributed to
the subsymbolic approach, while the focus on heuristics,
symbol manipulation and high levels of abstraction is
considered to be the way of the symbolic approach (for a
review, see Lieto, 2021). It is important to note that such
differentiation was not universally accepted. Indeed, Simon
and Newell considered perception to be a vital component in
symbolic models of intelligence, together with a physical
symbol system (Newell, 1990; Simon, 1981). Our CHREST
model also adheres to their view (which is natural, as it is
closely related to their models). However, contrary to Simon’s
intuitions (Simon, 1991, pp. 81-83) and more in line with
Newell’s thinking (Newell, 1990, p. 487), our results
demonstrate both serial and parallel approaches to be
convergent in important ways (with cognitive chunks and
neural engrams being equivalent in a narrow technical sense),
and with there being multiple levels of explicit representation
a mind-level representation and a brain-level representation.
To conclude, the current paper further bridges the gap
between cognition and its neural substrate by demonstrating
profound convergence between a rigorous cognitive
psychology-based model and its neuroscience-based
counterpart. Our findings connect the mechanisms proposed
by chunking theory to the neural network modelling approach,
and make further inroads towards a Unified Theory of
Cognition (Newell, 1990).
