ArticlePDF AvailableLiterature Review

Hierarchical Structure in Sequence Processing: How to Measure It and Determine Its Neural Implementation


Abstract and Figures

In many domains of human cognition, hierarchically structured representations are thought to play a key role. In this paper, we start with some foundational definitions of key phenomena like “sequence” and “hierarchy," and then outline potential signatures of hierarchical structure that can be observed in behavioral and neuroimaging data. Appropriate behavioral methods include classic ones from psycholinguistics along with some from the more recent artificial grammar learning and sentence processing literature. We then turn to neuroimaging evidence for hierarchical structure with a focus on the functional MRI literature. We conclude that, although a broad consensus exists about a role for a neural circuit incorporating the inferior frontal gyrus, the superior temporal sulcus, and the arcuate fasciculus, considerable uncertainty remains about the precise computational function(s) of this circuitry. An explicit theoretical framework, combined with an empirical approach focusing on distinguishing between plausible alternative hypotheses, will be necessary for further progress.
Content may be subject to copyright.
Topics in Cognitive Science (2019) 1–15
©2019 The Authors Topics in Cognitive Science published by Wiley Periodicals, Inc. on behalf of Cognitive
Science Society
ISSN:1756-8765 online
DOI: 10.1111/tops.12442
This article is part of the topic “Learning Grammatical Structures: Developmental, Cross-
species and Computational Approaches,” Carel ten Cate, Clara Levelt, Judit Gervain, Chris
Petkov and Willem Zuidema (Topic Editors). For a full listing of topic papers, see http://
Hierarchical Structure in Sequence Processing: How to
Measure It and Determine Its Neural Implementation
Julia Udd
Mauricio de Jesus Dias Martins,
Willem Zuidema,
W. Tecumseh Fitch
Department of Psychology, Department of Linguistics, Stockholm University
Swedish Collegium for Advanced Study (SCAS)
Berlin School of Mind and Brain, Humboldt Universit
at zu Berlin
Max Planck Institute for Human Cognitive and Brain Sciences
Clinic for Cognitive Neurology, University Hospital Leipzig
Institute for Logic, Language and Computation, University of Amsterdam
Department of Cognitive Biology, Faculty of Life Sciences, University of Vienna
Received 1 April 2018; received in revised form 17 June 2019; accepted 17 June 2019
In many domains of human cognition, hierarchically structured representations are thought to
play a key role. In this paper, we start with some foundational definitions of key phenomena like
“sequence” and “hierarchy," and then outline potential signatures of hierarchical structure that can
be observed in behavioral and neuroimaging data. Appropriate behavioral methods include classic
ones from psycholinguistics along with some from the more recent artificial grammar learning and
sentence processing literature. We then turn to neuroimaging evidence for hierarchical structure with
a focus on the functional MRI literature. We conclude that, although a broad consensus exists about
a role for a neural circuit incorporating the inferior frontal gyrus, the superior temporal sulcus, and
the arcuate fasciculus, considerable uncertainty remains about the precise computational function(s)
Correspondence should be sent to Julia Udd
en, Department of Psychology, Stockholm University, SE-106
91 Stockholm, Sweden. E-mail:; W. Tecumseh Fitch, Department of Cognitive Biol-
ogy, University of Vienna, Althanstrasse 14, 1090 Vienna, Austria. E-mail:
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial
License, which permits use, distribution and reproduction in any medium, provided the original work is
properly cited and is not used for commercial purposes.
of this circuitry. An explicit theoretical framework, combined with an empirical approach focusing
on distinguishing between plausible alternative hypotheses, will be necessary for further progress.
Keywords: Hierarchical structure; Sequence processing; Nested grouping; Neural signatures
1. The challenge of hierarchy
Since the cognitive revolution, the cognitive and neurosciences have sought an account
of perception, motor, and higher cognitive faculties such as language and memory in
terms of specific representations.In several cognitive domains, including most promi-
nently language, a seminal suggestion (Chomsky, 1957; Lashley, 1951; Simon, 1962) is
that the human mind creates hierarchical representations, even when the sensory input is
sequentially presented (or the output is a sequence of actions).
For most linguists, the hierarchical nature of linguistic representations is self-evident, and
most explicit theories of language processing take hierarchical abilities as a given, as do
several theories of musical structure (Fitch, 2013; Lerdahl & Jackendoff, 1983). However,
empirically demonstrating the existence of hierarchical structure in cognition, particularly
outside of language, remains a challenge for at least two reasons. One is terminological:
because scholars use the term “hierarchical” in different ways, a valid test for one concep-
tion of hierarchy may not apply to another. The second is more interesting and substantial:
Our lack of direct access to cognitive structures means that certain types of hierarchies can-
not be distinguished, empirically, from other structures (e.g., sequences).
Current controversy in neurolinguistics illustrates this point. Distinct theories of syntax
posit different hierarchical operations, leading researchers to analyze the neural basis of
syntax in terms of “move” (Caplan, 1987), “merge” (Berwick et al., 2013), or “unify”
(Hagoort, 2005). Additional hierarchy-building operators are also available (e.g., “adjoin”
in tree adjoining grammar; Joshi, 2003). However, it is a true challenge, if possible, to
empirically distinguish among such fine theoretical distinctions due to the “granularity
mismatch” problem (Embick & Poeppel, 2015). Future progress will require a unified
perspective broad enough to capture what these syntactic operators have in common, but
specific enough to distinguish hierarchical from sequential processing.
In this paper, we thus begin with unambiguous, explicit definitions of key concepts for
the following discussion, especially “hierarchy," “sequence,"and “tree” (see also Fitch,
2013; Zuidema, Hupkes, Wiggins, Scharff, & Rohrmeier, 2018). This provides an explicit
framework for the following review, which focuses on how to empirically distinguish
between hierarchical and sequential processing in different domains. Our goal is not to
develop a new theory of syntax or hierarchy, but rather to use well-established terminol-
ogy from mathematics (especially graph theory) to clarify and sharpen our subsequent
data-focused review. Only from such a general perspective will it be possible to deter-
mine whether the behavioral and neural signatures of hierarchy differ between domains.
With these clarifications in hand, we then turn to our main focus: critically reviewing
possible empirical indicators of hierarchical structure and/or processing that have been
2J. Udd
en et al./ Topics in Cognitive Science (2019)
proposed, beginning with behavioral data and then turning to neuroimaging data. We
argue that despite considerable controversy concerning terminology and theory, there is
consistent converging evidence that a specific frontotemporal network is involved in hier-
archy building, and this network is similarly activated by hierarchical processing in dif-
ferent domains (especially music and language).
2. Definitions
Our goal is to formulate a general but explicit classification of different hierarchical and
non-hierarchical structures, allowing comparisons of linguistic hierarchical structure and
processing with that in other domains such as music or social cognition. Given this goal,
we must avoid formulations that prematurely embody or entail language- or music-specific
constructs (e.g., c-command or musical key) while allowing space for those constructs as
special cases. To achieve this, we adopt the overarching terminology of graph theory, in
which such fundamental notions as “sequence," “tree," and “network” can be explicitly
defined. Graph theory is clear, well-developed, widely known, and widely used in computer
science, as well as neuroscience (Bullmore & Sporns, 2009). Although other formulations
are possible, particularly for domain-specific applications, they lack the combination of
generality and clarity that we aim for here. For example, there are set-theoretic models of
syntax, providing an alternative formulation of hierarchical containment relations via sub-
sets, as well as model theoretic accounts and vector-space models (see, e.g., Pullum &
Scholz, 2001; Repplinger, Beinborn, & Zuidema, 2018). The difficulty is that sets are by
definition unordered, thus excluding the core notion of “sequence.” Furthermore, mathe-
matical sets cannot contain duplicates (while graphs can) and thus are ill-suited as repre-
sentations of sentences with repeated words or melodies with repeated notes.
2.1. Hierarchical and sequential structure
The notion of hierarchical structure that we are interested in contrasts with sequential
structure. How can we define these terms formally? We examine these concepts from the
perspective of graph theory and computing theory (see Harel & Feldman, 1987; Wilson,
1996 for accessible introductions). We will consider structures over a collection of dis-
crete, categorical items: These could be collections of words, notes, syllables, phones,
primitive actions, or any other entities that are represented by the brain, but also cognitive
categories that encompass multiple items, such as the categories of birds, nouns, or nasal
Agraph is a mathematical structure composed of nodes representing items and edges
connecting nodes. Edges represent arbitrary relationships between these items (such as
“close to,” “resembles,” “implies”, etc.). There is no further limitation on graphs, though
we will confine ourselves here to connected graphs, where every item is linked to the
group via at least one link (Fig. 1A). An intuitive example of a graph would be the
nations of the world, with the distances to their neighbors indicated by the edges.
J. Udd
en et al. / Topics in Cognitive Science (2019) 3
A graph is a very general notion; restrictions on the “graph” definition create subtypes,
such as directed acyclic graphs (DAGs), where the edges have a directionality (Fig. 1B),
pointing from parent node to child node. "Acyclic" means that there are no circles or
loops in this structure, implying that no node can (indirectly) dominate itself. In DAGs, a
terminal may have more than one parent node, but the graph nonetheless remains acyclic.
Terminal nodes are connected to only their parent and have no dependent nodes (they
have "one end free"): Terminals often represent explicit, perceptibly expressed items
(e.g., numbers, words, musical notes, individuals, etc.) but sometimes also “null ele-
ments," “traces," etc. (in linguistics) or silent rests (in music). Non-terminal nodes are
connected with at least one child, and they can be either perceptually expressed (as in
Fig. 1F) or not (as in Fig. 1E). When these nodes are not explicitly represented and need
to be inferred by the listener/viewer, they are often termed internal nodes.
An important subtype is the “rooted” graph, a graph which has a root node (Fig. 1C).
The notion of a root node is intuitive: There is some single node from which the entire
rest of the graph can be said to emanate, meaning that this node is the (indirect) parent
of all others.
A simple example of a rooted DAG is a sequence, which is a group of items that is
accessed in a specified order (e.g., the alphabet [a,b,c, ..., z]). In sequences, each node
has exactly one child (except for the terminal, which has none) and one parent (except
for the root) (Fig. 1F), thus enforcing an obligatory reading order. Accordingly, sequences
have a single terminal. These limitations do not apply to hierarchies.
Hierarchy entails a more complex rooted DAG in which at least one node has more
than one child, and every node has exactly one parent (except for the root). Since a par-
ent with two children implies “branching” of the directed graph, hierarchies are com-
monly called trees.
Fig. 1. (A) Connected graph; (B) directed acyclic graph (DAG); (C) rooted DAG; (D) right-branching tree;
(E) multiply nested tree; (F) sequence. Non-terminal nodes in C, D, and E are represented as black dots;
other items as letters. The crucial difference between hierarchies (C, D, and E) and sequences (F), both
rooted DAGs, is that in the former, at least one node has more than one child, which implies that hierarchies
have more than one terminal. Although it is conventional to represent terminal nodes as ordered from left to
right, these terminal nodes can be either unordered or ordered using some supplemental enumeration method
(e.g., alphabetic).
4J. Udd
en et al. / Topics in Cognitive Science (2019)
Both sequences and trees are rooted DAGs, in which items are ordered or ranked along
a “root-to-terminal” axis. In the case of sequences, there is only one path from root-to-
terminal (the final element). In the case of hierarchies, “branching” implies several root-
to-terminal paths and, therefore, more than one terminal. This crucial difference endows
hierarchies with an additional group of itemsthe set of terminals ={terminal1, termi-
nal2, ...}which is unordered. This unordered set creates a secondary representational
dimension along a “terminal-to-terminal” axis, which can acquire any arbitrary perceptual
organization (spatial or temporal), independent of the root-to-terminal order.
We can now use these structural definitions to define our central concepts:
Sequential structure: a rooted DAG in which no node has more than one child, thus
being limited to a single order along the root-to-terminal axis and possessing a single ter-
Hierarchical structure: a rooted DAG in which at least one node has more than one
child, thus forming a branching tree. In this structure, items are ordered along a root-to-
terminal axis. In addition, due to branching, there is more than one terminal. Unless spec-
ified by some secondary method, the set of terminals is unordered along the terminal-to-
terminal axis.
This distinction allows us to frame a central aspect of trees in human cognition: They
frequently embody both hierarchical and sequential structure. In language, utterances con-
tain words in a sequence, while musical melodies have notes in sequence. Perceptually,
words or notes are expressed through time in a sequential manner. At the same time, syn-
tactic relations between these elements typically implies hierarchical structure which can-
not be fully represented by the perceptually explicit sequential structure. This means that
a listener processing a string of items (where only the sequential structure is explicit)
must infer the internal nodes that determine the hierarchical structure, of which words or
notes are only the set of terminals. Although clues to this hierarchical structure may be
present, including speech prosody, embedding markers like “that," or musical phrasing,
these do not fully specify the structure. Thus, trees exist in the mind of the beholder, not
in the perceptual stimulus itself. A key desideratum in understanding hierarchical cogni-
tion is thus understanding how and why hierarchical structures can be reliably output as
sequences (cf. Kayne, 1994), and how those sequences converted (“parsed”) back into
hierarchical structures. The existence of additional hierarchical representations that per-
ceivers impose or “project” onto a sequentially presented stimulus affords several key sig-
natures of hierarchy, discussed below.
Hierarchical representations of linguistic structure are central in all major linguistic
theories, including theories of phonological structure, theories of morphological structure,
theories of sentence-level semantics, theories of dialogue and discourse structure, and
both phrase-structure and dependency-structure-based theories of syntax. Trees are the
simplest graphs that can account for argument structure (“who does what to whom”) and
the productivity of language. However, they are not complex enough to account for cer-
tain syntactic phenomena such as pronouns and anaphora, or sentences such as “Mary
pretends not to hear” (where Mary is the subject of both “pretend” and “hear”). Linguists
would argue that such phenomena necessitate more complex graphs than trees, as do
J. Udd
en et al. / Topics in Cognitive Science (2019) 5
more unusualand controversialphonological phenomena such as ambisyllabicity,
where the same consonant is “owned” by two different syllables. Hierarchical structure is
also assumed in many theories of musical structure (Lerdahl & Jackendoff, 1983; Rohr-
meier, 2011), although empirical demonstrations distinguishing hierarchical from sequen-
tial structure turn out to be challenging. The difficulties stem from the fact that, as
mentioned above, in many cognitive domains including language, music, and action, tree
structures are “serialized” for performance, so that each hierarchical terminal (word, note,
action, etc.) is perceptually expressed in a specific sequential order.
The central difficulty in clearly distinguishing hierarchical from sequential structure is
illustrated by Fig. 1C, 1, and 1, which shows three examples of structures that are unam-
biguously hierarchical, theoretically, but if read out serially (from left to right) are very
difficult to distinguish from purely sequential structures.
We will focus on sequentially presented stimuli, discussing signatures of hierarchical
structure (i.e., representation) and generation processes separately. An overview of the
methods is present in Table 1.
2.2. Signatures of hierarchical structures in representation: Distance methods
One class of approaches for demonstrating the cognitive reality of hierarchy distin-
guishes between “hierarchical distance,” which is the number of intervening superior
nodes in the path from one terminal to another, from “sequential distance,” which is sim-
ply how many intervening terminals we see in the sequential output. This distinction lies
at the heart of many empirical indicators of hierarchical structure.
This method cannot, however, be applied to all hierarchies. For instance, in Fig. 1C all
terminals are hierarchically next-door neighbors, even though sequentially at different dis-
tances. Only if we had unambiguous measures of hierarchical and sequential distance
could we demonstrate that the terminals are hierarchically arranged. In the “right branch-
ing” tree, Fig. 1D, the difficulty is the opposite: Sequential and hierarchical distances are
perfectly correlated. Terminal ‘a’ is just as far, hierarchically, from terminal ‘e’ as it is
sequentially. In both cases, it will be challenging to evaluate the hierarchical structure
empirically using distance methods. Fig. 1E shows the type of tree that supports
Table 1
Overview of methods. Some methods can formally establish the presence of hierarchical structure, while
others rather are simply compatible with the presence of such structure (see text)
Distance methods
Hierarchical distance shorter than sequential distance
Levelt’s analysis of similarity/relatedness
Automatic hierarchical clustering methods
Presence of long distance dependencies
Generalization and error-based methods
Hierarchical generalization and foils
Structural priming
Deletions and insertions
6J. Udd
en et al. / Topics in Cognitive Science (2019)
unambiguous attribution of hierarchy. Here, a multiply nested tree has terminal pairs
where sequential and hierarchical distances are clearly different: Although the sequential
distance from c to d is the same as d to e, hierarchically c and d are neighbors while d
and e are four nodes apart.
In natural language, the sequential/hierarchical distance distinction provides the clear-
est demonstration of hierarchy, using semantic interpretation. Given the sentence “the
boy who fed the dog chased the girl," we can ask the semantically based question “who
chased the girl." The answer is “the boy”: Although “the dog” is closer to “chased” in
the sequence, its hierarchical distance is longer than the hierarchical distance from “boy”
to “chased." This and many other phenomena in syntax make language a domain where
the presence of tree structure is practically undisputed (although its pervasiveness has
recently been questioned (Frank, Bod, & Christiansen, 2012)).
Levelt (1970a,1970b) developed a behavioral paradigm to test the psychological reality
of hierarchical structure in a sentence processing experiment, based on Johnson (1967).
Levelt first investigated how the probability of identification/recall of each word in a sen-
tence presented in noise depended on the identification/recall of other words in that sen-
tence (Levelt, 1970a). High conditional probabilities suggest that a cluster (a “subgroup”
in our terms) was formed between these words. Additionally, participants ranked similari-
ties between all possible pairs of three randomly selected words. High similarity rankings
between two words suggest these words form a subgroup. Levelt then used each hierar-
chical structure derived from these data as a model to generate predictions then tested on
the data, creating a measure of fit of the best hierarchical model. A very good fit implies
the psychological reality of the hierarchical structured analysis. In Levelt’s case, only
about 5% of the model predictions failed to show up in the data (Levelt, 1970a); he thus
concluded that hierarchical structure was indeed present. Outside the domain of language,
analysis of response times across sequences of keypresses during motor learning have
also been used to demonstrate patterns consistent with the representation of motor clus-
ters, which cannot be explained by simple sequential associations (Hunt & Aslin, 2001;
Verway et al., 2011; Verway & Wright, 2014).
Demonstration of long-distance dependencies, where the interpretation of one part of
the sequence depends on another, distant, part, is also indicative of hierarchical structure
(like the “boy chased” example above). To establish that a long-distance dependency is
present, we can generate stimuli using a suitable artificial grammar and then test if partic-
ipants parse the stimuli hierarchically using the similarity/relatedness method. Long-dis-
tance dependencies require memory, which can also be investigated (see Section 3.1).
This permits investigation of sequences with multiple long-distance dependencies,
whether crossed or nested (for crossing dependencies see, e.g., Udden, Ingvar, Hagoort,
& Petersson, 2017). Successful processing of nested long-distance dependencies in classi-
cal music has also been demonstrated, using a violation paradigm (Koelsch et al, 2013).
These authors, however, point out that it is still unclear whether multiple (more than one)
simultaneous embedded dependencies are processed in music.
These methods have been applied in domains where semantically based diagnostics do
not apply, such as prosody. Certain prosodic phenomena, such as phrase-final
J. Udd
en et al. / Topics in Cognitive Science (2019) 7
lengthening, provide indications of phrase structure, and if these are nested are consistent
with a hierarchical interpretation (Morgan, 1996).
2.3. Generalization and error-based methods
Convincing evidence for the presence of hierarchical structure is also provided by hier-
archical generalization, when a set of terminals can be flexibly rearranged in a way that
obeys a posited hierarchical structure (e.g., “the girl who fed the boy chased the dog,"
“the dog who fed the girl chased the boy," etc.) without generating ill-formed alternatives
(“the dog fed chased the boy the girl”). In an artificial grammar learning (AGL) experi-
ment, for instance, we can investigate whether a participant generalizes to new sequence
exemplars following a hierarchical grammar, while rejecting sequences violating the
grammar (but including the same collection of terminals) as non-grammatical. To be con-
vincing, such experiments should evaluate whether participants exposed to training stim-
uli generated by hierarchical rules generalize to new exemplars of different lengths and
reject carefully-selected foils (cf. Fitch & Friederici, 2012). The approach of testing the
ability to generate well-formed hierarchical structures by acquiring the appropriate gener-
ative rules and applying them beyond the given perceptual stimuli has also been success-
fully used in the visual-spatial, motor, and tonal domains (Jiang et al., 2018; Martins
et al., 2019, 2014, 2017).
Another behavioral method is termed structural priming (cf. Branigan & Pickering,
2017). Structural priming experiments can establish that, for example, sentence structure
is primed, rather than specific terminals, by replacing terminals from the “prime”
sequence in the “target” sequence. Recognition (or production) of the target sequence is
then facilitated by recent exposure to the prime sequence. Priming effects are typically
quantified by decreased reaction times or decreased neural activity. Structural priming
does not, however, provide definitive proof of hierarchical structure: A priming effect
only shows that some kind of structure was primed, but it does not necessarily distinguish
hierarchical from sequential structure (not until this difference is specifically addressed).
Finally, in sequences, a node is only connected to other adjacent nodes. Thus, deletion
of any node (except for the first and last) should hinder generation of the sequence. If
participants halt when facing deletions or insertions, this suggests sequential rather than
hierarchical structure. However, this method also does not provide a definitive proof,
because hesitations or increased error rates are the probable observable outcomes of such
a halt, and these effects might also be predicted (albeit to a lesser extent) by hierarchical
2.4. Signatures of hierarchical structures in generating processes
Online behavioral and neural data can be analyzed to test the psychological reality of
the processes that generate hierarchical structures, if these processes are made computa-
tionally explicit by means of a processing model (e.g., the grouping, ordering and nested
grouping/hierarchical branching processes discussed above). A processing model,
8J. Udd
en et al. / Topics in Cognitive Science (2019)
including, for example, nested grouping, must also specify that increased load on some
part of the process will lead to increased effort (“effort” in this context does not imply
deliberate thought processes). For behavioral data, online measures such as reading/re-
sponse time, or performance under dual task conditions, provide metrics to measure
effort. Additional online methods for measuring effort include eye-tracking fixation time
data, or neural measures including deflections of particular ERP-responses, oscillatory
MEG responses, electrocorticography (ECoG) data, or fMRI BOLD-responses. An under-
lying hierarchical structure is suggested when increased load in a putative nested group-
ing process correlates with increased effort as measured by such behavioral and neural
A seminal example of this approach is an fMRI-study by Pallier et al. (2011), which
presented word groups of different sizes, varying from 2 to 12 words, but always in 12-
word sequences (thus one to six groups per sequence). Assuming larger constituent sizes
require increased activity in group-generating processes, any neural signal that parametri-
cally increases with constituent size is potentially diagnostic of hierarchical structure
building. The location of activity can furthermore indicate where such computational pro-
cesses are implemented. Using a Jabberwocky (nonce word) condition, where content
words are replaced with non-words to control for semantic processes, Pallier et al. (2011)
located the non-semantic structure-building processes to the LIFG and left posterior supe-
rior temporal sulcus (LpSTS).
Timing and incremental processing can provide further evidence for a process-based
signature of hierarchical structure. To parse a sequence into a tree structure, the listener
needs to place “open” terminals (those requiring additional terminal(s) to satisfy their
relations) into some auxiliary memory store (e.g., a “stack” or a random access memory)
until their appropriate completion terminal(s) arrive, so that they can be inserted into the
nested grouping structure. Just as the presence of long-distance dependencies is indicative
of hierarchical structure, increased activation of an external memory store with increasing
“open” terminals also provides a signature of hierarchical processing.
Multiple assumptions underlie this approach, for example, that all levels of nested
groups that can be formed (i.e., all structures that can be built) at a certain time step will
be formed at that time step. The more deeply nested a group is, the more it depends on
the completion of higher nesting levels, so that it will be completed later. The number of
groups that can be formed at each time step can thus be translated to a time course of
nested grouping effort that can be matched to the online effort data. Examples of this
approach, where the incremental dimension of hierarchical processing (of sentences)
emerges, include Nelson et al. (2017) using ECoG-data or Udden et al. (2019a), using the
To conclude this section, the list of methods in Table 1 implies multiple potential indi-
cators of hierarchical structure, no one of which will apply in all cases or to all cognitive
domains. The most convincing evidence would be cumulative, when multiple signatures
are demonstrated for a particular cognitive domain (or species). In addition, “model selec-
tion” approach (Chamberlin, 1890; Platt, 1964) will be crucial for experiments attempting
to distinguish sequential and hierarchical structure, since the data may be consistent with
J. Udd
en et al. / Topics in Cognitive Science (2019) 9
a hierarchical model, but more (or equally) consistent with a sequential one. Only when
the hierarchical model clearly provides a superior fit can we confidently conclude that
hierarchy is the best explanation (e.g., Ravignani, Westphal-Fitch, Aust, Schlumpp, &
Fitch, 2015; van Heijningen, de Visser, Zuidema, & ten Cate, 2009).
3. Neural signatures of hierarchical processing
Recall that we identified "nested grouping" as a key process by which hierarchical
structure emerges (Section 2.22.3), and that additional demands on memory are a signa-
ture for hierarchical structure. Both nested grouping and such involvement of "auxiliary
memory"play important roles in the literature on the neural signatures of hierarchical pro-
cessing discussed next.
We will restrict our analysis to language, even though there is an interesting literature
on hierarchical processing in other domains. The reason is that in those other domains,
information is often not accessed via sequences with fixed order (e.g., visuospatial, social,
or navigation hierarchies, in which hierarchical processing also involves nested grouping),
and may therefore not involve the same auxiliary memory systems as those used to pro-
cess structured sequences (as in speech or music). Moreover, two recent papers suggest
specializations to particular kinds of content, even in domains where the information is
presented sequentially (e.g., visual vs. verbal; Milne, Wilson, & Christiansen, 2018;
Udden & M
annel, 2019b).
3.1. Divisions of nested grouping and auxiliary memory
What memory systems are used in processing hierarchical structure? It is important to
distinguish between different auxiliary memory systems, which may include activation of
long-term memory stores (e.g., lexical retrieval), or different forms of working memory
(McElree, 2006). When viewing working memory capacity as distinct from long-term
memory, we should be precise regarding domain-specificity of the memory store. Is it
“just” the phonological loop (or echoic memory, or iconic memory, etc.) which is used,
or do stores specific for sequence processing exist? Several recent proposals help sharpen
this distinction.
Based on data from dynamic causal modeling (Makuuchi & Friederici, 2013), Frieder-
ici (2017) proposes a generic working memory system for sentence processing in inferior
frontal sulcus (IFS) and the inferior parietal sulcus (IPS), connected via the superior lon-
gitudinal fasciculus, and further suggests that Merge (a process forming nested groupings)
takes place in the ventral BA44, a suggestion reminiscent of Fedorenko’s proposal of a
domain-general system interacting with a neighboring domain-specific language system.
The proposed domain general system is located just dorsal to the endpoint of the arcuate
fasciculus connecting LIFG and the posterior temporal lobe (Fedorenko, 2014).
In Matchin’s (2017) model, the posterior LIFG (BA45) supports syntactic working
memory specifically, by applying a general working memory function to domain-specific
10 J. Udd
en et al. / Topics in Cognitive Science (2019)
syntactic representations. In this model, even though this working memory function is
syntax specific, it is still separated from the structure building process per se, suggested
to rely on LpSTS and/or distributed oscillatory signals.
3.2. Neural signatures of nested grouping
Turning to the processes generating nested grouping per se, a common suggestion is
that nested grouping processes may be accomplished by oscillations nested in time, a pro-
posal complementary to the spatial localization approaches discussed above. Investigating
this suggestion requires computational approaches to modelling neurophysiology, most
important intrinsic neural oscillations. A recent seminal paper suggesting that nested
grouping could be implemented using nested oscillations (Ding, Melloni, Zhang, Tian, &
Poeppel, 2016) has inspired further suggestions integrating this hypothesis with theoreti-
cal linguistics (e.g., that brain rhythms can be naturally linked to the linguistic notion of
phases; Boeckx, 2017).
In Section 2.1, we noted that the operations Merge and Unify characterize specific bot-
tom-up views of nested group formation. Recent work explicitly aims to localize merge,
linking theoretical linguistics and neuroimaging, but results differ across two laboratories.
Friederici’s group (Zaccarella & Friederici, 2015; Zaccarella, Meyer, Makuuchi, & Frie-
derici, 2017) localized Merge to ventral BA44, in line with Friederici’s model (2017). In
contrast, Sakai’s laboratory, using a parametric design with a varying number of Merge
applications needed to comprehend sentences (Ohta, Fukui, & Sakai, 2013), observed
increased activity along the left IFS and in the left parietal cortex with the number of
merges applied. These two lines of work build on an earlier paper (Musso et al., 2003),
which no longer meets today’s power standards (Button et al., 2013).
Furthermore, in a recent meta-analysis, multiple studies including sentence versus word
list conditions were reinterpreted as a merge versus no-merge contrast (Zaccarella, Schell,
& Friederici, 2017), yielding similar observations as those of Pallier et al. (2011).
Although their fMRI or meta-analytic data did not distinguish the LIFG from the pSTS,
they nonetheless interpreted BA44 as the location of merge, while left pSTS was inter-
preted as subserving a later integration with semantics. Another recent review specifies
the function of left pSTS as labeling (making the tree a rooted tree by categorization/
headedness, cf. Goucha, Zaccarella, & Friederici, 2017).
3.3. Neural signatures of auxiliary memory
An ECoG-study by Nelson et al. (2017) used a model of incremental-generating pro-
cesses where the number of open syntactic nodes varies across presented word. This was
used as an explanatory variable explaining the high-frequency component of the intracra-
nial ECoG signal. As in the Pallier study, activity in LIFG and left pSTS corresponded
well with this index of hierarchical structure generation. In a model comparison approach,
their results were interpreted as supporting modeling sentences as hierarchical (rather than
J. Udd
en et al. / Topics in Cognitive Science (2019) 11
In a recent fMRI study by Udden et al. (2019a) using both visual and auditory sen-
tence processing, these results were extended by showing a functional asymmetry in neu-
ral processing of dependencies that go from the head to the dependent (i.e., left-
branching) compared to the other way around (right-branching). The crucial difference is
that non-attached constituents must be maintained online (e.g., pushed on a stack) only in
the left-branching case. This occurs only if there is a asymmetrical hierarchical structure
present in the sentences, an analysis which the study thus supports. Parametrically
increased stack depth (the number of simultaneously open left branching dependencies)
correlated with activity in LIFG and left pSTS. The corresponding measure for right
branching dependencies (not requiring syntactic working memory) activated the anterior
temporal lobe; another complexity measure not distinguishing left from right dependen-
cies (total dependency length) activated left pSTS only. Note, however, that both studies
are still limited in the extent to which they can strictly distinguish auxiliary memory from
structure building processes, as sentences with high load on memory are also structurally
In order to disambiguate between these components, it can be useful to look at
domains with different auxiliary memory systems. For instance, during visual-spatial and
motor hierarchical processing, neither LIFG nor pSTS seemed to support hierarchical
branching (Martins et al., 2014, 2019). However, in these studies, the production of hier-
archies was highly automatized. A recent voxel-based symptom-lesion replication with
untrained participants suggests that in the visual domain, while pMTG is crucial for the
acquisition of hierarchical branching rules, LIFG rather supports cognitive control.
3.4. Concluding observations
In summary, by gradually clarifying the central theoretical distinctions and common-
alities, different authors are increasingly recognizing closely related theoretical notions
of hierarchy. The empirical studies on linguistic syntax reviewed above reveal consis-
tent activations of LIFG and left pSTS, whether they are focused on nested grouping or
external memory processes (noteworthy since the studies were selected for review based
on their empirical approach, not their results). Models based on the sequential/hierarchi-
cal distinction all include activation of these regions. Although some models suggest
LIFG as a structure building “hotspot” and others suggest left pSTS, there is a broad
consensus that sequence processing over hierarchical structures in language can be
assigned to a circuit incorporating the LIFG and LpSTS, via their connections in the
arcuate fasciculus.
Extending this approach beyond music and language to further cognitive domains (e.g.,
vision, action or spatial navigation) may help to further clarify the division of labor
between brain areas. Neither spatial location nor oscillatory neural activity alone can pre-
cisely specify the computations underlying hierarchical processing. However, explicit pro-
cessing models that allow stimulus material to be parameterized allow precise focus on
the key hierarchy/sequence distinction. Combined with a model comparison approach,
this kind of theoretical clarity will provide the necessary basis for future progress,
12 J. Udd
en et al. / Topics in Cognitive Science (2019)
allowing us to identify the signatures of hierarchical structure in human cognition, and
further pinpoint and understand its computational and neural underpinnings.
JU was supported by “Stiftelsen Riksbankens Jubileumsfond” via the Swedish Col-
legium for Advanced Study (SCAS) Pro Futura Scientia program. Preparation of this
paper was also supported by Austrian Science Fund (FWF) DK Grant “Cognition &
Communication” (#W1262-B29) to WTF.
Berwick, R. C., Friederici, A. D., Chomsky, N., & Bolhuis, J. J. (2013). Evolution, brain, and the nature of
language. Trends in Cognitive Sciences,17,8998.
Boeckx, C. (2017). A conjecture about the neural basis of recursion in light of descent with modification.
Journal of Neurolinguistics,43, 193198.
Branigan, H. P., & Pickering, M. J. (2017). Structural priming and the representation of language. Behavioral
and Brain Sciences, e282.
Bullmore, E., & Sporns, O. (2009). Complex brain networks: Graph theoretical analysis of structural and
functional systems. Nature Reviews Neuroscience,10, 1860198.
Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafo, M. R.
(2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews.
Neuroscience,14, 365376.
Caplan, D. (1987). Neurolinguistics and linguistic aphasiology. New York: McGraw Hill.
Chamberlin, T. C. (1890). The method of multiple working hypotheses. Science,148, 754759.
Chomsky, N. (1957). Syntactic structures. The Hague: Mouton.
Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic
structures in connected speech. Nature Neuroscience,19, 158.
Embick, D., & Poeppel, D. (2015). Towards a computational(ist) neurobiology of language: Correlational,
integrated and explanatory neurolinguistics. Language, Cognition and Neuroscience,30, 357366.
Fedorenko, E. (2014). The role of domain-general cognitive control in language comprehension. Frontiers in
Psychology,5(5), 335.
Fitch, W. T. (2013). Rhythmic cognition in humans and animals: Distinguishing meter and pulse perception.
Frontiers in Systems Neuroscience,7, 68.
Fitch, W. T., & Friederici, A. D. (2012). Artificial grammar learning meets formal language theory: An
overview. Philosophical Transactions of the Royal Society B: Biological Sciences,367, 19331955.
Frank, S. L., Bod, R., & Christiansen, M. H. (2012). How hierarchical is language use? Proceedings of the
Royal Society of London B: Biological Sciences,279, 45224453.
Friederici, A. D. (2017). Language in our brain: The origins of a uniquely human capacity. Cambridge, MA:
MIT Press.
Goucha, T., Zaccarella, E., & Friederici, A. D. (2017). A revival of Homo loquens as a builder of labeled
structures: Neurocognitive considerations. Neuroscience and Biobehavioral Reviews,81, 213224.
Hagoort, P. (2005). On Broca, brain, and binding: a new framework. Trends in Cognitive Sciences,9, 416
Harel, D., & Feldman, Y. A. (1987). Algorithmics: The spirit of computing. Berlin: Springer-Verlag.
J. Udd
en et al. / Topics in Cognitive Science (2019) 13
Hunt, R. H., & Aslin, R. N. (2001). Statistical learning in a serial reaction time task: Access to separable
statistical cues by individual learners. Journal of Experimental Psychology: General,130, 658680.
Joshi, A. (2003). Tree-adjoining grammars. In R. Mikkov (Ed.), Oxford handbook of computational
linguisitcs (pp. 483501). New York: Oxford University Press.
Jiang, X., Long, T., Cao, W., Li, J., Dehaene, S., & Wang, L. (2018). Production of supra-regular spatial
sequences by macaque monkeys. Current Biology,28, 18511859.
Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika,32, 241254.
Kayne, R. S. (1994). The antisymmetry of syntax. Cambridge, MA: MIT Press.
Koelsch, S., Rohrmeier, M., Torrecuso, R., & Jentschke, S. (2013). Processing of hierarchical syntactic
structure in music. Proceedings of the National Academy of Sciences,110, 1544315448.
Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral mechanisms
in behavior; the Hixon symposium (pp. 112146). New York: Wiley.
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press.
Levelt, W. J. (1970a). Hierarchical chunking in sentence processing. Perception & Psychophysics,8,99103.
Levelt, W. J. (1970b). A scaling approach to the study of syntactic relations. In Flores d’ArcaisG. B. &Levelt
W. J. M. (Eds.),Advances in psycholinguistics (pp. 109-121). Amsterdam: Elsevier.
Makuuchi, M., & Friederici, A. D. (2013). Hierarchical functional connectivity between the core language
system and the working memory system. Cortex; A Journal Devoted to the Study of the Nervous System
and Behavior,49, 24162423.
Martins, M. J. D., Bianco, R., Sammler, D., & Villringer, A. (2019). Recursion in action: An fMRI study on
the generation of new hierarchical levels in motor sequences. Human Brain Mapping,40, 26232638.
Martins, M. J. D., Fischmeister, F. P., Puig-Waldm
uller, E., Oh, J., Geißler, A., Robinson, S., Fitch, W. T., &
Beisteiner, R. (2014). Fractal image perception provides novel insights into hierarchical cognition.
NeuroImage,96, 300308.
Martins, M. J. D., Gingras, B., Puig-Waldmueller, E., & Fitch, W. T. (2017). Cognitive representation of
“musical fractals”: Processing hierarchy and recursion in the auditory domain. Cognition,161,3145.
Martins, M. J. D., Krause, C., Neville, D. A., Pino, D., Villringer, A., & Obrig, H. (in press). Recursive
hierarchical embedding in vision is impaired by posterior middle temporal gyrus lesions. Brain.
Matchin, W. G. (2017). A neuronal retuning hypothesis of sentence-specificity in Broca’s area. Psychonomic
Bulletin & Review,25, 16821694.
McElree, B. (2006). Accessing recent events. Psychology of Learning and Motivation,46, 155200.
Milne, A., Wilson, B., & Christiansen, M. (2018). Structured sequence learning across sensory modalities in
humans and nonhuman primates. Current Opinion in Behavioral Sciences,21,3948.
Morgan, J. L. (1996). Prosody and the roots of parsing. Language and Cognitive Processes,11,69106.
Musso, M., Moro, A., Glauche, V., Rijntjes, M., Reichenbach, J., Buchel, C., & Weiller, C. (2003). Broca’s
area and the language instinct. Nature Neuroscience,6, 774781.
Nelson, M. J., El Karoui, I., Giber, K., Yang, X., Cohen, L., Koopman, H., Cash, S. S., Naccache, L., Hale,
J. T., Pallier, C., & Dehaene, S. (2017). Neurophysiological dynamics of phrase-structure building during
sentence processing. Proceedings of the National Academy of Sciences of the United States of America,
114, E3669E3678.
Ohta, S., Fukui, N., & Sakai, K. L. (2013). Syntactic computation in the human brain: The degree of merger
as a key factor. PLoS ONE,8, e56230.
Pallier, C., Devauchelle, A. D., & Dehaene, S. (2011). Cortical representation of the constituent structure of
sentences. Proceedings of the National Academy of Sciences of the United States of America,108, 2522
Platt, J. R. (1964). Strong inference. Science,146, 347353.
Pullum, G. K., & Scholz, B. C. (2001). On the distinction between model-theoretic and generative-
enumerative syntactic frameworks. International Conference on Logical Aspects of Computational
Linguistics.(pp.1743) Berlin: Springer.
14 J. Udd
en et al. / Topics in Cognitive Science (2019)
Ravignani, A., Westphal-Fitch, G., Aust, U., Schlumpp, M., & Fitch, W. T. (2015). More than one way to
see it: Individual heuristics in avian visual cogntion. Cognition,143,1324.
Repplinger, M., Beinborn, L. M., & Zuidema, W. H. (2018). Vector-space models of words and sentences.
Nieuw Archief voor Wiskunde,19(3), 167174.
Rohrmeier, M. (2011). Towards a generative syntax of tonal harmony. Journal of Mathematics and Music,5,
Simon, H. A. (1962). The architecture of complexity. Proceedings of the American Philosophical Society,
106, 467482.
Udden, J., Hulten, A., Schoffelen, J.-M., Lam, N., Harbusch, K., van den Bosch, A., Kempen, G., Petersson,
K. M., & Hagoort, P. (2019a). Supramodal sentence processing in the human brain: fMRI evidence for the
influence of syntactic complexity in more than 200 participants. bioRxiv:576769.
Udden, J., Ingvar, M., Hagoort, P., & Petersson, K. M. (2017). Broca’s region: A causal role in implicit
processing of grammars with crossed non-adjacent dependencies. Cognition,164, 188198.
Udden, J., & M
annel, C. (2019b). Artificial grammar learning and its neurobiology in relation to language
processing and development. In S.-A. Rushmeyer, & G. Gaskell (Eds.), Oxford handbook of
psycholinguistics (pp. 755783). Oxford, UK: Oxford University Press.
van Heijningen, C. A. A., de Visser, J., Zuidema, W., & ten Cate, C. (2009). Simple rules can explain
discrimination of putative recursive syntactic structures by a songbird species. Proceedings of the National
Academy of Sciences,106, 2053820543.
Verwey, W. B., Abrahamse, E. L., Ruitenberg, M. F., Jim
enez, L., & de Kleine, E. (2011). Motor skill
learning in the middle-aged: Limited development of motor chunks and explicit sequence knowledge.
Psychological Research Psychologische Forschung,75, 406422.
Verwey, W. B., & Wright, D. L. (2014). Learning a keying sequence you never executed: Evidence for
independent associative and motor chunk learning. Acta Psychologica,151,2431.
Wilson, R. J. (1996). Introduction to graph theory (4th ed). Essex, UK: Addison Wesley Longman.
Zaccarella, E., & Friederici, A. D. (2015). Merge in the human brain: A sub-region based functional
investigation in the left pars opercularis. Frontiers in Psychology,6, 1818.
Zaccarella, E., Meyer, L., Makuuchi, M., & Friederici, A. D. (2017). Building by syntax: The neural basis of
minimal linguistic structures. Cerebral Cortex,27, 411421.
Zaccarella, E., Schell, M., & Friederici, A. D. (2017). Reviewing the functional basis of the syntactic Merge
mechanism for language: A coordinate-based activation likelihood estimation meta-analysis. Neuroscience
& Biobehavioral Reviews,80, 646656.
Zuidema, W., Hupkes, D., Wiggins, G., Scharff, C., & Rohrmeier, M. (2018). Formal models of structure
building in music, language and animal song. In H. Honing (Ed.), The origins of musicality(p. 253).
Cambridge, MA: MIT Press.
J. Udd
en et al. / Topics in Cognitive Science (2019) 15
... In line with [1], [79], [12], [80] who supports the view that the brain holds some exclusive mechanisms for manipulating symbolic nested trees, the Broca area appears clearly to hold one of those mechanisms for the detection of the complexity pattern in sequences [4]. We might suspect that the Broca area is functional very rapidly during infancy since babies and even neonates appear to be sensitive to syntax in proto-words [81], [82], [83], [9], [10]; see also the computational models of Dominey in [84], [76], [54]. ...
... It has been suggested that conjunctive cells in frontal areas play an important role for goal-based behaviors [123], [58]. We suggest further that hierarchical tree codes and rank-order codes may allow the structural learning of tree representations in temporal sequences and that they are necessary for grammar and language [12], [79]. ...
Full-text available
In order to keep trace of information and grow up, the infant brain has to resolve the problem about where old information is located and how to index new ones. We propose that the immature prefrontal cortex (PFC) uses its primary functionality of detecting hierarchical patterns in temporal signals as a second feature to organize the spatial ordering of the cortical networks in the developing brain itself. Our hypothesis is that the PFC detects the hierarchical structure in temporal sequences in the shape of ordinal patterns and use them to index information hierarchically in different parts of the brain. Henceforth, we propose that this mechanism for detecting ordinal patterns participates also in the hierarchical organization of the brain during development; i.e., the bootstrapping of the connectome. By doing so, it gives the tools to the language-ready brain for manipulating abstract knowledge and for planning temporally ordered information; i.e., the emergence of causality and symbolic thinking. In this position paper, we will review several neural models from the literature that support serial ordering and propose an original one. We will confront then our ideas with evidences from developmental, behavioral and brain results.
... According to hierarchical complexity theory (Commons, 2007;Commons et al., 1998), nested and sequential control structures differ in their hierarchical complexity and horizontal complexity, which can be clearly depicted using nodes and edges rooted in graph theory (Uddén et al., 2020;West, 2001). As shown in Fig. 1, there are two types of nodes. ...
Full-text available
Background Coding has become an integral part of STEM education. However, novice learners face difficulties in processing codes within embedded structures (also termed nested structures). This study aimed to investigate the cognitive mechanism underlying the processing of embedded coding structures based on hierarchical complexity theory, which suggests that more complex hierarchies are involved in embedded versus sequential coding structures. Hierarchical processing is expected to place a great load on the working memory system to maintain, update, and manipulate information. We therefore examined the difference in cognitive load induced by embedded versus sequential structures, and the relations between the difference in cognitive load and working memory capacity. Results The results of Experiment 1 did not fully support our hypotheses, possibly due to the unexpected use of cognitive strategies and the way stimuli were presented. With these factors well controlled, a new paradigm was designed in Experiment 2. Results indicate that the cognitive load, as measured by the accuracy and response times of a code comprehension task, was greater in embedded versus sequential conditions. Additionally, the extra cognitive load induced by embedded coding structures was significantly related to working memory capacity. Conclusions The findings of these analyses suggest that processing embedded coding structures exerts great demands on the working memory system to maintain and manipulate hierarchical information. It is therefore important to provide scaffolding strategies to help novice learners process codes across different hierarchical levels within embedded coding structures.
... More crucially, these non-adjacent rules, generating structures containing either single-or multilevel associations, are not sufficient to capture the hierarchical nature of human language. This lack of sufficiency is because a mere association (e.g., X-Y) is unable to generate a syntactic node higher than XP in a well-merged phrase { XP X Y} (Perruchet and Rey, 2005;Friederici et al., 2011;Jeon, 2014;Goucha et al., 2017;Chen et al., 2021a) and because the processor should infer the internal nodes of the syntactic hierarchy during the processing of a human language sequence (Uddén et al., 2020). Such a problem is more severe in the adjacent dependency rules, as in (AB) n grammar, which could generate minimal adjacent two-word pairs such as "AB" without necessarily generating a hierarchical structure. ...
Full-text available
Introduction: Human language allows us to generate an infinite number of linguistic expressions. It's proposed that this competence is based on a binary syntactic operation, Merge, combining two elements to form a new constituent. An increasing number of recent studies have shifted from complex syntactic structures to two-word constructions to investigate the neural representation of this operation at the most basic level. Methods: This fMRI study aimed to develop a highly flexible artificial grammar paradigm for testing the neurobiology of human syntax at a basic level. During scanning, participants had to apply abstract syntactic rules to assess whether a given two-word artificial phrase could be further merged with a third word. To control for lower-level template-matching and working memory strategies, an additional non-mergeable word-list task was set up. Results: Behavioral data indicated that participants complied with the experiment. Whole brain and region of interest (ROI) analyses were performed under the contrast of "structure > word-list." Whole brain analysis confirmed significant involvement of the posterior inferior frontal gyrus [pIFG, corresponding to Brodmann area (BA) 44]. Furthermore, both the signal intensity in Broca's area and the behavioral performance showed significant correlations with natural language performance in the same participants. ROI analysis within the language atlas and anatomically defined Broca's area revealed that only the pIFG was reliably activated. Discussion: Taken together, these results support the notion that Broca's area, particularly BA 44, works as a combinatorial engine where words are merged together according to syntactic information. Furthermore, this study suggests that the present artificial grammar may serve as promising material for investigating the neurobiological basis of syntax, fostering future cross-species studies.
... The problem of learning the hierarchical relationships from sequential input is a challenging unsolved one. 30 This becomes even harder when the problem structure is dynamic, e.g., variable correlations change over time. Since any information can be serialized, the pattern recognition over sequences is a general one that can be applied ubiquitously to any type of serialized data. ...
Full-text available
Self-organization is ubiquitous in nature and mind. However, machine learning and theories of cognition still barely touch the subject. The hurdle is that general patterns are difficult to define in terms of dynamical equations and designing a system that could learn by reordering itself is still to be seen. Here, we propose a learning system, where patterns are defined within the realm of nonlinear dynamics with positive and negative feedback loops, allowing attractor-repeller pairs to emerge for each pattern observed. Experiments reveal that such a system can map temporal to spatial correlation, enabling hierarchical structures to be learned from sequential data. The results are accurate enough to surpass state-of-the-art unsupervised learning algorithms in seven out of eight experiments as well as two real-world problems. Interestingly, the dynamic nature of the system makes it inherently adaptive, giving rise to phenomena similar to phase transitions in chemistry/thermodynamics when the input structure changes. Thus, the work here sheds light on how self-organization can allow for pattern recognition and hints at how intelligent behavior might emerge from simple dynamic equations without any objective/loss function.
... We hypothesize that these effects would be due to the self-similarity of the signal which 22 would act as a reinforcement of the structure elaborated by the participants. 23 24 underlying hierarchical organization [8,9]. The question is therefore to know which properties of the 32 signal lead the system to organize it hierarchically. ...
Full-text available
In this paper, we explore the extraction of recursive nested structure in the processing of self-similar binary sequences generated by two Lindenmayer grammars: the Fibonacci grammar and the Skip grammar. In each of these grammars only sequential order information marks the hierarchical structure. Although closely related, these grammars differ from a formal point of view: the Fibonacci grammar is perfectly scale-free and presents an isomorphism between its surface and derivational properties while the Skip grammar, although also self-similar, does not present this isomorphism. Our goal was to explore the influence of these formal differences on the extraction of hierarchical structure by the participants. To this end, we implemented these grammars in a serial reaction time task. The results show that in both the Fibonacci grammar and the Skip grammar, participants elaborated a hierarchical structure from the signal. This suggests the involvement of at least partially similar mechanisms during processing. However, some processing differences remained that cannot be explained by the hypotheses proposed so far regarding the processing of strings generated by L-systems. We hypothesize that these effects would be due to the self-similarity of the signal which would act as a reinforcement of the structure elaborated by the participants.
... Thus, research on other species is not conclusive and remains a fascinating question (Dehaene et al., 2015). In humans, evidence that adjacent and nonadjacent dependencies recruit different brain substrates has gained strength (Calmus et al., 2020;Friederici et al., 2006;Uddén et al., 2020). Nonadjacency has been shown to be acquired later than adjacency in development (Gómez & Maye, 2005;Teinonen et al., 2009), and direct comparisons between adjacent and nonadjacent dependencies showed increased learning for adjacent ones (Friederici et al., 2006;Öttl et al., 2017). ...
Full-text available
Formal language hierarchy describes levels of increasing syntactic complexity (adjacent dependencies, nonadjacent nested, nonadjacent crossed) of which the transcription into a hierarchy of cognitive complexity remains under debate. The cognitive foundations of formal language hierarchy have been contradicted by two types of evidence: First, adjacent dependencies are not easier to learn compared to nonadjacent; second, crossed nonadjacent dependencies may be easier than nested. However, studies providing these findings may have engaged confounds: Repetition monitoring strategies may have accounted for participants' high performance in nonadjacent dependencies, and linguistic experience may have accounted for the advantage of crossed dependencies. We conducted two artificial grammar learning experiments where we addressed these confounds by manipulating reliance on repetition monitoring and by testing participants inexperienced with crossed dependencies. Results showed relevant differences in learning adjacent versus nonadjacent dependencies and advantages of nested over crossed, suggesting that formal language hierarchy may indeed translate into a hierarchy of cognitive complexity. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
... In this study, we test whether there is an anterior-posterior gradient organisation of frontal lobe involvement in language comprehension and processing. Such an organisation would provide evidence for hierarchically organised processing steps involved in language processing (Udden et al., 2020). ...
Full-text available
Frontal lobe organisation displays a functional gradient, with overarching processing goals located in parts anterior to more subordinate goals, processed more posteriorly. Functional specialisation for syntax and phonology within language relevant areas has been supported by meta-analyses and reviews, but never directly tested experimentally. We tested for organised functional specialisation by manipulating syntactic case and phonotactics, creating violations at the end of otherwise matched and predictable sentences. Both violations led to increased activation in expected language regions. We observe the clearest signs of a functional gradient for language processing in the medial frontal cortex, where syntactic violations activated a more anterior portion compared to the phonotactic violations. A large overlap of syntactic and phonotactic processing in the left inferior frontal gyrus (LIFG) supports the view that general structured sequence processes are located in this area. These findings are relevant for understanding how sentence processing is implemented in hierarchically organised processing steps in the frontal lobe.
... We further distinguish between complex grammars, which use hierarchy and recursion, and simple grammars, which use single units or linear sequences (Jackendoff and Wittenberg 2014), that may equally interface with all modalities. This classification is in line with psycholinguistic models differentiating processing of linearity and hierarchy, and models of the neurocognition of sequencing (Dehaene et al. 2015, Uddén et al. 2020. This distinction yields differences in the complexity of the utterances (Figure 1b), such as between gestures which are composed of simple grammars versus sign languages composed of complex grammars. ...
Full-text available
Since its inception, the study of language has been a central pillar to Cognitive Science. Despite an “amodal view,” where language is thought to “flow into” modalities indiscriminately, speech has always been considered the prototypical form of the linguistic system. However, this view does not hold up to the evidence about language and expressive modalities. While acknowledgment of both the nonvocal modalities and multimodality has grown over the last 40 years in linguistics and psycholinguistics, this has not yet led to a necessary shift in the mainstream linguistic paradigm. Such a shift requires reconfiguring models of language to account for multimodality, and demands a different view on what the linguistic system is and how it works, necessitating a Cognitive Science sensitive to the full richness of human communication.
Understanding what someone says requires relating words in a sentence to one another as instructed by the grammatical rules of a language. In recent years, the neurophysiological basis for this process has become a prominent topic of discussion in cognitive neuroscience. Current proposals about the neural mechanisms of syntactic structure building converge on a key role for neural oscillations in this process, but they differ in terms of the exact function that is assigned to them. In this Perspective, we discuss two proposed functions for neural oscillations — chunking and multiscale information integration — and evaluate their merits and limitations taking into account a fundamentally hierarchical nature of syntactic representations in natural languages. We highlight insights that provide a tangible starting point for a neurocognitive model of syntactic structure building. Neural oscillations are thought to have an important role in syntactic structure building but views differ on their exact function in this context. In this Perspective, Kazanina and Tavano explore two proposed functions for neural oscillations in this process, namely chunking and multiscale information integration.
Full-text available
This second edition of The Oxford Handbook of Computational Linguistics has been substantially revised, updated, and expanded. Alongside updated accounts of the topics covered in the first edition, it includes 17 new chapters on subjects such as deep learning, word representation, semantic role labelling, translation technology, opinion mining and sentiment analysis, and the application of Natural Language Processing in educational and biomedical contexts, among many others. The volume is divided into four parts that examine, respectively: the linguistic fundamentals of computational linguistics; the methods and resources used, such as statistical modelling, machine learning, and corpora; key language processing tasks including text segmentation, anaphora resolution, and speech recognition; and the major applications of Natural Language Processing, from machine translation to author profiling. The book will be an essential reference for researchers and students in computational linguistics and Natural Language Processing, as well as those working in related industries.
Full-text available
The generation of hierarchical structures is central to language, music and complex action. Understanding this capacity and its potential impairments requires mapping its underlying cognitive processes to the respective neuronal underpinnings. In language, left inferior frontal gyrus and left posterior temporal cortex (superior temporal sulcus/middle temporal gyrus) are considered hubs for syntactic processing. However, it is unclear whether these regions support computations specific to language or more generally support analyses of hierarchical structure. Here, we address this issue by investigating hierarchical processing in a non-linguistic task. We test the ability to represent recursive hierarchical embedding in the visual domain by contrasting a recursion task with an iteration task. The recursion task requires participants to correctly identify continuations of a hierarchy generating procedure, while the iteration task applies a serial procedure that does not generate new hierarchical levels. In a lesion-based approach, we asked 44 patients with left hemispheric chronic brain lesion to perform recursion and iteration tasks. We modelled accuracies and response times with a drift diffusion model and for each participant obtained parametric estimates for the velocity of information accumulation (drift rates) and for the amount of information accumulated before a decision (boundary separation). We then used these estimates in lesion-behaviour analyses to investigate how brain lesions affect specific aspects of recursive hierarchical embedding. We found that lesions in the posterior temporal cortex decreased drift rate in recursive hierarchical embedding, suggesting an impaired process of rule extraction from recursive structures. Moreover, lesions in inferior temporal gyrus decreased boundary separation. The latter finding does not survive conservative correction but suggests a shift in the decision criterion. As patients also participated in a grammar comprehension experiment, we performed explorative correlation-analyses and found that visual and linguistic recursive hierarchical embedding accuracies are correlated when the latter is instantiated as sentences with two nested embedding levels. While the roles of the inferior temporal gyrus and posterior temporal cortex in linguistic processes are well established, here we show that posterior temporal cortex lesions slow information accumulation (drift rate) in the visual domain. This suggests that posterior temporal cortex is essential to acquire the (knowledge) representations necessary to parse recursive hierarchical embedding in visual structures, a finding mimicking language acquisition in young children. On the contrary, inferior frontal gyrus lesions seem to affect recursive hierarchical embedding processing by interfering with more general cognitive control (boundary separation). This interesting separation of roles, rooted on a domain-general taxonomy, raises the question of whether such cognitive framing is also applicable to other domains.
Full-text available
Generation of hierarchical structures, such as the embedding of subordinate elements into larger structures, is a core feature of human cognition. Processing of hierarchies is thought to rely on lateral prefrontal cortex (PFC). However, the neural underpinnings supporting active generation of new hierarchical levels remain poorly understood. Here, we created a new motor paradigm to isolate this active generative process by means of fMRI. Participants planned and executed identical movement sequences by using different rules: a Recursive hierarchical embedding rule, generating new hierarchical levels; an Iterative rule linearly adding items to existing hierarchical levels, without generating new levels; and a Repetition condition tapping into short term memory, without a transformation rule. We found that planning involving generation of new hierarchical levels (Recursive condition vs. both Iterative and Repetition) activated a bilateral motor imagery network, including cortical and subcortical structures. No evidence was found for lateral PFC involvement in the generation of new hierarchical levels. Activity in basal ganglia persisted through execution of the motor sequences in the contrast Recursive versus Iteration, but also Repetition versus Iteration, suggesting a role of these structures in motor short term memory. These results showed that the motor network is involved in the generation of new hierarchical levels during motor sequence planning, while lateral PFC activity was neither robust nor specific. We hypothesize that lateral PFC might be important to parse hierarchical sequences in a multi‐domain fashion but not to generate new hierarchical levels.
Full-text available
Language makes us human. It is an intrinsic part of us, although we seldom think about it. Language is also an extremely complex entity with subcomponents responsible for its phonological, syntactic, and semantic aspects. In this landmark work, Angela Friederici offers a comprehensive account of these subcomponents and how they are integrated. Tracing the neurobiological basis of language across brain regions in humans and other primate species, she argues that species-specific brain differences may be at the root of the human capacity for language. Friederici shows which brain regions support the different language processes and, more important, how these brain regions are connected structurally and functionally to make language processes that take place in milliseconds possible. She finds that one particular brain structure (a white matter dorsal tract), connecting syntax-relevant brain regions, is present only in the mature human brain and only weakly present in other primate brains. Is this the “missing link” that explains humans’ capacity for language? Friederici describes the basic language functions and their brain basis; the language networks connecting different language-related brain regions; the brain basis of language acquisition during early childhood and when learning a second language, proposing a neurocognitive model of the ontogeny of language; and the evolution of language and underlying neural constraints. She finds that it is the information exchange between the relevant brain regions, supported by the white matter tract, that is the crucial factor in both language development and evolution. © 2017 Massachusetts Institute of Technology. All rights reserved.
Interdisciplinary perspectives on the capacity to perceive, appreciate, and make music. Research shows that all humans have a predisposition for music, just as they do for language. All of us can perceive and enjoy music, even if we can't carry a tune and consider ourselves “unmusical.” This volume offers interdisciplinary perspectives on the capacity to perceive, appreciate, and make music. Scholars from biology, musicology, neurology, genetics, computer science, anthropology, psychology, and other fields consider what music is for and why every human culture has it; whether musicality is a uniquely human capacity; and what biological and cognitive mechanisms underlie it. Contributors outline a research program in musicality, and discuss issues in studying the evolution of music; consider principles, constraints, and theories of origins; review musicality from cross-cultural, cross-species, and cross-domain perspectives; discuss the computational modeling of animal song and creativity; and offer a historical context for the study of musicality. The volume aims to identify the basic neurocognitive mechanisms that constitute musicality (and effective ways to study these in human and nonhuman animals) and to develop a method for analyzing musical phenotypes that point to the biological basis of musicality. ContributorsJorge L. Armony, Judith Becker, Simon E. Fisher, W. Tecumseh Fitch, Bruno Gingras, Jessica Grahn, Yuko Hattori, Marisa Hoeschele, Henkjan Honing, David Huron, Dieuwke Hupkes, Yukiko Kikuchi, Julia Kursell, Marie-Élaine Lagrois, Hugo Merchant, Björn Merker, Iain Morley, Aniruddh D. Patel, Isabelle Peretz, Martin Rohrmeier, Constance Scharff, Carel ten Cate, Laurel J. Trainor, Sandra E. Trehub, Peter Tyack, Dominique Vuvan, Geraint Wiggins, Willem Zuidema
The language faculty is grounded in the human brain and allows any infant to learn any language. In her book, Angela D. Friederici offers a neurobiological theory of human language by integrating data from adult language processing, language development and brain evolution across primates. Describing the brain basis of language in its functional and structural neuroanatomy as well as its neurodynamics, she argues that differences in the brain that are species-specific may be at the root of human language.
This textbook provides a comprehensive introduction to the emerging fields of neurolinguistics and linguistic aphasiology. Reflecting the dramatic changes that have taken place in the study of language disorders over the last decade, David Caplan's approach is firmly interdisciplinary. He introduces concepts from the main contributing disciplines - neurology, linguistics, psychology and speech pathology - in such a way that they will be clearly understood by all students, whatever their particular background. The topics covered have been carefully selected to demonstrate how the more sophisticated topical neurolinguistic approaches have developed from traditional clinical models. The critical and detailed discussion of all the main theoretical issues in the fields makes this a fundamental work not only for students but also for specialists.
The revised edition of the Handbook offers the only guide on how to conduct, report and maintain a Cochrane Review ? The second edition of The Cochrane Handbook for Systematic Reviews of Interventions contains essential guidance for preparing and maintaining Cochrane Reviews of the effects of health interventions. Designed to be an accessible resource, the Handbook will also be of interest to anyone undertaking systematic reviews of interventions outside Cochrane, and many of the principles and methods presented are appropriate for systematic reviews addressing research questions other than effects of interventions. This fully updated edition contains extensive new material on systematic review methods addressing a wide-range of topics including network meta-analysis, equity, complex interventions, narrative synthesis, and automation. Also new to this edition, integrated throughout the Handbook, is the set of standards Cochrane expects its reviews to meet. Written for review authors, editors, trainers and others with an interest in Cochrane Reviews, the second edition of The Cochrane Handbook for Systematic Reviews of Interventions continues to offer an invaluable resource for understanding the role of systematic reviews, critically appraising health research studies and conducting reviews.