ArticlePDF Available

Recursive music elucidates neural mechanisms supporting the generation and detection of melodic hierarchies

Authors:

Abstract and Figures

The ability to generate complex hierarchical structures is a crucial component of human cognition which can be expressed in the musical domain in the form of hierarchical melodic relations. The neural underpinnings of this ability have been investigated by comparing the perception of well-formed melodies with unexpected sequences of tones. However, these contrasts do not target specifically the representation of rules generating hierarchical structure. Here, we present a novel paradigm in which identical melodic sequences are generated in four steps, according to three different rules: The Recursive rule, generating new hierarchical levels at each step; The Iterative rule, adding tones within a fixed hierarchical level without generating new levels; and a control rule that simply repeats the third step. Using fMRI, we compared brain activity across these rules when participants are imagining the fourth step after listening to the third (generation phase), and when participants listened to a fourth step (test sound phase), either well-formed or a violation. We found that, in comparison with Repetition and Iteration, imagining the fourth step using the Recursive rule activated the superior temporal gyrus (STG). During the test sound phase, we found fronto-temporo-parietal activity and hippocampal de-activation when processing violations, but no differences between rules. STG activation during the generation phase suggests that generating new hierarchical levels from previous steps might rely on retrieving appropriate melodic hierarchy schemas. Previous findings highlighting the role of hippocampus and inferior frontal gyrus may reflect processing of unexpected melodic sequences, rather than hierarchy generation per se.
a Melodic hierarchical sequences. Colored items denote musical notes of a particular pitch and duration. Letters within these items denote the musical note (E, C and Ab) and the color denote their hierarchical level (1, 2, 3 or 4). Items within dominant (lower frequency) levels were of a longer duration than items in subordinate levels. Each item in Levels 1, 2 and 3 was dominant over a set of three other items of a higher pitch and with a certain melodic contour (in this figure ‘ascending’, see text for details). The pitch of a dominant item determined the pitches of the subordinate set according to pitch relations (major third or minor sixth) that were consistent across different levels. The melodic relations within each set of three and their contour were also consistent across levels. These sequences could be generated using: b Recursive rules, which added new hierarchical levels at each application step (1, 2, 3 and 4) or c Iterative rules, which added elements within a fixed level, without generating a new level. d a Repetition rule was also run, in which the first two steps were unrelated and then participants were asked whether step 4 was a repetition of step 3. Our test stimuli in the MR-scanner were the first four steps resulting from the application of these rules (or unrelated tone sequences in Repetition) plus an incorrect 4th step (e), which was used as a foil. e These foils were generated by applying a rule to generate the 4th step which was different from the rule used to generate the previous 3 steps (see Fig. 4 for details)
… 
Content may be subject to copyright.
Vol.:(0123456789)
1 3
Brain Structure and Function (2020) 225:1997–2015
https://doi.org/10.1007/s00429-020-02105-7
ORIGINAL ARTICLE
Recursive music elucidates neural mechanisms supporting
thegeneration anddetection ofmelodic hierarchies
MauricioJ.D.Martins1,2,3,4 · FlorianPh.S.Fischmeister5,6· BrunoGingras7· RobertaBianco8·
EstelaPuig‑Waldmueller9· ArnoVillringer1,2,3· W.TecumsehFitch9· RolandBeisteiner10
Received: 12 September 2019 / Accepted: 16 June 2020 / Published online: 26 June 2020
© The Author(s) 2020
Abstract
The ability to generate complex hierarchical structures is a crucial component of human cognition which can be expressed in
the musical domain in the form of hierarchical melodic relations. The neural underpinnings of this ability have been investi-
gated by comparing the perception of well-formed melodies with unexpected sequences of tones. However, these contrasts do
not target specifically the representation of rules generating hierarchical structure. Here, we present a novel paradigm in which
identical melodic sequences are generated in four steps, according to three different rules: The Recursive rule, generating
new hierarchical levels at each step; The Iterative rule, adding tones within a fixed hierarchical level without generating new
levels; and a control rule that simply repeats the third step. Using fMRI, we compared brain activity across these rules when
participants are imagining the fourth step after listening to the third (generation phase), and when participants listened to a
fourth step (test sound phase), either well-formed or a violation. We found that, in comparison with Repetition and Iteration,
imagining the fourth step using the Recursive rule activated the superior temporal gyrus (STG). During the test sound phase,
we found fronto-temporo-parietal activity and hippocampal de-activation when processing violations, but no differences
between rules. STG activation during the generation phase suggests that generating new hierarchical levels from previous
steps might rely on retrieving appropriate melodic hierarchy schemas. Previous findings highlighting the role of hippocampus
and inferior frontal gyrus may reflect processing of unexpected melodic sequences, rather than hierarchy generation per se.
Keywords IFG· Hippocampus· Recursion· Hierarchy· STG· Music
Introduction
The human ability to represent and generate complex
hierarchies is an intriguing phenomenon. Although some
animal species seem able to represent simple hierarchies
Electronic supplementary material The online version of this
article (https ://doi.org/10.1007/s0042 9-020-02105 -7) contains
supplementary material, which is available to authorized users.
* Mauricio J. D. Martins
mmartins@cbs.mpg.de
1 Berlin School ofMind andBrain, Humboldt Universität zu
Berlin, Berlin, Germany
2 Max Planck Institute forHuman Cognitive andBrain
Sciences, Leipzig, Germany
3 Clinic forCognitive Neurology, University Hospital Leipzig,
Leipzig, Germany
4 Institut Jean Nicod, Département d’Etudes Cognitives, ENS,
EHESS, CNRS, PSL Research University, Paris, France
5 Institute ofPsychology, University ofGraz, Graz, Austria
6 Department ofBiomedical Imaging andImage-Guided
Therapy, Medical University ofVienna, Vienna, Austria
7 Institute ofPsychology, University ofInnsbruck, Innsbruck,
Austria
8 UCL Ear Institute, University College London, London, UK
9 Department ofBehavioral andCognitive Biology, University
ofVienna, Vienna, Austria
10 Department ofNeurology, High-Field MR Center
ofExcellence, Medical University ofVienna, Vienna, Austria
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1998 Brain Structure and Function (2020) 225:1997–2015
1 3
during social and spatial navigation (Buzsáki and Moser
2013; McKenzie etal. 2014; Seyfarth and Cheney 2014),
human hierarchical cognition has both a larger scope and
depth. First, humans can generate hierarchical structures in
multiple domains including language, music, and complex
action sequencing (Fitch and Martins 2014), and second,
there is no limit, in principle, to the depth we can add
to hierarchical structures (Hauser etal. 2002), except for
those imposed by the limits of human working memory.
This generalized and unbounded generativity is sup-
ported by capacities for hierarchical embedding and recur-
sion. Hierarchical embedding is a process through which
an element, or set of elements, is made ‘subordinate’ to
another ‘dominant’ element. For instance, in English,
when the word ‘film’ is embedded in ‘committee’ to form
[[film] committee], it refers to a kind of committee, not
a kind of film. Recursion is the process through which a
function’s output is used again as input to the same func-
tion. For instance, the natural numbers are described by
the recursive function Ni = Ni−1 + 1, which generates the
infinite set {1, 2, 3,…}. By combining these two prop-
erties—recursion and hierarchical embedding—we can
generate hierarchies of unbounded depth. For instance,
by using the recursive embedding rule NP [[NP] NP]
we can add ‘student’ to ‘film committee’ and obtain [[[stu-
dent] film] committee] and so on.
The ability to use recursive hierarchical embedding
(RHE) has been demonstrated in the domains of language
(Perfors etal. 2010), music (Martins etal. 2017), vision
(Martins etal. 2014a, b, 2015) and in the motor domain
(Martins etal. 2019). While behavioural research suggests
that RHE is instantiated by similar cognitive resources
across these domains (Martins etal. 2017), it is not clear
to what extent it is also supported by similar neural mecha-
nisms. In previous research we have investigated the neural
implementation of RHE in the visual and motor domains
(Martins etal. 2014a, 2019). Here, we will use music-like
stimuli to extend this research to the auditory domain.
Within the auditory domain, previous research has
focused on musical harmonic syntax, which describes a
set of rules governing hierarchical tonal relations between
notes and chords (Lerdahl and Jackendoff 1977). These rela-
tions are learned through processes of music enculturation
and thus create expectations which, when violated, cause
certain sequences to be perceived as incorrect or surpris-
ing, similar to the effects of grammatical violations in lan-
guage (Beisteiner etal. 1999; Rohrmeier and Koelsch 2012;
Rohrmeier etal. 2015; Tillmann 2012). Violations of music
syntax consistently activate the inferior frontal gyrus (IFG)
and the superior temporal gyrus (STG) (Bianco etal. 2016;
Koelsch etal. 2002, 2005; rev. Salimpoor etal. 2015; Seger
etal. 2013). Moreover, these areas are also active when con-
trasting melodies vs unstructured tones (Minati etal. 2008).
The specific roles of IFG and STG in the processing of
hierarchical structures are unclear. Because IFG is also
involved in processing syntax in language and in action,
this area has been thought as essential to the processing
of hierarchies in general (Fadiga etal. 2009; Fazio etal.
2009; Fitch and Martins 2014; Maess etal. 2001; Musso
etal. 2015; Patel 2003). Shared activation patterns between
language and music have been dubbed syntactic integration
resource hypothesis (SSIRH) (Patel 2003), and since then
several neuroimaging studies have highlighted these com-
monalities (reviewed in Peretz etal. 2015). However, it has
been pointed out that, because the contrasted stimuli have
different surface characteristics (correct vs incorrect, local
vs. long-distance dependencies), these similar patterns of
activity may reflect domain-general resources such as work-
ing memory or cognitive control, rather than specifically
reflecting increased structural load in the combinatorial
activities used to generate hierarchies (Bigand etal. 2014;
Novick etal. 2005; Patel and Morgan 2017; Rogalsky etal.
2011).
STG has also been found to be active during the process-
ing of music and linguistic syntax (Sammler etal. 2013),
and during the processing of both lyrics and tones (Sammler
etal. 2010). This region seems to be generally active in the
processing of auditory stimuli, but it also stores tonal maps
(Rogalsky etal. 2011) and schemas of sound events (Lee
etal. 2011), and is active in music imagery and familiarity
for structural relations (Herholz etal. 2012). This evidence
for structure sensitivity suggests that processing music
is biased by expectations based on such stored schemas
(reviewed in Salimpoor etal. 2015). Furthermore, STG acti-
vations seem to differ between local and global violations
of music syntax, with the former generating bilateral, and
the latter left-lateralized, activations (Stewart etal. 2008).
This inter-level segregation is an essential building block
for the processing of hierarchies, since the understanding of
hierarchy necessitates the ability to represent different levels
of structural organization.
In addition to these areas, recent research has highlighted
the role of the hippocampus in the processing of hierarchi-
cal structures across a variety of domains (Garvert etal.
2017; McKenzie etal. 2014; Schapiro etal. 2013; Stachen-
feld etal. 2017). The hippocampus is generally involved in
the formation of schemas through memory generalization
processes (Berens and Bird 2017). When multiple experi-
ences with similar features occur, these can form context-
independent memory structures. These memory structures
can contain schema nodes activated in a hierarchical fashion
navigating between local and global features of the stimuli
(Cooper and Shallice 2006; Stachenfeld etal. 2017). In
the musical domain, hippocampus activity increases with
music expertise while processing musical syntax (Grous-
sard etal. 2010), similarly to STG (Groussard etal. 2010;
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1999Brain Structure and Function (2020) 225:1997–2015
1 3
Koelsch etal. 2005). This suggests that both areas may be
important for the formation and retrieval of tonal schemas.
Furthermore, when the recognition of previously learned
music structures is measured, instead of violation process-
ing, hippocampus activity increases with familiarity while
IFG activity decreases (Watanabe etal. 2008).
Summarizing, previous studies leave unclear to what
extent current experimental paradigms specifically isolate
hierarchical generativity or, more specifically, recursive
hierarchical embedding. As noted above, studies investi-
gating the processing of music syntax usually rely on the
contrast between violations vs. well-formed tone (or chord)
sequences, or melodies vs. scrambled tones. Because dif-
ferent stimuli are contrasted, brain activity in these studies
could reflect general working memory, cognitive control,
or other processes involved while parsing superficial fea-
tures of these stimuli. This makes it difficult to isolate the
mechanisms supporting the representation of the underlying
hierarchical structure using established paradigms.
In this fMRI experiment, we introduce a novel paradigm
which partially reproduces features of previous studies but
allows us to isolate the cognitive processes underlying inter-
nal representations of hierarchical tone sequences. Here, we
repeatedly apply RHE rules to musical tones, forming audi-
tory ‘fractal’ hierarchies in discrete steps. ‘Fractal’ refers to
the structural similarity across hierarchical levels that results
from applying a recursive rule. Well-trained participants lis-
ten to the first three steps, each of which generates a new
level of the hierarchy. They then listen to a fourth step, and
are asked to determine whether the new tone sequence con-
taining an additional hierarchical level is consistent with the
previous three steps.
We contrasted accuracy and brain activity associated with
the ‘Recursive’ rule with a simpler ‘Iterative’ rule, which
followed the same stepwise procedure. However, in Itera-
tion, each step added elements within the same single level
of the hierarchy, without generating new levels. Behavioural
studies have shown that accuracy in using the Recursive rule
in the music domain correlates with the same ability in the
visual and action sequencing domains (Martins etal. 2017).
Importantly, these shared capacities dissociated from those
seen in Iteration. Our current procedure also included a con-
trol Repetition condition, in which participants were given
a certain melodic structure in step three, and then asked
whether step four was identical to step three or not. Cru-
cially, the stimuli were identical across all rules in the fourth
step. Hence, the stimuli to be imagined and heard were the
same. This aspect of our design eliminates the potential con-
found of stimuli being perceptually different, isolating how
identical stimuli were internally represented (as hierarchical
vs. iterative).
Finally, we divided our neural activity analysis to contrast
between two processing periods: (i) the generation phase, in
a silent period between the third and fourth steps, in which
participants use the rule parameters to imagine the tone
sequence corresponding to the fourth step, and (ii) the test
sound phase (the fourth step), in which participants listen
to a tone sequence that is either the correct continuation
of the third step, or a violation. This analysis allows us to
separate the activity related to the internal generative act
from the activity related to external stimulus processing.
By separating these phases, we evaluated whether any of
the brain regions discussed above is specifically involved in
the generative act versus playing a more general role in the
processing of melodic structures. We performed both whole
brain analyses and ROI analyses targeting IFG, hippocampus
and STG in both hemispheres, the most likely candidates
to support generation of hierarchical structures, based on
previous research.
Methods
Participants
Fifteen healthy participants (seven males and eight females,
age range 20–35, M = 25.5) took part in the study. All par-
ticipants were non-musicians: None had more than 2years of
music training and none practiced regularly with a musical
instrument. All had normal or corrected-to-normal vision
and audition, no history of neurological or psychiatric dis-
ease and no current use of psychoactive medications. All
completed a short questionnaire screening for previous clini-
cal history and a paper-and-pencil version of Raven’s pro-
gressive matrices (a test of non-verbal intelligence) (Raven
etal. 1998) and the Melodic Memory Task from the Gold-
MSI test battery (Müllensiefen etal. 2014). Participants
were recruited online and most were university students.
All participants were right-handed German native speak-
ers. Participants gave informed written consent before the
experiment in accordance with guidelines of the local ethics
committee. Before the functional magnetic resonance imag-
ing (fMRI) session, each participant was explicitly debriefed
about both generating rules and practiced one or two blocks
of the experimental task (with different stimuli) after which
s/he received feedback. Participants were paid 30 Euros for
their participation. The overall procedure comprised one
hour of practice plus cognitive testing and approximately
one and a half hours of fMRI scanning.
Stimuli andbehavioral tasks
The central cognitive construct that we aim to isolate is the
capacity to represent rules which enable the generation of
hierarchical structures. In previous work, we defined the
distinct concepts of hierarchy and sequence using a graph
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2000 Brain Structure and Function (2020) 225:1997–2015
1 3
theoretical framework (Udden etal. 2019): Sequence is a
rooted directed acyclical graph (DAG) in which no node
has more than one child, thus being limited to a single order
along the root-to-terminal axis and to a single terminal
(Fig.1, bottom). On the other hand, a Hierarchy is a rooted
DAG in which at least one node has more than one child,
thus forming a branching tree (Fig.1, top). The terminals
(Fig.1a–f) of a hierarchical structure are unordered (top)
unless a sequence is imposed upon them (bottom). When
this is the case, we obtain two distinct ordering structures:
(1) a sequential terminal-to-terminal (horizontal) ordering,
which in music corresponds to items unfolding in time; and
(2) a hierarchical root-to-terminal (vertical) ordering, which
in music can correspond to either a tonal or harmonic inter-
vallic structure.
Following the conceptual framework, we designed a set
of tasks described in detail in Martins etal. (2017), which
we use in the current study. In this earlier study, we showed
that both musicians and non-musicians can acquire recur-
sive rules in the domain of melodic hierarchies and that this
capacity (when contrasted with simple iteration) is predicted
by the ability to understand recursion in the visual domain
and with the capacity to solve the Tower of Hanoi, a recur-
sive planning task.
Our stimuli are based on the properties of melodic fractals
(Fig.2a). Melodic fractals are structures with several hier-
archical levels in which the hierarchical relations between
dominant and subordinate elements are kept the same across
levels. Here, the elements of different levels are identified by
their pitch and duration, with lower pitch and long duration
elements being dominant over elements with higher pitch
and shorter duration (Martins etal. 2017; Tamir-Ostrover
and Eitan 2015).
Each note at level n, (x)n, is dominant over a sequence of
subordinate tones at level n + 1: [(x1)n+1, (x2)n+1, (x3)n+1].
The generative rule determining the relationship between
hierarchical levels includes four variables: inter-level inter-
val, inter-tone interval, melodic contour and tone duration.
First, the inter-level interval l is the pitch interval between
a dominant tone (x)n and the subordinate with the lowest
pitch (x1)n+1. l can be either 4 or 8 semitones (l ϵ {4,8}) and
is constant across levels. We chose these particular musi-
cal intervals to avoid dissonance, because the new levels
were added cumulatively and were played back simultane-
ously with dominant level tones (see Martins etal. 2017 for
details).
Second, the inter-tone distance t is the pitch interval
between each pair of adjacent items in the subordinate
sequence. The distance t can be either 4 semitones (major
third) or 8 semitones (minor sixth) (t ϵ {4,8}) and is constant
across levels. Since the reference note within each triplet is
(x2)n+1, we can write (x2)n+1 as xn + l + t, or for simplicity
(x2)n+1 = xn + ϕ. Putting together the relationships between
and within level, we obtain the generative rule: (x)n [(x)n
[(xt + ϕ)n+1, (x + ϕ)n+1, (x + t + ϕ)n+1]], in which the domi-
nant level (x)n remains present, in addition to the new sub-
ordinate sequence [(xt + ϕ)n+1, (x + ϕ)n+1, (x + t + ϕ)n+1].
Third, the subordinate sequence can have either an
ascending or descending contour c ϵ {− 1, 1} (ascending = 1
or descending = 1). Adding this parameter to the genera-
tive rule, we obtain the recursive hierarchical embedding
rule:
Finally, the duration of the (x)n+1 tones comprising the
subordinate sequence are approximately 1/3 of the dura-
tion of the dominant (x)n tone. We added short silent pauses
between tones to facilitate discrimination of each individual
tone. Pauses between clusters of three tones were longer than
pauses within each cluster, which facilitated the perceptual
separation between clusters. This resulted in a sound with
a total duration of 7.4s. For more details see Martins etal.
(2017).
In addition to contour, inter- and intra-level intervals,
which gave us 2 × 2 × 2 = 8 different fractal stimuli, there
were four different starting tones for each stimulus, resulting
in a pool of 8 × 4 = 32 different fractals.
As follows from above, the ‘Recursive rule’ (Fig.2b)
generated the final structure (Fig.2a) in four steps, each
step adding a new hierarchical level consistent with the
previous. In contrast, the ‘Iterative rule’, which also gen-
erated the same structures in 4 steps, simply added new
elements within a fixed hierarchical level, without generat-
ing new levels (Fig.2c). The contrast between these rules
thus taps into the representation of recursive processes that
specifically generate several levels with a single rule. For
each melodic hierarchy, in addition to a correct 4th step, we
generated an incorrect 4th step or foil (Fig.2d). This foil
was built by violating the rule contour parameter c from
the third to the fourth step (“positional” foil), i.e. changing
(
x)n
[
(x)n
[
(xtc +𝜙)n
+1
,(x+𝜙)n
+1
,(x+tc +𝜙)n
+1]].
Fig. 1 A collection of items (af) unfolding in time can be repre-
sented simultaneously as a sequence (bottom arrows) and a cognitive
hierarchy (top structure)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2001Brain Structure and Function (2020) 225:1997–2015
1 3
from ascending to descending melodic contour or vice
versa. For each condition, we presented well-formed stim-
uli in half of the trials and foils in the other half.
Our third control condition was the ‘Repetition rule’,
in which participants were first exposed to three unrelated
melodic hierarchies and then asked to determine whether
the 4th step was an exact repetition of the 3th step, or a
different melodic hierarchy.
Two aspects are important in our design. First, our
generative rule (x)n [(x)n [(xtc + ϕ)n+1, (x + ϕ)n+1,
(x + tc + ϕ)n+1]] creates both a melodic sequence [(x—
tc + ϕ)n+1, (x + ϕ)n+1, (x + tc + ϕ)n+1] and a sequence of
Fig. 2 a Melodic hierarchi-
cal sequences. Colored items
denote musical notes of a
particular pitch and duration.
Letters within these items
denote the musical note (E, C
and Ab) and the color denote
their hierarchical level (1, 2, 3
or 4). Items within dominant
(lower frequency) levels were
of a longer duration than items
in subordinate levels. Each
item in Levels 1, 2 and 3 was
dominant over a set of three
other items of a higher pitch and
with a certain melodic contour
(in this figure ‘ascending’, see
text for details). The pitch of
a dominant item determined
the pitches of the subordinate
set according to pitch relations
(major third or minor sixth)
that were consistent across
different levels. The melodic
relations within each set of three
and their contour were also
consistent across levels. These
sequences could be generated
using: b Recursive rules, which
added new hierarchical levels
at each application step (1, 2, 3
and 4) or c Iterative rules, which
added elements within a fixed
level, without generating a new
level. d a Repetition rule was
also run, in which the first two
steps were unrelated and then
participants were asked whether
step 4 was a repetition of step
3. Our test stimuli in the MR-
scanner were the first four steps
resulting from the application
of these rules (or unrelated tone
sequences in Repetition) plus an
incorrect 4th step (e), which was
used as a foil. e These foils were
generated by applying a rule to
generate the 4th step which was
different from the rule used to
generate the previous 3 steps
(see Fig.4 for details)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2002 Brain Structure and Function (2020) 225:1997–2015
1 3
harmonic intervals (of the tones in the same sequence in
relation to the baseline tone (x)n). Presumably, participants
could use either or both aspects to infer the rule. We ensured
that the baseline tones (x)n were perceptually available in
the stimulus to facilitate the detection of the relationship
between two hierarchical levels while reducing short term
memory demands. While we cannot determine whether par-
ticipants focused on the sequence of increasing/decreasing
harmonic intervals or on the ascending/descending melodic
sequence when listening to the stimuli, both perceptual
experiences correspond to the same underlying rule, and
are formally equivalent.
Second, in the 4th iteration, it is possible that participants
paid attention only to the last two levels without tracking the
full vertical harmonic structure. Importantly, before hearing
the complete test stimuli with all levels played at once in the
4th iteration, they heard each level being introduced individ-
ually, step-by-step, in a 4-step procedure. In order to decode
the hierarchical structure of the stimuli, and form correct
expectations, they thus had to represent the generating rule
binding the two highest levels of each step/iteration (and
understand that the generating rule was consistent with that
binding the previously heard levels). Thus, participants who
were able to correctly identify continuations in the Recursive
task needed to cognitively represent levels of the hierarchy,
based on the formal hierarchical definition adopted here
(see Udden etal. 2019), whether or not they perceptually
attended to them in the final stimulus. Second, it is important
to note that participants’ ability to represent this binding rule
correlated strongly and specifically with similar abilities in
the visual and motor domains (Martins etal. 2017), support-
ing the hypothesis that the Recursive task employed here
isolates some aspects of hierarchical generativity.
fMRI procedure
Before each trial (Fig.3) participants were shown a letter
indicating the trial rule [Recursion (R), Iteration (I) or Repe-
tition (S)]. Then they listened sequentially to the first 3 steps
resulting from the application of the rule (or three unrelated
hierarchies in ‘Repetition’). Each step was accompanied by
1, 2, or 3 crosshairs on the screen indicating the correspond-
ing step. After the 3rd step there was a generation phase,
ranging between 2 and 4s in which participants were asked
to imagine how the 4th step would sound like. Then, in the
test sound phase, participants were asked to listen to the test
sound, and to determine whether this tone sequence was a
correct 4th step or a foil. They delivered the response in the
decision phase after the test phase by pressing a button on a
button box (using LEFT thumb if it was correct and RIGHT
thumb if it was incorrect).
Crucially, the same final test sound sequences were used
in the different task rules (Recursion, Iteration or Repeti-
tion). Thus, cognitive differences in the generation and test
Fig. 3 Trial structure. Inside the scanner, participants performed
4 sessions of 18 trials each [6 trials of Recursion (R), 6 of Iteration
(I) and 6 of Repetition (S)]. All trials were constructed as the follow-
ing: First, there was a letter indicating the category of the trial (F, I,
R), then the first three steps were played while crosshairs were pre-
sented on the screen. After the first three steps were presented, par-
ticipants had a period of 2–4 s to imagine how the 4th step would
sound like (generation phase). Then they were presented with the
test sound sequence (test sound phase), which could be either a cor-
rect or an incorrect 4th step. They were asked to decide whether the
tone sequence was correct or incorrect and to present their choice by
pressing one of two buttons in a button box (decision phase, which
ran until a button press up to a maximum of 5 s). In addition, we
separated the test sound phase in two equal parts and focused our
analysis in the first part, where potential violations could already be
detected (see methods for details). By doing so we could model acti-
vations related to preparation for response and to potential attention
drifts. max. maximum
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2003Brain Structure and Function (2020) 225:1997–2015
1 3
phases would relate to how identical test sound sequences
were generated and represented. Importantly, in order to
account for potential differences in the BOLD signal due to
the first three steps, we included these steps in the first level
fMRI analysis.
Each participant performed 4 sessions of 18 trials each
with intermixed conditions [6 trials of Recursion (R), 6
of Iteration (I) and 6 of Repetition (S)]. Half of the trials
presented a correct 4th step and half presented a Foil. The
set of six trials of each ‘correctness’ category was further
divided in three stimuli with ascending contour, and three
stimuli with descending contour. The number of stimuli of
each inter-level (IL) and inter-tone (IT) intervals (8 vs. 4)
was balanced for each rule across the full 4-session set of
trials (4 × 6 = 24): 12 stimuli (6 correct and 6 foils) of IT/IL
interval 8 and 12 stimuli of IT/IL interval 4. Session order
was pseudorandomized across participants. Optimal trial
sequence and Jittering parameters in the planning phase
were obtained using Optseq2 (Greve 2002).
Pretesting
No more than one week before the MRI testing, partici-
pants performed a 2-h pretesting session. In this session,
participants were explicitly and verbally instructed about the
task rules, aided by slides shown on a screen (Powerpoint
presentation and sounds available as Supplementary Mate-
rials). Then we familiarized participants with the stimuli
and assessed whether their performance was adequate. In
this session, participants were explicitly instructed about the
recursive and iterative rules and then performed 2-forced
choice “discrimination” tasks with 12 Recursion and 12
Iteration trials (described in detail in (Martins etal. 2017)).
These discrimination tasks were similar to the procedure
described above, except that two-tone sequences were con-
secutively presented (in a random order) in the test sound
phase instead of one. One of the test sound sequences was
a correct (step 4) continuation and the other was a foil. Par-
ticipants had to indicate which was correct by selecting the
appropriate box in the screen using the mouse. In the pretest-
ing phase, we used two additional foil categories in relation
to the fMRI procedure: (1) A ‘Repeat 3’ foil, which is simply
a repetition of the third step; and (2) an “Odd foil” in which
one element in each set of three tones was misplaced in the
subordinate level (Fig.4). Participants performed a maxi-
mum of two runs with each task, until their performance
was at least 8/12 correct.
Then, on the scanning day, we performed one warm-
up session with the same experimental procedure used in
the MR, but outside the scanner. In this session, we used
all three foil categories: (1) Repetition, (2) Odd, and (3)
Positional. We kept the three foils in this phase to prevent
participants from developing simple auditory heuristics
based on the detection of feature specific to ‘Positional
foils’, thus incentivizing participants to try to imagine the
correct 4th step before the test sound phase. However, in
the final sessions in scanner we only used the ‘Positional’
foils so that we could present exactly the same test sound
stimuli in the Recursion, Iteration and Repetition trials,
thus facilitating analysis and interpretation.
In addition to the pretesting familiarization phase,
participants’ melodic memory was assessed using the
Melodic Memory Task from the Gold-MSI test battery
(Müllensiefen etal. 2014). For the latter, to ensure normal
musical perception abilities, participants were asked to
listen to pairs of short melodies (containing between 10
and 17 notes) and to indicate whether the two melodies
had an identical pitch interval structure or not (by select-
ing “same” or “different”). In “same” trials, the second
melody had the same pitch interval structure as the first
one, but was transposed by a semitone or by a fifth. In “dif-
ferent” trials, in addition to being transposed, the second
melody was modified by changing two notes by an interval
varying between 1 and 4 semitones (for details, see Mul-
lensiefen etal. 2014). The task was composed of 13 trials,
including 2 initial training trials, and had a total duration
of around 10min. Percentage correct was calculated and
all participants scored within 2 standard deviations of the
norm for the UK (Müllensiefen etal. 2014).
Fig. 4 Correct 4th step and foil types in the first and second pre-
testing sessions. The first pre-training session was a 2-forced choice
discrimination task, while the second was a 1-forced choice detection
task. During both sessions, we used three foil categories (Positional,
Odd and Repeat) similarly to Martins etal. (2017)) to prevent par-
ticipants from developing simple auditory heuristic strategies and to
incentivize participants to imagine the test sound sequence during the
generation phase. However, in the fMRI experiment we used only the
‘Position’ foils to homogenize stimuli across conditions and facilitate
analysis (The Repeat foil was different between Iteration and Recur-
sion, and the Odd foil is a salient contour which is easy to detect)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2004 Brain Structure and Function (2020) 225:1997–2015
1 3
fMRI data acquisition
Functional and anatomical data were acquired with a 3T
TIM Trio system (Siemens, Erlangen, Germany) using
a 32-channel Siemens head coil. For the functional mag-
netic resonance images (fMRI) an optimized 2D single-
shot echo planar imaging (EPI) sequence with TR 2000ms
and TE 32ms was used. Altogether, in four sessions
of 420 volumes each, functional images were acquired
with FOV of 210 × 210mm, in-plane matrix 90 × 90,
with 36 slices of 2.7mm thickness and 20% gap (voxel
size 2.3mm × 2.3mm × 2.7mm) aligned parallel to the
AC-PC plane, and a flip angle of 73°. The total acquisi-
tion time was 56min. Additionally, anatomical high-reso-
lution T1-weighted MR images were collected using a 3D
MPRAGE sequence (TE = 3.02ms, TR = 2190ms, inversion
time [TI] = 1300ms) with a matrix size of 250 × 250 × 256,
with isometric voxels with a nominal side length of 0.9mm,
flip angle of 9° and GRAPPA acceleration factor 2.
fMRI data preprocessing
fMRI data of 15 participants were analysed with statisti-
cal parametric mapping (SPM8; Welcome Trust Centre for
Neuroimaging; https ://www.fil.ion.ucl.ac.uk/spm/softw are/
spm8/). Functional data were pre-processed by following
standard spatial pre-processing procedures. They consisted
of: slice time correction (by means of cubic spline interpo-
lation method), spatial realignment and co-registration of
functional and anatomical data. Then, we performed a classi-
cal spatial normalisation into the MNI (Montreal Neurologi-
cal Institute) stereotactic space that included resampling to
2 × 2 × 2mm voxel size. Finally, data were spatially low-pass
filtered using a 3D Gaussian kernel with full-width at half-
maximum (FWHM) of 8mm.
For single-subject analyses, evoked hemodynamic
responses for the different event types were modelled within
a comprehensive general linear model (GLM). This first
level model included the generation phase, the test sound
phase and the decision phase (Fig.3). We also included an
event comprising the period between the beginning of the
trial and the end of step 3 (the prior phase). With the inclu-
sion of this prior phase in the first level analysis we mod-
elled the BOLD differences between trial types (Recursion,
Iteration and Repetition) within steps 1, 2 and 3, and sought
to extract these effects from the generation and test sound
phases.
To summarize, the first level GLM included: (1) the prior
phase, which was the period between trial onset and the end
of step 3, with duration d = 26.04s; (2) the generation phase,
with onset 2–4s prior to the test sound (step 4), and with
d = 2–4s; (3) the test sound phase, d = 7.4s, corresponding
to step 4; and (4) the decision phase, comprising the period
between the end of step 4 and the response button press.
We further divided the test sound phase (composed of 3
clusters of 9 tones each) in two halves, the first compris-
ing the first cluster (d = 2.4s) and the second comprising
the remaining duration of the tone sequence. The rationale
behind this division was the following: because the auditory
stimuli were organized in 3 clusters with identical structure
(Fig.2a), it was possible to detect violations to well-formed
tone sequences within the first cluster (Fig.2e). In order to
model potential attention drifts or other artifacts related to
motor response preparation in the later phases of the test
sound phase, both parts of the latter phase were included in
the first level GLM. For the second-level analysis only the
first part of the test sound phase was included in the GLM.
For comparison, results showing the full test sound phase
of 7.4s are depicted in Supplementary Materials and are
identical, except with greater activity in the motor cortex
areas, lateralized according to the response button: LEFT
for incorrect and RIGHT for correct.
To this design, we added estimated motion realignment
parameters as covariates of no interest to regress out residual
motion artefacts and increase statistical sensitivity. In addi-
tion, a 128s cutoff high-pass filter was applied to account
for low-frequency drifts and signal fluctuations.
Responses corresponding to the generation [RULE:
Recursion (R), Iteration (I) and Repetition (S)], the test
sound [RULE × CORRECTNESS: Correct (Co) vs. Foil
(Fo)], and the decision (BUTTON PRESS: left vs. right)
phases were then summarized across the four sessions and
entered into a second-level GLM.
fMRI statistical analysis
Group analyses were conducted in the context of the general
linear model (GLM) separately for the generation and test
sound phases. For these analyses, flexible factorial within-
subject ANOVAs were performed with the factor RULE
(Recursion, Iteration and Repetition); for the test sound
phase analyses CORRECTNESS (Correct vs. Foil) was
added as additional factor. All analyses were restricted to
grey matter, i.e. individual whole brain data were masked
using a voxel wise global gray matter threshold of 0.25, and
finally thresholded to FWE ps > 0.05 for significance. We
also modelled button press in the decision phase.
Within these models further statistical parametric maps
using t contrasts were constructed to disentangle significant
main effects. In the generation phase, t contrasts were cal-
culated between each RULE. In the test sound phase, t con-
trasts were calculated between RULE and between Correct
and Foil sounds.
We controlled family-wise error rate (FWER) of clus-
ters below 0.05 with a cluster-forming height-threshold
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2005Brain Structure and Function (2020) 225:1997–2015
1 3
of 0.001. Anatomical labels are based on Harvard–Oxford
cortical structural atlas implemented in FSL (https ://fsl.
fmrib .ox.ac.uk/fsl/fslwi ki/Atlas es).
Region ofinterest (ROI) analyses
To investigate the role of IFG, hippocampus, and STG in
the processing of hierarchical structures, we extracted 8
ROIs from Jülich Histological Atlas comprising IFG (BA
44 and 45, left and right) (Amunts and Zilles 2012), hip-
pocampus (Cornu Ammonis and Dentate Gyrus, left and
right) (Amunts etal. 1999) (Fig.5), and 4 ROISs from
the Harvard–Oxford atlas (anterior and posterior STG,
left and right) (Fischl etal. 2004). The population map
of these regions was truncated at 50%.
Using the model structure of the flexible factorial
within-subject ANOVAs described above, we extracted
the mean of the single-subject beta values across each
ROI mask using the REX toolbox (https ://web.mit.edu/
swg/softw are.htm), with global scaling. We then com-
puted linear mixed models with these beta values as
dependent variable, with RULE (Recursion, Repetition
and Iteration) as within-factor in the generation phase,
and RULE, CORRECTNESS (Correct, Foil) and their
interaction in the test sound phase. Statistical analyses
were performed in R studio (1.1.453). Models were com-
puted with the function lmer() with package lme4 (Bates
etal. 2014) using participants as random factor. Models
are reported using ANOVA (type = II) and the R package
Anova() for p values. When main effects were found, we
tested for pairwise differences with emmeans() (Russell
2018), using Kenward-Roger methods to calculate the
degrees of freedom, and Tukey p value adjustment when
comparing 3 parameters.
Results
Pretesting
Our sample scored on average 69% (SD = 14) in the Melodic
Memory Task, a result within the normative range for non-
musicians in the United Kingdom (Müllensiefen etal. 2014).
Results for the discrimination 2-forced choice task used
to increase experience with the stimulus material showed an
average percentage of correct answers of 77% (SD = 20) in
the Iteration Rule and 85% (SD = 17) in the Recursion Rule.
Participants could correctly reject all foil categories (accu-
racy > 70% for all), establishing that they were not using
simple heuristics to accomplish successful discrimination
(see “Methods” for details).
On the same day as the MR testing, participants per-
formed one session of 18 trials with the same 1-forced
choice detection task used within the scanner (but with three
foil categories instead of one). Mean accuracy was 73% in
Iteration trials, 73% in Recursion and 82% in Repetition.
The corresponding discriminability values (d’) were 0.92
for Iteration (SD = 0.60), 0.83 for Recursion (SD = 0.70),
and 1.08 for Repetition trials (SD = 0.36), indicating that
participants discriminated well above chance levels.
Behavioral
During MRI scanning, participants scored on average 75%
in Iteration trials (SD = 23), 79.2% in Recursion (SD = 18)
and 89% in Repetition (SD = 7). The corresponding discrimi-
nability values (d’) were 0.88 for Iteration (SD = 0.91), 1.06
for Recursion (SD = 0.79), and 1.31 for Repetition trials
(SD = 0.37). We found no significant effects of RULE on
discriminability scores (F(2, 28) = 2.2. p = 0.13, within-
subjects ANOVA).
Fig. 5 Regions of interest
(ROIs). ROIs were defined on
the Jülich Histological (IFG and
Hippocampus) and Harvard–
Oxford (superior temporal
gyrus) Atlases. Population map
for all areas was truncated at
50%. BA Brodmann’s area, L
left, R right
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2006 Brain Structure and Function (2020) 225:1997–2015
1 3
Response time in the decision phase was on average
740ms in Iteration trials (SD = 290), 780ms in Recur-
sion (SD = 330) and 580ms in Repetition (SD = 140). We
found an effect of RULE on response time (F(2, 28) = 6.2.
p = 0.006), specifically there was a statistically significant
difference between Repetition and both Recursion and
Iteration (both p values < 0.02) but not between Iteration
and Recursion (p = 0.5).
fMRI
Generation phase
Data is depicted in Fig.6 and Table1. In comparison with
Repetition, we found that imagining new hierarchical levels
using the Recursion rule activated a bilateral network com-
prising Planum Temporale (PT) and Heschl’s Gyrus (HG),
extending to the posterior superior temporal gyrus (STG).
The linear contrast Recursion > Iteration yielded a similar
pattern on the right hemisphere, but did not extend to PT
and pSTG on the left. The contrasts Iteration > Repetition
Fig. 6 Brain activation during
the generation phase (between
steps 3 and 4)
Table 1 Rule effect in the
generation phase
Whole-brain activation cluster sizes (k). MNI coordinates (x. y. z) and T values for the Rule contrast in the
execution phase (pvoxel < 0.001; pcluster < 0.05. FWE corrected). Repeated labels within each cluster are not
depicted
Hem hemisphere, HG Heschl’s Gyrus, pSTG superior temporal gyrus, posterior division, PT planum tem-
porale
Region Hem k x y z T value
Recursion > Iteration
pSTG R 584 70 − 26 10 6.06
66 − 20 2 5.93
HG 56 − 10 4 4.13
HG L 243 − 48 − 16 2 4.86
Central opercular cortex − 50 − 6 12 4.20
Insular cortex − 38 − 8 12 3.61
Recursion > Repetition
pSTG R 1055 66 − 20 4 6.72
PT 56 − 24 6 6.10
HG 52 − 14 4 5.52
PT L 807 − 48 − 18 4 5.58
pSTG − 58 − 34 8 5.15
HG − 56 − 10 2 4.90
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2007Brain Structure and Function (2020) 225:1997–2015
1 3
and Iteration > Recursion yielded no significant activations.
The application of the Recursion rule during the generation
phase yielded specific activations in contrast with both the
simple Repetition and Iterative rule. Compared to Repeti-
tion, the Recursive rule activated a bilateral network com-
prising Heschl’s Gyrus (HG), posterior Superior Temporal
Gyrus (pSTG) and Planum Temporale (PT). The same net-
work was active in the contrast Recursion > Iteration for the
right hemisphere, but for the left hemisphere, activity was
restricted to the HG.
To test whether the IFG, STG or hippocampus, or any of
their sub-regions played a significant role in the generation
of melodic hierarchies, we performed ROI analyses. In
particular, we tested whether the mean activation differed
between rules for each ROI individually (Fig.7). Signifi-
cant main effects of Rule were found only for right anterior
STG [aSTG R; F(2,28) = 7.7, p < 0.001], r ight posterior
STG [pSTG R; F(2,28) = 8.1, p < 0.001] and left posterior
STG [pSTG L: F(2,28) = 4.1, p = 0.03]. Within all other
ROIs the effect of Rule was not significant (all ps > 0.05).
In particular, we found that mean activity for Recursion
was higher than for Repetition in all three regions [aSTG R:
t(28) = 3.2, b = 1.0, p < 0.001; pSTG R: t(28) = 3.6, b = 0.9,
p < 0.001; pSTG L: t(28) = 3.7, b = 0.9, p = 0.03], and
Fig. 7 ROI analysis during the generation phase. For each of the 12
ROIs, we performed linear mixed models for single-subject beta val-
ues, with Rule as fixed factor. We found significant main effects of
Rule only within the STG (see text for details). In particular, Recur-
sion activity was higher than Iteration only within right posterior
Superior Temporal Gyrus [t(28) = 3.3, p < 0.001]. p posterior, a ante-
rior, L left, R right, STG superior temporal gyrus, IFG inferior frontal
gyrus. *p < 0.05 **p < 0.001. Mean Beta (global scaling): mean beta
values divided by the global mean across all voxels and scaled to 100
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2008 Brain Structure and Function (2020) 225:1997–2015
1 3
Recursion activity was higher than Iteration within pSTG R
[t(28) = 3.3, b = 0.9, p < 0.001].
Comparing ROI mean activity can potentially conceal
interesting activity cluster differences between rules within
each ROI. To address this issue, we ran several Small
Volume Correction (SVC) analyses within the same ROI
masks. Other than the results already reported for mean
ROI activity there were no other significant differences
during the generation phase (with uncorrected p < 0.01).
Fig. 8 Brain activation during the test sound phase (first part of
step 4). A fronto-temporo-parietal network was activated when par-
ticipants heard sequences of tones that violated the underlying rule
(Foil) in comparison with well-formed tone sequences (Correct)
(main effect of CORRECTNESS). This finding was consistent across
all rules (Iteration, Recursion and Repetition). There was no effect of
RULE in the test sound phase and no interaction between RULE and
CORRECTNESS
Table 2 Rule effect in the test
sound phase
Whole-brain activation cluster sizes (k). MNI coordinates (x. y. z) and Z scores for the Rule contrast in the
execution phase (pvoxel < 0.001; pcluster < 0.05. FWE corrected). Repeated labels within each cluster are not
depicted
Hem hemisphere, Lat Lateral, pSTG Superior temporal Gyrus, posterior division, SFG superior frontal
gyrus, MFG middle frontal gyrus, IFG po inferior frontal gyrus, pars opercularis, PT planum temporale,
AG angular gyrus, pSMG supramarginal gyrus, posterior division
Region Hem k x y z T value
Foil > Correct
MFG R 4633 48 12 40 6.98
IFGpo 58 18 18 6.56
42 14 32 6.53
SFG L 2247 0 30 44 6.90
Paracingulate cortex − 6 18 46 6.50
MFG L 4131 − 38 12 28 6.67
− 38 0 52 5.78
Frontal pole − 46 40 8 5.57
pSTG L 1135 − 64 − 26 6 6.20
PT − 56 − 30 6 5.68
pSMG − 62 − 42 12 5.29
pSMG L 999 − 42 − 46 42 5.90
− 32 − 56 36 5.70
pSMG L 514 36 − 46 36 4.73
AG 44 − 50 46 4.39
Lat. occipital 40 − 58 46 4.27
Supplementary motor R 234 12 − 2 4 4.27
Caudate 12 16 2 3.79
Pallidum 12 6 0 3.69
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2009Brain Structure and Function (2020) 225:1997–2015
1 3
Test sound phase
Data is depicted in Fig.8 and Table2. In the test sound
phase, we modelled both Rule and Correctness (Correct vs.
Foil). We found no main effect of Rule and no interaction
between Rule and Correctness. In this phase, we found that
when participants heard melodic sequences that violated the
correct rules (vs. well-formed sequences), there was activ-
ity in a bilateral fronto-temporo-parietal network. This net-
work included clusters in the Superior Frontal Gyrus (SFG)
extending to Paracingulate Gyrus, Middle Frontal Gyrus
(MFG) extending to IFG, STG, Supra Marginal Gyrus
(SMG), Angular Gyrus (AG) extending to Lateral Occipital
Cortex, and finally a Supplementary Motor Cortex cluster
extending to the basal ganglia.
In addition to whole brain analysis, we performed ROI
analyses, using the same regions as for the generation phase
(Fig.9). For each region, we performed a linear mixed
model with Rule, Correctness and their interaction as fixed
factors, and participant as random factor. Within the left
Fig. 9 ROI analysis during the test sound phase. Mimicking the
whole brain analysis, we found an increase of activity in the con-
trast Foil > Correct in all IFG sub-regions, right anterior and pos-
terior STG, and left posterior STG (all p < 0.001). In addition, the
same contrast was associated with a decrease in activity in the left
hippocampal subregions Cornu Ammonis and Dentate Gyrus (both
p = 0.02). Finally, we found a main effect of Rule within the right
anterior STG, in particular, activity was higher in Recursion than
Repetition [t(70) = 2.6, p = 0.04]. p posterior, a anterior, L left, R
right, STG superior temporal gyrus, IFG inferior frontal gyrus.
*p < 0.05 **p < 0.001. Mean Beta (global scaling) mean beta values
divided by the global mean across all voxels and scaled to 100
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2010 Brain Structure and Function (2020) 225:1997–2015
1 3
hippocampus, we found a main effect of Correctness in
the Cornu Ammonis [F(1,70) = 5.7, p = 0.02] and [Den-
tate Gyrus F(1,70) = 5.4, p = 0.02]. In particular, activity in
these regions was higher during processing of Correct tone
sequences vs. Foils [Cornu Ammonis: t(70) = 2.4, b = 0.5,
p = 0.02; Dentate Gyrus: t(70) = 2.3, b = 0.9, p = 0.02]. Simi-
larly, with the exception of left anterior STG, we found a
main effect of Correctness within all IFG regions and STG
regions, [BA44 L: F(1,70) = 27.5, p < 0.001; BA44 R:
F(1,70) = 42.9, p < 0.001; BA45 L: F(1,70) = 19.1, p < 0.001;
BA45 R: F(1,70) = 34.2, p < 0.001; aSTG R: F(1,70) = 11.1,
p < 0.001; pSTG L: F(1,70) = 25.3, p < 0.001; pASTG R:
F(1,70) = 35.8, p < 0.001]. Contrary to the hippocampus,
activity in these regions was lower during the processing
of Correct tone sequences vs. Foils, [BA44 L: t(70) = − 5.2,
b = − 2.2, p < 0.001; BA44 R: t(70) = − 6.6, b = -2.2,
p < 0.001; BA45 L: t(70) = − 4.4, b = − 1.6, p < 0.001; BA45
R: t(70) = − 5.9, b = -1.9, p < 0.001; aSTG R: t(70) = − 3.3,
b = − 1.2, p = 0.001; pSTG L: t(70) = − 5.0, b = − 1.4,
p < 0.001; pSTG R: t(70) = − 6.0, b = − 2.2, b = − 1.4,
p < 0.001].
Finally, we found a main effect of Rule within the right
anterior STG [F(2,70) = 3.4, p = 0.04]. In particular, activity
was higher in Recursion than Repetition [t(70) = 2.6, b = 1.2,
p = 0.04]. We found no other significant main effects or
interactions within the ROIs.
As in the generation phase, we performed SVC analyses
to detect whether there were particular clusters of activity
differentiating task rules in addition to the mean ROI analy-
sis. We replicated the mean ROI results and found addi-
tional activity clusters within the left posterior STG for the
contrast Recursion > Iteration (T = 3.58, x = − 66, y = − 22,
z = 6, K = 8, pFWE-cluster = 0.028), and within the right poste-
rior STG for the contrast Recursion > Repetition (T = 3.54,
x = 66, y = − 14, z = 0, K = 6, pFWE-cluster = 0.031). There were
no other significant clusters within STG, hippocampus or
IFG (with threshold uncorrected p > 0.01).
Discussion
Our goal was to isolate and investigate the neural bases sup-
porting the generation and representation of hierarchies in
melodic sequences. To that aim, we devised a paradigm in
which identical melodic sequences were generated accord-
ing to different rules: A Recursion rule, which added hier-
archical levels via recursive embedding; an Iterative rule,
which successively added items to a fixed hierarchical level,
without creating new levels; and a control Repetition rule,
which simply required short term memory of a complete
melodic sequence without any cognitive transformation.
In our procedure, we primed participants with a certain
rule by successively presenting three melodic sequences
corresponding to the first three steps that resulted from the
application of each rule. After the third step, we asked par-
ticipants to apply the rule one step further and to imagine
the next melodic sequence (generation phase). Then we pre-
sented a fourth sequence (test sound phase), correct or foil,
and asked participants to evaluate whether it matched their
predictions. Using this paradigm we could isolate the neural
structures active in the representation of Recursive Hierar-
chical Embedding (RHE), both in anticipation (generation
phase) and during the perception (test sound phase) of a
melodic hierarchy.
As in our previous behavioral work (Martins etal. 2017),
we found that, after training, participants were able to
achieve comparable accuracy in the Recursion and Itera-
tion rules, and to reject different foil categories as incorrect,
indicating that they did not rely upon any simple auditory
heuristic to solve the tasks. They rejected different foil cat-
egories in both a 2-forced choice discrimination task and
a 1-choice detection task, similar to the fMRI procedure.
Crucially, the Recursion rule in the auditory paradigm has
been shown behaviorally to share cognitive resources with
similar tasks in the visual and action sequencing domains
(Martins etal. 2017). This suggests that the Recursion rule
in our auditory task is adequate to isolate the mechanisms
underlying the representation of RHE. In the following para-
graphs, we summarize the results we obtained using this
reliable methodology.
First, we found that during the generation phase, between
steps 3 and 4, there was increased activity in STG, HG and
PT for the Recursion rule, in comparison with Iteration and
with Repetition, indicating that these regions are particu-
larly involved in the cognitive generation of new hierarchi-
cal levels. This activity was more robust in the right hemi-
sphere. Based on previous hypotheses implicating IFG, STG
and hippocampus in the processing of hierarchies across
domains (see introduction for a review), we specifically
tested whether these regions were active in the generation
phase. In line with whole brain analysis, we found signifi-
cantly increased activity only in the right posterior STG for
both contrasts Recursion vs. Iteration and Recursion vs.
Repetition.
Second, during the test sound phase, during which par-
ticipants listened to melodic sequences that were identical
across conditions, the whole brain analysis revealed no sig-
nificant difference between task rules. However, ROI analy-
ses suggest that some clusters within the posterior STG are
more active in Recursion than in either Iteration (left hemi-
sphere) or in Repetition (right hemisphere), thus partially
replicating our results for the generation phase.
In addition, we found strong effects of Correctness, mean-
ing that participants evaluated whether their expectations
were met. In line with previous studies (Koelsch etal. 2005;
Musso etal. 2015; Salimpoor etal. 2015; Seger etal. 2013),
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2011Brain Structure and Function (2020) 225:1997–2015
1 3
in this analysis we found increased activity in a fronto-
temporo-parietal network during the processing of viola-
tions relative to well-formed structures. Our ROI analyses
confirmed this finding for each subregion within IFG (BA
44 and 45, both left and right) and STG (pSTG L, aSTG R,
pSTG R). Interestingly, processing violations also decreased
activity across several regions within left hippocampus.
We now turn to a discussion of how these results contrib-
ute to the understanding of the neural bases of the represen-
tation of recursive hierarchical embedding rules.
STG inthegeneration ofnew levels inmelodic
hierarchies
Pervious research has implicated both IFG and STG in the
processing of music syntax (Koelsch etal. 2005, 2002;
Minati etal. 2008; Musso etal. 2015; Seger etal. 2013).
However, the specific roles of these areas in the genera-
tion of hierarchies remain unknown (Bianco etal. 2016;
Fadiga etal. 2009; Fitch and Martins 2014; Friederici 2011;
Koelsch etal. 2002; Maess etal. 2001; Makuuchi etal. 2009;
Patel 2003; Zaccarella etal. 2015).
In our experiment, STG (especially the right posterior
STG) was robustly more active in both Recursion > Itera-
tion and Recursion > Repetition during the stimulus-free
generation phase. However, activity in this area did not differ
between Iteration and Repetition, suggesting that this effect
is more likely to reflect hierarchical generative effort than
any simple difference in in the number of tones maintained
in memory. This region is not only associated with the pro-
cessing of music syntax, but it is more generally thought
to be a repository of tonal sequence schemas (Janata 2002)
and tonal relations, being active in auditory imagery and
prediction tasks (Salimpoor etal. 2015, for a review). Rel-
evant for our task, STG has also been shown to differentiate
between ascending and descending melodic contours (Lee
etal. 2011) and between local and global level violations
(Stewart etal. 2008). During the generation phase, partici-
pants were required to take step 3, which provides a global
context, and to add a new local hierarchical level according
to a rule which determined whether this local contour was
ascending or descending. Then they were asked to build a
guiding prediction of this new structure before the test sound
(step 4). Combined with results from the previous literature,
our results suggest that the representation of RHE is cru-
cially dependent on the retrieval and manipulation of the
appropriate tonal sequence schemas from STG.
In our study, we found no indication that IFG was active
in the generation of new hierarchical levels. IFG is com-
monly active when well-formed structures are contrasted
with violations, not only in music syntax (Bianco etal. 2016;
Koelsch etal. 2002; Maess etal. 2001; Patel 2003), but also
in language (Friederici 2011, for a review). These findings
led to the hypothesis that IFG supports the generation of
hierarchies across domains (Fadiga etal. 2009; Fitch and
Martins 2014; Patel 2003; Fitch 2014). However, instead
of supporting functions specific to the generation of hierar-
chies, IFG might rather support domain-general cognitive
functions associated with working memory, cognitive con-
trol, or other computations necessary to process unexpected
or complex sequences (Bigand etal. 2014; Patel and Morgan
2017; Rogalsky etal. 2011). While our study does not dem-
onstrate a domain-general role of IFG, it is more consistent
with this hypothesis than with the hypothesis of a specific
role for IFG in the generation of hierarchies.
Finally, against our prior hypothesis, we also did not find
hippocampus to be active in the representation of RHE dur-
ing the generation phase. While this region may be impor-
tant for the initial formation of new hierarchical schemas
(Berens and Bird 2017), with training on a particular cat-
egory of stimuli, as in the current study, these functions may
migrate to other cortical regions, e.g. the superior temporal
cortex (Gilboa and Marlatte 2017).
IFG, STG andhippocampus intheprocessing
ofmelodic sequences
In contrast to the generation phase, we did not find signifi-
cant differences in brain activity between Recursion and
Iteration when participants heard the melodic sequences
during the test sound phase. Because brain activity in all
trial phases (i.e. generation, test sound, and decision) was
included in the first level GLM, this means that this phase
did not explain additional rule differences when the genera-
tion phase was accounted for. In other words, the stimuli
themselves were not represented differently once the effects
of expectancy were also modelled.
Although we did not find an effect of rule, we replicated
the classical activity pattern for the processing of violations
vs. well-formed tonal structures (Koelsch etal. 2005, 2002;
Musso etal. 2015; Seger etal. 2013) which included IFG
and STG, but also portions of the fronto-parietal network.
Again, this pattern of activity seems less likely to reflect
increased structural load specific to hierarchical process-
ing, than the recruitment of other mechanisms required to
detect and resolve expectancy violations, such as increased
attention, cognitive control and auditory working memory
(Bigand etal. 2014; Patel and Morgan 2017; Rogalsky etal.
2011).
Another interesting finding in the test sound phase was
the deactivation of the hippocampus in the processing of
violations. The hippocampus is known to guide reactivation
of memory schemas during perceptual experience (Schlicht-
ing and Preston 2015), biasing processing from the input
system (Gilboa and Marlatte 2017). When new stimuli are
presented, these are either assimilated into existing schemas,
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2012 Brain Structure and Function (2020) 225:1997–2015
1 3
or the old schemas are modified to accommodate the new
stimuli. However, when a certain item strongly violates
expectations, activity in the hippocampus is reduced to facil-
itate violation detection (Armelin etal. 2017). This process
might be essential to inhibit accommodation of schemas in
response to incorrect stimuli. Interestingly, the accuracy of
music schema retrieval is associated with decreased activ-
ity in left IFG and increased activity in right hippocampus
(Watanabe etal. 2008). Our pattern of activity was sym-
metrically opposite, hinting that these regions may play
complementary roles in detecting and resolving violations.
Limitations
First, a minor oversight is that we did not balance button
press (LEFT, RIGHT) across participants. So there is a cog-
nitive mapping (CORRECT/INCORRECT) to these buttons.
We minimized the influence of this design shortcoming by
including activity only from the first melodic structure clus-
ter of the test sound phase (we excluded the later part in
which the dominant cognitive process was the preparation
for response) and by including the button press phase in first
level model.
Second, in the fMRI experiment the detection task in the
test sound phase was very simple, since there was only one
kind of foil: participants only needed to detect ascending vs
descending contour. Tasks were simpler in the scanner to
keep the stimuli exactly the same across rules: as shown in
Fig.4, a repetition foil in the Recursion condition sounds dif-
ferent than in the Iteration condition. Thus, using these foils
in the scanner task would have introduced a perceptual con-
found. However, participants were trained twice—using both
discrimination and detection tasks—with a more complex
set of stimuli and with more foils. Thus, it is unlikely that
they acquired simple ascending/descending response heuris-
tics, since these would be insufficient to solve the training
tasks where they also scored adequately (and equivalently
to their performance in the scanner).
Third, accepting that participants were not employing
simple heuristics to determine the contour of the fourth
level, how can we determine if they were truly able to rep-
resent hierarchical relations? Determining the exact compu-
tations underlying a behavioral task is always a challenge.
However, we surmise that there are minimal representational
requirements necessary to solve our task: in the Recursive
rule (but not the other rules) participants necessarily have to
bind information from two different levels of information: In
particular, they have to apply the information derived from a
given hierarchical level (with a particular rhythmic structure,
pitch range, and melodic contour) to form an expectation
about the next level (with a similar but more rapid rhyth-
mic structure, higher pitch range, and same contour). The
hierarchical structure is not in the stimulus itself, but rather
in the rule that binds each parent tone from level n to a set of
three children tones in level n + 1: the frequency and dura-
tion of a parent tone determines the melodic and rhythmic
structure of the triplet. Since this relation is a rooted directed
acyclical graph with a branched structure, it is by definition
hierarchical (Udden etal. 2019). In addition, the previous
finding that the ability to perform the Recursive rule strongly
and specifically correlates with similar abilities in the vis-
ual and motor domains (Martins etal. 2017), supports the
assumption that the Recursive task isolates some aspects of
hierarchical generativity.
Finally, it could be argued that activity in the generation
phase reflects “spillover” activation from steps 1, 2 and 3.
However, 1) we modelled all trial phases in first level of
analysis, which reduces this spillover and 2) stimuli in these
early steps were actually more complex (more items and
hierarchical levels) in Iteration and Repetition than in the
Recursion rule. Hence, activity in the generation phase is
more likely to reflect the additional cognitive load required
for the transformation of step 3 into step 4, i.e., the genera-
tion of a new hierarchical level, than any long-lasting effects
of previous phases. Moreover, patterns of activity in these
early trial phases (steps 1, 2 and 3, see Supplementary Mate-
rials) clearly show increased activity for Repetition within
the Fronto–Parietal Network, and increased activity for
both Iteration and Recursion (vs. Repetition) within midline
structures. Crucially, in this phase there was no increased
activity in Recursion vs. Iteration within the temporal corti-
ces, making it unlikely that spillover effects account for the
STG activations we found.
Conclusion
In this study we used a novel paradigm using musical stimuli
to clarify the neural basis of hierarchical cognition in the
auditory domain. Previous research has uncovered a peri-
sylvian network, incoporating both temporal and frontal cor-
tices, and strikingly similar to that used in language, appar-
ently involved in the processing of musical syntax (Koelsch
etal. 2002; Musso etal. 2015; Patel 2003; Sammler etal.
2013). However, these previous studies used a violation
paradigm which left unclear whether the observed activa-
tions might reflect surprise, attentional change, and cognitive
control, rather than factors specific to hierarchical process-
ing of melodic stimuli. Our new paradigm allowed us to
tease these factors apart, and revealed robust activation of
superior temporal gyrus (particularly right posterior STG),
specifically during the process of cognitively generating a
new level in an auditory hierarchical structure.
In contrast, inferior frontal regions were only robustly
activated when detecting violations versus correct stimuli
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2013Brain Structure and Function (2020) 225:1997–2015
1 3
in all conditions (not specifically for hierarchical genera-
tion). This result suggests that the IFG is mostly involved in
violation detection/cognitive control in our task, rather than
hierarchy generation per se.
This division of labor mirrors recent findings in the visual
domain with stroke patients (Martins etal. 2019) and sug-
gests that future work aiming to probe the role of the IFG
and STG in music (or other cognitive domains) should use
test paradigms that do not rely solely on a violation/correct
discrimination, but rather isolate the generative acts involved
in processing and manipulating hierarchical representations
(Fitch 2014; Fitch and Martins 2014).
From a music cognition perspective, we note that despite
their simplicity and limited aesthetic appeal, but thanks to
their vertical harmonic structure, our melodic fractals pave
the way for the study of musical hierarchies in more com-
plex and musically relevant stimuli, such as musical excerpts
making use of contrapuntal techniques.
Acknowledgement Open Access funding provided by Projekt DEAL.
Funding This work was supported by an FCT Grant SFRH/
BD/64206/2009 to MM, by ERC Advanced Grant SOMACCA, Pro-
ject No. 230604 to WTF, by a research cluster grant “Shared Neural
Resources for Music and Language” to WTF and R. Beisteiner (Univer-
sity of Vienna and Medical University of Vienna), by the Grant EUR
FrontCog ANR-17-EURE-0017 and ANR-10-IDEX-0001-02 PSL.
Compliance with ethical standards
Conflict of interest The authors report no competing interests.
Research involving human participants and/or animals All procedures
performed in studies involving human participants were in accordance
with the ethical standards of the institutional review board and with the
Helsinki declaration ethical standards. This article does not contain any
studies with animals performed by any of the authors.
Informed consent Informed consent was obtained from all individual
participants included in the study.
Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in
the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.
References
Amunts K, Zilles K (2012) Architecture and organizational principles
of Broca’s region. Trends Cogn Sci 16(8):418–426. https ://doi.
org/10.1016/j.tics.2012.06.005
Amunts K, Schleicher A, Buerger U, Mohlberg H, Uylings HBM, Ziles
K (1999) Broca’s region revisted: cytoarchitecture and intersubject
variability. J Comp Neurol 412(2):319–341
Armelin A, Heinemann U, de Hoz L (2017) The hippocampus influ-
ences assimilation and accommodation of schemata that are not
hippocampus-dependent. Hippocampus 27(3):315–331. https ://
doi.org/10.1002/hipo.22687
Bates D, Maechler M, Bolker B, Walker S (2014) lme4: linear mixed-
effects models using Eigen and S4. R package version 1.1-7
Beisteiner R, Erdler M, Mayer D, Gartus A, Edward V, Kaindl T,
Deecke L (1999) A marker for differentiation of capabilities for
processing of musical harmonies as detected by magnetoencepha-
lography in musicians. Neurosci Lett 277(1):37–40. https ://doi.
org/10.1016/S0304 -3940(99)00836 -8
Berens SC, Bird CM (2017) The role of the hippocampus in gener-
alizing configural relationships. Hippocampus 27(3):223–228.
https ://doi.org/10.1002/hipo.22688
Bianco R, Novembre G, Keller PEE, Kim S-GG, Scharf F, Friederici
AD, Sammler D (2016) Neural networks for harmonic structure
in music perception and action. NeuroImage 142:454–464. https
://doi.org/10.1016/j.neuro image .2016.08.025
Bigand E, Delbé C, Poulin-Charronnat B, Leman M, Tillmann B
(2014) Empirical evidence for musical syntax processing?
Computer simulations reveal the contribution of auditory short-
term memory. Front Syst Neurosci 8(June):1–27. https ://doi.
org/10.3389/fnsys .2014.00094
Buzsáki G, Moser EI (2013) Memory, navigation and theta rhythm
in the hippocampal-entorhinal system. Nat Neurosci 16(2):130–
138. https ://doi.org/10.1038/nn.3304
Cooper RP, Shallice T (2006) Hierarchical schemas and goals in the
control of sequential behavior. Psychol Rev 113(4):887–916.
https ://doi.org/10.1037/0033-295X.113.4.887
Fadiga L, Craighero L, D’Ausilio A (2009) Broca’s area in language,
action, and music. Ann N Y Acad Sci 1169(1):448–458. https
://doi.org/10.1111/j.1749-6632.2009.04582 .x
Fazio P, Cantagallo A, Craighero L, D’ausilio A, Roy AC, Pozzo
T, Fadiga L (2009) Encoding of human action in Broca’s area.
Brain 132(7):1980–1988. https ://doi.org/10.1093/brain /awp11 8
Fischl B, Van Der Kouwe A, Destrieux C, Halgren E, Ségonne F,
Salat DH, Dale AM (2004) Automatically parcellating the
human cerebral cortex. Cereb Cortex 14(1):11–22. https ://doi.
org/10.1093/cerco r/bhg08 7
Fitch WT (2014) Toward a computational framework for cognitive
biology: unifying approaches from cognitive neuroscience and
comparative cognition. Phys Life Rev 11(3):329–364. https ://
doi.org/10.1016/j.plrev .2014.04.005
Fitch WT, Martins MD (2014) Hierarchical processing in music,
language, and action: Lashley revisited. Ann N Y Acad Sci
1316(1):87–104. https ://doi.org/10.1111/nyas.12406
Friederici AD (2011) The brain basis of language processing: from
structure to function. Physiol Rev 91(4):1357–1392. https ://doi.
org/10.1152/physr ev.00006 .2011
Garvert MM, Dolan RJ, Behrens TEJ (2017) A map of abstract rela-
tional knowledge in the human hippocampal–entorhinal cortex.
eLife 6:1–20. https ://doi.org/10.7554/eLife .17086
Gilboa A, Marlatte H (2017) Neurobiology of schemas and schema-
mediated memory. Trends Cogn Sci 21(8):618–631. https ://doi.
org/10.1016/j.tics.2017.04.013
Greve, D. N. (2002). Optseq Home Page. Retrieved from https ://surfe
r.nmr.mgh.harva rd.edu/optse q
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2014 Brain Structure and Function (2020) 225:1997–2015
1 3
Groussard M, La Joie R, Rauchs G, Landeau B, Chételat G, Viader
F, Platel H (2010) When music and long-term memory interact:
Effects of musical expertise on functional and structural plas-
ticity in the hippocampus. PLoS ONE 5(10):1–8. https ://doi.
org/10.1371/journ al.pone.00132 25
Hauser MD, Chomsky N, Fitch WT (2002) The faculty of lan-
guage: what is it, who has it, and how did it evolve? Sci-
ence 298(5598):1569–1579. https ://doi.org/10.1126/scien
ce.298.5598.1569
Herholz SC, Halpern AR, Zatorre RJ (2012) Neuronal correlates
of perception, imagery, and memory for familiar tunes. J
Cogn Neurosci 24(6):1382–1397. https ://doi.org/10.1162/
jocn_a_00216
Janata P (2002) The cortical topography of tonal structures underly-
ing western music. Science 298(5601):2167–2170. https ://doi.
org/10.1126/scien ce.10762 62
Koelsch S, Gunter TC, Cramon DY, Zysset S, Lohmann G, Friederici
AD (2002) Bach speaks: a cortical “language-network” serves
the processing of music. NeuroImage 17(2):956–966. https ://doi.
org/10.1016/S1053 -8119(02)91154 -7
Koelsch S, Fritz T, Schulze K, Alsop D, Schlaug G (2005) Adults
and children processing music: an fMRI study. NeuroIm-
age 25(4):1068–1076. https ://doi.org/10.1016/j.neuro image
.2004.12.050
Lee YS, Janata P, Frost C, Hanke M, Granger R (2011) Investiga-
tion of melodic contour processing in the brain using multivari-
ate pattern-based fMRI. NeuroImage 57(1):293–300. https ://doi.
org/10.1016/j.neuro image .2011.02.006
Lerdahl F, Jackendoff R (1977) Toward a Formal Theory of Tonal
Music. J Music Theory 21(1):111–171
Maess B, Koelsch S, Gunter TC, Friederici AD (2001) Musical syn-
tax is processed in Broca’s area: an MEG study. Nat Neurosci
4(5):540–545. https ://doi.org/10.1038/87502
Makuuchi M, Bahlmann J, Anwander A, Friederici AD (2009) Seg-
regating the core computational faculty of human language from
working memory. Proc Natl Acad Sci USA 106(20):8362–8367.
https ://doi.org/10.1073/pnas.08109 28106
Martins MD, Fischmeister FP, Puig-Waldmüller E, Oh J, Geißler
A, Robinson S, Beisteiner R (2014a) Fractal image perception
provides novel insights into hierarchical cognition. NeuroImage
96:300–308. https ://doi.org/10.1016/j.neuro image .2014.03.064
Martins MD, Laaha S, Freiberger EMEM, Choi S, Fitch WT (2014b)
How children perceive fractals: Hierarchical self-similarity and
cognitive development. Cognition 133(1):10–24. https ://doi.
org/10.1016/j.cogni tion.2014.05.010
Martins MD, Martins IP, Fitch WT (2015) A novel approach to inves-
tigate recursion and iteration in visual hierarchical processing.
Behav Res Methods. https ://doi.org/10.3758/s1342 8-015-0657-1
Martins MD, Gingras B, Puig-Waldmueller E, Fitch WT (2017) Cog-
nitive representation of “musical fractals”: processing hierarchy
and recursion in the auditory domain. Cognition. https ://doi.
org/10.1016/j.cogni tion.2017.01.001
Martins MD, Bianco R, Sammler D, Villringer A (2019) Recursion in
action: An fMRI study on the generation of new hierarchical levels
in motor sequences. Hum Brain Mapp. https ://doi.org/10.1002/
hbm.24549
McKenzie S, Frank AJ, Kinsky NR, Porter B, Rivière PD, Eichenbaum
H (2014) Hippocampal representation of related and opposing
memories develop within distinct, hierarchically organized neural
schemas. Neuron 83(1):202–215. https ://doi.org/10.1016/j.neuro
n.2014.05.019
Minati L, Rosazza C, D’Incerti L, Pietrocini E, Valentini L, Scaioli V,
Bruzzone MG (2008) FMRI/ERP of musical syntax: compari-
son of melodies and unstructured note sequences. NeuroReport
19(14):1381–1385. https ://doi.org/10.1097/WNR.0b013 e3283
0c694 b
Müllensiefen D, Gingras B, Musil J, Stewart L, Levitin D, Hallam S,
Winner E (2014) The musicality of non-musicians: an index for
assessing musical sophistication in the general population. PLoS
ONE 9(2):e89642. https ://doi.org/10.1371/journ al.pone.00896 42
Musso M, Weiller C, Horn A, Glauche V, Umarova R, Hennig J,
Rijntjes M (2015) A single dual-stream framework for syntactic
computations in music and language. NeuroImage 117:267–283.
https ://doi.org/10.1016/j.neuro image .2015.05.020
Novick JM, Trueswell JC, Thompson-Schill SL (2005) Cognitive con-
trol and parsing: reexamining the role of Broca’s area in sentence
comprehension. Cogn Affect Behav Neurosci 5(3):263–281. https
://doi.org/10.3758/CABN.5.3.263
Patel AD (2003) Language, music, syntax and the brain. Nat Neurosci
6(7):674–681. https ://doi.org/10.1038/nn108 2
Patel AD, Morgan E (2017) Exploring cognitive relations between pre-
diction in language and music. Cogn Sci 41:303–320. https ://doi.
org/10.1111/cogs.12411
Peretz I, Vuvan D, Lagrois MÉ, Armony JL (2015) Neural overlap
in processing music and speech. Philos Trans R Soc B Biol Sci
370(1664):20140090
Perfors A, Tenenbaum JB, Gibson E, Regier T (2010) How recursive
is language? A Bayesian exploration. In: van der Hulst H (ed)
Recursion and human language. de Gruyter Mouton, Berlin/New
York, pp 159–175
Raven J, Raven JC, Court J (1998) Manual for Raven’s progres-
sive matrices and vocabulary scales. Raven Man. https ://doi.
org/10.1006/cogp.1999.0735
Rogalsky C, Rong F, Saberi K, Hickok G (2011) Functional anatomy
of language and music perception: temporal and structural fac-
tors investigated using functional magnetic resonance imaging.
J Neurosci 31(10):3843–3852. https ://doi.org/10.1523/jneur
osci.4515-10.2011
Rohrmeier M, Koelsch S (2012) Predictive information processing in
music cognition. A critical review. Int J Psychophysiol 83(2):164–
175. https ://doi.org/10.1016/j.ijpsy cho.2011.12.010
Rohrmeier M, Zuidema W, Wiggins GA, Scharff C (2015) Princi-
ples of structure building in music, language and animal song.
Philos Trans R Soc Lond Ser B, Biol Sci. https ://doi.org/10.1098/
rstb.2014.0097
Russell L (2018) emmeans: estimated marginal means, aka least-
squares means. R package version 1.4.2
Salimpoor VN, Zald DH, Zatorre RJ, Dagher A, McIntosh AR (2015)
Predictions and the brain: how musical sounds become reward-
ing. Trends Cogn Sci 19(2):86–91. https ://doi.org/10.1016/j.
tics.2014.12.001
Sammler D, Baird A, Valabrègue R, Clément S, Dupont S, Belin P,
Samson S (2010) The relationship of lyrics and tunes in the pro-
cessing of unfamiliar songs: a functional magnetic resonance
adaptation study. J Neurosci 30(10):3572–3578. https ://doi.
org/10.1523/JNEUR OSCI.2751-09.2010
Sammler D, Koelsch S, Ball T, Brandt A, Grigutsch M, Huppertz H,
Schulze-bonhage A (2013) NeuroImage Co-localizing linguistic
and musical syntax with intracranial EEG. NeuroImage 64:134–
146. https ://doi.org/10.1016/j.neuro image .2012.09.035
Schapiro AC, Rogers TT, Cordova NI, Turk-Browne NB, Botvinick
MM (2013) Neural representations of events arise from temporal
community structure. Nat Neurosci 16(4):486–492. https ://doi.
org/10.1038/nn.3331
Schlichting ML, Preston AR (2015) Memory integration: neural mech-
anisms and implications for behavior. Curr Opin Behav Sci 1:1–8.
https ://doi.org/10.1016/j.cobeh a.2014.07.005
Seger CA, Spiering BJ, Sares AG, Quraini SI, Alpeter C, James D,
Thaut MH (2013) Corticostriatal contributions to musical expec-
tancy perception. J Cogn Neurosci 25(7):1062–1077. https ://doi.
org/10.1162/jocn
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2015Brain Structure and Function (2020) 225:1997–2015
1 3
Seyfarth RM, Cheney D (2014) The evolution of language from
social cognition. Curr Opin Neurobiol. https ://doi.org/10.1016/j.
conb.2014.04.003
Stachenfeld KL, Botvinick MM, Gershman SJ (2017) The hippocam-
pus as a predictive map. Nat Neurosci 20(11):1643–1653. https ://
doi.org/10.1038/nn.4650
Stewart L, Overath T, Warren JD, Foxton JM, Griffiths TD (2008)
fMRI evidence for a cortical hierarchy of pitch pattern processing.
PLoS ONE. https ://doi.org/10.1371/journ al.pone.00014 70
Tamir-Ostrover H, Eitan Z (2015) Higher is faster. Music Percept
33(2):179–198. https ://doi.org/10.1525/mp.2015.33.2.179
Tillmann B (2012) Music and language perception: expectations,
structural integration, and cognitive sequencing. Top Cogn Sci
4(4):568–584. https ://doi.org/10.1111/j.1756-8765.2012.01209 .x
Udden J, Martins MD, Zuidema W, Fitch WT (2019) Hierarchical
structure in sequence processing: how do we measure it and
what’s the neural implementation? Top Cogn Sci. https ://doi.
org/10.1111/tops.12442
Watanabe T, Yagishita S, Kikyo H (2008) Memory of music: roles of
right hippocampus and left inferior frontal gyrus. NeuroImage
39(1):483–491. https ://doi.org/10.1016/j.neuro image .2007.08.024
Zaccarella E, Meyer L, Makuuchi M, Friederici AD (2015) Building
by Syntax: the neural basis of minimal linguistic structures. Cereb
Cortex. https ://doi.org/10.1093/cerco r/bhv23 4
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Interestingly, these were the same conditions that elicited the greatest pleasure ratings in listeners. Other fMRI work has been more equivocal as to whether hippocampus tracks uncertainty in auditory sequences (Tobia et al., 2012), and in one study the hippocampal BOLD signal was reduced in tone sequences in which simple or hierarchical rules concerning pitch and duration were violated compared to when they were met (Martins et al., 2020). Disparate findings may relate to functional heterogeneity of hippocampal fields, position of activity along the long axis, or subtle task differences. ...
Article
The hippocampus has a well-established role in spatial and episodic memory but a broader function has been proposed including aspects of perception and relational processing. Neural bases of sound analysis have been described in the pathway to auditory cortex, but wider networks supporting auditory cognition are still being established. We review what is known about the role of the hippocampus in processing auditory information, and how the hippocampus itself is shaped by sound. In examining imaging, recording, and lesion studies in species from rodents to humans, we uncover a hierarchy of hippocampal responses to sound including during passive exposure, active listening, and the learning of associations between sounds and other stimuli. We describe how the hippocampus' connectivity and computational architecture allow it to track and manipulate auditory information – whether in the form of speech, music, or environmental, emotional, or phantom sounds. Functional and structural correlates of auditory experience are also identified. The extent of auditory-hippocampal interactions is consistent with the view that the hippocampus makes broad contributions to perception and cognition, beyond spatial and episodic memory. More deeply understanding these interactions may unlock applications including entraining hippocampal rhythms to support cognition, and intervening in links between hearing loss and dementia.
... Recent work has used fractal stimuli to explore hierarchical processing in the visual modality (Martins et al., 2015;Martins et al., 2014;, the auditory modality (Martins et al., 2017;Martins et al., 2020), and in the motor domain . In this series of studies, participants were performing a completion task on periodic fractals. ...
Preprint
Full-text available
In this article, we explore the extraction of recursive nested structure in the processing of binary sequences. Our aim was to determine whether the brain learns the higher order regularities of a highly simplified input where only sequential order information marks the hierarchical structure. To this end, we implemented sequence generated by the Fibonacci grammar in a serial reaction time task. This deterministic grammar generates aperiodic but self-similar sequences. The combination of these two properties allowed us to evaluate hierarchical learning while controlling for the use of low-level strategies like detecting recurring patterns. The deterministic aspect of the grammar allowed us to predict precisely which points in the sequence should be subject to anticipation. Results showed that participants' pattern of anticipation could not be accounted for by "flat" statistical learning processes and was consistent with them anticipating upcoming points based on hierarchical assumptions. We also found that participants were sensitive to the structure constituency, suggesting that they organized the signal into embedded constituents. We hypothesized that the participants built this structure by merging recursively deterministic transitions.
... In the latter these were Interestingly, these were the same conditions that elicited the greatest pleasure ratings in listeners. Other fMRI work has been more equivocal as to whether hippocampus tracks uncertainty in auditory sequences (Tobia et al., 2012), and in another study the hippocampal BOLD signal was reduced in tone sequences in which simple or hierarchical rules concerning pitch and duration were violated compared to when they were met (Martins et al., 2020). Disparate findings may relate to functional heterogeneity of hippocampal fields, or position of activity along the long axis. ...
Preprint
Full-text available
The hippocampus has a well-established role in spatial and episodic memory but a broader function has been proposed including aspects of perception and relational processing. Neural bases of sound analysis have been described in the pathway to auditory cortex, but wider networks supporting auditory cognition are still being established. We review what is known about the role of the hippocampus in processing auditory information, and how the hippocampus itself is shaped by sound. In examining imaging, recording, and lesion studies in species from rodents to humans, we uncover a hierarchy of hippocampal responses to sound including during passive exposure, active listening, and the learning of associations between sounds and other stimuli. We describe how the hippocampus' connectivity and computational architecture allow it to track and manipulate auditory information – whether in the form of speech, music, or environmental, emotional, or phantom sounds. Functional and structural correlates of auditory experience are also identified. The extent of auditory-hippocampal interactions is consistent with the view that the hippocampus makes broad contributions to perception and cognition, beyond spatial and episodic memory. More deeply understanding these interactions may unlock applications including entraining hippocampal rhythms to support cognition, and intervening in links between hearing loss and dementia.
Article
Music is used as an important medium for communication in human societies, often times to enhance the emotional meaning of narrative scenarios and ritual events. Music has a number of domain-specific tonal devices for doing this, spanning from scale structure to harmonic progressions and beyond. In order to explore the neural basis of tonal processing in music, we carried out an activation likelihood estimation (ALE) meta-analysis of 20 published functional magnetic resonance imaging studies of tonal cognition, with an emphasis on harmony processing. The most concordant areas of activation across these studies occurred at the junction of the inferior frontal gyrus, anterior insula, and orbitofrontal cortex in Brodmann areas 47 and 13 in the right hemisphere. This region is associated not only with emotion in general, but with the conveyance of affective meanings during communication processes, including speech prosody and music.
Article
Several previous authors have proposed a kind of specious or subjective present moment that covers a few seconds of recent information. This article proposes a new hypothesis about the subjective present, renamed the extended present, defined not in terms of time covered but as a thematically connected information structure held in working memory and in transiently accessible form in long-term memory. The three key features of the extended present are that information in it is thematically connected, both internally and to current attended perceptual input, it is organised in a hierarchical structure, and all information in it is marked with temporal information, specifically ordinal and duration information. Temporal boundaries to the information structure are determined by hierarchical structure processing and by limits on processing and storage capacity. Supporting evidence for the importance of hierarchical structure analysis is found in the domains of music perception, speech and language processing, perception and production of goal-directed action, and exact arithmetical calculation. Temporal information marking is also discussed and a possible mechanism for representing ordinal and duration information on the time scale of the extended present is proposed. It is hypothesised that the extended present functions primarily as an informational context for making sense of current perceptual input, and as an enabler for perception and generation of complex structures and operations in language, action, music, exact calculation, and other domains.
Article
Full-text available
Although comparative research has made substantial progress in clarifying the relationship between language and music as neurocognitive systems from both a theoretical and empirical perspective, there is still no consensus about which mechanisms, if any, are shared and how they bring about different neurocognitive systems. In this paper, we tackle these two questions by focusing on hierarchical control as a neurocognitive mechanism underlying syntax in language and music. We put forward the Coordinated Hierarchical Control (CHC) hypothesis: linguistic and musical syntax rely on hierarchical control, but engage this shared mechanism differently depending on the current control demand. While linguistic syntax preferably engages the abstract rule-based control circuit, musical syntax rather employs the coordination of the abstract rule-based and the more concrete motor-based control circuits. We provide evidence for our hypothesis by reviewing neuroimaging as well as neuropsychological studies on linguistic and musical syntax. The CHC hypothesis makes a set of novel testable predictions to guide future work on the relationship between language and music.
Article
Full-text available
In many domains of human cognition, hierarchically structured representations are thought to play a key role. In this paper, we start with some foundational definitions of key phenomena like “sequence” and “hierarchy," and then outline potential signatures of hierarchical structure that can be observed in behavioral and neuroimaging data. Appropriate behavioral methods include classic ones from psycholinguistics along with some from the more recent artificial grammar learning and sentence processing literature. We then turn to neuroimaging evidence for hierarchical structure with a focus on the functional MRI literature. We conclude that, although a broad consensus exists about a role for a neural circuit incorporating the inferior frontal gyrus, the superior temporal sulcus, and the arcuate fasciculus, considerable uncertainty remains about the precise computational function(s) of this circuitry. An explicit theoretical framework, combined with an empirical approach focusing on distinguishing between plausible alternative hypotheses, will be necessary for further progress.
Article
Full-text available
Generation of hierarchical structures, such as the embedding of subordinate elements into larger structures, is a core feature of human cognition. Processing of hierarchies is thought to rely on lateral prefrontal cortex (PFC). However, the neural underpinnings supporting active generation of new hierarchical levels remain poorly understood. Here, we created a new motor paradigm to isolate this active generative process by means of fMRI. Participants planned and executed identical movement sequences by using different rules: a Recursive hierarchical embedding rule, generating new hierarchical levels; an Iterative rule linearly adding items to existing hierarchical levels, without generating new levels; and a Repetition condition tapping into short term memory, without a transformation rule. We found that planning involving generation of new hierarchical levels (Recursive condition vs. both Iterative and Repetition) activated a bilateral motor imagery network, including cortical and subcortical structures. No evidence was found for lateral PFC involvement in the generation of new hierarchical levels. Activity in basal ganglia persisted through execution of the motor sequences in the contrast Recursive versus Iteration, but also Repetition versus Iteration, suggesting a role of these structures in motor short term memory. These results showed that the motor network is involved in the generation of new hierarchical levels during motor sequence planning, while lateral PFC activity was neither robust nor specific. We hypothesize that lateral PFC might be important to parse hierarchical sequences in a multi‐domain fashion but not to generate new hierarchical levels.
Article
Full-text available
A cognitive map has long been the dominant metaphor for hippocampal function, embracing the idea that place cells encode a geometric representation of space. However, evidence for predictive coding, reward sensitivity and policy dependence in place cells suggests that the representation is not purely spatial. We approach this puzzle from a reinforcement learning perspective: what kind of spatial representation is most useful for maximizing future reward? We show that the answer takes the form of a predictive representation. This representation captures many aspects of place cell responses that fall outside the traditional view of a cognitive map. Furthermore, we argue that entorhinal grid cells encode a low-dimensionality basis set for the predictive representation, useful for suppressing noise in predictions and extracting multiscale structure for hierarchical planning.
Article
Full-text available
The human ability to process hierarchical structures has been a longstanding research topic. However, the nature of the cognitive machinery underlying this faculty remains controversial. Recursion, the ability to embed structures within structures of the same kind, has been proposed as a key component of our ability to parse and generate complex hierarchies. Here, we investigated the cognitive representation of both recursive and iterative processes in the auditory domain. The experiment used a two-alternative forced-choice paradigm: participants were exposed to three-step processes in which pure-tone sequences were built either through recursive or iterative processes, and had to choose the correct completion. Foils were constructed according to generative processes that did not match the previous steps. Both musicians and non-musicians were able to represent recursion in the auditory domain, although musicians performed better. We also observed that general ‘musical’ aptitudes played a role in both recursion and iteration, although the influence of musical training was somehow independent from melodic memory. Moreover, unlike iteration, recursion in audition was well correlated with its non-auditory (recursive) analogues in the visual and action sequencing domains. These results suggest that the cognitive machinery involved in establishing recursive representations is domain-general, even though this machinery requires access to information resulting from domain-specific processes.
Article
Full-text available
The hippocampus has been implicated in integrating information across separate events in support of mnemonic generalizations. These generalizations may be underpinned by processes at both encoding (linking similar information across events) and retrieval (“on-the-fly” generalization). However, the relative contribution of the hippocampus to encoding- and retrieval-based generalizations is poorly understood. Using fMRI in humans, we investigated the hippocampal role in gradually learning a set of spatial discriminations and subsequently generalizing them in an acquired equivalence task. We found a highly significant correlation between individuals' performance on a generalization test and hippocampal activity during the test, providing evidence that hippocampal processes support on-the-fly generalizations at retrieval. Within the same hippocampal region there was also a correlation between activity during the final stage of learning (when all associations had been learnt but no generalization was required) and subsequent generalization performance. We suggest that the hippocampus spontaneously retrieves prior events that share overlapping features with the current event. This process may also support the creation of generalized representations during encoding. These findings are supportive of the view that the hippocampus contributes to both encoding- and retrieval-based generalization via the same basic mechanism; retrieval of similar events sharing common features. This article is protected by copyright. All rights reserved.
Article
The hippocampal-entorhinal system encodes a map of space that guides spatial navigation. Goal-directed behaviour outside of spatial navigation similarly requires a representation of abstract forms of relational knowledge. This information relies on the same neural system, but it is not known whether the organisational principles governing continuous maps may extend to the implicit encoding of discrete, non-spatial graphs. Here, we show that the human hippocampal-entorhinal system can represent relationships between objects using a metric that depends on associative strength. We reconstruct a map-like knowledge structure directly from a hippocampal-entorhinal functional magnetic resonance imaging adaptation signal in a situation where relationships are non-spatial rather than spatial, discrete rather than continuous, and unavailable to conscious awareness. Notably, the measure that best predicted a behavioural signature of implicit knowledge and blood oxygen level-dependent adaptation was a weighted sum of future states, akin to the successor representation that has been proposed to account for place and grid-cell firing patterns.
Article
Schemas are superordinate knowledge structures that reflect abstracted commonalities across multiple experiences, exerting powerful influences over how events are perceived, interpreted, and remembered. Activated schema templates modulate early perceptual processing, as they get populated with specific informational instances (schema instantiation). Instantiated schemas, in turn, can enhance or distort mnemonic processing from the outset (at encoding), impact offline memory transformation and accelerate neocortical integration. Recent studies demonstrate distinctive neurobiological processes underlying schema-related learning. Interactions between the ventromedial prefrontal cortex (vmPFC), hippocampus, angular gyrus (AG), and unimodal associative cortices support context-relevant schema instantiation and schema mnemonic effects. The vmPFC and hippocampus may compete (as suggested by some models) or synchronize (as suggested by others) to optimize schema-related learning depending on the specific operationalization of schema memory. This highlights the need for more precise definitions of memory schemas.
Article
Learning is facilitated when information can be incorporated into an already learned set of rules or 'mental schema'. The location of a new restaurant, for example, is learned more easily if the neighborhood's general layout is already known. This type of information is processed by the hippocampus and stored as a schema in the cortex, but it is not known whether the hippocampus can also map new stimuli to cortical schemata that are hippocampus-independent, such as odour classification. Using a hippocampus-independent odour-rule task we found that animals without a functional hippocampus learnt which odours did not fit the rule faster than sham animals, which persistently applied the rule to all odours. Conversely, when non-fitting odours were linked to a new rule sham animals were faster to link these odours to the new rule. The hippocampus, thus, regulates the association of stimuli with existing schemata even when the schemata are hippocampus-independent. This article is protected by copyright. All rights reserved.
Article
The online processing of both music and language involves making predictions about upcoming material, but the relationship between prediction in these two domains is not well understood. Electrophysiological methods for studying individual differences in prediction in language processing have opened the door to new questions. Specifically, we ask whether individuals with musical training predict upcoming linguistic material more strongly and/or more accurately than non-musicians. We propose two reasons why prediction in these two domains might be linked: (a) Musicians may have greater verbal short-term/working memory; (b) music may specifically reward predictions based on hierarchical structure. We provide suggestions as to how to expand upon recent work on individual differences in language processing to test these hypotheses.
Article
The ability to predict upcoming structured events based on long-term knowledge and contextual priors is a fundamental principle of human cognition. Tonal music triggers predictive processes based on structural properties of harmony, i.e., regularities defining the arrangement of chords into well-formed musical sequences. While the neural architecture of structure-based predictions during music perception is well described, little is known about the neural networks for analogous predictions in musical actions and how they relate to auditory perception. To fill this gap, expert pianists were presented with harmonically congruent or incongruent chord progressions, either as musical actions (photos of a hand playing chords) that they were required to watch and imitate without sound, or in an auditory format that they listened to without playing. By combining task-based functional magnetic resonance imaging (fMRI) with functional connectivity at rest, we identified distinct sub-regions in right inferior frontal gyrus (rIFG) interconnected with parietal and temporal areas for processing action and audio sequences, respectively. We argue that the differential contribution of parietal and temporal areas is tied to motoric and auditory long-term representations of harmonic regularities that dynamically interact with computations in rIFG. Parsing of the structural dependencies in rIFG is co-determined by both stimulus- or task-demands. In line with contemporary models of prefrontal cortex organization and dual stream models of visual-spatial and auditory processing, we show that the processing of musical harmony is a network capacity with dissociated dorsal and ventral motor and auditory circuits, which both provide the infrastructure for predictive mechanisms optimising action and perception performance.