ArticlePDF Available

Abstract and Figures

Music is a curious example of a temporally patterned acoustic stimulus, and a compelling pan-cultural phenomenon. This review strives to bring some insights from decades of music psychology and sensorimotor synchronization (SMS) literature into the mainstream auditory domain, arguing that musical rhythm perception is shaped in important ways by temporal processing mechanisms in the brain. The feature that unites these disparate disciplines is an appreciation of the central importance of timing, sequencing, and anticipation. Perception of musical rhythms relies on an ability to form temporal predictions, a general feature of temporal processing that is equally relevant to auditory scene analysis, pattern detection, and speech perception. By bringing together findings from the music and auditory literature, we hope to inspire researchers to look beyond the conventions of their respective fields and consider the cross-disciplinary implications of studying auditory temporal sequence processing. We begin by highlighting music as an interesting sound stimulus that may provide clues to how temporal patterning in sound drives perception. Next, we review the SMS literature and discuss possible neural substrates for the perception of, and synchronization to, musical beat. We then move away from music to explore the perceptual effects of rhythmic timing in pattern detection, auditory scene analysis, and speech perception. Finally, we review the neurophysiology of general timing processes that may underlie aspects of the perception of rhythmic patterns. We conclude with a brief summary and outlook for future research.
Content may be subject to copyright.
Temporal Processing in Audition: Insights from Music
Vani G. Rajendran,
a
Sundeep Teki
ay
and Jan W. H. Schnupp
by
*
a
Auditory Neuroscience Group, University of Oxford, Department of Physiology, Anatomy, and Genetics, Oxford, UK
b
City University of Hong Kong, Department of Biomedical Sciences, 31 To Yuen Street, Kowloon Tong, Hong Kong
Abstract—
Music is a curious example of a temporally patterned acoustic stimulus, and a compelling pan-cultural
phenomenon. This review strives to bring some insights from decades of music psychology and sensorimotor
synchronization (SMS) literature into the mainstream auditory domain, arguing that musical rhythm perception
is shaped in important ways by temporal processing mechanisms in the brain. The feature that unites these dis-
parate disciplines is an appreciation of the central importance of timing, sequencing, and anticipation. Perception
of musical rhythms relies on an ability to form temporal predictions, a general feature of temporal processing that
is equally relevant to auditory scene analysis, pattern detection, and speech perception. By bringing together
findings from the music and auditory literature, we hope to inspire researchers to look beyond the conventions
of their respective fields and consider the cross-disciplinary implications of studying auditory temporal sequence
processing. We begin by highlighting music as an interesting sound stimulus that may provide clues to how
temporal patterning in sound drives perception. Next, we review the SMS literature and discuss possible neural
substrates for the perception of, and synchronization to, musical beat. We then move away from music to explore
the perceptual effects of rhythmic timing in pattern detection, auditory scene analysis, and speech perception.
Finally, we review the neurophysiology of general timing processes that may underlie aspects of the perception
of rhythmic patterns. We conclude with a brief summary and outlook for future research.
This article is part of a Special Issue entitled: Sequence Processing.Ó2017 The Authors. Published by Elsevier Ltd on behalf
of IBRO. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Key words: music psychology, sensorimotor synchronization, beat perception, rhythm perception, auditory scene analysis, tem-
poral prediction.
WHAT MUSIC PSYCHOLOGY REVEALS ABOUT
THE NATURAL BOUNDS OF HUMAN
TEMPORAL PROCESSING
Rhythm is an aspect of music that occurs on a medium
temporal scale (hundreds of milliseconds to one or two
seconds), longer than that of pitch (up to tens of
milliseconds), but shorter than that of global musical
form and structure (several seconds to minutes, e.g.
phrases, sections, movements). Crucially, it is at the
temporal scale of rhythm that a number of overt motor
processes in humans tend to occur, such as the swing
of the arms and legs during walking or the inhaling and
exhaling of air during breathing. Dance, for example, is
movement to the rhythm of music. In the Western music
tradition, movements such as dance are typically
synchronized to a periodic pulse, or beat. It is important
to highlight that pulse and beat are not physical
properties of the music itself, but are perceptual
phenomena that arise from music through beat
induction (BI). BI refers to our ability to extract a
periodic pulse from music and is widely considered a
cognitive skill, though its species-specificity and domain-
specificity are topics of current debate (Honing, 2012).
The neurophysiology underlying beat perception will later
be discussed at length, but a brief review of music psy-
chology research into perceptual aspects of rhythmic tim-
ing will first offer a number of practical observations from
which to embark on this investigation.
Timescales
Studies into sensorimotor synchronization (SMS) tend to
employ simple movements such as tapping a finger as a
readout of the perceived beat. These studies find that
beat is generally perceived between 0.5 and 4 Hz,
corresponding to time intervals of 250 ms to 2 s, a
range beyond which precise coordination of motor
movements becomes difficult (Repp, 2005; McAuley
et al., 2006; Repp and Su, 2013). Even within this range,
https://doi.org/10.1016/j.neuroscience.2017.10.041
0306-4522/Ó2017 The Authors. Published by Elsevier Ltd on behalf of IBRO.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
*Corresponding author.
E-mail address: wschnupp@cityu.edu.hk (J. W. H. Schnupp).
y
Sundeep Teki and Jan W. H. Schnupp contributed equally as last
authors.
Abbreviations: BI, beat induction; EEG, electroencephalography;
ERPs, event-related potentials; IOI, inter-onset interval; NMA,
negative mean asynchrony; SMS, sensorimotor synchronization;
SSA, stimulus-specific adaptation.
NEUROSCIENCE
RESEARCH ARTICLE
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.ne uroscience.2017.10.041
1
perception of time differs between shorter and longer time
intervals. When asked to judge the duration of time inter-
vals, there is a systematic tendency for human listeners to
overestimate shorter time intervals (roughly 250–400 ms)
and underestimate long ones (600 ms to 2 s). The tran-
sition point in between, measured by various researchers
to lie between 400 and 600 ms, is termed the indifference
interval and also corresponds to the rate at which people
spontaneously tap (Fraisse et al., 1958;Fraisse, 1963;
1978,1982;Clarke, 1999).
In the context of rhythm perception, it is also the
boundary between temps courts and temps longs
(Clarke, 1999). When human subjects are asked to tap
rhythmically, almost invariably they employ a 1:1 or 1:2
ratios to the time intervals between successive taps. A
ratio of 1:2 refers to a tapping pattern of long and short
intervals where the short intervals are precisely half the
duration of the longer ones. This alludes to the theory that
temps longs are intervals during which a listener is aware
of the passage of time, whereas temps courts do not
evoke a sense of time passage by themselves, but listen-
ers are aware that a certain number of them grouped
together make up a longer interval. Tapping ratios of 1:2
observed almost always span the indifference interval,
with the longer interval belonging to temps longs and
the shorter one to temps courts (see Fig. 1). A preference
for time intervals with integer ratios also shapes the way
rhythmic patterns are perceived (Jacoby and
McDermott, 2017). Compared with intervals with noninte-
ger ratios, intervals with integer ratios are more accurately
reproduced by listeners (Essens and Povel, 1985) and
show a distinct pattern of neural activity (Sakai et al.
(1999); see also the later section entitled Neurophysiol-
ogy of beat perception). Interestingly, while a preference
for integer ratios spans different cultures, the specific
ratios preferred by listeners is primarily determined by
their music listening experience and is not strongly
affected by musical expertise (Jacoby and McDermott,
2017).
Beat – a perceptual accent
The timescales are one aspect of what determines where
a musical beat might be felt, but not all sound events in
music are equally likely to induce a beat percept.
Certain events in music have been described as giving
rise to perceptual accents, which, together with the
temporal constraints described earlier, form the basis of
where the beat is felt.
Perceptual accents may be felt at points that differ in
loudness or in frequency relative to surrounding events.
However, perceptual accents can also arise purely
through temporal context. Essens and Povel (Povel and
Essens, 1985) proposed a theoretical framework for met-
rical complexity based on empirical observations. They
posit that (1) an isolated acoustic event will be perceived
as accented, (2) the second of a set of two similar or iden-
tical acoustic events played in sequence will be perceived
as accented, and (3) the first and last of three or greater
similar events in a sequence will be perceptually
accented. Based on the location of perceptual accents
within a rhythm (which themselves may not be periodic),
the period and phase of a periodic pulse can be
determined.
Not all beats are created equal, nor is there always an
accent: subjective rhythmization
The basic temporal structure of a piece of music can be
described by its meter, or its recurring pattern of strong
beats and weak beats. Again, ‘strong’ and ‘weak’ in this
context are perceptual notions, much akin to identical
ticks of a clock being instinctively perceived as tick-tock-
tick-tock (Bolton, 1894; van Noorden and Moelants,
1999; Brochard et al., 2003; Ba
˚a
˚th, 2015). This tick-tock
of a clock could be described as having a binary meter,
or a beat pattern based on the number two (most com-
monly two or four beats in a bar) and have the beat pat-
tern of strong-weak-strong-weak. Ternary meters, or
bars based on the number three, have a pattern of
strong-weak-weak-strong-weak-weak, the most common
example being a waltz. Other more complex meters, for
example based on 5 or 7 beats in a bar are also common
in Western music, though binary and ternary meters are
more often studied because they are generally more
effective in inducing a clear beat percept. The preference
or natural acceptance of binary meter could be due to a
likeness of such meters to common rhythmic motor pat-
terns such as breathing or walking.
To summarize
Within the range of frequencies that a periodic pulse can
typically be perceived, there is a further distinction
between longer timescales across which the passage of
time is noticeable, and shorter timescales of which
several together are perceived to fit into a longer
timescale. The boundary between the two is the
indifference interval, which lies somewhere between 400
and 600 ms. This is where temporal perception is most
accurate in humans (Fraisse, 1978), and incidentally also
corresponds to a comfortable walking pace (Styns et al.,
2007). Beats themselves arise as a result of the combina-
tion of perceptual accents and the constraint of a periodic
pulse within the range of perceivable beat frequencies. A
repeating pattern of strong and weak beats group
together to form the musical meter of a piece, and some
meters (binary and ternary) are more easily interpreted
generally, perhaps due to their semblance to binary motor
patterns or to the harmonic series on a fundamental
frequency.
THE PSYCHOACOUSTICS OF BEAT
PERCEPTION: SENSORIMOTOR
SYNCHRONIZATION
With beat and meter defined, we are now equipped to
explore how we synchronize to beat. When we hear a
beat in music, we almost instinctively want to move
with it, and it has been shown that listeners often
cannot maintain movements that are out of sync
(Repp, 2002a). The synchronization of our movements to
an external rhythm is known as sensorimotor synchroniza-
tion (SMS). SMS has been studied extensively
2 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
(see Repp, 2005 and Repp and Su, 2013 for reviews), and
we highlight a few observations from SMS studies that may
be of particular relevance to a discussion of the neurophys-
iological processes that underlie rhythm perception.
When tapping along, we are usually early
Negative mean asynchrony (NMA) is a testament to the
predictive nature of synchronizing a motor action with an
expected stimulus. NMA refers to the observation that
listeners, when asked to tap along with an isochronous
pacing stimulus such as a metronome, tend to
anticipate stimulus onsets with their taps by tens of
milliseconds, rather than tapping with a distribution that
is symmetric around sound onsets (sometimes early,
sometimes late). Interestingly, listeners are often
unaware of their own NMA, suggesting a general
incongruence between objective and subjective
synchrony. Musicians tend to show less NMA than
nonmusicians, and the neurophysiological differences
between the two groups may therefore shed some
insight into the interaction between the sensory, motor,
and cognitive processes involved. A final observation is
that NMA decreases as the tempo of the pacing
stimulus increases, which may allude to the tendency to
overestimate short time intervals and underestimate
longer ones described in the previous section on
Timescales. For a more comprehensive review of NMA,
see (Aschersleben, 2002).
Beat period and phase may have distinct underlying
representations
A number of intriguing insights into SMS have also been
uncovered through studies that systematically perturb
the pacing stimulus, for example by introducing a phase
offset, tempo change, or a sequence of distractors.
Overwhelmingly, the evidence points to an interesting
behavioral dissociation between phase correction and
period correction (Repp, 2005). Phase correction in an
isochronous sound sequence refers to a subtle adjust-
ment of tapping so that it returns to synchrony following
an unexpected inter-onset interval (IOI) that is abnormally
short or long, which would result in an abrupt phase shift
in the sequence that is either temporary or persistent
(Anomaly and Phase shift in Fig. 2, respectively). Period
correction refers to the adjustment of taps to a sudden
tempo change, or an abruptly shorter or longer IOI
(Tempo change in Fig. 2).
The phase correction mechanism appears to be
automatic; the timing of the tap subsequent to the
perturbation shifts according to whether the preceding
IOI was shortened or lengthened, even when the phase
offset is imperceptible to listeners (Repp, 2002b). Simi-
larly, shifting a single tone in an isochronous stimulus
such that it results in a shorter IOI on one side of it and
a longer IOI on the other induces an involuntary shift in
tap times after the perturbation, even when participants
were told to ignore the perturbation. If a distractor
Fig. 1. (A) The beat perceived depends on the tempo at which a musical rhythm is played. In this simple, recognizable example rhythm, notes
represent sound events and those with a single stem are quarter notes, notes with the attached stem are eighth notes, and the remaining symbol is
a quarter rest (silence). The basic unit of time here is the quarter note; a quarter rest is the same duration as a quarter note, and each eighth note is
half the duration of the quarter note. Tempo is conventionally specified in beats per minute, so for the slow tempo (in red), there would be 75 quarter
notes per minute, and each quarter note is therefore 800 ms in duration. The fast tempo (in blue) is twice the speed of the slow tempo. In both cases,
the beat may be comfortably perceived at 800-ms intervals or 1.25 Hz (filled circles), but depending on the tempo this may coincide with different
events in the music. The alternation of strong (solid lines) and weak beats (dotted lines) are illustrated for each tempo. Syncopation (green triangle),
or when a beat is felt where there is silence, is very common in music. (B) This schematic illustrates the time scales over which common auditory
events unfold. Time is on a log scale from small intervals (fast rates) to large intervals (slow rates), with values shown in milliseconds and in Hz. The
indifference interval is marked in purple; shorter intervals are temps courts, longer intervals are temps longs.
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 3
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
sequence of isochronous tones is introduced, taps shift
toward it, and interestingly this effect appears to be insen-
sitive to the pitch difference between the tones of the pac-
ing and distractor sequence. In this case, temporal
coherence seems to be key: if a target and distractor tone
are within 120 ms of each other, tapping behavior would
suggest that they are treated as a joint referent. In con-
trast, period correction to a step change in tempo appears
to require the change in tempo to be perceptible (Repp
and Keller, 2004). Listeners’ ability to completely ignore
a tempo change and continue tapping at the original
tempo without showing any period correction is further
evidence that period correction requires cognitive control,
in contrast to phase correction, which in the same task
proved impossible for participants to suppress. Under
the looser constraint of self-paced movements, there
does appear to be a natural tendency to synchronize
movements to the period and phase of a musical beat
(Peckel et al., 2014).
How fast we can tap along depends on what we are
tapping to
Depending on the nature of the task, different studies
reports somewhat different ranges within which beat
perception and SMS can occur. In truth, the context-
dependent nature of SMS is in itself a reflection of the
different sensory and biomechanical constraints. At the
slow extreme, IOIs longer than 1.8 s result in inaccurate
synchronization where taps begin to lag the pacing
stimulus. At the fast extreme, finger tapping with an
isochronous pacing stimulus can be done at a rate of up
to 5–7 taps per second. However, if the task is 1:n
synchronization, the IOIs in the pacing signal can be as
short as 100–120 ms for trained musicians, which
suggests that audiomotor processing can cope with
these fast rates. This so-called subdivision benefit too
depends on the exact subdivision required. 1:2, 1:3, 1:4,
and 1:8 tapping can be done at lower IOIs than 1:5 or
1:7, with 1:6 and 1:9 tapping falling somewhere in
between. This suggests a certain level of automaticity to
subdivision by 2, 3, 4, and 8, while the cognitive
demands of counting groups of 5 interferes with sensory
processing. A similar effect is observed when listeners
tap an isochronous beat in non-isochronous rhythmic
patterns. Rhythmic patterns differ in their complexity,
and while very complex rhythms are difficult to
synchronize to (Povel and Essens, 1985), rhythmic pat-
terns of medium complexity are what elicit the greatest
desire from listeners to move (Witek et al., 2014). This
may relate to beat salience, which has been shown to cor-
relate with listeners’ desire to move (Madison et al.,
2011). Rhythmic complexity and the strength of the beat
percept also influence the precision of temporal judg-
ments (Grube and Griffiths, 2009) and may be due to dif-
ferences in neural representation of metrically simple,
complex, and non-metrical sound patterns (Sakai et al.,
1999; Vuust and Witek, 2014).
Fig. 2. Illustration of a selection of perturbations used to study period and phase correction in sensorimotor synchronization. The x-axis represents
time, and here a temporal grid representing 600-ms intervals is marked by the vertical dotted lines. Circles represent clicks to which a listener would
align their taps, and blue circles mark sounds whose timing would be a departure from the isochronous condition where there is no perturbation. In
the Isochronous condition (top), a click is played every 600 ms. In Anomaly, a single click in the sequence is manipulated such that the IOIs that
flank it are too long and too short by 100 ms, allowing the remainder of the sequence to remain unchanged. In Phase Shift, a single IOI is lengthened
by 100 ms but this time is not gained back, resulting in a phase shift of 100 ms that persists for all remaining clicks, even though their IOI remains
600 ms. In Tempo Change, the IOI changes abruptly from 600 ms to 500 ms. This would be perceived as a faster tempo and would require and
adjustment in the period of taps.
4 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
To summarize
Perception appears to be based on a judgment of
intervals, whereas action appears to be the result of a
joint computation based on stimulus onsets and ongoing
taps. Beat has both a period and a phase, and
perturbation studies suggest that dissociable processes
underlie adjustment of each. Specifically, phase
correction appears to be automatic and involuntary,
while period correction requires cognizance of a tempo
change and can be suppressed at will. The temporal
limits of synchronization ability are also context-
dependent and are a result of biomechanical, sensory,
and cognitive constraints. These factors are also at play
in the context of more complicated rhythmic patterns
and real music, where the temporal structure of the
sound affects listeners’ ability and desire to synchronize.
NEUROPHYSIOLOGY OF BEAT PERCEPTION
A number of electrophysiological studies in humans have
attempted to identify the neural correlate of the beat
percept. Neural signatures of beat perception have been
identified through direct and indirect means and involve
distributed cortical and subcortical networks (Teki et al.,
2012). Comparisons with studies in newborn humans
(Winkler et al., 2009b) and nonhuman species would sug-
gest that some aspects of rhythm perception may be
innate to humans and to some nonhuman species,
whereas other aspects may be unique to humans.
Strong beats differ physiologically from weak beats
As described earlier, subjective rhythmization can
generate a metrical percept of alternating strong and
weak beats even in an isochronous sequence of
identical sounds. This paradigm arguably would allow
for the dissociation between the cognitive and the
sensory aspects of beat perception in the context of
identical isochronous sounds. Electroencephalography
(EEG) studies that investigate the neural correlates of
subjective accenting do so either directly or indirectly.
Indirect methods involve the measurement of event-
related potentials (ERPs) resulting from rare ‘‘deviant”
(e.g. an omission or change in loudness) in a series of
‘‘standard” or expected sounds. Differences in the ERP
to perturbations coinciding with strong and weak beats
may therefore signify neurophysiological differences in
processing that result from subjective accenting. Early
components of the ERP are known as the mismatch
negativity (MMN) and are considered to be pre-attentive
in contrast to later components (300–600 ms post-
stimulus onset), which are presumed to reflect cognitive
mechanisms. Though subjective accenting is cognitive
by definition, the setting of temporal expectations may
influence the processing of forthcoming sounds in a
predictive manner, and indeed both early and late ERP
differences have been found between deviants at strong
and weak beat positions (Brochard et al., 2003;
Abecasis, 2005; Geiser et al., 2010; Schaefer et al.,
2010; Bouwer et al., 2014;Honing et al., 2014).
The more direct approach compares sound-evoked
responses at strong and weak positions in rhythmic
sequences. Here too, strong beats evoke higher source
current activity than weak beats in temporal and frontal
areas, despite sounds being acoustically identical (Todd
and Lee, 2015). Similarly, a target sound played over a
background of pop music evokes stronger cortical and
brainstem responses if it was presented on the beat,
rather than shifted off the beat by ¼of the inter-beat inter-
val (Tierney and Kraus, 2013). All together, these event-
based studies suggest that metrically strong positions
are accompanied by larger source currents than metri-
cally weak positions, and that these differences may be
pre-attentive. It is worth noting, however, that by design
these studies look at differences in predictions of not only
‘‘when” an auditory event is expected, but also ‘‘what” that
auditory event should be (Teki and Kononowicz, 2016).
Behavioral evidence suggests that these two types of pre-
dictions may have distinct neural substrates (Morillon
et al., 2016; Rajendran and Teki, 2016), and it is therefore
not yet possible to say whether pre-attentive responses
are a result of temporal expectation alone or a combina-
tion of expectations of ‘‘what” and ‘‘when” (Arnal, 2012;
Arnal and Giraud, 2012; Schwartze et al., 2013).
Entrainment of oscillatory activity to musical beat
In addition to event-based descriptions of the beat
percept, cortical oscillations have also been shown to
reflect metrical structure. This is noteworthy because it
suggests that neural oscillations, in addition to
entraining to the rate of individual events in a rhythmic
sequence, are also able to entrain to higher-level
temporal regularities, but the precise mechanism behind
this is still unknown. Modulation of auditory cortical
activity in the beta band has been shown to track the
clicks of a metronome, while gamma oscillations appear
to encode anticipated stimulus timing as evidenced by a
peak in gamma activity even in the absence of a click
(Fujioka et al., 2009). Beta oscillations have also been
demonstrated to encode beat and meter imagery
(Iversen et al., 2009; Fujioka et al., 2015), and the dynam-
ics of induced beta oscillatory activity both in humans
(Teki, 2014) and in nonhuman primates (Bartolo et al.,
2014; Bartolo and Merchant, 2015) (see the later section
on Beat processing in nonhuman species), have been
shown to vary according to the temporal regularity of
sound sequences. In addition to beta, gamma band oscil-
lations also appear to encode beat and meter (Snyder and
Large, 2005; Zanto et al., 2006), and entrainment in the
low-frequency delta-theta band (<8 Hz) has also been
shown to correlate with years of musical training
(Doelling and Poeppel, 2015). Low-frequency entrain-
ment to the beat has also been observed in the bulk elec-
troencephalogram signal (Nozaradan et al., 2011; Henry
et al., 2014; see Zhou et al., 2016 for a guide on the inter-
pretation of low-frequency components in the Fourier
spectrum). A hierarchical organization of oscillatory activ-
ity in the auditory cortex is thought to facilitate temporal
processing of auditory stimuli and coordinate activity
between sensory and other brain areas (Lakatos, 2005).
Cortical oscillations have furthermore been hypothesized
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 5
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
to provide a mechanism for attentional selection and may
be entrained by rhythmic auditory stimuli (Lakatos et al.,
2008; Schroeder and Lakatos, 2009; Gomez-Ramirez
et al., 2011; Lakatos et al., 2013).
Brain areas involved in beat perception
In addition to the auditory cortex, musical rhythms have
been shown to engage a number of distributed brain
areas, including several that would traditionally be
considered part of the brain’s motor system, and hence
might not immediately be thought of as playing a key
role in beat perception. These include the basal ganglia,
supplementary motor area, striatum, cerebellum,
sensorimotor cortex, and premotor cortex (Parsons,
2001; Grahn and Brett, 2007; Zatorre et al., 2007; Chen
et al., 2008; Grahn, 2009; Teki et al., 2011). Engagement
of motor-related areas appears to be automatic since it is
observed consistently even when listeners are instructed
not to make overt movements (Chen et al., 2008). Activa-
tion in auditory and motor areas furthermore correlates
with individual differences in beat perception (Grahn and
McAuley, 2009).
The activation of brain areas during beat perception
depends on several factors including the duration of
intervals (Lewis and Miall, 2003), temporal context (Teki
et al., 2011), and task demands (Merchant et al., 2013).
The core timing areas of the brain, specifically the stria-
tum and the cerebellum (Ivry and Schlerf, 2008) are acti-
vated in perceptual timing depending on the temporal
regularity of the sequences. For isochronous sequences,
where a clear beat can be perceived, timing relies more
on a network involving the striatum, while for jittered
sequences, where the percept of a beat is negligible
and intervals are encoded in an absolute manner, timing
relies more on an olivocerebellar network (Teki et al.,
2011,2012). Examination of individuals who exhibit ‘‘beat
deafness” (Phillips-Silver et al., 2011), a rare condition
that is associated with poor motor synchronization and/
or impoverished beat perception (Sowin
´ski and Bella,
2013), provides further evidence that beat perception
may recruit distinct circuits depending on the implicit/ex-
plicit timing aspect of the task (Be
´gel et al., 2017). The
dissociation of striatal and cerebellar responses for
beat-based versus duration-based sequences has
recently been observed to hold not only for perception
but also for working memory for single time intervals in
sequences with different rhythmic structures (Teki and
Griffiths, 2014,2016).
Beat perception itself may be subcategorized into the
processes of finding, continuing, and adjusting the beat,
and the evidence points strongly toward the basal
ganglia being involved in the continued representation of
beat rather than its detection or adjustment (Chapin
et al., 2010; Grahn and Rowe, 2013). In one fMRI study
(Chapin et al., 2010), participants were played six cycles
of each of a set of complex rhythm and were tasked with
attending to the rhythm, holding it in memory over 12 s,
then reproducing it by tapping. During the attending
phase, the basal ganglia showed significant activation
only if the auditory stimulus was attended to, and if
sufficient cycles of the rhythm had passed for listeners
to perceive the beat. The basal ganglia also remained
active during the rehearsal period. Similarly, in another
fMRI study (Grahn and Rowe, 2013) where beat and non-
beat rhythms were played consecutively, the preceding
rhythm determined whether the beat in the subsequent
rhythm, if any, was a continuation from the previous
rhythm (beat continuation), was sped up or slowed down
(beat adjustment), or needed to be found afresh (beat
finding). Here, the basal ganglia were most active in beat
continuation conditions and less active for beat adjust-
ment conditions, with no apparent difference between
the beat finding and the nonbeat (where no beat was pre-
sent) conditions.
The superior temporal gyrus, premotor cortex, and
ventrolateral prefrontal cortex show activity during beat
detection and synchronization through tapping (Kung
et al., 2013). When tapping to rhythmic sequences that
contain syncopation (the absence of sound on a per-
ceived beat, see Fig. 1), differences in activation of the
premotor cortex, supplemental motor area, basal ganglia,
and lateral cerebellum were observed, and these differ-
ences were present even when motor actions were not
executed and the beat was simply imagined (Oullier
et al., 2005). Syncopation is among the factors that deter-
mine how engaging listeners find a piece of music, and
pleasant music appears to more effectively entrain neural
responses in the caudate nucleus of the basal ganglia
(Trost et al., 2014). Premotor and cerebellar areas are
also more heavily recruited in response to subjectively
more ‘‘beautiful” rhythms, and activity in the ventral pre-
motor cortex (PMv) is enhanced by rhythms that are at
a preferred tempo (Kornysheva et al., 2010). Repetitive
transcranial magnetic stimulation (TMS) over the PMv
changed people’s preferred tempo, suggesting that the
PMv may be involved in beat rate preference
(Kornysheva and Schubotz, 2011).
Findings from a number of functional imaging studies
begin to allude back to some of the observations from
early studies on temporal processing in the context of
music. For example, beat induction is poorer for a slow
(1500 ms) tempo compared to a faster one (600 ms),
and activity in the basal ganglia, premotor and
supplementary motor regions, and thalamus is
correspondingly reduced (McAuley et al., 2012). This is
consistent with accounts that the motor system is prefer-
entially engaged in the measurement of sub-second time
intervals (Lewis and Miall, 2003). Basal ganglia activity
peaks around 500–600 ms (Riecker et al., 2003), which
is comparable to the indifference interval and the rate of
spontaneous tapping in humans (Repp and Su, 2013).
The upper tempo limit to beat perception (200 ms)
may be determined by the time constant for temporal
integration (Loveless et al., 1996), which is comparable
to the duration of auditory short term sensory memory,
or ‘‘short auditory store” (Cowan, 1984). Recent work,
however, suggests that temporal memory resources
may not be fixed for a discrete number of items but flexibly
distributed according to the number of intervals to be
encoded in a sequence (Teki and Griffiths, 2014;
Joseph et al., 2016).
6 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
Model-based accounts of beat perception
A number of theoretical models have been proposed that
capture neural and behavioral aspects of beat perception.
Neural resonance theory is an influential computational
model that consists of two sets of dynamic nonlinear
oscillators, one that receives sensory input (an
‘‘auditory” layer) and one that receives input and
projects back to the auditory layer (a ‘‘motor” layer). The
interaction between these layers can be modeled as a
dynamical system, and the results resemble both
neurophysiological and behavioral aspects of beat
perception (Large et al., 2015). Neural resonance theory
is compatible with the dynamic attending theory, which
postulates oscillatory fluctuations in attention (Large and
Jones, 1999). The active sensing hypothesis (Schroeder
et al., 2010) postulates similar interactions between the
auditory and motor system (see (Henry and Herrmann,
2014) for a comparison of the two hypotheses). The
‘‘action simulation for auditory prediction” (ASAP) hypoth-
esis goes a step further by suggesting that auditory per-
ception is sharpened by the explicit simulation of
periodic movement in motor planning regions of the brain
(Patel and Iversen, 2014). The precise mechanism for
beat induction remains unknown, though the entrainment
of neural oscillations is a common thread between these
competing hypotheses.
Beat processing in nonhuman species
So far, the discussion has focused on findings from
human studies. Beat perception studies in nonhuman
species are numerous, but apart from notable
exceptions such as a cockatoo (Patel et al., 2009) and a
sea lion (Cook et al., 2013; Rouse et al., 2016), nonhu-
man species have shown little compelling evidence of
being able to perceive and synchronize to the beat as pre-
cisely as humans (Geiser et al., 2014). Though chim-
panzees appear to show some synchronization ability
(Hattori et al., 2013), it appears to be weak and quite lim-
ited in tempo range compared to that of the human. This
may be somewhat surprising given that humans are not
the only species that relies on rhythmic sounds such as
vocalizations and produces rhythmic movements. Indeed,
some signatures of rhythm perception in humans have
also been observed in macaques, such as interval
duration-selective modulation of beta oscillations
(Bartolo et al., 2014). In this and other related studies,
the macaques were given a serial continuation task where
they tapped along to a metronome and continued tapping
at the same rate after the metronome stops. Though tap
times tended to lag metronome clicks by 100–250 ms,
these lags were shorter than the macaque’s reaction
times, suggesting that there was a predictive element,
though not strong enough to mimic the near-zero or neg-
ative lags in humans. Like in humans, beta oscillations in
the basal ganglia (putamen) show preference for the con-
tinuation of a beat, and overall, similar timing circuits have
been identified in both human and nonhuman primates,
though macaques show better performance when syn-
chronizing their movements to a visual rather than audi-
tory metronome (Merchant et al., 2015). This is in
contrast to a clear auditory bias in humans (Honing and
Merchant, 2014). Larger responses in primary auditory
cortex to tones at ‘‘strong beat” positions in a rhythmic
sequence than to the same tones in a rhythmically irregu-
lar sequence have also been observed in macaques, in
addition to enhanced deviance detection ability
(Selezneva et al., 2013). However, this may be due to
sensitivity to rhythmic grouping rather than to beat per-
ception itself, since certain aspects of beat-specific neural
activity observed in human adults and newborns are not
observed in macaques (Honing et al., 2012). From the
perspective of low-level auditory processing, firing rate
adaptation as early as the midbrain results in higher aver-
age firing rates on the beat than off the beat; this may
explain why some beat interpretations are more likely to
be felt than others, and may also be a relevant precursor
to the entrainment of cortical oscillations to beat
(Rajendran et al., 2017).
To summarize
Human imaging studies have provided glimpses into the
complex and highly distributed neural dynamics that are
set into motion by musical rhythms. A key conceptual
advance is the finding that rhythmic sequences engage
auditory and motor areas more strongly than arrhythmic
sequences, even during passive listening and in the
absence of movement. Another is that perceptually
strong beats evoke stronger neural activity than weak
beats, which suggests a close link between neural
activity and perception. Underlying both are oscillatory
processes that are capable of entraining to the beat, but
are also coordinated across sensory, frontal, parietal,
and motor-related areas both cortically and
subcortically. Some of these neural dynamics have
been observed in nonhuman primates, and it therefore
remains an open question to what extent humans are
unique in their ability to perceive musical beat, and what
differences in connectivity and neural response
dynamics give rise to humans’ seemingly superior ability
to spontaneously synchronize movements to music.
PREDICTABLE TIMING IN AUDITORY
PERCEPTION
As alluded to in the introduction, an appreciation of music
is only one of many consequences of our ability to
perceive rhythmic patterns. We now begin to shift our
focus away from music to explore rhythm perception in
a more general context. Intrinsic to the perception of a
musical beat is the prediction of when the next beat will
occur, and the perceptual advantages afforded by our
general ability to form temporal predictions will be the
subject of this section.
Temporal predictability in pattern detection
Humans show a remarkable ability to detect repeating
patterns that are quite complex in their acoustic content
(Agus et al., 2010; McDermott and Simoncelli, 2011;
Kumar et al., 2014; Barascud et al., 2016). To do so is
an impressive feat; the brain must be able to hold arbitrary
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 7
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
sounds of arbitrary length and complexity in memory over
timescales that can range from milliseconds up to tens of
seconds (Kaernbach, 2004). It is therefore relevant that a
feature of repeating sounds in nature is that they tend to
be rhythmic and indicative of animate sound sources.
Rhythm detection may therefore be an advantageous
sensory capability, and it has been shown that rhythmic
presentation of repeating sounds facilitates detection of
complex acoustic patterns (Rajendran et al., 2016) and
decreases detection thresholds (Lawrance et al., 2014).
The entrainment of oscillatory activity in the brain,
mentioned earlier in the context of beat perception (see
Entrainment of oscillatory activity to musical beat),
provides a likely explanation for these results too.
Rhythmic input is widely thought to entrain attentional
resources (Lakatos et al., 2008; Bolger et al., 2013;
Calderone et al., 2014) such that neuronal excitability is
highest when the next stimulus is predicted to occur
(Lakatos et al., 2009; Besle et al., 2011). Low-frequency
entrainment of oscillations may therefore serve as a mech-
anism for sensory selection (Schroeder and Lakatos, 2009)
and improve the quality of sensory information received
(Rohenkohl et al., 2014). It is worth noting that the rhythmic
form of temporal expectation is just one of several forms of
temporal expectation, each resulting in subtle differences
in perception that may arise from differences in the under-
lying neural substrates (Nobre et al., 2007; Breska and
Deouell, 2017). For example, an enhancement of percep-
tual sensitivity has been demonstrated in both periodic
and non periodic sequences that are temporally pre-
dictable, but motor facilitation through faster response
latencies were only observed in the periodic condition
(Morillon et al., 2016; Rajendran and Teki, 2016).
However, it is also important to note that there is an as
yet unresolved tension, or apparent conflict, in the
physiological literature regarding the nature of the neural
responses involved in the processing of periodic or
rhythmic stimuli. The aforementioned studies posit that
entrainment due to temporal expectation and attention
would result in periods of heightened sensitivity in phase
with the rhythm, which would be expected to lead to
enhanced, stronger responses. This is in contrast to
well documented phenomena such as ‘‘repetition
suppression” in auditory-evoked responses measured
through EEG (Costa-Faidella et al., 2011) and
‘‘stimulus-specific adaptation (SSA)” observed in neural
responses recorded extracellularly in auditory cortex
and non-lemniscal parts of the inferior colliculus and tha-
lamus (Malmierca, 2014; Khouri and Nelken, 2015; Nieto-
Diego and Malmierca, 2016), which find that responses to
simple periodic stimuli are reduced or suppressed, rather
than enhanced. How can isochronous stimuli on the one
hand produce entrainment that is suggestive of periodi-
cally heightened sensitivity but at the same time elicit
reduced response amplitudes as evidenced by SSA or
repetition suppression? The simple answer is that we do
not yet know. The methodologies of studies of entrain-
ment versus SSA are too different to allow direct compar-
isons. Entrainment studies typically use EEG or LFP
measures, the amplitude of which depends at least as
much on the degree of synchronization of neural activity
as on net response amplitudes of individual neurons.
Additionally, they are often carried out on awake human
volunteers or animal subjects who may be attending to
the rhythmic sounds, while SSA studies typically use
anesthetized preparations to measure extracellular
response amplitudes that are essentially independent of
neural synchrony. Consequently, while the take-home
messages from studies of entrainment and of SSA at pre-
sent appear somewhat contradictory, how they may be
reconciled will need to be addressed in future studies
using unified methodologies.
Temporal predictability in auditory scene analysis
Another practical advantage of forming predictions based
on temporal patterns is that it allows us to parse a
complex auditory scene into distinct perceptual objects
(Winkler et al., 2009a). In addition to temporal coherence
of sound features (Turgeon et al., 2002,2005;Shamma
et al., 2011), the predictability of features such as loca-
tion, pitch, loudness, and timbre play a pivotal role in audi-
tory scene analysis (Bendixen, 2014). The segregation of
a set of sounds from another set of sounds is known as
auditory stream segregation and has often been probed
experimentally using an alternating tone paradigm of
A-B-A, where A and B tones are typically different fre-
quencies of a certain frequency separation (Bregman
and Campbell, 1971; van Noorden, 1975). Temporal reg-
ularity within these paradigms influences whether these
sequences are perceived as integrated (A-B-A-B-A) or
whether they segregate into two perceptually distinct
streams (A---A---A and --B---B) (Bendixen et al., 2010;
Andreou et al., 2011; Rajendran et al., 2013). Together
with attentional effects, predictive coding based on tem-
poral and other feature regularities may account for the
stability of auditory objects (Denham and Winkler, 2006;
Pressnitzer and Hupe, 2006; Chait et al., 2007; Winkler
et al., 2012).
Current theories suggest that the formation of auditory
objects may rely on both basic sensory neural
mechanisms (Fishman et al., 2012) and attention-driven
oscillatory mechanisms (Lakatos et al., 2008; Schroeder
and Lakatos, 2009). Though the question of how a per-
ceptual object is formed from potentially noisy and con-
flicting information is still an open one, the final
representation of an auditory object is remarkably distinct
and robust, even if it overlaps spectrotemporally with the
unattended background (Ding and Simon, 2012,2013). A
key question here, which may also be relevant to how
beat and meter emerge from music, is whether and how
different oscillatory populations of neurons entrain to dif-
ferent time-varying sound features, and how their relative
contributions are weighted and integrated to form a coher-
ent percept of a single speaker in a noisy room.
Rhythms in speech perception
Speech is perhaps the most pervasive and critical context
in which we rely on our ability to derive meaning from
complex temporal patterns. The intelligibility of speech
has been shown to correlate with the entrainment of
8 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
oscillatory neural responses to the speech envelope
(Ahissar et al., 2001; Peelle and Davis, 2012), particularly
in the 4–8-Hz range (Luo and Poeppel, 2007). This range
corresponds to the syllable rate of speech production
(Greenberg et al., 1999) and dominates the temporal
modulations in the speech envelope (Chi et al., 1999;
Chandrasekaran et al., 2009; Elliott and Theunissen,
2009). The syllabic rate is nearly an order of magnitude
slower than fine structure elements such as formants
(30–50 Hz), and a few-fold faster than intonation contours
that are typical of phrasal units (1–2 Hz). Content at all of
these timescales are parsed concurrently to extract
meaning.
Speech, like music, is built hierarchically from
elements that span short to long timescales. A recent
survey of temporal modulations in speech and music
reveals a consistent peak in temporal modulations
around 5 Hz for speech across nine languages, and
around 2 Hz for music across several (Western) musical
genres (Ding et al., 2017). It is worth emphasizing, how-
ever, that the temporal structure present in speech is
not periodic like it is in music (Nolan and Jeon, 2014).
The temporal modulations in speech are constrained by
the motor system though, specifically by the biomechan-
ics of the articulators, and this results in clear temporal
structure in both the auditory and visual components to
speech (Chandrasekaran et al., 2009). There is strong
evidence that speech contains sufficient temporal struc-
ture to robustly entrain oscillatory neural activity (Giraud
and Poeppel, 2012), and that this entrainment serves to
maximize processing efficiency of future inputs by ensur-
ing that intervals of high neuronal excitability coincide with
when critical information is expected to arrive (Peelle and
Davis, 2012; Ding et al., 2017).
Interestingly, temporal manipulations to speech more
drastically impair intelligibility (Adank and Janse, 2009)
than extreme spectral manipulations do (Shannon et al.,
1995). Model-based accounts (Ghitza, 2011; Giraud and
Poeppel, 2012) suggest that phase-locking and nested
theta-gamma oscillations could explain why an extremely
impoverished speech signal can be understood if the syl-
labic rhythm is preserved (Ghitza and Greenberg, 2009).
The ‘‘asymmetric sampling in time” (AST) hypothesis sug-
gests that the two cerebral hemispheres sample an audi-
tory signal at different rates; the left auditory areas extract
information from 20 to 40-ms temporal integration win-
dows, while auditory areas in the right hemisphere sample
using 150–250-ms temporal integration windows
(Poeppel, 2003). A related hypothesis suggests that the
left hemisphere has better temporal resolution and the
right hemisphere has better spectral resolution, and that
this functional organization reflects an optimization of pro-
cessing for speech and music, respectively (Zatorre et al.,
2002). Both of these ideas are consistent with the obser-
vation that the left hemisphere dominates during speech
processing while the right hemisphere dominates during
music processing (Tervaniemi and Hugdahl, 2003). The
parallels drawn here between music and speech deal
strictly with timing and do not suggest that music has
any meaning that is analogous to the semantic meaning
of speech (Lerdahl and Jackendoff, 1983). However,
given these parallels, it is possible that music and speech
co-evolved (Fitch, 2000; Hauser and McDermott, 2003;
Fitch, 2006; Patel, 2007) and are built on overlapping cir-
cuit mechanisms for auditory working memory (Hickok
et al., 2003; Joseph et al., 2015) and timing (Patel, 2011).
To summarize
The temporal predictability that results from rhythmic
stimulation helps us detect patterns, parse an auditory
scene into distinct auditory objects, and understand
speech. Entrainment of neural oscillations, which by
virtue of aligning to temporal modulations in a rhythmic
acoustic signal generates predictions about future input,
is thought to underlie all of these abilities. The acoustic
stimuli used in these studies range from extremely
simple alternating tone paradigms, to the parsing of two
people speaking simultaneously. Much remains to be
understood regarding what periodically or quasi-
periodically repeating features in a spectrotemporally
complex sound entrain oscillations, and how such
oscillations are ultimately integrated to form distinct
auditory objects or extract meaning. Knowing these
answers would likely shed light on the mechanism and
functional role of oscillatory entrainment to musical
rhythms.
NEURAL MECHANISMS OF TIMING
Music, speech, and the parsing of complex auditory
scenes all rely on an ability to detect temporal
regularities in order to form the temporal predictions that
drive how sounds in the future are perceived. This
requires some form of timekeeping in the brain. The
timing field is vast and is a likely reflection of the
complexity of the neural circuits that are able to track
time (Teki, 2016). Those findings most relevant to our dis-
cussion are reviewed here.
Dedicated timekeeping circuits?
The neuronal mechanisms underlying timing have been a
subject of investigation for several decades. Braitenberg
(1967) proposed the cerebellum as a biological clock in
the millisecond range. Since then, the concept of a central
clock or internal timekeeper has dominated timing
research. Early work highlighted the unique synaptic cir-
cuitry of the cerebellum and the inferior olive as being
capable of generating precise timing signals. Specifically,
inferior olive neurons, which provide climbing fiber input to
the Purkinje cells in the cerebellum, possess unique
voltage-gated conductances that exhibit rhythmic sub-
threshold membrane potential oscillations (5–15 Hz) as
well as electrical gap-junctions that synchronize mem-
brane potential oscillations across cells into distinct neu-
ronal clusters that show temporally coherent activity
(Llinas et al., 1974; Llinas and Yarom, 1981). The deep
cerebellar nuclei like the dentate nucleus modulate the
electrical activity of olivary neurons and decouple them
into dynamic cell assemblies. Furthermore, these deep
cerebellar nuclei are inhibited by the Purkinje cells,
completing a feed-forward inhibitory loop. These
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 9
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
neurophysiological properties provide the olivocerebellar
network with the capacity to generate accurate absolute
timing signals for motor and perceptual timing (Welsh
et al., 1995; Yarom and Cohen, 2002; Jacobson et al.,
2008; Mathy et al., 2009). The use of timing signals from
the olivocerebellar network has been demonstrated
across several timing paradigms in human studies as well
(Xu, 2006; Teki et al., 2011; Wu et al., 2011; Lusk et al.,
2016).
Motivated by neuropsychological evidence from
Parkinson’s patients who showed perceptual timing
deficits, parallel work focused on the basal ganglia as a
core timing network in the brain (Artieda et al., 1992).
Matell and Meck (2004) proposed an oscillatory timing
model: medium spiny neurons in the dorsal striatum act
as coincidence detectors of oscillatory cortical activity
(5–15 Hz; Miall, 1989). The cortical oscillations are pro-
posed to be synchronized at interval onset by phasic
dopamine release from the ventral tegmental area, while
dopaminergic input from the substantia nigra modulates
the activity of the dorsal striatum (Buhusi and Meck,
2005). Cortico-striatal synapses are strengthened or
weakened over experience through long-term potentiation
and depression, and after repeated stimulus presentation,
medium spiny neurons learn to encode the duration of
reinforced time intervals (Gu et al., 2011). Several studies
point to the importance of the dopaminergic basal ganglia
network in mediating accurate timing signals (Jin et al.,
2009; Bartolo et al., 2014; Gershman et al., 2014; Chiba
et al., 2015; Gouve
ˆa et al., 2015; Mello et al., 2015;
Soares et al., 2016).
Cortical networks such as primary visual, auditory,
parietal and frontal cortices have also been implicated in
sensory timing functions (e.g. Leon and Shadlen, 2003;
Bueti et al., 2008; Bueti and Macaluso, 2010;Bueti,
2011;Schneider and Ghose, 2012; Hayashi et al., 2015;
Jazayeri and Shadlen, 2015; Namboodiri et al., 2015;
Bakhurin et al., 2016; Shuler, 2016). However, it is not
completely understood what aspects of timing are respec-
tively mediated by each of these networks, nor are the
dynamics of temporal processing across sensory and
higher order cortical networks completely clear (see Rao
et al., 2001). The likely hypothesis is that early stage sen-
sory cortices process the stimulus-related features while
parietal and frontal cortices are engaged by task demands
like memory and attention (Finnerty et al., 2015). More
recently, the hippocampus (CA1) has been shown to have
‘time cells’ that display increased firing rates in relation to
elapsing durations, independent of space and distance
(MacDonald et al., 2011). The prevailing view suggests
the existence of ‘time cells’ in the striatum, cerebellum
and the hippocampus whose output is integrated to obtain
a common percept of time (Lusk et al., 2016).
Is the passage of time implicit in neural responses?
An alternative hypothesis to the one positing that time is
kept by dedicated sensorimotor circuits is one that
suggests that timing is an intrinsic computation that
emerges from network-wide neural dynamics (Goel and
Buonomano, 2014). Hardy and Buonomano (2016) have
recently reviewed a number of plausible neurocomputa-
tional models of timing. Here, we briefly summarize the
primary models and their principles of operation.
One of the simplest network models of timing is based
on ‘synfire chains’ where groups of neurons are
connected in a feed-forward fashion such that each
neuronal population is activated at a different instant in
time (Haß et al., 2008). Synfire chains represent neurobi-
ologically plausible mechanism for interval timing but are
limited because of their pure feed-forward architecture
and absence of recurrent connections. Positive feedback
models, on the other hand, use recurrent excitatory con-
nections and are compatible with experimental findings
on cortical representation of time (e.g. Namboodiri et al.,
2015). The limitation of these models, however, is that it
is not known whether these can be generalized to
sequences of intervals. Finally, state-dependent networks
of timing and temporal processing are based on the
hypothesis that sensory events interact with current
states of recurrent networks to form a sequence of net-
work states that encode each event in the context of
recent stimulus history (Karmarkar and Buonomano,
2007). Several studies have demonstrated that cortical
networks can be trained to represent time intervals in
the hundreds of millisecond range where timing is pro-
posed to emerge from network-wide and pathway-
specific changes in evoked neural dynamics (e.g. Goel
and Buonomano, 2016).
To summarize
Although the notion of population clocks is gaining traction
(Hardy and Buonomano, 2016), there is no compelling
biologically plausible model that generalizes these results
from studies based on computation of single time inter-
vals to complex sequences such as those observed in
music. Natural motor commands as well as sound
sequences like speech and music consist of dynamically
varying time intervals with different temporal structures.
Several of the circuits reviewed above are specialized
for processing time on the scale of tens of milliseconds
to a few seconds, but it is not yet clear which of these
mechanisms apply in the context of beat perception as
this has not directly been tested. Integrating basic mech-
anisms of sound processing observed along the auditory
pathways with models of timing may provide some novel
insights into the examination of pattern timing.
Timing functions are distributed across the brain and
are expressed in subcortical motor structures like the
basal ganglia and the cerebellum, sensory and motor
cortices, as well as higher order areas like the parietal
and frontal cortical networks. While the timing
computations performed by of each these individual
brain regions is not fully understood, it is evident that
particular areas are specialized for mediating specific
timing functions. Future research may benefit from
dissecting the precise role within each brain area and as
a part of a distributed timing network.
CONCLUSIONS AND OUTLOOK
A lot of ground has been covered in this review, largely
because the work comprising each section draws from a
10 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
different field (or several different fields) of research that
have so far shown little overlap with the others. This is
despite these topics sharing common themes that unite
them. For example, the timescales that are relevant in
music are also relevant in other contexts such as in the
production and perception of movement (walking,
running, breathing) and speech, and in the parsing of
complex acoustic scenes (see Fig. 1B). Furthermore,
the entrainment of neural oscillations through
sensorimotor loops may be a central mechanism
governing perception and action in all of these contexts.
By presenting an overview of these diverse topics that
likely rely on similar temporal encoding mechanisms, we
hope that this review will provide an insightful point of
departure for future investigations into auditory temporal
sequence processing.
We conclude by leaving the reader with an open
question that we believe will be pivotal to advancing our
understanding of temporal sequence processing,
namely a mechanistic understanding of the entrainment
of neural oscillations. While a large body of work points
to the importance of neural oscillations (the studies
mentioned in the second half of this review only scratch
the surface), this topic is nevertheless not without
controversy with many questions that remain
unresolved, starting with the functional role that
oscillations in different frequency bands play in
information coding and retrieval. A number of theories
have been proposed that describe functional aspects of
oscillatory dynamics, including communication between
neuronal groups through coherence of oscillations
(Fries, 2015), the prioritization of sensory streams
through pulsed inhibition via alpha oscillations (Haegens
et al., 2011; Mathewson et al., 2011; Jensen et al.,
2014; Strauß et al., 2014), the retrieval of memories
through spiking that is phase-locked to theta oscillations
(Hsieh and Ranganath, 2014), and the active sensing of
sound through rhythmic temporal priors provided by the
motor system (Morillon et al., 2015). Of particular rele-
vance are the behavioral (Morillon et al., 2016) and neu-
ronal dissociations (Breska and Deouell, 2017) that
have been observed between auditory sequences that
are periodic versus temporally predictable but not peri-
odic, suggesting that the underlying neural dynamics
manifest differently according to the nature of the tempo-
ral predictions being maintained. Further work is required
to develop a unified understanding of the function served
by the entrainment of neural oscillations.
A second question relates to the dynamics of
entrainment itself, specifically how entrainment arises
and unfolds, how it extracts higher-order temporal
regularities in a rhythmic sequence, how it behaves in
response to new sensory input, and how possibly
different and simultaneous processes interact to guide
what we perceive. Much of our current knowledge about
the role of neural oscillations and entrainment in the
perception of temporally structured stimuli is based on
the interpretation of data obtained with non-invasive
techniques (EEG, MEG, fMRI), which lack the fine
resolution required to provide insights into these
phenomena at the level of individual neurons and neural
networks. Deeper insights will need data obtained at
higher spatial resolution, as is typically obtained from
invasive recordings in experimental animals, but the use
of richly structured auditory stimuli such as music in
animal electrophysiology experiments remains a highly
unusual thing to do (see Rajendran et al., 2017 for a first
step in this direction). However, we would suggest that
employing music, in addition to traditional paradigms,
may be especially fruitful since much is known already
about our perception of music (see the first two sections
of this review), and because it is a finely controllable stim-
ulus paradigm within which nested periodicities across dif-
ferent sound features (frequency, loudness, duration) can
be simultaneously present and tuned. Furthermore, we
suggest that, since nonhuman model organisms do show
some capacity to perceive and discriminate rhythms (see
the section on Beat Processing in Nonhuman Species),
and since recognizing rhythmic patterns in environmental
sounds such as footsteps or vocalizations is of great
importance to a wide range of organisms, complementary
studies in nonhuman species should begin to fill in the
gaps in our knowledge that non-invasive psychoacoustic
and physiological studies on humans alone cannot
answer.
Acknowledgments—VGR (Wellcome Trust Doctoral Programme
in Neuroscience: WT099750MA) and ST (Sir Henry Wellcome
Postdoctoral Fellowship: WT106084/Z/14/Z) are supported by
the Wellcome Trust.
REFERENCES
Abecasis D (2005) Differential brain response to metrical accents in
isochronous auditory sequences. Music Percept 22:549–562.
Adank P, Janse E (2009) Perceptual learning of time-compressed
and natural fast speech. J Acoust Soc Am 126:2649–2659.
Agus TR, Thorpe SJ, Pressnitzer D (2010) Rapid formation of robust
auditory memories: insights from noise. Neuron 66:610–618.
Ahissar E, Nagarajan S, Ahissar M, Protopapas A, Mahncke H,
Merzenich MM (2001) Speech comprehension is correlated with
temporal response patterns recorded from auditory cortex. Proc
Natl Acad Sci USA 98:13367–13372.
Andreou L-V, Kashino M, Chait M (2011) The role of temporal
regularity in auditory segregation. Hear Res 280:8.
Arnal LH (2012) Predicting ‘‘When” Using the Motor System’s Beta-
Band Oscillations. Front Hum Neurosci 6.
Arnal LH, Giraud A-L (2012) Cortical oscillations and sensory
predictions. Trends Cogn Sci 16:390–398.
Artieda J, Pastor MA, Lacruz F, Obeso JA (1992) Temporal
discrimination is abnormal in Parkinson’s disease. Brain
115:199–210.
Aschersleben G (2002) Temporal control of movements in
sensorimotor synchronization. Brain Cogn. 48:66–79.
Bakhurin KI, Goudar V, Shobe JL, Claar LD, Buonomano DV,
Masmanidis SC (2016) Differential encoding of time by prefrontal
and striatal network dynamics. J Neurosci. 1789–16.
Barascud N, Pearce MT, Griffiths TD, Friston KJ, Chait M (2016)
Brain responses in humans reveal ideal observer-like sensitivity to
complex acoustic patterns. Proc Natl Acad Sci USA 113:
E616–E625.
Bartolo R, Merchant H (2015) Beta oscillations are linked to the
initiation of sensory-cued movement sequences and the internal
guidance of regular tapping in the monkey. J Neurosci
35:4635–4640.
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 11
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
Bartolo R, Prado L, Merchant H (2014) Information processing in the
primate basal ganglia during sensory-guided and internally driven
rhythmic tapping. J Neurosci 34:3910–3923.
Ba
˚a
˚th R (2015) Subjective rhythmization. Music Percept 33:244–254.
Bendixen A (2014) Predictability effects in auditory scene analysis: a
review. Front Neurosci 8:60.
Bendixen A, Denham SL, Gyimesi K, Winkler I (2010) Regular
patterns stabilize auditory streams. J Acoust Soc Am 128:3658.
Besle J, Schevon CA, Mehta AD, Lakatos P, Goodman RR, McKhann
GM, Emerson RG, Schroeder CE (2011) Tuning of the human
neocortex to the temporal dynamics of attended events. J
Neurosci 31:3176–3185.
Be
´gel V, Benoit C-E, Correa A
´, Cutanda D, Kotz SA, Bella SD (2017)
‘‘Lost in time” but still moving to the beat. Neuropsychologia
94:129–138.
Bolger D, Trost W, Scho
¨n D (2013) Rhythm implicitly affects temporal
orienting of attention across modalities. Acta Psychol.
142:238–244.
Bolton TL (1894) Rhythm. Am J Psychol 6:145.
Bouwer FL, van Zuijen TL, Honing H (2014) Beat processing is pre-
attentive for metrically simple rhythms with clear accents: An ERP
study Johnson B, ed. PLoS ONE 9:e97467.
Braitenberg V (1967) Is the cerebellar cortex a biological clock in the
millisecond range? In: The Cerebellum, pp 334–346 Progress in
Brain Research. Elsevier.
Bregman AS, Campbell J (1971) Primary auditory stream segregation
and perception of order in rapid sequences of tones. J Exp
Psychol 89:244–249.
Breska A, Deouell LY (2017) Neural mechanisms of rhythm-based
temporal prediction: delta phase-locking reflects temporal
predictability but not rhythmic entrainment Poeppel D, ed. PLoS
Biol 15:e2001665–30.
Brochard R, Abecasis D, Potter D, Ragot R, Drake C (2003) The
‘‘Ticktock” of our internal clock: direct brain evidence of subjective
accents in isochronous sequences. Psychol Sci 14:362–366.
Bueti D (2011) The sensory representation of time. Front Integr
Neurosci 5.
Bueti D, Bahrami B, Walsh V (2008) Sensory and association cortex
in time perception. J Cogn Neurosci 20:1054–1062.
Bueti D, Macaluso E (2010) Auditory temporal expectations modulate
activity in visual cortex. NeuroImage 51:1168–1183.
Buhusi CV, Meck WH (2005) What makes us tick? Functional and
neural mechanisms of interval timing. Nat Rev Neurosci
6:755–765.
Calderone DJ, Lakatos P, Butler PD, Castellanos FX (2014)
Entrainment of neural oscillations as a modifiable substrate of
attention. Trends Cogn Sci 18:300–309.
Chait M, Poeppel D, de Cheveigne
´A, Simon JZ (2007) Processing
asymmetry of transitions between order and disorder in human
auditory cortex. J Neurosci 27:5207–5214.
Chandrasekaran C, Trubanova A, Stillittano S, Caplier A, Ghazanfar
AA (2009) The natural statistics of audiovisual speech Friston KJ,
ed. PLoS Comput Biol 5:e1000436.
Chapin HL, Zanto T, Jantzen KJ, Kelso SJA, Steinberg F, Large EW
(2010) Neural responses to complex auditory rhythms: the role of
attending. Front Psychol 1.
Chen JL, Penhune VB, Zatorre RJ (2008) Moving on time: brain
network for auditory-motor synchronization is modulated by
rhythm complexity and musical training. J Cogn Neurosci
20:226–239. https://doi.org/10.1162/jocn200820018.
Chi T, Gao Y, Guyton MC, Ru P, Shamma S (1999) Spectro-temporal
modulation transfer functions and speech intelligibility. J Acoust
Soc Am 106:2719–2732.
Chiba A, Oshio KI, Inase M (2015) Neuronal representation of
duration discrimination in the monkey striatum. Physiol Rep 3:
e12283.
Clarke EF (1999) Rhythm and timing in music. In: The psychology of
music. Elsevier. p. 473–500.
Cook P, Rouse A, Wilson M, Reichmuth C (2013) A California sea
lion (Zalophus californianus) can keep the beat: motor
entrainment to rhythmic auditory stimuli in a non vocal mimic. J
Comp Psychol 127:412–427.
Costa-Faidella J, Baldeweg T, Grimm S, Escera C (2011)
Interactions between ‘‘What” and ‘‘When” in the Auditory
System: Temporal Predictability Enhances Repetition
Suppression. J Neurosci 31:18590–18597.
Cowan N (1984) On short and long auditory stores. Psychol Bull
96:341–370.
Denham SL, Winkler I (2006) The role of predictive models in the
formation of auditory streams. J Physiol Paris 100:154–170.
Ding N, Patel AD, Chen L, Butler H, Luo C, Poeppel D (2017)
Temporal modulations in speech and music. Neurosci Biobehav
Rev:1–7.
Ding N, Simon JZ (2012) Emergence of neural encoding of auditory
objects while listening to competing speakers. Proc Natl Acad Sci
USA 109:11854–11859.
Ding N, Simon JZ (2013) Adaptive temporal encoding leads to a
background-insensitive cortical representation of speech. J
Neurosci 33:5728–5735.
Doelling KB, Poeppel D (2015) Cortical entrainment to music and its
modulation by expertise. Proc Natl Acad Sci USA 112:
E6233–E6242.
Elliott TM, Theunissen FE (2009) The modulation transfer function for
speech intelligibility Friston KJ, ed. PLoS Comput Biol 5:
e1000302.
Essens PJ, Povel DJ (1985) Metrical and nonmetrical representations
of temporal patterns. Percept Psychophys 37:1–7.
Finnerty GT, Shadlen MN, Jazayeri M, Nobre AC, Buonomano DV
(2015) Time in cortical circuits. J Neurosci 35:13912–13916.
Fishman YI, Micheyl C, Steinschneider M (2012) Neural mechanisms
of rhythmic masking release in monkey primary auditory cortex:
implications for models of auditory scene analysis. J Neurophysiol
107:2366–2382.
Fitch WT (2000) The evolution of speech: a comparative review.
Trends Cogn Sci 4:258–267.
Fitch WT (2006) The biology and evolution of music: a comparative
perspective. Cognition 100:173–215.
Fraisse P (1963) The psychology of time. Harper & Row.
Fraisse P (1978) Time and rhythm perception. In: Carterette E,
Friedman M, editors. Handbook of perception, Vol. VIII. New
York: Academic Press. p. 203–254.
Fraisse P (1982) Rhythm and tempo. In: The psychology of
music. Elsevier.
Fraisse P, Ole
´ron G, Paillard J (1958) Sur les repe
`res sensoriels qui
permettent de contro
ˆler les mouvements d’accompagnement de
stimuli pe
´riodiques. psy 58:321–338.
Fries P (2015) Rhythms for cognition: communication through
coherence. Neuron 88:220–235.
Fujioka T, Ross B, Trainor LJ (2015) Beta-band oscillations represent
auditory beat and its metrical hierarchy in perception and imagery.
J Neurosci 35:15187–15198.
Fujioka T, Trainor LJ, Large EW, Ross B (2009) Beta and gamma
rhythms in human auditory cortex during musical beat processing.
Ann N Y Acad Sci 1169:89–92.
Geiser E, Sandmann P, Ja
¨ncke L, Meyer M (2010) Refinement of
metre perception - training increases hierarchical metre
processing. Eur J Neurosci 32:1979–1985.
Geiser E, Walker KMM, Bendor D (2014) Global timing: a conceptual
framework to investigate the neural basis of rhythm perception in
humans and non-human species. Front Psychol 5:159.
Gershman SJ, Moustafa AA, Ludvig EA (2014) Time representation
in reinforcement learning models of the basal ganglia. Front
Comput Neurosci 7.
Ghitza O (2011) Linking speech perception and neurophysiology:
speech decoding guided by cascaded oscillators locked to the
input rhythm. Front Psychol 2.
Ghitza O, Greenberg S (2009) On the possible role of brain rhythms
in speech perception: intelligibility of time-compressed speech
with periodic and aperiodic insertions of silence. Phonetica
66:113–126.
12 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
Giraud A-L, Poeppel D (2012) Cortical oscillations and speech
processing: emerging computational principles and operations.
Nat Neurosci 15:511–517.
Goel A, Buonomano DV (2014) Timing as an intrinsic property of
neural networks: evidence from in vivo and in vitro experiments.
Phil Trans R Soc B 369:20120460.
Goel A, Buonomano DV (2016) Temporal interval learning in cortical
cultures is encoded in intrinsic network dynamics. Neuron
91:320–327.
Gomez-Ramirez M, Kelly SP, Molholm S, Sehatpour P, Schwartz TH,
Foxe JJ (2011) Oscillatory sensory selection mechanisms during
intersensory attention to rhythmic auditory and visual inputs: a
human electrocorticographic investigation. J Neurosci
31:18556–18567.
Gouve
ˆa TS, Monteiro T, Motiwala A, Soares S, Machens C, Paton JJ
(2015) Striatal dynamics explain duration judgments. eLife
4:2473.
Grahn JA (2009) The role of the basal ganglia in beat perception. Ann
N Y Acad Sci 1169:35–45.
Grahn JA, Brett M (2007) Rhythm and beat perception in motor areas
of the brain. J Cogn Neurosci 19:893–906.
Grahn JA, McAuley JD (2009) Neural bases of individual differences
in beat perception. NeuroImage 47:1894–1903.
Grahn JA, Rowe JB (2013) Finding and feeling the musical beat:
striatal dissociations between detection and prediction of
regularity. Cereb Cortex 23:913–921.
Greenberg S, Arai T, Kingsbury B, Morgan N, Shire M, Silipo R, Wu
SL (1999) Syllable-based speech recognition using auditory like
features. J Acoust Soc Am 105:1157–1158.
Grube M, Griffiths TD (2009) Metricality-enhanced temporal encoding
and the subjective perception of rhythmic sequences. Cortex
45:72–79.
Gu BM, Cheng RK, Yin B, Meck WH (2011) Quinpirole-induced
sensitization to noisy/sparse periodic input: temporal
synchronization as a component of obsessive-compulsive
disorder. Neuroscience 179:143–150.
Haegens S, Na
´cher V, Luna R (2011) a-Oscillations in the monkey
sensorimotor network influence discrimination performance by
rhythmical inhibition of neuronal spiking.
Hardy NF, Buonomano DV (2016) Neurocomputational models of
interval and pattern timing. Curr Opin Behav Sci 8:250–257.
Haß J, Blaschke S, Rammsayer T, Herrmann JM (2008) A
neurocomputational model for optimal temporal processing. J
Comput Neurosci 25:449–464.
Hattori Y, Tomonaga M, Matsuzawa T (2013) Spontaneous
synchronized tapping to an auditory rhythm in a chimpanzee.
Sci Rep 3:1566.
Hauser MD, McDermott J (2003) The evolution of the music faculty: a
comparative perspective. Nat Neurosci 6:663–668.
Hayashi MJ, Ditye T, Harada T, Hashiguchi M, Sadato N, Carlson S,
Walsh V, Kanai R (2015) Time adaptation shows duration
selectivity in the human parietal cortex Zatorre R, ed. PLoS Biol
13:e1002262.
Henry MJ, Herrmann B (2014) Low-frequency neural oscillations
support dynamic attending in temporal context. Timing Time
Percept 2:62–86.
Henry MJ, Herrmann B, Obleser J (2014) Entrained neural
oscillations in multiple frequency bands comodulate behavior.
Proc Natl Acad Sci USA 111:14935–14940.
Hickok G, Buchsbaum B, Humphries C, Muftuler T (2003) Auditory-
motor interaction revealed by fMRI: speech, music, and working
memory in area spt. J Cogn Neurosci 15:673–682.
Honing H (2012) Without it no music: beat induction as a fundamental
musical trait. Ann N Y Acad Sci 1252:85–91.
Honing H, Bouwer FL, Ha
´den GP (2014) Perceiving Temporal
Regularity in Music: The Role of Auditory Event-Related
Potentials (ERPs) in Probing Beat Perception. In: Neurobiology
of Interval Timing, pp 305–323 Advances in Experimental
Medicine and Biology. New York, NY: Springer.
Honing H, Merchant H (2014) Differences in auditory timing between
human and nonhuman primates. Behav Brain Sci 37:557–558.
Honing H, Merchant H, Ha
´den GP, Prado L, Bartolo R (2012) Rhesus
monkeys (Macaca mulatta) detect rhythmic groups in music, but
not the beat Larson CR, ed. PLoS ONE 7:e51369.
Hsieh L-T, Ranganath C (2014) Frontal midline theta oscillations
during working memory maintenance and episodic encoding and
retrieval. NeuroImage 85:721–729.
Iversen JR, Repp BH, Patel AD (2009) Top-down control of rhythm
perception modulates early auditory responses. Ann N Y Acad Sci
1169:58–73.
Ivry RB, Schlerf JE (2008) Dedicated and intrinsic models of time
perception. Trends Cogn Sci 12:273–280.
Jacobson GA, Rokni D, Yarom Y (2008) A model of the olivo-
cerebellar system as a temporal pattern generator. Trends
Neurosci. 31:617–625.
Jacoby N, McDermott JH (2017) Integer ratio priors on musical
rhythm revealed cross-culturally by iterated reproduction. Curr
Biol 27:359–370.
Jazayeri M, Shadlen MN (2015) A neural mechanism for sensing and
reproducing a time interval. Curr Biol 25:2599–2609.
Jensen O, Gips B, Bergmann TO, Bonnefond M (2014) Temporal
coding organized by coupled alpha and gamma oscillations
prioritize visual processing. Trends Neurosci:1–14.
Jin DZ, Fujii N, Graybiel AM (2009) Neural representation of time in
cortico-basal ganglia circuits. Proc Natl Acad Sci USA
106:19156–19161.
Joseph S, Teki S, Kumar S, Husain M, Griffiths TD (2016) Resource
allocation models of auditory working memory. Brain Res
1640:183–192.
Kaernbach C (2004) The memory of noise. Exp Psychol 51:240–248.
Karmarkar UR, Buonomano DV (2007) Timing in the absence of
clocks: encoding time in neural network states. Neuron
53:427–438.
Khouri L, Nelken I (2015) Detecting the unexpected. Curr Opin
Neurobiol 35:142–147.
Kornysheva K, Anshelm-Schiffer von A-M, Schubotz RI (2010)
Inhibitory stimulation of the ventral premotor cortex temporarily
interferes with musical beat rate preference. Hum Brain Mapp
32:1300–1310.
Kornysheva K, Schubotz RI (2011) Impairment of auditory-motor
timing and compensatory reorganization after ventral premotor
cortex stimulation Tsakiris M, ed. PLoS ONE 6:e21421.
Kumar S, Bonnici HM, Teki S, Agus TR, Pressnitzer D, Maguire EA,
Griffiths TD (2014) Representations of specific acoustic patterns
in the auditory cortex and hippocampus. Proc Biol Sci
281:20141000.
Kung S-J, Chen JL, Zatorre RJ, Penhune VB (2013) Interacting
cortical and basal ganglia networks underlying finding and tapping
to the musical beat. J Cogn Neurosci 25:401–420.
Lakatos P (2005) An oscillatory hierarchy controlling neuronal
excitability and stimulus processing in the auditory cortex. J
Neurophysiol 94:1904–1911.
Lakatos P, Karmos G, Mehta AD, Ulbert I, Schroeder CE (2008)
Entrainment of neuronal oscillations as a mechanism of
attentional selection. Science 320:110–113.
Lakatos P, Musacchia G, O’Connel MN, Falchier AY, Javitt DC,
Schroeder CE (2013) The spectrotemporal filter mechanism of
auditory selective attention. Neuron 77:750–761.
Lakatos P, O’Connell MN, Barczak A, Mills A, Javitt DC, Schroeder
CE (2009) The leading sense: supramodal control of
neurophysiological context by attention. Neuron 64:419–430.
Large EW, Herrera JA, Velasco MJ (2015) Neural networks for beat
perception in musical rhythm. Front Systems Neurosci
9:514–583.
Large EW, Jones MR (1999) The dynamics of attending: how people
track time-varying events. Psychol Rev 106:119–159.
Lawrance ELA, Harper NS, Cooke JE, Schnupp JWH (2014)
Temporal predictability enhances auditory detection. J Acoust
Soc Am 135:EL357–EL363.
Leon MI, Shadlen MN (2003) Representation of time by neurons in
the posterior parietal cortex of the Macaque. Neuron 38:
317–327.
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 13
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
Lerdahl F, Jackendoff R (1983) A generative theory of tonal
music. MIT Press.
Lewis PA, Miall RC (2003) Brain activation patterns during
measurement of sub- and supra-second intervals.
Neuropsychologia 41:1583–1592.
Llinas R, Baker R, Sotelo C (1974) Electrotonic coupling between
neurons in cat inferior olive. J Neurophysiol.
Llinas R, Yarom Y (1981) Electrophysiology of mammalian inferior
olivary neurones in vitro. Different types of voltage-dependent
ionic conductances. J Physiol 315:549–567.
Loveless N, Leva
¨nen S, Jousma
¨ki V, Sams M, Hari R (1996)
Temporal integration in auditory sensory memory: neuromagnetic
evidence. Electroencephalography Clinical Neurophysiology/
Evoked Potentials Section 100:220–228.
Luo H, Poeppel D (2007) Phase patterns of neuronal responses
reliably discriminate speech in human auditory cortex. Neuron
54:1001–1010.
Lusk NA, Petter EA, MacDonald CJ, Meck WH (2016) Cerebellar,
hippocampal, and striatal time cells. Curr Opin Behav Sci
8:186–192.
MacDonald CJ, Lepage KQ, Eden UT, Eichenbaum H (2011)
Hippocampal ‘‘Time Cells” bridge the gap in memory for
discontiguous events. Neuron 71:737–749.
Madison G, Gouyon F, Ulle
´nF,Ho
¨rnstro
¨m K (2011) Modeling the
tendency for music to induce movement in humans: first
correlations with low-level audio descriptors across music
genres. J Exp Psychol Hum Percept Perform 37:1578–1594.
Malmierca M (2014) Neuronal adaptation, novelty detection and
regularity encoding in audition. Front Syst Neurosci 8.
Matell MS, Meck WH (2004) Cortico-striatal circuits and interval
timing: coincidence detection of oscillatory processes. Cogn Brain
Res 21:139–170.
Mathewson KE, Lleras A, Beck DM, Fabiani M, Ro T, Gratton G
(2011) Pulsed out of awareness: EEG alpha oscillations represent
a pulsed-inhibition of ongoing cortical processing. Front Psychol
2.
Mathy A, Ho SSN, Davie JT, Duguid IC, Clark BA, Ha
¨usser M (2009)
Encoding of oscillations by axonal bursts in inferior olive neurons.
Neuron 62:388–399.
McAuley JD, Henry MJ, Tkach J (2012) Tempo mediates the
involvement of motor areas in beat perception. Ann N Y Acad
Sci 1252:77–84.
McAuley JD, Jones MR, Holub S, Johnston HM, Miller NS (2006) The
time of our lives: life span development of timing and event
tracking. J Exp Psychol Gen 135:348–367.
McDermott JH, Simoncelli EP (2011) Sound texture perception via
statistics of the auditory periphery: evidence from sound
synthesis. Neuron 71:926–940.
Mello GBM, Soares S, Paton JJ (2015) A scalable population code for
time in the striatum. Curr Biol 25:1113–1122.
Merchant H, Grahn J, Trainor L, Rohrmeier M, Fitch WT (2015)
Finding the beat: a neural perspective across humans and non-
human primates. Phil Trans R Soc B 370:20140093.
Merchant H, Harrington DL, Meck WH (2013) Neural basis of the
perception and estimation of time. Annu Rev Neurosci
36:313–336.
Miall C (1989) The storage of time intervals using oscillating neurons.
Neural Comput. 1:359–371.
Morillon B, Hackett TA, Kajikawa Y, Schroeder CE (2015) Predictive
motor control of sensory dynamics in auditory active sensing. Curr
Opin Neurobiol 31:230–238.
Morillon B, Schroeder CE, Wyart V, Arnal LH (2016) Temporal
prediction in lieu of periodic stimulation. J Neurosci
36:2342–2347.
Namboodiri VMK, Huertas MA, Monk KJ, Shouval HZ, Shuler MG
(2015) Visually cued action timing in the primary visual cortex.
Neuron 86:319–330.
Nieto-Diego J, Malmierca MS (2016) Topographic distribution of
stimulus-specific adaptation across auditory cortical fields in the
anesthetized rat Zatorre R, ed. PLoS Biol 14:e1002397.
Nobre AC, Correa A, Coull JT (2007) The hazards of time. Curr Opin
Neurobiol 17:465–470.
Nolan F, Jeon HS (2014) Speech rhythm: a metaphor? Phil Trans R
Soc B 369:20130396.
Nozaradan S, Peretz I, Missal M, Mouraux A (2011) Tagging the
neuronal entrainment to beat and meter. J Neurosci.
Oullier O, Jantzen KJ, Steinberg FL, Kelso JAS (2005) Neural
substrates of real and imagined sensorimotor coordination. Cereb
Cortex 15:975–985.
Parsons LM (2001) Exploring the functional neuroanatomy of music
performance, perception, and comprehension. Ann N Y Acad Sci
930:211–231.
Patel AD (2007) Music, language, and the brain. Oxford University
Press.
Patel AD (2011) Why would musical training benefit the neural
encoding of speech? The OPERA hypothesis. Front Psychol 2.
Patel AD, Iversen JR (2014) The evolutionary neuroscience of
musical beat perception: the Action Simulation for Auditory
Prediction (ASAP) hypothesis. Front Syst Neurosci.
Patel AD, Iversen JR, Bregman MR, Schulz I (2009) Experimental
evidence for synchronization to a musical beat in a nonhuman
animal. Curr Biol 19:827–830.
Peckel M, Pozzo T, Bigand E (2014) The impact of the perception of
rhythmic music on self-paced oscillatory movements. Front
Psychol 5:917.
Peelle JE, Davis MH (2012) Neural oscillations carry speech rhythm
through to comprehension. Front Psychol 3.
Phillips-Silver J, Toiviainen P, Gosselin N, Piche
´O, Nozaradan S,
Palmer C, Peretz I (2011) Born to dance but beat deaf: a new form
of congenital amusia. Neuropsychologia 49:961–969.
Poeppel D (2003) The analysis of speech in different temporal
integration windows: cerebral lateralization as ‘‘asymmetric
sampling in time”. Speech Commun. 41:245–255.
Povel D-J, Essens P (1985) Perception of temporal patterns. Music
Percept 2:411–440.
Pressnitzer D, Hupe J-M (2006) Temporal dynamics of auditory and
visual bistability reveal common principles of perceptual
organization. Curr Biol 16:1351–1357.
Rajendran VG, Harper NS, Abdel-Latif KHA, Schnupp JWH (2016)
Rhythm facilitates the detection of repeating sound patterns.
Front Neurosci 10:464–467.
Rajendran VG, Harper NS, Garcia-Lazaro JA, Lesica NA, Schnupp
JWH (2017) Midbrain adaptation may set the stage for the
perception of musical beat. Proc Biol Sci 284:1455.
Rajendran VG, Harper NS, Willmore BD, Hartmann WM, Schnupp
JWH (2013) Temporal predictability as a grouping cue in the
perception of auditory streams. J Acoust Soc Am 134:
EL98–EL104.
Rajendran VG, Teki S (2016) Periodicity versus prediction in sensory
perception. J Neurosci 36:7343–7345.
Rao RPN, Eagleman DM, Sejnowski TJ (2001) Optimal smoothing in
visual motion perception. Neural Comput. 13:1243–1253.
Repp BH (2002a) Perception of timing is more context sensitive than
sensorimotor synchronization. Percept Psychophys 64:703–716.
Repp BH (2002b) Automaticity and voluntary control of phase
correction following event onset shifts in sensorimotor
synchronization. J Exp Psychol Hum Percept Perform
28:410–430.
Repp BH (2005) Sensorimotor synchronization: a review of the
tapping literature. Psychon Bull Rev 12:969–992.
Repp BH, Keller PE (2004) Adaptation to tempo changes in
sensorimotor synchronization: effects of intention, attention, and
awareness. Q J Exp Psychol A 57:499–521.
Repp BH, Su Y-H (2013) Sensorimotor synchronization: a review of
recent research (2006–2012). Psychon Bull Rev 20:403–452.
Riecker A, Wildgruber D, Mathiak K, Grodd W, Ackermann H (2003)
Parametric analysis of rate-dependent hemodynamic response
functions of cortical and subcortical brain structures during
auditorily cued finger tapping: a fMRI study. NeuroImage
18:731–739.
14 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
Rohenkohl G, Gould IC, Pessoa J, Nobre AC (2014) Combining
spatial and temporal expectations to improve visual perception. J
Vis 14:8.
Rouse AA, Cook PF, Large EW, Reichmuth C (2016) Beat keeping in
a sea lion as coupled oscillation: implications for comparative
understanding of human rhythm. Front Neurosci 10:403.
Sakai K, Hikosaka O, Miyauchi S, Takino R, Tamada T, Iwata NK,
Nielsen M (1999) Neural representation of a rhythm depends on
its interval ratio. J Neurosci 19:10074–10081.
Schaefer RS, Vlek RJ, Desain P (2010) Decomposing rhythm
processing: electroencephalography of perceived and self-
imposed rhythmic patterns. Psychol Res 75:95–106.
Schneider BA, Ghose GM (2012) Temporal production signals in
parietal cortex Pack CC, ed. PLoS Biol 10:e1001413.
Schroeder CE, Lakatos P (2009) Low-frequency neuronal oscillations
as instruments of sensory selection. Trends Neurosci. 32:
9–18.
Schroeder CE, Wilson DA, Radman T, Scharfman H, Lakatos P
(2010) Dynamics of active sensing and perceptual selection. Curr
Opin Neurobiol 20:172–176.
Schwartze M, Farrugia N, Kotz SA (2013) Dissociation of formal and
temporal predictability in early auditory evoked potentials.
Neuropsychologia 51:320–325.
Selezneva E, Deike S, Knyazeva S, Scheich H, Brechmann A,
Brosch M (2013) Rhythm sensitivity in macaque monkeys. Front
Syst Neurosci 7.
Shamma SA, Elhilali M, Micheyl C (2011) Temporal coherence and
attention in auditory scene analysis. Trends Neurosci.
34:114–123.
Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M (1995)
Speech recognition with primarily temporal cues. Science
270:303–304.
Shuler MG (2016) Timing in the visual cortex and its investigation.
Curr Opin Behav Sci 8:73–77.
Snyder JS, Large EW (2005) Gamma-band activity reflects the metric
structure of rhythmic tone sequences. Cogn Brain Res
24:117–126.
Soares S, Atallah BV, Paton JJ (2016) Midbrain dopamine neurons
control judgment of time. Science 354:1273–1277.
Sowin
´ski J, Bella SD (2013) Poor synchronization to the beat may
result from deficient auditory-motor mapping. Neuropsychologia
51:1952–1963.
Strauß A, WA
˜stmann M, Obleser J (2014) Cortical alpha oscillations
as a tool for auditory selective inhibition. Front Hum Neurosci
8:1926.
Styns F, van Noorden L, Moelants D, Leman M (2007) Walking on
music. Hum Mov Sci 26:769–785.
Teki S (2014) Beta drives brain beats. Front Syst Neurosci 8:743.
Teki S (2016) A citation-based analysis and review of
significant papers on timing and time perception. Front Neurosci
10:656.
Teki S, Griffiths TD (2014) Working memory for time intervals in
auditory rhythmic sequences. Front Psychol 5:1329.
Teki S, Griffiths TD (2016) Brain bases of working memory for time
intervals in rhythmic sequences. Front Neurosci 10:743.
Teki S, Grube M, Griffiths TD (2012) A unified model of time
perception accounts for duration-based and beat-based timing
mechanisms. Front Integr Neurosci 5.
Teki S, Grube M, Kumar S, Griffiths TD (2011) Distinct neural
substrates of duration-based and beat-based auditory timing. J
Neurosci 31:3805–3812.
Teki S, Kononowicz TW (2016) Commentary: beta-band oscillations
represent auditory beat and its metrical hierarchy in perception
and imagery. Front Neurosci 10:743.
Tervaniemi M, Hugdahl K (2003) Lateralization of auditory-cortex
functions. Brain Res Rev 43:231–246.
Tierney A, Kraus N (2013) Neural responses to sounds presented on
and off the beat of ecologically valid music. Front Syst Neurosci 7.
Todd NPM, Lee CS (2015) The sensory-motor theory of rhythm and
beat induction 20 years on: a new synthesis and future
perspectives. Front Hum Neurosci 9:357.
Trost W, Fru
¨hholz S, Scho
¨n D, Labbe
´C, Pichon S, Grandjean D,
Vuilleumier P (2014) Getting the beat: entrainment of brain activity
by musical rhythm and pleasantness. NeuroImage 103:55–64.
Turgeon M, Bregman AS, Ahad PA (2002) Rhythmic masking
release: contribution of cues for perceptual organization to the
cross-spectral fusion of concurrent narrow-band noises. J Acoust
Soc Am 111:1819–1831.
Turgeon M, Bregman AS, Roberts B (2005) Rhythmic masking
release: effects of asynchrony, temporal overlap, harmonic
relations, and source separation on cross-spectral grouping. J
Exp Psychol Hum Percept Perform 31:939–953.
van Noorden L (1975) Temporal coherence in the perception of tone
sequences. Doctoral Thesis.
van Noorden L, Moelants D (1999) Resonance in the perception of
musical pulse. J New Music Res 28:43–66.
Vuust P, Witek MAG (2014) Rhythmic complexity and predictive
coding: a novel approach to modeling rhythm and meter
perception in music. Front Psychol 5:273.
Welsh JP, Lang EJ, Suglhara I, Llina
´s R (1995) Dynamic organization
of motor control within the olivocerebellar system. Nature
374:453–457.
Winkler I, Denham S, Mill R, Bohm TM, Bendixen A (2012)
Multistability in auditory stream segregation: a predictive coding
view. Philos Trans Royal Soc B: Biol Sci 367:1001–1012.
Winkler I, Denham SL, Nelken I (2009a) Modeling the auditory scene:
predictive regularity representations and perceptual objects.
Trends Cogn Sci 13:532–540.
Winkler I, Ha
´den GP, Ladinig O, Sziller I, Honing H (2009b) Newborn
infants detect the beat in music. Proc Natl Acad Sci USA
106:2468–2471.
Witek MAG, Clarke EF, Wallentin M, Kringelbach ML, Vuust P (2014)
Syncopation, body-movement and pleasure in groove music
Canal-Bruland R, ed. PLoS ONE 9:e94446.
Wu X, Ashe J, Bushara KO (2011) Role of olivocerebellar system in
timing without awareness. Proc Natl Acad Sci USA
108:13818–13822.
Xu D (2006) Role of the olivo-cerebellar system in timing. J Neurosci
26:5990–5995.
Yarom Y, Cohen D (2002) The olivocerebellar system as a generator
of temporal patterns. Ann N Y Acad Sci 978:122–134.
Zanto TP, Snyder JS, Large EW (2006) Neural correlates of rhythmic
expectancy. Adv Cogn Psychol 2:221–231.
Zatorre RJ, Belin P, Penhune VB (2002) Structure and function of
auditory cortex: music and speech. Trends Cogn Sci 6:37–46.
Zatorre RJ, Chen JL, Penhune VB (2007) When the brain plays
music: auditory-motor interactions in music perception and
production. Nat Rev Neurosci 8:547–558.
Zhou H, Melloni L, Poeppel D, Ding N (2016) Interpretations of
frequency domain analyses of neural entrainment: periodicity,
fundamental frequency, and harmonics. Front Hum Neurosci
10:508–509.
(Received 25 August 2017, Accepted 27 October 2017)
(Available online xxxx)
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 15
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
... The current study focuses on subtle asynchronies that may favor selfother segregation. At the neural level, cortico-cerebellar and striato-thalamo-cortical loops are known to be involved in action timing and time perception, as well as in processes of sensorimotor error-correction in single performers (Brown et al. 2006;Molinari et al. 2007;Chen et al. 2009;Kornysheva and Schubotz 2011;Teki et al. 2011b;Rajendran et al. 2018;Cannon and Patel 2021). However, whether and how these regions contribute to the processing of temporal asynchronies during joint music performance and the balancing of self-other integration and segregation remains to date unclear. ...
... On the behavioral level, we expected pianists to adapt less to their partners during subtly asynchronous compared to synchronous performance, i.e. to shift balance towards self-other segregation, ref lected in reduced cross-correlations of partners' inter-keystroke intervals (IKIs) at lag +1 and at lag −1 (see section "Materials and methods" for details ;Fairhurst et al. 2013;Novembre et al. 2016;Koban et al. 2019). On the neural level, the cerebellum and the basal ganglia (BG) were hypothesized as plausible candidates for the detection of and adaptation to these subtle temporal asynchronies, due to the involvement of these subcortical structures in temporal and rhythm processing (Ivry et al. 2002;Grahn and Brett 2007;Chen et al. 2008a;Teki et al. 2011b;Rajendran et al. 2018;Cannon and Patel 2021), and audio-motor coordination (Zatorre et al. 2007;Chen et al. 2008b;Teki et al. 2011a). ...
... These simulation-based sensory predictions may be supported by the results of our psychophysiological interaction analysis that showed increased functional connectivity between (pre)motor areas, BG, and bilateral temporal regions during performance of familiar conditions, including Heschl's gyrus (HG), planum temporale and polare (PT and PP), as well as the temporal pole (TP). The BG are known to support movement sequencing and rhythm processing (Grahn and Brett 2007;Teki et al. 2011b;Rajendran et al. 2018;Cannon and Patel 2021). The observed temporal areas are typically involved in auditory perception and musical imagery (Zatorre and Halpern 2005;Zhang et al. 2017;Martin et al. 2018), with TP being linked to higher order processes such as the recognition of familiar tunes (Hsieh et al. 2011), and the processing of musical melody and harmony (Brown et al. 2004), both functions that are plausibly relevant for sensory predictions of familiar partner actions. ...
Article
Full-text available
Joint music performance requires flexible sensorimotor coordination between self and other. Cognitive and sensory parameters of joint action—such as shared knowledge or temporal (a)synchrony—influence this coordination by shifting the balance between self-other segregation and integration. To investigate the neural bases of these parameters and their interaction during joint action, we asked pianists to play on an MR-compatible piano, in duet with a partner outside of the scanner room. Motor knowledge of the partner’s musical part and the temporal compatibility of the partner’s action feedback were manipulated. First, we found stronger activity and functional connectivity within cortico-cerebellar audio-motor networks when pianists had practiced their partner’s part before. This indicates that they simulated and anticipated the auditory feedback of the partner by virtue of an internal model. Second, we observed stronger cerebellar activity and reduced behavioral adaptation when pianists encountered subtle asynchronies between these model-based anticipations and the perceived sensory outcome of (familiar) partner actions, indicating a shift towards self-other segregation. These combined findings demonstrate that cortico-cerebellar audio-motor networks link motor knowledge and other-produced sounds depending on cognitive and sensory factors of the joint performance, and play a crucial role in balancing self-other integration and segregation.
... For future work, the analysis of other oscillation bands should be considered. In auditory events, neural oscillations in delta band have been observed in several cases: activation of the temporal cortex during sound localization tasks [80], cortical entrainment to stimuli, and years of musical training [81]; and alignment in the phase of neural oscillations during rhythmic stimulation, increasing stimulus processing and decreasing reaction times [82]. In theta band, there is evidence reporting that desynchronization and synchronization change levels in line with experience and learning [66], and that synchronization increases in task demands [68]. ...
... In theta band, there is evidence reporting that desynchronization and synchronization change levels in line with experience and learning [66], and that synchronization increases in task demands [68]. Neural activity modulation in beta band has been linked to rhythmic patterns, such as clicks of a metronome [81]. Gamma band activity has also been shown to encode rhythmic and beat information [81]. ...
... Neural activity modulation in beta band has been linked to rhythmic patterns, such as clicks of a metronome [81]. Gamma band activity has also been shown to encode rhythmic and beat information [81]. Another limitation was that this study only simulated the frequency responses of two headphones on ATVIO. ...
Article
The perception of sound can differ significantly, regardless of its physical properties. The frequency content of a sound can provoke different psychoacoustic effects on humans, and simultaneously, it modulates differently brain oscillations. Audio devices such as headphones are variables that have not been taken into consideration in many studies concerning acoustic therapies, which strongly depend on the transmission means. Headphones become a fundamental key in acoustic treatments since they are responsible for transmitting auditory stimuli. It is known that the frequency response of audio devices can change the frequency content of a signal. However, it is still unknown if their limited (or even inappropriate) functioning could affect acoustic therapy effectiveness. Therefore, the present work aims to analyze alpha (resonance frequency at which neurons decode auditory information) brain oscillations in healthy individuals when listening to the same sound but through different frequency responses of headphones. For this purpose, the frequency responses of three headphone models were firstly obtained: Atvio® supra-aural headphones (ATVIO), Shure® SRH1840 circum-aural headphones (SHURE), and Apple® EarPods® intra-aural headphones (APPLE). The estimated frequency responses showed that ATVIO and APPLE headphones presented a major difference in comparison with an ideal frequency response (same gain at every frequency of the headphone bandwidth, i.e., a flat frequency response). Then, inverse filters on the difference of the responses were implemented. Finally, the alpha brain oscillations of 29 individuals were analyzed while they were listening to pink noise through the filters of the three models. The findings suggested that the three headphones modulated alpha brain oscillations accordingly their frequency response. Namely, mapping of alpha brain oscillations for ATVIO and APPLE headphones reflected lower neural activity associated with decoding of auditory information, possibly due to the loss of gain at different frequencies. In contrast, SHURE headphones, which presented a flatter frequency response, reflected higher neural activity related to decoding and interpretation of acoustic information since frequency response was much less affected.
... Entrainment in humans involves an interplay of stimulus and temporal expectation [9]. Nowhere is this clearer than in interaction with music, humankind's playground for auditory temporal expectation and entrainment [10]. But the precise nature of this interplay is an open question. ...
... They are therefore ill-suited to characterizing moment-by-moment errors in timing prediction, which are made sporadically and separated by intervals mostly devoid of informative prediction error. This may be a fundamental shortcoming in modeling inference in the brain: behavior and neurophysiology suggests that information about "when" is carried by its own distinctive pathways and represented separately from "what," both in perceptual and motor tasks [6,10,19]. Bayesian methods have been applied to describe inferences about timing in the brain [20][21][22], but in these cases the problem the brain solves has been formulated as discrete inferences about consecutive intervals rather than a continuous inference process. ...
Article
Full-text available
When presented with complex rhythmic auditory stimuli, humans are able to track underlying temporal structure (e.g., a “beat”), both covertly and with their movements. This capacity goes far beyond that of a simple entrained oscillator, drawing on contextual and enculturated timing expectations and adjusting rapidly to perturbations in event timing, phase, and tempo. Previous modeling work has described how entrainment to rhythms may be shaped by event timing expectations, but sheds little light on any underlying computational principles that could unify the phenomenon of expectation-based entrainment with other brain processes. Inspired by the predictive processing framework, we propose that the problem of rhythm tracking is naturally characterized as a problem of continuously estimating an underlying phase and tempo based on precise event times and their correspondence to timing expectations. We present two inference problems formalizing this insight: PIPPET (Phase Inference from Point Process Event Timing) and PATIPPET (Phase and Tempo Inference). Variational solutions to these inference problems resemble previous “Dynamic Attending” models of perceptual entrainment, but introduce new terms representing the dynamics of uncertainty and the influence of expectations in the absence of sensory events. These terms allow us to model multiple characteristics of covert and motor human rhythm tracking not addressed by other models, including sensitivity of error corrections to inter-event interval and perceived tempo changes induced by event omissions. We show that positing these novel influences in human entrainment yields a range of testable behavioral predictions. Guided by recent neurophysiological observations, we attempt to align the phase inference framework with a specific brain implementation. We also explore the potential of this normative framework to guide the interpretation of experimental data and serve as building blocks for even richer predictive processing and active inference models of timing.
... This article proposes brain design principles, mechanisms, and architectures that enable humans to learn and consciously perform lyrics and melodies with variable rhythms and beats. There are currently a number of excellent articles and books that discuss facts about music and about how our minds perceive it (e.g., Gjerdingen, 1989;Howell et al., 1991;Deutsch, 1992Deutsch, , 2013Krumhansl, 2000;Repp, 2005Repp, , 2006aLevitin, 2006;Zatorre et al., 2007;Thompson, 2009;Large, 2010;Patel and Iversen, 2014;Large et al., 2015;Nguyen et al., 2018;Rajendran et al., 2018;Damm et al., 2020). The current article complements these contributions by developing a neural model of the brain mechanisms that regulate how humans consciously perceive, learn, and perform music. ...
Article
Full-text available
A neural network architecture models how humans learn and consciously perform musical lyrics and melodies with variable rhythms and beats, using brain design principles and mechanisms that evolved earlier than human musical capabilities, and that have explained and predicted many kinds of psychological and neurobiological data. One principle is called factorization of order and rhythm : Working memories store sequential information in a rate-invariant and speaker-invariant way to avoid using excessive memory and to support learning of language, spatial, and motor skills. Stored invariant representations can be flexibly performed in a rate-dependent and speaker-dependent way under volitional control. A canonical working memory design stores linguistic, spatial, motoric, and musical sequences, including sequences with repeated words in lyrics, or repeated pitches in songs. Stored sequences of individual word chunks and pitch chunks are categorized through learning into lyrics chunks and pitches chunks . Pitches chunks respond selectively to stored sequences of individual pitch chunks that categorize harmonics of each pitch, thereby supporting tonal music. Bottom-up and top-down learning between working memory and chunking networks dynamically stabilizes the memory of learned music. Songs are learned by associatively linking sequences of lyrics and pitches chunks. Performance begins when list chunks read word chunk and pitch chunk sequences into working memory. Learning and performance of regular rhythms exploits cortical modulation of beats that are generated in the basal ganglia. Arbitrary performance rhythms are learned by adaptive timing circuits in the cerebellum interacting with prefrontal cortex and basal ganglia. The same network design that controls walking, running, and finger tapping also generates beats and the urge to move with a beat.
... We assessed auditory processing in terms of temporal resolution and speech perception in noise. We opted to measure temporal resolution as it helps in music perception (Nakajima et al., 2018;Rajendran et al., 2018) and aids in discrimination of musical notes (Kumar et al., 2016). The temporal release of masking test measured the temporal resolution ability. ...
Article
We assessed fatigue's effect on temporal resolution and speech perception in noise abilities in trained instrumental musicians. In a pretest-posttest quasiexperimental research design, trained instrumental musicians (n = 39) and theater artists as nonmusicians (n = 37) participated. Fatigue was measured using a visual analog scale (VAS) under eight fatigue categories. The temporal release of masking measured the temporal resolution, and auditory stream segregation assessed speech perception in noise. Entire testing was carried out at two time-points: before and after rehearsal. Each participant rehearsed for five to six hours: musicians playing musical instruments and theater artists conducted stage practice. The results revealed significantly lower VAS scores for both musicians and nonmusicians after rehearsal, indicating that both musicians and nonmusicians were fatigued after rehearsal. The musicians had higher scores for temporal release of masking and lower scores for auditory stream segregation abilities than nonmusicians in the pre-fatigue condition, indicating musicians’ edge in auditory processing abilities. However, no such differences in the scores of musicians and nonmusicians were observed in the post-fatigue testing. The results were inferred as the music training related advantage in temporal resolution, and speech perception in noise might have been reduced due to fatigue. In the end, we recommend that musicians consider fatigue a significant factor, as it might affect their performance in auditory processing tasks. Future researchers must also consider fatigue as a variable while measuring auditory processing in musicians. However, we restricted the auditory processing to temporal resolution and speech perception in noise only. Generalizing these results to other auditory processes requires further investigation.
... According to the existing literature, the average duration of music is 57.1 s, with some lasting up to 30 min [12,26]. During music playback, human emotion perception will also change accordingly, with inter-individual differences [27]. Recent studies have used music as the inducing material, which has problems, such as long time duration, variable emotional states, and poor adaptability. ...
Article
Full-text available
Music can regulate and improve the emotions of the brain. Traditional emotional regulation approaches often adopt complete music. As is well-known, complete music may vary in pitch, volume, and other ups and downs. An individual’s emotions may also adopt multiple states, and music preference varies from person to person. Therefore, traditional music regulation methods have problems, such as long duration, variable emotional states, and poor adaptability. In view of these problems, we use different music processing methods and stacked sparse auto-encoder neural networks to identify and regulate the emotional state of the brain in this paper. We construct a multi-channel EEG sensor network, divide brainwave signals and the corresponding music separately, and build a personalized reconfigurable music-EEG library. The 17 features in the EEG signal are extracted as joint features, and the stacked sparse auto-encoder neural network is used to classify the emotions, in order to establish a music emotion evaluation index. According to the goal of emotional regulation, music fragments are selected from the personalized reconfigurable music-EEG library, then reconstructed and combined for emotional adjustment. The results show that, compared with complete music, the reconfigurable combined music was less time-consuming for emotional regulation (76.29% less), and the number of irrelevant emotional states was reduced by 69.92%. In terms of adaptability to different participants, the reconfigurable music improved the recognition rate of emotional states by 31.32%.
... Interactions between the auditory and motor systems of the brain allow the detection and anticipation of temporally predictable moments in a song (Penhune and Zatorre, 2019). In this sense, the premotor cortex (PMC) and the supplementary motor area (SMA) are involved in rhythm detection (Kung et al., 2013;Merchant et al., 2015;Rajendran et al., 2018); and the inferior frontal gyrus (IFG) and the precentral gyrus (PreCG) presented a music-related activity that is linked to movements associated to the music, i.e., a match between music and movement related to that music (e.g., while dancing or playing an instrument) (Lahav et al., 2005;Furukawa et al., 2017). Auditory-motor interactions are important to understand the link between music perception, dancing or even speech (Gordon et al., 2018 Music genre or style is an essential category for understanding human preferences of music, but it is largely unknown how abstract categories of music genre are represented in the brain (Nakai et al., 2018). ...
Article
Full-text available
The neuroscience of music has recently attracted significant attention, but the effect of music style on the activation of auditory-motor regions has not been explored. The aim of the present study is to analyze the differences in brain activity during passive listening to non-vocal excerpts of four different music genres (classical, reggaeton, electronic and folk). A functional magnetic resonance imaging (fMRI) experiment was performed. Twenty-eight participants with no musical training were included in the study. They had to passively listen to music excerpts of the above genres during fMRI acquisition. Imaging analysis was performed at the whole-brain-level and in auditory-motor regions of interest (ROIs). Furthermore, the musical competence of each participant was measured and its relationship with brain activity in the studied ROIs was analyzed. The whole brain analysis showed higher brain activity during reggaeton listening than the other music genres in auditory-related areas. The ROI-analysis showed that reggaeton led to higher activity not only in auditory related areas, but also in some motor related areas, mainly when it was compared with classical music. A positive relationship between the melodic-MET score and brain activity during reggaeton listening was identified in some auditory and motor related areas. The findings revealed that listening to different music styles in musically inexperienced subjects elicits different brain activity in auditory and motor related areas. Reggaeton was, among the studied music genres, the one that evoked the highest activity in the auditory-motor network. These findings are discussed in connection with acoustic analyses of the musical stimuli.
... Also, speech can be used for information intake, forecasting and responding in everyday life and in sports situations, although the content is realized a little later (Schreiber & McMurray, 2019). In a review concerning the perception of music, rhythms and speech, Rajendran, Teki and Schnupp (2018) demonstrated that auditory scene analyses, pattern detections and speech perceptions can be used for perception and anticipation. Auksztulewicz et al. (2019) analyzed brain areas being responsible for "what" and "when" information and showed that auditory information are processed in the prefrontal cortex, as it is the case with some visual information. ...
Article
It is well-known that visual information is essential for anticipation in table tennis but it not clarified whether auditory cues are also used. Therefore, we performed two in-situ studies, in which novices (study A) and advanced players (study B) responded to strokes of a real opponent or a ball machine by returning with forehand counters (study A) and forehand top spins (study B) to a given target area on the table. We assessed the parameters "hit quality" and "subjective effort". In study A, we provided four conditions: normal, a noise-cancelling headphone and earplugs to dampen auditory information, other noise-cancelling headphones and earplugs to remove almost all environmental sounds, and the same head-phones with additional bright noise to remove all sounds. In study B, we performed three tests (irregular play and regular play with an opponent and response to regular balls of a ball machine) under two conditions: normal and noise-cancelling headphones with the additional bright noise. In both studies, no significant differences between all conditions for "hit quality" and "subjective effort" (all p>0.05) were found. We conclude that auditory information, as well as their volume, have no influence on the hit quality in table tennis for novices and advanced players.
Chapter
This paper examines the individuation of concepts in the music faculty (MF) based on their intrinsic congruent structure (extending the theory of congruence in Rawbone (2017) and Rawbone and Jan (2020)), and explores the aggregation and chaining of concepts in a language of musical thought (LMT). It is proposed that music perception is enacted through ‘input modules’, which are components of the MF that ground basic uniparametric concepts through congruence in the realms of rhythm and pitch; these systems are innate, domain-specific, automatic, bottom-up, and informationally encapsulated. More intricate modules of the MF, here described as sub-central systems, build complex multiparametric concepts from basic concepts, generally preserving congruence and comprising a compositional syntax—constraining the LMT. The LMT can be characterised as a sequencing of causal–functional tokens of congruent conceptual representations. While the LMT is located inside the MF, it is suggested that the sub-central systems that assemble it are mediated partly by ‘central’, domain-general systems of thought situated outside the MF. Central systems are needed for thinking and reasoning about information that is ambiguous or noncongruent and also integrating various sources of information, such as consolidating the representations of perception and memory. There are two key considerations for the music modularity and LMT hypotheses. Firstly, determining the extent to which the grounding of multiparametrically congruent concepts is automatic, bottom-up, innate, and encapsulated and secondly, establishing why noncongruent terms are significant when there is no perceptual imperative for coining such concepts.