Temporal Processing in Audition: Insights from Music
Vani G. Rajendran,
and Jan W. H. Schnupp
Auditory Neuroscience Group, University of Oxford, Department of Physiology, Anatomy, and Genetics, Oxford, UK
City University of Hong Kong, Department of Biomedical Sciences, 31 To Yuen Street, Kowloon Tong, Hong Kong
Music is a curious example of a temporally patterned acoustic stimulus, and a compelling pan-cultural
phenomenon. This review strives to bring some insights from decades of music psychology and sensorimotor
synchronization (SMS) literature into the mainstream auditory domain, arguing that musical rhythm perception
is shaped in important ways by temporal processing mechanisms in the brain. The feature that unites these dis-
parate disciplines is an appreciation of the central importance of timing, sequencing, and anticipation. Perception
of musical rhythms relies on an ability to form temporal predictions, a general feature of temporal processing that
is equally relevant to auditory scene analysis, pattern detection, and speech perception. By bringing together
ﬁndings from the music and auditory literature, we hope to inspire researchers to look beyond the conventions
of their respective ﬁelds and consider the cross-disciplinary implications of studying auditory temporal sequence
processing. We begin by highlighting music as an interesting sound stimulus that may provide clues to how
temporal patterning in sound drives perception. Next, we review the SMS literature and discuss possible neural
substrates for the perception of, and synchronization to, musical beat. We then move away from music to explore
the perceptual eﬀects of rhythmic timing in pattern detection, auditory scene analysis, and speech perception.
Finally, we review the neurophysiology of general timing processes that may underlie aspects of the perception
of rhythmic patterns. We conclude with a brief summary and outlook for future research.
This article is part of a Special Issue entitled: Sequence Processing.Ó2017 The Authors. Published by Elsevier Ltd on behalf
of IBRO. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Key words: music psychology, sensorimotor synchronization, beat perception, rhythm perception, auditory scene analysis, tem-
WHAT MUSIC PSYCHOLOGY REVEALS ABOUT
THE NATURAL BOUNDS OF HUMAN
Rhythm is an aspect of music that occurs on a medium
temporal scale (hundreds of milliseconds to one or two
seconds), longer than that of pitch (up to tens of
milliseconds), but shorter than that of global musical
form and structure (several seconds to minutes, e.g.
phrases, sections, movements). Crucially, it is at the
temporal scale of rhythm that a number of overt motor
processes in humans tend to occur, such as the swing
of the arms and legs during walking or the inhaling and
exhaling of air during breathing. Dance, for example, is
movement to the rhythm of music. In the Western music
tradition, movements such as dance are typically
synchronized to a periodic pulse, or beat. It is important
to highlight that pulse and beat are not physical
properties of the music itself, but are perceptual
phenomena that arise from music through beat
induction (BI). BI refers to our ability to extract a
periodic pulse from music and is widely considered a
cognitive skill, though its species-speciﬁcity and domain-
speciﬁcity are topics of current debate (Honing, 2012).
The neurophysiology underlying beat perception will later
be discussed at length, but a brief review of music psy-
chology research into perceptual aspects of rhythmic tim-
ing will ﬁrst oﬀer a number of practical observations from
which to embark on this investigation.
Studies into sensorimotor synchronization (SMS) tend to
employ simple movements such as tapping a ﬁnger as a
readout of the perceived beat. These studies ﬁnd that
beat is generally perceived between 0.5 and 4 Hz,
corresponding to time intervals of 250 ms to 2 s, a
range beyond which precise coordination of motor
movements becomes diﬃcult (Repp, 2005; McAuley
et al., 2006; Repp and Su, 2013). Even within this range,
0306-4522/Ó2017 The Authors. Published by Elsevier Ltd on behalf of IBRO.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
E-mail address: email@example.com (J. W. H. Schnupp).
Sundeep Teki and Jan W. H. Schnupp contributed equally as last
Abbreviations: BI, beat induction; EEG, electroencephalography;
ERPs, event-related potentials; IOI, inter-onset interval; NMA,
negative mean asynchrony; SMS, sensorimotor synchronization;
SSA, stimulus-speciﬁc adaptation.
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.ne uroscience.2017.10.041
perception of time diﬀers between shorter and longer time
intervals. When asked to judge the duration of time inter-
vals, there is a systematic tendency for human listeners to
overestimate shorter time intervals (roughly 250–400 ms)
and underestimate long ones (600 ms to 2 s). The tran-
sition point in between, measured by various researchers
to lie between 400 and 600 ms, is termed the indiﬀerence
interval and also corresponds to the rate at which people
spontaneously tap (Fraisse et al., 1958;Fraisse, 1963;
In the context of rhythm perception, it is also the
boundary between temps courts and temps longs
(Clarke, 1999). When human subjects are asked to tap
rhythmically, almost invariably they employ a 1:1 or 1:2
ratios to the time intervals between successive taps. A
ratio of 1:2 refers to a tapping pattern of long and short
intervals where the short intervals are precisely half the
duration of the longer ones. This alludes to the theory that
temps longs are intervals during which a listener is aware
of the passage of time, whereas temps courts do not
evoke a sense of time passage by themselves, but listen-
ers are aware that a certain number of them grouped
together make up a longer interval. Tapping ratios of 1:2
observed almost always span the indiﬀerence interval,
with the longer interval belonging to temps longs and
the shorter one to temps courts (see Fig. 1). A preference
for time intervals with integer ratios also shapes the way
rhythmic patterns are perceived (Jacoby and
McDermott, 2017). Compared with intervals with noninte-
ger ratios, intervals with integer ratios are more accurately
reproduced by listeners (Essens and Povel, 1985) and
show a distinct pattern of neural activity (Sakai et al.
(1999); see also the later section entitled Neurophysiol-
ogy of beat perception). Interestingly, while a preference
for integer ratios spans diﬀerent cultures, the speciﬁc
ratios preferred by listeners is primarily determined by
their music listening experience and is not strongly
aﬀected by musical expertise (Jacoby and McDermott,
Beat – a perceptual accent
The timescales are one aspect of what determines where
a musical beat might be felt, but not all sound events in
music are equally likely to induce a beat percept.
Certain events in music have been described as giving
rise to perceptual accents, which, together with the
temporal constraints described earlier, form the basis of
where the beat is felt.
Perceptual accents may be felt at points that diﬀer in
loudness or in frequency relative to surrounding events.
However, perceptual accents can also arise purely
through temporal context. Essens and Povel (Povel and
Essens, 1985) proposed a theoretical framework for met-
rical complexity based on empirical observations. They
posit that (1) an isolated acoustic event will be perceived
as accented, (2) the second of a set of two similar or iden-
tical acoustic events played in sequence will be perceived
as accented, and (3) the ﬁrst and last of three or greater
similar events in a sequence will be perceptually
accented. Based on the location of perceptual accents
within a rhythm (which themselves may not be periodic),
the period and phase of a periodic pulse can be
Not all beats are created equal, nor is there always an
accent: subjective rhythmization
The basic temporal structure of a piece of music can be
described by its meter, or its recurring pattern of strong
beats and weak beats. Again, ‘strong’ and ‘weak’ in this
context are perceptual notions, much akin to identical
ticks of a clock being instinctively perceived as tick-tock-
tick-tock (Bolton, 1894; van Noorden and Moelants,
1999; Brochard et al., 2003; Ba
˚th, 2015). This tick-tock
of a clock could be described as having a binary meter,
or a beat pattern based on the number two (most com-
monly two or four beats in a bar) and have the beat pat-
tern of strong-weak-strong-weak. Ternary meters, or
bars based on the number three, have a pattern of
strong-weak-weak-strong-weak-weak, the most common
example being a waltz. Other more complex meters, for
example based on 5 or 7 beats in a bar are also common
in Western music, though binary and ternary meters are
more often studied because they are generally more
eﬀective in inducing a clear beat percept. The preference
or natural acceptance of binary meter could be due to a
likeness of such meters to common rhythmic motor pat-
terns such as breathing or walking.
Within the range of frequencies that a periodic pulse can
typically be perceived, there is a further distinction
between longer timescales across which the passage of
time is noticeable, and shorter timescales of which
several together are perceived to ﬁt into a longer
timescale. The boundary between the two is the
indiﬀerence interval, which lies somewhere between 400
and 600 ms. This is where temporal perception is most
accurate in humans (Fraisse, 1978), and incidentally also
corresponds to a comfortable walking pace (Styns et al.,
2007). Beats themselves arise as a result of the combina-
tion of perceptual accents and the constraint of a periodic
pulse within the range of perceivable beat frequencies. A
repeating pattern of strong and weak beats group
together to form the musical meter of a piece, and some
meters (binary and ternary) are more easily interpreted
generally, perhaps due to their semblance to binary motor
patterns or to the harmonic series on a fundamental
THE PSYCHOACOUSTICS OF BEAT
With beat and meter deﬁned, we are now equipped to
explore how we synchronize to beat. When we hear a
beat in music, we almost instinctively want to move
with it, and it has been shown that listeners often
cannot maintain movements that are out of sync
(Repp, 2002a). The synchronization of our movements to
an external rhythm is known as sensorimotor synchroniza-
tion (SMS). SMS has been studied extensively
2 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
(see Repp, 2005 and Repp and Su, 2013 for reviews), and
we highlight a few observations from SMS studies that may
be of particular relevance to a discussion of the neurophys-
iological processes that underlie rhythm perception.
When tapping along, we are usually early
Negative mean asynchrony (NMA) is a testament to the
predictive nature of synchronizing a motor action with an
expected stimulus. NMA refers to the observation that
listeners, when asked to tap along with an isochronous
pacing stimulus such as a metronome, tend to
anticipate stimulus onsets with their taps by tens of
milliseconds, rather than tapping with a distribution that
is symmetric around sound onsets (sometimes early,
sometimes late). Interestingly, listeners are often
unaware of their own NMA, suggesting a general
incongruence between objective and subjective
synchrony. Musicians tend to show less NMA than
nonmusicians, and the neurophysiological diﬀerences
between the two groups may therefore shed some
insight into the interaction between the sensory, motor,
and cognitive processes involved. A ﬁnal observation is
that NMA decreases as the tempo of the pacing
stimulus increases, which may allude to the tendency to
overestimate short time intervals and underestimate
longer ones described in the previous section on
Timescales. For a more comprehensive review of NMA,
see (Aschersleben, 2002).
Beat period and phase may have distinct underlying
A number of intriguing insights into SMS have also been
uncovered through studies that systematically perturb
the pacing stimulus, for example by introducing a phase
oﬀset, tempo change, or a sequence of distractors.
Overwhelmingly, the evidence points to an interesting
behavioral dissociation between phase correction and
period correction (Repp, 2005). Phase correction in an
isochronous sound sequence refers to a subtle adjust-
ment of tapping so that it returns to synchrony following
an unexpected inter-onset interval (IOI) that is abnormally
short or long, which would result in an abrupt phase shift
in the sequence that is either temporary or persistent
(Anomaly and Phase shift in Fig. 2, respectively). Period
correction refers to the adjustment of taps to a sudden
tempo change, or an abruptly shorter or longer IOI
(Tempo change in Fig. 2).
The phase correction mechanism appears to be
automatic; the timing of the tap subsequent to the
perturbation shifts according to whether the preceding
IOI was shortened or lengthened, even when the phase
oﬀset is imperceptible to listeners (Repp, 2002b). Simi-
larly, shifting a single tone in an isochronous stimulus
such that it results in a shorter IOI on one side of it and
a longer IOI on the other induces an involuntary shift in
tap times after the perturbation, even when participants
were told to ignore the perturbation. If a distractor
Fig. 1. (A) The beat perceived depends on the tempo at which a musical rhythm is played. In this simple, recognizable example rhythm, notes
represent sound events and those with a single stem are quarter notes, notes with the attached stem are eighth notes, and the remaining symbol is
a quarter rest (silence). The basic unit of time here is the quarter note; a quarter rest is the same duration as a quarter note, and each eighth note is
half the duration of the quarter note. Tempo is conventionally speciﬁed in beats per minute, so for the slow tempo (in red), there would be 75 quarter
notes per minute, and each quarter note is therefore 800 ms in duration. The fast tempo (in blue) is twice the speed of the slow tempo. In both cases,
the beat may be comfortably perceived at 800-ms intervals or 1.25 Hz (ﬁlled circles), but depending on the tempo this may coincide with diﬀerent
events in the music. The alternation of strong (solid lines) and weak beats (dotted lines) are illustrated for each tempo. Syncopation (green triangle),
or when a beat is felt where there is silence, is very common in music. (B) This schematic illustrates the time scales over which common auditory
events unfold. Time is on a log scale from small intervals (fast rates) to large intervals (slow rates), with values shown in milliseconds and in Hz. The
indiﬀerence interval is marked in purple; shorter intervals are temps courts, longer intervals are temps longs.
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 3
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
sequence of isochronous tones is introduced, taps shift
toward it, and interestingly this eﬀect appears to be insen-
sitive to the pitch diﬀerence between the tones of the pac-
ing and distractor sequence. In this case, temporal
coherence seems to be key: if a target and distractor tone
are within 120 ms of each other, tapping behavior would
suggest that they are treated as a joint referent. In con-
trast, period correction to a step change in tempo appears
to require the change in tempo to be perceptible (Repp
and Keller, 2004). Listeners’ ability to completely ignore
a tempo change and continue tapping at the original
tempo without showing any period correction is further
evidence that period correction requires cognitive control,
in contrast to phase correction, which in the same task
proved impossible for participants to suppress. Under
the looser constraint of self-paced movements, there
does appear to be a natural tendency to synchronize
movements to the period and phase of a musical beat
(Peckel et al., 2014).
How fast we can tap along depends on what we are
Depending on the nature of the task, diﬀerent studies
reports somewhat diﬀerent ranges within which beat
perception and SMS can occur. In truth, the context-
dependent nature of SMS is in itself a reﬂection of the
diﬀerent sensory and biomechanical constraints. At the
slow extreme, IOIs longer than 1.8 s result in inaccurate
synchronization where taps begin to lag the pacing
stimulus. At the fast extreme, ﬁnger tapping with an
isochronous pacing stimulus can be done at a rate of up
to 5–7 taps per second. However, if the task is 1:n
synchronization, the IOIs in the pacing signal can be as
short as 100–120 ms for trained musicians, which
suggests that audiomotor processing can cope with
these fast rates. This so-called subdivision beneﬁt too
depends on the exact subdivision required. 1:2, 1:3, 1:4,
and 1:8 tapping can be done at lower IOIs than 1:5 or
1:7, with 1:6 and 1:9 tapping falling somewhere in
between. This suggests a certain level of automaticity to
subdivision by 2, 3, 4, and 8, while the cognitive
demands of counting groups of 5 interferes with sensory
processing. A similar eﬀect is observed when listeners
tap an isochronous beat in non-isochronous rhythmic
patterns. Rhythmic patterns diﬀer in their complexity,
and while very complex rhythms are diﬃcult to
synchronize to (Povel and Essens, 1985), rhythmic pat-
terns of medium complexity are what elicit the greatest
desire from listeners to move (Witek et al., 2014). This
may relate to beat salience, which has been shown to cor-
relate with listeners’ desire to move (Madison et al.,
2011). Rhythmic complexity and the strength of the beat
percept also inﬂuence the precision of temporal judg-
ments (Grube and Griﬃths, 2009) and may be due to dif-
ferences in neural representation of metrically simple,
complex, and non-metrical sound patterns (Sakai et al.,
1999; Vuust and Witek, 2014).
Fig. 2. Illustration of a selection of perturbations used to study period and phase correction in sensorimotor synchronization. The x-axis represents
time, and here a temporal grid representing 600-ms intervals is marked by the vertical dotted lines. Circles represent clicks to which a listener would
align their taps, and blue circles mark sounds whose timing would be a departure from the isochronous condition where there is no perturbation. In
the Isochronous condition (top), a click is played every 600 ms. In Anomaly, a single click in the sequence is manipulated such that the IOIs that
ﬂank it are too long and too short by 100 ms, allowing the remainder of the sequence to remain unchanged. In Phase Shift, a single IOI is lengthened
by 100 ms but this time is not gained back, resulting in a phase shift of 100 ms that persists for all remaining clicks, even though their IOI remains
600 ms. In Tempo Change, the IOI changes abruptly from 600 ms to 500 ms. This would be perceived as a faster tempo and would require and
adjustment in the period of taps.
4 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Please cite this article in press as: Rajendran VG et al. Temporal Processing in Audition: Insights from Music. Neuroscience (2017), https://doi.org/10.1016/j.neuroscience.2017.10.041
Perception appears to be based on a judgment of
intervals, whereas action appears to be the result of a
joint computation based on stimulus onsets and ongoing
taps. Beat has both a period and a phase, and
perturbation studies suggest that dissociable processes
underlie adjustment of each. Speciﬁcally, phase
correction appears to be automatic and involuntary,
while period correction requires cognizance of a tempo
change and can be suppressed at will. The temporal
limits of synchronization ability are also context-
dependent and are a result of biomechanical, sensory,
and cognitive constraints. These factors are also at play
in the context of more complicated rhythmic patterns
and real music, where the temporal structure of the
sound aﬀects listeners’ ability and desire to synchronize.
NEUROPHYSIOLOGY OF BEAT PERCEPTION
A number of electrophysiological studies in humans have
attempted to identify the neural correlate of the beat
percept. Neural signatures of beat perception have been
identiﬁed through direct and indirect means and involve
distributed cortical and subcortical networks (Teki et al.,
2012). Comparisons with studies in newborn humans
(Winkler et al., 2009b) and nonhuman species would sug-
gest that some aspects of rhythm perception may be
innate to humans and to some nonhuman species,
whereas other aspects may be unique to humans.
Strong beats diﬀer physiologically from weak beats
As described earlier, subjective rhythmization can
generate a metrical percept of alternating strong and
weak beats even in an isochronous sequence of
identical sounds. This paradigm arguably would allow
for the dissociation between the cognitive and the
sensory aspects of beat perception in the context of
identical isochronous sounds. Electroencephalography
(EEG) studies that investigate the neural correlates of
subjective accenting do so either directly or indirectly.
Indirect methods involve the measurement of event-
related potentials (ERPs) resulting from rare ‘‘deviant”
(e.g. an omission or change in loudness) in a series of
‘‘standard” or expected sounds. Diﬀerences in the ERP
to perturbations coinciding with strong and weak beats
may therefore signify neurophysiological diﬀerences in
processing that result from subjective accenting. Early
components of the ERP are known as the mismatch
negativity (MMN) and are considered to be pre-attentive
in contrast to later components (300–600 ms post-
stimulus onset), which are presumed to reﬂect cognitive
mechanisms. Though subjective accenting is cognitive
by deﬁnition, the setting of temporal expectations may
inﬂuence the processing of forthcoming sounds in a
predictive manner, and indeed both early and late ERP
diﬀerences have been found between deviants at strong
and weak beat positions (Brochard et al., 2003;
Abecasis, 2005; Geiser et al., 2010; Schaefer et al.,
2010; Bouwer et al., 2014;Honing et al., 2014).
The more direct approach compares sound-evoked
responses at strong and weak positions in rhythmic
sequences. Here too, strong beats evoke higher source
current activity than weak beats in temporal and frontal
areas, despite sounds being acoustically identical (Todd
and Lee, 2015). Similarly, a target sound played over a
background of pop music evokes stronger cortical and
brainstem responses if it was presented on the beat,
rather than shifted oﬀ the beat by ¼of the inter-beat inter-
val (Tierney and Kraus, 2013). All together, these event-
based studies suggest that metrically strong positions
are accompanied by larger source currents than metri-
cally weak positions, and that these diﬀerences may be
pre-attentive. It is worth noting, however, that by design
these studies look at diﬀerences in predictions of not only
‘‘when” an auditory event is expected, but also ‘‘what” that
auditory event should be (Teki and Kononowicz, 2016).
Behavioral evidence suggests that these two types of pre-
dictions may have distinct neural substrates (Morillon
et al., 2016; Rajendran and Teki, 2016), and it is therefore
not yet possible to say whether pre-attentive responses
are a result of temporal expectation alone or a combina-
tion of expectations of ‘‘what” and ‘‘when” (Arnal, 2012;
Arnal and Giraud, 2012; Schwartze et al., 2013).
Entrainment of oscillatory activity to musical beat
In addition to event-based descriptions of the beat
percept, cortical oscillations have also been shown to
reﬂect metrical structure. This is noteworthy because it
suggests that neural oscillations, in addition to
entraining to the rate of individual events in a rhythmic
sequence, are also able to entrain to higher-level
temporal regularities, but the precise mechanism behind
this is still unknown. Modulation of auditory cortical
activity in the beta band has been shown to track the
clicks of a metronome, while gamma oscillations appear
to encode anticipated stimulus timing as evidenced by a
peak in gamma activity even in the absence of a click
(Fujioka et al., 2009). Beta oscillations have also been
demonstrated to encode beat and meter imagery
(Iversen et al., 2009; Fujioka et al., 2015), and the dynam-
ics of induced beta oscillatory activity both in humans
(Teki, 2014) and in nonhuman primates (Bartolo et al.,
2014; Bartolo and Merchant, 2015) (see the later section
on Beat processing in nonhuman species), have been
shown to vary according to the temporal regularity of
sound sequences. In addition to beta, gamma band oscil-
lations also appear to encode beat and meter (Snyder and
Large, 2005; Zanto et al., 2006), and entrainment in the
low-frequency delta-theta band (<8 Hz) has also been
shown to correlate with years of musical training
(Doelling and Poeppel, 2015). Low-frequency entrain-
ment to the beat has also been observed in the bulk elec-
troencephalogram signal (Nozaradan et al., 2011; Henry
et al., 2014; see Zhou et al., 2016 for a guide on the inter-
pretation of low-frequency components in the Fourier
spectrum). A hierarchical organization of oscillatory activ-
ity in the auditory cortex is thought to facilitate temporal
processing of auditory stimuli and coordinate activity
between sensory and other brain areas (Lakatos, 2005).
Cortical oscillations have furthermore been hypothesized
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 5
to provide a mechanism for attentional selection and may
be entrained by rhythmic auditory stimuli (Lakatos et al.,
2008; Schroeder and Lakatos, 2009; Gomez-Ramirez
et al., 2011; Lakatos et al., 2013).
Brain areas involved in beat perception
In addition to the auditory cortex, musical rhythms have
been shown to engage a number of distributed brain
areas, including several that would traditionally be
considered part of the brain’s motor system, and hence
might not immediately be thought of as playing a key
role in beat perception. These include the basal ganglia,
supplementary motor area, striatum, cerebellum,
sensorimotor cortex, and premotor cortex (Parsons,
2001; Grahn and Brett, 2007; Zatorre et al., 2007; Chen
et al., 2008; Grahn, 2009; Teki et al., 2011). Engagement
of motor-related areas appears to be automatic since it is
observed consistently even when listeners are instructed
not to make overt movements (Chen et al., 2008). Activa-
tion in auditory and motor areas furthermore correlates
with individual diﬀerences in beat perception (Grahn and
The activation of brain areas during beat perception
depends on several factors including the duration of
intervals (Lewis and Miall, 2003), temporal context (Teki
et al., 2011), and task demands (Merchant et al., 2013).
The core timing areas of the brain, speciﬁcally the stria-
tum and the cerebellum (Ivry and Schlerf, 2008) are acti-
vated in perceptual timing depending on the temporal
regularity of the sequences. For isochronous sequences,
where a clear beat can be perceived, timing relies more
on a network involving the striatum, while for jittered
sequences, where the percept of a beat is negligible
and intervals are encoded in an absolute manner, timing
relies more on an olivocerebellar network (Teki et al.,
2011,2012). Examination of individuals who exhibit ‘‘beat
deafness” (Phillips-Silver et al., 2011), a rare condition
that is associated with poor motor synchronization and/
or impoverished beat perception (Sowin
´ski and Bella,
2013), provides further evidence that beat perception
may recruit distinct circuits depending on the implicit/ex-
plicit timing aspect of the task (Be
´gel et al., 2017). The
dissociation of striatal and cerebellar responses for
beat-based versus duration-based sequences has
recently been observed to hold not only for perception
but also for working memory for single time intervals in
sequences with diﬀerent rhythmic structures (Teki and
Beat perception itself may be subcategorized into the
processes of ﬁnding, continuing, and adjusting the beat,
and the evidence points strongly toward the basal
ganglia being involved in the continued representation of
beat rather than its detection or adjustment (Chapin
et al., 2010; Grahn and Rowe, 2013). In one fMRI study
(Chapin et al., 2010), participants were played six cycles
of each of a set of complex rhythm and were tasked with
attending to the rhythm, holding it in memory over 12 s,
then reproducing it by tapping. During the attending
phase, the basal ganglia showed signiﬁcant activation
only if the auditory stimulus was attended to, and if
suﬃcient cycles of the rhythm had passed for listeners
to perceive the beat. The basal ganglia also remained
active during the rehearsal period. Similarly, in another
fMRI study (Grahn and Rowe, 2013) where beat and non-
beat rhythms were played consecutively, the preceding
rhythm determined whether the beat in the subsequent
rhythm, if any, was a continuation from the previous
rhythm (beat continuation), was sped up or slowed down
(beat adjustment), or needed to be found afresh (beat
ﬁnding). Here, the basal ganglia were most active in beat
continuation conditions and less active for beat adjust-
ment conditions, with no apparent diﬀerence between
the beat ﬁnding and the nonbeat (where no beat was pre-
The superior temporal gyrus, premotor cortex, and
ventrolateral prefrontal cortex show activity during beat
detection and synchronization through tapping (Kung
et al., 2013). When tapping to rhythmic sequences that
contain syncopation (the absence of sound on a per-
ceived beat, see Fig. 1), diﬀerences in activation of the
premotor cortex, supplemental motor area, basal ganglia,
and lateral cerebellum were observed, and these diﬀer-
ences were present even when motor actions were not
executed and the beat was simply imagined (Oullier
et al., 2005). Syncopation is among the factors that deter-
mine how engaging listeners ﬁnd a piece of music, and
pleasant music appears to more eﬀectively entrain neural
responses in the caudate nucleus of the basal ganglia
(Trost et al., 2014). Premotor and cerebellar areas are
also more heavily recruited in response to subjectively
more ‘‘beautiful” rhythms, and activity in the ventral pre-
motor cortex (PMv) is enhanced by rhythms that are at
a preferred tempo (Kornysheva et al., 2010). Repetitive
transcranial magnetic stimulation (TMS) over the PMv
changed people’s preferred tempo, suggesting that the
PMv may be involved in beat rate preference
(Kornysheva and Schubotz, 2011).
Findings from a number of functional imaging studies
begin to allude back to some of the observations from
early studies on temporal processing in the context of
music. For example, beat induction is poorer for a slow
(1500 ms) tempo compared to a faster one (600 ms),
and activity in the basal ganglia, premotor and
supplementary motor regions, and thalamus is
correspondingly reduced (McAuley et al., 2012). This is
consistent with accounts that the motor system is prefer-
entially engaged in the measurement of sub-second time
intervals (Lewis and Miall, 2003). Basal ganglia activity
peaks around 500–600 ms (Riecker et al., 2003), which
is comparable to the indiﬀerence interval and the rate of
spontaneous tapping in humans (Repp and Su, 2013).
The upper tempo limit to beat perception (200 ms)
may be determined by the time constant for temporal
integration (Loveless et al., 1996), which is comparable
to the duration of auditory short term sensory memory,
or ‘‘short auditory store” (Cowan, 1984). Recent work,
however, suggests that temporal memory resources
may not be ﬁxed for a discrete number of items but ﬂexibly
distributed according to the number of intervals to be
encoded in a sequence (Teki and Griﬃths, 2014;
Joseph et al., 2016).
6 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Model-based accounts of beat perception
A number of theoretical models have been proposed that
capture neural and behavioral aspects of beat perception.
Neural resonance theory is an inﬂuential computational
model that consists of two sets of dynamic nonlinear
oscillators, one that receives sensory input (an
‘‘auditory” layer) and one that receives input and
projects back to the auditory layer (a ‘‘motor” layer). The
interaction between these layers can be modeled as a
dynamical system, and the results resemble both
neurophysiological and behavioral aspects of beat
perception (Large et al., 2015). Neural resonance theory
is compatible with the dynamic attending theory, which
postulates oscillatory ﬂuctuations in attention (Large and
Jones, 1999). The active sensing hypothesis (Schroeder
et al., 2010) postulates similar interactions between the
auditory and motor system (see (Henry and Herrmann,
2014) for a comparison of the two hypotheses). The
‘‘action simulation for auditory prediction” (ASAP) hypoth-
esis goes a step further by suggesting that auditory per-
ception is sharpened by the explicit simulation of
periodic movement in motor planning regions of the brain
(Patel and Iversen, 2014). The precise mechanism for
beat induction remains unknown, though the entrainment
of neural oscillations is a common thread between these
Beat processing in nonhuman species
So far, the discussion has focused on ﬁndings from
human studies. Beat perception studies in nonhuman
species are numerous, but apart from notable
exceptions such as a cockatoo (Patel et al., 2009) and a
sea lion (Cook et al., 2013; Rouse et al., 2016), nonhu-
man species have shown little compelling evidence of
being able to perceive and synchronize to the beat as pre-
cisely as humans (Geiser et al., 2014). Though chim-
panzees appear to show some synchronization ability
(Hattori et al., 2013), it appears to be weak and quite lim-
ited in tempo range compared to that of the human. This
may be somewhat surprising given that humans are not
the only species that relies on rhythmic sounds such as
vocalizations and produces rhythmic movements. Indeed,
some signatures of rhythm perception in humans have
also been observed in macaques, such as interval
duration-selective modulation of beta oscillations
(Bartolo et al., 2014). In this and other related studies,
the macaques were given a serial continuation task where
they tapped along to a metronome and continued tapping
at the same rate after the metronome stops. Though tap
times tended to lag metronome clicks by 100–250 ms,
these lags were shorter than the macaque’s reaction
times, suggesting that there was a predictive element,
though not strong enough to mimic the near-zero or neg-
ative lags in humans. Like in humans, beta oscillations in
the basal ganglia (putamen) show preference for the con-
tinuation of a beat, and overall, similar timing circuits have
been identiﬁed in both human and nonhuman primates,
though macaques show better performance when syn-
chronizing their movements to a visual rather than audi-
tory metronome (Merchant et al., 2015). This is in
contrast to a clear auditory bias in humans (Honing and
Merchant, 2014). Larger responses in primary auditory
cortex to tones at ‘‘strong beat” positions in a rhythmic
sequence than to the same tones in a rhythmically irregu-
lar sequence have also been observed in macaques, in
addition to enhanced deviance detection ability
(Selezneva et al., 2013). However, this may be due to
sensitivity to rhythmic grouping rather than to beat per-
ception itself, since certain aspects of beat-speciﬁc neural
activity observed in human adults and newborns are not
observed in macaques (Honing et al., 2012). From the
perspective of low-level auditory processing, ﬁring rate
adaptation as early as the midbrain results in higher aver-
age ﬁring rates on the beat than oﬀ the beat; this may
explain why some beat interpretations are more likely to
be felt than others, and may also be a relevant precursor
to the entrainment of cortical oscillations to beat
(Rajendran et al., 2017).
Human imaging studies have provided glimpses into the
complex and highly distributed neural dynamics that are
set into motion by musical rhythms. A key conceptual
advance is the ﬁnding that rhythmic sequences engage
auditory and motor areas more strongly than arrhythmic
sequences, even during passive listening and in the
absence of movement. Another is that perceptually
strong beats evoke stronger neural activity than weak
beats, which suggests a close link between neural
activity and perception. Underlying both are oscillatory
processes that are capable of entraining to the beat, but
are also coordinated across sensory, frontal, parietal,
and motor-related areas both cortically and
subcortically. Some of these neural dynamics have
been observed in nonhuman primates, and it therefore
remains an open question to what extent humans are
unique in their ability to perceive musical beat, and what
diﬀerences in connectivity and neural response
dynamics give rise to humans’ seemingly superior ability
to spontaneously synchronize movements to music.
PREDICTABLE TIMING IN AUDITORY
As alluded to in the introduction, an appreciation of music
is only one of many consequences of our ability to
perceive rhythmic patterns. We now begin to shift our
focus away from music to explore rhythm perception in
a more general context. Intrinsic to the perception of a
musical beat is the prediction of when the next beat will
occur, and the perceptual advantages aﬀorded by our
general ability to form temporal predictions will be the
subject of this section.
Temporal predictability in pattern detection
Humans show a remarkable ability to detect repeating
patterns that are quite complex in their acoustic content
(Agus et al., 2010; McDermott and Simoncelli, 2011;
Kumar et al., 2014; Barascud et al., 2016). To do so is
an impressive feat; the brain must be able to hold arbitrary
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 7
sounds of arbitrary length and complexity in memory over
timescales that can range from milliseconds up to tens of
seconds (Kaernbach, 2004). It is therefore relevant that a
feature of repeating sounds in nature is that they tend to
be rhythmic and indicative of animate sound sources.
Rhythm detection may therefore be an advantageous
sensory capability, and it has been shown that rhythmic
presentation of repeating sounds facilitates detection of
complex acoustic patterns (Rajendran et al., 2016) and
decreases detection thresholds (Lawrance et al., 2014).
The entrainment of oscillatory activity in the brain,
mentioned earlier in the context of beat perception (see
Entrainment of oscillatory activity to musical beat),
provides a likely explanation for these results too.
Rhythmic input is widely thought to entrain attentional
resources (Lakatos et al., 2008; Bolger et al., 2013;
Calderone et al., 2014) such that neuronal excitability is
highest when the next stimulus is predicted to occur
(Lakatos et al., 2009; Besle et al., 2011). Low-frequency
entrainment of oscillations may therefore serve as a mech-
anism for sensory selection (Schroeder and Lakatos, 2009)
and improve the quality of sensory information received
(Rohenkohl et al., 2014). It is worth noting that the rhythmic
form of temporal expectation is just one of several forms of
temporal expectation, each resulting in subtle diﬀerences
in perception that may arise from diﬀerences in the under-
lying neural substrates (Nobre et al., 2007; Breska and
Deouell, 2017). For example, an enhancement of percep-
tual sensitivity has been demonstrated in both periodic
and non periodic sequences that are temporally pre-
dictable, but motor facilitation through faster response
latencies were only observed in the periodic condition
(Morillon et al., 2016; Rajendran and Teki, 2016).
However, it is also important to note that there is an as
yet unresolved tension, or apparent conﬂict, in the
physiological literature regarding the nature of the neural
responses involved in the processing of periodic or
rhythmic stimuli. The aforementioned studies posit that
entrainment due to temporal expectation and attention
would result in periods of heightened sensitivity in phase
with the rhythm, which would be expected to lead to
enhanced, stronger responses. This is in contrast to
well documented phenomena such as ‘‘repetition
suppression” in auditory-evoked responses measured
through EEG (Costa-Faidella et al., 2011) and
‘‘stimulus-speciﬁc adaptation (SSA)” observed in neural
responses recorded extracellularly in auditory cortex
and non-lemniscal parts of the inferior colliculus and tha-
lamus (Malmierca, 2014; Khouri and Nelken, 2015; Nieto-
Diego and Malmierca, 2016), which ﬁnd that responses to
simple periodic stimuli are reduced or suppressed, rather
than enhanced. How can isochronous stimuli on the one
hand produce entrainment that is suggestive of periodi-
cally heightened sensitivity but at the same time elicit
reduced response amplitudes as evidenced by SSA or
repetition suppression? The simple answer is that we do
not yet know. The methodologies of studies of entrain-
ment versus SSA are too diﬀerent to allow direct compar-
isons. Entrainment studies typically use EEG or LFP
measures, the amplitude of which depends at least as
much on the degree of synchronization of neural activity
as on net response amplitudes of individual neurons.
Additionally, they are often carried out on awake human
volunteers or animal subjects who may be attending to
the rhythmic sounds, while SSA studies typically use
anesthetized preparations to measure extracellular
response amplitudes that are essentially independent of
neural synchrony. Consequently, while the take-home
messages from studies of entrainment and of SSA at pre-
sent appear somewhat contradictory, how they may be
reconciled will need to be addressed in future studies
using uniﬁed methodologies.
Temporal predictability in auditory scene analysis
Another practical advantage of forming predictions based
on temporal patterns is that it allows us to parse a
complex auditory scene into distinct perceptual objects
(Winkler et al., 2009a). In addition to temporal coherence
of sound features (Turgeon et al., 2002,2005;Shamma
et al., 2011), the predictability of features such as loca-
tion, pitch, loudness, and timbre play a pivotal role in audi-
tory scene analysis (Bendixen, 2014). The segregation of
a set of sounds from another set of sounds is known as
auditory stream segregation and has often been probed
experimentally using an alternating tone paradigm of
A-B-A, where A and B tones are typically diﬀerent fre-
quencies of a certain frequency separation (Bregman
and Campbell, 1971; van Noorden, 1975). Temporal reg-
ularity within these paradigms inﬂuences whether these
sequences are perceived as integrated (A-B-A-B-A) or
whether they segregate into two perceptually distinct
streams (A---A---A and --B---B) (Bendixen et al., 2010;
Andreou et al., 2011; Rajendran et al., 2013). Together
with attentional eﬀects, predictive coding based on tem-
poral and other feature regularities may account for the
stability of auditory objects (Denham and Winkler, 2006;
Pressnitzer and Hupe, 2006; Chait et al., 2007; Winkler
et al., 2012).
Current theories suggest that the formation of auditory
objects may rely on both basic sensory neural
mechanisms (Fishman et al., 2012) and attention-driven
oscillatory mechanisms (Lakatos et al., 2008; Schroeder
and Lakatos, 2009). Though the question of how a per-
ceptual object is formed from potentially noisy and con-
ﬂicting information is still an open one, the ﬁnal
representation of an auditory object is remarkably distinct
and robust, even if it overlaps spectrotemporally with the
unattended background (Ding and Simon, 2012,2013). A
key question here, which may also be relevant to how
beat and meter emerge from music, is whether and how
diﬀerent oscillatory populations of neurons entrain to dif-
ferent time-varying sound features, and how their relative
contributions are weighted and integrated to form a coher-
ent percept of a single speaker in a noisy room.
Rhythms in speech perception
Speech is perhaps the most pervasive and critical context
in which we rely on our ability to derive meaning from
complex temporal patterns. The intelligibility of speech
has been shown to correlate with the entrainment of
8 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
oscillatory neural responses to the speech envelope
(Ahissar et al., 2001; Peelle and Davis, 2012), particularly
in the 4–8-Hz range (Luo and Poeppel, 2007). This range
corresponds to the syllable rate of speech production
(Greenberg et al., 1999) and dominates the temporal
modulations in the speech envelope (Chi et al., 1999;
Chandrasekaran et al., 2009; Elliott and Theunissen,
2009). The syllabic rate is nearly an order of magnitude
slower than ﬁne structure elements such as formants
(30–50 Hz), and a few-fold faster than intonation contours
that are typical of phrasal units (1–2 Hz). Content at all of
these timescales are parsed concurrently to extract
Speech, like music, is built hierarchically from
elements that span short to long timescales. A recent
survey of temporal modulations in speech and music
reveals a consistent peak in temporal modulations
around 5 Hz for speech across nine languages, and
around 2 Hz for music across several (Western) musical
genres (Ding et al., 2017). It is worth emphasizing, how-
ever, that the temporal structure present in speech is
not periodic like it is in music (Nolan and Jeon, 2014).
The temporal modulations in speech are constrained by
the motor system though, speciﬁcally by the biomechan-
ics of the articulators, and this results in clear temporal
structure in both the auditory and visual components to
speech (Chandrasekaran et al., 2009). There is strong
evidence that speech contains suﬃcient temporal struc-
ture to robustly entrain oscillatory neural activity (Giraud
and Poeppel, 2012), and that this entrainment serves to
maximize processing eﬃciency of future inputs by ensur-
ing that intervals of high neuronal excitability coincide with
when critical information is expected to arrive (Peelle and
Davis, 2012; Ding et al., 2017).
Interestingly, temporal manipulations to speech more
drastically impair intelligibility (Adank and Janse, 2009)
than extreme spectral manipulations do (Shannon et al.,
1995). Model-based accounts (Ghitza, 2011; Giraud and
Poeppel, 2012) suggest that phase-locking and nested
theta-gamma oscillations could explain why an extremely
impoverished speech signal can be understood if the syl-
labic rhythm is preserved (Ghitza and Greenberg, 2009).
The ‘‘asymmetric sampling in time” (AST) hypothesis sug-
gests that the two cerebral hemispheres sample an audi-
tory signal at diﬀerent rates; the left auditory areas extract
information from 20 to 40-ms temporal integration win-
dows, while auditory areas in the right hemisphere sample
using 150–250-ms temporal integration windows
(Poeppel, 2003). A related hypothesis suggests that the
left hemisphere has better temporal resolution and the
right hemisphere has better spectral resolution, and that
this functional organization reﬂects an optimization of pro-
cessing for speech and music, respectively (Zatorre et al.,
2002). Both of these ideas are consistent with the obser-
vation that the left hemisphere dominates during speech
processing while the right hemisphere dominates during
music processing (Tervaniemi and Hugdahl, 2003). The
parallels drawn here between music and speech deal
strictly with timing and do not suggest that music has
any meaning that is analogous to the semantic meaning
of speech (Lerdahl and Jackendoﬀ, 1983). However,
given these parallels, it is possible that music and speech
co-evolved (Fitch, 2000; Hauser and McDermott, 2003;
Fitch, 2006; Patel, 2007) and are built on overlapping cir-
cuit mechanisms for auditory working memory (Hickok
et al., 2003; Joseph et al., 2015) and timing (Patel, 2011).
The temporal predictability that results from rhythmic
stimulation helps us detect patterns, parse an auditory
scene into distinct auditory objects, and understand
speech. Entrainment of neural oscillations, which by
virtue of aligning to temporal modulations in a rhythmic
acoustic signal generates predictions about future input,
is thought to underlie all of these abilities. The acoustic
stimuli used in these studies range from extremely
simple alternating tone paradigms, to the parsing of two
people speaking simultaneously. Much remains to be
understood regarding what periodically or quasi-
periodically repeating features in a spectrotemporally
complex sound entrain oscillations, and how such
oscillations are ultimately integrated to form distinct
auditory objects or extract meaning. Knowing these
answers would likely shed light on the mechanism and
functional role of oscillatory entrainment to musical
NEURAL MECHANISMS OF TIMING
Music, speech, and the parsing of complex auditory
scenes all rely on an ability to detect temporal
regularities in order to form the temporal predictions that
drive how sounds in the future are perceived. This
requires some form of timekeeping in the brain. The
timing ﬁeld is vast and is a likely reﬂection of the
complexity of the neural circuits that are able to track
time (Teki, 2016). Those ﬁndings most relevant to our dis-
cussion are reviewed here.
Dedicated timekeeping circuits?
The neuronal mechanisms underlying timing have been a
subject of investigation for several decades. Braitenberg
(1967) proposed the cerebellum as a biological clock in
the millisecond range. Since then, the concept of a central
clock or internal timekeeper has dominated timing
research. Early work highlighted the unique synaptic cir-
cuitry of the cerebellum and the inferior olive as being
capable of generating precise timing signals. Speciﬁcally,
inferior olive neurons, which provide climbing ﬁber input to
the Purkinje cells in the cerebellum, possess unique
voltage-gated conductances that exhibit rhythmic sub-
threshold membrane potential oscillations (5–15 Hz) as
well as electrical gap-junctions that synchronize mem-
brane potential oscillations across cells into distinct neu-
ronal clusters that show temporally coherent activity
(Llinas et al., 1974; Llinas and Yarom, 1981). The deep
cerebellar nuclei like the dentate nucleus modulate the
electrical activity of olivary neurons and decouple them
into dynamic cell assemblies. Furthermore, these deep
cerebellar nuclei are inhibited by the Purkinje cells,
completing a feed-forward inhibitory loop. These
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 9
neurophysiological properties provide the olivocerebellar
network with the capacity to generate accurate absolute
timing signals for motor and perceptual timing (Welsh
et al., 1995; Yarom and Cohen, 2002; Jacobson et al.,
2008; Mathy et al., 2009). The use of timing signals from
the olivocerebellar network has been demonstrated
across several timing paradigms in human studies as well
(Xu, 2006; Teki et al., 2011; Wu et al., 2011; Lusk et al.,
Motivated by neuropsychological evidence from
Parkinson’s patients who showed perceptual timing
deﬁcits, parallel work focused on the basal ganglia as a
core timing network in the brain (Artieda et al., 1992).
Matell and Meck (2004) proposed an oscillatory timing
model: medium spiny neurons in the dorsal striatum act
as coincidence detectors of oscillatory cortical activity
(5–15 Hz; Miall, 1989). The cortical oscillations are pro-
posed to be synchronized at interval onset by phasic
dopamine release from the ventral tegmental area, while
dopaminergic input from the substantia nigra modulates
the activity of the dorsal striatum (Buhusi and Meck,
2005). Cortico-striatal synapses are strengthened or
weakened over experience through long-term potentiation
and depression, and after repeated stimulus presentation,
medium spiny neurons learn to encode the duration of
reinforced time intervals (Gu et al., 2011). Several studies
point to the importance of the dopaminergic basal ganglia
network in mediating accurate timing signals (Jin et al.,
2009; Bartolo et al., 2014; Gershman et al., 2014; Chiba
et al., 2015; Gouve
ˆa et al., 2015; Mello et al., 2015;
Soares et al., 2016).
Cortical networks such as primary visual, auditory,
parietal and frontal cortices have also been implicated in
sensory timing functions (e.g. Leon and Shadlen, 2003;
Bueti et al., 2008; Bueti and Macaluso, 2010;Bueti,
2011;Schneider and Ghose, 2012; Hayashi et al., 2015;
Jazayeri and Shadlen, 2015; Namboodiri et al., 2015;
Bakhurin et al., 2016; Shuler, 2016). However, it is not
completely understood what aspects of timing are respec-
tively mediated by each of these networks, nor are the
dynamics of temporal processing across sensory and
higher order cortical networks completely clear (see Rao
et al., 2001). The likely hypothesis is that early stage sen-
sory cortices process the stimulus-related features while
parietal and frontal cortices are engaged by task demands
like memory and attention (Finnerty et al., 2015). More
recently, the hippocampus (CA1) has been shown to have
‘time cells’ that display increased ﬁring rates in relation to
elapsing durations, independent of space and distance
(MacDonald et al., 2011). The prevailing view suggests
the existence of ‘time cells’ in the striatum, cerebellum
and the hippocampus whose output is integrated to obtain
a common percept of time (Lusk et al., 2016).
Is the passage of time implicit in neural responses?
An alternative hypothesis to the one positing that time is
kept by dedicated sensorimotor circuits is one that
suggests that timing is an intrinsic computation that
emerges from network-wide neural dynamics (Goel and
Buonomano, 2014). Hardy and Buonomano (2016) have
recently reviewed a number of plausible neurocomputa-
tional models of timing. Here, we brieﬂy summarize the
primary models and their principles of operation.
One of the simplest network models of timing is based
on ‘synﬁre chains’ where groups of neurons are
connected in a feed-forward fashion such that each
neuronal population is activated at a diﬀerent instant in
time (Haß et al., 2008). Synﬁre chains represent neurobi-
ologically plausible mechanism for interval timing but are
limited because of their pure feed-forward architecture
and absence of recurrent connections. Positive feedback
models, on the other hand, use recurrent excitatory con-
nections and are compatible with experimental ﬁndings
on cortical representation of time (e.g. Namboodiri et al.,
2015). The limitation of these models, however, is that it
is not known whether these can be generalized to
sequences of intervals. Finally, state-dependent networks
of timing and temporal processing are based on the
hypothesis that sensory events interact with current
states of recurrent networks to form a sequence of net-
work states that encode each event in the context of
recent stimulus history (Karmarkar and Buonomano,
2007). Several studies have demonstrated that cortical
networks can be trained to represent time intervals in
the hundreds of millisecond range where timing is pro-
posed to emerge from network-wide and pathway-
speciﬁc changes in evoked neural dynamics (e.g. Goel
and Buonomano, 2016).
Although the notion of population clocks is gaining traction
(Hardy and Buonomano, 2016), there is no compelling
biologically plausible model that generalizes these results
from studies based on computation of single time inter-
vals to complex sequences such as those observed in
music. Natural motor commands as well as sound
sequences like speech and music consist of dynamically
varying time intervals with diﬀerent temporal structures.
Several of the circuits reviewed above are specialized
for processing time on the scale of tens of milliseconds
to a few seconds, but it is not yet clear which of these
mechanisms apply in the context of beat perception as
this has not directly been tested. Integrating basic mech-
anisms of sound processing observed along the auditory
pathways with models of timing may provide some novel
insights into the examination of pattern timing.
Timing functions are distributed across the brain and
are expressed in subcortical motor structures like the
basal ganglia and the cerebellum, sensory and motor
cortices, as well as higher order areas like the parietal
and frontal cortical networks. While the timing
computations performed by of each these individual
brain regions is not fully understood, it is evident that
particular areas are specialized for mediating speciﬁc
timing functions. Future research may beneﬁt from
dissecting the precise role within each brain area and as
a part of a distributed timing network.
CONCLUSIONS AND OUTLOOK
A lot of ground has been covered in this review, largely
because the work comprising each section draws from a
10 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
diﬀerent ﬁeld (or several diﬀerent ﬁelds) of research that
have so far shown little overlap with the others. This is
despite these topics sharing common themes that unite
them. For example, the timescales that are relevant in
music are also relevant in other contexts such as in the
production and perception of movement (walking,
running, breathing) and speech, and in the parsing of
complex acoustic scenes (see Fig. 1B). Furthermore,
the entrainment of neural oscillations through
sensorimotor loops may be a central mechanism
governing perception and action in all of these contexts.
By presenting an overview of these diverse topics that
likely rely on similar temporal encoding mechanisms, we
hope that this review will provide an insightful point of
departure for future investigations into auditory temporal
We conclude by leaving the reader with an open
question that we believe will be pivotal to advancing our
understanding of temporal sequence processing,
namely a mechanistic understanding of the entrainment
of neural oscillations. While a large body of work points
to the importance of neural oscillations (the studies
mentioned in the second half of this review only scratch
the surface), this topic is nevertheless not without
controversy with many questions that remain
unresolved, starting with the functional role that
oscillations in diﬀerent frequency bands play in
information coding and retrieval. A number of theories
have been proposed that describe functional aspects of
oscillatory dynamics, including communication between
neuronal groups through coherence of oscillations
(Fries, 2015), the prioritization of sensory streams
through pulsed inhibition via alpha oscillations (Haegens
et al., 2011; Mathewson et al., 2011; Jensen et al.,
2014; Strauß et al., 2014), the retrieval of memories
through spiking that is phase-locked to theta oscillations
(Hsieh and Ranganath, 2014), and the active sensing of
sound through rhythmic temporal priors provided by the
motor system (Morillon et al., 2015). Of particular rele-
vance are the behavioral (Morillon et al., 2016) and neu-
ronal dissociations (Breska and Deouell, 2017) that
have been observed between auditory sequences that
are periodic versus temporally predictable but not peri-
odic, suggesting that the underlying neural dynamics
manifest diﬀerently according to the nature of the tempo-
ral predictions being maintained. Further work is required
to develop a uniﬁed understanding of the function served
by the entrainment of neural oscillations.
A second question relates to the dynamics of
entrainment itself, speciﬁcally how entrainment arises
and unfolds, how it extracts higher-order temporal
regularities in a rhythmic sequence, how it behaves in
response to new sensory input, and how possibly
diﬀerent and simultaneous processes interact to guide
what we perceive. Much of our current knowledge about
the role of neural oscillations and entrainment in the
perception of temporally structured stimuli is based on
the interpretation of data obtained with non-invasive
techniques (EEG, MEG, fMRI), which lack the ﬁne
resolution required to provide insights into these
phenomena at the level of individual neurons and neural
networks. Deeper insights will need data obtained at
higher spatial resolution, as is typically obtained from
invasive recordings in experimental animals, but the use
of richly structured auditory stimuli such as music in
animal electrophysiology experiments remains a highly
unusual thing to do (see Rajendran et al., 2017 for a ﬁrst
step in this direction). However, we would suggest that
employing music, in addition to traditional paradigms,
may be especially fruitful since much is known already
about our perception of music (see the ﬁrst two sections
of this review), and because it is a ﬁnely controllable stim-
ulus paradigm within which nested periodicities across dif-
ferent sound features (frequency, loudness, duration) can
be simultaneously present and tuned. Furthermore, we
suggest that, since nonhuman model organisms do show
some capacity to perceive and discriminate rhythms (see
the section on Beat Processing in Nonhuman Species),
and since recognizing rhythmic patterns in environmental
sounds such as footsteps or vocalizations is of great
importance to a wide range of organisms, complementary
studies in nonhuman species should begin to ﬁll in the
gaps in our knowledge that non-invasive psychoacoustic
and physiological studies on humans alone cannot
Acknowledgments—VGR (Wellcome Trust Doctoral Programme
in Neuroscience: WT099750MA) and ST (Sir Henry Wellcome
Postdoctoral Fellowship: WT106084/Z/14/Z) are supported by
the Wellcome Trust.
Abecasis D (2005) Diﬀerential brain response to metrical accents in
isochronous auditory sequences. Music Percept 22:549–562.
Adank P, Janse E (2009) Perceptual learning of time-compressed
and natural fast speech. J Acoust Soc Am 126:2649–2659.
Agus TR, Thorpe SJ, Pressnitzer D (2010) Rapid formation of robust
auditory memories: insights from noise. Neuron 66:610–618.
Ahissar E, Nagarajan S, Ahissar M, Protopapas A, Mahncke H,
Merzenich MM (2001) Speech comprehension is correlated with
temporal response patterns recorded from auditory cortex. Proc
Natl Acad Sci USA 98:13367–13372.
Andreou L-V, Kashino M, Chait M (2011) The role of temporal
regularity in auditory segregation. Hear Res 280:8.
Arnal LH (2012) Predicting ‘‘When” Using the Motor System’s Beta-
Band Oscillations. Front Hum Neurosci 6.
Arnal LH, Giraud A-L (2012) Cortical oscillations and sensory
predictions. Trends Cogn Sci 16:390–398.
Artieda J, Pastor MA, Lacruz F, Obeso JA (1992) Temporal
discrimination is abnormal in Parkinson’s disease. Brain
Aschersleben G (2002) Temporal control of movements in
sensorimotor synchronization. Brain Cogn. 48:66–79.
Bakhurin KI, Goudar V, Shobe JL, Claar LD, Buonomano DV,
Masmanidis SC (2016) Diﬀerential encoding of time by prefrontal
and striatal network dynamics. J Neurosci. 1789–16.
Barascud N, Pearce MT, Griﬃths TD, Friston KJ, Chait M (2016)
Brain responses in humans reveal ideal observer-like sensitivity to
complex acoustic patterns. Proc Natl Acad Sci USA 113:
Bartolo R, Merchant H (2015) Beta oscillations are linked to the
initiation of sensory-cued movement sequences and the internal
guidance of regular tapping in the monkey. J Neurosci
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 11
Bartolo R, Prado L, Merchant H (2014) Information processing in the
primate basal ganglia during sensory-guided and internally driven
rhythmic tapping. J Neurosci 34:3910–3923.
˚th R (2015) Subjective rhythmization. Music Percept 33:244–254.
Bendixen A (2014) Predictability eﬀects in auditory scene analysis: a
review. Front Neurosci 8:60.
Bendixen A, Denham SL, Gyimesi K, Winkler I (2010) Regular
patterns stabilize auditory streams. J Acoust Soc Am 128:3658.
Besle J, Schevon CA, Mehta AD, Lakatos P, Goodman RR, McKhann
GM, Emerson RG, Schroeder CE (2011) Tuning of the human
neocortex to the temporal dynamics of attended events. J
´gel V, Benoit C-E, Correa A
´, Cutanda D, Kotz SA, Bella SD (2017)
‘‘Lost in time” but still moving to the beat. Neuropsychologia
Bolger D, Trost W, Scho
¨n D (2013) Rhythm implicitly aﬀects temporal
orienting of attention across modalities. Acta Psychol.
Bolton TL (1894) Rhythm. Am J Psychol 6:145.
Bouwer FL, van Zuijen TL, Honing H (2014) Beat processing is pre-
attentive for metrically simple rhythms with clear accents: An ERP
study Johnson B, ed. PLoS ONE 9:e97467.
Braitenberg V (1967) Is the cerebellar cortex a biological clock in the
millisecond range? In: The Cerebellum, pp 334–346 Progress in
Brain Research. Elsevier.
Bregman AS, Campbell J (1971) Primary auditory stream segregation
and perception of order in rapid sequences of tones. J Exp
Breska A, Deouell LY (2017) Neural mechanisms of rhythm-based
temporal prediction: delta phase-locking reﬂects temporal
predictability but not rhythmic entrainment Poeppel D, ed. PLoS
Brochard R, Abecasis D, Potter D, Ragot R, Drake C (2003) The
‘‘Ticktock” of our internal clock: direct brain evidence of subjective
accents in isochronous sequences. Psychol Sci 14:362–366.
Bueti D (2011) The sensory representation of time. Front Integr
Bueti D, Bahrami B, Walsh V (2008) Sensory and association cortex
in time perception. J Cogn Neurosci 20:1054–1062.
Bueti D, Macaluso E (2010) Auditory temporal expectations modulate
activity in visual cortex. NeuroImage 51:1168–1183.
Buhusi CV, Meck WH (2005) What makes us tick? Functional and
neural mechanisms of interval timing. Nat Rev Neurosci
Calderone DJ, Lakatos P, Butler PD, Castellanos FX (2014)
Entrainment of neural oscillations as a modiﬁable substrate of
attention. Trends Cogn Sci 18:300–309.
Chait M, Poeppel D, de Cheveigne
´A, Simon JZ (2007) Processing
asymmetry of transitions between order and disorder in human
auditory cortex. J Neurosci 27:5207–5214.
Chandrasekaran C, Trubanova A, Stillittano S, Caplier A, Ghazanfar
AA (2009) The natural statistics of audiovisual speech Friston KJ,
ed. PLoS Comput Biol 5:e1000436.
Chapin HL, Zanto T, Jantzen KJ, Kelso SJA, Steinberg F, Large EW
(2010) Neural responses to complex auditory rhythms: the role of
attending. Front Psychol 1.
Chen JL, Penhune VB, Zatorre RJ (2008) Moving on time: brain
network for auditory-motor synchronization is modulated by
rhythm complexity and musical training. J Cogn Neurosci
Chi T, Gao Y, Guyton MC, Ru P, Shamma S (1999) Spectro-temporal
modulation transfer functions and speech intelligibility. J Acoust
Soc Am 106:2719–2732.
Chiba A, Oshio KI, Inase M (2015) Neuronal representation of
duration discrimination in the monkey striatum. Physiol Rep 3:
Clarke EF (1999) Rhythm and timing in music. In: The psychology of
music. Elsevier. p. 473–500.
Cook P, Rouse A, Wilson M, Reichmuth C (2013) A California sea
lion (Zalophus californianus) can keep the beat: motor
entrainment to rhythmic auditory stimuli in a non vocal mimic. J
Comp Psychol 127:412–427.
Costa-Faidella J, Baldeweg T, Grimm S, Escera C (2011)
Interactions between ‘‘What” and ‘‘When” in the Auditory
System: Temporal Predictability Enhances Repetition
Suppression. J Neurosci 31:18590–18597.
Cowan N (1984) On short and long auditory stores. Psychol Bull
Denham SL, Winkler I (2006) The role of predictive models in the
formation of auditory streams. J Physiol Paris 100:154–170.
Ding N, Patel AD, Chen L, Butler H, Luo C, Poeppel D (2017)
Temporal modulations in speech and music. Neurosci Biobehav
Ding N, Simon JZ (2012) Emergence of neural encoding of auditory
objects while listening to competing speakers. Proc Natl Acad Sci
Ding N, Simon JZ (2013) Adaptive temporal encoding leads to a
background-insensitive cortical representation of speech. J
Doelling KB, Poeppel D (2015) Cortical entrainment to music and its
modulation by expertise. Proc Natl Acad Sci USA 112:
Elliott TM, Theunissen FE (2009) The modulation transfer function for
speech intelligibility Friston KJ, ed. PLoS Comput Biol 5:
Essens PJ, Povel DJ (1985) Metrical and nonmetrical representations
of temporal patterns. Percept Psychophys 37:1–7.
Finnerty GT, Shadlen MN, Jazayeri M, Nobre AC, Buonomano DV
(2015) Time in cortical circuits. J Neurosci 35:13912–13916.
Fishman YI, Micheyl C, Steinschneider M (2012) Neural mechanisms
of rhythmic masking release in monkey primary auditory cortex:
implications for models of auditory scene analysis. J Neurophysiol
Fitch WT (2000) The evolution of speech: a comparative review.
Trends Cogn Sci 4:258–267.
Fitch WT (2006) The biology and evolution of music: a comparative
perspective. Cognition 100:173–215.
Fraisse P (1963) The psychology of time. Harper & Row.
Fraisse P (1978) Time and rhythm perception. In: Carterette E,
Friedman M, editors. Handbook of perception, Vol. VIII. New
York: Academic Press. p. 203–254.
Fraisse P (1982) Rhythm and tempo. In: The psychology of
Fraisse P, Ole
´ron G, Paillard J (1958) Sur les repe
`res sensoriels qui
permettent de contro
ˆler les mouvements d’accompagnement de
´riodiques. psy 58:321–338.
Fries P (2015) Rhythms for cognition: communication through
coherence. Neuron 88:220–235.
Fujioka T, Ross B, Trainor LJ (2015) Beta-band oscillations represent
auditory beat and its metrical hierarchy in perception and imagery.
J Neurosci 35:15187–15198.
Fujioka T, Trainor LJ, Large EW, Ross B (2009) Beta and gamma
rhythms in human auditory cortex during musical beat processing.
Ann N Y Acad Sci 1169:89–92.
Geiser E, Sandmann P, Ja
¨ncke L, Meyer M (2010) Reﬁnement of
metre perception - training increases hierarchical metre
processing. Eur J Neurosci 32:1979–1985.
Geiser E, Walker KMM, Bendor D (2014) Global timing: a conceptual
framework to investigate the neural basis of rhythm perception in
humans and non-human species. Front Psychol 5:159.
Gershman SJ, Moustafa AA, Ludvig EA (2014) Time representation
in reinforcement learning models of the basal ganglia. Front
Comput Neurosci 7.
Ghitza O (2011) Linking speech perception and neurophysiology:
speech decoding guided by cascaded oscillators locked to the
input rhythm. Front Psychol 2.
Ghitza O, Greenberg S (2009) On the possible role of brain rhythms
in speech perception: intelligibility of time-compressed speech
with periodic and aperiodic insertions of silence. Phonetica
12 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Giraud A-L, Poeppel D (2012) Cortical oscillations and speech
processing: emerging computational principles and operations.
Nat Neurosci 15:511–517.
Goel A, Buonomano DV (2014) Timing as an intrinsic property of
neural networks: evidence from in vivo and in vitro experiments.
Phil Trans R Soc B 369:20120460.
Goel A, Buonomano DV (2016) Temporal interval learning in cortical
cultures is encoded in intrinsic network dynamics. Neuron
Gomez-Ramirez M, Kelly SP, Molholm S, Sehatpour P, Schwartz TH,
Foxe JJ (2011) Oscillatory sensory selection mechanisms during
intersensory attention to rhythmic auditory and visual inputs: a
human electrocorticographic investigation. J Neurosci
ˆa TS, Monteiro T, Motiwala A, Soares S, Machens C, Paton JJ
(2015) Striatal dynamics explain duration judgments. eLife
Grahn JA (2009) The role of the basal ganglia in beat perception. Ann
N Y Acad Sci 1169:35–45.
Grahn JA, Brett M (2007) Rhythm and beat perception in motor areas
of the brain. J Cogn Neurosci 19:893–906.
Grahn JA, McAuley JD (2009) Neural bases of individual diﬀerences
in beat perception. NeuroImage 47:1894–1903.
Grahn JA, Rowe JB (2013) Finding and feeling the musical beat:
striatal dissociations between detection and prediction of
regularity. Cereb Cortex 23:913–921.
Greenberg S, Arai T, Kingsbury B, Morgan N, Shire M, Silipo R, Wu
SL (1999) Syllable-based speech recognition using auditory like
features. J Acoust Soc Am 105:1157–1158.
Grube M, Griﬃths TD (2009) Metricality-enhanced temporal encoding
and the subjective perception of rhythmic sequences. Cortex
Gu BM, Cheng RK, Yin B, Meck WH (2011) Quinpirole-induced
sensitization to noisy/sparse periodic input: temporal
synchronization as a component of obsessive-compulsive
disorder. Neuroscience 179:143–150.
Haegens S, Na
´cher V, Luna R (2011) a-Oscillations in the monkey
sensorimotor network inﬂuence discrimination performance by
rhythmical inhibition of neuronal spiking.
Hardy NF, Buonomano DV (2016) Neurocomputational models of
interval and pattern timing. Curr Opin Behav Sci 8:250–257.
Haß J, Blaschke S, Rammsayer T, Herrmann JM (2008) A
neurocomputational model for optimal temporal processing. J
Comput Neurosci 25:449–464.
Hattori Y, Tomonaga M, Matsuzawa T (2013) Spontaneous
synchronized tapping to an auditory rhythm in a chimpanzee.
Sci Rep 3:1566.
Hauser MD, McDermott J (2003) The evolution of the music faculty: a
comparative perspective. Nat Neurosci 6:663–668.
Hayashi MJ, Ditye T, Harada T, Hashiguchi M, Sadato N, Carlson S,
Walsh V, Kanai R (2015) Time adaptation shows duration
selectivity in the human parietal cortex Zatorre R, ed. PLoS Biol
Henry MJ, Herrmann B (2014) Low-frequency neural oscillations
support dynamic attending in temporal context. Timing Time
Henry MJ, Herrmann B, Obleser J (2014) Entrained neural
oscillations in multiple frequency bands comodulate behavior.
Proc Natl Acad Sci USA 111:14935–14940.
Hickok G, Buchsbaum B, Humphries C, Muftuler T (2003) Auditory-
motor interaction revealed by fMRI: speech, music, and working
memory in area spt. J Cogn Neurosci 15:673–682.
Honing H (2012) Without it no music: beat induction as a fundamental
musical trait. Ann N Y Acad Sci 1252:85–91.
Honing H, Bouwer FL, Ha
´den GP (2014) Perceiving Temporal
Regularity in Music: The Role of Auditory Event-Related
Potentials (ERPs) in Probing Beat Perception. In: Neurobiology
of Interval Timing, pp 305–323 Advances in Experimental
Medicine and Biology. New York, NY: Springer.
Honing H, Merchant H (2014) Diﬀerences in auditory timing between
human and nonhuman primates. Behav Brain Sci 37:557–558.
Honing H, Merchant H, Ha
´den GP, Prado L, Bartolo R (2012) Rhesus
monkeys (Macaca mulatta) detect rhythmic groups in music, but
not the beat Larson CR, ed. PLoS ONE 7:e51369.
Hsieh L-T, Ranganath C (2014) Frontal midline theta oscillations
during working memory maintenance and episodic encoding and
retrieval. NeuroImage 85:721–729.
Iversen JR, Repp BH, Patel AD (2009) Top-down control of rhythm
perception modulates early auditory responses. Ann N Y Acad Sci
Ivry RB, Schlerf JE (2008) Dedicated and intrinsic models of time
perception. Trends Cogn Sci 12:273–280.
Jacobson GA, Rokni D, Yarom Y (2008) A model of the olivo-
cerebellar system as a temporal pattern generator. Trends
Jacoby N, McDermott JH (2017) Integer ratio priors on musical
rhythm revealed cross-culturally by iterated reproduction. Curr
Jazayeri M, Shadlen MN (2015) A neural mechanism for sensing and
reproducing a time interval. Curr Biol 25:2599–2609.
Jensen O, Gips B, Bergmann TO, Bonnefond M (2014) Temporal
coding organized by coupled alpha and gamma oscillations
prioritize visual processing. Trends Neurosci:1–14.
Jin DZ, Fujii N, Graybiel AM (2009) Neural representation of time in
cortico-basal ganglia circuits. Proc Natl Acad Sci USA
Joseph S, Teki S, Kumar S, Husain M, Griﬃths TD (2016) Resource
allocation models of auditory working memory. Brain Res
Kaernbach C (2004) The memory of noise. Exp Psychol 51:240–248.
Karmarkar UR, Buonomano DV (2007) Timing in the absence of
clocks: encoding time in neural network states. Neuron
Khouri L, Nelken I (2015) Detecting the unexpected. Curr Opin
Kornysheva K, Anshelm-Schiﬀer von A-M, Schubotz RI (2010)
Inhibitory stimulation of the ventral premotor cortex temporarily
interferes with musical beat rate preference. Hum Brain Mapp
Kornysheva K, Schubotz RI (2011) Impairment of auditory-motor
timing and compensatory reorganization after ventral premotor
cortex stimulation Tsakiris M, ed. PLoS ONE 6:e21421.
Kumar S, Bonnici HM, Teki S, Agus TR, Pressnitzer D, Maguire EA,
Griﬃths TD (2014) Representations of speciﬁc acoustic patterns
in the auditory cortex and hippocampus. Proc Biol Sci
Kung S-J, Chen JL, Zatorre RJ, Penhune VB (2013) Interacting
cortical and basal ganglia networks underlying ﬁnding and tapping
to the musical beat. J Cogn Neurosci 25:401–420.
Lakatos P (2005) An oscillatory hierarchy controlling neuronal
excitability and stimulus processing in the auditory cortex. J
Lakatos P, Karmos G, Mehta AD, Ulbert I, Schroeder CE (2008)
Entrainment of neuronal oscillations as a mechanism of
attentional selection. Science 320:110–113.
Lakatos P, Musacchia G, O’Connel MN, Falchier AY, Javitt DC,
Schroeder CE (2013) The spectrotemporal ﬁlter mechanism of
auditory selective attention. Neuron 77:750–761.
Lakatos P, O’Connell MN, Barczak A, Mills A, Javitt DC, Schroeder
CE (2009) The leading sense: supramodal control of
neurophysiological context by attention. Neuron 64:419–430.
Large EW, Herrera JA, Velasco MJ (2015) Neural networks for beat
perception in musical rhythm. Front Systems Neurosci
Large EW, Jones MR (1999) The dynamics of attending: how people
track time-varying events. Psychol Rev 106:119–159.
Lawrance ELA, Harper NS, Cooke JE, Schnupp JWH (2014)
Temporal predictability enhances auditory detection. J Acoust
Soc Am 135:EL357–EL363.
Leon MI, Shadlen MN (2003) Representation of time by neurons in
the posterior parietal cortex of the Macaque. Neuron 38:
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 13
Lerdahl F, Jackendoﬀ R (1983) A generative theory of tonal
music. MIT Press.
Lewis PA, Miall RC (2003) Brain activation patterns during
measurement of sub- and supra-second intervals.
Llinas R, Baker R, Sotelo C (1974) Electrotonic coupling between
neurons in cat inferior olive. J Neurophysiol.
Llinas R, Yarom Y (1981) Electrophysiology of mammalian inferior
olivary neurones in vitro. Diﬀerent types of voltage-dependent
ionic conductances. J Physiol 315:549–567.
Loveless N, Leva
¨nen S, Jousma
¨ki V, Sams M, Hari R (1996)
Temporal integration in auditory sensory memory: neuromagnetic
evidence. Electroencephalography Clinical Neurophysiology/
Evoked Potentials Section 100:220–228.
Luo H, Poeppel D (2007) Phase patterns of neuronal responses
reliably discriminate speech in human auditory cortex. Neuron
Lusk NA, Petter EA, MacDonald CJ, Meck WH (2016) Cerebellar,
hippocampal, and striatal time cells. Curr Opin Behav Sci
MacDonald CJ, Lepage KQ, Eden UT, Eichenbaum H (2011)
Hippocampal ‘‘Time Cells” bridge the gap in memory for
discontiguous events. Neuron 71:737–749.
Madison G, Gouyon F, Ulle
¨m K (2011) Modeling the
tendency for music to induce movement in humans: ﬁrst
correlations with low-level audio descriptors across music
genres. J Exp Psychol Hum Percept Perform 37:1578–1594.
Malmierca M (2014) Neuronal adaptation, novelty detection and
regularity encoding in audition. Front Syst Neurosci 8.
Matell MS, Meck WH (2004) Cortico-striatal circuits and interval
timing: coincidence detection of oscillatory processes. Cogn Brain
Mathewson KE, Lleras A, Beck DM, Fabiani M, Ro T, Gratton G
(2011) Pulsed out of awareness: EEG alpha oscillations represent
a pulsed-inhibition of ongoing cortical processing. Front Psychol
Mathy A, Ho SSN, Davie JT, Duguid IC, Clark BA, Ha
¨usser M (2009)
Encoding of oscillations by axonal bursts in inferior olive neurons.
McAuley JD, Henry MJ, Tkach J (2012) Tempo mediates the
involvement of motor areas in beat perception. Ann N Y Acad
McAuley JD, Jones MR, Holub S, Johnston HM, Miller NS (2006) The
time of our lives: life span development of timing and event
tracking. J Exp Psychol Gen 135:348–367.
McDermott JH, Simoncelli EP (2011) Sound texture perception via
statistics of the auditory periphery: evidence from sound
synthesis. Neuron 71:926–940.
Mello GBM, Soares S, Paton JJ (2015) A scalable population code for
time in the striatum. Curr Biol 25:1113–1122.
Merchant H, Grahn J, Trainor L, Rohrmeier M, Fitch WT (2015)
Finding the beat: a neural perspective across humans and non-
human primates. Phil Trans R Soc B 370:20140093.
Merchant H, Harrington DL, Meck WH (2013) Neural basis of the
perception and estimation of time. Annu Rev Neurosci
Miall C (1989) The storage of time intervals using oscillating neurons.
Neural Comput. 1:359–371.
Morillon B, Hackett TA, Kajikawa Y, Schroeder CE (2015) Predictive
motor control of sensory dynamics in auditory active sensing. Curr
Opin Neurobiol 31:230–238.
Morillon B, Schroeder CE, Wyart V, Arnal LH (2016) Temporal
prediction in lieu of periodic stimulation. J Neurosci
Namboodiri VMK, Huertas MA, Monk KJ, Shouval HZ, Shuler MG
(2015) Visually cued action timing in the primary visual cortex.
Nieto-Diego J, Malmierca MS (2016) Topographic distribution of
stimulus-speciﬁc adaptation across auditory cortical ﬁelds in the
anesthetized rat Zatorre R, ed. PLoS Biol 14:e1002397.
Nobre AC, Correa A, Coull JT (2007) The hazards of time. Curr Opin
Nolan F, Jeon HS (2014) Speech rhythm: a metaphor? Phil Trans R
Soc B 369:20130396.
Nozaradan S, Peretz I, Missal M, Mouraux A (2011) Tagging the
neuronal entrainment to beat and meter. J Neurosci.
Oullier O, Jantzen KJ, Steinberg FL, Kelso JAS (2005) Neural
substrates of real and imagined sensorimotor coordination. Cereb
Parsons LM (2001) Exploring the functional neuroanatomy of music
performance, perception, and comprehension. Ann N Y Acad Sci
Patel AD (2007) Music, language, and the brain. Oxford University
Patel AD (2011) Why would musical training beneﬁt the neural
encoding of speech? The OPERA hypothesis. Front Psychol 2.
Patel AD, Iversen JR (2014) The evolutionary neuroscience of
musical beat perception: the Action Simulation for Auditory
Prediction (ASAP) hypothesis. Front Syst Neurosci.
Patel AD, Iversen JR, Bregman MR, Schulz I (2009) Experimental
evidence for synchronization to a musical beat in a nonhuman
animal. Curr Biol 19:827–830.
Peckel M, Pozzo T, Bigand E (2014) The impact of the perception of
rhythmic music on self-paced oscillatory movements. Front
Peelle JE, Davis MH (2012) Neural oscillations carry speech rhythm
through to comprehension. Front Psychol 3.
Phillips-Silver J, Toiviainen P, Gosselin N, Piche
´O, Nozaradan S,
Palmer C, Peretz I (2011) Born to dance but beat deaf: a new form
of congenital amusia. Neuropsychologia 49:961–969.
Poeppel D (2003) The analysis of speech in diﬀerent temporal
integration windows: cerebral lateralization as ‘‘asymmetric
sampling in time”. Speech Commun. 41:245–255.
Povel D-J, Essens P (1985) Perception of temporal patterns. Music
Pressnitzer D, Hupe J-M (2006) Temporal dynamics of auditory and
visual bistability reveal common principles of perceptual
organization. Curr Biol 16:1351–1357.
Rajendran VG, Harper NS, Abdel-Latif KHA, Schnupp JWH (2016)
Rhythm facilitates the detection of repeating sound patterns.
Front Neurosci 10:464–467.
Rajendran VG, Harper NS, Garcia-Lazaro JA, Lesica NA, Schnupp
JWH (2017) Midbrain adaptation may set the stage for the
perception of musical beat. Proc Biol Sci 284:1455.
Rajendran VG, Harper NS, Willmore BD, Hartmann WM, Schnupp
JWH (2013) Temporal predictability as a grouping cue in the
perception of auditory streams. J Acoust Soc Am 134:
Rajendran VG, Teki S (2016) Periodicity versus prediction in sensory
perception. J Neurosci 36:7343–7345.
Rao RPN, Eagleman DM, Sejnowski TJ (2001) Optimal smoothing in
visual motion perception. Neural Comput. 13:1243–1253.
Repp BH (2002a) Perception of timing is more context sensitive than
sensorimotor synchronization. Percept Psychophys 64:703–716.
Repp BH (2002b) Automaticity and voluntary control of phase
correction following event onset shifts in sensorimotor
synchronization. J Exp Psychol Hum Percept Perform
Repp BH (2005) Sensorimotor synchronization: a review of the
tapping literature. Psychon Bull Rev 12:969–992.
Repp BH, Keller PE (2004) Adaptation to tempo changes in
sensorimotor synchronization: eﬀects of intention, attention, and
awareness. Q J Exp Psychol A 57:499–521.
Repp BH, Su Y-H (2013) Sensorimotor synchronization: a review of
recent research (2006–2012). Psychon Bull Rev 20:403–452.
Riecker A, Wildgruber D, Mathiak K, Grodd W, Ackermann H (2003)
Parametric analysis of rate-dependent hemodynamic response
functions of cortical and subcortical brain structures during
auditorily cued ﬁnger tapping: a fMRI study. NeuroImage
14 V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx
Rohenkohl G, Gould IC, Pessoa J, Nobre AC (2014) Combining
spatial and temporal expectations to improve visual perception. J
Rouse AA, Cook PF, Large EW, Reichmuth C (2016) Beat keeping in
a sea lion as coupled oscillation: implications for comparative
understanding of human rhythm. Front Neurosci 10:403.
Sakai K, Hikosaka O, Miyauchi S, Takino R, Tamada T, Iwata NK,
Nielsen M (1999) Neural representation of a rhythm depends on
its interval ratio. J Neurosci 19:10074–10081.
Schaefer RS, Vlek RJ, Desain P (2010) Decomposing rhythm
processing: electroencephalography of perceived and self-
imposed rhythmic patterns. Psychol Res 75:95–106.
Schneider BA, Ghose GM (2012) Temporal production signals in
parietal cortex Pack CC, ed. PLoS Biol 10:e1001413.
Schroeder CE, Lakatos P (2009) Low-frequency neuronal oscillations
as instruments of sensory selection. Trends Neurosci. 32:
Schroeder CE, Wilson DA, Radman T, Scharfman H, Lakatos P
(2010) Dynamics of active sensing and perceptual selection. Curr
Opin Neurobiol 20:172–176.
Schwartze M, Farrugia N, Kotz SA (2013) Dissociation of formal and
temporal predictability in early auditory evoked potentials.
Selezneva E, Deike S, Knyazeva S, Scheich H, Brechmann A,
Brosch M (2013) Rhythm sensitivity in macaque monkeys. Front
Syst Neurosci 7.
Shamma SA, Elhilali M, Micheyl C (2011) Temporal coherence and
attention in auditory scene analysis. Trends Neurosci.
Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M (1995)
Speech recognition with primarily temporal cues. Science
Shuler MG (2016) Timing in the visual cortex and its investigation.
Curr Opin Behav Sci 8:73–77.
Snyder JS, Large EW (2005) Gamma-band activity reﬂects the metric
structure of rhythmic tone sequences. Cogn Brain Res
Soares S, Atallah BV, Paton JJ (2016) Midbrain dopamine neurons
control judgment of time. Science 354:1273–1277.
´ski J, Bella SD (2013) Poor synchronization to the beat may
result from deﬁcient auditory-motor mapping. Neuropsychologia
Strauß A, WA
˜stmann M, Obleser J (2014) Cortical alpha oscillations
as a tool for auditory selective inhibition. Front Hum Neurosci
Styns F, van Noorden L, Moelants D, Leman M (2007) Walking on
music. Hum Mov Sci 26:769–785.
Teki S (2014) Beta drives brain beats. Front Syst Neurosci 8:743.
Teki S (2016) A citation-based analysis and review of
signiﬁcant papers on timing and time perception. Front Neurosci
Teki S, Griﬃths TD (2014) Working memory for time intervals in
auditory rhythmic sequences. Front Psychol 5:1329.
Teki S, Griﬃths TD (2016) Brain bases of working memory for time
intervals in rhythmic sequences. Front Neurosci 10:743.
Teki S, Grube M, Griﬃths TD (2012) A uniﬁed model of time
perception accounts for duration-based and beat-based timing
mechanisms. Front Integr Neurosci 5.
Teki S, Grube M, Kumar S, Griﬃths TD (2011) Distinct neural
substrates of duration-based and beat-based auditory timing. J
Teki S, Kononowicz TW (2016) Commentary: beta-band oscillations
represent auditory beat and its metrical hierarchy in perception
and imagery. Front Neurosci 10:743.
Tervaniemi M, Hugdahl K (2003) Lateralization of auditory-cortex
functions. Brain Res Rev 43:231–246.
Tierney A, Kraus N (2013) Neural responses to sounds presented on
and oﬀ the beat of ecologically valid music. Front Syst Neurosci 7.
Todd NPM, Lee CS (2015) The sensory-motor theory of rhythm and
beat induction 20 years on: a new synthesis and future
perspectives. Front Hum Neurosci 9:357.
Trost W, Fru
¨hholz S, Scho
¨n D, Labbe
´C, Pichon S, Grandjean D,
Vuilleumier P (2014) Getting the beat: entrainment of brain activity
by musical rhythm and pleasantness. NeuroImage 103:55–64.
Turgeon M, Bregman AS, Ahad PA (2002) Rhythmic masking
release: contribution of cues for perceptual organization to the
cross-spectral fusion of concurrent narrow-band noises. J Acoust
Soc Am 111:1819–1831.
Turgeon M, Bregman AS, Roberts B (2005) Rhythmic masking
release: eﬀects of asynchrony, temporal overlap, harmonic
relations, and source separation on cross-spectral grouping. J
Exp Psychol Hum Percept Perform 31:939–953.
van Noorden L (1975) Temporal coherence in the perception of tone
sequences. Doctoral Thesis.
van Noorden L, Moelants D (1999) Resonance in the perception of
musical pulse. J New Music Res 28:43–66.
Vuust P, Witek MAG (2014) Rhythmic complexity and predictive
coding: a novel approach to modeling rhythm and meter
perception in music. Front Psychol 5:273.
Welsh JP, Lang EJ, Suglhara I, Llina
´s R (1995) Dynamic organization
of motor control within the olivocerebellar system. Nature
Winkler I, Denham S, Mill R, Bohm TM, Bendixen A (2012)
Multistability in auditory stream segregation: a predictive coding
view. Philos Trans Royal Soc B: Biol Sci 367:1001–1012.
Winkler I, Denham SL, Nelken I (2009a) Modeling the auditory scene:
predictive regularity representations and perceptual objects.
Trends Cogn Sci 13:532–540.
Winkler I, Ha
´den GP, Ladinig O, Sziller I, Honing H (2009b) Newborn
infants detect the beat in music. Proc Natl Acad Sci USA
Witek MAG, Clarke EF, Wallentin M, Kringelbach ML, Vuust P (2014)
Syncopation, body-movement and pleasure in groove music
Canal-Bruland R, ed. PLoS ONE 9:e94446.
Wu X, Ashe J, Bushara KO (2011) Role of olivocerebellar system in
timing without awareness. Proc Natl Acad Sci USA
Xu D (2006) Role of the olivo-cerebellar system in timing. J Neurosci
Yarom Y, Cohen D (2002) The olivocerebellar system as a generator
of temporal patterns. Ann N Y Acad Sci 978:122–134.
Zanto TP, Snyder JS, Large EW (2006) Neural correlates of rhythmic
expectancy. Adv Cogn Psychol 2:221–231.
Zatorre RJ, Belin P, Penhune VB (2002) Structure and function of
auditory cortex: music and speech. Trends Cogn Sci 6:37–46.
Zatorre RJ, Chen JL, Penhune VB (2007) When the brain plays
music: auditory-motor interactions in music perception and
production. Nat Rev Neurosci 8:547–558.
Zhou H, Melloni L, Poeppel D, Ding N (2016) Interpretations of
frequency domain analyses of neural entrainment: periodicity,
fundamental frequency, and harmonics. Front Hum Neurosci
(Received 25 August 2017, Accepted 27 October 2017)
(Available online xxxx)
V. G. Rajendran et al. / Neuroscience xxx (2017) xxx–xxx 15