Content uploaded by Asteris Zacharakis
Author content
All content in this area was uploaded by Asteris Zacharakis on Sep 27, 2023
Content may be subject to copyright.
ICMPC17-APSCOM7, Tokyo, August 24-28, 2023
Multiple exposures and repeating musical patterns shape perception of musical
ideas and structure
Stamatia Kalaitzidou (a), Asterios Zacharakis (a),
Konstantinos Velenis (a) and Emilios Cambouropoulos (a)
(a) School of Music Studies, Aristotle University of Thessaloniki
mat.kal@hotmail.com
In this study we look into how repeated listening to a classical music excerpt may affect the perception of its
structure and to what extent types of repetition may shape the perception of musical entities. Thirty-eight
participants with musical training were asked to indicate in real-time the points where they identified the
introduction of ‘musical ideas’ while listening to the exposition of Sonata No. 18 in E flat major by L.V.
Beethoven. The first condition involved listening to an audio recording of the piece (performed by Stephen
Kovacevich), while the second condition involved listening to an inexpressive MIDI-generated audio file. Both
conditions comprised four successive relistenings of the same excerpt during which participants were asked to
repeatedly perform the above task. Our findings suggest that the identification of musical ideas constitutes a viable
component for capturing perceived musical structure. In addition, multiple exposures not only resulted in a
progressively clearer perception of musical structure but also prompted a perceptual organization dominated by
larger structures. Finally, it seems that expressive performance leads to fewer markings of repeated patterns
whereas listeners rely more on immediate repetitions in inexpressive mechanical renditions, marking thus more
beginnings of repeated patterns as musical ideas.
Keywords: musical idea, repetition, multiple exposures, musical structure
1. Introduction
A variety of empirical studies have dealt with the
question of how musical repetition helps identify and
categorize musical elements and how this influences
the perception of musical structure (see Margulis,
2014 for an overview). Repetition often acts as an
umbrella under which the perception of various
musical parameters is studied. Such perspectives
concern either how a stimulus is assigned to
perceptual categories in the brain (Vosniadou, 1989),
or is classified together with other stimuli based on
similarities (Ziv & Eitan, 2007).
Cambouropoulos (2006) has studied how repetition
influences the segmentation of the musical surface
and has presented a computational model for melodic
segmentation in relation to parallelism (especially
immediately repeated melodic patterns).
Studies have also looked into the way continuous
repetition perceptually transforms spoken phrases into
music (Deutsch, Henthorn, & Lapidis, 2011). More
specifically, identical repetitions of single phrases
were most likely to be rated as a song than speech. An
additional result of the above research was that when
the words of the phrase were transposed or something
changed slightly in the order of the syllables, listeners
were less likely to identify the phrase as song and not
as speech. A more recent study (Gross and Μargulis,
2018) empirically examined environmental sounds
which were identically repeated or transformed while
repeated (jumbled order). Results showed that both
forms of repetition transformed the perception of
environmental sounds into music.
Margulis’ (2012) research investigated the types of
repetition that can be identified by listeners and how
these change across multiple exposures of the same
music stimuli. Listeners indicated beginnings of
repetitions by means of pressing a button. Results
showed that repeated exposure to a certain piece of
music facilitated a gradual detection of longer
repeated units and revealed different higher-level
structural relations. In terms of immediately repeating
units, results indicated that repetitions longer than 3.5
seconds were more noticeable than shorter ones, and
that an immediately repeated unit was more noticeable
when occurring within the same phrase rather than as
a beginning of a new phrase (with intervening
closure).
The current research investigates repetition
following a similar strategy to Margulis’ study in
terms of experimental design and methodology (e.g.,
repeated listenings of the music stimulus and real-time
mouse clicks by listeners). It may be argued that one
important challenge in empirically inferring possible
structural effects of musical repetition types is to
obtain perceived musical structures without an
explicit reference to repetition (thus minimising a
possible introduction of bias) during the experimental
process. In this study we opted to ask listeners identify
“musical ideas” rather than “repetitions” per se.
The term “musical idea” has been adopted by a
number of scholars in the past. It has been suggested
to indicate elements that characterize either a musical
pattern or entire phrase relationships (Boss, 2001, p.
213) and has also been described as a musical pattern
that creates the desire to be heard again (Burnham,
2002, p. 885). Furthermore, the transformation of a
musical idea into an element of musical structure has
been argued to directly depend on its repetition
(Schachter, 1999, p. 74). Given that the concept of
ICMPC17-APSCOM7, Tokyo, August 24-28, 2023
musical idea is closely linked to musical patterns (and
thus repetition and similarity), asking listeners to
identify musical ideas seems like a promising path for
inferring perceived musical structure. In addition, by
obtaining beginnings of musical ideas we are also
aiming to study repetition indirectly. Of course,
musical ideas may be also introduced by other means
(e.g., abrupt change of rhythm, harmony or texture);
this problem, however, may be diminished by
selecting a musical stimulus that is clearly structured
around repetition (see subsection 2.2 and Figure 1).
What is more, the proposed indirect study of
repetition via musical ideas may feature an additional
advantage. It is, generally-speaking, good practice to
ask subjects to perform tasks that are as musical and
intuitive as possible. Identifying repetitions explicitly
is a conscious cognitive task that requires retrieval
from memory together with subsequent comparison
and similarity assessment, thus potentially a
cognitively demanding task. Indicating ‘musical
ideas’ on the other hand, seems to be a more intuitive
task that listeners may feel more comfortable to
perform based on their overall musical
knowledge/understanding. Thus, one general
objective of this preliminary work is to examine
whether it would be possible to capture perceptual
aspects of repetition indirectly via concepts such as
‘musical idea’.
It is worth pointing out that, repetition within a
musical work may appear in various guises (e.g.,
immediate or distant, identical or variant repetition)
on both microstructural and macrostructural levels.
These different types of repetition at various
hierarchical levels, in different combinations and
roles may well influence the perception of musical
structure in complex ways; some types of repetition
may assist establishing independent musical units
perceived at many hierarchical levels, whereas others
remain in the background shaping the motivic identity
and texture of larger entities. A further point of
interest of the current study is to examine whether
some types of repetition are more relevant than others
for establishing ‘musical ideas’.
Finally, a number of additional questions that this
study will attempt to address relate to the repeated
exposure to the musical stimulus in respect to
different levels of expressivity: human performance
vs. a MIDI generated rendition. Do multiple listenings
of a musical excerpt improve the identification
accuracy of musical ideas? Can the establishment of
higher-level structures (e.g., identification of longer
repeated patterns) that has been suggested by
Margulis (2012) as a consequence of multiple
exposure also emerge through the noting of musical
ideas? And finally, what is the role of expressive
performance on the perception of repetition and
structure?
2. Methods
2.1. Participants
Thirty-eight musically trained listeners took part in
the experiment. The participants were recruited
through the School of Music Studies of the Aristotle
University of Thessaloniki and social networking
platforms. The average musical training was eight
years.
2.2. Materials
The presented excerpt was the exposition (m.1-64)
of the Sonata No. 18 in E flat major, Op. 31, No 3, by
L.V. Beethoven (see Figure 1). The selected excerpt
features repetitions in most of the types and modes
(i.e., identical, intervallic transferred, spatially varied,
immediate or with an intermediate time gap) some of
which are included in different structural parts or in
parts that bear structural ambiguity. Therefore, this
excerpt posits repetition as a key element in the
perception of structure, at various levels, from its
microstructure to its macrostructure.
Two different types of stimuli originating from this
musical score were utilized for this study. The first,
consisted of a classical recording of the piece by
Stephen Kovacevich while the second was a
computer-generated playback from a MIDI file that
was completely stripped of expressive qualities. The
latter served to neutralize any expressive element that
may facilitate the perception of musical structure by
the listeners.
2.3. Procedure
The experiment was carried out using an online
platform for stimulus presentation and data
acquisition which randomly assigned each participant
to one of the two experimental conditions (human
performer or MIDI). This resulted in 20 participants
for the former and 18 for the latter condition. They all
listened to the two-minute-long music excerpt four
times in succession. The task of the listeners was to
indicate onsets of musical ideas by clicking a button
on the keyboard.
The question was phrased in this way so as to make
clear that what was requested was the recording of the
beginning of complete musical entities. The choice of
the verb “begin” is supported by the parallelism rule
GPR6 by Lerdahl & Jackendoff (1983, p. 51) and also
by Deliège (2001, p. 400) regarding pattern
recognition and the importance of initiation in this
process. Furthermore, Ahlbäck (2004, p. 251) states
that “similarity grouping is a process-oriented from
the beginning/start (of the melodic patterns), as
similarity is recognized through repetition, and the
repetition of what has already been listened to is
identified from the beginning”.
ICMPC17-APSCOM7, Tokyo, August 24-28, 2023
3. Results
Data analysis involved a comparison between the
two experimental conditions (rendition by human
performer and MIDI) and the four listenings of the
excerpt. This quantitative approach was
complemented by a theoretical qualitative analysis of
the musical work in relation to the listeners’
responses.
The number of clicks per piece was represented as
frequency histograms with 1-second bins. The
window of 1 second was deemed optimal to represent
the reaction times of the listeners since both shorter
(.5 s) or longer (2 s) time windows resulted in flatter
distributions of clicks in time.
An initial observation was that listeners in the
human performance condition tended to report
musical ideas with some time delay during the 1st
listening compared to the next ones. To quantify this,
we constructed a time series consisting of events with
structural importance for both conditions. We
then performed a cross-correlation analysis between
these time series for each condition and the four
different frequency histograms corresponding to each
relistening. The time lag featuring the highest
correlation was 2 seconds for the 1st listening; this
fell to 1 second from the 2nd relistening on (with
progressively increasing correlation strength). On the
contrary, for the MIDI rendition, the time lag with the
highest correlation was 1 second even from the 1st
exposure. This time lag fluctuated from 1 to 2 seconds
for the relistenings showing a slight loss of accuracy
from the 3rd exposure on. Figure 2 shows the
frequency histograms and the ground truth structural
points in time for both experimental conditions.
In order to quantify the distributions of the number
of clicks in time, we calculated flatness values as the
ratio between the geometric mean and the arithmetic
mean (Peeters et al., 2011) for each one. Smaller
values of flatness (minimum 0) indicate a peaky
distribution while larger values (maximum 1) indicate
a flatter distribution. Table 1 shows that the
distribution of the number of clicks in time is flatter
for the MIDI condition in comparison to the human
performer condition for the 1st exposure (flatness
value = .66 vs .73). This means that participants
demonstrated a higher agreement in the temporal
identification of musical ideas when they listened to
an actual performance of the piece. The inter-
participant accuracy increases with the number of
exposures (distributions become less flat) for both
conditions with the higher decrease in flatness taking
place mainly in the 2nd exposure. The difference in
flatness between the two conditions is even slightly
increased at the 4th exposure implying that the
contribution of relistenings to a cohesive
identification of musical ideas is more prominent in
the human performer condition.
The perceived structural organisation implied by
the distribution of clicks shown in Figure 2 seems to
be in agreement with what would be expected based
Table 1. Flatness as a metric of distribution for the
number of clicks in time. Smaller values (minimum 0)
indicate a peakier distribution (meaning higher
coincidence/agreement between listeners) while
larger values (maximum 1) indicate a flatter
distribution.
Flatness
Exposure
Human perform.
MIDI Rendition
1st
.66
.73
2nd
.56
.64
3rd
.55
.65
4th
.52
.62
Figure 1. Music score of the expositions for Beethoven’s Sonata No 18, in E flat major, Op. 31, No 3. The
highlighted patterns A1-A3 and B1-B2 correspond to the patterns represented by the dotted vertical stems
of Figure 2. Same color indicates repeated occurrence of a pattern.
ICMPC17-APSCOM7, Tokyo, August 24-28, 2023
on the ground truth structural points for both
conditions. This is evident even from the 1st listening
but the salience of the peaks (i.e., confidence in
indicating an idea) most of the time increases together
with additional exposures (e.g., sec. 49 & sec. 66 for
the human performer or sec. 18 & sec. 72 for the MIDI
rendition). A1 and B1, which are the starting points of
the first and second thematic groups of the sonata
exposition gather the highest number of clicks.
Patterns A2, A3a-b and B2 which are less important
for higher structural levels are all noted less strongly
and also exhibit differences between the experimental
conditions. Such specific differences are highlighted
below:
Human performance: Pattern A2 was not identified
overall as the beginning of a musical idea. A2 is a
chord succession that follows A1 (see Figure 1) and
consists of four measures, the last two of which are an
immediate modified repetition of the first two. A3a,
which is a differentiated pattern containing a cadence,
is noted, although not as strongly as A1 and B1.
However, A3b, which constitutes the continuation of
Α3a and later on transforms into the transitional
segment, is generally not identified in its first
appearance (sec.15) except in the 1st listening; A3b
accumulated a small number of clicks later on sec.39
and sec. 41. B2 is the consequent phrase of the period
and is indicated by listeners, although less
prominently than B1.
MIDI rendition: In contrast to the human
performance condition, the A2 patterns were
indicated for the MIDI rendition, especially in the first
listening. Interestingly, in a total reversal from the
human condition, A3a is not noted but A3b is noted
instead. Finally, the number of clicks for B2 was
substantially suppressed in comparison to the human
performance.
4. Discussion
The analysis of our data gave rise to a number of
interesting findings regarding the relationships
between repeated exposure, internal repetition of
musical material and musical expressivity.
To begin with, it seems that requesting to identify
musical ideas is indeed a valid way to empirically
obtain information on perceived musical structure.
Let us take a look at one simple example from the
excerpt in Figure 1. The A1 consists of two meters,
with the second measure being an immediate identical
repetition of the first. Had listeners been asked to
identify repetition, it would have been expected that
the repetition of the second measure would be
indicated (by clicking at the beginning of measure 2).
Since listeners were asked to indicate musical ideas
instead, we hardly received any clicks at the beginning
of measure 2 (or the internal repetition of A2). It
seems that the immediate repetition of the first
measure is a kind of repetition that assists forming and
establishing the autonomy/independence of the first
appearance of the pattern (related to ‘formative
repetition’ by Lidov (2005)) rather than appearing as
a distinct autonomous structural element. When
Figure 2. Distribution of the number of clicks within 1-second windows that indicate the beginning of
a musical idea for the human performance (top) and the MIDI rendition (bottom). The vertical dotted
stems are annotations of the points in time where beginnings of musical patterns appear in the audio.
ICMPC17-APSCOM7, Tokyo, August 24-28, 2023
listeners are asked to mark ‘musical ideas’ they ignore
this type of low-level immediate repetition and
provide responses that are more telling about the
musical structure.
The fundamental musical structure was indicated
even from the 1st listening but became more accurate
and clearer through additional exposures. The
improvement in accuracy for identifying structurally
important points in time was particularly true for the
human performance condition, as is manifest through
a decrease in the listeners’ reaction time while they
became increasingly familiar with the musical
excerpt. This effect was less prominent in the MIDI
condition which featured high accuracy even from the
1st listening. One possible explanation could be that
the lack of expressive gestures might have made
listeners more sensitive to elements of the musical
surface which, in turn, might account for the smaller
time lag between the actual events and their markings.
Apart from temporal accuracy, the clarity of
perceived structure (i.e., inter-participant agreement
in noting musical ideas) was also progressively
increased for both conditions. This was demonstrated
by a decreasing trend in the flatness of the number-of-
clicks distributions with exposure. A comparison
between the two conditions showed that the
identification of clicks for the human performance
excerpt was more cohesive from the 1st exposure and
retained this positive margin throughout the rest of the
exposures. This could be attributed to the stronger
emphasis on structural elements by the performer in
contrast to the lack of expressive elements in the MIDI
rendition that may have resulted in some ambiguity
regarding structural perception.
In addition, we observed a trend for progressive
(through exposure) attribution of increased salience to
repeating units that signified starting points of
structural segments. This may be illustrated in the
following instances. First, the A1 pattern in secs. 30
and 35 belong to the same structural segment. In the
1st listening of the human performance excerpt, these
two successive appearances of A1 received a similar
amount of clicks. However, through the additional
exposures —and especially in the 4th relistening—
the first A1 received almost double the number of
clicks compared to its immediate repetition. In the
MIDI rendition, the decrease of clicks in the second
A1 seems to have taken place already from the second
listening.
An additional decrease in the number of clicks took
place at A1' (sec. 64\67) which might indicate that
repeated exposure helped listeners identify points of
structural importance more clearly. This pattern is a
modified repetition of A1 and serves as the end of the
first section (see Fig. 1). Taken together, all the above
support Margulis’ (2012) findings that multiple
exposures can lead to a perceptual organization of
music dominated by larger structures.
Interestingly, the repetition of musical units —and
immediate repetition in particular— seemed to give
rise to the perception of musical ideas through
interaction with musical expressivity. For example,
the A3b (~ sec. 40) presents a difference in the
responses between the two listening conditions. In the
human performance, the immediate repetition of A3b
(sec. 42) was noted only in the 1st listening and was
suppressed in the next exposures. However, the
listeners’ reaction to the MIDI rendition enhanced this
point by raising its number of clicks for each
additional exposure. This implies that in the absence
of expressive cues, the immediate repetition was
identified as a musical idea. Similarly, A2 was also
most notable in the MIDI rendition and less so for the
human performer. Specifically, some participants of
the MIDI condition clicked on points of the internal
immediate repetition of the pattern (last 2 measures)
(sec. 9, sec. 23, sec. 57) which is reflected by the
seeming delay of clicks in A2. The above responses to
immediate repetition of musical material seem to
weakly support the assertion by Cambouropoulos
(2006, p. 254) according to which a specific type of
repetition (i.e., immediate), can affect the way in
which the melodic surface is segmented. In our case,
something similar occurs not only in the melodic
surface but also throughout the whole musical surface,
including its vertical dimension.
In addition, there is an interesting contrast observed
between the human and MIDI rendition in the
transitional segment whose exact starting point is a
disputed matter between analysts. The two opposing
opinions are that it either begins at the second instance
of A3b (Rosen, 2002, p.176) or in the next A1
reappearance in the minor mode (Tovey, 1931, p.131).
It seems that multiple exposures in the absence of
expressivity (MIDI rendition) result in a transition of
perceptual weight from A1 to A3b leading to an
almost equivalent number of clicks for the 4th
listening. Here, the immediate repetition functions as
a starting point of a structural segment. On the
contrary, the interpretation of the human performer in
combination with multiple exposures increased the
number of clicks for A1 to the expense of A3b. This
can probably be attributed to the emphasis applied by
Kovacevich through a rhythmical slowdown
(ritenuto) just prior to the reappearance of A1.
Finally, two points are given in respect to the
second thematic group of the sonata. Firstly, it can be
noticed that there is a delay in the marking of B1
especially for the MIDI condition. The B1 pattern
starts with a levare- the starting point of the pattern is
at the upbeat of the previous measure. According to
Hasty (1997, p. 34) rhythmic regularity derives from
the periodicity of equal beats and their internal
relationships. In our musical excerpt, B1 is the first
time where a rhythmic irregularity —pattern begins
upbeat— occurs. This rhythmic innovation might
have frustrated the listeners and led them a bit off the
beginning of the pattern. Secondly, B2 was indicated
in the human performance but not in the MIDI
rendition. It is useful to state that B2 is the second
ICMPC17-APSCOM7, Tokyo, August 24-28, 2023
phrase of a period. The period consists of the B1
phrase in the tonic and the B2 phrase, which is an
immediate repetition of B1, in the dominant. It seems
that the repetition of the material in the dominant is
not considered as the beginning of a musical idea, but
rather as a continuation of the pre-existing appearance
in the tonic. Consequent phrases of the period follow
and are embodied in the antecedent (Schoenberg,
1967, p.25; Caplin, 2013, p.75). The listeners in the
MIDI rendition seem to have got used to listening to
the period as a whole and that is why they might have
not noted B2. This could indicate that the function of
repetition exists in a deeper level of perception, in this
case in the perception of the period itself. By contrast,
we can observe that the listeners of the human
performance do mark B2 (although with fewer clicks
than B1). At this point, it is again useful to underpin
the power of interpretative tools employed by the
performer. In our excerpt, the pianist decides to
highlight the pattern in the dominant (B2) with a
rhythmic change. We can assume that the emphasis
given to B2 by the performer led the listeners to
separate the consequent from the antecedent phrase of
the period.
The findings of this empirical experiment resulted
from a comparison between separate groups of
musicians in two different listening conditions. It
would be of great interest to apply the above
methodology to non-musically trained listeners and
search for potential differences or similarities in the
perception of musical structure with respect to
repetition and multiple exposures. Future work should
also expand its scope beyond classical style to a
variety of musical idioms in order to provide more
generalised insight into the function of repetition and
multiple exposures on structural perception.
References
Ahlbäck, S., (2004). Melody beyond notes: A study
of melodic cognition. Ph.D. thesis, Göteborgs
Universitet, Sweden.
Boss, J., (2000/2001), The "Musical Idea" and
Global Coherence in Schoenberg's Atonal and
Serial Music, Intégral, Vol. 14/15, 209-264.
Burnham S., (2002). Form, in The Cambridge
History of Western Music Theory, Christensen
Τ., (ed), Cambridge University Press, 880-906.
Cambouropoulos E., (2006). Musical Parallelism
and Melodic Segmentation: A Computational
Approach, Music Perception, 23, 249-269.
Caplin W.E., (2013). Analyzing classical form: An
approach for the classroom, Oxford University
Press.
Dai, S., Yu H., Dannenberg, R., B., (2022). What is
missing in deep music generation? A study of
repetition and structure in popular music, In
Proceedings of the 23rd International Society
for Music Information Retrieval (ISMIR), 659-
666.
Deliège, I., (2001). Prototype effects in music
listening: An empirical approach to the notion
of imprint. Music Perception, 18, 371-407.
Deutsch, D., Henthorn, T., & Lapidis, R., (2011).
Illusory transformation from speech to song,
The Journal of the Acoustical Society of
America, 129, 2245–2252.
Hasty, C., (1997). Meter as Rhythm, Oxford
University Press.
Lerdahl, F. & Jackendoff, R., (1983). Generative
Theory of Tonal Music, MIT press.
Lidov, D., (2005). Is Language a Music? Writings
on Musical Form and Signification, Indiana
University press.
Margulis E. H., (2014). On repeat; How music plays
the mind, Oxford University Press.
Margulis, E. H., (2012). Musical repetition detection
across multiple exposures. Music Perception,
29, 377-385.
Rosen C., (2002). Beethoven's Piano Sonatas A
Short Companion, Yale University Press.
Schachter, C., (1999). Unfoldings-Essays in
Schenckerian Theory and Analysis, Oxford
University Press.
Schoenberg A., (1967). Fundamentals of musical
composition (G. Strang & L. Stein, ed.), Faber
& Faber.
Simchy-Gross, R., Margulis, E. H., (2018). The
sound to music illusion: Repetition can
musicalize nonspeech sounds, Music &
Science, I, 1-6.
Francis, T. D., (1931). A companion to Beethoven’s
27 pianoforte sonatas complete analyses, the
Associated board of the Royal School of
Music.
Vosniadou S., Ortony A., (1989). Similarity and
analogical reasoning, Cambridge University
Press, 199-241.
Ziv N., Eitan Z., (2007). Themes as prototypes:
Similarity judgments and categorization tasks
in musical contexts, Musicae Scientiae
Discussion Forum 4A, 99-133.