Conference PaperPDF Available

Evidence for Pianist-specific Rubato Style in Chopin Nocturnes.


Abstract and Figures

The performance of music usually involves a great deal of interpretation by the musician. In classical music, the final ritardando is a good example of the expressive aspect of music performance. Even though expressive timing data is expected to have a strong component that is determined by the piece itself, in this paper we investigate to what degree individual performance style has an effect on the timing of final ritardandi. The particular approach taken here uses Friberg and Sundberg's kinematic rubato model in order to characterize performed ritardandi. Using a machine- learning classifier, we carry out a pianist identification task to assess the suitability of the data for characterizing the in- dividual playing style of pianists. The results indicate that in spite of an extremely reduced data representation, when cancelling the piece-specific aspects, pianists can often be identified with accuracy above baseline. This fact suggests the existence of a performer-specific style of playing ritardandi.
Content may be subject to copyright.
Miguel Molina-Solana
Dpt. Computer Science and AI
University of Granada, Spain
miguelmolina at
Maarten Grachten
IPEM - Dept. of Musicology
Ghent University, Belgium
Gerhard Widmer
Dpt. of Computational Perception
Johannes Kepler Univ., Austria
The performance of music usually involves a great deal of
interpretation by the musician. In classical music, the final
ritardando is a good example of the expressive aspect of
music performance. Even though expressive timing data
is expected to have a strong component that is determined
by the piece itself, in this paper we investigate to what de-
gree individual performance style has an effect on the tim-
ing of final ritardandi. The particular approach taken here
uses Friberg and Sundberg’s kinematic rubato model in or-
der to characterize performed ritardandi. Using a machine-
learning classifier, we carry out a pianist identification task
to assess the suitability of the data for characterizing the in-
dividual playing style of pianists. The results indicate that
in spite of an extremely reduced data representation, when
cancelling the piece-specific aspects, pianists can often be
identified with accuracy above baseline. This fact suggests
the existence of a performer-specific style of playing ritar-
Performance of music involves a great deal of interpre-
tation by the musician. This is particularly true of piano
music from the Romantic period, where performances are
characterized by large fluctuations of tempo and dynam-
ics. In music performance research it is generally acknowl-
edged that, although widely used, the mechanical perfor-
mance (with a constant tempo throughout the piece) is not
an adequate norm when studying expressive timing, since
it is not the way a performance should naturally sound.
As an alternative, models of expressive timing could be
used, as argued in [18]. However, only few models exist
that deal with expressive timing in general [2, 16]. Due
to the complexity and heterogeneity of expressive timing,
most models only describe specific phenomena, such as the
timing of grace notes [15] or the final ritardando.
Precisely, the final ritardando —the slowing down to-
ward the end of a musical performance to conclude the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page.
2010 International Society for Music Information Retrieval.
piece gracefully— is one of the clearest manifestations of
expressive timing in music. Several models have been pro-
posed [3, 14] in the related literature to account for its spe-
cific shape. Those models generally come in the form of a
mathematical function that describes how the tempo of the
performance changes with score position.
In a previous empirical study by Grachten et al. [4] on
the performance of final ritardandi, a kinematic model [3]
was fitted to a set of performances. Even though some sys-
tematic differences were found between pianists, in gen-
eral the model parameters tend to reflect primarily aspects
of the piece rather than the individual style of the pianist
(i.e. expressive timing data is expected to have a strong
component that is determined by piece-specific aspects).
This fact is relevant in a recurrent discussion in the field
of musicology, about which factor (the piece or the per-
former) mostly influences a performance [9]. Some experts
argue that the performance should be preceded of a thor-
ough study of the piece; while others indicate that the per-
sonal feeling of music is the first and main point to be con-
sidered. Works supporting both views can be found in [12].
A study by Lindstr¨
om et al. [7] involving a questionnaire,
showed that music students consider both the structure of
the piece and the feelings of the performer as relevant in a
The current paper extends that previous work by Grachten
et al., by investigating whether or not canceling piece-specific
aspects leads to a better performer characterization. Musi-
cologically speaking, the validation of this hypothesis im-
plies that performers’ signatures do exist in music inter-
pretation regardless of the particular piece. We present a
study of how final ritardandi in piano works can be used
for identifying the pianist performing the piece. Our pro-
posal consists in applying a model to timing data, normal-
izing the fitted model parameters per piece and searching
for performer-specific patterns.
Performer characterization and identification [8, 13] is
a challenging task since not only the performances of the
same piece by several performers are compared, but also
the performance of different pieces by the same performer.
Opposed to performer identification (where performers are
supposed to have distinctive ways of performing) is piece
identification —which requires the structure of the piece
to imply a particular expressive behavior, regardless of the
A further implication of this work would be that, when
an estimation can be made of the prototypical performance
based on the musical score, this estimation could be a use-
ful reference for judging the characteristics of performances.
This knowledge can also allow the artificial interpretation
of musical works by a computer in expressive and realistic
ways [17].
This paper is organized as follows: Section 2 describes
the dataset used for this study, including the original timing
data and the model we fit them to. Section 3 deals with the
data processing procedure. Results of the pianist classifi-
cation task are presented and discussed in Section 4, while
Section 5 states conclusions and future work.
The data used in this paper come from measurements of
timing data of musical performances taken from commer-
cial CD recordings of Chopin’s Nocturnes. This collection
has been chosen since these pieces exemplify classical pi-
ano music from the Romantic period, a genre that is char-
acterized by the prominent role of expressive interpretation
in terms of tempo and dynamics. Furthermore, Chopin’s
Nocturnes is a well-known repertoire, performed by many
pianists, and thus facilitating large scale studies.
As explained before, models of expressive timing are
generally focused in a certain phenomenon. In our study,
we will focus on the final ritardando of the pieces. Hence,
we select those Nocturnes whose final passages have a rel-
atively high note density and are more or less homoge-
neous in terms of rhythm. With these constraints we avoid
the need to estimate a tempo curve from only few interon-
set intervals, and reduce the impact of rhythmic particular-
ities on the tempo curve.
In particular, we used ritardandi from the following pieces:
Op. 9 nr. 3, Op. 15 nr. 1, Op. 15 nr. 2, Op. 27 nr. 1, Op. 27
nr. 2 and Op. 48 nr. 1. In two cases (Op. 9 nr. 3 and Op. 48
nr. 1), the final passage consists of two clearly separated
parts, being both of them performed individually with a
ritardando. These ritardandi were treated separately —
namely rit1 and rit2. So that, we have 8 different ritardandi
for our study.
The data were obtained in a semi-automated manner,
using a software tool [10] for automatic transcription of the
audio recordings. From these transcriptions, the segments
corresponding to the final ritardandi were then extracted
and corrected manually by means of Sonic Visualiser, a
software tool for audio annotation and analysis [1].
The dataset in this paper is a subset of that used in
previous work [4], as we are only considering those pi-
anists from whom all eight recordings are available. Ta-
ble 1 shows the names of these pianists and the year of
their recordings. Hence, the dataset for the current study
contains a total amount of 136 ritardandi from 17 different
2.1 Friberg & Sundberg’s kinematic model
As mentioned in Section 1, we wish to establish to what
degree the specific form of the final ritardando in a musical
Arrau (1978) Falvai (1997) Pires (1996)
Ashkenazy (1985) Harasiewicz (1961) Pollini (2005)
Barenboim (1981) Hewitt (2003) Rubinstein (1965)
Biret (1991) Leonskaja (1992) Tsong (1978)
d’Ascoli (2005) Mertanen (2001) Woodward (2006)
Engerer (1993) Ohlsson (1979)
Table 1. Performer and year of the recordings analyzed in
the experiments
q = -4
w = .3
q = 1 q = 5
w = .5 w = .7
Figure 1. Examples of tempo curves generated by the
model using different values of parameters wand q. In
each plot, the x and y axis represent score position and
tempo respectively, both in arbitrary units.
performance is dependent on the identity of the performing
pianist. We address this question by fitting a model to the
data, and investigating the relation between the piece/pianist
identity and the parameter values of the fitted model. To
such a task, we employ the kinematic model by Friberg &
Sundberg [3].
This model is based on the hypothesized analogy of mu-
sical tempo and physical motion, and is derived from a
study of the motion of runners when slowing down. From
a variety of decelerations by various runners, the deceler-
ations judged by a jury to be most aesthetically pleasing
turned out to be those where the deceleration force is held
roughly constant. This observation was implying that ve-
locity was proportional to square root function of time, and
to a cubic root function of position. Equating physical po-
sition to score position, Friberg and Sundberg used this ve-
locity function as a model for tempo in musical ritardandi.
Thus, the model describes the tempo v(x)of a ritardando
as a function of score position x:
v(x) = (1 + (wq1)x)1/q (1)
The parameter qis added to account for variation in cur-
vature, as the function is not necessarily a cubic root of
position. The parameter wrepresents the final tempo, and
was added since the tempo in music cannot reach zero. The
model can be fitted to ritardandi performed by particular
pianists by means of its parameters.
Parameters wand qgenerate different plots of tempo
curves (see Figure 1). Values of q > 1lead to convex
tempo curves, whereas values of q < 1lead to concave
0.3 0.4 0.5 0.6 0.7 0.8 0.9
Parameter q
Parameter w
Figure 2. Original data representation in the w-qplane
curves. The parameter wdetermines the vertical end posi-
tion of the curve.
Even though this kind of models are incomplete as they
ignore several musical characteristics [6], the kinematic
model described above was reported to predict the evolu-
tion of tempo during the final ritardando quite accurately,
when matched to empirical data [3]. An additional advan-
tage of this model is its simplicity, both conceptually (it
contains few parameters) and computationally (it is easy
to implement).
The model is designed to work with normalized score
position and tempo. More specifically, the ritardando is
assumed to span the score positions in the range [0,1], and
the initial tempo is defined to be 1. Although in most cases
there is a ritardando instruction written in the score, the ri-
tardando may start slightly before or after this instruction.
When normalizing, we must assure that normalized posi-
tion 0 coincide with the actual start of the ritardando. A
manual inspection of the data showed that the starting po-
sition of the ritardandi strongly tended to coincide among
pianists. For each piece, the predominant starting position
was determined and the normalization of score positions
was done accordingly.
The model is fitted to the data by non-linear least-squares
fitting through the Levenberg-Marquardt algorithm 1, us-
ing the implementation from gnuplot. The model fitting is
applied to each performance individually, so for each com-
1The fitting must be done by numerical approximation since the model
is non-linear in the parameters wand q
bination of pianist and piece, three values are obtained: w,
qand the root mean square of the error after fitting (serving
this value as a goodness-of-fit measure).
At this point, we can represent each particular ritar-
dando in the corpus as a combination of those two attributes:
wand q. In Figure 2 2, the values obtained from fitting are
displayed as a scatter plot on the two-dimensional attribute
space qversus w. The whole dataset —136 instances—
is shown in this plot. Each point location correspond to a
certain curve with parameters wand q. We refer the reader
to Figure 1 to visualize the shape of different combination
of parameters.
As can be seen from Figure 2, there are no clusters that
can be easily identified from this representation. Hence,
the performer identification task using these original data
is expected to have a low success rate.
In Section 1, we already mentioned that the expressive tim-
ing data is expected (as stated in [4]) to have a strong com-
ponent that is determined by piece-specific aspects such as
rhythmical structure and harmony. In order to focus on
pianist-specific aspects of timing, it would be helpful to
remove this piece-specific component.
Let Xbe the set of all instances (i.e. ritardando perfor-
mances) in our dataset. Each instance xXis a duple
(w, q). Given a ritardando i,Xiis the subset of Xthat
2this figure is best viewed in color
contains those instances xXcorresponding to that par-
ticular ritardando.
In order to remove the piece-specific components, we
propose to apply a linear transformation to the 2-attribute
representation of ritardandi. This transformation consists
in calculating the performance norm for a given piece and
subtracting it from the actual examples of that piece. To
do so, we first group the instances according to the piece
they belong. We then calculate the centroid of each group
(e.g. mean value between all these instances) and move it
to the origin, moving consequently all the instances within
that group.
We are aware that modelling the performance norm of
a given ritardando as the mean of the performances of that
ritardando is not the only option and probably not the best
one. In fact, which performance is the best and which one
is the most representative is still an open problem with no
clear results. Moreover, several performance norms can be
equally valid for the same score. In spite of these difficul-
ties, we chose to use the mean to represent the performance
norm, for its simplicity and for the lack of an obvious al-
Two approaches were then devised in order to calculate
that performance norm. In the first one, the mean perfor-
mance curve is calculate as a unweighted mean of the at-
tributes wand q(see Equation 2); whereas in the second
one, fit serves to weight the mean (see Equation 3).
In the first approach, the performance norm for a given
ritardando ican be calculated as:
In the second approach, it is calculated as a weighted
mean, where fitistands for the fit value of instance xi:
In either case, all instances xiare then transformed into
iby subtracting the corresponding performance norm:
Xwould be then the dataset that contains all x. Af-
ter this transformation, all xcontain mainly information
about the performer of the ritardando, as we have removed
the common component of the performances per piece.
In order to verify whether pianists have a personal way
of playing ritardandi, independent of the piece they play,
we have designed a classification experiment with different
conditions, in which performers are identified by their ri-
tardandi. The ritardandi are represented by the fitted model
parameters. In one condition, the data instances are the
set X, i.e. the fitted model parameters are used as such,
without modification. In the second and third conditions,
Figure 3. % success rate in the performer identification
task using the whole dataset, with different k-NN classi-
fiers. Baseline value (5.88%) from random classification is
also shown
the piece-specific component in every performance is sub-
tracted (data set X). The second condition uses the un-
weighted average as the performance norm, the third con-
dition uses the weighted average.
Note that accurate performer identification in this setup
is unlikely. Firstly the current setting, in which the number
of classes (17) is much higher than the number of instances
per class (8), is rather austere as a classification problem.
Secondly, the representation of the performer’s rubato by
a model with two parameters is very constrained, and is
unlikely to capture all (if any) of the performer’s individual
rubato style. Nevertheless, by comparing results between
the different conditions, we hope to determine the presence
of individual performer style independent of piece.
As previously explained, the training instances (ritar-
dandi of a particular piece performed by a particular pi-
anist) consist of two attributes (wand q) that describe the
shape of the ritardando in terms of timing. Those attributes
come from matching the original timing data with the kine-
matic model previously cited.
The pianist classification task is executed as follows.
We employ k-NN (Nearest Neighbor) classification, with
k∈ {1, . . . , 7}. The target concept is the pianist in all the
cases, and two attributes (wand q) are used. For validation,
we employ leave-one-out cross-validation over a dataset of
136 instances (see Section 2). The experiments are carried
out by using the Weka framework [5].
Figure 3 shows the results for the previously described
setups, employing a range of k-NN classifiers with differ-
ent values of k∈ {1,...,7}. We also carry out the clas-
sification task using the original data (without the transfor-
mation) that were shown in Figure 2, in order to compare
the effect of the transformation.
The first conclusion we can extract from the results is
that the success rate is practically always better when trans-
forming the data than when not. In other words, by remov-
ing the (predominant) piece-specific component, it gets eas-
ier to recognize performers. This is particularly interesting
as it provides evidence for the existence of a performer-
specific style of playing ritardandi, which was our initial
Note however, that the success rate is not so good to
allow this representation for being a suitable estimation of
the performer of a piece, even in the best case. A model
with only two parameters cannot comprise the complexity
of a performer expressive fingerprint. Although improving
performer identification is an interesting problem, that is
not the point of this work.
As can be seen, employing a weighted mean of wand
qfor calculating the performance norm of a piece —being
fit the weight— leads to better results when kis small (i.e.
k < 3). However, this approach, which is methodologi-
cally the most valid, does not make a remarkable differ-
ence with respect to the original data for larger values of
An interesting and unexpected result is that the transfor-
mation with the unweighted mean (see equation 2), gives
better results for medium-large kvalues. The lower results
for smaller kcould be explained by the fact that instances
with a low fit (which are actually noisy data), interfere with
the nearest-neighbor classification process. The better re-
sults for higher ksuggest that in the wider neighborhood
of the instance to be classified, the instances of the correct
target dominate —and thus that the noise due to low fit is
only limited.
Note also that this approach is more stable with respect
to the size of kthan the original or the weighted ones. It
also outperforms the random classification baseline —that
is 5.88% with 17 classes— for all the different values of k.
Further experiments show that those are the trends for
those two different transformation of the data. Employing
the weighted mean leads to the highest accuracy using a 1-
NN classifier, but it quickly degrades as kis increased. On
the other hand, an unweighted mean leads to more stable
results, with the maximum reached with an intermediate
number of neighbors.
Although (as expected with many classes, few instances
and a simplistic model) the classification results are not sat-
isfactory from the perspective of performer identification,
the improvement that transforming the data (by removing
piece-specific aspects) gives in classification results, sug-
gests that there is a performer-specific aspect of rubato tim-
ing. Even more, it can be located specifically in the curva-
ture and depth of the rubato (wand qparameters).
Ritardandi in musical performances are good examples of
the expressive interpretation of the score by the pianist.
However, in addition to personal style, ritardando perfor-
mances tend to be substantially determined by the musical
context they appeared in. Because of this fact, we propose
in this paper a procedure for canceling these piece-specific
aspects and focus on the personal style of pianists.
To do so, we make use of collected timing variations
during ritardando in the performances of Chopin Nocturnes
by famous pianists. We obtain a two-attributes (w,q) rep-
resentation of each ritardando, by fitting Friberg and Sund-
berg’s kinematic model to the data.
A performer identification task was carried out using
k-Nearest Neighbor classification on, comparing the (w,q)
representation to another condition in which average wand
qvalues per piece are subtracted from each (w,q) pair.
The results indicate that in even in this reduced repre-
sentation of ritardandi, pianists can often be identified by
the tempo curve of the ritardandi above baseline accuracy.
More importantly, removing the piece-specific component
in the wand qvalues leads to better performer identifica-
This suggests that even very global features of ritar-
dandi, such as its depth (w) and curvature (q), carry some
performer-specific information. We expect that a more de-
tailed representation of the timing variation of ritardandi
performances will reveal more of the individual style of
A more detailed analysis of the results is necessary to
answer further questions. For instance, do all pianists have
a quantifiable individual style or only some? Also, there is
a need for alternative models of rubato (such as the model
proposed by Repp [11]), to represent and study ritardandi
in more detail.
Finally, we intend to relate our empirical findings with
the musicological issue of the factors affecting music per-
formances. Experiments supporting whether or not the
structure of the piece and the feelings of the performer are
present in renditions could be of interest to musicologists.
This research is supported by the Austrian Research Fund
FWF under grants P19349 and Z159 (‘Wittgenstein Award’).
M. Molina-Solana is supported by the Spanish Ministry of
Education (FPU grant AP2007-02119).
[1] Chris Cannam, Christian Landone, Mark Sandler, and
Juan Pablo Bello. The sonic visualiser: A visualisation
platform for semantic descriptors from musical signals.
In Proc. Seventh International Conference on Music
Information Retrieval (ISMIR 2006), Victoria, Canada,
October 8-12 2006.
[2] Anders Friberg. Generative rules for music perfor-
mance: A formal description of a rule system. Com-
puter Music Journal, 15(2):56–71, 1991.
[3] Anders Friberg and Johan Sundberg. Does musical per-
formance allude to locomotion? A model of final ri-
tardandi derived from measurements of stopping run-
ners. Journal of the Acoustical Society of America,
105(3):1469–1484, 1999.
[4] Maarten Grachten and Gerhard Widmer. The kinematic
rubato model as a means of studying final ritards across
pieces and pianists. In Proc. Sixth Sound and Music
Computing Conference (SMC 2009), pages 173–178,
Porto, Portugal, July 23-25 2009.
[5] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard
Pfahringer, Peter Reutemann, and Ian H. Witten. The
WEKA Data Mining Software: An update. SIGKDD
Explorations, 11(1):10–18, 2009.
[6] Henkjan Honing. When a good fit is not good enough:
a case study on the final ritard. In Proc. Eighth Inter-
national Conference on Music Perception & Cognition
(ICMPC8), pages 510–513, Evanston, IL, USA, Au-
gust 3-7 2004.
[7] Erik Lindstr¨
om, Patrik N. Juslin, Roberto Bresin, and
Aaron Williamon. ”expressivity comes from within
your soul”: A questionnaire study of music students’
perspectives on expressivity. Research Studies in Mu-
sic Education, 20:23–47, 2003.
[8] Miguel Molina-Solana, Josep Lluis Arcos, and Emilia
Gomez. Using expressive trends for identifying violin
performers. In Proc. Ninth Int. Conf. on Music Infor-
mation Retrieval (ISMIR2008), pages 495–500, 2008.
[9] Miguel Molina-Solana and Maarten Grachten. Na-
ture versus culture in ritardando performances. In
Proc. Sixth Conference on Interdisciplinary Musicol-
ogy (CIM10), Sheffield, United Kingdom, July 23-24
[10] Bernhard Niedermayer. Non-negative matrix division
for the automatic transcription of polyphonic music.
In Proc. Ninth International Conference on Music In-
formation Retrieval (ISMIR 2008), Philadelphia, USA,
September 14-18 2008.
[11] Bruno H. Repp. Diversity and commonality in music
performance - An analysis of timing microstructure
in Schumann’s “Tr¨
aumerei”. Journal of the Acoustical
Society of America, 92(5):2546–2568, 1992.
[12] John Rink, editor. The Practice of Performance: Stud-
ies in Musical Interpretation. Cambridge University
Press, 1996.
[13] Efstathios Stamatatos and Gerhard Widmer. Automatic
identification of music performers with learning en-
sembles. Artificial Intelligence, 165(1):37–56, 2005.
[14] Johan Sundberg and Violet Verrillo. On the anatomy of
the retard: A study of timing in music. Journal of the
Acoustical Society of America, 68(3):772–779, 1980.
[15] Renee Timmers, Richard Ashley, Peter Desain, Henk-
jan Honing, and W. Luke Windsor. Timing of orna-
ments in the theme of Beethoven’s Paisiello Varia-
tions: Empirical data and a model. Music Perception,
20(1):3–33, 2002.
[16] Neil P. Todd. A computational model of rubato. Con-
temporary Music Review, 3(1):69–88, 1989.
[17] Gerhard Widmer, Sebastian Flossmann, and Maarten
Grachten. YQX plays Chopin. AI Magazine, 30(3):35–
48, 2009.
[18] W. Luke Windsor and E.F. Clarke. Expressive tim-
ing and dynamics in real and artificial musical perfor-
mances: Using and algorithm as an analytical tool. Mu-
sic Perception, 15(2):127–152, 1997.
... Methods for comparing performances can be used for identifying musicians by their individual performance styles. This has been demonstrated for violinists (Molina-Solana et al., 2008, 2010a, saxophone players (Ramírez et al., 2007), and pianists (Stamatatos and Widmer, 2005;Saunders et al., 2008;Molina-Solana et al., 2010b). ...
Full-text available
Expressive performance is an indispensable part of music making. When playing a piece, expert performers shape various parameters (tempo, timing, dynamics, intonation, articulation, etc.) in ways that are not prescribed by the notated score, in this way producing an expressive rendition that brings out dramatic, affective, and emotional qualities that may engage and affect the listeners. Given the central importance of this skill for many kinds of music, expressive performance has become an important research topic for disciplines like musicology, music psychology, etc. This paper focuses on a specific thread of research: work on computational music performance models. Computational models are attempts at codifying hypotheses about expressive performance in terms of mathematical formulas or computer programs, so that they can be evaluated in systematic and quantitative ways. Such models can serve at least two purposes: they permit us to systematically study certain hypotheses regarding performance; and they can be used as tools to generate automated or semi-automated performances, in artistic or educational contexts. The present article presents an up-to-date overview of the state of the art in this domain. We explore recent trends in the field, such as a strong focus on data-driven (machine learning) approaches; a growing interest in interactive expressive systems, such as conductor simulators and automatic accompaniment systems; and an increased interest in exploring cognitively plausible features and models. We provide an in-depth discussion of several important design choices in such computer models, and discuss a crucial (and still largely unsolved) problem that is hindering systematic progress: the question of how to evaluate such models in scientifically and musically meaningful ways. From all this, we finally derive some research directions that should be pursued with priority, in order to advance the field and our understanding of expressive music performance.
... To this end, we have developed a technique for audio-to-score alignment, the task of temporally aligning a musical score to an audio rendition of that score. Score alignment has been used extensively, especially in interactive applications such as automatic accompaniment (Hu, Dannenberg, and Tzanetakis 2003;Cont 2010) or automatic page-turning (Arzt, Widmer, and Dixon 2008), and in performance-analysis applications (Sapp 2007;Molina-Solana and Widmer 2010). Score-alignment methods geared toward interactivity must operate in real time and thus require online inference, whereas those aimed for performance analysis may use offline inference but must be accurate enough to uncover useful information about interpretation. ...
This article presents an offline method for aligning an audio signal to individual instrumental parts constituting a musical score. The proposed method is based on fitting multiple hidden semi-Markov models (HSMMs) to the observed audio signal. The emission probability of each state of the HSMM is described using latent harmonic allocation (LHA), a Bayesian model of a harmonic sound mixture. Each HSMM corresponds to one musical instrument’s part, and the state duration probability is conditioned on a linear dynamics system (LDS) tempo model. Variational Bayesian inference is used to jointly infer LHA, HSMM, and the LDS. We evaluate the capability of the method to align musical audio to its score, under reverberation, structural variations, and fluctuations in onset timing among different parts.
... Audio-to-score alignment, the task of finding a temporal mapping between a music score and audio signal, is a critical task in music information retrieval. It is required whenever a system needs to coordinate a music score (e.g. a standard MIDI file) and an audio signal, such as score-informed source separation [1] [2], or score-informed analysis of music performance [3] [4] [5] [6] [7]. ...
Conference Paper
Full-text available
This paper presents a Bayesian method for temporally aligning a music score and an audio rendition. A critical problem in audio-to-score alignment is in dealing with the wide variety of timbre and volume of the audio rendition. In contrast with existing works that achieve this through ad-hoc feature design or careful training of tone models, we propose a Bayesian audio-to-score alignment method by modeling music performance as a Bayesian Hidden Markov Model, each state of which emits a Bayesian signal model based on Latent Harmonic Allocation. After attenuating reverberation, variational Bayes method is used to iteratively adapt the alignment, instrument tone model and the volume balance at each position of the score. The method is evaluated using sixty works of classical music of a variety of instrumentation ranging from solo piano to full orchestra. We verify that our method improves the alignment accuracy compared to dynamic time warping based on chroma vector for orchestral music, or our method employed in a maximum likelihood setting.
Full-text available
This is a detailed technical presentation of performance rules resulting from a project in which music performance has been analyzed by means of an analysis-by-synthesis procedure. The purpose of the rules is to convert the written score, complemented with chord symbols and phrase markers, to a musically-acceptable performance. The are currently implemented in the programming language Lisp on a Macintosh computer
Full-text available
This paper compares timing and key-velocity data collected from a skilled performance of Schubert's G♭-major Impromptu (Opus 90) with a number of performances generated by a version of a musical expression algorithm proposed by Todd (1992). Regression analysis is used to demonstrate both the shortcomings of this model as a complete explanation of musical expression and how it might be more successfully used as a tool for analyzing data from real performances. Used in this second manner, the algorithm is shown to provide a general expressive baseline against which other aspects of expression may be highlighted. It is also suggested that such a baseline provides a method of decomposing performances into continuous and discrete forms of expression. It is concluded that using algorithmic models as heuristic tools, rather than as explanations in themselves, may better serve our increased understanding of the flexible and multiple nature of musical expression.
Full-text available
Much has been written about expressivity by philosophers, composers, musicologists, and psychologists, but little is known about how the musicians of tomorrow — music students — approach this subject. This paper reports an exploratory study in which 135 students from music conservatories in three countries (England, Italy, Sweden) filled out a questionnaire that addressed four themes: (a) conceptualizing expressivity, (b) expressivity in everyday practice, (c) expressivity in music teaching, and (d) novel teaching strategies. The results suggest that students define expressivity mainly in terms of communicating emotions and `playing with feeling'. Expressive skills are regarded as highly important by students, and they would like to practice more on expressivity than is currently the case. However, most students are skeptical toward using computers in teaching of expressivity since they cannot see how such applications could work. The results suggest that expressivity deserves more attention in music education than has hitherto been the case.
Full-text available
This investigation explores the common assumption that music and motion are closely related by comparing the stopping of running and the termination of a piece of music. Video recordings were made of professional dancers' stopping from running under different deceleration conditions, and instant values of body velocity, step frequency, and step length were estimated. In decelerations that were highly rated for aesthetic quality by a panel of choreographers, the mean body velocity could be approximated by a square- root function of time, which is equivalent to a cubic-root function of position. This implies a linear relationship between kinetic energy and time, i.e., a constant braking power. The mean body velocity showed a striking similarity with the mean tempo pattern of final ritardandi in music performances. The constant braking power was used as the basis for a model describing both the changes of tempo in final ritardandi and the changes of velocity in runners' decelerations. The translation of physical motion to musical tempo was realized by assuming that velocity and musical tempo are equivalent. Two parameters were added to the model to account for the variation observed in individual ritardandi and in individual decelerations: (1) the parameter q controlling the curvature, q = 3 corresponding to the runners' deceleration, and (2) the parameter v(end) for the final velocity and tempo value, respectively. A listening experiment was carried out presenting music examples with final ritardandi according to the model with different q values or to an alternative function. Highest ratings were obtained for the model with q = 2 and q = 3. Out of three functions, the model produced the best fit to individual measured ritardandi as well as to individual decelerations. A function previously used for modeling phrase- related tempo variations (interonset duration as a quadratic function of score position) produced the lowest ratings and the poorest fits to individual ritardandi. The results thus seem to substantiate the commonly assumed analogies between motion and music.
Full-text available
This paper presents an empirical study of the performance of final ritards in classical piano music by a collection of famous pianists. The particular approach taken here uses Friberg and Sundberg's kinematic rubato model in order to characterizethevariabilityofperformedritardsacrosspieces and pianists. The variability is studied in terms of the model parameters controlling the depth and curvature of the ritard, after the model has been fitted to the data. Apart from find- ing a strong positive correlation of both parameters, we de- rive curvature values from the current data set that are sub- stantially higher than curvature values deemed appropriate in previous studies. Although the model is too simple to capture all meaningful fluctuations in tempo, its parameters seem to be musically relevant, since performances of the same piece tend to be strongly concentrated in the param- eter space. Unsurprisingly, the model parameters are gen- erally not discriminative for pianist identity. Still, in some cases systematic differences between pianists are observed between pianists.
Full-text available
The relation between music and motion has been a topic of much theoretical and empirical research. An important contribution is made by a family of computational theories, so-called kinematic models, that make an explicit relation between the laws of physical motion in the real world and expressive timing in music performance (see Friberg & Sundberg, 1999). These models were shown to have a good fit with a variety of empirical data, most notably that of the final ritard in music performance: the typical slowing down at the end of a music performance. However, the predictions of these kinematic models are independent of (1) the number of events, (2) the rhythmic structure, and (3) the overall tempo of the performance; These factors have no effect on the predicted shape of the ritardando. Computer simulations of a number of rhythm perception models show, however, a large effect of these structural and temporal factors. They are therefore proposed as a perception-based alternative to the kinematic approach. While a final ritard might coarsely resemble a square root function (according to a kinematic model), the predictions made by perception-based models are also influenced by the temporal structure of the musical material that constraints possible shapes of the ritard, and it can therefore be considered a potentially stronger theory than one that simply has a good fit (Roberts & Pashler, 2000).
The Practice of Performance: Studies in Musical Interpretation, edited by John Rink. Cambridge, Cambridge University Press, 1995. xxi + 290 pp. ISBN 0 521 45374 7. Reviewed by José A. Boiven
The timing of the last tones constituting the final retard is studied in performances of motor music, i.e., music dominated by long sequences of short and equal note values frequently accompanied by similar series of twice as long note values. The results suggest that the retard length is related to the length of the final cadence and that the retards are divided into two phases, the first of which is variable while the second is more regular; its length and decrease in tempo depends on the length of the last conceptual unit (motive) of the piece and, as regards the decrease in tempo, also the preretard mean tempo, with which the piece is played. The same preretard mean tempo also determines the duration of the note preceding the final chord. These observations are expressed in a set of equations by means of which retards are computed for a set of compositions. The musical quality of such rule generated retards is assessed by a jury of experienced musicians and music listeners.
Presented is a model of rubato, implemented in Lisp, in which expression is viewed as the mapping of musical structure into the variables of expression. The basic idea is that the performer uses “phrase final lengthening” as a device to reflect some internal representation of the phrase structure. The representation is based on Lardahl and Jackendoff's time-span reduction. The basic heuristic in the model is recursive involving look-ahead and planning at a number of levels. The planned phrasings are superposed beat by beat and the output from the program is a list of durations which could easily be adapted to be sent to a synthesiser given a suitable system.