Conference PaperPDF Available

Bass Playing Style Detection Based on High-level Features and Pattern Similarity.


Abstract and Figures

In this paper, we compare two approaches for automatic classification of bass playing styles, one based on highlevel features and another one based on similarity measures between bass patterns. For both approaches,we compare two different strategies: classification of patterns as a whole and classification of all measures of a pattern with a subsequent accumulation of the classification results. Furthermore, we investigate the influence of potential transcription errors on the classification accuracy, which tend to occur when real audio data is analyzed. We achieve best classification accuracy values of 60.8% for the feature-based classification and 68.5% for the classification based on pattern similarity based on a taxonomy consisting of 8 different bass playing styles.
Content may be subject to copyright.
Jakob Abeßer
Fraunhofer IDMT
Ilmenau, Germany
Paul Br¨
Piranha Musik & IT
Berlin, Germany
Hanna Lukashevich, Gerald Schuller
Fraunhofer IDMT
Ilmenau, Germany
In this paper, we compare two approaches for automatic
classification of bass playing styles, one based on high-
level features and another one based on similarity mea-
sures between bass patterns. For both approaches, we com-
pare two different strategies: classification of patterns as a
whole and classification of all measures of a pattern with a
subsequent accumulation of the classification results. Fur-
thermore, we investigate the influence of potential tran-
scription errors on the classification accuracy, which tend
to occur when real audio data is analyzed. We achieve best
classification accuracy values of 60.8% for the
feature-based classification and 68.5% for the classifica-
tion based on pattern similarity based on a taxonomy con-
sisting of 8 different bass playing styles.
Melodic and harmonic structures were often studied in the
field of Music Information Retrieval. In genre discrimi-
nation tasks, however, mainly timbre-related features are
somewhat satisfying to the present day. The authors as-
sume, that bass patterns and playing styles are missing
complementaries. Bass provides central acoustic features
of music as a social phenomenon, namely its territorial
range and simultaneous bodily grasp. These qualities come
in different forms, which are what defines musical genres
to a large degree. Western popular music with its world-
wide influence on other styles is based upon compositional
principles of its classical roots, harmonically structured
around the deepest note. African styles also often use tonal
bass patterns as ground structure, while Asian and Latin
American styles traditionally prefer percussive bass sounds.
In contrast to the melody (which can easily be interpreted
in “cover versions” of different styles), the bass pattern
most often carries the main harmonic information as well
as a central part of the rhythmic and structural information.
A more detailed stylistic characterization of the bass in-
strument within music recordings will inevitably improve
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copiesare
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page.
2010 International Society forMusic Information Retrieval.
classification results in genre and artist classification tasks.
Within the field of Computational Ethnomusicology (CE)
[19], the automatic detection of the playing styles of the
participating instruments such as the bass constitutes a
meaningful approach to unravel the fusion of different mu-
sical influences of a song. This holds true for many con-
temporary music genres and especially for those of a global
music background.
The remainder of this paper is organized as follows. Af-
ter outlining the goals and challenges in Sec. 2 and Sec. 3,
we provide a brief overview over related work in Sec. 4.
In Sec. 5, we introduce novel high-level features for the
analysis of transcribed bass lines. Furthermore, we pro-
pose different classification strategies, which we apply and
compare later in this paper. We introduce the used data set
and describe the performed experiments in Sec. 6. After
the results are discussed, we conclude this paper in Sec. 7.
The goal of this publication is to compare different ap-
proaches for automatic playing style classification. For
this purpose, we aim at comparing different classification
approaches based on common statistical pattern recogni-
tion algorithms as well as on the similarity between bass
patterns. In both scenarios, we want to investigate the ap-
plicability of a aggregation classification based on the sub-
patterns of an unknown pattern.
The extraction of score parameters such as note pitch and
onset from real audio recordings requires reliable auto-
matic transcription methods, which nowadays are still error-
prone when it comes to analyzing multi-timbral and poly-
phonic audio mixtures [4, 13]. This drawback impedes a
reliable extraction of high-level features that are designed
to capture important rhythmic and tonal properties for a
description of an instrumental track. This is one problem
addressed in our experiments. Another general challenge
is the translation of musical high-level terms such as syn-
copations, scale, or pattern periodicity into parameters that
are automatically retrievable by algorithms. Information
regarding micro-timing, which is by the nature of things
impossible to encompass in a score [9], is left out.
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
Within the last years, the use of score-based high-level
features became more popular for tasks such as automatic
genre classification. To derive a score-based representation
from real audio recordings, various automatic transcrip-
tion algorithms have been proposed so far. The authors
of [18], [13], and [4] presented algorithms to transcribe
bass lines. Musical high-level features allow to capture
different properties from musical domains such as melody,
harmony, and rhythm [1,3,10, 11]. Bass-related audio fea-
tures we used for genre classification in [18], [1], and [17].
An excellent overview over existing approaches for the
analysis of expressive music performance and artist-
specific playing styles is provided in [23] and [24]. In [7],
different melodic and rhythmic high-level features are ex-
tracted before the performed melody is modeled with an
evolutionary regression tree model. The authors of [15]
also used features derived from the onset, inter-onset-
interval and loudness values of note progression to quan-
tify the performance style of piano players in terms of their
timing, articulation and dynamics. To compare different
performances in terms of rhythmic and dynamic similarity,
the authors of [14] proposed a numerical method based on
the correlation at different timescales.
5.1 Feature extraction
In this paper, we use 23 multi-dimensional high-level fea-
tures that capture various musical properties for the tonal
and rhythmic description of bass lines. The feature vec-
tor consists of 136 dimensions in total. The basic note
parameters, which we investigate in this paper, are the
absolute pitch ΘP, the loudness ΘV, the onset Θ[s]
O, and the duration Θ[s]
Dand Θ[M]
Dof each note. The
indices [s] and [M] indicate that both the onset and the du-
ration of a note can be measured in seconds as well as in
lengths of measures. All these parameters are extracted
from symbolic MIDI files by using the MIDI-Toolbox for
Afterwards, further advanced note parameters are de-
rived before features are extracted. From the pitch dif-
ferences ∆ΘPbetween adjacent notes in semitones, we
obtain vectors containing the interval directions ∆Θ(D)
(being either ascending, constant, or descending), and the
pitch differences in terms of functional interval types
P. To derive the functional type of an interval, we
map its size to a maximum absolute value of 12 semitones
or one octave by using the modulo 12 operation in case it
is larger than one octave upwards or downwards (12 semi-
tones). Then each interval is assigned to a function interval
type (prime, second, third etc.) according to well known
music principles. In addition to the high-level features pre-
sented in [1], we use various additional features related to
tonality and rhythm in this paper, which are explained in
the following subsections.
Features related to tonality
We derive features to measure if a certain scale is applied
in a bass pattern. Therefore, we take different binary scale
templates for natural minor (which includes the major scale),
harmonic minor, melodic minor, pentatonic minor (subset
of natural minor which also includes the pentatonic major
scale), blues minor, whole tone, whole tone half tone, ara-
bian, minor gypsy and hungarian gypsy [21] into account.
Each scale template consists of 12 values representing all
semitones of an octave. The value 1 is set for all semi-
tones that are part of the scale, the value 0 for those that
are not. All notes within a given pattern, which are related
to a certain scale, are accumulated by adding their normal-
ized note loudness values ΘV/ΘV,max with ΘV ,max be-
ing the maximum note loudness in a pattern. The same is
done for all notes, which are not contained in the scale.
The ratio of both sums is calculated over all investigated
scales and over all 12 possible cyclic shifts of the scale
template. This cyclic shift is performed to cope with each
possible root note position. The maximum ratio value over
all shifts is determined for each scale template and used as
a feature value, which measures the presence of each con-
sidered scale. We obtain the relative frequencies piof all
possible values in the vector that contains the interval di-
rections (∆Θ(D)
P) as well as the vector that contains the
functional interval types (Θ(F)
P) and use them as fea-
tures to characterize the variety of different pitch transi-
tions between adjacent notes.
Features related to rhythm
Syncopation embodies an important stylistic means in dif-
ferent music genres. It represents the accentuation on weak
beats of a measure instead of an accentuation on a neigh-
bored strong beat that usually would be emphasized. To
detect syncopated note sequences within a bass-line, we
investigate different temporal grids in terms of equidis-
tant partitioning of single measures. For instance, for an
eight-note grid, we map all notes inside a measure towards
one of eight segments according to their onset position in-
side the measure. In a 4
4time signature, these segments
correspond to all 4 quarter notes (on-beats) and their off-
beats in between. If at least one note is mapped to a seg-
ment, it is associated with the value 1, otherwise with 0.
For each grid, we count the presence of the following seg-
ment sequences - (1001), (0110), (0001), or (0111). These
sequences correspond to sequences of alternating on-beat
and off-beat accentuations that are labeled as syncopations.
The ratios between the number of syncopation sequences
and the number of segments are applied as features for the
rhythmical grids 4, 8, 16, and 32.
We calculate the ratio Θ(M)
O(k)between the
duration value of the k-th note in measure lengths and the
inter-onset-interval between the k-th note and its succeed-
ing note. Then we derive the mean and the variance of
this value over all notes as features. A high or low mean
value indicates whether notes are played legato or stac-
cato. The variance over all ratios captures the variation
between these two types of rhythmic articulation within a
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
given bass pattern. To measure if notes are mostly played
on on-beats or off-beats, we investigate the distribution of
notes towards the segments in the rhythmical grids as ex-
plained above for the syncopation feature. For example,
the segments 1, 3, 5, and 7 are associated to on-beat posi-
tions for an eight-note grid and a 4
4time signature. Again,
this ratio is calculated over all notes and mean and vari-
ance are taken as feature values. As additional rhythmic
properties, we derive the frequencies of occurrence of all
commonly used note lengths from half notes to 64th notes,
each in its normal, dotted, and triplet version. In addition,
the relative frequencies from all note-note, note-break and
break-note sequences over the complete pattern are taken
as features.
5.2 Classification based on statistical pattern
We investigate the applicability of the well-established Sup-
port Vector Machines (SVM) using the Radial Basis Func-
tion (RBF) as kernel combined with a preceding feature
selection using the Inertia Ratio Maximization using Fea-
ture Space Projection (IRMFSP) as a baseline experiment.
The feature selection is applied to choose the most discrim-
inative features and thus to reduce the dimensionality of
the feature space prior to the classification. Therefore, we
calculate the high-level features introduced in 5.1 for each
bass pattern, which results in an 136 dimensional feature
space. Details on both the SVM and the IRMFSP can be
found for instance in [1].
5.3 Classification based on pattern similarity
In this paper, we apply 2 different kinds of pattern similar-
ity measures, pairwise similarity measures and similarity
measures based on the Levenshtein distance. To compute
similarity values between patterns, the values of the on-
set vector Θ[M]
Oand the absolute pitch vector ΘPare sim-
ply converted into character strings. In the latter case, we
initially subtract the minimum value of ΘPfor each pat-
tern separately to remain independent from pitch transposi-
tions. This approach can of course be affected by potential
outliers, which do not belong to the pattern.
5.3.1 Similarity measures based on the Levenshtein
The Levenshtein distance DLoffers a metric for the com-
putation of the similarity of strings [6]. It measures the
minimum number of edits in terms of insertions, deletions,
and substitutions, which are necessary, to convert one string
into the other. We use the Wagner-Fischer algorithm [20]
to compute DLand derive a similarity measure SLbe-
tween two strings of length l1and l2from
SL= 1 DL/DL,max .(1)
The lengths l1and l2correspond to the number of notes
in both patterns. DL,max equals the maximum value of
l1and l2. In the experiments, we use the rhythmic simi-
larity measure SL,R and the tonal similarity measure SL,T
derived from the Levenshtein distance between the onset
Oand the pitch ΘPas explained in the previous sec-
tion. Furthermore, we investigate
SL,RT,M ax =(SL,R , SL,R SL,T
SL,T , SL,T > SL,R
SL,RT,M ean =1
2(SL,R +SL,T )(3)
by using the maximum and the arithmetic mean between
of SL,R and SL,T as aggregated similarity measures.
5.3.2 Pairwise similarity measures
In general, we derive a pairwise similarity measure
Nn,m denotes the number of notes in pattern n, for which
at least one note in pattern mexists that have the same
absolute pitch value (for the similarity measure SP,T ) or
onset value (for the similarity measure SP,R ). Nm,n is
defined vice versa. By applying the constraint that both
onset and absolute pitch need to be equal in Eq. 4, we
obtain the measure SP,RT . Furthermore, we derive the ag-
gregated similarity measures SP,RT,M ax and SP ,RT,M ean
analogous to Eq. 2 and Eq. 3.
6.1 Data-set
We assembled a novel dataset from instructional bass lit-
erature [12, 21], which consists of bass patterns from the
8 genres Swing (SWI), Funk (FUN), Blues (BLU), Reggae
(REG), Salsa & Mambo (SAL), Rock (ROC), Soul & Mo-
town (SOU) and Africa (AFR), a rather general term which
here signifies Sub-Saharan Popular Music Styles [16]. For
each genre, 40 bass-lines of 4 measure length have been
stored as symbolic audio data as MIDI files. Initial listen-
ing tests revealed that in this data set, which was assem-
bled and categorized by professional bass players, a certain
amount of stylistic overlap and misclassification between
genres as for instance Blues and Swing or Soul & Motown
and Funk occurs. The overlap is partly inherent to the ap-
proach of the data sets, which treat all examples of a style
(e.g. Rock) as homogenous although the sets include typ-
ical patterns of several decades. In some features, early
Rock patterns might resemble early Blues patterns more
than they resemble late patterns of their own style [22].
Thus, the data set will be extended further and revised by
educated musicologists for future experiments.
6.2 Experiments & Results
6.2.1 Experiment 1 - Feature-based classification
As described in Sec. 5.2, we performed a baseline experi-
ment that consists of IRMFSP for chosing the best N= 80
features and the SVM as classifier. The parameter Nhas
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
AFR 66.2 5.9 2 8.8 10.8 0 6.4 0
0 46.1 0 22.4 0 11.8 3.9 15.7
7.4 4.2 72.8 1.4 10.6 3.6 0 0
2 2.9 6.9 51.6 4.6 21.8 10.3 0
21 0 4.2 10.6 49.4 8.3 6.5 0
2.6 0 0 10.7 0 70.4 16.2 0
25 0 1.2 5.6 6.7 14 47.5 0
0 17.6 0 0 0 0 0 82.4
Bass Playing Style (correct)
Bass Playing Style (classified)
66.2 5.9 2 8.8 10.8 0 6.4 0
0 46.1 0 22.4 0 11.8 3.9 15.7
7.4 4.2 72.8 1.4 10.6 3.6 0 0
2 2.9 6.9 51.6 4.6 21.8 10.3 0
21 0 4.2 10.6 49.4 8.3 6.5 0
2.6 0 0 10.7 0 70.4 16.2 0
25 0 1.2 5.6 6.7 14 47.5 0
0 17.6 0 0 0 0 0 82.4
66.2 5.9 2 8.8 10.8 0 6.4 0
0 46.1 0 22.4 0 11.8 3.9 15.7
7.4 4.2 72.8 1.4 10.6 3.6 0 0
2 2.9 6.9 51.6 4.6 21.8 10.3 0
21 0 4.2 10.6 49.4 8.3 6.5 0
2.6 0 0 10.7 0 70.4 16.2 0
25 0 1.2 5.6 6.7 14 47.5 0
0 17.6 0 0 0 0 0 82.4
66.2 5.9 2 8.8 10.8 0 6.4 0
0 46.1 0 22.4 0 11.8 3.9 15.7
7.4 4.2 72.8 1.4 10.6 3.6 0 0
2 2.9 6.9 51.6 4.6 21.8 10.3 0
21 0 4.2 10.6 49.4 8.3 6.5 0
2.6 0 0 10.7 0 70.4 16.2 0
25 0 1.2 5.6 6.7 14 47.5 0
0 17.6 0 0 0 0 0 82.4
Figure 1. Exp. 1 - Confusion matrix for the feature-based
pattern-wise classification (all values given in %). Mean
classification accuracy is 60.8% with a standard deviation
of 2.4%.
been determined to perform best in previous tests on the
data-set. A 20-fold cross validation was applied to de-
termine the mean and standard deviation of the classifi-
cation accuracy. For a feature extraction and classification
based on complete patterns, we achieved 60.8% of accu-
racy with a standard deviation of 2.4%. The correspond-
ing confusion matrix is shown in Fig. 1. It can be seen,
that best classification results were achieved for the styles
Funk, Rock, and Swing. Strong confusions between Blues
and Motown respectively Swing, Motown and Rock, Reg-
gae and Africa as well as between Salsa and Africa can
be identified. These confusions support the musicological
assessment of the data-set given in Sec. 6.1. In addition,
they coincide with historical relations between the styles in
Africa, the Caribbean, and Latin America, as well as rela-
tions within North America as it is common musicological
knowledge [8].
As a second classification strategy, we performed the
feature extraction and classification based on sub-patterns.
Therefore, we divided each pattern within the test set into
N= 4 sub-patterns of one measure length. It was en-
sured, that no sub-patterns of patterns in the test set were
used as training data. After all sub-patterns were classi-
fied, the estimated playing style for the corresponding test
set pattern was derived from a majority decision over all
sub-pattern classifications. In case of multiple winning
classes, a random decision was applied between the win-
ning classes. For the accumulated measure-wise classifi-
cation, we achieved only 56.4% of accuracy. Thus, this
approach did not improve the classification accuracy. We
assume that the majority of the applied high-level features
that are based on different statistical descriptors (see Sec. 5.1
for details), can not provide a appropriate characterization
of the sub-patterns, which themselves only consist of 6 to
9 notes in average.
6.2.2 Experiment 2 - Pattern Similarity
This experiment is based on a leave-one-out cross-
validation scheme and thus consists of N= 320 evalu-
ation steps according to the 320 patterns in the data-set.
Within each evaluation step, the current pattern Pkis used
as test data while all remaining patterns Plwith l6=kare
used as training data. We derive the class estimate ˆckof
AFR 57.4 2.1 6.4 17 6.4 0 8.5 2.1
4.2 50 4.2 18.8 2.1 6.3 4.2 10.4
4.4 6.7 62.2 11.1 2.2 6.7 4.4 2.2
0 0 0 95.1 0 0 2.4 2.4
4.7 0 7 11.6 65.1 7 2.3 2.3
0 4.7 0 14 0 69.8 2.3 9.3
6.8 4.5 4.5 6.8 4.5 0 68.2 4.5
0 12.5 0 7.5 0 0 0 80
Bass Playing Style (correct)
Bass Playing Style (classified)
Figure2. Exp. 2 - Confusion matrix for the best similarity-
based configuration (measure-wise classification using the
SP,RT ,Max similarity measure - all values given in %).
Mean classification accuracy is 68.5% with a standard de-
viation of 3.1%.
Pkfrom the class label ˆcof the best-fitting pattern b
l= arg max
lSk,l (5)
with Sk,m representing the similarity measure between Pk
and Pmin the given case. As in Sec. 6.2.1, if multiple
patterns have the same (highest) similarity, we perform
a random decision among these candidates. This experi-
ment is performed for all similarity measures introduced
in Sec. 6.2.2.
Exp. 2a: Pattern-wise classification. Thebasic approach
for a pattern-based classification is to use each pattern of 4
measures length as one item to be classified.
Exp. 2b: Accumulated measure-wise classification. Bass
patterns are often structured in a way, that the measure or
a part of the measure, which precedes the pattern repeti-
tion, is often altered rhythmically or tonally and thus often
varies greatly from the pattern. These figures separating or
introducing pattern repetition are commonly referred to as
pickups or upbeats, meaning that they do not vary or over-
lap the following pattern repetition which starts on the first
beat of the new measure. A pattern-wise classification as
described above thus might overemphasize the difference
between the last measure because the patterns are com-
pared over their complete length. Hence, we investigate
another decision aggregation strategy in this experiment.
As described in Sec. 6.2.1, we divide each bass pattern
into sub-patterns of one measure length each. Within each
fold k, we classify each sub-pattern SPk,l of the current
test pattern Pkseparately. At the same time, we ensure
that only sub-patterns of the other patterns Piwith i6=k
are used as training set for the current fold. To accumulate
the classification results in each fold, we add all similarity
values Sk,l between each sub-pattern SPk,l towards their
assigned winning pattern(s) Pk,l,w in. The summation is
done for each of the 6 genres separately. The genre that
achieve the highest sum is considered as the winning genre.
As depicted in Fig. 3, the proposed accumulated
measure-wise classification strategy led to higher classifi-
cation accuracy values (blue bars) in comparison to a
pattern-wise classification (red bars). This approach can
be generalized and adopted to patterns of arbitrary length.
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
Mean accuracy
Accumulated measure−wise classification
Pattern−wise classification
Figure 3. Mean classification accuracy results for experi-
ment 2
Figure 4. Exp. 3 - Mean classification accuracy vs. per-
centage εof pattern variation (dotted line - pattern-wise
similarity, solid line - accumulated measure-wise similar-
The similarity measure SP,RT ,M ax clearly outperforms the
other similarity measures by over 10 percent points of ac-
curacy. The corresponding confusion matrix is shown in
Fig. 2. We therefore assume that it is beneficial to use sim-
ilarity information both based on pitch and onset similarity
of bass patterns. For the pattern-wise classification, it can
be seen that similarity measures based on tonal similar-
ity generally achieve lower accuracy results in comparison
to measures based on the rhythmic similarity. This might
be explained by the frequently occurring tonal variation of
patterns according to the given harmonic context such as a
certain chord of a changed key in different parts of a song.
The most remarkable result in confusion matrix is the very
high accuracy of 95.1% for the Motown genre.
6.2.3 Experiment 3 - Influence of pattern variations
For the extraction of bass-patterns from audio recordings,
two potential sources of error exist. In most music gen-
res, the dominant bass patterns are object of small vari-
ations throughout a music piece. An automatic system
might recognize the basic pattern or a variation of the basic
pattern. Furthermore, automatic music transcription sys-
tems are prone to errors in terms of incorrect pitch, onset,
and duration values of the notes. Both phenomena directly
have a negative effect on the computed high-level features.
We therefore investigate the achievable classification accu-
racy dependent on the percentage of notes with erroneous
note parameters.
We simulate the mentioned scenarios by manipulating
a random selection of εpercents of all notes from each
unknown pattern and vary εfrom 0% to 50%. The ma-
nipulation of a single note consists of either a modifica-
tion of the onset Θ[M]
Oby a randomly chosen difference
0.25 ∆Θ(M)
O0.25 (which corresponds to a maxi-
mum shift distance of one beat for a 4
4time signature), a
modification of the absolute pitch ΘPby a randomly cho-
sen difference 2∆ΘP2(which corresponds to a
maximum distance of 2 semitones), or a simple deletion of
the current note from the pattern. Octave pitch errors that
often appear in automatic transcription algorithms were not
considered because of the mapping of each interval to a
maximum size of one octave as described in Sec. 5.1. In-
sertions in terms of additional notes, which are not part of
the pattern will be taken into account in future experiments.
As depicted in Fig. 4, the accuracy curve of the three
different pair-wise similarity measures SP,R ,SP,T and
SP,RT ,Max falls until about 40% for a transcription er-
ror rate of 50% Interestingly, the pattern-wise classifica-
tion based on SP,R seems to be more robust to transcrip-
tion errors above 15% in comparison to the accumulated
measure-wise classification even though it has a lower ac-
curacy rate for the assumption of a perfect transcription.
6.2.4 Comparison to the related work
The comparison of the achieved results to the related work
is not directly feasible. On one side, it is caused by the
fact, that different data sets have been utilized. Tsunoo
et al. [18] reported an accuracy of 44.8% for the GZTAN
data set 1while using only bass-line features. On the other
side, the performance of only bass-line features was not
every time stated. The work of Tsuchihashi et al. [17]
showed an improvement of classification accuracy from
53.6% to 62.7% while applying bass-line features compli-
mentary to other timbre and rhythmical features, but the
results of genre classification with only bass features were
not reported.
In this paper, different approaches for the automatic de-
tection of playing styles from score parameters were com-
pared. These parameters can be extracted from symbolic
audio data (e.g. MIDI) or from real audio data by means of
automatic transcription. For the feature-based appraoch,
a best result of 60.8% of accuracy was achieved using a
combination of feature selection (IRMFSP) and classifier
(SVM) and a pattern-wise classification. Regarding the
classification based on pattern similarity, we achieved
68.5% of accuracy using the combined similarity measure
SP,RT ,Max and a measure-wise aggregation strategy based
on the classification of sub-patterns. The random baseline
is 12.5%. This approach outperformed the common ap-
proach to classify the complete pattern as once.
For analyzing real-world audio recordings, further mu-
sical aspects such as micro-timing, tempo range, applied
plucking & expression styles [2], as well as the interac-
1G. Tzanetakis and P. Cook. Musical genre classification of audio
signals. IEEE Transaction on Speech and Audio Processing, 10(5):293-
302, 2002.
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
tion with other participating instruments need to be incor-
porated into a all-embracing style description of a specific
instrument in a music recording. The results of experiment
4 emphasize the need for a well-performing transcription
system for a high-level classification task such as playing
style detection.
This work has been partly supported by the German re-
search project GlobalMusic2One2funded by the Federal
Ministry of Education and Research (BMBF-FKZ:
01/S08039B). Additionally, the Thuringian Ministry of
Economy, Employment and Technology supported this re-
search by granting funds of the European Fund for Re-
gional Development to the project Songs2See3, enabling
transnational cooperation between Thuringian companies
and their partners from other European regions.
[1] J. Abeßer, H. Lukashevich, C. Dittmar, and
G. Schuller. Genre classification using bass-related
high-level features and playing styles. In Proc. of the
Int. Society of Music Information Retrieval (ISMIR
Conference), Kobe, Japan, 2009.
[2] J. Abeßer, H. Lukashevich, and G. Schuller. Feature-
based extraction of plucking and expression styles of
the electric bass guitar. In Proc. of the IEEE Int.
Conf. on Acoustics, Speech, and Signal Processing
(ICASSP), 2010.
[3] P. J. Ponce de L´eon and J. M. I˜nesta. Pattern recogni-
tion approach for music style identification using shal-
low statistical descriptors. IEEE Transactions on Sys-
tem, Man and Cybernetics - Part C : Applications and
Reviews, 37(2):248–257,March 2007.
[4] C. Dittmar, K. Dressler, and K. Rosenbauer. A tool-
box for automatic transcription of polyphonic music.
In Proc. of the Audio Mostly, 2007.
[5] Tuomas Eerola and Petri Toiviainen. MIDI Toolbox:
MATLAB Tools for Music Research. University of
Jyv¨askyl¨a, Jyv¨askyl¨a, Finland, 2004.
[6] D. Gusfield. Algorithms on strings, trees, and se-
quences: computer science and computational biol-
ogy. Cambridge University Press, Cambridge, UK,
[7] A. Hazan, M. Grachten, and R. Ramirez. Evolving per-
formance models by performance similarity: Beyond
note-to-note transformations. In Proc. of the Int. Sym-
posium, 2006.
[8] Ellen Koskoff, editor. The Garland Encyclopedia of
World Music - The United States and Canada. Garland
Publishing, New York, 2001.
3see topics/songs2see.html
[9] Gerhard Kubik. Zum Verstehen afrikanischer Musik.
Lit Verlag, Wien, 2004.
[10] C. McKay and I. Fujinaga. Automatic genre classifi-
cation using large high-level musical feature sets. In
Proc. of the Int. Symposium of Music Information Re-
trieval (ISMIR), 2004.
[11] C. McKay and I. Fujinaga. jSymbolic: A feature ex-
tractor for MIDI files. In Int. Computer Music Confer-
ence (ICMC), pages 302–305, 2006.
[12] H.-J. Reznicek. I’m Walking - Jazz Bass. AMA, 2001.
[13] M. P. Ryyn¨anen and A. P. Klapuri. Automatic tran-
scription of melody, bass line, and chords in poly-
phonic music. Computer Music Journal, 32:72–86,
[14] C. S. Sapp. Hybrid numeric/rank similarity metrics for
musical performance analysis. In Proc. of the Int. Sym-
posium on Music Information Retrieval (ISMIR), pages
501–506, 2008.
[15] E. Stamatatos and G. Widmer. Automatic identification
of music performers with learning ensembles. Artificial
Intelligence, 165:37–56, 2005.
[16] Ruth M. Stone, editor. The Garland Encyclopedia of
World Music - Africa, volume 1. Garland Publishing,
New York, 1998.
[17] Y. Tsuchihashi, T. Kitahara, and H. Katayose. Using
bass-line features for content-based MIR. In Proc. of
the Int. Conference on Music Information Retrieval
(ISMIR), Philadelphia, USA, pages 620–625, 2008.
[18] E. Tsunoo, N. Ono, and S. Sagayama. Musical bass-
line clustering and its application to audio genre classi-
fication. In Proc. of the Int. Society of Music Informa-
tion Retrieval (ISMIR Conference), Kobe, Japan, 2009.
[19] George Tzanetakis, Ajay Kapur, W. Andrew Schloss,
and Matthew Wright. Computational ethnomusicol-
ogy. Journal of Interdisciplinary Music Studies,
1(2):1–24, 2007.
[20] Robert A. Wagner and Michael J. Fischer. The string-
to-string correction problem. Journal of the ACM
(JACM), 21(1):168–173,1974.
[21] Paul Westwood. Bass Bible. AMA, 1997.
[22] Peter Wicke. Handbuch der popul¨
aren Musik:
Geschichte, Stile, Praxis, Industrie. Schott, Mainz,
[23] G. Widmer, S. Dixon, W. Goebl, E. Pampalk, and
A. Tobudic. In search of the Horowitz factor. AI Mag-
azine, 24:111–130, 2003.
[24] G. Widmer and W. Goebl. Computational models of
expressive music performance: The state of the art.
Journal of New Music Research, 33(3):203–216,2004.
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
... In order to extract likelihood-values for different vibration modes that indicate the presence of subharmonics, harmonic spectral templates are tuned to different virtual fundamental frequencies f 0,virtual (m) = f 0 /m with m ∈ [2, 7]. 5 At the same time, these spectral templates are modified in such way that they have no spectral peaks at multiples of the "real" fundamental frequency f 0 . The likelihoodvalue χ sub,m for f 0,virtual (m) is computed by multiplying the spectrum with the modified spectral templates and using the energy sum ratio as feature similar to (4.37). ...
... Genre classification algorithms that extract audio features from the bass track are presented in [5,10,12,99,157,169,170]. Simsekli points out in [157] that "basslines form a bridge between the melody and the rhythm section. ...
... The first group of features are related to the tonality of given melodies or basslines. Features that characterize the melodic shape are used in [5,99,140,169]. The melodic shape or contour describes how the absolute pitch of a melody changes over time. Tzanetakis et al. compute simple histograms from the absolute pitch and the pitch class 1 in [174]. ...
... • Cilibrasi et al., 2004b,a;Li & Sleep, 2004b;Lin et al., 2004;Pérez-Sancho et al., 2005;Conklin, 2006;Li et al., 2006;Pérez-Sancho et al., 2006;Karydis, 2006;Karydis et al., 2006;Ruppin & Yeshurun, 2006;de León et al., 2006;Juhász, 2006;de León & Iñesta, 2007;Pérez-Sancho et al., 190 2008a;de León et al., 2008;Conklin, 2009;Hillewaere et al., 2009;Abeßer et al., 2009Abeßer et al., , 2010Şimşekli, 2010;Anan et al., 2011;Cuthbert et al., 2011;Lin & Chen, 2012;Abeßer et al., 2012;Hillewaere et al., 2012;Khoo et al., 2012;Kotsifakos et al., 2013;Velarde et al., 2013a,b;van Kranenburg et al., 2013;Conklin, 2013;Hillewaere et al., 2014;195 Hedges et al., 2014). Total:45. ...
... Pérez-Sancho et al., 2008a;de León et al., 2008;Abeßer et al., 2009; Anglade et al., 2009a,b;Pérez-Sancho et al., 2009;Abeßer et al., 2010;Pérez-García et al., 2010;Abeßer et al., 2012;Hedges et al., 2014). Total:21. ...
Music is present in everyday life and used for a wide range of objectives. Musical databases have considerably increased in number and size over the past years, therefore, the development of accurate tools for music information retrieval (MIR) has become an important topic in computer science. The increasing theoretical advances in machine learning algorithms together with the abundance of recordings available in digital audio formats, the growing quality and accessibility of on-line symbolic music data, and availability of tools and toolboxes for the extraction of musical properties have motivated many studies on machine learning and MIR. Relevant problems in MIR include classification of songs into genres, which enables the summarization of common features (or patterns) shared by different songs. The automatic classification of music genres plays a fundamental role in the context of music indexing and retrieval, so that websites and device music engines can manage and label music content. Most studies have dealt with such an issue by extracting music characteristics from the audio content, and some have provided overviews of audio features and classification algorithms for music genre classification. However, precise high-level musical information can be extracted from symbolic data (e.g. digital music scores), known to be closely related to the way humans perceive music. A number of approaches use such musical information to process, retrieve and classify music content. This manuscript provides an overview of the most important approaches that deal with music genre classification and consider the symbolic representation of music data. Current issues inherent to such a music format, as well the main algorithms adopted for the modeling of the music feature space are presented.
... We take both the mean and standard deviation over all measures as features to characterize the tempo variations during the performance. To measure the note density of each instrument track over time, we follow two approaches as previously presented in [9] . All onset-related features explained hereafter are computed for each onset class separately. ...
... After mapping all note onset values to an eighth-note subdivision (q = 8), we segment each instrument track into note sequences of one measure length. Then, we use the Levensthein distance d L to compute the rhythmic similarity between the note sequences in two arbitrary measures (see [9] for more details). We compute the matrix S ∈ R N ×N (s i,j ∈ [0, 1]) with N denoting the number of measures and s i,j denoting the similarity between measures i and j. ...
Conference Paper
Full-text available
In this paper, we present the results of a pre-study on music performance analysis of ensemble music. Our aim is to implement a music classification system for the description of live recordings, for instance to help musicologist and musicians to analyze improvised ensemble performances. The main problem we deal with is the extraction of a suitable set of audio features from the recorded instrument tracks. Our approach is to extract rhythm-related audio features and to apply them for regression-based modeling of eight more general musical attributes. The model based on Partial Least-Squares Regression without preceding Principal Component Analysis performed best for all of the eight attributes.
Conference Paper
Much work is focused upon music genre recognition (MGR) from audio recordings, symbolic data, and other modalities. While reviews have been written of some of this work before, no survey has been made of the approaches to evaluating approaches to MGR. This paper compiles a bibliography of work in MGR, and analyzes three aspects of evaluation: experimental designs, datasets, and figures of merit.
In this paper, we present a comparative study of three different classification paradigms for genre classification based on repetitive basslines. In spite of a large variety in terms of instrumentation, a bass instrument can be found in most music genres. Thus, the bass track can be analysed to explore stylistic similarities between music genres. We present an extensive set of transcription-based high-level features related to rhythm and tonality that allows one to characterize basslines on a symbolic level. Traditional classification techniques based on pattern recognition techniques and audio features are compared with rule-based classification and classification based on the similarity between basslines. We use a novel dataset that consists of typical basslines of 13 music genres from different cultural backgrounds for evaluation purposes. Finally, the genre confusion results obtained in the experiments are examined by musicologists. Our study shows that several known stylistic relationships between music genres could be verified that way by classifying typical basslines. We could achieve a highest accuracy value of 64.8% for the genre classification solely based on repetitive basslines of a song.
Conference Paper
At the Fraunhofer Institute for Digital Media Technology (IDMT) in Ilmenau, Germany, two current research projects are directed towards core problems of Music Information Retrieval. The Songs2See project is supported by the Thuringian Ministry of Economy, Employment and Technology through granting funds of the European Fund for Regional Development. The target outcome of this project is a web-based application that assists music students with their instrumental exercises. The unique advantage over existing e-learning solutions is the opportunity to create personalized exercise content using the favorite songs of the music student. GlobalMusic2one is a research project supported by the German Ministry of Education and Research. It is set out to develop a new generation of hybrid music search and recommendation engines. The target outcomes are novel adaptive methods of Music Information Retrieval in combination with Web 2.0 technologies for better quality in the automated recommendation and online marketing of world music collections.
Full-text available
This contribution gives an overview of the state of the art in the field of computational modeling of expressive music performance. The notion of predictive computational model is briefly discussed, and a number of quantitative models of various aspects of expressive performance are briefly reviewed. Four selected computational models are reviewed in some detail. Their basic principles and assumptions are explained and, wherever possible, empirical evaluations of the models on real performance data are reported. In addition to these models, which focus on general, common principles of performance, currently ongoing research on the formal characterisation of differences in individual performance style are briefly presented.
Full-text available
Abstract. John Blacking said “The main task of ethnomusicology is to explain music and music making with reference to the social, but in terms of the musical factors involved in performance and appreciation” (1979:10). For this reason, research in ethnomusicology has, from the beginning, involved analysis of sound, mostly in the form of transcriptions done “by ear” by trained scholars. Bartók’s many transcriptions of folk music of his native Hungary are a notable example. Since the days of Charles Seeger, there have been many attempts to facilitate this analysis using various technological tools. We survey such existing work, outline some guidelines for scholars interested in working in this area, and describe some of our initial research ef- forts in this field. We will use the term “Computational Ethnomusicology” (CE) to refer to the design, development and usage of computer tools that have the potential to assist in ethnomusicological research. Although not new, CE is not an established term and existing work is scattered among the different disciplines involved. As we quickly enter an era in which all recorded media will be “online,” meaning that it will be instantaneously available in digital form anywhere in the world that has an Inter- net connection, there is an unprecedented need for navigational/analytical methods that were entirely theoretical just a decade ago. This era of instantaneously available, enormous collections of music only intensifies the need for the tools that fall under the CE rubric. We will concentrate on the usefulness of a relatively new area of research in music called Music Information Retrieval (MIR). MIR is about designing and building tools that help us organize, understand and search large collections of music, and it is a field that has been rapidly evolving over the past few years, thanks in part to recent advances in computing power and digital music distribution. It en- compasses a wide variety of ideas, algorithms, tools, and systems that have been proposed to handle the increasingly large and varied amounts of musical data available digitally. Researchers in this emerging field come from many different backgrounds including computer science, electrical engineering, library and information science, music, and psychology. The technology of MIR is ripe to be integrated into the practice of ethnomusicological research. To date, the majority of existing work in MIR has focused on either popular music with applications such as music recommendation systems, or on Western “classical” music with applications such as score following and query-by- humming. In addition, as microchips become smaller and faster and as sensor technology and actuators become cheaper and more precise, we are beginning to see ethnomusicological re- search incorporating both robotic systems and digital capture of music-related bodily gestures; music in general is embodied and involves more than a microphone can record. Our hope is that the material in this paper will help motivate more interdisciplinary and multidisciplinary researchers and scholars to explore these possibilities and solidify the field of computational ethnomusicology.
Full-text available
A library of 160 high-level features is presented along with jSymbolic, a software package that extracts these features from MIDI files. jSymbolic is intended both as a platform for developing new features as well as a tool for providing features to data mining software that can be used to automatically classify music or evaluate musical similarity.
This publication introduces a software toolbox that encapsulates different algorithmic solutions directed towards the automatic extraction of symbolic note information from digitized music excerpts. This process, often referred to as automatic musictranscription is still confronted with many issues such as mimicking the human perception or making a decision between ambiguousnote candidates for symbolic representation. Therefore, the current publication describes algorithmic procedures dedicated to thedetection and classification of drum notes, bass notes, main melody notes and chord structure. The focus on four different domains of automatic transcription allows utilization of specialized analysis procedures for almost every aspect of music. This paper provides insight into the single transcription methods and their performance. Additionally, various application scenarios for the transcription based interaction with music and audio are sketched with regard to the required technologies.
Conference Paper
In this paper,we present a feature-based approach for the classification of different playing techniques in bass guitar recordings. The applied audio features are chosen to capture typical instrument sounds induced by 10 different playing techniques. A novel database that consists of approx. 4300 isolated bass notes was assembled for the purpose of evaluation. The usage of domain-specific features in a combination of feature selection and feature space transformation techniques improved the classification accuracy by over 27% points in comparison to a state-of-the-art baseline system. Classification accuracy reached 93.25% and 95.61% for the recognition of plucking and expression styles respectively.
This article addresses the problem of identifying the most likely music performer, given a set of performances of the same piece by a number of skilled candidate pianists. We propose a set of very simple features for representing stylistic characteristics of a music performer, introducing ‘norm-based’ features that relate to a kind of ‘average’ performance. A database of piano performances of 22 pianists playing two pieces by Frédéric Chopin is used in the presented experiments. Due to the limitations of the training set size and the characteristics of the input features we propose an ensemble of simple classifiers derived by both subsampling the training set and subsampling the input features. Experiments show that the proposed features are able to quantify the differences between music performers. The proposed ensemble can efficiently cope with multi-class music performer recognition under inter-piece conditions, a difficult musical task, displaying a level of accuracy unlikely to be matched by human listeners (under similar conditions).