Content uploaded by Gerald Schuller
Author content
All content in this area was uploaded by Gerald Schuller on May 08, 2014
Content may be subject to copyright.
BASS PLAYING STYLE DETECTION BASED ON HIGH-LEVEL
FEATURES AND PATTERN SIMILARITY
Jakob Abeßer
Fraunhofer IDMT
Ilmenau, Germany
(abr@idmt.fraunhofer.de)
Paul Br¨
auer
Piranha Musik & IT
Berlin, Germany
Hanna Lukashevich, Gerald Schuller
Fraunhofer IDMT
Ilmenau, Germany
ABSTRACT
In this paper, we compare two approaches for automatic
classification of bass playing styles, one based on high-
level features and another one based on similarity mea-
sures between bass patterns. For both approaches, we com-
pare two different strategies: classification of patterns as a
whole and classification of all measures of a pattern with a
subsequent accumulation of the classification results. Fur-
thermore, we investigate the influence of potential tran-
scription errors on the classification accuracy, which tend
to occur when real audio data is analyzed. We achieve best
classification accuracy values of 60.8% for the
feature-based classification and 68.5% for the classifica-
tion based on pattern similarity based on a taxonomy con-
sisting of 8 different bass playing styles.
1. MOTIVATION
Melodic and harmonic structures were often studied in the
field of Music Information Retrieval. In genre discrimi-
nation tasks, however, mainly timbre-related features are
somewhat satisfying to the present day. The authors as-
sume, that bass patterns and playing styles are missing
complementaries. Bass provides central acoustic features
of music as a social phenomenon, namely its territorial
range and simultaneous bodily grasp. These qualities come
in different forms, which are what defines musical genres
to a large degree. Western popular music with its world-
wide influence on other styles is based upon compositional
principles of its classical roots, harmonically structured
around the deepest note. African styles also often use tonal
bass patterns as ground structure, while Asian and Latin
American styles traditionally prefer percussive bass sounds.
In contrast to the melody (which can easily be interpreted
in “cover versions” of different styles), the bass pattern
most often carries the main harmonic information as well
as a central part of the rhythmic and structural information.
A more detailed stylistic characterization of the bass in-
strument within music recordings will inevitably improve
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copiesare
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page.
c
2010 International Society forMusic Information Retrieval.
classification results in genre and artist classification tasks.
Within the field of Computational Ethnomusicology (CE)
[19], the automatic detection of the playing styles of the
participating instruments such as the bass constitutes a
meaningful approach to unravel the fusion of different mu-
sical influences of a song. This holds true for many con-
temporary music genres and especially for those of a global
music background.
The remainder of this paper is organized as follows. Af-
ter outlining the goals and challenges in Sec. 2 and Sec. 3,
we provide a brief overview over related work in Sec. 4.
In Sec. 5, we introduce novel high-level features for the
analysis of transcribed bass lines. Furthermore, we pro-
pose different classification strategies, which we apply and
compare later in this paper. We introduce the used data set
and describe the performed experiments in Sec. 6. After
the results are discussed, we conclude this paper in Sec. 7.
2. GOALS
The goal of this publication is to compare different ap-
proaches for automatic playing style classification. For
this purpose, we aim at comparing different classification
approaches based on common statistical pattern recogni-
tion algorithms as well as on the similarity between bass
patterns. In both scenarios, we want to investigate the ap-
plicability of a aggregation classification based on the sub-
patterns of an unknown pattern.
3. CHALLENGES
The extraction of score parameters such as note pitch and
onset from real audio recordings requires reliable auto-
matic transcription methods, which nowadays are still error-
prone when it comes to analyzing multi-timbral and poly-
phonic audio mixtures [4, 13]. This drawback impedes a
reliable extraction of high-level features that are designed
to capture important rhythmic and tonal properties for a
description of an instrumental track. This is one problem
addressed in our experiments. Another general challenge
is the translation of musical high-level terms such as syn-
copations, scale, or pattern periodicity into parameters that
are automatically retrievable by algorithms. Information
regarding micro-timing, which is by the nature of things
impossible to encompass in a score [9], is left out.
93
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
4. PREVIOUS APPROACHES
Within the last years, the use of score-based high-level
features became more popular for tasks such as automatic
genre classification. To derive a score-based representation
from real audio recordings, various automatic transcrip-
tion algorithms have been proposed so far. The authors
of [18], [13], and [4] presented algorithms to transcribe
bass lines. Musical high-level features allow to capture
different properties from musical domains such as melody,
harmony, and rhythm [1,3,10, 11]. Bass-related audio fea-
tures we used for genre classification in [18], [1], and [17].
An excellent overview over existing approaches for the
analysis of expressive music performance and artist-
specific playing styles is provided in [23] and [24]. In [7],
different melodic and rhythmic high-level features are ex-
tracted before the performed melody is modeled with an
evolutionary regression tree model. The authors of [15]
also used features derived from the onset, inter-onset-
interval and loudness values of note progression to quan-
tify the performance style of piano players in terms of their
timing, articulation and dynamics. To compare different
performances in terms of rhythmic and dynamic similarity,
the authors of [14] proposed a numerical method based on
the correlation at different timescales.
5. NOVEL APPROACH
5.1 Feature extraction
In this paper, we use 23 multi-dimensional high-level fea-
tures that capture various musical properties for the tonal
and rhythmic description of bass lines. The feature vec-
tor consists of 136 dimensions in total. The basic note
parameters, which we investigate in this paper, are the
absolute pitch ΘP, the loudness ΘV, the onset Θ[s]
Oand
Θ[M]
O, and the duration Θ[s]
Dand Θ[M]
Dof each note. The
indices [s] and [M] indicate that both the onset and the du-
ration of a note can be measured in seconds as well as in
lengths of measures. All these parameters are extracted
from symbolic MIDI files by using the MIDI-Toolbox for
MATLAB [5].
Afterwards, further advanced note parameters are de-
rived before features are extracted. From the pitch dif-
ferences ∆ΘPbetween adjacent notes in semitones, we
obtain vectors containing the interval directions ∆Θ(D)
P
(being either ascending, constant, or descending), and the
pitch differences in terms of functional interval types
∆Θ(F)
P. To derive the functional type of an interval, we
map its size to a maximum absolute value of 12 semitones
or one octave by using the modulo 12 operation in case it
is larger than one octave upwards or downwards (12 semi-
tones). Then each interval is assigned to a function interval
type (prime, second, third etc.) according to well known
music principles. In addition to the high-level features pre-
sented in [1], we use various additional features related to
tonality and rhythm in this paper, which are explained in
the following subsections.
Features related to tonality
We derive features to measure if a certain scale is applied
in a bass pattern. Therefore, we take different binary scale
templates for natural minor (which includes the major scale),
harmonic minor, melodic minor, pentatonic minor (subset
of natural minor which also includes the pentatonic major
scale), blues minor, whole tone, whole tone half tone, ara-
bian, minor gypsy and hungarian gypsy [21] into account.
Each scale template consists of 12 values representing all
semitones of an octave. The value 1 is set for all semi-
tones that are part of the scale, the value 0 for those that
are not. All notes within a given pattern, which are related
to a certain scale, are accumulated by adding their normal-
ized note loudness values ΘV/ΘV,max with ΘV ,max be-
ing the maximum note loudness in a pattern. The same is
done for all notes, which are not contained in the scale.
The ratio of both sums is calculated over all investigated
scales and over all 12 possible cyclic shifts of the scale
template. This cyclic shift is performed to cope with each
possible root note position. The maximum ratio value over
all shifts is determined for each scale template and used as
a feature value, which measures the presence of each con-
sidered scale. We obtain the relative frequencies piof all
possible values in the vector that contains the interval di-
rections (∆Θ(D)
P) as well as the vector that contains the
functional interval types (∆Θ(F)
P) and use them as fea-
tures to characterize the variety of different pitch transi-
tions between adjacent notes.
Features related to rhythm
Syncopation embodies an important stylistic means in dif-
ferent music genres. It represents the accentuation on weak
beats of a measure instead of an accentuation on a neigh-
bored strong beat that usually would be emphasized. To
detect syncopated note sequences within a bass-line, we
investigate different temporal grids in terms of equidis-
tant partitioning of single measures. For instance, for an
eight-note grid, we map all notes inside a measure towards
one of eight segments according to their onset position in-
side the measure. In a 4
4time signature, these segments
correspond to all 4 quarter notes (on-beats) and their off-
beats in between. If at least one note is mapped to a seg-
ment, it is associated with the value 1, otherwise with 0.
For each grid, we count the presence of the following seg-
ment sequences - (1001), (0110), (0001), or (0111). These
sequences correspond to sequences of alternating on-beat
and off-beat accentuations that are labeled as syncopations.
The ratios between the number of syncopation sequences
and the number of segments are applied as features for the
rhythmical grids 4, 8, 16, and 32.
We calculate the ratio Θ(M)
D(k)/∆Θ(M)
O(k)between the
duration value of the k-th note in measure lengths and the
inter-onset-interval between the k-th note and its succeed-
ing note. Then we derive the mean and the variance of
this value over all notes as features. A high or low mean
value indicates whether notes are played legato or stac-
cato. The variance over all ratios captures the variation
between these two types of rhythmic articulation within a
94
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
given bass pattern. To measure if notes are mostly played
on on-beats or off-beats, we investigate the distribution of
notes towards the segments in the rhythmical grids as ex-
plained above for the syncopation feature. For example,
the segments 1, 3, 5, and 7 are associated to on-beat posi-
tions for an eight-note grid and a 4
4time signature. Again,
this ratio is calculated over all notes and mean and vari-
ance are taken as feature values. As additional rhythmic
properties, we derive the frequencies of occurrence of all
commonly used note lengths from half notes to 64th notes,
each in its normal, dotted, and triplet version. In addition,
the relative frequencies from all note-note, note-break and
break-note sequences over the complete pattern are taken
as features.
5.2 Classification based on statistical pattern
recognition
We investigate the applicability of the well-established Sup-
port Vector Machines (SVM) using the Radial Basis Func-
tion (RBF) as kernel combined with a preceding feature
selection using the Inertia Ratio Maximization using Fea-
ture Space Projection (IRMFSP) as a baseline experiment.
The feature selection is applied to choose the most discrim-
inative features and thus to reduce the dimensionality of
the feature space prior to the classification. Therefore, we
calculate the high-level features introduced in 5.1 for each
bass pattern, which results in an 136 dimensional feature
space. Details on both the SVM and the IRMFSP can be
found for instance in [1].
5.3 Classification based on pattern similarity
In this paper, we apply 2 different kinds of pattern similar-
ity measures, pairwise similarity measures and similarity
measures based on the Levenshtein distance. To compute
similarity values between patterns, the values of the on-
set vector Θ[M]
Oand the absolute pitch vector ΘPare sim-
ply converted into character strings. In the latter case, we
initially subtract the minimum value of ΘPfor each pat-
tern separately to remain independent from pitch transposi-
tions. This approach can of course be affected by potential
outliers, which do not belong to the pattern.
5.3.1 Similarity measures based on the Levenshtein
distance
The Levenshtein distance DLoffers a metric for the com-
putation of the similarity of strings [6]. It measures the
minimum number of edits in terms of insertions, deletions,
and substitutions, which are necessary, to convert one string
into the other. We use the Wagner-Fischer algorithm [20]
to compute DLand derive a similarity measure SLbe-
tween two strings of length l1and l2from
SL= 1 −DL/DL,max .(1)
The lengths l1and l2correspond to the number of notes
in both patterns. DL,max equals the maximum value of
l1and l2. In the experiments, we use the rhythmic simi-
larity measure SL,R and the tonal similarity measure SL,T
derived from the Levenshtein distance between the onset
Θ[M]
Oand the pitch ΘPas explained in the previous sec-
tion. Furthermore, we investigate
SL,RT,M ax =(SL,R , SL,R ≥SL,T
SL,T , SL,T > SL,R
(2)
and
SL,RT,M ean =1
2(SL,R +SL,T )(3)
by using the maximum and the arithmetic mean between
of SL,R and SL,T as aggregated similarity measures.
5.3.2 Pairwise similarity measures
In general, we derive a pairwise similarity measure
SP=1
2Nn,m
Nn
+Nm,n
Nm(4)
Nn,m denotes the number of notes in pattern n, for which
at least one note in pattern mexists that have the same
absolute pitch value (for the similarity measure SP,T ) or
onset value (for the similarity measure SP,R ). Nm,n is
defined vice versa. By applying the constraint that both
onset and absolute pitch need to be equal in Eq. 4, we
obtain the measure SP,RT . Furthermore, we derive the ag-
gregated similarity measures SP,RT,M ax and SP ,RT,M ean
analogous to Eq. 2 and Eq. 3.
6. EVALUATION
6.1 Data-set
We assembled a novel dataset from instructional bass lit-
erature [12, 21], which consists of bass patterns from the
8 genres Swing (SWI), Funk (FUN), Blues (BLU), Reggae
(REG), Salsa & Mambo (SAL), Rock (ROC), Soul & Mo-
town (SOU) and Africa (AFR), a rather general term which
here signifies Sub-Saharan Popular Music Styles [16]. For
each genre, 40 bass-lines of 4 measure length have been
stored as symbolic audio data as MIDI files. Initial listen-
ing tests revealed that in this data set, which was assem-
bled and categorized by professional bass players, a certain
amount of stylistic overlap and misclassification between
genres as for instance Blues and Swing or Soul & Motown
and Funk occurs. The overlap is partly inherent to the ap-
proach of the data sets, which treat all examples of a style
(e.g. Rock) as homogenous although the sets include typ-
ical patterns of several decades. In some features, early
Rock patterns might resemble early Blues patterns more
than they resemble late patterns of their own style [22].
Thus, the data set will be extended further and revised by
educated musicologists for future experiments.
6.2 Experiments & Results
6.2.1 Experiment 1 - Feature-based classification
As described in Sec. 5.2, we performed a baseline experi-
ment that consists of IRMFSP for chosing the best N= 80
features and the SVM as classifier. The parameter Nhas
95
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
AFR BLU FUN MOT REG ROC SAL SWI
SWI
SAL
ROC
REG
MOT
FUN
BLU
AFR 66.2 5.9 2 8.8 10.8 0 6.4 0
0 46.1 0 22.4 0 11.8 3.9 15.7
7.4 4.2 72.8 1.4 10.6 3.6 0 0
2 2.9 6.9 51.6 4.6 21.8 10.3 0
21 0 4.2 10.6 49.4 8.3 6.5 0
2.6 0 0 10.7 0 70.4 16.2 0
25 0 1.2 5.6 6.7 14 47.5 0
0 17.6 0 0 0 0 0 82.4
Bass Playing Style (correct)
Bass Playing Style (classified)
66.2 5.9 2 8.8 10.8 0 6.4 0
0 46.1 0 22.4 0 11.8 3.9 15.7
7.4 4.2 72.8 1.4 10.6 3.6 0 0
2 2.9 6.9 51.6 4.6 21.8 10.3 0
21 0 4.2 10.6 49.4 8.3 6.5 0
2.6 0 0 10.7 0 70.4 16.2 0
25 0 1.2 5.6 6.7 14 47.5 0
0 17.6 0 0 0 0 0 82.4
66.2 5.9 2 8.8 10.8 0 6.4 0
0 46.1 0 22.4 0 11.8 3.9 15.7
7.4 4.2 72.8 1.4 10.6 3.6 0 0
2 2.9 6.9 51.6 4.6 21.8 10.3 0
21 0 4.2 10.6 49.4 8.3 6.5 0
2.6 0 0 10.7 0 70.4 16.2 0
25 0 1.2 5.6 6.7 14 47.5 0
0 17.6 0 0 0 0 0 82.4
66.2 5.9 2 8.8 10.8 0 6.4 0
0 46.1 0 22.4 0 11.8 3.9 15.7
7.4 4.2 72.8 1.4 10.6 3.6 0 0
2 2.9 6.9 51.6 4.6 21.8 10.3 0
21 0 4.2 10.6 49.4 8.3 6.5 0
2.6 0 0 10.7 0 70.4 16.2 0
25 0 1.2 5.6 6.7 14 47.5 0
0 17.6 0 0 0 0 0 82.4
Figure 1. Exp. 1 - Confusion matrix for the feature-based
pattern-wise classification (all values given in %). Mean
classification accuracy is 60.8% with a standard deviation
of 2.4%.
been determined to perform best in previous tests on the
data-set. A 20-fold cross validation was applied to de-
termine the mean and standard deviation of the classifi-
cation accuracy. For a feature extraction and classification
based on complete patterns, we achieved 60.8% of accu-
racy with a standard deviation of 2.4%. The correspond-
ing confusion matrix is shown in Fig. 1. It can be seen,
that best classification results were achieved for the styles
Funk, Rock, and Swing. Strong confusions between Blues
and Motown respectively Swing, Motown and Rock, Reg-
gae and Africa as well as between Salsa and Africa can
be identified. These confusions support the musicological
assessment of the data-set given in Sec. 6.1. In addition,
they coincide with historical relations between the styles in
Africa, the Caribbean, and Latin America, as well as rela-
tions within North America as it is common musicological
knowledge [8].
As a second classification strategy, we performed the
feature extraction and classification based on sub-patterns.
Therefore, we divided each pattern within the test set into
N= 4 sub-patterns of one measure length. It was en-
sured, that no sub-patterns of patterns in the test set were
used as training data. After all sub-patterns were classi-
fied, the estimated playing style for the corresponding test
set pattern was derived from a majority decision over all
sub-pattern classifications. In case of multiple winning
classes, a random decision was applied between the win-
ning classes. For the accumulated measure-wise classifi-
cation, we achieved only 56.4% of accuracy. Thus, this
approach did not improve the classification accuracy. We
assume that the majority of the applied high-level features
that are based on different statistical descriptors (see Sec. 5.1
for details), can not provide a appropriate characterization
of the sub-patterns, which themselves only consist of 6 to
9 notes in average.
6.2.2 Experiment 2 - Pattern Similarity
This experiment is based on a leave-one-out cross-
validation scheme and thus consists of N= 320 evalu-
ation steps according to the 320 patterns in the data-set.
Within each evaluation step, the current pattern Pkis used
as test data while all remaining patterns Plwith l6=kare
used as training data. We derive the class estimate ˆckof
AFR BLU FUN MOT REG ROC SAL SWI
SWI
SAL
ROC
REG
MOT
FUN
BLU
AFR 57.4 2.1 6.4 17 6.4 0 8.5 2.1
4.2 50 4.2 18.8 2.1 6.3 4.2 10.4
4.4 6.7 62.2 11.1 2.2 6.7 4.4 2.2
0 0 0 95.1 0 0 2.4 2.4
4.7 0 7 11.6 65.1 7 2.3 2.3
0 4.7 0 14 0 69.8 2.3 9.3
6.8 4.5 4.5 6.8 4.5 0 68.2 4.5
0 12.5 0 7.5 0 0 0 80
Bass Playing Style (correct)
Bass Playing Style (classified)
Figure2. Exp. 2 - Confusion matrix for the best similarity-
based configuration (measure-wise classification using the
SP,RT ,Max similarity measure - all values given in %).
Mean classification accuracy is 68.5% with a standard de-
viation of 3.1%.
Pkfrom the class label ˆcof the best-fitting pattern b
Pas
ˆck=cˆ
l⇔ˆ
l= arg max
lSk,l (5)
with Sk,m representing the similarity measure between Pk
and Pmin the given case. As in Sec. 6.2.1, if multiple
patterns have the same (highest) similarity, we perform
a random decision among these candidates. This experi-
ment is performed for all similarity measures introduced
in Sec. 6.2.2.
Exp. 2a: Pattern-wise classification. Thebasic approach
for a pattern-based classification is to use each pattern of 4
measures length as one item to be classified.
Exp. 2b: Accumulated measure-wise classification. Bass
patterns are often structured in a way, that the measure or
a part of the measure, which precedes the pattern repeti-
tion, is often altered rhythmically or tonally and thus often
varies greatly from the pattern. These figures separating or
introducing pattern repetition are commonly referred to as
pickups or upbeats, meaning that they do not vary or over-
lap the following pattern repetition which starts on the first
beat of the new measure. A pattern-wise classification as
described above thus might overemphasize the difference
between the last measure because the patterns are com-
pared over their complete length. Hence, we investigate
another decision aggregation strategy in this experiment.
As described in Sec. 6.2.1, we divide each bass pattern
into sub-patterns of one measure length each. Within each
fold k, we classify each sub-pattern SPk,l of the current
test pattern Pkseparately. At the same time, we ensure
that only sub-patterns of the other patterns Piwith i6=k
are used as training set for the current fold. To accumulate
the classification results in each fold, we add all similarity
values Sk,l between each sub-pattern SPk,l towards their
assigned winning pattern(s) Pk,l,w in. The summation is
done for each of the 6 genres separately. The genre that
achieve the highest sum is considered as the winning genre.
As depicted in Fig. 3, the proposed accumulated
measure-wise classification strategy led to higher classifi-
cation accuracy values (blue bars) in comparison to a
pattern-wise classification (red bars). This approach can
be generalized and adopted to patterns of arbitrary length.
96
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
0
20
40
60
80
100
Mean accuracy
SP,R
SP,T
SP,RT
SP,RT,Mean
SP,RT,Max
SL,R
SL,T
SL,RT,Max
SL,RT,Mean
Accumulated measure−wise classification
Pattern−wise classification
Figure 3. Mean classification accuracy results for experi-
ment 2
0 10 20 30 40 50
0
20
40
60
80
100
Percentage ε of transcription errors
Accuracy
SP,R
SP,T
SP,RT,Max
Figure 4. Exp. 3 - Mean classification accuracy vs. per-
centage εof pattern variation (dotted line - pattern-wise
similarity, solid line - accumulated measure-wise similar-
ity).
The similarity measure SP,RT ,M ax clearly outperforms the
other similarity measures by over 10 percent points of ac-
curacy. The corresponding confusion matrix is shown in
Fig. 2. We therefore assume that it is beneficial to use sim-
ilarity information both based on pitch and onset similarity
of bass patterns. For the pattern-wise classification, it can
be seen that similarity measures based on tonal similar-
ity generally achieve lower accuracy results in comparison
to measures based on the rhythmic similarity. This might
be explained by the frequently occurring tonal variation of
patterns according to the given harmonic context such as a
certain chord of a changed key in different parts of a song.
The most remarkable result in confusion matrix is the very
high accuracy of 95.1% for the Motown genre.
6.2.3 Experiment 3 - Influence of pattern variations
For the extraction of bass-patterns from audio recordings,
two potential sources of error exist. In most music gen-
res, the dominant bass patterns are object of small vari-
ations throughout a music piece. An automatic system
might recognize the basic pattern or a variation of the basic
pattern. Furthermore, automatic music transcription sys-
tems are prone to errors in terms of incorrect pitch, onset,
and duration values of the notes. Both phenomena directly
have a negative effect on the computed high-level features.
We therefore investigate the achievable classification accu-
racy dependent on the percentage of notes with erroneous
note parameters.
We simulate the mentioned scenarios by manipulating
a random selection of εpercents of all notes from each
unknown pattern and vary εfrom 0% to 50%. The ma-
nipulation of a single note consists of either a modifica-
tion of the onset Θ[M]
Oby a randomly chosen difference
−0.25 ≤∆Θ(M)
O≤0.25 (which corresponds to a maxi-
mum shift distance of one beat for a 4
4time signature), a
modification of the absolute pitch ΘPby a randomly cho-
sen difference −2≤∆ΘP≤2(which corresponds to a
maximum distance of 2 semitones), or a simple deletion of
the current note from the pattern. Octave pitch errors that
often appear in automatic transcription algorithms were not
considered because of the mapping of each interval to a
maximum size of one octave as described in Sec. 5.1. In-
sertions in terms of additional notes, which are not part of
the pattern will be taken into account in future experiments.
As depicted in Fig. 4, the accuracy curve of the three
different pair-wise similarity measures SP,R ,SP,T and
SP,RT ,Max falls until about 40% for a transcription er-
ror rate of 50% Interestingly, the pattern-wise classifica-
tion based on SP,R seems to be more robust to transcrip-
tion errors above 15% in comparison to the accumulated
measure-wise classification even though it has a lower ac-
curacy rate for the assumption of a perfect transcription.
6.2.4 Comparison to the related work
The comparison of the achieved results to the related work
is not directly feasible. On one side, it is caused by the
fact, that different data sets have been utilized. Tsunoo
et al. [18] reported an accuracy of 44.8% for the GZTAN
data set 1while using only bass-line features. On the other
side, the performance of only bass-line features was not
every time stated. The work of Tsuchihashi et al. [17]
showed an improvement of classification accuracy from
53.6% to 62.7% while applying bass-line features compli-
mentary to other timbre and rhythmical features, but the
results of genre classification with only bass features were
not reported.
7. CONCLUSIONS & OUTLOOK
In this paper, different approaches for the automatic de-
tection of playing styles from score parameters were com-
pared. These parameters can be extracted from symbolic
audio data (e.g. MIDI) or from real audio data by means of
automatic transcription. For the feature-based appraoch,
a best result of 60.8% of accuracy was achieved using a
combination of feature selection (IRMFSP) and classifier
(SVM) and a pattern-wise classification. Regarding the
classification based on pattern similarity, we achieved
68.5% of accuracy using the combined similarity measure
SP,RT ,Max and a measure-wise aggregation strategy based
on the classification of sub-patterns. The random baseline
is 12.5%. This approach outperformed the common ap-
proach to classify the complete pattern as once.
For analyzing real-world audio recordings, further mu-
sical aspects such as micro-timing, tempo range, applied
plucking & expression styles [2], as well as the interac-
1G. Tzanetakis and P. Cook. Musical genre classification of audio
signals. IEEE Transaction on Speech and Audio Processing, 10(5):293-
302, 2002.
97
11th International Society for Music Information Retrieval Conference (ISMIR 2010)
tion with other participating instruments need to be incor-
porated into a all-embracing style description of a specific
instrument in a music recording. The results of experiment
4 emphasize the need for a well-performing transcription
system for a high-level classification task such as playing
style detection.
8. ACKNOWLEDGEMENTS
This work has been partly supported by the German re-
search project GlobalMusic2One2funded by the Federal
Ministry of Education and Research (BMBF-FKZ:
01/S08039B). Additionally, the Thuringian Ministry of
Economy, Employment and Technology supported this re-
search by granting funds of the European Fund for Re-
gional Development to the project Songs2See3, enabling
transnational cooperation between Thuringian companies
and their partners from other European regions.
9. REFERENCES
[1] J. Abeßer, H. Lukashevich, C. Dittmar, and
G. Schuller. Genre classification using bass-related
high-level features and playing styles. In Proc. of the
Int. Society of Music Information Retrieval (ISMIR
Conference), Kobe, Japan, 2009.
[2] J. Abeßer, H. Lukashevich, and G. Schuller. Feature-
based extraction of plucking and expression styles of
the electric bass guitar. In Proc. of the IEEE Int.
Conf. on Acoustics, Speech, and Signal Processing
(ICASSP), 2010.
[3] P. J. Ponce de L´eon and J. M. I˜nesta. Pattern recogni-
tion approach for music style identification using shal-
low statistical descriptors. IEEE Transactions on Sys-
tem, Man and Cybernetics - Part C : Applications and
Reviews, 37(2):248–257,March 2007.
[4] C. Dittmar, K. Dressler, and K. Rosenbauer. A tool-
box for automatic transcription of polyphonic music.
In Proc. of the Audio Mostly, 2007.
[5] Tuomas Eerola and Petri Toiviainen. MIDI Toolbox:
MATLAB Tools for Music Research. University of
Jyv¨askyl¨a, Jyv¨askyl¨a, Finland, 2004.
[6] D. Gusfield. Algorithms on strings, trees, and se-
quences: computer science and computational biol-
ogy. Cambridge University Press, Cambridge, UK,
1997.
[7] A. Hazan, M. Grachten, and R. Ramirez. Evolving per-
formance models by performance similarity: Beyond
note-to-note transformations. In Proc. of the Int. Sym-
posium, 2006.
[8] Ellen Koskoff, editor. The Garland Encyclopedia of
World Music - The United States and Canada. Garland
Publishing, New York, 2001.
2see http://www.globalmusic2one.net
3see http://www.idmt.de/eng/research topics/songs2see.html
[9] Gerhard Kubik. Zum Verstehen afrikanischer Musik.
Lit Verlag, Wien, 2004.
[10] C. McKay and I. Fujinaga. Automatic genre classifi-
cation using large high-level musical feature sets. In
Proc. of the Int. Symposium of Music Information Re-
trieval (ISMIR), 2004.
[11] C. McKay and I. Fujinaga. jSymbolic: A feature ex-
tractor for MIDI files. In Int. Computer Music Confer-
ence (ICMC), pages 302–305, 2006.
[12] H.-J. Reznicek. I’m Walking - Jazz Bass. AMA, 2001.
[13] M. P. Ryyn¨anen and A. P. Klapuri. Automatic tran-
scription of melody, bass line, and chords in poly-
phonic music. Computer Music Journal, 32:72–86,
2008.
[14] C. S. Sapp. Hybrid numeric/rank similarity metrics for
musical performance analysis. In Proc. of the Int. Sym-
posium on Music Information Retrieval (ISMIR), pages
501–506, 2008.
[15] E. Stamatatos and G. Widmer. Automatic identification
of music performers with learning ensembles. Artificial
Intelligence, 165:37–56, 2005.
[16] Ruth M. Stone, editor. The Garland Encyclopedia of
World Music - Africa, volume 1. Garland Publishing,
New York, 1998.
[17] Y. Tsuchihashi, T. Kitahara, and H. Katayose. Using
bass-line features for content-based MIR. In Proc. of
the Int. Conference on Music Information Retrieval
(ISMIR), Philadelphia, USA, pages 620–625, 2008.
[18] E. Tsunoo, N. Ono, and S. Sagayama. Musical bass-
line clustering and its application to audio genre classi-
fication. In Proc. of the Int. Society of Music Informa-
tion Retrieval (ISMIR Conference), Kobe, Japan, 2009.
[19] George Tzanetakis, Ajay Kapur, W. Andrew Schloss,
and Matthew Wright. Computational ethnomusicol-
ogy. Journal of Interdisciplinary Music Studies,
1(2):1–24, 2007.
[20] Robert A. Wagner and Michael J. Fischer. The string-
to-string correction problem. Journal of the ACM
(JACM), 21(1):168–173,1974.
[21] Paul Westwood. Bass Bible. AMA, 1997.
[22] Peter Wicke. Handbuch der popul¨
aren Musik:
Geschichte, Stile, Praxis, Industrie. Schott, Mainz,
2007.
[23] G. Widmer, S. Dixon, W. Goebl, E. Pampalk, and
A. Tobudic. In search of the Horowitz factor. AI Mag-
azine, 24:111–130, 2003.
[24] G. Widmer and W. Goebl. Computational models of
expressive music performance: The state of the art.
Journal of New Music Research, 33(3):203–216,2004.
98
11th International Society for Music Information Retrieval Conference (ISMIR 2010)