ThesisPDF Available

Logic-based Modelling of Musical Harmony for Automatic Characterisation and Classification


Abstract and Figures

Harmony is the aspect of music concerned with the structure, progression, and relation of chords. In Western tonal music each period had different rules and practices of harmony. Similarly some composers and musicians are recognised for their characteristic harmonic patterns which differ from the chord sequences used by other musicians of the same period or genre. This thesis is concerned with the automatic induction of the harmony rules and patterns underlying a genre, a composer, or more generally a ‘style’. Many of the existing approaches for music classification or pattern extraction make use of statistical methods which present several limitations. Typically they are black boxes, can not be fed with background knowledge, do not take into account the intricate temporal dimension of the musical data, and ignore rare but informative events. To overcome these limitations we adopt first-order logic representations of chord sequences and Inductive Logic Programming techniques to infer models of style. We introduce a fixed length representation of chord sequences similar to n-grams but based on first-order logic, and use it to characterise symbolic corpora of pop and jazz music. We extend our knowledge representation scheme using context-free definite-clause grammars, which support chord sequences of any length and allow to skip ornamental chords, and test it on genre classification problems, on both symbolic and audio data. Through these experiments we also compare various chord and harmony characteristics such as degree, root note, intervals between root notes, chord labels and assess their characterisation and classification accuracy, expressiveness, and computational cost. Moreover we extend a state- of-the-art genre classifier based on low-level audio features with such harmony-based models and prove that it can lead to statistically significant classification improvements. We show our logic-based modelling approach can not only compete with and improve on statistical approaches but also provides expressive, transparent and musicologically meaningful models of harmony which makes it suitable for knowledge discovery purposes.
Content may be subject to copyright.
Logic-based Modelling of Musical Harmony
for Automatic Characterisation and
Amélie Anglade
Thesis submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
of the
University of London
School of Electronic Engineering and Computer Science
Queen Mary, University of London
January 2014
I, Amélie Anglade, confirm that the research included within this thesis is my own work or
that where it has been carried out in collaboration with, or supported by others, that this is
duly acknowledged below and my contribution indicated. Previously published material is
also acknowledged below.
I attest that I have exercised reasonable care to ensure that the work is original, and does not
to the best of my knowledge break any UK law, infringe any third party’s copyright or other
Intellectual Property Right, or contain any confidential material.
I accept that the College has the right to use plagiarism detection software to check the
electronic version of the thesis.
I confirm that this thesis has not been previously submitted for the award of a degree by this
or any other university.
The copyright of this thesis rests with the author and no quotation from it or information
derived from it may be published without the prior written consent of the author.
Date: January 29, 2014
Details of collaboration and publications:
All collaborations and earlier publications that have influenced the work and writing of this
thesis are fully detailed in Section 1.4.
Harmony is the aspect of music concerned with the structure, progression, and relation of
chords. In Western tonal music each period had different rules and practices of harmony.
Similarly some composers and musicians are recognised for their characteristic harmonic
patterns which differ from the chord sequences used by other musicians of the same period or
genre. This thesis is concerned with the automatic induction of the harmony rules and patterns
underlying a genre, a composer, or more generally a ‘style’.
Many of the existing approaches for music classification or pattern extraction make use of
statistical methods which present several limitations. Typically they are black boxes, can not
be fed with background knowledge, do not take into account the intricate temporal dimension
of the musical data, and ignore rare but informative events. To overcome these limitations we
adopt first-order logic representations of chord sequences and Inductive Logic Programming
techniques to infer models of style.
We introduce a fixed length representation of chord sequences similar to n-grams but based
on first-order logic, and use it to characterise symbolic corpora of pop and jazz music. We
extend our knowledge representation scheme using context-free definite-clause grammars,
which support chord sequences of any length and allow to skip ornamental chords, and test it
on genre classification problems, on both symbolic and audio data. Through these experiments
we also compare various chord and harmony characteristics such as degree, root note, intervals
between root notes, chord labels and assess their characterisation and classification accuracy,
expressiveness, and computational cost. Moreover we extend a state- of-the-art genre classifier
based on low-level audio features with such harmony-based models and prove that it can lead
to statistically significant classification improvements.
We show our logic-based modelling approach can not only compete with and improve on
statistical approaches but also provides expressive, transparent and musicologically meaningful
models of harmony which makes it suitable for knowledge discovery purposes.
To all women engineers,
in particular to those who inspired me,
but most importantly to all those to come.
As I am writing this acknowledgments section I realise that throughout the years of my PhD
research and writing-up I have met and have been influenced by a great number of people
without whom my research and life would have been dramatically different.
Thanks to my supervisor, Simon Dixon, for his guidance throughout the PhD, but most
importantly for remaining calm and never losing faith in me. Thanks to Rafael Ramirez for
sharing with me his expertise on Inductive Logic Programming applied to music, for clearing
up the questions I had on this topic and for collaborating with me on what was then turned
into two publications and forms the central part of my thesis.
Thanks to the Pattern Recognition and Artificial Intelligence Group of the University of
Alicante for sharing their invaluable dataset, and to the Declarative Languages and Artificial
Intelligence Group of the KU Leuven for swiftly and kindly answering all my queries about
ACE-ILP, often with in-depth analyses of my application specific problems.
Thanks to Emmanouil Benetos, Matthias Mauch, Mathieu Barthet, Gyorgy Fazekas,
Sefki Kolozali, Robert Macrae, Thierry Bertin-Mahieux, Jason Hockman, Mohamed Sordo,
Òscar Celma, Paul Lamere, Ben Fields, Brian McFee, Claudio Baccigalupo and Norman
Casagrande for helping me escape the lonely experience PhD research can sometimes be with
our fulfilling and close collaborations, be they experiments, publications, development of the
Hotttabs app or organisation of the f(MIR) and WOMRAD workshops.
Thanks to those whose feedback greatly influenced my work: Lesley Mearns for the
insightful musicological discussions, and Claudio Baccigalupo for his handsome LaTeX
template and for the interesting conversations on music recommendations we had together.
Thanks to Mathieu Barthet, Matthew Davies, Sheila Stark and Johanna Brewer for their
indispensable guidance while I was writing-up.
Thanks to my OMRAS2 collaborators with whom I enjoyed sharing ideas and projects:
Ben Fields, Yves Raimond, Kurt Jacobson, Matthias Mauch, Robert Macrae, Chris Can-
nam, Christopher Sutton, Michael Casey, Tim Crawford, Dan Tidhar, Thomas Wilmering,
Mathieu Barthet, Gyorgy Fazekas, Xue Wen, Mark Levy, Panos Kudumakis, Samer Abdallah,
Christophe Rhodes, Mark D’Inverno, Michaela Magas, Polina Proutskova.
Thanks to Mark Sandler and Mark Plumbley for welcoming me in the Centre for Digital
Music and providing all the resources for me to thrive as a researcher. Thanks also to
those co-workers I have not mentioned yet who made my time at QMUL an enjoyable one:
Louis Matignon, Katy Noland, Andrew Robertson, Andrew Nesbit, Enrique Perez Gonzalez,
Dan Stowell, Asteris Zacharakis, Magdalena Chudy, Martin Morrell, Michael Terrell, Alice
Clifford, Josh Reiss, Anssi Klapuri, Daniele Barchiesi, Chris Harte, Aris Gretsistas, Maria
Jafari, Adam Stark, Tim Murray Browne, Jean-Batiste Thiebaut, Nela Brown, Steve Welburn,
Steven Hargreaves, Holger Kirchhoff, Peter Foster, Luis Figueira.
I would like to express my sincere gratitude to the established Music Information Retrieval
researchers that made me feel part of the community, inspired me, encouraged me, mentored
me, and even though I was only a PhD student, treated me as an equal. In no particular order:
Òscar Celma, Paul Lamere, Norman Casagrande, Douglas Eck, Jean-Julien Aucouturier,
George Tzanetakis, Frans Wiering, Emilia Gómez, Fabien Guyon, Xavier Serra, Masataka
Goto, Juan Pablo Bello, Xavier Amatriain, Fabio Vignoli, Steffen Pauws.
Thanks to my amazing past and current managers and colleagues at SoundCloud and frestyl
who over the past months gave me the space to complete my writing-up with flexible working
hours and assignments.
Thanks to my family and my friends for their invaluable support. Thanks especially to my
mother who so many times took on her some of my personal tasks to allow me to focus on my
PhD. Thanks to my father, brother, grand-parents and family in-law for constantly checking
on me and for tolerating my absences as I was writing-up. Thanks to Becky for being such an
inspiring and supportive colleague and friend and for her and Ben’s kind hospitality. Thank you
also for being, like me, an active women in Science, Engineering and Technology with whom
I could talk to about gender questions and simply share girly moments with during my PhD.
For the same reason I would like to thank Nela, Magdalena, Claire and Rita as well.
Thanks to all those fellow volunteers I was so lucky to collaborate with over the past years,
for their inspiringly untiring dedication and for proving me that engineers can contribute to
making the world a better place.
My endless gratitude goes to my friend Benjamin whose unreserved coaching over the
past year made the difference between a plan and its execution. Above all, I would like to
thank my husband, Benoît, for his writing guidance and unconditional emotional support.
This work was supported by the EPSRC grants EP/E017614/1 and EP/E045235/1.
Contents 9
List of Figures 12
List of Tables 13
1 Introduction 17
1.1 Motivation ...................................... 18
1.1.1 Music Characterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.1.2 The Expressive Power of Logic-based Modelling . . . . . . . . . . . . . 20
1.1.3 The Characterisation Power of Harmony . . . . . . . . . . . . . . . . . 20
1.2 ResearchGoal .................................... 21
1.3 Contributions..................................... 22
1.4 Related Publications by the Author . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5 ThesisOutline .................................... 28
2 Background and Related Work in Music Information Retrieval 31
2.1 Introduction ..................................... 32
2.2 Musical Concept Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.1 Harmony-Related Terms . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.2 ChordNotations............................... 36
2.3 Related Work in Music Information Retrieval . . . . . . . . . . . . . . . . . . . 38
2.3.1 Characterisation............................... 39
2.3.2 Automatic Genre Classification . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Conclusions...................................... 44
3 Background and Related Work in Inductive Logic Programming 47
3.1 Introduction ..................................... 48
3.2 ADenitionofILP ................................. 48
3.2.1 Inductive Concept Learning . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.2 Inductive Concept Learning with Background Knowledge . . . . . . . . 49
3.2.3 Relational Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.4 A Simple Inductive Logic Programming Problem . . . . . . . . . . . . . 50
3.3 ILP Techniques and Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 RelationalDataMining ............................... 53
3.5 ApplicationsofILP ................................. 54
3.5.1 A Tool Used in Many Disciplines . . . . . . . . . . . . . . . . . . . . . . 54
3.5.2 Musical Applications of ILP . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6 Conclusions...................................... 57
4 Automatic Characterisation of the Harmony of Song Sets 59
4.1 Introduction ..................................... 60
4.2 Methodology..................................... 60
4.2.1 Harmonic Content Description . . . . . . . . . . . . . . . . . . . . . . 61
4.2.2 Rule Induction with ILP . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3.1 Independent Characterisation of The Beatles and Real Book Chord
Sequences .................................. 66
4.3.2 Characterisation of The Beatles vs. Real Book Songs . . . . . . . . . . . 71
4.3.3 Considerations About the Size of the Corpora and the Computation Time 72
4.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5 Automatic Genre Classification 75
5.1 Introduction ..................................... 76
5.2 LearningHarmonyRules .............................. 76
5.2.1 Representing Harmony with Context-Free Definite-Clause Grammars 76
5.2.2 LearningAlgorithm ............................. 80
5.2.3 Dataset .................................... 82
5.2.4 Chord Transcription Algorithm . . . . . . . . . . . . . . . . . . . . . . . 83
5.2.5 Data Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.1 Statistical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.2 Choosing the Knowledge Representation . . . . . . . . . . . . . . . . . 87
5.3.3 From Symbolic Data to Automatic Chord Transcriptions . . . . . . . . 92
5.3.4 Towards Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.5 ExamplesofRules .............................. 98
5.4 Conclusions...................................... 101
6 Improving on State-of-the-art Genre Classification 103
6.1 Introduction ..................................... 104
6.2 Combining Audio and Harmony-based Classifiers . . . . . . . . . . . . . . . . 104
6.2.1 FeatureExtraction.............................. 105
6.2.2 FeatureSelection .............................. 106
6.2.3 Classification System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3 Datasets........................................ 108
6.4 Experiments ..................................... 109
6.4.1 TrainingResults ............................... 109
6.4.2 Testing on Real Audio Data . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.4.3 Discussion .................................. 115
6.5 Conclusions...................................... 116
7 Conclusions 117
7.1 Summary ....................................... 118
7.2 Conclusions and Answers to Research Questions . . . . . . . . . . . . . . . . . 121
7.3 Directions for Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.3.1 Further Developments of our Approach . . . . . . . . . . . . . . . . . . 123
7.3.2 Potential Applications of our Work . . . . . . . . . . . . . . . . . . . . . 125
Bibliography 129
List of Figures
1.1 Logic-based framework for musical rule induction. . . . . . . . . . . . . . . . . . . 22
2.1 Types of triads (and neutral chord) used in this work. . . . . . . . . . . . . . . . . 34
2.2 Types of seventh and major sixth chords used in this work. . . . . . . . . . . . . . 34
2.3 Example of the major mode: C Major scale, with pitch labels and degrees. . . . . 35
2.4 Example of the minor mode: A (natural) minor scale, with pitch labels and degrees. 36
2.5 A short extract of music in C major with different harmony notations . . . . . . . 37
2.6 Model of a chord in the Chord Ontology . . . . . . . . . . . . . . . . . . . . . . . 38
3.1 Michalski’s train problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1 A piece of music (i.e. list of chords) assumed to be in C major, and its Definite
Clause Grammar (difference-list Prolog clausal) representation. . . . . . . . . . . 80
5.2 Schematic example illustrating the induction of a first-order logic tree for a 3-genre
classicationproblem .................................. 82
6.1 Block diagram of the genre classifier . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Classification accuracy for the GTZAN dataset using various feature subsets. . . . 112
6.3 Classification accuracy for the ISMIR04 dataset using various feature subsets. . . 113
List of Tables
2.1 List of the most important harmonic intervals . . . . . . . . . . . . . . . . . . . . . 33
2.2 Diatonicscaledegrees. ................................. 35
2.3 Shorthand notations of main chord categories as used in this work. . . . . . . . . . 38
4.1 Beatles harmony rules whose coverage is larger than 1% . . . . . . . . . . . . . . . 68
4.2 Real Book harmony rules whose coverage is larger than 1% . . . . . . . . . . . . . 69
4.3 Beatles root interval and chord category rules (whose coverage is larger than 1%)
and the associated degree and chord category rules . . . . . . . . . . . . . . . . . 70
4.4 Top ten Real Book harmony rules when considering root interval progressions and
chord category progressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5 Top ten Beatles harmony rules when the Real Book is taken as the source of negative
examples ......................................... 72
4.6 Top ten Beatles root interval and chord category rules when the Real Book is taken
as the source of negative examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.1 Background knowledge used in the first-order logic decision tree induction algorithm. 79
5.2 Classification results on symbolic data using 10-fold cross-validation . . . . . . . . 88
5.3 Confusion matrices (test results aggregated over all folds) for all main-genre clas-
sification problems using the Degree & Category representation on the symbolic
dataset. .......................................... 90
5.4 Confusion matrices (test results aggregated over all folds) for 9-subgenre and pop-
ular subgenres classification problems using the Degree & Category representation
onthesymbolicdataset.................................. 91
5.5 Classification results of models trained on symbolic data when tested on audio data 93
5.6 Classification results of models trained and tested on audio data using 10-fold cross-
validation......................................... 95
5.7 Classification results of Random Forests models trained and applied on symbolic
data, trained on symbolic data and applied audio data, and trained on audio data
andappliedonaudiodata................................ 96
5.8 Comparison of classification accuracies for Single Tree and Random Forest models
with Degree & Category representation . . . . . . . . . . . . . . . . . . . . . . . . 97
5.9 Comparison of classification accuracies for our Random Forest (RF) models and
Pérez-Sancho et al.’s n-gram models . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.1 ExtractedFeatures.................................... 105
6.2 The subset of 10 selected features. . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3 Confusion matrix (cumulative results over the 5 folds of the cross-validation) for
the harmony-based classifier applied on the classical-jazz/blues-pop restricted and
re-organised version of the Perez-9-genres Corpus (symbolic dataset). . . . . . . . . 110
6.4 Confusion matrix (cumulative results over the 5 folds of the cross-validation) for
the harmony-based classifier applied on the classical-jazz/blues-pop restricted and
re-organised version of the Perez-9-genres Corpus (synthesised audio dataset). . . . 110
6.5 Confusion matrix for the harmony-based classifier trained on symbolic data and
applied on the GTZAN dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.6 Confusion matrix for the harmony-based classifier trained on synthesised audio
data and applied on the GTZAN dataset. . . . . . . . . . . . . . . . . . . . . . . . 111
6.7 Confusion matrix for the harmony-based classifier trained on symbolic data and
applied on the ISMIR04 dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.8 Confusion matrix for the harmony-based classifier trained on synthesised audio
data and applied on the ISMIR04 dataset. . . . . . . . . . . . . . . . . . . . . . . 111
6.9 Best mean accuracy achieved by the various classifiers for the GTZAN and
ISMIR04 datasets using 5x5-fold cross-validation. . . . . . . . . . . . . . . . . . . 112
6.10 Mean accuracy achieved by the various classifiers for the GTZAN and ISMIR04
datasets, using the MARSYAS feature set and 5x5-fold cross-validation. . . . . . . 114
6.11 Confusion matrix for one 5-fold cross validation run of the SVM-H classifier applied
on the GTZAN dataset using the 60 selected features set. . . . . . . . . . . . . . . 114
6.12 Confusion matrix for one 5-fold cross validation run of the SVM+H classifier
applied on the GTZAN dataset using the 50 selected features set. . . . . . . . . . 114
6.13 Confusion matrix for one 5-fold cross validation run of the SVM-H classifier applied
on the ISMIR04 dataset using the 70 selected features set. . . . . . . . . . . . . . 114
6.14 Confusion matrix for one 5-fold cross validation run of the SVM+H classifier
applied on the ISMIR04 dataset using the 80 selected features set. . . . . . . . . . 115
18 | Introduction
Music like other online media is undergoing an information explosion. Massive online
music stores such as the iTunes Store1or Amazon MP32, and their counterparts, the streaming
platforms, such as Spotify3, Rdio4and Deezer5, offer more than 30 million6pieces of music to
their customers, that is to say anybody with a smart phone. Indeed these ubiquitous devices
offer vast storage capacities and cloud-based apps that can cater any music request. As Paul
Lamere puts it7:
we can now have a virtually endless supply of music in our pocket. The ‘bottomless iPod’
will have as big an effect on how we listen to music as the original iPod had back in 2001.
But with millions of songs to chose from, we will need help finding music that we want to
hear [...]. We will need new tools that help us manage our listening experience.
Retrieval, organisation, recommendation, annotation and characterisation of musical data is
precisely what the Music Information Retrieval (MIR) community has been working on for
at least 15 years (Byrd and Crawford, 2002). It is clear from its historical roots in practical
fields such as Information Retrieval, Information Systems, Digital Resources and Digital
Libraries but also from the publications presented at the first International Symposium on Music
Information Retrieval in 2000 that MIR has been aiming to build tools to help people to navigate,
explore and make sense of music collections (Downie et al., 2009). That also includes analytical
tools to support for instance computational musicology, in which the user is then an expert, a
2 Download/b?node=163856011
6 new-mac-app- discovery-features- hits-5m-
subscribers-12m- monthly-active-users
7in Is that a million songs in your pocket, or are you just glad to see me? posted on September 2, 2010
in Music Machinery: a-million-songs- in-your- pocket-or-are-
you-just- glad-to-see- me/
1.1. Motivation | 19
1.1.1 Music Characterisation
If MIR field remains largely application oriented, it however seems like the end user himself
has been neglected (Schedl et al., 2013). We believe this is due to MIR approaches
focusing often too much on the result, the prediction as the end goal. Retrieval methods
are evaluated on their precision and recall. Classification techniques are judged on their
accuracy. Recommendations are gauged on their mean absolute error. They are assessed
upon their predictive capacities, while neglecting to tap into the descriptive power of the
models they employ. If many of them are nonetheless based on acoustical or musicological
concepts, the signal-based and statistical tools generally adopted to represent and model
those concepts are often obfuscating the underlying musical phenomena which then remain
invisible to the end-user. One example of this that we will describe further in Chapter 2
is the Bag-of-Features, the most popular approach to genre classification. The underlying
properties of timbre, pitch or rhythm it uses are reduced to low-level representations through
signal-based descriptors, and lose their temporal dimension as they are processed by statistical
models. Such unexpressive black-boxes result in opaque predictions. This goes against what
the Expert Systems (Swartout et al., 1991) and later the Recommender Systems (Herlocker
et al., 2000; Sinha and Swearingen, 2002) communities suggested, which is to provide users
with explanations for the predictions. As Tintarev and Masthoff (2007) summarise it in their
survey of explanations in recommender systems:
among other things, good explanations [for recommendations] could [and do (as shown
in the survey)] help inspire user trust and loyalty, increase satisfaction, make it quicker
and easier for users to find what they want, and persuade them”.
Accordingly, we focus in this work on adding description to prediction, by concentrating
on music characterisation, which is an intrinsically transparent MIR task. What we try to
characterise in the music is what we call style, the underlying common traits of a genre, a
composer, a musical period, or even a user’s preference.
20 | Introduction
1.1.2 The Expressive Power of Logic-based Modelling
Automatic characterisation of music requires machine learning modelling techniques to
support the discovery and extraction of patterns from data, that is to say inductive learning.
Since we are pursuing transparency we need tools that enable it. This is precisely what a
logic-based representation can bring. Logic rules are written in a symbolic language that has
the advantage of being human readable. Automatically extracted musical patterns expressed
in logical formulae can be transmitted to musicologists who can in turn analyse them. In the
musical domain we could obtain rules describing the relationship between musical phenomena
and structured descriptions of local musical content. When using first-order (or relational)
logic those descriptors can be not only low level concepts but also high level ones involving
relations between individual data points. For example temporal relations between local
musical events can easily be represented.
To extend this expressive power from the representation of the data to the representation
of models and even to the inductive learning process discovering them, Inductive Logic
Programming (ILP) is a fitting framework. This field, that we will describe in more details in
Chapter 3, is at the intersection of Machine Learning and Logic Programming (Muggleton,
1991). It “is concerned with inductive inference. It generalizes from individual instances/observations
in the presence of background knowledge, finding regularities/hypotheses about yet unseen instances
(Džeroski and Lavrac, 2001a). Inductive inference learns relations from a training dataset to
be applied on new data points. The use of prior knowledge is one of the distinctive strengths
of ILP. Moreover the expression of this background information is quite natural thanks again
to the use of a relational representation. It is also compact as only a fraction of it has to be
explicitly expressed and the rest can be derived as part of the mining process.
This short presentation of the expressive power of logic modelling triggers our first research
RQ1: How can logic and in particular ILP support music characterisation? How can musical
properties be represented and extracted?
1.1.3 The Characterisation Power of Harmony
Having found a tool for characterisation, we need now to find a set of features from which
patterns can be extracted which capture an important aspect of music. We could focus on
some characterising aspects such as rhythm, melody or timbre but this thesis will limit its
1.2. Research Goal | 21
scope to another one, harmony. Harmony is a high level descriptor of music focusing on
the structure, progression, and relation of chords. In Western tonal music, to which this
thesis will limit its scope, harmony can be used to characterise a musical period (e.g. Baroque
music) or a musical genre. Indeed, as depicted by Piston (1987), each period had different
rules and practices of harmony. Some harmonic patterns forbidden in a period became
common practices afterwards: for instance the tritone was considered as diabolus in musica
until the early 18th century and became later on a key component of the tension/release
mechanism of the tonal system. “Modern” musical genres are also characterised by typical
chord sequences: if the pop-rock tunes mainly follow the tonic-subdominant-dominant chord
sequence, jazz standards usually follow more complex chord progressions. Similarly some
composers, musicians and bands are recognised for their characteristic harmonic patterns
which differ from the chord sequences used by other musicians of the same period or genre.
Despite its richness, harmony is a concept that can be understood not only by experts but
also amateur musicians, through simplified notations such as lead sheets or tabs containing
chord charts. The availability of guitar tabs all over the internet and their success with the
amateur guitarists attests a popular interest and understanding of at least basic harmonic
Hence we believe harmony has a good discriminative (can distinguish between genres)
and expressive power (can be understood by experts and amateurs). This paragraph leads to
our second research question:
RQ2: Is it possible to leverage harmony’s descriptive and expressive power to characterise music
automatically? To what extent can harmony be used by itself to characterise styles?
Research Goal
Our working hypothesis is that we can combine ILP and harmony to build characterisation
models of musical styles. As mentioned above a widely spread way of thinking of harmony
is as sequences of chords. There is obviously more to harmony than sequences of chords,
and we will describe more harmony concepts such as tonality in Chapter 2, but at its core
harmony describes chords and temporal patterns between them. Sequences imply temporal
22 | Introduction
Extraction of
musical events
Symbolic or
Audio Features Automatic
inference of
rules applying
to the database
Inference system
description of
the musical
of musical
of symbolic
or audio
Figure 1.1: Logic-based framework for musical rule induction.
relations, which can be represented in ILP. This brings us to the next research question:
RQ3: Which representation schemes are suited to represent harmony and in particular chord
sequences in ILP?
The idea in this work is to create high-level music descriptors based on logical expressions.
High-level harmony descriptors are closer to the description of music given by musicologists
than the low-level signal-based descriptors commonly used in MIR which are unintelligible and
can not be presented to non-technical users. By use of ILP on a learning database representing
a particular style we derive logical rules describing this style. Therefore the goal of this thesis
is to build a harmony- and logic-based reasoning framework for music characterisation.
An illustration of this framework is given in Figure 1.1. It takes as input a database of
examples to characterise. Notice that these examples can either be audio signals or symbolic
examples. They are then analysed by a musical event extractor using either symbolic or audio
features. Finally the relational description of the examples resulting from this analysis is given
to an inference system which derives musical rules that cover the examples. To implement
this framework we need to answer the question:
RQ4: Which induction algorithms can best extract logic rules from Harmony?
1 . 3
This thesis describes the development, tuning and testing of such a harmony- and logic-based
framework for musical style characterisation. The main contributions can be found in Chapters
1.3. Contributions | 23
4, 5 and 6. To build our framework we answer our research questions with an explorative
approach. Throughout the thesis, we experiment in five main directions:
representation of chord sequences
ILP induction method
granularity of the musical styles to characterise (looking at different levels of a genre
different datasets representing these styles
symbolic and audio domains
The first representation scheme for chord sequences is presented in Chapter 4. It consists
of using fixed-length blocks of chord progressions of length four. We introduce in that chapter
the idea of expressing various chord properties to describe the chord progressions: root note,
bass note, chord category, root interval, bass interval and degree. If in Chapter 5 we use the
same chord properties, we however exchange the chord progression representation for a more
flexible one. We relax the constraints to allow for chord sequences of any length and add
the concept of gap to skip non-characterising chords. Inspired by biological studies, this is
implemented with a context-free definite-clause grammar and a difference-list representation.
For each induction method we exploit an existing ILP piece of software. In Chapter
4 we base the research on Aleph, an inverse entailment ILP tool. Our experiments show
our implementation scales to datasets of musicologically meaningful sizes while keeping
computation short and lightweight. However Aleph can not infer rules from flexible length
chord progressions. As a consequence, in Chapter 5, we move to TILDE, an ILP decision tree
induction software. We consider both single tree models and random forests.
We explore characterisation of an artist, in this case actually a band, The Beatles, thanks
to Harte’s dataset (2010), containing transcriptions of the 180 songs featured on all their 12
studio albums (Section 4.2). We also differentiate genres with several datasets:
transcriptions of 244 Jazz standards from the Real Book (various, 2004), representing
Jazz – presented in Section 4.2
• the Perez-9-genres dataset consisting of 856 pieces of music representing three main
genres (popular, jazz and academic music), that are further separated into 9 subgenres
(pop, blues and Celtic; pre-bop, bop and bossanova; Baroque, Classical and Romantic
periods) (Pérez-Sancho, 2009) – presented in Section 5.2.3
24 | Introduction
the GTZAN database which contains 1000 audio recordings equally distributed across
10 music genres (Tzanetakis and Cook, 2002) – presented in Section 6.3
the dataset created for the ISMIR 2004 Genre Classification Contest which covers seven
genre classes with varying numbers of audio recordings per classe (ISMIR, 2004), from
which we use 447 pieces – presented in Section 6.3.
Additionally we develop techniques to cover both symbolic and audio data types. Symbolic
data in our case comes in several formats: the Real Book and the Perez-9-genres datasets are
encoded in the Band in a Box file format, while the Beatles dataset is in a Resource Description
Framework (RDF) format. They are both described in more details in Section 2.2.2. To extend
our framework from symbolic to audio data, which are more readily available and are the main
target of industrial applications, we use a chord transcription algorithm from Mauch (2010).
We investigate in Chapter 5 the use on audio data of ILP models trained on symbolic files, but
obtain better results with models trained on audio files. The audio datasets we run experiments
on are: Perez-9-genres, GTZAN and ISMIR 2004 Genre Classification Contest.
From the experiments, the following research question arises:
RQ5: How can characterisation accuracy be evaluated?
Our first attempt at answering it is to examine the transparent models resulting from the
experiments where we confirm that the logical rules in them agree with generally accepted
musicological findings. This work can be read in Section 4.3 and 5.3.5. If Computational
Musicology is not the focus of this thesis, this analysis shows automatic rule generation
assisting musicology could still be an application of our research. We contribute to the field
not only the methods, but also the sets of transparent and human readable rules generated
from our experiments that characterise various styles.
In order to get a quantifiable measure of accuracy we pursue in Chapter 5 the neighbouring
task of genre classification, focusing on extracting transparent models of classification to
still allow for characterisation. This task of genre classification has the advantage of having
established measures of success, due to its clear binary classifying decision (matching predicted
labels with ground truth labels). Moreover as described in more details in Section 2.3.2 the
genre classification task has received a lot of interest from the MIR community, which allows us
to test our classification methods on multiple existing and well studied datasets. In Chapter
5 we compare our transparent approach to classification with some of the many statistical
1.3. Contributions | 25
methods applied on the same datasets. We show that our level of accuracy is equivalent to
those other methods.
Finally in Chapter 6, in order to improve on a state of the art genre classifier we pair
our harmony models with more traditional low level signal-based descriptors, resulting in a
meta-classifier which obtains statistically significantly better results than the same framework
without our contributed models.
In a nutshell, this thesis work brings the following novel points to the state-of-the-art:
A variable- and arbitrary-length representation of chord sequences which enriches the
fixed-length approaches (n-grams or similar) employed up to now for their representa-
tion. The original use of context-free definite-clause grammars is the key ingredient for
such a flexible representation.
A modelling of gaps of unspecified length in between harmonic sequences of interest
to skip ornamental and passing chords, which we were the first to introduce. We capture
gaps of flexible length thanks to the recursive power of Inductive Logic Programming.
The proof of the appropriateness and usefulness of the combination of degree
and chord category in automatic characterisation and classification of style. We
show that despite the complexity of such models including both a function relative
to the tonality (degree) and the chords internal structure (category), they obtain
statistically significantly better results when used for genre classification. This composite
representation also result in less complex models requiring smaller computation times.
Finally we also show that it is also more musicologically meaningful than using each
component independently.
The incorporation of harmony models into state-of-the-art signal-based genre clas-
sifiers. Where current genre-classification approaches suffer from a semantic gap due
to acoustical low-level features, ours includes a temporal modelling of harmony to retain
musical meaning. This results in a statistically significant increase in genre classification
accuracy of up to 2.5%.
Automatically generated datasets of harmony rules characterising several genres and
styles for further musicological studies. Our experiments produce sets of rules for all
characterisation and classification problems we study. These are available upon request
to musicologists and other researchers, either in a human-readable format or as logic
26 | Introduction
1 . 4
Related Publications by the Author
We list below those of our publications that have influenced the work and writing of this thesis.
When they were the results of collaborations, the contributions of the co-authors are also
specified accordingly.
Simon Dixon was the author’s supervisor and provided guidance, support and feedback for
the entire duration of her thesis. Rafael Ramirez acted as her local supervisor for the 2-month
research exchange she spent at the Music Technology Group (Universitat Pompeu Fabra,
Barcelona) in January-March 2009 and for the subsequent publications.
Peer-Reviewed Conference Papers
(Anglade and Dixon, 2008a) – Amélie Anglade and Simon Dixon. Characterisation
of Harmony with Inductive Logic Programming. In Proceedings of the 9th International
Conference on Music Information Retrieval (ISMIR), pages 63–68, Philadelphia, PA, USA, 2008.
The poster presenting this publication at the ISMIR 2008 conference also received the Best
Poster Award at the London Hopper Colloquium for women in computing research, British
Computer Society Headquarters, London, 2009.
(Anglade and Dixon, 2008b) – Amélie Anglade and Simon Dixon. Towards Logic-
based Representations of Musical Harmony for classification, Retrieval and Knowledge
Discovery. In Proceedings of the 1st International Workshop on Machine Learning and Music
2008 (MML), co-located with the 25th International Conference on Machine Learning (ICML), the
24th Conference on Uncertainty in Artificial Intelligence (UAI) and the 21st Annual Conference on
Learning Theory (COLT), pages 11–12, Helsinki, Finland, 2008.
(Anglade et al., 2009b) – Amélie Anglade, Rafael Ramirez and Simon Dixon. First-
order Logic Classification Models of Musical Genres Based on Harmony. In Proceedings
of the 6th Sound and Music Computing Conference (SMC), pages 309–314, Porto, Portugal, 2009.
(Anglade et al., 2009c) – Amélie Anglade, Rafael Ramirez and Simon Dixon. Genre
1.4. Related Publications by the Author | 27
classification using harmony rules induced from automatic chord transcriptions. In
Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR),
pages 669–674, Kobe, Japan, 2009.
(Barthet et al., 2011) – Mathieu Barthet, Amélie Anglade, Gyorgy Fazekas, Sefki Kolozali
and Robert Macrae. Music Recommendation for Music Learning: Hotttabs, a Multimedia
Guitar Tutor. In Proceedings of the 2nd Workshop on Music Recommendation and Discovery
(WOMRAD), co-located with the 5th ACM Recommender Systems Conference (ACM RecSys),
pages 7–13, Chicago, IL, USA, 2011.
This publication describes a web application developed at the London Music Hack Day
2009 and Barcelona Music Hack Day 2009. All co-authors have built or provided pieces
(algorithms or infrastructure) of the application and worked on their integration together.
The author’s work lies mainly in the design, implementation and refinement of the guitar
tab clustering component, as well as the conceptualisation of the original idea for the web
application and contribution to the definition and refinement of its functionalities.
Journal Article
(Anglade et al., 2010) – Amélie Anglade, Emmanouil Benetos, Matthias Mauch and Simon
Dixon. Improving Music Genre Classification Using Automatically Induced Harmony
Rules.Journal of New Music Research, 39(4):349–361, 2010.
Emmanouil Benetos and the author equally collaborated on this work. Emmanouil Benetos
provided his prior implementations of the signal-based features and statistical classification
algorithms, while the author contributed her harmony-based ILP classification models.
The integration of those components as well as the experimental design, execution and
interpretation were the fruit of their collaboration. Matthias Mauch contributed the chord
transcription algorithm and ran the transcription tasks.
Other Publications
(Anglade et al., 2009a) – Amélie Anglade, Rafael Ramirez and Simon Dixon. Computing
genre statistics of chord sequences with a flexible query system.Demo at the SIGMUS
Symposium, 20098.
(Dixon et al., 2011) – Simon Dixon, Matthias Mauch and Amélie Anglade. Probabilistic
28 | Introduction
and Logic-Based Modelling of Harmony. In Ystad, S., Aramaki, M., Kronland-Martinet, R.,
and Jensen, K., editors, Exploring Music Contents, 7th International Symposium, CMMR 2010,
volume 6684 of Lecture Notes in Computer Science, pages 1–19. Springer Berlin Heidelberg,
This publication provides a summary and comparison of Matthias Mauch’s and the author’s
thesis research.
1 . 5
Thesis Outline
Chapter 1: Introduction
In this chapter we describe the context and motivation for this work. The research goal and
research questions are presented and the thesis contributions are discussed.
Chapter 2: Background and Related Work in Music Information Retrieval
In this chapter we define music theory and in particular harmony-related concepts that will
be used in the remainder of the thesis. We also review prior work in the field of Music
Information Retrieval focusing on music characterisation and classification.
Chapter 3: Background and Related Work in Inductive Logic Programming
This chapter covers the theory, background as well as related work in Inductive Logic
Programming. A review of related logic-based approaches to musical tasks concludes the
Chapter 4: Automatic Characterisation of the Harmony of Song Sets
Based on work also published (Anglade and Dixon, 2008a) and (Anglade and Dixon, 2008b).
In this chapter we present a style characterisation approach using chord sequences of
fixed-length as representation scheme, and Aleph as ILP induction algorithm. We study and
characterise a band, the Beatles represented by their entire studio albums, and a genre, Jazz,
represented by a set of pieces from the Real Book. A qualitative analysis of the extracted
rules, as well as a comparison with a statistical study of the same corpora is provided. We
conclude with an analysis and description of the constraints and limitations of this approach.
1.5. Thesis Outline | 29
Chapter 5: Automatic Genre Classification
Extends work published in (Anglade et al., 2009b) and (Anglade et al., 2009c).
In this chapter we move to the task of genre classification, and use the TILDE algorithm
which is designed for building decision tree models. We introduce a new Context-Free
Definite-Clause Grammar scheme to represent harmony sequences of any length including
gaps. We report on the results of classification experiments on 9 genres, comparing the
discriminative power of several chord properties but also of single decision trees vs. random
forests. We also perform a comparison of our method with another study on the same
dataset using a statistical approach. We conclude with a musicological analysis of some of the
extracted rules.
Chapter 6: Improving on State-of-the-art Genre Classification
Based on work also published in (Anglade et al., 2010).
In this chapter we present a new meta-classification framework extending a state-of-the-art
statistical genre classifier based on timbral features with our harmony models. The latter
are the first-order random forests built in the previous chapter using the best representation
scheme and trained on audio data. Tests on two new genre datasets (not used in our previous
experiments) indicate that the proposed harmony-based rules combined with the timbral
descriptor-based genre classification system lead to significantly improved genre classification
Chapter 7: Conclusions
This chapter provides a summary of our findings and discusses future research topics and
applications of this work.
32 | Background and Related Work in Music Information Retrieval
2 . 1
In this chapter we review background music theory concepts as well as related work in the
field of Music Information Retrieval. We begin by providing music theory definitions which
are essential to the understanding of our work. We particularly have a close look at harmony-
related concepts, terms and notation conventions which we will later use. We go on to
summarise and discuss MIR research to date covering the two tasks we are interested in,
namely automatic characterisation and music genre classification. We provide a description of
those tasks and how they relate to each other, a review of state-of-the-art approaches to tackle
them as well as a discussion of their limitations and recent endeavours to overcome them.
2 . 2
Musical Concept Definitions
The work done in this thesis is based on music theory concepts that we define here. We
will limit those definitions to the scope of this work, which as seen in Chapter 1 does not
have musicological goals but instead borrows useful concepts from musicology for music
information retrieval purposes – some of which might be of interest to musicologists too.
Particularly, if our work spans three genres – classical, jazz and popular music – which have
their own independent bodies of musicological research, it is outside the scope of this work to
explain in detail how their views of the following concepts differ and instead we will focus on
their common traits.
The pitch is the perceptual interpretation of the frequency of a sound wave. It consists of
a pitch-class, notated with the 7 pitch labels of the diatonic scale (A, B, C, D, E, F, G) which
can all be altered with one or several sharps () or flats (), and the octave in which the pitch is
found. In this work we will often omit the octave and treat pitch and pitch-class as synonyms.
Similarly for a musical note, which consists of a pitch and a duration, we will often omit the
duration and use it to refer to its pitch-class. If in music tone is usually the audio instantiation
of a note, here we will only employ that term to refer to the intervals called semitone and (whole)
2.2. Musical Concept Definitions | 33
tone. A semitone is the distance between adjacent notes in the chromatic scale, as on the piano
keyboard, or also the distance between a note and the same note altered by one sharp (raising
by a semitone) or one flat (lowering by a semitone). A tone equals two semitones.
2.2.1 Harmony-Related Terms
The main musical concept addressed throughout this work is harmony. In music, harmony
is concerned with the study of chords and of their structures, progressions and relations. In
Western Music, harmony is one of the fundamental elements of music, clearly as important
as melody and rhythm to which it is linked, the three supporting each other. We use the
most generally accepted definition for a chord: the simultaneous combination of 3 or more
notes. We will exceptionally extend this definition to some 2-note chords when it is clear from
their context that the third note is implied. Another important building block of harmony
is the interval, “the distance between two notes” as Piston (1987) defines it. When the two
tones are sounded simultaneously, he calls it a harmonic interval, which he uses to provide
another definition of the chord: “the combination of two or more harmonic intervals”. A
(non-exhaustive) list of the most important harmonic intervals is provided in Table 2.1.
Table 2.1: List of the most important harmonic intervals. 4 and 5 in the pitch names symbolise
the (fourth and fifth) octave, where the fourth octave extends from middle C up to, but not
including, the next C.
Name diatonic steps semitones Examples
Perfect unison 0 0 C4-C4
Minor second 1 1 C-D, B-C, E-F
Major second 1 2 C-D
Minor third 2 3 C-E, A-C, E-G
Major third 2 4 C-E
Diminished fourth 3 4 C-F, C-F
Perfect fourth 3 5 C-F
Augmented fourth 3 6 C-F, F-B
Diminished fifth 4 6 C-G, B-F
Perfect fifth 4 7 C-G
Augmented fifth 4 8 C-G, C-G
Minor sixth 5 8 C-A, A-F
Major sixth 5 9 C-A
Minor seventh 6 10 C-B, A-G
Major seventh 6 11 C-B
Perfect Octave 7 12 C4-C5
Minor ninth 8 13 C4-D5, B4-C5
Major ninth 8 14 C4-D5
Perfect eleventh 10 17 C4-F5
Major thirteenth 12 21 C4-A5
In this thesis we will focus on a limited number of chord types, or as we will refer to them,
34 | Background and Related Work in Music Information Retrieval
perf 5th perf 5th
maj 3rd
min 3rd
Major triad Minor triad
min 3rd
maj 3rd
Diminished triad
min 3rd
min 3rd
Augmented triad
maj 3rd
maj 3rd
Suspended triads
maj 2nd
“Neutral” chord
perf 5th
perf 4th
sus2 sus4
perf 5th perf 5th
Figure 2.1: Types of triads (and neutral chord) used in this work.
maj 3rd
min 3rd
Dominant 7th Minor seventh Major seventh Major sixth
maj 6th
min 7th
min 3rd
maj 3rd
min 7th
maj 3rd
min 3rd
maj 7th
Figure 2.2: Types of seventh and major sixth chords used in this work.
chord categories. They are defined and characterised by the number of notes and specific (up
to octave equivalence) intervals they contain. We will work with triads, seventh chords and
major sixth chords. Triads are chords of three notes that are separated by 2 intervals of a third.
Figure 2.1 illustrates the types of triads that will be used in the next chapters: major, minor,
diminished, augmented, suspended, neutral (when the third is neither present nor suspended).
Aseventh chord is a triad with an additional note a third above the top note, which is also a
seventh above the root note. The root note is the note used to name the chord, and the lowest
note in that chord when it is played in its root position, i.e. when all notes are separated by
thirds. The chord is inverted when its notes are reorganised. The lowest note in an inverted
chord is called the bass note. Figure 2.2 shows the types of seventh chords that will be used
in this thesis: dominant 7th, minor 7th, major 7th. Finally we also use the major sixth chord,
also known in classical music as an added sixth chord and in popular music as a sixth chord,
which is a triad with an added sixth interval from the root. It is included in our study due
to its extensive use in modern popular music. It can also be found in Figure 2.2. Hence we
only consider tertian chords, i.e. chords which are built of superimposed thirds, their inversions
and chords based on tertian chords where some intervals are omitted (such as the 3rd in the
“neutral” triad), replaced with other intervals (e.g. suspended triads where the 3rd is replaced
with a 2nd or a 4th) or added (such as the added 6th in the major sixth chord). Moreover it is
noticeable that chords with larger intervals than the 7th (such as 9th, 11th and 13th chords)
have the same functions as the 7th chords they contain and extend, which is why we omit them
from our vocabulary of chords for our experiments and instead focus on the underlying seventh
and triad chords. We acknowledge though that it is a simplification and at the perception level
these more complex chords would still sound different and can be used by composers and
musicians to add harmonic colours to their music.
2.2. Musical Concept Definitions | 35
tone tone semitone tone tone tone semitone
Figure 2.3: Example of the major mode: C Major scale, with pitch labels and degrees.
Another essential concept in harmony is tonality, the hierarchical organisation of pitches
in a scale and around a tonal centre called the tonic. In harmony analysis the seven pitches of
adiatonic scale are identified with roman numerals which correspond to their position in the
scale, also called their degree, the tonic being the first degree (I). The full list of degrees and
their names is provided in Table 2.2. Each chord can then be characterised by its type, its
root note or degree of its root note and potentially its inversion or bass note. When degrees
are used we talk about Roman numeral analysis which allows to identify patterns across music
pieces with different tonal centres. The tonic together with the type of scale, the mode, are
grouped into the key. Although some of the genres studied for this work might occasionally
use different modes, we will limit the modes used to those of common practice: major and
minor. This has the advantage of providing a common referential in which we can compare
various genres that have all been represented or transcribed into those two modes. Studies
comparing genres sharing other modes could however employ those as well or instead. Major
and minor modes are characterised by the respective positions of the tones and semitones in
the diatonic scale which are illustrated in Figures 2.3 and 2.4. The tuning system assumed
to be underlying all pieces studied in this work is the 12-tone equal temperament, in which the
octave is divided in 12 equal semitones, or chromatic intervals, allowing for transposition to
other keys: movement of the tonal centre and all pitches while keeping the intervals between
all notes identical. When a change of tonal centre is only temporary and occurs inside a piece
of music then we talk about modulation.
Table 2.2: Diatonic scale degrees.
Degree Name
I Tonic
II Supertonic
III Mediant
IV Subdominant
V Dominant
VI Submediant
VII Leading note
36 | Background and Related Work in Music Information Retrieval
Figure 2.4: Example of the minor mode: A (natural) minor scale, with pitch labels and degrees.
Finally higher level concepts of harmony that will be mentioned in this thesis are cadence
and harmonic rhythm. A harmonic cadence is a chord sequence that acts as a punctuation
mark at the end of a musical phrase or a musical piece. Specific examples of cadences will be
introduced in later chapters when they will be needed or identified. Harmonic rhythm refer to
either the specific rhythm (original definition) or more generally the tempo at which the chord
changes are happening in a piece of music.
2.2.2 Chord Notations
To study and analyse the harmony of a piece of music one looks at all the chords it contains and
labels them. In classical music all the individual notes are usually provided in the score, without
explicit notation of their harmonic function, and it is the task of the musician or musicologist
to group them together into chords and then perform harmonic analysis to identify degrees,
types and inversions of the chords, with notation conventions such as classical Roman numeral
analysis. In popular music and jazz however it is more common to directly represent the
chords in a shorthand fashion without specifying the individual notes, and root note is often
preferred over degree. Those shorthand labels are explicit so that they can be played at sight.
They juxtapose root note, chord type and inversion (preceded by a forward slash: “/”). Such
jazz/pop/rock shorthand chord labels are found in lead sheets (e.g. on top of lyrics) and real or
fake books for instance. The various chord syntaxes mentioned here are illustrated in Figure
2.5 (taken from (Harte et al., 2005) where they are also described in more detail).
The datasets we use for our experiments were provided by their creators in what can
be considered as standard formats now in the Music Information Retrieval community.
The first format is the Band in a Box file format. Band in a Box1is an accompaniment
software for musicians who need a computer generated “band” to play along with them. The
general interface displays a list of bars in which the user can type in chords (annotated in
a jazz/pop/rock shorthand fashion). More parameters allow the user to define the tempo,
the style, the repetitions, etc. Thus, a Band in a Box file contains a user supplied list of
2.2. Musical Concept Definitions | 37
Figure 2.5: A short extract of music in C major with different harmony notations: a) Musical
score b) Figured bass, c) Classical Roman numeral, d) Classical letter, e) Typical Popular music
guitar style, f) Typical jazz notation. From (Harte et al., 2005).
pairs (time, chord)and as such can be seen as a simplified music score. The second format
is the chord notation introduced by Harte et al. (2005) which was then integrated into the
Music Ontology (Raimond et al., 2007) as the Chord Ontology2to allow for structured RDF
(Resource Description Framework) descriptions of harmonic events. In this representation
each harmonic event (or chord) is associated with a start time, an end time and a web identifier
from which one can retrieve an RDF description of the chord. As shown in Figure 2.6, in
the Chord Ontology each RDF description of a chord contains in turn none (if the chord is
unknown), one or several of the following:
the root note of the chord,
the bass note of the chord,
the component intervals of the chord (additive description), or a base chord (i.e. maj, 7,
sus4, etc.) and optionally the intervals from that base chord that are not contained in
the current chord (subtractive description).
All datasets used in our experiments will be pre-processed with specific parsers or converters
extracting from the aforementioned formats the chord types and transforming them into
jazz/pop/rock shorthand representations of the chord types. The shorthand notations for the
38 | Background and Related Work in Music Information Retrieval
Figure 2.6: Model of a chord in the Chord Ontology. From http://motools.sourceforge.
net/chord/ licensed by Christopher Sutton, Yves Raimond and Matthias Mauch under
Creative Commons Attribution 1.0 Generic.
chord categories used in this work are shown in Table 2.3. All parsers and converters will be
described and introduced in later chapters, together with the datasets they will be used on.
Table 2.3: Shorthand notations of main chord categories as used in this work.
Full chord name frequently abbreviated as shorthand label
major triad major chord maj
minor triad minor chord min
diminished triad diminished chord dim
augmented triad augmented chord aug
suspended (second or fourth) triad sus chord sus
“neutral” triad neutral chord neut
dominant seventh dominant chord or dominant 7th dom or 7
major seventh major 7th maj7
minor seventh minor 7th min7
major sixth major 6th maj6
2 . 3
Related Work in Music Information Retrieval
2.3. Related Work in Music Information Retrieval | 39
In this section we describe the Music Information Retrieval state-of-the-art approaches to the
tasks of characterisation and classification and how they relate to each other.
2.3.1 Characterisation
The discussion of music theoretical terms in the previous section is important because as Piston
(1987) describes it in the introduction of his book, Harmony,“a secure grounding in [music]
theory is [...] a necessity [...], since it forms the basis for any intelligent appraisal of individual styles
of the past or present”, suggesting that harmony is one of the areas of music theory that allows
to identify both common practices and individual styles. The act of finding and “describ[ing] the
distinctive nature or features of” an item or concept is what the Oxford Dictionary defines as
characterisation. Because of its descriptive nature, in Music Information Retrieval, the task
of automatic characterisation lies at the border with computational (ethno)musicology. For
instance it is for its descriptive power that Taminau et al. (2009) employ the rule learning
technique Descriptive Subgroup. It enables them to discover in a dataset of folk tunes
both subgroups and interpretable rules describing them. The latter are of great importance
for ethnomusicologists. Characterisation studies are also conducted on composers (van
Kranenburg, 2006), genres (Pérez-Sancho, 2009) and on musical corpora representing or
exhibiting specific styles. For instance in search of chord idioms, Mauch et al. (2007) made
an inventory of chord sequences present in the Real Book (a corpus representing an entire
genre, Jazz) and in the Beatles’ studio albums (a corpus representing a specific band). Their
approach is entirely statistical and resulted in an exhaustive list of chord sequences together
with their relative frequencies.
Additionally the methods employed for characterisation often belong to the pattern
recognition domain, since, as van Kranenburg (2006) describes it, it fits within Meyer’s theory
of musical style which states that “style is a replication of patterning, whether in human behavior
or in the artifacts produced by human behavior, that results from a series of choices made within some
set of constraints” (Meyer, 1989). McKay and Fujinaga (2007) for instance have developed
an entire computer-based framework implementing state-of-the-art Pattern Recognition and
Data Mining techniques. It is meant to be used by musicologists for exploratory analysis
of large amount of data and considering many musical aspects (or features) at a time.
Furthermore for accuracy reasons many of those pattern recognition studies are conducted
on symbolic data, extracted from scores or score-like data (e.g. extracted from audio by mean
of transcription techniques), such as in (Pérez-Sancho, 2009) where naïve Bayes and n-grams
models are first tested on symbolic melodic and harmonic data and then extended to audio
40 | Background and Related Work in Music Information Retrieval
data using polyphonic transcription and chord recognition algorithms respectively.
However if what we mean precisely by characterisation in this work is the descriptive
analysis whose goal is the (human-readable) description itself, it is important to notice
that characterisation techniques are often used to another end. Indeed the result of a
characterisation process, the description itself can be seen and used as model of the style
it represents. Cope (1996), after modelling characteristic compositional traits of various
classical composers, automatically composed new musical works in their styles with impressive
results. Similarly, after characterising Johann Sebastian Bach’s fugue compositions and those
of 9 other composers (his son and students) using 20 features mostly focusing on polyphonic
characteristics, van Kranenburg (2006) use a Fisher-transformation to project the multi-
dimensional representations of the fugues onto a 2 dimensional space where the compositions
of each composer are expected to form a separate cluster. As seen in this work a few of
Bach’s disputed compositions actually cluster closer to other composers which means that this
characterisation technique is a useful tool for discussions of authorship attribution. Finally,
the most common application of characterisation is certainly to the task of classification.
2.3.2 Automatic Genre Classification
In Music Information Retrieval, classification consists in the automatic tasks of learning and
assigning labels to pieces of music. Because genre is a characteristic of music that has been
historically and widely used to organise music, and even though it is a “ill-defined” concept that
even experts would not agree on (Aucouturier and Pachet, 2003), music genre classification –
also sometimes called music genre recognition – has been one of the earliest and most widely
investigated MIR tasks (Lee et al., 2009; Sturm, 2012). Most of the works in music genre
classification to date focus on the task of assigning a single label to each piece of music
– which we will also limit this work to – but some work on multi-label and multi-domain
approaches to this problem have also been published (Lukashevich et al., 2009). Other MIR
classification tasks include mood and emotion classification which are outside the scope of
this work. Automatic tagging – cf. (Bertin-Mahieux et al., 2010) for an overview – is a larger
problem that we will also not describe here.
The topic of music genre classification itself being so popular we can not possibly cover the
entire music genre recognition literature but we refer the reader to surveys such as the one
from Scaringella et al. (2006), as well as the thorough review work from Sturm (2012) who,
even though he focuses on evaluation of music genre recognition algorithms, references all
publications up to December 2012 on this topic.
2.3. Related Work in Music Information Retrieval | 41
Bag-of-Frames / Bag-of-Features Approach
The majority of genre classification systems are signal-based – cf. (Scaringella et al., 2006) for
an overview of these systems – and most of them are based on the so-called “Bag-of-Frames”
or “Bag-of-Features” (BOF) approach. It proceeds as follows:
1. Each class is represented by several audio examples.
2. For each of these examples, the acoustic signal is cut into short overlapping frames
(typically 50 ms frame with an overlap of 50%).
3. For each frame a feature vector is computed (typically spectral features such as Mel
Frequency Cepstrum Coefficients).
4. These feature vectors are given to a statistical classifier (e.g. Gaussian Mixture Models)
which models the global distributions or average values of these vectors over the whole
piece or passage for each class. Interestingly such distributions have not only been used
to separate the examples into classes but also to compute similarity measures between
examples for tasks such as retrieval or recommendation (Aucouturier and Pachet, 2008),
making the background literature on music genre classification and music similarity
Typically the features used in the BOF are low level descriptors of music, focusing mostly on
timbral texture (Aucouturier and Pachet, 2004), rhythmic content (Gouyon and Dixon, 2005),
pitch content (melody or harmony) (Gómez, 2006) or, as suggested by Tzanetakis and Cook
(2002), and in accordance with the modular architecture of music processing in the human
brain pointed out by the neuroscientists Peretz and Coltheart (2003), a combination of the
three (Basili et al., 2004; Berenzweig et al., 2004; Cano et al., 2005; Shen et al., 2006).
One interesting example of the use of the BOF is the Extractor Discovery System (EDS),
an expert system developed at Sony CSL (Zils, 2004). Its particularity lies in optimising
combinations of signal processing features with genetic programming. It is able to distinguish
sounds with different timbres, even when they are played on the same instrument with only
slight modifications of timbre (Roy et al., 2007), to learn subjective measures such as the
perceived intensity (Zils and Pachet, 2003) and to build a classifier “modelling [urban sounds]
to near-perfect precision”, but fails in classifying polyphonic music with the same precision
(Aucouturier et al., 2007).
It has indeed been suggested by many that the BOF presents a glass-ceiling, or in other
words a maximum accuracy that can not be surpassed, even when optimising its various steps
42 | Background and Related Work in Music Information Retrieval
(e.g. design of the signal-based features, feature selection, etc.) (Aucouturier and Pachet,
2004). Other reported and connected limitations of the BOF include the creation of false
positive hubs (Aucouturier and Pachet, 2008) and false negative orphans (Pampalk, 2006),
respectively abnormally similar and dissimilar to any other piece. Explanations for these
behaviours such as the curse of dimensionality – the feature space being high-dimensional
– (Karydis et al., 2010), as well as fixes for them e.g. using mutual proximity (Flexer et al.,
2012) have also been provided by the community.
If those shortcomings are purely statistical, the MIR community has also criticised the
BOF approach for ignoring the musical nature and properties of the content it classifies.
For instance Aucouturier and Pachet (2008) explain that most of the time there is no direct
mapping between the acoustical properties and mental representation of a musical entity, such
as genre or mood. They also point out that contrary to the bag-of-frames assumption the
contribution of a musical event to the perceptual similarity is not proportional to its statistical
importance – rare musical events can even be the most informative ones to determine its
genre (Aucouturier et al., 2007). Moreover the bag-of-frames approach ignores the temporal
organisation of the acoustic signal. Indeed rarely does time modelling go beyond delta features
– comparing values of the current frame with those of the preceding one. And yet when
comparing pieces from similar genres or passages of a same song it is crucial in the retrieval
process to use sequences and not only average values or global statistical distributions of
features over a whole passage or piece (Casey and Slaney, 2006). In summary, the BOF
approach, based on low-level signal-based content descriptors, lacks high-level, contextual
concepts which are equally important for the human perception and characterisation of music
genres (McKay and Fujinaga, 2006).
Combining Low and Higher Level Descriptors/Features
Thus, recently, several attempts have been made to use, or integrate with state-of-the-art
low-level audio features, such higher-level or contextual features, including: long-time audio
features (Meng et al., 2005), statistical (Lidy et al., 2007) or distance-based (Cataltepe et al.,
2007) symbolic features, text features derived from song lyrics (Neumayer and Rauber, 2007),
cultural features or contextual features extracted from the web (Whitman and Smaragdis,
2002), social tags (Chen et al., 2009) or combinations of several of these high-level features
(McKay and Fujinaga, 2008).
2.3. Related Work in Music Information Retrieval | 43
Using Sequences
When dealing with symbolic data, the Bag-of-Frames approach can obviously not be applied.
However, as Hillewaere et al. (2009) explain, much of the work on genre classification of
symbolic musical data uses global features, i.e. features at the level of the entire piece of music.
They nonetheless show that models using event features, i.e. features representing the pieces
of music as sequences of events, outperform global feature models. In their case the models
employed with such event features are n-grams, and also their own multiple viewpoint model.
Pérez-Sancho et al. (2009) also employ n-grams to represent melodic and harmonic sequences
and perform genre classification on symbolic data. They also prove that the same sequence-
based approach can be applied to audio data (Pérez-Sancho et al., 2010).
Harmony-Based Approaches to Music Genre Classification
Although some harmonic (or chord) sequences are famous for being used by a composer or
in a given genre, harmony is scarcely found in the automatic genre recognition literature as
a means to that end. Pérez-Sancho et al. (2008) investigated whether stochastic language
models of harmony including naïve Bayes classifiers and 2-, 3- and 4-grams could be used
for automatic genre classification on both symbolic and audio data. They reported better
classification results when using a richer vocabulary (i.e. including seventh chords), reaching
3-genre classification accuracies on symbolic data of 86% with naïve Bayes models and 87%
using bi-grams (Pérez-Sancho et al., 2009). To deal with audio data generated from MIDI they
used a chord transcription algorithm and obtain accuracies of 75% with naïve Bayes (Pérez-
Sancho, 2009) and 89% when using bi-grams (Pérez-Sancho et al., 2010). Earlier attempts at
using harmony include Tzanetakis et al. (2003), who introduced pitch histograms as a feature
describing the harmonic content of music. Statistical pattern recognition classifiers were
trained to extract the genres. Classification of audio data covering 5 genres yielded recognition
rates around 70%, and for audio generated from MIDI files rates reached 75%. However
this study focused on low-level harmony features. Only a few studies have considered using
higher-level harmonic structures, such as chord progressions, for automatic genre recognition.
In (Shan et al., 2002), a frequent pattern technique was used to classify sequences of chords
into three categories: Enya, Beatles and Chinese folk songs. The algorithm looked for
frequent sets, bi-grams and sequences of chords. A vocabulary of 60 different chords was
extracted from MIDI files through heuristic rules: major, minor, diminished and augmented
triads as well as dominant, major, minor, half and fully diminished seventh chords. The best
two way classifications were obtained using sequences with accuracies between 70% and
44 | Background and Related Work in Music Information Retrieval
84%. Lee (2007) considered automatic chord transcription based on chord progression. He
used hidden Markov models on audio generated from MIDI and trained by genre to predict
the chords. It turned out he could not only improve chord transcription but also estimate
the genre of a song. He generated 6 genre-specific models, and although he tested the
transcription only on the Beatles’ songs, frame rate accuracy reached highest level when using
blues- and rock-specific models, indicating that models could be used to identify genres.
2 . 4
In this chapter we have provided definitions and context information for the music theory
and musicological concepts and terms we will be using in the following chapters. Harmony
being the domain we have decided to explore as a high-level descriptor of music we have
taken the time to define and explain the terms and notations associated with it, including
in particular chord symbols and notation conventions. We also reviewed related work on
style characterisation and music genre classification which are the tasks we will explore
using harmony only in Chapters 4 and 5 respectively. We saw that what we defined as
characterisation, the task of analysing and extracting patterns of interest in pieces of music
representing a unified musical style, has not only been explored by the MIR community as a
task in itself for its musicological and ethnomusicological applications, but has also been used
as an intermediate step in retrieval and identification tasks. The most popular of such tasks
is the extensively studied problem of music genre classification which we have also reviewed
and for which we have also described the most commonly employed approaches. Many of
those in fact build models which are black-boxes (due to the low-level signal based-features
they employ) and ignore high-level and temporal musical properties of the items they classify.
We saw that promising solutions in music genre classification have employed either high-level
and contextual features or sequences of musically meaningful events. It is at the intersection
of these that harmony-based music genre classification approaches lie. It is clear from our
literature review that most of those harmony-based methods use n-grams or other statistical
sequential models of fixed-length. If such sequential harmony models have shown to have
a distinctive characterisation power which we shall build upon in this thesis, much could be
2.4. Conclusions | 45
done to better capture of the essence of harmony. One limitation of these models is their
lack of flexibility, which we address by applying techniques from another domain to both
represent and infer harmony-based models for characterisation and classification: Inductive
Logic Programming, which we will now review in Chapter 3.
48 | Background and Related Work in Inductive Logic Programming
3 . 1
Inductive Logic Programming (ILP) is a field at the intersection of Machine Learning and
Logic Programming (Muggleton, 1991). It is a technique that learns from examples (i.e.
which induces general rules from specific observations). Based on a first-order logic framework
it permits to express concepts that might not be formulated in a traditional attribute-value
framework (Lavrac and Džeroski, 1994). Moreover it supports background knowledge and
can handle imperfect data. At first, ILP was restricted to binary classification tasks but recently
it has been adapted to many more data mining tasks. Finally ILP has already been successfully
used for knowledge discovery and to build expert/reasoning systems in various engineering and
research domains including MIR.
3 . 2
A Definition of ILP
To define what Inductive Logic Programming is we first describe the tasks of inductive concept
learning (without and with background knowledge) and relational learning.
3.2.1 Inductive Concept Learning
Given a universal set of objects U, a concept Cis a subset of objects in U(CU). The problem
of inductive concept learning can be defined as follows: given instances and non-instances of
C, find a hypothesis able to tell for each xUwhether xC.
To perform an inductive concept learning task one needs to specify a language of examples LE
which defines the space of instances considered (i.e. U) and a language of concept description
LHwhich defines the space of hypotheses considered. If an example eexpressed in LEis an
instance of the concept Cthen eis a positive example of Cotherwise it is said to be a negative
example of C. A coverage relation between LHand LE,covers(H, e)needs also to be specified.
It returns true when the example ebelongs to the concept defined by the hypothesis Hand
3.2. A Definition of ILP | 49
false otherwise. We can then define a new relation:
covered(H, E) = {eE|covers(H, e) = true}
which returns the set of examples Ewhich are covered by H. So the problem of inductive
concept learning can be reformulated as follows: given a set of examples Econtaining positive
(set E+) and negative (set E) examples of a concept Cexpressed in a given language of
examples LE, find a hypothesis Hdescribed in a given language of concept description LH
such that:
every positive example eE+is covered by H(completeness): covered(H , E+) = E+
no example eEis covered by H(consistency): covered(H, E) =
3.2.2 Inductive Concept Learning with Background Knowledge
When a concept learner has also access to prior knowledge, this prior knowledge is called
background knowledge.
The task of inductive concept learning with background knowledge is described as follows:
given a set of examples Eand background knowledge B, find a hypothesis Hdescribed
in a given language of description LHsuch that it is complete and consistent with respect
to the set of examples Eand the background knowledge B(covered(B , H, E+) = E+and
covered(B , H, E) = ).
Notice that the covers relation is extended as follows: covers(B, H, e) = covers(BH, e).
3.2.3 Relational Learning
One class of learning systems is the class of relational learners. They deal with structured
concepts and structured objects defined in terms of their components and relations among
them. These relations constitute the background knowledge.
The languages of examples and of concept description used by relational learners are typically
subsets of first-order logic. When the hypothesis language used by a relational learner is the
language of logic programs (Lloyd, 1987) it is called an inductive logic programming system. It
turns out that in most ILP systems not only the hypotheses are expressed in logic program
form but also the examples and the background knowledge (with additional restrictions for
each of the languages).
So in the case of ILP the coverage relation can be written: covers(B, H, e)BH|=ewhere
|= stands for logical implication or entailment.
50 | Background and Related Work in Inductive Logic Programming
Figure 3.1: Michalski’s train problem. From (Michalski, 1980).
3.2.4 A Simple Inductive Logic Programming Problem
To understand in more detail how ILP works we illustrate its principle using a simple and well-
known relational learning problem: Michalski’s train challenge (Michalski, 1980).
Descriptions are provided for ten trains, five eastbound trains and five westbound trains.
Each description contains information about the number of carriages of a train, the length of
each carriage (which can be long or short), the roof of each carriage (open or closed roof), the
number of wheels each carriage has, and the loads carried or not in each carriage (information
about their presence and about their shapes). The description of these ten trains is illustrated
in Figure 3.1. The challenge consists in finding a way to generalise from the examples and
automatically distinguish the eastbound trains from the westbound trains (binary classification
An ILP system can learn a rule that defines what is an eastbound train. In ILP the positive
examples are described as Prolog facts, so in this problem the positive examples can be
expressed as follows1:
Similarly the negative examples (i.e. facts that are false) are:
1the predicates employed in this example are the ones used by Srinivasan (2003) to express the same problem.
3.2. A Definition of ILP | 51
Then we need to store the descriptions of each train in our background knowledge. For
instance the description of the first eastbound train in Prolog facts can be expressed as follows:
Notice that the background knowledge is not limited to facts and can contain rules. For
instance imagine that we want to add information about the carriages’ positions in the train in
terms of which carriage follows which other carriage. We could add the following facts:
52 | Background and Related Work in Inductive Logic Programming
where succ(X,Y) means X follows Y.
Using the predicates from the positive examples, negative examples and background
knowledge the ILP system can then generate the following hypothesis which covers all the
positive examples and none of the negative examples:
eastbound(A) :- has_car(A, B), short(B), closed(B).
which says that an eastbound train always has a carriage which is short and closed.
3 . 3
ILP Techniques and Frameworks
Let us go back to the basic ILP problem of relational rule induction. To induce a hypothesis
Hwhich is complete and consistent with respect to a set of examples Eand background
knowledge Bwithout enumerating all the possible results, several ILP techniques have been
developed including least general generalisation, inverse resolution, inverse entailment. It is
beyond the scope of this thesis to enumerate and explain all the possible ILP techniques to
search the space of clauses. For a good overview and description of these techniques we refer
the reader to (Džeroski et al., 2000).
The goal of this thesis is not to develop a new ILP technique or framework. That is why
we looked at established ILP frameworks, starting from the in-depth comparison provided
in (Maclaren, 2003, Section 3.2). After testing we selected Aleph (Srinivasan, 2003) for its
usability, responsiveness of its user community and existing examples of its use in MIR tasks.
We later moved on to TILDE as we wanted to perform classification. TILDE not only is a
classification algorithm, it also benefits from a very active and responsive maintenance team
which constantly optimises its performance. The few other candidates were rejected for their
non-maintained state, lack of support or poor performance.
Aleph (used in Chapter 4) is based on inverse entailment. Inverse entailment consists in
selecting an uncovered example, saturating it (i.e. looking for all the facts that are true about
this example using the example itself and the background knowledge) to obtain a bottom clause
(the disjunction of all the facts found in the saturation phase) and searching the space of clauses
3.4. Relational Data Mining | 53
that subsumes this bottom clause in a top-down manner starting from the shortest clauses. The
clause that covers the maximum number of positive examples and the minimum number of
negative examples (i.e. which maximises a score function based on the number of positive and
negative examples covered by this clause) is kept as a hypothesis. The examples covered by
this hypothesis are removed and the next uncovered example is selected to be saturated, and so
on until no uncovered example is left. Finally Aleph returns a set of hypotheses that covers all
the positive examples. Note that like most of the recent ILP systems, Aleph is able to handle
noise and imperfect data. One of the parameters the user can change is the noise level, which
is the amount of negative examples that can be covered by a hypothesis.
In order to build classification model, we use in Chapter 5 TILDE. It is a first order logic
extension of the C4.5 decision tree induction algorithm (Quinlan, 1993). Like C4.5 it is a
top-down decision tree induction algorithm. The difference is that at each node of the trees
conjunctions of literals are tested instead of attribute-value pairs. At each step the test (i.e.
conjunction of literals) resulting in the best split of the classification examples is kept. As
explained in (Blockeel and De Raedt, 1998) “the best split means that the subsets that are obtained
are as homogeneous as possible with respect to the classes of the examples”. By default TILDE uses
the information gain-ratio criterion (Quinlan, 1993) to determine the best split. TILDE builds
first-order logic decision trees expressed as ordered sets of rules (or Prolog programs). For an
example illustrating the induction of a classification tree from a set of examples covering three
musical genres, we refer the reader to the Figure 5.2 in Chapter 5.
Relational Data Mining
If relational rule induction was the first and is still the most common task of ILP, it is no
longer restricted to it. The ILP approach has been extended to most data mining tasks. For
each data mining technique using a propositional approach a relational approach using first-
order logic has been suggested and classified under the umbrella term Relational Data Mining
(Džeroski and Lavrac, 2001b). Note that there is a trade-off between the expressiveness of
first-order logic and computational complexity of the algorithms using such an approach. This
explains why these relational data mining techniques were successfully developed only recently.
Džeroski (2006) gives an overview of all the relational data mining techniques one can now
54 | Background and Related Work in Inductive Logic Programming
use. Among them:
induction of relational classification rules – with the ICL software (Van Laer and De
Raedt., 2001),
relational classification using nearest-neighbors – with RIBL (Emde and Wettschereck,
1996) and RIBL2 (Horváth et al., 2001; Kirsten et al., 2001),
relational decision trees – TILDE (Blockeel and De Raedt, 1998),
first-order random forests (Van Assche, 2008) – also implemented in TILDE,
relational regression trees and rules – TILDE, S-CART (Kramer, 1996) and RIBL2,
relational clustering (Kirsten et al., 2001)
frequent pattern discovery (Dehaspe, 1999),
discovery of relational association rules (Dehaspe and Toivonen, 1999, 2001).
3 . 5
Applications of ILP
3.5.1 A Tool Used in Many Disciplines
ILP has been successfully used for knowledge discovery and to build expert/reasoning systems
in various engineering and research domains. For instance it has been used to learn rules for
early diagnosis of rheumatic diseases (using examples and background knowledge provided
by an expert), to design finite element meshes (by constructing rules deciding appropriate
mesh resolution, a decision usually made by experts), to predict protein secondary structure,
to design drugs (by finding structure-activitiy relations of the chemical components), to learn
diagnosis rules from qualitative models. For a detailed description of these examples we refer
the reader to (Lavrac and Džeroski, 1994) and (Bratko and Muggleton, 1995). It is also
extensively used in Natural Language Processing (Džeroski et al., 2000; Claveau et al., 2003).
3.5. Applications of ILP | 55
3.5.2 Musical Applications of ILP
Not surprisingly, ILP and similar inductive logic approaches have also been successfully used
on musical data.
Widmer worked on identifying relevant rules of expressive performance from MIDI
recordings of W.A. Mozart’s sonatas performed by different pianists on a Bösendorfer SE290
computer-monitored grand piano (Widmer et al., 2003). Because it was not possible to
build completely discriminative models, which would mean that the artists who perform are
“perfectly consistent and predictable” (Widmer, 2001), he developed the PLCG (for Partition
Learn Cluster Generalize) algorithm, an inductive rule learning system which builds partial
models, i.e. models that explain only the examples that can be explained (Widmer, 2003).
The target was to learn local rules (i.e. for each note) concerning the tempo (accelerando or
ritardando), dynamics (crescendo or diminuendo) and articulation properties (staccato,legato or
portato) of the note. To illustrate each concept, positive examples were given to the system and
they were also used as negative examples of the competing classes. The background knowledge
was fed with descriptions of each note containing information about intrinsic properties (e.g.
as duration, metrical position) and information about the context of the note (such as the
interval between a note and its predecessor, and the duration of surrounding notes). The
PLCG algorithm extracted 17 expressive performance rules (2 for accelerando, 4 for ritardando,
3 for crescendo, 3 for diminuendo, 4 for staccato, 1 for legato) among which some were surprising
but nevertheless relevant performance rules, such as:
Given two notes of equal duration followed by a longer note, lengthen the note (i.e., play
it more slowly) that precedes the final, longer one, if this note is in a metrically weak
position [...]; none of the existing theories of expressive performance were aware of this
simple pattern”.
In a similar study, Dovey (1995) analysed and extracted rules from piano performances of
Rachmaninoff recorded in the 1920’s on an Ampico Recording Piano. For that he used the
PROGOL ILP system.
His work was extended by VanBaelen and De Raedt (1996) who used both Ampico recordings
and MIDI performance data analysed using the Sound Harmony Melody Rhythm and Growth
(SHMRG) model (LaRue, 1970). With additional context information (i.e. more background
knowledge containing also non-local rules) coming from the analysis of the MIDI pieces they
obtained better rules of performance regularities than Dovey’s and used them to predict the
56 | Background and Related Work in Inductive Logic Programming
performance of each note. These predictions were then encoded as MIDI information. A
listening analysis of these files showed that at expressive performance was not modelled well
at a global level, but at a local level, some bars were actually very well interpreted by the
automatic system.
But Inductive Logic Programming has not only been employed for musical performance
analysis. Morales (1997) implemented a pattern-based first-order inductive system called
PAL to learn counterpoint rules. The system looks for patterns in the notes, described
by their pitch (including octave) and voice, using background knowledge restricted to
the classification of intervals between pairs of notes into perfect or imperfect consonant,
and dissonant, valid and invalid intervals. PAL was fed with a small number of examples of
the four counterpoint rules of the first species and was able to induce those rules automatically.
The most recent work using ILP for MIR is Ramirez’s. His first ILP based application learns
rules in popular music harmonisation using Aleph (Ramirez, 2003). The rules were constructed
at a bar level (and not at a note level) to capture chord patterns. The structure (i.e. musical
phrases) of the songs given as examples was manually annotated, which provided the system
with a rich background knowledge containing not only local but also global information. The
system proved to be capable of inducing very simple and very general rules. But the fact that
manually annotated data is necessary limits the scalability of such a system.
Later on, Ramirez et al. (2004) studied Jazz performance but starting from audio examples
this time. Monophonic recordings of jazz standards were automatically analysed, extracting
low level descriptors (instantaneous energy and fundamental frequency), performing some
note segmentation and using those results to compute note descriptors. The positive and
negative examples given to the ILP system (Aleph) were these automatically extracted note
descriptors. Ramirez et al. were interested in differences between the score indication and
the actual interpretation of a note. So they asked the system to induce rules related to the
duration transformation (lengthen, shorten or same) of a note, its onset deviation (advance,
delay, or same), its energy (soft, loud and same) and note alteration which refers to alteration
of the score melody by adding or deleting notes (consolidation, ornamentation and none). The
background knowledge was composed of information about the neighbouring notes and the
Narmour group(s), i.e. basic melodic structural units based on the Implication-Realisation
model of Narmour (1990), to which each note belongs. The tempo of the performance was
also given to the ILP system in order to study if it had an influence on the performance rules.
3.6. Conclusions | 57
Some rules induced by the system turned out to have a high coverage which confirmed the
presence of pattern in jazz expressive performance.
Finally, following Van Baelen and De Raedt’s idea, Ramirez and Hazan (2006)
implemented a framework which analyses classical violin performance by means of both
an ILP technique (the relational decision tree learner called TILDE) and a numerical
method. Another component of this system then uses these results to synthesise expressive
performances from unexpressive melody descriptions.
In this chapter we have provided an introduction to Inductive Logic Programming and its core
concepts through definitions and a simple example. We have also chosen and described two
ILP techniques that we will use in our experiments. We finally reviewed applications of ILP,
emphasising on its use in MIR.
We have seen that harmony has been already modelled with ILP with promising results.
Additionally the numerous studies on musical performance with ILP allowed us to compare
and identify interesting practices and algorithms. Building on the experience gathered by
Ramirez et al. we will combine harmony and the ILP systems Aleph and TILDE in our own
experiments in Chapters 4 and 5.
60 | Automatic Characterisation of the Harmony of Song Sets
4 . 1
In this chapter we present our first attempt at describing sets of songs using harmony-based
representation and relational induction of logical rules. The starting point of this first approach
is a paper by Mauch et al. (2007) in which the authors study two distinct corpora of two distinct
genres which might still exhibit shared harmony practices. The two genres are British pop,
represented by The Beatles, and jazz represented by a set of songs from the Real Book songs.
We extract their respective most common chord sequences using a statistical approach. We
present here our own analysis of the exact same symbolic corpora which is in our case entirely
based on Inductive Logic Programming and compare the two approaches, stating how our
methodology overcomes theirs. In Section 4.2 we explain our methodology to automatically
extract logical harmony rules from manually annotated chords. In Section 4.3 the details and
results of our automatic analysis of the Beatles and Real Book with ILP are presented. As in
the next chapters the primary focus is on methodology and knowledge representation, rather
than on the presentation of new musical knowledge extracted by the system. However we
qualitatively evaluate the characterisation power of our methodology by performing a short
musicological analysis of the harmony rules we automatically extracted. We conclude with
an analysis and description of the constraints and limitations of the specific Inductive Logic
Programming software used in this study, Aleph, and explaining how that led us to experiment
with other knowledge representations and ILP induction techniques and software (Section
4 . 2
As seen in Section 2.3.1, in search of chord idioms, Mauch et al. (2007) made an inventory
of chord sequences present in a subset of the Real Book and in The Beatles’ studio albums.
Their approach is entirely statistical and resulted in an exhaustive list of chord sequences
together with their relative frequencies. To compare the results of our relational methodology
4.2. Methodology | 61
with their results obtained with a statistical method, we examine RDF (Resource Description
Framework) descriptions of the two manually annotated collections they use:
Harte’s transcriptions of the 180 songs featured on the Beatles’ studio albums1containing
a total of 14,132 chords (Harte, 2010),
• transcriptions of 244 Jazz standards from the Real Book2containing 24,409 chords
(various, 2004).
These transcriptions constitute a compact symbolic representation of the songs: the chords
are manually labelled in a jazz/pop/rock shorthand fashion (explained in more details in Section
2.2.2) and their start and end times are provided.
The steps to extract harmony rules from these songs transcriptions are summarised
as follows: First the RDF representation of the harmonic events is pre-processed and
transcribed into a logic programming format that can be understood by an Inductive Logic
Programming system. This logic programming representation is passed to the ILP software
Aleph (Srinivasan, 2003) which induces the logical harmony rules underlying the harmonic
4.2.1 Harmonic Content Description
The RDF files describing the Beatles and Real Book songs we study contain a structured
representation of the harmonic events based on the Music Ontology (Raimond et al., 2007)
as described in Section 2.2.2.
We implemented an RDF chord parser to transcribe RDF chord representation into Prolog
files that can be directly given as input to Aleph. For each of these chords it extracts the root
note, bass note, component intervals (extracted from the additive or subtractive description
of the chord), start time and end time from the RDF description. It then computes the chord
category and degree (if key is given) of a chord and the root interval and bass interval between
two consecutive chords.
As we do not know in which octaves the root and bass notes are (since this is not relevant
to our harmony analysis), we chose to measure all intervals upwards, i.e. assuming that the
second note always has a higher pitch than the first one. For instance the interval between C
and Bb is a minor seventh (and not a downward major second). Similarly the interval between
G and C is a perfect fourth (and not a downward perfect fifth). This choice guarantees that
we consistently measure intervals and can find interval patterns in the chord sequences.
1these RDF files are available at beatles
2available at
62 | Automatic Characterisation of the Harmony of Song Sets
For this study we limit the chord categories (or chord types) to ‘major’, ‘minor’, ‘aug-
mented’, ‘diminished’, ‘suspended’, ‘dominant’, ‘neutral’ (when the 3rd is neither present nor
suspended) and ‘unknown’ (for every chord that does not belong to the previous categories).
For each chord, the intervals are analysed by the RDF chord parser which then assigns the
chord to one of these categories. First it reduces the chord to a 7th chord and checks if this
reduced chord is a dominant 7th, in which case the chord is labeled ‘Dominant’. Otherwise
the chord is reduced to a triad and the type of this triad is kept as the chord category.
The degrees are computed by our RDF chord parser using the current key. Key information
was added by hand when available. We only had access to tonality information for the Beatles,
so no degree details were added for the Real Book songs. For the Beatles we performed two
studies: one without degree over the whole set of songs and one with degree in which only the
songs where there was no key modulation were kept. In this second study we also filtered out
the songs which were not tonal songs (i.e. songs that were not following major or minor scales)
which yielded a remaining dataset of 73.9% of the Beatles’ songs.
Our sole interest is in sequences of chords between which there is a harmonic modification
(i.e. at least the root, bass or chord category differs from one chord to the next one). Although
harmonic rhythm is important (cf. Section 2.2.1) we do not take it into account in this work.
4.2.2 Rule Induction with ILP
We restrict our focus to chord sequences of length 4 as in Mauch et al.’s study (2007). A four-
chord sequence is a typical phrase length for the studied corpora. This choice is also the result
of an empirical process: we also studied shorter sequences, but the results consist of only a few
rules (25 for the Beatles and 30 for the Real Book) with high coverage and little interest (such
as ‘2 consecutive major chords’ covering 51% of the Beatles chord sequences of length 2). For
longer sequences, the extracted patterns are less general, i.e. have a smaller coverage and thus
are less characteristic of the corpus. The concept we want to characterise is the harmony of a
set of songs e.g. all the Beatles songs, all the Real Book songs. Therefore the positive examples
given to the ILP system are all the chord sequences of length 4 (predicate chord_prog_4/4)
found in such a set of songs. These chord sequences overlap: from a chord sequence of length
n, with n4we extract n4 + 1 overlapping chord sequences of length 4. For instance the
Aleph file containing all the positive examples for the Beatles looks like this:
4.2. Methodology | 63
Where chordX_Y_Z means the Zth chord in the Yth song of the Xth album. These are Prolog
atoms which uniquely identify each chord in each song. Hence all chord sequences of length
4 starting from any position of any song from any album of The Beatles (at least all those in
our annotated corpus) are listed in this file.
The background knowledge is composed of the descriptions of all the chords previously
derived by the RDF chord parser. So for each of those uniquely identified chords a full
description of its attributes is stored in the background knowledge, in the following format:
This code says that the first chord of the first song of the first album of the Beatles is a chord
(first line), which is a major chord (second line), whose root note is E (third line) and bass
note is E (fourth line). That chord starts 2.612267 seconds into the song (fifth line) and it is on
the tonic, i.e. degree I (sixth line). Then the next lines describe the connections between that
first chord and the second one: there is a perfect fourth between the root notes of chord1_1_1
64 | Automatic Characterisation of the Harmony of Song Sets
and chord1_1_2 (seventh line) and similarly for the bass notes (eighth line) and chord1_1_1
precedes chord1_1_2 (ninth line).
In the ILP system we use to induce harmony rules, Aleph (Srinivasan, 2003), we can either
provide negative examples of a concept (in our case, chord progressions of length 4 from
another set of songs not included in the current one) or force Aleph to explain the positive
examples using a well-designed negative example (we will refer to this mode as the one negative
example mode). In the latter case our negative example consists of the first chord sequence of
our corpus in which we exchanged the position of the first and second chords as shown below:
It is a valid negative example because in our background knowledge the position of each
uniquely identified individual chord relative to the other chords is specified, using the predicate
pred/2. So it is impossible for the second chord in the first song of the corpus (chord1_1_2) to
precede the first one (chord1_1_1). We found out that by limiting the set of negative examples
to this very simple one we obtained a more complete set of rules than when using the positive
examples only mode of Aleph which randomly generates a limited number of negative examples.
To generate hypotheses Aleph uses inverse entailment (cf. Section 3.3 for more details).
It consists of selecting an uncovered example, saturating it to obtain a bottom clause and
searching the space of clauses that subsumes this bottom clause in a top-down manner starting
from the shortest clauses. The clause that is kept as a hypothesis is the one that maximises the
evaluation function, which in our case is the default Aleph evaluation function called ‘coverage’
and equal to PN, where P,Nare the number of positive and negative examples covered by
the clause. The examples covered by the found hypothesis are removed and the next uncovered
example is selected to be saturated, and so on until no uncovered example is left. Finally Aleph
returns a set of hypotheses that covers all the positive examples. The set of generated rules
depends on the order in which the examples are selected by Aleph (which is the order in which
the examples are given to Aleph). So the resulting set of rules is only one of the sets of rules that
could be induced from the set of examples. However since Aleph looks for the most general
rules at each step, the final set of rules is a sufficient description of the data (it explains all chord
sequences) and is non-redundant (no subset of the rules explains all the chord sequences). This
minimal sufficient description of a data set could be very useful for classification purposes since
only a few characteristics need to be computed to classify a new example. This is one of the
advantages of our method against the purely statistical method employed by Mauch et al.
(2007) which only computes the frequencies of each chord sequence and does not try to build
a sufficient model of the corpora.
4.2. Methodology | 65
To obtain meaningful rules we also constrain Aleph to look for a hypothesis
explaining the chord progressions only in terms of specific root note progressions
(root_prog_4/8), bass note progressions (bassNote_prog_4/8), chord category progressions
(category_prog_4/8), root interval progressions (rootInterval_prog_3/7), bass interval
progressions (bassInterval_prog_3/7) and degree progressions (degree_prog_4/8). The
following lines of code in the background knowledge file tell Aleph that chord_prog_4/4 can
only be described with the six predicates aforementioned:
Additionally each of these six predicates is described, again in the background knowledge file,
with the predicates used to describe the individual or pairs of chords:
66 | Automatic Characterisation of the Harmony of Song Sets
4 . 3
Experiments and Results
4.3.1 Independent Characterisation of The Beatles and Real Book Chord
We run two experiments. In the first experiment we want to characterise the chord sequences
present in the Beatles’ songs and compare them to the chord sequences present in the Real
Book songs. Therefore we extract all the chord sequences of length 4 in the Beatles’ tonal
songs with no modulation (10,096 chord sequences), all the chord sequences of length 4 in all
the Beatles’ songs (13,593 chord sequences) and all the chord sequences of length 4 from the
Real Book songs (23,677 chord sequences). Then for each of these sets of chord sequences we
induce rules characterising them using the one negative example mode in Aleph. It is important to
realise that we run our experiments on all chord sequences of each group without considering
the individual songs they are extracted from anymore.
Our system induces sets of 333 and 267 rules for each of the Beatles collections (all chord
sequences in tonal songs with no modulation, all chord sequences in all songs) and a set of
4.3. Experiments and Results | 67
646 rules for the Real Book. The positive coverage of a rule is the number of positive examples
covered by this rule. We want to consider only the patterns characteristic of the corpus, i.e.
the ones occurring in multiple songs. For that we leave out the rules with a too small coverage
(smaller than 1%). The top rules for our first experiment are shown in Tables 4.1 and 4.2. For
analysis purposes they have been re-ordered by decreasing coverage.
For readability we show here a compact representation of the body of rules:
degrees are represented with roman numerals,
“/ ” precedes a bass note as in jazz chord notation,
the intervals between roots (written first) or bass notes of the chords (following a “/”) are
put on top of the arrows,
a bullet symbolises the absence of information about some characteristics of the chord.
In accordance with Mauch et al.’s conclusions (2007), some patterns extracted in these
experiments are very common pop and jazz harmonic patterns. For instance, the Beatles rule
with the highest coverage (more than a third of the chord sequences) is a sequence of 4 major
chords. The minor chord is the second most frequent chord category in the Beatles and the
dominant chord ranks quite low in the chord category rules (rule 25). For the Real Book, the
rule with the highest coverage is a sequence of three perfect fourth intervals between chord
roots. An interpretation of this rule is the very common jazz progression ii-V-I-IV. Another
common jazz chord progression, I-VI-II-V (often used as a “turnaround” in jazz), is captured
by rule 8 in Table 4.2. Moreover the dominant chord is the most frequent chord category in
the Real Book which clearly distinguishes the jazz standards of the Real Book from the pop
songs of the Beatles.
Note that due to the fact that the chord sequences overlap and due to the cyclic nature of
some of the pop and jazz songs, many rules are not independent. For instance rules 2, 3, 6 and
7 in Table 4.1 can represent the same chord sequence maj-maj-maj-min repeated several times.
Moreover we can also derive rules that make use of degree information. For this we
constrain Aleph to derive rules about the intervals between the chord roots associated with
chord category in order to capture harmonic patterns which can then be interpreted in term
of scale degrees. The top root interval and category rules for each corpus are presented in
Tables 4.3 and 4.4. Furthermore, since we have key information for some of the Beatles songs
we can actually obtain degree rules for them and an analysis of the degree rules allows us to
68 | Automatic Characterisation of the Harmony of Song Sets
Table 4.1: Beatles harmony rules whose coverage is larger than 1%. C1and C2represent the
positive coverage over all the Beatles songs and over the Beatles tonal songs with no modulation
respectively. “perfU” means perfect unison.
Rule C1C2
1. maj maj maj maj 4752 (35%) 3951 (39%)
2. maj maj maj min 632 (4.65%) 431 (4.27%)
3. min maj maj maj 628 (4.62%) 448 (4.44%)
4. perf4th
586 (4.31%) -
584 (4.30%) -
6. maj min maj maj 522 (3.84%) 384 (3.80%)
7. maj maj min maj 494 (3.63%) 363 (3.60%)
463 (3.41%) 346 (3.43%)
9. maj maj min min 344 (2.53%) 217 (2.15%)
10. perfU
336 (2.47%) 237 (2.38%)
11. min min maj maj 331 (2.44%) 216 (2.14%)
12. maj min min maj 308 (2.27%) 197 (1.95%)
13. perf4th
260 (1.91%) 209 (2.07%)
14. maj2nd
251 (1.85%) 195 (1.93%)
15. /A/A/A/A-176 (1.74%)
16. min maj min maj 232 (1.71%) 167 (1.65%)
17. min min min min 226 (1.66%) 104 (1.03%)
18. perf4th
219 (1.61%) 146 (1.45%)
19. perf4th
216 (1.59%) 165 (1.63%)
20. maj min maj min 212 (1.56%) 157 (1.56%)
21. perf4th
211 (1.55%) 160 (1.58%)
22. min maj maj min 205 (1.51%) 132 (1.31%)
23. min min min maj 204 (1.50%) 113 (1.12%)
24. maj min min min 203 (1.49%) 119 (1.18%)
25. maj dom maj maj 200 (1.47%) 174 (1.72%)
26. maj maj dom maj 192 (1.41%) 170 (1.68%)
27. perf5th
188 (1.38%) -
28. maj maj maj dom 187 (1.38%) 166 (1.64%)
29. dom maj maj maj 183 (1.35%) 153 (1.52%)
30. min min maj min 176 (1.29%) 86 (0.85%)
31. perfU
172 (1.27%) -
32. perf4th
169 (1.24%) 112 (1.11%)
33. perf4th
163 (1.20%) 152 (1.51%)
34. min maj min min 163 (1.20%) 92 (0.91%)
35. perf5th
160 (1.18%) -
36. dom dom dom dom 147 (1.08%) 110 (1.09%)
142 (1.04%) 132 (1.31%)
38. I VIV I - 111 (1.10%)
39. perf4th
138 (1.02%) 88 (0.87%)
40. perfU
138 (1.01%) 100 (0.99%)
41. perf4th
135 (0.99%) 112 (1.11%)
42. perf5th
114 (0.84%) 103 (1.02%)
4.3. Experiments and Results | 69
Table 4.2: Real Book harmony rules whose coverage is larger than 1%. Cis the positive
Rule C
1. perf4th
1861 (7.86%)
2. min dom min dom 969 (4.09%)
3. min dom maj min 727 (3.07%)
4. dom min dom min 726 (3.07%)
5. min min min min 708 (2.99%)
6. dom dom dom dom 674 (2.85%)
7. perf4th
615 (2.60%)
8. maj6th
611 (2.58%)
9. perf4th
608 (2.57%)
10. dom min dom maj 594 (2.51%)
11. dom maj min dom 586 (2.47%)
12. perf4th
579 (2.45%)
547 (2.31%)
14. maj min dom maj 478 (2.02%)
477 (2.01%)
440 (1.86%)
17. perf4th
436 (1.84%)
18. min dom maj dom 424 (1.79%)
19. min min dom maj 413 (1.74%)
20. perfU