Thesis

Harmonic analysis of music using combinatory categorial grammar

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Various patterns of the organization of Western tonal music exhibit hierarchical structure, among them the harmonic progressions underlying melodies and the metre underlying rhythmic patterns. Recognizing these structures is an important part of unconscious human cognitive processing of music. Since the prosody and syntax of natural languages are commonly analysed with similar hierarchical structures, it is reasonable to expect that the techniques used to identify these structures automatically in natural language might also be applied to the automatic interpretation of music. In natural language processing (NLP), analysing the syntactic structure of a sentence is prerequisite to semantic interpretation. The analysis is made difficult by the high degree of ambiguity in even moderately long sentences. In music, a similar sort of structural analysis, with a similar degree of ambiguity, is fundamental to tasks such as key identification and score transcription. These and other tasks depend on harmonic and rhythmic analyses. There is a long history of applying linguistic analysis techniques to musical analysis. In recent years, statistical modelling, in particular in the form of probabilistic models, has become ubiquitous in NLP for large-scale practical analysis of language. The focus of the present work is the application of statistical parsing to automatic harmonic analysis of music. This thesis demonstrates that statistical parsing techniques, adapted from NLP with little modification, can be successfully applied to recovering the harmonic structure underlying music. It shows first how a type of formal grammar based on one used for linguistic syntactic processing, Combinatory Categorial Grammar (CCG), can be used to analyse the hierarchical structure of chord sequences. I introduce a formal language similar to first-order predicate logical to express the hierarchical tonal harmonic relationships between chords. The syntactic grammar formalism then serves as a mechanism to map an unstructured chord sequence onto its structured analysis. In NLP, the high degree of ambiguity of the analysis means that a parser must consider a huge number of possible structures. Chart parsing provides an efficient mechanism to explore them. Statistical models allow the parser to use information about structures seen before in a training corpus to eliminate improbable interpretations early on in the process and to rank the final analyses by plausibility. To apply the same techniques to harmonic analysis of chord sequences, a corpus of tonal jazz chord sequences annotated by hand with harmonic analyses is constructed. Two statistical parsing techniques are adapted to the present task and evaluated on their success at recovering the annotated structures. The experiments show that parsing using a statistical model of syntactic derivations is more successful than a Markovian baseline model at recovering harmonic structure. In addition, the practical technique of statistical supertagging serves to speed up parsing without any loss in accuracy. This approach to recovering harmonic structure can be extended to the analysis of performance data symbolically represented as notes. Experiments using some simple proof-of-concept extensions of the above parsing models demonstrate one probabilistic approach to this. The results reported provide a baseline for future work on the task of harmonic analysis of performances.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... In this paper, we propose an innovative, rulebased model for automatic chord labelling that considers both the musical surface and figured bass. Compared to existing methods that only considered the musical surface [5,6,9,13,15,18,19], the advantages of our approach are: Figure 1: The first measures of BWV 33.06 "Allein zu dir, Herr Jesu Christ" from our Bach Chorale Figured Bass (BCFB) dataset. FBAs and chord labels are shown below the bass line. ...
... In this paper, we apply our automatic chord labelling model to the Bach Chorale Figured Bass (BCFB) dataset [14], a corpus we constructed containing FBAs in MusicXML, **kern, and MEI (Music Encoding Initiative) formats. 9 We chose this repertoire due to its key role in modern music pedagogy and its general historical importance. It consists of all 139 Johann Sebastian Bach chorales that include figured bass he wrote himself, based on the Neue Bach Ausgabe (NBA) [3], the most up-to-date scholarly critical edition. ...
... Other types of NCTs, such as passing tones or neighbor tones, are implied by the absence of the corresponding notes in FBAs.7 In this paper, we consider that each slice with figures represents an individual chord.8 Although figured bass is primarily a notation for performance, rather than a strict prescription of harmony, it nonetheless provides a useful description of harmony acknowledged by others[2,7].9 Available at: https://github.com/juyaolongpaul/Bach_chorale_FB. ...
... Computational systems developed for harmonic analysis and/or harmonic generation (e.g. melodic harmonisation), rely on chord labelling schemes that are relevant and characteristic of particular idioms [7,10,20,21,26]. There exist various typologies for encoding note simultaneities that embody different levels of harmonic information/abstraction and cover different harmonic idioms. ...
... For instance, in a diatonic major context, while c 1 = [0, [0, 4,7]] and c 2 = [0, [0, 4, 7, 10]] fulfil criteria 1 and 2, according to criterion 3 they are not grouped together since c2 includes value 10, which is mapped to the non-diatonic 10 pitch class value. In a major context [0, [0,4,7,10]] is secondary dominant to the IV (V/IV) and is differentiated from the I major chord. Each GCT group includes the GCT types that satisfy the aforementioned three criteria. ...
... half-diminished chords [0,3,6,10] were labelled as minor chords with added sixth [0,3,7,9]; e.g. [B, D, F, A] was re-ordered as [D, F, A, B]. ...
Conference Paper
Full-text available
The General Chord Type (GCT) representation is appropriate for encoding tone simultaneities in any harmonic context (such as tonal, modal, jazz, octatonic, atonal). The GCT allows the rearrangement of the notes of a harmonic sonority such that abstract idiom-specific types of chords may be derived. This encoding is inspired by the standard roman numeral chord type labelling and is, therefore, ideal for hierarchic harmonic systems such as the tonal system and its many variations; at the same time, it adjusts to any other harmonic system such as post-tonal, atonal music, or traditional polyphonic systems. In this paper the descriptive potential of the GCT is assessed in the tonal idiom by comparing GCT harmonic labels with human expert annotations (Kostka & Payne harmonic dataset). Additionally , novel methods for grouping and clustering chords, according to their GCT encoding and their functional role in chord sequences, are introduced. The results of both harmonic labelling and functional clustering indicate that the GCT representation constitutes a suitable scheme for representing effectively harmony in computational systems.
... In recent years, many melodic harmonisation systems have been developed, some rule-based [4,5] or evolutionary approaches that utilize rule based fitness evaluation [6,7] others relying on machine learning techniques like probabilistic approaches [8,9] and neural networks [10], grammars [11] or hybrid systems (e.g. [12]). ...
... [12]). Almost all of these systems model aspects of tonal harmony: from "standard" Bach-like chorale harmonisation [4,10] among many others) to tonal systems such as "classic" jazz or pop ( [9,11] among others). These systems aim to produce harmonizations of melodies that reflect the style of the discussed idiom, which is pursued by utilising chords and chord annotations that are characteristic of the idiom. ...
... For instance, the algorithm gives two possible encodings for a [0,2,5,9] pc-set, namely minor seventh chord or major chord with sixth (see Table1, example 2); such ambiguity may be resolved if tonal context is taken into account. For the [0,3,4,7] pc-set with root 0, the algorithm produces two answers, namely, a [2,5,9] and [5,9,0] [2,5,9] and [5,9,0] [2,5,9,12] and [5, 9, [2,5], [5,8], [8,11], [2,11] [2, 5], [5,8], [8,11], [2,11] all rotations of [ cated harmonic analysis, but rather to find a practical and efficient encoding for tone simultaneities (to be used, for instance, in statistical learning and automatic harmonic generation -see end of Section 4), we decided to extend the algorithm so as to reach in every case a single chord type for each simultaneity (no ambiguity). ...
Conference Paper
Full-text available
In this paper we focus on issues of harmonic representa-tion and computational analysis. A new idiom-independent representation is proposed of chord types that is appropriate for encoding tone simultaneities in any harmonic context (such as tonal, modal, jazz, octatonic, atonal). The General Chord Type (GCT) representation, allows the re-arrangement of the notes of a harmonic simultaneity such that abstract idiom-specific types of chords may be derived; this encoding is inspired by the standard roman numeral chord type labeling, but is more general and flexible. Given a consonance-dissonance classification of intervals (that reflects culturally-dependent notions of consonance/dissonance), and a scale, the GCT algorithm finds the maximal subset of notes of a given note simultaneity that contains only con-sonant intervals; this maximal subset forms the base upon which the chord type is built. The proposed representa-tion is ideal for hierarchic harmonic systems such as the tonal system and its many variations, but adjusts to any other harmonic system such as post-tonal, atonal music, or traditional polyphonic systems. The GCT representa-tion is applied to a small set of examples from diverse musical idioms, and its output is illustrated and analysed showing its potential, especially, for computational music analysis & music information retrieval.
... Even though shared structural commonalities can be identified within one or across a set of several musical pieces, these, 2 While the task of defining and finding similarities between music and language is complex, notable attempts have been made in this area. For instance, a previous, rather extensive publication (Granroth-Wilding, 2013) demonstrates the application of Combinatory Categorial Grammar (CCG) to analyze the hierarchical structure of chord sequences. This approach introduces a formal language, similar to first-order predicate logic, to express the tonal harmonic relationships between chords, serving as a mechanism to map unstructured chord sequences into structured analyses. ...
Conference Paper
Full-text available
This study introduces the novel concepts of Explicit and Implicit Musical Parameters (EMPs and IMPs) and demonstrates their application in digital musicology. Furthermore, it discusses the concept of 'musical words', that suggests representing explicit and implicit musical parameters as words or textual entities. This 'music-to-text' approach allows the application of advanced techniques and tools commonly used within the computational linguistics for the analysis of musical data, highlighting the structural parallels between music and language. Lastly, the findings of this paper not only illustrate the feasibility of this approach but also pave the way for further interdisciplinary studies and the advancement of analytical user-friendly tools that are applicable in both computational linguistics and digital musicology.
... Alternatively, one can interpret the words in the sentence as well as the resulting expression set-theoretically -it describes a set of situations in which Kim walked and fed the dog. This model is technically a grammar, as there are proof mechanisms to show a certain collection of tokens is a valid sentence [8], one can assign parts of speech according to the types of the words, and it can be used to show how individual parts make up a greater whole. However, it is more commonly studied by semanticists than syntacticians, and is the foundation of much research in natural language meaning. ...
Conference Paper
Full-text available
Like natural language, music can be described as being composed of various parts, which combine together to form a set-theoretic or logical entity. The conceptualized parts are more basic than the music seen on a page; they are the musical objects subject to music-theoretic analysis, and can be described using the language of functional programming and lambda calculus. This paper introduces the types of musical objects seen in tonal and modern music, as well as the combinators that allow them to combine to create other musical objects. We propose a method for automatically generating melodies by searching for combinations of musical objects which together produce a valid program corresponding to a melody or set of melodies.
... More recent years have seen developments in computational linguistics, especially probabilistic grammars, filtering back into musicological work. For example, Steedman's chord grammar has inspired a number of researchers to apply more sophisticated grammar formalisms to more general harmonic models (Granroth-Wilding, 2013;Granroth-Wilding and Steedman, 2012;Rohrmeier, 2006Rohrmeier, , 2011Steedman, 2003). ...
Chapter
Recent developments in computational linguistics offer ways to approach the analysis of musical structure by inducing probabilistic models (in the form of grammars) over a corpus of music. These can produce idiomatic sentences from a probabilistic model of the musical language and thus offer explanations of the musical structures they model. This chapter surveys historical and current work in musical analysis using grammars, based on computational linguistic approaches. We outline the theory of probabilistic grammars and illustrate their implementation in Prolog using PRISM. Our experiments on learning the probabilities for simple grammars from pitch sequences in two kinds of symbolic musical corpora are summarized. The results support our claim that probabilistic grammars are a promising framework for computational music analysis, but also indicate that further work is required to establish their superiority over Markov models.
... From the computational point of view, the various aspects of musical analysis have all been addressed since the 1960s (Forte, 1967;Rothgeb, 1968;Winograd, 1968), and there has been a sustained interest in the area up to the present day. In the last few years, several theses (bachelor, master and Ph.D.) have been published from this point of view (de Haas, 2012;Granroth-Wilding, 2013;Mearns, 2013;Sapp, 2011;Tracy, 2013;Willingham, 2013), which underlines the importance of this area of study. ...
Chapter
Full-text available
In a harmonic analysis task, melodic analysis determines the importance and role of each note in a particular harmonic context. Thus, a note is classified as a harmonic tone when it belongs to the underlying chord, and as a non-harmonic tone otherwise, with a number of categories in this latter case. Automatic systems for fully solving this task without errors are still far from being available, so it must be assumed that, in a practical scenario in which the melodic analysis is the system’s final output, the human expert must make corrections to the output in order to achieve the final result. Interactive systems allow for turning the user into a source of high-quality and high-confidence ground-truth data, so online machine learning and interactive pattern recognition provide tools that have proven to be very convenient in this context. Experimental evidence will be presented showing that this seems to be a suitable way to approach melodic analysis.
... relating to tonicisations/modulations/phrase endings) are represented using simple probabilistic grammars that capture harmonic dependences among distant events. There is a considerably extensive literature for techniques that utilize grammars for harmonization, among which [21,22,23,24], however the grammatical rules employed are based on theoretic considerations of specific idioms, while the approach followed in COINVENT applies a probabilistic context. Additionally, the application of efficient voice leading is also tackled through a statistical learning technique, which encapsulates statistical information about pitch height contour relations between the constituent pitches of chords. ...
Conference Paper
Full-text available
Conceptual blending is a cognitive theory whereby elements from diverse, but structurally-related, mental spaces are " blended " giving rise to new conceptual spaces that often possess new powerful interpretative properties, allowing better understanding of known concepts or the emergence of novel concepts altogether. This paper provides an overview of the wide computational methodological spectrum that is being developed towards building an automatic melodic harmonization system that employs conceptual blending, yielding harmonizations that inherit characteristics from multiple idioms. Examples of conceptual blending in harmony are presented that exhibit the effectiveness of the developed model in inventing novel harmonic concepts. These examples discuss the invention of well-known jazz cadences through blending the underlying concepts of classical music cadences, as well as the construction of larger chord sequences. Furthermore, examples of a conceptual blending interpretation in human compositions that motivated the goals of the system's design are given. Finally, conceptual blending between harmonic and non-harmonic domains is discussed, offering tools that allow for intuitive human intervention in the harmonization process.
... It seems to us that inductive approaches like ours are still using simplistic representations of harmony, while the neighbouring domain of computational models of harmony applies their rich expert models to small hand-picked examples. Examples of such rich models can be found in the seminal work of Steedman (1984) as well as in more recent studies such as (Bergeron, 2010;de Haas, 2012;Mearns, 2013;Granroth-Wilding, 2013). A long-term plan would be to merge the strong points of both these branches of research. ...
Thesis
Full-text available
Harmony is the aspect of music concerned with the structure, progression, and relation of chords. In Western tonal music each period had different rules and practices of harmony. Similarly some composers and musicians are recognised for their characteristic harmonic patterns which differ from the chord sequences used by other musicians of the same period or genre. This thesis is concerned with the automatic induction of the harmony rules and patterns underlying a genre, a composer, or more generally a ‘style’. Many of the existing approaches for music classification or pattern extraction make use of statistical methods which present several limitations. Typically they are black boxes, can not be fed with background knowledge, do not take into account the intricate temporal dimension of the musical data, and ignore rare but informative events. To overcome these limitations we adopt first-order logic representations of chord sequences and Inductive Logic Programming techniques to infer models of style. We introduce a fixed length representation of chord sequences similar to n-grams but based on first-order logic, and use it to characterise symbolic corpora of pop and jazz music. We extend our knowledge representation scheme using context-free definite-clause grammars, which support chord sequences of any length and allow to skip ornamental chords, and test it on genre classification problems, on both symbolic and audio data. Through these experiments we also compare various chord and harmony characteristics such as degree, root note, intervals between root notes, chord labels and assess their characterisation and classification accuracy, expressiveness, and computational cost. Moreover we extend a state- of-the-art genre classifier based on low-level audio features with such harmony-based models and prove that it can lead to statistically significant classification improvements. We show our logic-based modelling approach can not only compete with and improve on statistical approaches but also provides expressive, transparent and musicologically meaningful models of harmony which makes it suitable for knowledge discovery purposes.
Article
Full-text available
Repetition and structure have a significant place in music theory, but the structure hierarchy and its influences are often ignored in both music analysis and music generation. In this article, we first describe novel algorithms based on repetition to extract music structure hierarchy from a MIDI data set of popular music and show its effectiveness through evaluation. Then, we introduce new data-driven approaches to estimate and validate structural influences in music. Results show that the automatically detected hierarchical repetition structures reveal significant interactions between structure and harmony, melody, rhythm, and predictivity. Different levels of hierarchy interact differently, providing evidence that structural hierarchy plays an important role in our popular music data set beyond simple notions of repetition or similarity. We further study how musical structure has evolved over decades of popular music writing. Finally, we discuss the importance of this work in highlighting roles that structure can play in music analysis, music similarity, music generation, music evaluation, and other music information retrieval tasks.
Chapter
We introduce a novel perspective on set-class analysis combining the DFT magnitudes with the music visualisation technique of wavescapes. With such a combination, we create a visual representation of a piece’s multidimensional qualia, where different colours indicate saliency in chromaticity, diadicity, triadicity, octatonicity, diatonicity, and whole-tone quality. At the centre of our methods are: 1) the formal definition of the Fourier Qualia Space (FQS), 2) its particular ordering of DFT coefficients that delineate regions linked to different musical aesthetics, and 3) the mapping of such regions into a coloured wavescape. Furthermore, we demonstrate the intrinsic capability of the FQS to express qualia ambiguity and map it into a synopsis wavescape. Finally, we showcase the application of our methods by presenting a few analytical remarks on Bach’s Three-part Invention BWV 795, Debussy’s Reflets dans l’eau, and Webern’s Four Pieces for Violin and Piano, Op. 7, No. 1, unveiling increasingly ambiguous wavescapes.
Article
Full-text available
In order to reveal normative prototypes undergirding various formal sections, this article introduces the ‘Anchoring vi Schema’: a medium length major-mode passage (typically eight or 16 bars) that initiates on an unambiguous hypermetric downbeat (for example, the beginning of a verse or chorus). The Anchoring vi Schema must begin with tonic harmony and deploy submediant harmony at its midpoint – the second most hypermetrically strong beat. The identification of the Anchoring vi Schema enables closer readings of phrase expansion and deletion in popular music. A comparison of the common harmonies used in eight- and 16-measure passages reveals some striking similarities, particularly in terms of where tonic and subdominant chords are likely to occur. Although the endings of formal sections can take a variety of paths – including arriving at various tonal goals within a range of possible times – hypermetrically accented beginnings and midpoints show greater consistency in their organisation.
Thesis
Full-text available
The main objective of the master's thesis has been to examine how the entropy of harmony affects either the auditory acceptance or rejection of music composition and thereby to contribute to further research dealing with entropy in the field of music, especially in the field of the entropy of harmony.160 musical examples have been evaluated in terms of four main attributes: difficulty in listening to a musical example, the impact of pleasantness of the musical example, recognition of the musical example and repeatability. The analysis of 160 musical examples has also answered the following questions: a) if there is a relationship between symmetry of the harmony and pleasantness in listening to a musical example, b) if there is a relationship between symmetry in the use of chords in harmony and difficulty in listening to a musical example, c) if there is a difference in symmetry from the point of view of using chords in harmony in favorite and less favorite musical examples, and d) if entropy of harmony can predict the hearing acceptance of a music composition. KEY WORDS: entropy, harmony, symmetry, auditory acceptance, predicting hearing acceptability.
Article
Full-text available
Automatic Music Transcription is the extraction of an acceptable notation from performed music. One important task in this problem is rhythm quantization which refers to categorization of note durations. Although quantization of a pure mechanical performance is rather straightforward, the task becomes increasingly difficult in presence of musical expression, i.e. systematic variations in timing of notes and in tempo. For transcription of natural performances, we employ a framework based on Bayesian statistics. Expressive deviations are modelled by a probabilistic performance model from which the corresponding optimal quantizer is derived by Bayes theorem. We demonstrate that many different quantization schemata can be derived in this framework by proposing suitable prior and likelihood distributions.
Conference Paper
Full-text available
We present novel metrics for parse evaluation in joint segmentation and parsing scenarios where the gold sequence of terminals is not known in advance. The protocol uses distance-based metrics defined for the space of trees over lattices. Our metrics allow us to precisely quantify the performance gap between non-realistic parsing scenarios (assuming gold segmented and tagged input) and realistic ones (not assuming gold segmentation and tags). Our evaluation of segmentation and parsing for Modern Hebrew sheds new light on the performance of the best parsing systems to date in the different scenarios.
Article
Full-text available
Article
Full-text available
Al sinds onze babytijd hebben wij, mensen, een grote perceptuele gevoeligheid voor zowel de melodische, ritmische als dynamische aspecten van spraak en muziek. Het gaat, voor zover we nu weten, om een uniek menselijke aanleg voor het waarnemen, interpreteren en waarderen van muziek, nog voordat er een woord gesproken, of zelfs maar bedacht is. Het is het preverbale en preletter stadium waar het muzikale luisteren vol van is. Muziek speelt op een intrigerende manier met ons gehoor, ons geheugen, onze emoties en onze verwachtingen. Als luisteraar zijn we ons er vaak niet van bewust, maar we spelen een actieve rol bij wat muziek spannend, troostend of opwindend maakt, omdat luisteren zich niet afspeelt in de buitenwereld van de klinkende muziek, maar in de stille binnenwereld van ons hoofd en onze hersenen.
Conference Paper
Full-text available
Automated harmonic analysis is an important and interest- ing music research topic. Although many researchers have studied solutions to this problem, there is no comprehen- sive and systematic comparison of the many techniques proposed. In this paper we present Rameau, a framework for auto- matic harmonic analysis we are developing. With Rameau we are able to reimplement and analyze previous tech- niques, and develop new ones as well. We present a per- formance evaluation of ten algorithms on a corpus of 140 Bach chorales. We also evaluate four of them using preci- sion and recall and discuss possible improvements. We also present a numeric codification for tonal mu- sic with interesting properties, such as easy transposition, preservation of enharmonic information and easy conver- sion to standard pitch-class notation.
Article
Full-text available
We argue for a memory-based approach to music analysis which works with concrete musical experiences rather than with abstract rules or principles. New pieces of music are analyzed by combining fragments from structures of previ- ously encountered pieces. The occurrence-frequencies of the fragments are used to determine the preferred analysis of a piece. We test some instances of this approach against a set of 1,000 manually annotated folksongs from the Essen Folk- song Collection, yielding up to 85.9% phrase accuracy. A qualitative analysis of our results indicates that there are grouping phenomena that challenge the commonly accepted Gestalt principles of proximity, similarity and parallelism. These grouping phenomena can neither be explained by other musical factors, such as meter and harmony. We argue that music perception may be much more memory-based than previously assumed.
Article
Full-text available
This chapter investigates the issue of the role of the computer in musical analysis. Starting with a survey of the main approaches in computer analysis, we focus on the particular problem of Jazz chord sequences harmonic analysis. We propose a theory of chord sequence analysis, based on an explicit conceptual hierarchy of analysis objects. We discuss the implementation of the theory and its results on a typical example (Blues for Alice, by Charlie Parker), for which the system produces an analysis which conforms exactly to human interpretation. We also exhibit a chord sequence, Solar (by Miles Davis), for which the results of the system do not conform to human perception, i.e. it does not find it is a Blues. We conclude on the issue of the role for the computer in musical analysis.
Book
Exploring the application of Bayesian probabilistic modeling techniques to musical issues, including the perception of key and meter. In Music and Probability, David Temperley explores issues in music perception and cognition from a probabilistic perspective. The application of probabilistic ideas to music has been pursued only sporadically over the past four decades, but the time is ripe, Temperley argues, for a reconsideration of how probabilities shape music perception and even music itself. Recent advances in the application of probability theory to other domains of cognitive modeling, coupled with new evidence and theoretical insights about the working of the musical mind, have laid the groundwork for more fruitful investigations. Temperley proposes computational models for two basic cognitive processes, the perception of key and the perception of meter, using techniques of Bayesian probabilistic modeling. Drawing on his own research and surveying recent work by others, Temperley explores a range of further issues in music and probability, including transcription, phrase perception, pattern perception, harmony, improvisation, and musical styles. Music and Probability—the first full-length book to explore the application of probabilistic techniques to musical issues—includes a concise survey of probability theory, with simple examples and a discussion of its application in other domains. Temperley relies most heavily on a Bayesian approach, which not only allows him to model the perception of meter and tonality but also sheds light on such perceptual processes as error detection, expectation, and pitch identification. Bayesian techniques also provide insights into such subtle and advanced issues as musical ambiguity, tension, and "grammaticality," and lead to interesting and novel predictions about compositional practice and differences between musical styles.
Chapter
Machine Models of Music brings together representative models and current research to illustrate the rich impact that artificial intelligence has had on the understanding and composition of traditional music and to demonstrate the ways in which music can push the boundaries of traditional Al research. Machine Models of Music brings together representative models ranging from Mozart's "Musical Dice Game" to a classic article by Marvin Minsky and current research to illustrate the rich impact that artificial intelligence has had on the understanding and composition of traditional music and to demonstrate the ways in which music can push the boundaries of traditional Al research. Major sections of the book take up pioneering research in generate-and-test composition (Lejaren Hiller, Barry Brooks, Jr., Stanley Gill); composition parsing (Allen Forte, Herbert Simon, Terry Winograd); heuristic composition (John Rothgeb, James Moorer, Steven Smoliar); generative grammars (Otto Laske, Gary Rader, Johan Sundberg, Fred Lerdahl); alternative theories (Marvin Minsky, James Meehan); composition tools (Charles Ames, Kemal Ebcioglu, David Cope, C. Fry); and new directions (David Levitt, Christopher Longuet-Higgins, Jamshed Bharucha, Stephan Schwanauer). Stephan Schwanauer is President of Mediasoft Corporation. David Levitt is the founder of HIP Software and head of audio products at VPL Research.
Thesis
State-of-the-art parsers suffer from incomplete lexicons, as evidenced by the fact that they all contain built-in methods for dealing with out-of-lexicon items at parse time. Since new labelled data is expensive to produce and no amount of it will conquer the long tail, we attempt to address this problem by leveraging the enormous amount of raw text available for free, and expanding the lexicon offline, with a semi-supervised word learner. We accomplish this with a method similar to self-training, where a fully trained parser is used to generate new parses with which the next generation of parser is trained. This thesis introduces Chart Inference (CI), a two-phase word-learning method with Combinatory Categorial Grammar (CCG), operating on the level of the partial parse as produced by a trained parser. CI uses the parsing model and lexicon to identify the CCG category type for one unknown word in a context of known words by inferring the type of the sentence using a model of end punctuation, then traversing the chart from the top down, filling in each empty cell as a function of its mother and its sister. We first specify the CI algorithm, and then compare it to two baseline wordlearning systems over a battery of learning tasks. CI is shown to outperform the baselines in every task, and to function in a number of applications, including grammar acquisition and domain adaptation. This method performs consistently better than self-training, and improves upon the standard POS-backoff strategy employed by the baseline StatCCG parser by adding new entries to the lexicon. The first learning task establishes lexical convergence over a toy corpus, showing that CI’s ability to accurately model a target lexicon is more robust to initial conditions than either of the baseline methods. We then introduce a novel natural language corpus based on children’s educational materials, which is fully annotated with CCG derivations. We use this corpus as a testbed to establish that CI is capable in principle of recovering the whole range of category types necessary for a wide-coverage lexicon. The complexity of the learning task is then increased, using the CCGbank corpus, a version of the Penn Treebank, and showing that CI improves as its initial seed corpus is increased. The next experiment uses CCGbank as the seed and attempts to recover missing question-type categories in the TREC question answering corpus. The final task extends the coverage of the CCGbank-trained parser by running CI over the raw text of the Gigaword corpus. Where appropriate, a fine-grained error analysis is also undertaken to supplement the quantitative evaluation of the parser performance with deeper reasoning as to the linguistic points of the lexicon and parsing model.
Thesis
Parsing is the task of assigning syntactic or semantic structure to a natural language sentence. This thesis focuses on syntactic parsing with Combinatory Categorial Grammar (CCG; Steedman 2000). CCG allows incremental processing, which is essential for speech recognition and some machine translation models, and it can build semantic structure in tandem with syntactic parsing. Supertagging solves a subset of the parsing task by assigning lexical types to words in a sentence using a sequence model. It has emerged as a way to improve the efficiency of full CCG parsing (Clark and Curran, 2007) by reducing the parser’s search space. This has been very successful and it is the central theme of this thesis. We begin by an analysis of how efficiency is being traded for accuracy in supertagging. Pruning the search space by supertagging is inherently approximate and to contrast this we include A* in our analysis, a classic exact search technique. Interestingly, we find that combining the two methods improves efficiency but we also demonstrate that excessive pruning by a supertagger significantly lowers the upper bound on accuracy of a CCG parser. Inspired by this analysis, we design a single integrated model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting complexity, we experiment with both loopy belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. Finally, we address training the integrated model. We adopt the idea of optimising directly for a task-specific metric such as is common in other areas like statistical machine translation. We demonstrate how a novel dynamic programming algorithm enables us to optimise for F-measure, our task-specific evaluation metric, and experiment with approximations, which prove to be excellent substitutions. Each of the presented methods improves over the state-of-the-art in CCG parsing. Moreover, the improvements are additive, achieving a labelled/unlabelled dependency F-measure on CCGbank of 89.3%/94.0% with gold part-of-speech tags, and 87.2%/92.8% with automatic part-of-speech tags, the best reported results for this task to date. Our techniques are general and we expect them to apply to other parsing problems, including lexicalised tree adjoining grammar and context-free grammar parsing.
Article
The idea that natural language grammar and planned action are related systems has been implicit in psychological theory for more than a century. However, formal theories in the two domains have tended to look very different. This article argues that both faculties share the formal character of applicative systems based on operations corresponding to the same two combinatory operations, namely functional composition and type-raising. Viewing them in this way suggests simpler and more cognitively plausible accounts of both systems, and suggests that the language faculty evolved in the species and develops in children by a rather direct adaptation of a more primitive apparatus for planning purposive action in the world by composing affordances of objects or tools. The knowledge representation that underlies such planning is also reflected in the natural language semantics of tense, mood, and aspect, which the paper begins by arguing provides the key to understanding both systems.
Article
In this essay I consider how Schenkerian theory might be evaluated as a theory of composition (describing composers' mental representations) and as a theory of perception (describing listeners' mental representations). I propose to evaluate the theory in the usual way: by examining its predictions and seeing if they are true. The first problem is simply to interpret and formulate the theory in such a way that substantive, testable predictions can be made. While I consider some empirical evidence that bears on these predictions, my approach is, for the most part, informal and intuitive: I simply present my own thoughts as to which of the theory's possible predictions seem most promising—that is, which ones seem from informal observation to be borne out in ways that support the theory.
Conference Paper
Analysing music resembles natural language parsing in requiring the derivation of structure from an unstructured and highly ambiguous sequence of elements, whether they are notes or words. Such analysis is fundamental to many music processing tasks, such as key identification and score transcription. The focus of the present paper is on harmonic analy-sis. We use the three-dimensional tonal harmonic space developed by [4, 13, 14] to define a theory of tonal har-monic progression, which plays a role analogous to se-mantics in language. Our parser applies techniques from natural language processing (NLP) to the problem of analysing harmonic progression. It uses a formal gram-mar of jazz chord sequences of a kind that is widely used for NLP, together with the statistically based modelling techniques standardly used in wide-coverage parsing, to map music onto underlying harmonic progressions in the tonal space. Using supervised learning over a small corpus of jazz chord sequences annotated with harmonic analyses, we show that grammar-based musical parsing using simple statistical parsing models is more accurate than a baseline Markovian model trained on the same corpus.
Article
Abstract This study concerns the use of formal grammars commonly,applied to lan- guage to model the process of harmonic analysis and the human,understanding of the language of jazz harmony. It builds on the Combinatory Categorial Gram- mar (CCG) of Steedman (1996) for jazz chord sequences. The coverage of the grammar,is extended and its semantic productions based on Longuet-Higgins’ tonal space theory are developed according to literature on functional harmonic analysis and examination of bodies of jazz chord sequences. A language of underspecified harmonic semantic expressions is developed that can be used to express generalizations over movements in Longuet-Higgins’ tonal space. The language is applied successfully to the problem of recognizing chord sequences that are variations on a general harmonic form; in particular, it is used to recognize examples of the 12-bar blues. A parser for the harmonic grammar has been implemented and applied to jazz chord sequences. The grammar,is evaluated with respect to its applicability to jazz standards outside the domain of the blues. Shortcomings of the grammar as a model of musical analysis are discussed and suggestions are made for future development. Some further examples are considered from outside the domain of jazz standards. The high lexical ambiguity of the grammar,calls for statistic approaches similar to those used for natural language parsing. These are discussed but not implemented, due to the current lack of a suitable annotated corpus. The grammatical approach to harmonic analysis appears to provide a good means for modelling human perception of jazz chord sequences, which promises to generalize well to interpretation of harmony in a wider spectrum of Western tonal music. iii Acknowledgements
Article
This paper describes a system for chordal analysis of tonal music. We establish that, in the worst case, the segmenting the music and labeling the harmonies takes on the order of 2^n steps, where n is number of notes in a piece of music. We show that, when segments of the music can be analyzed locally, the problem becomes O(n^2). We then show that the results of the O(n^2) search can be closely approximated through the use of a heuristic that allows O(n) time search. The results of the segmenting and chord labeling algorithms are then empirically measured against analyses derived from a basic music theory text and the statistical results are reported.
Article
This paper is intended to complete and supplement three papers on Music which I have already read before the Royal Society’. It contains a more complete theory of temperament, embracing that indicated by Helmholtz ² , but not worked out by him, and its application to the theory of constructing musical instruments with an intonation practically just, without change of fingering, and, if there are three or four performers, without change of mechanism. The name Duodene refers to that collection of twelve notes, suitable to the present manuals, which is made the unit of construction.
Article
This paper proposes a music analysing system called the automatic time-span tree analyser (ATTA). ATTA derives a time-span tree that assigns a hierarchy of “structural importance” to the notes of a piece of music based on the generative theory of tonal music (GTTM). Although the time-span tree has been applied in music summarization and collaborative music creation systems, these systems use time-span trees manually analysed by experts in musicology. Current systems based on GTTM cannot acquire a time-span tree without manual application of most of the rules, since GTTM does not resolve much of the ambiguity involved in the application of the rules. To solve this problem, we propose a novel computational model of GTTM that re-formalizes the rules through a computer implementation. The main advantage of our approach is that we can introduce adjustable parameters, which enables us to assign priorities to the rules. Our analyser automatically acquires time-span trees by configuring the parameters that cover 17 out of 26 GTTM rules for constructing a time-span tree. Experimental results show that after these parameters were tuned, our method could outperform a baseline performance. We hope to distribute the time-span tree analyser as a tool for various musical tasks, such as searching and arranging music.
Article
Lerdahl and Jackendoff's recent book A Generative Theory of Tonal Music brings together a range of ideas that are of common interest to music theorists, music analysts and psychologists of music. For this reason, as well as for its specific theoretical contributions, the book is an important landmark. However, it is argued that not only are there shortcomings in the theory when viewed from each of these three perspectives, but that it also fails to pay sufficient attention to the different aims and conceptual frameworks of the three disciplines. This leads to certain difficulties in applying the theory and assessing the status of its constructs. If the three disciplines are to contribute significantly to one another, their method- ological and conceptual differences must be recognised and accommodated.
Article
This paper explores the issue of what is going on in a listener's mind during the real-time processing of music, such that it is possible to account for the listener's understanding of the music. The issue will be approached through evidence internal to music itself, and also by analogy with evidence from the processing of language. I will then examine how processing of the sort I propose provides a basis for considering a particular issue in the theory of musical affect.
Conference Paper
We present a new system for chord transcription from polyphonic musical audio that uses domain-specific knowledge about tonal harmony and metrical position to improve chord transcription performance. Low-level pulse and spectral features are extracted from an audio source using the Vamp plugin architecture. Subsequently, for each beat-synchronised chromagram we compute a list of chord candidates matching that chromagram, together with the confidence in each candidate. When one particular chord candidate matches the chromagram significantly better than all others, this chord is selected to represent the segment. However, when multiple chords match the chromagram similarly well, we use a formal music theoretical model of tonal harmony to select the chord candidate that best matches the sequence based on the surrounding chords. In an experiment we show that exploiting metrical and harmonic knowledge yields statistically significant chord transcription improvements on a corpus of 217 Beatles, Queen, and Zweieck songs.
Article
A Generative Theory of Tonal Music. By Fred Lerdahl and Ray Jackendoff. MIT Press: 1983. Pp.368. $35, £31.50.