ArticlePDF Available

Composer Attribution by Quantifying Compositional Strategies


Abstract and Figures

Taking a theory of musical style developed by Leonard B. Meyer as a starting point, an experiment is described in which statistical pattern recognition algorithms are used to characterize a particular musical style with respect to other styles. The resulting description can be used in authorship discussions. In the current study, a number of disputed or- gan works from the Bach catalogue is used to illustrate the possibilities of this approach.
Content may be subject to copyright.
Composer attribution by quantifying compositional strategies
Peter van Kranenburg
Department of Information and Computing Sciences
Utrecht University
Taking a theory of musical style developed by Leonard B.
Meyer as a starting point, an experiment is described in
which statistical pattern recognition algorithms are used to
characterize a particular musical style with respect to other
styles. The resulting description can be used in authorship
discussions. In the current study, a number of disputed or-
gan works from the Bach catalogue is used to illustrate the
possibilities of this approach.
Keywords: Musical Style, Pattern Recognition, Classical
Music, Composer Attribution, Johann Sebastian Bach.
1. Introduction
In order to describe a musical style or differences between
styles or the historical development of certain styles, a the-
ory of style is necessary. This applies to “traditional” de-
scriptions of musical style as well as studies in which tools
and algorithms from information technology are used.
In [5], Leonard Meyer develops a theory of musical style
that can be used as starting point for studies in which sta-
tistical pattern recognition algorithms are used to study and
compare musical styles. Meyer defines (musical) style as
follows: Style is a replication of patterning, whether in hu-
man behavior or in the artifacts produced by human behav-
ior, that results from a series of choices made within some
set of constraints.
Without repeating patterns, there would be no style at
all. The constraints are important for they shape a musi-
cal style by allowing certain patterns and disallowing oth-
ers. Meyer distinguishes three levels in these constraints:
Laws, rules and strategies. Laws are universal constraints,
e.g., One cannot ask a piccolo to play a contra G. The sec-
ond level, the rules are intracultural constraints. It is in the
rules that music from the Renaissance differs from music
from the Baroque. The third level, the strategies are con-
straints the composer subjects himself to, within the rules of
a certain cultural established style. Thus it is in the strate-
gies that the music of G.F. Handel differs from the music of
G.Ph. Telemann. Strategies reside on conscious as well as
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page.
°2006 University of Victoria
on unconscious levels. Certain patterns are ingrained dur-
ing the training and development of a composer and are not
replicated consciously.
In the second part of his book, Meyer applies his the-
ory to nineteenth century western classical music. He ad-
dresses some general patterns that recur in many composi-
tions from that age and connects these patterns to the un-
derlying romantic esthetic and ideology. In doing so, he is
forced to limit himself to proof by example. For a more pro-
found evaluation of musical styles, it would be necessary to
make extensively use of all available data (i.c. everything in
all considered scores). For achieving this, statistical pattern
recognition algorithms can be of great use. As Meyer him-
self states: “Since all classification and all generalization
about stylistic traits are based on some estimate of relative
frequency, statistics are inescapable.” ([5], p. 64).
2. A Pattern Recognition Approach
Meyer’s theory offers a foundation for the design of exper-
iments in which algorithms from statistical pattern recog-
nition are used. The features that will represent (parts of)
compositions can be allied with the replicated patterns that
are mentioned in Meyer’s definition. Assuming that for a
certain musicological problem the scores involved are elec-
tronically available, a major task will be the extraction of
the feature values from those scores. From the perspective
of “traditional” style analysis, large-scale features are more
interesting than small-scale features, e.g., in order to deter-
mine the way in which a certain composition resembles a
sonata-form, a global overview of the entire composition
is necessary. From the perspective of algorithmic extrac-
tion, small-scale features are more interesting, because the
algorithms to extract them are less complicated and the re-
sults less ambiguous. It is, for example, not clear how to
quantify the extent to which a composition resembles a cer-
tain sonata-form, but it is much less difficult to determine
the proportion of parallel thirds with respect to all inter-
val successions in the composition. So we need small-scale
patterns, which can be easily detected and counted, and of
which we have many.
With the limitation of the previous section in mind, a
set of twenty features is designed. The smallest scale in
a score is that of the relation of a single note to the other
notes around it. When a single note is part of a voice in a
polyphonic composition, it is more independent than when
it is part of a chord. Because of this, most features quan-
tify aspects of the relations between the different voices,
which means that only polyphonic compositions can be rep-
resented with the designed feature set. Since we will use this
representation for studying authorship of organ fugues, this
is not a problem. There are also some other features in the
set, that describe more global characteristics. The features
are described in [1]. Here a list of them is provided:
1. StabTimeslice 6. PitchEntropy 11. PartAugFourths 16. PartOctaves
2. DissPart 7. VoiceDensity 12. PartDimFifths 17. ParThirds
3. BeginBarDiss 8. PartSeconds 13. PartFifths 18. ParFourths
4. SonorityEntropy 9. PartThirds 14. PartSixths 19. ParSixths
5. HarmonyEntropy 10. PartFourths 15. PartSevenths 20. StepSuspension
By measuring all these features, compositions are repre-
sented as vectors in a 20-dimensional space. To such a data
set various kinds of pattern recognition algorithms can be
3. Organ Fugues ascribed to J.S. Bach
As a pilot experiment, a data set is assembled with 16 fugues
for organ that are listed in the catalogue of compositions of
Johann Sebastian Bach ([7]). Of six of these fugues the au-
thorship has been disputed. Also five fugues of his eldest
son, Wilhelm Friedemann Bach, and eight of his most im-
portant student, Johann Ludwig Krebs, are incorporated. So
we have a three-class data set.1Each composition is seg-
mented using a segmenting method described in [1], so each
composition is represented by a ’“cloud” of points.
The Fisher-transformation (described in [8], p. 145ff) can
be used to project the data points onto a two-dimensional
space in such a way that the classes are optimal separated
(figure 1). This projection shows that the compositions of
each composer do form a cluster. So it can serve as a refer-
ence for classifying compositions which might be composed
by be one of the three involvedcomposers.
Figure 1 indicates where the data points of the disputed
fugues are projectet. Some interesting observations can be
made. The F minor fugue BWV 534, is projected among
the fugues of J.L. Krebs. This fugue has been ascribed to
W.F. Bach ([3]). With the current result, that ascription can
be rejected. An ascription to J.L. Krebs seems more likely.
A suggested composer for BWV 536 is J.P. Kellner ([4]).
If this is true, Kellners style resembles more the style of
J.S. Bach than that of the other two composers. BWV 537
is said to be composed partly by J.S. Bach (bar 1–40) and
partly by J.L. Krebs ([6]). The first part is projected among
the works of J.S. Bach indeed. The second part however, is
outside of both the Bach-region and the Krebs-region. The
ending of the fugue is in the region between J.S. Bach and
Krebs. This does not fully support the hypothesis, but it
shows that a large part of the fugue is not Bach-like. Also
Bach’s authorship of the fugue in C minor, BWV 546, has
1The dataset is available from: http://www.musical-style-
Figure 1. Projection of disputed fugues on top of the the com-
positions of J.S. Bach (+), W.F. Bach (o) and J.L. Krebs (*).
been doubted ([2]). The current evaluation shows us that,
with respect to the styles of W.F. Bach and J.L. Krebs, this
fugue has the characteristics of the style of J.S. Bach. The
most famous organ work in existence, the toccata and fugue
in D minor, BWV 565, is not projected among the other
compositions of Bach. This confirms the doubts expressed
in [9].
4. Conclusion
Although the current results don’t offer enough evidence to
draw conclusions about the authorship of the involved com-
positions, it is clear that the proposed method is very help-
ful in finding hypotheses about differences in personal styles
and thus for studying authorship problems.
[1] E. Backer and P. van Kranenburg, “On musical stylometry
a pattern recognition approach”, in Pattern Recognition Let-
ters, 26 (2005), 299–309.
[2] W. Breig, “Versuch einer Theorie der Bachschen Orgelfuge”,
in Die Musikforschung 48 (1995), 14–52.
[3] P. Dirksen, “Het auteurschap van Praeludium en fuga in f
(BWV 534)”, in Het Orgel 96 (2000), nr. 5, 5–14.
[4] D. Humphreys, “A Bach Polyglotthe A major Prelude &
Fugue BWV 536”, in The Organ Yearbook XX (1989), 72–
[5] L.B. Meyer, Style and Music Theory, History, and Ideol-
ogy, Chicago, 1989.
[6] J. O’Donnell, “Mattheson, Krebs and the Fantasia & Fugue
in C minor BWV 537”, in The Organ Yearbook XX (1989),
[7] W. Schmieder, Thematisch-systematisches Verzeichnis der
musikalischen Werke von Johann Sebastian Bach. Bach-
Werke-Verzeichnis 2. ¨
uberarbeitete und erweiterte Aus-
gabe, Wiesbaden, 21990.
[8] A. Webb, Statistical Pattern Recognition, Chichester, 22002.
[9] P. Williams, “BWV 565: a toccata in D minor for organ by
J. S. Bach?”, Early Music 9 (1981), 330–337.
... That is, in the framework of human statistical learning theory and information theory, a higher-probability sequence may represent one that a composer is more likely to choose compared to lower-probability sequence. Indeed, informatics approach is often used to understand general music acquisition [33][34][35][36][37][38][39][40][41][42] and the mental representation of implicit knowledge [44-46]. Particularly, PARSER [47], Competitive Chunker [48], Information Dynamics of Music (IDyOM) [12,15], and n-gram models [49] underpin the hypothesis that music is acquired based on statistical distribution of music. ...
... Musical rules and how they should be followed vary over time and among traditions, genres, and composers themselves. Therefore, the characteristics of music can be extracted based on music-specific structures, such as the harmony, tonalities, relative pitches, and musical intervals in the musical scores [32][33][34][35][36][37][38][39][40][41][42]. To the best of our knowledge, however, few studies have verified whether the characteristics of music can be extracted based on the statistical structure in the musical scores. ...
Full-text available
Learning and knowledge of transitional probability in sequences like music, called statistical learning and knowledge, are considered implicit processes that occur without intention to learn and awareness of what one knows. This implicit statistical knowledge can be alternatively expressed via abstract medium such as musical melody, which suggests this knowledge is reflected in melodies written by a composer. This study investigates how statistics in music vary over a composer’s lifetime. Transitional probabilities of highest-pitch sequences in Ludwig van Beethoven’s Piano Sonata were calculated based on different hierarchical Markov models. Each interval pattern was ordered based on the sonata opus number. The transitional probabilities of sequential patterns that are musical universal in music gradually decreased, suggesting that time-course variations of statistics in music reflect time-course variations of a composer’s statistical knowledge. This study sheds new light on novel methodologies that may be able to evaluate the time-course variation of composer’s implicit knowledge using musical scores.
... Other domains such as source code and music also have stylistic features, especially grammar. Therefore stylometry is applicable to these domains as well, often using strikingly similar techniques [45], [10]. ...
Conference Paper
Full-text available
Static binary rewriting is a core technology for many systems and security applications, including profiling, optimization, and software fault isolation. While many static binary rewriters have been developed over the past few decades, most make various assumptions about the binary, such as requiring correct disassembly, cooperation from compilers, or access to debugging symbols or relocation entries. This paper presents Multiverse, a new binary rewriter that is able to rewrite Intel CISC binaries without these assumptions. Two fundamental techniques are developed to achieve this: (1) a superset disassembly that completely disassembles the binary code into a superset of instructions in which all legal instructions fall, and 92) an instruction rewriter that is able to relocate all instructions to any other location by mediating all indirect control flow transfers and redirecting them to the correct new addresses. A prototype implementation of Multiverse and evaluation on SPECint 2006 benchmarks shows that Multiverse is able to rewrite all of the testing binaries with a reasonable overhead for the new rewritten binaries. Simple static instrumentation using Multiverse and its comparison with dynamic instrumentation shows that the approach achieves better average performance. Finally, the security applications of Multiverse are exhibited by using it to implement a shadow stack.
... Other domains such as source code and music also have linguistic features, especially grammar. Therefore stylometry is applicable to these domains as well, often using strikingly similar techniques [11,43]. ...
The ability to identify authors of computer programs based on their coding style is a direct threat to the privacy and anonymity of programmers. Previous work has examined attribution of authors from both source code and compiled binaries, and found that while source code can be attributed with very high accuracy, the attribution of executable binary appears to be much more difficult. Many potentially distinguishing features present in source code, e.g. variable names, are removed in the compilation process, and compiler optimization may alter the structure of a program, further obscuring features that are known to be useful in determining authorship. We examine executable binary authorship attribution from the standpoint of machine learning, using a novel set of features that include ones obtained by decompiling the executable binary to source code. We show that many syntactical features present in source code do in fact survive compilation and can be recovered from decompiled executable binary. This allows us to add a powerful set of techniques from the domain of source code authorship attribution to the existing ones used for binaries, resulting in significant improvements to accuracy and scalability. We demonstrate this improvement on data from the Google Code Jam, obtaining attribution accuracy of up to 96% with 20 candidate programmers. We also demonstrate that our approach is robust to a range of compiler optimization settings, and binaries that have been stripped of their symbol tables. Finally, for the first time we are aware of, we demonstrate that authorship attribution can be performed on real world code found "in the wild" by performing attribution on single-author GitHub repositories.
... van Kranenburg (2006) described an experiment in which statistical pattern recognition algorithms were used to characterize a particular musical style with respect to other styles. He took a theory of musical style and proposed a description as an aid to authorship activity in music; i.e. composition. ...
Full-text available
The main purpose of this paper is to design a collaborative multi-agent system for providing an XML output which is used in composition. Explaining the performance of rhythm and melody agents is the main part of the paper structure. In this research, systems analysis and design has been adopted as the methodology; and computational calculations have been used. An XML output that is a printable music note can be used by famous music software packages like Sibelius and Final. The novel method introduced in this paper is new and can help musicians make new music with better quality and more diverse content anywhere anytime.
Full-text available
In this short communication we describe some experiments in which methods of statistical pattern recognition are applied for musical style recognition and disputed musical authorship attribution.Values of a set of 20 features (also called “style markers”) are measured in the scores of a set of compositions, mainly describing the different sonorities in the compositions. For a first study over 300 different compositions of Bach, Handel, Telemann, Mozart and Haydn were used and from this data set it was shown that even with a few features, the styles of the various composers could be separated with leave-one-out-error rates varying from 4% to 9% with the exception of the confusion between Mozart and Haydn which yielded a leave-one-out-error rate of 24%. A second experiment included 30 fugues from J.S. Bach, W.F. Bach and J.L. Krebs, all of different style and character. With this data set of compositions of undisputed authorship, the F minor fugue for organ, BWV 534 (of which Bach’s authorship is disputed) then was confronted. It could be concluded that there is experimental evidence that J.L. Krebs should be considered in all probability as the composer of the fugue in question.
Statistical pattern recognition is a term used to cover all stages of an investigation from problem formulation and data collection through to discrimination and classification, assessment of results and interpretation. This chapter introduces some of the basic concepts in classification and describes the key issues. It presents two complementary approaches to discrimination, namely a decision theory approach based on calculation of probability density functions and the use of Bayes theorem, and a discriminant function approach. Many different forms of discriminant function have been considered in the literature, varying in complexity from the linear discriminant function to multiparameter nonlinear functions such as the multilayer perceptron. Regression is an important part of statistical pattern recognition. Regression analysis is concerned with predicting the mean value of the response variable given measurements on the predictor variables and assumes a model of the form. Bayes' theorem; regression analysis; statistical process control
The concept of style is integral to the existence of musicology. In his article of 1885 articulating the nature and limits of this new academic discipline Guido Adler placed a distinct emphasis upon the historical evolution of musical styles. He suggested a sequence of analytical steps through which a work could be identified as belonging to a particular genre or period. Analysis formed the foundation for explaining musical style but was not the end point of its explanation. Using a strong admixture of organic and evolutionary metaphors, Adler emphasised that the “true kernel of historical work was the explanation of change and development in the succession of styles.”1Particularly important was his later recognition that an appearance of necessity in the pattern of style change did not support any attribution to successive styles of a causal connection. The purposive appearance of change was a result, instead, of the “ freer creative effectiveness” (freier Schaffenswirkssamkeit) of composers working from available precedents.2 The work of the musicologist was to explain particular styles while maintaining the perspective of their place in a continuous succession. It was not sufficient to identify separate problems for each stylistic era if a coherent account of historical development was to be achieved.3
J. S. Bachs Pedaliter-Fugen bilden eine zusammengehörige Werkgruppe, deren Formmerkmale sich deutlich von denen der Klavierfugen unterscheiden. Gemeinsam sind ihnen die Grundzüge der Expositionsgestaltung; hinsichtlich der Gesamtform lassen sich ein (früherer) vierteiliger und ein (späterer) dreiteiliger Typus als Hauptformen erkennen. Die herkömmliche Gliederung in "Durchführungen" erweist sich als untauglich zur Analyse von Bachs Orgelfugen.
Het auteurschap van Praeludium en fuga in f (BWV 534
  • P Dirksen
P. Dirksen, “Het auteurschap van Praeludium en fuga in f (BWV 534)”, in Het Orgel 96 (2000), nr. 5, 5–14.
A Bach Polyglot –– the A major Prelude & Fugue BWV 536
  • D Humphreys
D. Humphreys, " A Bach Polyglot –– the A major Prelude & Fugue BWV 536 ", in The Organ Yearbook XX (1989), 72– 87.