ArticlePDF Available

Abstract and Figures

In this paper we present a 12-dimensional tonal space in the context of the Tonnetz, Chew’s Spiral Array, and Harte’s 6-dimensional Tonal Centroid Space. The proposed Tonal Interval Space is calculated as the weighted Discrete Fourier Transform of normalized 12-element chroma vectors, which we represent as six circles covering the set of all possible pitch intervals in the chroma space. By weighting the contribution of each circle (and hence pitch interval) independently, we can create a space in which angular and Euclidean distances among pitches, chords, and regions concur with music theory principles. Furthermore, the Euclidean distance of pitch configurations from the centre of the space acts as an indicator of consonance.
Content may be subject to copyright.
A Multi-Level Tonal Interval Space for Modelling Pitch Relatedness
and Musical Consonance
Corresponding first author: Gilberto Bernardes1 (
Second author: Diogo Cocharro1 (
Third author: Marcelo Caetano1 (
Fourth author: Carlos Guedes1,2 (
Fifth author: Matthew E. P. Davies1 (
1 INESC TEC (Sound and Music Computing Group)
FEUP campus
Rua Dr. Roberto Frias
4200 - 465 Porto
+351 22 209 42 17
2 New York University Abu Dhabi — Faculty of Music
PO Box 129188
Abu Dhabi, United Arab Emirates
+971 2 628 5240
This research was supported by the Project "NORTE-01-0145-FEDER-000020",
financed by the North Portugal Regional Operational Programme (NORTE 2020),
under the PORTUGAL 2020 Partnership Agreement, and through the European
Regional Development Fund (ERDF) and by the Portuguese Foundation for Science
and Technology under the post-doctoral grants SFRH/BPD/109457/2015 and
This is a preprint version of an article published by Taylor & Francis in Journal of
New Music Research on 27 May 2016 available online at
In this paper we present a 12-dimensional tonal space in the context of the Tonnetz,
Chew's Spiral Array, and Harte's 6-dimensional Tonal Centroid Space. The proposed
Tonal Interval Space is calculated as the weighted Discrete Fourier Transform of
normalized 12-element chroma vectors, which we represent as 6 circles covering the
set of all possible pitch intervals in the chroma space. By weighting the contribution
of each circle (and hence pitch interval) independently, we can create a space in
which angular and Euclidean distances among pitches, chords, and regions concur
with music theory principles. Furthermore, the Euclidean distance of pitch
configurations from the centre of the space acts as an indicator of consonance.
Keywords: tonal pitch space, consonance, tonal hierarchy.
1 Introduction
A number of tonal pitch spaces have been presented in the literature since the
18th century (Euler, 1739). These tonal spaces relate spatial distance with perceived
proximity among pitch configurations at three levels: pitches, chords, and regions (or
keys). For example, a tonal space that aims to minimise distances among
perceptually-related pitch configurations should place the region of C major closer to
G major than B major because the first two regions are understood to be more
closely related within the Western tonal music context. For similar reasons, within the
C major region, a G major chord should be closer to a C major chord than a D minor
chord, and the pitch G should be closer to A than to G#.
The intelligibility and high explanatory power of tonal pitch spaces usually
hide complex theories, which need to account for a variety of subjective and
contextual factors. To a certain extent, the large number of different, and sometimes
contradictory, tonal pitch spaces presented in the literature help us understand the
complexity of such representations. Existing tonal spaces can be roughly divided into
two categories, each anchored to a specific discipline and applied methods. On the
one hand we have models grounded in music theory (Weber, 1817-21; Lewin, 1987;
Cohn, 1997, 1998; Tymoczko, 2011), and on the other hand, models based on
cognitive psychology (Longuet-Higgins, 1962; Shephard, 1982; Krumhansl, 1990).
Tonal pitch spaces based on music theory rely on musical knowledge,
experience, and the ability to imagine complex musical structures to explain which of
these structures work. Tonal pitch spaces based on cognitive psychology intend to
capture the mental processes underlying musical activities such as listening,
understanding, performing, and composing tonal music by interpreting the results of
listening experiments. Despite their divergence in terms of specific methods and
goals, music theory and cognitive psychology tonal pitch spaces share the same
motivation to capture intuitions about the closeness of tonal pitch configurations,
which is an important aspect of our experience of tonal music (Deutsch, 1984).
Recent research has attempted to bridge the gap between these two approaches by
proposing models that share methods and compare results from both disciplines, such
as the contributions of Balzano (1980, 1982), Lerdahl (1988, 2001), and Chew (2000,
Both music theory and cognitive tonal pitch spaces have been implemented
computationally to allow computers to better model and generate sounds and music.
Among the computational problems that have been addressed by tonal pitch spaces
we can highlight key estimation (Chew, 2000, 2008; Temperley, 2001; Bernardes et
al., 2016), harmonic change detection (Harte et al., 2006; Peiszer, Lidy & Rauber,
2008), automatic chord recognition (Mauch, 2010), and algorithmic-assisted
composition (Gatzsche, Mehnert, & Stöcklmeier, 2008; Behringer & Elliot, 2009;
Bernardes et al., 2015).
Following research into tonal pitch spaces, we present the Tonal Interval
Space, a new tonal pitch space inspired by the Tonnetz, Chew’s (2000) Spiral Array,
and Harte et al.’s (2006) 6-dimensional (6-D) Tonal Centroid Space. We describe the
mathematical formulation of the Tonal Interval Space and we discuss properties of the
space related to music theory. The innovations introduced in this paper constitute a
series of controlled distortions of the chroma space calculated as the weighted
Discrete Fourier Transform (DFT) of normalized 12-element chroma vectors, in
which we can measure the proximity of multi-level pitch configurations and their
level of consonance.
Primarily, our approach extends the Tonnetz, as well as the work of Chew
(2000, 2008) and Harte et al. (2006) in four fundamental aspects. First, it offers the
ability to represent and relate pitch configurations at three fundamental levels of
Western tonal music, namely pitch, chord and region within a single space. Second,
we compute the space by means of the DFT and furthermore demonstrate how Harte
et al.’s 6-D space can also be calculated in this way. Third, it allows the calculation of
a tonal pitch consonance indicator. Fourth, it projects pitch configurations that have a
different representation in the chroma space as unique locations in our space—thus
expanding the Harte et al.’s 6-D space to include all possible intervallic relations.
The remainder of this paper is structured as follows. In Section 2, we review
the problems and limitations of existing tonal pitch spaces. In Section 3, we detail the
three most related tonal pitch spaces to our work, which form the basis of our
approach. In Section 4, we describe the computation of Tonal Interval Vectors (TIVs)
that define the location of pitch configurations in a 12-dimensional (12-D) tonal pitch
space. In Section 5, we detail the representation of multi-level pitch configurations in
the space as well as the implications of the symmetry of the DFT for defining a
transposition invariant space. In section 6, we detail distance metrics used in the
Tonal Interval Space. In Section 7, we describe a strategy to adjust the distances
among pitch configurations in the 12-D space in order to better represent music theory
principles. In Section 8, we discuss the relations among different pitch configurations
on three fundamental tonal pitch levels, namely pitch classes, chords, and regions and
we compare the effect of different DFT weightings on the measurement of
consonance. Finally, in Sections 9 and 10 we reflect on the original contributions of
our work, draw conclusions and propose future directions.
2 Tonal pitch spaces: existing approaches and current limitations
The relations among tonal pitch structures, fundamental to the study of tonal
pitch spaces, have been a research topic extensively investigated in different
disciplines including music theory (Weber, 1817-21; Schoenberg, 1969), psychology
(Deutsch, 1984), psychoacoustics (Parncutt, 1989), and music cognition (Longuet-
Higgins, 1962; Shepard, 1982; Krumhansl, 1990). Different models and
interpretations of the same phenomena have been presented in these disciplines. We
argue that their discrepancy is due to historical, cultural, and aesthetic factors.
Therefore, tonal pitch spaces cannot be disassociated from the context where they
have been presented, and, more importantly, their understanding requires exposure to
tonal schemes (Deutsch, 1984).
Even though recent cognitive psychology research has managed to reduce
confounding factors and offer a more general view on the subject of perceptual
proximity of tonal pitch (Krumhansl, 1990), it is not averse to the idiosyncratic factors
that regulate listening expectancies within the Western tonal music context. For
example, many musical idioms that exist at the edge of tonality are clearly
misrepresented by tonal spaces resulting from empirical studies, such as the post-
romantic works of Richard Strauss and Gustav Mahler (Kross, 2006). Therefore, it is
important to bear in mind that, whichever applied method, a tonal space is only a
partial explanation of the entire Western tonal music corpus.
Given the limitations of tonal spaces to provide a universal explanation for the
cognitive foundations of pitch perception, related research must necessarily clarify
their basis, applied methodology, and most importantly their limitations. For the
purpose of this work we follow Lerdahl (2001) and most cognitive psychological
studies in the area, which position themselves in the extrapolation of ‘hierarchical
relations that accrue to an entire tonal system beyond its instantiation in a particular
piece’ (Lerdahl, 2001, p. 41). In other words, we are concerned with ‘tonal hierarchy’
that diverges from the concept of ‘event hierarchy’ (Bharucha, 1984) in the sense that
basic tonal structures of the first apply to the majority of Western tonal music rather
than a specific response to a particular style or composer’s idiom.
Tonal music structures result from the interaction of several levels of pitch
configuration, most importantly pitch, chords and regions (in increasing order of
abstraction). In the resulting tonal hierarchy, the upper levels embed lower ones and
all levels are inter-dependent. Therefore, as Lerdahl (2001) claims, a tonal pitch space
must account for the proximity of individual pitches, chords, and regions in the same
framework, as well as explain their interconnection.
In his Tonal Pitch Space theory, Lerdahl (2001) contextualizes all low-level
pitch configurations with top-level regions by representing pitch classes, chords, and
regions according to a similar method and all in the same space. Nevertheless, in
Lerdahl’s space, in order to represent low-level pitch configurations, we must define
the top-level region(s) to measure distances among their lower level pitch
configurations. Therefore, in order to measure the distance between two chords, for
example, we must define their region(s) in advance. Despite this compelling solution,
Lerdahl’s theory cannot be used in contexts where the regional level in unknown,
such as in Music Information Retrieval (MIR) problems like the automatic estimation
of keys and chords from a musical input. Therefore, we strive for a model that
explains all fundamental tonal pitch levels in a single space, without the need to
define a priori information.
Another commonly raised issue in the tonal pitch space literature, particularly
when discussing tonal spaces grounded in music theory, is their symmetry, which
does not equate with how humans perceive pitch distances (Krumhansl, 1990, pp.
119-123). However, the cyclical nature of the tonal system embeds operations like
transposition, which naturally create symmetrical spaces. By disregarding the cyclical
nature of the tonal system, we risk lacking an explanation for some of its most
fundamental operations. Additionally, the human ability to understand, abstract, and
group pitch contours invariant to their key or transposition factor stresses the
importance of relational distances that account for these operations, resulting in
symmetrical pitch space organisations (Shepard, 1982).
Figure 1. Representation of the Tonnetz or harmonic network, in which triangular
heavier strokes emphasise major/minor triads’ formation and shaded areas the
complete set of diatonic triads within the C major region—represented by their degree
in Roman numerals.
Consonance and dissonance, so closely related to the perception of pitch
proximity and musical tension, are poorly addressed in all theories supporting tonal
pitch spaces. Consonance and dissonance are at best implicitly considered in
Lerdahl’s (1988, 2001) Tonal Pitch Space, but never explicitly modelled (or
measurable) as a property of the space. Similarly, Krumhansl’s (1990, pp. 59-60)
analysis comparing Krumhansl and Kessler’s (1982) 24 major and minor key profiles
with several ratings of consonance and dissonance show poor results for intervals
formed within the minor keys.
3 The Tonnetz and its derivations
The Tonnetz is a planar representation of pitch relations first attributed to the
eighteenth-century mathematician Leonhard Euler (Cohn, 1998). In its most
traditional representation, the Tonnetz organizes (equal-tempered) pitch on a
conceptual plane according to intervallic relations, favouring perfect fifths, major
thirds, and minor thirds (see Figure 1). Fifths run horizontally from left to right, minor
thirds run diagonally from bottom left to top right, and major thirds run diagonally
from top left to bottom right.
Despite its original basis as a pitch class space, the Tonnetz has been
extensively used as a chordal space since the 19th century by music theorists such as
Riemann and Oettingen and more recently by neo-Riemannian music theorists
(Lewin, 1987; Hyer, 1995; Cohn, 1997). Chords are represented on the Tonnetz as
patterns formed by adjacent pitches, whose shapes are constant for chords with the
same quality. For example, major triads always form a downward pointing triangle,
whereas minor triads always form an upward pointing triangle (see Figure 1).
Music theorists following the Riemannian tradition adopted the Tonnetz to
explain significant tonal relationships between harmonic functions, which are near
one another in the Tonnetz (Cohn, 1998). For example, the dominant and the
subdominant chords are at close distances on either side of the chord of the tonic in a
given key. In Figure 1, if we draw a horizontal line traversing the centre of the C
major region tonic (I) we find its dominant (V) and subdominant (IV) chords in the
neighbourhood and its relative (C minor triad), mediant (iii), and submediant (vi)
chords in edge-adjacent triangles. Moreover, in the Tonnetz, chord distances also
equate with the number of common tones. The closer chord configurations are, the
greater their number of common tones. In addition to the large amount of music
theory literature on the Tonnetz, Krumhansl (1998) presented experimental support
for the psychological reality of one of its most important theoretical branches, the
neo-Riemannian theory.
Various derivations and models of the Tonnetz have been proposed. Of
interest here are those that have a mathematical formulation and that can be
computationally modelled, notably Chew’s (2000) Spiral Array and Harte et al.’s
(2006) 6-D space. Chew’s Spiral Array results from wrapping the Tonnetz into a tube
in which the line of fifths becomes a helix on its surface and major third intervals are
directly above each other. Chew’s model allows chords and keys to be projected into
the interior of the tube by the centre of mass of their constituent pitches.
The spatial location of pitches on the Spiral Array ensures that some pitch
configurations understood as perceptually related within the Western tonal music
context correspond to small Euclidean distances. That is, pitch distances are
minimised for intervals that play an important role in tonal music, such as unisons,
octaves, fifths, and thirds. These distances result from the helix representation of pitch
locations in the Tonnetz and from further defining the ratio of height to diameter, akin
to stretching out a spring coil. The Spiral Array has been applied to problems such as
key estimation (Chew, 2000) and pitch spelling (Chew & Chen, 2003) from music
encoded as symbolic data.
Following Chew’s research, Harte et al. (2006) proposed a tonal space that
projects pitch configurations encoded as 12-element chroma vectors to the interior of
a 6-D polytope visualised as three circles. Inter pitch-class distances in the 6-D space
mirror the spatial arrangement for the perfect fifth, major thirds, and minor thirds of
the Tonnetz, weighted in a similar fashion to Chew’s Spiral Array to favour perfect
fifths and minor thirds over major thirds. The fundamental difference from Chew’s
Spiral Array is the possibility to represent harmonic information in a single octave by
invoking enharmonic equivalence. Distances between pitch configurations with
variable numbers of notes are represented in the space by the centroid of their
component pitches, whose distances emphasise harmonic changes in musical audio
(Harte et al., 2006). Additionally, the 6-D tonal space has been applied in a variety of
MIR problems, including chord recognition (Lee, 2007), key estimation (Lee &
Slaney, 2007) and structural segmentation (Peiszer, Lidy & Rauber, 2008).
In the following section, we introduce the Tonal Interval Space which inherits
features from the Tonnetz and its derivative spaces (Chew’s Spiral Array and Harte et
al.’s 6-D space), concerning the organisation of pitch classes. We extend Harte et al.’s
6-D space by including all intervallic relationships, reinforcing and controlling the
contribution of each interval in the space according to empirical consonance and
dissonance ratings (Malmberg, 1918; Kameoka & Kuriyagawa, 1969; Hutchinson &
Knopoff, 1979). This allows us to measure the interpreted proximity of pitch
configurations within the Western tonal music at various levels of abstraction as well
as measuring their level of consonance in a single space.
4 Tonal Interval Space
The Tonal Interval Space maps 12-D chroma vectors to complex-valued TIVs
with the DFT.1 On the one hand, the chroma vector can be used to represent different
levels of pitch configurations such as pitches, chords, and regions. On the other hand,
Fourier analysis has been widely used to explore the harmonic relations between pitch
classes, primarily to investigate intervallic differences between two pitch class sets
and expand on the notion of maximal evenness (Clough and Douthett, 1991; Lewin,
2001; Quinn, 2006, 2007; Callender, 2007; Amiot & Sethares, 2011; Amiot, 2013)
and to a lesser extent tonal pitch relations (Bernardes et al., 2015; Yust, 2015). In this
paper, we explore the effect of all coefficients of the DFT of chroma vectors,
including coefficients discarded by Harte et al. (2006) towards enhancing the
description of tonal pitch and the computation of a tonal pitch consonance indicator.
4.1. Chroma vectors
In this work, we restrict our analysis of musical notation to symbolic
representations, and hence we consider chroma vectors 𝑐𝑛 which express the pitch
class content of pitch configurations as binary activations in a 12-element vector.
Each element corresponds to a pitch class of the equal-tempered chromatic scale. The
chroma vector 𝑐𝑛 in Table 1 represents the C major chord, so it activates pitch
classes [0, 4, 7] with the value 1. Table 1 supposes enharmonic and octave
equivalence characteristic of equal tempered tuning. There is no information about
pitch height encoded in 𝑐𝑛. Consequently, the octave cannot be represented by 𝑐𝑛
with binary encoding because all the octaves are collapsed into one.
The chroma vector 𝑐𝑛 allows the representation of multi-level pitch
configuration by simply indicating the presence of the respective pitch classes. For
example, for the pitch class C is [0], for the G major chord is [2, 7, 11], and for the
diatonic C major scale (or diatonic scale of A natural minor) is [0, 2, 4, 5, 7, 9, 11].
The chroma vector 𝑐𝑛 occupies a 12-D space independently of the pitch
configuration it represents. However, the geometric properties of the space spanned
by the chroma vector do not capture harmonic or musical properties of the pitch
configurations that it represents. In other words, chroma vectors 𝑐𝑛 that represent
perceptually similar harmonic relations are not necessarily close together in the space.
For example consider the following three dyads: a minor second [0, 1], a major third
[0, 4] and a perfect fifth [0, 7]. While all three share a single pitch class [0] and the
Euclidean distance between all of their chroma representations is the same, from a
perceptual standpoint, the minor second is perceptually further from the other two.
The DFT maps chroma vectors to TIVs into a space that exhibits useful properties to
explore the harmonic relationships of the tonal system, which we detail in Section 8.
Chroma vector 𝑐𝑛
Position n
Pitch class
Table 1. Chroma vector 𝑐𝑛 representation of the C major chord.
4.2 Tonal interval vectors
1 The use of the DFT in the context of our work was inspired by Ueda et al. (2010), who identified a
correspondence between the DFT coefficients of a chroma vector and Harte et al.’s (2006) 6-D space.
TIVs 𝑇𝑘 are calculated as the DFT of the chroma vector 𝑐𝑛 as follows
𝑇𝑘= 𝑤𝑘 𝑐𝑛𝑒!!!!"#
,𝑘 with 𝑐𝑛=
where 𝑁=12 is the dimension of the chroma vector and 𝑤𝑘 are weights derived
from empirical dissonance ratings of dyads used to adjust the contribution of each
dimension 𝑘 of the space, which we detail at length in Section 7. 𝑇𝑘 uses 𝑐𝑛,
which is 𝑐𝑛 normalized by the DC component 𝑇0=𝑐𝑛
!!! to allow the
representation of all levels of tonal pitch represented by 𝑐𝑛 in the same space. In
doing so, 𝑇(𝑘) can be compared amongst different hierarchical levels of tonal pitch.
Equation (1) can be interpreted from the point of view of Fourier analysis or
complex algebra. The Fourier view is useful to visualize TIVs and interpret 𝑘 as
musical intervals, whereas the algebra view (explained in section 6) is used to define
objective measures that capture perceptual features of the pitch sets represented by the
TIVs. The Fourier view interprets 𝑇𝑘 as a sequence of complex numbers with 𝑘
. When 0𝑛11, 𝑘 is usually set 0𝑘11. In practice, 1𝑘6 for 𝑇𝑘
since the coefficients for 7 k 12 are equal to 𝑇𝑘 for 1 k 6 because of the
symmetry properties of the DFT (Oppenheim et al., 1989). In this section, 𝑇𝑘 is
represented as magnitude |𝑇𝑘| versus k and phase 𝜑(𝑘)
versus k. For each index k,
we have
𝑇𝑘=𝔑𝑇𝑘!+𝔗𝑇𝑘! 1𝑘6,
where 𝔑 𝑇𝑘 and 𝔗 𝑇𝑘 denote the real and imaginary parts of 𝑇𝑘
The Tonal Interval Space uses the interpretation in Table 2, which we show in
Figure 2, using a strategy borrowed from Harte et al. (2006) to depict their 6-D space.
Each circle in Figure 2 corresponds to 𝑇𝑘 when 1𝑘6 in Equation (1). The
circle representing the intervals of m2/M7 has the real part of 𝑇1 on the x axis and
the imaginary part of 𝑇(1) on the y axis and so on. The integers around each circle
represent 0𝑛𝑁1 for N = 12, corresponding to the positions in the chroma
vector 𝑐𝑛. A fixed k in Equation (1) generates N = 12 points equally spaced by
𝜑𝑘= !!!"
!. Both in Table 2 and Figure 2, a musical nomenclature is adopted to
denote each of the DFT coefficients that arise from the interpretation of these points
as musical intervals. The musical interpretation assigned to each coefficient
corresponds to the musical interval that is furthest from the origin of the plane (i.e.,
the centre of the circles shown in Figure 2). For 𝑘=1 and 𝑘=5, the furthest musical
interval from the centre is formed between adjacent positions. For 𝑘=2, 𝑘=3,
𝑘=4, and 𝑘=6, the furthest interval from the centre is formed between overlapping
Figure 2. Visualisation of the TIV for the C major chord (pitch classes 0, 4, and 7) in
the 12-D Tonal Interval Space. Each circle corresponds to the coefficients of 𝑇(𝑘)
labelled according to the complementary musical intervals they represent. The TIVs
of isolated pitch classes lie on the circumference and the TIV corresponding to the
linear combination lies inside the region bounded by the straight lines connecting the
points. Shaded grey areas denote the regions that TIVs can occupy for each circle.
Position 𝑘
Steps 𝑛
Musical interval
TT (A4/D5)
Table 2. Intervallic interpretation of k fot T(k).
5 Multi-level pitch configurations and transposition
Section 4.1 demonstrated that the chroma vector 𝑐𝑛 can represent multi-
level pitch configurations as the sum of 𝑐𝑛 for each single pitch class. For example,
the chroma vector of the C major chord can be obtained as the sum of the chroma
vectors of its constituent pitch classes C, E, and G. Mathematically, 𝑐!,!,!0,4,7=
𝑐!0+𝑐!4+𝑐!7. Due to the linearity of the DFT, multi-level pitch
configurations in the Tonal Interval Space can be represented as a linear combination
of the DFT of its component pitch classes. Mathematically, 𝑇
Figure 2 illustrates 𝑇𝑘 for the C major chord as a convex combination of
𝑇(𝑘) for its component pitch classes. Convex combinations are linear combinations
!!!𝑇𝑘 where the coefficients 𝛼𝑘 are non-negative (i.e., 𝛼𝑘0) and
!=1. Geometrically, a convex combination always lies within the region
bounded by the elements being combined. So the convex combination of TIVs lies
inside the shaded regions shown in Figure 2 due to the normalization of 𝑐(𝑛) in
Equation (1). These regions can be obtained by connecting the adjacent TIVs of
isolated pitch classes.
An important feature of Western tonal music arising from 12 tone equal-
tempered tuning is the possibility to modulate across regions. This attribute
establishes hierarchies in tonal pitch, in which low-level components relate to, and are
commonly defined by their regional level. For example, we commonly define the
chords formed by the diatonic pitch set of C major region by the function they play
within that region, such as the chords of the tonic, sub-dominant, dominant, etc.
Perceptually, Western listeners also understand interval relations in different regions
as analogous (Deutsch, 1984). For example, the intervals from C to G in C major and
from C# to G# in C# major are perceived as equivalent. As Shepard (1982) claims,
this theoretical and perceptual aspect of Western tonal music is an important attribute
that should be modelled by tonal pitch spaces, which ‘must have properties of great
regularity, symmetry, and transformational invariance’ (p. 350). Briefly, a tonal space
must be transposition invariant.
In the Tonal Interval Space, transpositions by 𝑝 semitones result in rotations of
𝑇𝑘 by 𝜑𝑝=!!!"#
! radians. Transpositions of 𝑐𝑛, which by definition are
circular in the chroma domain, are represented as 𝑐𝑛𝑝. So, transposing C by
𝑝=7 results in G and by 𝑝=12 results in C. Using the properties of the Fourier
transform (Oppenheim et al., 1989), the pair 𝑐𝑛
𝑇𝑘 becomes 𝑐𝑛𝑝
!! where represents the DFT. Denoting 𝑇
!𝑘 as the TIV of 𝑐𝑛𝑝
we have
! .
Hence, any transposition 𝑐𝑛𝑝 resulting in 𝑇
!𝑘 has the same magnitude 𝑇𝑘
as the original sequence 𝑐𝑛 and a linear phase component 𝑒!!!!"
!!. Figure 3
illustrates the rotation of the TIV of the C major chord by one semitone.
Figure 3. Visualisation of the C major triad (pitch classes 0, 4, and 7—represented as
a squared) and the rotation of its TIV to transpose it one semitone higher (i.e. pitch
classes 1, 5, and 8—represented as a star).
6 Distance metrics in the Tonal Interval Space
This section illustrates the properties of the Tonal Interval Space which rely on
the complex algebra view of Equation (1), where 𝑇 𝑘!;𝑀=6. Here, 𝑇(𝑘) is
interpreted as a 6-D complex-valued vector in the space spanned by the Fourier basis
when 1𝑘6. Note that 6 complex dimensions correspond to 12 real dimensions
because the real and imaginary axes are orthogonal. Using the norm L2 in !, we can
define the inner product between 𝑇
! 𝑘 and 𝑇
! 𝑘, the norm of 𝑇
!(𝑘), and the
Euclidean distance between 𝑇
!(𝑘) and 𝑇
!(𝑘) as follows
!𝑘 𝑇
!𝑘cos 𝜃=𝑇
!𝑘 ,
where 𝑀= 6 is the dimension of the complex space, 𝜃 is the angle between 𝑇
! (𝑘)
and 𝑇
! (𝑘), and 𝑇
! (𝑘) denotes the conjugate transpose of 𝑇
! (𝑘). Equation (5) is the
inner product and Equation (6) is the Euclidean distance between 𝑇
!(𝑘) and 𝑇
Equation (7) is the norm of 𝑇
!(𝑘), which can also be calculated as the Euclidean
distance from the centre of the Tonal Interval Space 0 as 𝑇
We use equations (5), (6), and (7) within the Tonal Interval Space in order to
measure tonal pitch relations and consonance using complex algebra. The musical
interpretation of the algebraic properties are detailed at length in Section 8.
7 Improving the perceptual basis of the space
Following Chew (2000) and Harte et al. (2006), we apply a strategy to adjust
pitch class distances in our space. To this end, we apply weights 𝑤𝑘 to each circle
when calculating 𝑇𝑘 using Equation (1). By controlling the weights we can regulate
the contribution of the musical intervals associated with each of the DFT coefficients,
as described in Section 4. Specifically, we intend to use the weights as a means to
allow the computation of consonance of pitch configurations in the Tonal Interval
Space, which we calculate as the norm of a TIV (see Section 6).
We rely on two complementary sources of information to derive the set of
weights. First, the set of composite consonance ratings of dyads (Huron, 1994), as
shown in Table 3 and second, the relative ordering of triads according to increasing
dissonance (Cook et al., 2007): {maj/min, sus4, aug, dim}.2 Our goal is to find a set of
weights which both maximises the linear correlation with Huron’s composite
consonance ratings of dyads while simultaneously preserves Cook et al.’s relative
ordering of triads.3 While the search for weights can be considered a
2 Because the pitch configurations for major and minor triads contain identical relative intervals when
represented as chroma vectors, the Tonal Interval Space cannot disambiguate them, hence we must
consider them equally ranked.
3 While Roberts (1986) provides consonance ratings of triads, these were obtained from an
experimental design that relied heavily on a preceding musical context and not listener judgements of
multidimensional optimization problem, by applying two simplifying constrains we
can perform an exhaustive brute force search and thus consider all possible
combinations of weights. In this way, we can guarantee a near optimal result subject
to our constraints.
To allow a computationally tractable search for weights, we restrict the
properties of the weights follows: we allow only integer values in a defined range,
such that each 𝑤𝑘 can only take values between 1 and 20. Consequently this creates
a secondary constraint that the largest weight (i.e. the most important interval) can be,
at most, 20 times the smallest (i.e. the least important interval). Given that each
individual weight 𝑤𝑘 can take any value between 1 and 20 independently of all the
others, this provides a total of 𝐵 = 206-1 possible combinations of weights (i.e. 64
million) ranging from 𝑤!𝑘 ={1,1,1,1,1,1} to 𝑤!𝑘 = {20,20,20,20,20,20}. For
simplicity, we do not discard any sets of weights that are trivially related to one other
another in terms of scalar multiples.
For each set of weights 𝑤!𝑘 we first calculate the corresponding TIV,
!(𝑘), using Equation (1) for each dyad interval in Table 3 and the following set of
triads {maj, sus4, aug, dim}. We then calculate the consonance (i.e. the distance to the
centre of the Tonal Interval Space) as the magnitude 𝑇
!(𝑘) using Equation (7). We
then measure the linear correlation to Huron’s dyad consonance ratings, and verify the
ordering of the triads’ consonance according to Cook et al. From the complete set of
206-1 combinations of weights, we found 46 solutions (each of which is plotted in
Figure 4) that resulted in a linear correlation greater than .995 and preserved the triad
consonance ordering. Given the inherent similarity in shape of the different sets of
weights, we do not believe the choice over exactly which set of weights to be critical.
However, we ultimately selected the weights with the greatest mutual separation
between the triads according to consonance, thus 𝑤𝑘= {2 (m2/M7), 11 (TT), 17
(M3/m6), 16 (m3/M6), 19 (P4/P5), 7 (M2/m7)}.
Table 3. Composite consonance ratings based on normalized data from Malmberg
(1918), Kameoka and Kuriyagawa (1969), and Hutchinson and Knopoff (1979) (as
presented in Huron, 1994).
isolated triads. Therefore we do not attempt to directly incorporate these absolute ratings when
determining the weights 𝑤𝑘.
Figure 4. The set of weights that maximise the linear correlation with the composite
consonance ratings of dyads as shown in Table 3 while simultaneously preserving the
relative ordering of triads shown in Table 7. The bold black line corresponds to the set
of weights 𝑤𝑘 used in Equation (1).
8 Musical properties of the multi-level Tonal Interval Space
Pitch configurations are separated in the 12-D tonal pitch space by spatial and
angular distances whose metrics were presented in Section 6. In this section, we
discuss how these distances translate into musical properties within the most salient
hierarchical layers of the tonal system from lower to higher levels of abstraction, i.e.,
starting with the spatial relations between pitch classes, then chords, and finally keys
or regions.
The musical properties of the Tonal Interval Space can be split into two major
groups. The first is detailed in section 8.1 and reports the ability of the space to place
pitch configurations that share harmonic relations close to one another. The second is
reported in Section 8.2 and explains how 𝑇(𝑘) can be used as a measure of
8.1 Perceptual similarity among multi-level pitch configurations
Proximity in the Tonal Interval Space equates with how pitch structures are
understood within the Western tonal music context rather than objective pitch
frequency ratios. In other words, the closeness between pitch classes in our space
corresponds to interpreted proximity between pitch classes as used in the context of
Western tonal music rather than distances on a keyboard. For example, pitches placed
at a close distance on the keyboard, such as C and C#, are quite distant in our space.
In fact, objective frequency ratios among pitch classes are immediately
misrepresented in the chroma vector by collapsing all octaves into one, and even
further distorted in the weighted DFT of chroma vectors expressed by the TIVs.
𝜃 (in rad)
Table 4. Angular and Euclidean distances of complementary dyads in the Tonal
Interval Space presented from left to right in descending order of consonance.
Similar to Harte et al.’s (2006) 6-D space, the resulting structure of the Tonal
Interval Space inherits the pitch organization of the Tonnetz by wrapping the plane
into a toroid, see Harte et al. (2006) for a detailed explanation and illustration of this
operation. Therefore, in the Tonal Interval Space, as in the Tonnetz, the proximity of
dyads using both the angular and Euclidean distances computed by Equations (5) and
(6) are ranked as follows: unisons; perfect fourths/fifth; minor thirds/major sixths;
major thirds/minor sixth; tritone (augmented fourth or diminished fifth); major
second/minor seventh; and finally, minor second/major seventh (see Table 4).
Additionally, as a result of the symmetry of the Tonal Interval Space imposed by the
DFT, complementary intervals are at equidistant locations (see Section 5).
At the chordal level, the major/minor triad formation which groups close pitch
classes in the representations equates with the triads formed and commonly
highlighted in the Tonnetz (see Figure 1). Therefore, motions between adjacent triads
in the Tonal Interval Space indicate a chord progression that maximises the number of
common tones while minimising the displacement of moving voices (known as voice-
leading parsimony). For all regions we find, in the neighbourhood of the chord of the
tonic, the mediant and submediant chords, which each share two pitch classes with the
tonic. Neo-Riemannian theorists refer to these motions as primary transformations
(Cohn, 1997, 1998). Motions between chords that share fewer pitch classes are placed
further apart in the space.
Within the context of Western tonal music, we can also say that close
harmonic functions are depicted in our space as chord substitutions. Typical harmonic
progressions in Western tonal music remain at relatively close distances but are not
explicitly minimised in the space. Briefly, the chordal level in our space minimises
distances for common-tone chord progressions, which commonly substitute
themselves, rather than typical harmonic sequences.
Therefore, the Tonal Interval Space shows great potential to explore voice-
leading parsimony (as applied in Bernardes et al., 2015) and offers the possibility to
explore formal transformations that have been derived from Riemann's fundamental
harmonic theory (Lewin, 1982, 1987, 1992; Hyer 1995; Kopp, 1995; Mooney 1996;
Cohn 1997, 1998).
Figure 5. 2-D visualisation of the interkey distances in the Tonal Interval Space
amongst all major and minor regions using multidimensional scaling (De Leeuw &
Mair, 2009).4 The neighbour dominant (D), subdominant (SD), and relative (R)
regions of C major are emphasised.
Interkey distances in the Tonal Interval Space result in two concentric layers
which position keys by intervals of fifths. The outer layer (corresponding to vectors
with larger magnitude) contains the circle of fifths for all major keys and an inner
layer (corresponding to vectors with smaller magnitude) contains the circle of fifths
for all minor keys. Figure 5 illustrates interkey distances on a 2-D space. There, the
spatial proximity of each key to its dominant, subdominant and relative keys,
corresponds to our expectation of the proximity between the 24 major and minor keys
and adheres to Schoenberg’s (1969) map of key regions, which is a geometrical
representation of proximity between keys (Lerdahl, 1988, 2001).
The next consideration concerns the degree to which our space can explain the
interconnection of the three tonal pitch levels, and particularly the relation of the
lower abstraction levels with the top regional ones. This aspect is especially relevant
within the Western tonal music context because our understanding of pitch classes
and chords is dependent on their upper hierarchical levels (Krumhansl, 1990, pp.18-
21). Ideally, the three tonal pitch levels should interconnect and the distances among
4 In order to illustrate distances among pitch configurations in the 12-D Tonal Interval Space, we use
nonmetric multidimensional scaling (MDS) to plot it into a 2-dimensional plane. Shepard (1962) and
Kruskal (1964) first used this method, which has been extensively applied to visualise representations
of multidimensional pitch structures (Krumhansl & Kessler, 1982; Barlow, 2012, Lerdahl, 2001).
Briefly, nonmetric MDS attempts to transform a set of n-dimensional vectors, expressed by their
distance in the item-item matrix, into a spatial representation that exposes the interrelationships among
a set of input cases. We use the smacof library from the statistical analysis package ‘R’ to compute
dimensionality reduction using a nonmetric MDS algorithm. More specifically, we use the function
smacofSym, with ‘ordinal’ type and ‘primary’ ties.
pitch classes, chords and regions should be meaningful.
In the Tonal Interval Space, the pitch class set of different diatonic regions
occupies a compact neighbourhood. The same property applies to the set of diatonic
triads within a region because their location is the convex combination of the 𝑇(𝑘) for
its component pitch classes as explained in Section 5. Table 5 reinforces the validity
of this assertion by showing the angular and Euclidean distances between all
individual pitch classes from the C major and C harmonic minor regions. The set of
diatonic pitch classes (in bold) of each region are at smaller distances than the
remaining pitch classes. Due to the transposition invariance of the Tonal Interval
Space, these results hold true for all remaining major and minor regions in our space.
Table 5. Angular (θ) and Euclidean (d) distances between C major and C harmonic
minor regions TIVs (labelled as upper and lower case ‘c’, respectively) and the entire
set of pitch classes. The diatonic pitch class set of each region is presented in bold
Figure 6. 2-D visualisation of the diatonic triads of the C major region in the Tonal
Interval Space using nonmetric MDS. Riemann’s harmonic categories (tonic,
subdominant, and dominant) are well represented in the space and typical motions
between these are denoted by dashed lines.
Finally, as illustrated in Figure 6, the diatonic set of chords around a key TIV
is organized according to Riemman’s categorical harmonic functions and distributed
in roughly equal angular distances around its key centre. Chords common to more
than one region, also referred to as pivot chords, are located at the edge of the regions.
This allows the Tonal Interval Space to explain the modulation between keys or
regions as these chords are typically used to smoothly transition between them. See
(Bernardes et al., 2015) for a more comprehensive explanation of the angular distance
between key TIVs and their diatonic chordal set and an application of this property to
generate musical harmony and estimate the key of a musical input.
Table 6. Consonance level of all interval dyads within an octave.
Table 7. Consonance level of chords measured by our model (presented by increasing
order of consonance).
8.2 Measuring consonance
The Tonal Interval Space follows the pitch organization of the Tonnetz and
expands this geometric pitch representation with the possibility to compute indicators
of tonal consonance. Two important elements in Equation (1) allow the computation
of consonance, the normalization by 𝑇0 and the weights 𝑤𝑘. The former was
discussed in Section 4.2 and constrains the space to a limited area for all possible
(multi-)pitch configurations that a chroma vector can represent. The latter was
discussed in Section 7 and distorts the DFT coefficients to regulate the contribution of
each interval according to empirical ratings of consonance. These two elements create
a space in which pitch classes (at the edge of the space and furthest from the centre)
are considered the most consonant configurations. A chroma vector 𝑐𝑛 with all
active elements will be located in the centre of space 0, which we consider the most
dissonant. Within this range, the consonance of any pitch configuration can then be
measured. Hence, we extrapolated the consonance measure of the TIV by the norm
𝑇𝑘 given by Equation (7).
Due to the symmetry of the Tonal Interval Space, complementary intervals
and transposition share the same level of consonance as indicated in Equation (4). In
fact, 12 transpositions of 𝑇(𝑘) by 𝑝=1 semitone creates a concentric layer of 12
instances with the same magnitude 𝑇𝑘. Given this formulation, we present the
level of consonance for all interval dyads within an octave in Table 6 and the
consonance level of common triads and tetrads in tonal music in Table 7.
By comparing the values presented in Tables 5 and 6, we note that our
consonance measure contradicts a limitation of the sensory dissonance models - one
of the most popular models to measure innate aspects of consonance. As Huron
emphasises (cited in Mashinter, 2006), in sensory dissonance models adding spectral
components always results in an increase of sensory dissonance. In the Tonal Interval
Space, sonorities with fewer notes or partials may have a higher level of dissonance
than sonorities with more notes or partials, depending on the level to which it ‘fits’
triadic harmony and tonal structures.
9 Evaluation and discussion
The consonance level modelled in the Tonal Interval Space constitutes an
innovative aspect that has not been investigated in any other Tonnetz-derived spaces.
While our method provides the possibility to compute a consonance indicator in the
space by design, we now investigate whether other spaces are equally adept to that
task. In particular, given the resemblance of the Tonal Interval Space with Harte et
al.’s (2006) Tonal Centroid Space, we assess if the latter embeds properties for
computing tonal pitch consonance. Additionally, we further assess the role of the
weights 𝑤𝑘 in the Tonal Interval Space by comparing the consonance measurement
in a uniform version of the space. Both aforementioned spaces can be computed using
Equation (1) by assigning different weights 𝑤𝑘. In Harte et al.’s 6-D space 𝑤!𝑘=
{0, 0, 1, .5, 1, 0} whose non-zero weights correspond to the musical interpretation of
major thirds/minor sixth, minor third/major sixth, and perfect fourth/perfect fifth,
respectively (see Figure 3). In the uniform version of the Tonal Interval Space
𝑤!𝑘= {1, 1, 1, 1, 1, 1}. To investigate the behaviour of the spaces in measuring
tonal pitch consonance, we will adopt the same consonance measure used in the Tonal
Interval Space, computed by Equation (7), for all dyads and common triads.
To analyse the results we will use the Pearson correlation coefficient to
compare the tonal pitch consonance indicators computed in the spaces with empirical
ratings of dyads’ consonance (used to build the model and shown in Table 3), and the
ranking order of common triads’ consonance derived from both listening experiments
(Roberts, 1986; Cook et al., 2012) along with psychoacoustic models of sensory
dissonance (Plomp & Levelt, 1965; Parcutt, 1989; Sethares, 1999). Our hypothesis is
that, since we explicitly choose weights to control consonance, 𝑤!𝑘 and 𝑤!𝑘 will
be less effective in highlighting the consonance of pitch configurations, respectively
due to the exclusion of three intervals in 𝑤!𝑘 and the omission of any meaningful
distortion of the weights in 𝑤!𝑘.
Figure 7 shows the correlation between the spaces under evaluation, to which
we included our proposed Tonal Interval Space for the purpose of visual comparison.
The correlation between empirical data and the uniform Tonal Interval Space (r=-
.201, p=.703) shows the DFT of chroma vectors carry no information about tonal
consonance, reinforcing the positive impact of explicitly designing the weights in the
Tonal Interval Space. The correlation between empirical data and Harte et al.’s 6-D
space (r=.741, p=.09) shows that the space while positively correlated, is limited as
an indicator of tonal consonance. In particular, this is shown in Figure 7 by the outlier
corresponding to the consonance of the tritone interval in Harte et al.’s 6-D space,
which is explicitly not modelled in their space.
Figure 7. Scatter plot exposing the correlation between empirical consonance ratings
for complementary dyads (Huron, 1994) and their consonance level in three
theoretical models: Tonal Interval Space (bold line), uniform Tonal Interval Space
(dashed line) and Harte et al.’s 6-D space (dotted line). Plotted data is normalized to
zero mean and unit variance for enhanced visualisation.
Empirical ratings
Theoretical models
Sensory dissonance models
Tonal pitch spaces
Cook et
(Harte et
al., 2006)
Table 8. Ranking order of chord consonance based on Cook et al. (2007) comparing
empirical data derived from listening experiments and theoretical models. 1
corresponds to the most consonant chord and 5 the most dissonant.
We additionally assess how the consonance level of common triads measured
in Harte et al.’s (2006) 6-D space and the uniform version of the Tonal Interval Space
compares to empirical studies (Roberts, 1986; Cook et al., 2007) and psychoacoustic
models of sensory dissonance (Plomp & Levelt, 1965; Parcutt, 1989; Sethares,
1999).5 To this end, we compared the ranking order of common triads’ consonance of
5 A major difference between the two empirical studies conducted relies on their population. While
Roberts’ (1986) study was conducted among Western listeners, Cook et al.’s (2007) study involved
the three theoretical models and both empirical ratings and psychoacoustic models of
sensory dissonance. As shown in Table 8, the Harte et al.'s 6-D space and the uniform
Tonal Interval Space fail to predict the relative consonance of common triads’
consonance, further reinforcing the impact the weights and extended intervallic
representations have on the Tonal Interval Space. Table 8 additionally shows that
psychoacoustic models of sensory dissonance also fail to predict the relative
dissonance of common triads as expressed by the results of the empirical listening
While preserving the pitch organization of the Tonnetz, the Tonal Interval
Space constitutes an extension of Tonnetz-derived spaces towards the possibility to
compute a consonance indicator in the space. Furthermore, by expanding the number
of dimensions in relation to similar spaces, and in particular to Harte et al.’s (2006) 6-
D space, we obtain a finer definition of the intervallic content of chroma vectors,
whose contribution we were able to fine-tune by adopting the set of weights 𝑤𝑘. In
doing so, our model not only ensures that all information from the chroma vector is
retained in the TIV, but also guarantees that each TIV occupies a unique location in
the 12-D Tonal Interval Space.6 Both these properties are not found in any of the
existing tonal pitch spaces. Finally, despite its larger number of dimensions and the
increased complexity of the Tonal Interval Space in relation to similar spaces, we
believe that using the DFT makes it particularly accessible to the music, signal
processing, and MIR communities.
10 Conclusions and future work
In this paper we presented a 12-D Tonal Interval Space that represents pitch
configurations by the location of Tonal Interval Vectors, which are calculated as the
DFT of 12-element chroma vectors. A visualisation of the 12-D space is provided by
6 circles, each representing a DFT coefficient, from which we devised a musical
interpretation. The contribution of each DFT coefficient (or circles in the
visualisation) is then weighted according to empirical ratings of dyads consonance to
improve the relationship among pitch configurations at the three most important
levels of tonal pitch in Western music, i.e., pitches, chords, and regions, as well as
allowing the computation of a consonance indicator in the space.
While preserving the pitch organization and common-tone logic of the
Tonnetz, our 12-D space expands its range of representable pitch configuration
beyond major and minor triads. In relation to Chew’s Spiral Array, the input of our
space is more flexible in the sense that it allows the codification of any sonority
representable as a chroma vector albeit subject to enharmonic equivalence. In relation
to Harte et al.’s research, we expand their 6-D space to include all possible interval
relations within one octave, and hence the ability to represent all pitch configurations
by a unique location in space.
Two major indicators can be computed in the Tonal Interval Space. The first,
explains the relation among pitch configurations in light of the Western tonal music
East Asian listeners. The population of both studies included individuals with and without music
training. The remaining psychoacoustic-based models aim at measuring auditory roughness, which
largely equates with sensory dissonance (Sethares, 1999).
6 By guaranteeing uniqueness, our space avoids the overlap between relevant tonal pitch configurations
as in the Harte et al.’s (2006) Tonal Centroid Space, such as the pair of dyads F#-B (P4) and D-G#
(D5)pitch classes [5, 11] and [2, 8])and the D diminished seventh chord and the dyad D-A#
(A4)pitch classes [2, 5, 8, 11] and [2, 8].
theory principles by the angular and Euclidean distances among TIVs. Additionally,
due to the possibility to represent all hierarchical levels of tonal music in the same
space, given by the normalisation strategy applied in Equation (1), we can equally
compare and relate multi-level TIVs.
The second, and most innovative aspect of the Tonal Interval Space is the
possibility to compute indicators of tonal consonance for multi-level pitch
configurations as the norm of the TIVs. To the best of our knowledge, this attribute
has not been considered in any other Tonnetz-derived spaces, nor any other tonal
pitch spaces. By encoding all intervallic content of chroma vectors, distorted by both
the DFT and weights derived from dyads and triads consonance, we enhance the pitch
organization by allowing the measurement of consonance without disrupting the
Tonnetz-like pitch organization.
Our goal in this paper was to present the Tonal Interval Space from a
theoretical perspective, hence aspects concerning its scope and wider applicability are
somewhat superficially treated. Nonetheless, the space has been successfully used in
different application areas within the scope of generative music and MIR. In
generative music, we have explored its potential to generate a corpus of chords related
to a user-defined region (Navarro et al., 2015) as well as the possibility to smoothly
transition (or modulate) between regions in real-time (Bernardes et al., 2015). In
(Bernardes et al., 2015) we further explored the capabilities of the Tonal Interval
Space to harmonize a given input using its ability to generate tonal harmony with
consonance and perceptual relatedness as parameters. Additionally, in order to
identify the region of the musical input we proposed a key induction algorithm which
outperforms the current state-of-the-art.
Despite the robustness of our consonance measurement in the context of
Western tonal music, we are aware that our measure may fail to capture some aspects
of consonance and dissonance, because it does not take into account the physical or
physiological aspects of this phenomenon, which are directly related with frequency
ratios among the partials of a sonority (Sethares, 1999). Despite these limitations, our
consonance measure sheds some light on the future development of musical
consonance models that consider both schemata learned culturally and innate physical
and physiological principles.
Another limitation of our space is that it currently ignores the temporal
dimension of music, or simply put, the order of musical events. Therefore, even
though the perceived relation of tonal pitch events is known to depend on the order in
which they are sounded (Krumhansl, 1990, pp.121-123), we cannot yet account for
that feature in our space due to its symmetry, which is inherent to Fourier spaces. On
the other hand, the symmetry of the space imposed by the DFT is particularly relevant
to create a transposition invariant space, seminal to tonal pitch structures.
Additionally, we believe that many other mathematical properties of the DFT may be
useful musical counterparts, and we plan to study these further in the future. Among
them, we can highlight the of the capability to transform the Tonal Interval Space
back to the chroma space by computing the inverse Fourier transform.
In future work, we aim to assess the level to which the Tonal Interval Space
conforms to empirical judgments of tonal pitch relatedness, with the ultimate goal of
improving the distances among multi-level TIVs. For example, despite the current
possibility to compute the set of diatonic pitch classes of a given region, the distances
among pitch classes and key TIVs do not express the goodness of fit of the pitch
classes into that region.
Finally, the initial experiments reported here were conducted under very
controlled conditions by manually encoding pitch configurations as binary chroma
vectors. However, despite the possibility to represent musical audio (e.g. with chroma
vectors calculated from audio signals), further tests must be conducted in order to
understand the robustness of our space under such a non-binary input. In doing so, we
aim to study and expand our model with relevant dimensions in musical practice,
notably timbre/spectral and amplitude information. Ultimately, we want to describe
musical audio as robustly as symbolic music representations and apply our model
within the realm of performed music.
Amiot, E., Sethares, B. (2011). An Algebra for Periodic Rhythms and Scales, JMM, 3.
Amiot, E. (2013). The Torii of Phases. In J. Yust, J. Wild, & J. A. Burgoyne (Eds.),
Mathematics and Computation in Music, Lecture Notes in Computer Science,
Vol. 7937 (pp. 1–18). Springer Berlin Heidelberg.
Barlow, C. (2012). On Musiquantics. Mainz: Johannes Gutenberg-Universität Mainz.
Balzano, G. J. (1980). The Group-theoretic Description of 12-fold and Microtonal
Pitch Systems. Computer Music Journal, 4(4):66-84.
Balzano, G. J. (1982). The pitch set as a level of description for studying musical
pitch perception. In Music, Mind, and Brain (pp. 321-351). Springer US.
Behringer, R., & Elliot, J. (2009). Linking Physical Space with the Riemann Tonnetz
for Exploration of Western Tonality. In Hermida, J. & Ferrero, M. (Eds.) Music
Education (pp. 131-143). Hauppauge, NY: Nova Science Publishers, Inc.
Bharucha, J. J. (1984). Event Hierarchies, Tonal Hierarchies, and Assimilation: A
Reply to Deutsch and Dowling. Journal of Experimental Psychology: General,
113, 421-425.
Bernardes, G., Cocharro, D., Guedes, C., & Davies, M.E.P. (2015). "Conchord: An
Application for Generating Musical Harmony by Navigating in a Perceptually
Motivated Tonal Interval Space". Proceedings of the 11th International
Symposium on Computer Music Modeling and Retrieval (CMMR), (pp. 71-86).
Plymouth, UK.
Bernardes, G., Cocharro, D., Guedes, C., Davies, M.E.P. (2016). "Harmony
Generation Driven by a Perceptually Motivated Tonal Interval Space". ACM
Computers in Entertainment. In Press.
Chew, E. (2000). Towards a Mathematical Model of Tonality. Ph.D. dissertation,
Chew, E., & Chen, Y. (2003). Determining Context-defining Windows: Pitch spelling
Using the Spiral Array. In Proceedings of the International Society(for(Music(
Chew, E. (2008). Out of the Grid and Into the Spiral: Geometric Interpretations of and
Comparisons with the Spiral-Array Model. Computing in Musicology, 15: 51-
Clough, J. & Douthett, J. (1991). Maximally Even Sets. Journal of Music Theory, 35:
Cohn, R. (1997). Neo-Riemannian Operations, Parsimonious Trichords, and Their
Tonnetz Representations. Journal of Music Theory, 41: 1–66.
Cohn, R. (1998). Introduction to Neo-Riemannian Theory: A Survey and a Historical
Perspective. The Journal of Music Theory, 42(2): 167-180.
Cook, N. D., Fujisawa, T., & Konaka, H. (2007). Why Not Study Polytonal
Psychophysics? Empirical Musicology Review, 2(1): 38-44.
Cook, N. D. (2012). Harmony, Perspective, and Triadic Cognition. New York:
Cambridge University Press.
Callender, C. (2007). Continuous Harmonic Spaces. Journal of Music Theory, 51(2):
De Leeuw, J., & Mair, P. (2009). Multidimensional Scaling Using Majorization:
SMACOF in R. Journal of Statistical Software, 31(3): 1-30.
Deutsch, D. (1984). Two Issues Concerning Tonal Hierarchies: Comment on
Castellano, Bharuch, and Krumhansl. Journal of Experimental Psychology:
General, 113(3): 413-416.
Euler, L. (1739). Tentamen novae theoriae musicae. St. Petersburg. New York:
Broude, 1968.
Gatzsche, G., Mehnert, M., & Stöcklmeier, C. (2008). Interaction with Tonal Pitch
Spaces. In Proceedings of the International Conference on New Interfaces for
Musical Expression (pp. 325-330). Genova: NIME.
Harte, C., Sandler, M., & Gasser, M. (2006). Detecting Harmonic Change in Musical
Audio. In Proceedings of the 1st ACM Workshop on Audio and Music
Computing Multimedia (pp. 21-26). New York: ACM.
Hyer, B. (1995). Re-Imagining Riemann. Journal of Music Theory, 39(1), 101–138.
Huron, D. (1994). Interval-Class Content in Equally Tempered Pitch-Class Sets:
Common Scales Exhibit Optimum Tonal Consonance. Music Perception: An
Interdisciplinary Journal, 11(3): 289-305.
Hutchinson, W. & Knopoff, L. (1979). The Acoustic Component of Western
Consonance. Interface, 10(2): 129-149.
Kameoka, A. & Kuriyagawa, M. (1969). Consonance Theory. Part I: Consonance of
Dyads. Journal of Acoustical Society of America, 45: 1451-1459.
Kendall, M. (1938). A New Measure of Rank Correlation. Biometrika, 30(1–2): 81–
Kopp, D. (1995). A Comprehensive Theory of Chromatic Mediant Relations in Mid-
Nineteenth-Century Music. Ph.D. dissertation, Brandeis University.
Kross, D. (2006). Modernism and Tradition and the Traditions of Modernism.
Muzikologija, (6): 19-42.
Krumhansl, C. L. & Kessler, E. J. (1982). Tracing the Dynamic Changes in Perceived
Tonal Organization in a Spatial Representation of Musical Keys. Psychological
Review, 89: 334-368.
Krumhansl, C. (1990). Cognitive Foundations of Musical Pitch. Oxford: Oxford
University Press.
Krumhansl, C. L. (1998). Evidence Supporting the Psychological Reality of Neo-
Riemannian Transformations. Journal of Music Theory, 42: 265–281.
Kruskal, J. B. (1964). Nonmetric Multidimensional Scaling: A Numerical Method.
Psychometrika, 29: 28-42.
Lee, K. (2007). A System for Automatic Chord Transcription from Audio Using
Genre-specific HMM. In Boujemaa, N., Detyniecki, M., & Nürnberger (Eds.)
Adaptive Multimedial Retrieval: Retrieval, User, and Semantics (pp. 134–146).
Berlin-Heidelberg: Springer-Verlag.
Lee, K. & Slaney, M. (2007). A Unified System for Chord Transcription and Key
Extraction using Hidden Markov Models. In Proceedings of International
Conference on Music Information Retrieval (pp. 245-250). Vienna: ISMIR.
Lerdahl, F. (1988). Tonal Pitch Space. Music Perception, 5(3): 315-350.
Lerdahl, F. (2001). Tonal Pitch Space. New York: Oxford University Press.
Lewin, D. (1982). A Formal Theory of Generalized Tonal Functions. Journal of
Music Theory, 26: 23–60.
Lewin, D. (1987). Generalized Musical Intervals and Transformations. New Haven:
Yale University Press.
Lewin, D. (1992). Some Notes on Analyzing Wagner: The Ring and Parsifal.
Nineteenth Century Music, 16: 49–58.
Lewin, D. (2001). Special Cases of the Interval Function between Pitch-Class Sets X
and Y. Journal of Music Theory, 45: 1-29.
Longuet-Higgins, H. C. (1962). Two Letters to a Musical Friend. Music Review, 23:
244-248 and 271-280.
Malmberg, C. F. (1918). The Perception of Consonance and Dissonance.
Psychological Monographs, 25(2): 93-133.
Mashinter, K. (2006). Calculating Sensory Dissonance: Some Discrepancies Arising
from the Models of Kameoka & Kuriyagawa, and Hutchinson & Knopoff.
Empirical Musicology Review, 1(2): 65-84.
Mauch, M. (2010). Automatic Chord Transcription from Audio Using Computational
Models of Musical Context. Ph.D. dissertation, University of London.
Mooney, M. K. (1996). The Table of Relations and Music Psychology in Hugo
Riemann's Harmonic Theory. Ph.D. dissertation, Columbia University.
Navarro, M., Caetano, M., Bernardes, G., de Castro, L. N., & Corchado, J. M. (2015).
Automatic Generation 29 Page 14 Line 68 of Chord Progressions with an
Artificial Immune System. In Evolutionary and Biologically Inspired Music,
Sound, Art and Design (pp. 175-186). Springer International Publishing.
Parncutt, R. (1989). Harmony: A Psychoacoustical Approach. Berlin: Springer.
Peiszer, E., Lidy, T., & Rauber, A. (2008). Automatic Audio Segmentation: Segment
Boundary and Structure Detection in Popular Music. In Proceedings of the
International Workshop on Learning the Semantics of Audio Signals (pp. 45-
59). Paris: LSAS.
Plomp, R. & Levelt, W. J. M. (1965). Tonal Consonance and Critical Bandwidth.
Journal of the Acoustical Society of America, 38: 548-60.
Quinn, I. (2006). General Equal-tempered Harmony (Introduction and Part I).
Perspectives of New Music, 44(2): 114–158.
Quinn, I. (2007). General Equal-tempered Harmony: Parts 2 and 3. Perspectives of
New Music 45(1): 4–63.
Roberts, L. A. (1986). Consonance Judgements of Musical Chords by Musicians and
Untrained Listeners. Acta Acustica United with Acustica, 62(2): 163-171.
Shepard, R. N. (1962). The Analysis of Proximities: Multidimensional Scaling with
an Unknown Distance Function. I & II. Psychometrika, 27: 125-140 and 219-
Shepard, R. N. (1982). Structural Representations of Musical Pitch. In Deutsch, D.
(Ed.) The Psychology of Music (pp. 335–353). New York: Academic Press.
Sethares, W. (1999). Tuning, Timbre, Spectrum, Scale. London: Springer-Verlag.
Schoenberg, A. (1969). Structural Functions of Harmony [1954], 2nd edition (revised
by Leonard Stein). New York: W. W. Norton, Inc.
Oppenheim, A., Schafer, R., & Buck, J. (1989). Discrete-Time Signal Processing. 2nd
Ediction. Englewood Cliffs: Prentice-hall.
Temperley, D. (2001). The Cognition of Basic Musical Structures. Cambridge, MA:
MIT Press.
Tymoczko, D. (2011). A Geometry of Music: Harmony and Counterpoint in the
Extended Common Practice. New York: Oxford University Press.
Ueda, Y., Uchiyama, Y., Nishimoto, T., Ono, N., & Sagayama, S. (2010). HMM-
Based Approach for Automatic Chord Detection Using Refined Acoustic
Features. In Proceedings of the 2010 IEEE International Conference on
Acoustics Speech and Signal Processing (pp. 5518-5521). Dallas, TX: IEEE.
Weber, G. (1817-21). Versuch Einer Geordeneten Theorie der Tonsetzkunst [Theory
of Musical Composition]. Mainz: B. Schotts Söhne.
Werts, D. (1983). A Theory of Scale References. Ph.D. dissertation, Princeton
Yust, J. (2015). Schubert’s Harmonic Language and Fourier Phase Space. Journal of
Music Theory, 59(1): 121-181.
... To account for the two limitations identified above, we consider in this paper a set of harmonic features based on the Tonal Interval Vector (TIV) space proposed in [14]. Multi-level pitch is mapped into a 6-dimensional complex space whose distances capture perceptual relationships between sonorities. ...
... We make the following four main contributions. (1) We consider the TIVs proposed in [14] for style classification. (2) On the TIV space, we advance a set of novel harmonic features for capturing long-term hierarchical harmonic relationships. ...
... TIVs represent multi-level pitch in a geometrical space where vector distances relate to their perceived proximity [14]. The perceptual basis of the TIV space addresses three common limitations in preceding tonal pitch spaces [21][22][23][24]. ...
Conference Paper
Full-text available
The extraction of harmonic information from musical audio is fundamental for several music information retrieval tasks. In this paper, we propose novel harmonic audio features based on the perceptually-inspired tonal interval vector space, computed as the Fourier transform of chroma vectors. Our contribution includes mid-level features for musical dissonance, chromaticity, dyadicity, triadicity, diminished quality, diatonicity, and whole-toneness. Moreover, we quantify the perceptual relationship between short-and long-term harmonic structures, tonal dispersion, harmonic changes, and complexity. Beyond the computation on fixed-size windows, we propose a context-sensitive harmonic segmentation approach. We assess the robustness of the new harmonic features in style classification tasks regarding classical music periods and composers. Our results align with, slightly outperforming, existing features and suggest that other musical properties than those in state-of-the-art literature are partially captured. We discuss the features regarding their musical interpretation and compare the different feature groups regarding their effectiveness for discriminating classical music periods and composers.
... Note events that strongly support the underlying tonality or harmonic progression are often prioritized, as they are key structural points in the melodic structure. Several strategies have been pursued to capture pitch stability within Western tonal contexts [4,28,29,39]. ...
... where = 12, and ( ) = {2, 11, 17, 16, 19, 7}, which are weights empirically driven from complementary dyad consonance ratings to regulate the importance of each DFT dimension [4]. Eq 1 adopts as the 1 norm of the ( ) vector to represent all pitch structures (i.e., from a note to a chord, a key, or a segment of any length) within the same unit circle space. ...
Conference Paper
Full-text available
We propose a method for computing the similarity of symbolically-encoded Portuguese folk melodies. The main novelty of our method is the use of a preprocessing melodic reduction at multiple hierarchies to filter the surface of folk melodies according to 1) pitch stability, 2) interval salience, 3) beat strength, 4) durational accents, and 5) the linear combination of all former criteria. Based on the salience of each note event per criteria, we create three melodic reductions with three different levels of note retention. We assess the degree to which six folk music similarity measures at multiple reduction hierarchies comply with collected ground truth from experts in Portuguese folk music. The results show that SIAM combined with 75th quantile reduction using the combined or durational accents best models the similarity for a corpus of Portuguese folk melodies by capturing approximately 84-90% of the variance observed in ground truth annotations.
... Other important coefficients, â 3 and â 4 , indicate when a vector is concentrated around some division of the pitch-class circle by three or four respectively, and so are useful for identifying triads and seventh chords. Previous research (Yust, 2017b(Yust, , 2019Bernardes et al., 2016) has shown that two dimensions of the DFT, â 3 and â 5 , are effective in estimating the key of passages of tonal music and sorting harmonic functions. Because typical pitch-class profiles of major and minor keys have most of their energy in â 3 and â 5 , a two-dimensional space on the phases of these, denoted φ 3 and φ 5 , can serve as a map of key relatedness (Krumhansl, 1990;Yust, 2017b). ...
... The φ 3 and φ 4 dimensions in Figure 4 sort harmonies roughly according to the conventional functional categories of subdominant, dominant, and tonic (as observed by Bernardes et al., 2016;Yust, 2017aYust, , 2019. The spread in these dimensions therefore reflects the representation of all functional categories in each group. ...
... From previous work on adopting Fourier space to describe musical objects and their intrinsic relations (Quinn, 2006;Yust, 2015;Amiot, 2016;Bernardes et al., 2016), we adopt two different encodings based on the discrete Fourier transform (DFT) of pitch distributions. The first applies the DFT on a binary pitch distribution of m = 128 elements, similar to a piano roll, i.e., a binary vector representation where active notes are represented as ones. ...
Conference Paper
Full-text available
Variational Autoencoders (VAEs) have proven to be effective models for producing latent representations of cognitive and semantic value. We assess the degree to which VAEs trained on a prototypical tonal music corpus of 371 Bach's chorales define latent spaces representative of the circle of fifths and the hierarchical relation of each key component pitch as drawn in music cognition. In detail, we compare the latent space of different VAE corpus encodings-Piano roll, MIDI, ABC, Tonnetz, DFT of pitch, and pitch class distributions-in providing a pitch space for key relations that align with cognitive distances. We evaluate the model performance of these encodings using objective metrics to capture accuracy, mean square error (MSE), KL-divergence, and computational cost. The ABC encoding performs the best in reconstructing the original data, while the Pitch DFT seems to capture more information from the latent space. Furthermore, an objective evaluation of 12 major or minor transpositions per piece is adopted to quantify the alignment of 1) intra-and inter-segment distances per key and 2) the key distances to cognitive pitch spaces. Our results show that Pitch DFT VAE latent spaces align best with cognitive spaces and provide a common-tone space where overlapping objects within a key are fuzzy clusters, which impose a well-defined order of structural significance or stability-i.e., a tonal hierarchy. Tonal hierarchies of different keys can be used to measure key distances and the relationships of their in-key components at multiple hierarchies (e.g., notes and chords). The implementation of our VAE and the encodings framework are made available online.
... In the harmonic feature space T p (k), the cosine distance indicates the retaining of common tones in the mashups' harmonic content [15,41]. In the rhythmic feature space r p (q), the cosine distance gradually increases from rhythmic periodicities at the same tempo and subdivision to less identical rhythmic structures within the same tempo (or metrical structure, such as double or half tempo rhythmic structures) and, finally, to rhythmic periodicities that are not multiples of the tempo [16]. ...
Full-text available
We advance Mixmash-AIS, a multimodal optimization music mashup creation model for loop recombination at scale. Our motivation is to (1) tackle current scalability limitations in state-of-the-art (brute force) computational mashup models while enforcing the (2) compatibility of audio loops and (3) a pool of diverse mashups that can accommodate user preferences. To this end, we adopt the artificial immune system (AIS) opt-aiNet algorithm to efficiently compute a population of compatible and diverse music mashups from loop recombinations. Optimal mashups result from local minima in a feature space representing harmonic, rhythmic, and spectral musical audio compatibility. We objectively assess the compatibility, diversity, and computational performance of Mixmash-AIS generated mashups compared to a standard genetic algorithm (GA) and a brute force (BF) approach. Furthermore, we conducted a perceptual test to validate the objective evaluation function within Mixmash-AIS in capturing user enjoyment of the computer-generated loop mashups. Our results show that while the GA stands as the most efficient algorithm, the AIS opt-aiNet outperforms both the GA and BF approaches in terms of compatibility and diversity. Our listening test has shown that Mixmash-AIS objective evaluation function significantly captures the perceptual compatibility of loop mashups (p < .001).
Long-structured music generation that can be compared to human compositions remains an unresolved area of research. Since their introduction, the Transformer model and its variations, which rely on self-attention, have gained popularity in generating long-structured music. However, these models employ the teacher-forcing approach during training, which causes an exposure bias problem. Consequently, the generative model is incapable of producing music that consistently adheres to music theory. To address this issue, we propose a new Linear Transformer-GAN structure that generates high-quality music using a discriminator that has been trained to detect exposure bias. The Linear Transformer, a new and efficient variation of transformers, is creatively integrated with a generative adversarial network (GAN) to form our proposed model. In order to overcome the limitations of discrete domain data in GAN, we use the Policy Gradient and present a new discriminator structure that evaluates the current sequence reward based on several dimensions of music information. We use both the cross-entropy loss of different information dimensions and a music-theoretic mechanism to train the discriminator. Our experiments demonstrate that the proposed model generates music more consistent with music theory and is perceived as more pleasurable by listeners. This conclusion is supported by objective metrics and human evaluation. Overall, our approach offers a promising solution to the exposure bias problem in long-structured music generation and provides a more effective means of generating music that adheres to established music theory principles.
Full-text available
FluidHarmony is an algorithmic method for defining a hierarchical harmonic lexicon in equal temperaments. It utilizes an enharmonic weighted Fourier transform space to represent pitch class set (pcsets) relations. The method ranks pcsets based on user-defined constraints: the importance of interval classes (ICs) and a reference pcset. Evaluation of 5,184 Western musical pieces from the 16th to 20th centuries shows FluidHarmony captures 8% of the corpus's harmony in its top pcsets. This highlights the role of ICs and a reference pcset in regulating harmony in Western tonal music while enabling systematic approaches to define hierarchies and establish metrics beyond 12-TET.
DJ track selection can benefit from software-generated recommendations that optimise harmonic transitions. Emerging techniques (such as Tonal Interval Vectors) enable the definition of new metrics for harmonic compatibility (HC) estimation that improve the performance of existing applications. Thus, the aim of this study is to provide the DJ with a new tool to improve his/her musical selections. We present a software package that can estimate the HC between digital music recordings, with a particular focus on modern dance music and the workflow of the DJ. The user must define a target track for which the calculation is to be made, and obtains the HC values expressed as a percentage with respect to each track in the music collection. The system also calculates a pitch transposition interval for each candidate track that, if applied, maximizes the HC with respect to the target track. Its graphical user interface allows the user to easily run it simultaneously with the DJ software of choice during live performances. The system, tested with musically experienced users, generates pitch transposition suggestions that improve mixes in 73.7% of cases.KeywordsDJHarmonic compatibilityHarmonic mixingTonal Interval Vector (TIV)Pitch transpositionInterface
This article demonstrates how to obtain a periodicity-based description of cyclic rhythms using the discrete Fourier transform and applies this to understanding Steve Reich's use of rhythmic canons in a series of works from the early 1970s through the 1990s. The primary analytical tool is the rhythmic spectrum, which omits phase information, but the use of plots that include phase information is also demonstrated in a few instances. The method shows a consistency in Reich's rhythmic language despite experimentations with irregular cycles, which begins with the formulation of his “signature rhythm,” the basic rhythmic pattern of Clapping Music and Music for Pieces of Wood. The article also demonstrates the evolution of Reich's rhythmic experimentation preceding these pivotal pieces, through his “phase” works of the 1960s. It discusses the relationship of the Fourier-based method and concepts of meter, especially nonisochronous meter, maximally even rhythmic patterns, and the potential of rhythmic canons to interlock and make different kinds of combinatorial patterns.
Conference Paper
Full-text available
We present Conchord, a system for real-time automatic generation of musical harmony through navigation in a novel 12-dimensional Tonal Interval Space. In this tonal space, angular and Euclidean distances among vectors representing multi-level pitch configurations equate with music theory principles, and vector norms acts as an indicator of consonance. Building upon these attributes, users can intuitively and dynamically define a collection of chords based on their relation to a tonal center (or key) and their consonance level. Furthermore, two algorithmic strategies grounded in principles from function and root-motion harmonic theories allow the generation of chord progressions characteristic of Western tonal music.
Full-text available
We present D’accord, a generative music system for creating harmonically compatible accompaniments of symbolic and musical audio inputs with any number of voices, instrumentation and complexity. The main novelty of our approach centers on offering multiple ranked solutions between a database of pitch configurations and a given musical input based on tonal pitch relatedness and consonance indicators computed in a perceptually motivated Tonal Interval Space. Furthermore, we detail a method to estimate the key of symbolic and musical audio inputs based on attributes of the space, which underpins the generation of key-related pitch configurations. The system is controlled via an adaptive interface implemented for Ableton Live, MAX, and Pure Data, which facilitates music creation for users regardless of music expertise and simultaneously serves as a performance, entertainment, and learning tool. We perform a threefold evaluation of D’accord, which assesses the level of accuracy of our key-finding algorithm, the user enjoyment of generated harmonic accompaniments, and the usability and learnability of the system.
Conference Paper
We present Conchord, a system for real-time automatic generation of musical harmony through navigation in a novel 12-dimensional Tonal Interval Space. In this tonal space, Euclidean distances among multi-level pitch configurations equate with their perceptual proximity, and Euclidean distances of pitch configurations from the center of the space acts as an indicator of consonance. Building upon these attributes, users can intuitively and dynamically define a collection of chords based on their relation to a tonal center (or key) and their consonance level. Furthermore, two algorithmic strategies grounded in principles from function and root-motion harmonic theories generate chord progressions characteristic of Western tonal music.
The big question in the science of psychology is: Why are human cognition and behavior so different from the capabilities of every other animal species on Earth - including our close genetic relations, the chimpanzees? This book provides a coherent answer by examining those aspects of the human brain that have made triadic forms of perception and cognition possible. Mechanisms of dyadic association sufficiently explain animal perception, cognition, and behavior but a three-way associational mechanism is required to explain the human talents for language, tool-making, harmony perception, pictorial depth perception, and the joint attention that underlies all forms of social cooperation.