Solving the Antisymmetry Problem Caused by Pitch Interval and Duration Ratio in Geometric Matching of Music.
- SourceAvailable from: Thomas G Fevens[Show abstract] [Hide abstract]
ABSTRACT: Abstract Consider two orthogonal closed chains on a cylinder. These chains are monotone,with respect to the tangential,direction. We wish to rigidly move one chain so that the total area between the two is minimized. This minimization is a geometric measure of similarity between two melodies proposed by O Maid n. The,direction rep- resents time and the axial direction, z, represents pitch. Let the two chains have n and m vertices respectively, where n m, We present an O(n + m) time algorithm if is xed, and an O(nm log(n + m)) time algorithm for general rigid motions. These bounds also apply for planar orthogonal monotone open chains, where area is measured only within the common,domain of the two chains in the direction This paper extends the results presented by the authors at CCCG’03 Computer Music Journal 01/2006; 30:67-76. · 0.76 Impact Factor
Conference Paper: Searching Digital Music Libraries.Digital Libraries: People, Knowledge, and Technology, 5th International Conference on Asian Digital Libraries, ICADL 2002 Singapore, December 11-14, 2002, Proceedings; 01/2002
- Inf. Process. Manage. 01/2002; 38:249-272.
Solving the Antisymmetry Problem Caused by
Pitch Interval and Duration Ratio in Geometric
Matching of Music
Hung-Hsuan Wu*, Hwei-Jen Lin*, Shwu-Huey Yen, and Hsiao-Wei Chang
Department of Computer Science and Information Engineering, Tamkang University,
151 Ying-Chuan Road, Tamsui, Taipei County, Taiwan, R.O.C.
Email: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com
Abstract—Music representation with pitch interval and
duration ratio can achieve invariance to transposition and
tempo. However, altering the pitch or duration of a single
music note will cause an antisymmetry effect. This paper
proposes an algorithm for computing a geometric measure
between two music fragments represented with pitch
interval and duration ratio, with the capability of detecting
and reducing these effects to improve search effectiveness.
Index Terms—content-based music retrieval, geometric
matching, pitch interval, duration ratio, antisymmetry effect
The two common forms of music data used by content-
based music retrieval systems are audio data and notated
music. Some systems first transcribe a query of the audio
data into notated music and then search in a database of
notated music. Systems searching in a database of notated
music are more common than those searching in a
database of audio data since notated music specifies
plenty of information, including pitch, duration, timbre,
loudness, and others. Therefore, this paper will focus on
systems searching in databases of notated music.
Techniques based on string matching, such as edit
distance      and n-gram  , have
been extensively explored in the literature, and can tackle
approximate matching. Although pitch string is the most
commonly used feature in these techniques, it alone is not
sufficient for searching in a large database . Smith et
al.  concluded that both pitch and duration are the
most important attributes, and their finding will be
followed in this research. Matching methods based on
edit distance are quick and flexible. However, the pitch
and duration information are processed separately and as
a result it cannot effectively handle the problem of pitch
and tempo fluctuation.
When it comes to comparing two melodic fragments,
techniques based on geometric matching have been
explored over the past few years, including point set
matching   and sweeping-line matching   
. The advantage of geometric matching is its high
accuracy rate of search; while its drawback is the fact that
it is time-consuming, and the time complexity usually
exceeds a polynomial of degree 2. For retrieval accuracy,
we adopted the geometric matching, and improved its
effectiveness by using the representation with pitch
interval and duration ratio with the corresponding
antisymmetry effects reduced. For a detailed survey of
geometric matching please refer to our previous work
In the cognitive theory of music , the melodic
motion (characterized by successive pitch intervals) and
contour are very important for the perception of a
melody. If a melody is transposed to a different key, then
the melodic motion and contour remain the same, and is
said to be ‘invariance to transposition’. When a melody is
evenly changed in tempo, the rhythmic pattern is not
changed, and is referred to as ‘invariance to tempo’.
Thus, for a better performance, we shall use the pitch
interval instead of the pitch and the duration ratio instead
of the duration in order to achieve both types of
invariance. The change of the pitch/duration of a single
note affects two consecutive pitch interval/duration ratios,
each of which is called an antisymmetry effect. For
geometric matching, the antisymmetry effect increases
the distance of two similar melodies by a multiple of the
expected amount. The main issue of this paper is to detect
and reduce the antisymmetry effect of pitch and of
The remainder of this paper is organized as follows.
Section 2 describes how to solve the antisymmetry
problems. Section 3 shows the experimental results and
compares the performance of the representation by
pitch/pitch interval and duration/duration ratio, and shows
how the performance of the representation by pitch
interval and duration ratio can be improved by reducing
the antisymmetry effect. Finally, we draw our
conclusions and provide suggestions for future work in
II. SOLVING ANTISYMMETRY PROBLEM
In this section, we first briefly introduce our previously
proposed geometric matching method for pitch interval-
duration sequences , and show how to solve the
antisymmetry problem caused by pitch replacement. Then
* Authors contributed equally to this work.
522 JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010
© 2010 ACADEMY PUBLISHER
we introduce a modified version, pitch interval-duration
ratio sequences, and show how to solve the antisymmetry
problem caused by duration change.
A. Pitch Interval-Duration Geometric Matching
In our previous work  , a music fragment was
represented as a pitch interval-duration sequence (PID
sequence), where a pitch interval is defined as the
difference of the pitches of two consecutive notes. Each
note could be described by a horizontal line segment, of
which the height and the width denote its pitch interval
and duration, respectively. For every two consecutive line
segments, we connect the end point of the former and the
starting point of the latter by a vertical edge. In such a
way, a PID sequence can be expressed as an orthogonal
chain, as shown in Figure 1. The distance between two
music fragments can then be measured based on the
minimum of the areas of regions enclosed by their
representing orthogonal chains over all possible
alignments. This kind of matching mechanism is called
pitch interval geometric matching.
With the use of the pitch interval instead of the
absolute pitch, it is not only of transposition invariance
but also no need to shift the query in the vertical direction
during the matching process. Without loss of generality,
we assume that the reference sequence is fixed. Thus we
need only to shift the query sequence horizontally to find
out the best alignment (yielding the minimum area). As
depicted in Figure 1, both the query and the reference are
modeled as monotonic pitch interval rectilinear functions
of time. The region between the two orthogonal chains is
partitioned into rectangles Cα, α = 1, 2, …, k, each
defined by two vertical edges and two horizontal
segments, where each vertical edge occurs at an endpoint
of a note and each horizontal segment corresponds to a
pitch interval. The area is measured as A =
the height, and the area of the rectangle Cα, respectively.
Since the minimum area between two melodies must
occur in a case when two vertical edges of the two
melodies coincide and there might be some duplication
over such cases (i.e. more than two coinciding edge pairs
occur at the same time) , there are at most m n possible
horizontal positions needed to be evaluated. Therefore,
there are at most O(m n) different regions to be evaluated
for area, and thus it takes O(m2n) time to find the
minimum area. The detail description of the algorithm
can be found in our previous work .
, where wα, hα, and Aα denote the width,
Figure 1. Two PID sequences are represented as orthogonal chains, the
region enclosed by which is shaded.
B. Solving Antisymmetry Problem for Pitch Interval
Although the use of pitch interval works well for
transposition invariance in music matching, a single pitch
replacement would cause an antisymmetry effect that the
area of two consecutive rectangles would be changed.
This effect can be reduced by deducted by some factor δ
) the area of these rectangles, which can be
detected by checking if the sum of two successive pitch
intervals in the query is equal to the sum of two
corresponding successive intervals in the reference.
As shown in Figure 2(a), the pitch of the second note
of the query Q is changed to form a variant query Q’,
which is then matched with a given reference R. When
the (starting time of the) first note of the variant query
and the first note of the reference are aligned as shown in
Figure 2(b), the area enclosed by the two sequence is A
of the second and third notes of the variant query and that
of the reference are equal (-2+2 = +1-1), we deduce that
the pitch replacement occurs in the second note of the
query and the areas, A2 and A3, of the second and the third
rectangles are considered to be caused by the
antisymmetry effect. If we set the deducting factor δ to
0.5, both the evaluated areas A2 and A3 would be reduced
to half, and thus the total area would be recomputed as A’
= A – (0.5A2 + 0.5A3). Especially, when the deducting
factor δ is set to 1, the antisymmetry effect would be
completely removed; when the deducting factor δ is set to
0, the antisymmetry effect would be completely reserved.
In our experiments, we set the deducting factor to δ =
0.75, which was found to be capable to improve the
Let <n1, n2, …, nk> be a PID sequence representing a
music fragment of k notes, each ni is represented as
(n_pii, n_di), where n_pii and n_di denote its pitch interval
and duration, respectively. To detect a pitch replacement
causing an antisymmetry effect while matching two PID
sequences Q = <q1, q2, …, qm> and R = <r1, r2, …, rn>,
we follow the rule: If q_pii + q_pii+1 = r_pij + r_pij+1,
q_pii ≠ r_pij, q_pii-1 = r_pij-1, and q_pii+2 = r_pij+2, then
the note qi is said to have a pitch replacement. As shown
in Figure 2, the second (i = 2 and j = 2) note of the query
satisfies the above condition and the pitch replacement
could be detected. This proposed module solving the
antisymmetry problem for pitch interval is called SAP-
iA Observing that the sum of the pitch intervals
C. Solving Antisymmetry Problem for Duration Ratio
To achieve tempo invariance, we may adopt a pitch
interval-duration ratio sequence (called PIDR sequence)
instead of a PID sequence to represent a melody
fragment. If we let <n1, n2, …, nk> be a PIDR sequence
representing a melody fragment of k notes, then each
node ni is represented by its pitch interval n_pii and its
duration ratio n_dri as (n_pii, n_dri), where the duration
ratio n_dri is defined as the ratio of the duration of note ni
to that of the preceding note ni-1; that is, n_dri = n_di /
n_di-1. Similar to pitch interval, representation with
JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010523
© 2010 ACADEMY PUBLISHER
duration ratios has the antisymmetry problem caused by
duration replacement. We shall reduce the problem by
adjusting the duration of the corresponding query note
and giving some penalty to the reevaluated area. Let Q =
<q1, q2, …, qm> and R = <r1, r2, …, rn> be two PIDR
sequences, then the conditions for detecting a duration
replacement on note qi are as follows: q_dri · q_dri+1 =
r_drj · r_drj+1, q_dri ≠ r_drj, q_dri-1 = r_drj-1, and q_dri+2
Figure 2. (a). The PID sequences of an original query, a variant query
and a reference, (b). the antisymmetry effect occurs at the second and
third rectangles; If δ = 0.5, the total area would be reduced from A = 4.5
to A’ = 2.25.
Suppose the third note of the query shown in Figure
3(a) undergoes a duration replacement and becomes the
melody shown in Figure 3(b). When the (starting time of
the) first note of the new query and the first note of the
reference given in Figure 3(c) are aligned, the area
enclosed by the two sequence is
5 . 29
shown in Figure 4(a). We detect the duration replacement
by observing that q_dr3 · q_dr4 = 1 · 2 = 1/2 · 4 = r_dr3 ·
r_dr4, q_dr3 = 1 ≠ 1/2 = r_dr3, q_dr2 = 1/2 = r_dr2 and
q_dr5 = 1 = r_dr5, then adjust the duration of the
corresponding note in the query to q_d3 = r_d3 so that the
reevaluated area A’ becomes zero, as shown in Figure
4(b), and finally, we add a penalty value λ (A –A’) to A’ to
produce the final evaluated area A’’ = A’ + λ (A – A’)
corresponding to this alignment. In this research we set λ
= 0.1 and thus A’’ = 2.95 in this example. This proposed
module solving the antisymmetry problem for duration
ratio is called SAP-DR.
Figure 3. (a). Query Q, (b). query Q’ with duration replacement in the
third note, (c). reference R.
D. Simultaneously Solving the Two antisymmetry
The antisymmetry problems caused by the replacement
of pitch and duration can be solved simultaneously as
follows. We first detect the duration replacement, adjust
the detected durations, if any, then detect the replacement
of pitches, and reevaluated the area. This proposed
module solving the antisymmetry problem for both pitch
interval and duration ratio is called SAP-PIDR.
Since the running time for our previously proposed
geometric matching algorithm, called the Pitch-Interval-
GeoMatching module is O(m2n) and the detection of
replacement of pitch or duration is also O(m2n), the
computational complexity of the proposed module SAP-
PIDR considering antisymmetry effect remains the same.
Figure 4. (a). The area of the region enclosed by query Q and
reference R is A = 29.5, (b). the area of the region enclosed by query Q’
and reference R is A’ = 0.
524JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010
© 2010 ACADEMY PUBLISHER
III. EXPERIMENTAL RESULTS
We used a corpus of 807 pieces of music obtained
from the internet, consisting of a total of 3,065,420 music
notes. A subset of 50 pieces of music were randomly
selected, from each of which a theme, as suggested by
Barlow et al. , was extracted to form a query
consisting of 10 to 36 music notes. As a result, a set of 50
queries were introduced. In order to evaluate the
performance of the proposed method, we generated three
variant versions of the query set. The first version is
generated by randomly changing the pitches of some
notes in each query. The second version is generated by
randomly changing the durations of some notes in each
query. The third version is generated by randomly
changing the pitches of some notes and the durations of
some notes in each query. The variation rate for each
query is between 10% and 20%.
The rank of the corresponding reference in the
retrieved list is called the retrieval rank. As shown in
Figure 5, we tested the first variant version of the query
set over various values of the parameter δ, and found that
the best setting is δ = 0.5 ~ 1.0, from which we chose δ =
Figure 6 compares the retrieval rates on top-n lists for
the SAP-PI module and Pitch-Interval-GeoMatching
module proposed in our previous work , and
demonstrates that the former (SAP-PI) indeed reduces the
antisymmetry effect by pitch interval and outperforms the
latter and achieves retrieval rate of 100% on top-5
Figure 5. The average retrieval rank for the first variant version of the
query set over various settings for δ.
1248 1632 64128256512 807
Figure 6. The retrieval rates on top-n lists for two algorithms.
As shown in Figure 7, we tested the second variant
version of the query set over various values of the
parameter λ, and found that the best setting is λ < 0.2,
from which we chose λ = 0.1.
Figure 8 compares the retrieval rates on top-n lists for
the SAP-DR module and Pitch-Interval-GeoMatching
module, and demonstrates that the former (SAP-DR),
which uses duration ratio instead of duration with the
antisymmetry effect solved, outperforms the latter and
achieves retrieval rate of 100% on top-16 retrieval lists.
Figure 7. The average retrieval rank for the second variant version of
the query set over various settings for λ.
Figure 8. The retrieval rates on top-n lists for two algorithms.
Finally, we tested our final module SAP-PIDR and the
Pitch-Interval-GeoMatching module on the third variant
version of query set, with δ = 0.75 and λ = 0.1. As shown
in Figure 9, the three modules achieve 100% on top-37,
top-288 and top-565 lists, respectively.
Figure 9. The retrieval rates on top-n lists for three algorithms.
The comparison of precision rates and recall rates for
our SAP-PIDR module and the Pitch-Interval-
GeoMatching module is given in Figure 10. In this
experiment, 30 pieces of music were randomly selected
from the corpus of 655 pieces of music to serve as
queries. For each of the selected queries 15 variant
JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010 525
© 2010 ACADEMY PUBLISHER
versions were made and added to the corpus to form a
new corpus consisting of 1105 pieces of music. Each
relevant reference was generated by randomly changing
the pitches of some notes and durations of some notes in
a selected query. The variation rate for each relevant
reference is between 2% and 30%. Figure 10
demonstrates that the SAP-PIDR module outperforms the
number of retrieved pieces
precision rate of Pitch-Interval-
precision rate of SAP-PIDR
recall rate of Pitch-Interval-
recall rate of SAP-PIDR
Figure 10. The precision and recall rates for two algorithms.
IV. CONCLUSIONS AND FUTURE WORK
In this paper we proposed an algorithm for computing
a geometric measure between two music fragments
represented with pitch interval and duration ratio, with
the capability of detecting and reducing these effects to
improve search effectiveness. The experimental results
show that the proposed method successfully detects the
antisymmetry effect and effectively reserves invariance to
both transposition and tempo.
However, the proposed method only deals with the
pitch/duration replacement of discrete (not adjacent)
notes. In our future work, we shall investigate the
replacement of multiple adjacent notes. Beside, we would
like to integrate a pitch-tracking algorithm  with our
matching technique to construct a more practical system.
 G. Aloupis, T. Fevens, S. Langerman, T. Matsui, A. Mesa,
Y. Nunez, D. Rappaport, and G. Toussaint, “Algorithms
for computing geometric measures of melodic similarity,”
Computer Music Journal 30(3) (2006), pp. 67–76.
 D. Bainbridge, M. Dewsnip, and I. H. Witten, “Searching
digital music libraries,” Information Processing and
Management 41(1) (2005), pp. 41–56.
 H. Barlow and S. Morgenstern, “A dictionary of musical
themes,” Crown Publishers, Inc., New York, 1975.
 D. Byrd and T. Crawford, “Problems of music information
retrieval in the real world,” Information Processing and
Management 38 (2002), pp. 249–272.
 A. de Cheveigné and H. Kawahara, “Yin, a fundamental
frequency estimator for speech and music,” Journal of
Acoustical Society of America, 111(4) (2002), pp. 1917–
 R. Clifford, M. Christodoulakis, T. Crawford, D. Meredith,
and G. Wiggins, “A fast, randomised, maximal subset
matching algorithm for document–level music retrieval,”
in Proceedings of ISMIR 2006, pp. 150–155.
 S. Doraisamy and S. Rüger, “A Polyphonic Music
Retrieval System Using N-grams,” in Proceedings of the
Fifth International Conference on Music Information
Retrieval, 2004, pp. 204–209.
 C. Francu and C. G. Nevill–Manning, “Distance metrics
and indexing strategies for a digital library of popular
music,” in Proceedings of the IEEE International
Conference on Multimedia and EXPO, 2000, pp. 889–892.
 A. Guo and H. Siegelmann, “Time-warped longest
common subsequence algorithm for music retrieval,” in
Proceedings of the Fifth International Conference on
Music Information Retrieval, 2004, pp. 10–15.
 K. Lemström and E. Ukkonen, “Including interval
encoding into edit distance based music comparison and
retrieval,” in Proceedings of Symposium on Creative &
Cultural Aspects and Applications of AI & Cognitive
Science (AISB 2000), Birmingham, United Kingdom,
2000, pp. 53–60.
 Hwei-Jen Lin, Hung-Hsuan Wu, and Yang-Ta Kao,
“Geometric measures of distance between two pitch
contour sequences,” Journal of Computers 19(2) (2008),
 Hwei-Jen Lin and Hung-Hsuan Wu, “Efficient geometric
measure of music similarity,” Information Processing
Letter 109(2) (2008), pp. 116–120.
 A. Lubiw and L. Tanur, “Pattern matching in polyphonic
music as a weighted geometric translation problem,” in
Proceedings of the Fifth International Conference on
Music Information Retrieval, 2004, pp. 289-296.
 D. Ó. Maidín, “A geometrical algorithm for melodic
difference,” Computing in Musicology 11 (1998), pp. 65–
 M. Mongeau and D. Sankoff, “Comparison of musical
sequences,” Computers and the Humanities 24 (1990), pp.
 L. A. Smith, R. J. McNab, and I. H. Witten, “Sequence–
based melodic comparison: a dynamic programming
approach,” Computing in Musicology 11 (1998), pp. 101–
 B. Snyder, “Music and memory: an introduction,” MIT
 I. S. H. Suyoto and A. L. Uitdenbogerd, “Effectiveness of
note duration information for music retrieval,” in
Proceedings of the 10th Database Systems for Advanced
Applications Conference, DASFAA 2005, Beijing, China,
2005, pp. 265–275.
 F. Wiering, R. Typke,
“Transportation distances and their application in music–
notation retrieval,” Computing in Musicology 13 (2004),
Hung-Hsuan Wu born in I-Lan, Taiwan, R.O.C., received the
B.S. and M.S. degrees in Computer Science and Information
Engineering from Tamkang University, Taipei, Taiwan in 1996
and 2002, respectively. He is currently a Ph. D. Candidate at the
Department of Information Engineering of Tamkang University,
Taipei, Taiwan, R. O. C.. His research interests include music
information retrieval, pattern recognition and image processing.
He worked for K. C. Trade Co. as a sales assistant during 1997-
1999, and as a software engineering in Kinpo Electronics, Inc.
Hwei-Jen Lin born in I-Lan, Taiwan, R.O.C., received the B.S.
degree in applied mathematics from National Chiao Tung
and R. C. Veltkamp,
526JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010
© 2010 ACADEMY PUBLISHER