Page 1

Solving the Antisymmetry Problem Caused by

Pitch Interval and Duration Ratio in Geometric

Matching of Music

Hung-Hsuan Wu*, Hwei-Jen Lin*, Shwu-Huey Yen, and Hsiao-Wei Chang

Department of Computer Science and Information Engineering, Tamkang University,

151 Ying-Chuan Road, Tamsui, Taipei County, Taiwan, R.O.C.

Email: joseph5@seed.net.tw, hjlin@cs.tku.edu.tw, 105390@mail.tku.edu.tw, changhw@cc.cust.edu.tw

Abstract—Music representation with pitch interval and

duration ratio can achieve invariance to transposition and

tempo. However, altering the pitch or duration of a single

music note will cause an antisymmetry effect. This paper

proposes an algorithm for computing a geometric measure

between two music fragments represented with pitch

interval and duration ratio, with the capability of detecting

and reducing these effects to improve search effectiveness.

Index Terms—content-based music retrieval, geometric

matching, pitch interval, duration ratio, antisymmetry effect

I. INTRODUCTION

The two common forms of music data used by content-

based music retrieval systems are audio data and notated

music. Some systems first transcribe a query of the audio

data into notated music and then search in a database of

notated music. Systems searching in a database of notated

music are more common than those searching in a

database of audio data since notated music specifies

plenty of information, including pitch, duration, timbre,

loudness, and others. Therefore, this paper will focus on

systems searching in databases of notated music.

Techniques based on string matching, such as edit

distance [9] [10] [15] [16] [18] and n-gram [2] [7], have

been extensively explored in the literature, and can tackle

approximate matching. Although pitch string is the most

commonly used feature in these techniques, it alone is not

sufficient for searching in a large database [4]. Smith et

al. [16] concluded that both pitch and duration are the

most important attributes, and their finding will be

followed in this research. Matching methods based on

edit distance are quick and flexible. However, the pitch

and duration information are processed separately and as

a result it cannot effectively handle the problem of pitch

and tempo fluctuation.

When it comes to comparing two melodic fragments,

techniques based on geometric matching have been

explored over the past few years, including point set

matching [6] [19] and sweeping-line matching [1] [8] [11]

[13][14]. The advantage of geometric matching is its high

accuracy rate of search; while its drawback is the fact that

it is time-consuming, and the time complexity usually

exceeds a polynomial of degree 2. For retrieval accuracy,

we adopted the geometric matching, and improved its

effectiveness by using the representation with pitch

interval and duration ratio with the corresponding

antisymmetry effects reduced. For a detailed survey of

geometric matching please refer to our previous work

[11] [12].

In the cognitive theory of music [17], the melodic

motion (characterized by successive pitch intervals) and

contour are very important for the perception of a

melody. If a melody is transposed to a different key, then

the melodic motion and contour remain the same, and is

said to be ‘invariance to transposition’. When a melody is

evenly changed in tempo, the rhythmic pattern is not

changed, and is referred to as ‘invariance to tempo’.

Thus, for a better performance, we shall use the pitch

interval instead of the pitch and the duration ratio instead

of the duration in order to achieve both types of

invariance. The change of the pitch/duration of a single

note affects two consecutive pitch interval/duration ratios,

each of which is called an antisymmetry effect. For

geometric matching, the antisymmetry effect increases

the distance of two similar melodies by a multiple of the

expected amount. The main issue of this paper is to detect

and reduce the antisymmetry effect of pitch and of

duration.

The remainder of this paper is organized as follows.

Section 2 describes how to solve the antisymmetry

problems. Section 3 shows the experimental results and

compares the performance of the representation by

pitch/pitch interval and duration/duration ratio, and shows

how the performance of the representation by pitch

interval and duration ratio can be improved by reducing

the antisymmetry effect. Finally, we draw our

conclusions and provide suggestions for future work in

Section 4.

II. SOLVING ANTISYMMETRY PROBLEM

In this section, we first briefly introduce our previously

proposed geometric matching method for pitch interval-

duration sequences [11], and show how to solve the

antisymmetry problem caused by pitch replacement. Then

* Authors contributed equally to this work.

522JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010

© 2010 ACADEMY PUBLISHER

doi:10.4304/jmm.5.5.522-527

Page 2

we introduce a modified version, pitch interval-duration

ratio sequences, and show how to solve the antisymmetry

problem caused by duration change.

A. Pitch Interval-Duration Geometric Matching

In our previous work [11] [12], a music fragment was

represented as a pitch interval-duration sequence (PID

sequence), where a pitch interval is defined as the

difference of the pitches of two consecutive notes. Each

note could be described by a horizontal line segment, of

which the height and the width denote its pitch interval

and duration, respectively. For every two consecutive line

segments, we connect the end point of the former and the

starting point of the latter by a vertical edge. In such a

way, a PID sequence can be expressed as an orthogonal

chain, as shown in Figure 1. The distance between two

music fragments can then be measured based on the

minimum of the areas of regions enclosed by their

representing orthogonal chains over all possible

alignments. This kind of matching mechanism is called

pitch interval geometric matching.

With the use of the pitch interval instead of the

absolute pitch, it is not only of transposition invariance

but also no need to shift the query in the vertical direction

during the matching process. Without loss of generality,

we assume that the reference sequence is fixed. Thus we

need only to shift the query sequence horizontally to find

out the best alignment (yielding the minimum area). As

depicted in Figure 1, both the query and the reference are

modeled as monotonic pitch interval rectilinear functions

of time. The region between the two orthogonal chains is

partitioned into rectangles Cα, α = 1, 2, …, k, each

defined by two vertical edges and two horizontal

segments, where each vertical edge occurs at an endpoint

of a note and each horizontal segment corresponds to a

pitch interval. The area is measured as A =

∑∑

==

11

αα

the height, and the area of the rectangle Cα, respectively.

Since the minimum area between two melodies must

occur in a case when two vertical edges of the two

melodies coincide and there might be some duplication

over such cases (i.e. more than two coinciding edge pairs

occur at the same time) [1], there are at most m n possible

horizontal positions needed to be evaluated. Therefore,

there are at most O(m n) different regions to be evaluated

for area, and thus it takes O(m2n) time to find the

minimum area. The detail description of the algorithm

can be found in our previous work [11].

⋅=

kk

hwA

ααα

, where wα, hα, and Aα denote the width,

Figure 1. Two PID sequences are represented as orthogonal chains, the

region enclosed by which is shaded.

B. Solving Antisymmetry Problem for Pitch Interval

(SAP-PI)

Although the use of pitch interval works well for

transposition invariance in music matching, a single pitch

replacement would cause an antisymmetry effect that the

area of two consecutive rectangles would be changed.

This effect can be reduced by deducted by some factor δ

(

10

≤ ≤δ

) the area of these rectangles, which can be

detected by checking if the sum of two successive pitch

intervals in the query is equal to the sum of two

corresponding successive intervals in the reference.

As shown in Figure 2(a), the pitch of the second note

of the query Q is changed to form a variant query Q’,

which is then matched with a given reference R. When

the (starting time of the) first note of the variant query

and the first note of the reference are aligned as shown in

Figure 2(b), the area enclosed by the two sequence is A

=∑

≤≤

51

.

i

of the second and third notes of the variant query and that

of the reference are equal (-2+2 = +1-1), we deduce that

the pitch replacement occurs in the second note of the

query and the areas, A2 and A3, of the second and the third

rectangles are considered to be caused by the

antisymmetry effect. If we set the deducting factor δ to

0.5, both the evaluated areas A2 and A3 would be reduced

to half, and thus the total area would be recomputed as A’

= A – (0.5A2 + 0.5A3). Especially, when the deducting

factor δ is set to 1, the antisymmetry effect would be

completely removed; when the deducting factor δ is set to

0, the antisymmetry effect would be completely reserved.

In our experiments, we set the deducting factor to δ =

0.75, which was found to be capable to improve the

retrieval result.

Let <n1, n2, …, nk> be a PID sequence representing a

music fragment of k notes, each ni is represented as

(n_pii, n_di), where n_pii and n_di denote its pitch interval

and duration, respectively. To detect a pitch replacement

causing an antisymmetry effect while matching two PID

sequences Q = <q1, q2, …, qm> and R = <r1, r2, …, rn>,

we follow the rule: If q_pii + q_pii+1 = r_pij + r_pij+1,

q_pii ≠ r_pij, q_pii-1 = r_pij-1, and q_pii+2 = r_pij+2, then

the note qi is said to have a pitch replacement. As shown

in Figure 2, the second (i = 2 and j = 2) note of the query

satisfies the above condition and the pitch replacement

could be detected. This proposed module solving the

antisymmetry problem for pitch interval is called SAP-

PI.

iA Observing that the sum of the pitch intervals

C. Solving Antisymmetry Problem for Duration Ratio

(SAP-DR)

To achieve tempo invariance, we may adopt a pitch

interval-duration ratio sequence (called PIDR sequence)

instead of a PID sequence to represent a melody

fragment. If we let <n1, n2, …, nk> be a PIDR sequence

representing a melody fragment of k notes, then each

node ni is represented by its pitch interval n_pii and its

duration ratio n_dri as (n_pii, n_dri), where the duration

ratio n_dri is defined as the ratio of the duration of note ni

to that of the preceding note ni-1; that is, n_dri = n_di /

n_di-1. Similar to pitch interval, representation with

JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010523

© 2010 ACADEMY PUBLISHER

Page 3

duration ratios has the antisymmetry problem caused by

duration replacement. We shall reduce the problem by

adjusting the duration of the corresponding query note

and giving some penalty to the reevaluated area. Let Q =

<q1, q2, …, qm> and R = <r1, r2, …, rn> be two PIDR

sequences, then the conditions for detecting a duration

replacement on note qi are as follows: q_dri · q_dri+1 =

r_drj · r_drj+1, q_dri ≠ r_drj, q_dri-1 = r_drj-1, and q_dri+2

= r_drj+2.

(a)

(b)

Figure 2. (a). The PID sequences of an original query, a variant query

and a reference, (b). the antisymmetry effect occurs at the second and

third rectangles; If δ = 0.5, the total area would be reduced from A = 4.5

to A’ = 2.25.

Suppose the third note of the query shown in Figure

3(a) undergoes a duration replacement and becomes the

melody shown in Figure 3(b). When the (starting time of

the) first note of the new query and the first note of the

reference given in Figure 3(c) are aligned, the area

enclosed by the two sequence is

5 .29

91

==∑

≤i

≤

iAA

, as

shown in Figure 4(a). We detect the duration replacement

by observing that q_dr3 · q_dr4 = 1 · 2 = 1/2 · 4 = r_dr3 ·

r_dr4, q_dr3 = 1 ≠ 1/2 = r_dr3, q_dr2 = 1/2 = r_dr2 and

q_dr5 = 1 = r_dr5, then adjust the duration of the

corresponding note in the query to q_d3 = r_d3 so that the

reevaluated area A’ becomes zero, as shown in Figure

4(b), and finally, we add a penalty value λ (A –A’) to A’ to

produce the final evaluated area A’’ = A’ + λ (A – A’)

corresponding to this alignment. In this research we set λ

= 0.1 and thus A’’ = 2.95 in this example. This proposed

module solving the antisymmetry problem for duration

ratio is called SAP-DR.

(a)

(b)

(c)

Figure 3. (a). Query Q, (b). query Q’ with duration replacement in the

third note, (c). reference R.

D. Simultaneously Solving the Two antisymmetry

Problems (SAP-PIDR)

The antisymmetry problems caused by the replacement

of pitch and duration can be solved simultaneously as

follows. We first detect the duration replacement, adjust

the detected durations, if any, then detect the replacement

of pitches, and reevaluated the area. This proposed

module solving the antisymmetry problem for both pitch

interval and duration ratio is called SAP-PIDR.

Since the running time for our previously proposed

geometric matching algorithm, called the Pitch-Interval-

GeoMatching module is O(m2n) and the detection of

replacement of pitch or duration is also O(m2n), the

computational complexity of the proposed module SAP-

PIDR considering antisymmetry effect remains the same.

(a)

(b)

Figure 4. (a). The area of the region enclosed by query Q and

reference R is A = 29.5, (b). the area of the region enclosed by query Q’

and reference R is A’ = 0.

524JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010

© 2010 ACADEMY PUBLISHER

Page 4

III. EXPERIMENTAL RESULTS

We used a corpus of 807 pieces of music obtained

from the internet, consisting of a total of 3,065,420 music

notes. A subset of 50 pieces of music were randomly

selected, from each of which a theme, as suggested by

Barlow et al. [3], was extracted to form a query

consisting of 10 to 36 music notes. As a result, a set of 50

queries were introduced. In order to evaluate the

performance of the proposed method, we generated three

variant versions of the query set. The first version is

generated by randomly changing the pitches of some

notes in each query. The second version is generated by

randomly changing the durations of some notes in each

query. The third version is generated by randomly

changing the pitches of some notes and the durations of

some notes in each query. The variation rate for each

query is between 10% and 20%.

The rank of the corresponding reference in the

retrieved list is called the retrieval rank. As shown in

Figure 5, we tested the first variant version of the query

set over various values of the parameter δ, and found that

the best setting is δ = 0.5 ~ 1.0, from which we chose δ =

0.75.

Figure 6 compares the retrieval rates on top-n lists for

the SAP-PI module and Pitch-Interval-GeoMatching

module proposed in our previous work [11], and

demonstrates that the former (SAP-PI) indeed reduces the

antisymmetry effect by pitch interval and outperforms the

latter and achieves retrieval rate of 100% on top-5

retrieval lists.

1.211

1.105 1.1051.079

31.421

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

0 0.5 0.750.8751

δ

average rank

Figure 5. The average retrieval rank for the first variant version of the

query set over various settings for δ.

50%

55%

60%

65%

70%

75%

80%

85%

90%

95%

100%

1248 16 3264 128 256512807

n

retrieval rate

Pitch-Interval-GeoMatching

SAP-PI

Figure 6. The retrieval rates on top-n lists for two algorithms.

As shown in Figure 7, we tested the second variant

version of the query set over various values of the

parameter λ, and found that the best setting is λ < 0.2,

from which we chose λ = 0.1.

Figure 8 compares the retrieval rates on top-n lists for

the SAP-DR module and Pitch-Interval-GeoMatching

module, and demonstrates that the former (SAP-DR),

which uses duration ratio instead of duration with the

antisymmetry effect solved, outperforms the latter and

achieves retrieval rate of 100% on top-16 retrieval lists.

22.84

2.03

1.24

14.14

0

5

10

15

20

25

0.60.4 0.2 0.1

λ

average rank

Figure 7. The average retrieval rank for the second variant version of

the query set over various settings for λ.

60%

70%

80%

90%

100%

1248 16 3264128256 512807

n

retrieval rate

Pitch-Interval-GeoMatching

SAP-DR

Figure 8. The retrieval rates on top-n lists for two algorithms.

Finally, we tested our final module SAP-PIDR and the

Pitch-Interval-GeoMatching module on the third variant

version of query set, with δ = 0.75 and λ = 0.1. As shown

in Figure 9, the three modules achieve 100% on top-37,

top-288 and top-565 lists, respectively.

30%

40%

50%

60%

70%

80%

90%

100%

1248 1632 64128 256512807

n

retrieval rate

SAP-PIDR

Pitch-Interval-GeoMatching

Figure 9. The retrieval rates on top-n lists for three algorithms.

The comparison of precision rates and recall rates for

our SAP-PIDR module and the Pitch-Interval-

GeoMatching module is given in Figure 10. In this

experiment, 30 pieces of music were randomly selected

from the corpus of 655 pieces of music to serve as

queries. For each of the selected queries 15 variant

JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010525

© 2010 ACADEMY PUBLISHER

Page 5

versions were made and added to the corpus to form a

new corpus consisting of 1105 pieces of music. Each

relevant reference was generated by randomly changing

the pitches of some notes and durations of some notes in

a selected query. The variation rate for each relevant

reference is between 2% and 30%. Figure 10

demonstrates that the SAP-PIDR module outperforms the

Pitch-Interval-GeoMatching module.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 1001000

number of retrieved pieces

rate

precision rate of Pitch-Interval-

GeoMatching

precision rate of SAP-PIDR

recall rate of Pitch-Interval-

GeoMatching

recall rate of SAP-PIDR

Figure 10. The precision and recall rates for two algorithms.

IV. CONCLUSIONS AND FUTURE WORK

In this paper we proposed an algorithm for computing

a geometric measure between two music fragments

represented with pitch interval and duration ratio, with

the capability of detecting and reducing these effects to

improve search effectiveness. The experimental results

show that the proposed method successfully detects the

antisymmetry effect and effectively reserves invariance to

both transposition and tempo.

However, the proposed method only deals with the

pitch/duration replacement of discrete (not adjacent)

notes. In our future work, we shall investigate the

replacement of multiple adjacent notes. Beside, we would

like to integrate a pitch-tracking algorithm [5] with our

matching technique to construct a more practical system.

REFERENCES

[1] G. Aloupis, T. Fevens, S. Langerman, T. Matsui, A. Mesa,

Y. Nunez, D. Rappaport, and G. Toussaint, “Algorithms

for computing geometric measures of melodic similarity,”

Computer Music Journal 30(3) (2006), pp. 67–76.

[2] D. Bainbridge, M. Dewsnip, and I. H. Witten, “Searching

digital music libraries,” Information Processing and

Management 41(1) (2005), pp. 41–56.

[3] H. Barlow and S. Morgenstern, “A dictionary of musical

themes,” Crown Publishers, Inc., New York, 1975.

[4] D. Byrd and T. Crawford, “Problems of music information

retrieval in the real world,” Information Processing and

Management 38 (2002), pp. 249–272.

[5] A. de Cheveigné and H. Kawahara, “Yin, a fundamental

frequency estimator for speech and music,” Journal of

Acoustical Society of America, 111(4) (2002), pp. 1917–

1930.

[6] R. Clifford, M. Christodoulakis, T. Crawford, D. Meredith,

and G. Wiggins, “A fast, randomised, maximal subset

matching algorithm for document–level music retrieval,”

in Proceedings of ISMIR 2006, pp. 150–155.

[7] S. Doraisamy and S. Rüger, “A Polyphonic Music

Retrieval System Using N-grams,” in Proceedings of the

Fifth International Conference on Music Information

Retrieval, 2004, pp. 204–209.

[8] C. Francu and C. G. Nevill–Manning, “Distance metrics

and indexing strategies for a digital library of popular

music,” in Proceedings of the IEEE International

Conference on Multimedia and EXPO, 2000, pp. 889–892.

[9] A. Guo and H. Siegelmann, “Time-warped longest

common subsequence algorithm for music retrieval,” in

Proceedings of the Fifth International Conference on

Music Information Retrieval, 2004, pp. 10–15.

[10] K. Lemström and E. Ukkonen, “Including interval

encoding into edit distance based music comparison and

retrieval,” in Proceedings of Symposium on Creative &

Cultural Aspects and Applications of AI & Cognitive

Science (AISB 2000), Birmingham, United Kingdom,

2000, pp. 53–60.

[11] Hwei-Jen Lin, Hung-Hsuan Wu, and Yang-Ta Kao,

“Geometric measures of distance between two pitch

contour sequences,” Journal of Computers 19(2) (2008),

pp. 44–66.

[12] Hwei-Jen Lin and Hung-Hsuan Wu, “Efficient geometric

measure of music similarity,” Information Processing

Letter 109(2) (2008), pp. 116–120.

[13] A. Lubiw and L. Tanur, “Pattern matching in polyphonic

music as a weighted geometric translation problem,” in

Proceedings of the Fifth International Conference on

Music Information Retrieval, 2004, pp. 289-296.

[14] D. Ó. Maidín, “A geometrical algorithm for melodic

difference,” Computing in Musicology 11 (1998), pp. 65–

72.

[15] M. Mongeau and D. Sankoff, “Comparison of musical

sequences,” Computers and the Humanities 24 (1990), pp.

161–175.

[16] L. A. Smith, R. J. McNab, and I. H. Witten, “Sequence–

based melodic comparison: a dynamic programming

approach,” Computing in Musicology 11 (1998), pp. 101–

117.

[17] B. Snyder, “Music and memory: an introduction,” MIT

Press, 2000.

[18] I. S. H. Suyoto and A. L. Uitdenbogerd, “Effectiveness of

note duration information for music retrieval,” in

Proceedings of the 10th Database Systems for Advanced

Applications Conference, DASFAA 2005, Beijing, China,

2005, pp. 265–275.

[19] F. Wiering, R. Typke,

“Transportation distances and their application in music–

notation retrieval,” Computing in Musicology 13 (2004),

pp. 113–128.

Hung-Hsuan Wu born in I-Lan, Taiwan, R.O.C., received the

B.S. and M.S. degrees in Computer Science and Information

Engineering from Tamkang University, Taipei, Taiwan in 1996

and 2002, respectively. He is currently a Ph. D. Candidate at the

Department of Information Engineering of Tamkang University,

Taipei, Taiwan, R. O. C.. His research interests include music

information retrieval, pattern recognition and image processing.

He worked for K. C. Trade Co. as a sales assistant during 1997-

1999, and as a software engineering in Kinpo Electronics, Inc.

in 2002-2003.

Hwei-Jen Lin born in I-Lan, Taiwan, R.O.C., received the B.S.

degree in applied mathematics from National Chiao Tung

and R. C. Veltkamp,

526JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010

© 2010 ACADEMY PUBLISHER

Page 6

University, Hsinchu, Taiwan in 1981 and the M.S. and the

Ph.D. degrees in mathematics from Northeastern University,

Boston, U.S.A. in 1983 and 1989, respectively. She is currently

an associate professor at the Department of Computer Science

and Information Engineering of Tamkang University, Taipei,

Taiwan, R.O.C.. Her current research interests include pattern

recognition, image processing, and intelligent computation

algorithms. She worked for Northeastern University, Boston

and Rhode College, Providence, Rhode Island in U.S.A., as a

lecturer in 1989-1990 and an assistant professor in 1990-1991,

respectively.

Shwu-Huey Yen born in Taipei, Taiwan, R.O.C., received the

B.S. degree in applied mathematics from Fu-Jen Catholic

University, Taipei, Taiwan in 1980 and the M.S. and the Ph.D.

degrees in mathematics from Northeastern University, Boston,

U.S.A. in 1982 and 1986, respectively.

She is currently an associate professor at the Department of

Computer Science and Information Engineering of Tamkang

University, Taipei, Taiwan, R.O.C.. Her research interest is in

multimedia processing including security, feature matching and

information retrieval.

Hsiao-Wei Chang born in Tainan, Taiwan, R.O.C., received

the B.S. in Applied Mathematics from Fu Jen Catholic

University, Taipei, Taiwan in 1980 and the M.S. degree in

Computer Science from Texas A&I University, Texas, U.S.A.

in 1986. He is currently a Ph.D. Candidate at the Department of

Computer Science and Information Engineering of Tamkang

University, Taipei, Taiwan, R.O.C.. His research interests

include pattern recognition and image processing.

He is currently a lecturer at the Department of Computer

Science and Information Engineering, China University of

Science and Technology, Taipei, Taiwan, R.O.C. from 1988.

JOURNAL OF MULTIMEDIA, VOL. 5, NO. 5, OCTOBER 2010527

© 2010 ACADEMY PUBLISHER