[Show abstract][Hide abstract] ABSTRACT: This paper introduces a new music signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the log-frequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasi-optimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic music signal and automatic sound-to-MIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic music signals and compared with manually annotated MIDI data.
IEEE Transactions on Audio Speech and Language Processing 04/2008; 16(3-16):639 - 650. DOI:10.1109/TASL.2007.912998 · 2.48 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this paper, we propose a new signal processing technique, "specmurt anasylis," that provides piano-roll-like visual display of multi-tone signals (e.g., polyphonic music). Specmurt is defined as inverse Fourier transform of linear spectrum with logarithmic frequency, unlike fa-miliar cepstrum defined as inverse Fourier transform of logarithmic spectrum with linear frequency. We apply this technique to music signals frencyque anasylis us-ing specmurt filreting instead of quefrency alanysis us-ing cepstrum liftering. Suppose that each sound con-tained in the multi-pitch signal has exactly the same har-monic structure pattern (i.e., the energy ratio of harmonic components), in logarithmic frequency domain the over-all shape of the multi-pitch spectrum is a superposition of the common spectral patterns with different degrees of parallel shift. The overall shape can be expressed as a convolution of a fundamental frequency pattern (degrees of parallel shift and power) and the common harmonic structure pattern. The fundamental frequency pattern is restored by division of the inverse Fourier transform of a given log-frequency spectrum, i.e., specmurt, by that of the common harmonic structure pattern. The proposed method was successfully tested on several pieces of mu-sic recordings.