Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

Centre for Digital Music, Queen Mary Univ. of London, London, UK
IEEE Journal of Selected Topics in Signal Processing (Impact Factor: 3.63). 11/2011; 5(6):1111 - 1123. DOI: 10.1109/JSTSP.2011.2162394
Source: IEEE Xplore

ABSTRACT In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated.

Download full-text


Available from: Emmanouil Benetos, Mar 10, 2014
  • Source
    • "For example, in [4], the (B, F 0 ) parameters are learned on some single note recordings and interpolated on the tessitura. In [5] [6], they are jointly, roughly estimated during a preprocessing step. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for estimating the tuning and the inharmonicity coefficient of piano tones, from single notes or chord recordings. It is based on the Non-negative Matrix Factorization (NMF) framework, with a parametric model for the dictionary atoms. The key point here is to include as a relaxed constraint the inharmonicity law modelling the frequencies of transverse vibrations for stiff strings. Applications show that this can be used to finely estimate the tuning and the inharmonicity coefficient of several notes, even in the case of high polyphony. The use of NMF makes this method relevant when tasks like music transcription or source/note separation are targeted.
    Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: We develop a general robust fundamental frequency estimator that allows for non-parametric inharmonicities in the observed signal. To this end, we incorporate the recently developed multi-dimensional covariance fitting approach by allowing the Fourier vector corresponding to each perturbed harmonic to lie within a small uncertainty hypersphere centered around its strictly harmonic counterpart. Within these hyperspheres, we find the best perturbed vectors fitting the covariance of the observed data. The proposed approach provides the estimate of the fundamental frequency in two steps, and, unlike other recentmethods, involves only a single 1-D search over a range of candidate fundamental frequencies. The proposed algorithm is numerically shown to outperform the current competitors under a variety of practical conditions, including various degrees of inharmonicity and different levels of noise.
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: A method for automatic transcription of polyphonic music is proposed in this work that models the temporal evolution of musical tones. The model extends the shift-invariant probabilistic latent component analysis method by supporting the use of spectral templates that correspond to sound states such as attack, sustain, and decay. The order of these templates is controlled using hidden Markov model-based temporal constraints. In addition, the model can exploit multiple templates per pitch and instrument source. The shift-invariant aspect of the model makes it suitable for music signals that exhibit frequency modulations or tuning changes. Pitch-wise hidden Markov models are also utilized in a postprocessing step for note tracking. For training, sound state templates were extracted for various orchestral instruments using isolated note samples. The proposed transcription system was tested on multiple-instrument recordings from various datasets. Experimental results show that the proposed model is superior to a non-temporally constrained model and also outperforms various state-of-the-art transcription systems for the same experiment.
    The Journal of the Acoustical Society of America 03/2013; 133(3):1727-41. DOI:10.1121/1.4790351 · 1.56 Impact Factor
Show more