Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

Centre for Digital Music, Queen Mary Univ. of London, London, UK
IEEE Journal of Selected Topics in Signal Processing (Impact Factor: 3.63). 11/2011; 5(6):1111 - 1123. DOI: 10.1109/JSTSP.2011.2162394
Source: IEEE Xplore

ABSTRACT In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated.

  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a novel method to estimate the fundamental frequencies and directions-of-arrival (DOA) of multi-pitch signals impinging on a sensor array. Formulating the estimation as a group sparse convex optimization problem, we use the alternating direction of multipliers method (ADMM) to estimate both temporal and spatial correlation of the array signal. By first jointly estimating both fundamental frequencies and time-of-arrivals (TOAs) for each sensor and sound source, we then form a non-linear least squares estimate to obtain the DOAs. Numerical simulations indicate the preferable performance of the proposed estimator as compared to current state-of-the-art methods.
    ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 05/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of estimating the fundamental frequencies of a signal containing multiple harmonically related sinusoidal components using a novel block sparse signal representation. An efficient algorithm for solving the resulting optimization problem is devised exploiting a novel variable step-size alternating direction method of multipliers (ADMM). The resulting algorithm has guaranteed convergence and shows notable robustness to the f0 vs ambiguity problem. The superiority of the proposed method, as compared to earlier presented estimation techniques, is demonstrated using both simulated and measured audio signals, clearly indicating the preferable performance of the proposed technique.
    Signal Processing 04/2015; 109. DOI:10.1016/j.sigpro.2014.10.014 · 2.24 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We develop a general robust fundamental frequency estimator that allows for non-parametric inharmonicities in the observed signal. To this end, we incorporate the recently developed multi-dimensional covariance fitting approach by allowing the Fourier vector corresponding to each perturbed harmonic to lie within a small uncertainty hypersphere centered around its strictly harmonic counterpart. Within these hyperspheres, we find the best perturbed vectors fitting the covariance of the observed data. The proposed approach provides the estimate of the fundamental frequency in two steps, and, unlike other recentmethods, involves only a single 1-D search over a range of candidate fundamental frequencies. The proposed algorithm is numerically shown to outperform the current competitors under a variety of practical conditions, including various degrees of inharmonicity and different levels of noise.
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on; 01/2013


Available from
May 29, 2014