Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

Centre for Digital Music, Queen Mary Univ. of London, London, UK
IEEE Journal of Selected Topics in Signal Processing (Impact Factor: 3.63). 11/2011; DOI: 10.1109/JSTSP.2011.2162394
Source: IEEE Xplore

ABSTRACT In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for estimating the tuning and the inharmonicity coefficient of piano tones, from single notes or chord recordings. It is based on the Non-negative Matrix Factorization (NMF) framework, with a parametric model for the dictionary atoms. The key point here is to include as a relaxed constraint the inharmonicity law modelling the frequencies of transverse vibrations for stiff strings. Applications show that this can be used to finely estimate the tuning and the inharmonicity coefficient of several notes, even in the case of high polyphony. The use of NMF makes this method relevant when tasks like music transcription or source/note separation are targeted.
    Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Conditional random fields (CRFs) are probabilistic sequence models that have been applied in the last decade to a number of applications in audio, speech, and language processing. In this paper, we provide a tutorial overview of CRF technologies, pointing to other resources for more in-depth discussion; in particular, we describe the common linear-chain model as well as a number of common extensions within the CRF family of models. An overview of the mathematical techniques used in training and evaluating these models is also provided, as well as a discussion of the relationships with other probabilistic models. Finally, we survey recent work in speech, audio, and language processing to show how the same CRF technology can be deployed in different scenarios.
    Proceedings of the IEEE 04/2013; 101(5):1054-1075. · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: An automatic music transcriber is a device that detects, without human interference, the musical gestures required to play a particular piece. Many techniques have been proposed to solve the problem of automatic music transcription. This paper presents an overview on the theme, discussing digital signal processing techniques, pattern classification techniques and heuristic assumptions derived from music knowledge that were used to build some of the main systems found in the literature. The paper is focused on the motivations behind each technique, aiming to serve both as an introduction to the theme and as resource for the development of new solutions for automatic transcription.
    Journal of the Brazilian Computer Society 11/2013; 19(4):589-604.


Available from
May 29, 2014