Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

Centre for Digital Music, Queen Mary Univ. of London, London, UK
IEEE Journal of Selected Topics in Signal Processing (Impact Factor: 2.37). 11/2011; 5(6):1111 - 1123. DOI: 10.1109/JSTSP.2011.2162394
Source: IEEE Xplore


In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated.

Download full-text


Available from: Emmanouil Benetos, Mar 10, 2014
  • Source
    • "For example, in [4], the (B, F 0 ) parameters are learned on some single note recordings and interpolated on the tessitura. In [5] [6], they are jointly, roughly estimated during a preprocessing step. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for estimating the tuning and the inharmonicity coefficient of piano tones, from single notes or chord recordings. It is based on the Non-negative Matrix Factorization (NMF) framework, with a parametric model for the dictionary atoms. The key point here is to include as a relaxed constraint the inharmonicity law modelling the frequencies of transverse vibrations for stiff strings. Applications show that this can be used to finely estimate the tuning and the inharmonicity coefficient of several notes, even in the case of high polyphony. The use of NMF makes this method relevant when tasks like music transcription or source/note separation are targeted.
    Full-text · Conference Paper · Aug 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: ASO [1] is an adaptive embedding scheme that has proved its efficiency compared to HUGO [2] algorithm. It is based on the use of a detectability map that is correlated to the security of the embedding process. The detectability map is calculated using the Kodovský's ensemble classifiers[3] as an oracle, which preserves the distribution of the cover image and of the sender's database. In this article, we give the technical points about ASO. We give the details of the detectability map computation, then we study the security of the communication phase of ASO through the paradigm of the steganography by database. Since the introduced paradigm allows the sender to choose the most secure stego image(s) during the transmission of his message, we propose some security metrics that can help him to distinguish between secure and insecure images. We thus significantly increase the security of ASO.
    Full-text · Conference Paper · Aug 2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: This research proposes an automatic transcription-feedback system of music which help people to learn musical instruments by themselves. The focus of this research is piano. We develop real-time polyphonic pitch detectionfeedback system. For ’polyphonic pitch detection’, we use inner product based similarity measure with discriminant note detection threshold and top down attention. Also, we develop two parallel processes on simulink and matlab separately for real-time system. On simulink workspace, real-time recording and signal flow management is implemented. This system takes 2mins. 12secs. for analyzing 1min. piece and have accuracy of pitch detection as 79.33% for test case (Chopin Nocturne Op.9 N.2).
    No preview · Chapter · Nov 2012
Show more

We use cookies to give you the best possible experience on ResearchGate. Read our cookies policy to learn more.