Conference Paper

Optimal filtering of dynamics in shorttime features for music organization

Conference: ISMIR 2006, 7th International Conference on Music Information Retrieval, Victoria, Canada, 8-12 October 2006, Proceedings
Source: DBLP


There is an increasing interest in customizable methods for organizing music collections. Relevant music characteriza- tion can be obtained from short-time features, but it is not obvious how to combine them to get useful information. In this work, a novel method, denoted as the Positive Con- strained Orthonormalized Partial Least Squares (POPLS), is proposed. Working on the periodograms of MFCCs time series, this supervised method finds optimal filters which pick up the most discriminative temporal information for any music organization task. Two examples are presented in the paper, the first being a simple proof-of-concept, where an altosax with and without vibrato is modelled. A more complex 11 music genre classification setup is also inves- tigated to illustrate the robustness and validity of the pro - posed method on larger datasets. Both experiments showed the good properties of our method, as well as superior per- formance when compared to a fixed filter bank approach suggested previously in the MIR literature. We think that the proposed method is a natural step towards a customized MIR application that generalizes well to a wide range of dif- ferent music organization tasks.

Download full-text


Available from: Anders Meng, Feb 09, 2015
  • Source
    • "En contraposición con el caso anterior en el que el banco de filtros utilizado está definido a priori, en [3] "
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of acoustic events produced in meeting-rooms or classrooms could help to identify and to describe social and human activities that take place on these rooms. This article focus on the classification of 20 acoustic events, using short-time acoustic features based on MFCC (Mel-Frequency Cepstral Coefficients), applied on three different time integration techniques, and using a classifier based on SVM (Support Vector Machine). We analyze the performance obtained with each one of these techniques to eventually combine them with the aim of improving overall performance of the classification system.
    Full-text · Article · Dec 2012
  • Source
    • "Each periodogram was then summarized by its power in 4 predefined frequency bands using a fixed filter bank. This approach was pursued by Arenas-Garcia et al. [1] who trained the filter-bank in a supervised fashion to optimally suit a particular music organization task. Rauber et al. [20] used critical band energies periodograms with a much longer context, i.e. 6 seconds. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The enormous growth of digital music databases has led to a comparable growth in the need for methods that help users organize and access such information. One area in particular that has seen much recent research activity is the use of automated techniques to describe audio content and to allow for its identification, browsing and retrieval. Conventional approaches to music content description rely on features characterizing the shape of the signal spectrum in relatively short-term frames. In the context of Automatic Speech Recognition (ASR), Hermansky \cite{Hermansky_1} described an interesting alternative to short-term spectrum features, the TRAP-TANDEM approach which uses long-term band-limited features trained in a supervised fashion. We adapt this idea to the specific case of music signals and propose a generic system for the description of temporal patterns. The same system with different settings is able to extract features describing either timbre or rhythmic content. The quality of the generated features is demonstrated in a set of music retrieval experiments and compared to other state-of-the-art models.
    Preview · Article · Jan 2008
  • Source
    • "genre information for the songs in a training data set (using one-out-of-C encoding), but in previous works we also considered feature extraction for instrument classification [7] and for detecting the presence of vibrato in instrument recordings [4]. Furthermore, MVA algorithms can also be used in problems with multiple labels (e.g., when soft or multiple membership is allowed), and in regression problems (e.g., if Y is used to encode the ratings given by a user to different songs). "
    [Show abstract] [Hide abstract]
    ABSTRACT: There is an increasing interest in customizable methods for organizing music col- lections. Relevant music characterization can be obtained from short-time fea- tures, but it is not obvious how to combine them to get useful information. First, the relevant information might not be evident at the short-t ime level, and these features have to be combined at a larger temporal level into a new feature vector in order to capture the relevant information. Second, we need to learn a model for the new features that generalizes well to new data. In thi s contribution, we will study how multivariate analysis (MVA) and kernel methods can be of great help in this task. More precisely, we will present two modifie d versions of a MVA method known as Orthonormalized Partial Least Squares (OPLS), one of them being a kernel extension, that are well-suited for discover ing relevant dynamics in large music collections. The performance of both schemes will be illustrated in a music genre classification task.
    Full-text · Article · Jan 2006
Show more