[Show abstract][Hide abstract] ABSTRACT: Nowadays there is an increasing interest in developing methods for building music recommendation systems. In order to get a satisfactory performance from such a system, one needs to incorporate as much information about songs similarity as possible; however, how to do so is not obvious. In this paper, we build on the ideas of the Probabilistic Latent Semantic Analysis (PLSA) that has been successfully used in the document retrieval community. Under this probabilistic framework, any song will be projected into a relatively low dimensional space of "latent semantics", in such a way that that all observed similarities can be satisfactorily explained using the latent semantics. Additionally, this approach significantly simplifies the song retrieval phase, leading to a more practical system implementation. The suitability of the PLSA model for representing music structure is studied in a simplified scenario consisting of 10.000 songs and two similarity measures among them. The results suggest that the PLSA model is a useful framework to combine different sources of information, and provides a reasonable space for song representation.
[Show abstract][Hide abstract] ABSTRACT: Slow convergence is observed in the EM algorithm for linear state-space models. We propose to circumvent the problem by applying any off-the-shelf quasi-Newton-type optimizer, which operates on the gradient of the log-likelihood function. Such an algorithm is a practical alternative due to the fact that the exact gradient of the log-likelihood function can be computed by recycling components of the expectation-maximization (EM) algorithm. We demonstrate the efficiency of the proposed method in three relevant instances of the linear state-space model. In high signal-to-noise ratios, where EM is particularly prone to converge slowly, we show that gradient-based learning results in a sizable reduction of computation time.
No preview · Article · Apr 2007 · Neural Computation
[Show abstract][Hide abstract] ABSTRACT: In this paper we propose to use an instantaneous ICA method (BLUES) to separate the instruments in a real music stereo recording. We combine two strong separation techniques to segregate instruments from a mixture: ICA and binary time-frequency masking. By combining the methods, we are able to make use of the fact that the sources are differently distributed in both space, time and frequency. Our method is able to segregate an arbitrary number of instruments and the segregated sources are maintained as stereo signals. We have evaluated our method on real stereo recordings, and we can segregate instruments which are spatially different from other instruments.
[Show abstract][Hide abstract] ABSTRACT: In large MP3 databases, files are typically generated with different parameter settings, i.e., bit rate and sampling rates. This is of concern for MIR applications, as encoding dif- ference can potentially confound meta-data estimation and similarity evaluation. In this paper we will discuss the in- fluence of MP3 coding for the Mel frequency cepstral coe- ficients (MFCCs). The main result is that the widely used subset of the MFCCs is robust at bit rates equal or higher than 128 kbits/s, for the implementations we have investi- gated. However, for lower bit rates, e.g., 64 kbits/s, the im- plementation of the Mel filter bank becomes an issue.
[Show abstract][Hide abstract] ABSTRACT: In the Bayesian modeling framework there is a close relation between regularization and the prior distribution over pa- rameters. For prior distributions in the exponential family, we show that the optimal hyper-parameter, i.e., the optimal strength of regularization, satisfies a simple relation: The ex- pectation of the regularization function, i.e., takes the same value in the posterior and prior distribution. We present three examples: two simulations, and application in fMRI neuroimaging.
[Show abstract][Hide abstract] ABSTRACT: This demonstration illustrates how the methods developed in the MIR community can be used to provide real-time feedback to music users. By creating a genre classifier plug- in for a popular media player we present users with rele- vant information as they play their songs. The plug-in can furthermore be used as a data collection platform. After informed consent from a selected set of users the plug-in will report on music consumption behavior back to a central server.
[Show abstract][Hide abstract] ABSTRACT: The process of representing a large data set with a smaller number of vectors in the best possible way, also known as vector quantization, has been intensively studied in the recent years. Very efficient algorithms like the Kohonen Self Organizing Map (SOM) and the Linde Buzo Gray (LBG) algorithm have been devised. In this paper a physical approach to the problem is taken, and it is shown that by considering the processing elements as points moving in a potential field an algorithm equally efficient as the before mentioned can be derived. Unlike SOM and LBG this algorithm has a clear physical interpretation and relies on minimization of a well defined cost-function. It is also shown how the potential field approach can be linked to information theory by use of the Parzen density estimator. In the light of information theory it becomes clear that minimizing the free energy of the system is in fact equivalent to minimizing a divergence measure between the distribution of the data and the distribution of the processing element, hence, the algorithm can be seen as a density matching method.
Full-text · Article · Jan 2005 · Natural Computing
[Show abstract][Hide abstract] ABSTRACT: Representation of a large set of high-dimensional data is a fundamental problem in many applications such as communications and biomedical systems. The problem has been tackled by encoding the data with a compact set of code-vectors called processing elements. In this study, we propose a vector quantization technique that encodes the information in the data using concepts derived from information theoretic learning. The algorithm minimizes a cost function based on the Kullback-Liebler divergence to match the distribution of the processing elements with the distribution of the data. The performance of this algorithm is demonstrated on synthetic data as well as on an edge-image of a face. Comparisons are provided with some of the existing algorithms such as LBG and SOM.
[Show abstract][Hide abstract] ABSTRACT: In this paper a system that transforms speech waveforms to animated faces are proposed. The system relies on continuous state space models to perform the mapping, this makes it possible to ensure video with no sudden jumps and allows continuous control of the parameters in ’face space’.
The performance of the system is critically dependent on the number of hidden variables, with too few variables the model cannot represent data, and with too many overfitting is noticed
Simulations are performed on recordings of 3-5 sec. video sequences with sentences from the Timit database. From a subjective point of view the model is able to construct an image sequence from an unknown noisy speech sequence even though the number of training examples are limited.
[Show abstract][Hide abstract] ABSTRACT: Using a Parzen density estimator, any distribution can be approximated arbitrarily close by a sum of kernels. In particle filtering, this fact is utilized to estimate a probability density function with Dirac delta kernels; when the distribution is discretized it becomes possible to solve an otherwise intractable integral. In this work, we propose to extend the idea and use any kernel to approximate the distribution. The extra work involved in propagating small kernels through the nonlinear function can be made up for by decreasing the number of kernels needed, especially for high dimensional problems. A further advantage of using kernels with nonzero width is that the density estimate becomes continuous.
[Show abstract][Hide abstract] ABSTRACT: In this paper, a system that transforms speech waveforms to animated faces are proposed. The system relies on a state space model to perform the mapping. To create a photo realistic image, an active appearance model is used. The main contribution of the paper is to compare a Kalman filter and a hidden Markov model approach to the mapping. It is shown that even though the HMM can get a higher test likelihood than the Kalman filter, it is much easier to train and the animation quality is better for the Kalman filter.
[Show abstract][Hide abstract] ABSTRACT: The procedure of obtaining the two-degrees-of-freedom, finite dimensional. nonlinear mathematical model. which models the nonlinear features of aircraft flutter in transonic speed is reported. The model enables to explain every feature of the transonic flutter data of the wind tunnel tests conducted at National Aerospace Laboratory in Japan for a high aspect ratio wing. It explains the nonlinear features of the transonic flutter such as the subcritical Hopf bifurcation of a limit cycle oscillation (LCO), a saddle-node bifurcation, and an unstable limit cycle as well as a normal (linear) flutter condition with its linear pan. At a final procedure of improve a quantitative matching with the test data. the continuation method for analyzing the bifurcation is extensively used.
[Show abstract][Hide abstract] ABSTRACT: The paper describes the effects of random external excitations on the onset and dynamical characteristics of transonic flutter (i.e. large-amplitude, self-sustained oscillations) for a high aspect ratio wing. Wind tunnel experiments performed at the National Aerospace Laboratory (NAL) in Japan have shown that the self-sustained oscillations arise in a subcritical Hopf bifurcation. However, analysis of the experimental data also reveals that this bifurcation is modified in various ways. We present an outline of the construction of a 6 DOF model of the aeroelastic behavior of the wing structure. When this model is extended by the introduction of nonlinear terms, it can reproduce the subcritical Hopf bifurcation. We hereafter consider the effects of subjecting simplified versions of the model to random external excitations representing the fluctuations present in the airflow. These models can reproduce several of the experimentally observed modifications of the flutter transition. In particular, the models display the characteristic phenomena of coherence resonance.
No preview · Article · Mar 2002 · Mathematics and Computers in Simulation
[Show abstract][Hide abstract] ABSTRACT: We have collected a database of musical features from ra-dio broadcasts and CD collections (N > 10 5). The database poses a number of hard modelling challenges including: Seg-mentation problems and missing and wrong meta-data. We describe our efforts towards cleaning the data using proba-bility density estimation. We train conditional densities for checking the relation between meta-data and music features, and un-conditional densities for spotting unlikely music fea-tures. We show that the rejected samples indeed represent various types of problems in the music data. The models may in some cases assist reconstruction of meta-data.