Publications (8)5.9 Total impact
-
Article: Compact Modeling of Data Using Independent Variable Group Analysis
[show abstract] [hide abstract]
ABSTRACT: In this paper, we introduce a modeling approach called independent variable group analysis (IVGA) which can be used for finding an efficient structural representation for a given data set. The basic idea is to determine such a grouping for the variables of the data set that mutually dependent variables are grouped together whereas mutually independent or weakly dependent variables end up in separate groups. Computation of an IVGA model requires a combinatorial algorithm for grouping of the variables and a modeling algorithm for the groups. In order to be able to compare different groupings, a cost function which reflects the quality of a grouping is also required. Such a cost function can be derived, for example, using the variational Bayesian approach, which is employed in our study. This approach is also shown to be approximately equivalent to minimizing the mutual information between the groups. The modeling task is computationally demanding. We describe an efficient heuristic grouping algorithm for the variables and derive a computationally light nonlinear mixture model for modeling of the dependencies within the groups. Finally, we carry out a set of experiments which indicate that IVGA may turn out to be beneficial in many different applications.IEEE Transactions on Neural Networks 12/2007; · 2.95 Impact Factor -
Conference Proceeding: Approximating nonlinear transformations of probability distributions for nonlinear independent component analysis
[show abstract] [hide abstract]
ABSTRACT: The nonlinear independent component analysis method introduced by Lappalainen and Honkela in 2000 uses a truncated Taylor series representation to approximate the nonlinear transformation from sources to observations. The approach uses information only at the single point of input mean and can produce poor results if the input variance is large. This feature has recently been identified to be the cause of instability of the algorithm with large source dimensionalities. In this paper, an improved approximation is presented. The derivatives used in the Taylor scheme are replaced with slopes evaluated by global Gauss-Hermite quadrature. The resulting approximation is more accurate under high input variance and the new learning algorithm is more stable with high source dimensionalities.Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on; 08/2004 -
Article: Variational learning and bits-back coding: an information-theoretic view to Bayesian learning
[show abstract] [hide abstract]
ABSTRACT: The bits-back coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and information-theoretic minimum-description-length (MDL) learning approaches. The bits-back coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints provides interesting insights to the learning process and the functions of different parts of the model. In this paper, the problem of variational Bayesian learning of hierarchical latent variable models is used to demonstrate the benefits of the two views. The code-length interpretation provides new views to many parts of the problem such as model comparison and pruning and helps explain many phenomena occurring in learning.IEEE Transactions on Neural Networks 08/2004; · 2.95 Impact Factor -
Conference Proceeding: Speeding up cyclic update schemes by pattern searches
[show abstract] [hide abstract]
ABSTRACT: A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. We address a well-known problem with this strategy, namely slow convergence under low noise. We propose using so called pattern searches which consist of a parameter-wise update phase followed by a line search. The search direction of the line search is computed by combining the individual updates of all subproblems. The approach can be used to accelerate learning of several methods proposed in the literature without the need for large algorithmic modifications such as evaluation of global gradients. The proposed modification is shown to reduce the convergence time in a realistic independent component analysis (ICA) problem by more than 85 %.Neural Information Processing, 2002. ICONIP '02. Proceedings of the 9th International Conference on; 12/2002 -
Conference Proceeding: An ensemble learning approach to nonlinear dynamic blind source separation using state-space models
[show abstract] [hide abstract]
ABSTRACT: We propose a new method for learning a nonlinear dynamical state-space model in unsupervised manner. The proposed method can be viewed as a nonlinear dynamic generalization of standard linear blind source separation (BSS) or independent component analysis (ICA). Using ensemble learning, the method finds a nonlinear dynamical process which can explain the observations. The nonlinearities are modeled with multilayer perceptron networks. In ensemble learning, a simpler approximative distribution is fitted to the true posterior distribution by minimizing their Kullback-Leibler divergence. This also regularizes the studied highly ill-posed problem. In an experiment with a difficult chaotic data set, the proposed method found a much better model for the underlying dynamical process and source signals used for generating the data than the compared methodsNeural Networks, 2002. IJCNN '02. Proceedings of the 2002 International Joint Conference on; 02/2002 -
Conference Proceeding: Nonlinear static and dynamic blind source separation using ensemble learning
[show abstract] [hide abstract]
ABSTRACT: Blind separation of sources from their nonlinear mixtures is generally a very difficult problem. This is because both the nonlinear mapping and the underlying sources are unknown, and must be learned in a completely unsupervised manner from the data. We use multilayer perceptrons as nonlinear generative models for the data, and apply Bayesian ensemble learning for finding the sources. In this paper, we first consider a static nonlinear mixture model, with a successful application to real-world speech data. Then we briefly discuss extraction of sources from nonlinear dynamic processes. In a difficult test problem with chaotic data, our approach clearly outperforms currently available nonlinear prediction techniques. The proposed methods are computationally demanding especially in the dynamic case, but they allow the use of higher dimensional nonlinear latent variable models than other existing approachesNeural Networks, 2001. Proceedings. IJCNN '01. International Joint Conference on; 02/2001 -
Conference Proceeding: Nonlinear source separation using ensemble learning and MLP networks
[show abstract] [hide abstract]
ABSTRACT: We consider extraction of independent sources from their nonlinear mixtures. Generally, this problem is very difficult, because both the nonlinear mapping and the underlying sources are unknown and should be learned from the data. We use multilayer perceptrons as nonlinear generative models for the data. The model indeterminacy problem is resolved by applying ensemble learning. This Bayesian method selects the most probable generative data model. In simulations with artificial data, the network is able to find the underlying sources from the observations only, even though the data generating mapping is strongly nonlinear. We have applied the developed method also to real-world process dataAdaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000; 02/2000 -
Conference Proceeding: A state-space method for language modeling
[show abstract] [hide abstract]
ABSTRACT: A new state-space method for language modeling is presented. The complexity of the model is controlled by choosing the dimension of the state instead of the smoothing and back-off methods common in n-gram modeling. The model complexity also controls the generalization ability of the model, allowing the model to handle similar words in a similar manner. We compare the state-space model to a traditional n-gram model in a task of letter prediction. In this proof-of-concept experiment, the state-space model gives similar results as the n-gram model with sparse training data, but performs clearly worse with dense training data. While the initial results are encouraging, the training algorithm should be made more effective, so that it can fully exploit the model structure and scale up to larger token sets, such as words.Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on;
Top Journals
Institutions
-
2002–2004
-
University of Helsinki
Helsinki, Province of Southern Finland, Finland
-