Erkki Oja

Erkki Oja
Aalto University · Department of Computer Science

About

356
Publications
47,548
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
40,356
Citations
Citations since 2016
2 Research Items
10101 Citations
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500

Publications

Publications (356)
Article
Full-text available
Cluster analysis by nonnegative low-rank approximations has experienced a remarkable progress in the past decade. However, the majority of such approximation approaches are still restricted to nonnegative matrix factorization (NMF) and su er from the following two drawbacks: 1) they are unable to produce balanced partitions for large-scale manifold...
Article
In this work, we consider the Bayesian optimization (BO) approach for parametric tuning of complex chaotic systems. Such problems arise, for instance, in tuning the sub-grid-scale parameterizations in weather and climate models. For such problems, the tuning procedure is generally based on a performance metric which measures how well the tuned mode...
Article
Full-text available
Data visualization is one of the major applications of nonlinear dimensionality reduction. From the information retrieval perspective, the quality of a visualization can be evaluated by considering the extent that the neighborhood relation of each data point is maintained while the number of unrelated points that are retrieved is minimized. This pr...
Article
Affective classification and retrieval of multimedia such as audio, image, and video have become emerging research areas in recent years. The previous research focused on designing features and developing feature extraction methods. Generally, a multimedia content can be represented with different feature representations (i.e., views). However, the...
Conference Paper
Full-text available
With an ever growing number of published scientific studies, there is a need for automated search methods, able to collect and extract as much information as possible from those articles. We propose a framework for the extraction and characterization of brain activity areas published in neuroscientific reports, as well as a suitable clustering stra...
Article
Full-text available
In this work, we consider the Bayesian optimization (BO) approach for tuning parameters of complex chaotic systems. Such problems arise, for instance, in tuning the sub-grid scale parameterizations in weather and climate models. For such problems, the tuning procedure is generally based on a performance metric which measures how well the tuned mode...
Article
Full-text available
Information divergence that measures the difference between two nonnegative matrices or tensors has found its use in a variety of machine learning problems. Examples are Nonnegative Matrix/Tensor Factorization, Stochastic Neighbor Embedding, topic models, and Bayesian network optimization. The success of such a learning task depends heavily on a su...
Article
Many modern clustering methods employ a non-convex objective function and use iterative optimization algorithms to find local minima. Thus initialization of the algorithms is very important. Conventionally the starting guess of the iterations is randomly chosen; however, such a simple initialization often leads to poor clusterings. Here we propose...
Conference Paper
Full-text available
Images usually convey information that can influence people’s emotional states. Such affective information can be used by search engines and social networks for better understanding the user’s preferences. We propose here a novel Bayesian multiple kernel learning method for predicting the emotions evoked by images. The proposed method can make use...
Conference Paper
Full-text available
Emotional semantic image retrieval systems aim at incorporating the user’s affective states for responding adequately to the user’s interests. One challenge is to select features specific to image affect detection. Another challenge is to build effective learning models or classifiers to bridge the so-called “affective gap”. In this work, we study...
Conference Paper
Full-text available
Stochastic matrices are arrays whose elements are discrete probabilities. They are widely used in techniques such as Markov Chains, probabilistic latent semantic analysis, etc. In such learning problems, the learned matrices, being stochastic matrices, are non-negative and all or part of the elements sum up to one. Conventional multiplicative updat...
Conference Paper
Full-text available
Projective Nonnegative Matrix Factorization (PNMF) is able to extract sparse features and provide good approximation for discrete problems such as clustering. However, the original PNMF optimization algorithm can not guarantee theoretical convergence during the iterative learning. We propose here an adaptive multiplicative algorithm for PNMF which...
Conference Paper
Full-text available
Projective Nonnegative Matrix Factorization (PNMF) is one of the recent methods for computing low-rank approximations to data matrices. It is advantageous in many practical application domains such as clustering, graph partitioning, and sparse feature extraction. However, up to now a scalable implementation of PNMF for large-scale machine learning...
Conference Paper
Full-text available
Nonnegative Matrix Factorization (NMF) based on the family of β-divergences has shown to be advantageous in several signal processing and data analysis tasks. However, how to automatically select the best divergence among the family for given data remains unknown. Here we propose a new estimation criterion to resolve the problem of selecting β. Our...
Conference Paper
Full-text available
In the past decade, Probabilistic Latent Semantic Indexing (PLSI) has become an important modeling technique, widely used in clustering or graph partitioning analysis. However, the original PLSI is designed for multinomial data and may not handle other data types. To overcome this restriction, we generalize PLSI to t-exponential family based on a r...
Article
Full-text available
Independent Subspace Analysis (ISA) consists in separating sets (subspaces) of dependent sources, with different sets being independent of each other. While a few algorithms have been proposed to solve this problem, they are all completely general in the sense that they do not make any assumptions on the intra-subspace dependency. In this paper, we...
Article
Independent component analysis (ICA) is possibly the most widespread approach to solve the blind source separation problem. Many different algorithms have been proposed, together with several highly successful applications. There is also an extensive body of work on the theoretical foundations and limits of the ICA methodology.One practical concern...
Article
Full-text available
Clustering analysis by nonnegative low-rank approximations has achieved remarkable progress in the past decade. However, most approximation approaches in this direction are still restricted to matrix factorization. We propose a new low-rank learning method to improve the clustering performance, which is beyond matrix factorization. The approximatio...
Article
In Nonnegative Matrix Factorization (NMF), a nonnegative matrix is approximated by a product of lower-rank factorizing matrices. Most NMF methods assume that each factorizing matrix appears only once in the approximation, thus the approximation is linear in the factorizing matrices. We present a new class of approximative NMF methods, called Quadra...
Article
Full-text available
Many dynamical models, such as numerical weather prediction and climate models, contain so called clo-sure parameters. These parameters usually appear in physical parameterizations of sub-grid scale processes, and they act as "tuning handles" of the models. Currently, the values of these parameters are specified mostly manually, but the increasing...
Conference Paper
Full-text available
Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data sam...
Article
Full-text available
Multiplicative updates have been widely used in approximative nonnegative matrix factorization (NMF) optimization because they are convenient to deploy. Their convergence proof is usually based on the minimization of an auxiliary upper-bounding function, the construction of which however remains specific and only available for limited types of diss...
Conference Paper
Full-text available
In document clustering, semantically similar documents are grouped together. The dimensionality of document collections is often very large, thousands or tens of thousands of terms. Thus, it is common to reduce the original dimensionality before clustering for computational reasons. Cosine distance is widely seen as the best choice for measuring th...
Conference Paper
Full-text available
Explicit relevance feedback requires the user to explicitly refine the search queries for content-based image retrieval. This may become laborious or even impossible due to the ever-increasing volume of digital databases. We present a multimodal information collector that can unobtrusively record and asynchronously transmit the user’s implicit rele...
Conference Paper
Full-text available
The I-divergence or unnormalized generalization of Kullback-Leibler (KL) divergence is commonly used in Nonnegative Matrix Factorization (NMF). This divergence has the drawback that its gradients with respect to the factorizing matrices depend heavily on the scales of the matrices, and learning the scales in gradient-descent optimization may requir...
Article
Full-text available
What is Blind and Semi-blind Source Separation? Blind source separation (BSS) is a class of computational data analysis techniques for revealing hidden factors, that underlie sets of measurements or signals. BSS assumes a statistical model whereby the
Article
Full-text available
The well-known Nonnegative Matrix Factorization (NMF) method can be provided with more flexibility by generalizing the non-normalized Kullback-Leibler divergence to α-divergences. However, the resulting α-NMF method can only achieve mediocre sparsity for the factoriz-ing matrices. We have earlier proposed a variant of NMF, called Projective NMF (PN...
Article
Uncertainties in future climate projections are often evaluated based on the perceived spread of ensembles of multi-model climate projections, such as those generated in different phases of the Coupled Model Inter-comparison Project. In this paper we concentrate on uncertainties of a single climate model and the propagation of these uncertainties i...
Article
Full-text available
Climate models contain closure parameters to which the model climate is sensitive. These parameters appear in physical parameterization schemes where some unresolved variables are expressed by predefined parameters rather than being explicitly modeled. Currently, best expert knowledge is used to define the optimal closure parameter values, based on...
Conference Paper
Full-text available
Projective Nonnegative Matrix Factorization (PNMF) has demonstrated advantages in both sparse feature extraction and clustering. However, PNMF requires users to specify the column rank of the approximative projection matrix, the value of which is unknown beforehand. In this paper, we propose a method called ARDPNMF to automatically determine the co...
Article
Full-text available
Nonnegativity has been shown to be a powerful principle in linear matrix decompositions, leading to sparse component matrices in feature analysis and data compression. The classical method is Lee and Seung’s Nonnegative Matrix Factorization. A standard way to form learning rules is by multiplicative updates, maintaining nonnegativity. Here, a gener...
Conference Paper
Full-text available
We introduce a probabilistic version of the self-organizing map (SOM) where we model the uncertainty of both the model vectors and the data. While uncertainty information about the data is often not available, this property becomes very useful when the method is combined in a hierarchical manner with probabilistic principal component analysis (PCA)...
Conference Paper
Full-text available
It has been demonstrated that Student t-Distributed Stochastic Neighbor Embedding (t-SNE) can enhance discovery of clusters of data. However, the original t-SNE implementation employs an additive gradient-based algorithm which requires suitable learning step size and momentum rate, the tuning of which can be laborious. We propose a novel fixed-poin...
Article
Full-text available
Climate models contain closure parameters to which the model climate is sensitive. These parameters appear in physical parameterization schemes where some unresolved variables are expressed by predefined parameters rather than being explicitly modeled. Currently, best expert knowledge is used to define the optimal closure parameter values, based on...
Article
Full-text available
A variant of nonnegative matrix factorization (NMF) which was proposed earlier is analyzed here. It is called projective nonnegative matrix factorization (PNMF). The new method approximately factorizes a projection matrix, minimizing the reconstruction error, into a positive low-rank matrix and its transpose. The dissimilarity between the original...
Conference Paper
Full-text available
This paper presents an approach that allows for performing regression on large data sets in reasonable time. The main component of the approach consists in speeding up the slowest operation of the used al- gorithm by running it on the Graphics Processing Unit (GPU) of the video card, instead of the processor (CPU). The experiments show a speedup of...
Conference Paper
Full-text available
A new matrix factorization algorithm which combines two recently proposed nonnegative learning techniques is presented. Our new algorithm, α-PNMF, inherits the advantages of Projective Nonnegative Matrix Factorization (PNMF) for learning a highly orthogonal factor matrix. When the Kullback-Leibler (KL) divergence is generalized to α-divergence, it...
Conference Paper
Full-text available
In time series prediction, one does often not know the properties of the underlying system generating the time series. For example, is it a closed system that is generating the time series or are there any external factors influencing the system? As a result of this, you often do not know beforehand whether a time series is stationary or nonstation...
Conference Paper
Full-text available
Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, t-SNE, is restricted to a particular Student t-distribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to tune parameters such as the learning step size, mo...
Article
Full-text available
The derivation of the Cramer-Rao bound (CRB) in [ldquoPerformance Analysis of the FastICA Algorithm and Cramer-Rao Bounds for Linear Independent Component Analysis,rdquo IEEE Trans. Signal Process., vol. 54, no. 4, Apr. 2006, pp. 1189-1203] contains errors, which influence the matrix form of the CRB but not the CRB on variance of relevant off-diago...
Article
We give a general overview of the use and possible misuse of blind source separation (BSS) and independent component analysis (ICA) in the context of neuroinformatics data processing. A clear emphasis is given to the analysis of electrophysiological recordings, as well as to functional magnetic resonance images (fMRI). Two illustrative examples inc...
Article
This paper presents the CATS Benchmark and the results of the competition organised during the IJCNN'04 conference in Budapest. Twenty-four papers and predictions have been submitted and seventeen have been selected. The goal of the competition was the prediction of 100 missing values divided into five groups of twenty consecutive values.
Article
The Publisher regrets that this article is an accidental duplication of an article that has already been published in Neurocomputing, 70(2007), 2325–2329, doi:10.1016/j.neucom.2007.02.013. The duplicate article has therefore been withdrawn.
Article
We propose here new variants of the Non-negative Matrix Factorization (NMF) method for learning spatially localized, sparse, part-based subspace representations of visual or other patterns. The algorithms are based on positively constrained projections and are related both to NMF and to the conventional SVD or PCA decomposition. A crucial question...
Conference Paper
Full-text available
Many linear ICA techniques are based on minimizing a nonlinear contrast function and many of them use a hyperbolic tangent (tanh) as their built-in nonlinearity. In this paper we propose two rational functions to replace the tanh and other popular functions that are tailored for separating supergaussian (long-tailed) sources. The advantage of the r...
Article
The fast independent component analysis (FastICA) algorithm is one of the most popular methods to solve problems in ICA and blind source separation. It has been shown experimentally that it outperforms most of the commonly used ICA algorithms in convergence speed. A rigorous local convergence analysis has been presented only for the so-called one-u...
Article
Full-text available
FastICA is one of the most popular algorithms for independent component analysis (ICA), demixing a set of statistically independent sources that have been mixed linearly. A key question is how accurate the method is for finite data samples. We propose an improved version of the FastICA algorithm which is asymptotically efficient, i.e., its accuracy...
Article
In this paper, we enhance and analyze the Evolving Tree (ETree) data analysis algorithm. The suggested improvements aim to make the system perform better while still maintaining the simple nature of the basic algorithm. We also examine the system's behavior with many different kinds of tests, measurements and visualizations. We compare the ETree's...
Article
Full-text available
The FastICA or fixed-point algorithm is one of the most successful algorithms for linear independent component analysis (ICA) in terms of accuracy and computational complexity. Two versions of the algorithm are available in literature and software: a one-unit (deflation) algorithm and a symmetric algorithm. The main result of this paper are analyti...
Chapter
We introduce a neural network for the analysis of local independent components of an input signal. The network is a modification of Kohonen's adaptive-subspace self-organizing map. The map units consist of weight matrices adapted to represent linear transformations which locally minimize statistical dependence among pattern vector components. Train...
Chapter
A new and efficient version of the Hough Transform for curve detection, the Randomized Hough Transform (RHT), has been recently suggested. The RHT selects n pixels from an edge image by random sampling to solve n parameters of a curve and then accumulates only one cell in a parameter space. In this paper, the RHT is related to other recent developm...
Article
We present an example of exploratory data analysis of climate measurements using a recently developed denoising source separation (DSS) framework. We analyzed a combined dataset containing daily measurements of three variables: surface temperature, sea level pressure and precipitation around the globe, for a period of 56 years. Components exhibitin...
Conference Paper
The FastICA algorithm is a popular procedure for indepen- dent component analysis and blind source separation. In this paper, we analyze the average convergence behavior of the single-unit FastICA al- gorithm with kurtosis contrast for general m-source noiseless mixtures. We prove that this algorithm causes the average inter-channel interfer- ence...
Conference Paper
In this paper, we tested the efficiency of a two-step blind source separation (BSS) approach for the extraction of independent sources of α-activity from ongoing electroencephalograms (EEG). The method starts with a denoising source separation (DSS) of the recordings, and is followed by either an independent component analysis (ICA) or a temporal d...
Article
Full-text available
We propose using Independent Component Analysis (ICA) as an advanced pre-processing tool for blind suppression of interfering jammer signals in direct sequence spread spectrum communication systems utilizing antenna arrays. The role of ICA is to provide a jammer-mitigated signal to the conventional detection. If the jammer signal is weak or absent,...
Conference Paper
Full-text available
We present a method for exploratory data analysis of large spatiotemporal data sets such as global longtime climate measurements, extending our previous work on semiblind source separation of climate data. The method seeks fast changing components whose variances exhibit slow behavior with specific temporal structure. The algorithm is developed in...
Article
I. INTRODUCTION Information in its every form is becoming more and more important in the world of today. Modern computer systems can store huge amounts of data, and new data are acquired at an ever increasing rate. In a recent study [1] it was estimated that we collectively produced around 5 exabytes (5×10 18 bytes) of new information in the year 2...
Book
This book includes the proceedings of the International Conference on Artificial Neural Networks (ICANN 2006) held on September 10-14, 2006 in Athens, Greece, with tutorials being presented on September 10, the main conference taking place during September 11-13 and accompanying workshops on perception, cognition and interaction held on September 1...
Conference Paper
In many fields of science, engineering, medicine and economics, large or huge data sets are routinely collected. Processing and transforming such data to intelligible form for the human user is becoming one of the most urgent problems in near future. Neural networks and related statistical machine learning methods have turned out to be promising so...
Conference Paper
Full-text available
The fixed point algorithm, known as FastICA, is one of the most successful algorithms for independent component analysis in terms of accuracy and low computational complexity. This paper derives analytic closed form expressions that characterize separating ability of both one-unit and symmetric version of the algorithm in a local sense. Based on th...
Conference Paper
In image compression and feature extraction, linear expan- sions are standardly used. It was recently pointed out by Lee and Seung that the positivity or non-negativity of a linear expansion is a very power- ful constraint, that seems to lead to sparse representations for the images. Their technique, called Non-negative Matrix Factorization (NMF),...
Conference Paper
Full-text available
This paper derives a closed-form expression for the Cramer-Rao bound (CRB) on estimating the source signals in the linear independent component analysis problem, assuming that all independent components have finite variance. It is also shown that the fixed-point algorithm known as FastICA can approach the CRB (the estimate can be nearly efficient)...