Michel Verleysen

Michel Verleysen
Université Catholique de Louvain - UCLouvain | UCLouvain · Department of Electrical Engineering

About

381
Publications
83,984
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,347
Citations

Publications

Publications (381)
Preprint
Full-text available
Multidimensional scaling is a statistical process that aims to embed high dimensional data into a lower-dimensional space; this process is often used for the purpose of data visualisation. Common multidimensional scaling algorithms tend to have high computational complexities, making them inapplicable on large data sets. This work introduces a stoc...
Conference Paper
Full-text available
Fast multi-scale neighbor embedding (f-ms-NE) is an algorithm that maps high-dimensional data to a low-dimensional space by preserving the multi-scale data neighborhoods. To lower its time complexity, f-ms-NE uses random subsamplings to estimate the data properties at multiple scales. To improve this estimation and study the f-ms-NE sensitivity to...
Conference Paper
Full-text available
Multidimensional scaling is a statistical process that aims to embed high-dimensional data into a lower-dimensional, more manageable space. Common MDS algorithms tend to have some limitations when facing large data sets due to their high time and spatial complexities. This paper attempts to tackle the problem by using a stochastic approach to MDS w...
Article
Feature selection is an important preprocessing step in machine learning. It helps to better understand the importance of some features and to reduce the dimensionality of a dataset, which improves machine learning and information extraction. Among the different existing methods for selecting features, filters are popular because they are independe...
Article
Dimension reduction (DR) computes faithful low-dimensional (LD) representations of high-dimensional (HD) data. Outstanding performances are achieved by recent neighbor embedding (NE) algorithms such as t-SNE, which mitigate the curse of dimensionality. The single-scale or multiscale nature of NE schemes drives the HD neighborhood preservation in th...
Preprint
Full-text available
The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm is a ubiquitously employed dimensionality reduction (DR) method. Its non-parametric nature and impressive efficacy motivated its parametric extension. It is however bounded to a user-defined perplexity parameter, restricting its DR quality compared to recently developed multi-scale p...
Article
Full-text available
This work proposes a new algorithm for training neural networks to solve the problems of feature selection and function approximation. The algorithm applies different weight constraint functions for the hidden and the output layers of a multilayer perceptron neural network. The LASSO operator is applied to the hidden layer; therefore, the training...
Chapter
High-dimensional data are ubiquitous in regression. To obtain a better understanding of the data or to ease the learning process, reducing the data to a subset of the most relevant features is important. Among the different methods of feature selection, filter methods are popular because they are independent from the model, which makes them fast an...
Chapter
Selecting the best group of features from high-dimensional datasets is an important challenge in machine learning. Indeed problems with hundreds of features have now become usual. In the context of filter methods, the selected relevance criterion used for filtering is the key factor of a feature selection method. To select an appropriate criterion...
Conference Paper
Full-text available
Stochastic Neighbor Embedding (SNE) and variants like t-distributed SNE are popular methods of unsupervised dimensionality reduction (DR) that deliver outstanding experimental results. Regular t-SNE is often used to visualize data with class labels in colored scatterplots, even if those labels are actually not involved in the DR process. This paper...
Conference Paper
Full-text available
Noisy multi-way data sets are ubiquitous in many domains. In neuroscience, electroencephalogram (EEG) data are recorded during periodic stimulation from different sensory modalities, leading to steady-state (SS) recordings with at least four ways: the channels, the time, the subjects and the modalities. Improving the signal-to-noise ratio (SNR) of...
Chapter
Selecting features from high-dimensional datasets is an important problem in machine learning. This paper shows that in the context of filter methods for feature selection, the estimator of the criterion used to select features plays an important role; in particular the estimators may suffer from a bias when comparing smooth and non-smooth features...
Conference Paper
Full-text available
Stochastic Neighbor Embedding (SNE) and variants are dimensionality reduction (DR) methods able to foil the curse of dimensionality to deliver outstanding experimental results. Mitigating the crowding problem, t-SNE became an extremely popular DR scheme. Its quadratic time complexity in the number of samples is nevertheless unaffordable for big dat...
Conference Paper
Full-text available
In dimensionality reduction and data visualisation, t-SNE has become a popular method. In this paper, we propose two variants to the Gaussian similarities used to characterise the neighbourhoods around each high-dimensional datum in t-SNE. A first alternative is to use t distributions like already used in the low-dimensional embedding space; a vari...
Article
Full-text available
Introduction In the context of metabolomics analyses, partial least squares (PLS) represents the standard tool to perform regression and classification. OPLS, the Orthogonal extension of PLS which has proved to be very useful when interpretation is the main issue, is a more recent way to decompose the PLS solution into predictive components correla...
Article
Full-text available
In order to maintain life, living organism’s product and transform small molecules called metabolites. Metabolomics aims at studying the development of biological reactions resulting from a contact with a physio-pathological stimulus, through these metabolites. The 1H-NMR spectroscopy is widely used to graphically describe a metabolite composition...
Article
Dimensionality reduction (DR) aims to reveal salient properties of high-dimensional (HD) data in a low-dimensional (LD) representation space. Two elements stipulate success of a DR approach: definition of a notion of pairwise relations in the HD and LD spaces, and measuring the mismatch between these relationships in the HD and LD representations o...
Conference Paper
Full-text available
Stochastic neighbor embedding (SNE) is a method of dimen-sionality reduction that involves softmax similarities measured between all pairs of data points. To build a suitable embedding, SNE tries to reproduce in a low-dimensional space the similarities that are observed in the high-dimensional data space. Previous work has investigated the immunity...
Conference Paper
Full-text available
This work presents an approach allowing for an interactive visualization of dimensionality reduction outcomes, which is based on an extended view of conventional homotopy. The pairwise functional followed from a simple homotopic function can be incorporated within a geometrical framework in order to yield a bi-parametric approach able to combine se...
Conference Paper
Full-text available
This work introduces a generalized kernel perspective for spectral dimensionality reduction approaches. Firstly, an elegant matrix view of kernel principal component analysis (PCA) is described. We show the relationship between kernel PCA, and conventional PCA using a parametric distance. Secondly, we introduce a weighted kernel PCA framework follo...
Article
Usual multi-class classification techniques often rely on the availability of all relevant features. In practice, however, this requirement restricts the type of features that can be considered. Features whose value depends on some partial, intermediate classification results, can convey precious information but their nature hinders their use. A ty...
Conference Paper
Full-text available
Feature selection is essential in many machine learning problem, but it is often not clear on which grounds variables should be included or excluded. This paper shows that the mean squared leave-one-out error of the first-nearest-neighbour estimator is effective as a cost function when selecting input variables for regression tasks. A theoretical a...
Conference Paper
Full-text available
Dimensionality reduction methods aimed at preserving the data topol-ogy have shown to be suitable for reaching high-quality embedded data. In particular , those based on divergences such as stochastic neighbour embedding (SNE). The big advantage of SNE and its variants is that the neighbor preservation is done by optimizing the similarities in both...
Article
Full-text available
Label noise is an important issue in classification, with many potential negative consequences. For example, the accuracy of predictions may decrease, whereas the complexity of inferred models and the number of necessary training samples may increase. Many works in the literature have been devoted to the study of label noise and the development of...
Article
A way to achieve feature selection for classification problems polluted by label noise is proposed. The performances of traditional feature selection algorithms often decrease sharply when some samples are wrongly labelled. A method based on a probabilistic label noise model combined with a nearest neighbours-based entropy estimator is introduced t...
Conference Paper
Full-text available
The aim of this paper is to propose a new generalized formulation for feature extraction based on distances from a feature relevance point of view. This is done within an unsupervised framework. To do so, it is first outlined the formal concept of feature relevance. Then, a novel feature extraction approach is introduced. Such an approach employs t...
Conference Paper
Full-text available
Dimensionality reduction is a key stage for both the design of a pattern recognition system or data visualization. Recently, there has been a increasing interest in those methods aimed at preserving the data topology. Among them, Laplacian eigenmaps (LE) and stochastic neighbour embedding (SNE) are the most representative. In this work, we present...
Article
This paper introduces a new methodology to perform feature selection in multi-label classification problems. Unlike previous works based on the @g^2 statistics, the proposed approach uses the multivariate mutual information criterion combined with a problem transformation and a pruning strategy. This allows us to consider the possible dependencies...
Article
Feature selection is a task of fundamental importance for many data mining or machine learning applications, including regression. Surprisingly, most of the existing feature selection algorithms assume the problems to address are either supervised or unsupervised, while supervised and unsupervised samples are often simultaneously available in real-...
Article
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased tow...
Conference Paper
Full-text available
Radiotherapy treatment planning requires physicians to delineate the target volumes and organs at risk on 3D images of the patient. This segmentation task consumes a lot of time and can be partly automated with atlases (reference images segmented by experts). To segment any new image, the atlas is non-rigidly registered and the organ contours are t...
Article
Feature selection is an important preprocessing step for many high-dimensional regression problems. One of the most common strategies is to select a relevant feature subset based on the mutual information criterion. However, no connection has been established yet between the use of mutual information and a regression error criterion in the machine...
Article
Stochastic neighbor embedding (SNE) and its variants are methods of dimensionality reduction (DR) that involve normalized softmax similarities derived from pairwise distances. These methods try to reproduce in the low-dimensional embedding space the similarities observed in the high-dimensional data space. Their outstanding experimental results, co...
Article
Mutual information is a widely used performance criterion for filter feature selection. However, despite its popularity and its appealing properties, mutual information is not always the most appropriate criterion. Indeed, contrary to what is sometimes hypothesized in the literature, looking for a feature subset maximizing the mutual information do...
Article
Full-text available
We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely id...
Data
Full-text available
Online Support Material for: Unique in the Crowd: The privacy bounds of human mobility
Conference Paper
Generating effective visual embedding of high-dimensional data is difficult - the analyst expects to see the structure of the data in the visualization, as well as patterns and relations. Given the high dimensionality, noise and imperfect embedding techniques, it is hard to come up with a satisfactory embedding that preserves the data structure wel...
Chapter
Mutual information is one of the most popular criteria used in feature selection, for which many estimation techniques have been proposed. The large majority of them are based on probability density estimation and perform badly when faced to high-dimensional data, because of the curse of dimensionality. However, being able to evaluate robustly the...
Article
Feature selection is an important preprocessing task for many machine learning and pattern recognition applications, including regression and classification. Missing data are encountered in many real-world problems and have to be considered in practice. This paper addresses the problem of feature selection in prediction problems where some occurren...
Conference Paper
Full-text available
Image segmentation problems can be solved with classifica-tion algorithms. However, their use is limited to features derived from intensities of pixels or patches. Features such as contiguity of two regions cannot be considered without prior knowledge of one of the two class la-bels. Instead of stacking various classification algorithms, we describ...
Article
Full-text available
The issue of detecting abnormal vibrations from spectra is addressed in this article, when little is known both on the mechanical behavior of the system, and on the characteristic patterns of potential faults. With vibration measured from a bearing test rig and from an aircraft engine, we show that when only a small learning set is available, proba...
Article
Full-text available
Probability density estimation (PDF) is a task of primary importance in many contexts, including Bayesian learning and novelty detection. Despite the wide variety of methods at disposal to estimate PDF, only a few of them are widely used in practice by data analysts. Among the most used methods are the histograms, Parzen windows, vector quantiza...
Article
In the context of feature selection, there is a trade-off between the number of selected features and the generalisation error. Two plots may help to summarise feature selection: the feature selection path and the sparsity-error trade-off curve. The feature selection path shows the best feature subset for each subset size, whereas the sparsity-erro...
Article
Mutual Information estimation is an important task for many data mining and machine learning applications. In particular, many feature selection algorithms make use of the mutual information criterion and could thus benefit greatly from a reliable way to estimate this criterion. More precisely, the multivariate mutual information (computed between...
Article
Feature selection is a preprocessing step of great importance for a lot of pattern recognition and machine learning applications, including classification. Even if feature selection has been extensively studied for classical problems, very little work has been done to take into account a possible imprecision or uncertainty in the assignment of the...
Article
This paper proposes a method for the automatic classification of heartbeats in an ECG signal. Since this task has specific characteristics such as time dependences between observations and a strong class unbalance, a specific classifier is proposed and evaluated on real ECG signals from the MIT arrhythmia database. This classifier is a weighted var...
Article
Full-text available
Dimensionality reduction aims at representing high-dimensional data in low-dimensional spaces, mainly for visualization and exploratory purposes. As an alternative to projections on linear subspaces, nonlinear dimensionality reduction, also known as manifold learning, can provide data representations that preserve structural properties such as pair...
Conference Paper
Full-text available
The issue of detecting abnormal vibrations is addressed in this article, when little is known both on the mechanical behavior of the system, and on the characteristic patterns of potential faults. With data from a bearing test rig and from an aircraft engine, we show that when only a small learning set is available, Bayesian inference has several a...
Conference Paper
Full-text available
Dimensionality reduction is a well known technique in signal processing oriented to improve both the computational cost and the performance of classifiers. We use an electroencephalogram (EEG) feature matrix based on three extraction methods: tracks extraction, wavelets coefficients and Fractional Fourier Transform. The dimension reduction is perfo...
Conference Paper
The performance of traditional classification models can adversely be impacted by the presence of label noise in training observations. The pioneer work of Lawrence and Schölkopf tackled this issue in datasets with independent observations by incorporating a statistical noise model within the inference algorithm. In this paper, the specific case of...
Article
Support vector regression (SVR) is a state-of-the-art method for regression which uses the ε‐sensitive loss and produces sparse models. However, non-linear SVRs are difficult to tune because of the additional kernel parameter. In this paper, a new parameter-insensitive kernel inspired from extreme learning is used for non-linear SVR. Hence, the pra...
Conference Paper
In many real-world situations, the data cannot be assumed to be precise. Indeed uncertain data are often encountered, due for example to the imprecision of measurement devices or to continuously moving objects for which the exact position is impossible to obtain. One way to model this uncertainty is to represent each data value as a probability dis...
Chapter
Full-text available
One the earliest challenges a practitioner is faced with when using distance-based tools lies in the choice of the distance, for which there often is very few information to rely on. This chapter proposes to find a compromise between an a priori unoptimized choice (e.g. the Euclidean distance) and a fully-optimized, but computationally expensive, c...
Conference Paper
Full-text available
Aircraft engines are designed to be used during several tens of years. Ensuring a proper operation of engines over their lifetime is therefore an important and difficult task. The maintenance can be improved if efficient procedures for the understanding of data flows produced by sensors for monitoring purposes are implemented. This paper details su...
Conference Paper
Feature selection is fundamental in many data mining or machine learning applications. Most of the algorithms proposed for this task make the assumption that the data are either supervised or unsupervised, while in practice supervised and unsupervised samples are often simultaneously available. Semi-supervised feature selection is thus needed, and...
Conference Paper
This paper proposes the use of mutual information for feature selection in multi-label classification, a surprisingly almost not studied problem. A pruned problem transformation method is first applied, transforming the multi-label problem into a single-label one. A greedy feature selection procedure based on multidimensional mutual information is...
Article
Mode estimation is extensively studied in statistics. One of the most widely used methods of mode estimation is hill-climbing on a kernel density estimator with gradient ascent or a fixed-point approach. Within this framework, Gaussian kernels proves to be a natural and intuitive option for non-parametric density estimation. This paper shows that i...
Conference Paper
Feature selection is an important task for many machine learning applications; moreover missing data are encoutered very often in practice. This paper proposes to adapt a nearest neighbors based mutual information estimator to handle missing data and to use it to achieve feature selection. Results on artificial and real world datasets show that the...
Conference Paper
Full-text available
This paper proposes a method to perform class-specific feature selection in multiclass support vector machines addressed with the one-against-all strategy. The main issue arises at the final step of the classification process, where binary classifier outputs must be compared one against another to elect the winning class. This comparison may be bia...
Conference Paper
The diagnosis of cardiac dysfunctions requires the analysis of long-term ECG signal recordings, often containing hundreds to thousands of heart beats. In this work, automatic inter-patient classification of heart beats follow-ing AAMI guidelines is investigated. The prior of the normal class is by far larger than the other classes, and the classifi...
Article
This paper proposes an algorithm for feature selection in the case of mixed data. It consists in ranking independently the categorical and the continuous features before recombining them according to the accuracy of a classifier. The popular mutual information criterion is used in both ranking procedures. The proposed algorithm thus avoids the use...
Article
Full-text available
Supervised and interpatient classification of heart beats is primordial in many applications requiring long-term monitoring of the cardiac function. Several classification models able to cope with the strong class unbalance and a large variety of feature sets have been proposed for this task. In practice, over 200 features are often considered, and...
Conference Paper
Full-text available
www.uclouvain.be Abstract. This paper presents a Semi-Supervised Feature Selection Method based on a univariate relevance measure applied to a multiobjective approach of the problem. Along the process of decision of the optimal solution within Pareto-optimal set, atempting to maximize the relevance indexes of each feature, it is possible to determi...
Article
Dimensionality reduction aims at representing high-dimensional data in low-dimensional spaces, in order to facilitate their visual interpretation. Many techniques exist, ranging from simple linear projections to more complex nonlinear transformations. The large variety of methods emphasizes the need of quality criteria that allow for fair compariso...
Article
Full-text available
Fractal dimension is an index which can be used to characterize urban areas. The use of the curve of scaling behaviour is less common. However, its shape gives local information about the morphology of the built-up area. This paper suggests a method based on a <?tf=“t906”>k-medoid for clustering these curves. It is applied to forty-nine wards of Eu...
Conference Paper
Full-text available
The problem of aircraft engine condition monitoring based on vibration signals is addressed. To do so, we compare two estimators of the Frequency Response Function of an aircraft engine which input is its shaft angular position and which output is an accelerometric signal that measures vibrations. It is shown that this problem can be seen as a smoo...
Article
Full-text available
The large number of methods for EEG feature extraction demands a good choice for EEG features for every task. This paper compares three subsets of features obtained by tracks extraction method, wavelet transform and fractional Fourier transform. Particularly, we compare the performance of each subset in classification tasks using support vector mac...
Conference Paper
Full-text available
Unsupervised dimensionality reduction aims at representing high-dimensional data in lower-dimensional spaces in a faithful way. Dimensionality reduction can be used for compression or denoising purposes, but data visualization remains one its most prominent applications. This paper attempts to give a broad overview of the domain. Past developments...
Conference Paper
Full-text available
Dimensionality reduction techniques aim at representing high-dimensional data in low-dimensional spaces. To be faithful and reliable, the representation is usually required to preserve proximity relationships. In practice, methods like multidimensional scaling try to fulfill this requirement by preserving pairwise distances in the low-dimensional r...