Gauthier Doquire's research while affiliated with Université Catholique de Louvain - UCLouvain and other places

Publications (18)

Article
A way to achieve feature selection for classification problems polluted by label noise is proposed. The performances of traditional feature selection algorithms often decrease sharply when some samples are wrongly labelled. A method based on a probabilistic label noise model combined with a nearest neighbours-based entropy estimator is introduced t...
Article
This paper introduces a new methodology to perform feature selection in multi-label classification problems. Unlike previous works based on the @g^2 statistics, the proposed approach uses the multivariate mutual information criterion combined with a problem transformation and a pruning strategy. This allows us to consider the possible dependencies...
Article
Feature selection is a task of fundamental importance for many data mining or machine learning applications, including regression. Surprisingly, most of the existing feature selection algorithms assume the problems to address are either supervised or unsupervised, while supervised and unsupervised samples are often simultaneously available in real-...
Article
Feature selection is an important preprocessing step for many high-dimensional regression problems. One of the most common strategies is to select a relevant feature subset based on the mutual information criterion. However, no connection has been established yet between the use of mutual information and a regression error criterion in the machine...
Article
Mutual information is a widely used performance criterion for filter feature selection. However, despite its popularity and its appealing properties, mutual information is not always the most appropriate criterion. Indeed, contrary to what is sometimes hypothesized in the literature, looking for a feature subset maximizing the mutual information do...
Chapter
Mutual information is one of the most popular criteria used in feature selection, for which many estimation techniques have been proposed. The large majority of them are based on probability density estimation and perform badly when faced to high-dimensional data, because of the curse of dimensionality. However, being able to evaluate robustly the...
Article
Feature selection is an important preprocessing task for many machine learning and pattern recognition applications, including regression and classification. Missing data are encountered in many real-world problems and have to be considered in practice. This paper addresses the problem of feature selection in prediction problems where some occurren...
Article
Feature selection is a preprocessing step of great importance for a lot of pattern recognition and machine learning applications, including classification. Even if feature selection has been extensively studied for classical problems, very little work has been done to take into account a possible imprecision or uncertainty in the assignment of the...
Article
Mutual Information estimation is an important task for many data mining and machine learning applications. In particular, many feature selection algorithms make use of the mutual information criterion and could thus benefit greatly from a reliable way to estimate this criterion. More precisely, the multivariate mutual information (computed between...
Conference Paper
In many real-world situations, the data cannot be assumed to be precise. Indeed uncertain data are often encountered, due for example to the imprecision of measurement devices or to continuously moving objects for which the exact position is impossible to obtain. One way to model this uncertainty is to represent each data value as a probability dis...
Conference Paper
Feature selection is fundamental in many data mining or machine learning applications. Most of the algorithms proposed for this task make the assumption that the data are either supervised or unsupervised, while in practice supervised and unsupervised samples are often simultaneously available. Semi-supervised feature selection is thus needed, and...
Conference Paper
This paper proposes the use of mutual information for feature selection in multi-label classification, a surprisingly almost not studied problem. A pruned problem transformation method is first applied, transforming the multi-label problem into a single-label one. A greedy feature selection procedure based on multidimensional mutual information is...
Article
This paper proposes an algorithm for feature selection in the case of mixed data. It consists in ranking independently the categorical and the continuous features before recombining them according to the accuracy of a classifier. The popular mutual information criterion is used in both ranking procedures. The proposed algorithm thus avoids the use...
Conference Paper
Feature selection is an important task for many machine learning applications; moreover missing data are encoutered very often in practice. This paper proposes to adapt a nearest neighbors based mutual information estimator to handle missing data and to use it to achieve feature selection. Results on artificial and real world datasets show that the...
Article
Full-text available
Supervised and interpatient classification of heart beats is primordial in many applications requiring long-term monitoring of the cardiac function. Several classification models able to cope with the strong class unbalance and a large variety of feature sets have been proposed for this task. In practice, over 200 features are often considered, and...

Citations

... have least impact on entropy when removed. The entropy measure used is calculated by summing similarities between instance pairs. Mixed data is considered in the associated distance calculation, whereby numeric values are discretised prior to the calculation. This process of discretisation via binning, by its nature, loses precision of information (Doquire. & Verleysen., 2011) in addition to the dependence on the chosen number of bins. ...
... Also, numerical experiments suggest that for the estimation of information theoretic functionals, kNN methods can usually outperform Kernel method [47, 11,16]. As a result, kNN methods are widely used for nonparametric statistical problems [25]. ...
... Constant features contain only one value, and the variance threshold value is 0. Totally of 994 constant features were identified from 25088 high-dimensional features (Doquire & Verleysen, 2013). However, the constant features do not affect the models and are removed, leaving 24094 high-dimensional features. ...
... The effect of measurement errors on covariate importance values and regression coefficient estimates have not been fully investigated and to the best of our knowledge, it is unclear what caused the differences. However, previous studies have reported that measurement errors could affect machine learning by decreasing prediction performance (Nettleton et al., 2010), increasing model complexity (Brodley and Friedl, 1999) and affecting feature selection and covariates importance values (Frénay et al., 2014). In this study the insignificant influence of measurement errors on prediction performance could be because of the homogeneity of the measurement error variances. ...
... A larger MI indicates more significant dependencies [32]. More recently, MI has been widely employed before machine learning or deep learning (DL) [33][34][35]. We computed MI values between the considered features and the POD. ...
... Our scEpiAge setup, illustrated in Fig 2 A&B, is similar to the genetic and epigenetic distance based prediction methods used in other settings (Eirola et al., 2013;Han et al., 2020aHan et al., , 2020bKarimzadeh and Olafsson, 2019) and also by Trapp et al (Trapp et al., 2021), but includes the following major modifications: (1) we generalised it to work on both bulk and single-cell data, (2) we used a significantly larger number of reference samples (2.3 times more samples for liver and 3.6 times more samples for blood) originating from five different studies, and (3) we optimised feature selection and modelling. In short, we selected per tissue the top age correlating sites pruned to only capture independent age associated CpGs. ...
... Other ML approaches have been suggested to approach the case when the independent and identically distributed data assumption is not considered, such as Hidden Markov Models (HMMs), Conditional Random Fields (CRFs), and certain type of neural networks such as recurrent networks [8]. Methods to select a certain number of features for the classifier among those available can also be used at this stage, in particular wrapper methods that test all the possible feature combinations to find the optimal solution in terms of best classification performance, or filters methods that rank the available features according to a certain metric (e.g. the T-test or the mutual information) and then select the best N features [25]- [27]. It should be noted that similar processing steps are followed also for different types of sensors, such as wearable inertial measurement units (IMUs) as detailed in [13]. ...
... In this context, a correlated but important feature may be overthrown, leading to the wrong conclusion. Another well-known feature selection method is the mutual information (MI)based method, which measures the uncertainty of random variables, termed Shannon's entropy [21]. A recent feature subset selection method based on merit score, also implemented by Kathirgamanathan and P. Cunningham [22], was used to identify relevant features in the MTS domain. ...
... Primarily, we demonstrate how the cost function can help identify appropriate data preprocessing to mitigate problems with ill-behaved manifolds. While in some cases insights are available as to which data preprocessing results in accurate modeling [83,122,125,174,178,220], such decisions still have to be largely made through trial and error. Finally, with the many linear and nonlinear dimensionality reduction techniques available in the research community, we show how they can be assessed in their capacity to generate well-defined manifolds. ...
... The proposed method is compared with three representative classical methods: the PPT + MI [46], LP + CHI [47], and PPT + Relief-F [37], described as Method 1, Method 2, and Method 3, respectively. In Method 1, multi-label data was initially transformed to single-label data using the pruned problem transformation (PPT), after which the optimal features in the converted feature set were chosen based on mutual information. ...