Article
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The MI between class assignments and feature values may then be informative as to which features have good predictive powers of class membership. This relation was also used in the exploratory analyses, and has been suggested by others as a way to identify best features [10,20,60]. ...
... All but one of these were also used for the project described in this report. The features that are used in BBCI are features 12,13,14,15,18,19,20, and 21. Additionally, one feature, the "current density norm", is used in the BBCI algorithm, but not here. ...
... In this subsection we present an algorithm based on permutation testing, described in Algorithm 1, inspired by François et al. [14] to compare the difference in information loss due to various quantile removal. Let 1 , · · · , ℓ be the quantile values. ...
Preprint
Microbial communities are widely studied using high-throughput sequencing techniques, such as 16S rRNA gene sequencing. These techniques have attracted biologists as they offer powerful tools to explore microbial communities and investigate their patterns of diversity in biological and biomedical samples at remarkable resolution. However, the accuracy of these methods can negatively affected by the presence of contamination. Several studies have recognized that contamination is a common problem in microbial studies and have offered promising computational and laboratory-based approaches to assess and remove contaminants. Here we propose a novel strategy, MI-based (mutual information based) filtering method, which uses information theoretic functionals and graph theory to identify and remove contaminants. We applied MI-based filtering method to a mock community data set and evaluated the amount of information loss due to filtering taxa. We also compared our method to commonly practice traditional filtering methods. In a mock community data set, MI-based filtering approach maintained the true bacteria in the community without significant loss of information. Our results indicate that MI-based filtering method effectively identifies and removes contaminants in microbial communities and hence it can be beneficial as a filtering method to microbiome studies. We believe our filtering method has two advantages over traditional filtering methods. First, it does not required an arbitrary choice of threshold and second, it is able to detect true taxa with low abundance.
... Many researches have used the mutual information to estimate the association between or pair of variables in terms of the information of the joint distribution of expression values that variables hold [148,149,[165][166][167]. Recently Zhuang, et al. [168], used the mutual information to compare the information content between the variables of the kinematics from two monkeys and their frequency bands, showing a high performance of connectivity in the higher frequency bands using this metric. ...
Thesis
The analysis of complex physiologic time series has been the focus of considerable attention since simple mathematical models cannot be found to describe them. Signals derived from skin microvascular networks using Laser Doppler flowmetry (LDF) have been broadly investigated using both linear and nonlinear dynamical methods providing significant information about the microvascular function. This study aims to explore complexity methods that can quantify the changes in the complex flow motion characteristics from the human microcirculation in a range of pathophysiological states. Time and frequency domain analysis were used to define the signal values from the microvascular perfusion and their power contribution using the spectral analysis to quantify the different properties modulating the network perfusion. Nonlinear complexity methods were used to quantify the signal regularity by evaluating the presence of repeated patterns providing complexity variants at single and across multiple spatial and temporal scales. Further, a new approach, attractor reconstruction analysis, was used providing quantitative measures of the microvascular system in phase space and a visual representation in the shape and variability of the signal producing a two-dimensional attractor with features like density and symmetry. The skin blood flux (BF) and tissue oxygenation (OXY) signals obtained from a combined Laser Doppler flowmetry (LDF) and white light spectroscopy (WLS) device were investigated using time domain, frequency domain and the nonlinear methods in the skin of a healthy cohort during increased local warming. This study revealed multiple oscillatory components with a remarkable increase in the cardiac activity during thermally induced vasodilation. There was also shown a significant attenuation in the complexity across multiple scales and a significant drop in the attractor density measures during increased local warming. Subsequently, both linear and nonlinear methods were used to investigate the LDF signals obtained from groups of individuals at an increased cardiovascular disease (CVD) risk, categorised with presence or absence of type 2 diabetes and use of calcium channel blocker (CB) medication. The results showed an increase on the high frequency cardiac activity with CB treatment. There was a significant decrease in the complexity of the blood flux signals as the CVD risk increases across multiple time scales. Also, there is a decline with progression of CVD risk in the measures derived from attractor reconstruction analysis. The highest separability between these groups was achieved using the attractor and complexity measures combined. In conclusion, time and frequency domain analysis alone were insufficient to estimate the complex dynamics of the microvascular network during the application of a standard stressor. Nonlinear analysis provides a better characterisation of the flexibility of the system in a range of pathophysiological conditions. Together these mathematical approaches were able to quantify different microvascular functional states. With machine learning techniques this should allow the classification of the tissue perfusion features providing a use for clinical assessment.
... There are four to eight feature-level models and 12 methodlevel models for each method, so some models may score well purely by chance. To address this critical issue, the permutation test (Venkatraman, 2000;Francois et al., 2006;Ojala and Garriga, 2010) was implemented. When training a model A based on specific features, another model B was also trained by the same samples but with randomly shuffled labels. ...
Article
The electroencephalogram (EEG) is an informative neuroimaging tool for studying attention-deficit/hyperactivity disorder (ADHD); one main goal is to characterize the EEG of children with ADHD. In this study, we employed the power spectrum, complexity and bicoherence, biomarker candidates for identifying ADHD children in a machine learning approach, to characterize resting-state EEG (rsEEG). We built support vector machine classifiers using a single type of feature, all features from a method (relative spectral power, spectral power ratio, complexity or bicoherence), or all features from all four methods. We evaluated effectiveness and performance of the classifiers using the permutation test and the area under the receiver operating characteristic curve (AUC). We analyzed the rsEEG from 50 ADHD children and 58 age-matched controls. The results show that though spectral features can be used to build a convincing model, the prediction accuracy of the model was unfortunately unstable. Bicoherence features had significant between-group differences, but classifier performance was sensitive to brain region used. rsEEG complexity of ADHD children was significantly lower than controls and may be a suitable biomarker candidate. Through a machine learning approach, 14 features from various brain regions using different methods were selected; the classifier based on these features had an AUC of 0.9158 and an accuracy of 84.59%. These findings strongly suggest that the combination of rsEEG characteristics obtained by various methods may be a tool for identifying ADHD.
... 2 values of the signal in this three-dimensional grid Filter methods are methods in which features are selected without involving the translation algorithm 3 . This is done by statistical analysis of the interdependence between features and target (i.e., correlation analysis, mutual information etc) (Battiti, 1991;Francois et al., 2006;de Siqueira Santos et al., 2013). These techniques are less computationally intensive and provide better generalisation compared to wrapper methods. ...
Thesis
Full-text available
Brain computer interfaces (BCIs) are alternative communication pathways between human and artificial agents by direct brain signals measurement and processing. Particularly, endogenous BCIs are BCIs in which the control signal is internally generated by the user with no need for external stimuli. Due to their independence, endogenous BCIs offer high mobility and independence to the users and encompass the natural idea of brain control. Currently, Sensorimotor rhythms (SMR) are the neuronal phenomenon of choice in non-invasive endogenous BCIs, and the BCI systems that use this neuronal phenomenon have achieved the highest degree of freedom control in non-invasive design. Nevertheless, they are subject to a long period of training prior to attaining a satisfactory level of control requiring users to learn to modulate their rhythms. A BCI system that requires months of training prior to everyday use will be of little use in a practical sense. Hence, the following study has analysed the causes of this slow rise to performance, provided recommendations on alternative solutions to palliate this problem. This research has demonstrated that the slow control attainment in SMR-based BCIs is centrally due to its reliance on user training (neural adaptation) which is adaptive but slow in the context of SMR modulations and due to the weak decoding of the neurological phenomenon utilised by the user. This was essentially assessed by evaluation of the correlation (r2) between user features and target position over sessions as an estimate of neural adaptation, and by evaluation of the predictive power of the translation algorithm (10x10 fold cross-validation) over the different practice sessions as an estimate of artificial adaptation. The results obtained show that the features-target correlation increases over sessions, while at the same time the predictive accuracy (R2) of the translation algorithm remains averagely steady and very low (R2 best = 0:04). Additional performance indexes were used to substantiate this conclusion. This research essentially proposes that the most plausible solution to fasten control attainment will be to increase the predictive power of translation algorithm driving the BCI, towards a self-paced decoding of the neuronal component. This strategy could diminish the burden laid on the user (i.e., neural adaptation) as this novel approach will emphasise on machine learning.
... Likewise, the correlation coefficient r and additionally mutual information MI were both calculated between these structural metrics and node entropy for each modeled FC matrix across the coupling strength parameter. To test for significance of mutual information (François et al. 2006), all node entropy values were shuffled 10 000 times creating 10 000 randomized vectors. A randomized mutual information MI R was computed between each of these vectors and vectors corresponding to all structural metrics. ...
Article
The brain is a network that mediates information processing through a wide range of states. The extent of state diversity is a reflection of the entropy of the network. Here we measured the entropy of brain regions (nodes) in empirical and modeled functional networks reconstructed from resting state fMRI to address the connection of entropy at rest with the underlying structure measured through diffusion spectrum imaging. Using 18 empirical and 18 modeled stroke networks, we also investigated the effect that focal lesions have on node entropy and information diffusion. Overall, positive correlations between node entropy and structure were observed, especially between node entropy and node strength in both empirical and modeled data. Although lesions were restricted to one hemisphere in all stroke patients, entropy reduction was not only present in nodes from the damaged hemisphere, but also in nodes from the contralesioned hemisphere, an effect replicated in modeled stroke networks. Globally, information diffusion was also affected in empirical and modeled strokes compared with healthy controls. This is the first study showing that artificial lesions affect local and global network aspects in very similar ways compared with empirical strokes, shedding new light into the functional nature of stroke.
... A similar kind of permutation test was used with mutual information in the context of feature selection in François et al. (2006). Pseudocode for the conditional independence test is presented in Algorithm 1. ...
Article
Full-text available
We propose a method for learning Markov network structures for continuous data without invoking any assumptions about the distribution of the variables. The method makes use of previous work on a non-parametric estimator for mutual information which is used to create a non-parametric test for multivariate conditional independence. This independence test is then combined with an efficient constraint-based algorithm for learning the graph structure. The performance of the method is evaluated on several synthetic data sets and it is shown to learn considerably more accurate structures than competing methods when the dependencies between the variables involve non-linearities.
... Iðx i ; yÞ ¼ À 1 2 lnð1 À qðx i ; yÞ 2 Þ; i ¼ 1; Á Á Á ; p [39], where qðx i ; yÞ is the Pearson correlation coefficient between trait y and the ith gene expression. Rather than using an arbitrary threshold, here we applied a permutation test to determine the significance of the MI measure for each gene expression [40]. Under the null hypothesis that a gene expression is not associated with the trait, i.e. ...
Article
With the advancement of biotechniques, a vast amount of genomic data is generated with no limit. Predicting a disease trait based on these data offers a cost-effective and time-efficient way for early disease screening. Here we proposed a composite kernel partial least squares (CKPLS) regression model for quantitative disease trait prediction focusing on genomic data. It can efficiently capture nonlinear relationships among features compared with linear learning algorithms such as Least Absolute Shrinkage and Selection Operator or ridge regression. We proposed to optimize the kernel parameters and kernel weights with the genetic algorithm (GA). In addition to improved performance for parameter optimization, the proposed GA-CKPLS approach also has better learning capacity and generalization ability compared with single kernel-based KPLS method as well as other nonlinear prediction models such as the support vector regression. Extensive simulation studies demonstrated that GA-CKPLS had better prediction performance than its counterparts under different scenarios. The utility of the method was further demonstrated through two case studies. Our method provides an efficient quantitative platform for disease trait prediction based on increasing volume of omics data.
... This is not a problem as in this analysis the direction of the relationship is not particularly important and can be easily found anyway. Similar analyses have been performed outside of economic systems [68]. Nonetheless a less computationally expensive method can be presented, without introducing very strong assumptions. ...
Article
Recently the interest of researchers has shifted from the analysis of synchronous relationships of financial instruments to the analysis of more meaningful asynchronous relationships. Both of those analyses are concentrated only on Pearson's correlation coefficient and thus intraday lead-lag relationships associated with such. Under Efficient Market Hypothesis such relationships are not possible as all information is embedded in the prices. In this paper we analyse lead-lag relationships of financial instruments and extend known methodology by using mutual information instead of Pearson's correlation coefficient, which not only is a more general measure, sensitive to non-linear dependencies, but also can lead to a simpler procedure of statistical validation of links between financial instruments. We analyse lagged relationships using NYSE 100 data not only on intraday level but also for daily stock returns, which has usually been ignored.
... Feature filtering with the mutual information and the permutation test was also recently proposed [6,28,26], in a pure feature ranking approach where the permutation test is used to automatically set a threshold on the value of the mutual information. ...
Article
Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to make because, as the dimensionality of the subset increases, the estimation of the mutual information becomes less and less reliable. This paper proposes to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues. The resampling methods bring information about the variance of the estimator, information which can then be used to automatically set the parameter and to calculate a threshold to stop the forward procedure. The procedure is illustrated on a synthetic dataset as well as on real-world examples.
Article
For propeller-driven vessels, cavitation is the most dominant noise source producing both structure-borne and radiated noise impacting wildlife, passenger comfort, and underwater warfare. Physically plausible and accurate predictions of the underwater radiated noise at design stage, i.e., for previously untested geometries and operating conditions, are fundamental for designing silent and efficient propellers. State-of-the-art predictive models are based on physical, data-driven, and hybrid approaches. Physical models (PMs) meet the need for physically plausible predictions but are either too computationally demanding or not accurate enough at design stage. Data-driven models (DDMs) are computationally inexpensive ad accurate on average but sometimes produce physically implausible results. Hybrid models (HMs) combine PMs and DDMs trying to take advantage of their strengths while limiting their weaknesses but state-of-the-art hybridisation strategies do not actually blend them, failing to achieve the HMs full potential. In this work, for the first time, we propose a novel HM that recursively correct a state-of-the-art PM by means of a DDM which simultaneously exploits the prior physical knowledge in the definition of its feature set and the data coming from a vast experimental campaign at the Emerson Cavitation Tunnel on the Meridian standard propeller series behind different severities of the axial wake. Results in different extrapolating conditions, i.e., extrapolation with respect to propeller rotational speed, wakefield, and geometry, will support our proposal both in terms of accuracy and physical plausibility.
Article
Full-text available
Microbial communities are widely studied using high-throughput sequencing techniques, such as 16S rRNA gene sequencing. These techniques have attracted biologists as they offer powerful tools to explore microbial communities and investigate their patterns of diversity in biological and biomedical samples at remarkable resolution. However, the accuracy of these methods can negatively affected by the presence of contamination. Several studies have recognized that contamination is a common problem in microbial studies and have offered promising computational and laboratory-based approaches to assess and remove contaminants. Here we propose a novel strategy, MI-based (mutual information based) filtering method, which uses information theoretic functionals and graph theory to identify and remove contaminants. We applied MI-based filtering method to a mock community data set and evaluated the amount of information loss due to filtering taxa. We also compared our method to commonly practice traditional filtering methods. In a mock community data set, MI-based filtering approach maintained the true bacteria in the community without significant loss of information. Our results indicate that MI-based filtering method effectively identifies and removes contaminants in microbial communities and hence it can be beneficial as a filtering method to microbiome studies. We believe our filtering method has two advantages over traditional filtering methods. First, it does not required an arbitrary choice of threshold and second, it is able to detect true taxa with low abundance.
Article
Full-text available
Feature Selection for mixed data is an active research area with many applications in practical problems where numerical and non-numerical features describe the objects of study. This paper provides the first comprehensive and structured revision of the existing supervised and unsupervised feature selection methods for mixed data reported in the literature. Additionally, we present an analysis of the main characteristics, advantages, and disadvantages of the feature selection methods reviewed in this survey and discuss some important open challenges and potential future research opportunities in this field.
Article
Full-text available
People living with HIV are at increased risk for experiencing trauma, which may be linked to reduced adherence to antiretroviral therapy (ART), making it more difficult to achieve and maintain viral suppression. The current study sought to assess whether traumatic life experiences were associated with lower ART adherence among a diverse sample of people living with HIV in South Carolina. A cross-sectional survey was completed by 402 individuals receiving HIV care from a large immunology center. Principal component analysis revealed three primary categories of trauma experience (extreme violence/death-related trauma, physical and sexual assault, and accidental/disaster-related trauma). Multivariable logistic regression models using complete case analysis and multiple imputation were used to determine the associations between experiencing each trauma category and ART adherence. Complete case analysis showed that overall, participants who reported exposure to any trauma were 58% less likely to be adherent to their ART (adjusted OR 0.42; 95% CI 0.21–0.86) compared to respondents who did not experience trauma. Participants exposed to extreme violence/death-related trauma were 63% less likely to be adherent to their ART (adjusted OR 0.37; 95% CI 0.15–0.95) compared to respondents who did not experience trauma. Participants exposed to physical and sexual assault were 65% less likely (adjusted OR 0.35; 95% CI 0.16–0.77) and those who reported experiencing accidental/disaster-related trauma were 56% less likely (adjusted OR 0.44; 95% CI 0.21–0.93) to report being ART adherent compared to participants who did not experience trauma. Analyses with multiple imputation yielded similar findings as the complete case analyses. When the data were analyzed separately by gender, the associations between overall trauma, extreme violence/death-related trauma, and physical and sexual assault were statistically significant for men using complete case and multiple imputation analyses. There were no statistically significant associations between trauma and ART adherence among women. Findings highlight the need to adopt trauma-informed approaches and integrate trauma- and gender-specific interventions into HIV clinical care in the Southern United States.
Article
Full-text available
This paper presents a new relevance index based on mutual information that is based on labeled and unlabeled data. The proposed index, which is based in Mutual Information, takes into account the similarity between features and their joint influence on the output variable. Based on this principle, a method to select features is developed to eliminate redundant and irrelevant features when the relevance index value is less then a threshold value. A strategy to set the threshold is also proposed in this work. Experiments show that the new method is capable of capturing important joint relations between input and output variables, which are incorporated into a new feature selection clustering approach.
Article
Full-text available
Currently Mutual Information has been widely used in pattern recognition and feature selection problems. It may be used as a measure of redundancy between features as well as a measure of dependency evaluating the relevance of each feature. Since marginal densities of real datasets are not usually known in advance, mutual information should be evaluated by estimation. There are mutual information estimators in the literature that were specifically designed for continuous or for discrete variables, however, most real problems are composed by a mixture of both. There is, of course, some implicit loss of information when using one of them to deal with mixed continuous and discrete variables. This paper presents a new estimator that is able to deal with mixed set of variables. It is shown in experiments with synthetic and real datasets that the method yields reliable results in such circumstance.
Chapter
Adapting classification models to concept changes is one of the main challenges associated with learning in dynamic environments, where the definition of the target concept may change over time under the influence of varying contextual factors. Existing adaptive approaches, however, are limited in terms of the extent to which such contextual factors are explicitly identified and utilised, despite their importance. In response, we propose an informationtheoretic- based approach for systematic context identification, aiming to learn from data the contextual characteristics of the domain by identifying the context variables contributing to concept changes. Such explicit identification of context enables capturing the causes of drift, and hence facilitating more effective adaptation. We conduct experimental analyses to demonstrate the effectiveness of the approach on both simulated datasets with various change scenarios, and on an actual benchmark dataset from an electricity market.
Article
Small scale of labeled samples results in incorrect of computation of mutual information, which may lower the classification accuracy of minimal-redundancymaximal-relevance (mRMR) selective Bayesian classifiers. In order to solve the above problem, a kind of selective Bayesian classifier based on semi-supervised clustering algorithm is proposed. At first, a new semi-supervised Krepresentative clustering algorithm is designed by using the Bayesian posterior probability, which is applied to labeling the unlabeled samples so as to enlarge the scale of labeled samples. Then a novel feature selection criterion is proposed by combining the mRMR and the concept of Markov blanket to automatically determine a reasonably compact subset of features. In addition, a risk-regulation factor is introduced into the feature selection criterion to reduce the risk of mislabeling. At last, a Bayesian classifier is constructed based on the preprocessed samples. Experimental results indicate that the proposed Bayesian classifier can select optimal features to obtain high classification accuracy.
Article
Classification of structured data has gained importance recently. One important problem that exploits structured data is to computationally estimate some properties of small molecules. Among the algorithms for graph classification, kernel machines constitute a large portion. Although there are a number of graph kernels proposed in the literature, feature selection has only recently been considered in this domain. In this paper, we propose a feature selection method based on permutation tests, which not only improves the classification performance, but also provides space efficiency by eliminating uninformative features at the beginning. We demonstrate the performance of the method on a number of data sets in chemical compound classification.
Article
Full-text available
Predictive assessment of the risk of developing cardiovascular diseases is usually provided by computational approaches centred on Cox models. The complex interdependence structure underlying clinical data patterns can limit the performance of Cox analysis and complicate the interpretation of results, thus calling for complementary and integrative methods. Prognostic models are proposed for studying the risk associated with patients with known or suspected coronary artery disease (CAD) undergoing vasodilator stress echocardiography, an established technique for CAD detection and prognostication. In order to complement standard Cox models, network inference is considered a possible solution to quantify the complex relationships between heterogeneous data categories. In particular, a mutual information network is designed to explore the paths linking patient-associated variables to endpoint events, to reveal prognostic factors and to identify the best possible predictors of death. Data from a prospective, multicentre, observational study are available from a previous study, based on 4313 patients (2532 men; 64±11 years) with known (n=1547) or suspected (n=2766) CAD, who underwent high-dose dipyridamole (0.84 mg kg(-1) over 6 min) stress echocardiography with coronary flow reserve (CFR) evaluation of left anterior descending (LAD) artery by Doppler. The overall mortality was the only endpoint analysed by Cox models. The estimated connectivity between clinical variables assigns a complementary value to the proposed network approach in relation to the established Cox model, for instance revealing connectivity paths. Depending on the use of multiple metrics, the constraints of regression analysis in measuring the association strength among clinical variables can be relaxed, and identification of communities and prognostic paths can be provided. On the basis of evidence from various model comparisons, we show in this CAD study that there may be characteristic factors involved in prognostic stratification whose complexity suggests an exploration beyond the analysis provided by the still fundamental Cox approach.
Article
A kind of improved mRMR SBC was proposed by using K-means clustering and incremental learning algorithms to enlarge the scale of training samples. On one hand, the testing samples are labeled using the K-means clustering algorithm and are added to the training set. A regulatory factor is introduced into the process of attribute selection to reduce the risk of mislabel resulting from K-means clustering. On the other hand, some samples that are most helpful for improving the current classification accuracy are selected from the testing set and are added to the training set. Based on the enlarged training set, parameters in the Bayesian classifier are adjusted incrementally. Experimental results show that compared with mRMR SBC, the proposed Bayesian classifier has better classification results and is applicable for solving the classification problem for the high-dimensional dataset with little labels.
Article
Research on keystroke-based authentication has traditionally assumed human impostors who generate forgeries by physically typing on the keyboard. With bots now well understood to have the capacity to originate precisely timed keystroke sequences, this model of attack is likely to underestimate the threat facing a keystroke-based system in practice. In this work, we investigate how a keystroke-based authentication system would perform if it were subjected to synthetic attacks designed to mimic the typical user. To implement the attacks, we perform a rigorous statistical analysis on keystroke biometrics data collected over a 2-year period from more than 3000 users, and then use the observed statistical traits to design and launch algorithmic attacks against three state-of-the-art password-based keystroke verification systems. Relative to the zero-effort attacks typically used to test the performance of keystroke biometric systems, we show that our algorithmic attack increases the mean Equal Error Rates (EERs) of three high performance keystroke verifiers by between 28.6% and 84.4%. We also find that the impact of the attack is more pronounced when the keystroke profiles subjected to the attack are based on shorter strings, and that some users see considerably greater performance degradation under the attack than others. This article calls for a shift from the traditional zero-effort approach of testing the performance of password-based keystroke verifiers, to a more rigorous algorithmic approach that captures the threat posed by today’s bots.
Conference Paper
Using radial basis function networks for function approxima- tion tasks suffers from unavailable knowledge about an adequate network size. In this work, a measuring technique is proposed which can control the model complexity and is based on the correlation coefficient between two basis functions. Simulation results show good performance and, therefore, this technique can be integrated in the RBF training procedure.
Conference Paper
Full-text available
A hybrid filter/wrapper feature subset selection algorithm for regres- sion is proposed. First, features are filtered by means of a relevance and redun- dancy filter using mutual information between regression and target variables. We introduce permutation tests to find statistically significant relevant and re- dundant features. Second, a wrapper searches for good candidate feature sub- sets by taking the regression model into account. The advantage of a hybrid ap- proach is threefold. First, the filter provides interesting features independently from the regression model and, hence, allows for an easier interpretation. Sec- ondly, because the filter part is computationally less expensive, the global algo- rithm will faster provide good candidate subsets compared to a stand-alone wrapper approach. Finally, the wrapper takes the bias of the regression model into account, because the regression model guides the search for optimal fea- tures. Results are shown for the 'Boston housing' and 'orange juice' bench- marks based on the multilayer perceptron regression model.
Conference Paper
Full-text available
Monitoring the dynamics of networks in the brain is of central importance in normal and disease states. Current methods of detecting networks in the recorded EEG such as correlation and coherence only explore linear dependencies, which may be unsatisfactory. We propose applying mutual information as an alternative metric for assessing possible nonlinear statistical dependencies between EEG channels. However, EEG data are complicated by the fact that data are inherently non-stationary and also the brain may not work on the task continually. To address these concerns, we propose a novel EEG segmentation method based on the temporal dynamics of the cross-spectra of computed independent components. A real case study in Parkinson's disease and further group analysis employing ANOVA demonstrate different brain connectivity between tasks and between subject groups and also a plausible mechanism for the beneficial effects of medication used in this disease. The proposed method appears to be a promising approach for EEG analysis and warrants further study.
Article
Full-text available
In life sciences, interpretability of machine learning models is as important as their prediction accuracy. Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support vector machines and RandomForest (RF) models. Recently, it has been observed that RF models are biased in such a way that categorical variables with a large number of categories are preferred. In this work, we introduce a heuristic for normalizing feature importance measures that can correct the feature importance bias. The method is based on repeated permutations of the outcome vector for estimating the distribution of measured importance for each variable in a non-informative setting. The P-value of the observed importance provides a corrected measure of feature importance. We apply our method to simulated data and demonstrate that (i) non-informative predictors do not receive significant P-values, (ii) informative variables can successfully be recovered among non-informative variables and (iii) P-values computed with permutation importance (PIMP) are very helpful for deciding the significance of variables, and therefore improve model interpretability. Furthermore, PIMP was used to correct RF-based importance measures for two real-world case studies. We propose an improved RF model that uses the significant variables with respect to the PIMP measure and show that its prediction accuracy is superior to that of other existing models. R code for the method presented in this article is available at http://www.mpi-inf.mpg.de/ approximately altmann/download/PIMP.R CONTACT: altmann@mpi-inf.mpg.de, laura.tolosi@mpi-inf.mpg.de Supplementary data are available at Bioinformatics online.
ResearchGate has not been able to resolve any references for this publication.