Chapter

A Comparative Study of Microarray Data Classification Methods Based on Ensemble Biological Relevant Gene Sets

DOI: 10.1007/978-3-642-13214-8_4 In book: Advances in Bioinformatics, pp.25-32
Source: dx.doi.org

ABSTRACT

In this work we study the utilization of several ensemble alternatives for the task of classifying microarray data by using
prior knowledge known to be biologically relevant to the target disease. The purpose of the work is to obtain an accurate
ensemble classification model able to outperform baseline classifiers by introducing diversity in the form of different gene
sets. The proposed model takes advantage of WhichGenes, a powerful gene set building tool that allows the automatic extraction
of lists of genes from multiple sparse data sources. Preliminary results using different datasets and several gene sets show
that the proposal is able to outperform basic classifiers by using existing prior knowledge.

Keywordsmicroarray data classification-ensemble classifiers-gene sets-prior knowledge

Download full-text

Full-text

Available from: Miguel Reboiro-Jato, Apr 07, 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: An important emerging medical application domain for microarray technology is clinical decision support in the form of diagnosis of diseases. For this task, several computational methods ranging from statistical alternatives to more complex hybrid systems have been previously proposed in the literature. In this work we study the utilisation of several ensemble alternatives for the task of classifying microarray data by using prior knowledge known to be biologically relevant to the target disease. The experimental results using different datasets and several gene sets show that the proposal is able to outperform previous approaches by introducing diversity as different gene sets.
    No preview · Article · Jan 2012 · International Journal of Data Mining and Bioinformatics
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the last years, microarray technology has become widely used in relevant biomedical areas such as drug target identification, pharmacogenomics or clinical research. However, the necessary prerequisites for the development of valuable translational microarray-based diagnostic tools are (i) a solid understanding of the relative strengths and weaknesses of underlying classification methods and (ii) a biologically plausible and understandable behaviour of such models from a biological point of view. In this paper we propose a novel classifier able to combine the advantages of ensemble approaches with the benefits obtained from the true integration of biological knowledge in the classification process of different microarray samples. The aim of the current work is to guarantee the robustness of the proposed classification model when applied to several microarray data in an inter-dataset scenario. The comparative experimental results demonstrated that our proposal working with biological knowledge outperform
    No preview · Article · Jan 2013 · Expert Systems with Applications
  • [Show abstract] [Hide abstract]
    ABSTRACT: Since the introduction of DNA microarray technology, there has been an increasing interest on clinical application for cancer diagnosis. However, in order to effectively translate the advances in the field of microarray-based classification into the clinic area, there are still some problems related with both model performance and biological interpretability of the results. In this paper, a novel ensemble model is proposed able to integrate prior knowledge in the form of gene sets into the whole microarray classification process. Each gene set is used as an informed feature selection subset to train several base classifiers in order to estimate their accuracy. This information is later used for selecting those classifiers comprising the final ensemble model. The internal architecture of the proposed ensemble allows the replacement of both base classifiers and the heuristics employed to carry out classifier fusion, thereby achieving a high level of flexibility and making it possible to configure and adapt the model to different contexts. Experimental results using different datasets and several gene sets show that the proposal is able to outperform classical alternatives by using existing prior knowledge adapted from publicly available databases.
    No preview · Article · Apr 2014 · Applied Soft Computing