A Comparative Study of Microarray Data Classification Methods Based on Ensemble Biological Relevant Gene Sets

DOI: 10.1007/978-3-642-13214-8_4

ABSTRACT In this work we study the utilization of several ensemble alternatives for the task of classifying microarray data by using
prior knowledge known to be biologically relevant to the target disease. The purpose of the work is to obtain an accurate
ensemble classification model able to outperform baseline classifiers by introducing diversity in the form of different gene
sets. The proposed model takes advantage of WhichGenes, a powerful gene set building tool that allows the automatic extraction
of lists of genes from multiple sparse data sources. Preliminary results using different datasets and several gene sets show
that the proposal is able to outperform basic classifiers by using existing prior knowledge.

Keywordsmicroarray data classification-ensemble classifiers-gene sets-prior knowledge

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The protein structure prediction (PSP) problem is concerned with the prediction of the folded, native, tertiary structure of a protein given its sequence of amino acids. It is a challenging and computationally open problem, as proven by the numerous methodological attempts and the research effort applied to it in the last few years. The potential energy functions used in the literature to evaluate the conformation of a protein are based on the calculations of two different interaction energies: local (bond atoms) and non-local (non-bond atoms). In this paper, we show experimentally that those types of interactions are in conflict, and do so by using the potential energy function Chemistry at HARvard Macromolecular Mechanics. A multi-objective formulation of the PSP problem is introduced and its applicability studied. We use a multi-objective evolutionary algorithm as a search procedure for exploring the conformational space of the PSP problem.
    Journal of The Royal Society Interface 03/2006; 3(6):139-51. · 3.86 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper reviews the application of multiobjective optimization in the fields of bioinformatics and computational biology. A survey of existing work, organized by application area, forms the main body of the review, following an introduction to the key concepts in multiobjective optimization. An original contribution of the review is the identification of five distinct "contexts," giving rise to multiple objectives: These are used to explain the reasons behind the use of multiobjective optimization in each application area and also to point the way to potential future uses of the technique.
    IEEE/ACM Transactions on Computational Biology and Bioinformatics 05/2007; 4(2):279-92. · 1.54 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Distant homologies between proteins are often discovered only after three-dimensional structures of both proteins are solved. The sequence divergence for such proteins can be so large that simple comparison of their sequences fails to identify any similarity. New generation of sensitive alignment tools use averaged sequences of entire homologous families (profiles) to detect such homologies. Several algorithms, including the newest generation of BLAST algorithms and BASIC, an algorithm used in our group to assign fold predictions for proteins from several genomes, are compared to each other on the large set of structurally similar proteins with little sequence similarity. Proteins in the benchmark are classified according to the level of their similarity, which allows us to demonstrate that most of the improvement of the new algorithms is achieved for proteins with strong functional similarities, with almost no progress in recognizing distant fold similarities. It is also shown that details of profile calculation strongly influence its sensitivity in recognizing distant homologies. The most important choice is how to include information from diverging members of the family, avoiding generating false predictions, while accounting for entire sequence divergence within a family. PSI-BLAST takes a conservative approach, deriving a profile from core members of the family, providing a solid improvement without almost any false predictions. BASIC strives for better sensitivity by increasing the weight of divergent family members and paying the price in lower reliability. A new FFAS algorithm introduced here uses a new procedure for profile generation that takes into account all the relations within the family and matches BASIC sensitivity with PSI-BLAST like reliability.
    Protein Science 12/1999; 9(2):232 - 241. · 2.86 Impact Factor

Full-text (4 Sources)

Available from
May 16, 2014