Article

A GO-driven semantic similarity measure for quantifying the biological relatedness of gene products.

Intelligent Decision Technologies 01/2009; 3:239-248. DOI: 10.3233/IDT-2009-0059
Source: DBLP

ABSTRACT Advances in biological experiments, such as DNA microarrays, have produced large multidimensional data sets for examination and retrospective analysis. Scientists however, heavily rely on existing biomedical knowledge in order to fully analyze and comprehend such datasets. Our proposed framework relies on the Gene Ontology for integrating a priori biomedical knowledge into traditional data analysis approaches. We explore the impact of considering each aspect of the Gene Ontology individually for quantifying the biological relatedness between gene products. We discuss two figure of merit scores for quantifying the pair-wise biological relatedness between gene products and the intra-cluster biological coherency of groups of gene products. Finally, we perform cluster deterioration simulation experiments on a well scrutinized Saccharomyces cerevisiae data set consisting of hybridization measurements. The results presented illustrate a strong correlation between the devised cluster coherency figure of merit and the randomization of cluster membership.

Download full-text

Full-text

Available from: Christos Tjortjis, Aug 25, 2015
0 Followers
 · 
100 Views
  • Source
    • "The Gene Ontology (GO) represents an important knowledge resource for describing the function of genes [1], and has been widely used for identifying similarities between gene functions based on the GO structure [2] [3]. Recently, research has been done to explore the relationships between the GO-based similarity and gene expression profiles [3] [4] [5] [6] [7] and the relationships between gene function annotation and gene sequence [8]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Research has been done to explore the relationships between the Gene Ontology-based similarity and gene expression profiles in the mammalian brain. However, little attention has been paid to the location information of a gene's expressions. Gene expression maps, which contain spatial information regarding the expression of genes in mice's brain, are obtained by combining voxelation and microarrays. Based on the hypothesis that genes with similar gene expression maps may have similar gene functions, we propose an approach to identify pair-wise gene functional similarities by gene expression maps. By considering pairs of genes from an original dataset as samples whose features are extracted from expression maps and labels are the functional similarities of pairs of genes, we explore the relationship between similarities of gene maps and gene functions. We restrict the dataset to genes that are associated with previously detected functional expression profiles to strengthen the relationship. We use AdaBoost, coupled with our proposed weak classifier, to analyze the dataset and predict the functional similarities. The experimental results show that with the increasing similarities of gene expression maps, the functional similarities are increased too. The boosting analysis can predict the functional similarities between genes to a certain degree. The weights of the features in the model indicate which features are significant for this prediction. These findings can potentially assist the biologists by providing helpful clues in predicting gene functions.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The relationships between the gene functional similarity and gene expression profile, and between gene function annotation and gene sequence have been studied extensively. However, not much work has considered the connection between gene functions and location of a gene's expression in the mammalian tissues. On the other hand, although unsupervised learning methods have been commonly used in functional genomics, supervised learning cannot be directly applied to a set of normal genes without having a target (class) attribute. Here, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps that provide information about the location of gene expression. The features are extracted from expression maps and the labels denote the functional similarities of pairs of genes. We make use of wavelet features, original expression values, difference and average values of neighboring voxels and other features to perform boosting analysis. The experimental results show that with increasing similarities of gene expression maps, the functional similarities are increased too. The model predicts the functional similarities between genes to a certain degree. The weights of the features in the model indicate the features that are more significant for this prediction. By considering pairs of genes, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps. We also explore the relationship between similarities of gene maps and gene functions. By using AdaBoost coupled with our proposed weak classifier we analyze a large-scale gene expression dataset and predict gene functional similarities. We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain. This work is very important for predicting functions of unknown genes. It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions.
    BMC Bioinformatics 03/2012; 13 Suppl 3(Suppl 3):S1. DOI:10.1186/1471-2105-13-S3-S1 · 2.67 Impact Factor