Article

GENECODIS: A web-based tool for finding significant concurrent annotations in gene lists

BioComputing Unit, National Center of Biotechnology (CNB-CSIC), C/Darwin 3, Campus Universidad Autónoma de Madrid, 28049 Madrid, Spain.
Genome biology (Impact Factor: 10.47). 02/2007; 8(1):R3. DOI: 10.1186/gb-2007-8-1-r3
Source: PubMed

ABSTRACT We present GENECODIS, a web-based tool that integrates different sources of information to search for annotations that frequently co-occur in a set of genes and rank them by statistical significance. The analysis of concurrent annotations provides significant information for the biologic interpretation of high-throughput experiments and may outperform the results of standard methods for the functional analysis of gene lists. GENECODIS is publicly available at http://genecodis.dacya.ucm.es/.

Download full-text

Full-text

Available from: Pedro Carmona-Saez, Apr 02, 2014
1 Follower
 · 
166 Views
  • Source
    • "Current methods for overrepresentation analysis have become a de facto standard for molecular biological research [33]. A large number of gene set enrichment methods have been developed for analyzing gene set enrichment [6] [14] [15] [18] [23] [31] [32] [33]. Each of these methods employs a different test statistic and null hypnosis to estimate the amount of differential expression of genes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Biologists often need to know the set S¿ of genes that are the most functionally and semantically related to a given set S of genes. For determining the set S¿ , most current gene similarity measures overlook the structural dependencies among the Gene Ontology (GO) terms annotating the set S , which may lead to erroneous results. We introduce in this paper a biological search engine called RGFinder that considers the structural dependencies among GO terms by employing the concept of existence dependency. RGFinder assigns a weight to each edge in GO graph to represent the degree of relatedness between the two GO terms connected by the edge. The value of the weight is determined based on the following factors: (1) type of the relation represented by the edge (e.g., an "is-a" relation is assigned a different weight than a "part-of" relation), (2) the functional relationship between the two GO terms connected by the edge, and (3) the string-substring relationship between the names of the two GO terms connected by the edge. RGFinder then constructs a minimum spanning tree of GO graph based on these weights. In the framework of RGFinder, the set S¿ is annotated to the GO terms located at the lowest convergences of the subtree of the minimum spanning tree that passes through the GO terms annotating set S . We evaluated RGFinder experimentally and compared it with four gene set enrichment systems. Results showed marked improvement.
    IEEE Transactions on NanoBioscience 10/2014; DOI:10.1109/TNB.2014.2363295 · 1.77 Impact Factor
  • Source
    • "The methods for overrepresentation analysis have become a de facto standard for molecular biological research [20]. A large number of gene set enrichment methods have been developed for analyzing gene set enrichment [4] [6] [8] [9] [13] [18] [19] [20]. Each of these methods employs a different test statistic and null hypnosis to estimate the amount of differential expression of genes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: GO relation embodies some aspects of existence dependency. If GO term xis existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term xcannot be existence-dependent on GO term y, if x- and y- have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.
    IEEE/ACM Transactions on Computational Biology and Bioinformatics 09/2014; 11(6). DOI:10.1109/TCBB.2014.2344668 · 1.54 Impact Factor
  • Source
    • "GO enrichment indicates the relationship between genes and GO terms. For each gene í µí±” and each GO term GO í µí±— , a score is generated, which is typically referred to as the gene ontology enrichment score and defined as the −log 10 of the hypergeometric test í µí±ƒ value [33] [34] [35] for a gene set í µí°º consisting of í µí±”'s direct neighbors in STRING and the GO term GO j that can be computed as follows: "
    [Show abstract] [Hide abstract]
    ABSTRACT: Identifying disease genes is one of the most important topics in biomedicine and may facilitate studies on the mechanisms underlying disease. Age-related macular degeneration (AMD) is a serious eye disease; it typically affects older adults and results in a loss of vision due to retina damage. In this study, we attempt to develop an effective method for distinguishing AMD-related genes. Gene ontology and KEGG enrichment analyses of known AMD-related genes were performed, and a classification system was established. In detail, each gene was encoded into a vector by extracting enrichment scores of the gene set, including it and its direct neighbors in STRING, and gene ontology terms or KEGG pathways. Then certain feature-selection methods, including minimum redundancy maximum relevance and incremental feature selection, were adopted to extract key features for the classification system. As a result, 720 GO terms and 11 KEGG pathways were deemed the most important factors for predicting AMD-related genes.
    BioMed Research International 08/2014; 2014:450386. DOI:10.1155/2014/450386 · 2.71 Impact Factor
Show more