The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species

PLoS Computational Biology (Impact Factor: 4.62). 07/2009; 5(7). DOI: 10.1371/journal.pcbi.1000431
Source: OAI


The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content.

23 Reads
  • Source
    • "First, iBeetle genes were matched against the official Beetlebase gene identifiers by using a best reciprocal Blast hits based association table kindly provided by the iBeetle consortium [58, 59]. Using the official Beetlebase gene identifiers, gene descriptions, Gene Ontology (GO) terms [64] and InterPro attributes were downloaded from the EnsemblMetazoa database, release 17 [65]. In order to further improve the annotation of iBeetle genes, especially for those without a best reciprocal Blast hit among the Beetlebase gene identifiers, peptide sequences for genes of interest were downloaded from the iBeetle webpage and were blasted against the nr protein database using blastp from Blast2GO, version 2.6.6, and were then mapped and annotated with the Blast2GO default parameters [58, 59, 66]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Pathogens can infect their hosts through different routes. For studying the consequences for host resistance, we here used the entomopathogen Bacillus thuringiensis and the red flour beetle Tribolium castaneum for oral and systemic (i. e. pricking the cuticle) experimental infection. In order to characterize the molecular mechanisms underpinning the two different infection routes, the transcriptomes of beetles of two different T. castaneum populations – one recently collected population (Cro1) and a commonly used laboratory strain (SB) – were analyzed using a next generation RNA sequencing approach. Results The genetically more diverse population Cro1 showed a significantly larger number of differentially expressed genes. While both populations exhibited similar reactions to pricking, their expression patterns in response to oral infection differed remarkably. In particular, the Cro1 population showed a strong response of cuticular proteins and developmental genes, which might indicate an adaptive developmental flexibility that was lost in the SB population presumably as a result of inbreeding. The immune response of SB was primarily based on antimicrobial peptides, while Cro1 relied on responses mediated by phenoloxidase and reactive oxygen species, which may explain the higher resistance of this strain against oral infection. Conclusions Our data demonstrate that immunological and physiological processes underpinning the two different routes of infection are clearly distinct, and that host populations particularly differ in responses to oral infection. Furthermore, gene expression upon pricking infection entailed a strong signal of wounding, highlighting the importance of pricking controls in future infection studies. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-445) contains supplementary material, which is available to authorized users.
    Full-text · Article · Jun 2014 · BMC Genomics
  • Source
    • "This work introduced an interesting new tool for analyzing and visualizing the gene datasets. Because the typical Arabidopsis gene ontology (GO) (Gaudet et al., 2009) annotation provided limited understanding regarding which class of genes was important in dormancy and germination , the authors reannotated the genes in relation to previously described roles in germinationand dormancy-related terms (Microsoft Excel TAGGIT macro) (Carrera et al., 2007). This TAGGIT workflow was used for reanalyzing new and previous microarray data and has been shown to give a distinct visual gene signature for dormant and after-ripened seeds (Holdsworth et al., 2008b). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The success of flowering plants in most ecosystems could be seen as a result of the emergence of the seed structure as a reproduction vehicle. If plants are broadly defined as immobile living organisms, in contrast to animals, the seed would be the exception to that definition. The seed, enclosing an embryonic plant and nutrient sources, represents the final stage of the plant reproduction and allows the safe dispersion of the progeny. For doing that, the seed needs to survive a challenging and changing environment and to preserve the next generation until conditions are favorable for survival. The delay between seed formation and seed germination is one of the most important times during the entire plant cycle and has to be carefully synchronized with the environment to maximize seedling survival. This timing is principally determined by seed dormancy, which is a biological condition (physiological, morphological, and physical) that temporarily blocks germination, keeping the seed quiescent (Baskin and Baskin, 2004). Physiological dormancy is the most common form and generally includes components of embryo- and seed coat–based dormancy. Seed dormancy is an adaptive trait with high variability across species that has enormous importance in both wild and domesticated plants. Seed dormancy programs determine the ecological niche in which the seed germinates and prospers and are related to different factors such as climate, moisture, soil characteristics, light, temperature, nutrients, abiotic and biotic stress factors, and many others (Finch- Savage and Leubner-Metzger, 2006). In domesticated species, the control of seed dormancy is crucial and influences important agricultural traits, such as uniform germination and stand establishment, preharvest sprouting susceptibility (Gubler et al., 2005), and seed storage requirements.
    Full-text · Chapter · Feb 2013
  • Source
    • "Finally, in order to provide a standard framework for data integration and a reliable engine for SNPs selection, the database has been built on a strong ontology layer. Whenever available, data have been annotated using ontological terms: Gene Ontology [22] for genes and KEGG Pathway ontology (derived from the hierarchical organization of KEGG pathways) for pathways are just some of the hierarchically structured vocabularies that underlie the infrastructure. Additionally, ontology structures allow to improve the performance of statistical and analytical evaluations by means of the graphs that undergo the hierarchically structured vocabularies and that shed light on the relationships between biological components. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of genes and SNPs involved in human diseases remains a challenge. Many public resources, databases and applications, collect biological data and perform annotations, increasing the global biological knowledge. The need of SNPs prioritization is emerging with the development of new high-throughput genotyping technologies, which allow to develop customized disease-oriented chips. Therefore, given a list of genes related to a specific biological process or disease as input, a crucial issue is finding the most relevant SNPs to analyse. The selection of these SNPs may rely on the relevant a-priori knowledge of biomolecular features characterising all the annotated SNPs and genes of the provided list. The bioinformatics approach described here allows to retrieve a ranked list of significant SNPs from a set of input genes, such as candidate genes associated with a specific disease. The system enriches the genes set by including other genes, associated to the original ones by ontological similarity evaluation. The proposed method relies on the integration of data from public resources in a vertical perspective (from genomics to systems biology data), the evaluation of features from biomolecular knowledge, the computation of partial scores for SNPs and finally their ranking, relying on their global score. Our approach has been implemented into a web based tool called SNPRanker, which is accessible through at the URL . An interesting application of the presented system is the prioritisation of SNPs related to genes involved in specific pathologies, in order to produce custom arrays.
    Full-text · Article · Jan 2010 · Journal of integrative bioinformatics
Show more