Gene Ontology annotations: What they mean and where they come from

The Jackson Laboratory, Bar Harbor, ME, USA.
BMC Bioinformatics (Impact Factor: 2.58). 02/2008; 9 Suppl 5(Suppl 5):S2. DOI: 10.1186/1471-2105-9-S5-S2
Source: DOAJ


To address the challenges of information integration and retrieval, the computational genomics community increasingly has come to rely on the methodology of creating annotations of scientific literature using terms from controlled structured vocabularies such as the Gene Ontology (GO). Here we address the question of what such annotations signify and of how they are created by working biologists. Our goal is to promote a better understanding of how the results of experiments are captured in annotations, in the hope that this will lead both to better representations of biological reality through annotation and ontology development and to more informed use of GO resources by experimental scientists.

Download full-text


Available from: Judith A Blake
  • Source
    • "Efforts in the representation of more structured annotations have tended to be idiosyncratic, specific to a particular type of annotation or task, and not broadly interoperable. For example, for the task of Gene Ontology (GO) annotation, in which the functionalities of genes and gene products represented in biomedical databases are associated to GO terms [46], the Gene Association File format (GAF 2.0) [31] enables the representation of constraints on the context in which a given annotation might be valid (e.g., the type of cell in which the functionality is asserted to be present); however, this format is specific to this narrow task. Analogously, the corpus and computational linguistics communities have developed solutions for representing complex syntax and semantics for documents, e.g., the Penn Treebank format [15], but these representations are mostly idiosyncratic and not interoperable. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations. We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences. With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.
    Full-text · Article · Nov 2013 · Journal of Biomedical Semantics
  • Source
    • "One informatics resource that has transformed the analysis of large biological datasets is the Gene Ontology (GO) [3], which provides a computable description of the functional aspects of an increasing number of genes and gene products spanning a diverse range of species. The GO is used by gene product annotators to assign attributes to protein and functional RNA gene products based on experimental reports in the primary literature [4-12]. These annotation sets are used for large dataset interrogation to determine similarities and differences in the attributes of gene products within those datasets. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI. We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI. The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from
    Full-text · Article · Jul 2013 · BMC Genomics
  • Source
    • "Using this protein set, a comprehensive literature-based manual annotation drive was embarked on to capture all the experimental instances of peroxisomal subcellular location as well as recording the functional information for each protein. GO annotations were created with the appropriate evidence codes to inform the user of the type of supporting evidence that exists for making a particular functional statement (9). A total of 88 human proteins were identified as having peroxisomal localization based on experimental evidence from published literature. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Gene Ontology (GO) is the de facto standard for the functional description of gene products, providing a consistent, information-rich terminology applicable across species and information repositories. The UniProt Consortium uses both manual and automatic GO annotation approaches to curate UniProt Knowledgebase (UniProtKB) entries. The selection of a protein set prioritized for manual annotation has implications for the characteristics of the information provided to users working in a specific field or interested in particular pathways or processes. In this article, we describe an organelle-focused, manual curation initiative targeting proteins from the human peroxisome. We discuss the steps taken to define the peroxisome proteome and the challenges encountered in defining the boundaries of this protein set. We illustrate with the use of examples how GO annotations now capture cell and tissue type information and the advantages that such an annotation approach provides to users.Database URL: and
    Full-text · Article · Jan 2013 · Database The Journal of Biological Databases and Curation
Show more