Gene Ontology annotations: what they mean and where they come from
ABSTRACT To address the challenges of information integration and retrieval, the computational genomics community increasingly has come to rely on the methodology of creating annotations of scientific literature using terms from controlled structured vocabularies such as the Gene Ontology (GO). Here we address the question of what such annotations signify and of how they are created by working biologists. Our goal is to promote a better understanding of how the results of experiments are captured in annotations, in the hope that this will lead both to better representations of biological reality through annotation and ontology development and to more informed use of GO resources by experimental scientists.
Full-textDOI: · Available from: Judith A Blake, May 30, 2015
SourceAvailable from: Barbara Zdrazil[Show abstract] [Hide abstract]
ABSTRACT: Integration of open access, curated, high-quality information from multiple disciplines in the Life and Biomedical Sciences provides a holistic understanding of the domain. Additionally, the effective linking of diverse data sources can unearth hidden relationships and guide potential research strategies. However, given the lack of consistency between descriptors and identifiers used in different resources and the absence of a simple mechanism to link them, gathering and combining relevant, comprehensive information from diverse databases remains a challenge. The Open Pharmacological Concepts Triple Store (Open PHACTS) is an Innovative Medicines Initiative project that uses semantic web technology approaches to enable scientists to easily access and process data from multiple sources to solve real-world drug discovery problems. The project draws together sources of publicly-available pharmacological, physicochemical and biomolecular data, represents it in a stable infrastructure and provides well-defined information exploration and retrieval methods. Here, we highlight the utility of this platform in conjunction with workflow tools to solve pharmacological research questions that require interoperability between target, compound, and pathway data. Use cases presented herein cover 1) the comprehensive identification of chemical matter for a dopamine receptor drug discovery program 2) the identification of compounds active against all targets in the Epidermal growth factor receptor (ErbB) signaling pathway that have a relevance to disease and 3) the evaluation of established targets in the Vitamin D metabolism pathway to aid novel Vitamin D analogue design. The example workflows presented illustrate how the Open PHACTS Discovery Platform can be used to exploit existing knowledge and generate new hypotheses in the process of drug discovery.PLoS ONE 12/2014; 9(12):e115460. DOI:10.1371/journal.pone.0115460 · 3.53 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Realist ontologies organize knowledge by strict adherence to philosophical principles, ensuring robustness and coherence. According to those principles, only entities empirically verifiable can be represented. Our study aimed to analyze medical records to evaluate which kinds of entities should be represented for physicians. We classified the entities and found several entities that cannot be represented in realist ontologies. After due analysis, results suggest that a categorization that distinguishes reality from medical knowledge about reality and observations under both of them are useful to describe entities present in medical records.
[Show abstract] [Hide abstract]
ABSTRACT: Background Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures.ResultsWe have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function.We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes.Conclusions We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.BMC Bioinformatics 12/2014; 15(1):405. DOI:10.1186/s12859-014-0405-z · 2.67 Impact Factor