Conference Paper

Estimating the Quality of Ontology-Based Annotations by Considering Evolutionary Changes.

DOI: 10.1007/978-3-642-02879-3_7 Conference: Data Integration in the Life Sciences, 6th International Workshop, DILS 2009, Manchester, UK, July 20-22, 2009. Proceedings
Source: DBLP

ABSTRACT Ontology-based annotations associate objects, such as genes and proteins, with well-defined ontology concepts to semantically
and uniformly describe object properties. Such annotation mappings are utilized in different applications and analysis studies
whose results strongly depend on the quality of the used annotations. To study the quality of annotations we propose a generic
evaluation approach considering the annotation generation methods (provenance) as well as the evolution of ontologies, object
sources, and annotations. Thus, it facilitates the identification of reliable annotations, e.g., for use in analysis applications.
We evaluate our approach for functional protein annotations in Ensembl and Swiss-Prot using the Gene Ontology.

0 Bookmarks
 · 
89 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use the UniProt Knowledgebase (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality. Source code is available at the authors website: http://homepages.cs.ncl.ac.uk/m.j.bell1/annotation. phillip.lord@newcastle.ac.uk.
    Bioinformatics 09/2012; 28(18):i562-i568. · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Gene Ontology and its associated annotations are critical tools for interpreting lists of genes. Here, we introduce a method for evaluating the Gene Ontology annotations and structure based on the impact they have on gene set enrichment analysis, along with an example implementation. This task-based approach yields quantitative assessments grounded in experimental data and anchored tightly to the primary use of the annotations. Applied to specific areas of biological interest, our framework allowed us to understand the progress of annotation and structural ontology changes from 2004 to 2012. Our framework was also able to determine that the quality of annotations and structure in the area under test have been improving in their ability to recall underlying biological traits. Furthermore, we were able to distinguish between the impact of changes to the annotation sets and ontology structure. Our framework and implementation lay the groundwork for a powerful tool in evaluating the usefulness of the Gene Ontology. We demonstrate both the flexibility and the power of this approach in evaluating the current and past state of the Gene Ontology as well as its applicability in developing new methods for creating gene annotations.
    Journal of biomedical semantics. 04/2013; 4 Suppl 1:S4.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The continuous evolution of life science ontologies requires the adaptation of their associated mappings. We propose two approaches for tackling this problem in a largely automatic way: (1) a composition-based adaptation relying on the principle of mapping composition and (2) a diff-based adaptation algorithm individually handling change operations to update the mapping. Both techniques reuse unaffected correspondences, and adapt only the affected mapping part. We experimentally assess and confirm the effectiveness of our approaches for evolving mappings between large life science ontologies.
    Proceedings of the 9th International Conference on Data Integration in the Life Sciences (DILS’13); 01/2013

Full-text (2 Sources)

View
27 Downloads
Available from
May 28, 2014