Conference Paper

Estimating the Quality of Ontology-Based Annotations by Considering Evolutionary Changes.

DOI: 10.1007/978-3-642-02879-3_7 Conference: Data Integration in the Life Sciences, 6th International Workshop, DILS 2009, Manchester, UK, July 20-22, 2009. Proceedings
Source: DBLP

ABSTRACT Ontology-based annotations associate objects, such as genes and proteins, with well-defined ontology concepts to semantically
and uniformly describe object properties. Such annotation mappings are utilized in different applications and analysis studies
whose results strongly depend on the quality of the used annotations. To study the quality of annotations we propose a generic
evaluation approach considering the annotation generation methods (provenance) as well as the evolution of ontologies, object
sources, and annotations. Thus, it facilitates the identification of reliable annotations, e.g., for use in analysis applications.
We evaluate our approach for functional protein annotations in Ensembl and Swiss-Prot using the Gene Ontology.

Download full-text


Available from: Anika Groß, Sep 28, 2015
15 Reads
  • Source
    • "Likewise, measures of accuracy based on term specificity have been called into question [5]. Other approaches that address annotation error rates or accuracy such as [6] and [8] downplay the role of ontology structural quality, and ignore the effect that the ontology structure can have on real-world applications. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Gene Ontology and its associated annotations are critical tools for interpreting lists of genes. Here, we introduce a method for evaluating the Gene Ontology annotations and structure based on the impact they have on gene set enrichment analysis, along with an example implementation. This task-based approach yields quantitative assessments grounded in experimental data and anchored tightly to the primary use of the annotations. Applied to specific areas of biological interest, our framework allowed us to understand the progress of annotation and structural ontology changes from 2004 to 2012. Our framework was also able to determine that the quality of annotations and structure in the area under test have been improving in their ability to recall underlying biological traits. Furthermore, we were able to distinguish between the impact of changes to the annotation sets and ontology structure. Our framework and implementation lay the groundwork for a powerful tool in evaluating the usefulness of the Gene Ontology. We demonstrate both the flexibility and the power of this approach in evaluating the current and past state of the Gene Ontology as well as its applicability in developing new methods for creating gene annotations.
    Journal of Biomedical Semantics 04/2013; 4 Suppl 1(Suppl 1):S4. DOI:10.1186/2041-1480-4-S1-S4 · 2.26 Impact Factor
  • Source
    • "Annotations are frequently assigned a score, using a variety of methods. These approaches include assigning confidence scores to annotations based on their stability (Gross et al., 2009) or combining the breadth (coverage of gene product) and the depth (level of detail) for the terms in the Gene Ontology (GO) (Buza et al. 2008). However, while deeper nodes within an ontology are generally more specialized, these measures are problematic; first GO has three root domains and second an ontology, such as GO, is a graph not a tree, therefore depth is not necessarily meaningful. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use the UniProt Knowledgebase (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality. Source code is available at the authors website:
    Bioinformatics 09/2012; 28(18):i562-i568. DOI:10.1093/bioinformatics/bts372 · 4.98 Impact Factor
  • Source
    • "Typically, there are several new versions per ontology and year; new versions of the heavily used Gene Ontology are even released on a daily basis. Ontology modifications may invalidate annotations [10] and influence applications such as ontology-based functional profiling of gene sets [11,12]. GOMMA includes algorithms to automatically detect the changes between ontology versions and, can thus, help to identify, study and resolve problems caused by such changes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: ABSTRACT: Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data. We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and efficiently manage ontology versions and different kinds of mappings. Furthermore, it provides components for ontology matching, and determining evolutionary ontology changes. These components are used by analysis tools, such as the Ontology Evolution Explorer (OnEX) and the detection of unstable ontology regions. We introduce the component-based infrastructure and show analysis results for selected components and life science applications. GOMMA is available at GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies and analyze their evolution. Key functions include a generic storage of ontology versions and mappings, support for ontology matching and determining ontology changes. The supported features for analyzing ontology changes are helpful to assess their impact on ontology-dependent applications such as for term enrichment. GOMMA complements OnEX by providing functionalities to manage various versions of mappings between two ontologies and allows combining different match approaches.
    Journal of Biomedical Semantics 09/2011; 2(1):6. DOI:10.1186/2041-1480-2-6 · 2.26 Impact Factor
Show more