Estimating the Quality of Ontology-Based Annotations by Considering Evolutionary Changes.
-
Article: Characterizing gene sets with FuncAssociate.
[show abstract] [hide abstract]
ABSTRACT: FuncAssociate is a web-based tool to help researchers use Gene Ontology attributes to characterize large sets of genes derived from experiment. Distinguishing features of FuncAssociate include the ability to handle ranked input lists, and a Monte Carlo simulation approach that is more appropriate to determine significance than other methods, such as Bonferroni or idák p-value correction. FuncAssociate currently supports 10 organisms (Vibrio cholerae, Shewanella oneidensis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Arabidopsis thaliana, Caenorhaebditis elegans, Drosophila melanogaster, Mus musculus, Rattus norvegicus and Homo sapiens). AVAILABILITY: FuncAssociate is freely accessible at http://llama.med.harvard.edu/Software.html. Source code (in Perl and C) is freely available to academic users 'as is'.Bioinformatics 01/2004; 19(18):2502-4. · 5.47 Impact Factor -
Article: Lineage retrieval for scientific data processing: a survey.
ACM Comput. Surv. 01/2005; 37:1-28. -
Article: UniProtKB/Swiss-Prot.
[show abstract] [hide abstract]
ABSTRACT: The Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI), and the Protein Information Resource (PIR) form the Universal Protein Resource (UniProt) consortium. Its main goal is to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB) and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc). (1) UniProtKB is a comprehensive protein sequence knowledgebase that consists of two sections: UniProtKB/Swiss-Prot, which contains manually annotated entries, and UniProtKB/TrEMBL, which contains computer-annotated entries. UniProtKB/Swiss-Prot entries contain information curated by biologists and provide users with cross-links to about 100 external databases and with access to additional information or tools. (2) The UniRef databases (UniRef100, UniRef90, and UniRef50) define clusters of protein sequences that share 100, 90, or 50% identity. (3) The UniParc database stores and maps all publicly available protein sequence data, including obsolete data excluded from UniProtKB. The UniProt databases can be accessed online (http://www.uniprot.org/) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every 2 weeks. The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry, paying particular attention to the specificities of plant protein annotation. We will also present some of the tools and databases that are linked to each entry.Methods in molecular biology (Clifton, N.J.) 01/2007; 406:89-112.
Page 1
N.W. Paton, P. Missier, and C. Hedeler (Eds.): DILS 2009, LNBI 5647, pp. 71–87, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Estimating the Quality of Ontology-Based Annotations by
Considering Evolutionary Changes
Anika Gross1, Michael Hartung1, Toralf Kirsten1,2, and Erhard Rahm1,3
1 Interdisciplinary Centre for Bioinformatics, University of Leipzig
2 Institute for Medical Informatics, Statistics and Epidemiology,
University of Leipzig
3 Department of Computer Science, University of Leipzig
{gross,hartung,tkirsten}@izbi.uni-leipzig.de,
rahm@informatik.uni-leipzig.de
Abstract. Ontology-based annotations associate objects, such as genes and pro-
teins, with well-defined ontology concepts to semantically and uniformly
describe object properties. Such annotation mappings are utilized in different
applications and analysis studies whose results strongly depend on the quality
of the used annotations. To study the quality of annotations we propose a ge-
neric evaluation approach considering the annotation generation methods
(provenance) as well as the evolution of ontologies, object sources, and annota-
tions. Thus, it facilitates the identification of reliable annotations, e.g., for use in
analysis applications. We evaluate our approach for functional protein annota-
tions in Ensembl and Swiss-Prot using the Gene Ontology.
Keywords: annotation, evolution, quality.
1 Introduction
Ontologies and their application have become increasingly important especially in the
life sciences. Typically, they are used to semantically describe or annotate properties
of real world objects, such as genes and proteins. The associations between object
descriptions and the elements (concepts) of an ontology form a so-called annotation
mapping. For instance, the protein objects of Ensembl [11] and Swiss-Prot [3] are
associated with concepts of the popular Gene Ontology [9] to describe the molecular
functions and biological processes in which the proteins are involved. Annotation
mappings are utilized in different analysis scenarios and applications. These include
functional profiling of large datasets such as gene expression microarrays (e.g., [1,4]),
network reconstruction and retrieval [7], or instance-based ontology matching [13].
Computed results of these applications significantly depend on which annotations
are used and hence rely on a good quality of the annotations, e.g., with respect to their
correctness and completeness. A particularly important quality aspect is the stability
of annotations since major changes in the annotation mappings may substantially
influence or even invalidate earlier findings. This is potentially a major issue since
annotation mappings change frequently, e.g., due to changes (additions, deletions,
Page 2
72 A. Gross et al.
modifications) in the underlying ontologies [10], objects and annotation associations.
Furthermore, annotation quality is influenced by the method that has been used to
create the annotation because it likely affects how biologically founded or reliable an
annotation is. The relevance of the creation method is underlined by the increasing
use of predefined evidence codes (EC) to classify functional annotations based on the
Gene Ontology [8]. These evidence codes allow a distinction of whether annotations
are experimentally founded, are based on author or curator statements or generated by
automatic algorithms, e.g., data mining techniques or homology mappings. The evi-
dence codes represent provenance information (sometimes also called lineage1 [2,5])
that can be utilized by analysis applications to focus on specific annotation sets, e.g.,
manually curated or automatically generated annotations.
For illustration, Figure 1 shows the evolution of selected functional protein annota-
tions in five succeeding Ensembl versions (v48-v52). The first annotation
(ENSP00000344151, GO:0015808) was continuously available with unchanged evi-
dence code (IDA, inferred from direct assay) indicating a stable annotation. Con-
versely, the evidence code of the second annotation for protein ENSP00000230480
has been changed from traceable author statement (TAS) over IDA to inferred from
electronic annotation (IEA). Such a frequent revision of the provenance information
indicates reduced reliability of the annotation. Furthermore, the last annotation
(Figure 1, line 3) was temporarily absent also indicating a reduced stability.
So far, the quality of annotation mappings w.r.t. their stability and provenance informa-
tion is largely unexplored despite their potential importance for many analysis applica-
tions. We therefore present and evaluate a general approach to analyze annotation
mappings by taking their evolution and evidence information into account. To that end we
first propose an evolution model for annotation mappings including change operators and
quality measures (Section 2). The model captures ontology, instance and quality changes
w.r.t. annotation changes. Based on the evolution model, we propose evolution-based
quality measures to identify reliable annotations (Section 3). Finally, we evaluate our
evolution model by comparatively analyzing the annotation evolution in two large life
science annotation sources, namely Ensembl and Swiss-Prot (Section 4). In particular, we
study typical annotation changes and classify current annotations by applying the pro-
posed assessment method. Section 5 discusses related work before we conclude.
The analysis results and the proposed assessment method for annotations are ex-
pected to be valuable for users and applications of life science annotations. In particu-
lar, algorithms may utilize information of annotation history and annotation quality to
derive more robust / reliable results.
1 We further use the term provenance to determine the original source of data.
Instance ID
ENSP00000344151
ENSP00000230480
ENSP00000352999
Concept ID
v48
IDA IDA IDA IDA IDA
TAS TAS IDA TAS IEA
IDA--
v49
v50
v51
v52
GO:0015808 (L-alanine transport)
GO:0005615 (extracellular space)
GO:0006915 (apoptosis)-IDA
Fig.
(v48-v52 = Dec.2007-Dec.2008)
1. Evolution of functional protein annotations in Ensembl versions
Page 3
Estimating the Quality of Ontology-Based Annotations 73
2 Annotation Models
The stability of annotation mappings is affected by the changes in the involved in-
stance (object) sources, ontologies and object-ontology associations. In the following
we first introduce our model of annotation mappings including models for instance
sources, ontologies and annotation quality. We will assume that annotations (object-
ontology associations) include several quality indicators whose values may be taken
from predefined quality taxonomies. In Section 2.2 we will introduce our evolution
model including change operators for instances, ontologies and annotations. Further-
more, measures are proposed in order to quantify the evolution of annotations.
2.1 Annotation Mapping and Quality Models
As usual in life sciences, we assume that ontologies and instance sources are ver-
sioned so that a specific version reflects a stable data snapshot from a specific point in
time. The versioning scheme is assumed to be linear, i.e., a particular version vi has
exactly one successor version vi+1 and one predecessor version vi-1. The latest (first)
version form exceptions since no successor (predecessor) versions are available.
As illustrated in Figure 2, annotation mappings interrelate a specific version of an
instance source with a specific version of an ontology. Furthermore, annotation map-
pings can refer to common quality taxonomies to specify the quality of individual
annotation associations by different criteria, e.g., provenance or stability. Before we
define the details of annotation mappings we briefly introduce our models for instance
sources and ontologies which are based on [10].
An instance source of version v is denoted by Iv = (I, t) consisting of a set of in-
stances I = {i1, …, in} and a release timestamp t. An instance item i of I is described
by a set of attributes, e.g., name or current status. A special attribute called accession
number identifies instance items unambiguously. Accession numbers are utilized to
reference instance items within annotation mappings.
An ontology ONv = (C, R, t) of version number v and release timestamp t consists
of concepts C = {c1, …, cn} and relationships R = {r1, …, rm}. A concept c∈C
Sk+1
annotation
mapping
(Sk, Xi, Q,A)
annotation
mapping
(Sk, Yj, Q, A)
Xi+1
Yj+1
ontology
Xi= (Ci, Ri, ti)
ontology
Yj= (Cj, Rj, tj)
...
...
...
...
instance source
Sk= (Ik, tk)
quality taxonomies
Q = (Q1,…,Qm)
Fig. 2. Model of instance sources, ontologies and annotation mappings with versioning and
quality
Page 4
74 A. Gross et al.
comprises attributes for its detailed description, e.g., synonyms or a definition. An
accession number is utilized for unambiguous identification of concepts and the obso-
lete status signals whether a concept is active or not within the ontology. Furthermore,
concepts can be interconnected by directed relationships r = (c1, c2) ∈ R, e.g., is-a or
part-of relationships. Overall, concepts C and relationships R form the graph structure
of an ontology which is usually a directed acyclic graph (DAG) with root concepts
(concepts of C that have no relationships to a super concept).
An annotation mapping AM = (Iu, ONv, Q, A) associates an instance source version
Iu with an ontology version ONv by a set of correspondences A. A single association
or annotation a∈A is denoted by a = (i, c, {q}), i.e., an instance item i ∈ Iv is anno-
tated with an ontology concept c ∈ ONv and a set of quality indicators (ratings) {q}.
The quality indicators {q} of annotations may be numerical values or come from pre-
defined quality taxonomies Q1,…,Qm∈Q. Quality taxonomies represent predefined
criteria for uniform quality characterization, e.g., the evidence codes for provenance
information or stability indicators. Note that for each quality taxonomy at most one
quality indicator can be utilized in an annotation. Typically, the quality ratings of an
annotation are specified when an annotation is first generated. However annotation
ratings may be modified, as seen in the examples of Figure 1, e.g., when changed
information about the annotation becomes available.
A quality taxonomy representing a particular quality criterion consists of a set of
predefined quality terms {q1, …, qn} which may be arranged in an is-a-like hierarchy.
In the general case, a quality term q = (q’, type) of name q is defined by a type and an
optional super term q’. Every quality term has exactly one parent term, if no parent
term exists, the quality term is assumed to be the root of the quality taxonomy. Qual-
ity terms can be of two different types: instantiable and abstract. While instantiable
quality terms are applicable for rating an annotation, abstract ones are not utilized in
annotations, i.e., they only act as aggregation nodes within the taxonomy. For our
study, we assume that quality taxonomies remain unchanged.
We will utilize three different types of quality indicators to specify (1) provenance
type, (2) stability and (3) age of annotations. First, for provenance information we
utilize and analyze the existing Evidence Codes (EC) [8] for GO annotations which
specify their generation method. Figure 3 shows the current EC quality taxonomy
including different groups, in particular ‘Manually assigned’ (man), ‘Automatically
assigned’ (auto) and ‘Obsolete’ (obs). Manually determined annotations are further
ISO
ISO
ISO
ISO
ISM
ISM
ISM
ISM
ISA
ISA
ISA
ISA
IPI
IPI IMPIMP IGI
IPI
IPI IMPIMP IGIIGI IEP IGI IEP IEPIEP
IDA
IDA
IDA
IDA
Experimental
Experimental
( (exp
exp) )( (exp
exp) )
Automatically
Automatically
assigned
assigned ( (auto assigned
assigned ( (auto
auto) )
auto) )
Manually assigned
Manually assigned ( (manManually assigned
Manually assigned ( (man
man) )
man) )
Obsolete
Obsolete
( (obs
obs) )( (obs
obs) )
IEA
IEA
IEA
IEA
NR
NR
NR
NR
ISS
ISS
ISS
ISS
TAS
TAS
TAS
TAS
IC
IC
IC
IC
Computational
Computational
Analysis (
Analysis (comp Analysis (
Analysis (comp
comp) )
comp) )
Author
Author Statement Statement
( (auth
auth) ) ( (auth
auth) )
Curator
Curator
Curator
Curator
Statement (
Statement (curStatement (
Statement (cur
cur) )
cur) )
All ECs
All ECs
All ECs
All ECs
NAS
NAS
NAS
NAS
ND
ND
ND
ND
IGC
IGC
IGC
IGC
RCA
RCA
RCA
RCA
EXP
EXP
EXP
EXP
Experimental
Experimental
Automatically
Automatically
Obsolete
Obsolete
Computational
Computational
Author
Author Statement Statement
Fig. 3. Evidence Code Taxonomy
Page 5
Estimating the Quality of Ontology-Based Annotations 75
refined by the exp, auth, cur and comp groups. In contrast, auto annotations are un-
verified but have been generated by algorithms such as homology or keyword map-
pings. For stability and age, we do not directly use numerical values but map them
into categorical terms of a quality taxonomy to simplify their use and evaluation. Our
stability quality taxonomy consists of only two terms to differentiate stable and unsta-
ble annotations based on their evolution history. Our age quality taxonomy differenti-
ates between novel, middle and old annotations. Hence, an automatically generated,
stable and middle-aged annotation between instance item i and ontology concept c
can be described by a = (i,c,{IEA,stable,middle}). The introduced quality taxonomies
will be used in our evaluation in Section 4. Note that the EC information is frequently
available for GO annotations but has not yet been comparatively evaluated. Further-
more, to the best of our knowledge the stability and age of annotations has not yet
been analyzed and utilized.
In life sciences, annotation mapping versioning usually follows the versioning
scheme of the instance source, i.e., a new instance source version possibly includes
changed annotations as well as referring to some (current or older) versions of the
respective ontologies. On the other hand, a new ontology version is generally not
released with a new version of annotation mappings. Furthermore, succeeding ver-
sions of an instance source may refer to the same ontology version.
2.2 Evolution Model
We extend the evolution model for ontologies and mappings of [10] which is limited
to simple addition and deletion changes. In order to study evolution in annotations in
more detail, we introduce new change types and consider quality changes in annota-
tions as well as the influence of instance / ontology changes on annotations.
Figure 4 summarizes the possible change operations for instances, ontologies and
annotations in a simple taxonomy. For instance sources (object ≙ instance item) and
ontologies (object ≙ ontology concept), we distinguish between the following
operations:
• add:
• del:
• toObs: marking an existing object as obsolete, i.e., the object becomes inactive
• subs: substitution of an existing object by a new object
• merge: merging of an object into an existing object
addition of a new object
deletion of an existing object
For annotations we differentiate between the following change operations based on
the operations for instance sources and ontologies:
• add:
• delann: deletion of an existing annotation
• delont: deletion of an annotation caused by ontology concept change or delete
• delins: deletion of an annotation caused by instance item change or delete
• chgont: adaptation of an annotation caused by ontology concept change
• chgins: adaptation of an annotation caused by instance item change
• chgqual: change of the quality indicator of an annotation
addition of a new annotation
Page 6
76 A. Gross et al.
Several dependencies exist between instance/ontology changes and annotation
changes (see leadsTo dependencies in Figure 4) leading to a corresponding propaga-
tion of changes when ontologies and instances evolve. Deletions of ontology concepts
and instances always lead to the removal of dependent annotations (delont, delins
changes). Furthermore, a change (subs, merge, toObs) of an instance item or ontology
concept may cause the deletion or adaptation of dependent annotations as described
with the delins, delont, chgins and chgont operations. Besides these changes quality
changes (chgqual), e.g., when an automatically generated annotation was later proved
by an experiment, and conventional additions / deletions (add, delann) for annotations
are distinguished.
Figure 5 illustrates the various change operators by a rather comprehensive exam-
ple of annotation evolution. The example displays an evolution step between two
versions for an instance source I (I1?I2), an ontology ON (ON1?ON2) and an annota-
tion mapping AM ((I1,ON1)?(I2,ON2)). The table on the left summarizes the change
q0
q1
q2
q3
q4
quality
taxonomy
c1
c2
c3
c1
c3
c5
i1
i2
i3
i1
i5
ON1
ON2
I1
I2
c4
c4
AM =
(I1, O1)
AM =
(I2, O2)
i1,c1,q1
i1,c2,q3
i2,c3,q1
i3,c1,q4
i4
i4,c4,q3
i4
i6
i1,c1,q3
i1,c3,q3
i5,c1,q4
i4,c5,q1
i4,c1,q4
chgins((i3,c1,q4),(i5,c1,q4))chgins((i3,c1,q4),(i5,c1,q4))
chgqual((i1,c1,q1),(i1,c1,q3))
chgqual((i1,c1,q1),(i1,c1,q3))
add(i4,c5,q1), add(i4,c1,q4)
add(i4,c5,q1), add(i4,c1,q4)
AMAM
delins(i2,c3,q1), delont(i4,c4,q3) delins(i2,c3,q1), delont(i4,c4,q3)
chgont((i1,c2,q3),(i1,c3,q3))
chgont((i1,c2,q3),(i1,c3,q3))
add(i6), del(i2), subs(i3,i5)
add(i6), del(i2), subs(i3,i5)
II
merge(c2,c3), toObs(c4) merge(c2,c3), toObs(c4)
add(c5)
add(c5)
ONON
Operation OperationSourceSource
ontology
annotation
mapping
instance
source
Q:
Fig. 5. Evolution example with possible change operations
all
add
del
change
subs
merge
toObs
all
add
change
del
delann
delont
delins
chgqual
chgont
chgins
Instance and
Ontology Changes
AnnotationChanges
leadsTo
Fig. 4. Effects of instance and ontology changes on annotations
Page 7
Estimating the Quality of Ontology-Based Annotations 77
operations resulting in the new versions for I, ON and AM, shown on the right of
Figure 5. So the instance source as well as the ontology possess added (i6, c5) and
deleted objects (i2, c4). For c2 a merge into concept c3 was performed and c4 has be-
come obsolete. Furthermore, i3 was replaced by the new instance item i5. As a result
some annotations were adapted, e.g., (i1,c2) to (i1,c3) and (i3,c1) to (i5,c1), or deleted,
e.g., (i2,c3) and (i4,c4). Moreover, (i1,c1) changed its quality from q1 to q3 in the new
version. New annotations were also added: (i4,c5,q1) and (i4,c1,q4).
2.3 Measures to Quantify Annotation Evolution and Changes
For our evaluation, we will utilize several measures to quantitatively assess the evolu-
tion of life science annotations. In addition to some general cardinality and growth
measures we want to specifically evaluate annotation changes such as the change
propagations between instances/ontologies and annotations as well as changes in the
quality of annotations.
By using quality-specific statistics we can quantify how annotations with different
quality indicators evolve over time, e.g., to discover which quality groups (annota-
tions with a particular quality q) changed heavily or remained almost stable in a pe-
riod p under review. For these purposes, we use the following measures:
|Avi| number of annotations in version vi of an annotation mapping
|Avi,q| number of annotations with quality q in version vi
|Avi,q| / |Avi| relative share of annotations with quality q to the overall number
of annotations in version vi
Addvi,vj,q, Delvi,vj,q,Chgvi,vj,q number of added, deleted or changed annotations with quality q
between version vi and vj
Addp,q, Delp,q,Chgp,q number of added, deleted or changed annotations with quality q
within an observation period p
growthA,q,vi,vj =|Avj,q| / |Avi,q| growth rate of annotations with quality q between version vi
and vj
We further investigate the impact of instance/ontology changes on annotation
changes. Since instance/ontology changes especially deletions, merges or substitu-
tions affect changes in annotations we propose measures that assess these influences
w.r.t. a version change (vi? vj) or an observatio006E period (p):
Chgont,Chgins number of annotations that have changed caused by a change of
the referenced instance item or ontology concept
Chgqual number of annotations that changed their quality
Delont,Delins number of annotations that have been deleted caused by a change
or a deletion of the referenced instance item or ontology concept
3 Assessment of Annotation Stability
In this section we propose a method to assess the stability of annotations based on
their evolution history and changes in quality indicators. To assess the evolution his-
tory without considering quality criteria, we define the history h of an annotation
a = (i,c)n of version vn :
h((i,c)n) = ( (i,c)0, (i,c)1, …, (i,c)n ) | 0 ≤ i < n: (i,c)i ? (i,c)i+1
Page 8
78 A. Gross et al.
So an annotation (i,c)i+1 in vi+1 has evolved from (i,c)i in vi, e.g., caused by an in-
stance merge or substitution (see change taxonomy in Figure 5), or remained un-
changed. The non-existence of an annotation in a version is denoted by a null value,
e.g., after a deletion or before the first occurrence. The computation occurs with re-
spect to all versions of a predefined observation period p, e.g., the last year. Given the
history h for an annotation a we can determine different measures for its evolution
within an observation period p.
First, the age of an annotation (in number of versions) is defined as
• aage = (n-fo)+1
where n is the number of the current version (vn) and fo denotes the number of the
version (vfo) in which the annotation occurs for the first time within p. In addition, we
count the number of versions in p in which an annotation appeared (apresent). Note that
the counts ignore all versions of the annotation mapping before the first occurrence of
an annotation. Based on aage and apresent we define a simple existence stability measure
that evaluates the relative existence of a single annotation a:
• stabexis(a) = apresent / aage
To evaluate quality changes of annotations within p we use an extended history hQ
of an annotation with respect to a quality indicator (e.g., provenance):
hQ((i,c,q)n) = ( (i,c,q)0, (i,c,q)1, …, (i,c,q)n ) | 0 ≤ i < n: (i,c,q)i ? (i,c,q)i+1
The extended history hQ incorporates the values of the considered quality indicator
w.r.t. a particular quality taxonomy Q. Note that the consideration of quality changes
in an annotation history may only be useful for some quality criteria. For instance, we
will focus on provenance changes in our evaluation, e.g., when the evidence code of
an annotation is modified due to new experimental findings. We count quality
changes by determining the number of versions in the history of a where a quality
change occurred (achanged). Conversely, aunchanged specifies the number of versions
without quality modification. Versions for which an annotation was temporarily miss-
ing are skipped in the change comparison of the quality indicator.
Utilizing the counts we define a stability measure for quality stability as well as a
combined stability for a single annotation a:
• stabqual(a) = aunchanged / (aunchanged+achanged)
• stabcomb(a) = min ( stabqual(a), stabexis(a) )
While stabqual assesses the frequency of quality changes of an annotation, the com-
bined stability measure stabcomb conservatively integrates stabexis and stabqual by calcu-
lating the minimum. Note that the proposed measures have a value range of [0,1].
Thereby, a low value signals instability. Perfect stability is achieved in case of 1, e.g.,
if an annotation is permanently present since its first occurrence (perfect existence
stability) or possesses no quality changes (perfect quality stability). In our evaluation
(Section 4) we will utilize these measures to classify annotations w.r.t. the two quality
criteria age and stability discussed in Section 2.1. Particularly, we use a threshold
criterion to map numerical stability values into corresponding terms of the stability
taxonomy.
Page 9
Estimating the Quality of Ontology-Based Annotations 79
The example in Figure 6 illustrates the proposed measures for four annotations. An
observation period with 5 versions of an annotation mapping (v0-v4) is considered. For
each version the quality term of an annotation is displayed, an empty cell denotes the
temporal non-existence of an annotation in the respective version. The four histories
of (i1,c1,q1), (i2,c2,q1), (i3,c3,q3) and (i4,c4,q2) of version v4 exhibit different evolution
characteristics. Annotation (i1,c1,q1) has been introduced in v0 (i.e., aage=5) and shows
a perfect stability of 1 in stabexis as well as stabqual and thus also in stabcomb. By con-
trast, annotation (i2,c2,q1) of the same age possesses periods of temporal non-existence
(v1,v2) resulting in a low existence stability of 0.6. Furthermore, (i3,c3,q3) is continu-
ously present in 4 versions of p but received two quality changes (q2?q1?q3). Hence,
the quality and the combined stability are poor (0.33). The last annotation (i4,c4,q2)
shows a perfect combined stability, however it is quite novel (aage=2) due to its first
occurrence in version v3.
4 Evaluation
In our evaluation experiments we comparatively analyze the evolution of annotations
in the two large annotation sources Ensembl [11] and Swiss-Prot [3] which annotate
their proteins with concepts of the Gene Ontology [9]. We first analyze how the anno-
tations evolved for the different provenance types, i.e., different kinds of evidence
codes, and how instance (protein) and ontology changes propagated to annotations. In
Section 4.2, we additionally analyze the age and stability indicators of Section 3.
4.1 Provenance Analysis
For our study we use available Swiss-Prot and Ensembl versions between March 2004
and December 2008. During this observation period Swiss-Prot (Ensembl) released 14
(28) major versions, namely versions 43-56 (25-52). Both sources provide many func-
tional protein annotations for various species. Whereas Swiss-Prot primarily contains
manually curated entries, Ensembl focuses on the automatic generation and integra-
tion of data. We consider the functional annotations of human proteins with the
concepts of the Gene Ontology (GO) [9] which consists of the three sub ontologies
‘biological process’, ‘molecular function’ and ‘cellular component’. In the following
we do not differentiate between these sub-ontologies and treat GO as one ontology.
(i3,c3,q3) (i3,c3,q3)
q1
q1
q2
q2
q2
q2
v3
v3
q2
q2
q1
q1
q1
q1
v4
v4
v2
v2
v1
v1
v0
v0
(i4,c4,q2) (i4,c4,q2)
(i2,c2,q1)
(i2,c2,q1)
q1
q1
(i1,c1,q1) (i1,c1,q1)
q1
q1
q1
q1
q1
q1
p
1/(1+0) = 1
1/(1+0) = 1
1/(1+2) = 0.33
1/(1+2) = 0.33
2/(2+0) = 1
2/(2+0) = 1
4/(4+0) = 1
4/(4+0) = 1
stabqual
stabqual
0.330.33
4/4 = 1
4/4 = 144
(i3,c3,q3)(i3,c3,q3)
2/2 = 1
2/2 = 1
3/5 = 0.6
3/5 = 0.6
5/5 = 1
5/5 = 1
stabexis
stabexis
1122
(i4,c4,q2)(i4,c4,q2)
0.60.655
(i2,c2,q1)
(i2,c2,q1)
1155
(i1,c1,q1)
(i1,c1,q1)
stabcomb
stabcomb
aage
aage
annotation aannotation a
Fig. 6. History and measure results of four example annotations
Page 10
80 A. Gross et al.
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
220000
25 27 29 31 33 35 37 39 41 43 45 47 49 51
version
# annotations
man
auto
0
5000
10000
15000
20000
25000
30000
35000
40000
25 27 29 31 33 35 37 39 41 43 45 47 49 51
version
auth
exp
comp
cur
0
5000
10000
15000
20000
25000
30000
43 44 45 46 47 48 49 50 51 52 53 54 55 56
version
auth
exp
comp
auto
cur
Fig. 7. Evolution of annotations in different EC groups
(a) Manually curated vs. automatically assigned (Ensembl)
(b) “Subclasses” of manually-curated (Ensembl)
(c) All annotations (Swiss-Prot)
Note that Swiss-Prot always attempts to incorporate the current GO release whereas
Ensembl often relies on older GO releases in several versions.
Figure 7 shows how the number of GO annotations evolved for different evidence
code groups of the EC taxonomy for Ensembl and Swiss-Prot, respectively. Figure 7(a)
indicates that Ensembl is dominated by automatically assigned GO annotations (about
78% of the 265,000 annotations in the last version). Furthermore, the growth in the
number of automatically determined annotations is very high (factor 4.6 within the last
four years). In addition, there is a substantial number of deletions between v40 to v42. By
contrast, the manually curated annotations grew only modestly by a factor of 1.7. Figure
7(b) shows the development for the manually determined annotations in more detail.
We observe a strong increase for experimentally validated annotations (growthexp: 8.9)
while author statement annotations increased only slightly (growthauth: 1.1). The number
of curator and computational assigned annotations remained on a very low level.
Figure 7(c) illustrates the evolution of annotations in Swiss-Prot which currently
covers about 45,000 annotations, i.e., about six times less than Ensembl. In contrast to
Ensembl, Swiss-Prot contains very few automatically generated annotations (1,440)
which were recently introduced. The main part of Swiss-Prot annotations encom-
passes auth annotations (about 24,000 in v56). Note that their number is slightly
decreasing since v51. The number of exp annotations has significantly increased
(growthexp: 18.5) to about 16,000 at present. Overall, Swiss-Prot provides predomi-
nantly manually curated annotations that exhibit a continuous, stable evolution with-
out remarkable fluctuations.
The table in Figure 8 summarizes the number of evolution operations that have been
carried out since March 2004 in Swiss-Prot and Ensembl. To determine the changes we
compared objects of different versions based on their accession numbers to generate
sets of added or deleted objects. More complex changes such as the substitution or
merge of proteins that may cause annotation changes (Chgins) or deletions (Delins) were
identified with the help of evolution information provided by the source distributors.
Particularly, Swiss-Prot offers web services to keep track of the protein history, e.g.,
accession number changes, while Ensembl logs change events between released ver-
sions, e.g., what proteins were replaced by others in a new version. Whereas the
Page 11
Estimating the Quality of Ontology-Based Annotations 81
exp
auth
cur
comp
auto
obs
sum
15,751 48.2%
11,307 34.6% 15,177 83.3%
339
1.0%
3,730 11.4%
1,541
4.7%
0
0.0%
32,668
1,830 10.0%1,784 17.0%
7,350 70.0%
73
1,214 11.6%
81
0
10,502
25,979
34,046
6,362
6,734
6.6%
8.7% 16,381 34.3% 29,148 14.0%
1.6% 3000.6%
1.7% 5,720 12.0%
0.8% 316,979 80.9% 18,344 38.4% 157,632 75.6%
0.0%1,8260.5%1,234
391,92647,805
5,826 12.2%7,575 3.6%
65 0.4%
6.1%
0.2%
0.0%
0.7%6,318
4,362
3.0%
2.1% 1,107
35
0
18,214
2.6%3,550
208,585
1.7%
Del
l bme snE
Chg
t orP- s s iwS
ChgAddDelAdd
Fig. 9. Distribution of the operations add, change, delete in different EC groups in Ensembl and
Swiss-Prot
majority of changes are additions (60% in Ensembl, 53% in Swiss-Prot) there is a
surprising number of deletions and changes, apparently influenced by some major
reorganization such as introduction of new accession numbers. For example, in Swiss-
Prot about 30% of all evolution changes are annotation changes (Chg) which were
primarily caused by instance changes keeping corresponding annotations alive instead
of deleting them. By contrast, annotation changes in Ensembl are dominated by quality
(here: EC code) changes. In both sources ontology changes only marginally influence
changes on annotations. This is also influenced by the fact that annotations are admin-
istrated within the instance sources while ontologies are developed independently from
the instances. Finally, the number of deletions is non-negligible in both sources espe-
cially in Ensembl where 32% of all changes are annotation deletions.
We now analyze the distribution of the evolution operations add, change and de-
lete for the different EC groups, as summarized in Figure 9. In Swiss-Prot about one
half of the additions are experimentally validated annotations and a third comprises
auth annotations. By contrast, change (83%) and delete operations (70%) primarily
occur for auth annotations indicating a rather high instability for this provenance type.
On the other hand, Ensembl predominantly adds and deletes automatically generated
annotations (81% and 75% of all adds/deletes, respectively). Annotation changes are
distributed mainly over automatically assigned (38%) and author statement annota-
tions (34%). In summary, the evolution of existing annotations occurs primarily for
auto and auth annotations.
We further analyze provenance (EC) changes in more detail to see which new EC
codes are chosen for improved annotation quality. The tables in Figure 10 aggregates
EC changes in Swiss-Prot and Ensembl for versions since March 2004. Each cell
Add
Chgins
Chgont
18,214 (30%)
Chgqual
Delann
Delins
Delont
abs. (%)
32,613 (53%)
-
391,771 (60%)
-
16,106562,0528,5111,369622
abs. (%)
4,31017143,324145,20960,7882,588
208,585 (32%)47,805 (8%)
DelChg
10,502 (17%)
Sp
E
Fig. 8. Number (and percentage) of evolution operations aggregated over all versions in Swiss-
Prot (Sp) and Ensembl (E)
Page 12
82 A. Gross et al.
outlines how many annotations changed from one evidence code (rows) to another
(columns). Note, that we aggregate changes into the EC groups exp, auth, cur, comp,
auto and obs, e.g., changes from ISS to TAS are summarized in “from comp to auth”
while changes from IPI to IDA are mapped into “from exp to exp”. We observe that,
annotation changes in Swiss-Prot primarily (72%) occur for author statement (auth)
annotations and that most new annotations (66%) are experimentally proved (exp).
This shows the progress of annotation development in the recent years by increasingly
using biologically proved annotations which are preferred over mere author state-
ments. In Ensembl, the vast amount of automatically generated annotations leads to a
somewhat different picture. Only for the shares of two EC groups, auto and exp, there
is an increase for the new EC codes compared to the original ones. All other EC types
reduced their shares due to EC changes, especially auth annotations. Most EC
changes occurred – in both directions – between auto and auth annotations indicating
a high instability of these provenance categories.
4.2 Age and Stability Analysis
In addition to the evidence code (provenance) information, we now analyze the age
and stability measures introduced in Section 3. This analysis occurs for the currently
available annotations in the latest versions of Ensembl and Swiss-Prot. We compare
these annotations with all versions in the last three years (p), i.e., we use the versions
26-52 of Ensembl and versions 47-56 of Swiss-Prot.
We map the age and stability values into quality taxonomies mentioned in Section 2.
We differentiate three age groups: annotations that exist since half a year (novel), those
that were generated between half and one and a half years ago (middle) and annotations
that are older than one and a half years (old). For the stability criteria stabexis, stabqual and
the combination stabcomb we use a minimum threshold of 0.9 for stable annotations;
lower values indicate unstable annotations. Hence, a stable annotation must be present
in at least 90% of the versions since its first occurrence and at most 10% quality (EC)
changes can occur in the history of an annotation. Note, that we leave out all annotations
with evidence code NR (not recorded) and ND (no biological data available) since
these annotations provide no valuable information.
Figure 11 displays the classification results of our method for both annotation
sources. The 45,000 (263,000) annotations in Swiss-Prot (Ensembl) are classified
using the three mentioned criteria: provenance (rows), age (columns) further sepa-
rated by the three stability criteria. White (grey) rows denote the number of
from / to
exp
auth
cur
comp
auto
Sum
exp
147
1,121
7
160
16
1,451
66%
auth
24
270
9
197
4
504
23%
cur comp auto
0
34
0
7
0
41
2% 10%
Sum
214
1,590
19
364
21
2,208
42
165
3
0
1
211
1
0
0
0
0
1
0%
10%
72%
1%
16%
1%
from / to
exp
auth
cur
comp
auto
obs
Sum
exp
896
1,592
21
1,280
3,311
79
7,179
16%
auth
413
798
27
1,206
10,169
391
13,004
29%
cur
11
73
0
26
228
9
347
1%
comp
1,259
1,038
16
0
2,329
12
4,654
11%
auto
2,966
11,901
182
3,101
0
725
18,875
43%
obs
3
23
0
0
116
0
142
0%
Sum
5,548
15,425 35%
246
5,613
16,153 37%
1,216
44,201
13%
1%
13%
3%
Fig. 10. Evidence codes changes in Swiss-Prot (left) and Ensembl (right)
Page 13
Estimating the Quality of Ontology-Based Annotations 83
Swiss-Prot
|stabexis|
7,980
|stabqual| |stabcomb| |stabexis|
6,9656,905
1,0991,159
21,91321,760
320
160
24
1,5991,589
68
96
1
30,73330,510
1,5121,735
|stabqual| |stabcomb| |stabexis|
2,2662,266
40
1,1011,101
6
36
0
362
2
35
0
3,8003,800
48
|stabqual| |stabcomb|
5,637
18
1,054
0
115
0
844
1
1,308
0
8,958
19
2,306 5,6555,637
840 40018
22,064
169
184
1,1071,0541,054
473
160
24
0600
36
0
36115115
0000
1,651364362845844
16
96
78
96
0201
35
0
351,3081,308
11000
31,975
270
3,8488,9778,958
048019
cur
comp
exp
auth
novel
sum
auto
oldmiddle
Ensembl
|stabexis|
9,473
|stabqual| |stabcomb| |stabexis|
8,7748,415
1,3401,699
20,48819,700
2,9573,745
190
63
1,1701,079
743
89,11568,440
3,35924,034
104,929 119,737
23,2708,462
|stabqual| |stabcomb| |stabexis|
3,0623,057
325
3,9493,942
124
60
7
354
303
63,245 61,442
709 2,512
70,67068,854
1,4683,284
|stabqual| |stabcomb|
8,650
215
2,425
35
149
8
885
32
49,608
301
61,717
591
3,3788,8088,650
158
2,425
64193300
22,421
1,024
238
4,2442,492
9311
60
067
149184
69
67
0
157
15708
1,715
198
71,082
21,392
470353
124
942885
5783470
62,136
1,818
70,295
1,843
49,909 49,608
301
61,717
591
0
97,818
30,381
62,308
0
exp
auth
cur
oldmiddlenovel
comp
auto
sum
Fig. 11. Classification of annotations in Swiss-Prot and Ensembl by provenance, age and stabil-
ity; stab>0.9 (white), stab <= 0.9 (grey)
annotations that lie above (beyond) the stability threshold. Swiss-Prot covers propor-
tionately more older annotations (72%) than Ensembl (49%). By contrast, the use of
automatic annotations allows Ensembl a relative high share (24%) of young/novel
annotations. Despite the high share of older annotations, only 4% of the Swiss-Prot
annotations are classified as unstable compared to 13% in Ensembl (using stabcomb).
In other words, Swiss-Prot (Ensembl) covers 96% (87%) stable annotations.
Considering the three stability criteria one can recognize for both sources that nov-
el and middle aged annotations are rarely classified as unstable due to their short
history compared to old annotations. Hence, we examine old annotations more pre-
cisely w.r.t. their stability. In Swiss-Prot the majority of unstable annotations is due to
EC changes (stabqual, stabcomb) while relatively few annotations had an existence in-
stability. Most of the existentially unstable annotations (stabexis) are of type auth while
the absolute majority of unstable Swiss-Prot annotations are of type exp. This is in
accordance to our observations for EC changes (Figure 10) where many annotations
changed to experimental proved annotations. Such instabilities for the current annota-
tions may thus be seen as a provenance improvement. In Ensembl the number of
Page 14
84 A. Gross et al.
unstable annotations is primarily caused by existential instability (stabexis) caused by
temporal non-existence of annotations. The majority of unstable annotations occurs
for auto (79%) and auth (12%) annotations confirming their high instability observed
earlier.
Our assessment approach seems especially valuable for annotation sources such as
Ensembl containing many unverified annotations that are automatically generated.
The approach allows the identification of reliable and less reliable annotations w.r.t.
three significant criteria: age, provenance and stability. The used measures stabexis and
stabqual constitute orthogonal methods providing different classification results. Users
can thus filter a set of annotations, e.g., using only those annotations that existed for a
longer time, are experimentally proved or do not show existence or provenance insta-
bilities. For example, one may consider annotations as reliable if they are stable with
a middle or old age exhibiting a manual provenance. For these criteria, 34,179
(36,790) annotations of Swiss-Prot (Ensembl) would qualify, i.e., 76% (14%) of all
available annotations. Naturally, the selection of quality criteria and the correspond-
ing thresholds (e.g., for age or stability) are highly dependent on the application. So
users could also be interested in novel or unstable annotations as these are under
strong revision due to a high research interest.
The last aspect underlines that annotation instability is not necessarily a negative
feature but may indicate interesting objects or significant new biological findings.
Conversely, a high stability may be observed for objects of little interest. The pro-
posed evaluation method allows the selection of either stable or unstable annotations
and can thus meet the requirements of different applications and annotation use cases.
5 Related Work
Our work is related to the areas of ontology-based data quality and change manage-
ment which have received only little attention so far. The current work on change
management mainly focuses on ontologies instead of annotations. There are several
approaches that investigate ontology versioning [14,15], define change operations
describing differences between two ontologies [17], and formalize the evolution proc-
ess [20,21]. Complementary, there are only few approaches analyzing the ontology
evolution quantitatively [10,24]. In [10] we utilized a generic framework to study the
evolution of existing ontologies and to quantify changes of annotation and ontology
mappings. Our approach in this paper refines the proposed framework by capturing
causes of mapping changes. Hence, we can quantify the changes that have been influ-
enced by ontology and instance changes (additions and deletions) and those resulting
from provenance changes, whereas [10] only quantifies added and deleted mapping
correspondences (annotations). Furthermore, we introduce and analyze several quality
indicators of annotation in this paper.
Data or information quality [19] has been primarily addressed in the context of da-
ta integration [16,18]. In life sciences, the quality of annotations especially Gene
Ontology annotations including evidence codes has been studied in [6,12,22]. Particu-
larly, the case study in [6] assesses annotation quality by using quality-scores for ECs
thereby the scores are intuitively defined by the authors. They show descriptive and
comparative statistics w.r.t. the quality-scores and annotations in model eukaryotes.
Page 15
Estimating the Quality of Ontology-Based Annotations 85
Furthermore, [12] developed a method to estimate the error rate of curated sequence
annotations for a particular evidence code (ISS). The approach utilizes the GOSeqLite
database to compare annotations that were generated by sequence similarity vs. those
that were not. In [22] the authors recommend the utilization of ECs as an indicator for
their reliability. In addition, they show simple distribution statistics of annotations for
three self-defined classes (homology-based, literature-based and others) and different
species but do not examine the annotation evolution. In contrast to previous work on
annotation quality, we propose a generic evolution model allowing a multidimen-
sional analysis of annotations w.r.t. different quality taxonomies (age, stability, prove-
nance). The model makes heavily use of quantified evolutionary changes on instance
and ontology level but also includes annotation (quality) modifications.
Like our work, [23] provides stability measures to rate correspondences of avail-
able mappings but is focused on ontology mappings interconnecting two ontologies.
The idea behind this approach is to consider the correspondence stability in addition
to the computed element similarity. Conversely, our approach in this paper focuses on
annotation mappings and takes multiple quality taxonomies into account to specify
and classify the quality of annotations.
6 Conclusion and Future Work
We propose a generic approach to estimate the quality of ontology-based annotations
by taking their evolution history into account. The approach considers instance and
ontology changes and their influence on annotation mappings. Our annotation model
supports different quality measures, such as provenance, age, and stability and the use
of quality taxonomies. For provenance information we utilize existent information on
evidence codes. We propose different stability measures for annotations taking tem-
poral non-existence and provenance changes into account. Our approach can be used
in different scenarios, e.g., by various analysis applications to filter ingoing annota-
tions and by annotation providers to improve their data quality, especially when they
integrate annotations from other data sources.
We applied our model-based approach in a comparative evaluation to study func-
tional protein annotations provided by two large life science annotation sources, namely
Swiss-Prot and Ensembl. We observed that most annotation changes are additions of
new annotations but there are also many changes and deletions of existing annotations.
Most of the annotation changes are caused by instance changes or evidence code
changes while ontology changes had a minor impact on existing annotations. We also
observed that new experimental findings frequently cause the evidence code of existing
annotations to be updated. The high instability was observed for automatically gener-
ated annotations (in Ensembl) and annotations based on author statements.
We see several directions for future work. First, our annotation model can be ap-
plied for additional annotation data sets, e.g., for different species. Second, the pro-
posed approach can be utilized for enhancing instance-based matching techniques that
heavily depend on the reliability of input annotations. Likewise, the quality of auto-
matically generated annotations can probably be improved when they are based on
existing high quality annotations, e.g., to avoid verified annotations to be overwritten
by automatically determined ones or to mark them as new when they are generated
for the first time.