Conference PaperPDF Available

A Survey on Ontology Evaluation Methods


Abstract and Figures

Ontologies nowadays have become widely used for knowledge representation, and are considered as foundation for Semantic Web. However with their wide spread usage, a question of their evaluation increased even more. This paper addresses the issue of finding an efficient ontology evaluation method by presenting the existing ontology evaluation techniques, while discussing their advantages and drawbacks. The presented ontology evaluation techniques can be grouped into four categories: gold standard-based, corpus-based, task-based and criteria based approaches.
Content may be subject to copyright.
A Survey on Ontology Evaluation Methods
Joe Raad and Christophe Cruz
CheckSem Team, Le2i, University of Burgundy, Dijon, France,
Keywords: Semantic Web, Ontology, Evaluation.
Abstract: Ontologies nowadays have become widely used for knowledge representation, and are considered as
foundation for Semantic Web. However with their wide spread usage, a question of their evaluation
increased even more. This paper addresses the issue of finding an efficient ontology evaluation method by
presenting the existing ontology evaluation techniques, while discussing their advantages and drawbacks.
The presented ontology evaluation techniques can be grouped into four categories: gold standard-based,
corpus-based, task-based and criteria based approaches.
For most people, the World Wide Web has
become quite a long time ago an indispensable
means of providing and searching for information.
However, searching the web in its current form
usually provides a large number of irrelevant
answers, and leaves behind some other interesting
ones. The main reason of these unwanted results is
that existing Web resources are mostly only human
understandable. Therefore, we can clearly see the
necessity of extending this web and transform it into
a web of data that can be processed and analysed
also by machines.
This extension of the web through defined
standards is called the Semantic Web, or could also
be known by the term Web 3.0. This extended web
will make sure that machines and human users will
have a common communicating language, by
annotating web pages with information on their
contents. Such annotations will be given in some
standardized, expressive language and make use of
certain terms. Therefore one needs the use of
ontologies to provide a description of such terms.
Ontologies are fundamental Semantic Web
technologies, and are considered as its backbone.
Ontologies define the formal semantics of the terms
used for describing data, and the relations between
these terms. They provide an “explicit specification
of a conceptualization” (Gruber, 1993). The use of
ontologies is rapidly growing nowadays, as they are
now considered as the main knowledge base for
several semantic services like information retrieval,
recommendation, question answering, and decision
making services. A knowledge base is a technology
used to store complex information in order to be
used by a computer system. A knowledge base for
machines is equivalent to the level of knowledge for
humans. A human’s decision is not only affected by
how every person thinks (which is the reasoning for
machines), it is significantly affected by the level of
knowledge he has (knowledge base for machines).
For instance, the relationship of the two terms
“Titanic” and “Avatar” does not exist at all for a
given person. But, another person identifies them as
related since these terms are both movie titles.
Furthermore, a movie addict strongly relates these
two terms, as they are not only movie titles, but
these movies also share the same writer and director.
We can see the influence and the importance of the
knowledge base (level of knowledge for humans) in
every resulting decision. Therefore we can state that
having a “good” ontology can massively contribute
to the success of several semantic services and
various knowledge management applications. In this
paper, we investigate what makes a “good” ontology
by studying different ontology evaluation methods
and discuss their advantages. These methods are
mostly used to evaluate the quality of automatically
constructed ontologies.
The remainder of this paper is organized as
follows. The next section presents an introduction on
ontologies and the criteria that need to be evaluated.
Section three presents different types of ontology
evaluation methods. Finally, before concluding, the
last section presents the advantages of each type of
evaluation method and proposes an evaluation
method based on the previous existing ones.
The word ontology is frequently used to mean
different things, (e.g. glossaries and data
dictionaries, thesauri and taxonomies, schemas and
data models, and formal ontologies and inference).
Despite having different functionalities, these
different knowledge sources are very similar and
connected in their main purpose to provide
information on the meaning of elements. Therefore,
due to the similarity of these knowledge sources, and
in order to simplify the issue, we use the term
ontology in the rest of this paper even though some
of the papers are considering taxonomies in their
An example of one of the most used knowledge
sources is the large English lexical database
WordNet. In WordNet, there are four commonly
used semantic relations for nouns, which are
hyponym/hypernym (is-a), part meronym/part
holonym (part-of), member meronym/member
holonym (member-of) and substance
meronym/substance holonym (substance-of). A
fragment of (is-a) relation between concepts in
WordNet is shown in Figure 1. We can also find
many other popular general purpose ontologies like
YAGO and SENSUS, and some domain specific
ontologies like UMLS and MeSH (for biomedical
and health related concepts), SNOMED (for clinical
healthcare concepts), GO (for gene proteins and all
concerns of organisms) and STDS (for earth-
referenced spatial data).
However, the provided information by ontologies
could be very subjective. This is mainly due to the
fact that ontologies heavily depend on the level of
knowledge (e.g. the case of an ontology constructed
by human experts) or depend on its information
sources (e.g. the case of an automatically
constructed ontology).
Figure 1: A Fragment of (is-a) Relation in WordNet.
In addition, while being useful for many
applications, the size of ontologies can cause new
problems that affect different steps of the ontology
life cycle (d’Aquin et al., 2009). For instance, real
world domain ontologies, and especially complex
domain ontologies such as medicine, can contain
thousands of concepts. Therefore these ontologies
can be very difficult to create and normally require a
team of experts to be maintained and reused.
Another problem caused by large ontologies, is their
processing. Very large ontologies usually cause
serious scalability problems and increase the
complexity of reasoning. Finally, the most important
problem of large ontologies is their validation. Since
ontologies are considered as reference models, one
must insure their evaluation in the view of two
important perspectives (Hlomani & Stacey, 2014):
quality and correctness. These two perspectives
address several criteria (Vrandečić, 2009; Obrst et
al., 2007; Gruber, 1995; Gómez-Pérez, 2004;
Gangemi et al., 2005):
Accuracy is a criterion that states if the definitions,
descriptions of classes, properties, and individuals in
an ontology are correct.
Completeness measures if the domain of interest is
appropriately covered in this ontology.
Conciseness is the criteria that states if the
ontology includes irrelevant elements with regards
to the domain to be covered.
Adaptability measures how far the ontology
anticipates its uses. An ontology should offer the
conceptual foundation for a range of anticipated
Clarity measures how effectively the ontology
communicates the intended meaning of the defined
terms. Definitions should be objective and
independent of the context.
Computational efficiency measures the ability of
the used tools to work with the ontology, in
particular the speed that reasoners need to fulfil the
required tasks.
Consistency describes that the ontology does not
include or allow for any contradictions.
In summary, we can state that ontology
evaluation is the problem of assessing a given
ontology from the point of view of these previously
mentioned criteria, typically in order to determine
which of several ontologies would better suit a
particular purpose. In fact, an ontology contains both
taxonomic and factual information that need to be
evaluated. Taxonomic information includes
information about concepts and their association
usually organized into a hierarchical structure. Some
approaches evaluate taxonomies by comparing them
with a reference taxonomy or a reference corpus.
This comparison is based on comparing the concepts
of the two taxonomies according to one or several
semantic measures. However, semantic measure is a
generic term covering several concepts (Raad et al.,
Semantic relatedness, which is the most general
semantic link between two concepts. Two concepts
do not have to share a common meaning to be
considered semantically related or close, they can be
linked by a functional relationship or frequent
association relationship like meronym or antonym
concepts (e.g. Pilot “is related to” Airplane).
Semantic similarity, which is a specific case of
semantic relatedness. Two concepts are considered
similar if they share common meanings and
characteristics, like synonym, hyponym and
hypernym concepts (e.g. Old “is similar to”
Semantic distance, is the inverse of the semantic
relatedness, as it indicates how much two concepts
are unrelated to one another.
The following section presents the different
existing types of ontology evaluation methods.
Ontology evaluation is based on measures and
methods to examine a set of criteria. The ontology
evaluation approaches basically differ on how many
of these criteria are targeted, and their main
motivation behind evaluating the taxonomy. These
existing approaches can be grouped into four
categories: gold standard, corpus-based, task-based,
and finally criteria based approaches.
This paper aims to distinguish between these
categories of approaches and their characteristics
while presenting some of the most popular works.
3.1 Gold Standard-based
Gold standard based approaches which are also
known as ontology alignment or ontology mapping
are the most straight-forward approach (Ulanov,
2010). This type of approach attempts to compare
the learned ontology with a previously created
reference ontology known as the gold standard. This
gold standard represents an idealized outcome of the
learning algorithm. However, having a suitable gold
ontology can be challenging, since it should be one
that was created under similar conditions with
similar goals to the learned ontology. For this reason
some approaches create specific taxonomies with the
help of human experts to use it as the gold standard.
While other approaches prefer to use reliable,
popular taxonomies in a similar domain to consider
it as their reference taxonomy, since it saves a
considerable amount of work.
For instance, Maedche and Staab (2002)
consider ontologies as two-layered systems,
consisting of a lexical and a conceptual layer. Based
on this core ontology model, this approach measures
similarity between the learned ontology and a
tourism domain ontology modelled by experts. It
measures similarity based on the notion of lexicon,
reference functions, and semantic cotopy which are
described in details in (Maedche & Staab, 2002).
In addition, Ponzetto and Strube (2007) evaluate
its derived taxonomy from Wikipedia by comparing
it with two benchmark taxonomies. First, this
approach maps the learned taxonomy with
ResearchCyc using lexeme-to-concept denotational
mapper. Then it computes semantic similarity with
WordNet using different scenarios and measures:
Rada et al. (1989), Wu and Palmer (1994), Leacock
and Chodorow (1998), and Resnik’s measure
Treeratpituk et al. (2013) evaluate the quality of
its constructed taxonomy from a large text corpus by
comparing it with six topic specific gold standard
taxonomies. These six reference taxonomies are
generated from Wikipedia using their proposed
GraBTax algorithm.
Zavitsanos et al. (2011) also evaluate the learned
ontology against a gold reference. This novel
approach transforms the ontology concepts and their
properties into a vector space representation, and
calculates the similarity and dissimilarity of the two
ontologies at the lexical and relational levels.
This type of approach is also used by Kashyap
and Ramakrishnan (2005). They use the MEDLINE
database as the document corpus, and the MeSH
thesaurus as the gold standard to evaluate their
constructed taxonomy. The evaluation process
compares the generated taxonomy with the reference
taxonomy using two classes of metrics: (1) Content
Quality Metric: it measures the overlap in the labels
between the two taxonomies in order to measure the
precision and the recall. (2) Structural Quality
Metric: it measures the structural validity of the
labels. i.e. when two labels appear in a parent-child
relationship in one taxonomy, they should appear in
a consistent relationship (parent-child or ancestor-
descendant) in the other taxonomy.
Gold standard-based approaches are efficient in
evaluating the accuracy of an ontology. High
accuracy comes from correct definitions and
descriptions of classes, properties and individuals.
Correctness in this case may mean compliance to
defined gold standards. In addition, since a gold
standard represents an ideal ontology of the specific
domain, comparing the learned ontology with this
gold reference can efficiently evaluate if the
ontology covers well the domain and if it includes
irrelevant elements with regards to the domain.
3.2 Corpus-based
Corpus-based approaches, also known as data-
driven approaches, are used to evaluate how far an
ontology sufficiently covers a given domain. The
concept of this type of approach is to compare the
learned ontology with the content of a text corpus
that covers significantly a given domain. The
advantage is to compare one or more ontologies with
a corpus, rather than comparing one ontology with
another existing one.
One basic approach is to perform an automated
term extraction on the corpus and simply count the
number of concepts that overlap between the
ontology and the corpus. Another approach is to use
a vector space representation of the concepts in both
the corpus and the ontology under evaluation in
order to measure the fit between them. In addition,
Brewster et al. (2004) evaluate the learned ontology
by firstly applying Latent Semantic Analysis and
clustering methods to identify keywords in a corpus.
Since every keyword can be represented in a
different lexical way, this approach uses WordNet to
expand queries. Finally, the ontology can be
evaluated by mapping the set of concepts identified
in the corpus to the learned ontology.
Similarly, Patel et al. (2003) evaluate the
coverage of the ontology by extracting textual data
from it, such as names of concepts and relations. The
extracted textual data are used as input to a text
classification model trained using standard machine
learning algorithms.
Since this type of evaluation approach can be
considered similar in many aspects to the gold-
standard based approach, the two types of
approaches practically cover the same evaluation
criteria: accuracy, completeness and conciseness. In
addition, the main challenge in this type of approach
is also similar to the challenge in the gold-standard
based approaches. However, it is easier. Finding a
corpus that covers the same domain of the learned
ontology is notably easier than finding a well-
represented domain specific ontology. For example,
Jones and Alani (2006) use the Google search
engine to find a corpus based on a user query. After
extending the user query using WordNet, the first
100 pages from Google results are considered as the
corpus for evaluation.
3.3 Task-based
Task-based approaches try to measure how
far an ontology helps improving the results of a
certain task. This type of evaluation considers that a
given ontology is intended for a particular task, and
is only evaluated according to its performance in this
task, regardless of all structural characteristics.
For example, if one designs an ontology for
improving the performance of a web search engine,
one may collect several example queries and
compare whether the search results contain more
relevant documents if a certain ontology is used
(Welty et al., 2003).
Haase and Sure (2005) evaluate the quality of an
ontology by determining how efficiently it allows
users to obtain relevant individuals in their search. In
order to measure the efficiency, the authors
introduce a cost model to quantify the necessary
user’s effort to arrive at the desired information.
This cost is determined by the complexity of the
hierarchy in terms of its breadth and depth.
Task-based approaches are considered the most
efficient in evaluating the adaptability of an
ontology, by applying the ontology to several tasks
and evaluating its performance for these tasks. In
addition, task-based approaches are mostly used in
evaluating the compatibility between the used tool
and the ontology, and computing the speed to fulfil
the intended task. Finally, this type of approach can
also detect inconsistent concepts by studying the
performance of an ontology in a specified task.
3.4 Criteria based
Criteria-based approaches measures how far an
ontology or taxonomy adheres to certain desirable
criteria. One can distinguish between measures
related to the structure of an ontology and more
sophisticated measures.
3.4.1 Structure-based
Structure-based approaches compute various
structure properties in order to evaluate a given
taxonomy. For this type of measure, it is usually no
problem to have a fully automatic evaluation since
these measures are quite straightforward and easy to
understand. For instance, one may measure the
average taxonomic depth and relational density of
nodes. Others might evaluate taxonomies according
to the number of nodes, etc. For instance, Fernandez
et al. (2009) study the effect of several structural
ontology measures on the ontology quality. From
these experiments, the authors conclude that richly
populated ontologies with a high breadth and depth
variance are more likely to be correct. On the other
hand, Gangemi et al. (2006) evaluate ontologies
based on whether there are cycles in the directed
3.4.2 Complex and Expert based
There are a lot of complex ontology evaluation
measures that try to incorporate many aspects of
ontology quality. For example, Alani and Brewster
(2006) include several measures of ontology
evaluation in the prototype system AKTiveRank,
like class match measure, density and betweenness
which are described in details in (Alani & Brewster,
In addition, Guarino and Welty (2004) evaluate
ontologies using the OntoClean system, which is
based on philosophical notions like the essence,
identity and unity. These notions are used to
characterize relevant aspects of the intended
meaning of the properties, classes, and relations that
make up an ontology.
Lozano-Tello and Gomez-Perez (2004), evaluate
taxonomies based on the notion of multilevel tree of
characteristics with scores, which includes design
qualities, cost, tools, and language characteristics.
Criteria based approaches are the most efficient
in evaluating the clarity of an ontology. The clarity
could be evaluated using simple structure-based
measures, or more complex measures like
OntoClean. In addition, this type of approach is
capable on measuring the ability of the used tools to
work with the ontology by evaluating the ontology
properties such as the size and the complexity.
Finally, criteria-based measures and especially the
more complex ones are efficient in detecting the
presence of contradictions by evaluating the axioms
in an ontology.
4.1 Overview
In section two, we presented the criteria that need
to be available in a “good” ontology. Then in section
three, we presented several ontology evaluation
methods that tackle some of these criteria. The
relationship between these criteria and methods is
more or less complex: criteria provide justifications
for the methods, whereas the result of a method will
provide an indicator for how well one or more
criteria are met. Most methods provide indicators for
more than one criteria. Table I presents an overview
of the discussed ontology evaluation methods.
Table I: An overview of ontology evaluation methods.
It is difficult to construct a comparative table that
compares the ontology evaluation methods based on
their addressed criteria. This is mainly due to the
diversity of every evaluation approach, even the
ones that are grouped under the same category. In
Table I we present a comparison of the evaluation
methods, based on the previously presented criteria.
A darker colour in the table represents a better
coverage for the corresponding criterion.
Accuracy is a criterion that shows if the axioms
of an ontology comply with the domain knowledge.
A higher accuracy comes from correct definitions
and descriptions of classes, properties and
individuals. Evaluating if an ontology has a high
accuracy can typically be achieved by comparing the
ontology to a gold reference taxonomy or to a text
corpus that covers the domain.
Completeness measures if the domain of interest
is appropriately covered. An obvious method is to
compare the ontology with a text corpus that covers
significantly the domain, or with a gold reference
ontology if available.
Conciseness is the criteria that states if the
ontology includes irrelevant elements with regards
to the domain to be covered or redundant
representations of the semantics. Comparing the
ontology to a text corpus or a reference ontology that
only contain relevant elements is an efficient method
to evaluate the conciseness of a given ontology. One
basic approach is to check if every concept in the
ontology (and its synonym) is available in the text
corpus or the gold ontology.
Adaptability measures how far the ontology
anticipates its use. In order to evaluate how efficient
new tools and unexpected situations are able to use
the ontology, it is recommended to use the ontology
in these new situations and evaluate its performance
depending on the task.
Clarity measures how effectively the ontology
communicates the intended meaning of the defined
terms. Clarity depends on several criteria: definitions
should be objective and independent, ontologies
should use definitions instead of description for
classes, entities should be documented sufficiently
and be fully labelled in all necessary languages, etc.
Most of these criteria can ideally be evaluated using
criteria based approaches like OntoClean (Guarino
& Welty, 2004).
Computational Efficiency measures the ability
of the used tools to work with the ontology, in
particular the speed that reasoners need to fulfil the
required tasks. Some types of axioms, in addition to
the size of the ontology may cause problems for
certain reasoners. Therefore evaluating the
computational efficiency of an ontology could be
done by checking its performance in different tasks.
This will allow us to compute the compatibility
between the tool and the ontology, and the speed to
fulfil the task. Furthermore, structure based
approaches that evaluate the ontology size, in
addition to more sophisticated criteria based
approaches that evaluate the axioms of the ontology
can also prove to be a solution to evaluate the
computational efficiency in a given ontology.
Consistency describes that the ontology does not
include or allow for any contradictions. An example
for an inconsistency is the description of the element
Lion being “A lion is a large tawny-coloured cat that
lives in prides”, but having a logical axiom
ClassAssertion(ex: Type_of_chocolate ex: Lion).
Consistency can ideally be evaluated using criteria
based approaches that focus on axioms, or also can
be detected and evaluated according to the
performance of the ontology in a certain task.
As figured in Table I, all type of approaches
provide indicators for more than one criteria.
However, still none of the mentioned approaches
can evaluate an ontology according to all the
mentioned criteria. In order to target as many criteria
as possible, one can evaluate an ontology by
combining two or more type of approaches.
According to Table I, we clearly see the resemblance
of the gold standard and corpus based approaches.
We also see the resemblance of the criteria evaluated
by the task based and criteria based approaches,
despite having completely different evaluation
principles. Therefore, evaluating an ontology using a
gold standard based or a corpus based approach, in
addition of evaluating the ontology based on a task
based or criteria based approach can target at least
six out of seven evaluation criteria. However, the
challenging part is to find the most efficient and
compatible measures in every type of approach in
order to succeed in combining two (or more)
4.2 Proposition
Now after we studied different ontology
evaluation methods, which approach is the most
efficient one? Unfortunately, we cannot conclude
from this survey which approach is the “best” to
evaluate an ontology in general. We believe that the
motivation behind evaluating an ontology can give
one approach the upper hand on the others. In this
context, and according to Dellschaft and Staab
(2008), we should distinguish between two
scenarios. The first scenario is choosing the best
approach to evaluate the learned ontology, and the
second scenario is choosing the best approach to
evaluate the ontology learning algorithm itself.
According to (Dellschaft & Staab, 2008) task-based,
corpus-based and structure based approaches are
identified to be more efficient in evaluating the
learned ontology. While gold standard based and
complex and expert based approaches are identified
to be more efficient in evaluating the ontology
learning algorithm.
We propose, based on Porzel and Malaka’s
approach (2004), to evaluate the learned ontology
using a task-based approach that also require the use
of a gold standard. For instance, let’s consider that
the learned ontology is intended to be used in a
system that classifies a large number of documents.
This system will classify documents based on
several criteria like their themes and authors, and
will use the learned ontology as its knowledge base.
Therefore, the classification process is influenced by
two main factors: the classification algorithm and
the ontology being used as a knowledge base.
We propose to evaluate the ontology by
comparing the classification results obtained using
the automatically learned ontology with the
classification results obtained using a gold standard
ontology. We should mention that all the
classification factors, and mainly the classification
algorithm should kept unchanged between the two
experiences. Figure 2 illustrates the evaluation
Figure 2: Ontology Evaluation Proposition.
We manage in this proposition to cover the two
mentioned scenarios and to cover the maximum
number of criteria by combining the task-based
approach with the gold standard approach. This
approach benefits from the simplicity of the task-
based measures compared to the complexity of the
similarity measures used in the gold-standard based
approaches. It also benefits from the importance of
having an ideal reference ontology for comparison.
However it carries the main drawback of the gold-
standard based approaches, which is finding or
constructing a matching reference ontology to
compare the performance.
This survey can be considered as an introduction to a
large topic. Finding an efficient approach to evaluate
any ontology is still an unresolved issue, despite the
large number of researches targeting this issue for
many years.
After presenting several evaluation methods and
discussing their drawbacks and advantages, our next
objective is to directly compare its efficiency with
the other evaluation methods. Our aim is to finally
have a unified (semi-)automatic approach to
evaluate an ontology with the minimum involvement
of the human experts.
In the last years, the development of ontology-
based applications has increased considerably. This
growth increases the necessity of finding an efficient
approach to evaluate these ontologies. Finding
efficient evaluation schemes contributed heavily to
the overwhelming success of disciplines like
information retrieval, machine learning or speech
recognition. Therefore having a sound and
systematic approach to ontology evaluation is
required to transform ontology engineering into a
true scientific and engineering discipline.
In this paper, we presented the importance of
ontologies, and the criteria expected to be available
in these ontologies. Then we presented different
approaches that aim to guarantee the maintenance of
some of these criteria in automatic constructed
ontologies. These approaches can be grouped into
four categories: golden-standard, corpus-based, task-
based, and finally criteria based approaches. Finally
we proposed an approach to evaluate ontologies by
combining the task-based and the gold-standard
approaches in order to cover the maximum number
of criteria.
The authors would like to thank the “Conseil
Régional de Bourgogne” for their valuable support.
Alani, H., & Brewster, C. (2006). Metrics for ranking
d’Aquin, M., Schlicht, A., Stuckenschmidt, H., & Sabou,
M. (2009). Criteria and evaluation for ontology
modularization techniques. In Modular ontologies (pp.
67-89). Springer Berlin Heidelberg.
Brewster, C., Alani, H., Dasmahapatra, S., & Wilks, Y.
(2004). Data driven ontology evaluation.
Dellschaft, K., & Staab, S. (2008, June). Strategies for the
evaluation of ontology learning. In Proceedings of the
2008 Conference on Ontology Learning and
Population: Bridging the Gap between Text and
Knowledge, Frontiers in Artificial Intelligence and
Applications (Vol. 167, pp. 253-272).
Elias Zavitsanos, George Paliouras, and George A.
Vouros. (2011, November) Gold standard evaluation
of ontology learning methods through ontology
transformation and alignment. IEEE Trans. on Knowl.
and Data Eng., 23(11):16351648.
Fernández, M., Overbeeke, C., Sabou, M., & Motta, E.
(2009). What makes a good ontology? A case-study in
fine-grained knowledge reuse. In The semantic
web (pp. 61-75). Springer Berlin Heidelberg.
Gangemi, A., Catenacci, C., Ciaramita, M., & Lehmann, J.
(2005). Ontology evaluation and validation: an
integrated formal model for the quality diagnostic
task. On-line: http://www. loa-cnr.
it/Files/OntoEval4OntoDev_Final. pdf.
Gangemi, A., Catenacci, C., Ciaramita, M., & Lehmann, J.
(2006). Modelling ontology evaluation and
validation (pp. 140-154). Springer Berlin Heidelberg.
Gómez-Pérez, A. (2004). Ontology evaluation.
In Handbook on ontologies (pp. 251-273). Springer
Berlin Heidelberg.
Gruber, T. R. (1993). A translation approach to portable
ontology specifications.Knowledge acquisition, 5(2),
Gruber, T. R. (1995). Toward principles for the design of
ontologies used for knowledge sharing. International
journal of human-computer studies, 43(5), 907-928.
Guarino, N., & Welty, C. (2004). An Overview of
OntoClean in S. Staab, R. Studer (eds.), Handbook on
Haase, P., & Sure, Y. (2005). D3. 2.1 Usage Tracking for
Ontology Evolution.
Hlomani, H., & Stacey, D. (2014). Approaches, methods,
metrics, measures, and subjectivity in ontology
evaluation: A survey. Semantic Web Journal, na (na),
Jones, M., & Alani, H. (2006). Content-based ontology
Kashyap, V., Ramakrishnan, C., Thomas, C., & Sheth, A.
(2005). TaxaMiner: an experimentation framework for
automated taxonomy bootstrapping.International
Journal of Web and Grid Services, 1(2), 240-266.
Leacock, C., & Chodorow, M. (1998). Combining local
context and WordNet similarity for word sense
identification. WordNet: An electronic lexical
database, 49(2), 265-283.
Lozano-Tello, A., & Gómez-Pérez, A. (2004). Ontometric:
A method to choose the appropriate ontology. Journal
of Database Management, 2(15), 1-18.
Maedche, A., & Staab, S. (2002). Measuring similarity
between ontologies. InKnowledge engineering and
knowledge management: Ontologies and the semantic
web (pp. 251-263). Springer Berlin Heidelberg.
Obrst, L., Ceusters, W., Mani, I., Ray, S., & Smith, B.
(2007). The evaluation of ontologies. In Semantic
Web (pp. 139-158). Springer US.
Patel, C., Supekar, K., Lee, Y., & Park, E. K. (2003,
November). OntoKhoj: a semantic web portal for
ontology searching, ranking and classification.
InProceedings of the 5th ACM international workshop
on Web information and data management (pp. 58-
61). ACM.
Ponzetto, S. P., & Strube, M. (2007, July). Deriving a
large scale taxonomy from Wikipedia. In AAAI (Vol.
7, pp. 1440-1445).
Porzel, R., & Malaka, R. (2004, August). A task-based
approach for ontology evaluation. In ECAI Workshop
on Ontology Learning and Population, Valencia,
Raad, J., Bertaux, A., & Cruz, C. (2015, July). A survey
on how to cross-reference web information sources.
In Science and Information Conference (SAI),
2015 (pp. 609-618). IEEE.
Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989).
Development and application of a metric on semantic
nets. Systems, Man and Cybernetics, IEEE
Transactions on, 19(1), 17-30.
Resnik, Philip. "Using information content to evaluate
semantic similarity in a taxonomy." arXiv preprint
cmp-lg/9511007 (1995).
Treeratpituk, P., Khabsa, M., & Giles, C. L. (2013).
Graph-based Approach to Automatic Taxonomy
Generation (GraBTax). arXiv preprint
Ulanov, A., Shevlyakov, G., Lyubomishchenko, N.,
Mehra, P., & Polutin, V. (2010, August). Monte Carlo
Study of Taxonomy Evaluation. In Database and
Expert Systems Applications (DEXA), 2010 Workshop
on (pp. 164-168). IEEE.
Vrandečić, D. (2009). Ontology evaluation (pp. 293-313).
Springer Berlin Heidelberg.
Welty, C. A., Mahindru, R., & Chu-Carroll, J. (2003,
October). Evaluating ontological analysis. In Semantic
Integration Workshop (SI-2003) (p. 92).
Wu, Z., & Palmer, M. (1994, June). Verbs semantics and
lexical selection. InProceedings of the 32nd annual
meeting on Association for Computational
Linguistics (pp. 133-138). Association for
Computational Linguistics.
... Several methods have been used in the literature to evaluate ontologies. In [59], authors classify the different evaluation methods for ontologies into four categories: gold standard, corpus-based, task-based, and criteria-based approaches. A similar categorisation is presented in [60]: golden standard, data-driven, application, and criteria based. ...
... The corpus-based or data-driven approach has been chosen for this paper. This technique is "used to evaluate how far an ontology sufficiently covers a given domain" [59], making it suitable for the task at hand. The features extracted from popular workflow automation tools conform to the corpus against which the semantic model is compared. ...
... When evaluating ontologies, different possible criteria can be addressed. The authors in [59] define accuracy, completeness, conciseness, adaptability, clarity, computational efficiency, and consistency. For our proposed model, we evaluate completeness, conciseness, and adaptability. ...
Full-text available
Workflow automation is taking over software development systems, helping businesses increase efficiency, accelerate production, and adapt quickly to market changes. Combined with agile principles, it has given birth to the DevOps paradigm. However, practitioners often face an important issue known as vendor lock-in caused by the cost of tool replacement or migration to different platforms. This issue could be addressed by standardising service interfaces to facilitate their integration. Linked Data is an attractive choice for implementing such a standard without sacrificing versatility. Following this approach, the Open Services for Lifecycle Collaboration (OSLC) proposal aims to build an environment where services can interoperate using standard models. Therefore, this article proposes an extension of the existing OSLC specification, based on the Event-Condition-Action (ECA) model, for event-based interoperable automation. This extension enables a new path in the field of semantic automation for OSLC services, which allows the self-interaction of services among them and with human users. The article presents the key concepts of the proposed model and exemplifies its application in an automation scenario with two real-life services. The validation of the proposal has been carried out using established ontology evaluation methods, such as coverage and similarity metrics and competency questions.
... Raad and Cruz [25] identified four categories for methods to evaluate an ontology, including gold standard-based methods, corpus-based methods, taskbased methods, and criteria-based methods. Gold standard-based methods are the most straightforward and widely used type. ...
... Task-based methods can also be used to evaluate the adaptability of an ontology to a specific task. Based on Raad and Cruz [25], "adaptability measures how far the ontology anticipates its uses." Criteria-based methods are used to assess the adherence of a newly developed ontology to a specific criterion. ...
... Criteria-based methods are used to assess the adherence of a newly developed ontology to a specific criterion. Table 1 summarizes different criteria for each ontology evaluation method, in which three levels (i.e., high, medium, and low) are assigned to the corresponding criterion in different methods [25]. ...
Conference Paper
Real-time sensing data and continuously updated project documents pose challenges to project managers who need to analyze these data and documents to derive meaningful information necessary for decision-making. To collect and incorporate heterogeneous data both from offsite and onsite sources, the authors: (1) developed a construction tasks, resources, and techniques integrated (ConTaRTI) ontology to classify construction site information that is extensible; and (2) encoded recommendations regarding sensing technique selection into the proposed ConTaRTI ontology, which aims to help collect data for meeting real-time construction information needs. The proposed ConTaRTI ontology offers a novel way to classify construction information that needs to be collected, measured, and detected on the site, given its real-time decision contexts. The ConTaRTI ontology also helps provide sensing technique recommendations to guide the selection of methods and tools for the data collection on specific construction tasks and resources. Therefore, the ontology enables a new method for construction information management by linking construction site information with suitable data collection methods. In addition, the extensibility and flexibility of the proposed ontological model opens a new door to organizing and integrating specific information needs with its collection/process methods to support information management. The quantitative and qualitative evaluation results indicate that the developed ontology can recommend sensing techniques that effectively support field data collection and information management.
... Firstly, a traditional theoretical review was performed by retrieving the related ontology quality surveys and the literature reviews, to explore the ontology quality evaluation and then, to identify the possible quality models, including significant ontology quality characteristics and measures, and the relations among the characteristics. Few attempts have been carried out to present such contributions [51,93,108,112,140]. However, we consider that they are not comprehensive surveys as they have not clearly defined: the research gaps and questions that they have addressed and the survey methodology that they have followed. ...
... Ontology layers (i.e., levels) The researchers have discussed the ontology quality layer-wise, or level-wise by concerning an ontology as a multi-layered vocabulary [22,57,108,112,142]. Initially, three layers to be focused on have been proposed in [57], namely: content, syntactic & lexicon, and architecture. ...
... The authors in [15] have discussed the specification of measures associated with the selected ontology characteristics such as accuracy, completeness, consistency and uniqueness (i.e., conciseness). Moreover, there are several evaluation approaches (i.e., application-based, data-driven, golden standards and expert-based) as presented in survey papers [4,22,112,154]. These approaches discuss the techniques of ontology evaluation that usually can be performed after ontology development. ...
Full-text available
With the continuous advancement of methods, tools, and techniques in ontology development, ontologies have emerged in various fields such as machine learning, robotics, biomedical informatics, agricultural informatics, crowdsourcing, database management, and the Internet of Things. Nevertheless, the nonexistence of a universally agreed methodology for specifying and evaluating the quality of an ontology hinders the success of ontology-based systems in such fields as the quality of each component is required for the overall quality of a system and in turn impacts the usability in use. Moreover, a number of anomalies in definitions of ontology quality concepts are visible, and in addition to that, the ontology quality assessment is limited only to a certain set of characteristics in practice even though some other significant characteristics have to be considered for the specified use-case. Thus, in this research, a comprehensive analysis was performed to uncover the existing contributions specifically on ontology quality models, characteristics, and the associated measures of these characteristics. Consequently, the characteristics identified through this review were classified with the associated aspects of the ontology evaluation space. Furthermore, the formalized definitions for each quality characteristic are provided through this study from the ontological perspective based on the accepted theories and standards. Additionally, a thorough analysis of the extent to which the existing works have covered the quality evaluation aspects is presented and the areas further to be investigated are outlined.
... There are many approaches for measuring the quality of the constructed ontology from different perspectives. Summaries of these approaches were proposed in many papers, such as [58][59][60][61][62][63]. The ontology engineer decides the best-fitting approach for each ...
... There are many approaches for measuring the quality of the constructed ontology from different perspectives. Summaries of these approaches were proposed in many papers, such as [58][59][60][61][62][63]. The ontology engineer decides the best-fitting approach for each situation. ...
Full-text available
Ontologies provide a powerful method for representing, reusing, and sharing domain knowledge. They are extensively used in a wide range of disciplines, including artificial intelligence, knowledge engineering, biomedical informatics, and many more. For several reasons, developing domain ontologies is a challenging task. One of these reasons is that it is a complicated and time-consuming process. Multiple ontology development methodologies have already been proposed. However, there is room for improvement in terms of covering more activities during development (such as enrichment) and enhancing others (such as conceptualization). In this research, an enhanced ontology development methodology (ON-ODM) is proposed. Ontology-driven conceptual modeling (ODCM) and natural language processing (NLP) serve as the foundation of the proposed methodology. ODCM is defined as the utilization of ontological ideas from various areas to build engineering artifacts that improve conceptual modeling. NLP refers to the scientific discipline that employs computer techniques to analyze human language. The proposed ON-ODM is applied to build a tourism ontology that will be beneficial for a variety of applications, including e-tourism. The produced ontology is evaluated based on competency questions (CQs) and quality metrics. It is verified that the ontology answers SPARQL queries covering all CQ groups specified by domain experts. Quality metrics are used to compare the produced ontology with four existing tourism ontologies. For instance, according to the metrics related to conciseness, the produced ontology received a first place ranking when compared to the others, whereas it received a second place ranking regarding understandability. These results show that utilizing ODCM and NLP could facilitate and improve the development process, respectively.
... According to [13], an ontology can generally be evaluated through different approaches, such as comparison with a reference standard, data driven evaluation, user-based evaluation, and applied (taskoriented) evaluation. According to the knowledge of the authors, there is no other ontology for the domain of stock exchange and nancial market of Iran, so the rst option, that is comparing it with the reference standard, is not possible. ...
Full-text available
The unpredictability of the stock market makes it a serious area of study and analysis. With the help of the accumulated information available in the current digital age and the power of high-performance computing machines, there is a great focus on using these capabilities to design algorithms that can learn stock market trends and successfully predict stock prices. The main goal is to create an intelligent system that provides these features for predicting short-term stock price trends to facilitate the investment decision process. In order to increase the accuracy and productivity of these systems and facilitate the routine of using common-sense knowledge in machine learning systems, developing or enriching knowledge bases and ontology for market modeling will be one of the effective measures in this field. In this research, an attempt has been made to strengthen and enrich the basic ontology created by the authors by using other global ontologies related to the subject of the stock market, and parts of the target space that were not addressed have been added to the ontology. By combining reference ontologies, a level of standardization is also created for the ontology and stability in the representation of concepts and relationships is ensured. In the next step, it has been tried to test the impact of the concepts and relations of the ontology in predicting stock price movements. For this purpose, news in the field of economy is considered as input and a model is created that first filters the textual inputs related to the desired stock symbol and then observes their effect on the price changes of the related stock. The study conducted in this report, after improving the performance and comprehensiveness of the ontology, a model was presented to measure and prove the effect of the relationships in this ontology on price changes. And in practice, according to human limitations and the tools used, this effect was observed and confirmed with the proper certainty by checking the economic news.
... We quantitatively evaluate the new classified phenotype ontologies based on our new formulation of cardinality phenotype. The approach we use follows a task-based evaluation [37,38]. In a task-based evaluation, we apply different variants of an ontology and evaluate their performance with respect to a specific task. ...
Full-text available
Motivation Phenotypes are observable characteristics of an organism and they can be highly variable. Information about phenotypes is collected in a clinical context to characterize disease, and is also collected in model organisms and stored in model organism databases where they are used to understand gene functions. Phenotype data is also used in computational data analysis and machine learning methods to provide novel insights into disease mechanisms and support personalized diagnosis of disease. For mammalian organisms and in a clinical context, ontologies such as the Human Phenotype Ontology and the Mammalian Phenotype Ontology are widely used to formally and precisely describe phenotypes. We specifically analyze axioms pertaining to phenotypes of collections of entities within a body, and we find that some of the axioms in phenotype ontologies lead to inferences that may not accurately reflect the underlying biological phenomena. Results We reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.
Adopting semantic technologies has proven several benefits for enabling a better representation of the data and empowering reasoning capabilities over it. However, there are still unresolved issues, such as the shift from heterogeneous documents to semantic data models and the representation of search results. Thus, in this paper, we introduce a novel FramEwork for hybrid molEcule-baseD SEmantic SEARCH, entitled FEED2SEARCH, which facilitates Information Retrieval over a heterogeneous document corpus. We first propose a semantic representation of the corpus, which automatically generates a semantic graph covering both structural and domain-specific aspects. Then, we propose a query processing pipeline based on a novel data structure for query answers, extracted from this graph, which embeds core information together with structural-based and domain-specific context. This provides users with interpretable search results, helping them understand relevant information and track cross document dependencies. A set of experiments conducted using real-world construction projects from the Architecture, Engineering and Construction (AEC) industry shows promising results, which motivates us to further investigate the effectiveness of our proposal in other domains.
Conference Paper
Full-text available
The goal of giving information a well-defined meaning is currently shared by different research communities. Once information has a well-defined meaning, it can be searched and retrieved more effectively. Therefore, this paper is a survey about the methods that compare different textual information sources in order to determine whether they address a similar information or not. The improvement of the studied methods will eventually lead to increase the efficiency of documentary research. In order to achieve this goal, the first category of methods focuses on semantic measure definitions. A second category of methods focuses on paraphrase identification techniques, and the last category deals with improving event extraction techniques. A general discussion is given at the end of the paper presenting the advantages and disadvantages of these methods.
Full-text available
EU-IST Integrated Project (IP) IST-2003-506826 SEKT Deliverable D3.2.1 (WP3.2) This deliverable presents work performed in the task 'T3.2 Usage Tracking for Ontologies and Meta Data'. We address the problem of ontology evolution based on the usage context. We intro- duce the notion of an evaluation function that allows to measure the quality of an ontology with respect to given criteria. Within this framework, ontology evolution and discovering potentially useful changes can be formalized as an optimization problem. We instantiate the framework for the task of ontology pruning based on usage data, and for the task of collaborative evolution in a multi-user scenario, in which users maintain personalized ontologies. Keyword list: ontology evolution, change discovery, ontology usage, ontology evaluation
Full-text available
We propose a novel graph-based approach for constructing concept hierarchy from a large text corpus. Our algorithm, GraBTax, incorporates both statistical co-occurrences and lexical similarity in optimizing the structure of the taxonomy. To automatically generate topic-dependent taxonomies from a large text corpus, GraBTax first extracts topical terms and their relationships from the corpus. The algorithm then constructs a weighted graph representing topics and their associations. A graph partitioning algorithm is then used to recursively partition the topic graph into a taxonomy. For evaluation, we apply GraBTax to articles, primarily computer science, in the CiteSeerX digital library and search engine. The quality of the resulting concept hierarchy is assessed by both human judges and comparison with Wikipedia categories.
Conference Paper
Representing knowledge using domain ontologies has shown to be a useful mechanism and format for managing and exchanging information. Due to the difficulty and cost of building ontologies, a number of ontology libraries and search engines are coming to existence to facilitate reusing such knowledge structures. The need for ontology ranking techniques is becoming crucial as the number of ontologies available for reuse is continuing to grow. In this paper we present AKTiveRank, a prototype system for ranking ontologies based on the analysis of their structures. We describe the metrics used in the ranking system and present an experiment on ranking ontologies returned by a popular search engine for an example query.
An important aspect of ontology learning is a proper evaluation. Gen- erally, one can distinguish between two scenarios: (i) qual ity assurance during an ontology engineering project in which also ontology learning techniques may,be used and (ii) evaluating and comparing,ontology learning algorithms in the labora- tory during their development. This paper gives an overview of different evaluation approaches and matches them against the requirements of the scenarios. It will be shown,that different evaluation approaches have to be applied depending,on the scenario. Special attention will be paid to the second scenario and the gold standard based evaluation of ontology learning for which concrete measures for the lexical and taxonomic,layer will be presented. Keywords. ontology learning, evaluation