Conference PaperPDF Available

Towards Ontology Quality Assessment

Authors:

Abstract and Figures

The success of systems making use of ontology schemas depend mainly on the quality of their underlying ontologies. This has been acknowledged by researchers who responded by suggesting metrics to measure different aspects of quality. Tools have also been designed, but determining the set of quality metrics to use may not be a straightforward task. Research on ontology quality shows that detection of problems at an early stage of the ontology development cycle is necessary to reduce costs and maintenance at later stages, which is more difficult to achieve and requires more effort. Assessment using the right metrics is therefore crucial to identify key quality problems. This ensures that the data and instances of the ontology schema are sound and fit for purpose. Our contribution is a systematic survey on quality metrics applicable to ontologies in the Semantic Web, and preliminary investigation towards methods to visualise quality problems in ontologies.
Content may be subject to copyright.
Towards Ontology Quality Assessment
Silvio Mc Gurk1, Charlie Abela1, and Jeremy Debattista2
1Department of Intelligent Systems, Faculty of ICT, University of Malta, Malta,
silvio.mcgurk.15@um.edu.mt, charlie.abela@um.edu.mt
2Enterprise Information Systems, Fraunhofer IAIS / University of Bonn, Germany,
debattis@cs.iai.uni-bonn.de
Abstract. The success of systems making use of ontology schemas de-
pend mainly on the quality of their underlying ontologies. This has been
acknowledged by researchers who responded by suggesting metrics to
measure different aspects of quality. Tools have also been designed, but
determining the set of quality metrics to use may not be a straightfor-
ward task. Research on ontology quality shows that detection of problems
at an early stage of the ontology development cycle is necessary to re-
duce costs and maintenance at later stages, which is more difficult to
achieve and requires more effort. Assessment using the right metrics is
therefore crucial to identify key quality problems. This ensures that the
data and instances of the ontology schema are sound and fit for purpose.
Our contribution is a systematic survey on quality metrics applicable to
ontologies in the Semantic Web, and preliminary investigation towards
methods to visualise quality problems in ontologies.
Keywords: ontology quality metrics, ontology engineering, ontology
evaluation, quality visualisation
1 Introduction
Many ontologies have been designed and developed over time, spanning a number
of domains and including a number of concepts. Ontologies have been used in var-
ious domains including gene ontologies [2] and as unification tools in biomedicine
[17], in education to enhance learning experiences [19] and in information re-
trieval systems [4]. As ontologies are being developed and reused, the need to
address quality issues becomes an important factor as having a true understand-
ing of the quality of an ontology helps future data publishers to choose ontologies
based on ‘fitness for use’ [13]. Extensive research has been carried out along the
years to help identify quality problems in ontologies [7, 23, 20, 3, 21, 10, 18, 11].
As a result of this research, a number of quality metrics have been suggested.
These are coupled with tools and quality frameworks [5, 15, 7, 23, 25, 21] that
have been implemented in this respect, assessing either the data aspect, the on-
tology schema or both. Unlike in Linked Data Quality [27] and Data Profiling [1],
there is still a lack of concentrated effort to consolidate the various approaches
and methods taken by different researchers to identify and obtain a subset of
metrics that best represent the quality of ontologies. More effort is also needed
to design tools that help ontology engineers, data producers and data publish-
ers, not only to obtain metric measures, but also provide valuable insights into
possible lack of quality in the ontologies under test. Visualisation tools have so
far been mainly used to obtain a visual representation of ontologies, but not as
an alternative way to visualise quality aspects.
The main objectives and contributions of this paper are the following:
Objective 1: Identify and survey existing ontology and data quality metrics
Contribution 1: This will be achieved through a systematic review of existing
literature on quality metrics that have been used in various research fields in-
cluding ontologies, database schemas, XML schemas, object-oriented designs,
software engineering and hierarchical designs in general.
Objective 2: Investigate frameworks and tools that enable the quality assess-
ment of ontologies and visualise different quality aspects
Contribution 2: In this article we will propose a preliminary framework that
merges two known Linked Data tools with regard to data quality and ontology
visualisation, in order to enable the visualisation of ontology quality.
The remaining sections of this paper are organised as follows: Section 2 presents
the methodology and initial results of the survey to identify important metrics.
The section shows how metrics are classified according to the categories and
dimensions pertaining to the ISO Standard 25012 for Data Quality. Section 3
discusses and reviews existing visualisation tools and proposes an alternative
way of looking at the quality of ontologies through the use of visualisation tech-
niques.
2 Classifying Quality Metrics for Ontologies
Various metrics have been proposed in recent years, some of which are now
widely accepted and implemented in a number of frameworks and tools, such
as those in OQuaRE [7], OntoQualitas [23] and OntoQA [25]. Yang, Z. et al.
[26] describe how the quality of an ontology should be managed and evaluated
in terms of its engineering and visualisation. The authors describe how quality
metrics help engineers in their ontology design, thus:
(1) expected to lessen the need for maintenance and,
(2) provide means to find the most fit-for-use ontologies.
2.1 ISO/IEC 25012 Data Quality Standard
The ISO/IEC 25012 [12] is an approved standard, forming part of a series of
International Standards for Software Product Quality Requirements and Eval-
uation (SQuaRE). The model has been adopted in various areas such as soft-
ware engineering [9], ontologies [6] and to data on the World Wide Web and
applications [22], to define quality measures and perform quality evaluations.
It categorises fifteen quality dimensions into three main categories. We aim to
classify the metrics using this standard as in ontologies we are interested in both
the inherent category (such as detecting inconsistencies), as well as the system
category (such as detecting dereferenceability).
2.2 Survey Methodology
In order to ensure that research is thorough and fair, a systematic review was
deemed necessary. The review was carried out according to the methods men-
tioned in [14].
Search Strategy: Based on the objective of surveying quality metrics from
different research areas, several search terms that were deemed to be more ap-
propriate for this systematic review, were used. These included:
data quality, assessment, evaluation, linked data, ontology quality, quality met-
rics, software quality metrics, database quality metrics.
Repositories: The following three repositories were considered in the survey:
ScienceDirect
IEEE Xplore Digital Library
ACM Digital Library
2.3 Metrics Survey
An exercise was carried out to map the metrics identified in the survey, to a cate-
gory and dimension of the ISO/IEC 25012 Data Quality Standard. The standard
identifies three categories, as follows:
The Inherent Category caters for metrics that measure the degree to which
the model itself has quality characteristics of intrinsic nature to satisfy ‘fitness
for use’. This includes domain values, relationships and other metadata. In our
work, we refer to the accuracy, completeness, consistency and currentness di-
mensions of this category. The System Category refers to quality metrics that
measure the degree to which quality is maintained when the system is under
specific use, and includes availability, reliability and portability. The Inherent-
System Category includes dimensions that look at both Inherent and System
aspects, such as compliance and understandability, to which we make reference
in our work.
Table 1 to Table 7 show the metrics in their respective dimensions. Some met-
rics may belong to multiple dimensions or categories, however, we categorise the
metrics into the most appropriate dimension.
Inherent Category Metrics Table 1 to Table 4 show the association of the
metrics to the ISO 25012 Inherent Category. For example IA refers to the asso-
ciation between the Inherent Category and the Accuracy dimension.
Table 1. Accuracy Dimension
Ref. Metric Dimension Reference
IA1 Incorrect Relationship Accuracy [20], [21]
IA2 Merging of Different Concepts in same Class Accuracy [21]
IA3 Hierarchy Overspecialisation Accuracy [21], [3]
IA4 Using a Miscellaneous Class Accuracy [21]
IA5 Chain of Inheritance Accuracy [3]
IA6 Class Precision Accuracy [23]
IA7 Number of Deprecated Classes and Properties Accuracy [11]
IA1: Incorrect Relationship: An incorrect relationship typically occurs with
the vague use of ‘is’, instead of ‘subClassOf’, ‘type’ or ‘sameAs’. As mentioned in
[20], the correct use of the type of relationship is required to accurately represent
the domain. As explained by [21], the relationship ‘rdfs:subClassOf’ is reserved
for subclass relationship, ‘rdf:type’ for objects that belong to a particular class,
and ‘owl:sameAs’ is used to indicate that two instances are equivalent.
IA2: Merging of Different Concepts in same Class: Every different con-
cept should be in its own class. The anomaly occurs when two different concepts
are put in the same class.
IA3: Hierarchy Overspecialisation: Overspecialisation occurs when a leaf
class of an ontology (a class that is not a superclass of some other classes) does
not have any instances associated with it.
IA4: Using a Miscellaneous Class: A class within the hierarchy of the on-
tology which is simply used to represent instances that do not belong to any
of its siblings. For instance, having the class ‘Fruit’ with subclasses ‘Orange’,
‘Apple’, ‘Pear’ and ‘Miscellaneous’. The ‘Miscellaneous’ class might simply be
capturing the rest of the fruits, without any distinction between them, thereby
lacking accuracy.
IA5: Chain of Inheritance: An undesirable inheritance chain may occur when
a large part of an ontology exists where each class in the chain has only one sub-
class (for example a section of the ontology with a chain of six classes, each of
which has only one subclass and has no siblings). This might mean that some
aggregation of the concepts defined in that section might be required.
IA6: Class Precision: This metric is calculated over a given frame of reference
(existing resources or sources of data with which the ontology may be evaluated)
and tests precision of the ontology. It is defined as the cardinality of the inter-
section between classes in the ontology and classes in the frame, divided by the
total number of classes in the ontology. Effectively this is a percentage of the
number of classes common between the ontology and the test data source, with
respect to the total number of classes in the ontology. For example, assuming
an ontology of fifty classes, of which, forty are present in the test data source,
the ontology precision would be 80%. There is 20% of the ontology which is not
relevant to the test data source.
IA7: Number of Deprecated Classes and Properties: This metric ad-
dresses parts of an ontology which are marked as deprecated, identified by
‘owl:DeprecatedClass’ or ‘owl:DeprecatedProperty’. Deprecated sections are nor-
mally not updated anymore and might be superseded by newer classes or prop-
erties. This problem could either be within the ontology itself, or pointing to
external references that have since been deprecated. It must be noted here that,
having an ontology with a deprecated class or property is not necessarily a qual-
ity problem. In fact, in certain situations it might be desirable to leave the classes
and properties within the ontology and mark them as deprecated (rather than
deleting them), as there might be other ontologies that are currently referencing
the deprecated elements. Deleting those elements might make the other ontolo-
gies unusuable. What we mean here is that, new ontologies developed after an
element or property has been deprecated, should not ideally make use of those
elements (but rather use the new elements).
Table 2. Completeness Dimension
Ref. Metric Dimension Reference
IC1 Number of Isolated Elements Completeness [21]
IC2 Missing Domain or Range in Properties Completeness [21]
IC3 Class Coverage Completeness [23]
IC4 Relation Coverage Completeness [23]
IC1: Number of Isolated Elements: Elements, including classes, properties
and datatypes are considered isolated if they do not have any relation to the rest
of the ontology (declared but not used).
IC2: Missing Domain or Range in Properties: Properties should be ac-
companied by their domain and range. Missing information about the properties
may cause lack of completeness and may result in less accuracy and more in-
consistencies. This does not always and necessarily indicate a quality problem.
There might be cases, for instance in Linked Data, where it is desirable for a
property to be open (not being bound to a particular domain or specific range).
IC3: Class Coverage: This metric is calculated over a given frame of refer-
ence and determines the amount of coverage of a given ontology. It is defined as
the cardinality of the intersection between classes in the ontology and classes in
the frame, divided by the total number of classes in frame. Effectively this is a
percentage of the number of classes common between the ontology and the test
data source, with respect to the total number of classes in the test data source.
For example, assuming a test data source of sixty classes, of which, forty are
present in the ontology, the ontology coverage would be 67%. There is 33% of
the test data source which is not covered by the ontology.
IC4: Relation Coverage: This is similar to class coverage, but is defined as
the cardinality of the intersection between relations in the ontology and relations
in the frame, divided by the total number of relations in frame.
Table 3. Consistency Dimension
Ref. Metric Dimension Reference
IO1 Number of Polysemous Elements Consistency [21]
IO2 Including Cycles in a Class Hierarchy Consistency [20],[10],[21]
IO3 Missing Disjointness Consistency [20],[10],[21]
IO4 Defining Multiple Domains/Ranges Consistency [21]
IO5 Creating a Property Chain with One Property Consistency [21]
IO6 Lonely Disjoints Consistency [3]
IO7 Tangledness (two methods) Consistency [7]
IO8 Semantically Identical Classes Consistency [23]
IO1: Number of Polysemous Elements: Number of properties, objects or
datatypes that are referred by the same identifier. A quality issue arises if, in a
given ontology, there are multiple classes and/or properties which are concep-
tually different but have the same identifier. For example, ‘man’ might refer to
different but related concepts, such as referring to ‘the human species’ or a ‘male
person’.
IO2: Including Cycles in a Class Hierarchy: Identified by [10] as circu-
latory errors, this condition typically occurs, for example, when a class C1is
defined as a superclass of class C2, and C2is defined as a superclass of C1at the
same time. C1and C2may not necessarily be directly linked, thus cycles may
form at different depths, d.
IO3: Missing Disjointness: Gomez-Perez et al. in [10] qualifies that subclasses
of a class which are disjoint from each other (a subclass can only be of one type),
should specify this disjointness in the ontology.
IO4: Defining Multiple Domains/Ranges: Multiple domains and ranges
are allowed, however, these should not be in conflict with each other (that is,
no two domains or ranges should contradict each other). A quality issue arises
when multiple definitions are inconsistent.
IO5: Creating a Property Chain with One Property: This metric refers
to the use of the OWL construct ‘owl:propertyChainAxiom’ to set a property as
being composed of several other properties. The anomaly occurs when a prop-
erty chain includes only one property in the compositional part. For example,
declaring the property ‘grandparent’ as a property chain, but including only one
property ‘parent’ within it (instead of the required two ‘parent’ properties).
IO6: Lonely Disjoints: As mentioned in [3], a class C is referred to as a lonely
disjoint when the ontology specifies that this class is disjoint with some other
classes CAand CB, but C is not a sibling of CAand CB.
IO7: Tangledness: This is defined as the mean number of classes with more
than one direct ancestor. Another measure of tangledness is defined as the mean
number of direct ancestor of classes with more than one direct ancestor.
IO8: Semantically Identical Classes: This anomaly occurs when an ontology
includes multiple classes with the same semantics (referring to the same concept).
Table 4. Currentness Dimension
Ref. Metric Dimension Reference
IU1 Freshness Currentness [18]
IU1: Freshness: This is defined by [18] as a measure indicating how updated a
given piece of information is. The authors define a similar metric, ‘newness’ as
a measure to indicate how data was created in a timely manner.
Inherent-System Category Metrics Table 5 and Table 6 show the associa-
tion of metrics to the ISO 25012 Inherent-System Category (IS).
Table 5. Compliance Dimension
Ref. Metric Dimension Reference
ISM1 No OWL Ontology Declaration Compliance [21]
ISM2 Ambiguous Namespace Compliance [21]
ISM3 Namespace Hijacking Compliance [21]
ISM4 Number of Syntax Errors Compliance [11]
ISM1: No OWL Ontology Declaration: Ontologies must ensure that the
‘owl:Ontology’ tag is provided, which includes meta-data specific to the ontol-
ogy such as version, license and dates, and to make reference to other ontologies.
ISM2: Ambiguous Namespace: The absence of the ontology URI and the
namespace ‘xml:base’ will cause the ontology namespace to be matched to its
location. This may result in an unstable ontology which causes its namespace to
change depending on its location.
ISM3: Namespace Hijacking: Hijacking occurs when an ontology makes ref-
erence to terms T, properties Por objects Ofrom another namespace K, where
that namespace Kdoes not really have any definitions for T,Pand O.
ISM4: Number of Syntax Errors: This is a running total of the number of
syntax errors found in a given ontology.
Table 6. Understandability Dimension
Ref. Metric Dimension Reference
ISU1 Missing Annotations Understandability [21]
ISU2 Property Clumps Understandability [3]
ISU3 Using Different Naming Conventions Consistency [21]
ISU1: Missing Annotations: Elements of an ontology should have human
readable annotations that label them, such as the use of ‘rdfs:label’ or the label
‘skos:prefLabel’.
ISU2: Property Clumps: Clumps occur when a collection of elements (prop-
erties, objects) are included as a group in a number of class definitions. In such
cases, [3] argue that the ontology may be improved by defining an abstract con-
cept as an aggregation of the clump. A trivial example would be the common use
of properties ‘house’, ‘street’, ‘town’ and ‘country’, together in different places
within an ontology. An abstract single concept ‘address’ may be defined to in-
clude such properties.
ISU3: Using Different Naming Conventions: This is an inconsistency in
the way concepts, classes, properties and datatypes are written.
System Category Metrics Table 7 shows the association of metrics to the
ISO 25012 System Category (S).
Table 7. Availability Dimension
Ref. Metric Dimension Reference
SA1 Dereferenceability Availability [21]
SA1: Dereferenceability: This indicates whether a given ontology is readily
available online.
3 Visualisation
3.1 Visualising Ontologies
Various attempts have been made at visualising ontologies, mostly representing
them as graphs which depict the way concepts are connected together. Typically,
these attempts render force-directed hierarchical structures that present a nice,
intuitive and useful way of displaying ontologies. Lohmann, S. et al. [16] argue
that most visualisations lack in some respect. Some implementations such as
OWLViz3and OntoTrack [15] just present the user with the hierarchy of con-
cepts. Other systems provide more detail but lack in aspects such as datatypes
and characteristics that are necessary to better understand what ontologies are
really representing. These include systems such as OntoGraf4and FlexViz [8].
The authors further argue that VOWL is built with a comprehensive language
for representation and visualisation of ontologies which can be understood by
both engineers with expertise in ontologies and design, as well as by others who
may be less-knowledgeable in the area. Their implementation is designed for the
Web Ontology Language, OWL. This, along with the fact that VOWL is released
under the MIT license and is fully available and extensible enough, is main rea-
son why it is being used in this work to study how visualisation techniques may
help ontology engineers and users to assess quality.
3.2 Visualising Ontology Quality - A Preliminary Investigation in
Building a Pipeline between Luzzu and VOWL
In order to tackle Objective 2, we try to merge efforts done in Linked Data qual-
ity assessment frameworks and ontology visualisation tools. In order to achieve
this, we plan to investigate the outcomes of Luzzu [5], and re-use its interopera-
ble quality results and problem reports within VOWL [16], in a proposed system
(work in progress) as shown in Figure 1.
Luzzu was selected since it is a generic assessment framework, allowing for
the custom definition of quality metrics. Furthermore, the output generated by
Luzzu following the quality assessment, is interoperable - in the sense that we
can use the same schemas Luzzu uses to output the problem report and quality
metadata, in order to visualise ontology quality in VOWL. Our aim is to create
an additional layer on top of VOWL to visualise ontology quality and identify
quality weaknesses, as shown in Figure 2.
Areas of interest among concepts and properties are calculated according to
the number of different metrics, the different groups and the nature of the met-
rics that fail. Different methods and visualisation techniques will be studied to
determine how these can help ontology engineers and users to visualise quality
3http://protegewiki.stanford.edu/wiki/OWLViz
4http://protegewiki.stanford.edu/wiki/OntoGraf
Fig. 1. Proposed System
problems as clearly as possible in such a way that they could be easily under-
stood and interpreted correctly. The system would provide information about
which metrics were used in the assessment, in such a way that it would be pos-
sible to compare two visualised quality assessments with different metrics and
evaluate the effect on the given ontology.
Figure 2 shows an ontology which has been subjected to analysis. The three
areas identified (highlighted) represent locations of the ontology which failed
one or more tests. In this particular example, concept C5failed a number of
tests represented here by the overlap of the three highlighted groups. An inter-
pretation of this could be that concept C5might require immediate attention
since it has a higher degree of weakness.
Fig. 2. Projecting Metric Information onto the Visualised Ontology
4 Final Remarks and Future Work
Ontological quality is desirable given the popularity and the important role of
ontologies in communication and sharing of information across systems. This
work aims at providing a comprehensive view of quality metrics for ontologies. It
also looks at how visualisations can help in this process. An attempt to answer
these questions is made through a survey of existing metrics from literature,
obtained from different areas of computing. Correlation tests will be performed
to determine sets of metrics that address the same aspects of quality. The results
of the survey and correlation tests will help in identifying metrics that will
then be implemented in the Luzzu framework. Ontologies are assessed using
this framework, and its quality metadata and problem reports are fed into the
VOWL framework, whereby an additional layer will be implemented to provide
a visualisation of the quality assessment for the given ontology. As a result, we
aim to provide an alternative and more intuitive way of looking at the level of
quality in an ontology, achieved through visualisation techniques.
References
1. Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. The
VLDB Journal. 24, 557-581 (2015).
2. Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A.,
Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D., Issel-Tarver, L., Kasarskis,
A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G., Sherlock, G.:
Gene Ontology: tool for the unification of biology. Nature Genetics. 25, 25-29
(2000).
3. Baumeister, J., Seipel, D.: Smelly owls - Design anomalies in ontologies. Proceed-
ings of the Eighteenth International Florida Artificial Intelligence Research Soci-
ety Conference, FLAIRS 2005 - Recent Advances in Artifical Intelligence. 215-220
(2005).
4. Besbes, G., Baazaoui-Zghal, H.: Modular ontologies and CBR-based hybrid system
for web information retrieval. Multimedia Tools and Applications. 74, 8053-8077
(2014).
5. Debattista, J., Auer, S., Lange, C.: Luzzu - A Methodology and Framework for
Linked Data Quality Assessment. Journal of Data and Information Quality. 8, 1-32
(2016).
6. Duque-Ramos, A., Boeker, M., Jansen, L., Schulz, S., Iniesta, M., Fernndez-Breis,
J.: Evaluating the Good Ontology Design Guideline (GoodOD) with the Ontol-
ogy Quality Requirements and Evaluation Method and Metrics (OQuaRE). PLoS
ONE. 9, e104463 (2014).
7. Duque-Ramos, A., Fernndez-Breis, J., Iniesta, M., Dumontier, M., Egaa
Aranguren, M., Schulz, S., Aussenac-Gilles, N., Stevens, R.: Evaluation of the
OQuaRE framework for ontology quality. Expert Systems with Applications. 40,
2696-2703 (2013).
8. Falconer, S.M., Callendar, C., Storey, M.: FLEXVIZ: Visualizing Biomedical On-
tologies on the Web. International Conference on Biomedical Ontology, Software
Demonstration, Buffalo, NY. 0-1, (2009).
9. Febrero, F., Calero, C., Angeles Moraga, M.: Software reliability modeling based
on ISO/IEC SQuaRE. Information and Software Technology. 70, 18-29 (2016).
10. Gomez-Perez, A., Fernandez-Lopez, M., Corcho, O.: Ontological engineering.
Springer, London. (2010).
11. Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic
Web. CEUR Workshop Proceedings. 628 (2010).
12. ISO, ISO/IEC 25012:2008Software engineering. Software product quality require-
ments and evaluation (SQuaRE). Data quality model, Report, International Or-
ganization for Standarization. (2009).
13. Juran, J., Godfrey, A.: Juran’s Quality Handbook (5th Edition). McGraw-Hill
Professional Publishing, New York, USA. (1998).
14. Kitchenham, B.: Procedures for performing systematic reviews. Technical re-
port, Joint Technical Report Keele University Technical Report TR/SE-0401 and
NICTA Technical Report 0400011T.1 (2004).
15. Liebig, T., Noppens, O.: OntoTrack - A New Ontology Authoring Approach. 4
(2004).
16. Lohmann, S., Negru, S., Haag, F., Ertl, T.: Visualizing ontologies with VOWL.
Semantic Web. 7, 399-419 (2016).
17. McCray, A.: An Upper-Level Ontology for the Biomedical Domain. Comparative
and Functional Genomics. 4, 80-84 (2003).
18. Mendes, C.P.N., Bizer, C., Miklos, Z., Calbimonte, J., Moraru, A., Flouris, G.:
PlanetData D2.1 Conceptual model and best practices for high-quality metadata
publishing.
19. Miranda, S., Orciuoli, F., Sampson, D.: A SKOS-based framework for Subject
Ontologies to improve learning experiences. Computers in Human Behavior. 61,
609-621 (2016).
20. Noy, N.F., McGuinness, D.L.: Ontology Development 101: A Guide to Creating
Your First Ontology. Stanford Knowledge Systems Laboratory. 25 (2001).
21. Poveda-Villaln, M., Gomez-Perez, A., Suarez-Figueroa, M.: OOPS! (OntOlogy Pit-
fall Scanner!):. International Journal on Semantic Web and Information Systems.
10, 7-34 (2014).
22. Rafique I., Lew P., Qanber Abbasi M., Li, Z.: Information Quality Evaluation
Framework: Extending ISO 25012 Data Quality Model, World Academy of Sci-
ence, Engineering and Technology - International Journal of Computer, Electrical,
Automation, Control and Information Engineering. 6, 568-573 (2012).
23. Rico, M., Caliusco, M., Chiotti, O., Galli, M.: OntoQualitas: A framework for
ontology quality assessment in information interchanges between heterogeneous
systems. Computers in Industry. 65, 1291-1300 (2014).
24. Srinivasan, K., Devi, T.: A Comprehensive Review and Analysis on Object-
Oriented Software Metrics in Software Measurement. International Journal on
Computer Science and Engineering. 6, 7, 247-261 (2014).
25. Tartir, S., Arpinar, I.B., Moore, M., Sheth, A.P., Aleman-Meza, B.: OntoQA:
Metric-Based Ontology Quality Analysis. IEEE Workshop on Knowledge Acquisi-
tion from Distributed, Autonomous, Semantically Heterogeneous Data and Knowl-
edge Sources. 45-53 (2005).
26. Yang, Z., Zhang, D., Ye, C.: Evaluation metrics for ontology complexity and evo-
lution analysis. Proceedings - IEEE International Conference on e-Business Engi-
neering, ICEBE 2006. 162-169 (2006).
27. Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality
assessment for Linked Data: A Survey. Semantic Web. 7, 63-93 (2015).
... We analyzed the existing survey studies which have focused on ontology evaluation criteria, metrics, and approaches. Among them, a countable number of survey studies [16][17][18][19][20] were reviewed the related works comprehensively or systematically. However, none of them have provided a model or matrix, or overview among quality criteria, approaches, and ontology levels. ...
... As a diverse set of ontology quality criteria exist, it is difficult for researchers to find a suitable set of quality criteria for assessing a particular ontology based on the intended purpose. To mitigate this issue, scholars have adopted well-defined theories and standards in the software engineering discipline [3,20]. In the article [20], the authors have conducted a systematic review to identify the ontology quality criteria and grouped the measures of quality criteria into categories namely Inherent and Inherent-System, which have been defined in ISO/IEC 25012 Data Quality Standard. ...
... To mitigate this issue, scholars have adopted well-defined theories and standards in the software engineering discipline [3,20]. In the article [20], the authors have conducted a systematic review to identify the ontology quality criteria and grouped the measures of quality criteria into categories namely Inherent and Inherent-System, which have been defined in ISO/IEC 25012 Data Quality Standard. The adapted inherent quality criteria from this standard are accuracy, completeness, consistency, and currentness. ...
Chapter
Ontology quality assessment needs to be performed across the ontology development life cycle to ensure that the ontology being modeled meets the intended purpose. To this end, a set of quality criteria and metrics provides a basis to assess the quality with respect to the quality requirements. However, the existing criteria and metrics defined in the literature so far are messy and vague. Thus, it is difficult to determine what set of criteria and measures would be applicable to assess the quality of an ontology for the intended purpose. Moreover, there are no well-accepted methodologies for ontology quality assessment as the way it is in the software engineering discipline. Therefore, a comprehensive review was performed to identify the existing contribution on ontology quality criteria and metrics. As a result, it was identified that the existing criteria can be classified under five dimensions namely syntactic, structural, semantic, pragmatic, and social. Moreover, a matrix with ontology levels, approaches, and criteria/metrics was presented to guide the researchers when they perform a quality assessment.
... We analyzed existing survey studies which have been discussed ontology evaluation criteria, metrics and approaches. Among them, a countable number of studies [16][17][18][19][20] have been reviewed related works comprehensively or systematically. However, none of them have provided a matrix or overview among quality criteria, approaches, and ontology layers, then that would be difficult for researchers to gain insight that what quality criteria are required to consider when performing ontology evaluation and which criteria are more appropriate to assess each aspect of an ontology. ...
... As diverse ontology quality criteria exist, it is difficult for researchers to find a suitable set of quality criteria for assessing a particular ontology based on the intended purpose. To mitigate this issue, scholars have been adopted well-defined theories and standards in software engineering [4,20]. Silvio Mc Gurk et al, (2017) has conducted a systematic review to identify the ontology quality criteria and grouped the measures of quality criteria into categories: Inherent and Inherent-System, which have been defined in ISO/IEC 25012 Data Quality Standard [20]. ...
... To mitigate this issue, scholars have been adopted well-defined theories and standards in software engineering [4,20]. Silvio Mc Gurk et al, (2017) has conducted a systematic review to identify the ontology quality criteria and grouped the measures of quality criteria into categories: Inherent and Inherent-System, which have been defined in ISO/IEC 25012 Data Quality Standard [20]. The adapted inherent quality criteria from the standard are accuracy, completeness, consistency, and currentness. ...
Preprint
Ontology enables to represent domain knowledge in a formal way that can act as a component in an information system or as a part of an ontology network. In such environments, ontology quality is a key attribute that would influence the entire system or network quality. A diverse set of quality criteria and measures are available that support getting an insight on ontology quality in an objective and quantitative basis. Then, it would use to compare the quality of a collection of ontologies to select the best one for the intended purpose or to assess whether the required level of quality is achieved by the ontology which is incorporated in a system. However, there are no accepted standards in using ontology quality criteria across the ontology life cycle like in software engineering. Thus, ontology quality assessments are limited to a certain level in practice though many theories have been defined. Based on that, we attempt to explore the ontology criteria and measures that have been identified in the previous works. To this end, we identified a set of dimensions: syntactic, structural, semantic, and context (i.e pragmatic, usage) that the most ontology quality criteria and metrics can be mapped. Finally, an overview of metrics related to the ontology layers and evaluation approaches has been provided.
... There are 13 articles that propose some approaches of metrics about schema completeness [20], [22], [24], [27], [31], [53], [55], [59], [61]- [63], [66], [67]. These existing approaches defined a set of metrics to assess schema completeness such as applying fusion methods or defining quality indicators, or assessing completeness based on extracting a set of frequent/required predicates. ...
... These existing approaches defined a set of metrics to assess schema completeness such as applying fusion methods or defining quality indicators, or assessing completeness based on extracting a set of frequent/required predicates. Several metrics measure completeness as the ratio of the number of classes/properties presented in a dataset to the total number of classes/properties [22], [53], [61], [63], [66], [67]. Other metrics take into account only the mandatory properties to assess the completeness [24], [31], [72]. ...
... Definition 2 (Property completeness): Property completeness is the degree to which values for a specific property are available for a given task. Bronselaer et al. [20] Cappiello et al. [24] Issa et al. [31] Balaraman et al. [59] Lajus and Suchanek [27] Assaf et al. [66] McGurk et al. [67] Behkamal et al. [22] Behkamal [55] Mendes et al. [61] Kontokostas et al. [63] Färber et al. [53] Fürber and Hepp [62] Alec et al. [23] Font et al. [29] Endris et al. [34] Ali and Alchaita [44] Mountantonakis and Tzitzikas [46] Paulheim and Bizer [48] Christen and Goiser [58] Fürber and Hepp [65] Hasan et al. [69] Radulovic et al. [17] Nahari et al. [18] Mihindukulasooriya et al. [25] Darari et al. [42] Feeney et al. [47] Micic et al. [64] Galárraga et al. [57] Song and Heflin [33] Darari et al. [37] Cherix et al. [39] Acosta et al. [43] Darari et al. [45] Soulet et al. [60] Luggen et al. [56] Ruckhaus et al. [26] Mouromtsev et al. [68] Nguyen et al. [49] Jaffri et al. [70] Knap et al. [52] Albertoni et al. [19] Yaghouti et al. [21] Thakkar et al. [28] Guéret et al. [30] Albertoni et al. [14] Rizzo et al. [50] Rula et al. [35] Faisal et al. [36] Neumaier et al. [32] Kubler et al. [38] Charles et al. [40] Ellefi et al. [41] Assaf et al. [71] Ell et al. [51] Gürdür et al. [ One of the earliest and influential research work which addresses property completeness is the work of Mendes et al. [61]. They developed a versatile tool for quality assessment (Sieve) as a part of a Linked Data Integration Framework which deals with data access, schema mapping and identity resolution. ...
Article
Full-text available
The quality of a Knowledge Graph (also known as Linked Data) is an important aspect to indicate its fitness for use in an application. Several quality dimensions are identified, such as accuracy, completeness, timeliness, provenance, and accessibility, which are used to assess the quality. While many prior studies offer a landscape view of data quality dimensions, here we focus on presenting a systematic literature review for assessing the completeness of Knowledge Graph. We gather existing approaches from the literature and analyze them qualitatively and quantitatively. In particular, we unify and formalize commonly used terminologies across 56 articles related to the completeness dimension of data quality and provide a comprehensive list of methodologies and metrics used to evaluate the different types of completeness. We identify seven types of completeness, including three types that were not previously identified in previous surveys. We also analyze nine different tools capable of assessing Knowledge Graph completeness. The aim of this Systematic Literature Review is to provide researchers and data curators a comprehensive and deeper understanding of existing works on completeness and its properties, thereby encouraging further experimentation and development of new approaches focused on completeness as a data quality dimension of Knowledge Graph.
... Evaluating the quality of the merged ontology strongly depends on the quality of the input ontologies. We assess the quality of the two input ontologies and the resulting merged ontology by adapting and reformulating the metrics defined in [28], [21], [27]. ...
... where HRD ∈ {label, comment, description} and R represents the ontology resources. • Isolated Elements (IE) [21]: refers to classes and properties which are defined but not connected to the rest of the ontology, i.e. not used. The quality score function f IE :O→ R for an input ontology O is defined as follows: ...
... where R isolated represents resources which are defined but not used in O. • Missing Domain or Range in Properties (MP) [21]: refers to missing information about properties. The less of missing information about properties, the more the ontology is complete. ...
Preprint
Full-text available
With the growing amount of multilingual data on the Semantic Web, several ontologies (in different natural languages) have been developed to model the same domain. Creating multilingual ontologies by merging such monolingual ones is important to promote semantic interoperability among different ontologies in different natural languages. This is a step towards achieving the multilingual Semantic Web. In this paper, we propose MULON, an approach for merging monolingual ontologies in different natural languages producing a multilingual ontology. MULON approach comprises three modules; Preparation Module, Merging Module, and Assessment Module. We consider both classes and properties in the merging process. We present three real-world use cases describing the usability of the MULON approach in different domains. We assess the quality of the merged ontologies using a set of predefined assessment metrics. MULON has been implemented using Scala and Apache Spark under an open-source license. We have compared our cross-lingual matching results with the results from the Ontology Alignment Evaluation Initiative (OAEI 2019). MULON has achieved relatively high precision, recall, and F-measure in comparison to three state-of-the-art approaches in the matching process and significantly higher coverage without any redundancy in the merging process.
... Ontologies are assessed along different dimensions such as accuracy, completeness, conciseness, consistency, clarity, adaptability, computational efficiency but also compliance, currentness, understandability, or availability [46], [47]. Moreover, the methods for ontology assessment can be categorised as either gold-standard based, task based (or application based), corpus based, or criteria based [47]. ...
... Approaches and metrics.There are 13 articles that propose some approaches of metrics about schema completeness. These existing approaches defined a set of metrics to assess schema completenessMendes, Mühleisen, and Bizer 2012a;Assaf, Senart, and Troncy 2016;McGurk, Abela, and Debattista 2017] such as applying fusion methods or defining quality indicators, or assessing completeness based on extracting a set of frequent/required predicates [Bronselaer, De Mol, and De Tre 2018; Cappiello, Di Noia, Marcu, and Matera 2016; Issa, Paris, and Hamdi 2017b; Knap and Michelfeit 2012; Färber, Bartscherer, Menne, and Rettinger 2018; Balaraman, Razniewski, and Nutt 2018; Issa, Paris, Hamdi, and Cherfi 2019]. Several metrics measure completeness as the ratio of the number of classes/properties presented in a dataset to the total number of classes/properties [Behkamal, Kahani, Bagheri, and Jeremic 2014; Knap and Michelfeit 2012; Färber, Bartscherer, Menne, and Rettinger 2018; Mendes, Mühleisen, and Bizer 2012a; Kontokostas, Westphal, Auer, Hellmann, Lehmann, Cornelissen, and Zaveri 2014; Assaf, Senart, and Troncy 2016; McGurk, Abela, and Debattista 2017]. ...
Thesis
The wide spread of Semantic Web technologies such as the Resource Description Framework (RDF) enables individuals to build their databases on the Web, to write vocabularies, and define rules to arrange and explain the relationships between data according to the Linked Data principles. As a consequence, a large amount of structured and interlinked data is being generated daily. A close examination of the quality of this data could be very critical, especially, if important research and professional decisions depend on it. The quality of Linked Data is an important aspect to indicate their fitness for use in applications. Several dimensions to assess the quality of Linked Data are identified such as accuracy, completeness, provenance, and conciseness. This thesis focuses on assessing completeness and enhancing conciseness of Linked Data. In particular, we first proposed a completeness calculation approach based on a generated schema. Indeed, as a reference schema is required to assess completeness, we proposed a mining-based approach to derive a suitable schema (i.e., a set of properties) from data. This approach distinguishes between essential properties and marginal ones to generate, for a given dataset, a conceptual schema that meets the user's expectations regarding data completeness constraints. We implemented a prototype called “LOD-CM” to illustrate the process of deriving a conceptual schema of a dataset based on the user's requirements. We further proposed an approach to discover equivalent predicates to improve the conciseness of Linked Data. This approach is based, in addition to a statistical analysis, on a deep semantic analysis of data and on learning algorithms. We argue that studying the meaning of predicates can help to improve the accuracy of results. Finally, a set of experiments was conducted on real-world datasets to evaluate our proposed approaches.
... [62][63][64] The semantic assets usually take the form of metadata schemas or ontologies, stating what classes of objects exist (in a certain domain, i.e., the application field for which the schema or ontology is designed) and how they can relate to each other. 62,65,66 The approach based on semantic interoperability has the advantage that the agreement on both the format and the meaning is codified on the basis of definitions that can be processed computationally, e.g., by automated logical reasoning. In this way, the internal consistency of data sets can be checked, and data from multiple sources can be integrated, 2 facilitating more effective decision support systems. ...
Preprint
Full-text available
By introducing a common representational system for metadata that describe the employed simulation workflows, diverse sources of data and platforms in computational molecular engineering, such as workflow management systems, can become interoperable at the semantic level. To achieve semantic interoperability, the present work introduces two ontologies that provide a formal specification of the entities occurring in a simulation workflow and the relations between them: The software ontology VISO is developed to represent software packages and their features, and OSMO, an ontology for simulation, modelling, and optimization, is introduced on the basis of MODA, a previously developed semi-intuitive graph notation for workflows in materials modelling. As a proof of concept, OSMO is employed to describe a use case of the TaLPas workflow management system, a scheduler and workflow optimizer for particle-based simulations.
Chapter
Brazilian organizations must comply with the Brazilian General Data Protection Law (LGPD) and this need must be carried out in harmony with legacy systems and in the new systems developed and used by organizations. In this article we present an overview of the LGPD implementation process by public and private organizations in Brazil. We conducted a literature review and a survey with Information and Communication Technology (ICT) professionals to investigate and understand how organizations are adapting to LGPD. The results show that more than 46% of the organizations have a Data Protection Officer (DPO) and only 54% of the data holders have free access to the duration and form that their data is being treated, being able to consult this information for free and facilitated. However, 59% of the participants stated that the sharing of personal data stored by the organization is carried out only with partners of the organization, in accordance with the LGPD and when strictly necessary and 51% stated that the organization performs the logging of all accesses to the personal data. In addition, 96.7% of organizations have already suffered some sanction / notification from the National Data Protection Agency (ANPD). According to our findings, we can conclude that Brazilian organizations are not yet in full compliance with the LGPD.
Article
Full-text available
Purpose Health-care ontologies and their terminologies play a vital role in knowledge representation and data integration for health information. In health-care systems, Internet of Technology (IoT) technologies provide data exchange among various entities and ontologies offer a formal description to present the knowledge of health-care domains. These ontologies are advised to assure the quality of their adoption and applicability in the real world. Design/methodology/approach Ontology assessment is an integral part of ontology construction and maintenance. It is always performed to identify inconsistencies and modeling errors by the experts during the ontology development. A smart health-care ontology (SHCO) has been designed to deal with health-care information and IoT devices. In this paper, an integrated approach has been proposed to assess the SHCO on different assessment tools such as Themis, Test-Driven Development (TDD)onto, Protégé and OOPs! Several test cases are framed to assess the ontology on these tools, in this research, Themis and TDDonto tools provide the verification for the test cases while Protégé and OOPs! provides validation of modeled knowledge in the ontology. Findings As of the best knowledge, no other study has been presented earlier to conduct the integrated assessment on different tools. All test cases are successfully analyzed on these tools and results are drawn and compared with other ontologies. Originality/value The developed ontology is analyzed on different verification and validation tools to assure the quality of ontologies.
Article
Full-text available
The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data and, subsequently, to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data Quality, the output of such tools is not suitable for machine consumption, and thus consumers can hardly compare and rank datasets in the order of fitness for use. This article describes a conceptual methodology for assessing Linked Datasets, and Luzzu; aframework for Linked Data Quality Assessment. Luzzu is based on four major components: (1) an extensible interface for defining new quality metrics; (2) an interoperable, ontology-driven back-end for representing quality metadata and quality problems that can be re-used within different semantic frameworks; (3) scalable dataset processors for data dumps, SPARQL endpoints, and big data infrastructures; and (4) a customisable ranking algorithm taking into account user-defined weights. We show that Luzzu scales linearly against the number of triples in a dataset. We also demonstrate the applicability of the Luzzu framework by evaluating and analysing a number of statistical datasets against a variety of metrics. This article contributes towards the definition of a holistic data quality lifecycle, in terms of the co-evolution of linked datasets, with the final aim of improving their quality.
Article
Full-text available
The development and standardization of Semantic Web technologies has resulted in an unprecedented volume of data being published on the Web as Linked Data (LD). However, we observe widely varying data quality ranging from extensively curated datasets to crowdsourced and extracted data of relatively low quality. In this article, we present the results of a systematic review of approaches for assessing the quality of LD. We gather existing approaches and analyze them qualitatively. In particular, we unify and formalize commonly used terminologies across papers related to data quality and provide a comprehensive list of 18 quality dimensions and 69 metrics. Additionally, we qualitatively analyze the 30 core approaches and 12 tools using a set of attributes. The aim of this article is to provide researchers and data curators a comprehensive understanding of existing work, thereby encouraging further experimentation and development of new approaches focused towards data quality, specifically for LD.
Article
Full-text available
Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases.
Conference Paper
Full-text available
As the Semantic Web gains importance for sharing knowledge on the Internet this has lead to the development and publishing of many ontologies in different domains. When trying to reuse existing ontologies into their applications, users are faced with the problem of determining if an ontology is suitable for their needs. In this paper, we introduce OntoQA, an approach that analyzes ontology schemas and their populations (i.e. knowledgebases) and describes them through a well defined set of metrics. These metrics can highlight key characteristics of an ontology schema as well as its population and enable users to make an informed decision quickly. We present an evaluation of several ontologies using these metrics to demonstrate their applicability.
Chapter
Ontologies are formal, explicit specifications of shared conceptualizations. There is much literature on what they are, how they can be engineered and where they can be used inside applications. All these literature can be grouped under the term “ontological engineering,” which is defined as the set of activities that concern the ontology development process, the ontology lifecycle, the principles, methods and methodologies for building ontologies, and the tool suites and languages that support them. In this chapter we provide an overview of ontological engineering, describing the current trends, issues and problems.
Article
The Visual Notation for OWL Ontologies (VOWL) is a well-specified visual language for the user-oriented representation of ontologies. It defines graphical depictions for most elements of the Web Ontology Language (OWL) that are combined to a force-directed graph layout visualizing the ontology. In contrast to related work, VOWL aims for an intuitive and comprehensive representation that is also understandable to users less familiar with ontologies. This article presents VOWL in detail and describes its implementation in two different tools: ProtegeVOWL and WebVOWL. The first is a plugin for the ontology editor Protege, the second a standalone web application. Both tools demonstrate the applicability of VOWL by means of various ontologies. In addition, the results of three user studies that evaluate the comprehensibility and usability of VOWL are summarized. They are complemented by findings from an interview with experienced ontology users and from testing the visual scope and completeness of VOWL with a benchmark ontology. The evaluations helped to improve VOWL and confirm that it produces comparatively intuitive and comprehensible ontology visualizations.
Article
Subject Ontologies represent conceptualizations of disciplinary domains in which concepts symbolize topics that are relevant for the considered domain and are associated each other by means of specific relations. Usually, these kind of lightweight ontologies are adopted in knowledge-based educational environments to enable semantic organization and search of resources and, in other cases, to support personalization and adaptation features for learning and teaching experiences. For this reason, applying effective management methodologies for Subject Ontologies is a crucial aspect in engineering the environments. In particular, this paper proposes an approach to use SKOS (a Semantic Web-based vocabulary providing a standard way to represent knowledge organization systems) for modelling subject ontologies. Moreover, the paper underlines the main benefits of SKOS. It focuses on alternative strategies for storing and accessing ontologies in order to support the knowledge sharing, knowledge reusing, planning, assessment, customization and adaptation processes related to learning scenarios. The results of an early experimentation allowed the authors defining a framework able to support, from both methodological and technological viewpoints, the use of Subject Ontologies in the context of a Semantic Web-based Educational System. The defined framework has high performances in terms of response and this may really improve the user experience.
Article
Context: The increasing dependence of our society on software driven systems has led Software Reliability to become a key factor as well as making it a highly active research area with hundreds of works being published every year. It would, however, appear that this activity is much more reduced as regards how to apply representative international standards on Product Quality to industrial environments, with just a few works on Standard Based software reliability modeling (SB-SRM). This is surprising given the relevance of such International Standards in industry. Objective: To identify and analyze the existing works on the modeling of Software Reliability based on International Standards as the starting point for a reliability assessment proposal based on ISO/IEC-25000 "Software Product Quality Requirements and Evaluation" (SQuaRE) series. Method: The work methodology is based on the guidelines provided in Evidence Based Software Engineering for Systematic Literature Reviews (SLR). Results: A total of 1820 works were obtained as a result of the SLR search, more than 800 primary studies were selected after data filtering. After scrutiny, over thirty of those were thoroughly analyze, the results obtained show a very limited application of SB-SRM particularly to industrial environment. Conclusion: Our analysis point to the complexity of the proposed models together with the difficulties involved in applying them to the management of engineering activities as a root cause to be considered for such limited application. The various stakeholder needs are also a point of paramount importance that should be better covered if the industrial applicability of the proposed models is to be increased.