ArticlePDF Available

Abstract

Purpose Although ontology matchers are annually proposed to address different aspects of the semantic heterogeneity problem, finding the most suitable alignment approach is still an issue. This study aims to propose a computational solution for ontology meta-matching (OMM) and a framework designed for developers to make use of alignment techniques in their applications. Design/methodology/approach The framework includes some similarity functions that can be chosen by developers and then, automatically, set weights for each function to obtain better alignments. To evaluate the framework, several simulations were performed with a data set from the Ontology Alignment Evaluation Initiative. Simple similarity functions were used, rather than aligners known in the literature, to demonstrate that the results would be more influenced by the proposed meta-alignment approach than the functions used. Findings The results showed that the framework is able to adapt to different test cases. The approach achieved better results when compared with existing ontology meta-matchers. Originality/value Although approaches for OMM have been proposed, it is not easy to use them during software development. On the other hand, this work presents a framework that can be used by developers to align ontologies. New ontology matchers can be added and the framework is extensible to new methods. Moreover, this work presents a novel OMM approach modeled as a linear equation system which can be easily computed.
A framework to aggregate
multiple ontology matchers
Jairo Francisco de Souza
Department of Computer Science, Federal University of Juiz de Fora,
Juiz de Fora, Brazil, and
Sean Wolfgand Matsui Siqueira and Bernardo Nunes
Department of Informatics, Federal University of the State of Rio de Janeiro,
Rio de Janeiro, Brazil
Abstract
Purpose Although ontology matchers are annually proposed to address different aspects of the semantic
heterogeneity problem, nding the most suitable alignment approach is still an issue. This study aims to
propose a computational solution for ontology meta-matching (OMM) and a framework designed for
developers to make use of alignment techniques in their applications.
Design/methodology/approach The framework includes some similarity functions that can be
chosen by developers and then, automatically, set weights for each function to obtain better alignments. To
evaluate the framework, several simulations were performed with a data set from the Ontology Alignment
Evaluation Initiative. Simple similarity functions were used, rather than aligners known in the literature, to
demonstrate that the results would be more inuenced by the proposed meta-alignment approach than the
functions used.
Findings The results showed that the framework is able to adapt to different test cases. The approach
achieved better results when compared with existing ontology meta-matchers.
Originality/value Although approaches for OMM have been proposed, it is not easy to use them during
software development. On the other hand, this work presents a framework that can be used by developers to
align ontologies. New ontology matchers can be added and the framework is extensible to new methods.
Moreover, this work presents a novel OMM approach modeled as a linear equation system which can be
easily computed.
Keywords Metadata and ontologies, Semantic interoperability, Schema matching
Paper type Research paper
1. Introduction
With the dissemination of Semantic Web, ontologies have been built and made available to
formally represent domain concepts. Ontologies have been used to structure knowledge and
to facilitate the exchange of messages and data among systems. They are built for different
purposes and by people with distinguished specializations and skills as well as different
perspectives of the domain. Therefore, there exist distinct ontologies of the same domain as
well as complementary ones, which are built with contrasting structures, names and/or
characteristics. To reconcile these ontologies, alignment techniques have been used. There
are many techniquesto align ontologies andthey have been reviewed in the recent literature
(Thiéblin et al.,2019;Mohammadi et al.,2019;Chauhan et al., 2018;Abubakar et al.,2018;
Babalou et al., 2016;Otero-Cerdeira et al., 2015).
Different matchers have been proposed on a yearly basis (Thiéblin et al.,2019;
Mohammadi et al.,2019,2018;Chauhan et al.,2018;Xue and Wang, 2017) because of the fact
that ontology alignment is a complex matter, which can be tackled by many approaches
Multiple
ontology
matchers
151
Received 17 May 2019
Revised 16 September2019
Accepted 17 September2019
International Journal of Web
Information Systems
Vol. 16 No. 2, 2020
pp. 151-169
© Emerald Publishing Limited
1744-0084
DOI 10.1108/IJWIS-05-2019-0023
The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/1744-0084.htm
such as by using linguistic techniques (Kolyvakis et al., 2018;Wu et al.,2016;Van Hage et al.,
2005), the graph theory (Quintero et al.,2018;Li et al., 2018;Zang et al., 2016) and
mathematical logic (Karimi and Kamandi, 2019;Jayasri and Rajmohan, 2015;Sengupta and
Hitzler, 2015;Janowicz and Wilkes, 2009). In addition, each matcher tries to solve the
problem taking into account a subset of characteristics. Therefore, there is no alignment
approach that clearly stands out in relation to all others (Xue and Wang, 2017) and ts all
possible scenarios. To solve the alignment problem in a more general approach that depends
less on the specic scenario, it is possible to aggregate the result of a heterogeneous set of
matchers by adding the characteristics of each matcher to obtain better results. This context
leads to the emergence of semi-automatic approaches such as interactive matching (da Silva
et al.,2017) and automatic approaches such as meta-matchers. However, the issue of meta-
matching is far from being trivial (Xue and Pan, 2018;Guli
cet al.,2018;Xue and Liu, 2017).
This is a problem usually solved through an optimization approach, mainly by using
population-based meta-heuristics (Souza et al.,2014).
Even though ontology meta-matching (OMM) has become more frequent over the past
few years, only a few papers deal with its usage in practice (Xue et al., 2018b;Martinez-Gil
et al., 2012). There are tools for ontology alignment which have been proposed in the
literature and made available by researchers, but this is not the case for meta-matching
approaches. In addition, OMM is time-consuming and it is often used as a batch service.
We introduce GNOSISþ, a framework for OMM that can be used by developers to align
ontologies. GNOSISþwas conceived with an architecture that allows the insertion of new
matchers and the adaptation to combine different types of matchers, methods of
correspondence choice, parameterization of the approach for tuning the weights associated
with each matcher and format of output alignment.It was projected to support developers to
use alignment techniques in their applications. Moreover, the OMM approach is modeled as
a linear equation system, thus the tness function is lesscostly than classical functions used
by other meta-matchers.
To facilitate the understanding of the work, this paper is organized as follows: Section 2
discusses related works. The architecture of GNOSISþis presented in detail in Section 3. To
demonstrate that theproposed solution canbe adapted to different scenarios, we applied the
developed tool on a benchmark. The methodology and results are discussed in Section 4.
Finally, Section 5 presents our conclusions. Our experiments show that the tool was able to
reach results close to the state of art, but requiring few training data.
2. Related work
According to Xue and Wang (2017) and Martinez-Gil and Aldana-Montes (2012), even
though there are many ontology matching techniques, none has been proved to be fully
efcient technique in all cases. Usually, there is a need for having knowledgeon the context
in which the techniques will be applied, on the data available and on the existing differences
according to the model to be applied. Still, Xue and Wang (2017) highlight that ontology
matchers do not necessarily nd the same correspondences. The combined, coordinated use
of distinguished techniques, preferably complementary, may benet the achievement of
better ontology alignment; this is the context for the use of OMM.
Research on ontology alignment generally explores a xed solution. Considering the
tools submitted to Ontology Alignment Evaluation Initiative (OAEI) campaigns[1], few are
those which can be set up by users or easily adapted to new types of matchers (Mathur et al.,
2014;Martinez-Gil et al.,2012). According to Martinez-Gil and Aldana-Montes (2012), these
tools are generally prepared exclusively to reach the best results in predetermined tests,
thus, the results lose their importance inpractice.
IJWIS
16,2
152
The use of more than one alignment technique has been applied in some matchers, such as
the family of matchers named risk minimization based ontology mapping (RiMOM) (Li et al.,
2009;Wang et al.,2011;Shao et al., 2016). Although all the tools belonging to the RiMOM family
are used for distinguished alignment tasks, they make use of different matchers and have
combined results to generate the nal alignment. The RiMOM2 tool (Wang et al., 2011), for
example, uses up to eight distinct matchers. Each matcher is specialized at treating a type of
heterogeneity by resolving analysis of structure, syntactic analysis, etc. However, the values
used for the association between the weights and the strategies that are used to generate the
nal result are all predened. According to Xue and Wang (2017), the same occurs to FALCON-
AO (Hu and Qu, 2008), the pair SAMBO and SAMBOdtf (Lambrix et al., 2008), and for more
recent tools such as the agreement maker light (Faria et al., 2018), Lily (Tang et al., 2018), XMap
(Djeddi et al., 2016) and OMI-DL (Liu et al., 2016). Therefore, the approach chosen for these tools
cannot be considered meta-matching as the combination of results is predetermined.
Adaptable architectures for ontology alignment have been proposed in Mathur et al. (2014)
and Martinez-Gil et al. (2012). The proposal from Mathur et al. (2014) makes use of edit distance
techniquesandagraphsimilaritymethodinan architecture that allows insertion of new
similarity measures. The result of each measure is added in a score matrix. This matrix
represents a bipartite graph in which the lines represent the entities of an ontology; the columns
mean the entities of other ontologies, and each cellistheresultofthefunctionsofsimilarity
applied to these two entities. Once the matrix is generated, a graph union method is used to
decide on the best pairs of the matrix to generate the nal alignment. Even though the approach
by Mathur et al. (2014) is easily adapted to add new metrics, the authors do not provide a method
of denition of weights for the metrics in use, consequently not allowing to adjust the weights of
each similarity function. Thus, the meta-matching process needs to be executed from the scratch
for a new ontology version. The tool described in Martinez-Gil et al. (2012), in turn, enables the
use of 136 pre-implemented similarity functionsinwhichtheuserisabletochooseasubsetof
functions to match two ontologies. This tool was implemented as an extension to the Eclipse. In
addition to the decision on the similarity function, the user is also able to decide on the method of
aggregation of results to be used. Thus, the tool becomes very useful to test different solutions as
the amount of possible combinations of similarity functions and parameters is quite high.
There are approaches to dene weights for similarity functions, as reviewed in Otero-
Cerdeira et al. (2015) and Souza et al. (2014). Recent approaches were developed by Xue et al.
(2018a),Biniz and El Ayachi (2018),Xue and Wang (2017) and Xue and Liu (2017), which
present a genetic algorithm used to solve multi-objective problems. The authors use multi-
objective evolutionary algorithms, such as non-dominated sorting genetic algorithm-II (Deb
et al.,2000) and multiobjective evolutionary algorithm based on decomposition (Zhang and
Li, 2007), to solve the problem of OMM and demonstrate that the algorithm is able to generate
good results on OAEI data set. However, these authors do not offer a tool to extend the
matchers used in their approaches.
This paper describes the GNOSISþ, a solution created to adjust the weights of different
matchers regarding different application domains. Therefore, just like the tool developed by
Martinez-Gil et al. (2012) and Mathur et al. (2014), the GNOSISþenables the insertion of new
matchers as well as the decision on a number of matchers to process the alignment. In
contrast to these two approaches, the GNOSISþmeets the contributions generated by Xue
and Wang (2017) and Xue and Liu (2017) to carry out a process of meta-matching through a
genetic algorithm. However, GNOSISþpresents a new problem representation. The
problem is represented as a linear equation system which is easily computed. The user can
adapt the alignment process by dening the form of decision regarding the nal alignments
and adjusting different parameters of the genetic algorithm.
Multiple
ontology
matchers
153
3. GNOSIS1framework
GNOSISþwas proposed for supporting ontology alignment through the use of different
algorithms to calculate similarity degrees thus allowing new algorithms to be incorporated
into the calculation of similarities according to the users intention. The attribution of the
ideal weights for each matcher chosen by the user occurs through a genetic algorithm. To
resolve the issue of calibration of weights, the GNOSISþuses as input a small set of
correspondences provided by the engineers of ontologies. Therefore, the approach becomes
more applicable to scenarios in which the reference alignments are unknown, but the
ontology engineer is able to easily point out some correspondences. By using a small set for
training, the objective function is able to assess intermediate solutions faster.
Figure 1 illustrates the architecture of the framework. The tool is organized in three
major modules (compositor module, analyzer module and persistence module), which may
contain sub-modules in addition to an application program interface (API) used for
extension of the tool.
The tool was developed to process ontologies described in ontology web language (OWL) or
resource description framework (RDF) languages, in which the input is always pairs of
ontologies. Apache Jena was used to provide basic methods to read and record both RDF and
OWL documents[2], in addition to allow reasoners to be used for the processing of ontologies.
All of the three major modules are available in the GNOSISþAPI, which can be considered
an extension layer of the Jena API specialized in similarity functions. The GNOSISþAPI
enables the GNOSISþto be integrated into other applications as well as to provide a set of
methods to manipulate the results calculated by the GNOSISþ. In addition, the API offers a set
of classes for new similarity functions to be integrated to the GNOSISþtool.
The following sections detail each module of the tool as well as some functions of the API
that are related to the module.
3.1 Compositor module
The compositor module is responsible for composing the functions, i.e. aggregating the
functions implemented in the tool. The GNOSISþtool has some similarity functions
available and othersthat can be implemented or integrated with the GNOSISþAPI.
3.1.1 Input les. GNOSISþreceives ontologies to be matched as input. According to the
actual implementation version, the inputs must be in OWL or RDF formats. GNOSISþalso
Figure 1.
Architecture of the
framework
IJWIS
16,2
154
requires a description of the similarity functions to be used to match the two input
ontologies, as well as an optional description of the alignments already known between the
two input ontologies. These inputs should be in XML format.
Figure 2 shows a simplied XML le of the description of similarity functions. This le
contains the denition of the functions to be used to match the ontologies described through
the attribute id of XML elements ontology.
The similarity functions available in the tool are called through the XML element
function. Each function of the XML le with the description of the functions may have its
weight associated (XML element weight). If the weights of each function or container are
dened, the system uses these values to dene the nal similarity degree of each pair of
concepts. Otherwise, if the user wishes that the system calibrates the weights automatically,
the XML element pre-alignment is lled with the path of the le containing the alignments
provided by specialists. Figure 3 illustrates an extract of the le containing the input
alignments.
The le containing the example alignments must describe the correspondences already
known between the entities of ontologies described by their namespaces (XML elements
onto1 and onto2) as well as by their physical path (XML elements uri1 and uri2). The le
Figure 2.
Example of XML
document to describe
the composed
functions
Figure 3.
Example of XML
document with input
alignment
Multiple
ontology
matchers
155
describes a set of cell elements, each one representing a correspondence between two entities,
in which XML elements entity1 and entity2 identify a correspondence between the ontology
entities onto1 and onto2. The XML element measure describes the similarity degree
between entity1 and entity2.
3.1.2 Similarity functions. A set of similarity functions was implemented and is available
for use in GNOSISþ. Similarity functions can make use of syntactic analysis to calculate the
similarity degree between entities. Therefore, functions can incorporate any measure of
distance of terms. Thus, the user is able to specify for each function, whenever possible,
which algorithm of comparison will be used (see XML element strategy in Figure 2). Some
classic algorithms were implemented for the syntactic measures in the tool, which are
included in Table I.
Each class in Table I receives two chains of characters as input and computes the
similarity according to each algorithm. New algorithms to calculate the edit distance can be
incorporated in the tool through the implementation of the IeditDistance interface. Similarity
functions that make use of edit measures implement an interface called
StringBasedFunction.
Among these similarity functions available in the GNOSISþ, there are functions to
analyze different elements of ontologies, such as relations among classes, instances and
comments of elements. The similarity functions currently available were created to analyze
only one type of ontology elements or, in some cases, small groups of elements. In addition
to the functions available, new functions can be integrated to the tool through the
implementation of an interface called ISimilarityFunction. Thus, new algorithms which are
more complex may be easily incorporated to the GNOSISþ.Table II presents the similarity
functions available in the tool.
The description of thebasic functioning of each similarityfunction is presented below:
CommentSimilarity: Calculates the similarity of two entities regarding the
description contained in the elds of comments of the entities.
ConceptNameSimilarity: Calculates the similarity of two entities regarding the
similarity between their terms.
DirectCommonGraphSimilarity: Calculates the similarity of two entities regarding
the common identiers of elements which form a subgraph of all of their relations.
DirectDataTypePropertybyNameSimilarity: Calculates the similarity of two entities
regarding the similarity between the terms of their properties of data type.
DirectDataTypePropertybyRangeEqualSimilarity: Calculates the similarity of two
entities regarding the range of their properties of data type.
Table I.
Algorithms for string
comparison
Class name Measure
DamerauLevenshteinEditDistance Damerau (1964)
EqualsEditDistance
HammingEditDistance Hamming (1950)
JaroWinklerEditDistance Winkler (1999)
LevenshteinEditDistance Levenshtein (1966)
NGramDistance Kondrak (2005)
TokenizerEditDistance Robertson and Jones (1976)
IJWIS
16,2
156
DirectDataTypePropertybyRangeSimilarity: Calculates the similarity of two entities
regarding the similarity between the sets which form the range of their properties of
data type.
DirectIndividualbyNameSimilarity: Calculates the similarity of two entities
regarding the existence of instances with the same identier.
DirectIndividualbyPropertySimilarity: Calculates the similarity of two entities
regarding the values of properties of their instances.
DirectIndividualSimilarity: Calculates the similarity of two entities regarding the
values of properties as well as the identiers of their instances.
DirectObjectPropertybyNameSimilarity: Calculates the similarity of two entities
regarding the similarity between the terms of their properties of object type.
DirectObjectPropertybyRangeSimilarity: Calculates the similarity of two entities
regarding the similarity between the terms which identify the range of their
properties of object type.
DirectPropertybyNameSimilarity: Calculates the similarity of two entities regarding
the similarity between the terms which identify their properties of data and object
type.
DirectPropertybyRangeDomainSimilarity: Calculates the similarity of two entities
regarding the similarity between the sets which form the range as well as the
domain of their properties of data and object type.
DirectSubClassSimilarity: Calculates the similarity of two entities regarding the
similarity between the set of their subclasses (or super-properties, in case
relationships).
DirectSuperClassSimilarity: Calculates the similarity of two entities regarding the
similarity between the set of their superclasses (or sub-properties, in case of
relationships).
3.1.3 Function calibrator. We call OMM the problem of choosing a set of parameters for a
composed similarity function f(.) :E´[0.1], where Eis a set of pairs of entities. The best
Table II.
Similarity functions
available in
GNOSISþ
Class name Processed elements
CommentSimilarity comments
ConceptNameSimilarity identiers
DirectCommonGraphSimilarity relations
DirectDataTypePropertybyNameSimilarity properties
DirectDataTypePropertybyRangeEqualSimilarity properties
DirectDataTypePropertybyRangeSimilarity properties
DirectIndividualbyNameSimilarity instances
DirectIndividualbyPropertySimilarity instances
DirectIndividualSimilarity instances
DirectObjectPropertybyNameSimilarity properties
DirectObjectPropertybyRangeSimilarity properties
DirectPropertybyNameSimilarity properties
DirectPropertybyRangeDomainSimilarity properties
DirectSubClassSimilarity hierarchy
DirectSuperClassSimilarity hierarchy
Multiple
ontology
matchers
157
parameters for the function f(.) are the ones that makes f(.) close to the expected similarity for
each pair in E.
Let Sbe a set of known correspondences between two entities, the ith element of Sis a
triple (<x
i
,y
i
>,^,s
i
), where the pairs <x
i
,y
i
>[E,^represents the relation of equivalence
and s
i
[[0.1] is the expected similarity between the x
i
and y
i
. In practice, the number of pairs
in Sis much smaller than |E|. The OMM aims to obtain the function fthat best
approximates the expected similarity for all triples in S.
For instance, consider the set S={(x
1
,y
1
,^,s
1
), (x
2
,y
2
,^,s
2
), (x
3
,y
3
,^,s
3
)}, and the
composed function f(x,y)=g
1
(x, y)w
1
þg
2
(x, y)w
2
þg
3
(x, y)w
3.
For that input, we have the
following system of linear equations:
fx
1;y1
ðÞ
¼g1x1;y1
ðÞ
w1þg2x1;y1
ðÞ
w2þg3x1;y1
ðÞ
w3¼s1
fx
2;y2
ðÞ
¼g1x2;y2
ðÞ
w1þg2x2;y2
ðÞ
w2þg3x2;y2
ðÞ
w3¼s2
fx
3;y3
ðÞ
¼g1x3;y3
ðÞ
w1þg2x3;y3
ðÞ
w2þg3x3;y3
ðÞ
w3¼s3
Once we know the values of the member functions g
k
for the pairs of entities in S, the
problem is to select the weights w
k
[[0.1] that solve the system. However, as the system can
be inconsistent, the objective is to select the weights w
k
that best approximate the s
i
for all
triples (x
i
,y
i
). Thus, the OMM problem can be dened as: let Sbe a set of known
correspondences and a composition of Kmember functions g
k
,nd the weights w
k
for each
member function from fthat minimizes Zgiven by the equation (1):
min Z ¼X
jSj
i¼1
siX
K
k¼1
wkgkxi;yi
ðÞðÞ
(1)
subject to:
X
K
k¼1
wk¼1 (2)
wk20;1
½8k2f1;...Kg(3)
si20;1
½8i2f1;...jSjg (4)
Based on the choice of the weights w
k
for the members of f, the function will be applied to all
pairs in Eto select the best alignment for the pair of ontologies.
The genetic algorithm is used to nd an approximate solution for this linear system. To
nd a solution to represent the best calibration for the above functions, chromosomes are
created as C={w
1
,w
2
,...,w
k
}, in which w
i
represents the weight to be applied to the function
g
i
,1#i#K. According to equation (1), the closer to zero is the value of tness function, the
better adapted is the individual.
The solution space is reduced by discretization. Let
t
[[0.1] be the discretization
granularity. Thus, all weights are multiples of
t
. The domain of decision variables w
i
is the
set
t
= {0,
t
,2
t
,3
t
,...,1}. So, the problem consists on nding values such that minimizes Z
[equation (1)]. Let C=(w
1
,w
2
,...,w
k
) be a chromosome, the size of solution space is at most
1þ1
t

k.
IJWIS
16,2
158
After each tgenerations, a solution improvement process is used to search a winner
neighborhood for better solutions. A local search is used in theimprovement process. Let C
v
be the best solution (winner) for a given population. Local search algorithm aims to explore
C
v
neighborhood to identify chromosome C
z
, where C
z
[V=V
þ
|V
has better tness
than C
v
.Vis a set of neighbors of C
v
=(w
1
,w
2
,...,w
n
), where each neighbor has only one
gene value distinct of C. Neighborhood V
þ
is composed of chromosomes that differ from C
v
only in gene positions g
i
with values w
i
þ
t
.Similarly, V
is composed of chromosomes that
differ from C
v
only in gene positions w
i
with values w
i
t
0. So, V
þ
={(w
1
þ
t
,w
2
,...,
w
n
), (w
1
,w
2
þ
t
,...,w
n
), ...,(w
1
,w
2
,...,w
n
þ
t
)} and V
={(w
1
t
,w
2
,...,w
n
), (w
1
,w
2
t
,
...,w
n
),...,(w
1
,w
2
,...,w
n
t
)}. Local search is performed on the whole neighborhood V
and the best neighbor solution is chosen. Thus, local search complexity is O(| C|), where
|V|#2|C| and |C| represents the number of genes of a chromosome. Periodicity tusually
is set as 10 per cent of total number of generations.
The module allows the user to alter many parameters set by the calibrator, such as
population size, amount of generations, rates of reproduction, mutation and survival.
Therefore, the user is able to dene the execution of the module according to its capacity to
process and the available resources.
3.2 Analyzer
The analyzer module is responsible for calculating the degrees of similarity according to the
similarity functions selected by the user. At this point, the weights for each function of
similarity had already been established, either by denition of the user or by the calibrator
of functions.
Considering two ontologies oand o0, the analyzer a(e,e0)=sgenerates a set Tof tuples,
such that |T|#|oko0,asVeVe0(e,e0,s)[Tiff s
g
,e[o,e0[o0. The value
g
is denoted
from threshold or minimum condence value of the similarity degree. The user can specify
the minimum value in a system set-up le. In case it is not specied, then
g
=0.
To ll the set T, the analyzer also considers the types of combination as well as the
values of penalties specied by the user.
3.2.1 Types of combination. In this paper, combination is the process of choosing the
alignments that a similarity function performs to calculate the similarity degree between
two entities. Functions of similarity that analyze subclasses of a class, for example, may
generate a similarity degree according to the average of all similarities calculated among all
of the subclasses in each class. Thus, if one assumes that a class chas two subclasses sc
1
and
sc
2
as well as, similarly, a class c0has two subclasses sc0
1and sc0
2, by calculating the
similarities between the subclasses, the tuples emerge (sc
1
,sc0
1,s
1
), (sc
1
,sc0
2,s
2
), (sc
2
,sc0
1,s
3
),
(sc
2
,sc0
2,s
4
). That is, a similarity function can calculate the similarity between cand c0as the
arithmetic mean of similarities calculated for each subclass:
s1þs2þs3þs4
4(5)
Another form of calculating the similarity between cand c0is the average between the
similarity degrees without repetition of entities. Therefore, there would be two ways to
choose the tuples: {(sc
1
,sc0
1,s
1
), (sc
2
,sc0
2,s
3
)} or {(sc
1
,sc0
2,s
2
), (sc
2
,sc0
1,s
4
)}. That is, as each
subclass has two subclasses, the similarity between these classes would be calculated as the
average of similarity of only two tuples.
However, the decision on the tuple to form the nal similarity of the similarity function
can be carried out in different ways. The GNOSISþallows the similarity functions to be
Multiple
ontology
matchers
159
implemented by using strategies of combination (see XML element combination in Figure 2).
These strategies can be used by any similarity function through a set of similarities,
subclasses, superclasses, properties and instances, among others.
Two strategies are implemented in the tool, provided by classes FirstMatchCombination
and DeepCombination. Class FirstMatchCombination implements a strategy of greedy
combination, which consists of the ordination of a set of tuples in decreasing order per
similarity degree. Subsequently, the algorithm inserts the rst tuple into the initially empty
set of selected tuples, called S. Subsequently, the list of tuples is read from top to bottom and
a new tuple is chosen (x,y,z) such that V(e,e0,s)[S,x=e^y=e0. The tuple chosen is
inserted in S. The process ends when there are no longer tuples to be chosen. Even though it
is a simple strategy, it is fast, as it does not require other combinations of tuples.
For example, consider the set of tuples with the similarity degree between an entity e
and e0:
(e
1
,e0
2, 0.8);
(e
2
,e0
2, 0.7);
(e
1
,e0
3, 0.6);
(e
2
,e0
1, 0.4);
(e
1
,e0
1, 0.3); and
(e
2
,e0
3, 0.3).
Upon the ordinated set of tuples in Table III, the application of the greedy combination
strategy occurs through the following steps:
decision on the tuple (e
1
,e0
2, 0.8);
insertion of (e
1
,e0
2, 0.8) in S;
decision on the tuple (e
2
,e0
1, 0.4) as the previous tuples have elements e
1
or e0
2which
are already in S; and
insertion of (e
2
,e0
1, 0.4) in S.
In the end, S={(e
1
,e0
2, 0.8), (e
2
,e0
1, 0.4)} and the nal similarity is given by (0.8 þ0.4) 2=
0.6.The greedy strategy does not guarantee the best results. It is worth emphasizing that
nding the subset of Twhich resumes the highest value is a complete NP problem (Souza,
1986) and can be impractical to calculate this subset upon a very high amount of tuples.
Class DeepCombination implements a brute-force strategy to test all of the solutions.
Such strategy guarantees the best result in detriment of performance. Upon an ordinate set
of tuples T, all possible subsets S
i
of viable solutions of Tare created, and to calculate each
tuple t¼te;te0;ts
ðÞ
and w¼we;we0;ws
ðÞ
belonging to S
i
we have te we^te0 we0. After
generating all possible combinations between tuples, the algorithm selects the subset S
i
which has the highest nal similarity. Upon the ordinated set of tuples from Table III, the
brute-force combination strategy creates all possible subsets S
i
of viable solutions of T. For
this example, in the end of the process, the subset S={(e
2
,e0
2, 0.7), (e
1
,e0
3, 0.6)} is chosen, the
nal similarity is given by (0.7 þ0.6) 2 = 0.65 and there is no other subset with higher
nal similarity.
New combination strategies can be introduced in the tool by implementing an interface
called ICombination.
3.2.2 Penalties. When a process of combination is used in similarity functions that
analyze sets of entities according to the last section, it is possible to dene a penalty
IJWIS
16,2
160
specied by the user (see XML element penalty in Figure 2). The penalty can be considered a
negative weight applied to sets of entities with difference in theircardinality.
This type of penalty can be applied, for example, in similarity functions which assess the
properties of a class. If class chas xproperties and another class c0has yproperties, in which
x=y, then the engineer of ontologies can decide to penalize the similarity degree calculated
for these two classes.
Therefore, let Ube a penalty value and let gbe a function that adds up the similarity
degrees generated by the functions of occurrence combinations c.Considerccombines the
occurrences of a similarity function fapplied to two entities eand e0.Thenal similarity degree
r
of a similarity function that allows using penalties is calculated such as in equation (6):
r
e;e0
ðÞ
¼gc f e;e0
ðÞðÞðÞ
lmin þUlmax lmin
ðÞ (6)
To understand equation (6), consider n
e
as the number of elements related to the entity e
which were analyzed to calculate the similarity F(e,e0). In addition, consider ne0as the
Table III.
Tests used in the tool
evaluation
# Summary
101 Identical ontologies
102 Irrelevant ontology
103 Language generalization
104 Language restriction
201 Unnamed entities
202 Entities with random names and no comments
203 No comments in the ontology
204 Distinct naming conventions
205 Names changed by synonyms
206 Distinct language (translation)
207 Name translations only
208 Changes applied on #203 and #204
209 Changes applied on #203 and #205
210 Changes applied on #203 and #206
221 No specialization
222 Flattened hierarchy
223 Expanded hierarchy
224 Instances removed
225 Restrictions removed
228 Properties removed
230 Flattened classes
232 Changes applied on #221 and #224
233 Changes applied on #221 and #228
236 Changes applied on #224 and #228
237 Changes applied on #222 and #224
238 Changes applied on #223 and #224
239 Changes applied on #222 and #228
240 Changes applied on #223 and #228
241 Changes applied on #221, #224 and #228
246 Changes applied on #222, #224 and #228
247 Changes applied on #223, #224 and #228
301 Real ontology: BibTeX/MIT
302 Real ontology: BibTeX/UMBC
303 Real ontology: BibTeX/Karlsruhe
304 Real ontology: BibTeX/INRIA
Multiple
ontology
matchers
161
number of elements related to the entity e0which were analyzed to calculate the similarity F
(e,e0). For example, eis a class with three subclasses and e0is a class with four subclasses,
and F(e,e0) is a similarity function which analyzes the similarity between two classes
through a similarity function fapplied to the subclasses of eand e0. In this example, we
would have n
e
= 3 and ne0¼4. Variables l
min
and l
max
represent, respectively, the minimum
and the maximum numbers of elements related to each entity, that is, lmin ¼min ne;ne0
ðÞ
and lmax ¼max ne;ne0
ðÞ
.
Penalty Uis a real number belonging to [0,1]. If U= 0, then the difference does not matter
in the number of elements which have a given relation to an entity resulting in a simple
arithmetic mean. Otherwise, the difference in the number of elements is relevant for the
calculation of similarity between the elements and the higher the penalty value the lower the
nal value of
r
.
It is highlighted that the penalties can be applied to the functions which analyze any set
of entities that relates to a given entity. They can be applied to the functions analyzing sets
of instances, hierarchies, values of properties, among others. The API of the GNOSISþ
provides an abstract class PenaltyFunction containing basic methods for these types of
functions and allowing penalties to be applied in new similarity functions inserted in the
application.
3.3 Persistence module
The persistence module is responsible for persisting the set of tuples received from the
analyzer module in comma-separated values (CSV), HTML and alignment format (AF). AF
is a format dened by Euzenat (2003,2004), proposed such that most matchers could be able
to produce alignments. The XML le presented in Figure 3 is an example of alignment le in
AF format. New formats can be added into the tool through GNOSISþAPI.
4. Evaluation and resultsdiscussion
To evaluate this approach, this section describes the experiment conducted with the
benchmark of the OAEI, which provides a set of tests to verify the conformity of the
alignments generated in accordance to the alignments expected. The benchmark used for
the tests contains a reference ontology for the domain of bibliographical references. The
reference ontology, described in OWL-DL and serialized in RDF/XML format, has 33 named
classes, 24 object properties, 40 data properties, 56 named individuals and 20 anonymous
individuals, in addition to making reference to external resources to express non-
bibliographical information. These include resources of the friend of a friend[3] and
iCalendar[4] to express concepts of person, organization and event.
Each test consists of analyzing a second ontology (different according to the test) and
generates an alignment le which contains the highest amount of alignments described in a
le of reference alignment provided to assess the quality of the alignment. The tests are
systematically generated from degradations of the reference ontology, discarding or altering
a certain amount of information to assess how an algorithm behaves upon the absence of
such information.
Specically for the OMM issue, these tests are adequate as each one demands that the
meta-matcher determines a higher weight for the algorithms which can have a higher
relevance to the nal result of the alignment and, on the other hand, discards or attributes a
lower weight for the similarity functions which will not be determinant for the alignment.
The tests are divided in three major categories. The tests initiated with 1 (or 1xx) do not have
signicant changes on the formalization of concepts; the tests initiated with 2 (or 2xx) consist of
systematic alterations in the formalization of concepts, and tests initiated with 3 (or 3xx) consist
IJWIS
16,2
162
of comparisons with external ontologies. Table IV summarizes the altered or suppressed
information of the reference ontology in each test selected for our experiment. Detailed
descriptions can be found in the tests website[5]. Six are the elements that can be altered:
Names of entities can be changed by random characters, synonyms, with
distinguished convention or translated into another language.
Comments can be suppressed or translated into another language.
Hierarchies can be suppressed, expanded or reduced.
Instances can be suppressed.
Properties can be suppressed or have its restrictions discarded.
Classes can be expanded, that is, related to many other classes, or reduced.
To assess the conformity of the alignments generated regarding the alignments expected,
the benchmark of the OAEI uses metrics of the area of information retrieval, such as
Table IV.
Results obtained
from the tests
# test Precision Recall Fall-out F-measure
101 1.00 1.00 0.00 1.00
102 N/A N/A N/A N/A
103 1.00 1.00 0.00 1.00
104 1.00 1.00 0.00 1.00
201 1.00 1.00 0.00 1.00
202 0.63 0.63 0.37 0.63
203 1.00 1.00 0.00 1.00
204 1.00 1.00 0.00 1.00
205 1.00 0.98 0.00 0.99
206 0.97 0.98 0.03 0.97
207 0.98 0.97 0.02 0.97
208 0.97 0.97 0.03 0.97
209 0.81 0.80 0.19 0.81
210 0.87 0.87 0.13 0.87
221 1.00 1.00 0.00 1.00
222 1.00 1.00 0.00 1.00
223 1.00 1.00 0.00 1.00
224 1.00 1.00 0.00 1.00
225 1.00 1.00 0.00 1.00
228 1.00 1.00 0.00 1.00
230 0.94 1.00 0.06 0.97
232 1.00 1.00 0.00 1.00
233 1.00 1.00 0.00 1.00
236 1.00 1.00 0.00 1.00
237 1.00 1.00 0.00 1.00
238 1.00 1.00 0.00 1.00
239 1.00 1.00 0.00 1.00
240 1.00 1.00 0.00 1.00
241 1.00 1.00 0.00 1.00
246 1.00 1.00 0.00 1.00
247 1.00 1.00 0.00 1.00
301 0.87 0.79 0.13 0.83
302 1.00 0.71 0.00 0.83
303 0.75 0.84 0.25 0.79
304 0.88 0.95 0.12 0.91
Multiple
ontology
matchers
163
precision, recall, f-measure, and fall-out rate. Given an alignment of reference Aand an
alignment obtained B, the metrics are dened according to equations (7-10):
precision A;B
ðÞ
¼jA\Bj
jBj(7)
recall A;B
ðÞ
¼jA\Bj
jAj(8)
fmeasure A;B
ðÞ
¼2PR
PþR(9)
fall out A;B
ðÞ
¼jBjjA\Bj
jBj(10)
The system was set up to use simple preprogrammed similarity functions instead of using
more efcient alignment algorithms proposed in the literature. Each function assessed the
similarity considering simple heuristics such as name of concepts, intersection of
individuals and relationship between classes from properties. Then, each function
implements a heuristic that analyzes a single type of entity of the ontologies. Therefore, the
assessment of the calibration of weights for each function becomes clearer asthe tests of the
OAEI are carried out with systematic distortions on the model. To resolve each distortion, a
stronger emphasis is required on the analysis of certain typesof entities than on others.
In case more complex similarity functions had been used, that is, functions which
analyzed different types of entities at the same time, it could not have been evident the
reason why each function received the weight attributed by the genetic algorithm.
Furthermore, by using simple functions, the processing time of the system decreases and it
can be demonstrated that once having a set of algorithms to cover possible changes that the
model may suffer, the relevance of the meta-matcher is to unveil how to use these algorithms
combined to obtain the best result.
The tests were carried out with seven similarity functions, which were used to analyze
name of entities, instances, comments, relationships, specializations, domain and range of
properties and property of classes. The tests were conducted with the module of calibration
set up to generate 100 individuals at the initial generation and run for 200 generations. The
problem was discretized for a granularity of 0.005. To feed the module of calibration, each
pre-alignment le was used to store six known correspondences (about 6-10 per cent of
reference alignment, which is far below the 100 per cent that some meta-aligners use).
Table V presents the results of the GNOSISþfor each test. Table VI presents the total
harmonic mean reached grouped per type of test.
The results obtained demonstrated that the tool is able to adapt to different kinds of
heterogeneity, fullling what it is expected from matchers. More complex similarity
functions can be added in the tool to apply to other types of relationships or scenarios which
are not represented by the test data set (such as actual situations in which it becomes
necessary to use external databases to support the alignment process). The experiments
were carried out in a 2,9 GHz Intel Core i7, 8 Gb RAM computer and the average time
was 33 sec per test case using our approach, while the average time using f-measure tness
function was 162 sec on related work.
IJWIS
16,2
164
Table VI shows the average of the metrics reached by other ontology meta-matchers.
The averages were calculated from the results provided in the original papers. It is worth
mentioning that authors[6] from related work usually choose a subset of the tests in their
experiments. GNOSISþachieved slightly better results even using a larger set oftests.
The input size can inuence the effectiveness of the approach. However, we argue that a
small set of correspondences is enough to reach good results. Figure 4 shows the
effectiveness according to the input size. For an input subset that contains 1 per cent from
reference alignment, the mean f-measure was equals to 0.2 with variance equals to 1.6. The
variance represents how far the f-measure varies for each benchmark evaluation. The mean
f-measure was calculated for 20 evaluations. This variance occurs because GNOSISþ
implements a stochastic population-based approach, that is, it addresses OMM problem as
an approximation problem. Thus, the less the size of the correspondence, worse is the
algorithms convergence. On the other hand, a large input subset is not required because 10
per cent reference alignment input subsets do not produce signicantly better results than
the 6 per cent reference alignment input.
Table V.
Results grouped per
type of test
# test Precision Recall Fall-out F-measure
1xx 1.00 1.00 0.00 1.00
2xx 0.96 0.96 0.04 0.96
3xx 0.86 0.83 0.14 0.85
All 0.96 0.96 0.04 0.96
Table VI.
Results in OEAI
data set
Tools Precision Recall F-measure
GNOSISþ0.96 0.96 0.96
Martinez-Gil et al. (2012) 0.86 0.81 0.82
Mathur et al. (2014) 0.93 0.72 0.79
Biniz and El Ayachi (2018) N/A N/A 0.93
Xue and Wang (2017) 0.97 0.94 0.95
Xue and Liu (2017) 0.96 0.92 0.94
Figure 4.
Mean f-measure (y-
axis) when xper cent
from the reference
alignment is set as
input (x-axis). The
bar height represents
the variance
Multiple
ontology
matchers
165
5. Concluding remarks
This paper presented a solution for the meta-matching of ontologies created to be
encompassed in other applications. The problem of OMM was modeled in GNOSISþas a
system of linear equations where the weights applied to the ontology matchers are variables
from the system. Although ontology meta-matchers approach in literatureusually make use
of the reference ontology for meta-heuristics training, we show that GNOSISþcan reach
good results with little training data.
The tool developed in this work benets the process of alignment by using techniques of
calibration of matchers, allowing the user to make use of several matchers to reach better
results. The tool can be used to match pairs of ontologies in RDF or OWL formats and
generate the return of the alignments in AF, CSV or HTML formats. Still, new alignments,
output formats and techniques to choose the alignments can be inserted in the tool by
implementing pre-determined interfaces. Therefore, the tool can be used to meet the users
necessities, either by nding the ideal weight for the matchers or by receiving new
algorithms.
Further studies should include new approaches for the calibration of weights of the
matchers. The GNOSISþimplements an uni-objective approach; however, multi-
objective approaches can be used to adapt to users need when the user wishes to
balance both precision and recall or has other restrictions which may inuence the
weight of the matchers, such as maximize the amount of correspondences or minimize
theamountofmatchersthroughperformance restrictions, such as Xue and Wang
(2017) and Marjit (2015). Therefore, probability approaches (Kimmig et al., 2017;
Jiménez-Ruiz et al.,2016) can make the calibrator module less dependent on the initial
input data. Recent interactive ontology matching approaches (Xue and Yao, 2018;da
Silva et al., 2018) may be used with GNOSISþto improve the results over the fully
automatic approach, as the new correspondences provided by experts may better guide
the algorithm.
Notes
1. Available at: http://oaei.ontologyalignment.org/
2. Available at: http://jena.apache.org/
3. Available at: http://xmlns.com/foaf/0.1/
4. Available at: www.w3.org/2002/12/cal/
5. Available at: http://oaei.ontologymatching.org/tests/
6. For instance, in Martinez-Gil et al. (2012), the authors made available only the results of tests 201,
202, 203, 204, 205, 206 and 301.
References
Abubakar, M., Hamdan, H., Mustapha, N. and Aris, T.N.M. (2018), Instance-based ontology matching:
a literature review,International Conference on Soft Computing and Data Mining,Springer,
pp. 455-469.
Babalou, S., Kargar, M.J. and Davarpanah, S.H. (2016), Large-scale ontology matching: a review of the
literature, 2016 Second International Conference on Web Research (ICWR), IEEE, pp. 158-165.
Biniz, M. and El Ayachi, R. (2018), Optimizing ontology alignments by using Neural NSGA-II,Journal
of Electronic Commerce in Organizations, Vol. 16 No. 1, pp. 29-42.
IJWIS
16,2
166
Chauhan, A., Varadarajan, V. and Sliman, L. (2018), Ontology matching techniques: a gold standard
model, arXiv preprint arXiv:1811.10191.
da Silva, J., Baiao, F.A., Revoredo, K. and Euzenat, J. (2017), Semantic interactive ontology matching:
synergistic combination of techniques to improve the set of candidate correspondences,
International Workshop on Ontology Matching.
Da Silva, J., Revoredo, K., Baiao, F.A. and Euzenat, J. (2018), Interactive ontology matching: using
expert feedback to select attribute mappings,13th ISWC Workshop on Ontology Matching
(OM), pp. 25-36.
Damerau, F. (1964), A technique for computer detection and correction of spelling errors,
Communications of the ACM, Vol. 7 No.3, pp. 171-176.
Deb, K., Agrawal, S., Pratap, A. and Meyarivan, T. (2000), A fast elitist non-dominated sorting genetic
algorithm for multi-objective optimization: NSGA-II,International Conference on Parallel
Problem Solving from Nature,Springer, pp. 849-858.
Djeddi, W.E., Khadir, M.T. and Yahia, S.B. (2016), Xmap: results for OAEI 2016, OM, pp. 216-221.
Euzenat, J. (2003), Towards composing and benchmarking ontology alignments,Proceedings of the
ISWC Workshop on Semantic Information Integration, Sanibel Island (FL US),Sanibel Island,
pp. 165-166.
Euzenat, J. (2004), An api for ontology alignment,Proceedings of the International Semantic Web
Conference, Volume 3298 of Lecture Notes in Computer Science,Springer,Hiroshima,
pp. 698-712.
Faria, D., Pesquita, C., Balasubramani, B.S., Tervo, T., Carriço, D., Garrilha, R., Couto, F.M. and Cruz,
I.F. (2018), Results of AML participation in OAEI 2018, Ontology Matching: OM-2018:
Proceedings of the ISWC Workshop, p. 125.
Guli
c, M., Vrdoljak, B. and Pti
cek, M. (2018), Automatically specifying a parallel composition of
matchers in ontology matching process by using genetic algorithm,Information, Vol. 9 No. 6,
pp. 138.
Hamming, R. (1950), Error detecting and errorcorrecting codes,Bell System Technical Journal, Vol. 29
No. 2, pp. 147-160.
Hu, W. and Qu, Y. (2008), Falcon-ao: a practical ontology matching system,Journal of Web Semantics,
Vol. 6 No. 3, pp. 237-239.
Janowicz, K. and Wilkes, M. (2009), Sim-dl a: a novel semantic similarity measure for description logics
reducing inter-concept to inter-instance similarity,The Semantic Web: Research and
Applications, pp. 353-367.
Jayasri, K. and Rajmohan, R. (2015), Exploration on service matching methodology based on
description logic using similarity performance parameters,International Journal of Research
and Development Organisation, Vol. 2 No. 3, p. 2.
Jiménez-Ruiz, E., Grau, B.C. and Cross, V.V. (2016), Logmap family participation in the OAEI 2016,
Internation Semantic Web Conference, pp. 185-189.
Karimi, H. and Kamandi, A. (2019), A learning-based ontology alignment approach using inductive
logic programming,Expert Systems with Applications, Vol. 125, pp. 412-424.
Kimmig, A., Memory, A., Miller, R.J. and Getoor, L. (2017), A collective, probabilistic approach to
schema mapping, 2017 IEEE 33rd International Conference on Data Engineering (ICDE), IEEE,
pp. 921-932.
Kolyvakis, P., Kalousis, A. and Kiritsis, D. (2018), Deepalignment: unsupervised ontology matching
with rened word vectors,Proceedings of the 2018 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies,Association
for Computational Linguistics,New Orleans, Long Papers, Vol. 1, pp. 787-798.
Kondrak, G. (2005), N-gram similarity and distance,String Processing and Information Retrieval,
Springer, pp. 115-126.
Multiple
ontology
matchers
167
Lambrix, P., Tan, H. and Liu, Q. (2008), Sambo and sambodtf results for the ontology alignment
evaluation initiative 2008,Proceedings of the 3rd International Conference on Ontology
Matching-Vol. 431, CEUR-WS. Org, pp. 190-198.
Levenshtein, V. (1966), Binary codes capable of correcting deletions, insertions, and reversals, Vol. 10
No. 8, pp. 707-710.
Li, W., Zhang, S. and Qi, G. (2018), A graph-based approach for resolving incoherent ontology
mappings,Web Intelligence, Vol. 16 No. 1, pages, pp. 15-35.
Li, J., Tang, J., Li, Y. and Luo, Q. (2009), Rimom: a dynamic multistrategy ontology alignment
framework,IEEE Transactions on Knowledge and Data Engineering, Vol. 21 No. 8,
pp. 1218-1232.
Liu, X., Cheng, B., Liao, J., Barnaghi, P., Wan, L. and Wang, J. (2016), Omi-dl: an ontology matching
framework,IEEE Transactions on Services Computing, Vol. 9 No. 4, pp. 580-593.
Marjit, U. (2015), Aggregated similarity optimization in ontology alignment through multiobjective
particle swarm optimization,IJARCCE, Vol. 4 No.2, pp. 258-263.
Martinez-Gil, J. and Aldana-Montes, J.F. (2012), An overview of current ontology meta-matching
solutions,The Knowledge Engineering Review, Vol. 27 No. 4, pp. 393-412.
Martinez-Gil, J., Navas-Delgado, I. and Aldana-Montes, J.F. (2012), Maf: an ontology matching
framework,Journal of Universal Computer Science, Vol. 18 No. 2, pp. 194-217.
Mathur, I., Joshi, N., Darbari, H. and Kumar, A. (2014), Shiva: a framework for graph based ontology
matching,International Journal of Computer Applications, Vol. 89 No. 11, pp. 30-34.
Mohammadi, M., Hofman, W. and Tan, Y.H. (2019), A comparative study of ontology matching
systems via inferential statistics,IEEE Transactions on Knowledge and Data Engineering,
Vol. 31 No. 4, pp. 615-628.
Mohammadi, M., Atashin, A.A., Hofman, W. and Tan, Y. (2018), Comparison of ontology alignment
systems across single matching task via the McNemars test,ACM Transactions on Knowledge
Discovery from Data ( Data), Vol. 12 No. 4, p. 51.
Otero-Cerdeira, L., Rodríguez-Martínez, F.J. and G
omez-Rodríguez, A. (2015), Ontology matching: a
literature review,Expert Systems with Applications, Vol. 42 No. 2, pp. 949-971.
Quintero, R., Torres-Ruiz, M., Menchaca-Mendez, R., Moreno-Armendariz, M.A., Guzman, G. and
Moreno-Ibarra, M. (2018), Dis-c: conceptual distance in ontologies, a graph-based approach,
Knowledge and Information Systems, pages, pp. 1-33.
Robertson, S. and Jones, K. (1976), Relevance weighting of search terms,Journal of the American
Society for Information Science, Vol. 27 No. 3, pp. 129-146.
Sengupta, K. and Hitzler, P. (2015), Towards defeasible mappings for tractable description logics,
International Semantic Web Conference,Springer, pp. 237-252.
Shao, C., Hu, L.-M., Li, J.-Z., Wang, Z.-C., Chung, T. and Xia, J.-B. (2016), Rimom-im: a novel iterative framework
for instance matching,Journal of Computer Science and Technology, Vol. 31 No. 1, pp. 185-197.
Souza, J.F., Siqueira, S.W.M., Melo, R.N. and de Lucena, C.J.P. (2014), Análise de abordagens
populacionais Para Meta-alinhamento de ontologias,iSys-Revista Brasileira de Sistemas de
Informação, Vol. 7No. 4, pp. 75-97.
Souza, J.M. (1986), Software tools for conceptual schema integration, Tese de doutorado, University of
East Anglia.
Tang, Y., Wang, P., Pan, Z. and Liu, H. (2018), Lily results for OAEI 2018, Ontology Matching: OM-
2018: Proceedings of the ISWC Workshop, p. 179.
Thiéblin, É., Haemmerlé, O., Hernandez, N. and Trojahn, C. (2019), Survey on complex ontology
matching,Semantic Web Journal.
Van Hage, W.R., Katrenko, S. and Schreiber, G. (2005), A method to combine linguistic ontology-
mapping techniques,International Semantic Web Conference,Springer, pp. 732-744.
IJWIS
16,2
168
Wang, Z., Zhang, X., Hou, L. and Li, J. (2011), Rimom2: a exible ontology matching framework,
Proceedings of the ACM WebSci11, pp. 1-2.
Winkler, W.E. (1999), The state of record linkage and current research problems,Technical Report 99/
04, Statistical Research Division, US Census Bureau.
Wu, T., Qi, G., Wang, H., Xu, K. and Cui, X. (2016), Cross-lingual taxonomy alignment with bilingual
biterm topic model,THE AAAI Conference on Articial Intelligence, pp. 287-293.
Xue, X. and Liu, J. (2017), Optimizing ontology alignment through compact MOEA/D,International
Journal of Pattern Recognition and Articial Intelligence, Vol. 31 No. 4, p. 1759004.
Xue, X. and Pan, J.-S. (2018), A compact co-evolutionary algorithm for sensor ontology meta-
matching,Knowledge and Information Systems, Vol. 56 No. 2, pp. 335-353.
Xue, X. and Wang, Y. (2017), Improving the efciency of nsga-ii based ontology aligning technology,
Data and Knowledge Engineering, Vol. 108, pp. 1-14.
Xue, X., Chen, J., Chen, J. and Chen, D. (2018a), A hybrid nsga-ii for matching biomedical ontology,
International Conference on Intelligent Information Hiding and Multimedia Signal Processing,
Springer, pp. 3-10.
Xue, X., Chen, J., Chen, J. and Chen, D. (2018b), Using compact coevolutionary algorithm for matching
biomedical ontologies, Computational intelligence and neuroscience.
Xue, X. and Yao, X. (2018), Interactive ontology matching based on partial reference alignment,
Applied Soft Computing, Vol. 72, pp. 355-370.
Zang, Y., Wang, J. and Zhu, X. (2016), A general framework for graph matching and its application in
ontology matching, International Conference on Web-Age Information Management, Springer,
pp. 365-377.
Zhang, Q. and Li, H. (2007), Moea/d: a multiobjective evolutionary algorithm based on decomposition,
IEEE Transactions on Evolutionary Computation, Vol. 11 No. 6, pp. 712-731.
Corresponding author
Jairo Francisco de Souza can be contacted at: jairo.souza@ice.ufjf.br
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com
Multiple
ontology
matchers
169
Reproduced with permission of copyright owner. Further
reproduction prohibited without permission.
... This research evaluated three different metaheuristics: Genetic Algorithm (GA), Prey-Predator Algorithm (PPA), and a Greedy Randomized Adaptive Search Procedure (GRASP). GA is based on the classical algorithm proposed by Holland [20] inspired by the evolutionary biology and implementation details can be found in [21]. PPA algorithm is based on the movement pressure that forces a set of preys (average solutions) to run away from a predator (worst solution); this algorithm was based on the approach proposed by [22] and implementation details can be found in [23]. ...
... It is a harmonic mean between MatchRatio and MatchCoverage, where MatchRatio and MatchCoverage are substitutes for Precision and Recall, respectively [25]. -Linear System-based objective function evaluates the ability of a solution to solve a linear system built from a set of reference correspondences [21]. It is considered a semisupervised function due to the size of the reference set, which is from 3 to 4% of the correct alignment. ...
... After defining the hypotheses about the similarity measures used in the Modeling step, the next hypotheses refer to the meta-heuristics and objective functions, dividing between those used in the Optimization step and the ones used in Correspondence Selection step. There are several works addressing the OMM problem using different meta-heuristics, such as Genetic Algorithm (GA) [21], Memetic Algorithm [36], and Particle Swarm Optimization (PSO) [37]. Each meta-heuristic can make use of one or more objective functions, which are responsible for guiding the meta-heuristic solution(s) in the search space. ...
Article
Full-text available
Every year, new ontology matching approaches have been published to address the heterogeneity problem in ontologies. It is well known that no one is able to stand out from others in all aspects. An ontology meta-matcher combines different alignment techniques to explore various aspects of heterogeneity to avoid the alignment performance being restricted to some ontology characteristics. The meta-matching process consists of several stages of execution, and sometimes the contribution/cost of each algorithm is not clear when evaluating an approach. This article presents the evaluation of solutions commonly used in the literature in order to provide more knowledge about the ontology meta-matching problem. Results showed that the more characteristics of the entities that can be captured by similarity measures set, the greater the accuracy of the model. It was also possible to observe the good performance and accuracy of local search-based meta-heuristics when compared to global optimization meta-heuristics. Experiments with different objective functions have shown that semi-supervised methods can shorten the execution time of the experiment but, on the other hand, bring more instability to the result.
... Throughout the evolution of Web, several challenges have emerged to optimize information exchange and facilitate the consumption of Web-published data. Ontologies have emerged to determine a well-defined meaning for data and to reduce semantic heterogeneity problems when this data is consumed (De Souza et al. 2019). However, ontologies themselves have become the cause of a new problem in data semantics: using more than one ontology in an application can lead to ambiguity in data interpretation, mainly because ontologies are built by engineers with different needs and domain views . ...
... In these systems, traditional information retrieval measures are used: precision (rate of correct correspondences returned), recall (the rate of expected correct correspondences returned) and f-measure (harmonic mean between precision and recall), as in Xue et al. (2014), Marjit (2015), and Biniz and Ayachi (2018). In De Souza et al. (2019), the reference ontology is used in the construction of a linear system whose objective function seeks to find the best solution to solve the system. Due to the different types of meta-heuristics and objective functions, frameworks for comparing different proposals are important to understand what actually happens in each approach. ...
... Although simple, the front-end interface for users brings the solution closer to the real world applications, partly attending to some discussions on how to popularize and increase confidence in the OMM approaches . Several matchers can be used in the MaF, including hybrid matchers where atomic matchers are combined (hybrid matchers are also explored in other works (De Souza et al. 2019)). The variety of combinations of algorithms implemented in the tool makes scientific experimentation through the MaF customizable and easily performed from the point of view of the front-end user. ...
Article
Full-text available
Ontology matching has become a key issue to solve problems of semantic heterogeneity. Several researchers propose diverse techniques that can be used in distinct scenarios. Ontology meta-matching approaches are a specialization of ontology matching and have achieved good results in pairs of ontologies with different types of heterogeneities. However, developing a new ontology meta-matcher can be a costly process and a lot of experiments are often carried out to analyze the behavior of the matcher. This article presents a modularized framework that covers the main stages of the ontology meta-matching evaluation process. This framework aims to aid researchers to develop and analyze algorithms for ontology meta-matching, mainly metaheuristic-based supervised and unsupervised approaches. As the main contribution of the research, the framework proposed will facilitate the evaluation of ontology meta-matching approaches and, as the secondary contribution, a data provenance model that captures the main information generated and consumed throughout experiments is presented in the framework.
... For example, the OpenBiodiv 4 ontology introduces classes, properties, and axioms that align with several important domain ontologies (FaBiO, DoCO, DwC, Darwin-SW, NOMEN, ENVO) [16]. A meta-alignment approach is also proposed to align distinctive standards using simple similarity functions [17]. ...
... In addition, new tools for the aggregation of matchers (Ferranti et al., 2021;de Souza et al., 2020) as well as new techniques to achieve better results such as grasshopper optimization (Lv & Peng, 2020) have recently appeared. Or even evolutionary strategies that provide answers to problems such as premature convergence and the requirement of a reference alignment (Lv et al., 2021). ...
Article
Ontology meta-matching techniques have been consolidated as one of the best approaches to face the problem of discovering semantic relationships between knowledge models that belong to the same domain but have been developed independently. After more than a decade of research, the community has reached a stage of maturity characterized by increasingly better results and aspects such as the robustness and scalability of solutions have been solved. However, the resulting models remain practically intelligible to a human operator. In this work, we present a novel approach based on Mamdani fuzzy inference exploiting a model very close to natural language. This fact has a double objective: to achieve results with high degrees of accuracy but at the same time to guarantee the interpretability of the resulting models. After validating our proposal with several ontological models popular in the biomedical field, we can conclude that the results obtained are promising.
Article
Full-text available
Today, there is a rapid increase of the available data because of advances in information and communications technology. Therefore, many mutually heterogeneous data sources that describe the same domain of interest exist. To facilitate the integration of these heterogeneous data sources, an ontology can be used as it enriches the knowledge of a data source by giving a detailed description of entities and their mutual relations within the domain of interest. Ontology matching is a key issue in integrating heterogeneous data sources described by ontologies as it eases the management of data coming from various sources. The ontology matching system consists of several basic matchers. To determine high-quality correspondences between entities of compared ontologies, the matching results of these basic matchers should be aggregated by an aggregation method. In this paper, a new weighted aggregation method for parallel composition of basic matchers based on genetic algorithm is presented. The evaluation has confirmed a high quality of the new aggregation method as this method has improved the process of matching two ontologies by obtaining higher confidence values of correctly found correspondences and thus increasing the quality of matching results.
Article
Full-text available
Comparing ontology matching systems are typically performed by comparing their average performances over multiple datasets. However, this paper examines the alignment systems using statistical inference since averaging is statistically unsafe and inappropriate. The statistical tests for comparison of two or multiple alignment systems are theoretically and empirically reviewed. For comparison of two systems, the Wilcoxon signed-rank and McNemar's mid-p and asymptotic tests are recommended due to their robustness and statistical safety in different circumstances. The Friedman and Quade tests with their corresponding post-hoc procedures are studied for comparison of multiple systems, and their [dis]advantages are discussed. The statistical methods are then applied to benchmark and multifarm tracks from the ontology matching evaluation initiative (OAEI) 2015 and their results are reported and visualized by critical difference diagrams.
Article
Full-text available
This paper presents the DIS-C approach, which is a novel method to assess the conceptual distance between concepts within an ontology. DIS-C is graph based in the sense that the whole topology of the ontology is considered when computing the weight of the relationships between concepts. The methodology is composed of two main steps. First, in order to take advantage of previous knowledge, an expert of the ontology domain assigns initial weight values to each of the relations in the ontology. Then, an automatic method for computing the conceptual relations refines the weights assigned to each relation until reaching a stable state. We introduce a metric called generality that is defined in order to evaluate the accessibility of each concept, considering the ontology like a strongly connected graph. Unlike most previous approaches, the DIS-C algorithm computes similarity between concepts in ontologies that are not necessarily represented in a hierarchical or taxonomic structure. So, DIS-C is capable of incorporating a wide variety of relationships between concepts such as meronymy, antonymy, functionality and causality.
Article
Full-text available
Ontology mappings are regarded as the semantic bridges that link entities from different yet overlapping ontologies in order to support knowledge sharing and reuse on the Semantic Web. However, mappings can be wrong and result in logical conflicts among ontologies. Such kind of mappings are called incoherent mappings. As an important part of ontology matching, mapping validation aims at detecting the conflicts and restoring the coherence of mappings. In this paper, we propose a graphbased approach which is complete for detecting incoherent mappings among DL-Lite ontologies. The lightweight DL-Lite family of description logics stand out for tractable reasoning and efficient query answering capabilities. Our approach consists of a set of graph construction rules, a graph-based incoherence detection algorithm, and a graph-based incoherence repair algorithm. We propose and formalize three repair principles in an attempt to measure the wrong mappings, where the notion of common closures w.r.t. a mapping arc in the constructed graph is introduced. These principles feature a global removal strategy that is independent of individual ontology matchers. In order to relieve the loss of information among ontologies in the repair process, we further define a mapping revision operator so that common closures related to the removed mappings can be preserved in the graph. We implement the graph-based algorithms and evaluate their performance in a comparison with state-of-the-art systems on real-world ontologies. Experimental results show that our approach can remove more wrong mappings and achieve better repairing results in most of the cases.
Article
Full-text available
In this article, the authors propose a new hybrid approach based on a continuous Non-dominated Sorting Genetic Algorithm II (NSGA-II) and a neural network to refine the alignment results. This approach consists of three phases: (i) pre-alignment phase which allows to identify the formats of input ontologies, to adapt them and to transform them into Ontology Web Language (OWL) in order to solve the problem of heterogeneity of representation. (ii) alignment phase which combines syntactic and linguistic matching techniques and methods, based on the relevant attributes per different points of syntactic and structural technic. (iii) The post-alignment phase which optimizes the matching by a hybrid technique of continuous NSGA-II and networks of neurons. This approach is compared with the greatest systems per the Ontology Alignment Evaluation Initiative (OAEI) standard. The experimental results appear that the proposed approach is effective.
Article
Full-text available
With the proliferation of sensors, semantic web technologies are becoming closely related to sensor network. The linking of elements from semantic web technologies with sensor networks is called semantic sensor web whose main feature is the use of sensor ontologies. However, due to the subjectivity of different sensor ontology designer, different sensor ontologies may define the same entities with different names or in different ways, raising so-called sensor ontology heterogeneity problem. There are many application scenarios where solving the problem of semantic heterogeneity may have a big impact, and it is urgent to provide techniques to enable the processing, interpretation and sharing of data from sensor web whose information is organized into different ontological schemes. Although sensor ontology heterogeneity problem can be effectively solved by Evolutionary Algorithm (EA)-based ontology meta-matching technologies, the drawbacks of traditional EA, such as premature convergence and long runtime, seriously hamper them from being applied in the practical dynamic applications. To solve this problem, we propose a novel Compact Co-Evolutionary Algorithm (CCEA) to improve the ontology alignment’s quality and reduce the runtime consumption. In particular, CCEA works with one better probability vector (PV) \(PV_{better}\) and one worse PV \(PV_{worse}\), where \(PV_{better}\) mainly focuses on the exploitation which dedicates to increase the speed of the convergence and \(PV_{worse}\) pays more attention to the exploration which aims at preventing the premature convergence. In the experiment, we use Ontology Alignment Evaluation Initiative (OAEI) test cases and two pairs of real sensor ontologies to test the performance of our approach. The experimental results show that CCEA-based ontology matching approach is both effective and efficient when matching ontologies with various scales and under different heterogeneous situations, and compared with the state-of-the-art sensor ontology matching systems, CCEA-based ontology matching approach can significantly improve the ontology alignment’s quality.
Article
Ontologies are key concepts in the semantic web and have an impressive role which comprise the biggest and the most prominent part of the infrastructure in this realm of web research. By fast growth of the semantic web and also, the variety of its applications, ontology mapping (ontology alignment) has been transformed into a crucial issue in the realm of computer science. Several approaches are introduced for ontology alignment during these last years, but developing more accurate and efficient algorithms and finding new effective techniques and algorithms for this problem is an interesting research area since real-world applications with respect to their more complicated concepts need more efficient algorithms. In this paper, we illustrated a new ontology mapping method based on learning using Inductive Logic Programming (ILP), and show how the ILP can be used to solve the ontology mapping problem. As a matter of fact, in this approach, an ontology which is described in OWL format is interpreted to first-order logic. Then, with the use of learning based on inductive logic, the existing hidden rules and relationships between concepts are discovered and presented. Since the inductive logic has high flexibility in solving problems such as discovering relationships between concepts and links, it also can be performed effectively in solving the ontology alignment problem. Our experimental results show that this technique yield to more accurate results comparing to other matching algorithms and systems, achieving an F-measure of 95.6% and 91% on two well-known reference datasets the Anatomy and the Library, respectively.
Article
The technique that enables the user and the automatic ontology matching tool to cooperate with each other to generate high-quality alignments in a reasonable amount of time is referred to as the interactive ontology matching. Interactive ontology matching poses a new challenge in a way of how to efficiently leverage user validation to improve the ontology alignment. To address this challenge, this paper presents an innovative interactive ontology matching technique based on Partial Reference Alignment (PRA) to better balance between the large workload posed on users and the demand of improving the quality of ontology alignment. In particular, a PRA-based Interactive Compact Hybrid Evolutionary Algorithm (ICHEA) is proposed to reduce user workload, by adaptively determining the timing of involving users, showing them the most problematic mappings, and helping them to deal with multiple conflicting mappings simultaneously. Meanwhile, it increases the value of user involvement by propagating the confidences of validated mappings, as well as reducing the negative effects brought by the erroneous user validations. The well-known OAEI 2016's benchmark track and interactive track are utilized to test the performance of this approach. The experimental results on benchmark track show that both the f-measure and the f-measure per second of this approach outperform those of the OAEI participants and three state-of-the-art Evolutionary Algorithm (EA) based ontology matching techniques. In addition, the experimental results of three interactive testing cases further show that ICHEA can efficiently determine high-quality ontology alignments under different cases of user error rates, and the performance of the approach is generally better than that of state-of-the-art interactive ontology matching systems.