Article

Ontology Matching: State of the Art and Future Challenges

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

After years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology matching still making progress? Is this progress significant enough to pursue further research? If so, what are the particularly promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. These results show a measurable improvement in the field, the speed of which is albeit slowing down. We conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... However, a user's time and effort are limited resources and therefore it is necessary to consider strategies and approaches which would both limit interaction with the user and facilitate the user's involvement. The relevance of user involvement is evidenced by the fact that nearly half of the future challenges of the ontology matching area [111] are directly related to it. ...
... There are a number of surveys on the state-of-the art (e.g. [65,85,95,97,109,110,111]) as well as books (e.g. [14,41]) discussing different aspects of the field. ...
... [14,41]) discussing different aspects of the field. However, even with more than 20 years of research there are still a number of challenges facing the community [110,111]. The first challenge is related to the issue of evaluating large-scale ontology matching. ...
... The second and equally critical stage is the synthesis of these SFs into an optimal ensemble, a structure that can integrate disparate SFs into a harmonious whole. However, it is the interaction between SFs, often a web of complex and subtle interactions, that presents a formidable challenge, one that precludes simplistic or one-dimensional selection strategies [12]. ...
... Their union = ∪ ∪ is called ontology entities. Due to the absence of uniform design standards, ontologies often suffer from the heterogeneity problem [12], and OM addresses this issue by identifying correspondences between semantically similar entities [8]. ...
... Due to the challenges posed by the high-dimensional search space and complex interactions between SFs in OM, specialized FS methods are essential [12] to improve the quality of matching results. These methods aim to enhance matching accuracy by removing irrelevant or redundant SFs while retaining semantically significant ones, which can be divided into three categories, i.e., embedding-based, filter-based, and wrapper-based FS methods [22]. ...
Article
Full-text available
Ontology serves as a structured knowledge representation that models domain-specific concepts, properties, and relationships. Ontology matching (OM) aims to identify similar entities across distinct ontologies, which is essential for enabling communication between them. At the heart of OM lies the similarity feature (SF), which measures the likeness of entities from different perspectives. Due to the intricate nature of entity diversity, no single SF can be universally effective in heterogeneous scenarios, which underscores the urgency to construct an SF with high discriminative power. However, the intricate interactions among SFs make the selection and combination of SFs an open challenge. To address this issue, this work proposes a novel kernel principle component analysis and evolutionary algorithm (EA) to automatically construct SF for OM. First, a two-stage framework is designed to optimize SF selection and combination, ensuring holistic SF construction. Second, a cosine similarity-driven kPCA is presented to capture intricate SF relationships, offering precise SF selection. Finally, to bolster the practical application of EA in the SF combination, a novel evaluation metric is developed to automatically guide the algorithm toward more reliable ontology alignments. In the experiment, our method is compared with the state-of-the-art OM methods in the Benchmark and Conference datasets provided by the ontology alignment evaluation initiative. The experimental results show its effectiveness in producing high-quality ontology alignments across various matching tasks, significantly outperforming the state-of-the-art matching methods.
... DatasetIntegration dataset. The practice of merging ontologies is well established (Shvaiko and Euzenat, 2013). However, no research has yet combined hate speech taxonomies to make existing datasets suitable for iterative federated learning. ...
... As seen in the work already, certain subparts of the two choose example taxonomies could not be merged. The problems seen here are similar to the problems arising and handled within the ontology matching community (Shvaiko and Euzenat, 2013), the found solutions from that field will greatly contribute to future development of the approach. Furthermore, a significant challenge is that at least the first round of training is done with possibly mislabeled data, which could lead to underperformance in the field. ...
Preprint
Full-text available
Algorithmic hate speech detection faces significant challenges due to the diverse definitions and datasets used in research and practice. Social media platforms, legal frameworks, and institutions each apply distinct yet overlapping definitions, complicating classification efforts. This study addresses these challenges by demonstrating that existing datasets and taxonomies can be integrated into a unified model, enhancing prediction performance and reducing reliance on multiple specialized classifiers. The work introduces a universal taxonomy and a hate speech classifier capable of detecting a wide range of definitions within a single framework. Our approach is validated by combining two widely used but differently annotated datasets, showing improved classification performance on an independent test set. This work highlights the potential of dataset and taxonomy integration in advancing hate speech detection, increasing efficiency, and ensuring broader applicability across contexts.
... Such fusion must be conducted using manually created ontologies. Another approach to knowledge capture is learning ontologies [28][29][30] by automating their construction from texts. This often requires carrying out several processes, such as the identification of relevant terms [31][32][33][34], determination of concepts [35], obtaining of taxonomic relations through semantic similarity [36], or obtaining of partonomic relations. ...
... All these sets have to be pre-defined since well-written texts are required as inputs to find patterns with a certain guarantee of reliability. The approach we propose in this study focuses on the fusion of manually built ontologies [29]. This approach relies only on the expert's capacity for expressing her/his knowledge in an intuitive manner, that is, by selecting concepts, expressing attributes of each of them, and defining the (taxonomic and partonomic) relations between such concepts. ...
Article
Full-text available
In the construction of knowledge bases, it is very important to evaluate the quality of the knowledge entered into them. This is exacerbated in public administrations, where knowledge should be oriented towards public services. In this study, an artificial intelligence-based method for the evaluation of knowledge is described. This method takes advantage of the structure and contents of the knowledge representation schemas (representing the knowledge of the corresponding experts) to carry out knowledge evaluation. More precisely, the method allows the various comparisons between the schemas to be integrated and the overall schema to evaluate the contribution of each schema.
... While the problem of EA was introduced a few years ago, the more generic version of the problem -identifying entity records referring to the same real-world entity from different data sources-has been investigated from various angles by different communities, under the names of entity resolution (ER) [15,18,45], entity matching [13,42], record linkage [8,34], deduplication [16], instance/ontology matching [20,35,[49][50][51], link discovery [43,44], and entity linking/entity disambiguation [11,29]. Next, we describe the related work and the scope of this book. ...
... Certain methods for ER are created with the purpose of managing KGs and focus solely on binary connections, or data shaped like a graph. These methods are sometimes called instance/ontology matching approaches [49,50]. The graphshaped data comes with its own challenges: (1) Entities in graph-shaped data often lack detailed textual descriptions and may only be represented by their name, with a minimal amount of accompanying information. ...
Chapter
Full-text available
In this section, we provide a concise overview of the entity alignment task and also discuss other related tasks that have a close connection to entity alignment.
... Before the emergence of embedding-based EA, there have already been many conventional frameworks that match KGs in symbolic spaces [17,41,42]. While some are based on equivalence reasoning man- dated by OWL semantics [17], some leverage similarity computation to compare the symbolic features of entities [42]. ...
... The matching of relations (or ontology) between KGs has also been studied by prior symbolic works [41,42]. Nevertheless, compared with entities, they are usually in smaller amounts, of various granularities [36], and under-explored in embedding-based approaches [51]. ...
Chapter
Full-text available
In this chapter, we introduce recent progress of the alignment inference stage.
... Graph theory and ontology-based approaches, while offering improved interpretability, can become computationally intensive when dealing with large, complex datasets [2], [12], [36]. Advanced methods in high-performance computing, such as entity alignment from knowledge graphs, require sophisticated computer systems with high memory capacity, limiting their accessibility [16], [32], [33], [38]. Given these limitations, a logic-based approach to knowledge similarity could offer a promising and innovative solution. ...
Preprint
In this article, we present a novel method for assessing the similarity of information within knowledge-bases using a logical point of view. This proposal introduces the concept of a similarity property space Ξ\XiP for each knowledge K, offering a nuanced approach to understanding and quantifying similarity. By defining the similarity knowledge space Ξ\XiK through its properties and incorporating similarity source information, the framework reinforces the idea that similarity is deeply rooted in the characteristics of the knowledge being compared. Inclusion of super-categories within the similarity knowledge space Ξ\XiK allows for a hierarchical organization of knowledge, facilitating more sophisticated analysis and comparison. On the one hand, it provides a structured framework for organizing and understanding similarity. The existence of super-categories within this space further allows for hierarchical organization of knowledge, which can be particularly useful in complex domains. On the other hand, the finite nature of these categories might be restrictive in certain contexts, especially when dealing with evolving or highly nuanced forms of knowledge. Future research and applications of this framework focus on addressing its potential limitations, particularly in handling dynamic and highly specialized knowledge domains.
... The selection of basic matchers is a major challenge when it comes to achieving optimal matching results in the system. In addition, the architecture of the ontology matching system itself, which integrates these matchers to compute reliable correspondences between ontologies [13], plays a crucial role. In our previous work [12], we presented the architecture of the first CroMatcher version. ...
Article
Full-text available
One of the main challenges in ontology matching is to match ontologies with high accuracy. Therefore, ontology matching systems typically use multiple basic matchers, each targeting a specific ontology component for the matching process. However, optimizing the combination of these matchers remains an open problem. In this paper, we present CroMatcher 2.0, an improved ontology matching system that aims to overcome these challenges. We introduce two new basic matchers. The first matcher determines correspondence between entities by comparing strings obtained from entity IDs and annotations using the English lexical database to find similarities between tokens of the compared strings, considering their mutual relations (synonyms, hypernyms, etc.). The second matcher determines the correspondence between entities using the special mediator ontology, which is very valuable for ontology matching as it can contain additional information about the compared ontologies. In this paper, we tested this matcher on the Ontology Alignment Evaluation Initiative Anatomy track by using the Uberon mediator ontology, which contains a lot of information about anatomical structures. We also introduce a new weighted aggregation method (Autoweight 3.0) that automatically determines the weighting factors of the basic matchers in the parallel composition. CroMatcher 2.0 was evaluated in three test cases of the Ontology Alignment Evaluation Initiative (Benchmark, Anatomy, and Conference) and showed competitive performance compared to other state-of-the-art systems. The results position CroMatcher 2.0 among the best ontology matching systems for these datasets and confirm the effectiveness of the newly introduced methods.
... Graph theory and ontology-based approaches, while offering improved interpretability, can become computationally intensive when dealing with large, complex datasets [2], [12], [36]. Advanced methods in high-performance computing, such as entity alignment from knowledge graphs, require sophisticated computer systems with high memory capacity, limiting their accessibility [16], [32], [33], [38]. Given these limitations, a logic-based approach to knowledge similarity could offer a promising and innovative solution. ...
Conference Paper
Full-text available
In this work, we present a novel method for assessing the similarity of information within knowledge-bases using a logical point of view. This proposal introduces the concept of a similarity property space ΞP for each knowledge K, offering a nuanced approach to understanding and quantifying similarity. By defining the similarity knowledge space ΞK through its properties and incorporating similarity source information, the framework reinforces the idea that similarity is deeply rooted in the characteristics of the knowledge being compared. Inclusion of super-categories within the similarity knowledge space ΞK allows for a hierarchical organization of knowledge, facilitating more sophisticated analysis and comparison. On the one hand, it provides a structured framework for organizing and understanding similarity. The existence of super-categories within this space further allows for hierarchical organization of knowledge, which can be particularly useful in complex domains. On the other hand, the finite nature of these categories might be restrictive in certain contexts, especially when dealing with evolving or highly nuanced forms of knowledge.
... Ontology mappings. The construction of mappings (or alignments) between ontologies is an important challenge in ontology engineering and integration [95]. Given two ontologies O 1 and O 2 in different signatures Σ 1 and Σ 2 , the problem is to align the vocabulary items in Σ 1 with those in Σ 2 using a TBox T 12 that states logical relationships between Σ 1 and Σ 2 . ...
Preprint
The question whether an ontology can safely be replaced by another, possibly simpler, one is fundamental for many ontology engineering and maintenance tasks. It underpins, for example, ontology versioning, ontology modularization, forgetting, and knowledge exchange. What safe replacement means depends on the intended application of the ontology. If, for example, it is used to query data, then the answers to any relevant ontology-mediated query should be the same over any relevant data set; if, in contrast, the ontology is used for conceptual reasoning, then the entailed subsumptions between concept expressions should coincide. This gives rise to different notions of ontology inseparability such as query inseparability and concept inseparability, which generalize corresponding notions of conservative extensions. We survey results on various notions of inseparability in the context of description logic ontologies, discussing their applications, useful model-theoretic characterizations, algorithms for determining whether two ontologies are inseparable (and, sometimes, for computing the difference between them if they are not), and the computational complexity of this problem.
... Transformed ontologies were aligned to provide better integrated coverage of the domain than was provided by any single ontology on its own. Due to the nature of data, we used a terminological approach to the alignment (for other approaches see [Shvaiko and Euzenat 2013]), more specifically a character-based similarity measure and the I-SUB technique [Stoilos et al. 2005]. The concepts in the ontologies also made use of MACS [Clavel-Merrin 2004] to facilitate multilingual access to the resources for English, German and French. ...
Preprint
Full-text available
The CENDARI infrastructure is a research supporting platform designed to provide tools for transnational historical research, focusing on two topics: Medieval culture and World War I. It exposes to the end users modern web-based tools relying on a sophisticated infrastructure to collect, enrich, annotate, and search through large document corpora. Supporting researchers in their daily work is a novel concern for infrastructures. We describe how we gathered requirements through multiple methods to understand the historians' needs and derive an abstract workflow to support them. We then outline the tools we have built, tying their technical descriptions to the user requirements. The main tools are the Note Taking Environment and its faceted search capabilities, the Data Integration platform including the Data API, supporting semantic enrichment through entity recognition, and the environment supporting the software development processes throughout the project to keep both technical partners and researchers in the loop. The outcomes are technical together with new resources developed and gathered, and the research workflow that has been described and documented.
... E-learning is one of the application fields of ontology. • Ontology Matching: It semi-automatically identifies the correspondence between entities of ontology for the purpose of merging and question answering system (Shvaiko and Euzenat, 2013). • Ontology Merging: This operation merges the entire (or some) heterogeneous ontology source to form a new ontology (De Bruijn et al., 2006). ...
Preprint
Ontologies provide features like a common vocabulary, reusability, machine-readable content, and also allows for semantic search, facilitate agent interaction and ordering & structuring of knowledge for the Semantic Web (Web 3.0) application. However, the challenge in ontology engineering is automatic learning, i.e., the there is still a lack of fully automatic approach from a text corpus or dataset of various topics to form ontology using machine learning techniques. In this paper, two topic modeling algorithms are explored, namely LSI & SVD and Mr.LDA for learning topic ontology. The objective is to determine the statistical relationship between document and terms to build a topic ontology and ontology graph with minimum human intervention. Experimental analysis on building a topic ontology and semantic retrieving corresponding topic ontology for the user's query demonstrating the effectiveness of the proposed approach.
... algorithms for knowledge base linking (Pavel and Euzenat 2013): here, we build upon simple, yet high-performing previous approaches to linking LRs that achieved state-of-the-art performance. These rely at their core on computing the overlap between the bags of words built from the LRs' concept lexicalizations, e.g., (Navigli and Ponzetto 2012a;Gurevych et al. 2012) (inter alia). ...
Preprint
We present an approach to combining distributional semantic representations induced from text corpora with manually constructed lexical-semantic networks. While both kinds of semantic resources are available with high lexical coverage, our aligned resource combines the domain specificity and availability of contextual information from distributional models with the conciseness and high quality of manually crafted lexical networks. We start with a distributional representation of induced senses of vocabulary terms, which are accompanied with rich context information given by related lexical items. We then automatically disambiguate such representations to obtain a full-fledged proto-conceptualization, i.e. a typed graph of induced word senses. In a final step, this proto-conceptualization is aligned to a lexical ontology, resulting in a hybrid aligned resource. Moreover, unmapped induced senses are associated with a semantic type in order to connect them to the core resource. Manual evaluations against ground-truth judgments for different stages of our method as well as an extrinsic evaluation on a knowledge-based Word Sense Disambiguation benchmark all indicate the high quality of the new hybrid resource. Additionally, we show the benefits of enriching top-down lexical knowledge resources with bottom-up distributional information from text for addressing high-end knowledge acquisition tasks such as cleaning hypernym graphs and learning taxonomies from scratch.
... Researchers have also explored the use of knowledge bases to create and utilize semantically aware associations in the trace creation process -where a knowledge base include basic domain terms and sentences that describe the relationships between those terms [36], [27], [62]. Data is typically represented as an ontology in which relationships are represented using AND, OR, implication, and negation operators [36]. ...
Preprint
In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links, however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.
... The second aspect focuses on knowledge graph fusion. Overall, Shvaiko et al. [4] reviewed the current state and challenges in the field of ontology matching. They introduced various matching methods, providing a detailed analysis of the advantages and disadvantages of existing techniques, such as string matching, linguistic methods, structural matching, and semantic matching. ...
Article
Full-text available
This paper mainly explores the application of artificial intelligence (AI) technologies in knowledge graphs (KGs), focusing on how natural language processing (NLP), machine learning, and deep learning methods can achieve the automated construction of KGs. First, the paper introduces the basic concepts of KGs and the limitations of traditional construction methods. Then, it analyzes recent technological advancements in knowledge graph construction, data fusion, and reasoning, with particular emphasis on the application of graph convolutional neural networks (GCNs) in handling multi-relational data. Finally, the practical applications of KGs in business analytics, healthcare information systems, and recommendation systems are discussed, demonstrating their broad potential in data management and reasoning.
... Euzenat and Shivaiko [35] cover a number of systems that employ a similar mix of structure-based and lexical approaches. These include, for example, SAMBO [36], which combines terminological matching (n-gram, edit distance, and comparison of word lists), structural analysis of is-a and part-of hierarchies, and background knowledge; Falcon [37], which uses a divide-and-conquer approach where structural proximity between concepts is the basis for the subsequent partition ontologies into smaller clusters; RiMOM [38], which employs linguistic data from WordNet as background knowledge; and a variation of Similarity Flooding Algorithm [39] to assess structural similarity. ...
Article
Full-text available
Knowledge representation and manipulation in knowledge-based systems typically rely on ontologies. The aim of this work is to provide a novel weak unification-based method and an automatic tool for OWL ontology merging to ensure well-coordinated task completion in the context of collaborative agents. We employ a technique based on integrating string and semantic matching with the additional consideration of structural heterogeneity of concepts. The tool is implemented in Prolog and makes use of its inherent unification mechanism. Experiments were run on an OAEI data set with a matching accuracy of 60% across 42 tests. Additionally, we ran the tool on several ontologies from the domain of robotics. producing a small, but generally accurate, set of matched concepts. These results clearly show a good capability of the method and the tool to match semantically similar concepts. The results also highlight the challenges related to the evaluation of ontology-merging algorithms without a definite ground truth.
... Also known as ontology alignment, ontology matching [19][20][21][22][23][24][25][26][27] is the process of finding semantic correspondences (mainly similarities) among entities from different ontologies. Each type of entity (classes, object properties, datatype properties, and instances) is normally matched in isolation, so class-to-property, class-to-individual, or property-to-individual correspondences are not addressed. ...
Article
Full-text available
Data integration is considered a classic research field and a pressing need within the information science community. Ontologies play a critical role in such processes by providing well-consolidated support to link and semantically integrate datasets via interoperability. This paper approaches data integration from an application perspective by looking at ontology matching techniques. As the manual matching of different sources of information becomes unrealistic once the system scales up, the automation of the matching process becomes a compelling need. Therefore, we have conducted experiments on actual non-semantically enriched relational data with the support of existing tools (pre-LLM technology) for automatic ontology matching from the scientific community. Even considering a relatively simple case study—i.e., the spatio–temporal alignment of macro indicators—outcomes clearly show significant uncertainty resulting from errors and inaccuracies along the automated matching process. More concretely, this paper aims to test on real-world data a bottom-up knowledge-building approach, discuss the lessons learned from the experimental results of the case study, and draw conclusions about uncertainty and uncertainty management in an automated ontology matching process. While the most common evaluation metrics clearly demonstrate the unreliability of fully automated matching solutions, properly designed semi-supervised approaches seem to be mature for more generalized application.
... One noteworthy strategy revolves around the utilisation of recurrent neural networks (RNNs) for the purpose of timeseries analysis concerning the dissemination of diseases [20][21][22][23]. Building upon the SIR model discussed previously, RNNs come into play to capture the temporal dependencies within disease data, culminating in substantially more precise predictions [24]. ...
Article
Full-text available
With the current COVID‐19 pandemic, sophisticated epidemiological surveillance systems are more important than ever because conventional approaches have not been able to handle the scope and complexity of this global emergency. In response to this challenge, the authors present the state‐of‐the‐art SEIR‐Driven Semantic Integration Framework (SDSIF), which leverages the Internet of Things (IoT) to handle a variety of data sources. The primary innovation of SDSIF is the development of an extensive COVID‐19 ontology, which makes unmatched data interoperability and semantic inference possible. The framework facilitates not only real‐time data integration but also advanced analytics, anomaly detection, and predictive modelling through the use of Recurrent Neural Networks (RNNs). By being scalable and flexible enough to fit into different healthcare environments and geographical areas, SDSIF is revolutionising epidemiological surveillance for COVID‐19 outbreak management. Metrics such as Mean Absolute Error (MAE) and Mean sqḋ Error (MSE) are used in a rigorous evaluation. The evaluation also includes an exceptional R‐squared score, which attests to the effectiveness and ingenuity of SDSIF. Notably, a modest RMSE value of 8.70 highlights its accuracy, while a low MSE of 3.03 highlights its high predictive precision. The framework's remarkable R‐squared score of 0.99 emphasises its resilience in explaining variations in disease data even more.
... Regardless of the storage paradigm employed (e.g., structured through SQL, semi-structured via No-SQL), all the paradigms support specific domain modeling, incorporating entities, their relationships, and their attributes. To execute a successful migration, several key factors must be considered: (1) understanding the structure of both the source and target information systems, (2) establishing a mapping that defines the correspondence between entities, and (3) identifying the precise steps necessary to migrate all information while minimizing potential data loss [30]. The usual approach to achieve this migration is through the design and implementation of scripts [23]. ...
... Conventional approaches for mapping entities between KGs include concept-level matching [25][26][27], instance-level matching [28,29], or a combination ...
Article
Full-text available
Entity alignment plays an essential role in the integration of knowledge graphs (KGs) as it seeks to identify entities that refer to the same real-world objects across different KGs. Recent research has primarily centred on embedding-based approaches. Among these approaches, there is a growing interest in graph neural networks (GNNs) due to their ability to capture complex relationships and incorporate node attributes within KGs. Despite the presence of several surveys in this area, they often lack comprehensive investigations specifically targeting GNN-based approaches. Moreover, they tend to evaluate overall performance without analysing the impact of individual components and methods. To bridge these gaps, this paper presents a framework for GNN-based entity alignment that captures the key characteristics of these approaches. We conduct a fine-grained analysis of individual components and assess their influences on alignment results. Our findings highlight specific module options that significantly affect the alignment outcomes. By carefully selecting suitable methods for combination, even basic GNN networks can achieve competitive alignment results.
... An analysis of the existing literature revealed no relevant deviations or innovations with respect to standard alignment and integration approaches or techniques [56] for the Materials Science area. Notably, automatic techniques used in many contexts (e.g. ...
Article
Full-text available
The growing complexity and interdisciplinary nature of Materials Science research demand efficient data management and exchange through structured knowledge representation. Domain-Level Ontologies (DLOs) for Materials Science have emerged as a valuable tool for describing materials properties, processes, and structures, enabling effective data integration, interoperability, and knowledge discovery. However, the harmonization of DLOs, and, more generally, the establishment of fully interoperable multi-level ecosystems, remains a challenge due to various factors, including the diverse landscape of existing ontologies. This work provides a comprehensive overview of the state-of-the-art in DLOs for Materials Science, highlighting their main features and purposes. More than 40 DLOs in Materials Science are considered. Furthermore, an alignment methodology including both manual and automated steps, making use of Top-Level Ontologies’ (TLO) capability of promoting interoperability, and revolving around the engineering of FAIR standalone entities acting as minimal data pipelines (“bridge concepts”), is presented. A proof of concept is also provided. The primary aspiration of this undertaking is to make a meaningful contribution towards the establishment of a unified ontology framework for Materials Science, facilitating more effective data integration and fostering interoperability across Materials Science subdomains.
... However, the size of this domain often requires using several ontologies whose elements are linked through mappings. Mappings are the materialization of semantic relations between elements of interrelated ontologies [1]. ...
Article
Full-text available
Background Biomedical computational systems benefit from ontologies and their associated mappings. Indeed, aligned ontologies in life sciences play a central role in several semantic-enabled tasks, especially in data exchange. It is crucial to maintain up-to-date alignments according to new knowledge inserted in novel ontology releases. Refining ontology mappings in place, based on adding concepts, demands further research. Results This article studies the mapping refinement phenomenon by proposing techniques to refine a set of established mappings based on the evolution of biomedical ontologies. In our first analysis, we investigate ways of suggesting correspondences with the new ontology version without applying a matching operation to the whole set of ontology entities. In the second analysis, the refinement technique enables deriving new mappings and updating the semantic type of the mapping beyond equivalence. Our study explores the neighborhood of concepts in the alignment process to refine mapping sets. Conclusion Experimental evaluations with several versions of aligned biomedical ontologies were conducted. Those experiments demonstrated the usefulness of ontology evolution changes to support the process of mapping refinement. Furthermore, using context in ontological concepts was effective in our techniques.
... The ontology alignment process at the element and structure levels can be classified according to the type of input information based on 1) instance-based matching, 2) schema-based matching, 3) instance and schema-based matching, and 4) usage-based matching (Anam & Kim, n.d.; Shvaiko & Euzenat, 2013). Instance-based matching relies on instance similarity, so it is of high quality to match. ...
Article
Full-text available
Higher education today operates in a globally competitive environment. Competition is increasingly focused on quality. The quality of higher education reflects the relationship of higher education with users. Higher education uses various standards in the internal quality assurance system. It makes improvements in performance, features, suitability, reliability, durability, service, responsiveness, aesthetics, and reputation to support the progress of the quality of its performance. Implementing standards requires excellent effort because it requires quality fulfillment, and the satisfaction of each standard criterion requires internal and external audit processes. Standard alignment is needed for cost efficiency in implementing standards. Standard alignment can be done with ontology alignment technology. However, before applying ontology alignment, each ontology standard is needed. This study aims to explore literature that has implemented ontology and ontology alignment in education. This study aims to find out whether there has been research on the alignment of educational quality standards. The results of this study show that ontology has been applied in education, namely on the topics of curriculum, e-learning, learning assessment, system integration, syllabus, learning style, service, and accreditation. The implementation of ontology alignment has been carried out on the topics of Profile Learning, Learning Design, E-Learning, Curriculum, and System Integration. While the application of ontology or ontology alignment on educational quality standards has yet to be found by research that discusses it, quality standards have been applied to quality management models based on ISO 9000 requirements and software quality standards based on CMMI standards.
... The focus of many matching systems focuses on combining and extending the known methods. There are a number of popular matching algorithms, such as edit distance, WordNet matchers and iterative similarity matchers [46], [47]. It is important to note that there are different methods for ontology matching and ontology merging, and here only one of the most suitable methods for implementation is discussed. ...
Article
Full-text available
With the ongoing digital transformation and multi-domain interaction occurring in the buildings, a huge amount of heterogeneous data is generated and stored on a daily basis. To take advantage of the gathered data and help better decision makings, suitable methods are needed to meet the demand for building operations and reinvestment planning. Ontology, which provides not only the vocabulary of a certain domain but also the relationship between each other has been used in multiple engineering fields to manage heterogeneous data. A plethora of ontology development methodologies have been developed in the last decade, whereas those methods are still really time-consuming and in a low degree of automation. In this paper, we approach the problem by first presenting a semi-automatic ontology development framework that integrates existing automatic ontology tools and reuses existing ontology and data model. Based on this framework, we create a building energy management ontology and evaluate the data coverage of several real-life data sets.
... For example, if two concepts are semantically synonymous but defined differently in their terminological definitions, the similarity between the two is not captured and a reasoner would fail to find the match between the two concepts. Moreover, knowledge-based semantic matchmaking approaches are complex, require long design time, demand high maintenance to keep the knowledge-base up to date, and are associated with long processing time [16], [17]. ...
Chapter
Full-text available
This chapter debates on the use of Machine Learning (ML) to support edge-based semantic matchmaking to handle a large-scale integration of IoT data sources with IoT platforms. The chapter starts by addressing the interop-erability challenges currently faced by integrators, the role of ontologies in this context. It continues with a perspective on semantic matchmaking approaches, and ML solutions that can best support a cognitive matchmaking. The chapter then covers a use case and pilots that are being developed with a new open-source middleware, TSMatch, in the context of the Horizon 2020 EFPF project, for the purpose of environmental monitoring in smart manufacturing.
... It serves as a common vocabulary for researchers and forms the basis for semantic interoperability among various systems. OM is the process of identifying semantically equivalent entities (such as classes or properties) from different ontologies [22]. The output of an OM process is the a set of correspondences, so-called the ontology alignment, where each correspondence is a triple < e 1 , e 2 , r, con f > indicating that the relationship r holds between the entities e 1 and e 2 from the two different ontologies, the confidence of holding this relationship is con f . ...
Article
Full-text available
Ontology serves as a central technique in the semantic web to elucidate domain knowledge. The challenge of dealing with the heterogeneity introduced by diverse domain ontologies necessitates ontology matching, a process designed to identify semantically interconnected entities within these ontologies. This task is inherently complex due to the broad, diverse entities and the rich semantics inherent in vocabularies. To tackle this challenge, we bring forth a new interactive ontology matching method with local and global similarity deviations (IOM-LGSD) for ontology matching, which consists of three novel components. First, a local and global similarity deviation (LGSD) metrics are presented to measure the consistency of similarity measures (SMs) and single out the less consistent SMs for user validation. Second, we present a genetic algorithm (GA) based SM selector to evolve the SM subsets. Lastly, a problem-specific induced ordered weighting aggregating (IOWA) operator based SM aggregator is proposed to assess the quality of selected SMs. The experiment evaluates IOM-LGSD with the ontology alignment evaluation initiative (OAEI) Benchmark and three real-world sensor ontologies. The evaluation underscores the effectiveness of IOM-LGSD in efficiently identifying high-quality ontology alignments, which consistently outperforms comparative methods in terms of effectiveness and efficiency.
... Ontology alignment (OA), also referred to as ontology matching, is a central task in semantic web technologies that aims to find semantic correspondences between two ontologies with overlapping domains. As using ontologies is extending to many different fields, this task's importance is increasing, so ontology matching is required for bridging the semantic gap between various ontologies [1]. Although OA already looks back to many years of research, the task remains challenging, often requiring expert intervention to ensure accurate results. ...
Preprint
Full-text available
This study evaluates the applicability and efficiency of ChatGPT for ontology alignment using a naive approach. ChatGPT's output is compared to the results of the Ontology Alignment Evaluation Initiative 2022 campaign using conference track ontologies. This comparison is intended to provide insights into the capabilities of a conversational large language model when used in a naive way for ontology matching, and to investigate the potential advantages and disadvantages of this approach.
... In the rest of this chapter, we assume structural homogeneity as well, except where explicitly indicated. In practice, this assumption is not problematic because, even in the rare case of matching entities between PKGs with drastically different ontologies, an ontology matching solution can be applied as a first step to homogenize the data sources [72,104,117,123]. ...
Preprint
Full-text available
Entity Resolution (ER) is the problem of determining when two entities refer to the same underlying entity. The problem has been studied for over 50 years, and most recently, has taken on new importance in an era of large, heterogeneous 'knowledge graphs' published on the Web and used widely in domains as wide ranging as social media, e-commerce and search. This chapter will discuss the specific problem of named ER in the context of personal knowledge graphs (PKGs). We begin with a formal definition of the problem, and the components necessary for doing high-quality and efficient ER. We also discuss some challenges that are expected to arise for Web-scale data. Next, we provide a brief literature review, with a special focus on how existing techniques can potentially apply to PKGs. We conclude the chapter by covering some applications, as well as promising directions for future research.
... Hence, it is much easier to keep the ontological knowledge base up to date. To resolve the problem of semantic heterogeneity, where the same facts are expressed in different ways, AI-CPPS uses specific ontology matching techniques to merge ontologies [43]. ...
Article
Full-text available
Digital transformation is both an opportunity and a challenge. To take advantage of this opportunity for humans and the environment, the transformation process must be understood as a design process that affects almost all areas of life. In this paper, we investigate AI-Based Self-Adaptive Cyber-Physical Process Systems (AI-CPPS) as an extension of the traditional CPS view. As contribution, we present a framework that addresses challenges that arise from recent literature. The aim of the AI-CPPS framework is to enable an adaptive integration of IoT environments with higher-level process-oriented systems. In addition, the framework integrates humans as actors into the system, which is often neglected by recent related approaches. The framework consists of three layers, i.e., processes, semantic modeling, and systems and actors, and we describe for each layer challenges and solution outlines for application. We also address the requirement to enable the integration of new networked devices under the premise of a targeted process that is optimally designed for humans, while profitably integrating AI and IoT. It is expected that AI-CPPS can contribute significantly to increasing sustainability and quality of life and offer solutions to pressing problems such as environmental protection, mobility, or demographic change. Thus, it is all the more important that the systems themselves do not become a driver of resource consumption.
Preprint
Several approaches are proposed to deal with the problem of the Automatic Schema Matching (ASM). The challenges and difficulties caused by the complexity and uncertainty characterizing both the process and the outcome of Schema Matching motivated us to investigate how bio-inspired emerging paradigm can help with understanding, managing, and ultimately overcoming those challenges. In this paper, we explain how we approached Automatic Schema Matching as a systemic and Complex Adaptive System (CAS) and how we modeled it using the approach of Agent-Based Modeling and Simulation (ABMS). This effort gives birth to a tool (prototype) for schema matching called Reflex-SMAS. A set of experiments demonstrates the viability of our approach on two main aspects: (i) effectiveness (increasing the quality of the found matchings) and (ii) efficiency (reducing the effort required for this efficiency). Our approach represents a significant paradigm-shift, in the field of Automatic Schema Matching.
Chapter
The exponential growth of heterogeneous data from diverse sources, such as social media, IoT sensors, and transactional databases, poses significant challenges for effective processing and analysis. This data, often characterized by poor quality, diverse formats, and complex structures, prevents its utilization for extracting valuable insights and supporting informed decision-making. Machine learning (ML) emerges as a powerful tool to address these challenges by automating heterogeneous data processing tasks and enhancing data quality, integration, and analysis. In this context, this paper explores the contribution of machine learning methods to the different stages of the data management process: preparation, integration, and analytics. We aim to provide a comprehensive study of the role these methods play throughout the entire pipeline, as well as highlighting a set of challenges in this field.
Article
The Concept Model of Mission Space(CMMS) can be regarded as an ontology that systematically represents knowledge within a military domain. Ensuring the consistency of this ontology is crucial, and its consistency must be verifiable. This paper presents a case study on the consistency verification of the Korean CMMS, referred to as CMMS-K. The verification feature can detect inconsistencies such as duplication, missing links, and circular definitions within the ontology elements. This capability is achieved through the formality of the Methontology template, which provides a structured specification of ontology elements. The implemented feature demonstrates the practical ability to perform verification and highlights the future prospects of the Korean CMMS.
Chapter
Knowledge graphs have emerged as a powerful paradigm for organizing and integrating structured knowledge in smart digital libraries (SDLs). This chapter provides an overview of knowledge graphs, their key concepts, applications, and the underlying techniques involved in their construction and utilization. The role of knowledge graphs in SDLs is explored, highlighting their capacity to enhance discoverability, accuracy, and personalization of library services by integrating diverse collections, metadata, and external resources. The chapter delves into implementing knowledge graphs in library settings, discussing data modeling, technology selection, and the importance of collaboration among stakeholders. Various real-world applications and case studies are presented, showcasing the benefits of knowledge graphs in enhancing resource discovery, data integration, and user experience. Challenges associated with knowledge graph implementation, such as data complexity, scalability, and maintenance, are addressed, along with potential solutions and best practices. The chapter also explores the interoperability and integration of knowledge graphs with existing library systems and emphasizes the importance of maintenance, quality assurance, and continuous enhancement of knowledge graphs. Looking towards the future, the chapter discusses emerging trends and directions, including the impact of semantic web technologies, artificial intelligence, and machine learning on developing intelligent and user-centric information environments. Potential applications, such as intelligent knowledge discovery, decision support systems, automation, personalization, and collaboration, are highlighted, along with associated challenges and considerations. This chapter comprehensively overviews SDL knowledge graphs, their current applications, challenges, and future potential. It emphasizes their transformative role in enhancing knowledge organization, discovery, and user experience in the digital library landscape.
Thesis
Full-text available
Networks of Ontologies research deals with the need to combine several ontologies simultaneously. In a world of integrated systems (system of systems), isolated systems will be increasingly rare soon. Their integration creates opportunities to change, validate the information and add more value to an information system. This system of systems may contain ontologies to support the corresponding knowledge model. Consequently, new integration requirements may have to deal with network matching rather than single ontologies. This work delves into network matching and proposes new ways of approaching a particular case of significant ontologies matching. The work ́s contribution is the use of algebraic operations on networks to eliminate candidates before matching and to use stochastic search techniques to discover the relevant nodes. These nodes should be retained as they increase the accuracy and recovery of the final matching, even though they are identical and removed by the algebraic operation. To find out the particular relevance of each node, we propose a random walk combined with a frequent itemset approach that outperforms brute force approaches in processing time as the size of networks grows and has close precision. The approach was validated using networks of ontologies created from the OAEI ontologies. The approach selected entities to send to the matcher without losing significant preexisting matchings. Finally, two different matchers were used to obtain metrics and compare the results with the pairwise brute force approach
Article
The process of defining content from different ontologies is time-consuming, tedious and error-prone. To solve these problems, new methods for ontology comparison have been developed. This process focuses on the integration of ontologies for various applications, but also requires maintaining the integrity of the integrated ontologies. The concept ontology associated with integration is designed to be more efficient, accurate and useful. You can combine these two and compete together to increase accuracy. Comparing this approach with existing methods should provide greater accuracy and efficiency in ontology comparison.
Thesis
Full-text available
A predominant kind of software application is mashups of third-party web services. The third-party services are usually black-boxes that have private source code and public Application Programming Interfaces (APIs) with operations and parameters. Requesters of such black-box services are developers who compose services into mashups. Today, the creation of mashups is manual and thereby inefficient because the services requested and offered are insufficiently specified. Specifically, three main problems make the mashup creation inefficient: (1) Finding suitable services is cumbersome because request and API specification often mismatch due to terminological heterogeneity. This makes search algorithms ineffective in finding relevant APIs. (2) API specifications lack API protocols, which makes it impossible for requesters to determine all operation call sequences that are required for their mashup. Operations often are not used in isolation, as they have control or data flow interdependencies. (3) Enabling data communication inside the mashups is laborious for the requesters because different APIs use different parameter names and have incompatible data types and formats. This dissertation introduces Brokerage as a Service (BaaS) that pursues these objectives: (1) Resolving the terminological heterogeneity between requests and API specifications by linking them to a global ontology. Terminological normalization improves the effectiveness of finding relevant APIs. A systematic method to choose the most effective techniques to link APIs and ontologies is presented. (2) Deriving operation dependencies by mining API protocols from call-logs. The languages OWL-S, BPMN, and WS-BPEL are examined to identify control constructs needed to describe API protocols and which of these control constructs can be discovered through process mining. It is analyzed which mining algorithms are suitable for deriving API protocols. (3) Making APIs interoperable by generating glue code from parameter mappings. The code generator emits executable program code that makes API calls, extracts relevant parameters from requests and responses, and translates the data. In summary, this dissertation shows how to facilitate mashup creation by adding missing information to requests and API specifications and how to make APIs interoperable.
Article
Domain knowledge is gradually renovating its attributes to exhibit distinct features in autonomy, propelled by the shift of modern transportation systems (TS) towards autonomous TS (ATS) comprising three progressive generations. Knowledge graph (KG) and its corresponding versions can help depict the evolving TS. Given that KG versions exhibit asymmetry primarily due to variations in evolved knowledge, it is imperative to harmonize the evolved knowledge embodied by the entity across disparate KG versions. Hence, this paper proposes a siamese-based graph convolutional network (GCN) model, namely SiG, to address unresolved issues of low accuracy, efficiency, and effectiveness in aligning asymmetric KGs. SiG can optimize entity alignment in ATS and support the analysis of future-stage ATS development. Such a goal is attained through: a) generating unified KGs to enhance data quality, b) defining graph split to facilitate entire-graph computation, c) enhancing GCN to extract intrinsic features, and d) designing siamese network to train asymmetric KGs. The evaluation results suggest that SiG surpasses other commonly employed models, resulting in average improvements of 23.90% and 37.89% in accuracy and efficiency, respectively. These findings have significant implications for TS evolution analysis and offer a novel perspective for research on complex systems limited by continuously updated knowledge.
Chapter
The aggregation, fusion, sharing, opening, development, and utilization of public data provide a solid basis for promoting the development of e-government and digital economy. These activities rely on an infrastructure software called public data open platform (PDOP) to provide enabling services. While China’s national PDOP has yet to be completed, one pathway is to integrate the existing hundreds of provincial-level and prefectural-level PDOPs. However, these local PDOPs exhibit high heterogeneity, e.g., using different data catalogs and different metadata formats. In this system paper, we meet the challenge by crawling and integrating metadata records for datasets registered in existing local PDOPs, and we develop a prototype PDOP that provides unified search services over the integrated metadata. We conduct experiments to evaluate the core components of our prototype.
Article
Ontologies are the prime way of organizing data in the Semantic Web. Often, it is necessary to combine several, independently developed ontologies to obtain a complete representation of a domain of interest. The complementarity of existing ontologies can be leveraged by merging them. Existing approaches for ontology merging mostly implement a binary merge. However, with the growing number and size of relevant ontologies across domains, scalability becomes a central challenge. A multi-ontology merging technique offers a potential solution to this problem. We present Co Merger, a scalable multiple ontologies merging method. It takes as input a set of source ontologies and existing mappings across them and generates a merged ontology. For efficient processing, rather than successively merging complete ontologies pairwise, we group related concepts across ontologies into partitions and merge first within and then across those partitions. In both steps, user-specified subsets of generic merge requirements (GMRs) are taken into account and used to optimize outputs. The experimental results on well-known datasets confirm the feasibility of our approach and demonstrate its superiority over binary strategies. A prototypical implementation is freely accessible through a live web portal.
Chapter
Integrating heterogeneous and complementary data in clinical decision support systems (e.g., electronic health records, drug databases, scientific articles, etc.) could improve the accuracy of these systems. Based on this finding, the PreDiBioOntoL (Predicting Clinical Diagnosis by combining BioMedical Ontologies and Language Models) project aims at developing a computer-aided clinical and predictive diagnosis tool to help clinicians to better handle their patients. This tool will combine deep neural networks trained on heterogeneous data sources and biomedical ontologies. The first obtained results of PreDiBioOntoL are presented in this paper. We propose new siamese neural models (BioSTransformers and BioS-MiniLM) that embed texts to be compared in a vector space and then find their similarities. The models optimize an objective self-supervised contrastive learning function on articles from the scientific literature (MEDLINE bibliographic database) associated with their MeSH (Medical Subject Headings) keywords. The obtained results on several benchmarks show that the proposed models can solve different biomedical tasks without examples (zero-shot). These results are comparable to those of other biomedical transformers that are fine-tuned on supervised data specific to the problems being addressed. Moreover, we show in this paper how these new siamese models are exploited in order to semantically map entities from several biomedical ontologies.
Book
Full-text available
The process of user-centered innovation: how it can benefit both users and manufacturers and how its emergence will bring changes in business models and in public policy. Innovation is rapidly becoming democratized. Users, aided by improvements in computer and communications technology, increasingly can develop their own new products and services. These innovating users—both individuals and firms—often freely share their innovations with others, creating user-innovation communities and a rich intellectual commons. In Democratizing Innovation, Eric von Hippel looks closely at this emerging system of user-centered innovation. He explains why and when users find it profitable to develop new products and services for themselves, and why it often pays users to reveal their innovations freely for the use of all.The trend toward democratized innovation can be seen in software and information products—most notably in the free and open-source software movement—but also in physical products. Von Hippel's many examples of user innovation in action range from surgical equipment to surfboards to software security features. He shows that product and service development is concentrated among "lead users," who are ahead on marketplace trends and whose innovations are often commercially attractive. Von Hippel argues that manufacturers should redesign their innovation processes and that they should systematically seek out innovations developed by users. He points to businesses—the custom semiconductor industry is one example—that have learned to assist user-innovators by providing them with toolkits for developing new products. User innovation has a positive impact on social welfare, and von Hippel proposes that government policies, including R&D subsidies and tax credits, should be realigned to eliminate biases against it. The goal of a democratized user-centered innovation system, says von Hippel, is well worth striving for. An electronic version of this book is available under a Creative Commons license.
Conference Paper
Full-text available
The Linked Open Data (LOD) is a major milestone towards realizing the Semantic Web vision, and can enable applications such as robust Question Answering (QA) systems that can answer queries requiring multiple, disparate information sources. However, realizing these applications requires relationships at both the schema and instance level, but currently the LOD only provides relationships for the latter. To address this limitation, we present a solution for automatically finding schema-level links between two LOD ontologies – in the sense of ontology alignment. Our solution, called BLOOMS+, extends our previous solution (i.e. BLOOMS) in two significant ways. BLOOMS+ 1) uses a more sophisticated metric to determine which classes between two ontologies to align, and 2) considers contextual information to further support (or reject) an alignment. We present a comprehensive evaluation of our solution using schema-level mappings from LOD ontologies to Proton (an upper level ontology) – created manually by human experts for a real world application called FactForge. We show that our solution performed well on this task. We also show that our solution significantly outperformed existing ontology alignment solutions (including our previously published work on BLOOMS) on this same task.
Article
Full-text available
Data integration systems often provide a uniform interface, called a mediated schema, to a multitude of disparate data sources. To answer user queries posed over the mediated schema, such systems employ a set of semantic matches be-tween this schema and the local schemas of the data sources. Finding such matches is well known to be difficult. Hence much work has focused on developing semi-automatic tech-niques to efficiently find the matches. In this paper, how-ever, we consider the complementary problem of improving the mediated schema, to make finding such matches easier. Specifically, a mediated schema S will typically be matched with many source schemas. Thus, can the developer of S analyze and revise S in a way that preserves S's semantics, and yet makes it easier to match with in the future? We describe mSeer, a solution to this problem. Given a mediated schema S, mSeer first computes a matchabil-ity score that quantifies how well S can be matched against. Next, mSeer generates a matchability report that shows where the problems in matching S come from. Finally, mSeer auto-matically suggests changes to S (e.g., renaming an attribute, reformatting data values, etc.) that it believes will preserve the semantics of S and yet make it more amenable to match-ing. The creator of S is free to accept or revise the changes suggested by mSeer. We present extensive experiments over several real-world domains that demonstrate the effective-ness of our approach.
Article
Full-text available
The need for the establishment of evaluation methods that can measure respective improvements or degradations of ontological models-e.g. yielded by a precursory ontology population stage-is undisputed. We propose an evaluation scheme that allows to employ a number of different ontologies and to measure their performance on specific tasks. In this paper we present the resulting task-based approach for quantitative ontology evaluation, which also allows for a bootstrapping approach to ontology population. Benchmark tasks commonly feature a so-called gold-standard defining perfect performance. By selecting ontology-based approaches for the respective tasks, the ontology-dependent part of the performance can be measured. Following this scheme, we present the results of an experiment for testing and incrementally augmenting ontologies using a well-defined benchmark problem based on a evaluation gold-standard.
Article
Full-text available
Using ontology as a background knowledge in ontology match- ing is being actively investigated. Recently the idea attracted attention because of the growing number of available ontologies, which in turn opens up new opportunities, and reduces the problem of nding candi- date background knowledge. Particularly interesting is the approach of using multiple ontologies as background knowledge, which we explore in this paper. We report on an experimental study conducted using real-life ontologies published online. The rst contribution of this paper is an exploration about how the matching performance behaves when multiple background ontologies are used cumulatively. As a second contribution, we analyze the impact that dierent types of background ontologies have to the matching perfor- mance. With respect to the precision and recall, more background knowl- edge monotonically increases the recall, while the precision depends on the quality of the added background ontology, with high quality tending to increase, and the low quality tending to decrease the precision.
Article
Full-text available
This paper describes Orchid, a system that converts declarative mapping specifications into data flow specifications (ETL jobs) and vice versa. Orchid provides an abstract operator model that serves as a common model for both transformation paradigms; both mappings and ETL jobs are transformed into instances of this common model. As an additional benefit, instances of this common model can be optimized and deployed into multiple target environments. Orchid is being deployed in FastTrack, a data transformation toolkit in IBM Information Server.
Article
Full-text available
Recently, the number of ontology matching techniques and systems has increased significantly. This makes the issue of their evaluation and comparison more severe. One of the challenges of the ontology matching evaluation is in building large-scale evaluation datasets. In fact, the number of possible correspondences between two ontologies grows quadratically with respect to the numbers of entities in these ontologies. This often makes the manual construction of the evaluation datasets demanding to the point of being infeasible for large-scale matching tasks. In this paper, we present an ontology matching evaluation dataset composed of thousands of matching tasks, called TaxME2. It was built semi-automatically out of the Google, Yahoo, and Looksmart web directories. We evaluated TaxME2 by exploiting the results of almost two-dozen of state-of-the-art ontology matching systems. The experiments indicate that the dataset possesses the desired key properties, namely it is error-free, incremental, discriminative, monotonic, and hard for the state-of-the-art ontology matching systems.
Article
Full-text available
In this paper we focus on the semantic heterogeneity problem as one of the main challenges in current Spatial Data Infrastructures (SDIs). We first report on the state of the art in reducing such a heterogeneity in SDIs. We then consider a particular geo-service integration scenario. We discuss an approach of how to semantically coordinate geographic services, which is based on a view of the semantics of web service coordination, implemented by using the Lightweight Coordination Calculus (LCC) language. In this approach, service providers share explicit knowledge of the interactions in which their services are engaged and these models of interaction are used operationally as the anchor for describing the semantics of the interaction. We achieve web service discovery and integration by using semantic matching between particular interactions and web service descriptions. For this purpose we introduce a specific solution, called structure preserving semantic matching. We present a real world application scenario to illustrate how semantic integration of geo web services can be performed by using this approach. Finally, we provide a preliminary evaluation of the solution discussed.
Chapter
Full-text available
Existing mature ontology engineering approaches are based on some basic assumptions that are often neglected in practice. Ontologies often need to be built in a decentralized way, ontologies must be given to a community in a way such that individuals have partial autonomy over them, ontologies have a life cycle that involves an iteration back and forth between construction/modification and use and ontologies should support the participation of non-expert users in ontology engineering processes. While recently there have been some initial proposals to consider these issues, they lack the appropriate rigor of mature approaches. i.e. these recent proposals lack the appropriate depth of methodological description, which makes the methodology usable, and they lack a proof of concept by concrete cases studies. In this paper, we describe the DILIGENT methodology that takes decentralization, partial autonomy, iteration and non-expert builders into account and we demonstrate its proof-ofconcept in two real-world organizational case studies.
Chapter
Full-text available
Matching of concepts describing the meaning of data in heterogeneous distributed information sources, such as database schemas and other metadata models, grouped here under the heading of an ontology, is one of the basic operations of semantic heterogeneity reconciliation. The aim of this chapter is to motivate the need for ontology matching, introduce the basics of ontology matching, and then discuss several promising themes in the area as reflected in recent research works. In particular, we focus on such themes as uncertainty in ontology matching, matching ensembles, and matcher self-tuning. Finally, we outline some important directions for future research.
Chapter
Full-text available
We introduces oPLMap, a formal framework for automatically learning mapping rules between heterogeneous Web directories, a crucial step towards integrating ontologies and their instances in the Semantic Web. This approach is based on Horn predicate logics and probability theory, which allows for dealing with uncertain mappings (for cases where there is no exact correspondence between classes), and can be extended towards complex ontology models. Different components are combined for finding suitable mapping candidates (together with their weights), and the set of rules with maximum matching probability is selected. Our system oPLMap with different variants has been evaluated on a large test set.
Conference Paper
Full-text available
Ontology matching plays a key role for semantic interoperability. Many methods have been proposed for automatically finding the alignment between heterogeneous ontologies. However, in many real-world applications, finding the alignment in a completely automatic way is highly infeasible. Ideally, an ontology matching system would have an interactive interface to allow users to provide feedbacks to guide the automatic algorithm. Fundamentally, we need answer the following questions: How can a system perform an efficiently interactive process with the user? How many interactions are sufficient for finding a more accurate matching? To address these questions, we propose an active learning framework for ontology matching, which tries to find the most informative candidate matches to query the user. The user’s feedbacks are used to: 1) correct the mistake matching and 2) propagate the supervise information to help the entire matching process. Three measures are proposed to estimate the confidence of each matching candidate. A correct propagation algorithm is further proposed to maximize the spread of the user’s “guidance”. Experimental results on several public data sets show that the proposed approach can significantly improve the matching accuracy (+8.0% better than the baseline methods).
Conference Paper
Full-text available
In a semantic P2P network, peers use separate ontologies and rely on alignments between their ontologies for translating queries. Nonetheless, alignments may be limited —unsound or incomplete— and generate flawed translations, leading to unsatisfactory answers. In this paper we present a trust mechanism that can assist peers to select those in the network that are better suited to answer their queries. The trust that a peer has towards another peer depends on a specific query and represents the probability that the latter peer will provide a satisfactory answer. In order to compute trust, we exploit both alignments and peers’ direct experience, and perform Bayesian inference. We have implemented our technique and conducted an evaluation. Experimental results showed that trust values converge as more queries are sent and answers received. Furthermore, the use of trust improves both precision and recall.
Conference Paper
Full-text available
We present the Auto Mapping Core (AMC), a new framework that supports fast construction and tuning of schema matching approaches for specific domains such as ontology alignment, model matching or database-schema matching. Distinctive features of our framework are new visualisation techniques for modelling matching processes, stepwise tuning of parameters, intermediate result analysis and performance-oriented rewrites. Furthermore, existing matchers can be plugged into the framework to comparatively evaluate them in a common environment. This allows deeper analysis of behaviour and shortcomings in existing complex matching systems.
Article
Full-text available
Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces. To better assist the user in matching complex schemas, we have developed a new generic schema matching tool, COMA++, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas.
Article
Full-text available
Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper, we study clustering algorithms for data with boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate for boolean and categorical attributes. Instead, we propose a novel concept of links to measure the similarity/proximity between a pair of data points. We develop a robust hierarchical clustering algorithm ROCK that employs links and not distances when merging clusters. Our methods naturally extend to non-metric similarity measures that are relevant in situations where a domain expert/similarity table is the only source of knowledge. In addition to presenting detailed complexity results for ROCK, we also conduct an experimental study with real-life as well as synthetic data sets to demonstrate the effectiveness of our techniques. For data with categorical attributes, our findings indicate that ROCK not only generates better quality clusters than traditional algorithms, but it also exhibits good scalability properties.
Conference Paper
Full-text available
The central problems w.r.t. interoperability and data inte- gration issues in the Semantic Web are schema and ontology matching approaches. Today it takes an expert to determine the best algorithm and a decision can usually be made only after experimentation, so as both the necessary scaling and o-the-shelf use of matching algorithms are not possible. To tackle these issues, we present a rule-based evaluation method in which the best algorithms are determined semi-automatically and the selection performs prior to the execution of an algorithm.
Conference Paper
Full-text available
It is increasingly important to develop scalable integration techniques for the growing number of XML data sources. A practical starting point for the integration of large numbers of Document Type Definitions (DTDs) of XML sources would be to first find clusters of DTDs that are similar in structure and semantics. Reconciling similar DTDs within such a cluster will be an easier task than reconciling DTDs that are different in structure and semantics as the latter would involve more restructuring. We introduce XClust, a novel integration strategy that involves the clustering of DTDs. A matching algorithm based on the semantics, immediate descendents and leaf-context similarity of DTD elements is developed. Our experiments to integrate real world DTDs demonstrate the effectiveness of the XClust approach.
Conference Paper
Full-text available
In this paper we describe a novel approach to the visualization of the mapping between two schemas. Current approaches to visually defining such a mapping fail when the schemas or maps become large. The new approach uses various information visualization techniques to simplify the view, making it possible for users to effectively deal with much larger schemas and maps. A user study verifies that the new approach is useful, usable, and effective. The primary contribution is a demonstration of novel ways to effectively present highly complex information.
Conference Paper
Full-text available
The Web of Data currently coming into existence through the Linked Open Data (LOD) effort is a major milestone in realizing the Semantic Web vi- sion. However, the development of applications based on LOD faces difficul- ties due to the fact that the different LOD datasets are rather loosely connected pieces of information. In particular, links between LOD datasets are almost exclu- sively on the level of instances, and schema-level information is being ignored. In this paper, we therefore present a system for finding schema-level links between LOD datasets in the sense of ontology alignment. Our system, called BLOOMS, is based on the idea of bootstrapping information already present on the LOD cloud. We also present a comprehensive evaluation which shows that BLOOMS outperforms state-of-the-art ontology alignment systems on LOD datasets. At the same time, BLOOMS is also competitive compared with these other systems on the Ontology Evaluation Alignment Initiative Benchmark datasets.
Conference Paper
Full-text available
Ontology matching consists of finding correspondences between ontology entities. OAEI campaigns aim at comparing ontology matching systems on precisely defined test sets. Test sets can use ontologies of different nature (from expressive OWL ontologies to simple directories) and use different modalities, e.g., blind evaluation, open evaluation, consensus. OAEI-2008 builds over previous campaigns by having 4 tracks with 8 test sets followed by 13 participants. Following the trend of previous years, more participants reach the forefront. The official results of the campaign are those published on the OAEI web site.
Article
Ontology mapping is seen as a solution provider in today's landscape of ontology research. As the number of ontologies that are made publicly available and accessible on the Web increases steadily, so does the need for applications to use them. A single ontology is no longer enough to support the tasks envisaged by a distributed environment like the Semantic Web. Multiple ontologies need to be accessed from several applications. Mapping could provide a common layer from which several ontologies could be accessed and hence could exchange information in semantically sound manners. Developing such mapping has beeb the focus of a variety of works originating from diverse communities over a number of years. In this article we comprehensively review and present these works. We also provide insights on the pragmatics of ontology mapping and elaborate on a theoretical approach for defining ontology mapping.
Conference Paper
A data-integration system provides access to a multitude of data sources through a single mediated schema. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the source schemas and the mediated schema. We describe LSD, a system that employs and extends current machine-learning techniques to semi-automatically find such mappings. LSD first asks the user to provide the semantic mappings for a small set of data sources, then uses these mappings together with the sources to train a set of learners. Each learner exploits a different type of information either in the source schemas or in their data. Once the learners have been trained, LSD finds semantic mappings for a new data source by applying the learners, then combining their predictions using a meta-learner. To further improve matching accuracy, we extend machine learning techniques so that LSD can incorporate domain constraints as an additional source of knowledge, and develop a novel learner that utilizes the structural information in XML documents. Our approach thus is distinguished in that it incorporates multiple types of knowledge. Importantly, its architecture is extensible to additional learners that may exploit new kinds of information. We describe a set of experiments on several real-world domains, and show that LSD proposes semantic mappings with a high degree of accuracy.
Book
Ontologies tend to be found everywhere. They are viewed as the silver bullet for many applications, such as database integration, peer-to-peer systems, e-commerce, semantic web services, or social networks. However, in open or evolving systems, such as the semantic web, different parties would, in general, adopt different ontologies. Thus, merely using ontologies, like using XML, does not reduce heterogeneity: it just raises heterogeneity problems to a higher level. Euzenat and Shvaiko’s book is devoted to ontology matching as a solution to the semantic heterogeneity problem faced by computer systems. Ontology matching aims at finding correspondences between semantically related entities of different ontologies. These correspondences may stand for equivalence as well as other relations, such as consequence, subsumption, or disjointness, between ontology entities. Many different matching solutions have been proposed so far from various viewpoints, e.g., databases, information systems, artificial intelligence. With **Ontology Matching**, researchers and practitioners will find a reference book which presents currently available work in a uniform framework. In particular, the work and the techniques presented in this book can equally be applied to database schema matching, catalog integration, XML schema matching and other related problems. The objectives of the book include presenting (i) the state of the art and (ii) the latest research results in ontology matching by providing a detailed account of matching techniques and matching systems in a systematic way from theoretical, practical and application perspectives.
Book
Requiring heterogeneous information systems to cooperate and communicate has now become crucial, especially in application areas like e-business, Web-based mash-ups and the life sciences. Such cooperating systems have to automatically and efficiently match, exchange, transform and integrate large data sets from different sources and of different structure in order to enable seamless data exchange and transformation. The book edited by Bellahsene, Bonifati and Rahm provides an overview of the ways in which the schema and ontology matching and mapping tools have addressed the above requirements and points to the open technical challenges. The contributions from leading experts are structured into three parts: large-scale and knowledge-driven schema matching, quality-driven schema mapping and evolution, and evaluation and tuning of matching tasks. The authors describe the state of the art by discussing the latest achievements such as more effective methods for matching data, mapping transformation verification, adaptation to the context and size of the matching and mapping tasks, mapping-driven schema evolution and merging, and mapping evaluation and tuning. The overall result is a coherent, comprehensive picture of the field.With this book, the editors introduce graduate students and advanced professionals to this exciting field. For researchers, they provide an up-to-date source of reference about schema and ontology matching, schema and ontology evolution, and schema merging.
Article
This paper addresses the problem of synthesizing ontology alignment methods by maximizing the social welfare within a group of interacting agents: Specifically, each agent is responsible for computing mappings concerning a specific ontology element, using a specific alignment method. Each agent interacts with other agents with whom it shares constraints concerning the validity of the mappings it computes. Interacting agents form a bipartite factor graph, composed of variable and function nodes, representing alignment decisions and utilities, respectively. Agents need to reach an agreement to the mapping of the ontology elements consistently to the semantics of specifications with respect to their mapping preferences. Addressing the synthesis problem in such a way allows us to use an extension of the max-sum algorithm to generate near-to-optimal solutions to the alignment of ontologies through local decentralized message passing. We show the potential of such an approach by synthesizing a number of alignment methods, studying their performance in the OAEI benchmark series.
Article
Web services are not the only application requiring ontology matching and mediation. Agent communication, peer-to-peer systems, etc. also need to find relationships between ontologies. However, they do not necessarily require the same kind of mediation as web services. In order to maximise the utility of the semantic web infrastructure, it seems reasonable to share the mediation services among these applications. To that extent we propose an infrastructure based on the reified notion of alignments and show how it can be used in these various cases.
Chapter
Within open, distributed and dynamic environments, agents frequently encounter and communicate with new agents and services that were previously unknown. However, to overcome the ontological heterogeneitywhich may exist within such environments, agents first need to reach agreement over the vocabulary and underlying conceptualisation of the shared domain, that will be used to support their subsequent communication.Whilst there are many existing mechanisms for matching the agents’ individual ontologies, some are better suited to certain ontologies or tasks than others, and many are unsuited for use in a real-time, autonomous environment. Agents have to agree on which correspondences between their ontologies are mutually acceptable by both agents. As the rationale behind the preferences of each agent may well be private, one cannot always expect agents to disclose their strategy or rationale for communicating. This prevents the use of a centralised mediator or facilitator which could reconcile the ontological differences. The use of argumentation allows two agents to iteratively explore candidate correspondences within a matching process, through a series of proposals and counter proposals, i.e., arguments. Thus, two agents can reason over the acceptability of these correspondences without explicitly disclosing the rationale for preferring one type of correspondences over another. In this chapter we present an overview of the approaches for alignment agreement based on argumentation.
Chapter
In this paper we propose an ontology matching paradigm based on the idea of harvesting the Semantic Web, i.e., automatically finding and exploring multiple and heterogeneous online knowledge sources to derive mappings. We adopt an experimental approach in the context of matching two real life, large-scale ontologies to investigate the potential of this paradigm, its limitations, and its relation to other techniques. Our experiments yielded a promising baseline precision of 70% and identified a set of critical issues that need to be considered to achieve the full potential of the paradigm. Besides providing a good performance as a stand-alone matcher, our paradigm is complementary to existing techniques and therefore could be used in hybrid tools that would further advance the state of the art in the ontology matching field.
Article
It has been a formidable task to achieve efficiency and scalability for the alignment between two massive, conceptually similar ontologies. Here we assume, an ontology is typically given in RDF (Resource Description Framework) or OWL (Web Ontology Language) and can be represented by a directed graph. A straightforward approach to the alignment of two ontologies entails an O(N2) computation by comparing every combination of pairs of nodes from given two ontologies, where N denotes the average number of nodes in each ontology. Our proposed algorithm called Anchor-Flood algorithm, boasting of computation on the average, starts off with an anchor, a pair of “look-alike” concepts from each ontology, gradually exploring concepts by collecting neighboring concepts, thereby taking advantage of locality of reference in the graph data structure. It outputs a set of alignments between concepts and properties within semantically connected subsets of two entire graphs, which we call segments. When similarity comparison between a pair of nodes in the directed graph has to be made to determine whether two given ontologies are aligned or not, we repeat the similarity comparison between a pair of nodes, within the neighborhood pairs of two ontologies surrounding the anchor iteratively until the algorithm meets that “either all the collected concepts are explored, or no new aligned pair is found”. In this way, we can significantly reduce the computational time for the alignment. Moreover, since we only focus on segment-to-segment comparison, regardless of the entire size of ontologies, our algorithm not only achieves high performance, but also resolves the scalability problem in aligning ontologies. Our proposed algorithm reduces the number of seemingly-aligned but actually misaligned pairs. Through several examples with large ontologies, we will demonstrate the features of our Anchor-Food algorithm.
Article
Ontology mapping seeks to find semantic correspondences between similar elements of different ontologies. It is a key challenge to achieve semantic interoperability in building the Semantic Web. This paper proposes a new generic and adaptive ontology mapping approach, called the PRIOR+, based on propagation theory, information retrieval techniques and artificial intelligence. The approach consists of three major modules, i.e., the IR-based similarity generator, the adaptive similarity filter and weighted similarity aggregator, and the neural network based constraint satisfaction solver. The approach first measures both linguistic and structural similarity of ontologies in a vector space model, and then aggregates them using an adaptive method based on their harmonies, which is defined as an estimator of performance of similarity. Finally to improve mapping accuracy the interactive activation and competition neural network is activated, if necessary, to search for a solution that can satisfy ontology constraints. The experimental results show that harmony is a good estimator of f-measure; the harmony based adaptive aggregation outperforms other aggregation methods; neural network approach significantly boosts the performance in most cases. Our approach is competitive with top-ranked systems on benchmark tests at OAEI campaign 2007, and performs the best on real cases in OAEI benchmark tests.
Article
In this article, we present extensional mappings, that are based on second-order tuple generating dependencies between models in our Generic Role-based Metamodel GeRoMe. Our mappings support data translation between heterogeneous models, such as XML schemas, relational schemas, or OWL ontologies. The mapping language provides grouping functionalities that allow for complete restructuring of data, which is necessary for handling object oriented models and nested data structures such as XML. Furthermore, we present algorithms for mapping composition and optimization of the composition result. To verify the genericness, correctness, and composability of our approach we implemented a data translation tool and mapping export for several data manipulation languages. Furthermore, we address the question how generic schema mappings can be harnessed for answering queries against an integrated global schema.
Article
Ontology mapping is the key point to reach interoperability over ontologies. In semantic web environment, ontologies are usually distributed and heterogeneous and thus it is necessary to find the mapping between them before processing across them. Many efforts have been conducted to automate the discovery of ontology mapping. However, some problems are still evident. In this paper, ontology mapping is formalized as a problem of decision making. In this way, discovery of optimal mapping is cast as finding the decision with minimal risk. An approach called Risk Minimization based Ontology Mapping (RiMOM) is proposed, which automates the process of discoveries on 1:1, n:1, 1:null and null:1 mappings. Based on the techniques of normalization and NLP, the problem of instance heterogeneity in ontology mapping is resolved to a certain extent. To deal with the problem of name conflict in mapping process, we use thesaurus and statistical technique. Experimental results indicate that the proposed method can significantly outperform the baseline methods, and also obtains improvement over the existing methods.
Article
In distributed geospatial applications with heterogeneous databases, an ontology-driven approach to data integration relies on the alignment of the concepts of a global ontology that describe the domain, with the concepts of the ontologies that describe the data in the distributed databases. Once the alignment between the global ontology and each distributed ontology is established, agreements that encode a variety of mappings between concepts are derived. In this way, users can potentially query hundreds of geospatial databases using a single query. Using our approach, querying can be easily extended to new data sources and, therefore, to new regions. In this paper, we describe the AgreementMaker, a tool that displays the ontologies, supports several mapping layers visually, presents automatically generated mappings, and finally produces the agreements.
Article
For the effective alignment of ontologies, the subsumption mappings between the elements of the source and target ontologies play a crucial role, as much as equivalence mappings do. This paper presents the “Classification-Based Learning of Subsumption Relations” (CSR) method for the alignment of ontologies. Given a pair of two ontologies, the objective of CSR is to learn patterns of features that provide evidence for the subsumption relation among concepts, and thus, decide whether a pair of concepts from these ontologies is related via a subsumption relation. This is achieved by means of a classification task, using state of the art supervised machine learning methods. The paper describes thoroughly the method, provides experimental results over an extended version of benchmarking series of both artificially created and real world cases, and discusses the potential of the method.
Article
Ontologies proliferate with the progress of the Semantic Web. Ontology matching is an important way of establishing interoperability between (Semantic) Web applications that use different but related ontologies. Due to their sizes and monolithic nature, large ontologies regarding real world domains bring a new challenge to the state of the art ontology matching technology. In this paper, we propose a divide-and-conquer approach to matching large ontologies. We develop a structure-based partitioning algorithm, which partitions entities of each ontology into a set of small clusters and constructs blocks by assigning RDF Sentences to those clusters. Then, the blocks from different ontologies are matched based on precalculated anchors, and the block mappings holding high similarities are selected. Finally, two powerful matchers, V-Doc and Gmo, are employed to discover alignments in the block mappings. Comprehensive evaluation on both synthetic and real world data sets demonstrates that our approach both solves the scalability problem and achieves good precision and recall with significant reduction of execution time.
Article
We present a detailed experimental investigation of the easy-hard-easy phase transition for randomly generated instances of satisfiability problems. Problems in the hard part of the phase transition have been extensively used for benchmarking satisfiability algorithms. This study demonstrates that problem classes and regions of the phase transition previously thought to be easy can sometimes be orders of magnitude more difficult than the worst problems in problem classes and regions of the phase transition considered hard. These difficult problems are either hard unsatisfiable problems or are satisfiable problems which give a hard unsatisfiable subproblem following a wrong split. Whilst these hard unsatisfiable problems may have short proofs, these appear to be difficult to find, and other proofs are long and hard.
Article
Semantic matching of schemas in heterogeneous data sharing systems is time consuming and error prone. Existing mapping tools employ semi-automatic techniques for mapping two schemas at a time. In a large-scale scenario, where data sharing involves a large number of data sources, such techniques are not suitable. We present a new robust automatic method which discovers semantic schema matches in a large set of XML schemas, incrementally creates an integrated schema encompassing all schema trees, and defines mappings from the contributing schemas to the integrated schema. Our method, PORSCHE (Performance ORiented SCHEma mediation), utilises a holistic approach which first clusters the nodes based on linguistic label similarity. Then it applies a tree mining technique using node ranks calculated during depth-first traversal. This minimises the target node search space and improves performance, which makes the technique suitable for large-scale data sharing. The PORSCHE framework is hybrid in nature and flexible enough to incorporate more matching techniques or algorithms. We report on experiments with up to 80 schemas containing 83,770 nodes, with our prototype implementation taking 587 s on average to match and merge them, resulting in an integrated schema and returning mappings from all input schemas to the integrated schema. The quality of matching in PORSCHE is shown using precision, recall and F-measure on randomly selected pairs of schemas from the same domain. We also discuss the integrity of the mediated schema in the light of completeness and minimality measures.
Article
Researchers in the ontology-design field have developed the content for ontologies in many domain areas. This distributed nature of ontology development has led to a large number of ontologies covering overlapping domains. In order for these ontologies to be reused, they first need to be merged or aligned to one another. We developed a suite of tools for managing multiple ontologies. These suite provides users with a uniform framework for comparing, aligning, and merging ontologies, maintaining versions, translating between different formalisms. Two of the tools in the suite support semi-automatic ontology merging: iPrompt is an interactive ontology-merging tool that guides the user through the merging process, presenting him with suggestions for next steps and identifying inconsistencies and potential problems. AnchorPrompt uses a graph structure of ontologies to find correlation between concepts and to provide additional information for iPrompt.
Article
Preprint available at http://www.ida.liu.se/~patla00/publications.shtml ----------------------- Due to the recent explosion of the amount of on-line accessible biomedical data and tools, finding and retrieving the relevant information is not an easy task. The vision of a Semantic Web for life sciences alleviates these difficulties. A key technology for the Semantic Web is ontologies. In recent years many biomedical ontologies have been developed and many of these ontologies contain overlapping information. To be able to use multiple ontologies they have to be aligned or merged. In this paper we propose a framework for aligning and merging ontologies. Further, we developed a system for aligning and merging biomedical ontologies (SAMBO) based on this framework. The framework is also a first step towards a general framework that can be used for comparative evaluations of alignment strategies and their combinations. In this paper we evaluated different strategies and their combinations in terms of quality and processing time and compared SAMBO with two other systems.
Conference Paper
Much research in information management begins by asking how to manage a given information corpus. But information management systems can only be as good as the information they manage. They struggle and often fail to correctly infer meaning from large ...
Conference Paper
In the extensive usage of ontologies envisaged by the Semantic Web there is a compelling need for expressing mappings between the components of heterogeneous ontologies. These mappings are of many different forms and involve the different components of ontologies. State of the art languages for ontology mapping enable to express semantic relations between homogeneous components of different ontologies, namely they allow to map concepts into concepts, individuals into individu- als, and properties into properties. Many real cases, however, highlight the necessity to establish semantic relations between heterogeneous components. For example to map a concept into a relation or vice versa. To support the interoperability of ontologies we need therefore to enrich mapping languages with constructs for the representation of heterogeneous mappings. In this paper, we propose an extension of Distributed Description Logics (DDL) to allow for the representation of mapping between concepts and relations. We provide a semantics of the proposed language and show its main logical properties.1