Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Although it is widely acknowledged that adding class disjointness to ontologies enables a wide range of interesting applications, this type of axiom is rarely used on today’s Semantic Web. This is due to the enormous skill and effort required to make the necessary modeling decisions. Automatically generating disjointness axioms could lower the barrier of entry and lead to a wider spread adoption. Different methods have been proposed for this automatic generation. These include supervised, top-down approaches which base their results on heterogeneous types of evidence and unsupervised, bottom-up approaches which rely solely on the instance data available for the ontology. However, current literature is missing a thorough comparison of these approaches. In this article, we provide this comparison by presenting two fundamentally different state-of-the-art approaches and evaluating their relative ability to enrich a well-known, multi-purpose ontology with class disjointness. To do so, we introduce a high-quality gold standard for class disjointness. We describe the creation of this standard in detail and provide a thorough analysis. Finally, we also present improvements to both approaches, based in part on discoveries made during our analysis and evaluation.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Most of them are able to cope with expressive representation languages such as Description Logics (DLs) [11], theoretical foundation for OWL, and the Open World Assumption (OWA) typically adopted, differently from the Closed World Assumption (CWA) that is usually assumed in the traditional ML settings. Also, problems such as ontology refinement and enrichment at terminology/schema level have been proposed [46,47,102,186,157]. ...
... To tackle this problem, automated methods for discovering disjointness axioms from the data distribution have been devised. A solution grounded on association rule mining [4] has been proposed in [186]. It is based on studying the correlation between classes comparatively, namely by considering association rules, negative association rules and correlation coefficient. ...
... Once the TCT is grown, groups of (disjoint) clusters located at sibling nodes identify concepts involved in candidate disjointness axioms to be derived 18 . Unlike [186], that is based on the statistical correlation between instances, the empirical evaluation of [157,156] showed the system ability to discover disjointness axioms also involving complex concept descriptions, thanks to the exploitation of the underlying ontology as background knowledge. ...
Article
Full-text available
The graph model is nowadays largely adopted to model a wide range of knowledge and data, spanning from social networks to knowledge graphs (KGs), representing a successful paradigm of how symbolic and transparent AI can scale on the World Wide Web. However, due to their unprecedented volume, they are generally tackled by Machine Learning (ML) and mostly numeric based methods such as graph embedding models (KGE) and deep neural networks (DNNs). The latter methods have been proved lately very efficient, leading the current AI spring. In this vision paper, we introduce some of the main existing methods for combining KGs and ML, divided into two categories: those using ML to improve KGs, and those using KGs to improve results on ML tasks. From this introduction, we highlight research gaps and perspectives that we deem promising and currently under-explored for the involved research communities, spanning from KG support for LLM prompting, integration of KG semantics in ML models to symbol-based methods, interpretability of ML models, and the need for improved benchmark datasets. In our opinion, such perspectives are stepping stones in an ultimate view of KGs as central assets for neuro-symbolic and explainable AI.
... A number of works [8,18,20,28,34,35] exist in the literature to discover disjointness between Linked Data classes. However, there are not many works to discover disjointness axioms between properties in Linked Data. ...
... However, none of the abovementioned works talk about the role that time plays in deciding the disjointness of property pairs. The survey presented in Völker et al. [34] mentions that deciding the disjointness of two classes is subjective as it often depends on the underlying contexts such as time. However, Völker et al. [34] do not propose any technique to decide whether two classes are disjoint or not with respect to time. ...
... The survey presented in Völker et al. [34] mentions that deciding the disjointness of two classes is subjective as it often depends on the underlying contexts such as time. However, Völker et al. [34] do not propose any technique to decide whether two classes are disjoint or not with respect to time. As far as we know, ours is the first work to propose a systematic method to decide the potential disjointness of two properties with respect to time. ...
Article
Full-text available
Although Knowledge Graphs (KGs) have turned out to become a popular and powerful tool in the industry world, the major focus of most researchers has been only on adding more and more triples to the A-Boxes of the KGs. An often overlooked but important part of a KG is its T-Box. If the T-Box contains incorrect statements or if certain correct statements are absent in it, it can lead to inconsistent knowledge in the KG or to information loss respectively. In this paper, we propose a novel system, DOPLEX, based on Probabilistic Soft Logic (PSL) to detect disjointness between pairs of object properties present in the KG. Current approaches mainly rely on checking the absence of common triples and miss out on exploiting the semantics of property names. In the proposed system, in addition to checking common triples, PSL is used to determine if property names imply disjointness. We particularly focus on knowledge graphs that are auto-extracted from large text corpora. Our evaluation demonstrates that the proposed approach discovers disjoint property pairs with better precision when compared to the state-of-the-art system without compromising much on the number of disjoint pairs discovered. Towards the end of the paper, we discuss the disjointness of properties in the context of time and propose a new notion called temporal-non-disjointness and discuss its importance and characteristics. We also present an approach for the discovery of property pairs that are potentially temporally non-disjoint.
... In particular, class disjointness axioms are useful for checking the logical consistency and detecting undesired usage patterns or incorrect assertions. As for the definition of disjointness [21], two classes are disjoint if they do not possess any common individual according to their intended interpretation, i.e., the intersection of these classes is empty in a particular KB. A simple example can demonstrate the potential advantages obtained by the addition of this kind of axioms to an ontology. ...
... In the case of axiom learning, i.e., learning class disjointness axioms, recent methods [11,22] apply top-down or intensional approaches to learning disjointness which relies on schema-level information, i.e., logical and lexical descriptions of the classes. The contributions based on bottom-up or extensional approaches [1,21], on the other hand, require the instances in the dataset to induce instance-driven patterns to suggest axioms, e.g., disjointness class axioms. ...
... The most prominent related work relevant to learning disjointness axioms consists of the contributions by Johanna Völker and her collaborators [5,21,22]. In early work, Völker developed supervised classifiers from LOD incorporated in the LeDA tool [22]. ...
Conference Paper
Full-text available
Axiom learning is an essential task in enhancing the quality of an ontology, a task that sometimes goes under the name of ontology enrichment. To overcome some limitations of recent work and to contribute to the growing library of ontology learning algorithms, we propose an evolutionary approach to automatically discover axioms from the abundant RDF data resource of the Semantic Web. We describe a method applying an instance of an Evolutionary Algorithm, namely Grammatical Evolution, to the acquisition of OWL class disjointness axioms, one important type of OWL axioms which makes it possible to detect logical inconsistencies and infer implicit information from a knowledge base. The proposed method uses an axiom scoring function based on possibility theory and is evaluated against a Gold Standard, manually constructed by knowledge engineers. Experimental results show that the given method possesses high accuracy and good coverage.
... In particular, class disjointness axioms are useful for checking the logical consistency and detecting undesired usage patterns or incorrect assertions. As for the definition of disjointness [10], two classes are disjoint if they do not possess any common individual according to their intended interpretation, i.e., the intersection of these classes is empty in a particular KB. ...
... Recent methods [11,12] apply top-down or intensional approaches to learning disjointness which rely on schema-level information, i.e., logical and lexical decriptions of the classes. The contributions based on bottom-up or extensional approaches [9,10], on the other hand, require the instances in the dataset to induce instance-driven patterns to suggest axioms, e.g., disjointness class axioms. ...
... The most prominent related work relevant to learning disjointness axioms consists of the contributions by Johanna Völker and her collaborators [12,10,13]. In early work, Völker developed supervised classifiers from LOD incorporated in the LeDA tool [12]. ...
Chapter
Full-text available
Today, with the development of the Semantic Web, Linked Open Data (LOD), expressed using the Resource Description Framework (RDF), has reached the status of “big data” and can be considered as a giant data resource from which knowledge can be discovered. The process of learning knowledge defined in terms of OWL 2 axioms from the RDF datasets can be viewed as a special case of knowledge discovery from data or “data mining”, which can be called“RDF mining”. The approaches to automated generation of the axioms from recorded RDF facts on the Web may be regarded as a case of inductive reasoning and ontology learning. The instances, represented by RDF triples, play the role of specific observations, from which axioms can be extracted by generalization. Based on the insight that discovering new knowledge is essentially an evolutionary process, whereby hypotheses are generated by some heuristic mechanism and then tested against the available evidence, so that only the best hypotheses survive, we propose the use of Grammatical Evolution, one type of evolutionary algorithm, for mining disjointness OWL 2 axioms from an RDF data repository such as DBpedia. For the evaluation of candidate axioms against the DBpedia dataset, we adopt an approach based on possibility theory.
... For this reason, semi-automated labeling of disjoint classes could be advantageous. Recent approaches [2,3,4] propose supervised and unsupervised models using various features in disjointness axioms. However, the generalizability of these methods is limited to their specific datasets and cannot be implemented on a large scale. ...
... This method employs a hierarchical conceptual clustering technique capable of providing intensional cluster descriptions and utilizes a novel form of semi-distances over individuals in an ontological knowledge base, incorporating available background knowledge. In the supervised category, Völker et al. [2,3] gather syntactic and semantic evidence, such as positive and negative association rules as well as correlation coefficients, from various sources to establish a strong foundation for learning disjointness. However, their work exploits background knowledge and reasoning only to a limited extent. ...
Preprint
Full-text available
Ontologies often lack explicit disjointness declarations between classes, despite their usefulness for sophisticated reasoning and consistency checking in Knowledge Graphs. In this study, we explore the potential of Large Language Models (LLMs) to enrich ontologies by identifying and asserting class disjointness axioms. Our approach aims at leveraging the implicit knowledge embedded in LLMs, using prompt engineering to elicit this knowledge for classifying ontological disjointness. We validate our methodology on the DBpedia ontology, focusing on open-source LLMs. Our findings suggest that LLMs, when guided by effective prompt strategies, can reliably identify disjoint class relationships, thus streamlining the process of ontology completion without extensive manual input. For comprehensive disjointness enrichment, we propose a process that takes logical relationships between disjointness and subclass statements into account in order to maintain satisfiability and reduce the number of calls to the LLM. This work provides a foundation for future applications of LLMs in automated ontology enhancement and offers insights into optimizing LLM performance through strategic prompt design. Our code is publicly available on GitHub at https://github.com/n28div/llm-disjointness.
... Among systems mining specific types of axioms, disjointness axioms are a popular target; for example, the disjointness axiom DomesticAirport ⊓ InternationalAirport ≡ ⊥ states that the intersection of the two classes is equivalent to the empty class, or in simpler terms, no node can be simultaneously of type Domestic Airport and International Airport . The system proposed by Völker et al. [540] extracts disjointness axioms based on (negative) association rule mining [5], which finds pairs of classes where each has many instances in the knowledge graph but there are relatively few (or no) instances of both classes. Töpper et al. [524] rather extract disjointness for pairs of classes that have a cosine similarity below a fixed threshold. ...
... Textual definitions can also be harvested from large texts to extract hypernym relations and induce a taxonomy from scratch [534]. More recent works aim to extract more expressive axioms from text, including disjointness axioms [540]; and axioms involving the union and intersection of classes, along with existential, universal, and qualified-cardinality restrictions [412]. The results of an ontology learning process can then serve as input to a more general ontology engineering methodology, allowing us to validate the terminological coverage of an ontology, to identify new classes and axioms, etc. ...
... The growth of average fitness over generations of our approach in discovering class disjointness axioms (Precision = 0.95 ± 0.02).The recall value is higher than the value in GoldMiner[VFS15]. In addition, there are a number of discovered class disjointness axioms being absent in the result of GoldMiner. ...
... However, the learning algorithms need a set of labeled data for training that may demand expensive work by domain experts. In contrast to LeDA, statistical schema induction via association rule mining[VFS15] was given in the tool GoldMiner, where association rules are representations of implicit patterns extracted from large amount of data and no training data is required.Association rules are compiled based on statistical analysis of a transaction table, which is built from the results of SPARQL queries. That research only focused6. ...
Thesis
Full-text available
In the Semantic Web era, Linked Open Data (LOD) is its most successful implementation, which currently contains billions of RDF (Resource Data Framework) triples derived from multiple, distributed, heterogeneous sources. The role of a general semantic schema, represented as an ontology, is essential to ensure the correctness and consistency in LOD and make it possible to infer implicit knowledge by reasoning. The growth of LOD creates an opportunity for the discovery of ontological knowledge from its raw RDF data itself to enrich relevant knowledge bases. In this work, we aim at discovering schema-level knowledge in the form of axioms encoded in OWL (Ontology Web Language) from RDF data. The approaches to automated generation of the axioms from recorded RDF facts on the Web may be regarded as a case of inductive reasoning and ontology learning. The instances, represented by RDF triples, play the role of specific observations, from which axioms can be extracted by generalization.Based on the insight that discovering new knowledge is essentially an evolutionary process, whereby hypotheses are generated by some heuristic mechanism and then tested against the available evidence, so that only the best hypotheses survive, we propose a model applying Grammatical Evolution, one type of evolutionary algorithm, to mine OWL axioms from an RDF data repository. In addition, we specialize the model for the specific problem of learning OWL class disjointness axioms, along with the experiments performed on DBpedia, one of the prominent examples of LOD. Furthermore, we use different axiom scoring functions based on possibility theory, which are well-suited to the open world assumption scenario of LOD, to evaluate the quality of discovered axioms. Specifically, we proposed a set of measures to build objective functions based on single-objective and multi-objective models, respectively. Finally, in order to validate it, the performance of our approach is evaluated against subjective and objective benchmarks, and is also compared to the main state-of-the-art systems.
... Among systems mining specific types of axioms, disjointness axioms are a popular target; for example, the disjointness axiom DomesticAirport ⊓ InternationalAirport ≡ ⊥ states that the intersection of the two classes is equivalent to the empty class, or in simpler terms, no node can be simultaneously of type Domestic Airport and International Airport . The system proposed by Völker et al. [540] extracts disjointness axioms based on (negative) association rule mining [5], which finds pairs of classes where each has many instances in the knowledge graph but there are relatively few (or no) instances of both classes. Töpper et al. [524] rather extract disjointness for pairs of classes that have a cosine similarity below a fixed threshold. ...
... Textual definitions can also be harvested from large texts to extract hypernym relations and induce a taxonomy from scratch [534]. More recent works aim to extract more expressive axioms from text, including disjointness axioms [540]; and axioms involving the union and intersection of classes, along with existential, universal, and qualified-cardinality restrictions [412]. The results of an ontology learning process can then serve as input to a more general ontology engineering methodology, allowing us to validate the terminological coverage of an ontology, to identify new classes and axioms, etc. ...
Article
Full-text available
In this article, we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models, as well as languages used to query and validate knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We conclude with high-level future research directions for knowledge graphs.
... Among systems mining specific types of axioms, disjointness axioms are a popular target; for example, the disjointness axiom DomesticAirport ⊓ InternationalAirport ≡ ⊥ states that the intersection of the two classes is equivalent to the empty class, or in simpler terms, no node can be simultaneously of type Domestic Airport and International Airport . The system proposed by [512] extracts disjointness axioms based on (negative) association rule mining [4], which finds pairs of classes where each has many instances in the knowledge graph but there are relatively few (or no) instances of both classes. Töpper et al. [497] rather extract disjointness for pairs of classes that have a cosine similarity below a fixed threshold. ...
... Textual definitions can also be harvested from large texts to extract hypernym relations and induce a taxonomy from scratch [506]. More recent works aim to extract more expressive axioms from text, including disjointness axioms [512]; and axioms involving the union and intersection of classes, along with existential, universal, and qualified-cardinality restrictions [387]. The results of an ontology learning process can then serve as input to a more general ontology engineering methodology, allowing us to validate the terminological coverage of an ontology, to identify new classes and axioms, etc. ...
Article
Full-text available
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graphs.
... Among systems mining specific types of axioms, disjointness axioms are a popular target; for example, the disjointness axiom DomesticAirport ⊓ InternationalAirport ≡ ⊥ states that the intersection of the two classes is equivalent to the empty class, or in simpler terms, no node can be simultaneously of type Domestic Airport and International Airport . The system proposed by [512] extracts disjointness axioms based on (negative) association rule mining [4], which finds pairs of classes where each has many instances in the knowledge graph but there are relatively few (or no) instances of both classes. Töpper et al. [497] rather extract disjointness for pairs of classes that have a cosine similarity below a fixed threshold. ...
... Textual definitions can also be harvested from large texts to extract hypernym relations and induce a taxonomy from scratch [506]. More recent works aim to extract more expressive axioms from text, including disjointness axioms [512]; and axioms involving the union and intersection of classes, along with existential, universal, and qualified-cardinality restrictions [387]. The results of an ontology learning process can then serve as input to a more general ontology engineering methodology, allowing us to validate the terminological coverage of an ontology, to identify new classes and axioms, etc. ...
Preprint
Full-text available
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graphs.
... For ARM-based approaches 3 , the first approach was given in [21], which was extended by [5] and [6] to learn disjointness and property axioms separately by defining various association rule patterns. It was also extended by [20] to generate negative association rules for learning disjointness. Besides, by modifying some technical details in [21], the work in [4] can induce not only independent domain and range restrictions but also coupled ones. ...
... They used various external resources like manually constructed training examples, background ontologies, textual resources and WordNet to obtain features. This work was extended in [20] to use more logical features and external resources like wikipedia. The work in [22] integrated the probabilistic inference capability of Bayesian Networks with the logical formalism of Description Logics. ...
Chapter
With the rapid growth of knowledge graphs, schema induction, as a task of extracting relations or constraints from a knowledge graph for the classes and properties, becomes more critical and urgent. Schema induction plays an important role to facilitate many applications like integrating, querying and maintaining knowledge graphs. To provide a comprehensive survey of schema induction, in this paper, we overview existing schema induction approaches by mainly considering their learning methods, the types of learned axioms and the external resources that may be used during the learning process. Based on the comparison, we point out the challenges and directions for schema induction.
... In the literature, sundry approaches for discovering disjointness axioms have been proposed. Recent methods apply association rule mining [18,19]. However, they can capitalize on the available of intensional knowledge only to some marginal extent. ...
... Besides, methods based on relational learning [15] and formal concept analysis [3] have been proposed, but none specifically aimed at assessing the quality of the induced axioms. This is pointed out also in [19] and additional approaches [11,18] based on association rule mining have been introduced to better address this limitation. The goal was studying the correlation between classes. ...
Chapter
Full-text available
Despite the benefits deriving from explicitly modeling concept disjointness to increase the quality of the ontologies, the number of disjointness axioms in vocabularies for the Web of Data is still limited, thus risking to leave important constraints underspecified. Automated methods for discovering these axioms may represent a powerful modeling tool for knowledge engineers. For the purpose, we propose a machine learning solution that combines (unsupervised) distance-based clustering and the divide-and-conquer strategy. The resulting terminological cluster trees can be used to detect candidate disjointness axioms from emerging concept descriptions. A comparative empirical evaluation on different types of ontologies shows the feasibility and the effectiveness of the proposed solution that may be regarded as complementary to the current methods which require supervision or consider atomic concepts only.
... It is thus difficult to compare it to benchmarks targeting the current formalisation task. Nevertheless, when we go through literature, we find experimental set-ups for tasks in both fields of NL processing and Semantic Web like Question Answering (QA) [1], Named Entity Recognition and Disambiguation (NERD) [11] and Acquisition of Class Disjointness [13]. ...
... In acquisition of class disjointness, ontology designers do not always agree to assert two classes as disjoint or not. Völker and colleagues [13] thus face cases of ambiguity in the set of correct answers. To handle ambiguities, they calculate f1-measure in two main cases: for the subset of the gold standard (i) where all the ontology designers agree and (ii) where at least 50% of the designers agree. ...
Article
Full-text available
In this paper we present BEAUFORD, a benchmark for methods which aim to provide formal expressions of concepts using the natural language (NL) definition of these concepts. Adding formal expressions of concepts to a given ontology allows reasoners to infer more useful pieces of information or to detect inconsistencies in this given ontology. To the best of our knowledge, BEAUFORD is the first benchmark to tackle this ontology enrichment problem. BEAUFORD allows the breaking down of a given formalisation approach by identifying its key features. In addition, BEAUFORD provides strong mechanisms to evaluate efficiently an approach even in case of ambiguity which is a major challenge in formalisation of NL resources. Indeed, BEAUFORD takes into account the fact that a given NL phrase can be formalised in many ways. Hence, it proposes a suitable specification to represent these multiple formalisations. Taking advantage of this specification, BEAUFORD redefines classical precision and recall and introduces other metrics to take into account the fact that there is not only one unique way to formalise a definition. Finally, BEAUFORD comprises a well-suited dataset to concretely judge of the efficiency of methods of formalisation. Using BEAUFORD, current approaches of formalisation of definitions can be compared accurately using a suitable gold standard.
... Potoniec et al. considered mining subclass axioms with a fixed superclass and arbitrary OWL 2 EL class expression as a subclass [11]. Völker et al. proposed approaches to automatically discover class disjointness axioms by using associative rule mining and by posing the problem as a classification task [12]. Potoniec and Ławrynowicz presented an approach to split an existing class into a set of subclasses, each equipped with a formal definition through EquivalentTo axiom. ...
... For example, Potoniec et al. proposed Swift Linked Data Miner, a method for mining class expression serving as super-classes in class inclusion axioms [4]. Völker et al. proposed algorithms for mining class hierarchy [5] and class disjointness [6] using association rule mining. Li and Sima developed a method for parallel learning of an OWL 2 EL ontology from a large Linked Data repository [7]. ...
Article
Full-text available
We present an algorithm to inductively learn Web Ontology Language (OWL) 2 property chains to be used in object subproperty axioms. For efficiency, it uses specialized encodings and data structures based on hash-maps and sparse matrices. The algorithm is based on the frequent pattern search principles and uses a novel measure called s-support . We prove soundness and termination of the algorithm, and report on evaluation where we mine axioms from DBpedia 2016-10. We extensively discuss the 36 mined axioms and conclude that 30 (83%) of them are correct and could be added to the ontology.
... Learning disjointness axioms methods aim to discover axioms from the data that during the modelling process are overlooked and lead to misunderstanding the negative knowledge of the target domain. Indicative methods that tackle this problem are proposed in other works [28,29] . These methods study the correlation between the classes, the negative and association rules, and the correlation coefficient. ...
Article
Full-text available
Remarkable progress in research has shown the efficiency of Knowledge Graphs (KGs) in extracting valuable external knowledge in various domains. A Knowledge Graph (KG) can illustrate high-order relations that connect two objects with one or multiple related attributes. The emerging Graph Neural Networks (GNN) can extract both object characteristics and relations from KGs. This paper presents how Machine Learning (ML) meets the Semantic Web and how KGs are related to Neural Networks and Deep Learning. The paper also highlights important aspects of this area of research, discussing open issues such as the bias hidden in KGs at different levels of graph representation.
... It may be the case that during the learning, the negative examples point out that they have some difference. This also happens when the set of examples of two concepts is disjoint (Völker et al., 2015). ...
Article
An ontology formalises a number of dependent and related concepts in a domain, encapsulated as a terminology. Manually defining such terminologies is a complex, time-consuming and error-prone task. Thus, there is great interest for strategies to learn terminologies automatically. However, most of the existing approaches induce a single concept definition at a time, disregarding dependencies that may exist among the concepts. As a consequence, terminologies that are difficult to interpret may be induced. Thus, systems capable of learning all concepts within a single task, respecting their dependency, are essential for reaching concise and readable ontologies. In this paper, we tackle this issue presenting three terminology learning strategies that aim at finding dependencies among concepts, before, during or after they have been defined. Experimental results show the advantages of regarding the dependencies among the concepts to achieve readable and concise terminologies, compared to a system that learns a single concept at a time. Moreover, the three strategies are compared and analysed towards discussing the strong and weak points of each one.
... Using this technique, e.g., DeputyDirector ⊑ CivilServicePost and MinisterialDepartment ⊑ Department were extracted from data.gov.uk [48,14,47] (see also [43] for expressive DLs with fixed length). ...
Preprint
Full-text available
The quest for acquiring a formal representation of the knowledge of a domain of interest has attracted researchers with various backgrounds into a diverse field called ontology learning. We highlight classical machine learning and data mining approaches that have been proposed for (semi-)automating the creation of description logic (DL) ontologies. These are based on association rule mining, formal concept analysis, inductive logic programming, computational learning theory, and neural networks. We provide an overview of each approach and how it has been adapted for dealing with DL ontologies. Finally, we discuss the benefits and limitations of each of them for learning DL ontologies.
... The support is a metric for measuring statistical significance, while confidence measures the 'strength' of a rule, in this case, expressed as a CI in an ontology language. Many authors have already employed this method for building DL ontologies [19,48,49] (see also [42]) and for finding relational rules in knowledge graphs [22]. The usual approach is to fix the depth of the CIs in order to restrict the search space. ...
Preprint
Full-text available
Ontologies are a popular way of representing domain knowledge, in particular, knowledge in domains related to life sciences. (Semi-)automating the process of building an ontology has attracted researchers from different communities into a field called "Ontology Learning". We provide a formal specification of the exact and the probably approximately correct learning models from computational learning theory. Then, we recall from the literature complexity results for learning lightweight description logic (DL) ontologies in these models. Finally, we highlight other approaches proposed in the literature for learning DL ontologies.
... Völker et al. [35] proposed an approach to discover disjointness axioms, i.e., axioms stating that objects of one class cannot belong to the other class and vice versa. Their approach is based on associative rule mining and is able to incorporate additional information, such as lexical information from labels of the considered instances. ...
Article
Full-text available
We present an approach to mine cardinality restriction axioms from an existing knowledge graph, in order to extend an ontology describing the graph. We compare frequency estimation with kernel density estimation as approaches to obtain the cardinalities in restrictions. We also propose numerous strategies for filtering obtained axioms in order to make them more available for the ontology engineer. We report the results of experimental evaluation on DBpedia 2016-10 and show that using kernel density estimation to compute the cardinalities in cardinality restrictions yields more robust results that using frequency estimation. We also show that while filtering is of limited usability for minimum cardinality restrictions, it is much more important for maximum cardinality restrictions. The presented findings can be used to extend existing ontology engineering tools in order to support ontology construction and enable more efficient creation of knowledge-intensive artificial intelligence systems.
... The support is a metric for measuring statistical significance, while confidence measures the strength of a rule, in this case, expressed as a CI in an ontology language. Many authors have already employed this method for building DL ontologies [19,48,49] (see also [42]) and for finding relational rules in knowledge graphs [22]. The usual approach is to fix the depth of the CIs in order to restrict the search space. ...
Chapter
Ontologies are a popular way of representing domain knowledge, in particular, knowledge in domains related to life sciences. (Semi-) automating the process of building an ontology has attracted researchers from different communities into a field called “Ontology Learning”. We provide a formal specification of the exact and the probably approximately correct learning models from computational learning theory. Then, we recall from the literature complexity results for learning lightweight description logic (DL) ontologies in these models. Finally, we highlight other approaches proposed in the literature for learning DL ontologies.
... Learnability of EL ontologies from finite interpretations has also been investigated (Klarman and Britz 2015). Association rule mining has been used to learn DL ontologies (with concept expressions of limited depth) (Sazonau and Sattler 2017;Völker and Niepert 2011;Fleischhacker, Völker, and Stuckenschmidt 2012;Völker, Fleischhacker, and Stuckenschmidt 2015). ...
Article
We investigate the complexity of learning query inseparable εℒℋ ontologies in a variant of Angluin's exact learning model. Given a fixed data instance A* and a query language 𝒬, we are interested in computing an ontology ℋ that entails the same queries as a target ontology 𝒯 on A*, that is, ℋ and 𝒯 are inseparable w.r.t. A* and 𝒬. The learner is allowed to pose two kinds of questions. The first is ‘Does (𝒯,A)⊨ q?’, with A an arbitrary data instance and q and query in 𝒬. An oracle replies this question with ‘yes’ or ‘no’. In the second, the learner asks ‘Are ℋ and 𝒯 inseparable w.r.t. A* and 𝒬?’. If so, the learning process finishes, otherwise, the learner receives (A*,q) with q ∈ 𝒬, (𝒯,A*) |= q and (ℋ,A*) ⊭ q (or vice-versa). Then, we analyse conditions in which query inseparability is preserved if A* changes. Finally, we consider the PAC learning model and a setting where the algorithms learn from a batch of classified data, limiting interactions with the oracles.
... Using this technique, e.g., ⊑ and ⊑ were extracted from data.gov.uk [13,45,46] (see also [41] for expressive DLs with fixed length). ...
Article
Full-text available
The quest for acquiring a formal representation of the knowledge of a domain of interest has attracted researchers with various backgrounds into a diverse field called ontology learning. We highlight classical machine learning and data mining approaches that have been proposed for (semi-)automating the creation of description logic (DL) ontologies. These are based on association rule mining, formal concept analysis, inductive logic programming, computational learning theory, and neural networks. We provide an overview of each approach and how it has been adapted for dealing with DL ontologies. Finally, we discuss the benefits and limitations of each of them for learning DL ontologies.
... There are notable examples showcasing the influence of neural approaches to knowledge acquisition and representation learning on the broad area of Semantic Web technologies. These include, among oth- ers, ontology learning [40,49,65], learning structured query languages from natural language [69], ontology alignment [20,28,35,52], ontology annotation [15,58], joined relational and multi-modal knowledge representations [62], and relation prediction [1,59]. Ontologies, on the other hand, have been repeatedly utilized as background knowledge for machine learning tasks. ...
... It then uses the ARM-based method from the transaction table to extract some of the associated conceptual relationships. In their follow-up work, Fleischhacker and Völker [23] used negative-association ruleextracting techniques to study the concept of non-conceptual relations, and rich experimental results are given in Ref. [24]. Fig. 1 describes the approach to build a cybersecurity knowledge base. ...
Article
Full-text available
Cyberattack forms are complex and varied, and the detection and prediction of dynamic types of attack are always challenging tasks. Research on knowledge graphs is becoming increasingly mature in many fields. At present, it is very significant that certain scholars have combined the concept of the knowledge graph with cybersecurity in order to construct a cybersecurity knowledge base. This paper presents a cybersecurity knowledge base and deduction rules based on a quintuple model. Using machine learning, we extract entities and build ontology to obtain a cybersecurity knowledge base. New rules are then deduced by calculating formulas and using the path-ranking algorithm. The Stanford named entity recognizer (NER) is also used to train an extractor to extract useful information. Experimental results show that the Stanford NER provides many features and the useGazettes parameter may be used to train a recognizer in the cybersecurity domain in preparation for future work.
... The learned axioms mostly include Disjointness axioms between concepts of DBpedia ontology and have ALCH expressiveness. Völker et al. (2015) published a high-quality gold standard for class disjointness of DBpedia concepts. Fleischhacker et al. (2013) used this ontology to construct a dataset of 11 ontologies named O 0 , O 1 , · · · , O 10 . ...
Article
Due to modeling errors in designing ontologies, an ontology may carry incorrect information. Ontology debugging can be helpful in detecting errors in ontologies that are increasing in size and expressiveness day by day. While current ontology debugging methods can detect logical errors (incoherences and inconsistencies), they are incapable of detecting hidden modeling errors in coherent and consistent ontologies. From the logical perspective, there are no errors in such ontologies, but this study shows some modeling errors may not break the coherency of the ontology by not participating in any contradiction. In this paper, contextual knowledge is exploited to detect such hidden errors. Our experiments show that adding general ontologies like DBpedia as contextual knowledge in the ontology debugging process results in detecting contradictions in ontologies that are coherent.
... This limit has been pointed out also in [6], where an approach based on association rule mining has been introduced. Also the method reported in [3, 20] relies on association rules. Specifically three distinct approaches that aim at studying the correlation between classes have been considered comparatively: association rule, negative association rule and correlation coefficient. ...
Chapter
Despite the benefits deriving from explicitly stating concepts as dis-joint to model high-quality ontologies, the number of disjointness axioms in on-tologies adopted as vocabularies for the Web of Data is limited. As a result, while the limited expressiveness fosters their use, these vocabularies fail to specify important constraints. Therefore, devising automated methods for discovering these axioms may represent a powerful tool for supporting a knowledge engineer. In this perspective, the paper proposes a machine learning solution that combines distance-based clustering and the divide-and-conquer strategy employed to grow terminological decision trees. The resulting model, called terminological cluster tree, allow an engineer to derive candidate concept descriptions to be stated as disjoint. An empirical evaluation on different types of ontologies suggests the feasibility and usefulness of the proposed solution.
Chapter
Recent developments in the context of semantic technologies have given rise to ontologies for modelling scientific information in various fields of science. Over the past years, we have been engaged in the development of the Science Knowledge Graph Ontologies (SKGO), a set of ontologies for modelling research findings in various fields of science. This paper introduces the Modern Science Ontology (ModSci), an upper ontology for modelling relationships between modern science branches and related entities, including scientific discoveries, phenomena, prominent scientists, instruments, etc. ModSci provides a unifying framework for the various domain ontologies that make up the Science Knowledge Graph Ontology suite. Well-known ontology development guidelines and principles have been followed in the development and publication of the resource. We present several use cases and motivational scenarios to express the motivation behind developing the ontology and, therefore, its potential uses. We deem that within the next few years, a science knowledge graph is likely to become a crucial component for organizing and exploring scientific work.KeywordsOntology EngineeringKnowledge RepresentationTaxonomyModern ScienceHierarchical Classification
Chapter
The Semantic Web (SW) is characterized by the availability of a vast amount of semantically annotated data collections. Annotations are provided by exploiting ontologies acting as shared vocabularies. Additionally ontologies are endowed with deductive reasoning capabilities which allow to make explicit knowledge that is formalized implicitly. Along the years a large number of data collections have been developed and interconnected, as testified by the Linked Open Data Cloud. Currently, seminal examples are represented by the numerous Knowledge Graphs (KGs) that have been built, either as enterprise KGs or open KGs, that are freely available. All of them are characterized by very large data volumes, but also incompleteness and noise. These characteristics have made the exploitation of deductive reasoning services less feasible from a practical viewpoint, opening up to alternative solutions, grounded on Machine Learning (ML), for mining knowledge from the vast amount of information available. Actually, ML methods have been exploited in the SW for solving several problems such as link and type prediction, ontology enrichment and completion (both at terminological and assertional level), and concept leaning. Whilst initially symbol-based solutions have been mostly targeted, recently numeric-based approaches are receiving major attention because of the need to scale on the very large data volumes. Nevertheless, data collections in the SW have peculiarities that can hardly be found in other fields. As such the application of ML methods for solving the targeted problems is not straightforward. This paper extends [20], by surveying the most representative symbol-based and numeric-based solutions and related problems, with a special focus on the main issues that need to be considered and solved when ML methods are adopted in the SW field as well as by analyzing the main peculiarities and drawbacks for each solution.KeywordsSemantic WebMachine learningSymbol-based methodsNumeric-based methods
Conference Paper
The huge wealth of linked data available on the Web (also known as the Web of data), organized according to the standards of the Semantic Web, can be exploited to automatically discover new knowledge, expressed in the form of axioms, one of the essential components of ontologies. In order to overcome the limitations of existing methods for axiom discovery, we propose a two-objective grammar-based genetic programming approach that casts axiom discovery as a genetic programming problem involving the two independent criteria of axiom credibility and generality. We demonstrate the power of the proposed approach by applying it to the task of discovering class disjointness axioms involving complex class expression, a type of axioms that plays an important role in improving the quality of ontologies. We carry out experiments to determine the most appropriate parameter settings and we perform an empirical comparison of the proposed method with state-of-the-art methods proposed in the literature.
Conference Paper
Discovering disjointness axioms is a very important task in ontology learning and knowledge base enrichment. To help overcome the knowledge-acquisition bottleneck, we propose a grammar-based genetic programming method for mining OWL class disjointness axioms from the Web of data. The effectiveness of the method is evaluated by sampling a large RDF dataset for training and testing the discovered axioms on the full dataset. First, we applied Grammatical Evolution to discover axioms based on a random sample of DBpedia, a large open knowledge graph consisting of billions of elementary assertions (RDF triples). Then, the discovered axioms are tested for accuracy on the whole DBpedia. We carried out experiments with different parameter settings and analyze output results as well as suggest extensions.
Chapter
In the context of the Semantic Web, learning implicit knowledge in terms of axioms from Linked Open Data has been the object of much current research. In this paper, we propose a method based on grammar-based genetic programming to automatically discover disjointness axioms between concepts from the Web of Data. A training-testing model is also implemented to overcome the lack of benchmarks and comparable research. The acquisition of axioms is performed on a small sample of DBpedia with the help of a Grammatical Evolution algorithm. The accuracy evaluation of mined axioms is carried out on the whole DBpedia. Experimental results show that the proposed method gives high accuracy in mining class disjointness axioms involving complex expressions.
Article
In the context of the Semantic Web regarded as a Web of Data, research efforts have been devoted to improving the quality of the ontologies that are used as vocabularies to enable complex services based on automated reasoning. From various surveys it emerges that many domains would require better ontologies that include non-negligible constraints for properly conveying the intended semantics. In this respect, disjointness axioms are representative of this general problem: these axioms are essential for making the negative knowledge about the domain of interest explicit yet they are often overlooked during the modeling process (thus affecting the efficacy of the reasoning services). To tackle this problem, automated methods for discovering these axioms can be used as a tool for supporting knowledge engineers in modeling new ontologies or evolving existing ones. The current solutions, either based on statistical correlations or relying on external corpora, often do not fully exploit the terminology. Stemming from this consideration, we have been investigating on alternative methods to elicit disjointness axioms from existing ontologies based on the induction of terminological cluster trees, which are logic trees in which each node stands for a cluster of individuals which emerges as a sub-concept. The growth of such trees relies on a divide-and-conquer procedure that assigns, for the cluster representing the root node, one of the concept descriptions generated via a refinement operator and selected according to a heuristic based on the minimization of the risk of overlap between the candidate sub-clusters (quantified in terms of the distance between two prototypical individuals). Preliminary works have showed some shortcomings that are tackled in this paper. To tackle the task of disjointness axioms discovery we have extended the terminological cluster tree induction framework with various contributions: 1) the adoption of different distance measures for clustering the individuals of a knowledge base; 2) the adoption of different heuristics for selecting the most promising concept descriptions; 3) a modified version of the refinement operator to prevent the introduction of inconsistency during the elicitation of the new axioms. A wide empirical evaluation showed the feasibility of the proposed extensions and the improvement with respect to alternative approaches.
Chapter
In this article, we present an original use of Redescription Mining (RM) for discovering definitions of classes and incompatibility (disjointness) axioms between classes of individuals in the web of data. RM is aimed at mining alternate descriptions from two datasets related to the same set of individuals. We reuse this process for providing definitions in terms of necessary and sufficient conditions to categories in DBpedia. Firstly, we recall the basics of redescription mining and make precise the principles of our definitional process. Then we detail experiments carried out on datasets extracted from DBpedia. Based on the output of the experiments, we discuss the strengths and the possible extensions of our approach. KeywordsRedescription miningLinked Open DataDefinition of categoriesDisjointness axiomsFormal Concept Analysis
Article
With the development of the Semantic Web, more and more semantic data including many useful knowledge bases has been published on the Web. Such knowledge bases always lack expressive schema information, especially disjointness axioms and subclass axioms. This makes it difficult to perform many critical Semantic Web tasks like ontology reasoning, inconsistency handling and ontology mapping. To deal with this problem, a few approaches have been proposed to generate terminology axioms. However, they often adopt the closed world assumption which is opposite to the assumption adopted by the semantic data. This may lead to a lot of noisy negative examples so that existing learning approaches fail to perform well on such incomplete data. In this paper, a novel framework is proposed to automatically obtain disjointness axioms and subclass axioms from incomplete semantic data. This framework first obtains probabilistic type assertions by exploiting a type inference algorithm. Then a mining approach based on association rule mining is proposed to learn high-quality schema information. To address the incompleteness problem of semantic data, the mining model introduces novel definitions to compute the support and confidence for pruning false axioms. Our experimental evaluation shows promising results over several real-life incomplete knowledge bases like DBpedia and LUBM by comparing with existing relevant approaches.
Article
Full-text available
We study the problem of learning description logic (DL) ontologies in Angluin et al.'s framework of exact learning via queries. We admit membership queries ("is a given subsumption entailed by the target ontology?") and equivalence queries ("is a given ontology equivalent to the target ontology?"). We present three main results: (1) ontologies formulated in (two relevant versions of) the description logic DL-Lite can be learned with polynomially many queries of polynomial size; (2) this is not the case for ontologies formulated in the description logic EL, even when only acyclic ontologies are admitted; and (3) ontologies formulated in a fragment of EL related to the web ontology language OWL 2 RL can be learned in polynomial time. We also show that neither membership nor equivalence queries alone are sufficient in cases (1) and (3).
Conference Paper
Full-text available
RDF is structured, dynamic, and schemaless data, which enables a big deal of flexibility for Linked Data to be available in an open environment such as the Web. However, for RDF data, flexibility turns out to be the source of many data quality and knowledge representation issues. Tasks such as assessing data quality in RDF require a different set of techniques and tools compared to other data models. Furthermore, since the use of existing schema, ontology and constraint languages is not mandatory, there is always room for misunderstanding the structure of the data. Neglecting this problem can represent a threat to the widespread use and adoption of RDF and Linked Data. Users should be able to learn the characteristics of RDF data in order to determine its fitness for a given use case, for example. For that purpose, in this doctoral research, we propose the use of constraints to inform users about characteristics that RDF data naturally exhibits, in cases where ontologies (or any other form of explicitly given constraints or schemata) are not present or not expressive enough. We aim to address the problems of defining and discovering classes of constraints to help users in data analysis and assessment of RDF and Linked Data quality.
Conference Paper
Full-text available
Although an increasing number of RDF knowledge bases are published, many of those consist primarily of instance data and lack sophisticated schemata. Having such schemata allows more powerful querying, consistency checking and debugging as well as improved inference. One of the reasons why schemata are still rare is the effort required to create them. In this article, we propose a semi-automatic schemata construction approach addressing this problem: First, the frequency of axiom patterns in existing knowledge bases is discovered. Afterwards, those patterns are converted to SPARQL based pattern detection algorithms, which allow to enrich knowledge base schemata. We argue that we present the first scalable knowledge base enrichment approach based on real schema usage patterns. The approach is evaluated on a large set of knowledge bases with a quantitative and qualitative result analysis.
Article
Full-text available
In recent years the Web of Data experiences an extraordinary development: an increasing amount of Linked Data is available on the World Wide Web (WWW) and new use cases are emerging continually. However, the provided data is only valuable if it is accurate and without contradictions. One essential part of the Web of Data is DBpedia, which covers the structured data of Wikipedia. Due to its automatic extraction based on Wikipedia resources that have been created by various contributors, DBpedia data often is error-prone. In order to enable the detection of inconsistencies this work focuses on the enrichment of the DBpedia ontology by statistical methods. Taken the enriched ontology as a basis the process of the extraction of Wikipedia data is adapted, in a way that inconsistencies are detected during the extraction. The creation of suitable correction suggestions should encourage users to solve existing errors and thus create a knowledge base of higher quality.
Conference Paper
Full-text available
The Semantic Web has seen a rise in the availability and usage of knowledge bases over the past years, in particular in the Linked Open Data initiative. Despite this growth, there is still a lack of knowledge bases that consist of high quality schema information and instance data adhering to this schema. Several knowledge bases only consist of schema information, while others are, to a large extent, a mere collection of facts without a clear structure. The combination of rich schema and instance data would allow powerful reasoning, consistency checking, and improved querying possibilities as well as provide more generic ways to interact with the underlying data. In this article, we present a light-weight method to enrich knowledge bases accessible via SPARQL endpoints with almost all types of OWL 2 axioms. This allows to semi-automatically create schemata, which we evaluate and discuss using DBpedia.
Conference Paper
Full-text available
State-of-the-art research on automated learning of ontologies from text currently focuses on inexpressive ontologies. The acquisition of complex axioms involving logical connectives, role restrictions, and other expressive features of the Web Ontology Language OWL remains largely unexplored. In this paper, we present a method and implementation for enriching inexpressive OWL ontologies with expressive axioms which is based on a deep syntactic analysis of natural language definitions. We argue that it can serve as a core for a semi-automatic ontology engineering process supported by a methodology that integrates methods for both ontology learning and evaluation. The feasibility of our approach is demonstrated by generating complex class descriptions from Wikipedia definitions and from a fishery glossary provided by the Food and Agriculture Organization of the United Nations.
Conference Paper
Full-text available
Dealing with heterogeneous ontologies by means of semantic map- pings has become an important area of research and a number of systems for discovering mappings between ontologies have been developed. Most of these systems rely on general heuristics for finding mappings, hence are bound to fail in many situations. Consequently, automatically generated mappings often con- tain logical inconsistencies that hinder a sensible use of these mappings. In previ- ous work, we presented an approach for debugging mappings between expressive ontologies that eliminates inconsistencies by means of diagnostic reasoning. A shortcoming of this method was its need for expressive class definitions. More specifically, the applicability of this method critically relies on the existence of a high-quality disjointness axiomatization. This paper deals with the application of the debugging approach to mappings between lightweight ontologies that do not contain any or very few disjointness axioms, as it is the case for most of to- day's practical ontologies. After discussing different approaches to deal with the absence of disjointness axioms we propose the application of supervised machine learning for detecting disjointness in a fully automatic manner. We present a de- tailed evaluation of our approach to learning disjointness and its impact on map- ping debugging. The results show that debugging automatically created mappings with the help of learned disjointness axioms significantly improves the overall quality of these mappings.
Conference Paper
Full-text available
Understanding the logical meaning of any description logic or sim- ilar formalism is difficult for most people, and OWL-DL is no exception. This paper presents the most common difficulties encountered by newcomers to the language, that have been observed during the course of more than a dozen work- shops, tutorials and modules about OWL-DL and it's predecessor languages. It emphasises understanding the exact meaning of OWL expressions - proving that understanding by paraphrasing them in pedantic but explicit language. It ad- dresses, specifically, the confusion which OWL's open world assumption presents to users accustomed to closed world systems such as databases, logic program- ming and frame languages. Our experience has had a major influence in formulat- ing the requirements for a new set of user interfaces for OWL the first of which are now available as prototypes. A summary of the guidelines and paraphrases and examples of the new interface are provided. The example ontologies are available online.
Conference Paper
Full-text available
Ontology Learning from text aims at generating domain ontologies from textual resources by applying natural language processing and machine learning techniques. It is inherent in the ontology learning process that the ac- quired ontologies represent uncertain and possibly contradicting knowledge. From a logical perspective, the learned ontologies are potentially inconsistent knowl- edge bases that thus do not allow meaningful reasoning directly. In this paper we present an approach to generate consistent OWL ontologies from learned ontol- ogy models by taking the uncertainty of the knowledge into account. We further present evaluation results from experiments with ontologies learned from a Digi- tal Library.
Conference Paper
Full-text available
An increasing number of applications benefits from light-weight ontologies, or to put it differently “a little semantics goes a long way”. However, our experience indicates that more expressiveness can offer significant advantages. Introducing disjointness axioms, for instance, greatly facilitates consistency checking and the automatic evaluation of ontologies. In an extensive user study we discovered that proper modeling of disjointness is a difficult and very time-consuming task. We therefore developed an approach to automatically enrich learned or manually engineered ontologies with disjointness axioms. This approach relies on several methods for obtaining syntactic and semantic evidence from different sources which we believe to provide a solid base for learning disjointness. After thoroughly evaluating the implementation of our approach we think that in future ontology engineering environments the automatic discovery of disjointness axioms may help to increase the richness, quality and usefulness of any given ontology.
Conference Paper
Full-text available
Ontologies are the backbone of the Semantic Web as they allow one to share vocabulary in a semantically sound way. For ontologies, specified in OWL or a related web ontology language, Description Logic reasoner can often detect logical contradictions. Unfortunately, there are two drawbacks: they lack in support for debugging incoherence in ontologies, and they can only be applied to reasonably expressive ontologies (containing at least some sort of negation). In this paper, we attempt to close these gaps using a technique called pinpointing. In pinpointing we identify minimal sets of axioms which need to be removed or ignored to turn an ontology coherent. We then show how pinpointing can be used for debugging of web ontologies in two typical cases. More unusual is the application of pinpointing in the semantic clarification of underspecified web ontologies which we experimentally evaluate on a number of well-known web-ontologies. Our findings are encouraging: even though semantic ambiguity remains an issue, we show that pinpointing can be useful for debugging, and that it can significantly improve the quality of our semantic enrichment in a fully automatic way.
Conference Paper
Full-text available
Common assumption in most machine learning algorithms is that, labeled (source) data and unlabeled (target) data are sampled from the same distribution. However, many real world tasks violate this assumption: in temporal domains, feature distributions may vary over time, clinical studies may have sampling bias, or sometimes sufficient labeled data for the domain of interest does not exist, and labeled data from a related domain must be utilized. In such settings, knowing in which dimensions source and target data vary is extremely important to reduce the distance between domains and accurately transfer knowledge. In this paper, we present a novel method to identify variant and invariant features between two datasets. Our contribution is two fold: First, we present a novel transfer learning approach for domain adaptation, and second, we formalize the problem of finding differently distributed features as a convex optimization problem. Experimental studies on synthetic and benchmark real world datasets show that our approach outperform other transfer learning approaches, and it aids the prediction accuracy significantly.
Article
Full-text available
In this paper, we introduce DL-Learner, a framework for learning in description logics and OWL. OWL is the official W3C standard ontology language for the Seman tic Web. Concepts in this language can be learned for constructing and maintaining OWL ontologies or for solving prob- lems similar to those in Inductive Logic Programming. DL-Learner includes several learning al- gorithms, support for different OWL formats, reasoner interfaces, and learning problems. It is a cross-platform framework implemented in Java. The framework allows easy programmatic access and provides a command line interface, a graphical interface as well as a WSDL-based web service.
Article
Full-text available
The vision of the Semantic Web is to make use of seman- tic representations on the largest possible scale - the Web. Large knowledge bases such as DBpedia, OpenCyc, Gov- Track, and others are emerging and are freely available as Linked Data and SPARQL endpoints. Exploring and an- alyzing such knowledge bases is a signicant hurdle for Se- mantic Web research and practice. As one possible direction for tackling this problem, we present an approach for obtain- ing complex class descriptions from objects in knowledge bases by using Machine Learning techniques. We describe how we leverage existing techniques to achieve scalability on large knowledge bases available as SPARQL endpoints or Linked Data. Our algorithms are made available in the open source DL-Learner project and can be used in real-life scenarios by Semantic Web applications.
Article
Full-text available
More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Article
Full-text available
We describe an implementation of the well-known apriori algorithm for the induction of association rules [Agrawal et al. (1993), Agrawal et al. (1996)] that is based on the concept of a prefix tree. While the idea to use this type of data structure is not new, there are several ways to organize the nodes of such a tree, to encode the items, and to organize the transactions, which may be used in order to minimize the time needed to find the frequent itemsets as well as to reduce the amount of memory needed to store the counters. Consequently, our emphasis is less on concepts, but on implementation issues, which, however, can make a considerable difference in applications.
Conference Paper
Full-text available
Ontologies will play a pivotal role in the "Semantic Web", where they will provide a source of precisely defined terms that can be communicated across people and applications. OilEd, is an ontology editor that has an easy to use frame interface, yet at the same time allows users to exploit the full power of an expressive web ontology language (OIL). OilEd uses reasoning to support ontology design, facilitating the development of ontologies that are both more detailed and more accurate. 1
Conference Paper
The Linked Data cloud grows rapidly as more and more knowledge bases become available as Linked Data. Knowledge-based applications have to rely on efficient implementations of query languages like SPARQL, in order to access the information which is contained in large datasets such as DBpedia, Freebase or one of the many domain-specific RDF repositories. However, the retrieval of specific facts from an RDF dataset is often hindered by the lack of schema knowledge, that would allow for query-time inference or the materialization of implicit facts. For example, if an RDF graph contains information about films and actors, but only Titanic starring Leonardo_DiCaprio is stated explicitly, a query for all movies Leonardo DiCaprio acted in might not yield the expected answer. Only if the two properties starring and actedIn are declared inverse by a suitable schema, the missing link between the RDF entites can be derived. In this work, we present an approach to enriching the schema of any RDF dataset with property axioms by means of statistical schema induction. The scalability of our implementation, which is based on association rule mining, as well as the quality of the automatically acquired property axioms are demonstrated by an evaluation on DBpedia.
Conference Paper
Recent years have seen a dramatic growth of semantic web on the data level, but unfortunately not on the schema level, which contains mostly concept hierarchies. Theshortage of schemas makes the semantic web data difficult to be used in many semantic web applications, so schemas learningfrom semantic web data becomes an increasingly pressing issue. In this paper we propose a novel schemas learning approach -BelNet, which combines description logics (DLs) with Bayesian networks. In this way BelNet is capable to understand andcapture the semantics of the data on the one hand, and tohandle incompleteness during the learning procedure on theother hand. The main contributions of this work are: (i)we introduce the architecture of BelNet, and correspondinglypropose the ontology learning techniques in it, (ii) we compare the experimental results of our approach with the state-of-the-art ontology learning approaches, and provide discussions from different aspects.
Conference Paper
We consider the problem of analyzing market-basket data and present several important contributions. First, we present a new algorithm for finding large itemsets which uses fewer passes over the data than classic algorithms, and yet uses fewer candidate itemsets than methods based on sampling. We investigate the idea of item reordering, which can improve the low-level efficiency of the algorithm. Second, we present a new way of generating "implication rules," which are normalized based on both teh antecedent and the consequent and are truly implications (not simply a measure of co-occurence), and we show how they produce more intuitive results than other methods. Finally, we show how different characteristics of real data, as opposed to synthetic data, can dramatically affect the performance of the system and the form of the results.
Article
In this paper we present Text2Onto, a framework for on- tology learning from textual resources. Three main features distinguish Text2Onto from our earlier framework TextToOnto as well as other state-of-the-art ontology learning frameworks. First, by representing the learned knowledge at a meta-level in the form of instantiated model- ing primitives within a so called Probabilistic Ontology Model (POM), we remain independent of a concrete target language while being able to translate the instantiated primitives into any (reasonably expressive) knowledge representation formalism. Second, user interaction is a core as- pect of Text2Onto and the fact that the system calculates a confldence for each learned object allows to design sophisticated visualizations of the POM. Third, by incorporating strategies for data-driven change dis- covery, we avoid processing the whole corpus from scratch each time it changes, only selectively updating the POM according to the corpus changes instead. Besides increasing e-ciency in this way, it also allows a user to trace the evolution of the ontology with respect to the changes in the underlying corpus.
Chapter
The problem of finding all approximate occurrences P of a pattern string P in a text string T such that the edit distance between P and P is k is considered. We concentrate on a scheme in which T is first preprocessed to make the subsequent searches with different P fast. Two preprocessing methods and the corresponding search algorithms are described. The first is based suffix automata and is applicable for edit distances with general edit operation costs. The second is a special design for unit cost edit distance and is based on q-gram lists. The preprocessing needs in both cases time and space O(|T|). The search algorithms run in the worst case in time O(|P||T|) or O(k|T|), and in the best case in time O(|P|).
Article
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.
Conference Paper
In this paper we present i.com, a tool for intelligent conceptual modelling. i.comallows for the specification of multiple EER diagrams and inter- and intra-schema constraints.Complete logical reasoning is employed by the tool to verify the specification,infer implicit facts, and manifest any inconsistencies.1 Introductioni.com is a tool supporting the conceptual design phase of an information system, and inparticular of an integration information system -- such as a data warehouse....
Conference Paper
While the realization of the Semantic Web as once envisioned by Tim Berners-Lee remains in a distant future, the Web of Data has already become a reality. Billions of RDF statements on the Internet, facts about a variety of different domains, are ready to be used by semantic applications. Some of these applications, however, crucially hinge on the availability of expressive schemas suitable for logical inference that yields non-trivial conclusions. In this paper, we present a statistical approach to the induction of expressive schemas from large RDF repositories. We describe in detail the implementation of this approach and report on an evaluation that we conducted using several data sets including DBpedia.
Conference Paper
Ontology matching has become an important field of research over the last years. Although many different approaches have been proposed, only few of them are committed to a well defined semantics. As a consequence, the possibilities of reasoning are not exploited to their full extent. A reasoning based approach will not only improve ontology matching, but will also be necessary to solve certain problems that hinder the progress of the whole field. We focus on the notion of alignment incoherence to understand the capabilities of reasoning in ontology matching.
Conference Paper
The tremendous amounts of linked data available on the web are a valuable resource for a variety of semantic applications. However, these applications often need to face the challenges posed by flawed or underspecified representations. The sheer size of these data sets, being one of their most appealing features, is at the same time a hurdle on the way towards more accurate data because this size and the dynamics of the data often hinder manual maintenance and quality assurance. Schemas or ontologies constraining, e.g., the possible instantiations of classes and properties, could facilitate the automated detection of undesired usage patterns or incorrect assertions, but only few knowledge repositories feature schema-level knowledge of sufficient expressivity. In this paper, we present several approaches to enriching learned or manually engineered ontologies with disjointness axioms, an important prerequisite for the applicability of logical approaches to knowledge base debugging. We describe the strengths and weaknesses of these approaches and report on a detailed evaluation based on the DBpedia dataset.
Conference Paper
The Internet is a giant semiotic system. It is a massive collection of Peirce's three kinds of signs: icons, which show the form of something; indices, which point to something; and symbols, which represent something according to some convention. But current proposals for ontologies and metadata have overlooked some of the most important features of signs. A sign has three aspects: it is (1) an entity that represents (2) another entity to (3) an agent. By looking only at the signs themselves, some metadata proposals have lost sight of the entities they represent and the agents  human, animal, or robot  which interpret them. With its three branches of syntax, semantics, and pragmatics, semiotics provides guidelines for organizing and using signs to represent something to someone for some purpose. Besides representation, semiotics also supports methods for translating patterns of signs intended for one purpose to other patterns intended for different but related purposes. This article shows how the fundamental semiotic primitives are represented in semantically equivalent notations for logic, including controlled natural languages and various computer languages.
Article
Objectives: We sought to test the hypothesis that decreased plasma soluble receptor for advanced glycation end products (sRAGE) levels were associated with endothelial dysfunction in nondiabetic patients. Background: sRAGE, a C-truncated secretary isoform of the receptor protein, has been shown to neutralize vascular damage mediated by advanced glycation end products, and has been implicated in atherogenesis. However, the relation between plasma sRAGE level and endothelial function remains unclear. Methods: Plasma levels of sRAGE were examined in 180 nondiabetic participants with suspected coronary artery disease. Endothelial function was evaluated by endothelium-dependent flow-mediated vasodilation (FMD) of the brachial artery. The primary end point was the combined occurrence of major adverse cardiovascular events, including nonfatal myocardial infarction, revascularization with percutaneous coronary intervention or coronary artery bypass grafting, ischemic stroke, and cardiovascular death. Results: All participants were divided into three groups according to the magnitude of FMD: group 1 (FMD <3%), group 2 (FMD >or=3 and <6%), group 3 (FMD >or=6%). The plasma levels of sRAGE were significantly decreased in group 1 compared with groups 2 and 3 (676+/-270, 820+/-357, and 1140+/-451 pg/ml; P<0.001). By multivariate analysis, it was shown that the plasma sRAGE level was an independent predictor of endothelium-dependent FMD (R = 0.46; P<0.001). After a 48-month follow-up period, there were 23 events (26%) in the lower sRAGE group(<or=median, 809 pg/ml) and 11 events (12%) in the higher sRAGE group (>809 pg/ml; P<0.05). By the Kaplan-Meier analysis, it was shown that enhanced plasma levels of sRAGE were associated with better major adverse cardiovascular event-free survival (P = 0.032). Conclusion: The results indicate that plasma sRAGE levels are positively associated with endothelial function and predict cardiovascular events in nondiabetic participants with suspected coronary artery disease, suggesting its pivotal role in atherothrombosis.