ChapterPDF Available

Abstract and Figures

Despite the benefits deriving from explicitly modeling concept disjointness to increase the quality of the ontologies, the number of disjointness axioms in vocabularies for the Web of Data is still limited, thus risking to leave important constraints underspecified. Automated methods for discovering these axioms may represent a powerful modeling tool for knowledge engineers. For the purpose, we propose a machine learning solution that combines (unsupervised) distance-based clustering and the divide-and-conquer strategy. The resulting terminological cluster trees can be used to detect candidate disjointness axioms from emerging concept descriptions. A comparative empirical evaluation on different types of ontologies shows the feasibility and the effectiveness of the proposed solution that may be regarded as complementary to the current methods which require supervision or consider atomic concepts only.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
A new framework for the induction of logical decision trees is presented. Differently from the original setting, tests at the tree nodes are expressed with Description Logic concepts. This has a number of advantages: expressive terminological languages are endowed with full negation, thus allowing for a more natural division of the individuals at each test node; these logics support the standard ontology languages for representing knowledge bases in the Semantic Web. A top-down method for inducing terminological decision trees is proposed as an adaptation of well-known tree-induction methods. This offers an alternative way for learning in Description logics as concept descriptions can be associated to the terminological trees. A new version of the System TermiTIS, implementing the methods, is experimentally evaluated on ontologies from popular repositories.
Conference Paper
Full-text available
We survey nearly 1300 OWL ontologies and RDFS schemas. The col- lection of statistical data allows us to perform analysis and report some trends. Though most of the documents are syntactically OWL Full, very few stay in OWL Full when they are syntactically patched by adding type triples. We also re- port the frequency of occurrences of OWL language constructs and the shape of class hierarchies in the ontologies. Finally, we note that of the largest ontologies surveyed here, most do not exceed the description logic expressivity of ALC.
Conference Paper
Full-text available
While the number and size of Semantic Web knowledge bases increases, their maintenance and quality assurance are still difficult. In this article, we present ORE, a tool for repairing and enriching OWL ontologies. State-of the-art methods in ontology debugging and supervised machine learning form the basis of ORE and are adapted or extended so as to work well in practice. ORE supports the detection of a variety of ontology modelling problems and guides the user through the process of resolving them. Furthermore, the tool allows to extend an ontology through (semi-)automatic supervised learning. A wizardlike process helps the user to resolve potential issues after axioms are added.
Conference Paper
Full-text available
Ontology Learning from text aims at generating domain ontologies from textual resources by applying natural language processing and machine learning techniques. It is inherent in the ontology learning process that the ac- quired ontologies represent uncertain and possibly contradicting knowledge. From a logical perspective, the learned ontologies are potentially inconsistent knowl- edge bases that thus do not allow meaningful reasoning directly. In this paper we present an approach to generate consistent OWL ontologies from learned ontol- ogy models by taking the uncertainty of the knowledge into account. We further present evaluation results from experiments with ontologies learned from a Digi- tal Library.
Conference Paper
Full-text available
In order to overcome the limitations of deductive logic-based approaches to deriving operational knowledge from ontologies, especially when data come from distributed sources, inductive (instance-based) methods may be better suited, since they are usually ecient and noise- tolerant. In this paper we propose an inductive method for improving the instance retrieval and enriching the ontology population. By cast- ing retrieval as a classification problem with the goal of assessing the individual class-memberships w.r.t. the query concepts, we propose an extension of the k-Nearest Neighbor algorithm for OWL ontologies based on an entropic distance measure. The procedure can classify the indi- viduals w.r.t. the known concepts but it can also be used to retrieve individuals belonging to query concepts. Experimentally we show that the behavior of the classifier is comparable with the one of a standard reasoner. Moreover we show that new knowledge (not logically derivable) is induced. It can be suggested to the knowledge engineer for validation, during the ontology population task.
Article
Although it is widely acknowledged that adding class disjointness to ontologies enables a wide range of interesting applications, this type of axiom is rarely used on today’s Semantic Web. This is due to the enormous skill and effort required to make the necessary modeling decisions. Automatically generating disjointness axioms could lower the barrier of entry and lead to a wider spread adoption. Different methods have been proposed for this automatic generation. These include supervised, top-down approaches which base their results on heterogeneous types of evidence and unsupervised, bottom-up approaches which rely solely on the instance data available for the ontology. However, current literature is missing a thorough comparison of these approaches. In this article, we provide this comparison by presenting two fundamentally different state-of-the-art approaches and evaluating their relative ability to enrich a well-known, multi-purpose ontology with class disjointness. To do so, we introduce a high-quality gold standard for class disjointness. We describe the creation of this standard in detail and provide a thorough analysis. Finally, we also present improvements to both approaches, based in part on discoveries made during our analysis and evaluation.
Conference Paper
This work presents a clustering method which can be applied to relational knowledge bases. Namely, it can be used to discover interesting groupings of semantically annotated resources in a wide range of concept languages. The method exploits a novel dissimilarity measure that is based on the resource semantics w.r.t. a number of dimensions corresponding to a committee of features, represented by a group of concept descriptions (discriminating features). The algorithm is an adaptation of the classic Bisecting k-Means to complex representations typical of the ontology in the Semantic Web. We discuss its complexity and the potential applications to a variety of important tasks.
Article
A first order framework for top-down induction of logical decision trees is introduced. Logical decision trees are more expressive than the flat logic programs typically induced by empirical inductive logic programming systems because logical decision trees introduce invented predicates and mix existential and universal quantification of variables. An implementation of the framework, the Tilde system, is presented and empirically evaluated. 1 Introduction Top-down induction of decision trees (TDIDT) [Qui86] is the best known and most succesful machine learning technique. It has been used to solve numerous practical problems. It employs a divide-and-conquer strategy, and in this it differs from its rulebased competitors (e.g. AQ [MMHL86]), which are based on covering strategies (cf. [Bos95]). Within attribute-value learning (or propositional concept-learning) TDIDT is more popular than the covering approach. Yet, within first order approaches to concept-learning, only a few learning sy...
Conference Paper
While the realization of the Semantic Web as once envisioned by Tim Berners-Lee remains in a distant future, the Web of Data has already become a reality. Billions of RDF statements on the Internet, facts about a variety of different domains, are ready to be used by semantic applications. Some of these applications, however, crucially hinge on the availability of expressive schemas suitable for logical inference that yields non-trivial conclusions. In this paper, we present a statistical approach to the induction of expressive schemas from large RDF repositories. We describe in detail the implementation of this approach and report on an evaluation that we conducted using several data sets including DBpedia.