Chapter

# Rapid Restart Hill Climbing for Learning Description Logic Concepts

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

## Abstract

Recent advances in learning description logic (DL) concepts usually employ a downward refinement operator for space traversing and hypotheses construction. However, theoretical research proved that ideal refinement operator does not exist for expressive DLs, including the language $$\mathcal {ALC}$$. The state-of-the-art learning framework DL-Learner suggests to use a complete and proper refinement operator and to handle infiniteness algorithmically. For example, the CELOE algorithm follows an iterative widening approach to build a search tree of concept hypotheses. To select a tree node for expansion, CELOE adopts a simple greedy strategy that neglects the structure of the search tree. In this paper, we present the Rapid Restart Hill Climbing (RRHC) algorithm that selects a node for expansion by traversing the search tree in a hill climbing manner and rapidly restarts with one-step backtracking after each expansion. We provide an implementation of RRHC in the DL-Learner framework and compare its performance with CELOE using standard benchmarks.

## No full-text available

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
The paper presents the ultimate version of a concept learning system which can support typical ontology construction / evolution tasks through the induction of class expressions from groups of individual resources labeled by a domain expert. Stating the target task as a search problem, a Foil-like algorithm was devised based on the employment of refinement operators to traverse the version-space of candidate definitions for the target class. The algorithm has been further enhanced including a more general definition for the scoring function and better refinement operators. An experimental evaluation of the resulting new release of DL-Foil, which implements these improvements was carried out to assess its performance also in comparison with other concept learning systems.
Conference Paper
Full-text available
In this paper we focus on learning concept descriptions expressed in Description Logics. After stating the learning problem in this context, a FOIL-like algorithm is presented that can be applied to general DL languages, discussing related theoretical aspects of learning with the inherent incompleteness underlying the semantics of this representation. Subsequently we present an experimental evaluation of the implementation of this algorithm performed on some real ontologies in order to empirically assess its performance.
Conference Paper
Full-text available
In order to leverage techniques from Inductive Logic Pro- gramming for the learning in description logics (DLs), which are the foundation of ontology languages in the Semantic Web, it is important to acquire a thorough understanding of the theoretical potential and limitations of using refinement operators within the description logic paradigm. In this paper, we present a comprehensive study which ana- lyses desirable properties such operators should have. In particular, we show that ideal refinement operators in general do not exist, which is in- dicative of the hardness inherent in learning in DLs. We also show which combinations of desirable properties are theoretically possible, thus pro- viding an important step towards the definition of practically applicable operators.
Conference Paper
Full-text available
While the problem of learning logic programs has been extensively studied in ILP, the problem of learning in description logics (DLs) has been tackled mostly by empirical means. Learning in DLs is however worthwhile, since both Horn logic and description logics are widely used knowledge representation formalisms, their expressive powers being incomparable (neither includes the other as a fragment). Unlike most approaches to learning in description logics, which provide bottom-up (and typically overly specific) least generalizations of the examples, this paper addresses learning in DLs using downward (and upward) refinement operators. Technically, we construct a complete and proper refinement operator for the ALER description logic (to avoid overfitting, we disallow disjunctions from the target DL). Although no minimal refinement operators exist for ALER, we show that we can achieve minimality of all refinement steps, except the ones that introduce the ⊥ concept. We additionally prove that complete refinement operators for ALER cannot be locally finite and suggest how this problem can be overcome by an MDL search heuristic. We also discuss the influence of the Open World Assumption (typically made in DLs) on example coverage.
Conference Paper
Full-text available
We focus on the induction and revision of terminologies from metadata. Following a Machine Learning approach, this setting can be cast as a search problem to be solved employing operators that traverse the search space expressed in a structural representation, aiming at correct concept definitions. The progressive refinement of such definitions in a terminology is driven by the available extensional knowledge (metadata). A knowledge-intensive inductive approach to this task is presented, that can deal with on the expressive Semantic Web representations based on Description Logics, which are endowed with well-founded reasoning capabilities. The core inferential mechanism, based on multilevel counterfactuals, can be used for either inducing new concept descriptions or refining existing (incorrect) ones. The soundness of the approach and its applicability are also proved and discussed.
Conference Paper
Full-text available
Hill-climbing search is the most commonly used search algorithm in ILP systems because it permits the generation of theories in short running times. However, a well known drawback of this greedy search strategy is its myopia. Macro-operators (or macros for short), a recently proposed technique to reduce the search space explored by exhaustive search, can also be argued to reduce the myopia of hill-climbing search by automatically performing a variable-depth look-ahead in the search space. Surprisingly, macros have not been employed in a greedy learner. In this paper, we integrate macros into a hill-climbing learner. In a detailed comparative study in several domains, we show that indeed a hill-climbing learner using macros performs significantly better than current state-of-the-art systems involving other techniques for reducing myopia, such as fixed-depth look-ahead, template-based look-ahead, beam-search, or determinate literals. In addition, macros, in contrast to some of the other approaches, can be computed fully automatically and do not require user involvement nor special domain properties such as determinacy.
Conference Paper
Full-text available
Refinement operators are frequently used in the area of mul- tirelational learning (Inductive Logic Programming, ILP) in order to search systematically through a generality order on clauses for a correct theory. Only the clauses reachable by a finite number of applications of a refinement operator are considered by a learning system using this refine- ment operator; ie. the refinement operator determines the search space of the system. For efficiency reasons, we would like a refinement operator to compute the smallest set of clauses necessary to find a correct theory. In this paper we present a formal method based on macro-operators to reduce the search space defined by a downward refinement operator (�) while finding the same theory as the original operator. Basically we de- fine a refinement operator which adds to a clause not only single-literals but also automatically created sequences of literals (macro-operators). This in turn allows us to discard clauses which do not belong to a cor- rect theory. Experimental results show that this technique significantly reduces the search-space and thus accelerates the learning process.
Conference Paper
Full-text available
We focus on the problem of specialization in a description logics (DL) representation, specifically the ALN language. Standard approaches to learning in these representations are based on bottom-up algorithms that employ the lcs operator, which, in turn, produces overly specific (overfitting,) and still redundant concept definitions. In the dual (top-down) perspective, this issue can be tackled by means of an ILP downward operator. Indeed, using a mapping from DL descriptions onto a clausal representation, we define a specialization operator computing maximal specializations of a concept description on the grounds of the available positive and negative examples.
Article
Full-text available
With the advent of the Semantic Web, description logics have become one of the most prominent paradigms for knowledge representation and reasoning. Progress in re- search and applications, however, is constrained by the lack of well-structured knowledge bases consisting of a sophisticated schema and instance data adhering to this schema. It is paramount that suitable automated methods for their acquisition, maintenance, and evolu- tion will be developed. In this paper, we provide a learning algorithm based on refinement operators for the description logic ALCQ including support for concrete roles. We develop the algorithm from thorough theoretical foundations by identifying possible abstract prop- erty combinations which refinement operators for description logics can have. Using these investigations as a basis, we derive a practically useful complete and proper refinement operator. The operator is then cast into a learning algorithm and evaluated using our im- plementation DL-Learner. The results of the evaluation show that our approach is superior to other learning approaches on description logics, and is competitive with established ILP systems.
Article
In machine learning, one often encounters data sets where a general pattern is violated by a relatively small number of exceptions (for example, a rule that says that all birds can fly is violated by examples such as penguins). This complicates the concept learning process and may lead to the rejection of some simple and expressive rules that cover many cases. In this paper we present an approach to this problem in description logic learning by computing partial descriptions (which are not necessarily entirely complete) of both positive and negative examples and combining them. Our Symmetric Parallel Class Expression Learning approach enables the generation of general rules with exception patterns included. We demonstrate that this algorithm provides significantly better results (in terms of metrics such as accuracy, search space covered, and learning time) than standard approaches on some typical data sets. Further, the approach has the added benefit that it can be paral-lelised relatively simply, leading to much faster exploration of the search tree on modern computers.
Article
In this system paper, we describe the DL-Learner framework, which supports supervised machine learning using OWL and RDF for background knowledge representation. It can be beneficial in various data and schema analysis tasks with applications in different standard machine learning scenarios, e.g. in the life sciences, as well as Semantic Web specific applications such as ontology learning and enrichment. Since its creation in 2007, it has become the main OWL and RDF-based software framework for supervised structured machine learning and includes several algorithm implementations, usage examples and has applications building on top of the framework. The article gives an overview of the framework with a focus on algorithms and use cases.
Conference Paper
We propose a Parallel Class Expression Learning algorithm that is inspired by the OWL Class Expression Learner (OCEL) and its extension --- Class Expression Learning for Ontology Engineering (CELOE) --- proposed by Lehmann et al. in the DL-Learner framework. Our algorithm separates the computation of partial definitions from the aggregation of those solutions to an overall complete definition, which lends itself to parallelisation. Our algorithm is implemented based on the DL-Learner infrastructure and evaluated using a selection of datasets that have been used in other ILP systems. It is shown that the proposed algorithm is suitable for learning problems that can only be solved by complex (long) definitions. Our approach is part of an ontology-based abnormality detection framework that is developed to be used in smart homes.
Conference Paper
Recent empirical studies show that runtime distributions of backtrack procedures for solving hard combinatorial problems often have intriguing properties. Unlike standard distributions (such as the normal), such distributions decay slower than exponentially and have “heavy tails”. Procedures characterized by heavy-tailed runtime distributions exhibit large variability in efficiency, but a very straightforward method called rapid randomized restarts has been designed to essentially improve their average performance. We show on two experimental domains that heavy-tailed phenomena can be observed in ILP, namely in the search for a clause in the subsumption lattice. We also reformulate the technique of randomized rapid restarts to make it applicable in ILP and show that it can reduce the average search-time.
Article
Recent statistical performance studies of search algorithms in difficult combinatorial problems have demonstrated the benefits of randomising and restarting the search procedure. Specifically, it has been found that if the search cost distribution of the non-restarted randomised search exhibits a slower-than-exponential decay (that is, a “heavy tail”), restarts can reduce the search cost expectation. We report on an empirical study of randomised restarted search in ILP. Our experiments conducted on a high-performance distributed computing platform provide an extensive statistical performance sample of five search algorithms operating on two principally different classes of ILP problems, one represented by an artificially generated graph problem and the other by three traditional classification benchmarks (mutagenicity, carcinogenicity, finite element mesh design). The sample allows us to (1) estimate the conditional expected value of the search cost (measured by the total number of clauses explored) given the minimum clause score required and a “cutoff” value (the number of clauses examined before the search is restarted), (2) estimate the conditional expected clause score given the cutoff value and the invested search cost, and (3) compare the performance of randomised restarted search strategies to a deterministic non-restarted search. Our findings indicate striking similarities across the five search algorithms and the four domains, in terms of the basic trends of both the statistics (1) and (2). Also, we observe that the cutoff value is critical for the performance of the search algorithm, and using its optimal value in a randomised restarted search may decrease the mean search cost (by several orders of magnitude) or increase the mean achieved score significantly with respect to that obtained with a deterministic non-restarted search.
Article
FOIL is a first-order learning system that uses information in a collection of relations to construct theories expressed in a dialect of Prolog. This paper provides an overview of the principal ideas and methods used in the current version of the system, including two recent additions. We present examples of tasks tackled by FOIL and of systems that adapt and extend its approach.
Article
AbstractWhile the number of knowledge bases in the Semantic Web increases, the maintenance and creation of ontology schemata still remain a challenge. In particular creating class expressions constitutes one of the more demanding aspects of ontology engineering. In this article we describe how to adapt a semi-automatic method for learning OWL class expressions to the ontology engineering use case. Specifically, we describe how to extend an existing learning algorithm for the class learning problem. We perform rigorous performance optimization of the underlying algorithms for providing instant suggestions to the user. We also present two plugins, which use the algorithm, for the popular Protégé and OntoWiki ontology editors and provide a preliminary evaluation on real ontologies.
Article
In the line of realizing the Semantic-Web by means of mechanized practices, we tackle the problem of building ontologies, assisting the knowledge engineers’ job by means of Machine Learning techniques. In particular, we investigate on solutions for the induction of concept descriptions in a semi-automatic fashion. In particular, we present an algorithm that is able to infer definitions in the $$\mathcal{ALC}$$ Description Logic (a sub-language of OWL-DL) from instances made available by domain experts. The effectiveness of the method with respect to past algorithms is also empirically evaluated with an experimentation in the document image understanding domain.
Foundations of refinement operators for description logics
• J Lehmann
• P Hitzler
• H Blockeel
• J Ramon
• J Shavlik