Conference PaperPDF Available

Converting Semantic Meta-knowledge into Inductive Bias

Authors:

Abstract

The Cyc KB has a rich pre-existing ontology for representing common sense knowledge. To clarify and enforce its terms’ semantics and to improve inferential efficiency, the Cyc ontology contains substantial meta-level knowledge that provides definitional information about its terms, such as a type hierarchy. This paper introduces a method for converting that meta-knowledge into biases for ILP systems. The process has three stages. First, a “focal position” for the target predicate is selected, based on the induction goal. Second, the system determines type compatibility or conflicts among predicate argument positions, and creates a compact, efficient representation that allows for syntactic processing. Finally, mode declarations are generated, taking advantage of information generated during the first and second phases.
S. Kramer and B. Pfahringer (Eds.): ILP 2005, LNAI 3625, pp. 38 50, 2005.
© Springer-Verlag Berlin Heidelberg 2005
Converting Semantic Meta-knowledge
into Inductive Bias
John Cabral, Robert C. Kahlert, Cynthia Matuszek, Michael Witbrock,
and Brett Summers
Cycorp, Inc., 3721 Executive Center Drive, Suite 100,
Austin, TX 78739
{jcabral, rck, cynthia, witbrock, bsummers}@cyc.com
Abstract. The Cyc KB has a rich pre-existing ontology for representing
common sense knowledge. To clarify and enforce its terms’ semantics and to
improve inferential efficiency, the Cyc ontology contains substantial meta-level
knowledge that provides definitional information about its terms, such as a type
hierarchy. This paper introduces a method for converting that meta-knowledge
into biases for ILP systems. The process has three stages. First, a “focal
position” for the target predicate is selected, based on the induction goal.
Second, the system determines type compatibility or conflicts among predicate
argument positions, and creates a compact, efficient representation that allows
for syntactic processing. Finally, mode declarations are generated, taking
advantage of information generated during the first and second phases.
1 Introduction
Because of the general complexity of machine learning, the discipline has devoted
significant attention to the use of inductive bias in improving algorithmic efficiency.
Inductive logic programming (ILP) is in an interesting position on this topic because
its declarative representations can be extended to include the representation of the ILP
system itself, including the biases its algorithms employ. Tausend [1], [2], Nedellec et
al. [3] have investigated the utility of declaratively representing biases as part of the
background knowledge for ILP systems. They have noted that such representations
allow for an explicit and modular representation of bias, such that biases used by
different systems can be meaningfully compared and the settings of an individual
system can be easily manipulated. McCreath [4], McCreath and Sharma [5], and
DiMauro, et. al. [6] have begun the next phase of this work by developing algorithms
for inducing type and mode biases from data. The implementation of these algorithms
in pre-processors promises to reduce the burden on the human users of ILP systems.
An alternative source for inductive biases is the language of the data, when that
language is part of a larger ontology. Using the knowledge contained in the Cyc
knowledge base (KB) – a very large, non-domain-specific ontology of formalized
knowledge – provides a novel approach to automating the generation of inductive
biases.
Distribution Statement A: Approved for public release; distribution is unlimited.
Converting Semantic Meta-knowledge into Inductive Bias 39
2 An Overview of the Cyc Project
The Cyc project is an ambitious, decades-long effort to generate a store of common-
sense knowledge that enables reasoning in a broad array of domains. Human
knowledge engineers have so far constructed most of the knowledge base (KB).
However, one of the underlying premises of the Cyc project is that certain kinds of
machine learning require the existence of a pre-existing body of knowledge into
which new knowledge can be integrated (Lenat and Guha [7]). So, the long-term goal
of the project is to invest human effort in the creation of a necessary foundation for
effective, future machine learning. The goal of the research described in this paper is
to begin the work of converting the existing knowledge into a basis for more effective
automated learning. Since the representational language of Cyc is an extension of
first-order predicate logic, inductive logic programming is the natural fit for the type
of machine learning that would eventually drive the growth of the knowledge base.
The Cyc KB is represented using the language CycL, a LISP-like language that
includes features like quantification over predicates and complete sentences, modal
operators, and an extensive meta-language. All expressions are wrapped within
parentheses and the leftmost entity within the parentheses being the predicate or
function. Constants of the language are tagged with a prefixed '#$' and variables are
composed of all capital letters and a prefixed '?'. All aspects of the language are
represented within CycL. So, logical connectives follow the same syntactic rules. The
following Prolog clause (a translation of "For any birth event E in which some M
plays the role of female parent and some C is the child born during E, then M is C's
biological mother"):
biologicalMother(C,M) :-
birthEvent(E), femaleParentActor(E,M), birthChild(E,C)
would be represented in CycL as:
(#$implies
(#$and
(#$isa ?BIRTH #$BirthEvent)
(#$femaleParentActor ?BIRTH ?MOTHER)
(#$birthChild ?BIRTH ?CHILD))
(#$biologicalMother ?CHILD ?MOTHER)).1
The above example illustrates a significant feature of the Cyc ontology. First,
classes are denoted with constant names, not predicates. This choice is based on
certain requirements of ontological engineering. First, it allows for multiple instance-
to-class relations to be introduced into the vocabulary. Thus, we can specialize the
predicate to specific types of collection to more precise relations such as occupations,
ethnicities, and nationalities. Introducing these more specific relations increases the
precision of the language and, as a consequence, allows for greater inferential
1 When variables are unbound, the system assumes universal quantifiers with scope over the
entire sentence.
40 J. Cabral et al.
efficiency because problems can be more narrowly defined. Second, because there is
a significant inferential cost to reasoning with unbound predicates and it is quite
common quantify over classes, introducing classes as named entities allows rules to
be written at the right level of generality without incurring a significant computational
cost.
The Cyc KB is structured using a hierarchical arrangement of microtheories.
Microtheories represent contexts in which sentences are true. Microtheories inherit
content from other microtheories to which they are linked. So, if MT1 is more general
than MT2 than everything that is true in MT1 is also true in MT2. This relationship
among microtheories is transitive and reflexive, but not symmetric. Sentences that
hold universally are those which are expressed in the "highest," most general
microtheories, while sentences that are true in very limited contexts (i.e., the content
of a person's testimony or a work of fiction) are represented in microtheories that are
much "lower" in the hierarchy. The great advantage of microtheories is that
contradictory assertions can be represented within the KB without introducing formal
contradictions because those assertions can be represented in microtheories that are
not linked to one another.
As of March 2005, the Cyc KB contains over 296,000 reified terms, including over
55,000 classes, 198,000 individuals, and 22,000 relations (predicates and functions).
These terms are linked by 3.3 million assertions. The Cyc KB operates in conjunction
with an inference engine, with deductive and abductive capabilities. The inference
engine is composed of more than 800 special-purpose reasoning modules, each of
which handles a very specific type of query. These modules range in complexity
from the very simple (special index-lookup code for a specific predicate) to the
extremely complex (modules for temporal reasoning). The modules are interlinked
via a blackboard system, which allows very general modules to handles cases for
which more specific support does not exist; the final fall back is a general theorem
prover. This architecture allows for the addition of new modules as needed.
An important aspect in the design of the system comes from the design of the
ontology. Specifically, because of the explosive growth of search spaces during
deductive inferences with transformation, the ontology needs to be designed such that
it can make inference more efficient. The choice not to represent classes with unary
predicates is one illustration of this. Another significant illustration of this point is the
inclusion of definitional assertions on the elements of the ontology and the
enforcement of the semantics represented by those assertions at different stages within
the system. The definitional assertions for predicates are used to constrain the values
that the predicate's arguments can take (I will refer to these assertions content as
"semantic meta-knowledge"). During the knowledge entry process, this enables the
system to exclude assertions as "semantically ill-formed" and, thus, ensure the
integrity of the content of the KB. Thus, if #$biologicalMother can hold only
between an animal and a female animal and female animal and integer are disjoint
classes (i.e., they do not and cannot have any common instances), then an attempt to
enter into the KB that someone's mother is the number 12 would be rejected.
By preserving the integrity of the knowledge entered into the KB, these semantic
restrictions maintain the correctness of the answers derived through deductive
inference and spare inference the cost of checking for the satisfaction of constraints.
Converting Semantic Meta-knowledge into Inductive Bias 41
Another application of this knowledge is with abductive inference. Cyc
implements abduction through hypothesizing entities or relations such that the
hypothesized facts satisfy the antecedents of deductive implication rules. The
inference engine uses this knowledge about the meaning and proper usage of
predicates to reject hypotheses that could never be true. Thus, it will not hypothesize
that an office building was destroyed because it was a meteor that burned up upon
entering the atmosphere, because no office building is a meteorite.
As a first step toward using the knowledge in the Cyc ontology and KB to enhance
machine learning, we will be investigating the use of semantic meta-knowledge to
provide inductive biases to ILP systems. The next two sections of this paper discuss
the declarative representation of bias used by ALEPH, and McCreath and Sharma’s
[5] algorithm for learning modes and types from data. The remainder of the paper will
provide more detail on one form of semantic meta-knowledge for predicates, discuss
how to convert them into inductive biases, and discuss additional approaches for
handling more complex cases.
3 The Declarative Representation of Bias in ALEPH
ALEPH is an implementation of the inverse entailment algorithm underlying the
Progol system (Muggleton [8]). We are interested in this formalism because it
combines mode and type information and, so, offers a scheme that could be used
among different ILP systems. The system represents mode and type biases using the
predicate mode/2.2 The general form of these clauses is:
mode(RecallNumber,pred(ModeType1,…,ModeTypen).
RecallNumber is either the maximum number of successful calls to the predicate
that appears in the second argument position, or an asterisk, meaning the predicate has
unbounded indeterminacy. The number of instances of ModeTypei equals the arity
of the predicate pred. If these mode types are simple, the expressions that will
replace ModeTypei begin with a symbol designating the mode of the variables that
will fill that argument position, followed by a name for the type constraint on that
argument. In ALEPH, input variables are designated with a ‘+’, output variables are
designated with a ‘-’, and constant arguments are designated with a ‘#.’3 For example,
if the predicate addition/3 has all of its arguments constrained to integers and an
expression of the form addition(X,Y,Z) means that Z is the sum of X and Y,
then the following is an appropriate mode declaration for the predicate:
mode(*,addition(+integer, +integer,-integer)).
This clause states that addition/3 has unbounded indeterminacy, that all of its
arguments take instances of the same type, and that the first two arguments are input
2 This following description of ALEPH is based on the documentation available at: http://web.
comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/aleph.html
3 This paper will not discuss the creation of constants, but that is an important piece of future
research.
42 J. Cabral et al.
variables while the third argument is an output variable. It is important to note that the
names given to the types are not used for type reasoning by ALEPH. So, no special
significance attaches to the names used in the mode/2 statements.
The mode settings contained in mode/2 clauses are used by ALEPH to constrain
the set of hypotheses, by shaping the construction of the bottom clause from which
the algorithm generalizes. The modes constrain the hypotheses’ variables in the
following ways:
An input variable of a given type in a body literal must appear as an input variable
of the same type in the head literal or as an output variable of the same type in an
earlier body literal of that clause.
An output variable of a given type in the head literal must appear as an output
variable of the same type in some body literal of that clause.
Any predicate’s arguments declared to be a constant of a given type, must take a
ground instance of that type as a value in that argument.
The user need not program the mode and type information. ALEPH includes an
implementation of McCreath and Sharma’s (1995) algorithm for inferring mode and
type constraints from the background knowledge and examples.
4 McCreath and Sharma’s Algorithm: Bias Induction from Data
4.1 The Algorithm
McCreath and Sharma's [5] algorithm aims to induce meta-knowledge from only the
data given to the system. For determining mode biases, the algorithm initially
assumes that all possible combinations of modes for a predicates' arguments are valid.
It then works through the available data to find counter-examples. When a counter-
example is found, the contradicted modes are eliminated. Counter-examples are
based on the assumption that modes represent functional dependencies. So, a
counter-example would involve the values for a set of input arguments being matched
to different values for an associated output argument. For example, this algorithm
would originally assign the binary predicate p/2 the set of possible modes:
p(+,+), p(+,-), p(-,+) and p(-,-). If p(a,b) and p(a,c) were
positive examples, then the mode setting p(+,-) would be eliminated from the set
because the examples contradict the claim that the value of the second argument is a
function of the first argument.
For determining types, McCreath and Sharma's algorithm begins with the
assumption that every argument of every predicate in the language of the learning
problem is constrained to a different type. As the algorithm processes the data, if the
same value appears in two different argument positions, then the algorithm
redistributes the type constraints and assigns the same type to the argument positions
that shared that value. For example, if we have two binary predicates p/2 and r/2,
then the algorithm would initially assign every argument a different type constraint:
p(type1,type2) and r(type3,type4). If p(a,a) and r(a,b) appeared
within the examples, then the types would be modified to show that the same type can
Converting Semantic Meta-knowledge into Inductive Bias 43
appear in both of p/2's arguments and in the first argument of r/2. So, the revised
types would be p(type1,type1) and r(type1,type4).
4.2 Weaknesses of the Algorithm
With any form of learning, there is the possibility of error; but there are two types of
error to which McCreath and Sharma’s original algorithm and a modified version
implemented within ALEPH are susceptible. Both relate to type constraints.
Excluding available type constraints reduces efficiency, while including too many
type constrains puts too much of a restriction on the search space and, so, causes the
system to miss legitimate hypotheses.
McCreath and Sharma’s basic algorithm is susceptible to including too few type
restrictions. For instance, consider the following representation of a section of a
family tree.
father(abe,bob).
mother(abe,carol).
father(bob,david).
father(carol,fred).
In this example, Bob and Carol both appear in the first argument positions of
father/2 clauses. This causes the types on the second argument positions of
mother/2 and father/2 to be merged, even though they should be constrained to
disjoint types: female and male animals, respectively. So, in this situation, there
would be only one type for all of the argument positions.
ALEPH includes an alternative implementation of the algorithm that allows for the
merging of types between argument positions based on the degree of overlap in the
values for two different positions, where the degree of overlap is the proportion of
values of one position that are also values for the second position. This approach
faces the problem of not recognizing that argument positions can be merged. For
instance, in the above example, because there is only one value that appears in both
the second argument of mother and the first argument of father, a sufficiently high
overlap threshold between those arguments would prevent them from being linked.
Thus hypotheses that link mothers to their fathers could be missed.
5 Semantic Meta-knowledge in Cyc
5.1 Basic Meta-knowledge for Predicates
A first step in the addition of a new predicate to the Cyc ontology is the creation of
definitional assertions that express the predicate’s meaning by setting constraints on its
applicability. For example, the predicate #$hasHeadquartersInRegion relates
an organization to the geographical region where that organization’s headquarters are
located. If #$BMWInc denotes the Bayerische Motoren Werke corporation and
#$CityOfMunich denotes the city of Munich, Germany one could assert:
(#$hasHeadquartersInRegion #$BMWInc #$CityOfMunich)
44 J. Cabral et al.
To enforce the semantics of this predicate, a number of other assertions that constrain
#$hasHeadquartersInRegion would be used. For example, #$arg1Isa
and #$arg2Isa are, respectively, used to state that the first and second argument
positions of a particular predicate must be instances of particular collections.4 In
order to express that the first argument of #$hasHeadquartersInRegion must
be an organization and second argument must be a geographical area, the following
assertions would be made:
(#$arg1Isa #$hasHeadquartersInRegion #$Organization)
(#$arg2Isa #$hasHeadquartersInRegion
#$GeographicalRegion)
The effect of these two statements is that if a knowledge engineer tries to enter a new
assertion using the predicate #$hasHeadquartersInRegion and the value of
one of its arguments can be proved to not be an instance of the required type, then the
semantic validation process would reject the assertion.
Many predicates in the Cyc ontology take collections as values. The predicate
#$argIsa can be used to state that a value must be a collection. Additional
assertions can further specify the collection. First, the predicate #$argGenl can be
used to state that the value must be a sub-collection of some collection. Alternatively,
instead of #$Collection, #$argIsa could refer to a second-order collection –
a collection whose instances are collections. #$PersonTypeByPositionInOrg
is a second-order collection whose instances are types of occupation that refer to
organizations (e.g., #$PrimeMinister, #$ChiefExecutiveOffice).
The predicate #$personHasPositionInOrg relates a person to a type of
occupation and an organization. The following definitional assertions hold of its
second argument:
(#$arg2Isa #$positionOfPersonInOrg
#$PersonTypeByPositionInOrg)
(#$arg2Genl #$positionOfPersonInOrg #$Person)
So, although an instance of the collection #$PrimeMinister (e.g., Tony Blair)
could fill the first argument of a #$positionOfPersonInOrg assertion, the
appearance of the same value in the second argument position would not be
semantically well-formed because the collection of individuals is disjoint with the
collection of occupation types, #$PersonTypeByPositionInOrg.
5.2 Representing the Determination
The first step is the process is the representation of the problem, or determination.
Specifically, there is the representation of the target predicate (i.e. the predicate for
4 The Web Ontology Language, OWL, based on its inheritance of certain portions of the
Resource Description Framework, includes similar vocabulary for defining the properties of
predicates. The Cyc ontology includes a larger vocabulary with that function, but the
particular process described here can be applied to OWL ontologies as well.
Converting Semantic Meta-knowledge into Inductive Bias 45
the head of learned clauses) and the set of predicates for the background knowledge
(i.e., the predicates that could appear in the bodies of learned clauses). The Cyc
ontology uses the function #$ILPDeterminationFn for denoting determinations.
It is a binary function whose first argument position is the target predicate and whose
second argument is the list of background predicates. In order for the target predicate
to be considered a possible body predicate, it would need to be included in the list of
predicates as well.
For example, the following expression represents a determination that targets the
predicate #$hasHeadquartersInRegion with the possible body predicates:
#$residesInRegion, #$seniorExecutives, #$importantCompany,
and #$positionOfPersonInOrg:
(#$ILPDeterminationFn
#$hasHeadquartersInRegion
(#$TheList #$residesInRegion
#$seniorExecutives #$importantCompany
#$positionOfPersonInOrg))
For simplicity’s sake, during the remainder of this section the term
#$Determination01 will be used as a name for the determination denoted by the
above expression.
5.3 Deriving Types
Given a specific determination, Cyc uses its semantic meta-knowledge and its type
hierarchy to determine a more compact representation of the relevant type constraints.
This more compact representation allows us to communicate information to ILP
engines like ALEPH that do not do type reasoning when using inductive biases. The
goal is to try to capture all and only the appropriate linkages among arguments, while
minimizing the size of the representation. Because of the size of the Cyc ontology
and the different representational tasks of different predicates, they have type
constraints at different levels of generality. The basic strategy implemented here is to
look for subsumption relations among argument constraints and then to use only the
most specific types that are used as constraints.
The first step in the process is to generate the set of collections that constrain the
argument positions for the predicates in the determination. The set of these collections
for #$Determintion01 is:
{#$Animal, #$PersonTypeByPositionInOrg #$Person,
#$Organization, #$CommercialOrganization,
#$GeographicalRegion}
The next step is to reduce the set by eliminating any collections that have at least one
sub-collection that is also element of the set. #$Person is a sub-collection of
#$Animal and #$CommercialOrganization is a sub-collection of
#$Organization. So, #$Organization and #$Animal would be eliminated
to produce the new set:
46 J. Cabral et al.
{#$Person, #$PersonTypeByPositionInOrg
#$CommercialOrganization, #$GeographicalRegion}
For the remainder of this paper, any reference to a “relativized type” should be taken
to refer to an element of this reduced set. Given the original semantic meta-
knowledge about the predicates, knowledge of the subsumption and disjointness of
collections, a set of rules are invoked that conclude to the predicate
#$argIsaWRTSpec. (“argument is an instance of type with regard to
determination”). This predicate is used to represent more focused type constraints for
a predicate within the context of a given determination. For example, relative to the
determination above, Cyc would infer
(#$argIsaWRTSpec
#$hasHeadquartersInRegion 1 #$CommercialOrganization
#$Determination01)
This statement states that, relative to #$Determination01, the first argument of
the predicate #$hasHeadquartersInRegion should be treated as if it were
constrained to #$CommercialOrganization. In cases where the reduced set
contains several collections that are subsumed by the predicate’s original argument-
constraining collection, they are all represented with #$argIsaWRTSpec
assertions for that predicate’s argument.
5.4 Deriving Modes
For certain predicates, modes are readily definable based on the content of the
underlying relation. For example, in the case of a predicate that relates a list to one of
its members, a mode that requires the list value to be provided (i.e., an input) and the
member value to be derived (i.e., an output) is more efficient than the inverse would
be since the member value is a member of an unlimited number of possible lists. For
the vast majority of predicates in the Cyc ontology, a similar judgment is not as
readily available. The issue is magnified by the fact that the choice of mode settings
can greatly influence the behavior of the ILP system.
As Cyc is an engineering project, the choice of how to proceed with establishing
those connections occurs in the context of improving Cyc’s ability to answer
questions. Specific applications of Cyc will likely focus on a central set of queries that
share a common set of predicates, most of those predicates will be used to describe a
small set of types of entities. Assuming that induction will be geared to improving
inference with regard to a particular topic, the current strategy for generating modes is
to focus on one argument of the target predicate. We expect that this focal argument
will be bound during a query. In practice, this binding would be the entity that is the
topic of the question. So, if we wanted to know information about commercial
organizations, we would ask the following sort of query:
(#$hasHeadquartersInRegion #$BMWInc ?PLACE)
Given that the focal type for these queries is #$Organization and that the first
argument of #$hasHeadquartersInRegion is the only one constrained to
Converting Semantic Meta-knowledge into Inductive Bias 47
#$Organization, we would include the following assertion to specify that the
organization argument is focal for this determination:
(#$focalArgumentOfDeterminationTarget
#$Determination01 1)
Once a focal argument has been designated, the following rules are used to
determine the modes for the determination’s predicates:
The focal argument of the target predicate has an input mode.
If the target predicate has an input argument of relativized type T, and a predicate
(either target or background) also has an argument of relativized type T, then that
second argument has an input mode.
If the target predicate has an input argument of relativized type T1, a predicate
(either target or body) has an argument of relativized type T1, and T1 and T2 are
disjoint collections, then that second argument has an output mode. 5
This algorithm utilizes the user-provided information of the focal argument for the
target predicate in conjunction with semantic knowledge regarding all of the
predicates in the determination to derive modes for each argument.
In CycL, the process concludes with the derivation of assertions represent the
combination of the relativized type for a predicate’s arguments and that argument’s
mode. The predicates #$argIsaInputModeForDetermination and
#$argIsaOutputModeForDetermination are used for this task. In the case
of #$Determination01, we have assumed a focal argument that is constrained to
organizations. Since organizations are disjoint with geographical regions, the system
would draw the following conclusions for the predicate #$importantCompany,
which relates a commercial organization to a geographical region in which it plays a
significant economic role:
(#$argIsaInputModeForDetermination
#$importantCompany
#$CommercialOrganization 1
#$Determination01)
(#$argIsaOutputModeForDetermination
#$importantCompany
#$GeographicalRegion 2
#$Determination01)
The first argument is an input mode because it shares the same relativized type as the
focal argument, #$CommercialOrganization. The second argument is an
output mode because its relativized type, #$GeographicalRegion, is disjoint
with the focal argument’s relativized type.
5 This is a rough reconstruction of the general pattern found in the data sets found in the archive at:
http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/applications.html. The development of
more sophisticated algorithms is a topic for future research.
48 J. Cabral et al.
5.5 More Complex Cases
The method described above is most efficient in cases where the set of type
constraints all involve disjointness and subsumption relationships. First, the
subsumption allows us to minimize the number of collections that need to be
considered. So, we can send out a more compact representation to the ILP engine.
Further, when two collections are disjoint, then there should be no relationship
between two arguments that are constrained by those collections. However, for any
pair of collections that are chosen from the ontology, a significant chance exists that
they are neither disjoint nor does one subsume the other. As of March 2005, the
probability that two randomly selected collections from the Cyc KB are orthogonal is
0.35. While such ‘orthogonal collections’ are a minority in the set of collections used
for semantic constraints on predicates, they do occur. For example, the second
argument of #$father is constrained to #$MaleAnimal, while the first argument
of #$rulerInDynasty is constrained to #$Monarch-HeadOfState. Neither
of these collections subsumes the other and they share instances. Depending on the
focal argument of the determination, the rules described might not return modes for
such cases. There are four solutions that present themselves for handling cases where
this situation is violated.
First, one could introduce a framework that allows for type conversions of
individuals within the rules that govern the use of modes. Such conversions would be
made in the case of individuals that are instances of multiple collections. A second
strategy is the introduction of more general collections that subsume the orthogonal
collections to link the different argument. Third, a more specific collection could be
introduced that represents the intersection of the various orthogonal collections.
Finally, temporally created specialized predicates could be introduced when
communicating the background knowledge to ALEPH. The new predicates would be
specialized in the sense that their argument constraints be the intersection of the
orthogonal collections. This would establish subsumption relations and make them
compatible with the simple algorithm described above.
First, the “Casting” strategy is intended for cases where predicates are constrained
by orthogonal collections and they have values that are instances of those orthogonal
classes (e.g. a father who is a ruler in a dynasty). The strategy focuses on type
constraints of predicates, but also takes into account the fact that some entities will be
instances of more than one collection. So, if an argument is constrained to a
particular type that is orthogonal to another predicates’ constraining type and the
value of that argument is also an instance of the orthogonal collection, then that value
should be used with the other predicate. The principle cost of this strategy is a
dramatic increase in the number of predicates in relation to the number of individuals
that are instances of multiple collections. In addition, since type conversion has to be
bi-directional, there is the potential of catching ALEPH in a conceptual loop, as it
oscillates back and forth between two facets of a term.
Second, the “Type Generalization” strategy walks up the type hierarchy to find the
least general collection that subsumes the orthogonal collections. From a strictly
semantic perspective, the use of the more general types violates the meaning of the
predicate. This violation translates into a decrease in the utility of type knowledge.
Since fewer distinctions are being made, there is a weaker partitioning of the search
Converting Semantic Meta-knowledge into Inductive Bias 49
space. In the degenerate case, all type knowledge is lost if the least general collection
remains general enough to subsume all the argument constraints of all the predicates
in the determination.
Third, the “Type Specialization” strategy introduces more specific types that allow
the system to maintain the semantic integrity of the predicates and the benefits that
follow from the splitting up of the search space. However, this strategy requires the
generation of significantly more types and a proportionately larger group of mode
statements that properly connect them to the predicates.
Fourth, the “Predicate Specialization” strategy introduces more specific predicates.
It requires creating more specialized predicates whose argument positions have
significantly tighter constraints and then distributing the original data to the new
predicates based on the types of their values. Here, the problem of having multiple
linkages to the same argument positions is eliminated because the examples will be
partitioned based on the new predicates’ tighter semantics. However, there are two
problems with the approach. First, the strategy increases the likelihood of the system
returning over-fitted rules. If the invented predicate has a very small extent, the
likelihood increases that it will satisfy some of the positive examples even though the
more general predicate from which it was derived would not. Alternatively, if the
system’s evaluation function has a relatively high threshold for acceptability (e.g., its
minimum number of positive examples covered), the split might be such that
hypotheses with the new predicate falls beneath that threshold while a rule formed
with the original predicate would be above it.
6 Conclusions and Future Research
The next and most important stage in this work is the empirical investigation of the
utility of this strategy for ILP systems. Existing data sets need to be ontologized such
that a system like ALEPH can be tested with ontologically derived settings versus
hand-generated settings and the settings provided by other mode learning algorithms.
In addition, our presentation has focused on only a subset of the types of semantic
meta-knowledge that are available within the Cyc. Cyc contains a large amount of
unexploited meta-knowledge, such as that expressed by #$interArgReln and
#$interArgIsa. The former predicate is used to say that a particular binary
relation must hold between the values of some predicate. The latter predicate dictates
that when an instance of one collection appears in one argument of a predicate, then
an instance of a specified collection must appear in some other argument (e.g., a
human child can only have a human as a parent).
The content of the Cyc KB itself can also serve as a resource for generating
inductive biases. It contains a large body of rules expressed as either implications or
as ground assertions that can be expanded to implications via templates. These rules
could be used to divert the search away from known rules or hypotheses that are
entailed by existing rules. Preliminary work has been done to convert rules into
formats that use ALEPH’s pruning mechanism toward this end.
Finally, we have introduced only a simple methodology for generating mode
settings based on type constraints. More sophisticated strategies for deriving modes
from the focal type and the relationships among the type constraints of the
50 J. Cabral et al.
determination should become available with further research. Given that the rules for
generating modes are declaratively represented, representing the modes themselves
and rules for interpreting them could lead to further rule-learning improvements.
References
1. Tausend, B.: Biases and their Effects in Inductive Logic Programming. In: F. Bergadano
and L. De Raedt (eds.): Proceedings of the 7th European Conference on Machine Learning.
Lecture Notes in Artificial Intelligence, Vol. 784. Springer-Verlag, Berlin Heidelberg New
York (1994) 431–434.
2. Tausend, B.: Representing Biases for Inductive Logic Programming. In: F. Bergadano and
L. De Raedt, (eds.): Proceedings of the 7th European Conference on Machine Learning.
Lecture Notes in Artificial Intelligence, Vol. 784. Springer-Verlag, Berlin Heidelberg New
York (1994) 427–430.
3. Nedellec, C., Rouveirol, C., Ade,H., Bergadano, F., Tausend, B.: Declarative Bias in ILP.
In: L. De Raedt (ed.): Advances in Inductive Logic Programming. IOS Press, Amsterdam
(1996) 82-103.
4. McCreath, E.: Induction in First Order Logic from Noisy Training Examples and Fixed
Example Set Sizes. PhD Thesis, University Of New South Wales (1999).
5. McCreath, E. and Sharma, A.: Extraction of Meta-Knowledge to Restrict the Hypothesis
Space for ILP Systems. Yao, X. (ed): Eighth Australian Joint Conference on Artificial
Intelligence. World Scientific Publishing, Singapore (1995) 75-82.
6. Di Mauro, N., Esposito, F., Ferilli, S. and Basile, T.M.A.: An Algorithm for Incremental
Mode Induction. In Orchard, B., Yang, C., Moonis, A. (eds.): Innovations in Applied
Artificial Intelligence: 17th International Conference on Industrial and Engineering
Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2004. Lecture Notes in
Computer Science, Vol. 3029. Springer-Verlag, Berlin Heidelberg New York (2004) 512-
522.
7. Lenat, D. and Guha, R.: Building Large Knowledge Based Systems: Representations and
Inference in the Cyc Project. Addison-Wesley, Reading (1989).
8. Muggleton, S.H.: Inverse Entailment and Progol. New Generation Computing 13 (1995):
245-286.
... ation are well and good, but what they implicate, namely, a declarative and quantitative infrastructure for implementation, is still in the very early stages of being implemented at Cycorp. We assume that earlier work in knowledge exportation, which involves representation change of a different kind, will provide initial guidance [Reed & Lenat 2002, Cabral et. al. 2005. ...
Article
Full-text available
Formal work on hypothesis-driven models of recognition bring us face-to-face with the problem of ontological inter- operability: the ontology used for generating models employed in hypothesis-matching must align with the schema of the databases being searched in order to control the computational cost of the model-building process. We describe preliminary work toward a system for hypo- thesis generation and corroboration. The system uses a plurality of ontological representation approaches plus appl- ication-specific biases to transform descriptions of probable candidate scenarios into descriptions of observable and deductively provable candidate scenarios, based on the available data sources. The transformation occurs at system setup time. The application biases come from the domain of threat detection.
... These proposals try to overcome the difficulties of accommodating ontologies in Relational Learning. The work of [3] on using semantic meta-knowledge from Cyc as inductive bias in an ILP system is another attempt at solving this problem though more empirically. Conversely, we promote an extension of Relational Learning, called Onto-Relational Learning (ORL), which accounts for ontologies in a clear, elegant and well-founded manner by resorting to onto-relational rule languages. ...
Article
Full-text available
Rules complement and extend ontologies on the Semantic Web. We refer to these rules as onto-relational since they combine DL-based ontology languages and Knowledge Representation formalisms supporting the relational data model within the tradition of Logic Programming and Deductive Databases. Rule authoring is a very demanding Knowledge Engineering task which can be automated though partially by applying Machine Learning algorithms. In this chapter we show how Inductive Logic Programming (ILP), born at the intersection of Machine Learning and Logic Programming and considered as a major approach to Relational Learning, can be adapted to Onto-Relational Learning. For the sake of illustration, we provide details of a specific Onto-Relational Learning solution to the problem of learning rule-based definitions of DL concepts and roles with ILP.
... In order to realize the potential of automated reasoning, it is necessary to move beyond simply looking up facts, or doing simple type generalizations. In previous work (Cabral et al. 2005) we described the use of ILP to produce candidate rules in Cyc. Recently, we have been experimenting with an even simpler technique: transforming decision trees trained on KB content (in this case from the Analyst KB section of the Cyc KB, Deaton et al. 2005) into Cyc rules with probabilistic conclusions. ...
Conference Paper
Full-text available
Formalized representations of knowledge can offer greater flexibility and broader applicability to problem solving programs, by reducing the need for specialized programming. However, this promise can only be fully realized if all of the knowledge required for some task is available in suitable form. The recent increase in popularity of logicbased representations, operating over web-based logical symbols (the Semantic Web [ref]) offers some hope that the available knowledge might become available, but so far, the results of this representational activity have been mostly indicative, rather than broad demonstrations of the application of automated reasoning to answer real, practical needs.
... When the tool is performing well, it helps make knowledge engineering feasible for less sophisticated users. Automatic classification would also be beneficial for technologies that gather facts automatically or generate novel knowledge by analysis of existing knowledge, such as finding implication rules via ILP (Cabral et al., 2005). In this paper we describe three possible techniques for developing such a tool. ...
Conference Paper
Full-text available
Ontologies are an increasingly important tool in kn owledge representation, as they allow large amounts of data to be related in a logical fashion. Current research is c oncentrated on automatically constructing ontologies, merging ontologies with different structures, and optimal mechanisms for ontology building; in this work we c onsider the related, but distinct, problem of how to automa tically determine where to place new knowledge into an existing ontology. Rather than relying on human knowledge engineers to carefully classify knowledge, it is be coming increasingly important for machine learning techniq ues to automate such a task. Automation is particularly im portant as the rate of ontology building via automatic know ledge acquisition techniques increases. This paper compar es three well-established machine learning techniques and sh ows that they can be applied successfully to this knowl edge placement task. Our methods are fully implemented and tested in the Cyc knowledge base system. 1
... In the longer term, the prospects for increasingly automated knowledge acquisition seem bright. We have been working on automated rule learning over large conceptual and relational vocabularies (Cabral et al. 2005, Curtis et al. 2009 ), and are participating in the DARPA Machine Reading Program , in support of this goal. The other key to broad applicability is simply having the inferential scale needed to support queries depending on very large rule sets applied to web-scale data. ...
Article
Full-text available
By extending Cyc's ontology and knowledge base approximately 2 percent, Cycorp and Cleveland Clinic Foundation (CCF) have built a system to answer clinical researchers' ad hoc queries. The query may be long and complex, hence it is only partially understood at first, parsed into a set of CycL (higher-order logic) fragments with open variables. But, surprisingly often, after applying various constraints (medical domain knowledge, common sense, discourse pragmatics, syntax), there is only one single way to fit those fragments together, one semantically meaningful formal query P. The Semantic Research Assistant (SRA) system dispatches a series of database calls and then combines, logically and arithmetically, their results into answers to P. Seeing the first few answers stream back, users may realize that they need to abort, modify, and re-ask their query. Even before they push ASK, just knowing approximately how many answers would be returned can spark such editing. Besides real-time ad hoc query answering, queries can be bundled and persist over time. One bundle of 275 queries is rerun quarterly by CCF to produce the procedures and outcomes data it needs to report to the Society of Thoracic Surgeons (STS)-an external hospital accreditation and ranking body; another bundle covers the American College of Cardiology (ACC) reporting. Until full articulation and answering of precise, analytical queries becomes as straightforward and ubiquitous as text search, even partial understanding of a query empowers semantic search over semistructured data (ontology-tagged text), avoiding many of the false positives and false negatives that standard text searching suffers from.
... According to the World Wide Web Consortium (W3C), in the future, the Internet may become a vast and complex global knowledge base known as the Semantic Web (Antoniou & van Harmelen, 2008;Davies, Studer, & Warren, 2006;Cabral, Kahlert, & Matuszek, 2005). ...
Article
Advances in computer-mediated communication technologies have increased the variety of choices individuals have when sharing knowledge both within and across organizational boundaries. This paper extends prior research by comparing face-to-face and computer-mediated communication channels email, online communities, and knowledge repositories/wikis for different knowledge types computer program and expertise in intra-and inter-organizational knowledge sharing contexts. Using vignettes, this study determines whether individuals adjust their knowledge sharing behaviors based on the different knowledge types, communication channels, and boundary conditions. Results suggest individuals are more likely to share expertise intra-organizationally via face-to-face and share a computer program intra-organizationally via knowledge repositories/wikis. Also, results suggest individuals are more likely to share expertise or a computer program inter-organizationally via face-to-face.
Article
This paper discusses the informational equivalence problem for knowledge bases. The authors show that using some mathematical approach it is possible to attack this problem and end up with an implementable algorithm. An essential part of the paper is devoted to the explanation of the mathematical idea which stands behind this algorithm. The authors try to do that in common terms or, at least, in less formal terms. In the second part of the paper mathematical methods are applied to study the properties of automorphic equivalence of knowledge bases multi-models and show that this notion is much wider than the total isomorphism identity of knowledge bases. In order to make the paper self-contained, the reader is provided with the formal definition of a knowledge base. Further development of the theoretical approach presented in the paper can lead to practical applications. For example, it can be used for preventing duplication of information in knowledge bases and in other tasks of improving knowledge management.
Thesis
Full-text available
The stupefying success of Artificial Intelligence (AI) for specific problems, from recommender systems to self-driving cars, has not yet been matched with a similar progress in general AI systems, coping with a variety of problems. This dissertation deals with the long-standing problem of creating more general AI systems, through the analysis of their development and the evaluation of their cognitive abilities. Firstly, this thesis contributes with a general-purpose learning system that meets several desirable characteristics in terms of expressiveness, comprehensibility and versatility. The system works with approaches that are inherently general: inductive programming and reinforcement learning. The system does not rely on a fixed library of learning operators, but can be endowed with new ones, so being able to operate in a wide variety of contexts. This flexibility, jointly with its declarative character, makes it possible to use the system as an instrument for better understanding the role (and difficulty) of the constructs that each task requires. The learning process is also overhauled with a new developmental and lifelong approach for knowledge acquisition, consolidation and forgetting, which is necessary when bounded resources (memory and time) are considered. Secondly, this thesis analyses whether the use of intelligence tests for AI evaluation is a much better alternative to most task-oriented evaluation approaches in AI. Accordingly, we make a review of what has been done when AI systems have been confronted against tasks taken from intelligence tests. In this regard, we scrutinise what intelligence tests measure in machines, whether they are useful to evaluate AI systems, whether they are really challenging problems, and whether they are useful to understand (human) intelligence. Finally, the analysis of the concepts of development and incremental learning in AI systems is done at the conceptual level but also through several of these intelligence tests, providing further insight for the understanding and construction of general-purpose developing AI systems.
Article
Full-text available
The recent advances in knowledge base research and the growing importance of effective knowledge management raised an important question of knowledge base equivalence verification. This problem has not been stated earlier, at least in a way that allows speaking about algorithms for verification of informational equivalence, because the informal definition of knowledge bases makes formal solution of this problem impossible. In this paper we provide an implementable formal algorithm for knowledge base equivalence verification based on the formal definition of knowledge base proposed by Plotkin B. and Plotkin T., and study some important properties of automorphic equivalence of models. We also describe the concept of equivalence and formulate the criterion for the equivalence of knowledge bases defined over finite models. Further we define multi-models and automorphic equivalence of models and multi-models, that is generalization of automorphic equivalence of algebras.
Conference Paper
Full-text available
Learning systems have been devised as a way of overcoming the knowledge acquisition bottleneck in the development of knowledge-based sys- tems. They often cast learning to a search problem in a space of candidate solu- tions. Since such a space can grow exponentially, techniques for pruning it are needed in order to speed up the learning process. One of the biases used by In- ductive Logic Programming (ILP) systems for this purpose is mode declaration. This paper presents an algorithm to incrementally learn this type of meta- knowledge from the available observations, without requiring the final user's intervention.
Article
Many ILP systems, such as GOLEM, FOIL, and MIS, take advantage of user supplied meta-knowledge to restrict the hypothesis space. This meta-knowledge can be in the form of type information about arguments in the predicate being learned, or it can be information about whether a certain argument in the predicate is functionally dependent on the other arguments (supplied as mode information). This meta knowledge is explicitly supplied to an ILP system in addition to the data. The present paper argues that in many cases the meta knowledge can be extracted directly from the raw data. Three algorithms are presented that learn type, mode, and symmetric meta-knowledge from data. These algorithms can be incorporated in existing ILP systems in the form of a preprocessor that obviates the need for a user to explicitly provide this information. In many cases, the algorithms can extract metaknowledge that the user is either unaware of, but which information can be used by the ILP system to restrict ...
Article
. As each of the four main approaches to a declarative bias represention in Inductive Logic Programming (ILP), the representation by parameterized languages or by clause sets, the grammar-based and the scheme-based representation, fails in representing all language biases in ILP systems, we present a unifying representation language MILESCTL for these biases by extending the scheme-based approach. 1 Introduction Describing an inductive learning system at the knowledge level [Die86] requires to characterize the amount of new knowledge added by induction. As the hypotheses vary with respect to the bias, characterizing the new knowledge means to describe the bias. Among others, the bias involves the hypothesis language. For example, the language bias in Inductive Logic Programming (ILP) [Mug93] restricts the hypotheses to subsets of Horn logic. Comparing and combining several language biases in ILP is often difficult as they are uniquely not represented. None of the four approache...
Article
In inductive learning, the shift of the representation language of the hypotheses from attribute-value languages to Horn clause logic used by Inductive Logic Programming systems accounts for a very complex hypothesis space. In order to reduce this complexity, most of these systems use biases. In this paper, we study the influence of these biases on the size of the hypothesis space. For this comparison, we first identify the basic constituents the biases are combined of. Then, we discuss the restrictions set on the distribution of terms in the clause by the constituents of the bias. The effects of several constituents and of some combinations are shown by seven experiments. 1 Introduction Inductive logic programming (ILP) [Mug90] aims to overcome some limitations of attribute-value based learning algorithms by using Horn clause logic as the representation language for the hypotheses. However, a powerful representation language like Horn clause logic accounts for a a very comp...