Conference PaperPDF Available

OntoPath: A Language for Retrieving Ontology Fragments

Authors:

Abstract and Figures

In this work we introduce a novel retrieval language, named OntoPath, for specifying and retrieving relevant ontology fragments. This language is intended to extract customized self-standing ontologies from very large, general-purpose ones. Through OntoPath, users can specify the desired detail level in the concept taxonomies as well as the properties between concepts that are required by the target applications. The syntax and aims of OntoPath resemble XPath’s in that they are simple enough to be handled by non-expert users and they are designed to be included in other XML-based applications (e.g. transformations sheets, semantic annotation of web services, etc.). OntoPath has been implemented on the top of the graph-based database G, for which a Protégé OWL plug-in has been designed to access and retrieve ontology fragments.
Content may be subject to copyright.
R. Meersman and Z. Tari et al. (Eds.): OTM 2007, Part I, LNCS 4803, pp. 897–914, 2007.
© Springer-Verlag Berlin Heidelberg 2007
OntoPath: A Language for Retrieving Ontology
Fragments
E. Jiménez-Ruiz
1
, R. Berlanga
1
, V. Nebot
1
, and I. Sanz
2
1
Departamento de Lenguajes y Sistemas Informáticos
2
Departamento de Ingeniería y Ciencia de los Computadores
Universitat Jaume I (Castellón)
{ejimenez, victoria.nebot, berlanga, isanz}@uji.es
Abstract. In this work we introduce a novel retrieval language, named
OntoPath, for specifying and retrieving relevant ontology fragments. This
language is intended to extract customized self-standing ontologies from very
large, general-purpose ones. Through OntoPath, users can specify the desired
detail level in the concept taxonomies as well as the properties between
concepts that are required by the target applications. The syntax and aims of
OntoPath resemble XPath’s in that they are simple enough to be handled by
non-expert users and they are designed to be included in other XML-based
applications (e.g. transformations sheets, semantic annotation of web services,
etc.). OntoPath has been implemented on the top of the graph-based database
G, for which a Protégé OWL plug-in has been designed to access and retrieve
ontology fragments.
Keywords: Query Languages, Ontologies, Semantic Web, Ontology Fragments.
1 Introduction
The Semantic Web is intended to extend the current web by creating knowledge
objects that allow both users and programs to better exploit the huge amount of
resources. The cornerstone of the Semantic Web is the definition of widely agreed
ontologies conceptualizing the knowledge behind web resources. Nowadays, several
efforts are focused on building very large ontologies that are continuously growing as
new knowledge is added to them by the respective communities. This happens mainly
in the biomedical domain (e.g. GO
1
, GALEN
2
, FMA
3
, NCI-Thesurus
4
, Tambis
5
,
BioPax
6
, etc).
1
Gene Ontology: http://www.geneontology.org/
2
Galen Ontology: http://www.opengalen.org/
3
Foundational Model of Anatomy: http:// fma.biostr.washington.edu/
4
NCI taxonomy/ontology: http://nciterms.nci.nih.gov/NCIBrowser
5
Tambis Ontology: http://www.cs.man.ac.uk/~stevensr/tambis-oil.html
6
BioPax Ontology: http://www.biopax.org/
898 E. Jiménez-Ruiz et al.
However, the very large size of these ontologies as well their variety makes it
difficult to deploy them in particular applications. The first problem is scalability.
Most of these ontologies are expressed in OWL-DL, with different degrees of
expressivity. The classification of new concepts, queries and assertions require the use
of reasoners (e.g. Pellet
7
, Racer
8
, etc.), but they are not able to handle even medium-
size ontologies [1, 2]. A second problem is the application of these ontologies to
concrete applications. Such an application usually does not require comprehensive
descriptions of the domains but rather a handful subset of concepts and properties
from them (i.e. a local view of the domain [3]). Another important issue is their
visualization in ontology editors, in which large ontologies have difficulties to be
loaded and properly displayed.
This paper presents a new query language, OntoPath, for retrieving consistent
fragments from domain ontologies. It is based on the XPath [24] syntax but it is
applied over ontologies described in OWL. In contrast to other Semantic Web query
languages such as SparQL [4] and RQL [5], this language is intended to extract the
necessary knowledge to build new ontologies according to user and application
requests. As a consequence, query results are also OWL files.
OntoPath is being applied within the Health-e-Child project [25] as a mechanism
to explore large biomedical ontologies and also to perform the vertical integration of
the biomedical knowledge via the definition of ontology fragments and connections
between them (i.e. semantic bridges) [6]. Another direct application of OntoPath is
the extraction of analysis dimensions for building OLAP-based multidimensional
models for biomedical research [7]. Finally, it is also being applied to the
compositions of Semantic Web Services. In this case, semantic services are
represented as concepts and their input/output parameters as their properties. Service
chains can be then obtained through Ontopath queries [8].
The rest of the paper is organized as follows. Section 2 gives a brief review of
related works. An overview of the proposal is presented in Section 3. Section 4
describes how ontologies are parsed and stored into the semi-structured database G.
OntoPath syntax and semantics are presented in Section 5. Section 6 is dedicated to
OntoPath query processing. Experiments and examples are presented in Section 7.
Finally, Section 8 gives some conclusions and future work.
2 Related Work: Ontology Modularization and Querying
The necessity of working with ontology modules is well known in the literature and
several approaches can be found to modularize and query ontologies. In [1] a good
review of ontology modularization formalisms is presented, among them: Distributed
Description Logic [9], Modular Reuse of Ontologies [10], Package Based DL [11]
and Network Partitioning [12]. Although these works propose good formalisms to
automatically define ontology modules and their connections, they do not provide
mechanisms to guide the modularization, that is, users do not participate in the
partitioning of the ontology.
7
Pellet reasoner: http://pellet.owldl.com/
8
RacerPro reasoner: http://www.racer-systems.com/
OntoPath: A Language for Retrieving Ontology Fragments 899
Alternatively to these approaches, traversal-based ones are aimed at extracting
ontology fragments according to user preferences, usually specified as a set of
concepts and properties of interest. PROMPT [13], MOVE [14] and OntoPathView
[18] are examples of traversal-based methods. All of them deal with pure frame-based
ontologies. This is a serious limitation since most ontologies are being expressed in
the standard language OWL, which assumes as underlying model the Description
Logic (DL).
Recently, Seidenberg and Rector [2] have proposed a segmentation algorithm to
extract fragments (i.e. segments) from large ontologies in OWL, more specifically the
GALEN ontology. This algorithm resembles traversal-based approaches in that it
starts from a concept of interest and then navigates through the concept hierarchy to
identify not only related properties and concepts but also any element necessary to
describe and classify them. Thus, their main aim is to keep as much as possible of the
inference capabilities of the original ontology in the resulting segments. However,
resulting segments use to be very large, some of them still intractable by current
editors and reasoners. This is because affected axioms can be scattered across the
whole concept hierarchy.
In this paper, we present a different approach for extracting fragments from OWL
ontologies. Our method differs from the previous one in that we do not preserve all
the axioms that are necessary to classify the extracted concepts. Instead, our approach
makes explicit much of this knowledge in the resulting fragments. In other words,
instead of generating new OWL with implicit knowledge to be inferred, we generate
specific and explicit fragments that will not require reasoning capabilities. As a
consequence, much smaller and specific fragments can be obtained. This is especially
useful when generating OWL fragments that will be used in specific final applications
(e.g. a data warehouses).
It is worth mentioning that there exist other query languages for the Semantic Web,
such as RQL [5], SparQL [4], KAON views [15] and RVL [23]. However, these
languages deal with RDF/S ontologies, which are less expressive than OWL ones.
Moreover, their aim is not to build new closed and consistent ontologies as ours, but
providing relational-like data satisfying user requests (e.g. all elements with a specific
date and author).
3 Ontology Management System Architecture
The development of OntoPath has been carried out in the context of a new database
management system aimed at storing, organising and retrieving customized pieces of
knowledge to be used in specific applications [16]. This system relies on the semi-
structured database G
9
and the language OWL. The former was chosen because of its
great capability for storing and retrieving large graph-like data. Figure 1 summarizes
the system architecture, whose main modules are described in turn.
9
A. Rios ”G Platform (Helide S.A)”: http://www.helide.com, http://www.maat-g.com
900 E. Jiménez-Ruiz et al.
- OWL parser
10
and builder. The former consists of an ad-hoc SAX-based parser,
which builds the necessary objects from an input ontology to make it persistent in
the database. The latter allows OWL files to be generated from database objects.
Fig. 1. Architecture of the Ontology Management System
- G database. It has been used as a backend to store, index and retrieve the OWL
ontologies as graphs. To store ontologies four object types are defined, namely:
ontology, property, concept, and enumeration; the latter for nominal lists. Figure 2
summarizes the four object types and the existing references between them. All
OWL constructors have been regarded so that ontologies can be loaded and
retrieved without semantic lost.
Fig. 2. Meta-schema for storing OWL Ontologies into G
- OntoPath. It is the proposed query language for retrieving ontology fragments
(personalized modules or views) from the domain ontologies. By using OntoPath,
users can specify the desired detail level in the concept taxonomies as well as the
properties that are required by the target applications. OntoPath is the main focus
of this work.
- Protégé-OWL. It is the front-end of the system, which allow users to visualize and
edit the stored ontologies (fragment or whole ones). A specific plug-in has been
recently designed to create and browse ontology modules and fragments [19].
10
Python OWL Parser: http://krono.act.uji.es/people/Ernesto/Owl_Parser/
OntoPath: A Language for Retrieving Ontology Fragments 901
4 Ontology Parsing and Storage
From now on, we assume that ontologies are expressed in OWL-DL 1.0 (which uses a
subset of the DL
(D) [20]), and therefore we use standard Description
Logic (DL) syntax when discussing semantic issues in the paper.
As previously mentioned, ontologies are parsed and stored in the G database as
graphs (see Figure 2), where nodes represent ontology entities (e.g. concepts,
properties and nominals) and edges between nodes represent their different
relationships (e.g. subClassOf, domain, range, intersectionOf, etc.) From now on, we
represent G database records as follows:
R= Type(att
1
=v
1
,…,att
n
=v
n
)
Where Type is the type of the record (e.g. ontology, concept, etc.) , and att
i
=v
i
states that the record attribute att
i
takes the value v
i
. Multi-valued attributes are also
allowed, taking the form att
i
=v
1
,..,v
k
. In order to support the heterogeneity of the
database records, attributes lists associated to a record type can be of arbitrary length
(i.e. optional attributes are allowed) and formats (i.e. an attribute can take values from
different data types at each record). Additionally, values can be references to other
database records. For this purpose, record unique identifiers are used. As an example,
the following records represent a simple ontology with two concepts and one
property:
O=ontology(name=’Simple.owl’, rootConcept=C1, rootProperty=P1)
C1=concept(name=’Thing’)
P1= property (name=’PropertyThing’)
C2=concept(name=’Person’, subClassOf=C1)
P2= property(name=’hasFriend’, range=C2, domain=C2, subPropertyOf=P1)
P1, P2, C1, C2 and O are the unique identifiers of the corresponding data records.
To denote an attribute att of a record R, we use R.att, for example C2.subClassOf.
As explained in next sections, OntoPath query processing relies on traversing both
the concept hierarchy and the association graph between concepts. The former
consists of the links present at subClassOf record attributes, whereas the latter
consists of the property records linking concepts through its domains and ranges.
As OWL properties can appear in many different contexts (i.e. class restrictions),
we introduce another record type named inferred property (i-property). An i-property
record has a similar structure to original property records, but its interpretation must
be restricted to the context it happens. For example, considering the class definition
C
A R.B, the record i-property(name=R, domain=C, range=B) means that in the
context of C, some instances of A are related through R to some instances of B. For
this reason, i-property records will have attached the DL expression from which they
where extracted.
Ontology instances (i.e. individuals in DL jargon) are also stored as database
records, preserving their references through database object references. For example,
the following instances belong to the previous ontology:
I1=instance(name=’John’, hasFriend=I2, type=C2)
I2= instance(name=’Peter’, hasFriend=I1, type=C2)
902 E. Jiménez-Ruiz et al.
Following subsections describe how complex DL expressions resulted from OWL
constructors can be approximated to enrich as much as possible database records, and
therefore to make explicit the necessary knowledge for solving OntoPath queries.
4.1 Approximating DL Expressions
As mentioned in the introduction, large ontologies cannot be classified by current DL
reasoners due to their inherent exponential complexity. This means that concepts that
are not explicitly related to others cannot be properly classified, and will be
unconnected to the rest of the ontology. However, there are some simple and frequent
patterns appearing in OWL definitions that can be treated without requiring tableau-
like reasoning. The application of these patterns establishes further relations between
concepts and properties in the database, and they will provide a more complete vision
of the ontology as a graph.
The current list of patterns is as follows:
1. Inferred parents: if a concept is defined as an intersection (both equivalent and
a subset) of a set of concepts, we can infer that it is a subclass of them.
C
C
1
C
n
C.subClassOf=C
1
,…,C
n
2. Inferred children: if a class is defined as the equivalent of the union of a set of
classes, these classes are its subclasses.
C
C
1
C
n
C
1
.subClassOf.append(C) , …, C
n
.subClassOf.append(C)
3. Inferred domains: If a restriction of the form R.D or R.D is found as a class
definition, the corresponding property record is build:
C
R.D i-property(name=R, domain=C, range=D).
Both C and D must be named classes. Notice that this pattern is also applied to
cardinalities, but the i-property is created without a defined range (qualified
cardinalities are not considered).
4. Creation of new classes: If an axiom of the form C
R.(D S.E) is found,
being C and D named classes, then a new named class D’ (D’
D S.E) is
created, with D’.subClassOf=D. Notice that by applying pattern 3, the following
property record is also created: i-property(name=R, domain=C, range=D’).
Finally, the pattern 3 is recursively applied to the sub-expression (S.E). New
class names are generated with the elements of the expression:
D’.name=D_with_S_E.
5. Nominals: the occurrence of nominals is treated as a special case of pattern 3 if
they appear inside a restriction: allValuesFrom, someValuesFrom and hasValue
cases. The correspondent i-property record is created (inferred domain), an also
an enumeration record for the set of nominals. Notice that for hasValue cases ()
the enumeration record will have only one instance value associated.
C
R.{i
1
, i
2
…, i
n
} i-property(name=R, domain=C, enumeration=E).
C
R.{ i
1
} i-property(name=R, domain=C, enumeration=E).
E=enumeration(name=R-nominals, list-of-values=i
1,…
, i
n
)
OntoPath: A Language for Retrieving Ontology Fragments 903
These patterns are applied when parsing the OWL file. For each class definition, a
DL expression is built from the corresponding OWL constructors, and it is normalized
(conjunctive form) before applying the patterns. These DL expressions are also stored
in the database records of their classes.
In order to better understand the proposed patterns, let’s consider the following
class definitions examples in DL. The first one has been extracted from the Pizza
11
ontology and the second one has been taken from a portion of the GALEN ontology
expressed in OWL
12
:
Pizza Example
SpicyPizza Pizza hasTopping.(PizzaTopping hasSpiciness.Hot)
By applying the previous patterns, the following records are generated; notice that a
new class has been created (PizzaTopping_with_ hasSpiciness_Hot) and new explicit
relationships are stored (i-properties i-hS and i-hT):
P = concept(name=Pizza, ….)
SP = concept(name=SpicyPizza, subClassOf=P)
PT = concept(name=PizzaTopping, …)
H = concept(name=Hot,…)
PTH = concept(name= PizzaTopping_with_ hasSpiciness_Hot, subClassOf=PT)
hT = property(name=hasTopping, …)
hS = property(name=hasSpiciness, …)
i-hT = i-property(name=hasTopping, subPropertyOf=hT, domain=SP, range=PTH)
i-hS = i-property(name=hasSpiciness, subPropertyOf=hS, domain=PTH, range=H)
Figure 3 shows the representation of the previous registers with all the DL
constructors represented explicitly in the graph by means of relationships between
concepts
F
ig. 3. Graph representation for Pizza example
Let us emphasize that the creation of class PizzaTopping_with_ hasSpiciness_Hot
(by means of pattern 4) is also proposed in the Pizza-OWL Tutorial
13
in which they
create manually the class HotPizzaTopping.
11
Pizza OWL: http://www.co-ode.org/ontologies/pizza/
12
Galen ontology in OWL: http://www.cs.man.ac.uk/~horrocks/OWL/Ontologies/galen.owl
13
Pizza-OWL Tutorial: http://www.co-ode.org/resources/tutorials/protege-owl-tutorial.php
904 E. Jiménez-Ruiz et al.
Galen Example
AcuteIschaemicCardiacPathology
CardiacPathology
isConsequenceOf.(Ischaemia hasChronicity.(Chronicity hasState.Acute))
For the previous DL expression two classes are created by applying pattern 4
recursively (Ischaemia_with_hasChronicity_Chronicity_with_hasState_Acute and
Chronicity_with_hasState_Aute). Next we present the created records for the DL
expression:
CP = concept(name= CardiacPathology, ….)
AICP = concept(name= AcuteIschaemicCardiacPathology, subClassOf=CP…)
I = concept(name= Ischaemia, …)
C = concept(name= Chronicity,…)
A = concept(name= Acute,…)
ICA=.concept(subClassOf=I, name=
Ischaemia_with_hasChronicity_Chronicity_with_hasState_Acute
, … )
CA = concept(name=
Chronicity_with_hasState_Aute
, subClassOf=C)
iCO = property(name=
isConsequenceOf
, …)
hC = property(name=
hasChronicity
, …)
hS = property(name=
hasState
, …)
i-iCO = i-property(name=
isConsequenceOf
, subPropertyOf=iCO, domain=AICP, range=ICA, …)
i-hC = i-property(name=
hasChronicity
, subPropertyOf=hC, domain= ICA, range=CA …)
i-hS = i-property(name=
hasState
, subPropertyOf=hS, domain=CA, range=A,…)
Finally, it is worth mentioning that for the Galen portion (3002 classes and 413
properties) are generated more than two hundred classes following pattern 4 and near
two thousand inferred properties following patterns 3 and 5.
4.2 Expressivity of Stored and Retrieved Ontologies
During the storage of an ontology, a DL reasoner can be applied to classify all its
concepts and instances according to the logic subsumption relationship. However, if
such a reasoner is not available or it cannot be applied to the ontology due to its size,
relations between concepts will be only approximate as some subsumption
relationships cannot be detected. This is especially critical when union, negations or
complex axioms (i.e. C
R.D S.E F ¬G ) are included in the concept
definitions, as these constructors can only be treated under very strict conditions. In
these cases, a set of anonymous classes are created to maintain the graph structure,
and the original DL expressions are kept in the concept records in order to be
reconstructed when required.
Next table summarizes the treatment given in the storage for each class or role
constructor. Notice that, OWL DL ontologies using constructors that are not treated
will produce approximate answers (i.e. possibly incomplete) to OntoPath queries.
Fortunately, many domain ontologies seldom use negation and union in the concept
definitions.
OntoPath: A Language for Retrieving Ontology Fragments 905
Table 1. Treated DL constructors
Constructors Treatment
- R.C - Universal Restrictions
Pattern 3
- Concept Intersection Pattern 1
- R.C - Existential Qualifications
Pattern 3
- Complex Concept Negation Not treated.
- Concept union Pattern 2, only the case of concept equivalence.
- Role Hierarchy Stored role attribute subPropertyOf
- Complex Role Inclusion Axioms Not considered for OWL 1.0
- Inverse Properties Stored role attribute inverseOf
- Nominals Pattern 5
- Cardinality Restrictions Pattern 4
- Qualified Cardinality Restrictions Not considered for OWL 1.0
(D) - Data Type Properties Stored role attribute PropertyType
5 OntoPath Queries
An OntoPath query is a tree relating concepts and properties of the ontology. As in
XPath, trees are expressed through paths with predicates that impose conditions
through the traversed nodes. However, unlike XPath (and SparQL), the answer to an
OntoPath query is not a set of pointers to the nodes in the graph satisfying the
specified conditions, but a consistent and closed fragment of the ontology (i.e. a sub-
graph) containing the required concepts and properties. Consistent means that all
concepts in the fragment are satisfiable, whereas closed means that the fragment
includes all the necessary concepts and properties to make it consistent. Notice that
closeness and consistency may involve changes over the original knowledge, mainly
over frontier concepts of the ontology fragment, in which a partial definition of them
has been extracted. Nevertheless, the main objective is to define a party’s view of a
domain [3].
5.1 Query Expressions
An OntoPath query contains two types of tree nodes: concept nodes and property
nodes. A property node have only one child concept node, whereas a concept node
can have more than one child property node. Tree nodes also contain predicates that
must be satisfied in the retrieved fragment. Figure 4 shows the interpretation of a
simple OntoPath query like
Disease/related-to/Gene
, where an ontology fragment is
extracted with diseases (if exist) directly related by means of the
related-to
property to
the
Gene
concept.
The simplest query in OntoPath consists of specifying a concept C. The result
consists of an ontology fragment containing all their sub-concepts (i.e. the concept
taxonomy under C) an all its instances. No object properties are retrieved nor relations
between instances involved in the fragment. Data type properties associated to these
concepts (direct and inferred ones) are always included in the fragment.
906 E. Jiménez-Ruiz et al.
Disease
P=related-to
dom(P)
Gene
instances
Retrieved
fragments
range(P)
I(C)
I(C’)
Taxonom
y
F
ig. 4. Graphical interpretation of the OntoPath query
Disease/related-to/Gene
Let sub(C) denote the set of all the sub-concepts of C, and sub(P) be the set of all
the sub-properties of P and their respective i-property records. To state aggregation
conditions over concepts, we use the syntax: C/P/C’, where C and C’ represent
concepts, and P denotes a property. A fragment satisfying this query contains those
sub-concepts of C and C’, denoted R(C) and R(C’) respectively, as well as those sub-
properties of P, denoted R(P), such that:
-
R(C)sub(C), R(C’)sub(C’) and R(P) sub(P)
-
pR(P), dom(P)R(C) and range(P)R(C’)
-
cR(C), pR(P) such that c=dom(p)
-
c R(C’), pR(P) such that c=range(p)
We denote with I(C) to the direct instances (individuals) of C. Thus, the query
C/P/C’ also retrieves all the direct instances of all concepts in sub(C) and sub(C’).
Notice that we can extend the previous query definition to paths of any length:
C
1
/P
1
/C
2
/P
2
/C
3
/P
3
/…
Query expressions can contain the symbol “*” to denote the concept
(Thing), and
the symbol “?” to denote any property (i.e. sub(?) denotes all the properties in the
ontology).
Some examples of OntoPath queries are the following ones:
-
RheumatoidArthritis
: The taxonomy and instances below RA diseases are
retrieved.
-
Disease/?/RheumatoidFactor
: Returns a fragment with diseases (if exist) directly
related by means of any property to Rheumatoid Factor.
-
AutoinmuneDisease/?/*/?/GenePTPN22:
Returns a fragment with autoimmune
diseases related to gene PTPN22 along with all the concepts and properties
participating in their relationships.
-
RheumatoidArthritis/hasTreatment/ *
: Returns a fragment with RA diseases and
their treatments.
Notice that the query C/P/C’ is equivalent to the concept
C
P.C’
. Similarly, the
query C/P/* is equivalent to
C
P.
and */P/C to
P.C
. One can think that fragments for
them could be obtained by including their equivalent concepts in the ontology and
applying some reasoner to them. This has several limitations. Firstly, for each query we
OntoPath: A Language for Retrieving Ontology Fragments 907
need to modify the original ontology and consequently to check the new ontology from
scratch. Second, as previously mentioned reasoning is not possible for large ontologies.
Finally, extracting the ontology fragment involved by a query is a difficult task from the
completion graphs generated by tableau algorithms. For these reasons, we propose an
ad-hoc and fast algorithm for processing OntoPath queries (see Section 6).
5.2 Predicates
As in XPath, we use square brackets [ ] to specify a predicate that must be satisfied by
the involved concept or property of the query expression. If multiple conditions are
required we use one []-expression for each one. They are always interpreted as a
conjuction of predicates. On the contrary to XPath, we will use only []-expressions
instead of the logic equivalent ones.
Twig Queries: Predicates allow us to express twig queries, that is, tree-like aggregate
conditions over the concepts and properties of the query. These have the form:
C[P
1
/C
1
][P
2
/C
2
] …
This is equivalent to the DL query
C
P1.C1
P2.C2
.
When a twig query is required, the condition for the ontology fragment associated
to each branching node (e.g. C) is as follows:
cR(C), P
i
children(C), pR(P
i
), such that c=dom(p)
Future extensions of OntoPath will include negation and disjunction in the
predicates. However, such an extension requires the inclusion of reasoning
mechanisms for classifying the appropriate property domains and ranges participating
in the results.
Querying Metadata: These predicates are useful for selecting parts of the ontology
that were created by different authors or that contain useful annotations for retrieving
relevant concepts to certain applications (e.g. lexicon, keywords, etc.) They have the
following syntax:
Node[@annotation]
Node[@annotation <op> value]
Here Node is either a concept or a property of the ontology. The first expression
means that the specified annotation for the OWL element exists, whereas the second
one indicates that the assigned value for the specified annotation must satisfy the
expressed condition. For this purpose, <op> represents any binary operator over
atomic data types (e.g. <, >, ==, !=, like, etc.) Instances including annotated elements
will be included in the resulting fragment. For example, the query
Disease[@source=NCI]
retrieves all the disease concepts stemming from NCI thesaurus, along with the
instances created for them.
Notice that reasoning over annotations is undecidable (belongs to OWL-Full
category), and therefore we cannot use a reasoner to build fragments under metadata
conditions.
908 E. Jiménez-Ruiz et al.
Data filters: With this predicates, only concepts having the specified features are
selected. Moreover, only instances satisfying the data filter will be included in the
result. The syntax is as follows:
Concept[Datatype_Property <op> value]
Here <op> also represents any binary operator over atomic data types. In DL these
conditions are expressed via concrete domains. However, these are not fully
supported by current versions of OWL. In the future, it would be interesting to have a
classification of concepts involving data type properties (for example concepts
involving age intervals for disease onsets) and then filter the ontology concepts
according to them (for example, selecting only concepts that involve disease onsets
for infants). However, now this kind of filters can be only applied to instances that
explicitly set the specified data type property.
6 Query Processing
The query processor of OntoPath has been implemented as an interpreted grammar.
The grammar output consists of a tree where each node contains the necessary actions
to retrieve the required objects from the database and to build the result set with them.
All these actions require some basic query over the G database (API), see Figure 5.
Due to size limitations, a full description of the underlying algebra cannot be included
in the paper.
Fig. 5. Query Processing for OntoPath
Query Trees
In the rest of the section, concept nodes are denoted with C, C’, etc., property nodes
with P, P’, etc. parent(X) denotes the parent node of X in the query tree, whereas
children(C) and child(P) denote the children nodes of C and the unique child node of
P respectively. The concepts or properties denoted by the query node X are accessed
with name(X).
Our approach for query processing relies on the following principles:
-
For each query concept node C we must keep updated the concepts in
sub(name(C)) that covers all the sub-concepts required by the parent and
children query nodes. We call it R(C).
OntoPath: A Language for Retrieving Ontology Fragments 909
-
For each query property node P we must keep updated those properties in
sub(name(P)) that satisfies the conditions imposed by the parent and child
query nodes. We call it R(P).
-
We call empty concept, denoted as , to any unsatisfiable concept. Thus, if
R(C)=
means that the query has no solution. Similarly, if some property
node has R(P)= then the query has no solution too.
-
Two basic functions are necessary to calculate R(C) and R(P), namely:
o Nearest Common Descendant: NCD(u, v) returns the nearest concept
that is descendant of both u and v. It follows that NCD(u,*) = u,
NCD(*, v)=v, NCD(
, v)= and NCD(u, )= . If nodes u and v have
not a common descendant (i.e. sub(u) and sub(v) are disjoint) then
NCD(u, v)=
. This function can be defined over a set of concepts,
denoted NCD(S), so that it returns the nearest common descendant of all
the nodes in S.
o Nearest Common Ancestor: NCA(u, v) returns the nearest common
ancestor of both u and v. It follows that NCA(u,*)=*, NCA(*,v)=*,
NCA(
, v)= and NCA (u, )= . This function can be defined over a
set of concepts, NCA(S), so that it returns the nearest common ancestor
of all the nodes in S.
The initial values of R sets are calculated as follows:
-
R(P)=sub(name(P)) if P?. Otherwise, it is initialized taking into account the
values of its parent and child nodes as follows:
R(P)={p | psub(?) NCD(dom(p), R(parent(P))
NCD(range(p), R(child(P))
}
-
For concept nodes C, R(C)={name(C)}, that is, they just contain the name
associated to the concept specified in the query node.
R sets are updated as follows:
R(C)
new
= NCD({NCD(dom(p),R(C)
old
)}
pname(P)
)}
Pchildren(C)
)
NCD(range(parent(C)), R(C)
old
)
(1)
R(P)
new
= { p | p R(P)
old
, NCD(dom(p),R(parent(P))
NCD (range(p),R(child(P))
}
(2)
Notice that whenever a node C changes its R(C), its parent and children must be
revised. Similarly, whenever a node P changes its R(P), its parent and child must be
revised.
Graph Indexes
The efficient evaluation of queries requires the use of index structures. The navigation
of trees and DAGs (finding ancestors, descendants and siblings) has been extensively
studied in the literature; following Christophides et al.’s analysis in [21], in our
implementation we have adopted a variation of Agrawal, Borgida and Jagadish’s
interval-based encoding technique [22].
910 E. Jiménez-Ruiz et al.
Fig. 6. Example of interval of type postorder-index assigned to a concept hierarchy
Agrawal et al’s technique consists on numbering each node in the graph with
integers based on a preorder and postorder traversal of the spanning tree, and then
computing a set of intervals which compactly represent all the descendants of each
node. This encoding is especially suited for the computation of subsumption and
NCD, which are reduced to simple interval arithmetic operations. In contrast, the
computation of NCA is linear in the worst case. In our tests this has not been an issue,
since NCA is the least frequently used primitive; in any case, there are well-known
approaches to compute NCA in constant time at little extra spatial cost, should it be
deemed necessary.
6.1 Query Processing Algorithm
The query processing algorithm is as follows:
1. Initializing R sets for all query tree nodes.
2. First calculation of R sets for all query nodes. Put in a queue Q all the children
and parents of the updated nodes. Finish the query processing if some concept
node contains
or some property node is empty.
3. Until Q is not empty
Pop a node from Q and revise it. That is, apply expressions (1) and (2) to
update its R sets. Stop query processing if it contains
or is empty. If
successfully updated then put in Q its parent and children nodes.
4. For each concept node C, for each concept cR(C), construct an OWL fragment
taking into account the selected properties and groups generated in the previous
steps. DL expressions will be expanded only if all its contained concepts and
properties are included in the fragment.
5. For each property node P, construct an OWL fragment taking into account the
inferred domain and ranges as well as the selected property taxonomy.
6. Retrieve all the instances of all the selected concepts such that they satisfy the
filter conditions stated in the corresponding concept node.
OntoPath: A Language for Retrieving Ontology Fragments 911
If predicates over metadata are included in the query tree, we must first proceed as
follows. For each concept node C we must select all concept in sub(C) satisfying the
predicate. We must update sub(C) and re-construct the concept taxonomy under C for
the selected concepts. Notice that NCA and NCD functions must be now applied to
the re-constructed hierarchy.
The complexity of the query processing algorithm is linear with respect to the
number of property records included in the query tree. This is because each iteration
in step 3 can drop at least one property record from R(P), and the algorithm stops
when either no changes are produced or some set R(P) becomes empty.
7 Experiments and Results
In order to check the results obtained with OntoPath we have designed a set of
queries over well-known ontologies in the biomedical domain, namely: GALEN (a
fragment in OWL format) and NCI (version 03.09d
14
). The former is more expressive
and smaller than the latter.
Tables 2 and 3 summarize the results of the retrieved fragments for several queries.
These queries go from very simple taxonomy retrieval to complex twig queries with
different detail levels. As it can be seen, the size of the generated fragments is
relatively small (see last column), although it clearly depends on the generality of the
fetched query. In the NCI ontology-thesaurus, class definitions do not have complex
axioms. For example, restrictions included in classes just specify the concrete range
involved in certain property (i-properties). In this way, queries with solution are
reduced to paths of three elements. In the GALEN ontology, complex axioms are
included in class definitions, with numerous anonymous classes and properties. Thus,
richer queries can be realized, obtaining very specific and useful fragments. For
example, the first query in the GALEN ontology allow users to know which counting
Table 2. Query examples and fragment features for the NCI ontology
nciThesaurus.owl
OntoPath Query Size (Kb) Classes Properties i-properties Red.
Whole Ontology
32850 27652 109 13961 100%
Anatomy Kina 413 2204 0 0 1.2%
*/?/Cell 4581 16969 16 444 13.3%
FindingsAndDisordersKind/?/* 3403 10672 6 2556 9.9%
OrganismKind /?/FindingsAndDisordersKind 578 2217 1 0 1.7%
GeneKind/?/FindingsAndDisorders Kina 2977 9345 2 844 8.6%
*/rDiseaseHasAssociatedAnatomy/* 1509 4826 1 630 4.4%
Cell/?/* 447 2204 2 274 1.3%
Tissue/?/* 425 2204 2 92 1.2%
*/?/Tissue 4563 16969 16 290 13.2%
Protein Kind/?/* 5333 14851 13 11006 15.5%
*/?/Protein Kina 1458 5420 8 1712 4.2%
GeneKind/?/* 4593 13143 6 8392 13.3%
OrganismKind/?/* 3225 10331 3 0 9.4%
*/?/OrganismKind 1893 7544 5 2152 5.5%
14
EVS-NCI (current version: 07.04e): ftp://ftp1.nci.nih.gov/pub/cacore/EVS/NCI_Thesaurus/
912 E. Jiménez-Ruiz et al.
methods are associated to blood cells, and the last one allow users to know resistant
sensitivity cases produced by proteins.
Finally, Table 4 shows some examples of OntoPath that also retrieves instances.
As GALEN and NCI do not provide instances, we took instead the wine
15
ontology,
which illustrates several constructors for individuals and nominal entities.
Table 3. Query examples and fragment features for the GALEN ontology
Galen.owl
OntoPath Query Classes
Created
16
Properties i-properties
Whole Ontology
3002 221 413 2168
AbsoluteMeasurement/?/Cell/?/LiquidBlood 29 10 2 18
*/hasState/resistant 8 0 1 2
*/Attribute/resistant 8 0 1 2
*[hasSubprocess/*][isFunctionOf/*] 3 1 2 2
*/isFunctionOf/* 86 20 2 24
*/hasSubprocess/* 26 5 1 11
Sensitivity[hasState/resistant][Attribute/presence/?/Protein] 10 2 3 3
CardiacPathology/?/Ischaemia/?/Chronicity/?/acute 7 2 3 5
*/isStructuralComponentOf/*/?/Extremity 34 0 2 23
*/isSolidDivisionOf/UpperExtremity 7 0 1 6
Table 4. Query examples for retrieval of instances in the wine ontology
Wine.owl
OntoPath Query Classes Properties i-properties Instances
Whole Ontology
76 13 101 161
*/locatedIn/Region 64 1 4 42
Region/?/* 1 2 0 36 (regions)
Wine 63 0 0 53 (wines)
Wine/?/* 72 9 61 160
Wine/?/Region 64 1 25 89
*/hasSugar/* 64 1 24 56
*[hasSugar/*][hasBody/*] 65 2 31 59
8 Conclusions
Ontology modularization [1] and segmentation [2] have gained an important weight in
domains such as biomedicine, where available ontologies are huge. In these cases,
several issues force the final users to work with a subset of the ontology: scalability
problems in the reasoning over the whole ontology, visualization in ontology editors,
partial knowledge of the domain, maintenance and extension, and use in concrete
applications (i.e.: information extraction guided by ontologies).
The general aim of this work is to extract consistent, closed, and useful ontology
fragments, suitable for concrete applications or for knowledge exploration purposes.
In contrast with ontology modularization approaches, we do not advocate automatic
15
Wine Ontology: http://protege.cim3.net/file/pub/ontologies/wine/wine.owl
16
Classes created by means of pattern 4.
OntoPath: A Language for Retrieving Ontology Fragments 913
(formal or semi-formal) techniques to partition ontologies without the participation of
the final user of the ontology, which is an important limitation because the generated
modules may not be useful for concrete applications. Our proposed work follows a
similar approach to [13, 2], with respect to the guided module/fragment definition,
where final users define their custom knowledge, instead to work with a previously
module or with the whole ontology.
Currently, we are working in the definition of a well founded framework for our
fragment extraction mechanism, in order to maintain and to express formally (we
mean as formally a kind o representation that a reasoning system can understand;
currently the maintained references does not follows any formalism, only syntactic
references) the connections with the original ontology and between fragments. These
connections will allow the final user to expand its previously defined fragment with
more knowledge from the original ontology (or other fragments). From the literature
we can emphasize, as a good staring point, the Safe Ontology Modularization [10] and
the
-Connections [17] approaches. The former one attempts to define safe ontology
modules, whereas the latter aims at extending the OWL syntax and semantics to
represent the connection between ontology modules.
Acknowledgements
This work has been partially funded by the PhD Fellowship Program of the
Generalitat Valenciana, the CICYT Project TIN2005-09098-C05-04 from the Spanish
Ministry of Education and Science, and the Health-e-Child European Project.
References
1. Wang, Y., Haase, P., Bao, J.: A Survey of Formalisms for Modular Ontologies. In: IJCAI
2007. Workshop SWeCKa, Hyderabad, India (January 2007)
2. Seidenberg, J., Rector, A.: Web ontology segmentation: Analysis, classification and use.
In: WWW. Proceedings of the World Wide Web Conference, Edinburgh (June 2006)
3. Bouquet, P., Giunchiglia, F., van Harmelen, F., Serafini, L., Stuckenschmidt, H.: COWL:
Contextualizing Ontologies. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC
2003. LNCS, vol. 2870, pp. 164–179. Springer, Heidelberg (2003)
4. Seaborne, A., Prud’hommeaux, E.: SparQL Query Language for RDF (February 2005)
http://www.w3.org/TR/rdf-sparql-query/
5. Karvounarakis, G., Alexaki, S., Christophides, V., Plexousakis, D., Scholl, M.: RQL: A
Declarative Query Language for RDF. In: Proceedings WWW 2002, Hawaii, USA, USA
(2002)
6. Jimenez-Ruiz, E., et al.: The Management and Integration of Biomedical Knowledge:
Application in the Health-e-Child Project. In: Meersman, R., Tari, Z., Herrero, P. (eds.)
OTM 2006. LNCS, vol. 4278, Springer, Heidelberg (2006)
7. Wang, L., et al.: Biostar models of clinical and genomic data for biomedical data
warehouse design. Int. Journal of Bioinformatics Research and Applications (2005)
8. Paraire, J., Berlanga, R., Llidó, D.M.: Resolution of Semantic Queries on a Set of Web
Services. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS,
vol. 3588, pp. 385–394. Springer, Heidelberg (2005)
914 E. Jiménez-Ruiz et al.
9. Borgida, A., Serafini, L.: Distributed description logics: Directed domain correspondences
in federated information sources. In: CoopIS/DOA/ODBASE, pp. 36–53 (2002)
10. Cuenca-Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Extracting Modules from
Ontologies: A Logic-based Approach. In: OWLED 2007. Proc. of the Third International
OWL Experiences and Directions Workshop (2007)
11. Bao, J., Caragea, D., Honavar, V.: Towards collaborative environments for ontology
construction and sharing. In: CTS 2006. International Symposium on Collaborative
Technologies and Systems, pp. 99–108. IEEE Computer Society Press, Los Alamitos
(2006)
12. Stuckenschmidt, H., Klein, M.: Structure-based partitioning of large concept hierarchies.
In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS,
vol. 3298, Springer, Heidelberg (2004)
13. Noy, N., Musen, M.A.: Specifying ontology views by traversal. In: McIlraith, S.A.,
Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 713–725.
Springer, Heidelberg (2004)
14. Bhatt, M., et al.: Semantic completeness in sub-ontology extraction using distributed
methods. In: Laganà, A., Gavrilova, M., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O.
(eds.) ICCSA 2004. LNCS, vol. 3045, pp. 508–517. Springer, Heidelberg (2004)
15. Volz, R., Oberle, D., Studer, R.: Implementing Views for Light-weight Web Ontologies.
IEEE Database Engineering and Application Symposium (2003)
16. Jiménez-Ruiz, E., Berlanga, R.: A View-based Methodology for Collaborative Ontology
Engineering: an Approach for Complex Applications (VIMethCOE). In: STICA. 1st
International Workshop on Semantic Technologies in Collaborative Applications (June
2006)
17. Cuenca-Grau, B., et al.: Automatic Partitioning of OWL Ontologies Using E-Connections.
In: DL 2005. International Workshop on Description Logics (2005)
18. Jiménez, E., Berlanga, R., Sanz, I., Aramburu, M.J., Danger, R.: OntoPathView: A Simple
View Definition Language for the Collaborative Development of Ontologies. In: López,
B., et al. (eds.) Artificial Intelligence Research and Development, IOS Press, Amsterdam
(2005)
19. Jiménez-Ruiz, E., Nebot, V., Berlanga, R., Sanz, I., Rios, A.: A Protégé Plug-in-Based
System to Manage and Query Large Domain Ontologies. In: 10th Intl. Protégé Conference,
Budapest, Hungary (2007), http://protege.stanford.edu/conference/2007/schedule.html
20. Horrocks, I., Patel-Schneider, P.F., van Harmelen, F.: From SHIQ and RDF to OWL: The
making of a web ontology language. Journal of Web Semantics 1(1), 7–26 (2003)
21. Christophides, V., Plexousakis, D., Scholl, M., Tourtounis, S.: Optimizing Taxonomic
Semantic Web Queries Using Labeling Schemes. Journal of Web Semantics 1(2) (2004)
22. Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships
in large data and knowledge bases. In: SIGMOD 1989, ACM Press, New York (1989)
23. Magkanaraki, A., Tannen, V., Christophides, V., Plexousakis, D.: Viewing the Semantic
Web through RVL Lenses. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003.
LNCS, vol. 2870, Springer, Heidelberg (2003)
24. Means, W.S., Harold, E.R.: XML in a Nutshell A Desktop Quick Reference, January 2001,
Chapter 9: XPath: http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html
25. Freund, J., et al.: Health-e-Child: An Integrated Biomedical Platform for Grid-Based
Pediatrics. In: Health-Grid Conference, Valencia (2006)
... 36 3.5 NoRMV: Non ontological resources' metadata vocabulary [Villazón-Terrazas et al. 2010a] . . . . . . . . . . . . . . . . . 37 3.6 Ontopath, a model for representing ontologies [Jiménez-Ruiz et al. 2007] . . . . . . . . . . . . . . . . . . . . 38 3.7 Terminological entities meta-model from [Vandenbussche & Charlet 2009] . . . . . . . . . . . . . . . . ...
... Other approaches focus on representing the knowledge within resources by specifying a model for each type of resources and other approaches propose more 38 Chapter 3. Identification of knowledge resources generic models. [Jiménez-Ruiz et al. 2007] proposed a model that represents ontologies and their entities independently from a specific formalism. It uses types of ontology entities from different ontological models (figure 3.6). ...
... A query language called OntoPath is based on this model in order to extract ontology modules with the possibility to specify the level of details in the concept hierarchy. Ontopath, a model for representing ontologies [Jiménez-Ruiz et al. 2007] The extracted ontology fragments are stored as graphs. The usage of this model for representing ontological entities generates a usage of new types of explicit relations from ontology axioms. ...
Research
Full-text available
Multiple tasks related to documents, such as indexing, retrieving, annotation, or translation are based on linguistic, terminological and ontological knowledge existing in resources of different types represented using various formalisms. Building bridges between these resources and using them together is a complex task. Solving this problem relies on finding the right resources before extracting the required data. Ontology repositories have been created to help in this task by collecting ontologies and offering effective indexing of these resources. However, these repositories treat a single category of resources and do not provide operations for generating new resources. To meet these needs in terms of knowledge engineering, our contributions are (1) an ontology for representing heterogeneous resources and knowledge combination operators; (2) an approach based on the principles of semantic web to ensure the representation, storage and alignment of heterogeneous resources and (3) the development of an ontology-based repository for combining alignment resources.
... In the context of ontologies, SparQL navigational capabilities have been recently extended [25], [28] by introducing property-paths, which are regular expressions that retrieve pairs of nodes in an RDF graph that are connected by paths conforming to the expression. Languages for specifying and retrieving relevant OWL ontology fragments, by contrast, have been proposed in [24], [36], with the purpose of ontology modularization. With the goal to build new closed and consistent ontologies, the language OntoPath, inspired by XPath, has been proposed in [24]. ...
... Languages for specifying and retrieving relevant OWL ontology fragments, by contrast, have been proposed in [24], [36], with the purpose of ontology modularization. With the goal to build new closed and consistent ontologies, the language OntoPath, inspired by XPath, has been proposed in [24]. Finally, other approaches, among which a relevant recent representative is [32], addressed the issue of ontology exploration and navigation, abstracting from representational details for effective visualization, and allowing the user to make sense of the ontologies content and structure. ...
Article
Schemas are often used to constrain the content and structure of XML documents. They can be quite big and complex and, thus, difficult to be accessed manually. The ability to query a single schema, a collection of schemas or to retrieve schema components that meet certain structural constraints significantly eases schema management and is, thus, useful in many contexts. In this paper, we propose a query language, named XSPath, specifically tailored for XML schema that works on logical graph-based representations of schemas, on which it enables the navigation, and allows the selection of nodes. We also propose XPath/XQuery-based translations that can be exploited for the evaluation of XSPath queries. An extensive evaluation of the usability and efficiency of the proposed approach is finally presented within the EXup system [9].
... In particular, numerous techniques for extracting fragments of ontologies are known. Most of them, such as [11, 12, 13], rely on syntactic heuristics for detecting relevant axioms. These techniques do not attempt to formally specify the intended outputs and do not provide any guarantees. ...
Conference Paper
Full-text available
Driven by application requirements and using well-understood theoretical results, we describe a novel methodology and a tool for modular ontology design. We support the user in the safe use of imported symbols and in the economic import of the relevant part of the imported ontology. Both features are supported in a well-understood way: safety guarantees that the semantics of imported concepts is not changed, and economic import guarantees that no difference can be observed between importing the whole ontology and importing the relevant part.
... Although our strategies are oriented to the extraction of upper modules, in the evaluation we show a comparison with LUM instead of UM since the former modules have more similar sizes to our modules than the latter. We have also carried out some previous work based on the extraction of lower modules [15]. However, this work was focused on ontology querying rather than fragment extraction. ...
Article
Nowadays very large domain ontologies are being developed in life-science areas like Biomedicine, Agronomy, Astronomy, etc. Users and applications can benefit enormously from these ontologies in very different tasks, such as visualization, vocabulary homogenizing and data classification. However, due to their large size, they are often unmanageable for these applications. Instead, it is necessary to provide small and useful fragments of these ontologies so that the same tasks can be performed as if the whole ontology is being used. In this work we present a novel method for efficiently indexing and generating ontology fragments according to the user requirements. Moreover, the generated fragments preserve relevant inferences that can be made with the selected symbols in the original ontology. Such a method relies on an interval labeling scheme that efficiently manages the transitive relationships present in the ontologies. Additionally, we provide an interval’s algebra to compute some logical operations over the ontology concepts. We have evaluated the proposed method over several well-known biomedical ontologies. Results show very good performance and scalability, demonstrating the applicability of the proposed method in real scenarios.
... Several platforms have been developed to work as either natural language ontology editors or natural language query systems. Two important examples in the first category are CNL Editor [8] (formerly OntoPath [5]) and GINO [2]. Moreover, our research group has recently developed OWLPath [13]. ...
Conference Paper
Full-text available
In the last years, a number of semantic biomedical systems have been developed to store biomedical knowledge in an accessible manner. However, their practical usage is limited, since they require expertise in semantic languages by the user, or, in the other hand, their query interfaces do not fully exploit the semantics of the knowledge represented. Such drawbacks were present in the OGO system, a resource that semantically integrates knowledge about orthologs and human genetic diseases, developed by our research group. In this paper, we present an extension of the OGO system for improving the process of designing advanced semantic queries. The query module requires the users to know and to manage only the OGO ontology, which represents the domain knowledge, simplifying the process of query building.
Article
As Web Services and the Semantic Web become more important, enabling technologies such as Web service ontologies will grow larger. At the same time, use of mobile devices to access Web services has doubled in the last year. The ability of these resource-constrained devices to download and reason across ontologies to support service discovery are severely limited. Since concrete agents typically only needs a subset of what is described in a Web service ontology to complete their task, a reduced ontology can be created. Measuring the quality of a reduced ontology, in both knowledge content and performance, is a nontrivial task. Expert analysis of the ontologies is time-consuming and unreliable. We propose two measures of knowledge content and performance. Mean average recall (MAR) with respect to the original ontology compares the data returned from a series of queries related to a particular concept of interest. Mean average performance (MAP) compares the download and reasoning speedup of the reduced ontology with respect to the original ontology. Neither of these values can be computed easily, therefore we propose a set of ontology metrics to predict these values. In this paper, we develop two prediction models for MAR and MAP based on these metrics. These models are based on analysis of 23 ontologies from five domains. To compute MAR, a specific set of queries for each domain was applied to each candidate reduced ontology along with the original ontology. To compute MAP, a simulated mobile device will download and process of each ontology along with the original ontology. We believe this model allows a speedy selection of a reduced ontology that contains the knowledge content and performance speedup needed by a mobile device for service discovery.
Article
Full-text available
Multiple sources of information can improve knowledge mangement if they are properly combined and processed. Knowledge engineering usually relies on knowledge resources, typically ontologies. We propose a domain-independent framework which models, combines and represents heterogenous sources of information. Our aim is to build a resources repository and afford operations of loading, storing, indexing, translating, generating and matching different resources. We propose an ontology as a model of these resources and we explain how can we represent, annotate and load new resources into our repository. These resources are treated to fit a specific need in a knowledge management process.
Conference Paper
As Web services and the semantic Web become more important, enabling technologies such as Web service ontologies will grow larger. The ability of mobile devices, such as cell phones and PDAs, to download and reason across them will be severely limited. Given that an agent on a mobile device only needs a subset of what is described in a Web service ontology, an ontology sub-graph can be created. In this paper, we develop a empirical software engineering approach to build a prediction model to measure the quality in terms of correct query handling of ontology sub-graphs relative to the original ontology - mean average recall of the sub-graph compared to the original ontology is used as the quality standard. Our metrics allow speedy selection of a sub-graph for use by a mobile device.
Conference Paper
Full-text available
Medical research and clinical practice deal with complex and heterogeneous data. This requires a systematic approach for semantic integration of information to support clinicians in their daily tasks. As the clinicians speak and think in a very different language than that of the computer scientists, existing knowledge engineering approaches based on classical expert interviews fall short. Moreover, as human health is a very sensitive subject, the reuse of standardized hence reliable ontologies as medical knowledge resources becomes a key requirement. In this paper, we first discuss the specific medical knowledge engineering requirements, we identified along a semantic medical image and text retrieval use case. Then we report on ongoing work towards establishing a corresponding methodology based on ontology reuse that is derived from the requirements. The methodology, which will be discussed in detail, relies on a novel technique for semi-automatically generating a set of potential user queries to support the knowledge elicitation process.
Article
Full-text available
In this paper, we investigate state-of-the-art tech-nologies in the area of modular ontologies and cor-responding logical formalisms. We compare the strengths and weaknesses of different formalisms for modular ontologies in their ability to support networked, dynamic and distributed ontologies, as well as the reasoning capability over these ontolo-gies. The comparison result shows limitations of existing formalisms to fully address the need of modular ontologies in the given setting, and possi-ble future extensions to overcome those limitations.
Conference Paper
Full-text available
The Health-e-Child project aims to develop an integrated healthcare platform for European paediatrics. In order to achieve a comprehensive view of children’s health, a complex integration of biomedical data, information, and knowledge is necessary. Ontologies will be used to formally define this domain knowledge and will form the basis for the medical knowledge management system. This paper introduces an innovative methodology for the vertical integration of biomedical knowledge. This approach will be largely clinician-centered and will enable the definition of ontology fragments, connections between them (semantic bridges) and enriched ontology fragments (views). The strategy for the specification and capture of fragments, bridges and views is outlined with preliminary examples demonstrated in the collection of biomedical information from hospital databases, biomedical ontologies, and biomedical public databases. KeywordsVertical Knowledge Integration-Approximate Queries-Ontology Views-Semantic Bridges
Conference Paper
Full-text available
Ontologies are shared models of a domain that encode a view which is common to a set of different parties. Contexts are local models that encode a party’s subjective view of a domain. In this paper we show how ontologies can be contextualized, thus acquiring certain useful properties that a pure shared approach cannot provide. We say that an ontology is contextualized or, also, that it is a contextual ontology, when its contents are kept local, and therefore not shared with other ontologies, and mapped with the contents of other ontologies via explicit (context) mappings. The result is Context OWL (C-OWL), a language whose syntax and semantics have been obtained by extending the OWL syntax and semantics to allow for the representation of contextual ontologies.
Article
On the Semantic Web, the ability to combine, integrate and reuse ontologies is crucial. The Web Ontology Language (OWL) defines the owl:imports construct, which allows to include by reference all the axioms contained in another knowledge base (KB) on the Web. This certainly provides some syntactic modularity, but not a logical modularity. We have proposed [3] E-Connections as a suitable formalism for combining KBs and for achieving modular ontology development on the Web. E-Connections are KR languages defined as a combination of other logical formalisms. They were originally introduced in [4] mostly as a way to go beyond the expressivity of each of the component logics, while preserving the decidability of the reasoning services in the combination. We have found that E-Connections can help process, evolve, reuse, and understand OWL ontologies. In this paper, we address the problem of automatically transforming an OWL KB O into a E-Connection Σ in such a way that each of the relevant sub-domains modeled in O is represented in a different component of Σ. We present a formal definition and investigation of different variants of the problem, a polynomial
Article
From the Publisher:XML, the Extensible Markup Language, is a W3C endorsed standard for document markup. Because of its ability to deliver portable data, XML is positioned to be a key web application technology. Given the complexity and incredible potential of this powerful markup language, it is clear that every serious developer using XML for data or text formatting and transformation will need a comprehensive, easy-to- access desktop reference in order to take advantage of XML's full potential. XML in a Nutshell will assist developers in formatting files and data structures correctly for use in XML documents. XML defines a basic syntax used to mark up data with simple, human-readable tags, and provides a standard format for computer documents. This format is flexible enough to be customized for transforming data between applications as diverse as web sites, electronic data inter-change, voice mail systems, and wireless devices, to name a few. Developers can either write their own programs that interact with, massage, and manipulate the data in XML documents, or they can use off-the-shelf software like web browsers and text editors to work with XML documents. Either choice gives them access to a wide range of free libraries in a variety of languages that can read and write XML. The XML specification defines the exact syntax this markup must follow: how elements are delimited by tags, what a tag looks like, what names are acceptable for elements, where attributes are placed, and so forth. XML doesn't have a fixed set of tags and elements that are supposed to work for everybody in all areas of interest for all time. It allows developers and writers to define the elements they need as they need them. Although XML is quite flexible in the elements it allows to be defined, it is quite strict in many other respects. XML in a Nutshell covers the fundamental rules that all XML documents and authors must adhere to, detailing the grammar that specifies where tags may be placed, what they must look like, which element names are legal, how attributes attach to elements, and much more. About the Authors: Elliotte Rusty Harold is a noted writer and programmer, both on and off the Internet. His Cafe au Lait website has become one of the most popular independent Java sites on the internet, and his spin-off site Cafe con Leche for XML News and Resources has become one of the most popular XML sites on the internet. Elliotte is the author of O'Reilly's Java Network Programming. W. Scott Means has been a professional software developer since 1988, when he joined Microsoft Corporation at the age of 17. He was one of the original developers of OS/2 1.1 and Windows NT, and did some of the early work on the Microsoft Network for the Advanced Technology and Business Development group. Most recently he is serving as the CEO of Industrial Web Machines, a new Internet venture based in Columbia, South Carolina.
Conference Paper
A central problem of co-operative information systems is the ability to integrate information from multiple sources. Although this problem has been studied for several decades, there is a need for a more refined approach in those cases where the original sources form a loose federation, each maintaining its own independent view of the world. In particular, we motivate with examples the utility of directed non-injective mappings between the individuals in the domains of multiple IS.We then extend the logical formalism of Description Logics, which has previously served successfully in IS integration and is currently being used in semantic-web ontolgies, to handle such mappings. The result is called Distributed Description Logics, and we consider some of its desirable properties, as well as some theorems concerning its computational aspects.