ArticlePDF Available

Developing Efficient Ontology-based Systems Using A-box Data from Running Applications

Authors:

Abstract and Figures

Today, information is typically contained in different IT sys-tems. Ontologies and semantic integration have been proposed for inte-grating this information with the purpose of meaningful processing and providing useful services on top of that information. As applications, at the same time, modify the information during run-time, the information to be integrated has a dynamic nature. In this paper, we discuss per-formance aspects of integrating such dynamically changing A-box data from running applications, point out several technical alternatives, and present performance measures for those alternatives. We show how the findings are applied in two different application examples.
Content may be subject to copyright.
Developing Efficient Ontology-based Systems
Using A-box Data from Running Applications
Heiko Paulheim1and Lars Meyer2
1Technische Universit¨at Darmstadt
Knowledge Engineering Group
paulheim@ke.tu-darmstadt.de
2SAP Research
{lars.meyer}@sap.com
Abstract. Today, information is typically contained in different IT sys-
tems. Ontologies and semantic integration have been proposed for inte-
grating this information with the purpose of meaningful processing and
providing useful services on top of that information. As applications, at
the same time, modify the information during run-time, the information
to be integrated has a dynamic nature. In this paper, we discuss per-
formance aspects of integrating such dynamically changing A-box data
from running applications, point out several technical alternatives, and
present performance measures for those alternatives. We show how the
findings are applied in two different application examples.
1 Introduction
In a typical software landscape, information is contained in different systems
databases, legacy systems, desktop applications, and so forth. As users have
to work with information from those different systems, integration is required.
For providing a meaningful, semantic integration, ontologies have been widely
discussed. There are several possible utilizations of such an integration, each
providing different benefits to end users [1]:
On a semantic desktop, novel ways of searching for data in applications
are made possible by extracting semantically annotated information from
applications [2, 3].
User interfaces can be automatically adapted according to the users’ needs
by having a reasoner analyze the UI components, the data they contain, and
the user’s needs [4].
Help on applications can be provided at run-time, adapted according to the
system’s current state and/or a user model [5, 6].
Software components can be automatically integrated by having a reasoner
process events raised by different components and thereby coordinating user
interactions across different, heterogeneous components [7].
Interactive, ontology-based visualizations of the information contained in dif-
ferent related applications can assist the user in fulfilling knowledge-intensive
tasks [8].
In all of those cases, the information is contained in running applications, which
means that it is highly dynamic and thus needs to be integrated at run-time.
Furthermore, reasoning on that data is essential for providing valuable informa-
tion to the end user. Thus, it is required that a reasoner has efficient access to the
data as its A-box. At the same time, user interaction is involved all of the cases,
which imposes strict requirements in terms of performance. Thus, high perfor-
mance mechanisms for reasoning on dynamic data from running applications are
needed.
With this paper, we investigate different architectural alternatives of build-
ing a systems which support efficient integration and reasoning about running
software applications, and we analyze the performance impact of the different
alternatives with respect to dynamic data. In two examples, we show how the
findings can be applied to improve the performance of real-world applications of
semantic integration.
The rest of this paper is structured as follows. In section 2, we introduce our
basic reasoning framework. Section 3 discusses different approaches for optimiza-
tion, which are evaluated in section 4. In section 5, we introduce two example
use cases for our framework and discuss how they benefit from the optimiza-
tion strategies. We conclude with a review on related work, a summary, and an
outlook on future work.
2 Basic Architecture
From our work in application integration, we have derived a generic architec-
ture for reasoning on running software applications. In [9], we have analyzed
two different basic architectural variants for providing A-box data from running
software components to a reasoner:
1. In a pushing approach, software components inform the reasoner about up-
dates, and the reasoner keeps an up-to-date representation in its own A-box,
which duplicates the original data contained in the different components.
2. In a pulling approach, the reasoner does not maintain an A-box. Instead, it
dynamically pulls instance data from the components whenever the evalua-
tion of a query demands for that instance data.
Our experiments have shown that only pulling approaches are feasible for
building a scalable solution [9]. The reason is that given a highly dynamic com-
ponent which changes its state quite frequently, the reasoner is kept busy with
processing the updates on its A-box. Once the update frequency exceeds a certain
threshold, the reasoner is overloaded and cannot answer any queries anymore.
On the other hand, many of those updates are unnecessary, e.g., if the same facts
are overwritten many times without being used in a query. A pulling approach
avoids those unnecessary updates, which only create system load without any
benefit.
Furthermore, maintaining an A-box with information which is also kept in
the system leads to an overhead due to double bookkeeping and can therefore
Component Container
Component 1
Reasoning
Module
A-box Connector
T-box Connector
Ontology
T-Box
Component Adapter
O5
O4
O3
O2
O1
Objects
Object
Registry
Triple
Cache
Triple
Factory
URI
Factory
register
Component Container
Component 2
Component Adapter
O5
O4
O3
O2
O1
Objects
Object
Registry
Triple
Cache
Triple
Factory
URI
Factory
register
Component Container
Component n
Component Adapter
O5
O4
O3
O2
O1
Objects
Object
Registry
Triple
Cache
Triple
Factory
URI
Factory
register
...
Query
Interface
query query query
Query
Client
query
about
running
system
Fig. 1. General architecture of our framework for reasoning about running applications.
cause consistency problems: with a pushing approach, the updates and queries
need to be properly queued in order to guarantee correct answers to each query.
Fig. 1 shows the basic building blocks of our framework. It depicts a number
of software components which are to be integrated. Each component (which we
treat as a black box) is encapsulated in a container, which provides an adapter
to the reasoner. The adapter consists of four essential parts:
Integrated components create and hold objects, which they register at the
adapter’s object registry to reveal them to the reasoner. Each component may
process different types of objects, and each type of object may be processed
by different components. There are different possible strategies of informing
the registry of updates of the registered objects (e.g., using the observer
pattern or change listeners); in our current implementation, the component
actively sends updates to the object registry.
From those objects, a triple factory creates data that is usable by the rea-
soner, i.e., RDF triples. The triple factory is also responsible for processing
mappings between the component’s class model and the ontology used for
information exchange. Those mappings are defined in a flexible, rule-based
language, which is also applicable for conceptually heterogeneous class mod-
els [10].
To create such triples, a URI is needed for each object. The URI factory
is responsible for creating such URIs which are unambiguous and unique
throughout the whole integrated system.
To improve performance, triples may be cached by the component. There
are different variants such as lazy or eager caches, which will we analyze in
more detail in the subsequent sections.
These adapters are used by the reasoner’s A-box connector to dynamically
resolve queries. In addition to that A-box connector, a T-box connector provides
the T-box part of the ontology (i.e., the definition of classes and relations) used as
a common ground for integrating information from the different components. In
contrast to the A-box, the T-box is considered as static, and the T-box connector
loads it once when the system starts.
The reasoner has a query interface that allows client components to pose
queries about the running system. Client components may be, e.g., internal
components, such as an event processing logic, as discussed in section 5.1, or
a graphical user interface providing an endpoint for querying the system, as dis-
cussed in section 5.2. The query interface may use languages such as SPARQL
or F-Logic. For the prototype implementation of our framework, we have used
OntoBroker [11] as a reasoner, and F-Logic [12] as a query language.
When a query runs, the results of that query have to be consistent. Thus,
updates occuring between the start of a query and its end should not be consid-
ered when computing the result. To provide such a consistent query answering
mechanism, the reasoner sends a lock signal to the component wrappers when
a query is started. Updates coming from the component are then queued until
the reasoner notifies the component that the query has been finished.
3 Aspects of Optimization
In the previous section, we have sketched the basic architecture of our system.
There are different variants of implementing that architecture. In [9], we have
discussed two basic aspects: centralized and decentralized processing, and using a
redundant A-box vs. using connectors for retrieving instance data at query time.
These results led to the architecture introduced in Sect. 2, using a centralized
reasoner and connectors for retrieving A-box data. In this section, we have a
closer look at two design aspects which allow several variations: the design of
the rules which make the reasoner invoke the A-box connector, and the use of
caches.
3.1 Design of Connector Invocation Rules
To make the reasoner invoke a connector, a rule is needed whose head indicates
the type of information the connector will deliver, and whose body contains a
statement for actually calling that connector. Technically, a connector is wired
to the reasoner with a predicate. For example, a connector providing instance
data for an object can be addressed with a generic rule as follows1:
instance connector(?I , ?C)?C(?I).(1)
1We use the common SWRL human readable syntax for rules, although in SWRL,
variables are not allowed for predicates. In our implemented prototype, we have used
F-Logic for formulating the rules, which allows for using variables for predicates.
The reasoning framework, OntoBroker in our case, is responsible for dispatching
the use of the predicate instance connector to an implementation of that con-
nector, i.e. a Java method. This method then provides a set of bindings for the
variables (in this case: ?Iand ?C). If some of the variables are already bound,
the contract is that the method returns the valid bindings for the unbound vari-
ables which yield a true statement given the already bound ones. Consider the
following example query, asking for all instances of a class #Person:
SELECT ?I WHERE { ?I rdf:type #Person }
The resolution of this query leads to the invocation of rule 1 with the variable
?Cbound to #Person. The connector method now returns a set of bindings for
the variable ?Ifor which the statement #P erson(?I) is true. The reasoner then
substitutes the results in the query and returns the result.
The mechanism defined in rule 1 is the most basic way of integrating a
connector which delivers information about instances and the classes they belong
to. As it has to be evaluated in each condition in a rule’s body where statements
like ?C(?I) occur (either unbound or with one or both of the variables bound),
the connector is invoked very frequently. Since invoking a connector may be a
relatively costly operation (even with caches involved, as described below), this
is a solution which may imply some performance issues.
A possible refinement is the use of additional constraints. The idea is that
for each integrated software component, the set of possible ontology classes the
data objects may belong to is known. Given that the union of those sets over all
components is #Class1 through #ClassN, the above rule can be refined to an
extended rule:
(equal(?C, #C lass1) equal(?C, #C lass2)... equal(?C, #C lassN ))
instance connector(?I , ?C)
?C(?I) (2)
Assuming a left to right order of evaluation of the rule’s body, the connector is
now only invoked in cases where the variable ?Cis bound to one of the given
values. Therefore, the number of the connector’s invocations can be drastically
reduced.
A variant of that solution is the use of single rules instead of one large rule:
instance connector(?I , #Class1) #Class1(?I)
instance connector(?I , #Class2) #Class2(?I)
...
instance connector(?I , #ClassN )#C lassN (?I) (3)
In that case, the connector is not always invoked when evaluating a statement
of type ?C(?I). Instead, each rule is only invoked for exactly one binding of ?C.
In the example query above, only the first rule’s body would be evaluated at all,
invoking the connector once with one bound variable. On the other hand, the
number of rules the reasoner has to evaluate for answering a query is increased.
A-box
Connector
Component
Adapter
O5
O4
O3
O2
O1
Queries are passed
through to the
original component
query query
(a) Without cache
A-box
Connector
Component
Adapter
O5
O4
O3
O2
O1
Cache entries are
replaced when
updates occur
update
query
(b) Eager cache
A-box
Connector
Component
Adapter
O5
O4
O3
O2
O1
Cache entries are
invalidated when
updates occur
query query
on cache
miss
update
(c) Lazy cache
Fig. 2. Different variants for using caches
The above example rules show how to invoke the instance connector wrap-
per, which returns statements about category membership of instances. The
other important is relation connector(?X, ?R, ?Y), which has three variables. It
returns the set of all triples where object ?Xis in relation ?Rwith object Y.
As for the instance connector wrapper, the corresponding types of invocation
rules exist.
For this paper, we have analyzed the performance impact of all three rule
types: the generic rule (1), the use of an extended rule (2), and the use of single
rules (3).
3.2 Distributed Caching of A-Box Fragments
To speed up the answer of our connectors, partly caching instance data in the
connector is a good strategy [13], although it slightly contradicts to the idea
of avoiding double bookkeeping it is the classic trade-off of redundancy vs.
performance. We have analyzed three different variants: using no caches at all,
i.e., each query for instance data is directly passed to the underlying objects,
and statements are assembled at query time (see Fig. 2(a)); and using eager and
lazy caching. While the eager cache updates the required statements for each
object when that object changes (see Fig. 2(b)), the lazy cache flags statements
as invalid upon change of the represented object, and re-creates them only if
they are requested (see Fig. 2(c)).
While using no caches at all avoids unnecessary workload when an update
occurs, eager caches are supposed to be the fastest to respond to queries. Lazy
caches can provide a compromise between the two, allowing fast responses to
queries as well as avoiding unnecessary workload. In the next section, we will
analyze those effects in more detail.
Description Query in SPARQL
1 Get all objects of type CSELECT ?I WHERE {?I rdf:type C.}
2a Get all objects of type C
in a relation Rwith
object O
SELECT ?I WHERE {?I rdf:type C. ?I R O.}
2b SELECT ?I WHERE {?I rdf:type C. O R ?I.}
2a+b SELECT ?I WHERE {{?I rdf:type C. ?I R O.}
UNION {?I rdf:type C. O R ?I.}}
3a Get all objects of type C
in any relation with
object O
SELECT ?I WHERE {?I rdf:type C. ?I ?R O.}
3b SELECT ?I WHERE {?I rdf:type C. O ?R ?I.}
3a+b SELECT ?I WHERE {{?I rdf:type C. ?I ?R O.}
UNION {?I rdf:type C. O ?R ?I.}}
4a Get all objects of any
type in a relation Rwith
object O
SELECT ?I WHERE {?I R O.}
4b SELECT ?I WHERE {O R ?I.}
4a+b SELECT ?I WHERE {{?I R O.} UNION {O R ?I.}}
5a Get all objects of any
type in any relation with
object O
SELECT ?I WHERE {?I ?R O.}
5b SELECT ?I WHERE {O ?R ?I.}
5a+b SELECT ?I WHERE {{?I ?R O.}
UNION {O ?R ?I.}}
Table 1. The different query types we used to analyze the performance impact of
different variants, and their SPARQL representation.
4 Evaluation
4.1 Query Times
In this section, we analyze the performance impact of the different connector
rule and caching variants on different reasoning tasks which involve information
about and in the running system. We have examined both the query times for
different query types, as well as the maximum degree of dynamics, i.e. the max-
imum frequency of updates to the A-box which can be handled by the different
implementation variants.
4.2 Setup
To evaluate the impact of the different variants, we have analyzed various el-
ementary query types, as shown in table 1. The queries were selected to cover
a variety of cases with known or unknown predicate types, as well as known
or unknown type restrictions for the queried objects. For each query, we have
measured the average time to process that query. All tests have been carried
out on a Windows XP 64Bit PC with an Intel Core Duo 3.00GHz processor and
4GB of RAM, using Java 1.6 and OntoBroker 5.3.
The variables in our evaluation are the number of software components (5,
10, and 20), the number of object instances maintained by each component (250
and 500, randomly distributed over 10 classes per component), and the average
update frequency (25 and 50 updates per second). Each instance has been given
5 relations to other instances. Therefore, our maximum test set has 10000 object
instances with 50000 relations. As our focus is on dynamic data, we altered the
data at a frequency of 50 updates per second in the first set of evaluations.
In Fig. 3, we show the results of selected typical queries, which illustrate the
main findings of our analysis2. While the figure depicts the results for the a+b
flavor of each query type (see table 1), the results for the aand bflavor are
similar.
Generally, the query times using eager caching are faster than those using
lazy caching. The actual factor between the two ranges from double speed (e.g.,
in case of type 4 queries) to only marginal improvements (e.g., in case of type
3 queries). Since lazy caches have to re-create the invalidated triples to answer
a query, eager caches can always serve the requested triples directly. Therefore,
the latter can answer queries faster.
When looking at scalability, it can be observed that doubling the number
of integrated components (which also doubles the number of A-box instances)
about doubles the query answer time in most cases, thus, there is a linear growth
of complexity in most cases.
Multiple observations can be made when looking at the individual results
regarding different queries. For type 3 queries, it is single rules which produce
significant outliers (see Fig. 3(b)), while for type 5 queries, it is generic rules (see
Fig. 3(d)). Thus, only only extended rules guarantee reasonable response times
in all cases without any outliers, although they are outperformed by generic rules
in type 2 and 4 queries.
In case of type 1, 2 and 4 queries, the relation in which the objects sought
is already fixed (e.g., “find all persons which are married to another person”),
while the case of type 3 and 5 queries, the relation is variable (e.g., “find all
persons which have any relation to another person”). The first type of query is
rather target-oriented, while the latter is rather explorative. The key finding of
the results is that target-oriented queries do not pose any significant problems,
while explorative queries do.
The bad behavior of single rules in the case of explorative queries can be
explained by the fact that when an explorative query is answered, the various
single rules fire, thus causing many potentially expensive invocations of the con-
nector for Ncomponents and Mtypes, an explorative query may cause up
to O(N×M) connector invocations. The generic and extended rules, on the
other hand, invoke the connector less often. For a similar reason, the generic
rule variant behaves badly for the explorative query types: here, the reasoner
determines the possible relation types by invoking the wrapper multiple times,
each time trying another relation type. Extended and single rules, on the other
hand, already restrict the relation types in the rule body, thus requiring less
invocations of the wrapper.
2Type 1 queries are not shown; they are generally answered very quickly, and there
are no significant differences between the approaches.
4.3 Maximum Frequency of A-box Updates
Besides the time it takes to answer a query, another important performance cri-
terion is the robustness of the system regarding A-box dynamics. While the rule
design only influences the query times as such, a careful design of the wrappers’
caches has a significant impact on the system’s scalability with respect to the
maximum possible frequency of A-box updates, as shown in Fig. 4.
The figure shows that while both eager and lazy caches do not drop in per-
formance too strongly when scaling up the number of instances involved, lazy
caching is drastically more robust regarding A-box dynamics. While several thou-
sand updates per second on the A-box are possible with lazy caching, eager
caching allows for less than 100. As assumed, lazy caches thus scale up much
better regarding a-box dynamics, but at the trade-off of longer query response
times, as shown above.
5 Examples
To illustrate the relevance of the findings presented in the previous sections,
we introduce two examples: one using goal-directed and one using explorative
queries.
5.1 Example for Goal-Directed Queries: Semantic Event Processing
In [14], we have discussed the use of ontologies for application integration on
the user interface level. The approach relies on using ontologies for formally
describing user interface components and the information objects they process.
Reasoning is used to facilitate semantic event processing as an indirection for
decoupling the integrated applications [9].
By annotating the events produced by different user interface components, a
reasoner can analyze those events, compute possible reactions by other compo-
nents, and notify those components for triggering those reactions. This reason-
ing process requires instance information about the different applications, their
states, and the data they process, which is delivered by the framework explained
in Sect. 2.
An example for an integration rule could state the following: “When the user
performs a select action with an object representing a customer who has an
address, the address book component will display that address, if it is visible
on the screen.” If this rule is evaluated by a reasoner, it has to be able to
validate certain conditions, e.g. whether there is an address book component
which is visible, or whether the customer in question has an address. It therefore
needs access to information about both the system’s components as well as the
information objects they process. More sophisticated reasoning may come into
place, e.g., when implementing different behaviors for standard and for premium
customers, where the distinction between the two may involve the evaluation of
different business rules.
1000
1500
2000
2500
3000
3500
Generic Rule,
Eager Cache
Generic Rule,
Lazy Cache
Extended Rule,
Eager Cache
Extended Rule,
0
500
1000
1500
2000
2500
3000
3500
5
10
15
20
Generic Rule,
Eager Cache
Generic Rule,
Lazy Cache
Extended Rule,
Eager Cache
Extended Rule,
Lazy Cache
Single Rules,
Eager Cache
Single Rules,
Lazy Cache
(a) Query type 2a+b
3000
4000
5000
6000
7000
8000
9000
10000
Generic Rule,
Eager Cache
Generic Rule,
Lazy Cache
Extended Rule,
Eager Cache
Extended Rule,
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
5
10
15
20
Generic Rule,
Eager Cache
Generic Rule,
Lazy Cache
Extended Rule,
Eager Cache
Extended Rule,
Lazy Cache
Single Rules,
Eager Cache
Single Rules,
Lazy Cache
(b) Query type 3a+b
1000
1500
2000
2500
3000
Generic Rule,
Eager Cache
Generic Rule,
Lazy Cache
Extended Rule,
Eager Cache
Extended Rule,
0
500
1000
1500
2000
2500
3000
5
10
15
20
Generic Rule,
Eager Cache
Generic Rule,
Lazy Cache
Extended Rule,
Eager Cache
Extended Rule,
Lazy Cache
Single Rules,
Eager Cache
Single Rules,
Lazy Cache
(c) Query type 4a+b
30000
40000
50000
60000
70000
80000
90000
100000
Generic Rule,
Eager Cache
Generic Rule,
Lazy Cache
Extended Rule,
Eager Cache
Extended Rule,
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
5
10
15
20
Generic Rule,
Eager Cache
Generic Rule,
Lazy Cache
Extended Rule,
Eager Cache
Extended Rule,
Lazy Cache
Single Rules,
Eager Cache
Single Rules,
Lazy Cache
(d) Query type 5a+b
Fig. 3. Query times for selected query types, each for 500 object instances per compo-
nent, and 50 updates per second. The x axis shows the number of components (there
have been no explicit measurements for 15 components), and the y axis shows the
query time in seconds.
A typical query used for event processing asks: given a particular event #E,
which other events are triggered by that event:
SELECT ?E1 WHERE {?E1 #triggeredBy #E.}
Thus, the predicate is fixed, and the query is goal-directed. As discussed for
the general results above, we have experienced that the different invocation rule
variants do not affect performance too much, while eager caching leads to a
significant speed-up. Details on the example can be found in [9].
5.2 Example for Explorative Queries: Semantic Data Visualization
Gathering and aggregating information from different IT systems can be a cum-
bersome and time consuming task for the end user. Combining that data with
a reasoner can provide direct benefit for the end user.
In [8], we have introduced the Semantic Data Explorer. The Semantic Data
Explorer provides a uniform graphical visualization of the data contained from
applications using a central reasoning module, using the architecture discussed
1
10
100
1000
10000
100000
Eager Cache
Lazy Cache
Max. updates/second
Fig. 4. Evaluation of robustness of different caching approaches regarding A-box dy-
namics. The graph shows the maximum number of updates per second that the system
can process. Note that the y-axis has a logarithmic scale.
above. The user can drag objects from connected applications to the Semantic
Data Explorer and navigate the corresponding graph view.
The Semantic Data Explorer uses the reasoner as an indirection for construct-
ing the graph view. From an implementation point of view, this architecture pro-
vides a decoupling of the visualization and the data sources. More importantly,
the reasoner may also reveal implicit knowledge gathered from the A-box infor-
mation using T-box axioms and rules. This implicit knowledge is then included
in the visualization as well, providing additional value to the end user.
A user study has shown that the Semantic Data Explorer can lead to sig-
nificantly faster task completion times when gathering information, as well as
to enhanced user satisfaction3. Implementation details on the tool, as well as a
detailed description of the user study can be found in [8].
Displaying a node in the Semantic Data Explorer requires finding all incoming
and outgoing edges to other objects and data properties. Thus, the underlying
queries are explorative:
SELECT ?R ?V WHERE {<#x> ?R ?V}
SELECT ?R ?V WHERE {?V ?R <#x>}
The results above advise to use extended rules for this sort of queries. In fact,
experiments with the SDE showed that using an extended rule leads to a per-
ceivably faster system, which in turn increases the end users’ satisfaction.
3A demo video is available at http://soknos.de/index.php?id=470&L=0.
6 Related Work
One of the best researched approaches for reasoning on objects from integrated
systems is the use of so-called wrappers [11] or mediators [13], which collect
objects from databases or structured documents and provide them to a reasoner
as instance data.
D2RQ [15] is an example for a wrapper platform that integrates standard,
non-RDF databases as RDF data sources and thus makes them available to a
reasoner. Based on mapping rules, data entries from database tables are lifted as
RDF instance data. The authors present an evaluation based on different query
types that shows that retrieval of the data is feasible in reasonable time.
Lixto [16] is an example that uses wrappers to gather RDF data from non-
annotated web pages. It provides graphical interfaces for defining the mechanisms
used to extract data from the HTML documents. The authors show different use
cases where RDF data gathered from the web is utilized. Those applications do
not perform the retrieval at run-time, but offline, i.e. they parse web sites and
build an RDF data store. The user’s queries are then posed against that RDF
data store.
OntoBroker [11] is a reasoning engine that provides different means for inte-
grating data from various sources, including access to databases, web pages and
web services via so-called connectors. As the API also foresees the integration of
own connectors accessing arbitrary sources of instance data, we have based the
prototype described in this paper on OntoBroker.
Various approaches have been proposed for directly accessing ob jects of run-
ning software applications [17]. There are two main variants of making the in-
stance data known to the reasoner. The first relies on semantic annotation of the
underlying class models, such as sommer4or otm-j [18]. The second uses class
models generated directly from an ontology, with the necessary access classes for
reasoning access being generated as well, such as RDFReactor [19], or OntoJava
[20]. With dynamically typed scripting languages, the corresponding classes may
also be generated on the fly, as shown, e.g., with Tramp5for Python. A detailed
comparison of such approaches is given in [10]. However, analyses of efficiency
and scalability of these approaches are rarely found.
Most of those approaches are not very flexible with respect to conceptual
heterogeneity (i.e., class models that are different from the common ontology
used for integration) as well as technological heterogeneity (i.e., using class mod-
els in different programming languages in parallel). The framework discussed
in this paper uses flexible mapping rules and allows for containers for different
programming languages [10, 21].
One of the best-known and most compelling application of making data from
various applications known to a reasoner is the semantic desktop [3]. It allows
users to browse, analyze, and relate data stored in different applications and
provides new means of accessing data stored on a personal computer. Different
4https://sommer.dev.java.net/
5http://www.aaronsw.com/2002/tramp/
adapters exist which wrap data from databases, file systems, or e-mail clients.
While there are various publications concerning impressive applications of the
semantic desktop as well as various architectural aspects, systematic approaches
of assessing the performance of the underyling technology are still hard to find.
7 Conclusion and Outlook
In this paper, we have introduced a framework for integrating dynamic A-box
data from running software system with a central reasoner. There are several
use cases for applying such a framework, e.g. searching information from dif-
ferent applications on a semantic desktop, dynamically adapting user interfaces
to users’ needs, or automatically integrating existing user interface components
to a seamless application at run-time. In all of those approaches, a reasoner is
used, which may need to have access to the data both contained in software
components as well as about those software components as such. As reasoning
is performed while those components are running, the A-box can be highly dy-
namic.
In most of the use cases of reasoning on dynamic systems sketched above,
good performance is an essential requirement, as user interactions are involved.
Based on the prototype implementation of our architecture, we have conducted
several experiments to evaluate the performance impact of different implemen-
tation variants. Those variants encompass different caching strategies as well as
the design of the rules from which the connectors are called. We have tested the
variants with 13 different query types.
In this paper, we have analyzed the performance impact of three different
rule types for rules invoking connectors to software components: generic rules,
extended rules, and single rules. In some test cases, the query answering times
even differ at a factor of 100 between the different approaches. This proves that
the design of rules has a significant impact on the system performance.
One major finding is that there is no solution that provides optimal results
for each usage scenario. In summary, we have shown that there are significant
differences between explorative and goal-directed queries: in the first case, queries
contain a fixed relation (e.g. “find all persons that are married to another per-
son”), while in the latter case, the relation is a variable (e.g. “find all persons that
have any relation to another person”). While some queries are handled almost
equally well by all three rule types, only extended rules guarantee reasonable
and stable query answering times in all cases. To illustrate the significance of
the results, we have introduced two example use cases, one using goal-directed
and one using explorative queries.
In addition, we have analyzed two different strategies for caching data in
the wrappers: eager and lazy caching. Eager caching allows for response times
up two twice as fast as lazy caching. On the other hand, lazy caching supports
much more dynamic A-boxes: eager caching only works for less than 100 A-box
updates per second, while with lazy caches, several thousand A-box updates per
second can be processed. Therefore, a trade-off between A-box dynamics and
query times can be identified. When implementing an actual system, a solution
should be chosen according to that system’s actual requirements.
In this paper, we have analyzed the performance effects using a set of ele-
mentary queries in this paper, and we have shown that different implementation
variants perform better or worse with certain query types. More complex query
types may reveal deeper insights into performance optimization of semantic ap-
plications.
While semantic technologies and reasoning on running software applications
allow for interesting and valuable functionality, poor performance can be and
in fact often is a show stopper. Thus, such applications should be carefully de-
signed in order to be adopted by end users on a larger scale. With this paper, we
have given insight in some strategies which can be carried over to the develop-
ment of high performance systems using semantic technology. We are confident
that this contribution will help developers of semantic web based software in
creating systems which be come widely accepted.
Acknowledgements
The work presented in this paper has been partly funded by the German Federal
Ministry of Education and Research under grant no. 01ISO7009 and 01IA08006.
References
1. Paulheim, H., Probst, F.: Ontology-Enhanced User Interfaces: A Survey. Interna-
tional Journal on Semantic Web and Information Systems 6(2) (2010) 36–59
2. Cheyer, A., Park, J., Giuli, R.: IRIS: Integrate. Relate. Infer. Share. [22]
3. Sauermann, L., Bernardi, A., Dengel, A.: Overview and Outlook on the Semantic
Desktop. [22]
4. Karim, S., Tjoa, A.M.: Towards the Use of Ontologies for Improving User Interac-
tion for People with Special Needs. In Miesenberger, K., Klaus, J., Zagler, W.L.,
Karshmer, A.I., eds.: ICCHP. Volume 4061 of Lecture Notes in Computer Science.,
Springer (2006) 77–84
5. Gribova, V.: Automatic Generation of Context-Sensitive Help Using a User Inter-
face Project. In Gladun, V.P., Markov, K.K., Voloshin, A.F., Ivanova, K.M., eds.:
Proceedings of the 8th International Conference ”Knowledge-Dialogue-Solution”.
Volume 2. (2007) 417–422
6. Kohlhase, A., Kohlhase, M.: Semantic Transparency in User Assistance Systems.
In: Proceedings of the 27th annual ACM international conference on Design of
Communication. Special Interest Group on Design of Communication (SIGDOC-
09), Bloomingtion,, IN, United States, ACM Special Interest Group for Design of
Communication, ACM Press (2009) 89–96
7. Paulheim, H.: Ontologies for User Interface Integration. In Bernstein, A., Karger,
D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K., eds.:
The Semantic Web - ISWC 2009. Volume 5823 of LNCS., Springer (2009) 973–981
8. Paulheim, H.: Improving the Usability of Integrated Applications by Using Visu-
alizations of Linked Data. In: Proceedings of the International Conference on Web
Intelligence, Mining and Semantics (WIMS’11). (2011)
9. Paulheim, H.: Efficient Semantic Event Processing: Lessons Learned in User In-
terface Integration. In Aroyo, L., Antoniou, G., Hyv¨onen, E., ten Teije, A., Stuck-
enschmidt, H., Cabral, L., Tudorache, T., eds.: The Semantic Web: Research and
Applications (ESWC 2010), Part II. Volume 6089 of LNCS., Springer (2010) 60–74
10. Paulheim, H., Plendl, R., Probst, F., Oberle, D.: Mapping Pragmatic Class Models
to Reference Ontologies. In: The 2011 IEEE 27th International Conference on Data
Engineering Workshops - 2nd International Workshop on Data Engineering meets
the Semantic Web (DESWeb). (2011) 200–205
11. Decker, S., Erdmann, M., Fensel, D., Studer, R.: Ontobroker: Ontology Based
Access to Distributed and Semi-Structured Information. In Meersman, R., Tari,
Z., Stevens, S.M., eds.: Database Semantics - Semantic Issues in Multimedia Sys-
tems, IFIP TC2/WG2.6 Eighth Working Conference on Database Semantics (DS-
8), Rotorua, New Zealand, January 4-8, 1999. Volume 138 of IFIP Conference
Proceedings., Kluwer (1999) 351–369
12. Angele, J., Lausen, G.: Ontologies in F-Logic. In Staab, S., Studer, R., eds.:
Handbook on Ontologies. International Handbooks on Information Systems. 2nd
edition edn. Springer (2009) 45–70
13. Wiederhold, G., Genesereth, M.: The Conceptual Basis for Mediation Services.
IEEE Expert 12(5) (sep/oct 1997) 38 –47
14. Paulheim, H., Probst, F.: Application Integration on the User Interface Level: an
Ontology-Based Approach. Data & Knowledge Engineering Journal 69(11) (2010)
1103–1116
15. Bizer, C., Seaborne, A.: D2RQ - Treating Non-RDF Databases as Virtual RDF
Graphs. In: ISWC2004 Posters. (November 2004)
16. Baumgartner, R., Eiter, T., Gottlob, G., Herzog, M., Koch, C.: Information Ex-
traction for the Semantic Web. In Eisinger, N., Maluszynski, J., eds.: Reasoning
Web. Volume 3564 of Lecture Notes in Computer Science., Springer (2005) 275–289
17. Puleston, C., Parsia, B., Cunningham, J., Rector, A.: Integrating Object-Oriented
and Ontological Representations: A Case Study in Java and OWL. In Sheth, A.P.,
Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T.W., Thirunarayan, K.,
eds.: The Semantic Web - ISWC 2008. Volume 5318 of LNCS., Springer (2008)
130–145
18. Quasthoff, M., Meinel, C.: Semantic Web Admission Free - Obtaining RDF and
OWL Data from Application Source Code. In Kendall, E.F., Pan, J.Z., Sabbouh,
M., Stojanovic, L., Bontcheva, K., eds.: Proceedings of the 4th International Work-
shop on Semantic Web Enabled Software Engineering (SWESE). (2008)
19. olkel, M., Sure, Y.: RDFReactor - From Ontologies to Programmatic Data Ac-
cess. In: Posters and Demos at International Semantic Web Conference (ISWC)
2005, Galway, Ireland. (2005)
20. Eberhart, A.: Automatic Generation of Java/SQL Based Inference Engines from
RDF Schema and RuleML. In Horrocks, I., Hendler, J.A., eds.: The Semantic
Web - ISWC 2002, First International Semantic Web Conference, Sardinia, Italy,
June 9-12, 2002, Proceedings. Volume 2342 of Lecture Notes in Computer Science.,
Springer (2002) 102–116
21. Paulheim, H.: Seamlessly Integrated, but Loosely Coupled - Building UIs from
Heterogeneous Components. In: ASE ’10: Proceedings of the IEEE/ACM Inter-
national Conference on Automated Software Engineering, New York, NY, USA,
ACM (2010) 123–126
22. Decker, S., Park, J., Quan, D., Sauermann, L., eds.: Proceedings of the ISWC 2005
Workshop on The Semantic Desktop - Next Generation Information Management
& Collaboration Infrastructure. Volume 175 of CEUR-WS. (2005)
Chapter
End users also start days with Internet. This has become the scenario. One of the most burgeoning needs of computer science research is research on web technologies and intelligence, as that has become one of the most emerging nowadays. A big area of other research areas like e-marketing, e-learning, e-governance, searching technologies, et cetera will be highly benefited if intelligence can be added to the Web. The objective of this chapter is to create a clear understanding of Web technology research and highlight the ways to implement Semantic Web. The chapter also discusses the tools and technologies that can be applied to develop Semantic Web. This new research area needs enough care as sometimes data are open. Thus, software engineering issues are also a focus.
Article
Full-text available
In this paper we introduce a new semantic desktop system called IRIS, an application framework for enabling users to create a "personal map" across their office-related information objects. Built as part of the CALO Cog- nitive Assistant project, IRIS represents a step in our quest to construct the kinds of tools that will significantly augment the user's ability to perform knowledge work. This paper explains our design decisions, progress, and short- comings. The IRIS project has grown from the past work of others and offers opportunities to augment and otherwise collaborate with other current and fu- ture semantic desktop projects. This paper marks our entry into the ongoing conversation about semantic desktops, intelligent knowledge management, and systems for augmenting the performance of human teams.
Article
Full-text available
As Semantic Web technologies are getting ma-ture, there is a growing need for RDF applica-tions to access the content of huge, live, non-RDF, legacy databases without having to replicate the whole database into RDF. In this poster, we present D2RQ, a declarative lan-guage to describe mappings between applica-tion-specific relational database schemata and RDF-S/OWL ontologies. D2RQ allows RDF applications to treat non-RDF relational data-bases as virtual RDF graphs, which can be queried using RDQL.
Chapter
Full-text available
Frame Logic (F-logic) combines the advantages of conceptual modeling that come from object-oriented frame-based languages with the declarative style, compact and simple syntax, and the well defined semantics of logic-based languages. F-logic supports typing, meta-reasoning, complex objects, methods, classes, inheritance, rules, queries, modularization, and scoped inference. In this paper we describe the capabilities of knowledge representation systems based on F-logic and illustrate the use of this logic for ontology specification. We give an overview of the syntax and semantics of the language and discuss the main ideas behind the various implementations. Finally, we present a concrete application deployed in the automotive industry.
Conference Paper
Full-text available
Application integration can be carried out on three different levels: the data source level, the business logic level, and the user interface level. With ontologies-based integration on the data source level dating back to the 1990s and semantic web services for integrating on the business logic level coming of age, it is time for the next logical step: employing ontologies for integration on the user interface level. Such an approach supports both the developer (in terms of reduced development times) and the user (in terms of better usability) of integrated applications. In this paper, we introduce a framework employing ontologies for integrating applications on the user interface level.
Conference Paper
Full-text available
The Web Ontology Language (OWL) provides a modelling paradigm that is especially well suited for developing models of large, structurally complex domains such as those found in Health Care and the Life Sciences. OWL's declarative nature combined with powerful reasoning tools has effectively supported the development of very large and complex anatomy, disease, and clinical ontologies. OWL, however, is not a programming language, so using these models in applications necessitates both a technical means of integrating OWL models with programs and considerable methodological sophistication in knowing how to integrate them. In this paper, we present an analytical framework for evaluating various OWL-Java combination approaches. We have developed a software framework for what we call hybrid modelling , that is, building models in which part of the model exists and is developed directly in Java and part of the model exists and is developed directly in OWL. We analyse the advantages and disadvantages of hybrid modelling both in comparison to other approaches and by means of a case study of a large medical records system.
Conference Paper
Full-text available
Formal description of concepts so that it may be processed by com- puters has great promises for people with special needs. By making use of on- tologies, improved user interaction with personal information management sys- tem is possible for these people. An ontology using semantic web technology is proposed which formally describes the mapping information about user's im- pairments, and the available interface characteristics. Effort is made to enhance accessibility at a generic level by making it possible to enrich the ontology for a diverse range of users. Consequently users with all types of special needs are able to get already customized interfaces. Especially, the possible adaptation to our prototype Personal Information Management System SemanticLIFE (1) is the trigger for this investigation.
Article
Integration of software applications can be achieved on different levels: the data level, the business logic level, and the user interface level. Integration on the user interface level means assembling the user interfaces of existing applications in a framework allowing seamless, unified interaction with those applications. While integration on the user interface level is desirable both from a software engineering as well as from a usability point of view, most current approaches require detailed knowledge of the integrated applications and make the implementation of a common interaction that involves different applications a difficult issue.In this paper, we propose a framework using ontologies for application integration on the user interface level by encapsulating the applications in plugins. Our approach is to use different ontologies for characterizing applications and the interactions possible with them, and for semantically annotating information objects exchanged between applications. Thus, the domain-independent and the domain-specific parts are untangled, which makes the framework applicable to different domains. An instance-based reasoner is used to process the ontologies and to compute the possible interactions, thus enabling integration at run-time.In an example from the domain of emergency management, we show how our approach helps implementing cross-application interactions more easily, thus significantly lowering the barriers for interoperability.
Conference Paper
The World Wide Web represents a universe of knowledge and information. Unfortunately, it is not straightforward to query and access the desired information. Languages and tools for accessing, extracting, transforming, and syndicating the desired information are required. The Web should be useful not merely for human consumption but additionally for machine communication. Therefore, powerful and user-friendly tools based on expressive languages for extracting and integrating information from various different Web sources, or in general, various heterogeneous sources are needed. The tutorial gives an introduction to Web technologies required in this context, and presents various approaches and techniques used in information extraction and integration. Moreover, sample applications in various domains motivate the discussed topics and providing data instances for the Semantic Web is illustrated.
Conference Paper
This paper describes two approaches for automatically converting RDF Schema and RuleML sources into an inference engine and storage repository. Rather than using traditional inference systems, our solution bases on mainstream technologies like Java and relational database systems. While this necessarily imposes some restrictions, the ease of integration into an existing IT landscape is a major advantage. We present the conversion tools and their limitations. Furthermore, an extension to RuleML...