Semantic Query-by-Example for RDF data
ABSTRACT As a new way of information management emerges, the fixed and static nature of the relational model is no longer appropriate for new applications such as life sciences and lifelog management. The Semantic Web provides a very flexible data model called Resource Description Framework (RDF) and a query language called SPARQL to represent and share data for such applications. Easy access to RDF data is the most important factor in increasing the use of the Semantic Web and promoting information sharing. However, complex structure of RDF data and possible concept mismatch make it difficult to express a query by using a textual, structured query language like SPARQL. This paper presents a visual query interface called Semantic Query-by-Example (SQBE) that can be used to express a query over RDF data visually. The schema-guided query construction feature of SQBE makes it easy to formulate a query without knowledge on the structure of RDF data. A visual query drawn by SQBE is translated into a SPARQL query by a visual query-to-SPARQL translation algorithm for execution and result output. This paper presents various example queries that illustrate how simply and intuitively SQBE can formulate user queries. A prototype implementation of SQBE for use in a lifelog management system is also described in this paper.
-
Citations (0)
-
Cited In (0)
Page 1
Semantic Query-by-Example for RDF data
Inchul Song
Computer Science
KAIST
Daejeon, Korea
Email: icsong@dbserver.kaist.ac.kr
Myoung Ho Kim
Computer Science
KAIST
Daejeon, Korea
Email: mhkim@dbserver.kaist.ac.kr
Abstract—As a new way of information management emerges,
the fixed and static nature of the relational model is no longer
appropriate for new applications such as life sciences and lifelog
management. The Semantic Web provides a very flexible data
model called Resource Description Framework (RDF) and a
query language called SPARQL to represent and share data
for such applications. Easy access to RDF data is the most
important factor in increasing the use of the Semantic Web and
promoting information sharing. However, complex structure of
RDF data and possible concept mismatch make it difficult to
express a query by using a textual, structured query language
like SPARQL. This paper presents a visual query interface
called Semantic Query-by-Example (SQBE) that can be used
to express a query over RDF data visually. The schema-guided
query construction feature of SQBE makes it easy to formulate a
query without knowledge on the structure of RDF data. A visual
query drawn by SQBE is translated into a SPARQL query by
a visual query-to-SPARQL translation algorithm for execution
and result output. This paper presents various example queries
that illustrate how simply and intuitively SQBE can formulate
user queries. A prototype implementation of SQBE for use in a
lifelog management system is also described in this paper.
I. INTRODUCTION
The Semantic Web is a Web of data and information. It
allows describing and sharing of resources and relationships
on the Web. It has been used for many applications such as
life sciences [1] and lifelog management [2] [3].
Resource Description Framework (RDF) [4] is the data
model for the Semantic Web. Resources and relationships
between them are modeled as a labelled graph in RDF.
Universal Resource Identifiers (URIs) are used to uniquely
idenitfy resources and relationships in an RDF graph. The
RDF data model is very flexible so as to represent any form
of data such as relational data and XML data. For example,
data stored in a relational database can be easily translated
into RDF data [5]
The SPARQL language [6] is the standard query language
for RDF data. It is a graph pattern-based query language that
extracts information from RDF data that matches to the graph
structure specified by the user. To promote the use of the
Semantic Web, it is crucial that the user can easily access
RDF data. However, there are a couple of difficulties when
the user uses SPARQL to express a query over RDF data.
First, since SPARQL is based on graph patten matching, the
user needs to know the underlying structure of RDF data to
express a query correctly. For large RDF data with complex
structure, expressing a query is a difficult task.
Second, the concept mismatch between the user and the
domain specialist can exist [7]. The user who expresses a query
and the domain specialist who designs and generates RDF data
can express the same concept differently. For example, suppose
that the domain specialist uses the term ‘business’ in RDF data
but the user expresses a query by using the term ‘commerce’.
Then, no result will be returned to the user.
This paper presents a visual query interface called Seman-
tic Query-by-Example (SQBE) to remedy the inconvenience
of using a graph pattern-based, textual query language like
SPARQL. SQBE provides various visual elements so that the
user expresses a query visually. Formulation of a query is
guided by schema information available in RDF data. This
guided query formulation allows the user to express a query
without knowing the structure of RDF data in advance. This
also helps to reduce the possibility of concept mismatch.
A query visually expressed by the user is translated into a
SPARQL query to extract the requested information from RDF
data.
This paper describes SQBE in detail and presents a number
of visual query examples that can be expressed by SQBE.
These examples will show how intuitively a variety of queries
can be expressed in SQBE. This paper also describes a
visual query-to-SPARQL translation algorithm and presents a
prototype of SQBE and how it can be used to express a query
in a lifelog management system.
This paper is organized as follows. Section II describes
background information. Section III explains SQBE in detail
and Section IV presents a prototype implementation of SQBE.
Related work is described in Section V. Finally, Section VI
concludes the paper.
II. PRELIMINARIES
A. Beyound the Relational Data Model
The relational data model has been sucessfully used in
many applications such as banking, enterprise applications,
etc. Data modelling in the relational data model is usually
done at a conceptual level by Entity-Relationship diagrams
[8]. An ER diagram models objects and their relationships by
entity sets and relationships. An ER diagram is then translated
into a number of tables in a relational database. In general,
the relational model assumes that there is a schema, or a set
of tables, which rarely changes. However, as a new way of
information management emerges, the fixed and static nature
Page 2
Fig. 1.RDF data representing contact information
of the relational model is no longer appropriate for new
applications such as life sciences and lifelog management.
For example, in a lifelog management system, there are
many kinds of lifelogs such as pictures, videos, music, con-
versation records, schedules, medical records, and so on. And
any relationship can be made between any two lifelogs. In
addition, as the lifestyle or job of a person changes during the
lifetime of the person, new kinds of lifelogs and relationships
can appear. In this case, a static-table approach of the relational
data model is not appropriate for lifelog management.
The Semantic Web provides a data model suitable for
such applications. The RDF data model is very flexible to
accommodate changes in the data. New pieces of information
can be easily added to existing RDF data. Thus, RDF has been
popularly used and adopted as ‘the data model’ for many new
applications.
B. RDF Data Model
The RDF data model [4] is the data model for the Semantic
Web. It represents resource descriptions on the Web as a la-
belled graph or a set of triples. Each triple consists of a subject,
property and object. A triple says that “subject has object as
its property.” To uniquely identify each component of a triple,
URIs, which are similar to URLs, are used as identifiers in the
RDF data model. Because the length of a URI is typically long,
a shortened form is usually used instead. For example, suppose
that the URI “http://example.org/contact/joesmith” represents
a person whose name is Joe Smith. This URI can be abbre-
viated to “contact:joesmith”. Here the “contact” part of the
shortened URI actually means “http://example.org/contact/”.
subject and property must be URIs. In the position of object,
however, string values or literals can be used to contain textual
descriptions of resources.
Figure 1 shows an example of RDF data representing
contact information. It describes that the person identified by
“contact:joesmith” is a type of “contact:Person”, that is, an
instance1of the Person class. His name is “Joe Smith” as the
property “foaf:name” and the literal “Joe Smith” represents.
And his email address is “smith@example.org”. The data also
contains relationship information describing that Joe Smith’s
father is John Smith. The RDF data shown in Figure 1 can be
1In the standard documents related to RDF, the term individual is used
instead of instance. We use instance in this paper.
represented equivalently in a textual form as follows (a period
indicates the end of a triple):
contact:joesmith rdf:type contact:Person .
contact:joesmith foaf:mbox smith@example.org .
contact:joesmith foaf:name "Joe Smith" .
contact:joesmith family:father contact:johnsmith .
...
As shown in Figure 1, RDF data can contain schema
information that describes what kinds of classes are available.
It may also contain which properties can be used to connect
instances of specific classes.
C. The SPARQL query language
SPARQL is a query language based on the concept of graph
pattern matching. In SPARQL, the user expresses a query as a
graph pattern, which is basically a labelled graph. Then, part
of RDF data that has similar structure, or, has a match to the
graph pattern is extracted and returned to the user.
For example, the following SPARQL query is a query that
asks a name and email address of every person from the RDF
data shown in Figure 1.
PREFIX foaf: http://xmlns.com/foaf/0.1/
SELECT ?name ?mbox
WHERE {
?x foaf:name ?name .
?x foaf:mbox ?mbox . }
The first line of this query means that “foaf:” is a shortened
form of “http://xmlns.com/foaf/0.1/”. The subsequent lines are
the select clause and the where clause, which are the main
elements of a simple SPARQL query. In the where clause, a
graph pattern is specified in a textual form. A query variable
is used at the positions where the user does not know its value
exactly. Specified in the select clause is a list of query variables
whose values will be retrieved.
The SPARQL query shown above will match to the follow-
ing part of the RDF data in Figure 1:
contact:joesmith foaf:mbox smith@example.org .
contact:joesmith foaf:name "Joe Smith" .
As the values of the query variables ?name and ?mbox,
smith@email.org and "Joe Smith" will be returned
to the user. This query may also match to another part of the
RDF data. Then another pair of a name and email address will
be returned.
As we mentioned in Section I, for large RDF data, complex
structure and possible concept mismatch make it difficult to
express a query by using a textual, structured query language
like SPARQL.
III. SEMANTIC QUERY BY EXAMPLE
In this section, we describe SQBE in detail. SQBE provides
a visual query interface that can be used to express a query
for RDF data visually. Once a visual query is drawn, it is
translated into a SPARQL query for exeuction.
We make the following observations that lead to the devel-
opment of SQBE:
1) The user typically has partial or no knowledge of RDF
data, especially for large RDF data having complex
structure such as SwetoDblp [9] and DBpedia [10].
Page 3
2) The user does not remember precisely the names of
resources and relationships that appear in RDF data.
There is also possible concept mismatch between the
user and the domain specialist.
3) Learning a new structured query language like SPARQL
is a burden for the user.
SQBE allows the user to express a query visually by using
various visual elements provided by SQBE. This makes it
possible to formulate a query intuitively without learning a
new language syntax (see observation 3 above). SQBE also
supports schema-guided query construction by showing what
kinds of classes and properties available in RDF data based on
schema information stored in the data. This allows the user to
express a query without a complete knowledge of the structure
of RDF data (see observations 2 and 1).
In what follows, we first define a query graph, which models
what the user expresses visually by SQBE, and describes its
semantics. Next, we explain a query graph-to-SPARQL trans-
lation algorithm. Finally, we present a number of examples in
the context of lifelog management that show how easily many
kinds of queries can be expressed in SQBE.
A. Query Graphs
What the user expresses visually by SQBE is modelled as
a query graph. A query graph Q = (V,E,p) consists of three
components. V is a vertex set and is a union of a Class Node
(C-Node) set VC and a Literal Node (L-Node) set VL. Each
C-Node c has a label label(c), which is the name of the class
it represents. Each L-Node l also has a label label(l), which
is a string value of the L-Node. A C-Node is visually drawn
as a circle containing a class name in it and an L-Node as a
rectangle containing a string value in it.
An edge set E is a union of a Property Edge (P-Edge) set
EP = VC× (VC∪ VL) and a Connect Edge (C-Edge) set
EC= VC× (VC∪ VL). Each edge epin the P-Edge set EP
is associated with a label label(ep), which is the name of the
property it represents. The label label(ec) of every edge ecin
the C-Edge set ECis Connect.
A P-Edge (C-Edge, respectively) can connect either a C-
Node to another C-Node, or a C-Node to an L-Node. A P-
Edge (C-Edge) connecting a C-Node and another C-Node is
called an object P-Edge (C-Edge) while that connecting a C-
Node and an L-Node is called a literal P-Edge (C-Edge). Both
P-Edges and C-Edges are visually drawn as a labeled arrow
whose head points to the target C-Node or L-Node and whose
label is the edge label.
Finally, the Project Node (P-Node) p is a C-Node designated
by the user to be the P-Node. It must be specified in every
query graph. The P-Node is visually drawn as a black-colored
circle containing a class name in it.
B. Semantics
The semantics of a query graph is defined as follows. If
there is only one C-Node representing class C in a query
graph, then this query graph returns all instances of class C.
If there are two nodes in a query graph, and a P-Edge ep
connects the two, the meaning of the query graph is defined
as follows:
• For an object P-Edge ep = (c1,c2), the meaning is to
retrieve only those instances of classes c1and c2that are
connected to each other by property label(ep).
• For a literal P-Edge ep= (c,l), it means to return only
those instances of class label(c) whose value of property
label(ep) is label(l).
If two nodes are connected by a C-Edge, the meaning is
similar to the case when they are connected by a P-Edge.
However, the case for a C-Edge does not care about how the
two are connected to each other: Any number of any property
can be used to connect the two. In other words, connecting
two nodes with a C-Edge means that the user does not know
how exactly the two are connected in RDF data.
Not all instances are returned to the user as a final result.
The user designates one of the C-Nodes in a query graph as
the P-Node. As a final result, only those instances of the class
represented by the P-Node are extracted and returned to the
user.
C. Schema Guided Construction
SQBE supports schema guided construction of a query
graph based on schema information in RDF data. At first,
SQBE shows the classes available in RDF data as a class
hierarchy. The user initially starts with an empty query graph.
To add a C-Node to the empty query graph, the user selects a
class from the class hierarchy. Then a C-Node for the selected
class is added to the empty query graph.
Next, when the user selects a C-Node in the current query
graph, all P-Edges that can be used to connect the selected C-
Node to another C-Node or L-Node are shown for selection.
The user can expand the current query graph by selecting one
of the P-Edges presented. When an object P-Edge is selected,
the target C-Node is added to the query graph, and the source
and target C-Nodes are connected by the P-Edge. If a literal P-
Edge is selected, the user is prompted to enter a string value.
Then an L-Node with the string value input by the user is
added to the query graph, and the source C-Node and the
target L-Node are connected by the P-Edge. The user also can
connect any two nodes in the query graph with a C-Edge.
D. SPARQL Translation
Once a query graph is specified, it should be translated into
a SPARQL query for execution. Algorithm 1 shows a query
graph-to-SPARQL translation algorithm, TRANSLATE, and
Algorithm 2 shows supporting procedures used in Algorithm
1.
TRANSLATE accepts a query graph Q and an empty
SPARQL query Output and adds triples to the select clause
(select(Output)) and the where clause (where(Output)) of
the SPARQL query appropriately. TRANSLATE translates each
C-Node (line 3) and the edges connected to it (line 8). If
the current C-Node is designated as the P-Node, the query
variable assigned to the C-Node is added to the select clause
Page 4
Algorithm 1 Query graph-to-SPARQL translation algorithm
1: procedure TRANSLATE(Q = (V,E,p), Output)
2:
for each c ∈ VCdo
3:
subvar ← TRANSCNODE(c, Output)
4:
if c = p then
5:
Add subvar to select(Output)
6:
end if
7:
for each e = (c,d) ∈ E do
8:
TRANSEDGE(e, Output)
9:
end for
10:
end for
11: end procedure
of the SPARQL query (line 4 and 5). During translation
process, query variables are generated and assigned to C-
Nodes appropriately.
To translate a C-Node, TRANSCNODE in Algorithm 2 adds
a triple of the form ‘var rdf:type label(c).’ to the where clause
(line 5). This triple indicates that var is an instance of class
label(c). TRANSEDGE translates an object P-Edge (C-Edge,
respectively) and literal P-Edge (C-Edge) differently. The case
for an object P-Edge (C-Edge) is simple (line 12 to 14). A
triple of the form ‘subvar label(e) objvar.’ is added to the
where clause. However, for the case of a literal P-Edge (C-
Edge) (line 15 to 18), one more triple ‘objvar pf:textMatch
label(d)’ is added to the where clause. This triple indicates
that objvar is a literal value that is matched to label(d) by
free text search.
Algorithm 2 Supporting procedures for Algorithm 1
1: procedure TRANSCNODE(c, Output)
2:
if no variable has been assigned to c then
3:
var ← Generate a new variable
4:
Assign var to c
5:
Add ‘var rdf:type label(c).’ to where(Output)
6:
else
7:
return var
8:
end if
9: end procedure
10: procedure TRANSEDGE(e = (c,d), Output)
11:
subvar ← TRANSCNODE(c)
12:
if w is an object P-Edge or C-Edge then
13:
objvar ← TRANSCNODE(d)
14:
Add ‘subvar label(e) objvar.’ to where(Output)
15:
else
16:
objvar ← Generate a new variable
17:
Add ‘subvar label(e) objvar.’ to where(Output)
18:
Add‘objvar
pf:textMatch
where(Output)
19:
end if
20: end procedure
label(d).’to
The query graph-to-SPARQL translation algorithm uses two
properties that are not currently supported by the SPARQL
Fig. 2.Query graph for Example 1.
query language: Connect for C-Edges and pf:textMatch for
free text search. For the Connect property, refer to the query
language proposed in [11]. pf:textMatch is a property sup-
ported by LARQ [12], a text indexer for for SPARQL,
E. Examples
In this section, we present several exmples of using SQBE
to express a query over RDF data in a lifelog management
system. We shall show how simply and intuitively SQBE can
formulate user queries.
Before presenting examples, we first describe an example
schema prepared for use in examples. The example schema
models lifelog data and contains four classes: Picture, Person,
Music, and Company, whose meanings should be clear from
their names.
There are also several properties defined over these classes:
• A property description defined over the Picture class and
xsd:string2describes pictures by textual descriptions.
• A property musicListened defined over the Picture class
and the Music class describes what music is listened to
when a picture is taken.
• A property singer defined over the Music class and
xsd:string describes who sings a song.
• A property personAppeard defined over the Picture class
and the Person class describes who appear in a picture.
• A property worksFor defined over the Person class and
the Company class describes for which company a person
is working for.
• A property name defined over the Company class and
xsd:string describes the name of a company.
For brevity, we do not use a shortened form of URIs. All
class names and property names appeared in the examples are
assumed to be prefixed by “lifelog:”.
Example 1. Suppose that the user wants to find all pictures
from lifelog data. In this case, the user simply adds a C-Node
for class Picture to an empty query graph and marks the C-
Node as the P-Node of the query graph. Figure 2 shows the
query graph for this example.
Example 2. The user wishes to find all pictures that are
taken in the United States. The user first adds a C-Node for
class Picture to an empty graph as before, but this time the user
adds a L-Node containing a string value “states” and connects
the C-Node and the L-Node by a P-Edge description. Finally,
the user marks the C-Node for class Picture as the P-Node.
Figure 3 shows the query graph for this example.
Example 3. The user wants to find some pictures but does
not remember when or where those pictures were taken.
2xsd:string represents a string value.
Page 5
Fig. 3. Query graph for Example 2.
Fig. 4. Query graph for Example 3.
Fig. 5. Query graph for Example 4.
However, the user remembers the fact that, when the pictures
were taken, the user were listening to music sung by Michael
Jackson. With this clue, the user first adds a C-Node for
class Picture and another C-Node for class Music. Next, the
user connects them with a P-Edge musicListened. Finally, to
specify what music was listened to, the user adds an L-Node
containing a string value “Jackson” and connects the L-Node
to the C-Node for class Music with a P-Edge singer. Figure
4 shows the query graph for this example.
Example 4. In this last example, the user is searching for
the pictures taken together with someone whom the user
does not remember exactly. The user remembers only that
he is working for a company named Samsung. As before,
the user starts by adding a C-Node for class Picture and
another C-Node for class Person and connects them with a
P-Edge personAppeared. Next, the user adds yet another C-
Node for class Company and connects it to the C-Node for
class Person with a P-Edge worksFor. Then, the user adds a
L-Node containg a string value “Samsung” and connects it to
the C-Node for class Company with a P-Edge name. Finally,
the user marks the C-Node for class Picture as the P-Node.
Figure 5 shows the query graph for this example.
Compare the query graph for Example 4 with the following
SPARQL query that has the same meaning (simplified for
brevity).
SELECT ?picture
WHERE {
?picture rdf:type ex:Picture .
?person rdf:type ex:Person .
?company rdf:type ex:Company .
?picture ex:personAppeared ?person .
?person ex:worksFor ?company .
?company ex:name "Samsung"
}
We believe that the query graph specified by SQBE is more
simple and intuitive than the SPARQL query.
IV. IMPLEMENTATION
We implemented SQBE in Java SE 6.0. We use Jena
[13] to access ontology information modelled by the RDF
data model. Jena supports various storage subsystems such
as main memory storage, SQL-backed storage subsystem,
and non-SQL storage subsystem. We use TDB [14], a high
performance, pue-Java, non-SQL storage subsystem. TDB can
handle billions of triples. We use Visual Library 2.0 [15]
from NetBeans for visualization in SQBE. Visual Library 2.0
supports many features useful to implement SQBE such as
graph oriented visualization, zoom in/out, layouts, in-place
editing, satelite view, etc.
Figure 6 shows a screenshot of SQBE. In the figure, the
class hierarchy is shown on the left. By selecting a class and
clicking the “Add” button, the user can add a C-Node to the
query graph shown on the right. A C-Node in the query graph
is visualized as a table labelled with a class name. The table
shows a list of properties that can be used to connect the
C-Node to another C-Node or L-Node. It also has a check
box used to change a C-Node into the P-Node. By clicking
one of the properties or filling a string value in the condition
column, the user can expand the current query graph further.
A satelite view is shown on the right most side of the figure.
On the bottom are buttons for excuting and clearing a query,
and checkboxs and text fields that controls how to generate a
final SPARQL query from the query graph.
V. RELATED WORK
Zloof [16] proposed a visual query interface called Query-
by-Example (QBE) for relational databases. The user of QBE
fomulates a query by filling table space appropriately. When
the user types a table name, an empty table with column
headings corresponding to its attributes appears. The user then
fills in some of the columns with some values and designates
some other columns as output columns. Then the tuples having
values specified by the user are returned, projected onto the
output columns.
Since the Query-by-Example approach was first proposed,
this simple and intuitive concept has been applied to many
other data models as well. PESTO [17] is an integrated
query/browser for object databases. It supports both browsing
and querying in the same user interface. When the user selects
a class from an object database, a window for the selected class
appears on the screen. The attributes and values of the selected
class are shown in the window, for one object of the class at
a time. When the user changes into a query mode, the places
where attribute values appear are replaced by empty text fields,
and the user enters attribute values in some of the empty text
fields. Then only those objects that are filtered by the specified
attribute values appear one at a time in the window.
OZONE [18] is a visual query interface for ontologies.
Similar to our work, it provides visual elements that are used
to express a query for RDF data visually. However, there are
a couple of differences between OZONE and SQBE. First,