Conference PaperPDF Available

Weaving a Social Data Web with Semantic Pingback.

Authors:

Abstract and Figures

In this paper we tackle some pressing obstacles of the emerging Linked Data Web, namely the quality, timeliness and coherence of data, which are prerequisites in order to provide direct end user benefits. We present an approach for complementing the Linked Data Web with a social dimension by extending the well-known Pingback mechanism, which is a technological cornerstone of the blogosphere, towards a Semantic Pingback. It is based on the advertising of an RPC service for propagating typed RDF links between Data Web resources. Semantic Pingback is downwards compatible with conventional Pingback implementations, thus allowing to connect and interlink resources on the Social Web with resources on the Data Web. We demonstrate its usefulness by showcasing use cases of the Semantic Pingback implementations in the semantic wiki OntoWiki and the Linked Data interface for database-backed Web applications Triplify.
Content may be subject to copyright.
Weaving a Social Data Web
with Semantic Pingback
Sebastian Tramp, Philipp Frischmuth, Timofey Ermilov, and Sören Auer
Universität Leipzig, Institut für Informatik, AKSW,
Postfach 100920, D-04009 Leipzig, Germany,
{lastname}@informatik.uni-leipzig.de
http://aksw.org
Abstract. In this paper we tackle some of the most pressing obstacles
of the emerging Linked Data Web, namely the quality, timeliness and
coherence as well as direct end user benefits. We present an approach
for complementing the Linked Data Web with a social dimension by
extending the well-known Pingback mechanism, which is a technological
cornerstone of the blogosphere, towards a Semantic Pingback. It is based
on the advertising of an RPC service for propagating typed RDF links
between Data Web resources. Semantic Pingback is downwards com-
patible with conventional Pingback implementations, thus allowing to
connect and interlink resources on the Social Web with resources on the
Data Web. We demonstrate its usefulness by showcasing use cases of the
Semantic Pingback implementations in the semantic wiki OntoWiki and
the Linked Data interface for database-backed Web applications Triplify.
Introduction
Recently, the publishing of structured, semantic information as Linked Data has
gained much momentum. A number of Linked Data providers meanwhile publish
more than 200 interlinked datasets amounting to 13 billion facts1. Despite this
initial success, there are a number of substantial obstacles, which hinder the
large-scale deployment and use of the Linked Data Web. These obstacles are
primarily related to the quality, timeliness and coherence of Linked Data as well
as to providing direct benefits to end users. In particular for ordinary users of
the Internet, Linked Data is not yet sufficiently visible and (re-) usable. Once
information is published as Linked Data, authors hardly receive feedback on its
use and the opportunity of realizing a network effect of mutually referring data
sources is currently unused.
In this paper we present an approach for complementing the Linked Data
Web with a social dimension. The approach is based on an extension of the well-
known Pingback technology [9], which is one of the technological cornerstones of
the overwhelming success of the blogosphere in the Social Web. The Pingback
1http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/
DataSets/Statistics
mechanism enables bi-directional links between weblogs and websites in general
as well as author/user notifications in case a link has been newly established.
It is based on the advertising of a lightweight RPC service, in the HTTP or
HTML header of a certain Web resource, which should be called as soon as a
link to that resource is established. The Pingback mechanism enables authors
of a weblog entry or article to obtain immediate feedback, when other people
reference their work, thus facilitating reactions and social interactions. It also
allows to automatically publish backlinks from the original article to comments
or references of the article elsewhere on the Web, thus facilitating timeliness
and coherence of the Social Web. As a result, the distributed network of social
websites using the Pingback mechanism (such as the blogosphere) is much tighter
and timelier interlinked than conventional websites, thus rendering a network
effect, which is one of the major success factors of the Social Web.
With this work we aim to apply this success of the Social Web to the Linked
Data Web. We extend the Pingback mechanism towards a Semantic Pingback,
by adding support for typed RDF links on Pingback clients, servers and in the
autodiscovery process.
When an RDF link from a Semantic Pingback enabled Linked Data resource
is established with another Semantic Pingback enabled Linked Data resource, the
latter one can be automatically enriched either with the RDF link itself, with
an RDF link using an inverse property or additional information. When the
author of a publication, for example, adds bibliographic information including
RDF links to co-authors of this publication to her semantic wiki, the co-authors’
FOAF profiles can be enriched with backlinks to the bibliographic entry in an
automated or moderated fashion. The Semantic Pingback supports provenance
through tracking the lineage of information by means of a provenance vocabulary.
In addition, it allows to implement a variety of measures for preventing spam.
Semantic Pingback is completely downwards compatible with the conven-
tional Pingback implementations, thus allowing to seemlessly connect and in-
terlink resources on the Social Web with resources on the Data Web. A weblog
author can, for example, refer to a certain Data Web resource, while the pub-
lisher of this resource can get immediately notified and owl:seeAlso links can
be automatically added to the Data Web resource. In order to facilitate the
adoption of the Semantic Pingback mechanism we developed three complemen-
tary implementations: a Semantic Pingback implementation was included into
the semantic data wiki OntoWiki, we added support for Semantic Pingbacks to
the Triplify database-to-RDF mapping tool and provide a standalone implemen-
tation for the use by other tools or services.
The paper is structured as follows: We describe the requirements which
guided the development of Semantic Pingback in section 1. We present an archi-
tectural overview including communication behaviour and autodiscovery algo-
rithms of our solution in section 2. A description of our implementations based
on OntoWiki and Triplify as well as the standalone software is given in section 5.
Finally, we survey related work in section 6 and conclude with an outlook on
future work in section 7.
1 Requirements
In this section we discuss the requirements, which guided the development of
our Semantic Pingback approach.
Semantic links. The conventional Pingback mechanism propagates untyped
(X)HTML links between websites. In addition the Semantic Pingback mechanism
should be able to propagate typed links (e.g. OWL object properties) between
RDF resources.
Use RDFa-enhanced content where available. Since most traditional we-
blog and wiki systems are able to create semantically enriched content based
on RDFa annotations2, these systems should be able to propagate typed links
derived from the RDFa annotations to a Semantic Pingback server without any
additional modification or manual effort.
Downward compatibility with conventional Pingback servers. Conven-
tional Pingback servers should be able to retrieve and accept requests from Se-
mantic Pingback clients. Thus, widely used Social Web software such as Word-
Press or Serendipity can be pinged by a Linked Data resource to announce the
referencing of one of their posts. A common use case for this is a Linked Data
SIOC [4] comment which replies and refers to a blog post or wiki page on the
Social Web. Such a SIOC comment typically uses the sioc:reply_of object
property to establish a link between the comment and the original post3.
Downward compatibility for conventional Pingback clients. Conven-
tional Pingback clients should be able to send Pingbacks to Semantic Pingback
servers. Thus, a blogger can refer to any pingback-enabled Linked Data resource
in any post of her weblog. Hence, the conventional Pingback client should be
able to just send conventional Pingbacks to the Linked Data server. Unlike a
conventional Pingback server, the Semantic Pingback server should not create
a comment with an abstract of the blog post within the Linked Data resource
description. Instead an additional triple should be added to the Linked Data
resource, which links to the referring blog post.
Support Pingback server autodiscovery from within RDF resources.
The conventional Pingback specification keeps the requirements on the client
side at a minimum, thus supporting the announcement of a Pingback server
2This should be possible at least manually by using the systems HTML source editor,
but can be supported by extensions as for example described in [6] for Drupal.
3Since SIOC is a very generic vocabulary, people can also use more specific relations
as, for instance, disagreesWith or alternativeTo from the Scientific Discourse Re-
lationships Ontology [5].
RPC Layer
Resource Layer
Linking Resource
(Source)
Linked Resource
(Target)
(typed) linking
observes
announces
RPC request
autodiscovery
fetches
1
2
3
4
5
(updates)
6
Publisher
Link Reveiver
Pingback Server
Link Publisher
Pingback Client
(Link Propagator)
(notifies)
7
Fig. 1. Architecture of the Semantic Pingback approach.
through a <link>-Element in an HTML document. Since the Semantic Pingback
approach aims at applying the Pingback mechanism for the Web of Data, the
autodiscovery process should be extended in order to support the announcement
of a Pingback server from within RDF documents.
Provenance tracking. In order to establish trust on the Data Web it is
paramount to preserve the lineage of information. The Semantic Pingback mech-
anism should incorporate the provenance tracking of information, which was
added to a knowledge base as result of a Pingback.
Spam prevention. Another aspect of trust is the prevention of unsolicited
proliferation of data. The Semantic Pingback mechanism should enable the in-
tegration of measures to prevent spamming of the Data Web. These measures
should incorporate methods based on data content analysis and social relation-
ship analysis.
2 Architectural Overview
The general architecture of the Semantic Pingback approach is depicted in Fig-
ure 1. A linking resource (depicted in the upper left) links to another (Data)
Web resource, here called linked resource (arrow 1). The linking resource can be
either an conventional Web resource (e.g. wiki page, blog post) or a Linked Data
resource. Links originating from Linked Data resources are always typed (based
on the used property), links from conventional Web resources can be either un-
typed (i.e. plain HTML links) or typed (e.g. by means of RDFa annotations).
The Pingback client (lower left) is either integrated into the data/content man-
agement system or realized as a separate service, which observes changes of the
Web resource (arrow 2). Once the establishing of a link was noted, the Pingback
client tries to autodiscover a Pingback server from the linked resource (arrow 3).
If the autodiscovery was successful, the respective Pingback RPC server is called
(arrow 4), with the parameters linking resource (i.e. source) and linked resource
(i.e. target). In order to verify the retrieved request (and to obtain information
about the type of the link in the semantic case), the Pingback server fetches (or
dereferences) the linking resource (arrow 5). Subsequently, the Pingback server
can perform a number of actions (arrows 6,7), such as updating the linked re-
source (e.g. adding inverse links) or notifying the publisher of the linked resource
(e.g. via email). This approach is compatible with the conventional Pingback
specification [9].
The following scenario, which was introduced in the above mentioned speci-
fication, illustrates the chain of communication steps executed for a single Ping-
back request:
1. Alice posts to her blog. The post she has made (source resource ) includes a
link to a post on Bob’s blog (target resource).
2. Alice’s blogging system (Pingback client) contacts Bob’s blogging system
(Pingback server) and propagates that a link to a post inside Bob’s environ-
ment was established.
3. Bob’s blogging system then verifies, that Alice’s post indeed includes a link
to Bob and adds a link back to Alice’s post on his original post.
4. Readers of Bob’s article can follow this link to Alice’s post to read her
opinion.
This scenario as well as the general architecture introduce four components,
which we now describe in more detail:
Pingback client. Alice’s blogging system comprises the Pingback client. The
Pingback client establishes a connection to the Pingback server on a certain
event (e.g. on submitting a new blog post) and starts the Pingback request.
Pingback server. Bob’s blogging system acts as the Pingback server. The
Pingback server accepts Pingback request via XML-RPC and reacts as config-
ured by the owner. In most cases, the Pingback server saves information about
the Pingback in conjunction with the target resource.
Target resource. Bob’s article is called the target resource and is identified
by the target URI. The target resource can be either a web page or an RDF
resource, which is accessible through the Linked Data mechanism. A target re-
source is called pingback-enabled, if a Pingback client is able to glean information
about the target resource’s Pingback server (see section 3.1 for autodiscovery of
Pingback server information).
:Pingback-
Server
:Target
:Pingback-
Client :Source
scan for links
links
server autodiscovery
header or document
XML-RPC request (ping)
fetch and check
document with link(s) to target
XML response
:Source-
Publisher :Target-
Publisher
updates
observes
updates informs
Fig. 2. Sequence diagram illustrating the (Semantic) Pingback workflow.
Source resource. Alice’s post is called the source resource and is identified by
the source URI. Similar as the target resource, the source resource can be either
a web page or an RDF resource. The source resource contains some relevant
information chunks regarding the target resource.
These information chunks can belong to one or more of the following cate-
gories:
An untyped (X)HTML link in the body of the web page (this does not apply
for Linked Data resources).
A (possible RDFa-encoded) RDF triple linking the source URI with the
target URI trough an arbitrary RDF property. That is, the extracted source
resource model contains a direct relation between the source and the target
resource. This relation can be directed either from the source to the target
or in the opposite direction.
A (possible RDFa-encoded) RDF triple where either the subject or the ob-
ject of the triple is the target resource. This category represents additional
information about the target resource including textual information (e.g.
an additional description) as well as assertions about relations between the
target resource and a third resource. This last category will most likely ap-
pear only in RDFa enhanced web pages since Linked Data endpoints are less
likely to return triples describing foreign resources.
Depending on these categories, a Semantic Pingback server will handle the
Pingback request in different ways. We describe this in more detail later in
section 4.
Figure 2 illustrates the complete life-cycle sequence of a (Semantic) Pingback.
Firstly, the source publisher updates the source resource, which is observed by
a Pingback client. The Pingback client then scans the source resource for links
(typed or untyped) to other resources. Each time the client detects a suitable link,
it tries to determine a Pingback server by means of an autodiscovery process.
Once a Pingback server was determined, the client pings that server via an XML-
RPC request. Section 3 contains a more detailed description of these steps. Since
the requested Pingback server only receives the source and target URIs as input,
it tries to gather additional information. At least the source document is fetched
and (possibly typed) links are extracted. Furthermore the target resource is
updated and the publisher of the target resource is notified about the changes.
In section 4 the server behavior is described in more detail. Finally, the Pingback
server responds with an XML result.
3 Client Behavior
One basic design principle of the original Pingback specification is to keep the
implementation requirements of a Pingback client as simple as possible. Con-
sequently, Pingback clients do not even need an XML/HTML parser for basic
functionality. There are three simple actions to be followed by a Pingback client:
(1) Determine suitable links to external target resources, (2) detect the Pingback
server for a certain target resource and (3) send an XML-RPC post request via
HTTP to that server. Conventional Pingback clients would naturally detect (un-
typed) links by scanning HTML documents for <a>-elements and use the href-
attribute to determine the target. Semantic Pingback clients will furthermore
derive suitable links by examining RDFa annotated HTML or RDF documents.
Both conventional and Semantic Pingback clients are able to communicate with
a Semantic Pingback server, since the Semantic Pingback uses exactly the same
communication interface. In particular, we did not change the remote procedure
call, but we introduce a third possible autodiscovery mechanism for Semantic
Pingback clients in order to allow the propagation of server information from
within RDF documents. On the one hand, this enables the publisher of a re-
source to name a Pingback server, even if the HTTP header cannot be modified.
On the other hand, this allows caching and indexing of Pingback server infor-
mation in a Semantic Web application.
3.1 Server autodiscovery
The server autodiscovery is a protocol followed by a Pingback client to determine
the Pingback server of a given target resource. The Pingback mechanism sup-
ports two different autodiscovery mechanisms which can be used by the Pingback
client:
an HTTP header attribute X-Pingback and
alink-element in the HTML head with a relation attribute rel="pingback".
Both mechanisms interpret the respective attribute value as URL of a Ping-
back XML-RPC service, thus enabling the Pingback client to start the request.
The X-Pingback HTTP header is the preferred autodiscovery mechanism
and all Semantic Pingback server must implement it in order to achieve the
required downward compatibility. We define an additional autodiscovery method
for Linked Data resources which is based on RDF and integrates better with
Semantic Web technologies.
Therefore, we define an OWL object property service4, which is part of
the Pingback namespace and links a RDF resource with a Pingback XML-RPC
server URL. The advantage compared to an HTTP header attribute is that this
information can be stored along with a cached resource in an RDF knowledge
base. Another benefit is, that different resources identified by hash URIs can
be linked with different Pingback servers. However, a disadvantage (as for the
HTML link element too) is that Pingback clients need to retrieve and parse the
document instead of requesting the HTTP header only.
4 Server Behavior
While the communication behavior of the server is completely compatible with
the conventional Pingback mechanism (as described in [9]), the manipulation of
the target resource and other request handling functionality (e.g. sending email
notifications) is implementation and configuration dependent. Consequently, in
this section we focus on describing guidelines for the important server side manip-
ulation and request handling issues spam prevention, backlinking and provenance
tracking.
4.1 Spam Prevention
At some point every popular service on the Internet, be it Email, Weblogs,
Wikis, Newsgroups or Instant Messaging, had to face increasing abuse of their
communication service by sending unsolicited bulk messages indiscriminately.
Each service dealt with the problem by implementing technical as well as orga-
nizational measures, such as black- and whitelists, spam filters, captchas etc.
The Semantic Pingback mechanism prevents spamming by the following ver-
ification method. When the Pingback Server receives the notification signal, it
automatically fetches the linking resource, checking for the existence of a valid
incoming link or an admissible assertion about the target resource. The Ping-
back server defines, which types of links and information are admissible. This
can be based on two general strategies:
Information analysis. Regarding an analysis of the links or assertions, the
Pingback server can, for example, dismiss assertions which have logical im-
plications (such as domain, range or cardinality restrictions), but allow label
and comment translations into other languages.
4http://purl.net/pingback/service
Publisher relationship analysis. This can be based e.g. on the trust level of
the publisher of the linking resource. A possibility to determine the trust level
is to resolve foaf:knows relationships from the linked resource publisher to
the linking resource publisher.
If admissible links or assertions exist, the Pingback is recorded successfully,
e.g. by adding the additional information to the target resource and notifying
its publisher. This makes Pingbacks less prone to spam than e.g. trackbacks5.
In order to allow conventional Pingback servers (e.g. WordPress) to receive
links from the Data Web, this link must be represented in a respective HTML
representation of the linking resource (managed by the Pingback client) at least
as an untyped X(HTML) link. This enables the server to verify the given source
resource even without being aware of Linked Data and RDF.
4.2 Backlinking
The initial idea behind propagating links from the publisher of the source re-
source to the publisher of the target resource is to automate the creation of
backlinks to the source resource. In typical Pingback enabled blogging systems,
a backlink is rendered in the feedback area of a target post together with the
title and a short text excerpt of the source resource.
To retrieve all required information from the source resource for verifying the
link and gather additional data, a Semantic Pingback server will follow these
three steps:
1. Try to catch an RDF representation (e.g. RDF/XML) of the source resource
by requesting Linked Data with an HTTP Accept header.
2. If this is not possible, the server should try to gather an RDF model from
the source resource employing an RDFa parser.
3. If this fails, the server should at least verify the existence of an untyped
(X)HTML link in the body of the source resource.
Depending on the category of data which was retrieved from the source re-
source, the server can react in different ways:
If there is only an untyped (X)HTML link in the source resource, this link can
be created as an RDF triple with a generic RDF property like dc:references
or sioc:links_to in the servers knowledge base.
If there is at least one direct link from the source resource to the target
resource, this triple should be added to the servers knowledge base.
If there is any other triple in the source resource where either the subject or
the object of the triple corresponds to the target resource, the target resource
can be linked using the rdfs:seeAlso property with the source resource.
In addition to the statements which link the source and the target resource,
metadata about the source resource (e.g. a label and a description) can be stored
as well.
5http://en.wikipedia.org/wiki/Trackback
4.3 Provenance Tracking
Provenance information can be recorded using the provenance vocabulary [8]6.
This vocabulary describes provenance information based on data access and data
creation attributes as well as three basic provenance related types: executions,
actors and artifacts. Following the specification in [8], we define a creation guide-
line for Pingback requests, which is described in this paper, and identified by
the URI http://purl.net/pingback/Request. A specific Pingback request ex-
ecution is then performed by a Pingback data creating service, which uses the
defined creation guideline.
The following listing shows an example provenance model represented in N3:
1@ pr e fi x : < h t tp :/ / p u rl . o rg / n e t / p ro v en a n ce / n s # > .
2@ pr e fi x rd f : < h t tp :/ / w ww . w 3 . or g / 1 99 9 /0 2 / 22 - r df - s y nt ax - n s # > .
3@ pr e fi x rd fs : <h t tp :/ / w ww . w 3 . or g / 2 0 00 / 01 / r df - s ch e m a #> .
4@ pr e fi x s i oc : < h t tp : // r df s . or g / s io c / ns # > .
5
6[ a r df : S t at e me nt ;
7rd f : s u bj e ct < ht t p :/ / e x am p l e1 . o r g / So ur c e > ;
8rd f : p r ed i ca t e s i oc : l i n ks _ to ;
9rd f : o b je c t < h tt p : // e xa m pl e 2 . o rg / T ar g et > ;
10 :containedBy [
11 a : D at aI t em ;
12 : c re a te dB y [
13 a :DataCreation;
14 : pe r fo rm e dA t " 20 10 - 02 - 12 T 12 : 00 :0 0 Z" ;
15 :performedBy [
16 a :DataCreatingService;
17 r df s : la b el " Se m an t ic P in g ba c k S e rv i ce " ];
18 : u se dD a ta [
19 a : D at aI t em ;
20 : c on t a in e d By < ht t p :/ / e x am p l e1 . o r g / So ur c e > ] ;
21 : u se d G ui d e li n e < h t tp : / / p ur l . ne t / p i ng b ac k / R e qu es t > ] ];
22 ].
This provenance model describes a Pingback from http://example1.org/
Source to http://example2.org/Target. The Pingback was performed Friday,
12 February at noon and resulted in a single statement, which links the source
resource to the target resource using a sioc:links_to property.
5 Implementation and Evaluation
In this section we describe the implementation and evaluation of Semantic Ping-
back in three different scenarios. We implemented Semantic Pingback server and
client functionality for OntoWiki in order to showcase the semantic features of
the approach. Semantic Pingback server functionality was integrated in Triplify,
6The Provenance Vocabulary Core Ontology Specification is available at http://
trdf.sourceforge.net/provenance/ns.html.
thus supporting the interlinking with relational data on the Data Web. Finally,
we implemented a standalone Semantic Pingback server (also available as ser-
vice), that can be utilized by arbitrary resources that do not provide a Pingback
service themselves.
5.1 OntoWiki
OntoWiki [2]7is a tool for browsing and collaboratively editing RDF knowledge
bases. Since OntoWiki enables users to add typed links on external resources, we
integrated a Semantic Pingback client component. A recently added feature is the
ability to expose the data stored in OntoWiki via the Linked Data mechanism.
Based on that functionality, a Semantic Pingback server component was also
integrated.
OntoWiki Pingback client. The Pingback client consists of a plugin that
handles a number of events triggered when statements are added or removed
from the knowledge base. Each time a statement is added or removed, the plugin
first checks, whether:
the subject resource is a URI inside the namespace of the OntoWiki envi-
ronment,
the subject resource is (anonymously) accessible via the Linked Data mech-
anism8and
the object of the statement is a resource with an de-referenceable URI outside
the namespace of the OntoWiki environment.
If the above steps are successfully passed, the plugin tries to autodiscover a
Pingback server. This process follows the algorithm described in the original
Pingback specification but adds support for target resources represented in RDF
as described in section 3.1. If a server was discovered, an XML-RPC post request
is send.
OntoWiki Pingback server. The OntoWiki Pingback server is an extension
consisting of a plugin handling some request cycle related events, as well as a
component that provides a Pingback XML-RPC service. The plugin is respon-
sible for exposing the X-Pingback HTTP-header in conjunction with the URL
of the RPC service.
The provided Pingback service initially checks, whether the target resource
is valid, i.e. is inside the namespace of the OntoWiki environment and accessible
via the Linked Data mechanism. If a valid target resource was passed, the service
takes the following steps:
7http://ontowiki.net
8This step is added to the process since OntoWiki is able to handle various access
control mechanisms and we thus ensure that the Pingback server of the target re-
source is definitely able to access either the RDF or the (X)HTML representation of
the source resource.
Fig. 3. OntoWiki backlinks are rendered in the "Instances Linking Here" side box.
The example visualizes a personal WebID with three different backlinks using different
relations.
1. The server tries to request the target resource as RDF/XML. If an RD-
F/XML document is retrieved, all relevant triples are extracted.
2. If the above step fails or no relevant triples are found, the OntoWiki Pingback
server utilizes a configurable RDFa extraction service (e.g. the W3C RDFa
Distiller9), which dynamically creates an RDF/XML representation from a
target Web page.
3. If the second step fails, the target resource is requested without an additional
Accept-header. If an HTML document is retrieved, all links in the document
are checked. If a link to the target resource is found, a generic triple with
the property sioc:links_to is formed together with the source as subject
and the target resource as object.
Relevant triples are all triples that have either the source resource as sub-
ject and the target resource as object or vice versa. If no such statements were
found, but the graph contains at least one statement that has the target re-
source as subject, a rdfs:seeAlso link is established from target resource to
source resource.
All relevant statements are added to the knowledge base containing the tar-
get resource. By using the versioning functionality of OntoWiki, provenance
information of statements added via Pingback requests can be determined, thus
allowing the service to delete statements that are no longer contained by the
source resource.
9http://www.w3.org/2007/08/pyRdfa/
Backlinks that were established via the Pingback service are displayed in the
standard OntoWiki user interface. The "Instances Linking Here" box shows all
incoming links for a given resource in conjunction with the type of the link, as
visualized in figure 3.
5.2 Triplify
Triplify [1] enables the publication of Linked Data from relational databases. It
utilizes simple mappings to map HTTP-URLs to SQL queries and transforms
the relational result into RDF statements. Since a large quantity of currently
available web data is stored in relational databases, the number of available
Linked Data resources increases. As people start to link to those resources, it
becomes handy to notify the respective owner. Therefore, we integrated a Seman-
tic Pingback server into Triplify, which exposes an X-Pingback HTTP header
and handles incoming RPC requests.
The RPC service creates a new database table and stores all registered Ping-
backs persistently. Pingbacks are unique for a given source, target and relation
and hence can be registered only once. Each time the Pingback service is executed
for a given source and target, invalid Pingbacks are removed automatically.
Triplify was extended to export statements for all registered Pingbacks re-
garding a given target resource along with the instance data. The following listing
shows an excerpt of a Triplify export:
1# .. .
2
3< p os t / 1 >
4a sioc:Post ;
5s io c : h a s _ cr e a t or < us e r / 1 > ;
6d ct er ms : c re at ed "2 01 0 -0 2 -1 7 T0 5 :4 8 :1 1" ;
7d ct e rm s : ti t le " He ll o wo rl d ! " ;
8sio c : c o n t e n t " W e l c o m e t o Wo r dP r e s s . This is your .. . " .
9
10 # .. .
11
12 < ht t p :/ / b lo g . a ks w . or g / 20 0 8/ p in g ba ck - t e st / >
13 s io c : l i n ks _ t o < p o st /1 > .
5.3 Standalone implementation
Since a large amount of available RDF data on the Web is contained in plain
RDF files (e.g. FOAF files), we implemented a standalone Semantic Pingback
server10, that can be configured to allow Pingbacks also on external resources.
Based on this implementation, we offer a Semantic Pingback service at: http:
//pingback.aksw.org. It is sufficient to add an RDF statement to an arbitrary
web-accessible RDF document stating that the AKSW Pingback service should
10 Available at: http://aksw.org/Projects/SemanticPingBack
be used employing the pingback:service property. Once a Pingback was send
to that service, the owner of the document gets notified via email. This works
well for FOAF profiles, since the service can detect a foaf:mbox statement in
the profile, which relates the WebID to a mailto:-URI. If no such statement
is found, the service looks for statements that relate the target resource via a
foaf:maker,dc:creator,sioc:has_creator or sioc:has_owner relation to a
resource for which an email address can be obtained.
6 Related Work
Pingback [9] is one of three approaches which allow the automated generation
of backlinks on the Social Web. We have chosen the Pingback mechanism as the
foundation for this work, since it is widely used and less prone to spam than
for example Trackbacks11. Pingback supports the propagation of untyped links
only and is hence not directly applicable to the Data Web.
The PSI BackLinking Service for the Web of Data12 supports the manual
creation of backlinks on the Data Web by employing a number of large-scale
knowledge bases, as for example, data of the UK Public Sector Information
domain. Since it is based on crawling a fixed set of knowledge bases, it cannot
be applied for the entire Data Web. Another service that amongst others is
integrated with the PSI BackLinking Service is SameAs.org13 [7]. Other than
the Semantic Pingback it crawls the Web of Data in order to determine URIs
describing the same resources. OKKAM [3] is a system that aims at unifying
resource identifiers by employing metadata about resources in order to match
them on entities.
The above approaches support interlinking of resources employing central-
ized hubs, but do not support decentralized, on-the-fly backlinking, since they
are based on crawling the Data Web on a regular basis. Consequently the pri-
mary goal of these approaches is to reveal resource identifiers describing the
same entities, rather than interlinking different resources - a key feature of the
Semantic Pingback approach.
7 Conclusion and Future Work
Although the Data Web is currently substantially growing, it still lacks a network
effect as we could observe for example with the blogosphere in the Social Web.
In particular coherence, information quality, timeliness, direct end-user benefits
are still obstacles for the Data Web to become an Web-wide reality. With this
work we aimed at extending and transferring the technological cornerstone of
the Social Web the Pingback mechanism towards the Data Web. The result-
ing Semantic Pingback mechanism has the potential to significantly improve
11 http://www.sixapart.com/pronet/docs/trackback_spec
12 http://backlinks.psi.enakting.org
13 http://sameas.org
the coherence on the Data Web, since linking becomes bi-directional. With its
integrated provenance and spam prevention measures it helps to increase the in-
formation quality. Notification services based on Semantic Pingbacks represent
direct end-user benefits and increase the timeliness. In addition these different
benefits will mutually strengthen each other. Due to its complete downwards
compatibility our Semantic Pingback also bridges the gap between the Social
and the Data Web. We also expect the Semantic Pingback mechanism to sup-
port the transition process from data silos to flexible, decentralized structured
information assets.
Future Work. Currently the Semantic Pingback mechanism is applicable to
relatively static resources, i.e. RDF documents or RDFa annotated Web pages.
We plan to extend the Semantic Pingback mechanism in such a way, that it is
also usable in conjunction with dynamically generated views on the Data Web
- i.e. SPARQL query results. This would allow end-users as well as applications
using remote SPARQL endpoints to get notified once results of a query change.
References
1. S. Auer, S. Dietzold, J. Lehmann, S. Hellmann, and D. Aumueller. Triplify –
Lightweight Linked Data Publication from Relational Databases. In Proceedings
of the 17th International Conference on World Wide Web, WWW 2009, 2009.
2. S. Auer, S. Dietzold, and T. Riechert. OntoWiki - A Tool for Social, Semantic
Collaboration. In The Semantic Web - ISWC 2006, 5th International Semantic
Web Conference, ISWC 2006, 2006.
3. P. Bouquet, H. Stoermer, C. Niederée, and A. Mana. Entity Name System: The
Back-Bone of an Open and Scalable Web of Data. In Proceedings of the 2th IEEE
International Conference on Semantic Computing (ICSC 2008), 2008.
4. J. Breslin, A. Harth, U. Bo jars, and S. Decker. Towards Semantically-Interlinked
Online Communities. In The Semantic Web: Research and Applications Second
European Semantic Web Conference, ESWC 2005, 2005.
5. P. Ciccarese, E. Wu, G. T. Wong, M. Ocana, J. Kinoshita, A. Ruttenberg, and
T. Clark. The SWAN biomedical discourse ontology. Journal of Biomedical Infor-
matics, 41(5):739–751, 2008.
6. S. Corlosquet, R. Cyganiak, A. Polleres, and S. Decker. RDFa in Drupal: Bringing
Cheese to the Web of Data. In Proc. of 5th Workshop on Scripting and Development
for the Semantic Web at ESWC 2009, 2009.
7. H. Glaser, A. Jaffri, and I. Millard. Managing Co-reference on the Semantic Web.
In Proceedings of the Linked Data on the Web Workshop (LDOW2009), 2009.
8. O. Hartig. Provenance Information in the Web of Data, 2009. LDOW2009, April
20, 2009, Madrid, Spain.
9. S. Langridge and I. Hickson. Pingback 1.0. Technical report, http://hixie.ch/
specs/pingback/pingback, 2002.
... [18]), in particular the Semantic Pingback protocol (cf. [17]) and can be extended by a publish-subscribe system using PubSubHubbub (cf. [8]). ...
... Semantic Pingback 3 (cf. [17]) is an approach for bringing a social dimension to the Linked Data Web by adding semantic to the well-known Pingback mechanism (cf. [12]), a technological cornerstone of the blogosphere. ...
... After the resource was successfully written to the Resource Hosting Service the Annotation Client directly sends the pingback request to the Semantic Pingback Service. This behavior is in contras to the specification in [17], where the pingback request would be sent by the Resource Hosting Service. We have decided to implement it in this way to keep the requirements towards a Hosting Service as minimal as possible and increase the selection for a service as flexible as possible, to allow a more decentralized infrastructure in the end. ...
Conference Paper
Full-text available
The World Wide Web is an infrastructure to publish and retrieve information through web resources. It evolved from a static Web 1.0 to a multimodal and interactive communication and information space which is used to collaboratively contribute and discuss web resources, which is better known as Web 2.0. The evolution into a Semantic Web (Web 3.0) proceeds. One of its remarkable advantages is the decentralized and interlinked data composition. Hence, in contrast to its data distribution, workflows and technologies for decentralized collaborative contribution are missing. In this paper we propose the Structured Feedback protocol as an interactive addition to the Web of Data. It offers support for users to contribute to the evolution of web resources, by providing structured data artifacts as patches for web resources, as well as simple plain text comments. Based on this approach it enables crowd-supported quality assessment and web data cleansing processes in an ad-hoc fashion most web users are familiar with.
... Since the early days of the Linked Data Web, several attempts have been created and failed to sustain exhaustive Linked Data Search engines, such as Sindice [19], SWSE [20], Watson [21], Swoogle [22], just to name a few. Typically based on bespoke, crawler-based architectures, these search engines relied on either (i) collecting data published under the Linked Data principles and particularly applying the "follow-your-nose" approach enabled through these principles (i.e., find more Linked Data by dereferencing links appearing in Linked Data), and sometimes (ii) relying on "registry" or "pingback" services to collect and advertise linked data assets, such as Semantic Pingback [23]. In the meantime, unfortunately all of these search engines have been discontinued, and we are not aware of any active, public Semantic Pingback services. ...
... Yet, out of the 251 potential respondent endpoint addresses mentioned above only 136 respond to this recipe, out of which in fact 63 return HTML (mostly query forms), even if attempting CONNEG. 23 We note that while some of these mentioned HTML responses might contain RDFa, it is still an extra step to extract and parse and each such extra step will bloat a potential consuming client unnecessarily. Similarly, when attempting to find data dumps, without a semantic sitemap or a VoID file in place, our best guess would be to guess and try parsers from "format" descriptors in the metadata or from filename suffixes. ...
... 22 https://www.sitemaps.org/protocol.html 23 with sending an 'Accept: text/turtle, application/n-triples, application/trig, application/n-quads, application/rdf+xml, * ' header. ...
Article
Full-text available
In this deliberately provocative position paper, we claim that more than ten years into Linked Data there are still (too?) many unresolved challenges towards arriving at a truly machine-readable and decentralized Web of data. We take a deeper look at key challenges in usage and adoption of Linked Data from the ever-present "LOD cloud" diagram. Herein, we try to highlight and exemplify both key technical and non-technical challenges to the success of LOD, and we outline potential solution strategies. We hope that this paper will serve as a discussion basis for a fresh start towards more actionable, truly decentralized Linked Data, and as a call to the community to join forces.
... The patch request needs to be submitted to the PatchR Instance belonging to the dataset. To achieve this, either a PatchR side API can be applied to send the patch request directly, or the client publishes the patch request in an own repository and announces this publication via semantic pingback RPC service (Tramp, Frischmuth, Ermilov, & Auer, 2010). The decision on the execution of particular patch requests is left to the publisher's Moderator instance based on individual rules. ...
... Alternatively, the patch request can also be published at a client side web server. In this case the collector needs to be informed about the creation, which can be carried out through the semantic pingback mechanism (Tramp et al., 2010). Hereby, the client calls an RPC method on the Collector side having the URL of the patch as an argument. ...
Article
Full-text available
Incorrect or outdated data is a common problem when working with Linked Data in real world applications. Linked Data is distributed over the web and under control of various dataset publishers. It is difficult for data publishers to ensure the quality and timeliness of the data all by themselves, though they might receive individual complaints by data users, who identified incorrect or missing data. Indeed, we see Linked Data consumers equally responsible for the quality of the datasets they use. PatchR provides a vocabulary to report incorrect data and to propose changes to correct them. Based on the PatchR ontology a framework is suggested that allows users to efficiently report and data publishers to handle change requests for their datasets.
... Monitoring tools [28] also provide information about the availability of LOD. DSNotify [29] and Semantic Pingback [30] are often considered generic frameworks. DSNotify uses the time blocking technique to detect and fix broken links between resources. ...
Article
Full-text available
The Linked Open Data (LOD) cloud is a global information space with a wealth of structured facts, which are useful for a wide range of usage scenarios. The LOD cloud handles a large number of requests from applications consuming the data. However, the performance of retrieving data from LOD repositories is one of the major challenge. Overcome with this challenge, we argue that it is advantageous to maintain a local cache for efficient querying and processing. Due to the continuous evolution of the LOD cloud, local copies become outdated. In order to utilize the best resources, improvised scheduling is required to maintain the freshness of the local data cache. In this paper, we have proposed an approach to efficiently capture the changes and update the cache. Our proposed approach, called Application- Aware Change Prioritization (AACP), consists of a change metric that quantifies the changes in LOD, and a weight function that assigns importance to recent changes. We have also proposed a mechanism to update policies, called Preference-Aware Source Update (PASU), which incorporates the previous estimation of changes and establishes when the local data cache needs to be updated. In the experimental evaluation, several state-ofthe- art strategies are compared against the proposed approach. The performance of each policy is measured by computing the precision and recall between the local data cache update using the policy under consideration and the data source, which is the ground truth. Both cases of a single update and iterative update are evaluated in this study. The proposed approach is reported to outperform all the other policies by achieving an F1-score of 88% and effectivity of 93.5%.
... For instance, Endris et al. introduce an approach to monitor the changesets of DBpedia Live for relevant updates [2] (such a changeset is a log of removed and inserted triples). Tools for dataset update notification, such as DSNotify [12] and Semantic Pingback [14], are available but rarely deployed. ...
Conference Paper
Many datasets change over time. As a consequence, long-running applications that cache and repeatedly use query results obtained from a SPARQL endpoint may resubmit the queries regularly to ensure up-to-dateness of the results. While this approach may be feasible if the number of such regular refresh queries is manageable, with an increasing number of applications adopting this approach, the SPARQL endpoint may become overloaded with such refresh queries. A more scalable approach would be to use a middle-ware component at which the applications register their queries and get notified with updated query results once the results have changed. Then, this middle-ware can schedule the repeated execution of the refresh queries without overloading the endpoint. In this paper, we study the problem of scheduling refresh queries for a large number of registered queries by assuming an overload-avoiding upper bound on the length of a regular time slot available for testing refresh queries. We investigate a variety of scheduling strategies and compare them experimentally in terms of time slots needed before they recognize changes and number of changes that they miss.
... For instance, Endris et al. introduce an approach to monitor the changesets of DBpedia Live for relevant updates [3] (such a changeset is a log of removed and inserted triples). Tools for dataset update notification, such as DSNotify [15] and Semantic Pingback [17], are available but extremely rarely deployed. Further hints for possible changes may be obtained from metadata about datasets; for instance, the DCAT recommendation suggests to use dcterms:modified or dcterms:accrualPeriodicity to describe update frequencies of a dataset. ...
Technical Report
Full-text available
Many datasets change over time. As a consequence, long-running applications that cache and repeatedly use query results obtained from a SPARQL endpoint may resubmit the queries regularly to ensure up-to-dateness of the results. While this approach may be feasible if the number of such regular refresh queries is manageable, with an increasing number of applications adopting this approach, the SPARQL endpoint may become overloaded with such refresh queries. A more scalable approach would be to use a middle-ware component at which the applications register their queries and get notified with updated query results once the results have changed. Then, this middle-ware can schedule the repeated execution of the refresh queries without overloading the endpoint. In this paper, we study the problem of scheduling refresh queries for a large number of registered queries by assuming an overload-avoiding upper bound on the length of a regular time slot available for testing refresh queries. We investigate a variety of scheduling strategies and compare them experimentally in terms of time slots needed before they recognize changes and number of changes that they miss.
... One possibility is that various algorithms make use of shared vocabularies for publishing results of mapping, merging, repair or enrichment steps. After one service published its new findings in one of these commonly understood vocabularies, notification mechanisms (such as Semantic Pingback [11]) can notify relevant other services (which subscribed to updates for this particular data domain), or the original data publisher, that new improvement suggestions are available. Given proper management of provenance information, improvement suggestions can later (after acceptance by the publisher) become part of the original dataset. ...
Chapter
Full-text available
In this introductory chapter we give a brief overview on the Linked Data concept, the Linked Data lifecycle as well as the LOD2 Stack – an integrated distribution of aligned tools which support the whole life cycle of Linked Data from extraction, authoring/creation via enrichment, interlinking, fusing to maintenance. The stack is designed to be versatile; for all functionality we define clear interfaces, which enable the plugging in of alternative third-party implementations. The architecture of the LOD2 Stack is based on three pillars: (1) Software integration and deployment using the Debian packaging system. (2) Use of a central SPARQL endpoint and standardized vocabularies for knowledge base access and integration between the different tools of the LOD2 Stack. (3) Integration of the LOD2 Stack user interfaces based on REST enabled Web Applications. These three pillars comprise the methodological and technological framework for integrating the very heterogeneous LOD2 Stack components into a consistent framework.
... In response to these characteristics, the following Linked Data design principles were conceived: 3 • identifying information units by HTTP URIs, • returning uniformly structured data when these URIs are dereferenced, i.e. treated as URLs, and • linking to other, related data. We propose to extend these principles as follows in organisational settings: • evolving existing thesauri, taxonomies, wikis and master data management systems into corporate knowledge bases and knowledge hubs, • establishing an organisation-wide URI naming scheme, • extending existing information systems in the intranet by Linked Data interfaces, and • establishing links between sources of related information. ...
Article
Full-text available
The Linked Data paradigm has emerged as a powerful enabler for data and knowledge interlinking and exchange using standardised Web technologies. In this article, we discuss our vision how the Linked Data paradigm can be employed to evolve the intranets of large organisations-be it enterprises, research organisations or governmental and public administrations-into networks of internal data and knowledge. In particular for large enterprises data integration is still a key challenge. The Linked Data paradigm seems a promising approach for integrating enterprise data. Like the Web of Data, which now complements the original document-centred Web, data intranets may help to enhance and flexibilise the intranets and service-oriented architectures that exist in large organisations. Furthermore, using Linked Data gives enterprises access to 50+ billion facts from the growing Linked Open Data (LOD) cloud. As a result, a data intranet can help to bridge the gap between structured data management (in ERP, CRM or SCM systems) and semi-structured or unstructured information in documents, wikis or web portals, and make all of these sources searchable in a coherent way.
Chapter
Linked Data is one of the emerging ways to publish and link structured and machine-processable data on the Web, however, the existing techniques to perform live query Linked Data are based on recursive URI look-up process. These techniques contain a limitation for the query patterns having subject unbound and object containing a foreign URI. In such cases, the live query does not produce any answers to the query as the querying process could not be initiated due to unavailability of subject field in the triple pattern. In this paper, we make use of backlinking to extract and store foreign URIs and using this information for executing the queries live where the subject is unbound.
Chapter
Datenintegration ist in großen Unternehmen nach wie vor eine zentrale Herausforderung und wird es auch auf absehbare Zeit bleiben. Ein erfolgversprechender Ansatz ist die Verwendung des Linked Data Paradigmas für die Integration von Unternehmensdaten. Ebenso wie inzwischen ein Web der Daten das Dokumenten-zentrierte Web ergänzt, können Daten-Intranets die existierenden Intranet- und SOA-Landschaften in großen Unternehmen erweitern und flexibilisieren. Ein weiterer Vorteil des Linked Data Paradigmas ist die Möglichkeit der Nutzung von Daten aus der inzwischen auf über 50 Mrd. Fakten angewachsenen Linked Open Data (LOD) Cloud. Im Ergebnis kann ein unternehmensinternes Daten-Intranet dazu beitragen die Brücke zwischen strukturiertem Datenmanagement (in ERP, CRM, SCM Systemen) sowie semi- und unstrukturierten Informationen (Dokumente, Wikis, Portale) der Intranetsuche zu schlagen.
Article
Full-text available
The Web of Data is built upon two simple ideas: Employ the RDF data model to publish structured data on the Web and to set explicit RDF links between entities within different data sources. This paper presents the Silk – Link Discovery Framework, a tool for finding relationships between entities within different data sources. Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web. Silk features a declarative language for specifying which types of RDF links should be discovered between data sources as well as which conditions entities must fulfill in order to be interlinked. Link conditions may be based on various similarity metrics and can take the graph around entities into account, which is addressed using a path-based selector language. Silk accesses data sources over the SPARQL protocol and can thus be used without having to replicate datasets locally.
Conference Paper
Full-text available
The openness of the Web and the ease to combine linked data from different sources creates new challenges. Systems that consume linked data must evaluate quality and trust- worthiness of the data. A common approach for data quality assessment is the analysis of provenance information. For this reason, this paper discusses provenance of data on the Web and proposes a suitable provenance model. While tra- ditional provenance research usually addresses the creation of data, our provenance model also represents data access, a dimension of provenance that is particularly relevant in the context of Web data. Based on our model we identify options to obtain provenance information and we raise open questions concerning the publication of provenance-related metadata for linked data on the Web.
Conference Paper
Full-text available
Recognizing that information from different sources refers to the same (real world) entity is a crucial challenge in instance-level information integration, as it is a pre-requisite for combining the information about one entity from different sources. The required entity matching is time consuming and thus imposes a crucial limit for large-scale, dynamic information integration. An increased re-use of entity identifiers (or names) across different information collections such as RDF repositories, databases and document collections, eases this situation.In the ideal case, entity matching can be reduced to the trivial problem of spotting the same entity identifier in different information collections. In this paper we propose the use of an entity name system (ENS) - as it is currently under development in the EU-funded project OKKAM - for systematically supporting the re-use of entity identifiers. The main purpose of the ENS is to provide unique and uniform names for entities for the use in information collections, so that the same name is used for an entity, even when it is referenced in different contexts. Of course the creation of an ENS that can efficiently deal with entities on the Web scale raises scalability issues of its own. This paper focuses on the role of an ENS in contributing to the scalability of ad-hoc and on demand information integration tasks.
Conference Paper
Full-text available
In this paper we present Triplify { a simplistic but eective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto re- lational database queries. Triplify transforms the result- ing relations into RDF statements and publishes the data on the Web in various RDF serializations, in particular as Linked Data. The rationale for developing Triplify is that the largest part of information on the Web is already stored in structured form, often as data contained in relational databases, but usually published by Web applications only as HTML mixing structure, layout and content. In order to reveal the pure structured information behind the cur- rent Web, we have implemented Triplify as a light-weight software component, which can be easily integrated into and deployed by the numerous, widely installed Web ap- plications. Our approach includes a method for publishing update logs to enable incremental crawling of linked data sources. Triplify is complemented by a library of congura- tions for common relational schemata and a REST-enabled data source registry. Triplify congurations containing map- pings are provided for many popular Web applications, in- cluding osCommerce, WordPress, Drupal, Gallery, and ph- pBB. We will show that despite its light-weight architec- ture Triplify is usable to publish very large datasets, such as 160GB of geo data from the OpenStreetMap project.
Conference Paper
Full-text available
We present OntoWiki, a tool providing support for agile, distributed knowledge engineering scenarios. OntoWiki facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWYG for text documents. It fosters social collaboration aspects by keeping track of changes, allowing to comment and discuss every single part of a knowledge base, enabling to rate and measure the popularity of content and honoring the activity of users. Ontowiki enhances the browsing and retrieval by offering semantic enhanced search strategies. All these techniques are applied with the ultimate goal of decreasing the entrance barrier for projects and domain experts to collaborate using semantic technologies. In the spirit of the Web 2.0 OntoWiki implements an ”architecture of participation” that allows users to add value to the application as they use it. It is available as open-source software and a demonstration platform can be accessed at http://3ba.se.
Conference Paper
Full-text available
Online community sites have replaced the traditional means of keeping a community informed via libraries and publishing. At present, online communities are islands that are not interlinked. We describe dif- ferent types of online communities and tools that are currently used to build and support such communities. Ontologies and Semantic Web tech- nologies oer an upgrade path to providing more complex services. Fus- ing information and inferring links between the various applications and types of information provides relevant insights that make the available information on the Internet more valuable. We present the SIOC ontol- ogy which combines terms from vocabularies that already exist with new terms needed to describe the relationships between concepts in the realm of online community sites.
Article
Full-text available
Co-reference resolution, or the determination of ‘equivalent’ URIs referring to the same concept or entity, is a significant hurdle to overcome in the realisation of large scale Semantic Web applications. However, it has only recently gained the attention of research communities in the Semantic Web context, and while activities are now underway in identifying co-referent or conflated URIs, little consideration has been given to tools and techniques for storing, manipulating, and reusing co-reference information. This paper provides an overview of the specification, implementation, interactions and experiences in using the Co-reference Resolution Service (CRS)to facilitate rigorous management of URI co-reference data, and enable interoperation between multiple Linked Open Data sources. Comparisons are made throughout the paper contrasting the differences in the way the CRS manages multiple URIs for the same resource with the emerging practice of using owl:sameAs to identify duplicate URIs. The advantages and benefits that have been gained from deploying the CRS on a site with multiple Linked Data repositories are also highlighted.
Conference Paper
Co-reference resolution, or the determination of ‘equivalent’ URIs referring to the same concept or entity, is a significant hurdle to overcome in the realisation of large scale Semantic Web applications. However, it has only recently gained the attention of research communities in the Semantic Web context, and while activities are now underway in identifying co-referent or conflated URIs, little consideration has been given to tools and techniques for storing, manipulating, and reusing co-reference information. This paper provides an overview of the specification, implementation, interactions and experiences in using the Co-reference Resolution Service (CRS)to facilitate rigorous management of URI co-reference data, and enable interoperation between multiple Linked Open Data sources. Comparisons are made throughout the paper contrasting the differences in the way the CRS manages multiple URIs for the same resource with the emerging practice of using owl:sameAs to identify duplicate URIs. The advantages and benefits that have been gained from deploying the CRS on a site with multiple Linked Data repositories are also highlighted.
Conference Paper
A large number of websites are driven by content manage- ment systems (CMS), which manage not only textual content but also structured data related to the site's topic. Exposing this information to the Web of Data has so far required considerable expertise in RDF mod- elling and programming. We present a plugin for the popular CMS Dru- pal that enables high-quality RDF output with minimal eort from site administrators. This has the potential of greatly increasing the amount and topical range of information available on the Web of Data.
Article
Developing cures for highly complex diseases, such as neurodegenerative disorders, requires extensive interdisciplinary collaboration and exchange of biomedical information in context. Our ability to exchange such information across sub-specialties today is limited by the current scientific knowledge ecosystem's inability to properly contextualize and integrate data and discourse in machine-interpretable form. This inherently limits the productivity of research and the progress toward cures for devastating diseases such as Alzheimer's and Parkinson's. SWAN (Semantic Web Applications in Neuromedicine) is an interdisciplinary project to develop a practical, common, semantically structured, framework for biomedical discourse initially applied, but not limited, to significant problems in Alzheimer Disease (AD) research. The SWAN ontology has been developed in the context of building a series of applications for biomedical researchers, as well as in extensive discussions and collaborations with the larger bio-ontologies community. In this paper, we present and discuss the SWAN ontology of biomedical discourse. We ground its development theoretically, present its design approach, explain its main classes and their application, and show its relationship to other ongoing activities in biomedicine and bio-ontologies.