Conference PaperPDF Available

Sustainable Linked Open Data Creation: An Experience Report


Abstract and Figures

A flexible platform supporting the linked data life-cycle has been developed and applied in various use cases in the context of the large scale linked open data project Fusepool P3. Besides the description of the aims and achievements, experiences from publishing and reusing linked data in public sector and business are summarized. It is highlighted that without further help it is difficult for domain experts to estimate the time, effort and necessary skills when trying to transfer the platform to other use cases. Applying a new publishing methodology turned out to be useful in these cases.
Content may be subject to copyright.
Sustainable Linked Open Data Creation:
An Experience Report
Eduard KLEIN a,1, Stephan HALLER a, Adrian GSCHWEND a,b and
a E-Government Institute, University of Applied Sciences, Bern, Switzerland
b Zazuko GmbH, Biel, Switzerland
c Faculty of Computer Science and Engineering,
Ss. Cyril and Methodius University in Skopje, Macedonia
d OpenLink Software, United Kingdom
Abstract: A flexible platform supporting the linked data life-cycle has been de-
veloped and applied in various use cases in the context of the large scale linked
open data project Fusepool P3. Besides the description of the aims and achieve-
ments, experiences from publishing and reusing linked data in public sector and
business are summarized. It is highlighted that without further help it is difficult
for domain experts to estimate the time, effort and necessary skills when trying to
transfer the platform to other use cases. Applying a new publishing methodology
turned out to be useful in these cases.
Keywords: linked data, semantic enrichment, linked data life-cycle, data publish-
ing, data integration, resource description framework, data management
1. Introduction
The exploitation of the Internet for intelligent knowledge management has been
worked on for many years and it still remains one of the main challenges for the scien-
tific community with added value for business, public bodies and civil society. In this
attempt, the web is not only used in a classical way for publishing (unstructured) doc-
uments as HTML pages, offering online services like shopping, booking or text-based
search engines, but also as a platform for processing and managing structured infor-
mation. It appears in the form of data, which is published, interlinked and integrated
with other structured information as linked data [1], that can subsequently be browsed
or queried.
Annotated with appropriate vocabulary terms from ontologies, this interlinked
structured information can not only be searched by keywords, but on a semantic level,
thus laying the foundation for the Semantic Web [2]. Through linked data, information
and services on the Internet and in web-based applications and mobile apps can and
have already been enriched in a sophisticated way, although in the broad public it is not
yet noticed as a big bang, since it comes in form of a quiet revolution [3]. Facebook’s
Knowledge Graph, Google’s Hummingbird and Bing’s Satori are examples of improv-
Corresponding Author:
Electronic Government and Electronic Participation
H.J. Scholl et al. (Eds.)
© 2016 The authors and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
ing services through semantic search technologies, revealing the revolution’s silence
through incrementally improving the services in small iterations while digesting con-
stantly information from different sources.
In the e-Government domain, the use of linked open data (LOD) is spreading, as
public authorities realize its benefits – not only regarding the transparency of govern-
mental processes, but also as a driver for economic innovation: the availability of ma-
chine-readable semantically enriched open data enables SMEs and other entities to
develop and provide new value-added services and applications. However, while public
authorities in democratic countries around the globe have already or are developing a
strategy for Open Government Data (OGD), only a fraction of those already take the
additional step of provisioning the data as LOD through SPARQL endpoints. Take for
example Switzerland: An e-Government strategy is in place both on the federal level
(since 2007) as in most cantons; in addition, an OGD portal
as single point of entry for
all OGD data in Switzerland has been established in February 2016. A service platform
for LOD however is only available in a pilot stage with currently only a limited set of
One of the main roadblocks hindering a wider adoption of linked open data is
that authorities shy away from the additional effort needed to convert OGD to LOD.
This was also one of the key motivators to start the Fusepool P3 project.
Meanwhile, the Linked Data paradigm has fostered and propelled the emergence
of numerous research projects and software products with focus on LOD [4]. Currently,
the most prominent output of the LOD movement is visualized in the LOD cloud,
core of which is formed by the data sets of DBpedia [5] and GeoNames.
many domain-specific applications have evolved [6], often with an exploratory focus.
Inherent to LOD applications is the processing of data analogous to ETL pro-
cessing in the data warehouse domain, but with more complex operations such as data
extraction, enrichment, interlinking, fusing and maintenance. While these can be auto-
mated to a certain degree for a specific domain, a lot of manual work is still necessary,
e.g., for mapping tasks. This data processing is part of the linked data life-cycle [7],
that occurs with different complexity, among others depending on the data sources and
the requirements of the target applications. In one way or another, the linked data life-
cycle is integral in research projects like LOD2 [8], LATC [9], GeoKnow [10] and
Fusepool [11].
In this paper we describe experiences from Fusepool P3 [12], a large scale EC-
funded FP7 project with a focus on publishing and reusing linked data. The research
goal was to develop enhanced products and services based on the exploitation of linked
data in the context of the tourism domain. In the next section, the project goals are
summarized, followed by a description of the architecture of the integrated data plat-
form. Next, experiences from the project are pointed out, before concluding with as-
pects about the transfer of the research results to other application contexts.
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report100
Of the main findings we have learned that the Fusepool platform can significantly
simplify the publishing of data as linked open data. Regional authorities in Trento and
Tuscany were thus enabled to provide tourism-related data that form the basis for novel
applications. Reflecting several completed use cases showed that additional advice and
recommendations are essential for transferring project results to other use cases. A new
publishing methodology, described below, allows for recording information on com-
pleted LOD projects and helps estimating and planning new LOD use cases.
2. The Fusepool P3 Platform
The main goal was to facilitate publishing and reuse of linked data in a more seamless
way, based on a thriving data market economy with data providers, enhancers, and
component developers along the linked data value chain [11]. In order to facilitate pub-
lishing and processing of Linked Data within a single platform, a set of loosely coupled,
modular software components, compatible with the Linked Data Platform (LDP) best
practices [13], has been developed. These software components work closely together,
supporting the multilingual data value chain, to achieve the following tasks: revealing
data from structured and unstructured sources, refining data through text extraction and
enrichment, and running the linked data ecosystem through data-driven applications
(Fig. 1).
Supported by appropriate backend tools described below, and a high degree of au-
tomation in data processing, the Fusepool platform has successfully been deployed in
several research projects, including the preservation of intellectual property of SMEs in
the patent domain and in tourism use cases.
Figure 1: Elements of the Fusepool P3 data value chain; Fusepool derives its name from the idea of fusing
and pooling linked data with analytical processing on top of it, and P3 abbreviates Linked Data Publish-
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report 101
2.1 Architecture
We aim at providing a single platform for the linked data life-cycle. To achieve this,
the Fusepool platform architecture is based on loosely coupled components communi-
cating via HTTP and exposing RESTful APIs exchanging RDF [14]. This leads to re-
usability of components, enables distributed development and makes it easier for de-
velopers to understand and extend the software, thus ensuring its longevity.
RESTful RDF is the platform's native interaction method, meaning that there are
no proprietary data access APIs in place. Platform components, as well as third party
applications, communicate using generic RDF APIs. In Fig. 2, the Fusepool platform
architecture is depicted.
RDF Triple Store
LDP 1.0
Transformer API
Pipeline Transfomer
Single Transfome rs
LDP Transforming Proxy
LDP 1.0
Client Applications / Fusepool P3 Dashboard
Client Applications / Fusepool P3 Dashboard
Client Applications / Fusepool P3 Dashboard
RTransforming Container API
(Extension of LDP 1. 0)
User Interaction
Request Registry
Factory Registry
User Interaction
Request API
LDP Transforming Proxy
Figure 2: The P3 platform in Fundamental Modeling Concepts (FMC) Notation
The diagram shows how the Fusepool P3 dashboard – the main user interface to
interact with the platform – and other clients access the Fusepool platform primarily via
an LDP Transforming Proxy, an extension of the LDP 1.0 specification which uses the
REST-based Transforming Container API to enable RDF data generation and annota-
tion from input data. The proxy transparently handles transformation processes by call-
ing the actual transformers in the background, and once the process has finished, it
sends back the data to the LDP Server. The clients can also directly access transformers
via their REST API (the Transformer API) or use a SPARQL 1.1 endpoint.
As a result, the architecture does not require a common runtime for its components.
Every component, including all transformers, is by default run as an individual process
acting via HTTP as the interaction interface. The exception to this are the backend
related components (LDP, SPARQL, the RDF Triple Store and possible custom
backend services) which may be more tightly coupled, i.e., they may be run in the same
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report102
runtime environment, due to non-functional requirements such as performance or other
resource cost.
2.2 Components
The P3 platform is composed of three core components: transformers, the LDP Trans-
forming Proxy and backends. Applications such as the Fusepool dashboard are external
components which mainly use standard interfaces such as LDP or SPARQL. The plat-
form components communicate with each other via REST over HTTP. RDF is used as
the data model and exchange format in all communications, with the exception of the
use of SPARQL. All standard RDF serializations may be used, with Turtle being ex-
plicitly supported by all components implemented to date. Besides LDP and SPARQL,
the interaction between the components, as well as with external clients, is regulated by
APIs and the Fusepool Annotation Model (FAM) which are briefly explained below.
Transformers. Data transformation components are responsible for transforming data
from legacy formats (structured and unstructured) into RDF, and adding or refining
annotations to input data. In the Fusepool platform there are two families of transform-
ers: RDFizers and Annotators. The former transform non-RDF data to RDF, and the
latter enrich data in any format with RDF annotations.
Transformers are identified by a URI, which is the entry point for the RESTful
Transformer API defining the interaction with the transformer components. This API
supports both synchronous and asynchronous transformers. While a synchronous trans-
former returns the transformation result right away in the response to the transfor-
mation request, an asynchronous transformer delivers its result at a later time. Asyn-
chronous transformers may also require some user interaction in order to deliver their
A pipeline transformer invokes a list of transformers in sequence, passing the out-
put of a transformer as input to the next transformer. This enables chaining of multiple
transformers to perform more complex tasks.
The above-mentioned annotators are expected to produce annotation from textual
content, either unstructured or extracted from any other structured format. All annota-
tors produce RDF using FAM [14]. This is an important approach for piping annotators
and allowing configurations using multiple annotation services. The base structure of
FAM is fully compatible with Open Annotation [15], but defines some additional rela-
tions to ease the consumption of annotator results – especially the retrieval of selectors
for annotations.
LDP Transforming Proxy. This is an HTTP Proxy that is used as a reverse proxy in
front of an LDP Server. It intercepts POST requests against LDP Containers (LDPCs)
which are marked as Transforming Containers and then it (a) forwards the request to
the proxied LDP instance, and (b) sends the contents to the transformer associated with
the container. Once the result of the transformation is available, the LDP Transforming
Proxy will post it to the LDPC as well. In this way, the Transforming LDPC holds both
the original and the transformed data. A transforming LDPC can have a pipeline trans-
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report 103
former associated with it, should multiple transformers be executed over the POSTed
The Transforming Container API is defined as an extension to the LDP specifica-
tion to allow special containers to execute a transformer when a member resource is
added via a POST request. This allows documents to be automatically transformed
when they are added to a LDPC, and having both the original data and the transformed
data as a resource inside the Transforming LDPC. This process is supported via the
LDP Transforming Proxy.
The User Interaction Request API describes how an LDPC is used to maintain a
registry of requests for interaction. Its purpose is to provide support for components
which require user interaction during their lifetime, such as transformers requesting a
user input. According to the API, components submit a URI to the mentioned registry,
and remove the URI once the interaction is completed. A UI component can then pro-
vide the user with a link to the submitted URI. The component is free to present any
web-application at the denoted URI suitable for performing the required interaction.
Backends. The platform can use both Apache Marmotta and Virtuoso Universal Server
as backends, which provide the generic LDP and SPARQL interfaces and data persis-
tence in an RDF Triple Store. However, based on the architectural approach, any other
tool which supports the LDP and/or the SPARQL standards can be used as the platform
backend as well.
3. Experiences
Our experiences with the Fusepool platform are best explained by the example of our
two initial stakeholders in the Fusepool P3 project, namely two touristic regions in
Italy: Provincia Autonoma di Trento (PAT) and Regione Toscana (RET). They have
been publishing open data and are supporting the development of applications and
services in the tourism domain for some time. During this time both partners gained
valuable experience in data creation, maintenance and publication.
3.1. Limitations in Publishing Open Data
PAT and RET first started publishing data sets which were considered strategic. In
Italy in general but also in the two regions Tuscany and Trentino, one of the most im-
portant businesses is tourism. This also includes linked and related industrial activities
around tourism. Thus the regions are struggling with one particular question: How can
they support and push tourism by changing their daily operations.
Both partners provide a CKAN based open data portal,
which aims at data pub-
lishers providing tools to find and use data. The data quality depends on the data pro-
vider. Except adding some meta information, the data that gets pushed into the system
is the data which is made available to the user.
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report104
At project start, open data from PAT and RET was only available in particular data
formats like CSV, KML, XML and JSON. App developers had to download the raw
data and process it using their own ETL processes. With every update of the raw data,
this process had to be triggered manually for every single application using this data. If
the format of the raw data changed, the process had to be adjusted and could not be
automated. With every new data source, maintenance complexity of these open data
sets and its apps increased.
3.2. Linked Data Life-Cycle
Reducing the complexity for consuming open data requires that the necessary ETL
work is done up-front, ideally by the data owner or someone with domain knowledge.
Furthermore, the data should preferably be published as a service and without the need
for running separate database servers and other services. This is where linked data and
its RDF technology stack come into play. With its open, non-proprietary data model
using W3C standards such as SPARQL and HTTP, RDF is used as Lingua Franca us-
ing well-known schemas and ontologies.
In the classic document-centric web not much is known about the relationship be-
tween two pages as links between them are untyped. RDF links far more granular enti-
ties than pages, i.e. single attributes of an object, and defines relations between data
items in schemas and ontologies. Best practices recommend publishing these schemas
and ontologies also as RDF, thus making them publicly available in a machine-readable
3.3. Applying the Linked Data Life-Cycle
Experiences with applying the linked data life-cycle using the Fusepool platform were
made in preparation for and during a hackathon at the Spaghetti Open Data Event,
where the initial versions of two linked open data applications based on data from the
Province of Trento were developed.
In the first one, a web application called “LOD events eXplorer” allows events in
the Trento region to be browsed, and information and pictures of nearby points of in-
terests (POIs) are also shown (see Fig. 3). The developers could easily transform the
original data set provided as an XML feed into RDF using the XSLT transformer pro-
vided by the Fusepool platform and store the results in the data store of the platform,
making it accessible through SPARQL queries.
The most time-consuming manual task in doing so was to develop the XSLT file
that defines the mapping from the XML elements to the appropriate RDF model; crea-
tion of the mapping required developer skills and was a matter of a few hours, includ-
ing familiarization with the tool and environmental setting. The subsequent transfor-
mation of the data itself however took place in a matter of seconds only. RDFizing and
interlinking other data such as nearby POIs and images from DBpedia turned out to be
an easy and less complex task compared to the development of the initial mapping.
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report 105
Figure 3: The LOD events eXplorer application, showing events in the Trento region
A second application enables tourists to follow the footsteps of historical figures
from Trentino. They can read about these people, see where they lived and find POIs
and restaurants nearby. This mobile application – available in the respective app stores
for iOS and Android under the name “In The Footsteps: Trentino” – is based on several
open data sets available on the CKAN site operated by the region of Trento: namely,
historical characters, restaurants, architectural and artistic heritage plus POIs. These
were transformed in a similar fashion to linked open data on the Fusepool platform. For
additional information, data from Wikipedia and Yelp was also linked in. The devel-
opment time and the necessary skills turned out to be comparable to the LOD events
eXplorer application.
3.4. A Linked Data Publishing Methodology
Reflecting several LOD use cases, including those from the previous section, a
common methodology could be distilled, comprising of analysis, design and implemen-
tation steps. It turned out to be very helpful to externalize the findings in a LInked DA-
ta PUblishing MEthodology (LIDAPUME), consisting of a methodology schema (Fig.
4) and a template for orientation and guidance (Table 1).
Compared to other publishing methodologies, such as those used in LOD2 [8] or
LATC [9], non-technical steps are also under consideration here, as opposed to the
solely technical data life-cycle steps which are often used in related approaches. This
more holistic approach promotes the documentation of essential tasks which proved to
be helpful answering questions like “How long will it take to develop a use case with
this platform?” or “How many technical skills are necessary in order to achieve this?”.
The LIDAPUME steps are described in more detail in [16].
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report106
Figure 4: LIDAPUME schema, a linked data publishing methodology (from [16])
The template in Table 1 shows an instance of LIDAPUME, allowing annotations
of essential use case aspects. The template has been completed in the context of the
Swiss Archive Use Case, described below.
Table 1: LIDAPUME template for the Swiss Archive Use Case (1D=1 effort-day)
Using the methodology and the template turned out to be a good starting point for
LOD use case planning, with regards to completeness of the planning, necessary pro-
ject skills and project duration. Having experience from completed projects at hand,
allows for better estimation and shortens the learning curve.
The LIDAPUME methodology and template have been validated for several use
cases which are described in more detail in [16]. Besides the above described use cases,
it has been applied in enhancing the FU Berlin library content through an LOD use case,
called Library Keyword Clustering, and in the Swiss Archive Use Case [17].
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report 107
4. Conclusion and Outlook
In the past, a lot of time and energy was invested in providing tools for converting par-
ticular sets of data to linked data. Several FP7 projects such as LOD2 [8], LATC [9]
and GeoKnow [10] funded transformation of large linked data datasets which are now
available within the linked open data cloud. The Fusepool platform provides additional
value in the domain as it brings an integrated set of components that allow open data
from various sources to be easily published as linked open data, enabling development
of useful applications, like the examples described in this paper. The tools provided are
not domain-specific. While the current use cases have mainly been in the tourism do-
main, the methods can be applied equally well to other domains; we recently used the
platform to successfully transform around five million public records from the Swiss
Federal Archive
and four Swiss cantons and interlink it with GND, a universal authori-
ty file.
The most time-consuming task in order to promote data to the 5-star level [18] is in
defining the mappings of the original data sets to a linked data model. This requires
domain knowledge and close cooperation with domain experts. Once that one-time
effort has been done, the actual transformation of data can be automated such that new
data sets of the same type are to be published, they are transformed to linked data and
added to the RDF triple store.
To address this one-time effort, it turned out that two basic questions have to be
answered, namely “What are stable identifiers in the particular dataset?”, and “What is
the meaning of the data and how does it map to existing schemas and vocabularies?”
Answering the first question will help to coin stable URIs while the second ques-
tion will make data more useful for new data publishers. Integrating services like
Linked Open Vocabularies (LOV) [19] in P3 transformers support domain specialists
in mapping data to commonly used vocabularies. It is commonly recommended that the
focus should be on reusing existing vocabularies where possible and repurposing and
extending them where necessary only.
Experience has shown that the tools and technologies of the Fusepool platform for
publishing and reusing linked data are well suited for data publishers with technical
skills. For users with fewer technical skills additional help is necessary, whether it is in
the form of advice from developers or – preferably – in the form of guidelines and
more intuitive wizard-style tool guidance. Even for developers the learning curve is not
insignificant, in our use cases several iteration steps were necessary in order to become
familiar with the tooling environment and the data life-cycle processes.
To make sure these datasets and tools are maintainable, it is important to empower
data owners to run these processes on their own. Fusepool P3 provides some of the
necessary glue to integrate standalone components that were developed in the past and
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report108
will be developed in the future. By providing docker images,
the Fusepool platform
can be deployed within an organization within a few hours.
To have a sustainable linked data ecosystem, still more work is necessary on the
user interface level. In a follow-up project, it is thus planned to work with data publish-
ers to simplify the dashboard UI and to add a wizard-style tool guidance: For example,
when the user selects an XML-based data set in a CKAN site that he wants to publish
as linked data, the wizard will suggest using the XSLT transformer. The user still has
the option to choose another transformer like BatchRefine (which adds batch pro-
cessing capabilities to OpenRefine), but the wizard limits the possible user selections
only to transformers that can take an XML file as input.
In addition, it is planned to develop a cookbook that gives non-technical users
step-by-step instructions including screen casts on how to use the platform. It will be
based on three typical user scenarios, considering first data and subsequently technical
1. Based on a concrete data set in a CKAN site. The cookbook explains the steps
and the usage of additional tools that may be needed, e.g., how to create an
OpenRefine configuration in order to publish data from a CSV-based format.
2. Based on a concrete data file. This is very similar to the first scenario, the dif-
ference being that the file is not retrieved from a CKAN site but available on a
local drive.
3. Based on a known data structure and some sample data.
These changes and additions will hopefully simplify and improve the platform, al-
lowing data publishers to use it without further help, hence significantly simplifying the
task of publishing data as linked open data.
Acknowledgement: The research leading to these results has received funding
from the European Union Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 609696.
[1] T. Heath and C. Bizer, “Linked Data: Evolving the Web into a Global Data Space,” Synthesis Lectures
on the Semantic Web: Theory and Technology, vol. 1, no. 1. pp. 1–136, 2011.
[2] T. Berners-Lee, J. Hendler, and O. Lassila, “The Semantic Web,” Sci. Am., vol. 284, no. 5, pp. 35–43,
[3] W. Hall, “Linked Data: The Quiet Revolution,” ERCIM News, vol. 96, p. 4, 2014.
[4] F. Bauer and M. Kaltenböck, Linked Open Data: The Essentials. edition mono, Vienna, Austria, 2012.
[5] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann, “DBpedia - A
crystallization point for the Web of Data,” J. Web Semant., vol. 7, no. 3, pp. 154–165, 2009.
[6] ERCIM, “ERCIM News,” News, 2014. [Online]. Available: [Ac-
cessed: 22-Oct-2015].
[7] S. Auer, J. Lehmann, A. C. Ngonga Ngomo, and A. Zaveri, “Introduction to linked data and its lifecycle
on the web,” LNAI, vol. 8067, pp. 1–90, 2013.
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report 109
[8] S. Auer, L. Bühmann, C. Dirschl, O. Erling, M. Hausenblas, R. Isele, J. Lehmann, M. Martin, P. N.
Mendes, B. Van Nuffelen, C. Stadler, S. Tramp, and H. Williams, “Managing the Life-Cycle of Linked
Data with the LOD2 Stack,” in The Semantic Web–ISWC 2012, 2012, pp. 1–16.
[9] LATC, “LATC,” LOD Around-the-Clock, 2012. [Online]. Available:
[Accessed: 22-Oct-2015].
[10] S. Athanasiou, D. Hladky, G. Giannopoulos, G.-R. Alejandra, and J. Lehmann, “GeoKnow: Making the
Web an Exploratory Place for Geospatial Knowledge,” ERCIM News, vol. 96, pp. 12–13, 2014.
[11] M. Kaschesky and L. Selmi, “Fusepool R5 linked data framework,” Proc. 14th Annu. Int. Conf. Digit.
Gov. Res. - dg.o ’13, p. 156, 2013.
[12] A. Gschwend, A. C. Neuroni, T. Gehrig, and M. Combetto, “Publication and Reuse of Linked Data:
The Fusepool Publish-Process-Perform Platform for Linked Data,” in Electronic Government and Elec-
tronic Participation: Joint Proceedings of Ongoing Research of IFIP EGOV and IFIP ePart 2015, vol.
22, pp. 116–123.
[13] S. Speicher, J. Arwe, and A. Malhotra, “Linked Data Platform,” W3C Recommendation, 2015. [Online].
[14] R. Gmür, S. Fernández, J. Frank, R. Westenthaler, C. Blakeley, I. Kingsley, and A. Gschwend, “The
Fusepool P3 Platform,” 2014. [Online]. Available: [Accessed: 22-
[15] R. Sanderson, P. Ciccarese, and H. Van de Sompel, “Designing the W3C open annotation data model,”
Proc. 5th Annu. ACM Web Sci. Conf. - WebSci ’13, pp. 366–375, 2013.
[16] E. Klein, A. Gschwend, and A. C. Neuroni, “Towards a Linked Data Publishing Methodology,” in
CeDEM 2016 (Conf. for e-Democracy and Open Governemt).
[17] J.-L. Cochard, A. Dubois, A. D. Gonzenbach, A. Gschwend, K. Lambert, S. Kwasnitza, M. Luggen, U.
Meyer, F. Noyer, and T. Wildi, “Archival-Linked Open Data: practical and technical approach - A
swiss collaborative project,” in Archives: Evidence, Security & Civil Rights (3rd ICA Conf.), 2015.
[18] T. Berners-Lee, “Linked data-design issues,” 2009. [Online]. Available: [Accessed: 22-Oct-2015].
[19] P.-Y. Vandenbussche and B. Vatant, “Linked Open Vocabularies,” ERCIM News, vol. 96, p. 21, 2014.
E. Klein et al. / Sustainable Linked Open Data Creation: An Experience Report110
Full-text available
The vast amount of data available over the distributed infrastructure of the Web has initiated the development of techniques for their representation, storage and usage. One of these techniques is the Linked Data paradigm, which aims to provide unified practices for publishing and contextually interlinking data on the Web, by using the World Wide Web Consortium (W3C) standards and the Semantic Web technologies. This approach enables the transformation of the Web from a web of documents, to a web of data. With it, the Web transforms into a distributed network of data which can be used by software agents and machines. The interlinked nature of the distributed datasets enables the creation of advanced use-case scenarios for the end users and their applications , scenarios previously unavailable over isolated data silos. This creates opportunities for generating new business values in the industry. The adoption of the Linked Data principles by data publishers from the research community and the industry has led to the creation of the Linked Open Data (LOD) Cloud, a vast collection of interlinked data published on and accessible via the existing infrastructure of the Web. The experience in creating these Linked Data datasets has led to the development of a few methodo-logies for transforming and publishing Linked Data. However, even though these methodologies cover the process of modeling, transforming / generating and publishing Linked Data, they do not consider reuse of the steps from the life-cycle. This results in separate and independent efforts to generate Linked Data within a given domain, which always go through the entire set of life-cycle steps. In this PhD thesis, based on our experience with generating Linked Data in various domains and based on the existing Linked Data methodologies, we define a new Linked Data methodology with a focus on reuse. It consists of five steps which encompass the tasks of studying the domain, modeling the data, transforming the data, publishing it and exploiting it. In each of the steps, the methodology provides guidance to data publishers on defining reusable components in the form of tools, schemas and services, for the given domain. With this, future Linked Data publishers in the domain would be able to reuse these components to go through the life-cycle steps in a more efficient and productive manner. With the reuse of schemas from the domain, the resulting Linked Data dataset will be compatible and aligned with other datasets generated by reusing the same components, which additionally leverages the value of the datasets. This approach aims to encourage data publishers to generate high-quality, aligned Linked Data datasets from various domains, leading to further growth of the number of datasets on the LOD Cloud, their quality and the exploitation scenarios. With the emergence of data-driven scientific fields, such as Data Science, creating and publishing high-quality Linked Data datasets on the Web is becoming even more important, as it provides an open dataspace built on existing Web standards. Such a dataspace enables data scientists to make data analytics over the cleaned, structured and aligned data in it, in order to produce new knowledge and introduce new value in a given domain. As the Linked Data principles are also applicable within closed environments over proprietary data, the same methods and approaches are applicable in the enterprise domain as well.
Conference Paper
Full-text available
The work reported here was done within the applied research project Fusepool. Fusepool is an international project to develop a set of integrated software components to ease the publishing of open data based on an open-source Linked Data Platform and a set of associated best practices. This article provides researchers and laypersons interested in Linked Open Government Data an overview of the concepts, methodologies, and tools. It introduces the main concepts, such as open government and open government data along with enabling technological approaches such as the semantic web and linked data. It then presents the methodologies for publishing Linked Open Government Data along the R5 framework: Reveal, Refine, Reuse, Release, and Run. The final section discusses the potential impact of the Linked Data Platform as well as areas for further improvement.
Conference Paper
Full-text available
The LOD2 Stack is an integrated distribution of aligned tools which support the whole life cycle of Linked Data from extraction, authoring/creation via enrichment, interlinking, fusing to maintenance. The LOD2 Stack comprises new and substantially extended existing tools from the LOD2 project partners and third parties. The stack is designed to be versatile; for all functionality we define clear interfaces, which enable the plugging in of alternative third-party implementations. The architecture of the LOD2 Stack is based on three pillars: ( 1 ) Software integration and deployment using the Debian packaging system. ( 2 ) Use of a central SPARQL endpoint and standardized vocabularies for knowledge base access and integration between the different tools of the LOD2 Stack. ( 3 ) Integration of the LOD2 Stack user interfaces based on REST enabled Web Applications. These three pillars comprise the methodological and technological framework for integrating the very heterogeneous LOD2 Stack components into a consistent framework. In this article we describe these pillars in more detail and give an overview of the individual LOD2 Stack components. The article also includes a description of a real-world usage scenario in the publishing domain.
Conference Paper
Full-text available
With Linked Data, a very pragmatic approach towards achieving the vision of the Semantic Web has recently gained much traction. The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. While many standards, methods and technologies developed within by the Semantic Web community are applicable for Linked Data, there are also a number of specific characteristics of Linked Data, which have to be considered. In this article we introduce the main concepts of Linked Data. We present an overview of the Linked Data lifecycle and discuss individual approaches as well as the state-of-the-art with regard to extraction, authoring, linking, enrichment as well as evolution of Linked Data. We conclude the chapter with a discussion of issues, limitations and further research and development challenges of Linked Data.
Conference Paper
Many public administrations have been publishing data sets as open data for the last few years through portals based on CKAN or other platforms. Applications have been developed using, most of the time, one single data set as developers, data journalists and small businesses cannot afford the cost of re-using and integrating different data sets. This is due to many factors as the different formats used for the data, lack of documentation, different metadata and lack of public, trustworthy registries for entities of interest that could ease the task of connecting information provided by different authors about them. The Linked Data principles offer the guidelines to solve all these issues by leveraging the Semantic Web standards for describing resources on the Web. Moreover the Linked Data Platform specification, a recent W3C Recommendation, provides a realization of the guidelines defining some basic interactions between a client and a server to manage resources using the HTTP protocol. The purpose of this article is to present the Fusepool P3 platform that extends the LDP specification to support services which transform raw data sets into RDF format and enrich them.
The Open Annotation Core Data Model specifies an interoperable framework for creating associations between related resources, called annotations, using a methodology that conforms to the Architecture of the World Wide Web. Open Annotations can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource. This paper presents the W3C Open Annotation Community Group specification and the rationale behind the scoping and technical decisions that were made. It also motivates interoperable Annotations via use cases, and provides a brief analysis of the advantages over previous specifications.
The DBpedia project is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web. The resulting DBpedia knowledge base currently describes over 2.6 million entities. For each of these entities, DBpedia defines a globally unique identifier that can be dereferenced over the Web into a rich RDF description of the entity, including human-readable definitions in 30 languages, relationships to other resources, classifications in four concept hierarchies, various facts as well as data-level links to other Web data sources describing the entity. Over the last year, an increasing number of data publishers have begun to set data-level links to DBpedia resources, making DBpedia a central interlinking hub for the emerging Web of Data. Currently, the Web of interlinked data sources around DBpedia provides approximately 4.7 billion pieces of information and covers domains such as geographic information, people, companies, films, music, genes, drugs, books, and scientific publications. This article describes the extraction of the DBpedia knowledge base, the current status of interlinking DBpedia with other data sources on the Web, and gives an overview of applications that facilitate the Web of Data around DBpedia.