Conference PaperPDF Available

An Access Control Model for Linked Data


Abstract and Figures

Linked Open Data refers to a set of best practices for the publication and interlinking of structured data on the Web in order to create a global interconnected data space called Web of Data. To ensure the resources featured in a dataset are richly described and, at the same time, protected against malicious users, we need to specify the conditions under which a dataset is accessible. Being able to specify access terms should also encourage data providers to publish their data. We introduce a lightweight vocabulary, called Social Semantic SPARQL Security for Access Control Ontology (S4AC), allowing the definition of fine-grained access control policies formalized in SPARQL, and enforced when querying Linked Data. In particular, we define an access control model providing the users with means to define policies for restricting the access to specific RDF data, based on social tags, and contextual information.
Content may be subject to copyright.
An Access Control Model for Linked Data
Serena Villata, Nicolas Delaforge, Fabien Gandon, Amelie Gyrard
To cite this version:
Serena Villata, Nicolas Delaforge, Fabien Gandon, Amelie Gyrard. An Access Control Model
for Linked Data. OTM Workshops, Oct 2011, Heraklion, Greece. Springer, 7046, pp.454-463,
2011, Lecture Notes in Computer Science. <10.1007/978-3-642-25126-9 57>.<hal-00695229>
HAL Id: hal-00695229
Submitted on 14 May 2012
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destin´ee au d´epˆot et `a la diffusion de documents
scientifiques de niveau recherche, publi´es ou non,
´emanant des ´etablissements d’enseignement et de
recherche fran¸cais ou ´etrangers, des laboratoires
publics ou priv´es.
An Access Control Model for Linked Data?
Serena Villata, Nicolas Delaforge, Fabien Gandon, and Amelie Gyrard
INRIA Sophia Antipolis {firstname.lastname}
Abstract. Linked Open Data refers to a set of best practices for the
publication and interlinking of structured data on the Web in order to
create a global interconnected data space called Web of Data. To ensure
the resources featured in a dataset are richly described and, at the same
time, protected against malicious users, we need to specify the condi-
tions under which a dataset is accessible. Being able to specify access
terms should also encourage data providers to publish their data. We
introduce a lightweight vocabulary, called Social Semantic SPARQL Se-
curity for Access Control Ontology (S4AC), allowing the definition of
fine-grained access control policies formalized in SPARQL, and enforced
when querying Linked Data. In particular, we define an access control
model providing the users with means to define policies for restricting
the access to specific RDF data, based on social tags, and contextual
Keywords: LOD, Security, SPARQL 1.1, Context, Named Graphs
1 Introduction
Linked Data1[6] enables us to set links between items in different data sources,
and to connect these sources into a single global data space. These data are
provided with machine-readable annotations called metadata. Metadata have
the aim to provide a flexible way to describe things, and how they relate to
other things. However, one of the challenges of Linked Data is access control. As
underlined by Bizer et al. [2, 6], the datasets are published in the Linked Open
Data (LOD) cloud2without the addition of any kind of metadata specifying the
access control conditions under which the data is accessible.
This paper addresses the research question: How to define an access control
model for Linked Data? This is important in order to encourage as many data
providers as possible to publish data in their own terms, and not only fully
public data. The research question breaks down into two sub-questions: (i) how
to define fine-grained access policies? and (ii) how to define context-based access
The issue of defining access control policies for the Web has been addressed
by the Web Access Control vocabulary (WAC)3, which allows the user to specify
?DataLift is funded by the French National Research Agency: ANR-10-CORD-09.
access control lists (ACL). The ACL are of the form [acl:accessTo <card.rdf>;
acl:mode acl:Read, acl:Write; acl:agentClass <groups/fam#group>], which means
that anyone in the group <> may read and
write card.rdf. The WAC vocabulary distinguishes four classes of access con-
trol privileges: Read (read the content), Write (delete or update the content),
Control (set the ACL for the content), and Append (add information at the end
of the content). This vocabulary grants the access to a whole RDF document,
e.g., card.rdf. In this paper, we aim at providing fine-grained access control poli-
cies which grant the access to specific RDF data, i.e., the information providers
may want to restrict the access to a few named graphs [4]. Moreover, we enable
the requester to submit any SPARQL query, and resource provider to further
specify the access control privilege granted to the user, and we distinguish the
Delete and Update classes of privileges, included in the Write WAC class.
We introduce the Social Semantic SPARQL Security for Access Control vo-
cabulary (S4AC), a lightweight ontology which allows the information providers
to specify fine-grained access control policies for their RDF data (Figure 1). At
the core of S4AC is the Access Condition which is a SPARQL 1.1. ASK clause
that specifies the condition to be satisfied in order to grant the access to a
named graph. Moreover, the information providers can define Access Condi-
tions based on tags which restrain the conditions to named graphs tagged with
such tags, e.g., named graphs tagged “friends”, “amici”, “ami”. The conditions
can be bound on specific values to provide an access evaluation context, e.g.,
<‘‘?user’’, <>> where the URI of the user is
bound to <>. Finally, the Access Condition is as-
sociated with a temporal validity. The Access Privilege defines which kind of priv-
ilege is granted to the user satisfying the Access Conditions, e.g., s4ac:Update
grants the user the privilege to modify the requested named graph.
DisjunctiveACS ConjunctiveACS
subClassOf subClassOf
Fig. 1. An overview of the S4AC Ontology.
A key feature of our approach is to rely only on Semantic Web languages. As a
consequence, our access control model is platform independent, and can be used
by any kind of system based on those languages. In particular, the semantics of
our access control policies is grounded in SPARQL 1.14ASK queries. Relying on
SPARQL semantics, our model allows the user to submit arbitrary queries while
enforcing fine-grained access rules on the results he will receive. If the result of
the ASK query is true, then the user is provided with the information he requires.
If the result is false, then the model returns to the user a denial coupled with
one or more rule labels explaining the denial.
The reminder of the paper is organized as follows: Section 2 presents a use
case of the proposed access control model. Section 3 introduces the S4AC ontol-
ogy, and it details and analyses the access control polices which can be defined
using our model. Related work and conclusions end the paper.
2 The DataLift use case
The DataLift project5aims at providing a platform to ease the publication and
interlinking of datasets on the Web of Data. Figure 2 illustrates the Access
Control Manager (ACM) which is the core module of our access control model.
We now describe the features of the Access Control Manager first from the point
of view of the user, and then from the point of view of the system.
SPARQL 1.1 query
Access Control Manager
Query result
Access Denied
Fig. 2. The Access Control Manager.
Consider a user who wants to access some of the information published on
the Web of Data by means of the DataLift platform. The user first authenticates
to the ACM of the platform using the WebID protocol6. The user queries the
datasets using a SPARQL 1.1 SELECT,MODIFY,INSERT, or DELETE query7, de-
pending on the kind of operation the user intends to perform on the requested
7The MODIFY,INSERT, and DELETE queries are provided by SPARQL 1.1. See
data. The ACM returns the user an answer of the kind YES/NO together with
the query result, or the labels of the rules that caused the failure.
The ACM receives an authentication request from the user by means of the
WebID protocol. Then, after a successful authentication, it receives the query of
the user. The ACM has the aim to grant or restrict the access to the RDF data
published using the DataLift platform, where each SPARQL endpoint manages
its requests. Once the request of the user is received, the ACM selects, by means
of the module called Access Control Policies Selector (ACPS), which policy ap-
plies, depending on the requested operation. For instance, if the user submitted
aMODIFY query, then the ACPS identifies all the policies which apply, and con-
cern an Update access privilege. The ACPS handles two kinds of operations: (i)
it checks the S4AC ontology in order to identify which access conditions apply,
and (ii) it checks whether the contextual information, e.g., the temporal validity
of the selected policies, is satisfied. Note that we check whether the contextual
constraints hold before checking the reminder of the policy. If the contextual con-
straints are not satisfied, we already know that the access will not be granted.
After the identification of the policies, and a positive checking of the contex-
tual constraints, the Access Controller module matches the policies according
to the user’s profile to test what he can access. The Access Controller addresses
a SPARQL ASK query which returns true if the access to the named graph is
granted to the user. The Access Controller selects the set of named graphs to
which the user has access, and queries this dataset adding FROM, FROM NAMED
to the user’s query. If the answer is false, then the Access Controller returns
a failure, coupled with the categories causing the failure. These categories are
provided to the Access Controller by the ACPS when it checks the ontology.
3 Access control for Linked Data
3.1 Social Semantic SPARQL Security for Access Control Ontology
The Social Semantic SPARQL Security for Access Control Ontology (S4AC),
online at, is detailed in Figure 3. One of the
key features of our access control approach is to be integrated with the models
adopted in the fields of the Social Web, and of the Web of Data. In particular,
S4AC reuses concepts from SIOC8, SCOT9, NiceTag10, WAC, TIME11 , and the
access control model as a whole is grounded on further existing ontologies, as
FOAF12, Dublin Core13, and RELATIONSHIPS14.
The main class of the S4AC ontology is the class AccessCondition, which is
a subclass of the class Condition, itself a subclass of sioc:Item.
rdfs:subClassOf s4ac:AccessCondition
rdfs:subClassOf rdfs:subClassOf
Fig. 3. The S4AC Ontology.
Definition 1. An Access Condition (AC) is a SPARQL 1.1 ASK query. If a
solution exists, the ASK query returns true, and the Access Condition is said
to be verified. If no solution exists the ASK query returns false, and the Access
Condition is said not to be verified.
The Access Condition grants or restricts the access to the data. If the ASK re-
turns true, the access is granted to the user. In order to return the user a more
informative answer if the access is denied, we introduce the property hasCat-
egoryLabel. This property allows to associate to each AC one or more natural
language labels which “identify” the access condition, and they are returned to
the user to provide him the reasons of the denial. We cannot return the user
all the access conditions, because this would make him aware of the policies
of the provider. The AccessCondition defines the property of the access polices
hasValidity. It allows to define the validity of an Access Condition. Thanks to
the use of the concept time:TemporalEntity, the validity can be expressed in
various ways: valid from/through a specific date/time, or valid in a specific in-
terval of time. This property is used to express policies in which not only the
identity of the user requesting the data is checked, but also the contextual in-
formation related to the time in which the request is performed. A further class
is MaxResource which defines the number of times the user can access all or
a specified named graph. This class has the property maxOnResource which is
used to precise which resource is accessible by a limited number of accesses.
Definition 2. An Access Evaluation Context (AEC) is a list Lof predetermined
bound variables of the form L= (hvar1, val1i,hvar2, v al2i,...,hvarn, valni)that
is turned into a SPARQL 1.1 Binding Clause to constrain the ASK query evalu-
ation when verifying the Access Conditions.
The AEC is represented in the ontology as the class AccessEvaluationCon-
text which has two properties, hasVariable and hasValue, which are respec-
tively the variable, and the value to which the variable is bound. It is used
to provide a standard evaluation context to the access conditions, e.g., request-
ing user, resource provider. Consider the following example: L=(<‘‘?resource’’,
This list can be used to generate an additional Binding Clause for the access con-
ditions of the form: BINDINGS ?resource ?user {(<>,
Definition 3. An Access Condition Set (ACS) is a set of Access Conditions.
The AccessConditionSet class has a property hasAccessCondition which iden-
tifies which Access Conditions form the ACS. Two subclasses of AccessCondi-
tionSet are introduced: conjunctive, and disjunctive ACS.
Definition 4. A Conjunctive Access Condition Set (CACS) is a logical con-
junction of Access Conditions of the form CACS =AC1AC2. . . ACn. A
CACS is verified if and only if every access conditions it contains is verified.
Definition 5. A Disjunctive Access Condition Set (DACS) is a logical dis-
junction of Access Conditions of the form DACS =AC1AC2. . . ACn. A
DACS is verified if and only if at least one of the access conditions it contains
is verified.
Definition 6. An Access Tagging Rule (ATR) is a triple R=hACS, T agS et,
Bindingsiwhere ACS is an Access Condition Set, TagSet is a set of tags {tag1,
tag2, . . . , tagm}, and Bindings is an Access Evaluation Context. An ATR is ver-
ified for a named graph tagged with one or more tags from TagSet if and only if
the ACS is verified for that named graph.
An AT R declares that the access conditions in the AC S apply to any named
graph tagged with one or more tags from TagSet. Notice that the ACS may be
reduced to a single access condition. In this case, the AT R is said to be verified
if and only if the single access condition is verified. Note that TagSet may be
empty, in which case the AT R applies to any named graph. The class AccessTag-
gingRule has four properties: hasAccessConditionSet, associating an ACS to the
AT R,hasTag, providing a set of tags to the AT R,hasAccessEvaluationContext,
associating to the AT R the AEC, i.e., the bindings applied to the rule, and
hasAccessPrivilege. The hasAccessPrivilege property defines the access privilege
the user is granted to: Read,Create,Update,Delete. We expand the acl:Write
class, which is used for every kind of modification on the content, and we allow
fine-grained access control privileges. The class AccessTag, used to define the set
of tags, is a sub-class of scot:Tag.
3.2 The access control policies
We show now which kind of access control policies are enabled by the proposed
access control model. Consider the policy defined below: the data provider defines
an access policy such that only his named graphs tagged with tag “family” are
constrained by the access condition which grants the access to those users which
have a hasParent relationship with the provider, i.e., the parents of the provider.
The Access Condition Set is composed only by one access condition, thus this
is the only one which needs to be evaluated. The access privilege is Update.
Thus, given a MODIFY query of the user, if he is granted with the access, then
he is allowed to Update the requested named graphs. Concerning the contextual
information, the Access Tagging Rule grants the access to the user if the date
of the access is after December 31th at 23:59. If the user is not granted with the
access then the label the system returns him together with the failure message
is “parents”, to explain that the reasons of the failure have to be associated to
the fact that the user is not a parent of the provider; the system does not return
the entire policy to the user.
a s4ac:AccessTaggingRule;
s4ac:hasAccessConditionSet [
s4ac:hasAccessCondition [
s4ac:hasValidity [
time:hasBeginning [
time:inXSDDateTime 2011-12-31T23:59:00
s4ac:hasCategoryLabel skos:PrefLabel ’’parents’’@en;
s4ac:hasQueryAsk [
ASK { ?resource dcterms:creator ?provider .
?provider rel:hasParent ?user }
s4ac:hasAccessPrivilege s4ac:Update;
s4ac:hasTag scot:Tag ’’family’’@en.
The table below presents some examples of the ASK queries which may be
associated with the access conditions. Cond1 grants the access to those users
who have a relationship of kind “colleagues” with the provider. Cond2 grants
the access to the friends of the provider, and Cond3 extends this access condition
also to the friends of friends. Cond4 is more complicated15. It grants the access
to those users that are marked with a specified tag. For specifying the tag, we
use the NiceTag ontology. Also negative access conditions are allowed, where we
specify which specific user cannot access the data. This is expressed, as shown
in Cond5, by means of the FILTER clause, and the access is granted to every
user except sery.Cond6 expresses an access condition where the user can access
the data only if he is a minimum lucky, e.g., one chance out of two. Finally,
Cond7 grants the access to those users who are members of at least one group
the provider belongs to.
An example of conjunctive ACS is as follows: CACSf riendsbutsery =Cond2
Cond5, where the access is granted to the users who are friends of the provider,
15 The GRAPH keyword is used to match patterns against named graphs.
ASK { ?resource dcterms:creator ?provider .
?provider rel:hasColleague ?user }
ASK { ?resource dcterms:creator ?provider .
?provider rel:hasFriend ?user }
ASK { ?resource dcterms:creator ?provider .
?provider rel:hasFriend{1,2} ?user }
ASK { ?resource dcterms:creator ?provider .
?provider dcterms:creator ?g .
GRAPH ?g { ?user nicetag:hasCommunitySign ?tag }}
ASK { FILTER(! (?user= <>))}
ASK { FILTER(random()>0.5) }
ASK { ?resource dcterms:creator ?provider .
?provider sioc:member_of ?g .
?user sioc:member_of ?g }
but the user <>, even if friend of the provider, cannot
access the data. An example of disjunctive ACS is DACScolleaguesorfriends =
Cond1C ond2, where it is ensured that the users who are colleagues or friends
of the provider are allowed to access the data.
The ATR detailed above can be constrained to a wider set of tags such as
AT Rparents =hC ond, {00parent00,00 parents00,00 family00,00 relatives00},∅i where no
AEC is provided. Further examples of AT Rs are: (i)
AT Rfriends =hCond2,{00 f riends00 ,00 amici00 ,00 ami00 },∅i where the access con-
dition constrains the access to friends, and three tags are provided without an
AEC; (ii) AT Rgroup =hC ond7,{00common00 ,00 group00,00 close00},∅i is the same for
the belonging to the group of the provider; (iii)
AT Rhiking =hCond4,,h00?tag00 ,00 hiking00ii where the user can access the data
if he is tagged with tag “hiking” in the graph created by the provider; (iv)
AT Rfun =hDACScolleaguesor friends ,{00f un00,00 f unny00 ,00 :)00 },∅i where the
user can access the data if the disjunctive ACS above is satisfied on the named
graphs tagged with these three tags.
The prototype under development relies on the SPARQL query engine
KGRAM/CORESE16. Briefly, the system uses the Binding SPARQL 1.1 to sub-
stitute the variable ?resource with the URI of the named graphs to be accessed.
The query is executed to obtain all the AT Rs associated with the named graphs,
and the data provider. CORESE returns these AT Rs which contain the ACS.
The ASK queries inside the single AC are executed on CORESE, and the returned
booleans are conjunctively or disjunctively evaluated to grant or deny the access.
4 Related work
Sacco and Passant [9] present a Privacy Preference Ontology (PPO), built on
top of WAC, in order to express fine-grained access control policies to an RDF
file. They also specify the access queries with a SPARQL ASK, but their vocab-
ulary does not consider the temporal validity of the privacy preferences, and
the number of accesses allowed for each named graph. They rely entirely on the
WAC vocabulary without distinguishing different kinds of Write actions. Their
model does not allow to specify set of tags to limit the application of the policies
to the named graphs marked with those tags, and to specify conjunctive and dis-
junctive sets of privacy preferences. Muhleisen et al. [8] present a policy-enabled
server for Linked Data called PeLDS, where the access policies are expressed
using a descriptive language called PsSF, based on SWRL17. They distinguish
only Read and Update actions, and they do not consider contextual information.
Moreover, the system is based on an ontology of the actions that can be per-
formed on the datasets, i.e., Action,Rule,TriplePattern, no further description
is provided in [8].
Giunchiglia et al. [5] propose a Relation Based Access Control model (Rel-
BAC ), providing a formal model of permissions based on description logics. They
require to specify who can access the data, while in our model and in [9] the
provider can rely on specifying the attributes the user must satisfy. The Access
Management Ontology (AMO) [3] defines a role-based access control model. Such
a kind of role-based access control model applied to the world of Linked Data
does not provide enough flexibility since it again needs to specify who can access
the data. Abel et al. [1] present a model of context-dependent access control at
triple level, where also contextual predicates are allowed, e.g., related to time,
location, credentials. The policies are not expressed using Web languages, but
they introduce an high level syntax then mapped to existing policy languages.
They enforce access control as a layer on top of RDF stores. After the evaluation
of the contextual information, the queries are expanded, and then sent to the
database. Hollenbach and Presbrey [7] present a system where the users can de-
fine access controls on RDF documents, and these access controls are expressed
using the WAC. Our model extends WAC for allowing the construction of more
fine-grained access control policies.
5 Conclusions
In this paper, we introduce a fine-grained model of access control for Linked
Data. We rely only on Semantic Web languages, namely SPARQL 1.1 queries,
Update language, and Binding Clause. We present the S4AC vocabulary which
allows to define various kinds of fine-grained access policies on named graphs.
These policies involve both social aspects of the user who wants to access the
data, e.g., social relationship with the provider, being member of a group, being
tagged with a specific tag, and contextual information, e.g., the day in which
the request is performed is in a particular time interval, the user is allowed to
access the named graph for five times at most. Policies are evaluated together
with a set of tags, which restrain the policies on data tagged in such a way, and
an evaluation context which binds the variables of the query to specific values.
Moreover, we introduce the four access privileges as defined by the C.R.U.D.,
and we map them with the SPARQL 1.1 query to identify the policies regarding
this privilege which are defined on the requested named graph.
There are different research lines for future work. First, a prototype of the Ac-
cess Control Manager is under definition together with a user-friendly interface
allowing also non-expert users to define their own access terms. Our prototype
for the DataLift platform will show a real world application of the proposed
model with the aim to test its effectiveness. Second, we plan to introduce dele-
gation in the model, in order to allow the provider to delegate some authority. An
open issue remains whether this kind of delegation involves also the authority to
modify the access policies defined by the provider. Third, we plan to introduce
the licenses, e.g., Creative Commons18 and Waivers19, as a further description
of the datasets. These licenses then have to be returned together with the re-
quested data, even if the user does not ask explicitly for this information. This
is needed to allow data providers to open publish their datasets together with
their own terms of reuse.
1. Abel, F., Coi, J.L.D., Henze, N., Koesling, A.W., Krause, D., Olmedilla, D.: En-
abling advanced and context-dependent access control in rdf stores. In: Proceedings
of the 6th International Semantic Web Conference (ISWC-2007), LNCS 4825. pp.
1–14 (2007)
2. Bizer, C., Heath, T., Berners-lee, T.: Linked Data - The Story So Far. International
Journal on Semantic Web and Information Systems 5, 1–22
3. Buffa, M., Faron-Zucker, C., Kolomoyskaya, A.: Gestion s´emantique des droits
d’acc`es au contenu : l’ontologie AMO. In: Yahia, S.B., Petit, J.M. (eds.) EGC.
Revue des Nouvelles Technologies de l’Information, vol. RNTI-E-19, pp. 471–482.
Editions (2010)
4. Carroll, J.J., Bizer, C., Hayes, P.J., Stickler, P.: Named graphs. J. Web Sem. 3(4),
247–267 (2005)
5. Giunchiglia, F., Zhang, R., Crispo, B.: Ontology driven community access control.
In: Proceedings of the 1st Workshop on Trust and Privacy on the Social and Se-
mantic Web (SPOT-2009) (2009)
6. Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space (1st
edition), Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 1:1.
Morgan & Claypool (2011)
7. Hollenbach, J., Presbrey, J., Berners-Lee, T.: Using RDF Metadata To Enable Ac-
cess Control on the Social Semantic Web. In: Proceedings of the Workshop on Col-
laborative Construction, Management and Linking of Structured Knowledge (CK-
2009) (2009)
8. Muhleisen, H., Kost, M., Freytag, J.C.: SWRL-based Access Policies for Linked
Data. In: Proceedings of the 2nd Workshop on Trust and Privacy on the Social and
Semantic Web (SPOT-2010) (2010)
9. Sacco, O., Passant, A.: A Privacy Preference Ontology (PPO) for Linked Data. In:
Proceedings of the 4th Workshop about Linked Data on the Web (LDOW-2011)
Personal data are increasingly disseminated over the Web through mobile devices and smart environments, and are exploited for developing more and more sophisticated services and applications. All these advances come with serious risks for privacy breaches that may reveal private information wanted to remain undisclosed by data producers. It is therefore of utmost importance to help them to identify privacy risks raised by requests of service providers for utility purposes. In this paper, we first formalize privacy risks by privacy queries expressed (and kept secret) by each data producer to specify the data they do not want to be disclosed. Then, we develop a formal approach for detecting incompatibility between privacy and utility queries expressed as temporal aggregate conjunctive queries. The distinguishing point of our approach is to be data-independent and to come with an explanation based on the query expressions only. This explanation is intended to help data producers understand the detected privacy breaches and guide their choice of the appropriate technique to correct it.KeywordsTemporal aggregated conjunctive queriesUtility queriesPrivacy queries
Full-text available
Despite the need for data in a time of general digitization of organizations, many challenges are still hampering its shared use. Technical, organizational, legal, and commercial issues remain to leverage data satisfactorily, specially when the data is distributed among different locations and confidentiality must be preserved. Data platforms can offer “ad hoc” solutions to tackle specific matters within a data space. MUSKETEER develops an Industrial Data Platform (IDP) including algorithms for federated and privacy-preserving machine learning techniques on a distributed setup, detection and mitigation of adversarial attacks, and a rewarding model capable of monetizing datasets according to the real data value. The platform can offer an adequate response for organizations in demand of high security standards such as industrial companies with sensitive data or hospitals with personal data. From the architectural point of view, trust is enforced in such a way that data has never to leave out its provider’s premises, thanks to federated learning. This approach can help to better comply with the European regulation as confirmed from a legal perspective. Besides, MUSKETEER explores several rewarding models based on the availability of objective and quantitative data value estimations, which further increases the trust of the participants in the data space as a whole.
Full-text available
Digital transformation, data ecosystems, and Data Spaces are inevitable parts of our future. The book aims to educate the reader on data sharing and exchange techniques using Data Spaces. It will address and explore the cutting-edge theory, technologies, methodologies, and best practices for Data Spaces for both industrial and personal data. The book provides the reader with a basis for understanding the scientific foundation of Data Spaces, how they can be designed and deployed, and future directions.
Full-text available
Today, the need for “end-to-end” coordination between the electricity sector stakeholders, not only in business terms but also in securely exchanging real-time data, is becoming a necessity to increase electricity networks’ stability and resilience while satisfying individual operational optimization objectives and business case targets of all stakeholders. To this end, the SYNERGY energy data platform builds on state-of-the-art data management, sharing, and analytics technologies, driven by the actual needs of the electricity data value chain. This paper will describe the layered SYNERGY Reference Architecture that consists of a Cloud Infrastructure, On-Premise Environments, and Energy Apps and discuss the main challenges and solutions adopted for (a) the design of custom pipelines for batch and streaming data collection and for data manipulation and analytics (based on baseline or pre-trained machine learning and deep learning algorithms) and (b) their scheduled, on-event, or real-time execution on the cloud, on-premise and in gateways, toward an energy data space. Particular focus will be laid on the design of the SYNERGY AI analytics marketplace that allows for trustful sharing of data assets (i.e., datasets, pipelines, trained AI models, analytics results) which belong to different stakeholders, through a multi-party smart contract mechanism powered by blockchain technologies.
Full-text available
The path that the European Commission foresees to leverage data in the best possible way for the sake of European citizens and the digital single market clearly addresses the need for a European Data Space. This data space must follow the rules, derived from European values. The European Data Strategy rests on four pillars: (1) Governance framework for access and use; (2) Investments in Europe’s data capabilities and infrastructures; (3) Competences and skills of individuals and SMEs; (4) Common European Data Spaces in nine strategic areas such as industrial manufacturing, mobility, health, and energy. The project BOOST 4.0 developed a prototype for the industrial manufacturing sector, called European Industrial Data Space (EIDS), an endeavour of 53 companies. The publication will show the developed architectural pattern as well as the developed components and introduce the required infrastructure that was developed for the EIDS. Additionally, the population of such a data space with Big Data enabled services and platforms is described and will be enriched with the perspective of the pilots that have been build based on EIDS. KeywordsData SpacesData treasuresData sharingTrustDigital sovereigntyEIDSIDSAOpen sourceInteroperabilitySemantic modelQIFFIWARECertification
Full-text available
In our societies, there is a growing demand for the production and use of more data. Data is reaching the point that is driving all the social and economic activities in every industry sector. Technology is not going to be a barrier anymore; however, where there is large deployment of technology, the production of data creates a growing demand for better data-driven services, and at the same time the benefits of the production of the data are at large an impulse for a global data economy, Data has become the business’s most valuable asset. In order to achieve its full value and help data-driven organizations to gain competitive advantages, we need effective and reliable ecosystems that support the cross-border flow of data. To this end, data ecosystems are the key enablers of data sharing and reuse within or across organizations. Data ecosystems need to tackle the various fundamental challenges of data management, including technical and nontechnical aspects (e.g., legal and ethical concerns). This chapter explores the Big Data value ecosystems and provides a detailed overview of several data platform implementations as best-effort approaches for sharing and trading industrial and personal data. We also introduce several key enabling technologies for implementing data platforms. The chapter concludes with common challenges encountered by data platform projects and details best practices to address these challenges.
Full-text available
This chapter focuses on data interoperability best practices related to semantic technologies and data management systems. It introduces a particular view on how relevant data interoperability is achieved and its effects on developing technologies for the financial and insurance sectors. Financial technology (FinTech) and insurance technology (InsuranceTech) are rapidly developing and have created new business models and transformed the financial and insurance services industry in the last few years. The transformation is ongoing, and like many other domains, the vast amount of information available today known as Big Data, the data generated by IoT, and AI applications and also the technologies for data interoperability, which allows data nowadays to be reused, shared, and exchange, will have a strong influence. It is evident the entire financial sector is in a moment of new opportunities with a new vision for substantial growth. This book chapter analyzes the basis of data space design and discusses the best practices for data interoperability by introducing concepts and illustrating the way to understand how to enable the interoperability of information using a methodological approach to formalize and represent financial data by using semantic technologies and information models (knowledge engineering). This chapter provides a state-of-the-art offer called INFINITECH Way using the discussed best practices and explains how semantics for data interoperability are introduced as part of the FinTechs and InsuranceTech.
Full-text available
Data has been identified as a valuable input to boost enterprises. Nowadays, with the vast quantity of data available, a favorable scenario is established to exploit it, but crucial challenges must be addressed, highlighting its sharing and governance. In this context, the data space ecosystem is the cornerstone which enables companies to share and use valuable data assets. However, appropriate Data Governance techniques must be established to benefit from such opportunity considering two levels: internal to the organization and at the level of sharing between organizations. At a technological level, to reach this scenario, companies need to design and provision adequate data platforms to deal with Data Governance in order to cover the data life-cycle. In this chapter, we will address questions such as: How to share data and extract value while maintaining sovereignty over data, confidentiality, and fulfilling the applicable policies and regulations? How does the Big Data paradigm and its analytical approach affect correct Data Governance? What are the key characteristics of the data platforms to be covered to ensure the correct management of data without losing value? This chapter explores these challenges providing an overview of state-of-the-art techniques.
Full-text available
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Full-text available
The structure of the Semantic Web gives users the power to share and collaboratively generate decentralized linked data. In many cases, though, collaboration requires some form of authentication and authorization to ensure the security and integrity of the data being generated. Traditional authorization systems that rely on centralized databases are insufficient in this scenario, since they rely on the exis-tence of a central authority that is not available on the Semantic Web. In this paper, we present a scalable system that allows for decentralized user authentication and authorization. The system described supports per-document access control via an RDF metadata file containing an access control list (ACL). A simple interface allows authorized users to view and edit the RDF ACL directly in a Web browser. The system allows users to efficiently manage read and write access to linked data without a centralized authority, enabling a collaborative authoring environment suited to the Semantic Web.
Full-text available
Social applications are one of the fastest growing areas in the Web. However, privacy issues ensue if all information of all users of these applica-tions is stored on a single computer system. With small extensions to Semantic Web technologies and Linked Data concepts, a distributed approach to the social web is possible, where users retain fine-grained control over their data and are still able to combine their data with users on different systems. We describe our concept of a Policy-enabled Linked Data Server (PeLDS) obeying user-defined access policies for the stored information. PeLDS also supports configuration-free distributed authentication. Access policies are expressed in a newly devel-oped compact notation for the Semantic Web Rule Language. Authentication is performed using SSL certificates and the FOAF+SSL verification approach. We evaluate our concept using a prototype implementation and a distributed address book application.
Full-text available
The term "Linked Data" refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Full-text available
In this paper we present RelBAC(for Relation Based Access Control), a model and a logic for access control which models communities, possibly nested, and resources, possibly organized inside complex file systems, as lightweight ontologies, and permissions as relations between subjects and objects. RelBACallows us to represent expressive access control rules beyond the current state of the art, and to deal with the strong dynamics of subjects, objects and permissions which arise in Web 2.0 applications (e.g. social networks). Finally, as shown in the paper, using RelBAC, it becomes possible to reason about access control policies and, in particular to compute candidate permissions by matching subject ontologies (representing their interests) with resource ontologies (describing their characteristics).
The Semantic Web consists of many RDF graphs nameable by URIs. This paper extends the syntax and semantics of RDF to cover such named graphs. This enables RDF statements that describe graphs, which is beneficial in many Semantic Web application areas. Named graphs are given an abstract syntax, a formal semantics, an XML syntax, and a syntax based on N3. SPARQL is a query language applicable to named graphs. A specific application area discussed in detail is that of describing provenance information. This paper provides a formally defined framework suited to being a foundation for the Semantic Web trust layer.
Conference Paper
Semantic Web databases allow efficient storage and access to RDF statements. Applications are able to use expressive query languages in order to retrieve relevant metadata to perform different tasks. How-ever, access to metadata may not be public to just any application or service. Instead, powerful and flexible mechanisms for protecting sets of RDF statements are required for many Semantic Web applications. Un-fortunately, current RDF stores do not provide fine-grained protection. This paper fills this gap and presents a mechanism by which complex and expressive policies can be specified in order to protect access to metadata in multi-service environments.
The World Wide Web has enabled the creation of a global information space comprising linked documents. As the Web becomes ever more enmeshed with our daily lives, there is a growing desire for direct access to raw data not currently available on the Web or bound up in hypertext documents. Linked Data provides a publishing paradigm in which not only documents, but also data, can be a first class citizen of the Web, thereby enabling the extension of the Web with a global data space based on open standards-the Web of Data. In this Synthesis lecture we provide readers with a detailed technical introduction to Linked Data. We begin by outlining the basic principles of Linked Data, including coverage of relevant aspects of Web architecture. The remainder of the text is based around two main themes-the publication and consumption of Linked Data. Drawing on a practical Linked Data scenario, we provide guidance and best practices on: Architectural approaches to publishing Linked Data; choosing URIs and vocabularies to identify and describe resources; deciding what data to return in a description of a resource on the Web; methods and frameworks for automated linking of data sets; and testing and debugging approaches for Linked Data deployments. We give an overview of existing Linked Data applications and then examine the architectures that are used to consume Linked Data from the Web, alongside existing tools and frameworks that enable these. Readers can expect to gain a rich technical understanding of Linked Data fundamentals, as the basis for application development, research or further study.