ChapterPDF Available

Creating a Vocabulary for Data Privacy: The First-Year Report of Data Privacy Vocabularies and Controls Community Group (DPVCG)

Authors:

Abstract and Figures

Managing privacy and understanding handling of personal data has turned into a fundamental right, at least within the European Union, with the General Data Protection Regulation (GDPR) being enforced since May 25th 2018. This has led to tools and services that promise compliance to GDPR in terms of consent management and keeping track of personal data being processed. The information recorded within such tools, as well as that for compliance itself, needs to be interoperable to provide sufficient transparency in its usage. Additionally, interoperability is also necessary towards addressing the right to data portability under GDPR as well as creation of user-configurable and manageable privacy policies. We argue that such interoperability can be enabled through agreement over vocabularies using linked data principles. The W3C Data Privacy Vocabulary and Controls Community Group (DPVCG) was set up to jointly develop such vocabularies towards interoperability in the context of data privacy. This paper presents the resulting Data Privacy Vocabulary (DPV), along with a discussion on its potential uses, and an invitation for feedback and participation.
Content may be subject to copyright.
Creating A Vocabulary for Data Privacy ?
The First-Year Report of Data Privacy Vocabularies and
Controls Community Group (DPVCG)
Harshvardhan J. Pandit1, Axel Polleres2, Bert Bos3, Rob Brennan4, Bud
Bruegger5, Fajar J. Ekaputra6, Ramisa Gachpaz Hamed1, Elmar Kiesling6,
Mark Lizar7, Eva Schlehan5, Simon Steyskal8, and Rigo Wenning3
1Trinity College Dublin, Ireland
2Vienna University of Economics and Business, Austria
3W3C/ERCIM
4Dublin City University, Ireland
5Unabhängige Landeszentrum für Datenschutz Schleswig-Holstein, Germany
6Vienna University of Technology, Austria
7OpenConsent/Kantara Initiative, United Kingdom
8Siemens, Austria
Abstract. Managing privacy and understanding handling of personal
data has turned into a fundamental right, at least within the Euro-
pean Union, with the General Data Protection Regulation (GDPR) be-
ing enforced since May 25th 2018. This has led to tools and services
that promise compliance to GDPR in terms of consent management
and keeping track of personal data being processed. The information
recorded within such tools, as well as that for compliance itself, needs
to be interoperable to provide sufficient transparency in its usage. Ad-
ditionally, interoperability is also necessary towards addressing the right
to data portability under GDPR as well as creation of user-configurable
and manageable privacy policies. We argue that such interoperability
can be enabled through agreement over vocabularies using linked data
principles. The W3C Data Privacy Vocabulary and Controls Community
Group (DPVCG) was set up to jointly develop such vocabularies towards
interoperability in the context of data privacy. This paper presents the
resulting Data Privacy Vocabulary (DPV), along with a discussion on
its potential uses, and an invitation for feedback and participation.
Keywords: Privacy ·GDPR ·Interoperability ·Semantic Web
?Corresponding authors: Harshvardhan J. Pandit pandith@tcd.ie and Axel
Polleresaxel.polleres@wu.ac.at. We thank all members of the W3C DPVCG for
their feedback and input to this work: a preliminary outline of the goals of CG has
been presented in ISWC2018’s SWSG workshop [5] where we also gathered valuable
feedback by the participants; this work is the first complete presentation of the result-
ing, proposed vocabulary elaborated by the DPVCG since. This work was supported
by the European Union’s Horizon 2020 research and innovation programme under
grant 73160. Harshvardhan J. Pandit is funded by the ADAPT Centre for Digital
Excellence funded by SFI Research Centres Programme (Grant 13/RC/2106) and
co-funded by European Regional Development Fund.
2 Pandit, Polleres, et al.
1 Introduction
Concerns regarding privacy and trust have been raised to a point where regula-
tors, citizens, and companies have started to take action. Services on the Web
are often very complex orchestrations of co-operation between multiple actors,
and the processing of personal data in Big Data environments is becoming more
complex while being less transparent.
Yet, while from a legal point of view, the adoption of the General Data Pro-
tection Regulation (GDPR) [9] in April 2016, as well the California Consumer
Privacy Act (CCPA) [1] of 2018 regulate processing of personal data, their tech-
nical implementation in operative IT systems is far from being standardised.
While building privacy-by-design [7] into systems is a much wider scope, we lack
the tools, standards, and best practices for those wanting to be good citizens
of the Web to provide interoperable and understandable privacy controls, or to
keep records of data processing in an accountable manner, with the possible ex-
ception of work on permissions [14] and tracking protection [11], but even those
only cover partial aspects.
To this end, the work presented in this paper aims to set a basis for the es-
tablishment of interoperable standards in this domain. In particular, it addresses
the following gaps by complementing existing (W3C) standards:
There are no standard vocabularies to describe and interchange personal
data. Such vocabularies are relevant, for instance, to support data subjects’
right to data portability under Article 20 of the GDPR [9].
There are no agreed upon vocabularies or taxonomies for describing purposes
of personal data handling and categories of processing : the GDPR requires
legal bases for data processing, including consent, be tied to the specific
purposes and processing of personal data to justify their lawful use. Conse-
quently personal data processing should be logged with a standard reference
to a purpose which complies with the norms set by the legal bases - such as
the individual’s consent. The concrete taxonomies for representing this in-
formation in the context of personal data handling are not yet standardised.
There are no agreed upon vocabularies or ontologies that align the termi-
nology of privacy legislations - such as the GDPR, to allow organisations to
claim compliance with such regulations using machine-readable information.
The herein presented Data Privacy Vocabulary (DPV) aims at addressing
these challenges by providing a comprehensive, standardized way set of terms
for annotating provacy policies, consent receipts, and - in general - records of
personal data handling. To this end, the rest of the paper is structured as fol-
lows: Section 2 explains the setup and governance of the DPV Community group
within the World Wide Web Consortium (W3C) whereafter Section 3 summa-
rizes pre-existing relevant vocabularies and standards that served as inputs. Sec-
tion 4 describes the methodology that we applied in reconciling these towards
the DPV vocabulary. The vocabulary itself and its modules are described in
Section 5 (ommitting detailed descriptions of all classes and properties, which
can be found in the published W3C CG draft at https://www.w3.org/ns/dpv).
Creating A Vocabulary for Data Privacy 3
We close with a discussion of applications and adoption (Section 6) followed by
conclusions and a call for participation and feedback (Section 7).
2 DPVCG: Data Privacy Vocabularies and Controls CG
To address the gaps mentioned in Section 1, a W3C workshop was announced9,
which received 32 position statements and expressions of interest. These were
used to create an agenda based on standards and solutions for interoperable
privacy. The workshop took place on 17th and 18th April 2018 in Vienna and
consisted of about 40 participants. Discussions and interactions were structured
into sessions around the four themes of: (1) ‘relevant vocabularies and initiatives’,
(2) ‘industry perspective’, (3) ‘research topics’, and (4) ‘governmental side and
initiatives’. The workshop concluded with a discussion of the next steps and pri-
orities in terms of standardisation and interoperability. The identified goals were
(from highest to lowest priority): taxonomies for regulatory privacy terms (in-
cluding GDPR), personal data, purposes, disclosure and consent (as well as other
legal bases), details of anonymisation (and measures taken to protect personal
data), and for recording logs of personal data processing.
Following this, a W3C Community Group (CG) with the title ‘Data Privacy
Vocabularies and Controls CG’ (DPVCG) was formally established on 25th May
2018 - the implementation date of the GDPR. The group has a total of 55
participants to date representing academia, industry, legal experts, and other
stakeholders. Its discussions are open via the public mailing list10, along with a
wiki11 documenting meetings, resources, general information.
The CG had its first face-to-face meeting on 30th August 2018 co-located
with the MyData 201812 conference at Helsinki, Finland. The goal of this meet-
ing was agreement on the first steps and deliverables of the CG as well as estab-
lishment of meeting and management procedures. The outcome of this meeting
was agreement on working towards the following deliverables:
– Use cases and requirements: Collect and align common requirements
from industry and stakeholders to identify areas where interoperability is
needed in the handling of personal data. The outcome of this was a prioritised
list of requirements to enable interoperability in the identified use-cases.
– Alignment of vocabularies and identification of overlaps: Collect
existing vocabularies and standardisation efforts, and identify their overlaps
and suitability for covering the requirements prioritised in step one. The
identified vocabularies are presented in Section 3.
Glossary of GDPR terms: An understandable and interoperable glossary
of common terms from the GDPR and an analysis of how they are covered
by the agreed vocabularies.
9https://www.w3.org/2018/vocabws/
10 https://lists.w3.org/Archives/Public/public-dpvcg/
11 https://www.w3.org/community/dpvcg/wiki/Main_Page
12 https://mydata2018.org/
4 Pandit, Polleres, et al.
Vocabularies: Based on the heterogeneity or homogeneity of identified use-
cases and requirements, create a set of (modular) vocabularies for exchanging
and representing information in an interoperable form for personal data,
purposes, processing, consent, anonymisation, and transparency logs. The
resulting vocabulary is presented in Section 5.
A second face-to-face meeting was conducted on 3rd and 4th December 2019 at
Vienna, Austria. The goal of this meeting was to analyse the collected use-cases
and vocabularies, to establish agreement on the requirements for vocabularies
to be delivered, and to plan ahead towards their conception and completion. A
third face-to-face was organised on 4th and 5th April 2019 in Vienna and Dublin
to finalise the vocabulary and reach an agreement towards the first public draft.
The outcome of the meeting was agreement of terms used and its expression using
RDF and OWL. The meeting also provided agreement over the namespace of the
vocabulary, its hosting, and documentation. After over a year of collaborative
effort, the CG published the ‘Data Privacy Vocabulary‘ (DPV) on 25th July
2019. The CG is currently welcoming feedback for DPV from the community
and stakeholders in terms of comments, suggestions, and contributions.
3 Existing and Relevant Vocabularies
Existing relevant use cases and vocabularies were collected and documented
in the wiki13 through individual submissions by CG members. The wiki page
for each vocabulary presents a summary, its relevance, covered requirements,
uptake, and applicable use-cases. Relevant terms were then identified from each
vocabulary and categorised as per requirements. These were used as the basis
for discussions regarding terms to be included and aligned in the DPV.
3.1 Existing Standards and Standardisation Efforts
The CG considered several web-relevant standards for terms relevant towards
identified requirements: PROV-O [16] (and its extension P-Plan [12]) for prove-
nance, ODRL [14] for expressing policies, vCard [13] for describing people and
organisations, Activity Streams [26] for describing activities on the web, and
Schema.org [25] for metadata used in description of web pages.
The CG also considered standardisation efforts undertaken by bodies rel-
evant to the areas of privacy and interoperability. Classification of Everyday
Living (COEL) [8] describes a privacy-by-design framework for the collection
and processing of behavioural data with a focus on transparency and pseudo-
anonymisation. It was developed by OASIS, which is a non-profit organisation
dedicated to the development of open standards.
The ISA2is a programme by the by the European Parliament and the Coun-
cil of European Union for development of interoperable framework and solu-
tions, which includes a set of vocabularies, termed ‘Core Vocabularies’ [2], for
13 https://www.w3.org/community/dpvcg/wiki/Use-Cases,_Requirements,
_Vocabularies
Creating A Vocabulary for Data Privacy 5
person, business, location, criterion and evidence, and public organisation. IEEE
P7012 [19] is a work-in-progress effort to standardise privacy terms in a machine-
readable manner for use and sharing on the web.
Consent Receipt [17] is an interoperable standard developed by the Kantara
Initiative for capturing the consent given by a person regarding use of their
personal data. The standard enables creation of receipts in human as well as
machine readable formats for expressing information using pre-defined categories
for personal data collection, purposes, and its use and disclosire. However, it does
not address the requirements specified by the GDPR.
The Platform for Privacy Preferences Project (P3P) [18] is a (now-abandoned)
protocol for websites to declare their intended use of personal data collection and
usage with an emphasis on providing users with more control of their personal
information when browsing the web. P3P provided a machine-readable vocabu-
lary for websites and users to define their policies, which were then compared to
determine privacy actions.
3.2 Vocabularies addressing Privacy and GDPR
The Scalable Policy-aware Linked Data Architecture For Privacy, Transparency
and Compliance (SPECIAL) is an European H2020 project that uses semantic-
web technologies in the expression and evaluation of information for GDPR
compliance. SPECIAL has developed vocabularies for expressing Usage Policy
[6] and Policy Log [4] in order to evaluate whether the recorded use of personal
data is compliant with a given consent.
Mining and Reasoning with Legal Texts (MIREL) is another European H2020
project that uses semantic-web technologies for GDPR compliance. It has devel-
oped PrOnto (Privacy Ontology for Legal Reasoning) [20] - a legal ontology of
concepts consisting of privacy agents, personal data types, processing operations,
rights and obligations.
GDPRtEXT [22] provides a linked data version of the text of the GDPR that
makes it possible for links to be established between information and the text of
the GDPR by using RDF and OWL. It also provides a thesauri or vocabulary of
concepts defined or referred to within the GDPR in a machine-readable manner
using SKOS.
GDPRov [23] is an ontology to represent processes and activities associated
with the lifecycle of personal data and consent as an abstract model or plan
indicating what is supposed to happen, as well as the corresponding activity
logs indicating things that have happened. It extends PROV-O and P-Plan with
GDPR-specific terminology. GConsent [21] is an ontology for expressing nec-
essary information for management and evaluating compliance of consent as
governed by the obligations and requirements of the GDPR.
Considered ontologies developed prior to implementation of GDPR also in-
clude an ontology to express privacy preferences [24], a data protection ontology
based on the GDPR [3], and an ontology for expressing consent [10].
6 Pandit, Polleres, et al.
4 Methodology
Following the collection of vocabularies, relevant terms were documented in the
wiki14, and were used as the basis for further discussion for addressing the re-
quirements. While initially working towards a taxonomy of terms, the necessity
of representing relationships and logic led towards an RDF/OWL based ontology.
The process of ontology development was (informally) loosely based on NeOn
methodology scenarios [27], with the CG using the SPECIAL Usage Policy Lan-
guage [6] as the base ontology combined with modular ontologies representing
personal data categories, purposes, processing, technical and organisational mea-
sures, legal basis, and consent.
The aim of the ontology was stated to provide an extendable mechanism
for representing information by providing the necessary top-level concepts and
relationships in a hierarchical structure. To this end, an analysis of existing
vocabularies was carried out to determine their suitability, which revealed a lack
of top-level concepts which could be readily incorporated. Therefore, the CG
created the necessary concepts by inviting contributions and reviewing them
through discussions.
The agreement over how terms were proposed, discussed, and added was
documented through a collaborative spreadsheet hosted on the Google Sheets
platform15. The spreadsheet contained separate tabs for each ‘modular’ ontology
and a base ontology representing combined their combined usage to represent
personal data handling. The columns in the spreadsheet were mapped to seman-
tic web representations, as depicted in Table 1. The vocabulary was created by
using the Google Drive API in a script16 that extracted terms and generated
RDF serialisations using rdflib17 and documentation using ReSpec18 .
5 Data Privacy Vocabulary
As a result of the process above, the ‘Data Privacy Vocabulary‘ (DPV) has been
published on 25th July 2019 at the namespace http://w3.org/ns/dpv (for which
we will use the prefix dpv:) as a public draft for feedback. The current vocabu-
lary provides terms (classes and properties) to annotate and categorise instances
of legally compliant personal data handling. In particular, DPV provides extensi-
ble concepts and relationships to describe the following components (which are
elaborated in further sections):
1. Personal Data Categories
2. Purposes
3. Processing Categories
14 https://www.w3.org/community/dpvcg/wiki/Taxonomy
15 https://www.google.com/sheets/about/
16 https://github.com/dpvcg/extract-sheets/
17 https://github.com/RDFLib/rdflib
18 https://github.com/w3c/respec
Creating A Vocabulary for Data Privacy 7
Table 1. Columns in spreadsheet for generating RDF serialisations and documentation
Column Name Description Representation
Class/Property If term is Class or Property rdfs:Class|rdfs:Property
term The IRI of the term as IRI
description Description or definition dct:description
domain Domain if it is a property rdfs:domain
range Range if it is a property rdfs:range
super classes/properties Parent classes or properties rdfs:isSubClassOf
sub classes/properties Child classes or properties N/A
related terms Terms relevant to this rdfs:seeAlso
how related? Nature of relation use as is
comments Comments used for discussion N/A
source The source of the term rdfs:isDefinedBy
date Date of creation dct:created
status Status e.g. accepted,proposed sw:term_status
comments Comments to be recorded rdfs:comment
contributor dc:creator dct:creator
date-accepted Date of acceptance dct:date-accepted
resolution Record e.g. minutes of meeting as IRI
4. Technical and Organisational Measures
5. Legal Basis
6. Consent
7. Recipients, Data Controllers, Data Subjects
These terms are intended to express Personal Data Handling in a machine-
readable form by specifying the personal data categories undergoing some pro-
cessing, for some purpose, by data controller, justified by legal basis, with specific
technical and organisational measures, which may result in data being shared
with some recipient.
The vocabulary is built up in a modular fashion, where each ‘module’ covers
one of the above listed aspects, and which is linked together using a core Base
Vocabulary.
5.1 Base Ontology
The ‘Base Ontology’ describes the top-level classes defining a policy for legal
personal data handling. Classes and properties for each top-level class are fur-
ther elaborated using sub-vocabularies, which are available as separate mod-
ules and are outlined in subsequent sections. While all concepts in DPV share
a single dpv: namespace, the modular approach of providing the base ontol-
ogy as a separate module makes it possible to use sub-vocabularies without the
dpv:PersonalDataHandling class, for example to refer only to purposes. Excep-
tions to this are the NACE purpose taxonomy (cf. details Section 5.3) extending
the dpv:Sector concept in the Purposes vocabulary, and the GDPR legal bases
8 Pandit, Polleres, et al.
taxonomy (cf. details in Section 5.6) extending the top-level dpv:LegalBasis
class - which are provided under a separate namespaces to indicate their special-
isation. The core concepts of the Base Ontology module and their relationships
are depicted in Figure 1.
Fig. 1. DPV Base Ontology classes and properties
5.2 Personal Data Categories
DPV provides broad top-level personal data categories adapted from the tax-
onomy provided by EnterPrivacy19. The top-level concepts in this taxonomy
refer to the nature of information (financial, social, tracking) and to its inherent
source (internal, external). Each top-level concept is represented in the DPV as
a class, and is further elaborated by subclasses for referring to specific categories
of information - such as preferences or demographics.
Regulations such as the GDPR allow information about personal data used
in processing to be provided either as specific instances of persona data (e.g.,
“John Doe”) or as categories (e.g., name). Additionally, the class dpv:Special-
CategoryOfPersonalData represents categories that are ‘special’ or ‘sensitive’
and require additional conditions as per GDPR’s Article 9.
The categories defined in the personal data taxonomy can be used directly
or further extended to refer to the scope of personal data used in processing.
The taxonomy can be extended by subclassing the respective classes to depict
specialised concepts, such as “likes regarding movies” or combined with classes
to indicate specific contexts. The class dpv:DerivedPersonalData is one such
context where information has been derived from existing information, e.g., in-
ference of opinions from social media. Additional classes can be defined to specify
contexts such as use of machine learning, accuracy, and source.
While the taxonomy is by no means exhaustive, the aim is to provide a
sufficient coverage of abstract categories of personal data which can be extended
using the subclass mechanism to represent concepts used in the real-world. For
instance, Figure 2. shows the hierarchy of concepts for classifying depictions of
individuals in pictures.
19 https://enterprivacy.com/2017/03/01/categories-of-personal-information/
Creating A Vocabulary for Data Privacy 9
Fig. 2. Hierarchy of concepts for classifying depictions of individuals in pictures (in-
spired by EnterPrivacy)
5.3 Purposes
DPV at present defines a hierarchically (by subclassing) organized set of generic
categories of data handling purposes, as depicted in Figure 3. Overall, DPV
provides a list of 31 suggested purposes as subclasses of these generic purposes
which may be extended as shown in Listing 1 by further subclassing to create
more specific ones. As regulations such as the GDPR generally require a specific
purpose to be declared in an understandable manner, we suggest to such declare
specific purposes as subclasses of one or several dpv:Purpose categories to make
them as specific as possible, and to always annotate them with a human readable
description (e.g., by using rdfs:label and rdfs:comment).
Fig. 3. Categories of Purposes for Data Processing in DPV
10 Pandit, Polleres, et al.
1:NewPurpose
2rdfs:subClassOf dpv:DeliveryOfGoods,dpv:FraudPreventionAndDetection ;
3rdfs:label "New Purpose" ;
4rdfs:comment "Intended delivery of goods with fraud prevention" .
Listing 1: Extending pre-defined purposes with human-readable descriptions
Moreover, purposes can be further restricted to specific contexts using the
class dpv:Context and the property dpv:hasContext. Similarly, DPVCG pro-
vides a way to restrict purposes to a specific business sector, i.e., allowing/re-
stricting data handling to purposes related to particular business activities, using
the class dpv:Sector and the property dpv:hasSector. Potential hierarchies for
defining such business sectors include NACE20 (EU), NAICS21 (USA), ISIC22
(UN), and GICS23. At the moment, we recommend to use NACE (EU) codes
using dpv-nace:NACE-CODE as shown in Listing 2, where the prefix dpv-nace:
represents the DPV defined namespace http://www.w3.org/ns/dpv-nace#.
1:SomePurpose adpv:Purpose ;
2rdfs:label "Some Purpose" ;
3dpv:hasSector dpv-nace:M72 .
Listing 2: Creating a new purpose and restricting it to Scientific Research using
the NACE sector code (M.72)
5.4 Processing Categories
In this module, DPV provides a hierarchy of classes to specify operations asso-
ciated with processing of personal data, which are required by regulations such
as the GDPR. As common processing operations such as collect, share, and use
have certain constraints or obligations in GDPR, it is necessary to accurately
represent and define them for personal data handling. While the term ‘use’ is
liberally used to refer to a broad range of processing categories in privacy notices,
we recommend to select the most appropriate and specific terms to accurately
reflect the nature of processing as applicable.
20 https://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=
LST_NOM_DTL&StrNom=NACE_REV2
21 https://www.census.gov/eos/www/naics/
22 https://unstats.un.org/unsd/classifications
23 https://en.wikipedia.org/wiki/Global_Industry_Classification_Standard#
cite_note-mapbook-1
Creating A Vocabulary for Data Privacy 11
DPV defines top-level classes to represent the following broad categories of
processing - Disclose, Copy, Obtain, Remove, Store, Transfer, Transform, and
Use, as shown in Figure 4. Each of these are then again further expanded using
subclasses to provide 33 processing categories, which includes terms defined in
the definition of processing in GDPR (Article 4-2).
The DPVCG taxonomy further provides properties with a boolean range to
indicate the nature of processing regarding Systematic Monitoring,Evaluation
or Scoring,Automated Decision-Making,Matching or Combining,Large Scale
processing, and Innovative use of new solutions, as these are relevant towards
assessment of processing for GDPR compliance.
Fig. 4. Categories of Data Processing in DPV
5.5 Technical and Organisational Measures
Regulations require certain technical and organisational measures to be in place
depending on the context of processing involving personal data. For example,
GDPR (Article 32) states implementing appropriate measures by taking into
account the state of the art, the costs of implementation and the nature, scope,
context and purposes of processing, as well as risks, rights and freedoms. Exam-
ples of measures stated in the article states include:
the pseudonymisation and encryption of personal data
the ability to ensure the ongoing confidentiality, integrity, availability and
resilience of processing systems and services
the ability to restore the availability and access to personal data in a timely
manner in the event of a physical or technical incident
12 Pandit, Polleres, et al.
a process for regularly testing, assessing and evaluating the effectiveness
of technical and organisational measures for ensuring the security of the
processing
To address these requirements, DPV defines a module comprising of a hierar-
chical vocabulary for declaring such technical and organisational measures, as
shown in Figure 5.
For any of the DPV declared measures, we provide a generic ObjectProperty
(dpv:measureImplementedBy), and for the values of this attribute, we either
allow a blank node with a single rdfs:comment to describe the measure, or a
URI to a standard or best practice followed, i.e. a well-known identifier for that
standard or a URL where the respective document describes the standard. The
class StorageRestriction represents the measures used for storage of data with
two specific properties provided for storage location and duration restrictions.
While at the moment, we do not yet refer to specific certifications or security
standards, in the future, we plan to provide a collection of URIs for identifying
recommended standards and best practices, as they further develop. Feedback
on adding specific ones to future versions of the DPV specification is particularly
welcome.
Fig. 5. Technical and Organisational Measures in DPV
5.6 Consent and other Legal Bases
While the vocabulary provides dpv:LegalBasis as a top-level concept represent-
ing the various legal bases that can be used for justifying processing of personal
data, such legal bases may be defined differently in different legislations within
the scope of legal jurisdictions. For the particular case of GDPR, we therefore
Creating A Vocabulary for Data Privacy 13
provide the legal bases specific to GDPR as a separate aligned vocabulary, under
the https://www.w3.org/ns/dpv-gdpr namespace (prefix: dpv-gdpr:).
This vocabulary defines the legal bases defined by Articles 6 and 9 of the
GDPR, including consent, along with their description and source within. For
example, dpv-gdpr:A6-1-b denotes the legal basis provided by fulfillement/per-
formance of a contract.
In addition to the legal bases, Consent is addressed with additional properties
and classes within the core DPV vocabulary as it is a common form of legal
justification across jurisdictions. The module describing consent, illustrated in
Figure 6, provides the necessary terms to describe consent provision, withdrawal,
and expiry. This is based on an analysis of existing work in the form of Consent
Receipt [17] and GConsent [21].
Fig. 6. Consent in DPV
5.7 Recipients, Data Controllers, and Data Subjects
Last but not least, this module of the ontology is meant for defining a tax-
onomy of stakeholders involved in Personal DataHandling, extending the top
level classes dpv:DataController,dpv:DataSubject, and dpv:Recipient from
the Base vocabulary module. We consider defining recipients is important in
the context of data privacy as it allows tracking the entities personal data is
shared/transferred with. Similarly, a categorisation of Data Controllers and Data
Subjects has bearing on the privacy of personal data handling, especially when
considering situations such as where data subjects are children. The vocabu-
lary currently provides only a few top-level classes to describe such recipients
and data subjects, with an invitation to suggest/provide more terms for future
releases:
dpv:Child as a subclass of dpv:DataSubject in order to capture policies
and restrictions of data Handling related to children;
14 Pandit, Polleres, et al.
dpv:Processor as a subclass of dpv:Recipient to denote natural or legal
persons, public authorities, agencies or other bodies which processes personal
data on behalf of the controller ;
dpv:ThirdParty as a subclass of dpv:Recipient to provide a generic class
for third party recipients, i.e. natural or legal persons, public authorities,
agencies or bodies other than the data subject, controller, processor and per-
sons who, under the direct authority of the controller or processor, are au-
thorised to process personal data.
6 Potential Adoption and Usage
The primary aim of DPV is to assist in the representation of information con-
cerning privacy in the context of personal data processing. To this end, it models
concepts at an abstract or top-level to cover a broad range of concepts. This shall
enable the DPV to be used as an domain-independent vocabulary which can be
extended or specialised for specific domains or use-cases. Though the DPV does
not define or restrict how such extension should be created, this section high-
lights some suggested methods for its adoption and usage.
Firstly, the modular nature of DPV enables adoption of a selected subset
of the vocabulary only to address a specific use-case. For example, an adopter
may only wish to utilise the concepts under Purpose and PersonalDataCategory
without using/describing all aspects of a particular PersonalDataHandling from
the base vocabulary.
In addition, the use of RDFS and OWL enables extending the DPV in a
compatible manner to define domain-specific use-cases. For example, an exten-
sion targeting the finance domain can define additional concepts by using RDFS’
subclass mechanism. Such an extension, when represented as an ontology, will be
compatible with the DPV, and will enable semantic interoperability of informa-
tion, and ideally applications such as automated compliance checking for privacy
policies and data handling records annotated with DPV and its extensions.
The DPV is intended to be used as an interoperable vocabulary where terms
are structured in a hierarchy and have unambiguous definition to enable common
agreement over their semantics. Such usage involves limiting the concepts to
other pre-defined vocabulary, as seen in the case of Consent Receipts [17] and
the SPECIAL vocabularies [4].
The SPECIAL project actually has demonstrated how the above-mentioned
use case of automated compliance checking can be implemented based modeling
privacy policies and log records of personal data handling in a manner compatible
with DPV, cf. [15]. The SPECIAL project 24 with its industry use case partners
may also be viewed as a set of early adopters of the DPV, where currently further
tools and a scalable architecture for transparent and accountable personal data
processing in accordance with GDPR is being developed.
24 http://www.specialpricacy.eu
Creating A Vocabulary for Data Privacy 15
7 Conclusion
The Data Privacy Vocabulary is the outcome of cumulative effort of over a year
in W3C’s Data Privacy Vocabulary and Controls Community Group (DPVCG).
It represents the first step towards an effort to provide a standardised vocabulary
to represent instances of legally compliant personal data handling. To this end, it
provides a modular vocabulary representing concepts of personal data categories,
purposes of processing, categories of processing, technical and organisational
measures, legal bases, recipients, and consent.
With the onset of regulations in the privacy domain, the DPV fills an impor-
tant gap by providing the necessary terms in an interoperable and extendable
format. It is, to the best of our knowledge, currently the most comprehensive
vocabulary regarding definition of privacy-related terms in addition to being
aligned with regulations such as the GDPR, and attempting to comprehensively
cover the relevant aspects of personal data handling. For continued develop-
ment of this work, the DPVCG is currently inviting participation in the form of
comments, feedback, and suggestions. Specifically, the DPVCG kindly requests
proposals to extend its initial taxonomies by additional terms, where these are
missing or need refinements in order to describe specific use cases of personal
data handling.
Future plans also include producing documented examples of how the DPV
could be adopted for specific use-cases. Examples include annotating privacy
policies, documenting information for specific laws such as GDPR, and producing
transparent processing logs by mapping the DPV to existing database schemas.
References
1. Assembly Bill No. 375 Privacy: personal information: businesses. California
State Legislature (Jun 2018), https://leginfo.legislature.ca.gov/faces/
billTextClient.xhtml?bill_id=201720180AB375
2. Aleksandrova, Z.: Core Vocabularies (Nov 2016), https://ec.europa.eu/isa2/
solutions/core-vocabularies_en
3. Bartolini, C., Muthuri, R.: Reconciling Data Protection Rights and Obligations:
An Ontology of the Forthcoming EU Regulation. In: Workshop on Language and
Semantic Technology for Legal Domain. p. 8 (2015)
4. Bonatti, B.A., Dullaert, W., Fernandez, J.D., Kirrane, S., Milosevic, U., Polleres,
A.: The SPECIAL Policy Log Vocabulary (Nov 2018), https://aic.ai.wu.ac.
at/qadlod/policyLog/
5. Bonatti, P., Bos, B., Decker, S., Fernández, J.D., Kirrane, S., Peristeras, V.,
Polleres, A., Wenning, R.: Data privacy vocabularies and controls: Semantic web for
transparency and privacy. In: Semantic Web for Social Good Workshop (SWSG) co-
located with ISWC2018. CEUR Workshop Proceedings, vol. 2182. CEUR-WS.org
(Oct 2018), http://ceur-ws.org/Vol-2182/paper_3.pdf
6. Bonatti, P.A., Kirrane, S., Petrova, I.M., Sauro, L., Schlehahn, E.: The SPECIAL
Usage Policy Language, V0.1. Tech. rep. (2018), https://www.specialprivacy.
eu/vocabs
16 Pandit, Polleres, et al.
7. Cavoukian, A., et al.: Privacy by design: The 7 foundational principles. Information
and Privacy Commissioner of Ontario, Canada 5(2009)
8. Classification of Everyday Living Version 1.0 (Jan 2019), https://docs.
oasis-open.org/coel/COEL/v1.0/os/COEL-v1.0-os.pdf
9. European Parliament and Council: Regulation (EU) 2016/679 of the European
Parliament and of the Council of 27 April 2016 on the protection of natural persons
with regard to the processing of personal data and on the free movement of such
data, and repealing Directive 95/46/EC (General Data Protection Regulation)
(May 2016)
10. Fatema, K., Hadziselimovic, E., Pandit, H.J., Debruyne, C., Lewis, D., O’Sullivan,
D.: Compliance through Informed Consent: Semantic Based Consent Permission
and Data Management Model. In: Proceedings of the 5th Workshop on Society,
Privacy and the Semantic Web - Policy and Technology (PrivOn2017) (PrivOn)
(2017), http://ceur-ws.org/Vol-1951/PrivOn2017_paper_5.pdf
11. Fielding, R.T., Singer, D.: Tracking Preference Expression (DNT) (Jan 2019),
https://www.w3.org/TR/tracking-dnt/
12. Garijo, D., Gil, Y.: The P-PLAN Ontology (Mar 2014), http://vocab.
linkeddata.es/p-plan/
13. Iannella, R., McKinney, J.: vCard Ontology - for describing People and Organiza-
tions (May 2014), https://www.w3.org/TR/vcard-rdf/
14. Iannella, R., Villata, S.: ODRL Information Model 2.2 (Feb 2018), https://www.
w3.org/TR/odrl-model/
15. Kirrane, S., Bonatti, P., Fernández, J.D., Galdi, C., Sauro, L., Dell’Erba, D.,
Petrova, I., Siahaan, I.: SPECIAL deliverable d2.8 – transparency and compliance
algorithms v2 (Nov 2018), https://www.specialprivacy.eu/images/documents/
SPECIAL_D28_M23_V10.pdf
16. Lebo, T., Sahoo, S., McGuinness, D., Belhajjame, K., Cheney, J., Corsar, D., Gar-
ijo, D., Soiland-Reyes, S., Zednik, S., Zhao, J.: PROV-O: The PROV Ontology
(2013)
17. Lizar, M., Turner, D.: Consent Receipt Specification v1.1.0. Tech. rep.,
Kantara Initiative (2017), https://docs.kantarainitiative.org/cis/
consent-receipt-specification-v1-1-0.pdf
18. P3p: The Platform for Privacy Preferences, https://www.w3.org/P3P/
19. P7012 - Standard for Machine Readable Personal Privacy Terms, https://
standards.ieee.org/project/7012.html
20. Palmirani, M., Martoni, M., Rossi, A., Bartolini, C., Robaldo, L.: PrOnto: Privacy
Ontology for Legal Reasoning. In: Kő, A., Francesconi, E. (eds.) Electronic Gov-
ernment and the Information Systems Perspective. pp. 139–152. Lecture Notes in
Computer Science, Springer International Publishing (2018)
21. Pandit, H.J., Debruyne, C., O’Sullivan, D., Lewis, D.: GConsent - A Consent
Ontology Based on the GDPR. In: Hitzler, P., Fernández, M., Janowicz, K., Zaveri,
A., Gray, A.J., Lopez, V., Haller, A., Hammar, K. (eds.) The Semantic Web. pp.
270–282. Lecture Notes in Computer Science, Springer International Publishing
(2019), https://w3id.org/GConsent
22. Pandit, H.J., Fatema, K., O’Sullivan, D., Lewis, D.: GDPRtEXT - GDPR as a
Linked Data Resource. In: The Semantic Web - European Semantic Web Con-
ference. pp. 481–495. Lecture Notes in Computer Science, Springer, Cham (Jun
2018). https://doi.org/10/c3n4, https://link.springer.com/chapter/10.1007/
978-3-319-93417-4_31
Creating A Vocabulary for Data Privacy 17
23. Pandit, H.J., Lewis, D.: Modelling Provenance for GDPR Compliance using Linked
Open Data Vocabularies. In: Proceedings of the 5th Workshop on Society, Privacy
and the Semantic Web - Policy and Technology (PrivOn2017) (PrivOn) (2017),
http://ceur-ws.org/Vol-1951/PrivOn2017_paper_6.pdf
24. Sacco, O., Passant, A.: A Privacy Preference Ontology (PPO) for Linked Data.
In: LDOW. Citeseer (2011), http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.357.3591&rep=rep1&type=pdf
25. schema.org, https://schema.org/
26. Snell, J.M., Prodromou, E.: Activity Streams 2.0 (May 2017), https://www.w3.
org/TR/activitystreams-core/
27. Suárez-Figueroa, M.C., Gómez-Pérez, A., Fernández-López, M.: The NeOn
Methodology for Ontology Engineering. In: Suárez-Figueroa, M.C., Gómez-
Pérez, A., Motta, E., Gangemi, A. (eds.) Ontology Engineering in a Net-
worked World, pp. 9–34. Springer Berlin Heidelberg, Berlin, Heidelberg
(2012). https://doi.org/10.1007/978-3-642-24794-1_2, http://link.springer.
com/10.1007/978-3-642-24794-1_2
... The ONTOROPA project [12] proposes using semantic web ontologies and knowledge graphs for representing ROPA-related information, and using a blockchain to certify its integrity and authenticity. To address such challenges, research efforts at producing common terminology using semantic web vocabularies and ontologies have been developed [15,28]. Other approaches utilise such vocabularies to construct 'legal knowledge bases' and utilise them for compliance evaluation and monitoring which can help harmonise and facilitate a joint approach between legal departments and other stakeholders to identify feasible and compliant solutions around data protection and privacy regulations [11]. ...
... They do not consider the critical aspect of how the information required is maintained or generated within/by organisations and the stakeholders and information flows involved in this process. Some notable outputs for this are: BPR4GDPR's IMO [29], GDPRov [30], GConsent [31], DPV [15], GDPRtEXT [28], SPE-CIAL's ontologies [32], and PrOnto [33]. A recent survey (2022) provides further overview and details regarding ontologies and policy languages for representing information flows based on the GDPR [34]. ...
... PrOnto [33] provides concepts regarding data types, documents, agents and roles, purposes, legal bases (and more), but it is not available for reuse. DPV [15] also provides concepts regarding data categories, purposes, legal bases (and more), represents a community consensus, and is available for reuse. ...
Article
Full-text available
The GDPR requires Data Controllers and Data Protection Officers (DPO) to maintain a Register of Processing Activities (ROPA) as part of overseeing the organisation’s compliance processes. The ROPA must include information from heterogeneous sources such as (internal) departments with varying IT systems and (external) data processors. Current practices use spreadsheets or proprietary systems that lack machine-readability and interoperability, presenting barriers to automation. We propose the Data Processing Catalogue (DPCat) for the representation, collection and transfer of ROPA information, as catalogues in a machine-readable and interoperable manner. DPCat is based on the Data Catalog Vocabulary (DCAT) and its extension DCAT Application Profile for data portals in Europe (DCAT-AP), and the Data Privacy Vocabulary (DPV). It represents a comprehensive semantic model developed from GDPR’s Article and an analysis of the 17 ROPA templates from EU Data Protection Authorities (DPA). To demonstrate the practicality and feasibility of DPCat, we present the European Data Protection Supervisor’s (EDPS) ROPA documents using DPCat, verify them with SHACL to ensure the correctness of information based on legal and contextual requirements, and produce reports and ROPA documents based on DPA templates using SPARQL. DPCat supports a data governance process for data processing compliance to harmonise inputs from heterogeneous sources to produce dynamic documentation that can accommodate differences in regulatory approaches across DPAs and ease investigative burdens toward efficient enforcement.
... On the other hand, the research on representation and compliance checking for user consent and personal data handling also gained traction. Semantic Web technologies, with ontologies and reasoners as key resources, have been proposed to facilitate representing [38,52,54] and automated compliance checking [6,23] on usage policy, which are key to enable consent-check mechanisms for privacy-preserving data analysis. ...
... These two relations imply the use of the Data Privacy Vocabulary (DPV) [54] in our architecture that we will explain in the next subsection. ...
... To this end, there is a clear need to represent usage policies, both for user consent and analyst data handling, in a clear and concise manner. In recent years, a number of ontologies has been proposed to represent usage policies, e.g., DPV [54], PrOnto [52], SPECIAL [8] and SAVE [38], among others. ...
Article
Full-text available
Small and medium-sized organisations face challenges in acquiring, storing and analysing personal data, particularly sensitive data (e.g., data of medical nature), due to data protection regulations, such as the GDPR in the EU, which stipulates high standards in data protection. Consequently, these organisations often refrain from collecting data centrally, which means losing the potential of data analytics and learning from aggregated user data. To enable organisations to leverage the full-potential of the collected personal data, two main technical challenges need to be addressed: (i) organisations must preserve the privacy of individual users and honour their consent, while (ii) being able to provide data and algorithmic governance, e.g., in the form of audit trails, to increase trust in the result and support reproducibility of the data analysis tasks performed on the collected data. Such an auditable, privacy-preserving data analysis is currently challenging to achieve, as existing methods and tools only offer partial solutions to this problem, e.g., data representation of audit trails and user consent, automatic checking of usage policies or data anonymisation. To the best of our knowledge, there exists no approach providing an integrated architecture for auditable, privacy-preserving data analysis. To address these gaps, as the main contribution of this paper, we propose the WellFort approach, a semantic-enabled architecture for auditable, privacy-preserving data analysis which provides secure storage for users’ sensitive data with explicit consent, and delivers a trusted, auditable analysis environment for executing data analytic processes in a privacy-preserving manner. Additional contributions include the adaptation of Semantic Web technologies as an integral part of the WellFort architecture, and the demonstration of the approach through a feasibility study with a prototype supporting use cases from the medical domain. Our evaluation shows that WellFort enables privacy preserving analysis of data, and collects sufficient information in an automated way to support its auditability at the same time.
... New concepts were added to the vocabulary after being discussed and agreed upon by the CG. As a result of this process, a first version of the base vocabulary was published with the following main classes: personal data categories, processing, purposes, legal basis, technical and organizational measures and legal entities, including data subject and child, recipients, data controller, data processor and third party [96]. A second and third versions of the base vocabulary were released in 2021; the risk, right and data subject right classes were added to the base vocabulary and the previously existing classes were extended with new terms. ...
Article
This article surveys existing vocabularies, ontologies and policy languages that can be used to represent informational items referenced in GDPR rights and obligations, such as the ‘notification of a data breach’, the ‘controller’s identity’ or a ‘DPIA’. Rights and obligations in GDPR are analyzed in terms of information flows between different stakeholders, and a complete collection of 57 different informational items that are mentioned by GDPR is described. 13 privacy-related policy languages and 9 data protection vocabularies and ontologies are studied in relation to this list of informational items. ODRL and LegalRuleML emerge as the languages that can respond positively to a greater number of the defined comparison criteria if complemented with DPV and GDPRtEXT, since 39 out of the 57 informational items can be modelled. Online supplementary material is provided, including a simple search application and a taxonomy of the identified entities.
... At the same time, Pandit et al. [9] recommended the creation and adoption of standards and a common language for the exchange of GDPR compliance data. This theme aligns with the creation and development of the Data Privacy Vocabulary [10] or DPV. Early work by Pandit et al. [9,11] on interoperability and consent gives us a rich set of concepts and an extended (common) vocabulary relating to personal data processing. ...
Chapter
Full-text available
This paper describes a tool using an extended Data Privacy Vocabulary (the DPV) to audit and monitor GDPR compliance of international transfers of personal data. New terms were identified which have been proposed as extensions to the DPV W3C Working Group. A prototype software tool was built based on the model plus a set of validation rules, and synthetic use-cases created to test the capabilities of the model and tool (together a compliance framework). This framework was created because the rules around international transfer compliance are complex and changing, there is an absence of a common approach to ensuring compliance, few tools exist to assist, and those that do lack interoperability. Evaluation results demonstrate that the proposed model improves compliance identification and standardisation. The tool received positive feedback from the data protection practitioners who participated in the evaluation, and an initial version of is now in use in one financial services organisation. While currently the tool only addresses international transfers, in theory the framework can be extended through further work to the broader area of compliance of other aspects of the GPDR.
... It is an outcome of the W3C Data Privacy Vocabularies and Controls Community Group (DPVCG), representing a community agreement between different stakeholders. The creation of the DPV ontology follows guidelines and methodologies deemed 'best practice' by the semantic web community [17]. The DPV is helpful as a machine-readable representation of personal data processing and can be adopted in relevant use-cases such as legal compliance documentation and evaluation, policy specification, consent representation and requests, a taxonomy of legal terms, and annotation of text and data. ...
Chapter
Full-text available
This paper describes a new semantic metadata-based approach to describing and integrating diverse data processing activity descriptions gathered from heterogeneous organisational sources such as departments, divisions, and external processors. This information must be collated to assess and document GDPR legal compliance, such as creating a Register of Processing Activities (ROPA). Most GDPR knowledge graph research to date has focused on developing detailed compliance graphs. However, many organisations already have diverse data collection tools for documenting data processing activities, and this heterogeneity is likely to grow in the future. We provide a new approach extending the well-known DCAT-AP standard utilising the data privacy vocabulary (DPV) to express the concepts necessary to complete a ROPA. This approach enables data catalog implementations to merge and federate the metadata for a ROPA without requiring full alignment or merging all the underlying data sources. To show our approach’s feasibility, we demonstrate a deployment use case and develop a prototype system based on diverse data processing records and a standard set of SPARQL queries for a Data Protection Officer preparing a ROPA to monitor compliance. Our catalog’s key benefits are that it is a lightweight, metadata-level integration point with a low cost of compliance information integration, capable of representing processing activities from heterogeneous sources.
... The ICO document is a static, stand-alone entity, and it does not facilitate interoperability with any system, thus significantly increasing the likelihood that it will not be managed or maintained. This analysis will also provide a use case for the DPV and help to identify additional requirements for vocabulary, thus providing valuable insight into the standard requirements from industry and stakeholders to identify areas where interoperability is a requirement for the handling of personal data (Pandit, 2019). ...
... It is planned that the tool will use the W3C Community group's data protection vocabulary [49] to describe the context using explicit semantics and the W3C Data Cube vocabulary to represent the time series of measurements across the different GDPR aspects or dimensions [50]. This development involved taking the self-assessment checklist and transforming it into an evaluation tool which was populated by a sample organisation each month for six months in total. ...
Chapter
Full-text available
Organisations can be complex entities, performing heterogeneous processing on large volumes of diverse personal data, potentially using outsourced partners or subsidiaries in distributed geographical locations and jurisdictions. Many organisations appoint a Data Protection Officer (DPO) to assist them with their demonstration of compliance with the GDPR Principle of Accountability. The challenge for the DPO is to monitor these complex processing activities and to advise and inform the organisation with regard to the organisations demonstration of compliance with the Principle of Accountability. A review of GDPR compliance software solutions shows that organisations are being greatly challenged in meeting compliance obligations as set out under the GDPR, despite the myriad of software tools available to them. Many organisations continue to take a manual and informal approach to GDPR compliance. Our analysis shows significant gaps on the part of GDPR tools in their ability to demonstrate compliance in that they lack interoperability features, and they are not supported by published methodologies or evidence to support their validity or even utility. In contrast, RegTech has brought great success to financial compliance, using technological solutions to facilitate compliance with, and the monitoring of regulatory requirements. A review of the State of the Art identified the four success features of a RegTech system to be, strong data governance, automation through technology, interoperability of systems and a proactive regulatory framework. This paper outlines a set of requirements for GDPR compliance tools based on the RegTech experience and evaluate how these success features could be applied to improve GDPR compliance. A proof of concept prototype GDPR compliance tool was explored using the four success factors of RegTech, in which RegTech best practice was applied to regulator based self-assessment checklist to establish if the demonstration of GDPR compliance could be improved. The application of a RegTech success factors provides opportunities for demonstrable and validated GDPR compliance, notwithstanding the risk reductions and cost savings that RegTech can deliver and can facilitate organisations in meeting their GDPR compliance obligations.
... The W3C Data Privacy Vocabularies and Controls community group developed a vocabulary to annotate and categorize instances of legally compliant personal data handling [23]. This is complementary to our solution as their vocabulary describes consent and data processing purposes in EcoDaLo. ...
Chapter
Full-text available
A key source of revenue for the media and entertainment domain is ad targeting : serving advertisements to a select set of visitors based on various captured visitor traits. Compared to global media companies such as Google and Facebook that aggregate data from various sources (and the privacy concerns these aggregations bring), local companies only capture a small number of (high-quality) traits and retrieve an unbalanced small amount of revenue. To increase these local publishers’ competitive advantage, they need to join forces, whilst taking the visitors’ privacy concerns into account. The EcoDaLo consortium, located in Belgium and consisting of Adlogix, Pebble Media, and Roularta Media Group as founding partners, aims to combine local publishers’ data without requiring these partners to share this data across the consortium. Usage of Semantic Web technologies enables a decentralized approach where federated querying allows local companies to combine their captured visitor traits, and better target visitors, without aggregating all data. To increase potential uptake, technical complexity to join this consortium is kept minimal, and established technology is used where possible. This solution was showcased in Belgium which provided the participating partners valuable insights and suggests future research challenges. Perspectives are to enlarge the consortium and provide measurable impact in ad targeting to local publishers.
Article
Full-text available
The creation and maintenance of Registers of Processing Activities (ROPA) are essential to meeting the General Data Protection Regulation (GDPR) and thus to demonstrate compliance based on the GDPR concept of accountability. To establish its effectiveness in meeting this obligation, we evaluate an ROPA semantic model, the Common Semantic Model–ROPA (CSM–ROPA). Semantic models and tools represent one solution to the compliance challenges faced by organisations: the heterogeneity of relevant data sources, and the lack of tool interoperability and agreed common standards. By surveying current practice and the literature we identify the requirements for GDPR accountability tools: digital exchange of data, automated accountability verification and privacy-aware data governance. A case study was conducted to analyse the expressivity and effectiveness of CSM–ROPA when used as an interoperable, machine-readable mediation layer to express the concepts in a comprehensive regulator-provided accountability framework used for GDPR compliance. We demonstrate that CSM–ROPA can express 98% of ROPA accountability terms and fully express nine of the ten European regulators' ROPA templates. We identify three terms for addition to CSM–ROPA, and we identify areas where CSM–ROPA relies on partial matches that indicate model limitations. These improvements to CSM–ROPA will provide comprehensive coverage of the regulator-supplied model. We show that tools based on CSM–ROPA can fully meet the requirements of compliance best practice when compared with either manual accountability approaches or a leading privacy software solution.
Chapter
This paper aims to describe a research project focused on the digital representation of information related to the privacy and data protection domain. Currently, privacy policies are used by data controllers as a tool to achieve compliance with data protection regulations such as the EU GDPR, instead of being a privacy instrument at the disposal of both controllers and data subjects. On the other hand, data subjects lack the tools to effectively establish preferences when it comes to the processing and disclosure of their personal data, as well as to easily exercise their rights. In this regard, this paper discusses the challenges of the implementation of a service based on decentralised Web technologies and Semantic Web standards and specifications to facilitate the communication between data subjects and data controllers in the light of the GDPR. The main challenges that this service intends to address are linked to the exercising of GDPR-related rights and obligations, the negotiation of privacy terms and the governance of access to personal data stores. A case study in the healthcare and genomics domain will be explored to experiment with the developed tools. Early-stage results related to the implementation of semantic policies for the representation of GDPR rights and obligations are presented.
Conference Paper
Managing Privacy and understanding the handling of personal data has turned into a fundamental right-at least for Europeans-since May 25th with the coming into force of the General Data Protection Regulation. Yet, whereas many different tools by different vendors promise companies to guarantee their compliance to GDPR in terms of consent management and keeping track of the personal data they handle in their processes, interoperability between such tools as well uniform user facing interfaces will be needed to enable true transparency, user-configurable and-manageable privacy policies and data portability (as also-implicitly-promised by GDPR). We argue that such interoper-ability can be enabled by agreed upon vocabularies and Linked Data.
Chapter
Consent is an important legal basis for the processing of personal data under the General Data Protection Regulation (GDPR), which is the current European data protection law. GPDR provides constraints and obligations on the validity of consent, and provides data subjects with the right to withdraw their consent at any time. Determining and demonstrating compliance to these obligations require information on how the consent was obtained, used, and changed over time. Existing work demonstrates feasibility of semantic web technologies in modelling information and determining compliance for GDPR. Although these address consent, they currently do not model all the information associated with it. In this paper, we address this by first presenting our analysis of information associated with consent under the GDPR. We then present GConsent, an OWL2-DL ontology for representation of consent and its associated information such as provenance. The paper presents the methodology used in the creation and validation of the ontology as well as an example use-case demonstrating its applicability. The ontology and this paper can be accessed online at https://w3id.org/GConsent.
Chapter
The General Data Protection Regulation (GDPR) is the new European data protection law whose compliance affects organisations in several aspects related to the use of consent and personal data. With emerging research and innovation in data management solutions claiming assistance with various provisions of the GDPR, the task of comparing the degree and scope of such solutions is a challenge without a way to consolidate them. With GDPR as a linked data resource, it is possible to link together information and approaches addressing specific articles and thereby compare them. Organisations can take advantage of this by linking queries and results directly to the relevant text, thereby making it possible to record and measure their solutions for compliance towards specific obligations. GDPR text extensions (GDPRtEXT) uses the European Legislation Identifier (ELI) ontology published by the European Publications Office for exposing the GDPR as linked data. The dataset is published using DCAT and includes an online webpage with HTML id attributes for each article and its subpoints. A SKOS vocabulary is provided that links concepts with the relevant text in GDPR. To demonstrate how related legislations can be linked to highlight changes between them for reusing existing approaches, we provide a mapping from Data Protection Directive (DPD), which was the previous data protection law, to GDPR showing the nature of changes between the two legislations. We also discuss in brief the existing corpora of research that can benefit from the adoption of this resource.
Chapter
In contrast to other approaches that provide methodological guidance for ontology engineering, the NeOn Methodology does not prescribe a rigid workflow, but instead it suggests a variety of pathways for developing ontologies. The nine scenarios proposed in the methodology cover commonly occurring situations, for example, when available ontologies need to be re-engineered, aligned, modularized, localized to support different languages and cultures, and integrated with ontology design patterns and non-ontological resources, such as folksonomies or thesauri. In addition, the NeOn Methodology framework provides (a) a glossary of processes and activities involved in the development of ontologies, (b) two ontology life cycle models, and (c) a set of methodological guidelines for different processes and activities, which are described (a) functionally, in terms of goals, inputs, outputs, and relevant constraints; (b) procedurally, by means of workflow specifications; and (c) empirically, through a set of illustrative examples.
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation
  • European Parliament
  • Council
European Parliament and Council: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (May 2016)
Compliance through informed consent: semantic based consent permission and data management model
  • K Fatema
  • E Hadziselimovic
  • H J Pandit
  • C Debruyne
  • D Lewis
  • D Sullivan
Tracking Preference Expression (DNT)
  • R T Fielding
  • D Singer
Fielding, R.T., Singer, D.: Tracking Preference Expression (DNT) (Jan 2019), https://www.w3.org/TR/tracking-dnt/