Conference PaperPDF Available

Abstract

How do you keep a secret about your personal life in an age where your daughter’s glasses record and share everything she senses, your wallet records and shares your financial transactions, and your set-top box records and shares your family’s energy consumption? Your personal data has become a prime asset for many companies around the Internet, but can you avoid -- or even detect -- abusive usage? Today, there is a wide consensus that individuals should have increased control on how their personal data is collected, managed and shared. Yet there is no appropriate technical solution to implement such personal data services: centralized solutions sacrifice security for innovative applications, while decentralized solutions sacrifice innovative applications for security. In this paper, we argue that the advent of secure hardware in all personal IT devices, at the edges of the Internet, could trigger a sea change. We propose the vision of trusted cells: personal data servers running on secure smart phones, set-top boxes, secure portable tokens or smart cards to form a global, decentralized data platform that provides security yet enables innovative applications. We motivate our approach, describe the trusted cells architecture and define a range of challenges for future research.
Trusted Cells: A Sea Change for Personal Data Services
Nicolas Anciaux1, 2, Philippe Bonnet3, Luc Bouganim1, 2,
Benjamin Nguyen1, 2, Iulian Sandu Popa1, 2, Philippe Pucheral1, 2
1 INRIA Paris-Rocquencourt
Le Chesnay, France
<Fname.Lname>@inria.fr
2 PRISM Laboratory
Univ. of Versailles, France
<Fname.Lname>@prism.uvsq.fr
3 IT University of Copenhagen
Copenhagen, Denmark
phbo@itu.dk
ABSTRACT
How do you keep a secret about your personal life in an age
where your daughter’s glasses record and share everything she
senses, your wallet records and shares your financial transactions,
and your set-top box records and shares your family’s energy
consumption? Your personal data has become a prime asset for
many companies around the Internet, but can you avoid -- or even
detect -- abusive usage? Today, there is a wide consensus that
individuals should have increased control on how their personal
data is collected, managed and shared. Yet there is no appropriate
technical solution to implement such personal data services:
centralized solutions sacrifice security for innovative applications,
while decentralized solutions sacrifice innovative applications for
security. In this paper, we argue that the advent of secure
hardware in all personal IT devices, at the edges of the Internet,
could trigger a sea change. We propose the vision of trusted cells:
personal data servers running on secure smart phones, set-top
boxes, secure portable tokens or smart cards to form a global,
decentralized data platform that provides security yet enables
innovative applications. We motivate our approach, describe the
trusted cells architecture and define a range of challenges for
future research.
1. INTRODUCTION
With the convergence of mobile communications, sensors and
online social networks technologies, we are witnessing an
exponential increase in the creation and consumption of personal
data. Paper-based interactions (e.g., banking, health), analog
processes (e.g., photography, resource metering) or mechanical
interactions (e.g., as simple as opening a door) are now sources of
digital data linked to one or several individuals. They represent an
unprecedented potential for applications and business.
Until now, the enthusiasm for new opportunities has thwarted
privacy concerns. Nevertheless, the risk of a backlash is growing
as new devices and new services bring us closer to the dystopias
described in the science fiction literature. This risk is well
documented and the nature of the solution is consensual: it is
necessary to increase the control that individuals have over their
personal data [11,9,12]. The World Economic Forum even
formulates the need for a data platform that allows individuals to
manage the collection, usage and sharing of data in different
contexts and for different types and sensitivities of data [13].
Unfortunately, none of the solutions available today can be used
to implement this vision. Centralized solutions, including
emerging cloud-based personal data vaults management
platforms1, trade security and protection for innovative services.
At best, such approaches formulate sound privacy policies, but
none of them propose mechanisms to automatically enforce these
policies [1]. Even TrustedDB [3], which proposes tamper-resistant
hardware to secure outsourced centralized databases, does not
solve the two intrinsic problems of centralized approaches. First,
users get exposed to sudden changes in privacy policies. Second,
users are exposed to sophisticated attacks, whose cost-benefit is
high on a centralized database.
Decentralized solutions are promising because they do not exhibit
these intrinsic limitations. However, existing decentralized
solutions sacrifice functionality or usability for security. Many
examples are discussed in [8]. Other examples include the PDS
vision [2] or the FreedomBox [4]. In PDS, a personal data server
is embedded in a tamper-resistant portable token to hold the
personal data of a user, but the sharing of data is cumbersome
(since the tokens are mostly disconnected) and the range of
personal services is limited (since the tokens have extreme
resource constraints). FreedomBox aims at providing a software
platform that interconnects groups of individuals that trust each
other, thus drastically limiting the range of services it can support.
We argue that the advent of secure hardware embedded in all
forms of personal devices, at the edges of the Internet, will trigger
a sea change. Recently, AMD announced that it will incorporate a
secure Trust Zone-based2 ARM processor on its chips to be
included into smart phones, set-top boxes and laptops. Such
secure tamper-resistant microcontrollers provide tangible security
guarantees in the context of well-known environments3. We can
now imagine that whenever you take a picture, your smart phone
securely contacts the personal services of all individuals in the
frame of the picture, and automatically blurs the face of those who
request it. We can also imagine that the GPS tracker in your son’s
car gives him detailed turn-by-turn guidance, but hides those
details to local government, only delivering road-pricing results.
In this paper, we propose the vision of trusted cells, i.e., personal
data servers running on secure devices to form a decentralized data
platform. We illustrate how trusted cells can be used in the context
of an application scenario, describe the trusted cells architecture
and discuss requirements and challenges for future research.
1 These include Personal (http://www.personal.com), My personal vault
(http://www.mypersonalvault.com), or Mydex (http://www.mydex.org).
2 http://www.arm.com/products/processors/technologies/trustzone.php
3 The adoption of a standard API for secure micro-controllers [5] and the
availability of an open source embedded secure operating system based
on it (Open Virtualization) now enable higher level services.
This article is published under a Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/), which permits distribution
and reproduction in any medium as well allowing derivative
works, provided that you attribute the original work to the author(s)
and CIDR 2013.
6th Biennial Conference on Innovative Data Systems Research (CIDR ’13)
January 6-9, 2013, Asilomar, California, USA.
2. MOTIVATION
Alice lives in France with Bob and their two children. Their house
is now one of the 35 million households equipped with a Linky
power meter. The power meter reports once a day to the
distribution company, a certified time series of readings for
verification, billing and network operation [6]. Alice and Bob
have installed an energy butler app on their secure home gateway,
a trusted cell managing all smart appliances in their home and
storing their data. That award-winning app relies on external feeds
from their utility and local weather prediction, as well as a feed of
readings received every second from the Linky4, to control their
heat pump and the charge of their electrical vehicle. This app
minimizes overall load on the distribution network and saves them
30% on their bill. In addition, Alice is engaged in a social game (a
follow-up to simpleEnergy.com) where she competes with some
friends on their energy savings, reducing consumption by 20%.
At the 1Hz granularity provided by the Linky, most electrical
appliances have a distinctive energy signature. It is thus possible
to infer from the power meter data which activities Alice and Bob
are involved in at specific points in time [7]. How do Alice and
Bob configure the home gateway trusted cell to preserve privacy
while preserving the benefit of their applications? They have a
shared account on this trusted cell. Bob, Alice and their children
have agreed that they do not want to fully disclose all their
activities to each other. They rather have access to 15 min
aggregates via a visualization app at that granularity one cannot
detect specific activities, but it is still possible to infer a daily
routine. At the same time, daily statistics feed their social game,
monthly statistics are delivered to the distribution company and
time series at required granularity are securely exchanged with
other trusted cells in their neighborhood to achieve consumption
peak load shaving.
None of this data leaves the trusted cell application unless it is
accessed via a predefined set of aggregate queries. The trusted cell
guarantees that no malware can tamper with the data. If the trusted
cell gets stolen, an elaborate attack would need to be mounted to
break the secure hardware and get access to their personal data.
This scenario can be easily transposed to different types of
personal data like GPS traces, Internet traces, mobile phone data,
bills, pay slips, photos as well as health, administrative or scholar
records. We classify the data that could be managed with trusted
cells, based on how and who actually produces it:
(1) Data produced by smart sensors installed by companies in the
user’s home (e.g., power-meter, heat sensor) or in the user’s
environment (e.g., user’s car GPS tracking box for a PAYD
application) on which the user has full or shared ownership,
externalizing aggregated data. Users may opt-in for small-
scale sharing (e.g., local traffic optimization) or larger-scale
sharing (e.g., social games or traffic optimization).
(2) Data produced or inferred by external systems (e.g., purchase
receipt obtained by near field communication or medical data
sent by the hospital or labs). Small-scale sharing allows the
user to optimize her buying habits or to compare her medical
treatment with people having the same disease. Larger-scale
4 In France, such a short-range radio link is a requirement from the
regulation authorities. In other countries, the data from a smart meter
might not be directly accessible. In the US for example, the Green
Button initiative allows customers to obtain online the smart meter data
collected by their utility (http://www.greenbuttondata.org/)
sharing brings public health insights (e.g., epidemiological
study cross-analyzing diseases and alimentation).
(3) Data authored by the user herself (e.g., a photo, a mail, a
document) on which she has complete ownership. Small-scale
sharing benefit is obvious here. Larger-scale sharing of partial
data (e.g., photo location only, number of exchanged mails) is
undoubtedly a source of precious information (e.g., most
interesting places on Google maps).
3. TRUSTED CELLS ARCHITECTURE
What personal data services actually run on a trusted cell? How
do these services allow a user to control whom she shares her
secrets with? How do applications access these services? What
kind of guarantees do trusted cells offer about the security of the
data they manage? We obviously do not aim at answering those
questions fully in this paper. Our goal here is to draw the contours
of an architecture based on Trusted Cells interconnected via an
Untrusted Infrastructure.
Trusted Cells: A trusted cell implements a client-side reference
monitor [10] on top of secure hardware. At a minimum, the
hardware must guarantee a clear separation between secure and
non-secure software. We abstract a Trusted Cell as (1) a Trusted
Execution Environment, (2) a tamper-resistant memory where
cryptographic secrets are stored, (3) an optional and potentially
untrusted mass storage and (4) communication facilities.
Physically, a trusted cell can either be a stand-alone hardware
device (e.g., a smart token) or be embedded in an existing device
(e.g., a smartphone based on ARM’s TrustZone architecture).
The very high security provided by trusted cells comes from a
combination of factors: (1) the obligation to physically be in
contact with the device to attack it, (2) the tamper-resistance of
(part of) its processing and storage units making hardware and
side-channel attacks highly difficult, (3) the certification of the
hardware and software platform, or the openness of the code,
making software attacks (e.g., Trojan) also highly difficult, (4) the
capacity to be auto-administered, contrary to high-end multi-user
servers, avoiding insider (i.e., DBA) attacks, and (5) the
impossibility even for the trusted cell owner to directly access the
data stored locally or spy the local computing (she must
authenticate and only gets data according to her privileges).
In terms of functionality, a full-fledged trusted cell should be able
to (1) acquire data and synchronize it with the user’s digital space,
(2) extract metadata, index it and provide query facilities on it, (3)
cryptographically protect data against confidentiality and integrity
attacks, (4) enforce access and usage control rules, (5) make all
access and usage actions accountable, (6) participate to
computations distributed among trusted cells. Basic (e.g., sensor-
based) trusted cells may implement a subset of this.
Untrusted infrastructure: The infrastructure provides the
storage, computing and communication services, which expand
the resources of a single trusted cell and form the glue between
trusted cells. By definition, the infrastructure does not benefit
from the hardware security of the trusted cell and is therefore
considered untrusted. We consider that the infrastructure is
implemented by a Cloud-based service provider5.
In terms of functionality, the untrusted infrastructure is assumed
to: (1) ensure a highly available and resilient store for all data
outsourced by trusted cells, (2) provide communication facilities
5 A P2P infrastructure among trusted cells could be envisioned but would
raise many technical issues of limited interest for this article.
Figure 1: Alice (A) and Bob (B) are equipped with fixed and
portable trusted cells, acquiring data from several data sources,
synchronizing with their encrypted personal digital space on the
cloud. Charlie (C) is travelling around the world and can securely
access all his data from any (unsecure) terminal thanks to his
portable trusted cell. All users equipped with trusted cells can
securely share their encrypted data through the cloud.
among cells and (3) participate to distributed computations (e.g.,
store intermediate results), provided this participation can be
guaranteed harmless by security checks implemented at the
trusted cells side.
Figure 1 illustrates how trusted cells and the untrusted
infrastructure can collaborate to implement scenarios meeting the
privacy requirements stated above.
Threat model: In our context, the primary adversary is the
infrastructure. The infrastructure may deviate from the protocols it
is expected to implement with the objective to breach the
confidentiality of the outsourced data. Integrity attacks (e.g., on
data related to access control) must also be deterred since they
may lead to subsequent confidentiality leaks. The infrastructure is
assumed trying to cheat only if it cannot be convicted as an
adversary by any trusted cell. Indeed, revealing a data leak (or a
denial of service) in a public place would cause irreversible
political/financial/legal damage to the service provider. Such
adversaries are usually called malicious adversary having weakly
malicious intents [14]. Trusted cells are themselves presumably
trusted. However, even secure hardware can be breached, though
at very high cost, so that one cannot exclude with certainty that a
very small number of trusted cells be compromised. Hence, the
trusted cells’ cryptographic secrets must be managed in such a
way that a successful attack on a (small set of) trusted cell cannot
degenerate in breaking class attack. This is of utmost importance
considering also that an individual succeeding in breaking her
trusted cell could have effective malicious intents.
4. REQUIREMENTS AND CHALLENGES
We identify five major requirements for the user to actually
control how the data entering her personal digital space is
collected, protected, shared and finally used.
Controlled collection of sensed data: The targeted user(s) should
be the unique recipient(s) of raw sensed data and would accept
externalizing only aggregates by opting in/out for selected
applications/services.
At home, the power meter continuously pushes raw measurements
to Alice’s and Bob’s trusted cell gateway, while a certified
aggregated time series is sent to the power supplier company and
aggregates for a social game are pushed to the Cloud every day.
Similarly, the tracking box installed on Alice’s car is a trusted cell
delivering aggregated GPS data to her insurer and raw data to her
trusted cell smartphone that she will synchronize with her
personal space for further use when back home. Hence, adding a
trusted cell to a sensor, allows defining e.g., the frequency and or
precision of the data that should be externalized, thus leading to a
trusted source both for the user (in terms of privacy preservation)
and the provider (in terms of certification of the output data).
Related challenges: Co-design is a primary issue to allow the
definition of affordable sensor-based trusted cells. Low-cost is
indeed a prerequisite to the generalization of trusted sources,
capable of securely filtering and aggregating stream-based spatio-
temporal data with tiny hardware resources. Some trusted sources
being weakly connected to the Internet; asynchrony problems
must also be addressed. Finally, the combination of data streams
from multiple sources, each being separately harmless, may
generate new privacy risks that must be carefully tackled.
Secure private store: All data must be made highly available,
resilient to failure and protected against confidentiality and
integrity attacks. Accessing this data from any terminal, including
those outside the user’s ownership sphere (e.g., internet café),
should leave no trace of the access.
Cryptographic techniques (i.e., encryption, hashing, signatures)
are used to protect trusted cell’s data, keeping cryptographic keys
in their tamper-resistant memory. The data is then stored in the
Cloud and potentially cached in the trusted cell local mass
storage. At a minimum, trusted cells keep locally extended
metadata: access information, indexes, keywords, and
cryptographic keys. Metadata should be sufficient to allow
performing queries before accessing the Cloud to retrieve the data
of interest. Cryptographic keys never leave the trusted cells
tamper-resistant memory. Hence a trusted cell can be used to get
securely data from any (untrusted) terminal it is connected with.
Related challenges: Designing an intuitive HCI for managing this
bunch of heterogeneous personal data (data modeling, data
integration, querying) is a major challenge. Besides, a significant
amount of data and metadata is likely to be embedded in some
trusted cells and may need to be queried efficiently. While it does
not seem a major issue in powerful trusted cells (e.g., a smart
phone), it appears much more challenging when facing low-end
hardware devices like secure tokens (e.g., a microcontroller with
tiny RAM, connected to NAND Flash chips or SD cards, possibly
with energy consumption constraints). Whatever their complexity,
trusted cells should also be designed to support self-tuning, self-
diagnosis and self-healing to minimize the management burden
put on the trusted cell owner.
Secure sharing: The user can decide to keep her data private or
share it with other users or group of users under certain conditions
(e.g., time, location). Under which model the access control
policies are actually defined is an open issue, but not the main
concern of this paper. However, we insist that the user must get a
proof of legitimacy for the credentials exposed by the participants
of a data exchange and must trust the evaluation of the exchange
conditions (if any).
A
A & B services
A
A
Power
provider
Super-
market
Car insurer
B
Sync.
A
Portable
Trusted Cell
Encrypted
personal vault
Secure data
exchange
Photos
MyFiles
Sync.
Fixed Trusted Cell
Sync.
Heat
sensor
Power
meter
F E
D H
G
I
C Internet
café
A & B home
D
E
F
H
I
G
B
A
Employer
Hospital
School
C
C
GPS
Pay slips
Medical data
Scholar folder
The cloud
Practically, sharing data means sharing the associated metadata
(so that the recipient user can get the referenced data in the
Cloud), the cryptographic keys (so that her trusted cell can
decrypt them) and the sticky policy (so that her trusted cell can
enforce the expected access control rules). Hence, thanks to its
security properties, including the protection against illegitimate
actions of the recipient user, the recipient trusted cell can enforce
all the conditions appearing in the access control rules (user’s
credential, contextual conditions).
Related challenges: Again, an intuitive HCI for defining the
access control policies and simple modes of operation must be
devised. The trusted cells themselves may be a source of
simplification (e.g., integration of biometric sensors to
automatically authenticate users, automatic production of certified
credentials safely computed on a trusted cell, definition of default
policies by trusted third parties e.g., citizen associations which
could be automatically selected depending on a computed
individual’s profile). Also, secret management is at the heart of
any sharing protocol between trusted cells (i.e., at this level a
secret is a cryptographic key) and must be carefully designed
(e.g., class-breaking attacks must be prevented, master secrets
must be restorable in case of crash/loss of a trusted cell).
Secure usage and accountability: Usage control usually refers to
UCONABC [8]: obligations (actions a subject must take before or
while it holds a right), conditions (environmental or system-
oriented decision factors), and mutability (decisions based on
previous usage)6. Again, defining appropriate usage control
policies for trusted cell applications is an open issue.
Similarly to access control rules, usage control rules can be
implemented as sticky policies so that they are made
cryptographically inseparable from the data to be protected.
Hence usage control rules will be enforced by any trusted cell
downloading data and cannot be bypassed by the recipient user.
Regarding accountability, the recipient trusted cell can maintain
an audit log, encrypt it and push it on the Cloud to the destination
of the originator trusted cell.
Related challenges: Many challenges are common with secure
sharing. However, trusted cells hold the promise of new usages
and new usage controls. For example, trusted cells could be
parameterized so that any personal data produced by a trusted
source linked to an individual A and referencing individual B be
submitted for approbation to B’s trusted cell before being
integrated to A’s digital space.
Shared Commons: Privacy has also a collective dimension in the
sense that preserving one’s privacy should not hinder societal
benefits (e.g., census, epidemiologic releases, global queries). A
trusted cell user is thus expected to participate to global treatments
assuming her data suffers appropriate transformations (e.g,
anonymization, output perturbation) depending on the
trustworthiness of the recipient(s) and the expected usage of the
data/query. When data needs to be transformed before being
delivered, the recipient trusted cell implements the transformation
on its own if possible (e.g., filtering, local data perturbation) or in
collaboration with other trusted cells if the transformation requires
a collective action (e.g., anonymization, global data perturbation).
In the latter case, the computation may be implemented in a pure
Secure Multi-Party fashion or may require the participation of the
untrusted infrastructure (e.g., to store intermediate results).
6 For instance, a photo could be accessed ten times (mutability), in the
course of 2012 (condition), informing the owner of the precise access
date (obligation).
Related challenges: Such large scale computations may lead to
atypical distributed protocols combining security and performance
requirements in an asymmetric context made on one side of a very
large number of highly secure, low power and weakly available
trusted cells and on the other side of a highly powerful, highly
available but untrusted infrastructure. Hence, the trusted cells
architecture can be seen as a massive untrusted interconnection of
trusted co-processors.
5. CONCLUSION
We proposed the trusted cell architecture, a vision reconciling
individual’s privacy with innovative acquisition and sharing of
personal data. This vision is based on the premise of ubiquitous
and open secure hardware. Trusted cells enforce access and usage
control at the edges of the Internet, and thus constitute a sea
change with respect to personal data management. This vision
undoubtedly opens a set of exciting challenges that must be
explored by the database community.
6. ACKNOWLEDGEMENTS
This work has been partially funded by the French ANR KISS
project.
7. REFERENCES
[1] R. Agrawal, J. Kiernan, R. Srikant, Y. Xu: Hippocratic
Databases. VLDB 2002: 143-154
[2] T. Allard et al.: Secure Personal Data Servers: a Vision
Paper. PVLDB 3(1): 25-35 (2010)
[3] S. Bajaj, R. Sion: TrustedDB: a trusted hardware based
database with privacy and data confidentiality. SIGMOD
Conference 2011: 205-216
[4] FreedomBox: http://freedomboxfoundation.org/.
[5] Global Platform Device Technology. Trusted Execution
Environment Internal API Specification. Version 1.0.
December 2011.
[6] S. Katzenbeisser and K. Kursawe, Privacy and Security in
Smart Energy Grids, Dagstuhl Seminar 1151, 2011
[7] H. Lam. A Novel Method to Construct Taxonomy Electrical
Appliances Based on Load Signatures,. IEEE Transactions
on Consumer Electronics, 2007.
[8] A. Narayanan, V. Toubiana, S. Barocas, H. Nissenbaum, D.
Boneh: A Critical Look at Decentralized Personal Data
Architectures CoRR abs/1202.4503: (2012)
[9] H. Nissenbaum, Privacy in context: Technology, policy, and
the integrity of social life,”Stanford Law Books, 2010.
[10] J. Park and R. Sandhu, “The UCONABC usage control
model,” ACM Trans Inormationf System Security, vol. 7, no.
1, pp. 128-174, 2004.
[11] A. Pentland et al. Personal Data: The Emergence of a New
Asset Class. World Economic Forum. January 2011.
[12] S. Petronio, Unpacking the paradoxes of privacy in CMC
relationships: The challenges of blogging and relational
communication on the internet, In Computer-mediated
communication in Personal Relationships, 2011.
[13] The World Economic Forum. Rethinking Personal Data:
Strengthening Trust. May 2012.
[14] N. Zhang, W. Zhao: Distributed privacy preserving
information sharing. VLDB 2005.
... Research projects such as Personal Data Server [5] or Trusted Cells [7] propose an enhancement for the home cloud plugs family by adding a tamper-resistant element (e.g. a chip) to the hardware. This tamper-resistant element embeds a minimal trusted computing base that may be formally proven secure and acts as a DBMS. ...
... • Secure distributed computations. In [126,124] algorithms based on Trusted Cells [7] are proposed to achieve secure distributed computations by relying on an untrusted central server leveraging its high computation capabilities. The data are encrypted or anonymized and then sent to this server which performs partial computations on them. ...
... Decentralized solutions based on secure hardware have also been proposed for aggregate queries. For example, in [125,126] the authors propose a protocol to perform global computations such as SQL aggregates using a specific secure hardware [11] and architecture [7]. In their architecture, each node is equipped with a Trusted Data Store (TDS) with limited computing resources, storage, and low availability. ...
Thesis
Grâce aux “smart disclosure initiatives”, traduit en français par « ouvertures intelligentes » et aux nouvelles réglementations comme le RGPD, les individus ont la possibilité de reprendre le contrôle sur leurs données en les stockant localement de manière décentralisée. En parallèle, les solutions dites de clouds personnels ou « système personnel de gestion de données » se multiplient, leur objectif étant de permettre aux utilisateurs d'exploiter leurs données personnelles pour leur propre bien.Cette gestion décentralisée des données personnelles offre une protection naturelle contre les attaques massives sur les serveurs centralisés et ouvre de nouvelles opportunités en permettant aux utilisateurs de croiser leurs données collectées auprès de différentes sources. D'un autre côté, cette approche empêche le croisement de données provenant de plusieurs utilisateurs pour effectuer des calculs distribués.L'objectif de cette thèse est de concevoir un protocole de calcul distribué, générique, qui passe à l’échelle et qui permet de croiser les données personnelles de plusieurs utilisateurs en offrant de fortes garanties de sécurité et de protection de la vie privée. Le protocole répond également aux deux questions soulevées par cette approche : comment préserver la confiance des individus dans leur cloud personnel lorsqu'ils effectuent des calculs croisant des données provenant de plusieurs individus ? Et comment garantir l'intégrité du résultat final lorsqu'il a été calculé par une myriade de clouds personnels collaboratifs mais indépendants ?
... Research projects such as Personal Data Server [5] or Trusted Cells [7] propose an enhancement for the home cloud plugs family by adding a tamper-resistant element (e.g. a chip) to the hardware. This tamper-resistant element embeds a minimal trusted computing base that may be formally proven secure and acts as a DBMS. ...
... • Secure distributed computations. In [126,124] algorithms based on Trusted Cells [7] are proposed to achieve secure distributed computations by relying on an untrusted central server leveraging its high computation capabilities. The data are encrypted or anonymized and then sent to this server which performs partial computations on them. ...
... Decentralized solutions based on secure hardware have also been proposed for aggregate queries. For example, in [125,126] the authors propose a protocol to perform global computations such as SQL aggregates using a specific secure hardware [11] and architecture [7]. In their architecture, each node is equipped with a Trusted Data Store (TDS) with limited computing resources, storage, and low availability. ...
Thesis
Thanks to smart disclosure initiatives and new regulations like GDPR, individuals are able to get the control back on their data and store them locally in a decentralized way. In parallel, personal data management system (PDMS) solutions, also called personal clouds, are flourishing. Their goal is to empower users to leverage their personal data for their own good.This decentralized way of managing personal data provides a de facto protection against massive attacks on central servers and opens new opportunities by allowing users to cross their data gathered from different sources. On the other side, this ap- proach prevents the crossing of data from multiple users to perform distributed computations. The goal of this thesis is to design a generic and scalable secure decentralized computing framework which allows the crossing of personal data of multiple users while answering the following two questions raised by this approach. How to preserve individuals’ trust on their PDMS when performing global computations crossing data from multiple individuals ? And how to guarantee the integrity of the final result when it has been computed by a myriad of collaborative but independent PDMSs ?
... Enhancing Home-cloud plugs with tamper-resistant hardware is a proposal that comes from research projects with the Personal Data Server [4] and Trusted Cells [7] as examples. Their approach is to embed a minimal Trusted Computing Base (TCB) that acts as a database management system within the secure element of the device, in order to form a decentralized and secured data platform. ...
... In [98,97] their objective is to show that global computation and privacy protection are compatible concepts. They assume that the network is comprised of Trusted Cells [7] nodespersonal clouds that are secured by a secure hardware -and they want to be able to compute SQL-like queries on these, with a focus on joins and aggregates queries, while respecting the privacy of the users. ...
... (6) The BP decipher the IP addresses of the targets and then forward them the information. (7) The targets decipher the local query and list of actors, apply the local query, choose a DA, encrypts (hybrid encryption) their result for the selected DA, and asks AP to forward the information. Node possessing sensitive data. ...
Thesis
In a context where we produce more and more personal data and where we control less and less how and by whom they are used, a new way of managing them is on the rise: the "personal cloud". In partnership with the french start-up Cozy Cloud (https://cozy.io) that is developing such technology, we propose through this work a way of collaboratively querying the personal clouds while preserving the privacy of the users.We detail in this thesis three contributions to achieve this objective: (1) a set of four requirements any protocol has to respect in this particular context: imposed randomness to prevent an attacker from influencing the execution of a query, knowledge dispersion to prevent any node from concentrating information, task atomicity to split the execution in as many independent tasks as necessary and hidden communications to protect the identity of the participants as well as the content of their communications; (2) SEP2P a protocol leveraging a distributed hash table and CSAR, another protocol that generates a verifiable random number, in order to generate a random and verifiable list of actors in accordance with the first requirement; and (3) DISPERS a protocol that applies the last three requirements and splits the execution of a query so as to minimize the impact of a leakage (in case an attacker was selected as actor) by providing to each actor the minimum amount of information it needs in order to execute its task.
... These perspectives should not eclipse the security issues raised by the PDMS paradigm given the sensitivity and quantity of managed personal data. Several products (e.g., [39][40][41][42][43][44][45][46][47][48][49][50]) and research initiatives on PDMS (e.g., [1,2,3,12,17,25,37]) are riding this wave. While PDMS have been studied and developed for more than a decade, the proposed solutions provide diverse sets of functionalities and consider diverse threat models. ...
... In the first part of the tutorial, we review, compare and categorize, academic PDMS proposals [1,2,3,12,17,19,25,32,33,37] and industrial products representative of the current PDMS offer (CozyCloud [39], Inrupt [40], MyDex [41], Digi.me [42], Meeco [43], BitsAbout.Me [44], Perkeep [45], CloudLocker [46], MyCloud [47]). We also consider products targeting data storage and synchronization for personal applications like SpiderOak [11] which are related to PDMS. ...
... •DB server with secure hardware [7,9] •SGX-based DBMS [15,28,36] •Secure indexing [17,24] •Oblivious data access [15] •Secure MR/Spark [14,22,27,37] •Secure Machine Learning [26] •Secure Distributed Comp° [3,8,18,20,21,23,31] Functionalities ...
Article
Smart disclosure initiatives and new regulations such as GDPR in the EU increase the interest for Personal Data Management Systems (PDMS) being provided to individuals to preserve their entire digital life. Consequently, the thorny issue of data security becomes more and more prominent, but highly differs from traditional privacy issues in outsourced corporate databases. Concurrently, the emergence of Trusted Execution Environments (TEE) changes the game in privacy-preserving data management with novel security models. This tutorial offers a global perspective of the current state of work at the confluence of these two rapidly growing areas. The goal is threefold: (1) review and categorize PDMS solutions and identify existing privacy threats and countermeasures; (2) review new security models capitalizing on TEEs and related privacy-preserving data management solutions relevant to the personal context; (3) discuss new challenges at the intersection of PDMS security and TEE-based data management.
... Data management embedded in secure tokens [4,34] or more generally in smart objects is no longer a new topic. Many proposals from the database community tackle this problem in the context of the Internet of Things [11], strengthening the idea that smart objects must now be considered as first-class data sources. ...
... We then consider two use-cases where an embedded keyword-based search engine is called to play a central role and which exhibit different requirements in terms of document indexing, with the objective to assess the versatility of the solution. 200,200,200,200,200 The first use-case is in the Personal Cloud context and considers the use of a secure token embedding a Personal Data Server [4,5,34] to securely store, query and share personal files (documents, photos, emails) as presented in Section 1. This use-case is representative of situations where the indexing documents have a rich content (tens to hundreds of thousands of terms) and documents updates and deletes can be performed randomly. ...
... The second use-case targets the smart sensor context and the case where documents with a poor content are integrated in a Personal Cloud. For instance, home gateways capture a variety of events issued by a growing number of smart appliances, car trackers register our locations and driving habits to compute insurance fees and carbon tax [4]. Here, the documents are time windows, the terms are events occurring during this time window, and top-k queries are useful for analytic tasks. ...
Article
The Personal Cloud paradigm has emerged as a solution that allows individuals to manage under their control the collection, usage and sharing of their data. However, by regaining the full control over their data, the users also inherit the burden of protecting it against all forms of attacks and abusive usages. The Secure Personal Cloud architecture relieves the individual from this security task by employing a secure token (i.e., a tamper-resistant hardware device) to control all the sensitive information (e.g., encryption keys, metadata, indexes) and operations (e.g., authentication, data encryption/decryption, access control, and query processing). However, secure tokens are usually equipped with extremely low RAM but have significant Flash storage capacity (Gigabytes), which raises important barriers for embedded data management. This paper presents a new embedded search engine specifically designed for secure tokens, which applies to the important use-case of managing and securing documents in the Personal Cloud context. Conventional search engines privilege either insertion or query scalability but cannot meet both requirements at the same time. Moreover, very few solutions support data deletions and updates in this context. In this paper, we introduce three design principles, namely Write-Once Partitioning, Linear Pipelining and Background Linear Merging, and show how they can be combined to produce an embedded search engine matching the hardware constraints of secure tokens and reconciling high insert/delete/update rate and query scalability. Our experimental results, obtained with a prototype running on a representative hardware platform, demonstrate the scalability of the approach on large datasets and its superiority compared to state of the art methods. Finally, we also discuss the integration of our solution in another important real use-case related to performing information retrieval in smart objects.
... PIMS holds the promise of a Privacy-by-Design storage and computing platform where each individual can gather her complete digital environment in one place and share it with applications and other users, while preserving her control over her data. The Trusted Cells architecture presented in [4], and pictured in Figure 4.1, precisely answers the PIMS requirements by preventing data leaks during computations on personal data. Hence, we consider Trusted Cells as a reference computing architecture in this chapter. ...
... Trusted Cells reference architecture[4]. ...
Thesis
The benefit of performing Big data computations over individual’s microdata is manifold, in the medical, energy or transportation fields to cite only a few, and this interest is growing with the emergence of smart-disclosure initiatives around the world.However, these computations often expose microdata to privacy leakages, explaining the reluctance of individuals to participate in studies despite the privacy guarantees promised by statistical institutes. To regain indivuals’trust, it becomes essential to propose user empowerment solutions, that is to say allowing individuals to control the privacy parameter used to mke computations over their microdata.This work proposes a novel concept of personalized anonymisation based on data generalization and user empowerment.Firstly, this manuscript proposes a novel approach to push personalized privacy guarantees in the processing of database queries so that individuals can disclose different amounts of information (i.e. data at different levels of accuracy) depending on their own perception of the risk. Moreover, we propose a decentralized computing infrastructure based on secure hardware enforcing these personalized privacy guarantees all along the query execution process.Secondly, this manuscript studies the personalization of anonymity guarantees when publishing data. We propose the adapation of existing heuristics and a new approach based on constraint programming. Experiments have been done to show the impact of such personalization on the data quality. Individuals’privacy constraints have been built and realistically using social statistic studies.
... To improve the security of home cloud plugs, research proposals like Personal Data Server (PDS) [3] and Trusted Cells [8] introduce secure (i.e., tamper-resistant) hardware at the network edges to manage the user's personal data. These approaches propose to embed a minimal Trusted Computing Base (TCB) dedicated to data management in the secure element of smart phones, set-top boxes or portable USB tokens to form a global decentralized secured data platform. ...
Article
Riding the wave of smart disclosure initiatives and new privacy-protection regulations, the Personal Cloud paradigm is emerging through a myriad of solutions offered to users to let them gather and manage their whole digital life. On the bright side, this opens the way to novel value-added services when crossing multiple sources of data of a given person or crossing the data of multiple people. Yet this paradigm shift towards user empowerment raises fundamental questions with regards to the appropriateness of the functionalities and the data management and protection techniques which are offered by existing solutions to laymen users. These questions must be answered in order to limit the risk of seeing such solutions adopted only by a handful of users and thus leaving the Personal Cloud paradigm to become no more than one of the latest missed attempts to achieve a better regulation of the management of personal data. To this end, we review, compare and analyze personal cloud alternatives in terms of the functionalities they provide and the threat models they target. From this analysis, we derive a general set of functionality and security requirements that any Personal Data Management System (PDMS) should consider. We then identify the challenges of implementing such a PDMS and propose a preliminary design for an extensive and secure PDMS reference architecture satisfying the considered requirements. Finally, we discuss several important research challenges remaining to be addressed to achieve a mature PDMS ecosystem.
... PIMS holds the promise of a Privacy-by-Design storage and computing platform where each individual can gather her complete digital environment in one place and share it with applications and other users under her control. The Trusted Cells architecture presented in (Anciaux et al., 2013), and pictured in Figure 1, precisely answers the PIMS requirements by preventing data leaks during computations on personal data. Hence, we consider Trusted Cells as a reference computing architecture in this paper. ...
Chapter
The benefit of performing Big data computations over individual’s microdata is manifold, in the medical, energy or transportation fields to cite only a few, and this interest is growing with the emergence of smart disclosure initiatives around the world. However, these computations often expose microdata to privacy leakages, explaining the reluctance of individuals to participate in studies despite the privacy guarantees promised by statistical institutes.
Conference Paper
Full-text available
In this paper, we address issues related to sharing information in a distributed system consisting of autonomous entities, each of which holds a private database. Semi-honest behavior has been widely adopted as the model for adversarial threats. However, it substantially underestimates the capability of adversaries in reality. In this paper, we consider a threat space containing more powerful adversaries that includes not only semi-honest but also those malicious adversaries. In particular, we classify malicious adversaries into two widely existing subclasses, called weakly malicious and strongly malicious adversaries, respectively. We define a measure of privacy leakage for information sharing systems and propose protocols that can effectively and efficiently protect privacy against different kinds of malicious adversaries.
Conference Paper
Full-text available
An increasing amount of personal data is automatically gathered and stored on servers by administrations, hospitals, insurance companies, etc. Citizen themselves often count on internet companies to store their data and make them reliable and highly available through the internet. However, these benefits must be weighed against privacy risks incurred by centralization. This paper suggests a radically different way of considering the management of personal data. It builds upon the emergence of new portable and secure devices combining the security of smart cards and the storage capacity of NAND Flash chips. By embedding a full-fledged Personal Data Server in such devices, user control of how her sensitive data is shared by others (by whom, for how long, according to which rule, for which purpose) can be fully reestablished and convincingly enforced. To give sense to this vision, Personal Data Servers must be able to interoperate with external servers and must provide traditional database services like durability, availability, query facilities, transactions. This paper proposes an initial design for the Personal Data Server approach, identifies the main technical challenges associated with it and sketches preliminary solutions. We expect that this paper will open exciting perspectives for future database research.
Article
In this paper, we introduce the family of UCONABC models for usage control (UCON), which integrate Authorizations (A), oBligations (B), and Conditions (C). We call these core models because they address the essence of UCON, leaving administration, delegation, and other important but second-order issues for later work. The term usage control is a generalization of access control to cover authorizations, obligations, conditions, continuity (ongoing controls), and mutability. Traditionally, access control has dealt only with authorization decisions on users' access to target resources. Obligations are requirements that have to be fulfilled by obligation subjects for allowing access. Conditions are subject and object independent environmental or system requirements that have to be satisfied for access. In today's highly dynamic, distributed environment, obligations and conditions are also crucial decision factors for richer and finer controls on usage of digital resources. Although they have been discussed occasionally in recent literature, most authors have been motivated from specific target problems and thereby limited in their approaches. The UCONABC model integrates these diverse concepts in a unified framework. Traditional authorization decisions are generally made at the time of requests but hardly recognize ongoing controls for relatively long-lived access or for immediate revocation. Moreover, mutability issues that deal with updates on related subject or object attributes as a consequence of access have not been systematically studied.Unlike other studies that have targeted on specific problems or issues, the UCONABC model seeks to enrich and refine the access control discipline in its definition and scope. UCONABC covers traditional access controls such as mandatory, discretionary, and role-based access control. Digital rights management and other modern access controls are also covered. UCONABC lays the foundation for next generation access controls that are required for today's real-world information and systems security. This paper articulates the core of this new area of UCON and develops several detailed models.
Article
In this paper, we introduce the family of UCON ABC models for usage control (UCON), which in-tegrate Authorizations (A), oBligations (B), and Conditions (C). We call these core models because they address the essence of UCON, leaving administration, delegation, and other important but second-order issues for later work. The term usage control is a generalization of access control to cover authorizations, obligations, conditions, continuity (ongoing controls), and mutability. Tra-ditionally, access control has dealt only with authorization decisions on users' access to target resources. Obligations are requirements that have to be fulfilled by obligation subjects for allowing access. Conditions are subject and object independent environmental or system requirements that have to be satisfied for access. In today's highly dynamic, distributed environment, obligations and conditions are also crucial decision factors for richer and finer controls on usage of digital resources. Although they have been discussed occasionally in recent literature, most authors have been mo-tivated from specific target problems and thereby limited in their approaches. The UCON ABC model integrates these diverse concepts in a unified framework. Traditional authorization deci-sions are generally made at the time of requests but hardly recognize ongoing controls for relatively long-lived access or for immediate revocation. Moreover, mutability issues that deal with updates on related subject or object attributes as a consequence of access have not been systematically studied. Unlike other studies that have targeted on specific problems or issues, the UCON ABC model seeks to enrich and refine the access control discipline in its definition and scope. UCON ABC covers traditional access controls such as mandatory, discretionary, and role-based access control. Digital rights management and other modern access controls are also covered. UCON ABC lays the foundation for next generation access controls that are required for today's real-world information and systems security. This paper articulates the core of this new area of UCON and develops several detailed models.
Article
Load signatures are the unique identities of the electrical loads expressed in electrical form. A load signature database that contains the signatures of different types of loads can be combined with other technologies, such as smart meter, to provide new services and products. As there are a huge number of loads, it is valuable to construct load taxonomy to understand the signatures of different types of loads. The objective of this study is to construct a taxonomy of the typical household appliances using voltage-current (V-I) trajectory load signatures. Shape features were extracted from the trajectories and hierarchical clustering method was employed for grouping the appliances. The resulting taxonomy showed that the shape features were able to classify the appliances according to their similarities and dissimilarities in the shapes of trajectories, and produced engineering meaningful groupings.
Article
While the Internet was conceived as a decentralized network, the most widely used web applications today tend toward centralization. Control increasingly rests with centralized service providers who, as a consequence, have also amassed unprecedented amounts of data about the behaviors and personalities of individuals. Developers, regulators, and consumer advocates have looked to alternative decentralized architectures as the natural response to threats posed by these centralized services. The result has been a great variety of solutions that include personal data stores (PDS), infomediaries, Vendor Relationship Management (VRM) systems, and federated and distributed social networks. And yet, for all these efforts, decentralized personal data architectures have seen little adoption. This position paper attempts to account for these failures, challenging the accepted wisdom in the web community on the feasibility and desirability of these approaches. We start with a historical discussion of the development of various categories of decentralized personal data architectures. Then we survey the main ideas to illustrate the common themes among these efforts. We tease apart the design characteristics of these systems from the social values that they (are intended to) promote. We use this understanding to point out numerous drawbacks of the decentralization paradigm, some inherent and others incidental. We end with recommendations for designers of these systems for working towards goals that are achievable, but perhaps more limited in scope and ambition.
Conference Paper
TrustedDB is an outsourced database prototype that allows clients to execute SQL queries with privacy and under regulatory compliance constraints without having to trust the service provider. TrustedDB achieves this by leveraging server-hosted tamper-proof trusted hardware in critical query processing stages. TrustedDB does not limit the query expressiveness of supported queries. And, despite the cost overhead and performance limitations of trusted hardware, the costs per query are orders of magnitude lower than any (existing or) potential future software-only mechanisms. TrustedDB is built and runs on actual hardware, and its performance and costs are evaluated here.
Article
Privacy is one of the most urgent issues associated with information technology and digital media. This book claims that what people really care about when they complain and protest that privacy has been violated is not the act of sharing information itself—most people understand that this is crucial to social life —but the inappropriate, improper sharing of information. Arguing that privacy concerns should not be limited solely to concern about control over personal information, Helen Nissenbaum counters that information ought to be distributed and protected according to norms governing distinct social contexts—whether it be workplace, health care, schools, or among family and friends. She warns that basic distinctions between public and private, informing many current privacy policies, in fact obscure more than they clarify. In truth, contemporary information systems should alarm us only when they function without regard for social norms and values, and thereby weaken the fabric of social life.
Article
A load signature is an electrical expression that a load device or appliance distinctly possesses. Load signatures can be applied to produce many useful services and products, such as, determining the energy usage of individual appliances, monitoring the health of critical equipment, monitoring power quality, and developing facility management tools. Load signatures of typical yet extensive loads are needed to be collected before applying them to different services and products. As there are an enormous number of electrical appliances, it is beneficial to classify the appliances for building a well-organized load signature database. The objective of this study is to develop an effective method to classify the loads. A 2-dimensional form of load signatures, voltage-current (V-l) trajectory, is suggested for characterizing the typical household appliances.Hierarchical clustering method was employed to classify the appliances and construct the taxonomy of the appliances. The taxonomy based on V-l trajectory was compared to the taxonomies based on traditional power metrics and eigenvectors in the previous studies. It was found that the groups of appliances in the taxonomy based on V-I trajectory were well-separated and had engineering meanings .