Content uploaded by Diego F. Aranha
Author content
All content in this area was uploaded by Diego F. Aranha on Feb 02, 2019
Content may be subject to copyright.
Available via license: CC BY 4.0
Content may be subject to copyright.
Alves and Aranha
RESEARCH
A framework for searching encrypted databases
Pedro G. M. R. Alves*and Diego F. Aranha
This is the extended version of a paper by the same name that appeared in XVI Brazilian Symposium on
Information and Computational Systems Security in November, 2016.
Abstract
Cloud computing is a ubiquitous paradigm responsible for a fundamental change in the way distributed
computing is performed. The possibility to outsource the installation, maintenance and scalability of servers,
added to competitive prices, makes this platform highly attractive to the computing industry. Despite this,
privacy guarantees are still insucient for data processed in the cloud, since the data owner has no real control
over the processing hardware. This work proposes a framework for database encryption that preserves data
secrecy on an untrusted environment and retains searching and updating capabilities. It employs
order-revealing encryption to perform selection with time complexity in Θ(log n), and homomorphic encryption
to enable computation over ciphertexts. When compared to the current state of the art, our approach provides
higher security and exibility. A proof-of-concept implementation on top of the MongoDB system is oered
and applied in the implementation of some of the main predicates required by the winning solution to Netix
Grand Prize.
Keywords: cryptography; functional encryption; homomorphic encryption; order revealing encryption;
searchable encryption; databases
1 Introduction
The massive adoption of cloud computing is respon-
sible for a fundamental change in the way distributed
computing is performed. The possibility to outsource
the installation, maintenance and scalability of servers,
added to competitive prices, makes this service highly
attractive [1, 2]. From mobile to scientic computing,
the industry increasingly embraces cloud services and
takes advantage of their potential to improve avail-
*Correspondence: pedro.alves@ic.unicamp.br
Institute of Computing, University of Campinas, Albert Einstein Ave. 1251,
13083-852 Campinas/SP, Brazil
Full list of author information is available at the end of the article
ability and reduce operational costs [3, 4]. However,
the cloud cannot be blindly trusted. Malicious par-
ties may acquire full access to the servers and con-
sequently to data. Among the threats there are ex-
ternal entities exploiting vulnerabilities, intrusive gov-
ernments requesting information, competitors seeking
unfair advantages, and even possibly malicious sys-
tem administrators. The data owner has no real con-
trol over the processing hardware and therefore cannot
guarantee the secrecy of data [5]. The risk of conden-
tiality breaches caused by inadequate and insecure use
of cloud computing is real and tangible.
Alves and Aranha Page 2 of 25
The importance of privacy preservation is frequently
underestimated, as well as the damage its failure
represents to society, as the unfolding of a privacy
breach may be completely unpredictable. A report
from Javelin Advisory Services found a distressing cor-
relation between individuals who were victims of data
breaches and later victims of nancial fraud. About
75% of total fraud losses in 2016 had this characteris-
tic, corresponding to U$ 12 billion [6]. This could be
avoided with the use of strong encryption at the user
side, never revealing data even to the application or
the cloud.
The problem of using standard encryption in an en-
tire database is that it eliminates the capability of se-
lecting records or evaluating arbitrary functions with-
out the cryptographic keys, reducing the cloud to a
complex and huge storage service. For this reason, al-
ternatives have been proposed to solve this problem,
starting from anonymization and heuristic operational
measures which do not provide formal privacy guaran-
tees. Encryption schemes tailored for databases such
as searchable encryption are a promising solution with
perhaps more clear benets [7, 8, 9, 10]. Searchable
encryption enables the cloud to manipulate encrypted
data on behalf of a client without learning informa-
tion. Hence, it solves both of aforementioned problems,
keeping condentiality in regard to the cloud but re-
taining some of its interesting features.
1.1 The frustration of data anonymization
In 2006, Netix shared their interest in improving the
recommendation system oered to their users with the
academic community. This synergy was directed to an
open competition during 3 rounds which oered nan-
cial prizes for the best recommendation algorithms.
An important feature of Netix’s commercial model is
to eciently and assertively guide subscribers in nd-
ing content compatible to their interests. Doing this
correctly may reinforce the importance of the product
for leisure activities, consolidate Netix’s commercial
position, and ensure clients’ loyalty [11].
The participants of the contest received a training set
with anonymized movie ratings collected from Netix
subscribers between 1999 and 2005. There are approxi-
mately half million customers and about 17 thousands
movies classied in the set, totalling over 100 million
ratings. This dataset is composed by movie titles, the
timestamp when the rating was created, the rating it-
self, and an identication number for relating same-
user records. No other information about customers
was shared, such as name, address or gender. The ob-
jective of the participants was to predict with good ac-
curacy how much someone would enjoy a movie based
on their previously observed behavior in the platform.
In the same year, America Online (AOL) took a sim-
ilar approach and released millions of search queries
made by 658,000 of its users with the goal of contribut-
ing to the scientic community by enabling statistical
work over real data [12]. As Netix, AOL applied ef-
forts on anonymizing the data before publishing. All
the obviously sensitive data, such as usernames and IP
addresses, were suppressed, being replaced by unique
identication numbers.
The ability to understand user’s interests and pre-
dict their behavior based on collected data is desir-
able in several commercial models and consequently
a hot topic in the scientic literature [13, 14, 15].
However, the importance of privacy-preserving prac-
tices is still underestimated, a challenge to overcome.
For instance, despite the anonymization eorts of
Netix, Narayanan and Shmatikov brilliantly demon-
strated how to break anonymity of the Netix’s dataset
by cross-referencing information with public knowl-
edge bases, as those provided by the Internet Movie
Database (IMDB) [16]. Using a similar approach, New
York Times’ reporters were capable of relating a subset
of queries to a particular person by joining apparently
Alves and Aranha Page 3 of 25
innocent queries to non-anonymous real state public
databases [17].
1.2 “Unexpected” leaks
These events raised a still unsolved discussion about
how to safely collect and use data without undermin-
ing user privacy. As remarked by Narayanan and Fel-
ten, “data privacy is a hard problem” [18]. Even when
data holders choose the most conservative practice and
never share data, system breaches may happen.
In 2013, a large-scale surveillance program of the
USA government was revealed by Edward Snowden,
a former NSA employee. Named PRISM, it was struc-
tured as a massive data interception eort to collect
information for posterior analysis. Their techniques ar-
guably had support of the US legal system and were
frequently applicable even without knowledge of the
data-owner companies [19, 20].
Two years later, in 2015, stolen personal data of
millions of users of the website Ashley Madison was
leaked by malicious parties exploiting security vulner-
abilities [21]. As consequence, several reports of extor-
tion and even a suicide, illustrating how increasingly
sensitive data breaches are becoming.
In the same year, the Sweden’s Transport Agency
decided to outsource its IT operations to IBM. To
fulll the contract, the latter chose sites in Eastern
Europe to place these operations. This resulted in
Swedish condential data being stored in foreign data
centers, in particular Czech Republic, Serbia and Ro-
mania. As expected, this decision led to a massive
data leak, containing information about all vehicles
throughout Sweden, including police and military ve-
hicles. Thus, names, photos and home addresses of mil-
lions of Swedish citizens, military personal, people un-
der the witness relocation program, were exposed [22].
In 2016, Yahoo conrmed that a massive data
breach, possibly the largest known, aected about 500
million accounts and revealed to the world a dataset
full of names, addresses, and telephone numbers [23].
These occurrences take us to the disturbing feeling
that, despise all eorts, the risk of data deanonymiza-
tion increases in worrying ways following how much of
it is made available [24, 25]. Hence, a seemingly obvi-
ous strategy to avoid such issue is to simply stop any
kind of dataset collection.
1.3 Privacy by renouncing knowledge
History has proven that the task of collecting and
storing data from third parties should be treated as
risky. The chance of compromising user privacy by
accident is too big and possibly with extreme conse-
quences. This way, the concept of security by renounc-
ing knowledge has attracted adepts, as the search en-
gine DuckDuckGo that states in a blog post that “the
only truly anonymised data is no data”, and because
of that claims to forego the right to store their users’
data [26, 27].
A more nancial-realistic approach for dealing with
this issue is not to give up completely of knowledge
but reduce the entities with access by keeping it en-
crypted during all its lifespan: transportation, storage,
and processing, staying secret to the application and
the cloud. Thus, a new security fence is set, tying data
secrecy to formal guarantees.
1.4 Our contributions
This work follows the state of the art and proposes
directives to the modeling of a searchable encrypted
database [28]. We detect the main primitives of a re-
lational algebra necessary to keep the database func-
tional, while adding enhanced privacy-preserving prop-
erties. A set of cryptographic tools is used to con-
struct each of these primitives. It is composed by order-
revealing encryption to enable data selection, homo-
morphic encryption for evaluation of arbitrary func-
tions, and a standard symmetric scheme to protect and
Alves and Aranha Page 4 of 25
add exibility to the handling of general data. In par-
ticular, our proposed selection primitive achieves time
complexity of Θ(log n)on the dataset size. Moreover,
we provide a security analysis and performance eval-
uation to estimate the impact on execution time and
space consumption, and a conceptual implementation
that validates the framework. It works on top of Mon-
goDB, a popular document-based database, and is im-
plemented as a wrapper over its Python driver. The
source code was made available to the community un-
der a GNU GPLv3 license [29].
When compared to CryptDB [7], our proposal pro-
vides stronger security since it is able to keep conden-
tiality even in the case of a compromise of the database
and application servers. Since CryptDB delegates to
the application server the capability to derive users’
cryptographic keys, it is not able to provide such se-
curity guarantees. Furthermore, our work is database-
agnostic, it is not limited to SQL but can be applied
on dierent key-value databases.
This work is structured as follows. Section 2 de-
scribes the cryptographic building blocks required for
building our proposed solution. Sections 3 and 4 de-
ne searchable encryption, discuss related threats, and
present existing implementations. Section 5 proposes
our framework, while Section 6 discusses its suitability
in a recommendation system for Netix. Our experi-
mental validation results are presented in Section 7
and Section 8 concludes the paper.
2 Building blocks
The two main classes of cryptosystems are known as
symmetric and asymmetric (or public-key) and dened
by how users exchange cryptographic keys. Symmet-
ric schemes use the same secret key for encryption and
decryption, or equivalently can eciently compute one
from the other, while asymmetric schemes generate a
pair of keys composed by public and private keys. The
former is distributed openly and is the sole informa-
tion needed to encrypt a message to the key owner,
while the latter should be kept secret and used for de-
cryption.
Besides this, cryptosystems that produce always the
same ciphertext for the same message-key input pair
are known as deterministic. The opposite, when ran-
domness is used during encryption, are known as prob-
abilistic. We next recall basic security notions and spe-
cial properties that make a cryptosystem suitable to a
certain application. Later, we shall make use of these
concepts to analyze the security of our proposal.
2.1 Security notions
Ciphertext indistinguishability is a useful property to
analyze the security of a cryptosystem. Two scenarios
are considered, when an adversary has and does not
have access to an oracle that provides decryption ca-
pabilities. Usually these are evaluated through a game
in which an adversary tries to acquire information from
ciphertexts generated by a challenger [30].
Indistinguishability under chosen plaintext attack –
IND-CPA. In the IND-CPA game the challenger gen-
erates a pair (P K,S K) of cryptographic keys, makes
P K public and keeps SK secret. An adversary has
as objective to recognize a ciphertext created from a
randomly chosen message from a known two-element
message set. A polynomially bounded number of op-
erations is allowed, including encryption (but not de-
cryption), over P K and the ciphertexts. A cryptosys-
tem is indistinguishable under chosen plaintext attack
if no adversary is able to achieve the objective with
non-negligible probability.
Indistinguishability under chosen ciphertext attack and
adaptive chosen ciphertext attack – IND-CCA1 and
IND-CCA2. This type of indistinguishability diers
from IND-CPA due to the adversary having access
to a decryption oracle. In this game the challenge is
again to recognize a ciphertext as described before,
Alves and Aranha Page 5 of 25
but now the adversary is able to use decryption re-
sults. This new game has two versions, non-adaptive
and adaptive. In the non-adaptive version, IND-CCA1,
the adversary may use the decryption oracle until it re-
ceives the challenge ciphertext. On the other hand, in
the adaptive version he is allowed to use the decryp-
tion oracle even after that event. For obvious reasons,
the adversary cannot send the challenge ciphertext to
the decryption oracle. A cryptosystem is indistinguish-
able under chosen ciphertext attack/adaptive chosen
ciphertext attack if no adversary is able to achieve the
objective with non-negligible probability.
Indistinguishability under chosen keyword attack and
adaptative chosen keyword attack – IND-CKA and
IND-CKA2. This security notion is specic to the
context of keyword-based searchable encryption [31].
It considers a scenario in which a challenger builds an
index with keyword sets from some documents. This
index enables someone to use a value Tw, called trap-
door, to verify if a document contains the word w. This
game imposes that no information should be leaked
from the remotely stored les or index beyond the out-
come and the search pattern of the queries. The adver-
sary has access to an oracle that provides the related
trapdoor for any word. His objective is to use this or-
acle as training to apply the acquired knowledge and
break the secrecy of unknown encrypted keywords. As
well as in the IND-CCA1/IND-CCA2 game, the non-
adaptative version, IND-CKA, of this game forbids the
adversary to use the trapdoor oracle once the challenge
trapdoor is sent by the challenger. On the other hand,
the adaptative version allows the use of the trapdoor
oracle even after this event.
A cryptosystem is indistinguishable under chosen
keyword attack if every adversary has only a negligible
advantage over random guessing.
Indistinguishability under an ordered chosen plaintext
attack – IND-OCPA. Introduced by Boldyreva et al.,
this notion supposes that an adversary is capable of re-
trieving two sequences of ciphertexts resulting of the
encryption of any two sequences of messages [32]. Fur-
thermore, he knows that both sequences have identical
ordering. The objective of this adversary is to distin-
guish between these ciphertexts. A cryptosystem is in-
distinguishable under an ordered chosen plaintext at-
tack if no adversary is able to achieve the objective
with non-negligible probability.
2.2 Functional encryption
Cryptographic schemes deemed “functional” receive
such name because they support one or more oper-
ations over the produced ciphertexts, hence becoming
useful not only for secure storage.
Order-revealing encryption (ORE) Order-revealing
encryption schemes are characterized by having, in
addition to the usual set of cryptographic functions
like keygen and encrypt, a function capable of compar-
ing ciphertexts and returning the order of the original
plaintexts, as shown by Denition 1.
Denition 1 (ORE) Let Ebe an encryption func-
tion, Cbe a comparison function, and m1and m2
be plaintexts from the message space. The pair (E, C )
is dened as an encryption scheme with the order-
revealing property if:
C(E(m1), E (m2)) =
lower,if m1< m2,
equal,if m1=m2,
greater,otherwise.
This is a generalization of order-preserving encryp-
tion (OPE), that xes Cto a simple numerical com-
parison [33].
Security As argued by Lewi and Wu, the “best-
possible” notion of security for ORE is IND-OCPA,
Alves and Aranha Page 6 of 25
which means that it is possible to achieve indistin-
guishability of ciphertexts and with a much stronger
security guarantee than OPE schemes can have [34].
Furthermore, dierently from OPE, ORE is not inher-
ently deterministic [35]. For example, Chenette et al.
propose an ORE scheme that applies a pseudo-random
function over an OPE scheme, while Lewi and Wu pro-
pose an ORE scheme completely built upon symmetric
primitives, capable of limiting the use of the compari-
son function and reducing the leakage inherent to this
routine [36, 34]. Their solution works by dening ci-
phertexts composed by pairs (ctL, ctR). To compare
ciphertexts ctAand ctB, it requires ctALand ctBR.
This way, the data owner is capable of storing only
one side of those pairs in a remote database being cer-
tain that no one will be able to make comparisons be-
tween those elements. Nevertheless, any scheme that
reveals numerical order of plaintexts through cipher-
texts is vulnerable to inference attacks and frequency
analysis, as those described by Naveed et al. over re-
lational databases encrypted using deterministic and
OPE schemes [37]. Although ORE does not completely
discard the possibility of such attacks, it oers stronger
defenses.
Homomorphic encryption (HE) Homomorphic en-
cryption schemes have the property of conserving some
plaintext structure during the encryption process, al-
lowing the evaluation of certain functions over cipher-
texts and obtaining, after decryption, a result equiva-
lent to the same computation applied over plaintexts.
Denition 2 presents this property in a more formal
way.
Denition 2 (HE) Let Eand Dbe a pair of encryp-
tion and decryption functions, and m1and m2be plain-
texts. The pair (E, D)forms an encryption scheme
with the homomorphic property for some operator if
and only if the following holds:
E(m1)◦E(m2)≡E(m1m2).
The operation ◦in the ciphertext domain is equivalent
to in the plaintext domain.
Homomorphic cryptosystems are classied according
to the supported operations and their limitations. Par-
tially homomorphic encryption schemes (PHE) hold on
Denition 2 for either addition or multiplication oper-
ations, while fully homomorphic encryption schemes
(FHE) support both addition and multiplication oper-
ations.
PHE cryptosystems have been known for decades [38,
39]. However, the most common data processing appli-
cations, as those arising from statistics, machine learn-
ing or genomics processing, frequently require support
for both addition and multiplication operations simul-
taneously. This way, such schemes are not suitable for
general computation.
Nowadays, FHE performance is prohibitive, so weaker
variants, such as SHE [1] and LHE [2], have the stage
for solving computational problems of moderate com-
plexity [40, 41].
Security In terms of security, homomorphic encryp-
tion schemes achieve at most IND-CCA1, which means
that the scheme is not secure against an attacker with
arbitrary access to a decryption oracle [30]. This is a
natural consequence of the design requirements, since
these cryptosystems allow any entity to manipulate
ciphertexts. Most of current proposals, however, reach
at most IND-CPA and stay secure against attackers
without access to a decryption oracle [42].
[1]SHE stands for “Somewhat homomorphic encryp-
tion”.
[2]LHE stands for “Leveled fully homomorphic encryp-
tion”.
Alves and Aranha Page 7 of 25
3 Searchable encryption
We now formally dene the problem of searching over
encrypted data. We present three state-of-the-art im-
plementations of solutions to this problem, namely the
CryptDB, Arx, and Seabed database systems.
3.1 The problem
Suppose a scenario where Alice keeps a set of docu-
ments in untrusted storage maintained by an also un-
trusted entity Bob. She would like to keep this data
encrypted because, as dened, Bob cannot be trusted.
Alice also would like to occasionally retrieve a subset
of documents accordingly to a predicate without re-
vealing any sensitive information to Bob. Thus, shar-
ing the decryption key is not an option. The problem
lies in the fact that communication between Alice and
Bob may (and probably will) be constrained. Hence, a
naive solution consisting of Bob sending all documents
to Alice and letting her decrypt and select whatever
she wants may not be feasible. Alice must then imple-
ment some mechanism to protect her encrypted data
so that Bob will be able to identify the desired docu-
ments without knowing their contents or the selection
criteria [43].
An approach that Alice can take is to create an en-
crypted index as in Denition 3.
Denition 3 (Encrypted indexing) Suppose a
dataset DB = (m1, . . . , mn)and a list W=
(W1, . . . , Wn)of sets of keywords such that Wi
contains keywords for mi. The following routines
are required to build and search on an encrypted
index:
BuildIndexK(DB,W):The list Wis encrypted
using a searchable scheme under a key Kand
results in a searchable encrypted index I. This
process may not be reversible (e.g., if a hash
function is used). The routine outputs I.
TrapdoorK(F):This function receives a predi-
cate Fand outputs a trapdoor T. The latter
is dened as the information needed to search
Iand nd records that satisfy F.
SearchI(T): It iterates through Iapplying the
trapdoor Tand outputs every record that re-
turns True for the input trapdoor.
This way, if the searchable cryptosystem used is
IND-CKA then Alice is able to keep her data with Bob
and remain capable of selecting subsets of it without
revealing information [28].
3.2 Threat modeling
The development of ecient and secure solutions for
management of datasets depends on the awareness
of the threats we intent to mitigate. For such, this
work follows Grubbs’ denitions of adversaries for a
database [44].
Active attacker. The worst case scenario is when the
attacker acquires full control over the server, being ca-
pable of performing arbitrary operations. Thus, he is
not committed to follow any protocol.
Snapshot attacker. The adversary obtains a snapshot
of the dataset containing the primary data and indexes
but no information about issued queries and how they
access the encrypted data.
Persistent passive attacker. Another possibility is a
scenario in which the attack cannot interfere with the
server functionality but can observe all of its opera-
tions. We do not consider only attackers that inspect
issued queries in real-time, but also those that are able
to recover them later. As demonstrated by Grubbs,
the data contained in a real-world database goes far
beyond the primary dataset (names, addresses, …).
It also includes logs, caches, and auxiliary tables (as
Alves and Aranha Page 8 of 25
MySQL’s diagnostic tables) used, for instance, to guar-
antee ACID [3] and enable the server to undo incom-
plete queries after a power-break. It is very likely that
an attacker competent to subjugate the security proto-
cols of the system will be capable to also recover these
secondary datasets.
The idea of a snapshot attacker is very popular
among solutions and researchers intended to develop
encrypted databases. Nevertheless, it underestimates
the attacker and the many side-attacks a motivated
adversary can execute. As Rogaway remarks, we can-
not make the mistake to reduce the adversary to the
lazy and abstract Bob, but we must remember that it
can go far beyond that and take the form of a military-
industrial-surveillance program with a billionaire bud-
get and capability to surpass the obvious [45].
4 Related work
The management of a dataset is made by a database
management system (DBMS). It is composed by sev-
eral layers responsible for coordinating read and write
operations, guarantee data consistency and integrity,
and user access. The engineering of such a system is
a complex task and requires smart optimizations to
be able to store data, process queries and return the
outcome with minimum latency and good scalability.
This way, searchable encryption solutions usually are
implemented not inside but on top of these systems
as a middleware to translate encrypted queries to the
DBMS without revealing plaintext data and decrypt
the outcome, as shown in Figure 1. This strategy en-
ables the use of decades of optimizations incorporated
in nowadays DBMSs and portable to encrypted data.
It is important to state that, ideally, security features
should be designed in conjunction with the underlying
[3]Relative to a set of desirable properties for a
database. Acronym to “Atomicity, Consistency, Isola-
tion, Durability”.
database. Long-term solutions are expected to assimi-
late those strategies internally in the DBMS core.
Figure 1 Sequence diagram representing the process of
generating and processing an encrypted query. The proxy is
positioned between the user and the DBMS in a trusted
environment. Its responsibility is to receive a plaintext query,
apply an encryption protocol, submit the encrypted query to
the DBMS, and decrypt the outcome.
4.1 CryptDB
CryptDB is a software layer that provides capabilities
to store data in a remote SQL database and query
over it without revealing sensitive information to the
DBMS. It introduces a proxy layer responsible to en-
crypt and adjust queries to the database and decrypt
the outcome [7].
The context in which CryptDB stands is a typical
structure of database-backed applications, consisting
of a DBMS server and a separate application server.
To query a database, a predicate is generated by the
application and processed by the proxy before it is
sent to the DBMS server. The user interacts exclu-
sively with the application server and is responsible
for keeping his password secret. This is provided on
login to the proxy (via application) that derives all
the cryptographic keys required to interact with the
database. When the user logs out, it is expected that
the proxy deletes its keys.
Data encryption is done through the concept of
“onions”, which consist of layers of encryption that
Alves and Aranha Page 9 of 25
are combined to provide dierent functionalities, as
shown in Figure 2. Such layers are revealed as nec-
essary to process the queries being performed. Mod-
eling a database involves evaluating the meaning of
each attribute and predicting the operations it must
support. In particular, keyword-searching as described
in Denition 3 is implemented as proposed in Song’s
work [43]. The performance overhead over MySQL
measured by the authors is up to 30%.
Figure 2 Representation of the data format used by CryptDB.
The current value to be protected lies in the center, and a new
encryption layer is overlapped to it according to the need of a
particular functionality.
Two types of threats are treated in CryptDB: curi-
ous database administrators who try to snoop and ac-
quire information about client’s data but respect the
established protocols (a persistent passive attacker);
and an adversary that gains complete control of ap-
plication and DBMS servers (an active attacker). The
authors state that the rst threat is mitigated through
the encryption of stored data and the ability to query
it without any decryption or knowledge about its con-
tent; while the second applies only to logged-in clients.
In the considered scenario, the cryptographic keys rela-
tive to data in the database are handled by the appli-
cation server. Thus, if the application server is com-
promised, all the keys it possesses at that moment
(that are expected to be only from logged-in users) are
leaked to the attacker. Such arguments were revisited
after works by Naveed and Grubbs et al. demonstrated
how to explore several weaknesses of the construction,
such as the application of OPE [46, 47].
4.2 Arx
Arx is a database system implemented on top of Mon-
goDB [8]. It targets much stronger security proper-
ties and claims to protect the database with the same
level of regular AES-based encryption[4], achieving
IND-CPA security. This is a direct consequence of the
almost exclusively use of AES to construct selection
operators, even on range queries, and not only brings
strong security but also good performance due to e-
ciency of symmetric primitives, sometimes even ben-
eting from hardware implementations. The authors
report a performance overhead of approximately 10%
when used to replace the database of ShareLatex. The
building blocks used for searching follow those de-
scribed in Denition 3. Furthermore, they apply a dif-
ferent AES key for each keyword when generating the
trapdoor, requiring the client to store counters, as ex-
plained in the next paragraph.
At its core, Arx introduces two database indexes,
Arx-Range for range and order-by-limit queries and
Arx-EQ for equality queries, both built on top of
AES and using chained garbled circuits. The former
uses an obfuscation strategy to protect data, while en-
abling searches in logarithmic time. The latter embeds
a counter into each repeating value. This ensures that
the encryption of both are dierent, protecting them
against frequency analysis. Using a token provided by
[4]The Advanced Encryption Standard (AES) is a
well-established symmetric block cipher enabling high
performance implementation in hardware and soft-
ware [48].
Alves and Aranha Page 10 of 25
the client, the database is able to expand it in many
search tokens and return all the occurrences desired,
allowing an index to be built over encrypted data.
The context in which Arx stands is similar to
CryptDB. However, the authors consider the data
owner as the application itself. This way, it simplies
the security measures and considers the responsibil-
ity to keep the application server secure outside of its
scope.
4.3 Seabed
Seabed was developed by Papadimitriou et al. and
aims at Business Intelligence (BI) applications inter-
ested in keeping data secure on the cloud [49]. As
well as CryptDB and Arx, Seabed was built con-
sisting of a client-side query translator (to SQL), a
query planner, and a proxy that connects to a Apache
Spark instance [50]. Its main foundations are two new
cryptographic constructions, additively symmetric ho-
momorphic encryption (ASHE) and Splayed ASHE
(SPLASHE). The former is used to replace Paillier as
the additively homomorphic encryption scheme, stat-
ing that their construction is up to three orders of mag-
nitude faster. The latter is used to protect the database
against inference attacks [37].
SPLASHE works by splitting sensitive data into mul-
tiple attributes, obscuring the low-entropy of deter-
ministic encryption. Formally, let Cbe a sensitive at-
tribute of a dataset that can be lled with dpossible
discrete values. The approach taken by SPLASHE is
to replace this attribute in the encrypted database by
{C1,C2,· · · ,Cd}such that Cv= 1 and Ct= 0 for
t6=vif C=v. When encrypted by ASHE the cipher-
texts will look random to the adversary.
Seabed’s authors argue that SPLASHE is strong
enough to mitigate frequency analysis, enabling the
use of deterministic encryption whenever it is re-
quired in the database model. However, Grubbs states
that SPLASHE’s protection may be deected through
the auxiliary data stored by the database [44]. Their
work demonstrates how state-of-the-art databases
store metadata that can be used to reconstruct is-
sued queries and, this way, recognize access patterns
on the attributes. Such patterns leak the information
that SPLASHE intended to hide. Considering this, the
only threat really mitigated by SPLASHE against the
deterministic encryption of Seabed is from a snapshot
attacker.
5 Proposed framework
The goal of the proposed framework is to develop a
database model capable of storing encrypted records
and applying relational algebra primitives on it with-
out the knowledge of any cryptographic keys or the
need for decryption. A trade-o between performance
and security is desirable, however we completely dis-
card deterministic encryption whenever possible for se-
curity reasons. The only exception are contexts with
unique records, which avoid by denition weaknesses
intrinsic to deterministic encryption. The applicabil-
ity of this framework goes beyond SQL databases. Be-
sides the relational algebra hereby used to describe the
framework, it can be extended to key-value, document-
oriented, full text and several other databases classes
that keep the same attribute structure.
The three main operations needed to build a use-
ful database are insertion, selection and update. Once
data is loaded, being able to select only those pieces
that correspond to an arbitrary predicate is the basic
block to construct more complex operations, such as
grouping and equality joins. This functionality is fun-
damental when there is a physical separation between
the database and the data owner, otherwise high de-
mand for bandwidth is incurred to transmit large frac-
tions of the database records. Furthermore, real data
is frequently mutable and thus the database must sup-
port updates to remain useful.
Alves and Aranha Page 11 of 25
We dene as secure a system model that guarantees
that the data owner is the only entity capable of re-
vealing data, which can be achieved by his exclusive
possession of the cryptographic keys. Thus, a funda-
mental aspect of our proposal is the scenario in which
the database and the application server handle data
with minimum knowledge.
Lastly, the framework does not ensure integrity,
freshness or completeness of results returned to the
application or the user, since an adversary that com-
promises the database in some way can delete or alter
records. We consider this threat to be outside the scope
of this framework.
5.1 Classes of attributes
Records in an encrypted database are composed by
attributes. These consist of a name and a value, that
can be an integer, oat, string or even a binary blob.
Values of attributes are classied according to their
purpose:
static An immutable value only used for stor-
age. It is not expected to be evaluated
with any function, so there is no special
requirement for its encryption.
index Used for building a single or multivalued
searchable index. It should enable one to
verify if an arbitrary term is contained in
a set without the need to acquire knowl-
edge of its content.
computable A mutable value. It supports the eval-
uation with arithmetic circuits and en-
sures obtaining, after decryption, a re-
sult equivalent to the same circuit applied
over plaintexts.
The implementation of each attribute must satisfy
the requirements without leaking any vital informa-
tion beyond those related directly with the attribute
objective (e.g.: order for index attributes). Since the
name of an attribute reveals information, it may need
to be protected as well. However, the acknowledge-
ment of an attribute is done using its name, so even
anonymous attributes must be traceable in a query.
An option for anonymizing the attribute name is to
treat it as an index.
The aforementioned cryptosystems are natural sug-
gestions to be applied within these classes. Since static
is a class for storage only, which has no other require-
ments, any scheme with appropriate security level and
performance may be used, as AES. On the other hand,
index and computable attributes are immediate appli-
cations of ORE and HE schemes. Particularly, the lat-
ter denes the HE scheme according to the required
operations. Attributes that require only one operation
can be implemented with a PHE scheme, which pro-
vides good performance; while those that require arbi-
trary additions and multiplications must use FHE and
deal with the performance issues.
Denition 4 (Secure ORE) Let Eand Cbe, re-
spectively, an encryption and a comparison func-
tion. The pair (E, C)forms an encryption scheme
with the order-revealing property dened as “se-
cure” if and only if it satises Denition 1; the
encryption of a message mcan be written as
E(m) = (cL, cR) = (EL(m), ER(m)), where EL
and ERare complementary encryption functions;
and the comparison between two ciphertexts c1and
c2is done by C(cL1, cR2). This way, Cmay be
applied without the complete knowledge of the ci-
phertexts.
In order to build a secure and ecient index, an
ORE scheme that corresponds to Denition 4 should
be used. We dene the search framework as in Deni-
tion 5.
Alves and Aranha Page 12 of 25
Denition 5 (Encrypted search framework) Let
Sbe a set of words, sk a secret key, and an ORE
scheme ( Enc,Cmp) that satises Denition 4.
The operations required for an encrypted search
over Sare dened as follows:
BuildIndexsk(S):Outputs the set
S∗={cR|(cL, cR) = Encsk(w),∀w∈S}.
Trapdoorsk(w):Outputs the trapdoor
Tw= (cL|(cL, cR) = Encsk(w)) .
SearchS∗, r(Tw):To select all records in S∗with
the relation r∈ {lower,equal,greater}
to word w, one computes the trapdoor Tw
and iterates through S∗looking for the records
w∗∈S∗that satisfy
Cmp (Tw, w∗) = r.
The set ˆ
Swith all the elements in S∗that
satisfy this equation is returned.
5.2 Database operations
Let us consider a model composed by an encrypted
dataset stored in a remote server and a user that pos-
sesses the secret cryptographic keys. The latter would
like to perform queries on data without revealing sensi-
tive information to the server, as dened in Section 3.1.
In 1970, Codd proposed the use of a relational alge-
bra as a model for SQL [51]. This consists of a small set
of operators that can be combined to execute complex
queries over the data.
Through the functions dened in Denition 5, a re-
lational algebra for encrypted database operations can
be built. The basic operators for such algebra are de-
ned as follows.
1Projection (πA): Returns a subset Aof at-
tributes from selected records. This subset may
be dened by attribute names that may or may
not be encrypted.
(a) encrypted: If encrypted, a deterministic scheme
is used or they are treated as index values.
deterministic scheme: The user computes
A∗={EncDet(a)|a∈A}.A∗is sent to the
server, which picks the projected attributes
using a standard algorithm.
index attributes: The user computes A∗=
{T rapdoorsk (a)|a∈A}.A∗is sent to the
server, which picks the projected attributes
using Search.
(b) unencrypted: Unencrypted selectors are sent
to and selected by the server using a stan-
dard algorithm.
2Selection (σϕ): Given a predicate ϕ, returns only
the records satisfying it.
• Handles exclusively index, hence ϕmust be
equivalent to a combination of comparative
operators supported by Search.
• Let wx←ϕ, where is a compatible com-
parative operator, wan index attribute, and
xthe operand to be compared (e.g.: σage>30
signals for records which the attribute named
“age” value is greater than 30). The trapdoor
Tϕ=T rapdoorsk (ϕ)is sent to the server
that executes Search.
3Cartesian product (×): The Cartesian product
of two datasets is executed using a standard algo-
rithm.
4Dierence (−): The dierence between two
datasets Aand Bencrypted with the same keys
is dened as A−B={x|x∈Aand x6∈ B}.
5Union (∪): The union of two datasets Aand B
encrypted with the same keys is dened as A∪B=
{x|x∈Aor x∈B}.
Alves and Aranha Page 13 of 25
Union and dierence are dened over datasets with
the same set of attributes. The opposite is expected for
Cartesian product, so that no attribute may be shared
between operands.
Ramakrishnan calls these “basic operators” in the
sense that they are essential and sucient to execute
relational operations [52]. Additional useful operators
can be built over those. For instance: rename, join-
like, and division. The same observation applies in
the encrypted domain, and complex operators can be
constructed given basic operators dened over the en-
crypted domain.
6Rename (ρa,b): Renames attributes. Their names
may or may not be encrypted.
(a) encrypted: Encryption shall be executed us-
ing a deterministic cryptosystem or names
treated as index values.
deterministic scheme:Let abe an at-
tribute name to be replaced by b. The
user computes a∗←EncDet(a)and b∗←
EncDet(b), and sends the output to the
server, which applies a standard replacement
algorithm.
index attributes:Let abe an attribute
name to be replaced by b. The user com-
putes a∗←T rapdoor(a)and b∗←cR|
(cL, cR) = Encindex(b)and sends the output
to to the server, which selects attributes re-
lated to a∗as equal through the operation
Search and renames the result to b∗.
(b) unencrypted: Unencrypted attribute names
may be renamed by the server using a stan-
dard algorithm.
7Natural join (): Let Aand Bbe datasets
with a common subset of attributes. The natu-
ral join between Aand Bis dened as the se-
lection of all elements that lies in Aand Band
match all the values in those attributes. More for-
mally, let c1, c2, . . . , cnbe attributes common to
Aand B;x1, x2, . . . , xnattributes not contained
in Aor in B;a1, a2, . . . , ambe attributes unique
to A;b1, b2, . . . , bkbe attributes unique to B; and
K=N∗
n+1. We have that,
A B ≡σci=xiρ(ci,xi)(A)×B,∀i∈K.
8Equi-join (θ): Let Aand Bbe datasets. The
equi-join between Aand Bis dened as the selec-
tion of all elements that lie in Aand Band satisfy
a condition θ. More formally, A B =σθ(A×B).
9Division (/): Let Aand Bbe datasets and C
the subset of attributes unique to A. The division
operator joins the operands by common attributes
but projects only those unique to the dividend.
Hence, A/B =πC(A B).
Finally, it is important to dene data insertion and
update despite these cannot be properly dened as re-
lational operators.
•Insert: Encrypted data is provided and inserted
into the database using a standard algorithm.
•Update: An update operation is dened as a se-
lection followed by the evaluation of a computable
attribute by a supported homomorphic operation.
This set of operators enables operating over an en-
crypted database without the knowledge of crypto-
graphic keys or acquiring sensitive information from
user queries.
5.3 Security analysis
We assume the scenario in which the data owner has
exclusive possession of cryptographic keys. This way,
insertions to the database must be locally encrypted
before being sent to the server. The database or the
application never deal with plaintext data. Our frame-
work thus has the advantage over CryptDB of pre-
serving privacy even in the outcome of a compromised
database or application server.
Alves and Aranha Page 14 of 25
Despite being conceptually similar to OPE, ORE is
able to address several of its security limitations. ORE
does not necessarily generate ciphertexts that reveal
their order by design, but allows someone to protect
this information by only revealing it through specic
functions. ORE is able to achieve the IND-OCPA se-
curity notion and adds randomization to ciphertexts.
Those characteristics make it much safer against infer-
ence attacks [37]. The proposal of Lewi and Wu goes
even beyond that and is capable of limiting the use
of the comparison function [34]. Their scheme gener-
ates a ciphertext that can be decomposed into left and
right components such that a comparison between two
ciphertexts requires only a left component of one ci-
phertext and the right component of the other. This
way, the authors argue that robustness against such
attacks is ensured since the database dump may only
contain the right component, that is encrypted using
semantically-secure encryption. Their scheme satises
our notion of a Secure ORE and, therefore, provides
strong defenses against Snapshot attackers.
An eavesdropper (Active or Persistent passive at-
tacker) is not capable of executing comparisons by
himself in a Secure ORE. However he may learn the
result of these and recognize repeated queries by ob-
serving the outcome of a selection. This weakness may
still be used for inference attacks, that can breach con-
dentiality from related attributes. This issue can get
worse if the trapdoor is deterministic, when there is
no other solution than implementing a key refresh-
ment algorithm. Besides that, the knowledge of the
numerical order between every pair of elements in a
sequence may leak information depending on the ap-
plication. This problem manifests itself in our proposal
on the σprimitive if it uses a weak index structure,
like a naive sequential index. A balanced-tree-based
structure, on the other hand, obscures the numerical
order of elements in dierent branches. This way, an
attacker is capable of recovering the order of up to
O(log n)database elements and infer about the oth-
ers, in a database with nelements.
Schemes used with computable attributes are lim-
ited to IND-CCA1, and typically reach only IND-CPA.
Moreover, homomorphic ciphertexts are malleable by
design. Thus, an attacker that acquires knowledge
about a ciphertext can use it to predictably manip-
ulate others.
Finally, BuildIndex is not able to hide the quantity
of records that share the same index. This way, one is
able to make inferences about those by the number
of records. There is also no built-in protection for the
number of entries in the database. A workaround is
to x the size of each static attribute value and round
the quantity of records in the database using padding.
This approach increases secrecy but also the storage
overhead.
5.4 Performance analysis
The application of ORE as the main approach to build
a database index provides an extremely important con-
tribution to selection queries. Search does not require
walking through all the records testing a trapdoor,
but only a logarithmic subset of it when implemented
over an optimal index structure, such as an AVL tree
or B-tree based structure [53]. This characteristic is
highlighted on union, intersection and dierence op-
erations, which work by comparing and selecting ele-
ments in dierent groups. Moreover, current proposals
in the state of the art of ORE enjoy good performance
provided by symmetric primitives and does not require
more expensive approaches such as public-key cryp-
tography [36, 34, 33]. In particular, although fully ho-
momorphic cryptosystems promise to fulll this task
and progress has been made with new cryptographic
constructions [54], it is still prohibitively expensive for
real-world deployments [55].
Alves and Aranha Page 15 of 25
Space consumption is also aected. Ciphertexts are
computed as a combination of the plaintext with ran-
dom data. This way, a non-trivial expansion rate is
expected. Dierently from speed overheads which are
aected by a single attribute type, all attributes suer
with the expansion rate of encryption.
5.5 Capabilities and limitations
Our framework is capable of providing an always-
encrypted database that preserves secrecy as long as
the data owner keeps the cryptographic keys secure.
One is able to select records through index and ap-
ply arbitrary operations on attributes dened as com-
putable. Furthermore, it increases the security of data
but maintaining the computational complexity of stan-
dard relational primitives, achieving a fair trade-o be-
tween security and performance.
Although the framework has no constraints about at-
tributes classied as both index and computable, there
is no known encryption scheme in the literature ca-
pable of satisfying all the requirements. This way, the
relational model of the database must be as precise as
possible when assigning attributes to each class, spe-
cially because the costs of a model refactor can be
prohibitive.
Some scenarios appear to be more compatible with
an encrypted database as described than others. An e-
mail service, for example, can be trivially adapted. The
e-mails received by a user are stored in encrypted form
as static and some heuristic is applied on its content to
generate a set of keywords to be used on BuildIndex.
This heuristic may use all unique words in the e-mail,
for example. The sender address may be an impor-
tant value for querying as well, so it may be stored
as an index. To optimize common queries, a secondary
collection of records may be instantiated with, for ex-
ample, counters. The quantity of e-mails received from
a particular sender, how often a term appears or how
many messages are received in a time frame. Storing
this metadata information in a secondary data collec-
tion avoids some of the high costs of searching in the
main dataset.
However, our proposal fails when the user wants to
search for something that was not previously expected.
For example, regular expressions. Suppose a query that
searches for all the sentences that start with “Attack”
and end with “dawn”, or all the e-mails on the domain
“mail.com”. If these patterns were not foreseen when
the keyword index was built, then no one will be able to
correctly execute this selection without the decryption
of the entire database. Since the format of the strings
is lost on encryption, this kind of search is impossible
in our proposal.
Lastly, relational integrity is a desired property for a
relational database. It connects two or more sets using
same-value attributes in both sets (e.g.: every value
of a column in a table Aexists in a column in table
B), and establishes a primary-foreign key relationship.
This way, the existence of a record in an attribute clas-
sied as foreign key depends on the existence of the
related record on the other set, in which the primary
key is equal to that foreign key. To implement such
feature one must provide to the DBMS capabilities to
reinforce relational integrity rules. In other words, the
server must be able to recognize such a relationship to
guarantee it will be respected by issued queries.
Figure 3 Simple diagram describing the interaction between
users and products composing the information regarding an
order. Notice that the existence of users and products is
independent, but there is a dependence for orders.
Alves and Aranha Page 16 of 25
An example of the applicability of this concept is
an e-commerce database. Best practices dictate that
user data should be stored separately from products
and orders. Thus, one may model it as in Figure 3.
When a new order arrives, it is clear that a user chose
some product and informed the store about his intent
to buy it. Users and products are concrete elements.
However, a sale is an abstract object and cannot hap-
pen without a buyer and a product. This way, to main-
tain the consistency of the database the DBMS must
assure that no sale record will exist without relating
user and product. This can be achieved by constructing
the sales table such that records contain foreign keys
for the user and product tables (implying that these
contain attributes classied as primary keys). By def-
inition this feature imposes an inherent requirement
that the DBMS has knowledge about this relation-
ship between records on dierent tables. Any approach
to protect the attributes against third parties will af-
fect the DBMS itself and will never really achieve the
needed protection. Thus, any eort on implementing
secure relational integrity is at best security through
obscurity [5].
6 The winner solution of Netix’s prize
The winner of Netix Grand Prize was BellKor’s
Pragmatic Chaos team, who built a solution over
the progress achieved in the 2007 and 2008 Progress
Prizes [56]. Several machine learning predictors were
combined in the nal solution with the objective of
anticipating the suitability of Netix content for some
user considering previous behavior in the platform.
The foundation used for this considered diverse fac-
tors, such as:
• What is the general behavior of users when rat-
ing? What is the average rating?
[5]When the security of a system relies only in the lack
of knowledge by adversaries about its implementation
details and aws.
• How critic is a user and how this changes over
time?
• Does the user demonstrate preference for a spe-
cic movie or gender?
• Does the user demonstrate preference for block-
busters or non-mainstream content?
• What property of a movie aects the rating? Is
there a correlation between the rating of a user
and the presence of a particular actor in a partic-
ular gender?
The strategy used to combine these factors (and
many others) escapes the scope of this work. We should
attend only to the necessity of extracting data from the
dataset to feed the learning model.
6.1 Searching the encrypted Netix’s database
An interesting application of our framework is enabling
an entity to maintain an encrypted database on third
party hardware with a similar structure of Netix’s
dataset and being able to implement a prediction algo-
rithm with minimum data leakage to the DBMS. The
database should be capable of answering the requested
predicates regarding user behavior.
Two scenarios must be considered: the recommen-
dation system running on Netix’s infrastructure,
and the dataset becoming public. The former oers
an execution environment apparently honest (no one
would share data with an openly malicious party) but
that can be compromised at some point. To mitigate
the damage, the data owner can implement dierent
strategies to reduce the usefulness of any leakage that
might happen. Thus, data being handled exclusively in
encrypted form on the server is a natural option, since
security breaches would reveal nothing but incompre-
hensible ciphertexts. This is the best case scenario
since the data owner has as much control of the exe-
cution machine as possible, so our framework proposal
can be applied in its full capacity.
Alves and Aranha Page 17 of 25
As an example of the latter, an important feature re-
quired for running the Netix’s prize is the capability
of in-dataset comparisons. This time any security so-
lution should nd the balance between protecting data
secrecy and oering conditions for experimentation.
Moreover, we must consider that the execution envi-
ronment cannot be considered honest anymore. This
way, the suitability of our framework depends on the
relaxation of the indexing method. index values must
be published to enable comparisons. For instance, both
sides of Lewi-Wu’s ciphertexts should be published,
or even an OPE scheme may be used on the encryp-
tion of the index. From the perspective of the secrecy
of ciphertexts, if a IND-OCPA scheme is used then
there will be no security reduction beyond what the
corresponding threat model expects, as discussed in
Section 2.1. The adversary learns the ciphertext order
but has restricted ability to make inferences using in-
formation acquired from public databases. The only
strategy that can be applied uses the data distribu-
tion in the dataset (that can be retrieved by enabling
comparisons), which puts an attacker in this scenario
in a very similar position than the persistent passive
attacker.
Given the boundary conditions for privacy preser-
vation, we cannot precisely state the robustness of
our framework in the context of the Netix prize. It
clearly increases the hardness against an inference at-
tack, since the adversary is unable to observe the plain-
text, but the distribution leaked will give him hints
about its content. For instance, the correlation of age
groups and most watched (or better rated) movies. It
is a fact that all these are expressed as ciphertexts,
but as previously stated, a motivated adversary may
be able to combine such hints and defeat our security
barriers.
Our framework performs much better in the more
conservative scenario, where a production server pro-
vides recommendations to users with comparisons con-
trolled by the data owner through the two-sided index
attributes. The impossibility for arbitrary comparisons
makes snapshot attacks completely infeasible.
As previously discussed, a motivated adversary with
access to the database may be able to also retrieve
logs and auxiliary collections. Consequently, previous
queries may leak the second side of index ciphertexts
and recall the danger of persistent passive attacks. So,
an important feature for future work is the develop-
ment of a key refreshment algorithm to nullify the use-
fulness of such information.
6.2 Data structure
The dataset shared by Netix is composed by more
than 100 million real movie ratings from 480,000 users
about 17,000 movies, made between 1999 and 2005,
and formatted as a training test set [11, 56]. It contains
a subset of 4.2 million of those ratings, with up to 9
ratings per user. It consists of:
•CustomerID: A unique identication number per
user,
•MovieID: A unique identication number per
movie,
•Title: The English title of the movie,
•YearOfRelease: The year the movie was released,
•Rating: The rating itself,
•Date: The timestamp informing when the rating
happened.
6.3 Constructing queries of interest over encrypted data
Following we rewrite some of the main predicates re-
quired for BellKor’s solution using the relational alge-
bra of Section 5.2, thus enabling their execution over
an encrypted dataset.
Let
•DB be a dataset as described in Section 6.2,
• AID be the CustomerID related to a particular
user (that we shall call Alice),
Alves and Aranha Page 18 of 25
• BID be the CustomerID related to a particular
user dierent to Alice (that we shall call Bob),
• MID be the MovieID related to an arbitrary movie
in the dataset (that we should refer as M),
•T= (Tstart,Tend )be a time interval of interest,
•Trst-alice be the timestamp of the rst rating Alice
ever made,
•C() be a function that receives a set and returns
the quantity of items contained,
•rHand rLbe thresholds for extreme ratings char-
acterizing users that hated or loved a movie,
•σDate∈T (DB)≡σDate≥Tstart (DB)+σD ate<Tend (DB),
•f(X) = Px∈XπRating (x)
C(X).
Then, some of the required predicates for BellKor’s
solution are:
•Movies rated by Alice: Returns all movies that
received some rating from Alice. For
U(X) = σCustomerI D=X(DB),
we have the query
πMovieID(U(AID)).(1)
•Users who rated M: Returns all users that sent
some rating for MID. For
M(X) = σMovieID=MID(DB),
we have the query
πCustomerI D (M(MID)).(2)
•Average of Alice’s ratings over time: Com-
putes the average of all rates sent by Alice during
a particular time interval T. For
AAID,T=σDate∈T (U(AID)),
we have that
avg(AID,T) =
f(AAID,T)if C(AAID,T)>0,
0,otherwise.
(3)
•Average of ratings for a particular movie M
in a timeset: Computes the average of all rates
sent by all users during a particular time interval
Tfor a movie M. For
MMID,T=σDate∈T (M(MID))
we have that
avg(MID,T) =
f(MMID,T)if C(MMID,T)>0,
0,otherwise.
(4)
•Number of days since Alice’s rst rating:
Computes how many days have been since the Al-
ice submitted the rst rating of movie, relative to
a moment I.
dsf(AID, I) = I − πDate(σmin(D ate)(U(AID))).
(5)
•Quantity of users who hated M:Counts the
quantity of very bad ratings Mreceived since its
release.
CH(M) = C(σMovieID=MID(σRating≤rH(DB))) .
(6)
•Quantity of users who loved M:Counts the
quantity of very good ratings Mreceived since its
release.
CL(M) = C(σMovieID=MID(σRating≥rL(DB))) .
(7)
•Users that are similar to Alice: The similarity
assessment between users require the derivation
of a specic metric according to the boundary-
conditions. The winning solution developed a so-
Alves and Aranha Page 19 of 25
phisticated strategy, building a graph of neighbor-
hoods considering similar movies and users and
computing a weighted mean of the ratings. For
simplicity, we shall condense two factors that can
be used for this objective: the set of common rated
movies, and how close the ratings are. To query
the movies rated both by Alice and Bob, let
αAID =πMovieID,RatingA(ρRating,RatingA(U(AID)))
and
βBID =πMovieID,RatingB(ρRating,RatingB(U(BID))).
Then
SimilaritySet (AID,BID) = αAID βBI D (8)
returns a sequence of tuples of ratings made by
Alice and Bob. A simple approach for evaluating
proximity is to compute the average of the dier-
ence of ratings for each movie returned by Equa-
tion 8, as shown in Equation 9.
PSimilaritySet(AID,BID)|RatingA −RatingB|
C(SimilaritySet (AID,BID)) (9)
7 Implementation
A proof-of-concept implementation of the proposed
framework was developed and made available to the
community under a GNU GPLv3 license [29]. It runs
upon the popular document-based database Mon-
goDB and was designed as a wrapper over its Python
driver [57]. Hence, we are able to evaluate its compe-
tence as a search framework as well as the compatibil-
ity with a state-of-the-art DBMS. Moreover, running
as a wrapper makes it database-agnostic and restricts
the server to dealing with encrypted data. We choose
to implement our wrapper over a NoSQL database
so we could avoid dealing with the SQL interpreter
and thus reduce the implementation complexity. How-
ever, our solution should be easily portable to any
SQL database because of its strong roots in relational
algebra. Table 1 provides the schemes used for each at-
tribute class, the parameter size and its security level.
Table 1 Chosen cryptosystems for each attribute presented in
Section 5.
Attribute Cryptosystem Parameters Sec. level
static AES 128 bits 128 bits
index Lewi-Wu 128 bits 128 bits
computable (+) Paillier 3072 bits 128 bits
computable (×) ElGamal 3072 bits 128 bits
Lewi-Wu’s ORE scheme relies on symmetric prim-
itives and achieves IND-OCPA. The authors claim
that this is more secure than all existing OPE and
ORE schemes which are practical [34]. Finally, Pail-
lier and ElGamal are well-known public-key schemes.
Both achieve IND-CPA and are based on the hardness
of solving integer factorization and discrete logarithm
problems, respectively. Paillier supports homomorphic
addition, while ElGamal provides homomorphic multi-
plication. Both are classied as PHE schemes [38, 39].
The implementation of AES was provided by the py-
crypto toolkit [58]; we wrote a Python binding over
the implementation of Lewi-Wu provided by the au-
thors [59]; and we implemented Paillier and ElGamal
schemes. An AVL tree was used as the index structure.
It is important to notice that performance was not the
main focus in this proof-of-concept implementation.
The machines used to run our experiments are de-
scribed in Tables 2 and 3. The former species the
machine used to host the MongoDB server, and latter
describes the one used to run the client. Both machines
were connected by a Gigabit local network connection.
Table 2 Specications of the machine used for running the
MongoDB instance.
CPU 2 x Intel Xeon E5-2670 v1 @ 2.60GH
OS CentOS 7.3
Memory 16 x DDR3 DIMM 8192MB @ 1600MHz
Disk 7200RPM Western Digital HDD (SATA)
Alves and Aranha Page 20 of 25
Table 3 Specications of the machine used for running the
queries described in this document.
CPU 2 x Intel Xeon E5-2640 v2 @ 2.60GH
OS Ubuntu 16.04.2
Memory 4 x DDR3 DIMM 8192MB @ 1600MHz
Disk 7200RPM Western Digital HDD (SATA)
While it was trivial to index the plaintext dataset
natively, it was not so simple with the encrypted ver-
sion. MongoDB is not friendly to custom index struc-
tures or comparators, so we decided to construct the
structure with Python code and then insert it into
the database using pointers based on MongoDB’s na-
tive identity codes. Walking through the index tree
depends on a database-external operation at Python-
side, calling MongoDB’s find method to localize doc-
uments related to left/right pointers starting from the
tree root. Such limitation brings a major performance
overhead that especially aects range queries.
7.1 Netix’s prize dataset
We used the Netix’s dataset to measure the compu-
tational costs of managing an encrypted database.
We consider the two threat scenarios discussed in
Section 6.1, a recommendation system running in pro-
duction, and the disclosure of a real ratings dataset.
Both require the ability of running all queries pre-
sented in Section 6.3, diering only in the content
that must be inserted in the encrypted dataset (for
instance, how much of the index ciphertexts may be
stored). Hence, to demonstrate the suitability of our
framework as a strategy to fulll the development and
execution of a good predictor in such contexts, and be-
ing capable of mitigating damages to user privacy, we
implemented those queries in an encrypted instance of
the dataset.
As shown in Table 4, the four attributes chosen are
classied as static, which use the faster encryption
and decryption available. Rating is tagged computable
for addition and multiplication, thus being compatible
Table 4 Attribute structure of elements in the Netix’s prize
dataset.
Name Value type Class
CustomerID integer index,static
MovieID integer index,static
Rating integer static,
computable
Date integer index,static
with Equations 3 and 4. We use CustomerID,MovieID,
and Date for indexing. Encrypting the document struc-
ture takes 540µs per record.
There is no way to implement integer division
over Paillier ciphertexts. Thus, the predictor may be
adapted to use the non-divided result on Equations 3
and 4. Otherwise, a division oracle must be provided,
to which one could submit their homomorphically
added values and ask for a ciphertext equivalent to
its division by an arbitrary integer. This approach
does not reduce security for an IND-CPA homomor-
phic scheme.
Handling such a large dataset was not an easy task.
The ciphertext expansion factor caused by AES, Pail-
lier and ElGamal cryptosystems was relatively small,
but the Lewi-Wu implementation is very inecient in
this regard, having an expansion of about 400×. This
directly aects the index building and motivated us
to explore dierent strategies to encrypt and load the
dataset to a MongoDB instance in reasonable time.
Again, MongoDB is not friendly for custom index-
ing. A contribution by Grim, Wiersma and Turkmen
to our code enables us to manage the AVL tree in-
side the database through JavaScript code stored in-
side MongoDB’s engine (the only way to execute arbi-
trary code in MongoDB) [60]. Thus, our primary ap-
proach to feed the DBMS with the dataset was quite
simple: encrypt each record in our wrapper, insert in
the database, and update the index and balance the
tree inside the DBMS. The two rst operations suered
from an extremely high memory consumption and by
Alves and Aranha Page 21 of 25
Table 5 Latency for each step in the construction of an AVL tree
following Algorithm 1 for each index attribute specied in 4.
Attribute sort (s) group (s) build_index (s)
CustomerID 329 459 129
MovieID 270 161 2
Date 187 197 5
far surpassed our available RAM capacity. However,
an even worse problem we faced was to build the AVL
tree. For the rst thousand records we could do the
node insertion and tree balancing with a transfer rate
of about 600 documents per second, but it dropped
quickly as the tree height increases, reaching less than
1document per second before insertion of the 10,000th
record.
We found out that the initial insertions required a
novel approach. We completely decoupled the index
from the static data encryption and chose to rst feed
the database with the static ciphertexts, constructing
the entire AVL tree using the plaintext on client-sided
memory, and then inserting it in the database. More-
over, to speed up the index construction we followed
Algorithms 1 and 2 to construct the AVL tree. It takes
a sorted list of inputs and builds the tree with time
complexity of O(n)on the list size. As a result of this
approach we were able to build the encrypted database
and the index by 3000 documents per second during
the entire procedure.
Algorithm 1 Build an AVL tree using an array of docu-
ments.
1: procedure build_index(docs)
2: docssort ←sort(docs);
3: docsgroup ←group(docssort);Combine equal elements
4: return build_aux(docsgroup ,0, lenght(docsgr oup)−1);
5: end procedure
Table 5 shows the latency of each step we observed
during the construction of the AVL tree-based indexes.
The total time to build those 3 indexes was 40 minutes.
The queries we derived in Section 6.3 were ported to
our encrypted database, and the latency for each one
Algorithm 2 Recursively builds an AVL tree with a sorted
array of documents without repeated elements. Receives
the array itself, and the indexes for the leftmost and right-
most elements to be handled in each recursive call.
1: procedure build_aux(docs, L, R)
2: if L=Rthen
3: return new_node(docs[L]);
4: else if L+ 1 = Rthen
5: left_node ←new_node(docs[L]);
6: right_node ←new_node(docs[R]);
7: left_node.right =right_node;
8: left_node.height = 1;
9: return lef t_node;
10: else
11: M←L+b(R−L)/2c;
12: middle_node ←new_node(docs[M]);
13: middle_node.lef t ←build_aux(docs, L, M −1);
14: middle_node.right ←build_aux(docs, M + 1, R);
15: lh ←middle_node.lef t.height;
16: rh ←middle_node.right.height;
17: middle_node.height = 1 + max(lh, rh);
18: return middle_node;
19: end if
20: end procedure
can be seen in Table 6. The parameters used for each
Equation were arbitrarily selected. The CustomerIDs
for Alice and Bob (AID and BID) were 1061110 and
2486445 respectively, while MID was xed as 6287.
The time interval used was 01/01/2003 to 01/01/2004.
Lastly, we dened a “loved” rating as those greater
than 3, and “hated” rating as those lower than 3. We
applied some eorts in optimizing the execution, how-
ever these results can still be improved.
As it can be seen, complex queries composed by
range selections, as well as those with numerous out-
comes, suered from the slow communication between
server and the client. The latter inuenced even the
plaintext results. The outcome of Equation 1 is quite
small, requiring much less time to return than the out-
come of Equation 2 (the number of movies rated by a
user is much smaller than the number of users that
rated a movie).
Alves and Aranha Page 22 of 25
The time interval selection in Equations 3 and 4 re-
quired our implementation to visit many nodes in the
index tree for Date. Because each iteration requires a
back and forth between the server and the client, this
dramatically impacted the performance. The latencies
for Equations 1 and 5 were only 1.4times higher in
the encrypted database, however it reached 710 times
for Equation 3. Lastly, Equations 6 and 7 depend on
Paillier’s homomorphic additions. This implied in a
factor-12 slowdown.
Table 6 Execution times for implementations of the Equations
presented in Section 6.3 on an encrypted MongoDB collection and
an equivalent plaintext version. Each row contains the latency for
the entire circuit required by the respective Equation and
returning the outcome to the client. Times are computed as the
average for 100 independent executions. The machine and
parameters used in each cryptosystem follow those dened in
Section 7.
Equation Encrypted Plaintext
116.6ms 11.9ms
22s850 ms
32.7s3.8ms
42.7s1.0s
516.8ms 11.8ms
6 and 7 12 ms 1.0ms
9603 ms 200 ms
The implementation of queries based on Equations
3 and 4 took the previous suggestion and skipped the
nal division. We believe this does not undermine any
procedure that eventually consumes this outcome.
The optimal implementation of Equations 6 and 7
requires indexing of MovieID and Rating attributes.
However, due to limitations in our implementation,
rather than indexing the latter we use linear search
over the outcome of the movie selection on client-side.
Our approach for building indexes use the set data
structure of MongoDB documents. Yet, in the most re-
cent release such structure holds up to 16MB of data,
much smaller than the required for indexing the entire
dataset for Rating with our strategy.
Lastly, Equation 8 was implemented aiming at the
joining of data regarding two users, Alice and Bob. We
let the evaluation of such information by a similarity-
evaluation function as future work.
8 Conclusion
We presented the problem of searching in encrypted
data and a proposal of a framework that guides the
modeling of a database with support to this function-
ality. This is achieved by combining dierent crypto-
graphic concepts and using dierent cryptosystems to
satisfy the requirements of each attribute, like order-
revealing encryption and homomorphic encryption.
Over this approach, a relational algebra was built to
support encrypted data composed by: projection, se-
lection, Cartesian product, dierence, union, rename,
and join-like operators.
An overview of the security provided is discussed,
as well as a performance analysis about the impact in
a realistic database. As a case study we explored the
Netix prize, which published an anonymized dataset
with real-world information about user behavior which
was later deanonymized through correlation attacks in-
volving public databases.
We oered a proof-of-concept implementation in
Python over the document-based database MongoDB.
To demonstrate its functionality, we selected and ran
some of the main predicates required by the win-
ning solution of the Netix Grand Prize and mea-
sured the performance impact of the execution in a
encrypted version of the dataset. We conclude that
our proposal oers robustness against a compromised
server and we discuss how it would help to avoid the
deanonymization of the Netix dataset. In comparison
with CryptDB, our proposal provides higher security,
since it delegates exclusively to the data owner the re-
sponsibility of encrypting and decrypting data. This
way, privacy holds even in a scenario of database or
application compromise.
Alves and Aranha Page 23 of 25
As future research objectives we can mention:
•Extend the scope to associative arrays: Despite
being powerful on SQL, Codd’s relational alge-
bra is not completely applicable for non-relational
databases. For instance, NoSQL and NewSQL
databases lack the concept of joining. A more con-
venient foundation for such context is algebra of
associative arrays [61]. Hence, the formalization
of our primitives in such algebra would be an in-
teresting work.
•Reduce the leakage of index construction in the
database: Our proposal leaks both sides of index
ciphertexts to enable the index construction. At
this moment, an eavesdropper monitoring queries
would learn all information required to freely com-
pare the exposed ciphertexts. As discussed in this
document, such capability must be restricted, un-
der risk of enabling an inference attack.
•Key refreshment algorithm: A persistent passive
attacker is capable of learning the required infor-
mation to perform comparisons through the en-
tire database, just by observing issued queries and
its outcome. Thus, the framework primitives must
be improved to support an algorithm capable of
avoid any damage caused by the knowledge of
such information.
•Hide repeated queries: Even with encrypted queries
and outcomes, the access pattern in a database
may indicate repeated queries and the associated
records. A technique such as ORAM could be use-
ful to protect such information [62].
•Explore dierent databases: As stated, MongoDB
is a very popular NoSQL database. However, it
is not friendly to custom indexing or third party
code running in its engine. Thus, to replace it by a
more appropriate database could provide a more
productive system.
•Improve performance of our implementation: Our
implementation had as objective to be a proof-of-
concept and demonstrate how the proposal works.
The development of a space and speed-optimized
versions is an important next step.
Author’s contributions
The rst author developed the study design, carried out the implementation
eorts and wrote most the paper. The second author contributed with
discussions about the proposal and its validation with the case study. Both
authors read and approved the nal manuscript.
Acknowledgements
A prior version of this paper was presented at the XVI Brazilian Symposium
on Information and Computational Systems Security (SBSeg16) [63].
The authors thank Proof. André Santanchè for the initial opportunity to
develop this work, the Multidisciplinary High Performance Computing Lab
(LMCAD) for providing the required infrastructure, and the anonymous
reviewers that helped improving this work.
Funding
This research was partially founded by CNPq and the Google Research
Awards Latin America.
Availability of data and materials
Our proof-of-concept implementation is publicly available on GitHub, as
well as a generator for a synthetic dataset used for testing [29]. The Netix
dataset is available in academic repositories [64].
Competing interests
The authors declare that they have no competing interests.
References
1. Buyya, R.: Market-Oriented Cloud Computing: Vision, Hype, and
Reality of Delivering Computing As the 5th Utility. In: Proceedings of
the 2009 9th IEEE/ACM International Symposium on Cluster
Computing and the Grid. CCGRID ’09, p. 1. IEEE Computer Society,
Washington, DC, USA (2009)
2. Vecchiola, C., Pandey, S., Buyya, R.: High-Performance Cloud
Computing: A View of Scientic Applications. In: Pervasive Systems,
Algorithms, and Networks (ISPAN), 2009 10th International
Symposium On, pp. 4–16 (2009)
3. Hoa, C., Mehta, G., Freeman, T., Deelman, E., Keahey, K.,
Berriman, B., Good, J.: On the Use of Cloud Computing for Scientic
Workows. In: eScience 08. IEEE Fourth International Conference On,
pp. 640–645 (2008)
4. Dinh, H.T., Lee, C., Niyato, D., Wang, P.: A survey of mobile cloud
computing: architecture, applications, and approaches. Wireless
Communications and Mobile Computing 13(18), 1587–1611 (2013)
5. Xiao, Z., Xiao, Y.: Security and Privacy in Cloud Computing. IEEE
Communications Surveys Tutorials 15(2), 843–859 (2013)
Alves and Aranha Page 24 of 25
6. Pascual, A.: 2017 Data Breach Fraud Impact Report: Going
Undercover and Recovering Data. Technical report, Javelin Advisory
Services (2017)
7. Popa, R.A., Redeld, C.M.S., Zeldovich, N., Balakrishnan, H.:
Cryptdb: Protecting condentiality with encrypted query processing.
In: Proceedings of the Twenty-Third ACM Symposium on Operating
Systems Principles. SOSP ’11, pp. 85–100. ACM, New York, NY, USA
(2011)
8. Poddar, R., Boelter, T., Popa, R.A.: Arx: A Strongly Encrypted
Database System. Cryptology ePrint Archive, Report 2016/591 (2016)
9. Ramamurthy, R., Eguro, K., Arasu, A., Kaushik, R., Kossmann, D.,
Venkatesan, R.: A Secure Coprocessor for Database Applications
(2013)
10. Tu, S., Kaashoek, M.F., Madden, S., Zeldovich, N.: Processing
analytical queries over encrypted data. Proc. VLDB Endow. 6(5),
289–300 (2013)
11. Bennett, J., Lanning, S., et al.: The netix prize. In: Proceedings of
KDD Cup and Workshop, vol. 2007, p. 35 (2007). New York, NY,
USA
12. Michael Arrington: AOL Proudly Releases Massive Amounts of Private
Data. https://techcrunch.com/2006/08/06/
aol-proudly- releases-massive- amounts-of- user-search- data/.
Accessed 24 July 2017 (2006)
13. Said, A., Bellogín, A.: Comparative recommender system evaluation:
benchmarking recommendation frameworks. In: Proceedings of the 8th
ACM Conference on Recommender Systems, pp. 129–136 (2014).
ACM
14. Wang, Z., Liao, J., Cao, Q., Qi, H., Wang, Z.: Friendbook: a
semantic-based friend recommendation system for social networks.
IEEE Transactions on Mobile Computing (3), 538–551 (2015)
15. Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In:
The Adaptive Web, pp. 325–341. Springer, ??? (2007)
16. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large
sparse datasets. In: Proceedings of the 2008 IEEE Symposium on
Security and Privacy. SP ’08, pp. 111–125. IEEE Computer Society,
Washington, DC, USA (2008)
17. Barbaro, M., Zeller, T.: A Face Is Exposed for AOL Searcher No.
4417749. The New York Times. Accessed 05 April 2017 (2006)
18. Narayanan, A., Felten, E.W.: No silver bullet: De-identication still
doesn’t work. Technical report (2014)
19. Greenwald, G., MacAskill, E.: NSA Prism program taps in to user data
of Apple, Google and others. The Guardian (2013)
20. Weber, H.: How the NSA & FBI made Facebook the perfect mass
surveillance tool. Venture Beat. Published on 05/15/2014. (2014)
21. Thomsen, S.: Extramarital aair website Ashley Madison has been
hacked and attackers are threatening to leak data online. Business
Insider. Accessed 25 May 2016 (2015)
22. Niklas Magnusson and Niclas Rolander: Sweden Tries to Stem Fallout
of Security Breach in IBM Contract. Bloomberg (2017)
23. BBC News: Yahoo ’state’ hackers stole data from 500 million users.
Accessed 23 September 2016 (2016)
24. Sweeney, L.: Simple Demographics Often Identify People Uniquely
(2000). http://dataprivacylab.org/projects/identifiability/
25. Golle, P.: Revisiting the uniqueness of simple demographics in the us
population. In: Proceedings of the 5th ACM Workshop on Privacy in
Electronic Society. WPES ’06, pp. 77–80. ACM, New York, NY, USA
(2006)
26. DuckDuckGo: Privacy Mythbusting #3: Anonymized data is safe,
right? (Er, no.).
https://spreadprivacy.com/dataanonymization-e1e2b3105f3c.
Accessed 24 July 2017
27. Schneier, B.: Data is a toxic asset (2016). https://www.schneier.
com/essays/archives/2016/03/data_is_a_toxic_asse.html
28. Bösch, C., Hartel, P., Jonker, W., Peter, A.: A survey of provably
secure searchable encryption. ACM Comput. Surv. 47(2), 18–11851
(2014)
29. Alves, P.: A proof-of-concept searchable encryption backend for
mongodb. https://github.com/pdroalves/encrypted-mongodb.
Accessed July 2017 (2016)
30. Bellare, M., Desai, A., Pointcheval, D., Rogaway, P.: Relations among
notions of security for public-key encryption schemes. Advances in
Cryptology — CRYPTO ’98: 18th Annual International Cryptology
Conference Santa Barbara, California, USA August 23–27, 1998
Proceedings, pp. 26–45. Springer, Berlin, Heidelberg (1998)
31. Curtmola, R., Garay, J., Kamara, S., Ostrovsky, R.: Searchable
symmetric encryption: Improved denitions and ecient constructions.
Journal of Computer Security 19(5), 895–934 (2011)
32. Boldyreva, A., Chenette, N., Lee, Y., O’Neill, A.: Order-preserving
symmetric encryption. Lecture Notes in Computer Science 5479,
224–241 (2009)
33. Boneh, D., Lewi, K., Raykova, M., Sahai, A., Zhandry, M.,
Zimmerman, J.: Semantically Secure Order-Revealing Encryption:
Multi-input Functional Encryption Without Obfuscation, pp. 563–594.
Springer, Berlin, Heidelberg (2015)
34. Lewi, K., Wu, D.J.: Order-Revealing Encryption: New Constructions,
Applications, and Lower Bounds. Cryptology ePrint Archive, Report
2016/612 (2016)
35. Kolesnikov, V., Shikfa, A.: On the limits of privacy provided by
Order-Preserving Encryption. Bell Labs Technical Journal (2012)
36. Chenette, N., Lewi, K., Weis, S.A., Wu, D.J.: Practical
Order-Revealing Encryption with Limited Leakage. In FSE (2016)
37. Naveed, M., Kamara, S., Wright, C.V.: Inference attacks on
property-preserving encrypted databases. In: Proceedings of the 22Nd
ACM SIGSAC Conference on Computer and Communications Security.
CCS ’15, pp. 644–655. ACM, New York, NY, USA (2015)
38. Paillier, P.: In: Stern, J. (ed.) Public-Key Cryptosystems Based on
Composite Degree Residuosity Classes, pp. 223–238. Springer, Berlin,
Heidelberg (1999)
39. El Gamal, T.: A public key cryptosystem and a signature scheme based
on discrete logarithms. In: Proceedings of CRYPTO 84 on Advances in
Cryptology, pp. 10–18. Springer, New York, NY, USA (1985).
http://dl.acm.org/citation.cfm?id=19478.19480
Alves and Aranha Page 25 of 25
40. Gentry, C.: Computing Arbitrary Functions of Encrypted Data.
Commun. ACM 53(3), 97–105 (2010)
41. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (leveled) fully
homomorphic encryption without bootstrapping. In: Proceedings of the
3rd Innovations in Theoretical Computer Science Conference. ITCS
’12, pp. 309–325. ACM, New York, NY, USA (2012)
42. Loftus, J., May, A., Smart, N.P., Vercauteren, F.: On CCA-Secure
Somewhat Homomorphic Encryption. In: Proceedings of the 18th
International Conference on Selected Areas in Cryptography. SAC’11,
pp. 55–72. Springer, Berlin, Heidelberg (2012)
43. Song, D.X., Wagner, D., Perrig, A., Perrig, A.: Practical techniques for
searches on encrypted data. Proceeding 2000 IEEE Symposium on
Security and Privacy. S&P 2000, 44–55 (2000)
44. Grubbs, P., Ristenpart, T., Shmatikov, V.: Why Your Encrypted
Database Is Not Secure. Cryptology ePrint Archive, Report 2017/468.
http://eprint.iacr.org/2017/468 (2017)
45. Rogaway, P.: The moral character of cryptographic work. IACR
Cryptology ePrint Archive, 1162 (2015)
46. Naveed, M., Kamara, S., Wright, C.V.: Inference attacks on
property-preserving encrypted databases. In: Proceedings of the 22Nd
ACM SIGSAC Conference on Computer and Communications Security.
CCS ’15, pp. 644–655. ACM, New York, NY, USA (2015)
47. Grubbs, P., McPherson, R., Naveed, M., Ristenpart, T., Shmatikov,
V.: Breaking web applications built on top of encrypted data. In:
Proceedings of the 2016 ACM SIGSAC Conference on Computer and
Communications Security. CCS ’16, pp. 1353–1364. ACM, New York,
NY, USA (2016)
48. Daemen, J., Rijmen, V.: AES Proposal: Rijndael (1999)
49. Papadimitriou, A., Bhagwan, R., Chandran, N., Ramjee, R., Haeberlen,
A., Singh, H., Modi, A., Badrinarayanan, S.: Big data analytics over
encrypted datasets with seabed. In: Proceedings of the 12th USENIX
Conference on Operating Systems Design and Implementation.
OSDI’16, pp. 587–602. USENIX Association, Berkeley, CA, USA
(2016). http://dl.acm.org/citation.cfm?id=3026877.3026922
50. Shoro, A.G., Soomro, T.R.: Big data analysis: Apache spark
perspective. Global Journal of Computer Science and Technology
15(1) (2015)
51. Codd, E.F.: A relational model of data for large shared data banks.
Commun. ACM 26 (6), 64–69 (1983)
52. Ramakrishnan, R., Gehrke, J.: Database Management Systems, 3rd
edn. McGraw-Hill, Inc., New York, NY, USA (2003)
53. Sedgewick, R.: Algorithms, p. 199. Addison-Wesley, ??? (1983). Chap.
15
54. Doröz, Y., Hostein, J., Pipher, J., Silverman, J.H., Sunar, B., Whyte,
W., Zhang, Z.: Fully Homomorphic Encryption from the Finite Field
Isomorphism Problem. Cryptology ePrint Archive, Report 2017/548
(2017)
55. Boneh, D., Gentry, C., Halevi, S., Wang, F., Wu, D.J.: Private
database queries using somewhat homomorphic encryption. In:
Proceedings of the 11th International Conference on Applied
Cryptography and Network Security. ACNS’13, pp. 102–118. Springer,
Berlin, Heidelberg (2013)
56. Töscher, A., Jahrer, M., Bell, R.M.: The BigChaos Solution to the
Netix Grand Prize (2009)
57. Chodorow, K., Dirolf, M.: MongoDB: The Denitive Guide, 1st edn.
O’Reilly Media, Inc., USA (2010)
58. Litzenberger, D.: Python Cryptography Toolkit.
http://www.pycrypto.org/. Accessed 03 July 2016 (2016)
59. Wu, D.J., Lewi, K.: FastORE.
https://github.com/kevinlewi/fastore. Accessed July 2017
(2016)
60. Grim, M.W., Wiersma, A.T., Turkmen, F.: Security and Performance
Analysis of Encrypted NoSQL Databases. Technical report, University
of Amsterdam (2017)
61. Kepner, J., Gadepally, V., Hutchison, D., Jananthan, H., Mattson,
T.G., Samsi, S., Reuther, A.: Associative array model of sql, nosql, and
newsql databases. CoRR (2016)
62. Stefanov, E., Van Dijk, M., Shi, E., Fletcher, C., Ren, L., Yu, X.,
Devadas, S.: Path oram: an extremely simple oblivious ram protocol.
In: Proceedings of the 2013 ACM SIGSAC Conference on Computer &
Communications Security, pp. 299–310 (2013). ACM
63. Alves, P.G.M.R., Aranha, D.F.: A framework for searching encrypted
databases. In: Proceedings of the XVI Brazilian Symposium on
Information and Computational Systems Security (2016)
64. Netix Prize Data Set. http://academictorrents.com/details/
9b13183dc4d60676b773c9e2cd6de5e5542cee9a. Accessed July 2017
(2009)