Dynamic Authenticated Index Structures for Outsourced
†CS Department, Boston University, USA.‡AT&T Labs-Research, USA.
(lifeifei, gkollios, reyzin)@cs.bu.edu, firstname.lastname@example.org
Technical Report BUCS-TR-2006-004
April 22, 2006
In outsourced database (ODB) systems the database owner publishes its data through a
number of remote servers, with the goal of enabling clients at the edge of the network to access
and query the data more efficiently. As servers might be untrusted or can be compromised,
query authentication becomes an essential component of ODB systems. Existing solutions for
this problem concentrate mostly on static scenarios and are based on idealistic properties for
certain cryptographic primitives. In this work, first we define a variety of essential and prac-
tical cost metrics associated with ODB systems. Then, we analytically evaluate a number of
different approaches, in search for a solution that best leverages all metrics. Most importantly,
we look at solutions that can handle dynamic scenarios, where owners periodically update the
data residing at the servers. Finally, we discuss query freshness, a new dimension in data au-
thentication that has not been explored before. A comprehensive experimental evaluation of
the proposed and existing approaches is used to validate the analytical models and verify our
claims. Our findings exhibit that the proposed solutions improve performance substantially
over existing approaches, both for static and dynamic environments.
Database outsourcing  is a new paradigm that has been proposed recently and received con-
siderable attention. The basic idea is that data owners delegate their database needs and func-
tionalities to a third-party that provides services to the users of the database. Since the third
party can be untrusted or can be compromised, security concerns must be addressed before this
There are three main entities in the Outsourced Database (ODB) model: the data owner,
the database service provider (a.k.a. server) and the client. In general, many instances of each
entity may exist. In practice, usually there is a single or a few data owners, a few servers, and
many clients. The data owner first creates the database, along with the associated index and
authentication structures and uploads it to the servers. It is assumed that the data owner may
update the database periodically or ocasionally, and that the data management and retrieval
happens only at the servers. Clients submit queries about the owner’s data to the servers and get
back results through the network.
It is much cheaper to maintain ordinary servers than to maintain truly secure ones, particularly
in the distributed setting. To guard against malicious/compromised servers, the owner must give
the clients the ability to authenticate the answers they receive without having to trust the servers.
In that respect, query authentication has three important dimensions: correctness, completeness
and freshness. Correctness means that the client must be able to validate that the returned records
do exist in the owner’s database and have not been modified in any way. Completeness means
that no answers have been omitted from the result. Finally, freshness means that the results are
based on the most current version of the database, that incorporates the latest owner updates. It
should be stressed here that query freshness is an important dimension of query authentication
that has not been extensively explored in the past, since it is a requirement arising from updates
to the ODB systems, an aspect that has not been sufficiently studied yet.
There are a number of important costs pertaining to the aforementioned model, relating to
the construction, query, and update phases. In particular, in this work the following metrics are
considered: 1. The computation overhead for the owner, 2. The owner-server communication
cost, 3. The storage overhead for the server, 4. The computation overhead for the server, 5. The
client-server communication cost, and 6. The computation cost for the client (for verification).
Previous work has addressed the problem of query authentication mostly for static scenarios,
where owners never issue data updates. In addition, existing solutions take into account only
a subset of the metrics proposed here, and hence are optimized only for particular scenarios
and not the general case. Finally, previous work was mostly of theoretical nature, analyzing
the performance of the proposed techniques using analytical cost formulas, and not taking into
account the fact that certain cryptographic primitives do not feature idealistic characteristics
in practice. For example, trying to minimize the I/O cost associated with the construction of
an authenticated structure does not take into account the fact that generating signatures using
popular public signature schemes is two times slower than a random disk page access on today’s
computers. To the best of our knowledge, no previous work ever conducted empirical evaluations
on a working prototype of existing techniques.
Our Contributions. In this work, we: 1. Conduct a methodical analysis of existing approaches
over all six metrics, 2. Propose a novel authenticated structure that best leverages all metrics,
3. Formulate detailed cost models for all techniques that take into account not only the usual
structural maintenance overheads, but the cost of cryptographic operations as well, 4. Discuss
the extensions of the proposed techniques for dynamic environments (where data is frequently
updated), 5. Consider possible solutions for guaranteeing query freshness, 6. Implement a fully
working prototype and perform a comprehensive experimental evaluation and comparison of all
We would like to point out that there are other security issues in ODB systems that are
orthogonal to the problems considered here. Examples include: privacy-preservation issues [14, 1,
10], secure query execution , security in conjunction with access control requirements [20, 29, 5]
and query execution assurance . In particular, query execution assurance of  does not
provide authentication: the server could pass the challenges and yet still return false query results.
The rest of the paper is organized as follows. Section 2 presents background on essential
cryptography tools, and a brief review of related work. Section 3 discusses the authenticated
index structures for static ODB scenarios. Section 4 extends the discussion to the dynamic case
and Section 5 addresses query freshness. Finally, the empirical evaluation is presented in Section
6. Section 7 concludes the paper.
Figure 1: Example of a Merkle hash tree.
The basic idea of the existing solutions to the query authentication problem is the following. The
owner creates a specialized data structure over the original database that is stored at the servers
together with the database. The structure is used by a server to provide a verification object VO
along with the answers, which the client can use for authenticating the results. Verification usually
occurs by means of using collision-resistant hash functions and digital signature schemes. Note
that in any solution, some information that is authentic to the owner must be made available
to the client; else, from the client’s point of view, the owner cannot be differentiated from a
(potentially malicious) server. Examples of such information include the owner’s public signature
verification key or a token that in some way authenticates the database. Any successful scheme
must make it computationally infeasible for a malicious server to find incorrect query results
and verification object that will be accepted by a client who has the appropriate authentication
information from the owner.
2.1 Cryptography essentials
Collision-resistant hash functions.
computable function that takes a variable-length input x to a fixed-length output y = h(x).
Collision resistance states that it is computationally infeasible to find two inputs, x1?= x2, such
that h(x1) = h(x2). Collision-resistant hash functions can be built provably based on various
cryptographic assumptions, such as hardness of discrete logarithms . However, in this work
we concentrate on using heuristic hash functions, which have the advantage of being very fast to
evaluate, and specifically focus on SHA1 , which takes variable-length inputs to 160-bit (20-
byte) outputs. SHA1 is currently considered collision-resistant in practice; we also note that any
eventual replacement to SHA1 developed by the cryptographic community can be used instead of
SHA1 in our solution.
For our purposes, a hash function h is an efficiently
Public-key digital signature schemes. A public-key digital signature scheme, formally defined
in , is a tool for authenticating the integrity and ownership of the signed message. In such
a scheme, the signer generates a pair of keys (SK,PK), keeps the secret key SK secret, and
publishes the public key PK associated with her identity. Subsequently, for any message m that
she sends, a signature smis produced by: sm= S(SK,m). The recipient of smand m can verify
smvia V(PK,m,sm) that outputs “valid” or “invalid.” A valid signature on a message assures
the recipient that the owner of the secret key intended to authenticate the message, and that
the message has not been changed. The most commonly used public digital signature scheme is
RSA . Existing solutions [26, 27, 21, 23] for the query authentication problem chose to use this
scheme, hence we adopt the common 1024-bit (128-byte) RSA. Its signing and verification cost is
one hash computation and one modular exponentiation with 1024-bit modulus and exponent.
Aggregating several signatures.
m1,...,mtsigned by the same signer need to be verified all at once, certain signature schemes
allow for more efficient communication and verification than t individual signatures. Namely, for
RSA it is possible to combine the t signatures into a single aggregated signature s1,tthat has the
same size as an individual signature and that can be verified (almost) as fast as an individual
signature. This technique is called Condensed-RSA . The combining operation can be done by
anyone, as it does not require knowledge of SK; moreover, the security of the combined signature
is the same as the security of individual signatures. In particular, aggregation of t RSA signatures
can be done at the cost of t−1 modular multiplications, and verification can be performed at the
cost of t−1 multiplications, t hashing operations, and one modular exponentiation (thus, the com-
putational gain is that t − 1 modular exponentiations are replaced by modular multiplications).
Note that aggregating signatures is possible only for some digital signature schemes.
In the case when t signatures s1,...,st on t messages
The Merkle Hash Tree. An improvement on the straightforward solution for authenticating
a set of data values is the Merkle hash tree (see Figure 1), first proposed by . It solves the
simplest form of the query authentication problem for point queries and datasets that can fit in
main memory. The Merkle hash tree is a binary tree, where each leaf contains the hash of a
data value, and each internal node contains the hash of the concatenation of its two children.
Verification of data values is based on the fact that the hash value of the root of the tree is
authentically published (authenticity can be established by a digital signature). To prove the
authenticity of any data value, all the prover has to do is to provide the verifier, in addition to the
data value itself, with the values stored in the siblings of the path that leads from the root of the
tree to that value. The verifier, by iteratively computing all the appropriate hashes up the tree,
at the end can simply check if the hash she has computed for the root matches the authentically
published value. The security of the Merkle hash tree is based on the collision-resistance of the
hash function used: it is computationally infeasible for a malicous prover to fake a data value,
since this would require finding a hash collision somewhere in the tree (because the root remains
the same and the leaf is different—hence, there must be a collision somewhere in between). Thus,
the authenticity of any one of n data values can be proven at the cost of providing and computing
log2n hash values, which is generally much cheaper than storing and verifying one digital signature
per data value. Furthermore, the relative position (leaf number) of any of the data values within
the tree is authenticated along with the value itself.
Cost models for SHA1, RSA and Condensed-RSA. Since all existing authenticated struc-
tures are based on SHA1 and RSA, it is imperative to evaluate the relative cost of these operations
in order to be able to draw conclusions about which is the best alternative in practice. Based
on experiments with two widely used cryptography libraries, Crypto++  and OpenSSL ,
we obtained results for hashing, signing, verifying and performing modulo multiplications. Evi-
dently, one hashing operation on our testbed computer takes approximately 2 to 3 µs. Modular
multiplication, signing and verifying are, respectively, approximately 100, 10,000 and 1,000 times
slower than hashing (verification is faster than signing due to the fact that the public verification
exponent can be fixed to a small value).
Thus, it is clear that multiplication, signing and verification operations are very expensive,
and comparable to random disk page accesses. The cost of these operations needs to be taken into
account in practice, for the proper design of authenticated structures. In addition, since the cost
of hashing is orders of magnitude smaller than that of singing, it is essential to design structures
that use as few signing operations as possible, and hashing instead.
0 0.1 0.2 0.3 0.4 0.5 0.6
(a) VO size.
0 0.1 0.2 0.3 0.4 0.5 0.6
(b) Verification time.
Figure 12: Authentication cost.
Figure 13 summarizes the results for insertion operations. The ASB-tree will require comput-
ing between σND+ 1 and 2σND number of signatures. Essentially, every newly inserted tuple
requires two signature computations, unless if two new tuples are consecutive in order in which
case one signature computation can be avoided. Since the update operations are uniformly dis-
tributed, only a few such pairs are expected on average. Figure 13(a) verifies this claim. The other
structures are not included in the graph since they only require one signature re-computation in
The total number of pages affected is shown in Figure 13(b). The ASB-tree needs to update
both the pages containing the affected signatures and the B+-tree structure. Clearly, the signature
updates dominate the cost as they are linear to the number of update operations. Other structures
need to update only the nodes of the index. Trees with smaller fanout result in larger number of
affected pages. Even though the EMB−-tree and MB-tree have smaller fanout than the ASB-tree,
they produce much smaller number of affected pages. The EMB-tree and EMB∗-tree produce the
largest number of affected pages. Part of the reason is because in our experiments all indexes
are bulk-loaded with 70% utilization and the update workloads contain only insertions. This will
quickly lead to many split operations, especially for indexes with small fanout, which creates a
lot of new pages.
Another contributing factor to the update cost is the computation overhead. As we can
see from Figure 13(c) the ASB-tree obviously has the worst performance and its cost is order
of magnitudes larger than all other indexes, as it has to perform linear number of signature
computations (w.r.t the number of update operations). For other indexes, the computation cost
is mainly due to the cost of hashing operations and index maintenance. Finally, as Figure 13(d)
shows, the total update cost is simply the page I/O cost plus the computation cost. Our proposed
structures are the clear winners. Finally the communication cost incurred by update operations
is equal to the number of pages affected.
The experimental results clearly show that the authenticated structures proposed in this paper
perform better than the state-of-the-art with respect to all metrics except the VO size. Still,
our optimizations reduced the size to four times the size of the VO of the ASB-tree. Overall,
the EMB−-tree gives the best trade-off between all performance metrics, and it should be the
preferred technique in the general case. By adjusting the fanout of the embedded trees, we obtain
0 0.1 0.2 0.3 0.4 0.5 0.6
Number signatures to recompute
(a) Number of signature re-computations.
0 0.1 0.2 0.3 0.4 0.5 0.6
Total page accessed (#)
(b) Number of pages affected.
0 0.1 0.2 0.3 0.4 0.5 0.6
(c) Computation cost.
0 0.1 0.2 0.3 0.4 0.5 0.6
(d) Total update time.
Figure 13: Performance analysis for insertions.
a nice trade-off between query (V O) size, verification time, construction (update) time and storage
We presented a comprehensive evaluation of authenticated index structures based on a variety
of cost metrics and taking into account the cost of cryptographic operations, as well as that of
index maintenance. We proposed a novel structure that leverages good performance based on all
metrics. We extended the work to dynamic environments, a facet that had not been explored
in the past. We also formulated the problem of query freshness, which is a direct outcome of
the dynamic case. Finally, we presented a comprehensive experimental evaluation to verify our
claims. For future work, we plan to explore other directions for guaranteeing query freshness,
extend our ideas for multidimensional structures, and explore more involved types of queries.
Acknowledgments This work was partially supported by NSF grants IIS-0133825, CCR-0311485
and CCF-0515100. The authors would like to thank the anonymous reviewers for their constructive
 R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proc. of ACM Management of Data
(SIGMOD), pages 439–450, 2000.
 Authenticated Index Structures Library. http://cs-people.bu.edu/lifeifei/aisl/.
 E. Bertino, B. Carminati, E. Ferrari, B. Thuraisingham, and A. Gupta. Selective and authentic
third-party distribution of XML documents. IEEE Transactions on Knowledge and Data Engineering
(TKDE), 16(10):1263–1278, 2004.
 L. Bouganim, C. Cremarenco, F. D. Ngoc, N. Dieu, and P. Pucheral. Safe data sharing and data
dissemination on smart devices. In Proc. of ACM Management of Data (SIGMOD), pages 888–890,
 L. Bouganim, F. D. Ngoc, P. Pucheral, and L. Wu. Chip-secured data access: Reconciling access
rights with data encryption. In Proc. of Very Large Data Bases (VLDB), pages 1133–1136, 2003.
 D. Comer. The ubiquitous B-tree. ACM Computing Surveys, 11(2):121–137, 1979.
 Crypto++ Library. http://www.eskimo.com/∼weidai/cryptlib.html.
 P. Devanbu, M. Gertz, C. Martel, and S. Stubblebine. Authentic data publication over the internet.
Journal of Computer Security, 11(3):291–314, 2003.
 P. Devanbu, M. Gertz, C. Martel, and S. G. Stubblebine. Authentic third-party data publication. In
IFIP Workshop on Database Security (DBSec), pages 101–112, 2000.
 A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data
mining. In Proc. of ACM Symposium on Principles of Database Systems (PODS), pages 211–222,
 S. Goldwasser, S. Micali, and R. L. Rivest. A digital signature scheme secure against adaptive chosen-
message attacks. SIAM Journal on Computing, 17(2):96–99, 1988.
 H. Hacigumus, B. R. Iyer, C. Li, and S. Mehrotra. Executing SQL over encrypted data in the database
service provider model. In Proc. of ACM Management of Data (SIGMOD), pages 216–227, 2002.
 H. Hacigumus, B. R. Iyer, and S. Mehrotra. Providing database as a service. In Proc. of International
Conference on Data Engineering (ICDE), pages 29–40, 2002.
 B. Hore, S. Mehrotra, and G. Tsudik. A privacy-preserving index for range queries. In Proc. of Very
Large Data Bases (VLDB), pages 720–731, 2004.
 F. Li, M. Hadjieleftheriou, G. Kollios, and L. Reyzin. Authenticated Index Structures for Outsourced
Database Systems. Technical Report BUCS-TR 2006-004, CS Department, Boston University, 2006.
 C. Martel, G. Nuckolls, P. Devanbu, M. Gertz, A. Kwong, and S. Stubblebine. A general model for
authenticated data structures. Algorithmica, 39(1):21–41, 2004.
 K. McCurley. The discrete logarithm problem. In Proc. of the Symposium in Applied Mathematics,
pages 49–74. American Mathematical Society, 1990.
 R. C. Merkle. A certified digital signature. In Proc. of Advances in Cryptology (CRYPTO), pages
 S. Micali. Efficient certificate revocation. Technical Report MIT/LCS/TM-542b, Massachusetts In-
stitute of Technology, Cambridge, MA, March 1996.
 G. Miklau and D. Suciu. Controlling access to published data using cryptography. In Proc. of Very
Large Data Bases (VLDB), pages 898–909, 2003.
 E. Mykletun, M. Narasimha, and G. Tsudik. Authentication and integrity in outsourced databases.
In Symposium on Network and Distributed Systems Security (NDSS), 2004.
 E. Mykletun, M. Narasimha, and G. Tsudik.
gated/condensed signatures. In European Symposium on Research in Computer Security (ESORICS),
pages 160–176, 2004.
Signature bouquets:Immutability for aggre-
 M. Narasimha and G. Tsudik. Dsac: Integrity of outsourced databases with signature aggregation
and chaining. In Proc. of Conference on Information and Knowledge Management (CIKM), pages
 National Institute of Standards and Technology. FIPS PUB 180-1: Secure Hash Standard. National
Institute of Standards and Technology, 1995.
 OpenSSL. http://www.openssl.org.
 H. Pang, A. Jain, K. Ramamritham, and K.-L. Tan. Verifying completeness of relational query results
in data publishing. In Proc. of ACM Management of Data (SIGMOD), pages 407–418, 2005.
 H. Pang and K.-L. Tan. Authenticating query results in edge computing. In Proc. of International
Conference on Data Engineering (ICDE), pages 560–571, 2004.
 R. L. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatures and public-key
cryptosystems. Communications of the ACM (CACM), 21(2):120–126, 1978.
 S. Rizvi, A. Mendelzon, S. Sudarshan, and P. Roy. Extending query rewriting techniques for fine-
grained access control. In Proc. of ACM Management of Data (SIGMOD), pages 551–562, 2004.
 R. Sion. Query execution assurance for outsourced databases. In Proc. of Very Large Data Bases
(VLDB), pages 601–612, 2005.
 R. Tamassia and N. Triandopoulos. Efficient Content Authentication over Distributed Hash Tables.
Technical report, CS Department, Brown University, 2005.