Divyakant Agrawal

Divyakant Agrawal
University of California, Santa Barbara | UCSB · CCS - Department of Computer Science

About

430
Publications
39,825
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
14,401
Citations
Introduction
Skills and Expertise

Publications

Publications (430)
Article
The tutorial focuses on Private Information Retrieval (PIR), which allows clients to privately query public or server-owned databases without disclosing their queries. The tutorial covers the basic concepts of PIR such as its types, construction, and critical building blocks, including homomorphic encryption. It also discusses the performance of PI...
Article
The Diversity, Equity and Inclusion (DEI) initiative started as the Diversity/Inclusion initiative in 2020 [4]. The current report summarizes our activities in 2022. Our responsibility as a community is to ensure that attendees of DB conferences feel included, irrespective of their scientific perspective and personal background. One of the first st...
Preprint
The Log Structured Merge Trees (LSM-tree) based key-value stores are widely used in many storage systems to support a variety of operations such as updates, point reads, and range reads. Traditionally, LSM-tree's merge policy organizes data into multiple levels of exponentially increasing capacity to support high-speed writes. However, we contend t...
Article
System operators are often interested in extracting different feature streams from multi-dimensional data streams; and reporting their distributions at regular intervals, including the heavy hitters that contribute to the tail portion of the feature distribution. Satisfying these requirements to increase data rates with limited resources is challen...
Article
Consider a cloud server that owns a key-value store and provides a private query service to its clients. Preserving client privacy in this setting is difficult because the key-value store is public , and a client cannot encrypt or modify it. Therefore, privacy in this context implies hiding the access pattern of a client. Pantheon is a system that...
Conference Paper
Full-text available
Edge computing while bringing computation and data closer to users in order to improve response time, distributes edge servers in wide area networks resulting in increased communication latency between the servers. Synchronizing globally distributed edge servers, especially in the presence of Byzantine servers, becomes costly due to the high commun...
Article
Today's large-scale data management systems need to address distributed applications' confidentiality and scalability requirements among a set of collaborative enterprises. This paper presents Qanaat , a scalable multi-enterprise permissioned blockchain system that guarantees the confidentiality of enterprises in collaboration workflows. Qanaat pre...
Preprint
Linear sketches have been widely adopted to process fast data streams, and they can be used to accurately answer frequency estimation, approximate top K items, and summarize data distributions. When data are sensitive, it is desirable to provide privacy guarantees for linear sketches to preserve private information while delivering useful results w...
Preprint
Full-text available
Byzantine fault-tolerant protocols cover a broad spectrum of design dimensions from environmental setting on communication topology, to more technical features such as commitment strategy and even fundamental social choice related properties like order fairness. Designing and building BFT protocols remains a laborious task despite of years of inten...
Article
In this paper, we propose the first deterministic algorithms to solve the frequency estimation and frequent item problems in the bounded-deletion model. We establish the space lower bound for solving the deterministic frequent items problem in the bounded-deletion model, and propose Lazy SpaceSaving ± and SpaceSaving ± algorithms with optimal space...
Preprint
In this paper, we propose the first deterministic algorithms to solve the frequency estimation and frequent item problems in the bounded deletion model. We establish the space lower bound for solving the deterministic frequent items problem in the bounded deletion model, and propose the Lazy SpaceSaving$^\pm$ and SpaceSaving$^\pm$ algorithms with o...
Preprint
Full-text available
Today's large-scale data management systems need to address distributed applications' confidentiality and scalability requirements among a set of collaborative enterprises. In this paper, we present Qanaat, a scalable multi-enterprise permissioned blockchain system that guarantees confidentiality. Qanaat consists of multiple enterprises where each...
Conference Paper
Metadata from voice calls, such as the knowledge of who is communicating with whom, contains rich information about people’s lives. Indeed, it is a prime target for powerful adversaries such as nation states. Existing systems that hide voice call metadata either require trusted intermediaries in the network or scale to only tens of users. This pape...
Article
This errata article discusses and corrects a minor error in our work published in VLDB 2019. The discrepancy specifically pertains to Algorithms 3 and 4. The algorithms presented in the paper are biased towards a commit decision in a specific failure scenario. We explain the error using an example before correcting the algorithm.
Article
Recently the long standing problem of optimal construction of quantile sketches was resolved by K arnin, L ang, and L iberty using the KLL sketch (FOCS 2016). The algorithm for KLL is restricted to online insert operations and no delete operations. For many real-world applications, it is necessary to support delete operations. When the data set is...
Conference Paper
Full-text available
The unique features of blockchains such as immutability, transparency, provenance, and authenticity have been used by many large-scale data management systems to deploy a wide range of distributed applications including supply chain management, health-care, and crowdworking in a permissioned setting. Unlike permissionless settings, e.g., Bitcoin, w...
Preprint
Distributed caches are widely deployed to serve social networks and web applications at billion-user scales. This paper presents Cache-on-Track (CoT), a decentralized, elastic, and predictive caching framework for cloud environments. CoT proposes a new cache replacement policy specifically tailored for small front-end caches that serve skewed workl...
Preprint
Full-text available
Despite recent intensive research, existing crowdworking systems do not adequately address all the requirements of a real-world crowdworking environment. First, crowdworking platforms need to integrate within society and in particular to interface with legal and social institutions. Global regulations must be enforced, such as minimal and maximal w...
Article
The recent adoption of blockchain technologies and open permissionless networks suggest the importance of peer-to-peer atomic cross-chain transaction protocols. Users should be able to atomically exchange tokens and assets without depending on centralized intermediaries such as exchanges. Recent peer-to-peer atomic cross-chain swap protocols use ha...
Conference Paper
Full-text available
Modern large-scale data management systems utilize consensus protocols to provide fault tolerance. Consensus protocols are extensively used in the distributed database infrastructure of large enterprises such as Google, Amazon, and Facebook as well as permissioned blockchain systems like IBM's Hyperledger Fabric. In the last four decades, numerous...
Preprint
Full-text available
Significant amounts of data are currently being stored and managed on third-party servers. It is impractical for many small scale enterprises to own their private datacenters, hence renting third-party servers is a viable solution for such businesses. But the increasing number of malicious attacks, both internal and external, as well as buggy softw...
Preprint
Full-text available
Scalability is one of the main roadblocks to business adoption of blockchain systems. Despite recent intensive research on using sharding techniques to enhance the scalability of blockchain systems, existing solutions do not efficiently address cross-shard transactions. In this paper, we introduce SharPer, a permissioned blockchain system that enha...
Preprint
Full-text available
This paper presents TXSC, a framework that provides smart contract developers with transaction primitives. These primitives allow developers to write smart contracts without the need to reason about the anomalies that can arise due to concurrent smart contract function executions.
Conference Paper
Full-text available
Despite recent intensive research, existing blockchain systems do not adequately address all the characteristics of distributed applications. In particular, distributed applications collaborate with each other following service level agreements (SLAs) to provide different services. While collaboration between applications, e.g., cross-application t...
Conference Paper
Full-text available
IoT devices influence many different spheres of society and are predicted to have a huge impact on our future. Extracting real-time insights from diverse sensor data and dealing with the underlying uncertainty of sensor data are two main challenges of the IoT ecosystem In this paper, we propose a data processing architecture, M-DB, to effectively i...
Conference Paper
Full-text available
Permissioned Blockchain systems rely mainly on Byzantine fault-tolerant protocols to establish consensus on the order of transactions. While Byzantine fault-tolerant protocols mostly guarantee consistency (safety) in an asynchronous network using 3f+1 machines to overcome the simultaneous malicious failure of any f nodes, in many systems, e.g., blo...
Article
Full-text available
Despite recent intensive research, existing blockchain systems do not adequately address all the characteristics of distributed applications. In particular, distributed applications collaborate with each other following service level agreements (SLAs) to provide different services. While collaboration between applications, e.g., cross-application t...
Preprint
Full-text available
Large scale data management systems utilize State Machine Replication to provide fault tolerance and to enhance performance. Fault-tolerant protocols are extensively used in the distributed database infrastructure of large enterprises such as Google, Amazon, and Facebook, as well as permissioned blockchain systems like IBM's Hyperledger Fabric. How...
Conference Paper
Full-text available
The uprise of Bitcoin and other peer-to-peer cryptocurrencies has opened many interesting and challenging problems in cryptography, distributed systems, and databases. The main underlying data structure is blockchain, a scalable fully replicated structure that is shared among all participants and guarantees a consistent view of all user transaction...
Preprint
Full-text available
Permissionless blockchains (e.g., Bitcoin, Ethereum, etc) have shown a wide success in implementing global scale peer-to-peer cryptocurrency systems. In such blockchains, new currency units are generated through the mining process and are used in addition to transaction fees to incentivize miners to maintain the blockchain. Although it is clear how...
Preprint
Full-text available
The recent adoption of blockchain technologies and open permissionless networks suggest the importance of peer-to-peer atomic cross-chain transaction protocols. Users should be able to atomically exchange tokens and assets without depending on centralized intermediaries such as exchanges. Recent peer-to-peer atomic cross-chain swap protocols use ha...
Article
Data storage in the Cloud needs to be scalable and fault-tolerant. Atomic commitment protocols such as Two Phase Commit (2PC) provide ACID guarantees for transactional access to sharded data and help in achieving scalability. Whereas consensus protocols such as Paxos consistently replicate data across different servers and provide fault tolerance....
Article
Full-text available
Bitcoin is a successful and interesting example of a global scale peer-to-peer cryptocurrency that integrates many techniques and protocols from cryptography, distributed systems, and databases. The main underlying data structure is blockchain, a scalable fully replicated structure that is shared among all participants and guarantees a consistent v...
Conference Paper
In this paper, we propose Dynamic Paxos (DPaxos), a Paxos-based consensus protocol to manage access to partitioned data across globally-distributed datacenters and edge nodes. DPaxos is intended to implement a State Machine Replication component in data management systems for the edge. DPaxos targets the unique opportunities of utilizing edge compu...
Article
Cloud-based data-intensive applications have to process high volumes of transactional and analytical requests on large-scale data. Businesses base their decisions on the results of analytical requests, creating a need for real-time analytical processing. We propose Janus, a hybrid scalable cloud datastore, which enables the efficient execution of d...
Article
Today's web applications and social networks are serving billions of users around the globe. These users generate billions of key lookups and millions of data object updates per second. A single user's social network page load requires hundreds of key lookups. This scale creates many design challenges for the underlying storage systems. First, thes...
Conference Paper
Today's web applications and social networks are serving billions of users around the globe. These users generate billions of key lookups and millions of data object updates per second. A single user's social network page load requires hundreds of key lookups. This scale creates many design challenges for the underlying storage systems. First, thes...
Article
The widely adopted single-threaded OLTP model assigns a single thread to each static partition of the database for processing transactions in a partition. This simplifies concurrency control while retaining parallelism. However, it suffers performance loss arising from skewed workloads as well as transactions that span multiple partitions. In this...
Conference Paper
By maintaining the data in main memory, in-memory databases dramatically reduce the I/O cost of transaction processing. However, for recovery purposes, in-memory systems still need to flush the log to disk, which incurs a substantial number of I/Os. Recently, command logging has been proposed to replace the traditional data log (e.g., ARIES logging...
Conference Paper
We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google's Internet advertising business. Shasta targets applications with challenging requirements: First, user query latencies must be low. Second, underlying transactional data stores have complex "read-unfriendly...
Conference Paper
Global-scale data management (GSDM) empowers systems by providing higher levels of fault-tolerance, read availability, and efficiency in utilizing cloud resources. This has led to the emergence of global-scale data management and event processing. However, the Wide-Area Network (WAN) latency separating data is orders of magnitude larger than conven...
Conference Paper
Geo-replication is the process of maintaining copies of data at geographically dispersed datacenters for better availability and fault-tolerance. The distinguishing characteristic of geo-replication is the large wide-area latency between datacenters that varies widely depending on the location of the datacenters. Thus, choosing which datacenters to...
Article
Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-Time data ingestion and retrieval, as well as high availability, reliability, fault tolera...
Conference Paper
Data cubes are widely used as a powerful tool to provide multi-dimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube analysis. In this paper, we introduce HaCube, an extension of MapReduce, designed for efficient parallel dat...
Article
Nowadays, we are witnessing the fast production of very large amount of data, particularly by the users of online systems on the Web. However, processing this big data is very challenging since both space and computational requirements are hard to satisfy. One solution for dealing with such requirements is to take advantage of parallel frameworks,...
Conference Paper
Although MapReduce has been praised for its high scalability and fault tolerance, it has been criticized in some points, in particular, its poor performance in the case of data skew. There are important cases where a high percentage of processing in the reduce side is done by a few nodes, or even one node, while the others remain idle. There have b...
Article
We propose a new data-centric synchronization framework for carrying out of machine learning (ML) tasks in a distributed environment. Our framework exploits the iterative nature of ML algorithms and relaxes the application agnostic bulk synchronization parallel (BSP) paradigm that has previously been used for distributed machine learning. Data-cent...
Conference Paper
Cross datacenter replication is increasingly being deployed to bring data closer to the user and to overcome datacenter outages. The extent of the influence of wide-area communication on serializable transactions is not yet clear. In this work, we derive a lower-bound on commit latency. The sum of the commit latency of any two datacenters is at lea...
Conference Paper
For data-intensive applications with many concurrent users, modern distributed main memory database management systems (DBMS) provide the necessary scale-out support beyond what is possible with single-node systems. These DBMSs are optimized for the short-lived transactions that are common in on-line transaction processing (OLTP) workloads. One way...
Article
With large elastic and scalable infrastructures, the Cloud is the ideal storage repository for Big Data applications. Big Data is typically characterized by three V's: Volume, Variety and Velocity. Supporting these properties raises significant challenges in a cloud setting, including partitioning for scale out; replication across data centers for...
Conference Paper
Big Data provides an opportunity to interrogate some of the deepest scientific mysteries, e.g., how the brain works and develop new technologies, like driverless cars which, till very recently, were more in the realm of science fiction than reality. However Big Data as an entity in its own right creates several computational and statistical challen...
Article
Multicore CPUs and large memories are increasingly becoming the norm in modern computer systems. However, current database management systems (DBMSs) are generally ineffective in exploiting the parallelism of such systems. In particular, contention can lead to a dramatic fall in performance. In this paper, we propose a new concurrency control proto...
Article
By maintaining the data in main memory, in-memory databases dramatically reduce the I/O cost of transaction processing. However, for recovery purpose, those systems still need to flush the logs to disk, generating a significant number of I/Os. A new type of logs, the command log, is being employed to replace the traditional data log (e.g., ARIES lo...
Article
Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and queryability, as well as high availability, reliability, fault tol...
Article
In this collaborative keynote address, we will share Google's experience in building a scalable data infrastructure that leverages datacenters for managing Google's advertising data over the last decade. In order to support the massive online advertising platform at Google, the data infrastructure must simultaneously support both transactional and...
Article
Data outsourcing or database as a service is a new paradigm for data management. The third party service provider hosts databases as a service. These parties provide efficient and cheap data management by obviating the need to purchase expensive hardware and software, deal with software upgrades and hire professionals for administrative and mainten...
Conference Paper
With hundreds of millions of users worldwide, social networks provide incredible opportunities for social connection, learning, political and social change, and individual entertainment and enhancement in a multiple contexts. Because many social interactions currently take place in online networks, social scientists have access to unprecedented amo...
Conference Paper
Attributed graphs are becoming important tools for modeling information networks, such as the Web and various social networks (e.g. Facebook, LinkedIn, Twitter). However, it is computationally challenging to manage and analyze attributed graphs to support effective decision making. In this paper, we propose, Pagrol, a parallel graph OLAP (Online An...
Conference Paper
Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and queryability, as well as high availability, reliability, fault tol...
Article
Using geosocial applications, such as FourSquare, millions of people interact with their surroundings through their friends and their recommendations. Without adequate privacy protection, however, these systems can be easily misused, for example, to track users or target them for home invasion. In this paper, we introduce LocX, a novel alternative...
Article
The past decade has witnessed an increasing adoption of cloud database technology, which provides better scalability, availability, and fault-tolerance via transparent partitioning and replication, and automatic load balancing and fail-over. However, only a small number of cloud databases provide strong consistency guarantees for distributed transa...
Article
The First Law of Geography states "Everything is related to everything else, but near things are more related than distant things". This spatial significance has implications in various applications, trend detection being one of them. In this paper we propose a new algorithmic tool, GeoScope, to detect geo-trends. GeoScope is a data streams solutio...
Article
Full-text available
Data cubes are widely used as a powerful tool to provide multidimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube analysis. The problem is exacerbated by the demand of supporting more complicated aggregate functions (e.g. C...
Article
In many scientific applications, it is critical to determine if there is a relationship between a combination of objects. The strength of such an association is typically computed using some statistical measures. In order not to miss any important associations, it is not uncommon to exhaustively enumerate all possible combinations of a certain size...
Conference Paper
Reducing data transfer in MapReduce’s shuffle phase is very important because it increases data locality of reduce tasks, and thus decreases the overhead of job executions. In the literature, several optimizations have been proposed to reduce data transfer between mappers and reducers. Nevertheless, all these approaches are limited by how intermedi...
Article
Advances in operating system and storage-level virtualization technologies have enabled the effective consolidation of heterogeneous applications in a shared cloud infrastructure. Novel research challenges arising from this new shared environment include load balancing, workload estimation, resource isolation, machine replication, live migration, a...
Article
Web service providers have been using NoSQL datastores to provide scalability and availability for globally distributed data at the cost of sacrificing transactional guarantees. Recently, major web service providers like Google have moved towards building storage systems that provide ACID transactional guarantees for globally distributed data. For...
Conference Paper
A multitenant database management system (DBMS) in the cloud must continuously monitor the trade-off between efficient resource sharing among multiple application databases (tenants) and their performance. Considering the scale of \attn{hundreds to} thousands of tenants in such multitenant DBMSs, manual approaches for continuous monitoring are not...
Conference Paper
Technology trends are not only transforming the hardware landscape of end-user devices but are also dramatically changing the types of software applications that are deployed on these devices. With the maturity of cloud computing during the past few years, users increasingly rely on networked applications that are deployed in the cloud. In particul...
Article
A database management system (DBMS) serving a cloud platform must handle large numbers of application databases (or tenants) that are characterized by diverse schemas, varying footprints, and unpredictable load patterns. Scaling out using clusters of commodity servers and sharing resources among tenants (i.e., multitenancy) are important features o...
Conference Paper
Over the past few years, cloud computing and the growth of global large scale computing systems have led to applications which require data management across multiple datacenters. Initially the models provided single row level transactions with eventual consistency. Although protocols based on these models provide high availability, they are not id...
Conference Paper
Consolidation of multiple databases on the same server allows service providers to save significant resources because many production database servers are often under-utilized. Recent research investigates the problem of minimizing the number of servers required to host a set of tenants when the working sets of tenants are kept in main memory (e.g....
Article
Reducing data transfer in MapReduce's shuffle phase is very important because it increases data locality of reduce tasks, and thus decreases the overhead of job executions. In the literature, several optimizations have been proposed to reduce data transfer between mappers and reducers. Nevertheless, all these approaches are limited by how intermedi...
Chapter
With the growing popularity of the Internet, many applications and services started being delivered over the Internet and the scale of these applications also increased rapidly. As a result, many Internet companies, such as Google, Yahoo!, and Amazon, faced the challenge of serving hundreds of thousands to millions of concurrent users with ever inc...
Chapter
In the previous chapters of this book, we focused on large-scale applications that needed their databases to scale to thousands of transactions per seconds and span tens of thousands of servers within a data center or across geographically separated data centers. In this chapter, we shift our focus to another class of applications typically observe...
Chapter
The foundations of cloud computing are based on many of the fundamental concepts, protocols and models developed over the years in Computer Science, and especially in distributed computing and distributed data management. In this chapter, we cover some of the basic background in distributed systems and data management, which forms the foundation of...
Chapter
Key-value stores, such as Bigtable, PNUTS, and Dynamo, were designed to support single-operation and single data item transactions. Such operational semantics were sufficient for the initial set of applications for which these systems were designed. As these early data management platforms for the cloud became mature, an expanded set of application...
Chapter
In the previous chapter, we discussed abstractions and techniques to efficiently support transactional semantics on physically or logically co-located data. In this chapter, we discuss a set of approaches that do not require co-location (neither logical nor physical) of data accessed by a transaction. Distributed synchronization is inherent to such...