Daniel J. Abadi’s research while affiliated with University of Maryland, College Park and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (96)


FileScale: Fast and Elastic Metadata Management for Distributed File Systems
  • Conference Paper

October 2023

·

25 Reads

·

1 Citation

Gang Liao

·

Daniel J. Abadi

Detock: High Performance Multi-region Transactions at Scale

June 2023

·

26 Reads

·

3 Citations

Proceedings of the ACM on Management of Data

Many globally distributed data stores need to replicate data across large geographic distances. Since synchronously replicating data across such distances is slow, those systems with high consistency requirements often geo-partition data and direct all linearizable requests to the primary region of the accessed data. This significantly improves performance for workloads where most transactions access data close to where they originate from. However, supporting serializable multi-geo-partition transactions is a challenge, and they often degrade the performance of the whole system. This becomes even more challenging when they conflict with single-partition requests, where optimistic protocols lead to high numbers of aborts, and pessimistic protocols lead to high numbers of distributed deadlocks. In this paper, we describe the design of concurrency control and deadlock resolution protocols, built within a practical, complete implementation of a geographically replicated database system called Detock, that enables processing strictly-serializable multi-region transactions with near-zero performance degradation at extremely high conflict and order of magnitude higher throughput relative to state-of-the art geo-replication approaches, while improving latency by up to a factor of 5.


C5: cloned concurrency control that always keeps up

November 2022

·

11 Reads

·

3 Citations

Proceedings of the VLDB Endowment

Jeffrey Helt

·

Abhinav Sharma

·

Daniel J. Abadi

·

[...]

·

Asynchronously replicated primary-backup databases are commonly deployed to improve availability and offload read-only transactions. To both apply replicated writes from the primary and serve read-only transactions, the backups implement a cloned concurrency control protocol. The protocol ensures read-only transactions always return a snapshot of state that previously existed on the primary. This compels the backup to exactly copy the commit order resulting from the primary's concurrency control. Existing cloned concurrency control protocols guarantee this by limiting the backup's parallelism. As a result, the primary's concurrency control executes some workloads with more parallelism than these protocols. In this paper, we prove that this parallelism gap leads to unbounded replication lag, where writes can take arbitrarily long to replicate to the backup and which has led to catastrophic failures in production systems. We then design C5, the first cloned concurrency protocol to provide bounded replication lag. We implement two versions of C5: Our evaluation in MyRocks, a widely deployed database, demonstrates C5 provides bounded replication lag. Our evaluation in Cicada, a recent in-memory database, demonstrates C5 keeps up with even the fastest of primaries.



C5: Cloned Concurrency Control that Always Keeps Up

July 2022

·

2 Reads

·

1 Citation

Asynchronously replicated primary-backup databases are commonly deployed to improve availability and offload read-only transactions. To both apply replicated writes from the primary and serve read-only transactions, the backups implement a cloned concurrency control protocol. The protocol ensures read-only transactions always return a snapshot of state that previously existed on the primary. This compels the backup to exactly copy the commit order resulting from the primary's concurrency control. Existing cloned concurrency control protocols guarantee this by limiting the backup's parallelism. As a result, the primary's concurrency control executes some workloads with more parallelism than these protocols. In this paper, we prove that this parallelism gap leads to unbounded replication lag, where writes can take arbitrarily long to replicate to the backup and which has led to catastrophic failures in production systems. We then design C5, the first cloned concurrency protocol to provide bounded replication lag. We implement two versions of C5: Our evaluation in MyRocks, a widely deployed database, demonstrates C5 provides bounded replication lag. Our evaluation in Cicada, a recent in-memory database, demonstrates C5 keeps up with even the fastest of primaries.



The Seattle Report on Database Research

February 2020

·

363 Reads

·

58 Citations

ACM SIGMOD Record

Approximately every five years, a group of database researchers meet to do a self-assessment of our community, including reflections on our impact on the industry as well as challenges facing our research community. This report summarizes the discussion and conclusions of the 9th such meeting, held during October 9-10, 2018 in Seattle.


Integration of large-scale data processing systems and traditional parallel database technology

August 2019

·

25 Reads

·

1 Citation

Proceedings of the VLDB Endowment

In 2009 we explored the feasibility of building a hybrid SQL data analysis system that takes the best features from two competing technologies: large-scale data processing systems (such as Google MapReduce and Apache Hadoop) and parallel database management systems (such as Greenplum and Vertica). We built a prototype, HadoopDB, and demonstrated that it can deliver the high SQL query performance and efficiency of parallel database management systems while still providing the scalability, fault tolerance, and flexibility of large-scale data processing systems. Subsequently, HadoopDB grew into a commercial product, Hadapt, whose technology was eventually acquired by Teradata. In this paper, we provide an overview of HadoopDB's original design, and its evolution during the subsequent ten years of research and development effort. We describe how the project innovated both in the research lab, and as a commercial product at Hadapt and Teradata. We then discuss the current vibrant ecosystem of software projects (most of which are open source) that continued HadoopDB's legacy of implementing a systems level integration of large-scale data processing systems and parallel database technology.


SLOG: serializable, low-latency, geo-replicated transactions

July 2019

·

114 Reads

·

53 Citations

Proceedings of the VLDB Endowment

For decades, applications deployed on a world-wide scale have been forced to give up at least one of (1) strict serializability (2) low latency writes (3) high transactional throughput. In this paper we discuss SLOG: a system that avoids this tradeoff for workloads which contain physical region locality in data access. SLOG achieves high-throughput, strictly serializable ACID transactions at geo-replicated distance and scale for all transactions submitted across the world, all the while achieving low latency for transactions that initiate from a location close to the home region for data they access. Experiments find that SLOG can reduce latency by more than an order of magnitude relative to state-of-the-art strictly serializable geo-replicated database systems such as Spanner and Calvin, while maintaining high throughput under contention.


OLTP through the looking glass, and what we found there

December 2018

·

202 Reads

·

47 Citations

Online Transaction Processing (OLTP) databases include a suite of features---disk-resident B-trees and heap files, locking-based concurrency control, support for multi-threading---that were optimized for computer technology of the late 1970's. Advances in modern processors, memories, and networks mean that today's computers are vastly different from those of 30 years ago, such that many OLTP databases will now fit in main memory, and most OLTP transactions can be processed in milliseconds or less. Yet database architecture has changed little. Based on this observation, we look at some interesting variants of conventional database systems that one might build that exploit recent hardware trends, and speculate on their performance through a detailed instruction-level breakdown of the major components involved in a transaction processing database system (Shore) running a subset of TPC-C. Rather than simply profiling Shore, we progressively modified it so that after every feature removal or optimization, we had a (faster) working system that fully ran our workload. Overall, we identify overheads and optimizations that explain a total difference of about a factor of 20x in raw performance. We also show that there is no single "high pole in the tent" in modern (memory resident) database systems, but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations.


Citations (71)


... Another line of research aims to reduce lock contention by enforcing partial or full determinism in concurrency control. Calvin [53] and Detock [54] use a global agreement scheme to sequence lock requests deterministically. Deterministic techniques require a priori knowledge of read-set and write-set. ...

Reference:

GeoTP: Latency-aware Geo-Distributed Transaction Processing in Database Middlewares (Extended Version)
Detock: High Performance Multi-region Transactions at Scale
  • Citing Article
  • June 2023

Proceedings of the ACM on Management of Data

... Data-driven decision-making is becoming increasingly vital across various domains [1]. Top-K queries, which retrieve the top k tuples from a dataset using a user-defined function, offer a solution to identifying interesting objects from large databases. ...

The Seattle report on database research
  • Citing Article
  • August 2022

Communications of the ACM

... Today, storing and processing data on third-party remote cloud servers is widely used, showing explosive growth [1]. However, as the scale, value, and centralization of data increases, the reverse side of this process is revealed-the problems of ensuring the security and privacy of data are aggravated, which causes serious concern for owners and users of data. ...

The Seattle Report on Database Research
  • Citing Article
  • February 2020

ACM SIGMOD Record

... It achieves high performance by exploiting parallelism among a set of nodes. Massively Parallel Processing (MPP) data warehouse systems, such as Aster 1 and Greenplum 2 , have integrated MapReduce into their systems. Experiments in [14] show that combining MapReduce and data warehouse systems produces better performance. ...

Integration of large-scale data processing systems and traditional parallel database technology
  • Citing Article
  • August 2019

Proceedings of the VLDB Endowment

... Since database logs comprehensively record data increments in real time, log analysis can enhance consistency and availability for the system to some extent [8][9][10]. For a specific data exchange request, the receiver can synchronize data according to the data increments recorded in the initiator's log. ...

SLOG: serializable, low-latency, geo-replicated transactions
  • Citing Article
  • July 2019

Proceedings of the VLDB Endowment

... The term NoSQL was introduced to describe the database that did not include any SQL schemas. It was later on changed to Not only SQL, referring to another class of databases that do not show the characteristics of classical databases [1][2][3][4][5][6]. Nowadays, several features for NoSQL databases can be named. ...

The end of an architectural era: it's time for a complete rewrite
  • Citing Chapter
  • December 2018

... Combining transactional and analytical query workloads is a problem that has not been yet tackled for RDF graph data, despite its importance in future Big graph ecosystems [2]. Inspired by the relational database literature, we propose a triple store architecture using a buffer to store the updates [3] [4]. The key idea of the buffer is that most of the data is stored in a read-optimized main partition and updates are accumulated into a write-optimized buffer partition we call delta. ...

C-store: a column-oriented DBMS
  • Citing Chapter
  • December 2018

... Data quality will directly affect the efficiency of a small sample database, especially the discrepancy data in a small sample database will largely reduce the quality of small space storage information [9][10][11][12]. Combined with the difference data elimination method, it can effectively ensure that there is no duplicate data in different types of cloud databases, avoid differences such as data heterogeneity, effectively reduce the differences between the constructed small sample databases, and fundamentally improve the realtime, rapidity and reliability of database storage [13][14][15][16][17]. In addition, in order to use the transmission bandwidth as little as possible to complete the elimination of data with different nature differences, when the target data node and the source data node in the sample object synchronize the data, the data length should be as short as possible [18][19][20][21]. ...

OLTP through the looking glass, and what we found there
  • Citing Chapter
  • December 2018

... Advantages over Deterministic Databases. Existing deterministic databases [9, 24,51,52,57,61,62,[68][69][70] replicate batches of SQL transaction requests to multiple master nodes. After collecting all transactions of the same batch, each replica is required to execute these transactions according to a predefined serial order. ...

An overview of deterministic database systems
  • Citing Article
  • August 2018

Communications of the ACM