ArticlePublisher preview available

High-throughput state-machine replication using software transactional memory

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload.
This content is subject to copyright. Terms and conditions apply.
J Supercomput (2016) 72:4379–4398
DOI 10.1007/s11227-016-1747-2
High-throughput state-machine replication using
software transactional memory
Wenbing Zhao1·William Yang2·Honglei Zhang3·
Jack Yang4·Xiong Luo5·Yueqin Zhu6·
Mary Yang7·Chaomin Luo8
Published online: 13 May 2016
© Springer Science+Business Media New York 2016
Abstract State-machine replication is a common way of constructing general pur-
pose fault tolerance systems. To ensure replica consistency, requests must be executed
sequentially according to some total order at all non-faulty replicas. Unfortunately, this
could severely limit the system throughput. This issue has been partially addressed
by identifying non-conflicting requests based on application semantics and execut-
ing these requests concurrently. However, identifying and tracking non-conflicting
requests require intimate knowledge of application design and implementation, and
a custom fault tolerance solution developed for one application cannot be easily
BWenbing Zhao
wenbing@ieee.org
1Department of Electrical Engineering and Computer Science, Cleveland State University,
Cleveland, OH 44115, USA
2Texas Advanced Computing Center, University of Texas at Austin, 10100 Burnet Road, Austin,
TX 78758-4497, USA
3Agilysys Inc., Bellevue, WA, USA
4Division of Biostatistics and Biomathematics, Massachusetts General Hospital and Harvard
Medical School, Boston, MA 02114, USA
5School of Computer and Communication Engineering, University of Science and Technology
Beijing, Beijing 100083, China
6Development Research Center of China Geological Survey, Key Laboratory of Geological
Information Technology, Ministry of Land and Resources, Beijing 100037, China
7Department of Information Science, George Washington Donaghey College of Engineering and
Information Technology, Joint Bioinformatics Program of University of Arkansas at Little Rock
and University of Arkansas for Medical Sciences, 2801 S. University Avenue,
Little Rock, AR 72204, USA
8Department of Electrical and Computer Engineering, University of Detroit Mercy,
Detroit, MI 48221, USA
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Byzantine fault tolerance has been intensely studied since Castro and Liskov revitalized this research field [4,25]. The strategy employed in this research is along the line of application-aware Byzantine fault tolerance [5][6][7]23,24,[28][29][30]. The essence of this strategy is to minimize of use of traditional Byzantine agreement algorithms, which typically incurs high runtime overhead (in terms both latency and throughput), by exploiting application semantics [27]. ...
Article
Full-text available
In this article, we present a set of lightweight mechanisms to enhance the dependability of a safety-critical real-time distributed system referred to as an integrated clinical environment (ICE). In an ICE, medical devices are interconnected and work together with the help of a supervisory computer system to enhance patient safety during clinical operations. Inevitably, there are strong dependability requirements on the ICE.We introduce a set of mechanisms that essentially make the supervisor component a trusted computing base, which can withstand common hardware failures and malicious attacks. The mechanisms rely on the replication of the supervisor component and employ only one input-exchange phase into the critical path of the operation of the ICE. Our analysis shows that the runtime latency overhead is much lower than that of traditional approaches.
... Since NGS data are usually very big, high performance computing is often needed. We recently developed new high performance computing techniques for High-throughput statemachine replication using software transactional memory [5]. In this project, we consider that Docker [6] is a platform that can package an application and its dependencies so that the application is able to run in any Linux server, our NGS workflows were built using Docker container technology. ...
Conference Paper
Full-text available
Next generation sequencing (NGS) technology has generated a sheer volume of sequence data, opening unprecedented opportunities to gain new insight into biological systems. Meanwhile, the exponential growth of sequence data poses many challenges in processing and transferring data as well as data storage and analysis. Here, we have developed an NGS data management system to address these difficulties. Our system automates data processing and analysis, and provides efficient utilization of available resources. The system consists of components for data input and output, processing and storage. After registration, the user can login in and transfer data into the system. The NGS workflows are built using Docker container technology. Docker is a platform that can package an application and its dependencies so that the application is able to run on any Linux server. A job scheduler and a Docker service manager work together to ensure efficient allocation of resources, such as CPU and memory, for the jobs in the queue. Data size, job priority, and type of applications in the workflows are the key parameters for the job scheduler. The job scheduler determines cores, memory and order of execution of a job. The Docker service manager keeps track of available Docker servers, executes commands on each Docker server using the Docker remote APIs, and keeps track of the computational resources used by each running Docker container. The NGS management system is highly flexible and portable, and can run on a local server as well as in the cloud. Presently, NGS is becoming more available to many laboratories as the cost of sequencing continues to decrease. This web tool can facilitate biological discoveries from large-scale sequence data.
Chapter
The blockchain technology has attained huge interest in the last several years. This chapter first introduces the insight on the value of the blockchain technology in terms of different levels of benefits it can bring to applications. Second, it reviews the existing proposals on various blockchain applications for cyber‐physical systems. Third, the chapter summarizes the work on addressing the limited blockchain throughput issue using various means. Xu and Zou reviewed the blockchain technology from three perspectives: understanding the blockchain technology from the economy point of view; the economic functions of blockchain; and the use of blockchain as a financial infrastructure. Finally, the chapter introduces the work by Xu and Zou on their view of what blockchain can and cannot do and their opinion on the balance between decentralization and the trust on third parties.
Article
Full-text available
The pervasiveness of cloud-based services has significantly increased the demand for highly dependable systems. State machine replication is a powerful way of constructing highly dependable systems. However, state machine replication requires replicas to run deterministically and to process requests sequentially according to a total order. In this article, we review various techniques that have been used to engineer fault tolerance systems for better performance. Common to most such techniques is the customization of fault tolerance mechanisms based on the application semantics. By incorporating application semantics into fault tolerance design, we could enable concurrent processing of requests, reduce the frequency of distributed agreement operations, and control application nondeterminism. We start this review by making a case for considering application semantics for state machine replication. We then present a classification of various approaches to enhancing the performance of fault tolerance systems. This is followed by the description of various fault tolerance mechanisms. We conclude this article by outlining potential future research in high performance fault tolerance computing.
Conference Paper
Full-text available
Byzantine fault tolerance has been intensively studied over the past decade as a way to enhance the intrusion resilience of computer systems. However, state-machine-based Byzantine fault tolerance algorithms require deterministic application processing and sequential execution of totally ordered requests. One way of increasing the practicality of Byzantine fault tolerance is to exploit the application semantics, which we refer to as application-aware Byzantine fault tolerance. Application-aware Byzantine fault tolerance makes it possible to facilitate concurrent processing of requests, to minimize the use of Byzantine agreement, and to identify and control replica nondeterminism. In this paper, we provide an overview of recent works on application-aware Byzantine fault tolerance techniques. We elaborate the need for exploiting application semantics for Byzantine fault tolerance and the benefits of doing so, provide a classification of various approaches to application-aware Byzantine fault tolerance, and outline the mechanisms used in achieving application-aware Byzantine fault tolerance according to our classification.
Conference Paper
Full-text available
Event stream processing has been used to construct many mission-critical event-driven applications, such as business intelligence applications and collaborative intrusion detection applications. In this paper, we argue that event stream processing is also a good fit for autonomic computing and describe how to design such a system that is resilient to both hardware failures and malicious attacks. Based on a comprehensive threat analysis of event stream processing, we propose a set of lightweight mechanisms that help achieve Byzantine fault tolerant event processing for autonomic computing. The mechanisms consist of voting at the event consumers and an on-demand state synchronization mechanism triggered when an event consumer fails to collect a quorum of matching decision messages. We also introduce an evidence-based safe-guarding mechanism that prevents a faulty event consumer from inducing unnecessary rounds of state synchronization.
Conference Paper
Full-text available
Complex event processing has become an important technology for big data and intelligent computing because it facilitates the creation of actionable, situational knowledge from potentially large amount events in soft realtime. Complex event processing can be instrumental for many mission-critical applications, such as business intelligence, algorithmic stock trading, and intrusion detection. Hence, the servers that carry out complex event processing must be made trustworthy. In this paper, we present a threat analysis on complex event processing systems and describe a set of mechanisms that can be used to control various threats. By exploiting the application semantics for typical event processing operations, we are able to design lightweight mechanisms that incur minimum runtime overhead appropriate for soft realtime computing.
Chapter
Full-text available
In this paper, we argue for the need and benefits for providing Byzantine fault tolerance as a service to mission critical Web applications. In this new approach to Byzantine fault tolerance, an application server can partition the incoming requests into different domains for concurrent processing, decide which set of messages that should be totally ordered, or not at all, based its application semantics. This flexibility would reduce the end-to-end latency experienced by the clients and significantly increase the system throughput. Perhaps most importantly, we propose a middleware framework that provides a uniform interface to the applications so that they are not strongly tied to any particular Byzantine fault tolerance algorithm implementation.
Conference Paper
Full-text available
In this paper, we present a comprehensive study on how to achieve Byzantine fault tolerance for services with commutative operations. Recent research suggests that services may be implemented using Conflict-free Replicated Data Types (CRDTs) for highly efficient optimistic replication with the crash-fault model. We extend such studies by adopting the Byzantine fault model, which encompasses crash faults as well as malicious faults. We carefully analyze the threats towards the operations in a system constructed with CRDTs, and propose a lightweight solution to achieve Byzantine fault tolerance with low runtime overhead. We define a set of correctness properties for such systems and prove that the proposed Byzantine fault tolerance mechanisms guarantee these properties. Furthermore, we show that our mechanisms exhibit excellent performance with a proof-of-concept replicated shopping cart service constructed using CRDTs.
Article
Full-text available
This article presents a lightweight Byzantine fault tolerance (BFT) framework for session-oriented multi-tiered applications. We conclude that it is sufficient to use a lightweight BFT algorithm instead of a traditional BFT algorithm, based on a comprehensive study of the threat model to, and the state model of, the session-oriented multi-tiered applications. The lightweight BFT algorithm uses source ordering, rather than total ordering, of incoming requests to achieve Byzantine fault tolerant state-machine replication of such type of applications. The performance of the lightweight BFT framework is evaluated using a shopping cart application prototype built on the web services platform. The same shopping cart application is used as a running example to illustrate the problem we address and our proposed solution. Performance evaluation results obtained from the prototype implementation show that indeed our lightweight BFT algorithm incurs very insignificant overhead.
Conference Paper
Byzantine fault tolerance typically is achieved via state-machine replication, which requires the execution of all requests at the server replicas sequentially in a total order. This could severely limit the system throughput. We have seen tremendous efforts on the partial removal of the constraint on the sequential execution of all requests. Most of them rely on using application semantics to develop customized replication algorithms that could identify independent requests and execute them in parallel. In this paper, we describe concurrency control mechanisms for Byzantine fault tolerance systems using software transactional memory. This is an attractive approach to increasing the system throughput because no application-specific rules are required to determine whether or not two requests are conflicting. We present mechanisms for two common types of software transactional memory implementations, one based on transaction logs with two-phase locking, and the other based on multiversion concurrency control. We show that standard concurrency control mechanisms designed for these types cannot be used directly to ensure one-copy serializability, and introduce our solutions.
Article
The primary concern of traditional Byzantine fault tolerance is to ensure strong replica consistency by executing incoming requests sequentially according to a total order. Speculative execution at both clients and server replicas has been proposed as a way of reducing the end-to-end latency. In this article, we introduce optimistic Byzantine fault tolerance. Optimistic Byzantine fault tolerance aims to achieve higher throughput and lower end-to-end latency by using a weaker replica consistency model. Instead of ensuring strong safety as in traditional Byzantine fault tolerance, nonfaulty replicas are brought to a consistent state periodically and on-demand in optimistic Byzantine fault tolerance. Not all applications are suitable for optimistic Byzantine fault tolerance. We identify three types of applications, namely, realtime collaborative editing, event stream processing, and services constructed with conflict-free replicated data types, as good candidates for applying optimistic Byzantine fault tolerance. Furthermore, we provide a design guideline on how to achieve eventual consistency and how to recover from conflicts at different replicas. In optimistic Byzantine fault tolerance, a replica executes a request immediately without first establishing a total order of the message, and Byzantine agreement is used only to establish a common state synchronization point and the set of individual states needed to resolve conflicts. The recovery mechanism ensures both replica consistency and the validity of the system by identifying and removing the operations introduced by faulty clients and server replicas.
Book
This book covers the most essential techniques for designing and building dependable distributed systems. Instead of covering a broad range of research works for each dependability strategy, the book focuses only a selected few (usually the most seminal works, the most practical approaches, or the first publication of each approach) are included and explained in depth, usually with a comprehensive set of examples. The goal is to dissect each technique thoroughly so that readers who are not familiar with dependable distributed computing can actually grasp the technique after studying the book. The book contains eight chapters. The first chapter introduces the basic concepts and terminologies of dependable distributed computing, and also provide an overview of the primary means for achieving dependability. The second chapter describes in detail the checkpointing and logging mechanisms, which are the most commonly used means to achieve limited degree of fault tolerance. Such mechanisms also serve as the foundation for more sophisticated dependability solutions. Chapter three covers the works on recovery-oriented computing, which focus on the practical techniques that reduce the fault detection and recovery times for Internet-based applications. Chapter four outlines the replication techniques for data and service fault tolerance. This chapter also pays particular attention to optimistic replication and the CAP theorem. Chapter five explains a few seminal works on group communication systems. Chapter six introduces the distributed consensus problem and covers a number of Paxos family algorithms in depth. Chapter seven introduces the Byzantine generals problem and its latest solutions, including the seminal Practical Byzantine Fault Tolerance (PBFT) algorithm and a number of its derivatives. The final chapter covers the latest research results on application-aware Byzantine fault tolerance, which is an important step forward towards practical use of Byzantine fault tolerance techniques.