Conference Paper

# The Consensus Problem in Unreliable Distributed Systems (A Brief Survey)

Authors:
To read the full-text of this research, you can request a copy directly from the author.

## Abstract

Agreement problems involve a system of processes, some of which may be faulty. A fundamental problem of fault-tolerant distributed computing is for the reliable processes to reach a consensus. We survey the considerable literature on this problem that has developed over the past few years and give an informal overview of the major theoretical results in the area. 1 Agreement Problems To achieve reliability in distributed systems, protocols are needed which enable the system as a whole to continue to function despite the failure of a limited number of components. These protocols, as well as many other distributed computing problems, requires cooperation among the processes. Fundamental to such cooperation is the problem of agreeing on a piece of data upon which the computation depends. For example, the data managers in a distributed database system need to agree on whether to commit or abort a given transaction [20, 26]. In a replicated file system, the nodes might need to agree o...

## No full-text available

... In distributed agreement problems, a set of n participants, also called nodes, competes or cooperates to achieve the same goal which is reaching a common agreement without a central authority. There are three well-known and closely related distributed agreement problems [45]: the interactive consistency problem [90], the Byzantine generals problem [75] and the consensus problem [46]. These problems di erentiate from who provides the initial value(s) and on what is the agreement. ...
... The Byzantine generals problem can be applied in database management systems [51] where a user command is executed in each database stored by nodes and an agreed result has to be sent back to user, for example. Regarding the consensus problem, it may address the clock synchronization problem [45] where each node has an initial clock value and a periodic agreement on a single clock value is reached such that two honest nodes never di er by more than some value. ...
... Another closely related problem which has been also studied extensively in the literature is the consensus problem [45,46]. Every node has its own initial value that may be di erent from others. ...
Thesis
In recent years, two research domains in cryptography have received considerable attention: consensus protocols for blockchain technologies due to the emergence of cryptocurrencies, and quantum cryptanalysis due to the threat of quantum computers. Naturally, our research topics are geared towards these two research domains that are studied separately in this thesis.In the first part, we analyze the security of consensus protocols which are one of main challenges in these technologies. We focus more specifically on the leader election of consensus protocols. After a study of the state of the art on consensus protocols before and after the emergence of blockchain technologies, we study the security of two promising approaches to construct these protocols, called Algorand and Single Secret Leader Election. As a result, we define a security model of leader election with five security properties that address well-known issues and attacks against consensus protocols. Then, we provide a new leader election protocol called LEP-TSP intended to be used in private setting and prove that LEP-TSP meets the expected security properties while more than two third of participants are honest. As additional work, we provide a high level description of a new consensus protocol called Useful Work that uses the computing power to solve any real world problem.In the second part of this thesis, we review the best known cryptanalysis results on Misty schemes and we provide new quantum cryptanalysis results. First, we describe non-adaptive quantum chosen plaintext attacks (QCPA) against 4-round Misty L, 4-round Misty LKF, 3-round Misty R and 3-round Misty RKF schemes. We extend the QCPA attack against 3-round Misty RKF schemes to recover the keys of d-round Misty RKF schemes. As additional work, we show that the best known non-quantum attack against 3-round Misty R schemes is optimal.
... The impossibility result of the two generals problem has had far-reaching implications in the field of distributed protocols and databases, including the study of binary consensus [19]. In the binary consensus problem, every agent is initially assigned some binary value, referred to as the agent's initial opinion. ...
... where (19) follows from the law of total probability and (21) holds for all large enough n, due to Proposition 1. Furthermore, ...
... where (C.15) is true since Bin (1, q ′ ) ≤ 1 with probability one. It follows by symmetry that 19) which implies that ...
Preprint
In this work, we analyze the performance of a simple majority-rule protocol solving a fundamental coordination problem in distributed systems - \emph{binary majority consensus}, in the presence of probabilistic message loss. Using probabilistic analysis for a large scale, fully-connected, network of $2n$ agents, we prove that the Simple Majority Protocol (SMP) reaches consensus in only three communication rounds with probability approaching $1$ as $n$ grows to infinity. Moreover, if the difference between the numbers of agents that hold different opinions grows at a rate of $\sqrt{n}$, then the SMP with only two communication rounds attains consensus on the majority opinion of the network, and if this difference grows faster than $\sqrt{n}$, then the SMP reaches consensus on the majority opinion of the network in a single round, with probability converging to $1$ exponentially fast as $n \rightarrow \infty$. We also provide some converse results, showing that these requirements are not only sufficient, but also necessary.
... 3. Furthermore, Jadbabaie et al. [16] utilized an undirected graph model with a single leader, but we consider a directed one with multiple influencing agents. 4. Whereas Genter and Stone [9,10] solely give experimental results regarding this behaviour, we also propose theoretical proofs of useful properties while such behaviour is incorporated. ...
... In the traditional consensus problem of distributed computing [4], the goal is for all processors in a network to agree on a certain value. The problem becomes challenging as the network becomes more complex, and the consensus protocol should handle sparse networks, asynchronous timing and possible faults in the system [5,20,19]. ...
... We shall now consider Equation 4. Based on this equation, we note that the following quantity is invariant: AV G2(t) = i |Ni(t)|θi(t) i |Ni(t)| Based on Lemma 3.2, we would like to characterize the group decision value of a connected component in the influencing neighbors graph. ...
Preprint
This work concentrates on different aspects of the \textit{consensus problem}, when applying it to a swarm of flocking agents. We examine the possible influence an external agent, referred to as {\em influencing agent} has on the flock. We prove that even a single influencing agent with a \textit{Face Desired Orientation behaviour} that is injected into the flock is sufficient for guaranteeing desired consensus of the flock of agents which follow a Vicsek-inspired Model. We further show that in some cases this can be guaranteed also in dynamic environments.
... Distributed systems are associated with a fundamental problem, i.e., to achieve reliability in the presence of a number of faulty processes (nodes) in the system [16,34,39]. This problem, referred to as consensus problem. ...
... • Crash failure -it occurs when a node stops its activity, abruptly, and does not resume its functions. In this failure, other nodes in the system can detect the crash [16,34,39]. • Byzantine failure -in this failure the node behaves arbitrarily and no assumption can be made about its behaviour. ...
... • Byzantine failure -in this failure the node behaves arbitrarily and no assumption can be made about its behaviour. It may send conflicting messages to other nodes or it may remain silent and act as dead for a while then revive itself [16,34,35,39,40]. Byzantine problem was first introduced by Lamport et al. [40], in Byzantine Generals problem. ...
Article
Full-text available
Blockchain’s popularity has seen a historic rise over the last decade. However, existing blockchain systems have a major issue with scalability, which has become one of the main obstacles in technology’s adoption in mainstream. There have been several attempts to address this limitation by identifying Blockchain’s scalability/performance bottlenecks (e.g. those mainly related to consensus algorithms), and thus proposed different solutions (e.g., new consensus protocols) to address such limitations. Other works applied sharding to tackle the issue. All solutions however have mainly focused on Cryptocurrency applications, and thus addressing the scalability of blockchain systems for general applications remains a concern. This work proposes a scalable blockchain protocol for general applications (i.e., not restricted to Cryptocurrencies). To improve the two major factors affecting transaction scalability, namely throughput and latency, we needed to modify both the blockchain structure as well as the block generation process. ZyConChain, the proposed Blockchain system, introduces three types of blocks that form three separate chains: parentBlock, sideBlock and state block. These blocks are generated based on different consensus algorithms, as each algorithm has specific properties that make it suitable for each type of block. To improve the overall performance, ZyConChain generates sideBlocks (that carry transactions) at a high rate and keep them in a pool. To generate parentBlock, miners, instead of packing transactions into a block as they do in conventional blockchains, pack sideBlocks into a parentBlock. SideBlocks are generated based on an adapted Zyzzyva consensus protocol, with O(log n) complexity. This has reduced the final consensus complexity per transaction, in comparison to previous work. To enable the protocol to scale out with the increase in the number of nodes, ZyConChain applied sharding technique. Parallel state chains have also been introduced to address cross-shard transactions.
... The proof-of-work and proof-of-stake protocols that we have just reviewed try to solve a problem that is very similar to a well known problem in the academic literature. This problem, introduced as soon as 1983 by Fisher et al, in [32] is known as the consensus problem. Its blockchain version is introduced by Gramoli et al, in [37]. ...
... Sharding is over the scope of this paper but it is to be noted that the scalability trilemma is not an impossibility result but an observation. 32 1.4 Application layer ...
Thesis
The technological lock that this thesis addresses is therefore the interoperability of blockchains. Each blockchain is an independent environment with its own network, protocol and rules. They were not necessarily designed with interoperability in mind. By checking the history of transactions and identifying the author of a transaction thanks to digital signatures, it is possible to verify whether or not a transaction can be added to the chain. But to date, there is no mechanism for coordinating transactions between multiple chains to make an exchange. A system for exchanging crypto-assets between two chains seeks to satisfy the following properties; atomicity, the exchange takes place entirely or not at all, security, the participants do not risk losing their crypto-assets and finally vivacity, the duration of the exchange must be limited in time.
... A node is called a Byzantine node if it could behave arbitrarily as follows [87]. ...
... The Byzantine node model is a commonly used fault model in distributed systems, and an algorithm that can tolerate such nodes is called a Byzantine Fault Tolerance (BFT) algorithm, which is defined as follows. [87]). A set of nodes achieve state machine replication and satisfy consistency and liveness in the presence of Byzantine nodes. ...
Article
Sharding is the prevalent approach to breaking the trilemma of simultaneously achieving decentralization, security, and scalability in traditional blockchain systems, which are implemented as replicated state machines relying on atomic broadcast for consensus on an immutable chain of valid transactions. Sharding is to be understood broadly as techniques for dynamically partitioning nodes in a blockchain system into subsets (shards) that perform storage, communication, and computation tasks without fine-grained synchronization with each other. Despite much recent research on sharding blockchains, much remains to be explored in the design space of these systems. Towards that aim, we conduct a systematic analysis of existing sharding blockchain systems and derive a conceptual decomposition of their architecture into functional components and the underlying assumptions about system models and attackers they are built on. The functional components identified are node selection, epoch randomness, node assignment, intra-shard consensus, cross-shard transaction processing, shard reconfiguration, and motivation mechanism. We describe interfaces, functionality, and properties of each component and show how they compose into a sharding blockchain system. For each component, we systematically review existing approaches, identify potential and open problems, and propose future research directions. We focus on potential security attacks and performance problems, including system throughput and latency concerns such as confirmation delays. We believe our modular architectural decomposition and in-depth analysis of each component, based on a comprehensive literature study, provides a systematic basis for conceptualizing state-of-the-art sharding blockchain systems, proving or improving security and performance properties of components, and developing new sharding blockchain system designs.
... The impossibility result of the two generals' problem had far-reaching implications in the field of distributed protocols and databases, including the study of binary consensus [19]. In the binary consensus problem, every agent is initially assigned some binary value, referred to as the agent's initial opinion. ...
... where (19) follows from the law of total probability and (21) holds for all large enough n, due to Proposition 1. Furthermore, P{B n } = P{B n |A n } · P{A n } + P{B n |A c n } · P{A c n } (22) ≥ P{B n |A n } · P{A n } (23) ...
Article
Full-text available
In this work, we analyze the performance of a simple majority-rule protocol solving a fundamental coordination problem in distributed systems—binary majority consensus— in the presence of probabilistic message loss. Using probabilistic analysis for a large-scale, fully-connected, network of 2n agents, we prove that the Simple Majority Protocol (SMP) reaches consensus in only three communication rounds, with probability approaching 1 as n grows to infinity. Moreover, if the difference between the numbers of agents that hold different opinions grows at a rate of n, then the SMP with only two communication rounds attains consensus on the majority opinion of the network, and if this difference grows faster than n, then the SMP reaches consensus on the majority opinion of the network in a single round, with probability converging to 1 as exponentially fast as n→∞. We also provide some converse results, showing that these requirements are not only sufficient, but also necessary.
... Definition 22 (Byzantine Node). A node is called a Byzantine node if it could behave arbitrarily as follows [79]. ...
... Definition 23 (Byzantine Fault Tolerance [79]). A set of nodes achieve state machine replication and satisfy consistency and liveness in the presence of Byzantine nodes. ...
Preprint
Full-text available
Sharding is the prevalent approach to breaking the trilemma of simultaneously achieving decentralization, security, and scalability in traditional blockchain systems, which are implemented as replicated state machines relying on atomic broadcast for consensus on an immutable chain of valid transactions. Sharding is to be understood broadly as techniques for dynamically partitioning nodes in a blockchain system into subsets (shards) that perform storage, communication, and computation tasks without fine-grained synchronization with each other. Despite much recent research on sharding blockchains, much remains to be explored in the design space of these systems. Towards that aim, we conduct a systematic analysis of existing sharding blockchain systems and derive a conceptual decomposition of their architecture into functional components and the underlying assumptions about system models and attackers they are built on. The functional components identified are node selection, epoch randomness, node assignment, intra-shard consensus, cross-shard transaction processing, shard reconfiguration, and motivation mechanism. We describe interfaces, functionality, and properties of each component and show how they compose into a sharding blockchain system. For each component, we systematically review existing approaches, identify potential and open problems, and propose future research directions. We focus on potential security attacks and performance problems, including system throughput and latency concerns such as confirmation delays. We believe our modular architectural decomposition and in-depth analysis of each component, based on a comprehensive literature study, provides a systematic basis for conceptualizing state-of-the-art sharding blockchain systems, proving or improving security and performance properties of components, and developing new sharding blockchain system designs.
... The validity is defined as if all honest nodes start from the initial value b ∈ {0, 1}, then all honest nodes must decide on b. Here we consider the multi-value case, and we follow the definition in[3].VOLUME 4, 2016 ...
... In fact, even the weakly fair validity cannot be achieved. We will elucidate it in Section IV.3 In this paper, fair validity and responsiveness in our protocols depend on λ.2VOLUME 4, 2016 ...
Article
Full-text available
The Byzantine general problem is the core problem that consensus algorithms are trying to solve, which is at the heart of the design of blockchains. As a result, we have seen numerous proposals of consensus algorithms in recent years, trying to improve the level of decentralization, performance, and security of blockchains. In our opinion, there are two most challenging issues when we consider the design of such algorithms in the context of powering blockchains in practice. First, the outcome of a consensus algorithm usually depends on the underlying incentive model, so each participant should have an equal probability of receiving rewards for its work. Secondly, the protocol should be able to resist network failures, such as cloud services shutdown, while maintaining high performance otherwise. We address these two critical issues in this paper. First, we propose a new metric, called fair validity, for measuring the performance of Byzantine agreements. Intuitively, fair validity provides a lower bound for the probability of acceptances of honest nodes’ proposals. This is a strong notion of fairness, and we argue that it is crucial for the success of a blockchain in practice. We then show that any Byzantine agreement could not achieve fair validity in an asynchronous network, so we will focus on synchronous protocols. This leads to our second contribution: we propose a fair, responsive, and partition-resilient Byzantine agreement protocol able to tolerate up to 1/3 corruptions. As we will show in the paper, our protocol achieves fair validity and is responsive in the sense that the termination time only depends on actual network delay, as opposed to arbitrary, pre-determined time-bound. Furthermore, our proposal is partition-resilient. Last but not least, experimental results show that our Byzantine agreement protocol outperforms a wide variety of state-of-art synchronous protocols, combining the best from both theoretic and practical worlds.
... The committee is guaranteed to reach Byzantine agreement [19,22,47] in the presence of an adversary that can corrupt nodes and control their actions. The Algorand protocol is resilient to such adversary as long as it cannot corrupt more than 1/3 of the nodes. ...
Preprint
Full-text available
Founded in 2017, Algorand is one of the world's first carbon-negative, public blockchains inspired by proof of stake. Algorand uses a Byzantine agreement protocol to add new blocks to the blockchain. The protocol can tolerate malicious users as long as a supermajority of the stake is controlled by non-malicious users. The protocol achieves about 100x more throughput compared to Bitcoin and can be easily scaled to millions of nodes. Despite its impressive features, Algorand lacks a reward-distribution scheme that can effectively incentivize nodes to participate in the protocol. In this work, we study the incentive issue in Algorand through the lens of game theory. We model the Algorand protocol as a Bayesian game and propose a novel reward scheme to address the incentive issue in Algorand. We derive necessary conditions to ensure that participation in the protocol is a Bayesian Nash equilibrium under our proposed reward scheme even in the presence of a malicious adversary. We also present quantitative analysis of our proposed reward scheme by applying it to two real-world deployment scenarios. We estimate the costs of running an Algorand node and simulate the protocol to measure the overheads in terms of computation, storage, and networking.
... Consensus is a fundamental problem in distributed computation and as such has seen many works dedicated to solving it and its variations. Early work focused on solving consensus in multiple contexts such as in the presence of faults [13]. In a fully asynchronous setting, the presence of a even single undetectable fault was shown to produce the possibility of nontermination [14], leading to consensus algorithms assuming some level of faults being detectable [6] or focuses on reducing synchrony needed [11] to achieve consensus. ...
Preprint
Full-text available
Consensus and leader election are fundamental problems in distributed systems. Consensus is the problem in which all processes in a distributed computation must agree on some value. Average consensus is a popular form of consensus, where the agreed upon value is the average of the initial values of all the processes. In a typical solution for consensus, each process learns the value of others' to determine the final decision. However, this is undesirable if processes want to keep their values secret from others. With this motivation, we present a solution to privacy-preserving average consensus, where no process can learn the initial value of any other process. Additionally, we augment our approach to provide outlier resistance, where extreme values are not included in the average calculation. Privacy is fully preserved at every stage, including preventing any process from learning the identities of processes that hold outlier values. To our knowledge, this is the first privacy-preserving average consensus algorithm featuring outlier resistance. In the context of leader election, each process votes for the one that it wants to be the leader. The goal is to ensure that the leader is elected in such a way that each vote remains secret and the sum of votes remain secret during the election. Only the final vote tally is available to all processes. This ensures that processes that vote early are not able to influence the votes of other processes. We augment our approach with shallow ranked voting by allowing processes to not only vote for a single process, but to designate a secondary process to vote towards in the event that their primary vote's candidate does not win the election.
... The algorithm was based on task parallelism where the work load is shared between replicas of a logical process and message passing done with MPI. Process replication was examined and evaluated as a viable solution for fault tolerance and reliability of exascale systems [37]. This research mainly focused on MPI applications and MPI process replication, which require consistency between replicas. ...
Article
Full-text available
Large-scale HPC systems experience failures arising from faults in hardware, software, and/or networking. Failure rates continue to grow as systems scale up and out. Crash fault tolerance has up to now been the focus when considering means to augment the Message Passing Interface (MPI) for fault-tolerant operations. This narrow model of faults (usually restricted only to process or node failures) is insufficient. Without a more general model for consensus, gaps in the ability to detect, isolate, mitigate, and recover HPC applications efficiently will arise. Focusing on crash failures is insufficient because a chain of underlying components may lead to system failures in MPI. What is more, clusters and leadership-class machines alike often have Reliability, Availability, and Serviceability Systems to convey predictive and real-time fault and error information, which does not map strictly to process and node crashes. A broader study of failures beyond crash failures in MPI will thus be useful in conjunction with consensus mechanism for developers as they continue to design, develop, and implement fault-tolerant HPC systems that reflect observable faults in actual systems. We describe key factors that must be considered during consensus-mechanism design. We illustrate some of the current MPI fault tolerance models based on use cases. We offer a novel classification of common consensus mechanisms based on these factors such as the network model, failure types, and based on use cases (e.g., fault detection, synchronization) of the consensus in the computation process, including crash fault tolerance as one category.
... The consensus problem is traditionally considered in a distributed system. In distributed systems, we need a protocol that enables consensus on values (e.g., data files) to maintain state consistency among system nodes even in the existence of faulty nodes; see [17] for a survey. In this paper, we consider each system node as a machine learned predictor. ...
Preprint
Full-text available
Blockchains with smart contracts are distributed ledger systems which achieve block state consistency among distributed nodes by only allowing deterministic operations of smart contracts. However, the power of smart contracts is enabled by interacting with stochastic off-chain data, which in turn opens the possibility to undermine the block state consistency. To address this issue, an oracle smart contract is used to provide a single consistent source of external data; but, simultaneously this introduces a single point of failure, which is called the oracle problem. To address the oracle problem, we propose an adaptive conformal consensus (ACon$^2$) algorithm, which derives consensus from multiple oracle contracts via the recent advance in online uncertainty quantification learning. In particular, the proposed algorithm returns a consensus set, which quantifies the uncertainty of data and achieves a desired correctness guarantee in the presence of Byzantine adversaries and distribution shift. We demonstrate the efficacy of the proposed algorithm on two price datasets and an Ethereum case study. In particular, the Solidity implementation of the proposed algorithm shows the practicality of the proposed algorithm, implying that online machine learning algorithms are applicable to address issues in blockchains.
... Consensus algorithms are a well-known research area in distributed systems [7]. In this regard, many consensuses were designed to maintain a distributed database, a distributed log, or a state-machine replication system by using a set of fully interconnected servers. ...
Article
Blockchain designed for Mobile Ad hoc Networks (MANETs) and mesh networks is an emerging research topic that has to cope with the network partition problem. However, existing consensus algorithms used in blockchain have been designed to work in a fully connected network with reliable communication. As this assumption does not hold anymore in mobile wireless networks, we describe in this paper the problem of network partitions and their impact on blockchain. Then, we propose a new consensus algorithm called Consensus for Mesh (C4M) which is inspired by RAFT as a solution to this problem. The C4M consensus algorithm is integrated with Blockgraph, a blockchain solution for MANET and mesh networks. We implemented our solution in NS-3 to analyze its performances through simulations. The simulation results gave the first characterization of our algorithm, its performance, and its limits, especially in case of topology changes.
... Consensus has always been an essential problem in distributed systems [31]. Therefore, decentralized consensus plays a key role in building towards robust solutions for the edge-cloud continuum. ...
Preprint
Full-text available
Robotic systems are more connected, networked, and distributed than ever. New architectures that comply with the \textit{de facto} robotics middleware standard, ROS\,2, have recently emerged to fill the gap in terms of hybrid systems deployed from edge to cloud. This paper reviews new architectures and technologies that enable containerized robotic applications to seamlessly run at the edge or in the cloud. We also overview systems that include solutions from extension to ROS\,2 tooling to the integration of Kubernetes and ROS\,2. Another important trend is robot learning, and how new simulators and cloud simulations are enabling, e.g., large-scale reinforcement learning or distributed federated learning solutions. This has also enabled deeper integration of continuous interaction and continuous deployment (CI/CD) pipelines for robotic systems development, going beyond standard software unit tests with simulated tests to build and validate code automatically. We discuss the current technology readiness and list the potential new application scenarios that are becoming available. Finally, we discuss the current challenges in distributed robotic systems and list open research questions in the field.
... Tamperproof: Blockchain technology is tamperproof [9,12]. A vital point that should be considered when election is being conducted is how tamperproof the system is. ...
Article
Voting system, in Nigeria has received a major setback over the years where the citizens has stopped believing in its system as free, reliable, tamperproof, without interference and credible. The present system of voting in Nigeria has led to incessant riot, given rise to election rigging, double voting, ballot snatching, rigging by tampering with results, third party interference, increase in death rate and unfavorable atmosphere for business and tourism. The traditional system of voting in Nigeria is paper ballot voting, where citizens come out and line up to do a paper thumb print on the ballot paper of their preferred candidate, this system is not reliable, had been tampered with over the years, leads to double voting, loss of ballot boxes and snatching by third party interference, it is also rigorous as most registered voters end up not exercising their franchise. Nigeria, needs a voting system that is tamperproof, disallows double spending, can keep accurate record of voters, and does not allow third party interference, hence the need for blockchain technology. Abstract-Voting system, in Nigeria has received a major setback over the years where the citizens has stopped believing in its system as free, reliable, tamperproof, without interference and credible. The present system of voting in Nigeria has led to incessant riot, given rise to election rigging, double voting, ballot snatching, rigging by tampering with results, third party interference, increase in death rate and unfavorable atmosphere for business and tourism. The traditional system of voting in Nigeria is paper ballot voting, where citizens come out and line up to do a paper thumb print on the ballot paper of their preferred candidate, this system is not reliable, had been tampered with over the years, leads to double voting, loss of ballot boxes and snatching by third party interference, it is also rigorous as most registered voters end up not exercising their franchise. Nigeria, needs a voting system that is tamperproof, disallows double spending, can keep accurate record of voters, and does not allow third party interference, hence the need for blockchain technology. Blockchain technology is a noble disruptive technology that is transparent, immutable, need no third party interference and also serve as public repository of record keeping. For the purpose of this research, ethereum blockchain would be considered, a voting system blockchain application is proposed where, smart contract is written has the executable code that will serve as a policy guide and rule in the blockchain application that is proposed for voting in Nigeria. The aim of this proposed research study is to build blockchain application and use it as a tool to secure voting system in the Nigeria election environment.
... The notion of Byzantine agreement was introduced for the binary case (i.e. when the initial value consists of a bit) by Lamport, Shostak, and Pease [9], then quickly extended to arbitrary initial values (see the survey of Fischer [6] ...
Article
Full-text available
In this paper we present the Multidimensional Byzantine Agreement (MBA) Protocol, a leaderless Byzantine agreement protocol defined for complete and synchronous networks that allows a network of nodes to reach consensus on a vector of relevant information regarding a set of observed events. The consensus process is carried out in parallel on each component, and the output is a vector whose components are either values with wide agreement in the network (even if no individual node agrees on every value) or a special value ⊥\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bot$$\end{document} that signals irreconcilable disagreement. The MBA Protocol is probabilistic and its execution halts with probability 1, and the number of steps necessary to halt follows a Bernoulli-like distribution. The design combines a Multidimensional Graded Consensus and a Multidimensional Binary Byzantine Agreement, the generalization to the multidimensional case of two protocols presented by Micali et al. (SIAM J Comput 26(4):873–933, 1997; Byzantine agreement, made trivial, 2016). We prove the correctness and security of the protocol assuming a synchronous network where less than a third of the nodes are malicious.
... Trust between NGOs and donor agencies is managed in a distributed system with restricted access. Using consensus [23,24], only those entities are allowed to join the network that can be verified as valid NGOs or donor agencies. The verification includes looking up the public keys of known NGOs and donor agencies and confirming that the given entity is one of these valid entities, using digital signatures. ...
Article
Full-text available
Citation: Rehman, E.; Khan, M.A.; Soomro, T.R.; Taleb, N.; Afifi, M.A.; Ghazal, T.M. Using Blockchain to Ensure Trust between Donor Agencies and NGOs in UnderDeveloped Countries. Computers 2021, 10, 98. https://doi.org/10.3390/ computers10080098 Academic Editors: Hossain Shahriar and Corrado Aaron Visaggio Abstract: Non-governmental organizations (NGOs) in underdeveloped countries are receiving funds from donor agencies for various purposes, including relief from natural disasters and other emergencies, promoting education, women empowerment, economic development, and many more. Some donor agencies have lost their trust in NGOs in underdeveloped countries, as some NGOs have been involved in the misuse of funds. This is evident from irregularities in the records. For instance, in education funds, on some occasions, the same student has appeared in the records of multiple NGOs as a beneficiary, when in fact, a maximum of one NGO could be paying for a particular beneficiary. Therefore, the number of actual beneficiaries would be smaller than the number of claimed beneficiaries. This research proposes a blockchain-based solution to ensure trust between donor agencies from all over the world, and NGOs in underdeveloped countries. The list of National IDs along with other keys would be available publicly on a blockchain. The distributed software would ensure that the same set of keys are not entered twice in this blockchain, preventing the problem highlighted above. The details of the fund provided to the student would also be available on the blockchain and would be encrypted and digitally signed by the NGOs. In the case that a record inserted into this blockchain is discovered to be fake, this research provides a way to cancel that record. A cancellation record is inserted, only if it is digitally signed by the relevant donor agency.
... These algorithms find applications in many fields including sensor networks [11], coordination of vehicles [12], or even blockchain [13]. In practice, consensus algorithms must be robust to faults that arise from relatively frequent occurrences of interrupted communication links or corrupted signals [14]. Therefore, the convergence of resilient consensus algorithms was rigorously studied under different considerations for the nature of adversarial attacks [15], graph topology [16], [17], or frequency of communication [18]. ...
... The third commonality among traditional group communication systems named by the authors is the lack of utilization of the consensus abstraction. Since the definition of the consensus problem in (Michael J. Fischer, 1983), the utility of consensus in implementing atomic broadcast, group membership and view synchrony has been shown. Despite this, only one of the systems analyzed, namely Phoenix, uses a consensus component. ...
Article
Full-text available
Due to the continuous growth of modern networks and the resulting rise in complexity of communication protocols, efficient and elegant solutions to the problem of network communication are called for. Two approaches that have been proposed in the past are the extension of middleware architecture with new components and the use of intelligent network controllers that adapt to changing circumstances. This paper seeks to explore the possibility of combining the strengths of the two. The atomic-/generic-broadcast based architecture for group communication systems provides powerful abstractions for improving efficiency in message transfer, as well as improved means of ensuring consistency in the network. The capability of intelligent network controllers to find potent parameterizations could make for a powerful addition to this architecture. Furthermore, the integration of a learning system would allow the system to prepare for and react to new situations, further increasing its reliability.
... Agreement and consensus protocols in the presence of faults or attacks have been long studied in computer science [22,43]. A seminal work that sparked interest in agreement algorithms resilient to faults and adversaries is [44], which introduces the Byzantine Generals Problem. ...
Thesis
... The notion of Byzantine agreement was introduced for the binary case (i.e. when the initial value consists of a bit) by Lamport, Shostak, and Pease [9], then quickly extended to arbitrary initial values (see the survey of Fischer [7]). A (binary) Byzantine agreement protocol or Byzantine Fault Tolerant (BFT) protocol, is a protocol that allows a set of mutually mistrusting players to reach agreement on an arbitrary (respectively binary) value. ...
Preprint
Full-text available
In this paper we will present the Multidimensional Byzantine Agreement (MBA) Protocol, a leaderless Byzantine agreement protocol defined for complete and synchronous networks that allows a network of nodes to reach consensus on a vector of relevant information regarding a set of observed events. The consensus process is carried out in parallel on each component, and the output is a vector whose components are either values with wide agreement in the network (even if no individual node agrees on every value) or a special value $\bot$ that signals irreconcilable disagreement. The MBA Protocol is probabilistic and its execution halts with probability 1, and the number of steps necessary to halt follows a Bernoulli-like distribution. The design combines a Multidimensional Graded Consensus and a Multidimensional Binary Byzantine Agreement, the generalization to the multidimensional case of two protocols by Micali and Feldman. We prove the correctness and security of the protocol assuming a synchronous network where less than a third of the nodes are malicious.
... These algorithms find applications in many fields including sensor networks [11], coordination of vehicles [12], or even blockchain [13]. In practice, consensus algorithms must be robust to faults that arise from relatively frequent occurrences of interrupted communication links or corrupted signals [14]. Therefore, the convergence of resilient consensus algorithms was rigorously studied under different considerations for the nature of adversarial attacks [15], graph topology [16], [17], or frequency of communication [18]. ...
Preprint
Full-text available
Recently, many cooperative distributed multi-agent reinforcement learning (MARL) algorithms have been proposed in the literature. In this work, we study the effect of adversarial attacks on a network that employs a consensus-based MARL algorithm. We show that an adversarial agent can persuade all the other agents in the network to implement policies that optimize an objective that it desires. In this sense, the standard consensus-based MARL algorithms are fragile to attacks.
... From this, we can say that a consensus algorithm is a mechanism to allow different nodes in a network to agree on some information and work as a coherent group despite the failures of some of the nodes [22]. The study of Fischer et al. [23], states that a Byzantine failure can be a faulty process that can send messages when it is not supposed to. This means, sending conflicting information that oppose the general view of the system. ...
Article
Full-text available
Internet of Things (IoT) represent a significant area of network research due to the many opportunities derived from the problematics and applications. The most recurring problematics are the mobility, the availability and also the limited resources. A well-known interest in networks and therefore in IoT is to monitor properties of the network and nodes [1, 2]. The problematics can have a significant impact on the monitoring efforts. Mobility and availability can create incomplete results for the monitoring. It can also represent a challenge to monitor distributed properties. The literature states that accuracy is not always reliable and difficult to achieve due to dynamic properties of the IoT in particular with M2M communications and mobile devices. Therefore we propose a distributed monitoring architecture that relies on multiple points of observation. It provides a consensus mechanism that allows it to aggregate and provides a more meaningful and accurate result. We support our proposal with numerous mathematical definitions that model local results for a single node and global results for the network. Finally, we evaluate our architecture with an emulator that relies on AWS, NS3, and Docker with varying number of nodes, network size, network density, speed, mobility algorithms and timeouts. We obtain very promising results, especially regarding accuracy.
... Most solutions typically adopt a predefined frequency [23,24], but this strategy does not take into account that such frequency should be properly tuned to match the dynamics of the data being aggregated. Also, synchronising processes to let them organise aggregation in successive rounds is far from easy in asynchronous systems lacking a shared physical clock [25,26]. Our proposal starts from similar problems, but is specifically conceived for distributed field-based aggregation, and accounts for the strict relations between the spatial and temporal dimensions that exist in situated computations. ...
Preprint
Full-text available
Emerging application scenarios, such as cyber-physical systems (CPSs), the Internet of Things (IoT), and edge computing, call for coordination approaches addressing openness, self-adaptation, heterogeneity, and deployment agnosticism. Field-based coordination is one such approach, promoting the idea of programming system coordination declaratively from a global perspective, in terms of functional manipulation and evolution in "space and time" of distributed data structures called fields. More specifically regarding time, in field-based coordination (as in many other distributed approaches to coordination) it is assumed that local activities in each device are regulated by a fair and unsynchronised fixed clock working at the platform level. In this work, we challenge this assumption, and propose an alternative approach where scheduling is programmed in a natural way (along with usual field-based coordination) in terms of causality fields, each enacting a programmable distributed notion of a computation "cause" (why and when a field computation has to be locally computed) and how it should change across time and space. Starting from low-level platform triggers, such causality fields can be organised into multiple layers, up to high-level, collectively-computed time abstractions, to be used at the application level. This reinterpretation of time in terms of articulated causality relations allows us to express what we call "time-fluid" coordination, where scheduling can be finely tuned so as to select the triggers to react to, generally allowing to adaptively balance performance (system reactivity) and cost (resource usage) of computations. We formalise the proposed scheduling framework for field-based coordination in the context of the field calculus, discuss an implementation in the aggregate computing framework, and finally evaluate the approach via simulation on several case studies.
... Another potential approach would be to attempt to reduce IC to Byzantine Agreement (BA), by running n parallel instances of BA, as it was suggested for synchronous systems [30]. In each instance, a node n i would spread its private value v i to the rest of the system. ...
Article
Interactive consistency is the problem in which n distinct nodes, each having its own private value, where up to t may be Byzantine, run an algorithm that allows all non-faulty nodes to infer the values of each other node. This problem is relevant to critical applications that rely on the combination of the opinions of multiple peers to provide a service. Examples include monitoring a content source to prevent equivocation or to track variability in the content provided, and resolving divergent state amongst the nodes of a distributed system. Previous works assume a fully synchronous system, where one can make strong assumptions such as negligible message delivery delays and/or detection of absent messages. However, practical, real-world systems are mostly asynchronous, i.e., they exhibit only some periods of synchrony during which message delivery is timely, thus requiring a different approach. In this paper, we present a thorough study of practical interactive consistency. We leverage the vast prior work on broadcast and Byzantine consensus algorithms to design, implement and evaluate a set of randomized algorithms, with only a single synchronization barrier and varying message complexities, that can be used to achieve interactive consistency in real-world distributed systems. We present formal proofs of correctness and message complexity of our proposed algorithms. We provide a complete, open-source implementation of each proposed interactive consistency algorithm by building a multi-layered software stack of algorithms that includes several broadcast algorithms, as well as a binary and a multi-valued consensus algorithm. Most of these algorithms have never been implemented and evaluated in a real system before. Finally, we analyze the performance of our suite of algorithms experimentally by testing both single instance and multiple parallel instances of each alternative and present a case study of achieving interactive consistency in a real-world distributed e-voting system.
... Although different in several aspects such as performances, permissions, provable security and computational completeness, any blockchain implementation satisfies the above definition. As an example, a central blockchain's tool that differs among implementations is the consensus algorithm [8]. It solves the following problem, which is crucial in designing an efficient SMR protocol. ...
Article
Full-text available
Anomaly detection tools play a role of paramount importance in protecting networks and systems from unforeseen attacks, usually by automatically recognizing and filtering out anomalous activities. Over the years, different approaches have been designed, all focused on lowering the false positive rate. However, no proposal has addressed attacks specifically targeting blockchain-based systems. In this paper, we present BAD: Blockchain Anomaly Detection. This is the first solution, to the best of our knowledge, that is tailored to detect anomalies in blockchain-based systems. BAD is a complete framework, relying on several components leveraging, at its core, blockchain meta-data in order to collect potentially malicious activities. BAD enjoys some unique features: (i) it is distributed (thus avoiding any central point of failure); (ii) it is tamper-proof (making it impossible for a malicious software to remove or to alter its own traces); (iii) it is trusted (any behavioral data is collected and verified by the majority of the network); and, (iv) it is private (avoiding any third party to collect/analyze/store sensitive information). Our proposal is described in detail and validated via both experimental results and analysis, that highlight the quality and viability of our Blockchain Anomaly Detection solution.
... Then data from multiple sources can be used to reduce uncertainty and to improve the overall level of data quality. Simple methods for data reconciliation of conflicting sensor data are voting systems [5]. More elaborate fusion methods are the Bayes method [6,7], Dempster-Shafer method [8,9], and heuristic methods [10,11]. ...
Chapter
Reliability of sensor information in today’s highly automated systems is crucial. Neglected and not quantifiable uncertainties lead to lack of knowledge which results in erroneous interpretation of sensor data. Physical redundancy is an often-used approach to reduce the impact of lack of knowledge but in many cases is infeasible and gives no absolute certainty about which sensors and models to trust. However, structural models can link spatially distributed sensors to create analytical redundancy. By using existing sensor data and models, analytical redundancy comes with the benefits of unchanged structural behavior and cost efficiency. The detection of conflicting data using analytical redundancy reveals lack of knowledge, e.g. in sensors or models, and supports the inference from conflict to cause. We present an approach to enforce analytical redundancy by using an information model of the technical system formalizing sensors, physical models and the corresponding uncertainty in a unified framework. This allows for continuous validation of models and the verification of sensor data. This approach is applied to a structural dynamic system with various sensors based on an aircraft landing gear system.
... Reaching agreement in the presence of faults is a fundamental and non-trivial problem in the distributed systems literature -a topic subject of countless papers. See (Cristian, 1991;Fischer, 1983) for surveys. Within the context of this work, the distributed system represents the airborne selfseparation environment, and the processes correspond to the aircraft. ...
Chapter
Full-text available
This chapter introduces the reader to the benefits of distributed computing in air transportation. It presents a solution to airborne self-separation based on RAPTOR, a stack of distributed protocols that allows aircraft to reach different types of agreement in the presence of faults, both of accidental and malicious nature. These protocols are used as primitives to implement specific services for airborne self-separation, which are created within the context of a conflict resolution algorithm based on game theory.
... In the DPs, the decision on the global system time is made by specific algorithms aiming to solve the problem of consensus. The latter, in distributed systems, calls to the single nodes to "agree" on a given property, decision, or quantity [21] (in our case, time). A consensus algorithm has the following properties: ...
Article
Full-text available
One of the objectives of the medicine is to modify patients’ ways of living. In this context, a key role is played by the diagnosis. When dealing with acquisition systems consisting of multiple wireless devices located in different parts of the body, it becomes fundamental to ensure synchronization between the individual units. This task is truly a challenge, so one aims to limit the complexity of the calculation and ensure long periods of operation. In fact, in the absence of synchronization, it is impossible to relate all the measurements coming from the different subsystems on a single time scale for the extraction of complex characteristics. In this paper, we first analyze in detail all the possible causes that lead to have a system that is not synchronous and therefore not usable. Then, we propose a firmware implementation strategy and a simple but effective protocol that guarantees perfect synchrony between the devices while keeping computational complexity low. The employed network has a star topology with a master/slave architecture. In this paper a new approach to the synchronization problem is introduced to guarantee a precise but not necessarily accurate synchronization between the units. In order to demonstrate the effectiveness of the proposed solution, a platform consisting of two different types of units has been designed and built. In particular, a nine Degrees of Freedom (DoF) Inertial Measurement Unit (IMU) is used in one unit while a nine-DoF IMU and all circuits for the analysis of the superficial Electromyography (sEMG) are present on the other unit. The system is completed by an Android app that acts as a user interface for starting and stopping the logging operations. The paper experimentally demonstrates that the proposed solution overcomes all the limits set out and it guarantees perfect synchronization of the single measurement, even during long-duration acquisitions. In fact, a less than 30 μ s time mismatch has been registered for a 24 h test, and the possibility to perform complex post-processing on the acquired data with a simple and effective system has been proven.
... A wide range of resilience and fault tolerance notions for distributed networks have been explored extensively over the years. This led to a plentiful list of algorithms for fundamental problems, such as Consensus [14,15,16,22], Broadcast [38,34,23,35], Gossiping [6,2,8], and Agreement [13,33,9,7]. See [34] for a survey on this topic. ...
Preprint
We present the first round efficient algorithms for several fundamental distributed tasks in the presence of a Byzantine edge. Our algorithms work in the CONGEST model of distributed computing. In the \emph{Byzantine Broadcast} problem, given is a network $G=(V,E)$ with an unknown Byzantine edge $e'$. There is a source node $s$ holding an initial message $m_0$, and the goal is for all the nodes in the network to receive a copy of $m_0$, while ignoring all other messages. Perhaps surprisingly, to the best of our knowledge, all existing algorithms for the problem either assume that the Byzantine behavior is probabilistic, use polynomially large messages or else suffer from a large round complexity. We give an $\widetilde{O}(D^2)$-round \footnote{The notion $\widetilde{O}$ hides poly-logarithmic terms, and the notion $\widehat{O}$ hides a multiplicative factor of an $2^{O(\sqrt{\log n})}$ term.} algorithm for the Byzantine Broadcast problem, where $D$ is the diameter of the graph. The communication graph is required to be $3$-edge connected, which is known to be a necessary condition. We also provide a Leader Election algorithm in the presence of a Byzantine edge with the same round complexity of $\widetilde{O}(D^2)$ rounds. We use these algorithms to provide the efficient construction of \emph{Byzantine cycle covers} which serve the basis for (i) Byzantine BFS algorithms and (ii) a general compiler for algorithms in the presence of a Byzantine edge. We hope that the tools provided in this paper will pave the way towards obtaining \textbf{round-efficient algorithms} for many more distributed problems in the presence of Byzantine edges and nodes.
Chapter
Given a discrete-state continuous-time reactive system, like a digital circuit, the classical approach is to first model it as a state transition system and then prove its properties. Our contribution advocates a different approach: to directly operate on the input-output behavior of such systems, without identifying states and their transitions in the first place. We discuss the benefits of this approach at hand of some examples, which demonstrate that it nicely integrates with concepts of self-stabilization and fault-tolerance. We also elaborate on some unexpected artefacts of module composition in our framework, and conclude with some open research questions.
Preprint
Full-text available
Given a discrete-state continuous-time reactive system, like a digital circuit, the classical approach is to first model it as a state transition system and then prove its properties. Our contribution advocates a different approach: to directly operate on the input-output behavior of such systems, without identifying states and their transitions in the first place. We discuss the benefits of this approach at hand of some examples, which demonstrate that it nicely integrates with concepts of self-stabilization and fault-tolerance. We also elaborate on some unexpected artefacts of module composition in our framework, and conclude with some open research questions.
Chapter
Day-by-day, both data and network size are growing at a rapid rate. It is essential to keep private data secure and also prevent malicious activities. In a permissionless blockchain, nodes do not take permission for participation. One can directly mine a block by performing an open task. Security can be a significant issue here. Also, there is no third-party involvement in blockchain, so keeping trust among peers is an essential feature. The distributed public ledger stores history of old transactions to maintain trust between peers. To prevent malicious activities, consensus algorithms are used, which are defined as a complex task that a miner must perform to mine new blocks into the blockchain. In this chapter, various consensus mechanisms are mentioned with merits and demerits. With high computation power and digital currencies, nodes can quickly get into the blockchain and perform malicious activities. For that, various consensus algorithms are used like Proof of Work (PoW), Proof of Stake (PoS), Proof of Burn (PoB), Proof of Capacity (PoC), etc. Every consensus is developed to solve issues of previously developed consensus and provide more efficiency concerning resource allocation, scalability, security against attacks, power consumption, etc. Bitcoin is one of the use cases of blockchain, which is developed upon the PoW consensus method. Various companies have developed cryptocurrencies that are based on consensus algorithms. Consensus can be implemented on smart contracts to govern specific rules in the blockchain. While working with extensive transactions and a large chain of blocks, scalability, efficiency, and malicious attacks are significant issues. We have done a comparative analysis of all the consensus algorithms based on such issues.KeywordsBlockchainConsensusPermissionlessProof-basedVoting-based
Article
Emerging application scenarios, such as cyber-physical systems (CPSs), the Internet of Things (IoT), and edge computing, call for coordination approaches addressing openness, self-adaptation, heterogeneity, and deployment agnosticism. Field-based coordination is one such approach, promoting the idea of programming system coordination declaratively from a global perspective, in terms of functional manipulation and evolution in "space and time" of distributed data structures called fields. More specifically regarding time, in field-based coordination (as in many other distributed approaches to coordination) it is assumed that local activities in each device are regulated by a fair and unsynchronised fixed clock working at the platform level. In this work, we challenge this assumption, and propose an alternative approach where scheduling is programmed in a natural way (along with usual field-based coordination) in terms of causality fields, each enacting a programmable distributed notion of a computation "cause" (why and when a field computation has to be locally computed) and how it should change across time and space. Starting from low-level platform triggers, such causality fields can be organised into multiple layers, up to high-level, collectively-computed time abstractions, to be used at the application level. This reinterpretation of time in terms of articulated causality relations allows us to express what we call "time-fluid" coordination, where scheduling can be finely tuned so as to select the triggers to react to, generally allowing to adaptively balance performance (system reactivity) and cost (resource usage) of computations. We formalise the proposed scheduling framework for field-based coordination in the context of the field calculus, discuss an implementation in the aggregate computing framework, and finally evaluate the approach via simulation on several case studies.
Article
The influence vanishing property in social networks states that the influence of the most influential agent vanishes as society grows. Removing this assumption causes a failure of learning of boundedly rational dynamics. We suggest a boundedly rational methodology that leads to learning in almost all networks. The methodology adjusts the agent's weights based on the Sinkhorn-Knopp matrix scaling algorithm. It is a simple, local, Markovian, and time-independent methodology that can be applied to multiple settings.
Article
Failure detectors (FD)s are celebrated for their modularity in solving distributed problems. Algorithms are constructed using FD building blocks. Synchrony assumptions to implement FDs are studied separately and are typically expressed as eventual guarantees that need to hold, after some point in time, forever and deterministically. But in practice, they may hold only probabilistically and temporarily. This paper studies FDs in a realistic system N, where asynchrony is inflicted by probabilistic synchronous communication. We first address a problem with ⋄S, the weakest FD to solve consensus: an implementation of “consensus with probability 1” is possible in N without randomness in the algorithm, while an implementation of “⋄S with probability 1” is impossible in N. We introduce ⋄S⁎, a new FD with probabilistic and temporal accuracy. We prove that ⋄S⁎ (i) is implementable in N and (ii) can replace ⋄S, in several existing deterministic consensus algorithms that use ⋄S, to yield an algorithm that solves “consensus with probability 1”. We extend our results to other FD classes, e.g., ⋄P, and to a larger set of problems (beyond consensus), which we call decisive problems.
Chapter
A distributed computing system is a collection of processors that communicate either by reading and writing from shared memory or by sending messages over some communication network. Most prior biologically inspired distributed computing algorithms rely on message passing as the communication model. Here we show that in the process of genome-wide epigenetic modifications, cells utilize their DNA as a shared memory system. We formulate a particular consensus problem, called the epigenetic consensus problem, that cells attempt to solve using this shared memory model and then present algorithms, derive expected run time and discuss, analyze and simulate improved methods for solving this problem. Analysis of real biological data indicates that the computational methods indeed reflect aspects of the biological process for genome-wide epigenetic modifications.
Article
Full-text available
Software-Defined Networking (SDN) emerges as one of the leading technologies to address the pressing networking problems such as network virtualization and data center complexity issues using programmable switches and controllers. To well exchange information and have coordinated control for the SDN environment, the multi-controller SDN network has been proposed. Unfortunately, this network architecture still faces the problem of faulty controllers that may forge the command arbitrarily, and then the whole network will be broken down easily. Thus, it is important to propose the fault-tolerance scheme under the SDN network system. In this paper, a new SDN environment called the Multi-Controller Overlay Groups (MCOG) where each device is managed by the controllers is proposed. Besides, the degree of the faulty influence will be revised to define the different bounds of fault tolerance in the MCOG environment. Furthermore, the new protocol named General Dynamic Multi-Controller Agreement (GDMCA) protocol is proposed to solve the Byzantine Agreement (BA) problem in the MCOG environment. Based on the proof of complexity, it can show that the proposed GDMCA protocol is optimal under the MCOG environment.
Article
In this paper, we propose a game of the Byzantine Generals, which is a coordination game of agents seeking consensus by strategically transmitting information on a sequence of time-varying communication graphs. The first scenario of the game is where the generals cannot communicate with others at the same "level" in the communication graph. The second scenario is where those generals can. In either scenario, we examine the influences of the number of traitors and the decision rule held by the generals on equilibrium predictions of the game.
Preprint
Full-text available
Consensus in a distributed system i.e. the notion that all the nodes in the network agree upon a common data value, typically proposed by one of the nodes in the network is critical to the efficient functioning of a distributed system. Paxos is a key algorithm in the field of distributed systems aimed at achieving consensus despite the lack of a global clock and the possibility of faulty nodes in the network. Briefly, the algorithm relies on three main entities i.e. Proposer, Acceptor, and Learner where a majority of the acceptors are required to agree on a value proposed by a proposer for consensus. The paper describes the implementation of the Basic Paxos algorithm in UPPAAL, an environment for modeling, validation, and verification of real-time systems represented using timed automata. UPPAAL provides a real-time simulator that simulates the execution and supports the verification using Computational Tree Logic to verify the safety and liveness properties of the implemented model i.e. the Paxos algorithm in this case. Besides, UPPAAL provides statistical model checking capabilities that are leveraged to demonstrate the impact of three factors-the rate of exponential, probabilistic branching, and the number of tries, on the behavior of the UPPAAL model.
Article
Full-text available
Nowadays, to solve a problem, people/systems typically use knowledge from different sources. A binary vector is a useful structure to represent knowledge states, and determining the consensus for a binary vector collective is helpful in many areas. However, determining a consensus that satisfies postulate 2-Optimality is an NP-hard problem; therefore, many heuristic algorithms have been proposed. The basic heuristic algorithm is the fastest in the literature, and most widely used to solve this problem. The computational complexity of the basic heuristic algorithm is $O(m^{2}n)$ . In this study, we propose a quick algorithm (called QADC) to determine the 2-Optimality consensus. The QADC algorithm is developed based on a new approach for calculating the distances from a candidate consensus to the collective members. The computational complexity of the QADC algorithm has been reduced to $O(mn)$ , and the consensus quality of QADC algorithm and the basic heuristic algorithm is the same.
Preprint
The era of Internet of Things (IoT) has begun to evolve and with this the devices around us are getting more and more connected. Vehicular Ad-hoc NETworks (VANETs) is one of the applications of IoT. VANET allow vehicles within these networks to communicate effectively with each another. VANETs can provide an extensive range of applications that support and enhance passenger safety and comfort. It is important that VANETs are applied within a safe and reliable network topology; however, the challenging nature of reaching reliable and trustworthy agreement in such distributed systems is one of the most important issues in designing a fault-tolerant system. Therefore, protocols are required so that systems can still be correctly executed, reaching agreement on the same values in a distributed system, even if certain components in the system fail. In this study, the agreement problem is revisited in a VANET with multiple damages. The proposed protocol allows all fault-free nodes (vehicles) to reach agreement with minimal rounds of message exchanges, and tolerates the maximal number of allowable faulty components in the VANET.
Chapter
Cyber-intelligence sharing can leverage the development and deployment of security plans and teams within organizations, making infrastructures resilient and resistant to cyberattacks.
Article
Full-text available
This paper describes an application of Byzantine Agreement [DoSt82a, DoSt82e, LyFF82] to distributed transaction commit. We replace the second phase of one of the commit algorithms of [MoLi83] with Byzantine Agreement, providing certain trade-offs and advantages at the time of commit and providing speed advantages at the time of recovery from failure. The present work differs from that presented in [DoSt82b] by increasing the scope (handling a general tree of processes, and multi-cluster transactions) and by providing an explicit set of recovery algorithms. We also provide a model for classifying failures that allows comparisons to be made among various proposed distributed commit algorithms. The context for our work is the Highly Available Systems project at the IBM San Jose Research Laboratory [AAF-KM83].
Conference Paper
Full-text available
this paper only with one aspect of security: properties of the system that are hidden from an enemy who may make inferences. Informally, a participant (honest or dishonest) is presented with information and properties that he brings to the protocol as priori information. Whatever is to be excluded from knowledge (e.g., the knowledge of secret keys in a public key system) must be explicitly excluded from this information. This information is model- led by a set theoretic structure, and so the basic inferences that can be drawn by a participant are the sentences of the complete logical theory of this structure. A participant can also apply cryp- tographic operations to generate new messages. The basic mechanism for this is.an inference function which is assigned to each participant. The nature of an inference function is unspecified, except that it satisfy a losslessness condition
Conference Paper
Full-text available
Recently, Fischer, Lynch and Paterson [3] proved that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. We exhibit here a probabilistic solution for this problem, which guarantees that as long as a majority of the processes continues to operate, a decision will be made (Theorem 1). Our solution is completely asynchronous and is rather strong: As in [4], it is guaranteed to work with probability 1 even against an adversary scheduler who knows all about the system.
Conference Paper
Full-text available
Byzantine Agreement has become increasingly important in establishing distributed properties when errors may exist in the systems. Recent polynomial algorithms for reaching Byzantine Agreement provide us with feasible solutions for obtaining coordination and synchronization in distributed systems. In this paper the amount of information exchange necessary to ensure Byzantine Agreement is studied. This is measured by the total number of messages the participating processors have to send in the worst case. In algorithms that use a signature scheme, the number of signatures appended to messages are also counted. First it is shown that Ω( nt ) is a lower bound for the number of signatures for any algorithm using authentication, where n denotes the number of processors and t the upper bound on the number of faults the algorithm is supposed to handle. For algorithms that reach Byzantine Agreement without using authentication this is even a lower bound for the total number of messages. If n is large compared to t , these bounds match the upper bounds from previously known algorithms. For the number of messages in the authenticated case we prove the lower bound Ω( n + t ² ). Finally algorithms that achieve this bound are presented.
Article
Full-text available
Reaching agreement in a distributed system in the presence of fault processors is a central issue for reliable computer systems. Using an authentication protocol, one can limit the undetected behavior of faulty processors to a simple failure to relay messages to all intended targets. In this paper the authors show that, in spite of such an ability to limit faulty behavior, and no matter what message types or protocols are allowed, reaching (Byzantine) agreement requires at least t+1 phases or rounds of information exchange, where t is an upper bound on the number of faulty processors. They present algorithms for reaching agreement based on authentication that require a total number of messages sent by correctly operating processors that is polynomial in both t and the number of processors, n. The best algorithm uses only t+1 phases and o(nt) messages. 9 references.
Article
Full-text available
Byzantine Agreement has become increasingly important in establishing distributed properties when there may exist errors in the systems. Recent polynomial algorithms for reaching Byzantine Agreement provide us with feasible solutions for obtaining coordination and synchronization in distributed systems. In this paper we study the amount of information exchange necessary to ensure Byzantine Agreement. This is measured by the number of messages and the number of signatures appended to messages (in case of authenticated algorithms) the participating processors need to send, in the worse case, in order to reach Byzantine Agreement. The lower bound for the number of signatures in the authenticated case is &Ohgr;(nt), where n is the number of participating processors and t is the upper bound on the number of faults. If n is large compared to t, it matches the upper bounds from previously known algorithms. The lower bound for the number of messages is &Ohgr;(n+t2). We present an algorithm that achieves this bound and for which the number of phases does not exceed the minimum t+1 by more than a constant factor.
Article
Full-text available
The consensus problem involves an asynchronous system of processes, some of which may be unreliable. The problem is for the reliable processes to agree on a binary value. In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process. By way of contrast, solutions are known for the synchronous case, the “Byzantine Generals” problem.
Article
Full-text available
Description d'une methode generale pour implementer un systeme reparti ayant n'importe quel degre desire de tolerance de panne. La synchronisation par horloge fiable et une solution au probleme «Bizantine Generals» sont assumes
Conference Paper
Full-text available
Reaching agreement in a distributed system while handling malfunctioning behavior is a central issue for reliable computer systems. All previous algorithms for reaching the agreement required an exponential number of messages to be sent, with or without authentication. We give polynomial algorithms for reaching (Byzantine) agreement, both with and without the use of authentication protocols. We also prove that no matter what kind of information is exchanged, there is no way to reach agreement with fewer than t+1 rounds of exchange, where t is the upper bound on the number of faults.
Conference Paper
Full-text available
Reaching agreement is a primitive of distributed computing. While this poses no problem in an ideal, failure-free environment, it imposes certain constraints on the capabilities of an actual system: a system is viable only if it permits the existence of consensus protocols tolerant to some number of failures. Fischer, Lynch and Paterson [FLP] have shown that in a completely asynchronous model, even one failure cannot be tolerated. In this paper we extend their work, identifying several critical system parameters, including various synchronicity conditions, and examine how varying these affects the number of faults which can be tolerated. Our proofs expose general heuristic principles that explain why consensus is possible in certain models but not possible in others.
Article
Full-text available
SIFT (Software Implemented Fault Tolerance) is an ultrareliable computer for critical aircraft control applications that achieves fault tolerance by the replication of tasks among processing units. The main processing units are off-the-shelf minicomputers, with standard microcomputers serving as the interface to the I/O system. Fault isolation is achieved by using a specially designed redundant bus system to interconnect the proeessing units. Error detection and analysis and system reconfiguration are performed by software. Iterative tasks are redundantly executed, and the results of each iteration are voted upon before being used. Thus, any single failure in a processing unit or bus can be tolerated with triplication of tasks, and subsequent failures can be tolerated after reconfiguration. Independent execution by separate processors means that the processors need only be loosely synchronized, and a novel fault-tolerant synchronization method is described. The SIFT software is highly structured and is formally specified using the SRI-developed SPECIAL language. The correctness of SIFT is to be proved using a hierarchy of formal models. A Markov model is used both to analyze the reliability of the system and to serve as the formal requirement for the SIFT design. Axioms are given to characterize the high-level behavior of the system, from which a correctness statement has been proved. An engineering test version of SIFT is currently being built.
Article
Two-Phase Commit and other distributed commit protocols provide a method to commit changes while preserving consistency in a distributed database. These protocols can cope with various failures occurring in the system. But in case of failure they do not guarantee termination (of protocol processing) within a given time: sometimes the protocol requires waiting for a failed processor to be returned to operation. It happens that a straightforward use of timeouts in a distributed system is fraught with unexpected peril and does not provide an easy solution to the problem. In this paper we will combine Byzantine Agreement with Two-Phase Commit, using observations of Lamport to provide a method to cope with failure within a given time bound . An extra benefit of this combination of ideas is that it handles undetected and transient faults as well as the more usual system or processor down faults handled by other distributed commit protocols.
Article
The problem addressed here concerns a set of isolated processors, some unknown subset of which may be faulty, that communicate only by means of two-party messages. Each nonfaulty processor has a private value of information that must be communicated to each other nonfaulty processor. Nonfaulty processors always communicate honestly, whereas faulty processors may lie. The problem is to devise an algorithm in which processors communicate their own values and relay values received from others that allows each nonfaulty processor to infer a value for each other processor. The value inferred for a nonfaulty processor must be that processor's private value, and the value inferred for a faulty one must be consistent with the corresponding value inferred by each other nonfaulty processor. It is shown that the problem is solvable for, and only for, n ≥ 3m + 1, where m is the number of faulty processors and n is the total number. It is also shown that if faulty processors can refuse to pass on information but cannot falsely relay information, the problem is solvable for arbitrary n ≥ m ≥ 0. This weaker assumption can be approximated in practice using cryptographic methods.
Article
The Byzantine Generals Problem requires processes to reach agreement upon a value open though some of them may fail. It is weakened by allowing them to agree upon an 'incorrect' value if a failure occurs. The transaction commit problem for a distributed database is a special case of the weaker problem. It is shown that, like the original Byzantine Generals Problem, the weak version can be solved only if fewer than one-third of the processes may fail. Unlike the original problem, an approximate solution exists that can tolerate arbitrarily many failures.
Article
We define a new model for algorithms to reach Byzantine Agreement. It allows one to measure the complexity more accurately, to differentiate between processor faults, and to include communication link failures. A deterministic algorithm is presented that exhibits early stopping by phase 2f + 3 in the worst case, where f is the actual number of faults, under less stringent conditions than the ones of previous algorithms. Its average performance can also easily be analysed making realistic assumptions on random distribution of faults. We show that it stops with high probability after a small number of phases.
Article
Can unanimity be achieved in an unreliable distributed system? This problem was named the “Byzantine Generals Problem” by L. Lamport, R. Shostak, and M. Pease (Technical Report 54, Computer Science Laboratory, SRI International, March 1980). The results obtained in the present paper prove that unanimity is achievable in any distributed system if and only if the number of faulty processors in the system is: (1) less than one-third of the total number of processors; and (2) less than one-half of the connectivity of the system's network. In cases where unanimity is achievable, algorithms for obtaining it are given. This result forms a complete characterization of networks in the light of the Byzantine Problem.
Conference Paper
LOCUS is a distributed operating system that provides a very high degree of network transparency while at the same time supporting high performance and automatic replication of storage. By network transparency we mean that at the system call interface there is no need to mention anything network related. Knowledge of the network and code to interact with foreign sites is below this interface and is thus hidden from both users and programs under normal conditions. LOCUS is application code compatible with Unix ² , and performance compares favorably with standard, single system Unix. LOCUS runs on a high bandwidth, low delay local network. It is designed to permit both a significant degree of local autonomy for each site in the network while still providing a network-wide, location independent name structure. Atomic file operations and extensive synchronization are supported. Small, slow sites without local mass store can coexist in the same network with much larger and more powerful machines without larger machines being slowed down through forced interaction with slower ones. Graceful operation during network topology changes is supported.
Article
Algorithms are described for maintaining clock synchrony in a distributed multiprocess system where each process has its own clock. These algorithms work in the presence of arbitrary clock or process failures, including “two-faced clocks” that present different values to different processes. Two of the algorithms require that fewer than one-third of the processes be faulty. A third algorithm works if fewer than half the processes are faulty, but requires digital signatures.
Article
An encryption method is presented with the novel property that publicly re- vealing an encryption key does not thereby reveal the corresponding decryption key. This has two important consequences: 1. Couriers or other secure means are not needed to transmit keys, since a message can be enciphered using an encryption key publicly revealed by the intended recipient. Only he can decipher the message, since only he knows the corresponding decryption key. 2. A message can be \signed" using a privately held decryption key. Anyone can verify this signature using the corresponding publicly revealed en- cryption key. Signatures cannot be forged, and a signer cannot later deny the validity of his signature. This has obvious applications in \electronic mail" and \electronic funds transfer" systems. A message is encrypted by representing it as a number M, raising M to a publicly specied
Article
An encryption method is presented with the novel property that publicly revealing an encryption key does not thereby reveal the corresponding decryption key. This has two important consequences: Couriers or other secure means are not needed to transmit keys, since a message can be enciphered using an encryption key publicly revealed by the intended recipient. Only he can decipher the message, since only he knows the corresponding decryption key. A message can be “signed” using a privately held decryption key. Anyone can verify this signature using the corresponding publicly revealed encryption key. Signatures cannot be forged, and a signer cannot later deny the validity of his signature. This has obvious applications in “electronic mail” and “electronic funds transfer” systems. A message is encrypted by representing it as a number M, raising M to a publicly specified power e, and then taking the remainder when the result is divided by the publicly specified product, n , of two large secret prime numbers p and q. Decryption is similar; only a different, secret, power d is used, where e * d = 1(mod (p - 1) * (q - 1)). The security of the system rests in part on the difficulty of factoring the published divisor, n .
Article
this paper, we answer this question in the negative. That is, we show that any algorithm which assures interactive consistency in the presence of m faulty processors requires at least m + 1 rounds of communication
Article
Thesis (Ph. D.)--Georgia Institute of Technology, 1983. Vita. Includes bibliographical references (leaves 128-130). Photocopy.
Conference Paper
Two different notions of Byzantine Agreement - immediate and eventually - are defined depending on whether the agreement involves an action to be performed synchronously or not. The lower bounds for time complexity depend on what kind of agreement has to be achieved. All previous algorithms to reach Byzantine Agreement ensure immediate agreement. We present two algorithms that in many cases reach the second type of agreement faster than previously known algorithms showing that there actually is a difference between the two notions: Eventual Byzantine Agreement can be reached earlier than Immediate.
Conference Paper
Can unanimity be achieved in an unknown and unreliable distributed system? We analyze two extreme models of networks: one in which all the routes of communication are known, and the other in which not even the topology of the network is known. We prove that independently of the model, unanimity is achievable if and only if the number of faulty processors in the system is 1. less than one half of the connectivity of the system's network, and 2. less than one third of the total number of processors. In cases where unanimity is achievable, an algorithm to obtain it is given.
Conference Paper
We present a randomized solution for the Byzantine Generals Problems. The solution works in the synchronous as well as the asynchronous case and produces Byzantine Agreement within a fixed small expected number of computational rounds, independent of the number n of processes and the bound t on the number of faulty processes. The solution uses A. Shamir's method for sharing secrets. It specializes to produce a simple solution for the Distributed Commit problem.
Article
Two kinds of contemporary developments in cryptography are examined. Widening applications of teleprocessing have given rise to a need for new types of cryptographic systems, which minimize the need for secure key distribution channels and supply the equivalent of a written signature. This paper suggests ways to solve these currently open problems. It also discusses how the theories of communication and computation are beginning to provide the tools to solve cryptographic problems of long standing.
Article
Most of the published work on massive redundancy makes one crucial assumption: the redundant modules are synchronized. There are three ways of achieving synchronization in redundant systems-independent accurate clocks, a common external reference, and mutual feedback. The use of a common external reference is currently the most widely used technique, but suffers from vulnerability to common-point failures. We introduce a novel mutual feedback technique, called "synchronization voting," that does not have this drawback. A practical application of synchronization voting is described in the appendix—a fault-tolerant crystal-controlled clock.
Article
The Byzantine Generals problem involves a system of N processes, t of which may be unreliable, The problem is for the reliable processes to agree on a binary value sent by a "general", which may itself be one of the N processes, If the general sends the same value to each process, then all reliable processes must agree on that value, but in any case, they must agree on the same value. We give an explicit solution for a binary value among N = 3t + 1 processes, using 2t + 4 rounds and O(t 3 log t) message bits, where t bounds the number of faulty .')rocesses. This solution is easily extended to the genera[ case of N _ 3t + 1 to give a solution using 2t + 5 rounds and O(tN + t31og t) message bits, *This work was supported in part by the Office of Naval Research under Contract N00014.80-C-0221 through a subcontract from the University of Washington, by the Office of Army Research under Contract DAAG29-79-C-0155, and by the National Science Foundation under Grants MCS-79-24370, MCS80-04111, and MCS81-16678. MCS-79-24370, MCS60-04t 11, and MCS81-16678, 1.
Article
The transaction commit problem in a distributed database system is an instance of the Weak Byzantine Generals problem. It is shown that even under the assumption that a process can fail only by "crashing"---failing to send any more messages---a solution to this problem that can tolerate k failures must, in the worst case, require at least k + 1 message-passing delays. Under this same assumption, a simple solution that exhibits the optimal worst-case behavior is given. i Contents 1 Introduction 1 2 The WBG Problem 2 3 The Time Complexity of the WBG Problem 3 4 A Byzantine Generals Solution 13 ii 1 Introduction In many database systems, there is a point in the processing of a transaction when an irrevocable decision is made whether to abort or commit it---where committing the transaction involves inserting any changes it made into the database. For a distributed database system, this decision must be announced to all the sites a#ected by the transaction. We will show that designin...
A Discussion of Distributed Systems
• J Gray
J. Gray. A discussion of distributed systems. Research Report RJ2699, IBM, September 1979.
• F B Schneider
• D Gries
• R D Schlichting
F. B. Schneider, D. Gries, and R. D. Schlichting. Fast reliable broadcasts. Computer Science Technical Report TR 82-519, Cornell University, September 1982.
A Discussion of Distributed Systems
• J Gray
• Ibm
• Sept
Distributed commit with bounded waiting
• D Dolev
• H R Strong
D. Dolev and H. R. Strong. Distributed commit with bounded waiting. In Proc. Second Symposium on Reliability in Distributed Software and Database System, Pittsburgh, July 1982.
• B G Lindsay
B. G. Lindsay et al. Notes on distributed databases. Research Report RJ2571, IBM, July 1979.