Conference Paper

Network-Assisted Raft Consensus Algorithm

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Consensus is a fundamental problem in distributed computing. In this poster, we ask the following question: can we partially offload the execution of a consensus algorithm to the network to improve its performance? We argue for an affirmative answer by proposing a network-assisted implementation of the Raft consensus algorithm. Our approach reduces consensus latency, is failure-aware, and does not sacrifice correctness or scalability. In order to enable Raft-aware forwarding and quick response, we use P4-based programmable switches and offload partial Raft functionality to the switch. We demonstrate the efficacy of our approach and performance improvements it offers via a prototype implementation.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... • Reconfigurability: the parser and the processing logic can be redefined in the field. Variations [54][55][56][57][58][59][60][61][62] Collectors and Solutions [63][64][65][66][67] Congestion Control [68][69][70][71][72][73][74][75][76] Measurements AQM [99][100][101][102][103][104][105][106][107][108][109] QoS and TM [110][111][112][113][114] Multicast [115][116][117] Load Balancing [118][119][120][121][122][123][124][125][126] Caching [127][128][129][130][131][132][133][134][135][136] Telecom Services [137][138][139][140][141][142][143][144][145][146] Contentcentric Networking [147][148][149][150][151][152] Consensus [153][154][155][156][157][158][159][160] Machine Learning [161][162][163][164][165][166] Miscellaneous [167][168][169][170][171][172][173][174][175] Aggregation [176][177][178][179] Service Automation [180,181] Heavy Hitter [182][183][184][185][186][187][188][189][190] Cryptography [191][192][193][194][195] Anonymity [196][197][198][199][200] Access Control [201][202][203][204][205][206][207][208] Attacks and Defenses Troubleshoot [230][231][232][233][234] Verification [235][236][237][238][239][240][241][242][243] • Protocol independence: the switch is protocol-agnostic. The programmer defines the protocols, the parser, and the operations to process the headers. ...
... The system offloads the comparison of controllers' outputs required for correct BFT operations to programmable switches. Finally, Han et al. [156] offloaded part of the Raft consensus algorithm [296] to programmable switches in order to improve its performance. The authors selected Raft due to the fact that it has been formally proven to be more safe than Paxos, and it has been implemented on popular SDN controllers. ...
... Others Eris [155] Novel P4BFT [159] BFT N/A [156] Raft × Unordered and completely asynchronous networks require the full implementation and complexity of Paxos. NOPaxos suggests that the communication layer should provide a new Ordered Unreliable Multicast (OUM) primitive; that is, there is a guarantee that receivers will process the multicast messages in the same order, though messages can be lost. ...
Article
Full-text available
Traditionally, the data plane has been designed with fixed functions to forward packets using a small set of protocols. This closed-design paradigm has limited the capability of the switches to proprietary implementations which are hard-coded by vendors, inducing a lengthy, costly, and inflexible process. Recently, data plane programmability has attracted significant attention from both the research community and the industry, permitting operators and programmers in general to run customized packet processing functions. This open-design paradigm is paving the way for an unprecedented wave of innovation and experimentation by reducing the time of designing, testing, and adopting new protocols; enabling a customized, top-down approach to develop network applications; providing granular visibility of packet events defined by the programmer; reducing complexity and enhancing resource utilization of the programmable switches; and drastically improving the performance of applications that are offloaded to the data plane. Despite the impressive advantages of programmable data plane switches and their importance in modern networks, the literature has been missing a comprehensive survey. To this end, this paper provides a background encompassing an overview of the evolution of networks from legacy to programmable, describing the essentials of programmable switches, and summarizing their advantages over Software-defined Networking (SDN) and legacy devices. The paper then presents a unique, comprehensive taxonomy of applications developed with P4 language; surveying, classifying, and analyzing more than 200 articles; discussing challenges and considerations; and presenting future perspectives and open research issues.
... Control [68][69][70][71][72][73][74][75][76] Measurements AQM [99][100][101][102][103][104][105][106][107][108][109] QoS and TM [110][111][112][113][114] Multicast [115][116][117] Load Balancing [118][119][120][121][122][123][124][125][126] Caching [127][128][129][130][131][132][133][134][135][136] Telecom Services [137][138][139][140][141][142][143][144][145][146] Contentcentric Networking [147][148][149][150][151][152] Consensus [153][154][155][156][157][158][159][160] Machine Learning [161][162][163][164][165][166] Miscellaneous [167][168][169][170][171][172][173][174][175] Aggregation [176][177][178][179] Service Automation [180,181] Heavy Hitter [182][183][184][185][186][187][188][189][190] Cryptography [191][192][193][194][195] Anonymity [196][197][198][199][200] Access Control [201][202][203][204][205][206][207][208] Attacks and Defenses Troubleshoot [230][231][232][233][234] Verification [235][236][237][238][239][240][241][242][243] • Protocol independence: the switch is protocol-agnostic. The programmer defines the protocols, the parser, and the operations to process the headers. ...
... The system offloads the comparison of controllers' outputs required for correct BFT operations to programmable switches. Finally, Han et al. [156] offloaded part of the Raft consensus algorithm [296] to programmable switches in order to improve its performance. The authors selected Raft due to the fact that it has been formally proven to be more safe than Paxos, and it has been implemented on popular SDN controllers. ...
... Others Eris [155] Novel P4BFT [159] BFT N/A [156] Raft × Unordered and completely asynchronous networks require the full implementation and complexity of Paxos. NOPaxos suggests that the communication layer should provide a new Ordered Unreliable Multicast (OUM) primitive; that is, there is a guarantee that receivers will process the multicast messages in the same order, though messages can be lost. ...
Preprint
Full-text available
Traditionally, the data plane has been designed with fixed functions to forward packets using a small set of protocols. This closed-design paradigm has limited the capability of the switches to proprietary implementations which are hardcoded by vendors, inducing a lengthy, costly, and inflexible process. Recently, data plane programmability has attracted significant attention from both the research community and the industry, permitting operators and programmers in general to run customized packet processing function. This open-design paradigm is paving the way for an unprecedented wave of innovation and experimentation by reducing the time of designing, testing, and adopting new protocols; enabling a customized, top-down approach to develop network applications; providing granular visibility of packet events defined by the programmer; reducing complexity and enhancing resource utilization of the programmable switches; and drastically improving the performance of applications that are offloaded to the data plane. Despite the impressive advantages of programmable data plane switches and their importance in modern networks, the literature has been missing a comprehensive survey. To this end, this paper provides a background encompassing an overview of the evolution of networks from legacy to programmable, describing the essentials of programmable switches, and summarizing their advantages over Software-defined Networking (SDN) and legacy devices. The paper then presents a unique, comprehensive taxonomy of applications developed with P4 language; surveying, classifying, and analyzing more than 150 articles; discussing challenges and considerations; and presenting future perspectives and open research issues.
... Downstream nodes need to parse only the UPC to make forwarding decisions. [485] 2017 -Sankaran et al. [486] 2020 -Zang et al. [487] 2017 bmv2 Dang et al. [488,489] 2016/20 Tofino [490] P4BFT [491,492] 2019 bmv2, Netronome SwiShmem [493] 2020 -SC-BFT [494] 2020 bmv2 ...
... Zhang et al. [487] propose to offload parts of the Raft consensus algorithm to P4 switches. However, the mechanisms require an additional client to run on the switch. ...
Preprint
With traditional networking, users can configure control plane protocols to match the specific network configuration, but without the ability to fundamentally change the underlying algorithms. With SDN, the users may provide their own control plane, that can control network devices through their data plane APIs. Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane APIs which may be leveraged by user-defined SDN control. Thus, programmable data planes and SDN offer great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community and it is supported by various software and hardware platforms. In this paper, we survey the literature from 2015 to 2020 on data plane programming with P4. Our survey covers 497 references of which 367 are scientific publications. We organize our work into two parts. In the first part, we give an overview of data plane programming models, the programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we analyze a large body of literature considering P4-based applied research. We categorize 241 research papers into different application domains, summarize their contributions, and extract prototypes, target platforms, and source code availability.
... The impact of (a) controllers and; (b) disjoint controller clusters on the control plane load footprint in Internet2 and Fat-Tree (k = 4) topologies for 5000 randomized controller placements each. (a) randomizes the placement but fixes the no. of disjoint clusters to 3; (b) randomizes the no. of disjoint clusters between [1,7,13,17] but fixes the no. of controllers to 17. Fig. 4 depicts the processing delay incurred in the processing node for a single client request. The delay corresponds to the P4 pipeline execution time spent on identification of a correct controller message, comprising the i) hash computation over controller messages; ii) incrementing the counters for the computed hash; iii) signing the correct packet and; iv) propagating it to the correct egress port. ...
... In this paper, we investigate if a similar claim can be transferred to BFT algorithms in SDN context. In the same spirit, in [17], end-hosts partially offload the log replication and log commitment operations of RAFT consensus algorithm to neighboring P4 devices, thus accelerating the overall commit time. ...
Preprint
Full-text available
Byzantine Fault Tolerance (BFT) enables correct operation of distributed, i.e., replicated applications in the face of malicious take-over and faulty/buggy individual instances. Recently, BFT designs have gained traction in the context of Software Defined Networking (SDN). In SDN, controller replicas are distributed and their state replicated for high availability purposes. Malicious controller replicas, however, may destabilize the control plane and manipulate the data plane, thus motivating the BFT requirement. Nonetheless, deploying BFT in practice comes at a disadvantage of increased traffic load stemming from replicated controllers, as well as a requirement for proprietary switch functionalities, thus putting strain on switches' control plane where particular BFT actions must be executed in software. P4BFT leverages an optimal strategy to decrease the total amount of messages transmitted to switches that are the configuration targets of SDN controllers. It does so by means of message comparison and deduction of correct messages in the determined optimal locations in the data plane. In terms of the incurred control plane load, our P4-based data plane extensions outperform the existing solutions by ~33.2% and ~40.2% on average, in random 128-switch and Fat-Tree/Internet2 topologies, respectively. To validate the correctness and performance gains of P4BFT, we deploy bmv2 and Netronome Agilio SmartNIC-based topologies. The advantages of P4BFT can thus be reproduced both with software switches and "commodity" P4-enabled hardware. A hardware-accelerated controller packet comparison procedure results in an average ~96.4% decrease in processing delay per request compared to existing software approaches.
... It uses multiple candidate nodes to replicate logs in parallel, which improves scalability and transaction throughput. In 2017, Zhang et al. [56] proposed the Network-Assisted Raft algorithm using network assistance. Under the premise of ensuring correctness and scalability, by unloading the raft algorithm from the consensus layer to the programmable switch in the network layer, the working process of forwarding and response in the algorithm is optimized to improve the working performance. ...
Article
Full-text available
Blockchain technology can solve the problem of trust in the open network in a decentralized way. It has broad application prospects and has attracted extensive attention from academia and industry. The blockchain consensus algorithm ensures that the nodes in the chain reach consensus in the complex network environment, and the node status ultimately remains the same. The consensus algorithm is one of the core technologies of blockchain and plays a pivotal role in the research of blockchain technology. This article gives the basic concepts of the blockchain, summarizes the key technologies of the blockchain, especially focuses on the research of the blockchain consensus algorithm, expounds the general principles of the consensus process, and classifies the mainstream consensus algorithms. Then, focusing on the improvement of consensus algorithm performance, it reviews the research progress of consensus algorithms in detail, analyzes and compares the characteristics, suitable scenarios, and possible shortcomings of different consensus algorithms, and based on this, studies the future development trend of consensus algorithms for reference.
... Recently, implementations of consensus algorithms in networking hardware (e.g., those of Paxos [30], [31], Raft [32] and Byzantine agreement [33], [34]) have started gaining traction. Dang et al. [30], [31] portray throughput, latency and flexibility benefits of network-supported consensus execution at line speed. ...
Conference Paper
Full-text available
Centralized Software Defined Networking (SDN) controllers and Network Management Systems (NMS) introduce the issue of controller as a single-point of failure (SPOF). The SPOF correspondingly motivated the introduction of distributed controllers, with replicas assigned into clusters of controller instances replicated for purpose of enabling high availability. The replication of the controller state relies on distributed consensus and state synchronization for correct operation. Recent works have, however, demonstrated issues with this approach. False positives in failure detectors deployed in replicas may result in oscillating leadership and control plane unavailability. In this paper, we first elaborate the problematic scenario. We resolve the related issues by decoupling failure detector from the underlying signaling methodology and by introducing event agreement as a necessary component of the proposed design. The effectiveness of the proposed model is validated using an exemplary implementation and demonstration in the problematic scenario. We present an analytic model to describe the worst- case delay required to reliably agree on replica failures. The effectiveness of the analytic formulation is confirmed empirically using varied cluster configurations in an emulated environment. Finally, we discuss the impact of each component of our design on the replica failure- and recovery-detection delay, as well as on the imposed communication overhead.
... In this paper, we investigate if a similar claim can be transferred to BFT algorithms in SDN context. In the same spirit, in [14], end-hosts partially offload the log replication and log commitment operations of RAFT consensus algorithm to neighboring P4 devices, thus accelerating the overall commit time. In the context of in-network computation, Sapio et al. [15] discuss the benefit of data aggregation offloading to constrained network devices for the purpose of data reduction and minimization of workers' computation time. ...
Conference Paper
Full-text available
Byzantine Fault Tolerance (BFT) enables correct operation of distributed, i.e., replicated applications in the face of malicious takeover and faulty/buggy individual instances. Recently, BFT designs have gained traction in the context of Software Defined Networking (SDN). In SDN, controller replicas are distributed and their state replicated for high availability purposes. Malicious controller replicas, however, may destabilize the control plane and manipulate the data plane, thus motivating the BFT requirement. Nonetheless, deploying BFT in practice comes at a disadvantage of increased traffic load stemming from replicated controllers, as well as a requirement for proprietary switch functionalities, thus putting strain on switches' control plane where particular BFT actions must be executed in software. P4BFT leverages an optimal strategy to decrease the total amount of messages transmitted to switches that are the configuration targets of SDN controllers. It does so by means of message comparison and deduction of correct messages in the determined optimal locations in the data plane. In terms of the incurred control plane load, our P4-based data plane extensions outperform the existing solutions by ∼ 33.2% and ∼ 40.2% on average, in random 128-switch and Fat-Tree/Internet2 topologies, respectively. To validate the correctness and performance gains of P4BFT, we deploy bmv2 and Netronome Agilio SmartNIC-based topologies. The advantages of P4BFT can thus be reproduced both with software switches and "commodity" P4-enabled hardware. A hardware-accelerated controller packet comparison procedure results in an average 96.4 % decrease in processing delay per request compared to existing software approaches.
Article
Full-text available
Despite the enormous number of online docking services available, consumers sometimes struggle to discover the services they require from time to time. On the other hand, when finding matching or recommendation platforms from an academic or industry perspective, most of the related work they can find is centralized systems. Unfortunately, the centralized systems often have shortages, such as adv-driven, lack of trust, non-transparency, and unfairness. The authors propose a peer-to-peer (P2P) service network for service discovery and recommendation. ServiceNet is a blockchain-based service ecosystem that promises to provide an open, transparent, self-growing, and self-managing service environment. The article will provide the basic concept, the proto-architecture type's design, and the proto-initial type's implementation and performance assessment.
Article
Unprecedented attention towards blockchain technology is serving as a game-changer in fostering the development of blockchain-enabled distinctive frameworks. However, fragmentation unleashed by its underlying concepts hinders different stakeholders from effectively utilizing blockchain-supported services, resulting in the obstruction of its wide-scale adoption. To explore synergies among the isolated frameworks requires comprehensively studying inter-blockchain communication approaches. These approaches broadly come under the umbrella of Blockchain Interoperability (BI) notion, as it can facilitate a novel paradigm of an integrated blockchain ecosystem that connects state-of-the-art disparate blockchains. Currently, there is a lack of studies that comprehensively review BI, which works as a stumbling block in its development. Therefore, this article aims to articulate potential of BI by reviewing it from diverse perspectives. Beginning with a glance of blockchain architecture fundamentals, this article discusses its associated platforms, taxonomy, and consensus mechanisms. Subsequently, it argues about BI’s requirement by exemplifying its potential opportunities and application areas. Concerning BI, an architecture seems to be a missing link. Hence, this article introduces a layered architecture for the effective development of protocols and methods for interoperable blockchains. Furthermore, this article proposes an in-depth BI research taxonomy and provides an insight into the state-of-the-art projects. Finally, it determines possible open challenges and future research in the domain.
Article
Full-text available
This paper describes an implementation of the well-known consensus protocol, Paxos, in the P4 programming language. P4 is a language for programming the behavior of network forwarding devices (i.e., the network data plane). Moving consensus logic into network devices could significantly improve the performance of the core infrastructure and services in data centers. Moreover, implementing Paxos in P4 provides a critical use case and set of requirements for data plane language designers. In the long term, we imagine that consensus could someday be offered as a network service, just as point-to-point communication is provided today.
Article
Full-text available
OpenFlow is a vendor-agnostic API for controlling hardware and software switches. In its current form, OpenFlow is specific to particular protocols, making it hard to add new protocol headers. It is also tied to a specific processing paradigm. In this paper we make a strawman proposal for how OpenFlow should evolve in the future, starting with the definition of an abstract forwarding model for switches. We have three goals: (1) Protocol independence: Switches should not be tied to any specific network protocols. (2) Target independence: Programmers should describe how switches are to process packets in a way that can be compiled down to any target switch that fits our abstract forwarding model. (3) Reconfigurability in the field: Programmers should be able to change the way switches process packets once they are deployed in a network. We describe how to write programs using our abstract forwarding model and our P4 programming language in order to configure switches and populate their forwarding tables.
Article
Full-text available
In this paper, we describe ZooKeeper, a service for co-ordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates elements from group messaging, shared registers, and distributed lock services in a repli-cated, centralized service. The interface exposed by Zoo-Keeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet pow-erful coordination service. The ZooKeeper interface enables a high-performance service implementation. In addition to the wait-free property, ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all re-quests that change the ZooKeeper state. These design de-cisions enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers. We show for the target workloads, 2:1 to 100:1 read to write ratio, that ZooKeeper can handle tens to hundreds of thousands of transactions per second. This performance allows ZooKeeper to be used exten-sively by client applications.
Conference Paper
This paper explores the possibility of implementing the widely deployed Paxos consensus protocol in network devices. We present two different approaches: (i) a detailed design description for implementing the full Paxos logic in SDN switches, which identifies a sufficient set of required OpenFlow extensions; and (ii) an alternative, optimistic protocol which can be implemented without changes to the OpenFlow API, but relies on assumptions about how the network orders messages. Although neither of these protocols can be fully implemented without changes to the underlying switch firmware, we argue that such changes are feasible in existing hardware. Moreover, we present an evaluation that suggests that moving Paxos logic into the network would yield significant performance benefits for distributed applications.
Article
therefore be of some interest to computer scientists. I present here a short history of the Paxos Parliament's protocol, followed by an even shorter discussion of its relevance for distributed systems. Author's address: Systems Research, Digital Equipment Corporation, 130 Lytton Avenue, Palo Alto, CA 94301. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. 1998 ACM 0734-2071/98/0500--0133 $5.00 1 It should not be confused with the Ionian island of Paxoi, whose name is sometimes corrupted to Paxos. ACM Transact
Netpaxos: Consensus at network speed
  • H T Dang
H. T. Dang et al. Netpaxos: Consensus at network speed. In Proc. SOSR, 2015.
Consensus in a box: Inexpensive coordination in hardware
  • Z István
Z. István et al. Consensus in a box: Inexpensive coordination in hardware. In Proc. NSDI, 2016.