Conference PaperPDF Available

P4BFT: A Demonstration of Hardware-Accelerated BFT in Fault-Tolerant Network Control Plane

P4BFT: A Demonstration of Hardware-Accelerated
BFT in Fault-Tolerant Network Control Plane
Ermin Sakic
Cristian Bermudez Serna
Siemens AG
Munich, Germany
Endri Goshi
Nemanja Deric
Wolfgang Kellerer
Technical University Munich
Munich, Germany
CCS Classication:
Networks -> Programmable Networks;
Systems Security -> Distributed Systems Security;
ONOS and OpenDaylight deploy RAFT consensus to enforce
update ordering and leader fail-over in the face of controller
failures. RAFT is, however, unable to distinguish malicious /
incorrect (e.g., buggy [
]), from correct controller decisions,
and can easily be manipulated by an adversary in possession
of the leader [
]. Byzantine Fault Tolerance (BFT)-enabled
controller designs support correct consensus in scenarios
where a subset of controllers is faulty due to a malicious adver-
sary or internal bugs. Existing proposals [
] base their
correctness on premonition that controller outputs are col-
lected by trusted conguration targets and are compared by
the same for payload matching for the purpose of correct
message identication. Namely, each controller instance of
the administrative domain transmits its conguration to the
target switch. In in-band [
] deployments, where applica-
tion ows share the same infrastructure as the control ows,
the trac generated by controller replicas imposes a non-
negligible network load [
]. Furthermore, comparing and
processing controller messages in the switches’ control plane
incurs additional CPU load [2, 4] and reconguration time.
P4BFT introduces the concept of in-network processing
nodes, which intercept and collect individually computed
controller outputs for matching client requests. After col-
lecting sucient number of packets to identify the correct
message payload, a processing node forwards the correct
payload to the destined conguration target. By intercepting
control ows in processing switches, and establishing point-
to-point connections between processing switches and target
destinations, it minimizes the network load imposed by BFT
operation. P4BFT realizes the processing node functionality
purely using a P4 pipeline, responsible for controller packet
collection, correct packet identication and its forwarding to
the destination nodes at line rate, thus eectively minimizing
accesses to the switches’ software control plane and vastly
outperforming software-based BFT solutions.
In P4BFT, network controller instances are congured as a
set of replicated state machines, i.e., where each instance
calculates its decision in isolation from other controllers,
and transmits its decision to the destination switch. Control
packets are intercepted by the processing nodes (i.e., pro-
cessing switches) responsible for decisions destined for the
target switch. Consider Fig. 1. Given the placement of con-
trollers and the processing nodes’ capacity, with objective
to minimize the total control plane footprint and response
time, incurred for target conguration switches, P4BFT’s Re-
assigner component identies the optimal processing node
as the best t processing node for
. The multi-objective
formulation further considers the delay metric, available
processing capacities at switches (e.g., a hardware-enabled
P4BFT node has a higher throughput than the software-based
one), and thus minimizes the total traversed critical path be-
tween the controller furthest away from the conguration
target (
3in the worst case in Fig. 1, assuming a delay
weight of 1per hop). The resulting assignment additionally
minimizes the communication overhead to
11. This
is compared to state-of-the-art works [
] that default the
processing node assignment to reconguration targets, thus
resulting in higher FDand FCcompared to P4BFT.
Reassigner is the component that dynamically reassigns
the controller-switch connections based on the events col-
lected from the detection mechanism of the P4BFT switches.
Upon detection of faulty controllers, it excludes those from
the assignment procedure. It determines a minimum number
of required controllers, necessary to tolerate a congurable
number of availability failures
and Byzantine failures
], and assigns the controllers to each switch of its adminis-
trative domain. Additionally, the Reassigner maps a process-
ing node, in charge of controller messages’ comparison, to
each destination switch. Thus, switches declared as process-
ing nodes gain the responsibility of control packets collection
and forwarding. Reassigner executes once during network
bootstrapping and for selected control plane changes (i.e., on
SIGCOMM’19, August 19-24, 2019, Beijing, China Sakic et al.
Cluster 1 Cluster 2
C1 C2 C3 C4 C5
Controller Cluster
Processing Node
Destination Node
S1 S2 S3
S4 S5
Figure 1: P4BFT’s oloading of processing role ca-
pability to intermediate switches leads to decreased
packet footprint and control ow delays on critical
path, FC= 11 and FD= 3 hops, respectively.
addition / disconnection of a switch / controller). The opti-
mization output is enforced upon the P4 match-action tables
of the switches: i) the Processing Table, necessary to iden-
tify the switches responsible for comparison of controller
messages; and ii) the Forwarding Tables, necessary for for-
warding of controller messages to processing nodes and re-
conguration targets. Given the user-congurable parameter
of required tolerated failures
, Reassigner reports
to processing nodes the number of necessary matching mes-
sages that must be collected prior to marking a controller
message as correct. The details of the P4 control ow, as well
as the match-action pairs of P4 tables, are presented in [
P4BFT’s Reassigner can be deployed as a trusted compo-
nent of switch, or as a replicated component of the network
controller, i.e., in at least 2
1instances, so to tolerate
FMByzantine and FAavailability faults of the Reassigner.
The accompanying demonstration showcases the practical
advantage of P4BFT in 34-switch Internet2 and exemplary
topologies (ref. Fig. 1) in a testbed equipped with software
and physical P4-enabled switches. The signicant load foot-
print and reconguration delay improvements over state-
of-the-art works are visualized on a real-time dashboard,
similarly to Fig. 2. Furthermore, it is shown how a hardware-
based packet comparison can lead to a lowered total recong-
uration delay in scenarios where the capability of a process-
ing role is centralized in a single P4-enabled hardware node,
due to the decrease in number of accesses to the software-
based control plane. The software switches are instances of
the open-source
2reference switch, adhering to
language specication. The hardware-based
P4BFT data plane node comprises the Netronome Agilio CX
10GE SmartNIC. The option to disable the ooading of pro-
cessing node capability is implemented for the purpose of
comparison to methods presented in existing works. The con-
gurable weight parameters allow for ne-tuning of multi-
objective optimization, and thus provide the user with an in-
terface to prefer either minimized communication overhead
or resulting reconguration latency. The special case, where
all control plane packets stemming from replicated controller
instances are traversing a single
hardware node, demon-
strates the advantages of hardware-accelerated packet hash-
ing and comparison, and thus undermines a case for hybrid
deployments where the control-plane relies on state-of-the-
art control protocols (e.g., OpenFlow / NETCONF+YANG),
whereas the P4BFT-equipped edge nodes internalize the pro-
cessing node capabilities.
2 4 6 8 10 12 14 16 18
Switch Reconfiguration Delay [ms]
Cumulative Probability
Footprint Improvement [%]
(compared to [1, 2, 4])
State-of-Art ([1, 2, 4])
P4BFT (bmv2)
P4BFT - 1x Processing Node (P416 SmartNIC)
P4BFT - 1x Processing Node (bmv2 )
Mean Control Plane Load Improvement
Figure 2: P4BFT’s performance gains in terms of con-
trol plane load and reconguration latency.
This work has received funding from the EU in the context
of the H2020 project SEMIOTICS (grant agreement number
P4BFT: A Demonstration of HW-Accelerated BFT in FT Network CP SIGCOMM’19, August 19-24, 2019, Beijing, China
He Li et al
2014. Byzantine-resilient secure SDN with multiple con-
trollers in cloud. IEEE Transactions on Cloud Computing 2, 4 (2014).
Purnima Murali Mohan et al
2017. Primary-Backup Controller Mapping
for Byzantine Fault Tolerance in SDN. In 2017 IEEE Global Communica-
tions Conference (IEEE Globecom 2017). IEEE.
Abubakar Siddique Muqaddas et al
2016. Inter-controller trac in
ONOS clusters for SDN. In 2016 IEEE International Conference on Com-
munications (IEEE ICC 2016). IEEE.
Ermin Sakic et al
2018. MORPH: An Adaptive Framework for Ecient
and Byzantine Fault-Tolerant SDN Control Plane. IEEE Journal on
Selected Areas in Communications 36, 10 (2018).
Ermin Sakic et al
2019. P4BFT: Hardware-Accelerated Byzantine-
Resilient Network Control Plane. CoRR abs/1905.04064 (2019).
Liron Schi et al
2016. In-band synchronization for distributed SDN
control planes. ACM SIGCOMM CCR 46, 1 (2016).
Petra Vizarreta et al
2017. An empirical study of software reliability in
SDN controllers. In 2017 13th International Conference on Network and
Service Management (IEEE CNSM 2017). IEEE.
... To keep the correct operation in a distributed SDN through Byzantine Fault Tolerance (BFT), the efficiency and scalability would be compromised due the delay and traffic load imposed by the consensus procedure. To overcome this overhead, Sakic et al. in [131], [132], and Han et al. [133] leverage in-network computing. ...
... Sakic et al. [131], [132] focus on providing a correct consensus in the scenario of Byzantine failures where a subset of controllers operate faulty. Three entities are involved in the model: (i) Network controllers that decide about forwarding plane configuration. ...
... In the context of the same SDN architecure as in [131], [132], Han et al. [133] offload complete functionalities of the BFT to the programmable switches. Furthermore, time synchronization and state synchronization are also performed in programmable switch to reduce the communication and latency overhead imposed by communication among controllers. ...
Full-text available
In comparison with cloud computing, edge computing offers processing at locations closer to end devices and reduces the user experienced latency. The new recent paradigm of innetwork computing employs programmable network elements to compute on the path and prior to traffic reaching the edge or cloud servers. It advances common edge/cloud server based computing through proposing line rate processing capabilities at closer locations to the end devices. This paper discusses use cases, enabler technologies and protocols for in-network computing. According to our study, considering programmable data plane as an enabler technology, potential in-network computing applications are in-network analytics, in-network caching, innetwork security, and in-network coordination. There are also technology specific applications of in-network computing in the scopes of cloud computing, edge computing, 5G/6G, and NFV. In this survey, the state of the art, in the framework of the proposed categorization, is reviewed. Furthermore, comparisons are provided in terms of a set of proposed criteria which assess the methods from the aspects of methodology, main results, as well as application-specific criteria. Finally, we discuss lessons learned and highlight some potential research directions.
... Downstream nodes need to parse only the UPC to make forwarding decisions. [485] 2017 -Sankaran et al. [486] 2020 -Zang et al. [487] 2017 bmv2 Dang et al. [488,489] 2016/20 Tofino [490] P4BFT [491,492] 2019 bmv2, Netronome SwiShmem [493] 2020 -SC-BFT [494] 2020 bmv2 ...
... P4BFT [491,492] introduces a consensus mechanism against buggy or malicious control plane instances. The controller responses are sent to trustworthy instances which compare the responses and establish consensus, e.g., by choosing the most common response. ...
With traditional networking, users can configure control plane protocols to match the specific network configuration, but without the ability to fundamentally change the underlying algorithms. With SDN, the users may provide their own control plane, that can control network devices through their data plane APIs. Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane APIs which may be leveraged by user-defined SDN control. Thus, programmable data planes and SDN offer great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community and it is supported by various software and hardware platforms. In this paper, we survey the literature from 2015 to 2020 on data plane programming with P4. Our survey covers 497 references of which 367 are scientific publications. We organize our work into two parts. In the first part, we give an overview of data plane programming models, the programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we analyze a large body of literature considering P4-based applied research. We categorize 241 research papers into different application domains, summarize their contributions, and extract prototypes, target platforms, and source code availability.
... Recently, implementations of consensus algorithms in networking hardware (e.g., those of Paxos [30], [31], Raft [32] and Byzantine agreement [33], [34]) have started gaining traction. Dang et al. [30], [31] portray throughput, latency and flexibility benefits of network-supported consensus execution at line speed. ...
Conference Paper
Full-text available
Centralized Software Defined Networking (SDN) controllers and Network Management Systems (NMS) introduce the issue of controller as a single-point of failure (SPOF). The SPOF correspondingly motivated the introduction of distributed controllers, with replicas assigned into clusters of controller instances replicated for purpose of enabling high availability. The replication of the controller state relies on distributed consensus and state synchronization for correct operation. Recent works have, however, demonstrated issues with this approach. False positives in failure detectors deployed in replicas may result in oscillating leadership and control plane unavailability. In this paper, we first elaborate the problematic scenario. We resolve the related issues by decoupling failure detector from the underlying signaling methodology and by introducing event agreement as a necessary component of the proposed design. The effectiveness of the proposed model is validated using an exemplary implementation and demonstration in the problematic scenario. We present an analytic model to describe the worst- case delay required to reliably agree on replica failures. The effectiveness of the analytic formulation is confirmed empirically using varied cluster configurations in an emulated environment. Finally, we discuss the impact of each component of our design on the replica failure- and recovery-detection delay, as well as on the imposed communication overhead.
Conference Paper
Full-text available
Byzantine Fault Tolerance (BFT) enables correct operation of distributed, i.e., replicated applications in the face of malicious takeover and faulty/buggy individual instances. Recently, BFT designs have gained traction in the context of Software Defined Networking (SDN). In SDN, controller replicas are distributed and their state replicated for high availability purposes. Malicious controller replicas, however, may destabilize the control plane and manipulate the data plane, thus motivating the BFT requirement. Nonetheless, deploying BFT in practice comes at a disadvantage of increased traffic load stemming from replicated controllers, as well as a requirement for proprietary switch functionalities, thus putting strain on switches' control plane where particular BFT actions must be executed in software. P4BFT leverages an optimal strategy to decrease the total amount of messages transmitted to switches that are the configuration targets of SDN controllers. It does so by means of message comparison and deduction of correct messages in the determined optimal locations in the data plane. In terms of the incurred control plane load, our P4-based data plane extensions outperform the existing solutions by ∼ 33.2% and ∼ 40.2% on average, in random 128-switch and Fat-Tree/Internet2 topologies, respectively. To validate the correctness and performance gains of P4BFT, we deploy bmv2 and Netronome Agilio SmartNIC-based topologies. The advantages of P4BFT can thus be reproduced both with software switches and "commodity" P4-enabled hardware. A hardware-accelerated controller packet comparison procedure results in an average 96.4 % decrease in processing delay per request compared to existing software approaches.
Full-text available
Current approaches to tackling the single point of failure in SDN entail a distributed operation of SDN controller instances. Their state synchronization process is reliant on the assumption of a correct decision-making in the controllers. Successful introduction of SDN in the critical infrastructure networks also requires catering to the issue of unavailable, unreliable (e.g. buggy) and malicious controller failures. We propose MORPH, a framework tolerant to unavailability and Byzantine failures, that distinguishes and localizes faulty controller instances and appropriately reconfigures the control plane. Our controller-switch connection assignment leverages the awareness of the source of failure to optimize the number of active controllers and minimize the controller and switch reconfiguration delays. The proposed re-assignment executes dynamically after each successful failure identification. We require 2FM +FA+1 controllers to tolerate FM malicious and FA availability-induced failures. After a successful detection of FM malicious controllers, MORPH reconfigures the control plane to require a single controller message to forward the system state. Next, we outline and present a solution to the practical correctness issues related to the statefulness of the distributed SDN controller applications, previously ignored in the literature. We base our performance analysis on a resource-aware routing application, deployed in an emulated testbed comprising up to 16 controllers and up to 34 switches, so to tolerate up to 5 unique Byzantine and additional 5 availability-induced controller failures (a total of 10 unique controller failures). We quantify and highlight the dynamic decrease in the packet and CPU load and the response time after each successful failure detection.
Conference Paper
Full-text available
Security in Software Defined Networks (SDNs) has been a major concern for its deployment. Byzantine threats in SDNs are more sophisticated to defend since control messages issued by a compromised controller look legitimate. Applying traditional Byzantine Fault Tolerance approach to SDNs requires each switch to be mapped to 3f + 1 controllers to defend against f simultaneous controller failures. This approach on one hand overloads the controllers due to multiple requests from switches. On the other hand, it raises new challenges concerning the switch-controller mapping and determining minimum number of controllers required in the network. In this paper, we present a novel primary-backup controller mapping approach in which a switch is mapped to only f + 1 primary and f backup controllers to defend against simultaneous Byzantine attacks on f controllers. We develop an optimization programming formulation that provides the switch-controller mapping solution and minimizes the total number of controllers required. We consider the controller processing capacity and communication delay between switches and controllers as problem constraints. Our approach also facilitates capacity sharing of backup controllers when two switches use the same backup controller but do not need it simultaneously. We demonstrate the effectiveness of the proposed approach through numerical analysis. The results show that the proposed approach significantly reduces the total number of controllers required by up to 50% compared to an existing scheme while guaranteeing better load balancing among controllers with a fairness index of up to 0.92.
Control planes of forthcoming Software-Defined Networks (SDNs) will be distributed : to ensure availability and fault-tolerance, to improve load-balancing, and to reduce overheads, modules of the control plane should be physically distributed. However, in order to guarantee consistency of network operation, actions performed on the data plane by different controllers may need to be synchronized, which is a nontrivial task. In this paper, we propose a synchronization framework for control planes based on atomic transactions, implemented in-band, on the data-plane switches. We argue that this in-band approach is attractive as it keeps the failure scope local and does not require additional out-of-band coordination mechanisms. It allows us to realize fundamental consensus primitives in the presence of controller failures, and we discuss their applications for consistent policy composition and fault-tolerant control-planes. Interestingly, by using part of the data plane configuration space as a shared memory and leveraging the match-action paradigm, we can implement our synchronization framework in today's standard OpenFlow protocol, and we report on our proof-of-concept implementation.
Conference Paper
Software-defined network (SDN) is the next generation of networking architecture that is dynamic, manageable, cost-effective, and adaptable, making it ideal for the high-bandwidth, dynamic nature of today's applications. In SDN, network management is facilitated through software rather than low-level device configurations. However, the centralized control plane introduced by SDN imposes a great challenge for the network security. In this paper, we present a secure SDN structure, in which each device is managed by multiple controllers rather than a single one as in a traditional manner. It can resist Byzantine attacks on controllers and the communication links between controllers and SDN switches. Furthermore, we design a cost-efficient controller assignment algorithm to minimize the number of required controllers for a given set of switches. Extensive simulations have been conducted to show that our proposed algorithm significantly outperforms random algorithms.
Byzantine-resilient secure SDN with multiple controllers in cloud
  • He Li
He Li et al. 2014. Byzantine-resilient secure SDN with multiple controllers in cloud. IEEE Transactions on Cloud Computing 2, 4 (2014).