ArticlePDF Available

MORPH: An Adaptive Framework for Efficient and Byzantine Fault-Tolerant SDN Control Plane

Authors:

Abstract and Figures

Current approaches to tackling the single point of failure in SDN entail a distributed operation of SDN controller instances. Their state synchronization process is reliant on the assumption of a correct decision-making in the controllers. Successful introduction of SDN in the critical infrastructure networks also requires catering to the issue of unavailable, unreliable (e.g. buggy) and malicious controller failures. We propose MORPH, a framework tolerant to unavailability and Byzantine failures, that distinguishes and localizes faulty controller instances and appropriately reconfigures the control plane. Our controller-switch connection assignment leverages the awareness of the source of failure to optimize the number of active controllers and minimize the controller and switch reconfiguration delays. The proposed re-assignment executes dynamically after each successful failure identification. We require 2FM +FA+1 controllers to tolerate FM malicious and FA availability-induced failures. After a successful detection of FM malicious controllers, MORPH reconfigures the control plane to require a single controller message to forward the system state. Next, we outline and present a solution to the practical correctness issues related to the statefulness of the distributed SDN controller applications, previously ignored in the literature. We base our performance analysis on a resource-aware routing application, deployed in an emulated testbed comprising up to 16 controllers and up to 34 switches, so to tolerate up to 5 unique Byzantine and additional 5 availability-induced controller failures (a total of 10 unique controller failures). We quantify and highlight the dynamic decrease in the packet and CPU load and the response time after each successful failure detection.
Content may be subject to copyright.
A preview of the PDF is not available
... Replicas agree on the order of updates and eventually commit the update requests in the per application replicated log, thus providing for strong consistency property (i.e., serializability and linearizability [43]). We assume a non-Byzantine [8], [33], fail-stop [5] model -replicas that fail, cease to work correctly. Network: We assume Raft replicas connected in any-toany manner with reliability provided using disjoint paths enabling for fail-over of replica-to-replica connections in case of network link/node outages. ...
... In contrast, the links of the Fat-Tree only posses the inherit processing and queuing delays. In the Internet2, we leverage a Raft replica placement that allows for high robustness against replica failures according to [13], [33]. In the case of Fat-Tree, replica / client pairs were placed on leafnodes in Round-Robin order similar to [33]. ...
... In the Internet2, we leverage a Raft replica placement that allows for high robustness against replica failures according to [13], [33]. In the case of Fat-Tree, replica / client pairs were placed on leafnodes in Round-Robin order similar to [33]. PPM model computation: If a feature has a variance that is orders of magnitude larger than others, it might dominate EN's objective function due to values with smaller amplitudes being penalized more by L1 and L2 regularizers, in result making the estimator unable to learn from all features. ...
Preprint
Modern stateful web services and distributed SDN controllers rely on log replication to omit data loss in case of fail-stop failures. In single-leader execution, the leader replica is responsible for ordering log updates and the initiation of distributed commits, in order to guarantee log consistency. Network congestions, resource-heavy computation, and imbalanced resource allocations may, however, result in inappropriate leader election and increased cluster response times. We present SEER, a logically centralized approach to performance prediction and efficient leader election in leader-based consensus systems. SEER autonomously identifies the replica that minimizes the average cluster response time, using prediction models trained dynamically at runtime. To balance the exploration and exploitation, SEER explores replicas' performance and updates their prediction models only after detecting significant system changes. We evaluate SEER in a traffic management scenario comprising [3..7] Raft replicas, and well-known data-center and WAN topologies. Compared to the Raft's uniform leader election, SEER decreases the mean control plane response time by up to ~32%. The benefit comes at the expense of the minimal adaptation of Raft election procedure and a slight increase in leader reconfiguration frequency, the latter being tunable with a guaranteed upper bound. No safety properties of Raft are invalidated by SEER.
... Recent approaches to handling the issues of controller state consistency [151][152][153] recommended the use of adaptive consistency for the distributed SDN controller platforms. Aslan et al. [151] attempted to mitigate the impact of controller state distribution on SDN application performance by proposing an adaptive tunable consistency model following the delta consistency model. ...
... Recent research in SDN [151][152][153] has introduced the concept of adaptive consistency in the context of distributed SDN control. Unlike static consistency approaches, adaptivelyconsistent controllers adjust their consistency level at run-time to reach the desired application performance and consistency requirements. ...
Thesis
Centralized SDN designs raise many challenges including the issues of scalability and reliability. The latter can be addressed with the physical decentralization of the SDN control plane. However, such physically distributed, but logically centralized systems, bring anadditional set of open challenges. This thesis deals with the problem of decentralizing the SDN control plane in the context of large-scale networks. First, to assist recent initiatives in putting the SDN paradigm into practice, this thesis proposes original classifications that make comparisons between the broad range of state-of-the art SDN controller platforms with respect to various criteria. It also provides a thorough analysis of the major challenges encountered by the existing distributed SDN controller platforms. Furthermore, three novel approaches are proposed to decentralize the SDN control plane in large-scale networks while tackling some of the most prominent associated challenges. The first approach addresses the SDN controller placement problem by proposing scalability and reliability aware strategies for the placement of distributed SDN controllersat scale using different types of multi-criteria optimization algorithms. The second and third approaches investigate the knowledge sharing problem in the distributed SDN control by proposing adaptive and continuous consistency models for the distributed SDNcontrollers. The first approach uses a novel Anti-Entropy reconciliation mechanism for applications with eventual consistency needs on top of the ONOS controllers. The third approach puts forward an intelligent Quorum-based replication strategy for a CDN-like application developed on ONOS. The last two approaches are mainly aimed at achieving a consistency adaptation strategy that provides at run-time balanced trade-offs between the application’s continuous performance and consistency requirements. These real-time trade-offs should provide minimal application inter-controller overhead while satisfying the application-defined thresholds specified in the given application SLAs.
... The protection against attacks on SDN begins from overcoming the weaknesses of the architecture of traditional SDNs. For example, maintaining the control of the network in a logically centralized, but physically distributed manner can overcome the challenges of resource exhaustion attacks, as well as ensuring availability of network control points for the data plane [35]. Such resilience can be achieved through devolving the controller functions (e.g., local decisionmaking) [36], implementing hierarchical controllers [37], increasing resources and resource capabilities, and using intelligent security systems equipped with Machine Learning (ML) for proactive measures to be in place before attacks enter the weak points in a network [32]. ...
Article
Full-text available
5G enables the use of different types of services over the same physical infrastructure through the concepts and technologies of virtualization, softwarization, network slicing and cloud computing. Mobile Virtual Network Operators (MVNOs), using these concepts, provide an opportunity to share the same physical infrastructure among multiple operators. Each MVNO can have own distinct operating and support systems. However, the technologies used to enable such an environment have their own explicit security challenges and solutions. The integrated environment built upon these novel concepts and technologies, thus, will have complex security implications and requirements to be satisfied. In this vain, this article provides an overview of the security challenges and potential solutions for MVNOs.
... Отже, проведемо короткий аналіз проведених та опублікованих досліджень останніх років (Табл. 2) [15][16][17][18][19][20][21][22][23][24]. У роботі [16] було запропоновано підхід до боротьби з відмовами каналів зв'язку у площині даних -Controlled based Robust Network (CORONET). ...
Article
Full-text available
The article is devoted to the Network Layer means to ensure resilience during designing an infocommunication system that can counteract faults and failures. A review of the default gateway redundancy protocols concept and analysis of recent developments to overcome fault tolerance challenges in the Software-Defined Networks (SDN) control plane are conducted. In addition, an approach to the use of default gateway redundancy protocols in the existing Software-Defined Network architecture is proposed. Therefore, within the approach, the redundancy of the virtual controller is organized based on the current protocol implemented in traditional IP networks, and the SDN switch interacts with the virtual controller. This mechanism aims to reduce the amount of circulating overhead (control traffic), and the backup controller’s organization increases the control plane’s reliability. Whereas in hybrid and hierarchical SDN networks with border routers, the GLBP mechanism can be applied, which increases the reliability of the controller connected to the data plane. In addition, there are several scenarios where the controller that manages the operation of the SDN data plane may have multiple backup controllers to switch in case of failure, or a controller pool is used to manage each network that makes up the SDN data plane. It also highlights promising future areas for research and development to improve Software-Defined Network resilience, which contributes to the emergence of new solutions. Thus, future research directions are seen in proposing mathematical flow-based models of fault-tolerant interaction of the control plane and the data plane based on redundancy. At the same time, setting the problem in an optimization form with the implementation of load balancing will help to use available network resources effectively.
Article
Software-defined wide area networking (SD-WAN) enables dynamic network policy control over a large distributed network via network updates . To be practical, network updates must be consistent (i.e., free of transient errors caused by updates to multiple switches), secure (i.e., only be executed when sent from valid controllers), and reliable (i.e., function despite the presence of faulty or malicious members in the control plane), while imposing only minimal overhead on controllers and switches. We present SERENE: a protocol for se cure and re liable ne twork updates for SD-WAN environments. In short: Consistency is provided through the combination of an update scheduler and a distributed transactional protocol. Security is preserved by authenticating network events and updates, the latter with an adaptive threshold cryptographic scheme. Reliability is provided by replicating the control plane and making it resilient to a dynamic adversary by using a distributed ledger as a controller failure detector. We ensure practicality by providing a mechanism for scalability through the definition of independent network domains and exploiting parallelism of network updates both within and across domains. We formally define SERENE’s protocol and prove its safety with regards to event-linearizability. Extensive experiments show that SERENE imposes minimal switch burden and scales to large networks running multiple network applications all requiring concurrent network updates, imposing at worst a 16% overhead on short-lived flow completion and negligible overhead on anticipated normal workloads.
Article
Mobile edge computing (MEC) is a key feature of next-generation mobile networks aimed at providing a variety of services for different applications by performing related processing tasks closer to the users. With the advent of the next-generation mobile networks, researchers have turned their attention to various aspects of edge computing in an effort to leverage the new capabilities offered by 5G. So, the integration of software defined networking (SDN) and MEC techniques was seriously considered to facilitate the orchestration and management of Mobile Edge Hosts (MEH). Edge clouds can be installed as an interface between the local servers and the core to provide the required services based on the known concept of the SDN networks. Nonetheless, the problem of reliability and fault tolerance will be of great importance in such networks. The paper introduced a dynamic architecture that focuses on the end-to-end mobility support required to maintain service continuity and quality of service. This paper also presents an SDN control plane with stochastic network calculus (SNC) framework to control MEC data flows. In accordance with the entrance processes of different QoS-class data flows, closed-form problems were formulated to determine the correlation between resource utilization and the violation probability of each data flow. Compared to other solutions investigated in the literature, the proposed approach exhibits a significant increase in the throughput distributed over the active links of mobile edge hosts. It also proved that the outage index and the system’s aggregate data rate can be effectively improved by up to 32%.
Article
The decoupling of the data plane and the control plane in the Software-Defined Network (SDN) can increase the flexibility of network management and operation. And it can reduce the network limitations caused by the hardware. However, the centralized scheme in SDN also can introduce some other security issues such as the single point of failure, the data consistency in multiple- controller environment and the spoofing attack initiated by a malicious device in the data plane. To solve these problems, a security framework for SDN based on Blockchain (BCSDN) is proposed in this paper. BCSDN adopts a physically distributed and logically centralized multi-controller architecture. LLDP protocol is periodically used to obtain the link state information of the network, and a Merkle tree is establised according to the collected link information and the signature is generate based on KSI for each link that submitted by a switch by the main controller selected by using the PoW mechanism. Such, the dynamic change of network topology is recorded on Blockchian and the consistency of the topology information among multiple controllers can be guaranteed. The main controller issues the signature to the corresponding switch and a controller checks the legitimate of a switch by verifying the signature when it requests the flow rule table from the controller later. The signature verification ensures the authenticated communication between a controller and a switch. Finally, the simulation of the new scheme is implemented in Mininet platform that is a network emulation platform and experiments are done to verify our novel solution in our simulation tool. And we also informally analysis the security attributes that provided by our BCSDN.
Article
The separation of control and data plane in Software Defined Networking (SDN) introduces new security threats. A compromised controller can leverage its position to perform attacks by installing malicious rules in switches while avoiding detection. Current approaches propose broadcast of flow-setup requests to multiple controllers simultaneously and to check consistency of forwarding rules to install the correct rule and identify compromised controllers. However, such approaches result in heavy load on the control plane, resulting in longer response times to requests and higher network cost to accommodate the increased load. To alleviate this issue, we propose a game-theory based framework to detect a malicious controller without overloading the control plane. Instead of broadcasting every request to multiple controllers, switches randomly broadcast requests on the basis of a randomization strategy obtained by the Stackelberg game, whose solution results in a randomization strategy that maximizes the detection probability of a malicious controller. We formulate a two-level optimization problem in the context of our game-theoretic framework that aims to maximize the attack detection probability among the set of controllers by mapping switches to controllers and obtaining randomization strategies for each controller. We develop Midas (MalIcious controller Detection mApping Strategy), a heuristic algorithm to obtain an effective solution to the optimization problem in reasonable time. Midas achieves minimum detection probability within 12% of the optimal solution. Further, it achieves at least 80% of min-max ratio of load at the controllers, implying higher fairness in load distribution compared to optimal solution, a state-of-art algorithm and a baseline heuristic.
Conference Paper
Full-text available
With Software Defined Networking (SDN) the con-trol plane logic of forwarding devices, switches and routers, isextracted and moved to an entity called SDN controller, whichacts as a broker between the network applications and physicalnetwork infrastructure. Failures of the SDN controller inhibitthe network ability to respond to new application requests andreact to events coming from the physical network. Despite of thehuge impact that a controller has on the network performanceas a whole, a comprehensive study on its failure dynamics isstill missing in the state of the art literature. The goal of thispaper is to analyse, model and evaluate the impact that differentcontroller failure modes have on its availability. A model in theformalism of Stochastic Activity Networks (SAN) is proposedand applied to a case study of a hypothetical controller based oncommercial controller implementations. In case study we showhow the proposed model can be used to estimate the controllersteady state availability, quantify the impact of different failuremodes on controller outages, as well as the effects of softwareageing, and impact of software reliability growth on the transientbehaviour. Characterization of failure dynamics in SDN controllers. Available from: https://www.researchgate.net/publication/320832289_Characterization_of_failure_dynamics_in_SDN_controllers [accessed Feb 12 2018].
Conference Paper
Full-text available
Security in Software Defined Networks (SDNs) has been a major concern for its deployment. Byzantine threats in SDNs are more sophisticated to defend since control messages issued by a compromised controller look legitimate. Applying traditional Byzantine Fault Tolerance approach to SDNs requires each switch to be mapped to 3f + 1 controllers to defend against f simultaneous controller failures. This approach on one hand overloads the controllers due to multiple requests from switches. On the other hand, it raises new challenges concerning the switch-controller mapping and determining minimum number of controllers required in the network. In this paper, we present a novel primary-backup controller mapping approach in which a switch is mapped to only f + 1 primary and f backup controllers to defend against simultaneous Byzantine attacks on f controllers. We develop an optimization programming formulation that provides the switch-controller mapping solution and minimizes the total number of controllers required. We consider the controller processing capacity and communication delay between switches and controllers as problem constraints. Our approach also facilitates capacity sharing of backup controllers when two switches use the same backup controller but do not need it simultaneously. We demonstrate the effectiveness of the proposed approach through numerical analysis. The results show that the proposed approach significantly reduces the total number of controllers required by up to 50% compared to an existing scheme while guaranteeing better load balancing among controllers with a fairness index of up to 0.92.
Article
Full-text available
In software-defined networking (SDN), as data plane scale expands, scalability and reliability of the control plane has become major concerns. To mitigate such concerns, two kinds of solutions have been proposed separately. One is multi-controller architecture, i.e., a logically centralized control plane with physically distributed controllers. The other is control devolution, i.e., delegating control of some flows back to switches. Most of existing solutions adopt either static switch-controller association or static devolution, which may not adapt well to the traffic variation, leading to high communication costs between switches and controller, and high computation costs of switches. In this paper, we propose a novel scheme to jointly consider both solutions, i.e., we dynamically associate switches with controllers and dynamically devolve control of flows to switches. Our scheme is an efficient online algorithm that does not need the statistics of traffic flows. By adjusting some parameter V, we can make a trade-off between costs and queue backlogs. Theoretical analysis and extensive simulations show that our scheme yields much lower costs and latency compared to static schemes, and balanced loads among controllers.
Conference Paper
Full-text available
State synchronisation in clustered Software Defined Networking controller deployments ensures that all instances of the controller have the same state information in order to provide redundancy. Current implementations of controllers use a strong consistency model, where configuration changes must be synchronised across a number of instances before they are applied on the network infrastructure. For large deployments, this blocking process increases the delay of state synchronisation across cluster members and consequently has a detrimental effect on network operations that require rapid response, such as fast failover and Quality of Service applications. In this paper, we introduce an adaptive consistency model for SDN Controllers that employs concepts of eventual consistency models along with a novel `cost-based' approach where strict synchronisation is employed for critical operations that affect a large portion of the network resources while less critical changes are periodically propagated across cluster nodes. We use simulation to evaluate our model and demonstrate the potential gains in performance.
Conference Paper
In SDN, the logically centralized control plane ("network OS") is often realized via multiple SDN controllers for scalability and reliability. ONOS is such an example, where it employs Raft -- a new consensus protocol developed recently -- for state replication and consistency among the distributed SDN controllers. The reliance of network OS on consensus protocols to maintain consistent network state introduces an intricate inter-dependency between the network OS and the network under its control, thereby creating new kinds of fault scenarios or instabilities. In this paper, we use Raft to illustrate the problems that this inter-dependency may introduce in the design of distributed SDN controllers and discuss possible solutions to circumvent these issues.
Conference Paper
Software-Defined Networking (SDN) is a novel architectural model for cloud network infrastructure, improving resource utilization, scalability and administration. SDN deployments increasingly rely on virtual switches executing on commodity operating systems with large code bases, which are prime targets for adversaries attacking the network infrastructure. We describe and implement \( {\textsf{TruSDN}} \), a framework for bootstrapping trust in SDN infrastructure using Intel Software Guard Extensions (SGX), allowing to securely deploy SDN components and protect communication between network endpoints. We introduce ephemeral flow-specific pre-shared keys and propose a novel defense against cuckoo attacks on SGX enclaves. \( {\textsf{TruSDN}} \) is secure under a powerful adversary model, with a minor performance overhead.