Conference Paper

Dynamic switch-controller association and control devolution for SDN systems

Authors:
  • Shenzhen Institute of Artificial Intelligence and Robotics for Society
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... products-services/advanced-networking/ 2 Open vSwitch -https://www.openvswitch.org/ (a) Internet2 topology [38] (b) Fat-tree topology [39] Fig. 4. Exemplary network topologies and controller placements used in the evaluation of MORPH. Elements highlighted in green and blue represent the switches and clients, respectively. ...
... Elements highlighted in green and blue represent the switches and clients, respectively. Red elements are the controller instances placed as per [38], [39]. Each switch instance in the Internet2 topology is allocated a client, while the fat-tree topology hosts clients at the leaf switches. ...
... The links of the fattree topology only posses the inherit processing and queuing delays. The arrival rates of the incoming service embedding requests were modeled using a negative exponential distribution [39]. In the fat-tree topology, each leaf-switch was connected to 2 client instances, bringing the total number of clients up to 16. ...
Article
Full-text available
Current approaches to tackling the single point of failure in SDN entail a distributed operation of SDN controller instances. Their state synchronization process is reliant on the assumption of a correct decision-making in the controllers. Successful introduction of SDN in the critical infrastructure networks also requires catering to the issue of unavailable, unreliable (e.g. buggy) and malicious controller failures. We propose MORPH, a framework tolerant to unavailability and Byzantine failures, that distinguishes and localizes faulty controller instances and appropriately reconfigures the control plane. Our controller-switch connection assignment leverages the awareness of the source of failure to optimize the number of active controllers and minimize the controller and switch reconfiguration delays. The proposed re-assignment executes dynamically after each successful failure identification. We require 2FM +FA+1 controllers to tolerate FM malicious and FA availability-induced failures. After a successful detection of FM malicious controllers, MORPH reconfigures the control plane to require a single controller message to forward the system state. Next, we outline and present a solution to the practical correctness issues related to the statefulness of the distributed SDN controller applications, previously ignored in the literature. We base our performance analysis on a resource-aware routing application, deployed in an emulated testbed comprising up to 16 controllers and up to 34 switches, so to tolerate up to 5 unique Byzantine and additional 5 availability-induced controller failures (a total of 10 unique controller failures). We quantify and highlight the dynamic decrease in the packet and CPU load and the response time after each successful failure detection.
... Nonetheless, such a joint design often involves two concerns. One concern is about the tradeoff among the communication costs (e.g., bandwidth or roundtrip time) incurred by uploading requests to the control plane, local computational costs on switches, and queue stability in SDN systems [17]. The other concern comes from various uncertainties in SDN systems. ...
... Among such works, there are two lines of research in recent years that focus on optimizing the effective control and optimization for switch-controller association [3]- [12] and control devolution [13]- [16], respectively. Instead of studying these two topics separately, later works [17] [27] considered the problem of joint switch-controller association and control devolution, then proposed online control and predictive control schemes to optimize system costs with queue stability guarantee, respectively. Although the problem studied by such works is similar to our work, we would like to point out that all such works implicitly assume the full availability of instant system dynamics, which is hard to attain in practice. ...
... On the one hand, the benefit from uploading requests lies in that controllers have adequate service capacities and thus they incur shorter processing latencies than switches. Such a benefit may be offset unexpectedly by considerable communication costs (e.g., round-trip time [17]) of request uploading. On the other hand, if most requests are kept processed locally, switches may be overwhelmed by prohibitively high computational costs. ...
Preprint
In software-defined networking (SDN) systems, it is a common practice to adopt a multi-controller design and control devolution techniques to improve the performance of the control plane. However, in such systems, the decision-making for joint switch-controller association and control devolution often involves various uncertainties, e.g., the temporal variations of controller accessibility, and computation and communication costs of switches. In practice, statistics of such uncertainties are unattainable and need to be learned in an online fashion, calling for an integrated design of learning and control. In this paper, we formulate a stochastic network optimization problem that aims to minimize time-average system costs and ensure queue stability. By transforming the problem into a combinatorial multi-armed bandit problem with long-term stability constraints, we adopt bandit learning methods and optimal control techniques to handle the exploration-exploitation tradeoff and long-term stability constraints, respectively. Through an integrated design of online learning and online control, we propose an effective Learning-Aided Switch-Controller Association and Control Devolution (LASAC) scheme. Our theoretical analysis and simulation results show that LASAC achieves a tunable tradeoff between queue stability and system cost reduction with a sublinear time-averaged regret bound over a finite time horizon.
... The resulting controller placement is depicted in Fig. 3a. SDN controller replicas in the data-center topology are assumed to run on the leaf-nodes, deployed as virtual machines (VMs) (Fig. 3b), similar to the controller placement presented in [33]. ...
... (a) Internet2 topology [31] (b) Fat-tree topology [33] Fig. 3. Exemplary network topologies and controller placements used in the evaluation of the SC, AC and EC frameworks. Elements highlighted in green and blue represent the forwarding and compute devices, respectively. ...
... Elements highlighted in green and blue represent the forwarding and compute devices, respectively. Red elements are the OpenDaylight controller instances placed as per [31], [33]. ...
Article
Full-text available
Scalability of the control plane in a Software Defined Network (SDN) is enabled by means of decentralization of the decision-making logic, i.e. by replication of controller functions to physically or virtually dislocated controller replicas. Replication of a centralized controller state also enables the protection against controller failures by means of primary and backup replicas responsible for managing the underlying SDN data plane devices. In this work, we investigate the effect of the the deployed consistency model on scalability and correctness metrics of the SDN control plane. In particular, we compare the strong and eventual consistency, and make a case for a novel adaptive consistency approach. The existing controller platforms rely on either strong or eventual consistency mechanisms in their state distribution. We show how an adaptive consistency model offers the scalability benefits in terms of the total requesthandling throughput and response time, in contrast to the strong consistency model. We also outline how the adaptive consistency approach can provide for correctness semantics, that are unachievable with the eventual consistency paradigm in practice. The adaptability of our approach provides a balanced and tunable trade-off of scalability and correctness for the SDN application implemented on top of the adaptive framework. To validate our assumptions, we evaluate and compare the different approaches in an emulated testbed with an example of a load balancer controller application. The experimental setup comprises up to five extended OpenDaylight controller instances and two network topologies from the area of service provider and data center networks.
... The approach studied in [10] attempts to obtain a strategy to jointly optimize controller-switch assignment and control-path routing to minimize the number of recovery stages. To reduce flow setup time, reference [3] introduces a multiple mapping Closer to our work is [11] which aims to find a dynamic switch-controller association so as to minimize the long-term average cost. However, the authors do not consider the constraints of resource as well as service latency, which have a significant impact on the success of service delivery. ...
... where B = E {Y,eeG [ ^ \ 0 ( t ) } and the fact that the service processing rate pe(t),Ve e G is independent of current queue backlog. □ However, while minimizing a bound on A ( 0 (t)) would stabilize the system and satisfy constraint (11), it may result in a high cost m . Instead, by incorporating queue stability into system cost, we introduce a Lyapunov drift-plus-penalty function as follows ...
... where V is a control parameter to determine how much cost minimization is emphasized. Then from Theorem 1, the objective function of problem (14) can be re-formulated in combining with constraint (11). The problem is re-stated as: every slot t , given the current queue states 0 (t), x , y , z must satisfy constraints (1)-(6) and ...
... As the data plane expands, control plane may not be able to process the increasing number of requests that implemented with a single controller, resulting in unacceptable latency to flow setup. To address such problems, it is necessary to propose a modern controller design [11]. Multi-controller becomes a new SDN design scheme, which could solve the problem caused by the single controller. ...
... Based on a real-time network situation, the RLS makes the forwarding decision for every new flow [19]. However, using RLS as a centralized controller in the SDN poses some problems such as bottlenecks of a single controller, responsiveness, reliability and poor scalability [20] [11]. To overcome the mentioned problems, using multiple distributed controllers working together is a simple solution to overcome the functions of the logically centralized controller [21]. ...
Preprint
Full-text available
Recently, software defined networks (SDN) is a new form for networking that has the potential to have a major impact on Internet technology. Key aspect of SDN include, disassociating the data plane from control plane. The control plane consists of one or more controllers which are considered as the brain of SDN network. A centralized controller suffers from the issues of reliability and scalability. As the network size increases, the centralized controller cannot support the increasing flow processing. Thus, the promising solution for SDN with large-scale networks is the multi-controller. But also exist the problem of the load imbalance. If the load balancing is uneven in the SDN networks, it will greatly effect the performance of network. Many SDN-based load balancing strategies have been proposed to improve the performance of the SDN networks, therefore, in this paper, we present a comprehensive survey, we summarizes and classifies the load balancing schemes in SDN networks, and analyzes the advantages and disadvantages of the available proposed methods. Also we overview the research works and list some interesting open challenges for the future direction.
... A large body of works focus on statically delegation of certain functions or flows [6,10,27]. Some recent works [24] [11] have even proposed more flexible schemes to perform delegation based on real-time distribution of network states or workloads on controllers. ...
... From the above example, we find that predictive scheduling can effectively shorten the request response time by taking advantages of utilizing the present residual processor capacities. However, it still remains challenging how to incorporate predictive scheduling with joint switch-controller association and control devolution, especially with the non-trivial trade-off between system cost reduction and queueing stability in the system [11]. Furthermore, the benefits of predictive scheduling in SDN systems remains unexplored. ...
Conference Paper
In software-defined networking (SDN) systems, the scalability and reliability of the control plane still remain as major concerns. Existing solutions adopt either multi-controller designs or control devolution back to the data plane. The former requires a flexible yet efficient switch-controller association mechanism to adapt to workload changes and potential failures, while the latter demands timely decision making with low overheads. The integrate design for both is even more challenging. Meanwhile, the dramatic advancement in machine learning techniques has boosted the practice of predictive scheduling to improve the responsiveness in various systems. Nonetheless, so far little work has been conducted for SDN systems. In this paper, we study the joint problem of dynamic switch-controller association and control devolution, while investigating the benefits of predictive scheduling in SDN systems. We propose POSCAD, an efficient, online, and distributed scheme that exploits predictive future information to minimize the total system cost and the average request response time with queueing stability guarantee. Theoretical analysis and trace-driven simulation results show that POSCAD requires only mild-value of future information to achieve a near-optimal system cost and near-zero average request response time. Further, POSCAD is robust against mis-prediction to reduce the average request response time.
... They have designed in this work a greedy set coverage algorithm with the coalitional game in order to reduce both control resource consumption and control traffic overhead. Huang et al 25 are concerned with a combination between dynamic switch-controller association and control devolution. For this, a greedy algorithm based on a stochastic optimization problem is designed in order to decide if requests will be added to local queues of switches or uploaded to the controller. ...
Article
Full-text available
In a digital world changing at high speed, data centers still play a major role in new network architectures. Software‐defined network (SDN) technology is considered as a suitable solution to solve and overcome several limitations of standard data centers including flexibility, scalability, and ease of management. However, in an SDN with a distributed control plane, the assignment of switches could be a cause of generating significant latency that could degrade the Quality of Experience (QoE) of certain services and applications hosted on the data center. In this paper, we propose a matching game algorithm ensuring a dynamic assignment of switches. Its main characteristic is to achieve a fair assignment while balancing the load generated by the data plane among all controllers resulting in a balanced response time. The numerical results show that our algorithm outperforms other existing state‐of‐the‐art algorithms in terms of minimizing controllers' response time. In a distributed SDN enabled data center environnment, we propose in this work a novel approach in order to lower the response time of controllers. A fair assignment of swiches is done resulting in a balanced response time among all controllers. The numerical results confirm the effectiveness of our algorithm.
... The incoming client request arrivals follow a negative exponential distribution (n.e.d.) [15]. To reason about the prediction performance for imbalanced client workloads, we consider two distributions for the rate parameter λ: (i) a balanced equal rate for each client / source replica with λ r0 = . . . ...
Preprint
Modern stateful web services and distributed SDN controllers rely on log replication to omit data loss in case of fail-stop failures. In single-leader execution, the leader replica is responsible for ordering log updates and the initiation of distributed commits, in order to guarantee log consistency. Network congestions, resource-heavy computation, and imbalanced resource allocations may, however, result in inappropriate leader election and increased cluster response times. We present SEER, a logically centralized approach to performance prediction and efficient leader election in leader-based consensus systems. SEER autonomously identifies the replica that minimizes the average cluster response time, using prediction models trained dynamically at runtime. To balance the exploration and exploitation, SEER explores replicas' performance and updates their prediction models only after detecting significant system changes. We evaluate SEER in a traffic management scenario comprising [3..7] Raft replicas, and well-known data-center and WAN topologies. Compared to the Raft's uniform leader election, SEER decreases the mean control plane response time by up to ~32%. The benefit comes at the expense of the minimal adaptation of Raft election procedure and a slight increase in leader reconfiguration frequency, the latter being tunable with a guaranteed upper bound. No safety properties of Raft are invalidated by SEER.
... Experimental results show that the proposed management structure can effectively realize load balancing and improve link utilization. However, the increasing complexity of the network scale means that the burden of the controller will be seriously aggravated in this method, thereby leading to an increasing computation time of the topology update [12,13]. ...
Article
Full-text available
A data center undertakes increasing background services of various applications, and the data flows transmitted between the nodes in data center networks (DCNs) are consequently increased. At the same time, the traffic of each link in a DCN changes dynamically over time. Flow scheduling algorithms can improve the distribution of data flows among the network links so as to improve the balance of link loads in a DCN. However, most current load balancing works achieve flow scheduling decisions to the current links on the basis of past link flow conditions. This situation impedes the existing link scheduling methods from implementing optimal decisions for scheduling data flows among the network links in a DCN. This paper proposes a predictive link load balance routing algorithm for a DCN based on residual networks (ResNet), i.e., the link load balance route (LLBR) algorithm. The LLBR algorithm predicts the occupancy of the network links in the next duty cycle, according to the ResNet architecture, and then the optimal traffic route is selected according to the predictive network environment. The LLBR algorithm, round-robin scheduling (RRS), and weighted round-robin scheduling (WRRS) are used in the same experimental environment. Experimental results show that compared with the WRRS and RRS, the LLBR algorithm can reduce the transmission time by approximately 50%, reduce the packet loss rate from 0.05% to 0.02%, and improve the bandwidth utilization by 30%.
... There may be more than of one SMA at the same time in the network but all the SMAs are independent and no intersection between established SMAs. Thus, an overloaded controller Cr combines the possible subdomains and creates a SMA where algorithm selects a switch from switch set in the overloaded area and switch migrate to the target controller Ck [49,50]. ...
Article
Full-text available
The biggest challenge for the network service providers is the day to day advancement of technologies which makes them difficult to manage the traditional networks. This day to day advancement has worked as a motivation to vendors for developing, deploying and migrating their services, installments of new hardware, trained people and up gradation of infrastructure which involves a huge cost and time. These frequent changes demand a new network architecture which supports future technologies and solves all these issues named as the proposal of networks defined by software. A large amount of data is being generated and through the internet, we interact with the world using our smart devices such as tablets, sensors, and smartphones using the concepts of Internet of Things (IoT). Along with continuous growth and development, there is a continuous heterogenous and ever-increasing demands of services. This leads to a cause of emerging challenge of load balancing of networks for meeting up with highly demanding requirements (e.g., high performance, lower latency, high throughput, and high availability) of IoT and 5G network applications. For meeting up highly increasing demands, various proposal of load balancing techniques comes forward, in which highly dedicated balancers of loads are being required for ever service in some of them, or for every new service, manual recognition of device is required. In the conventional network, on the basis of the local information in the network, load balancing is being established. However, the production of more optimized load balancers and a global view for the network is being contained by SDN controllers. So, these well-known techniques are quite time-consuming, expensive and impractical as well as service types aren’t being considered by various existing load balancing schemes. Through this paper, researchers focus on an SDN based load balancing (SBLB) service, in which minimized response time and maximized resource utilization are being considered for the user on cloud servers. The proposed scheme is being constituted by an application module which runs along with a SDN controller and server pools that connect to the controller through SDN enabled switches. The application module contains a dynamic load balancing module, a monitoring module and a service classification module. All messages are being handled in real time and host pool are being maintained by the Controller. The performance of the proposed scheme has been validated by experimental results. Through various experiments, results are being concluded that usage of SBLB results in significant decrease in average response and reply time.
... The main aim of load balancing [25][26][27][28] is to improve the throughput avoiding processing delays in optimal path selection to balance the load. The various factors influencing load balancing in DCN include the energy of nodes, residual bandwidth, scalability of the network, types of flows. ...
Article
Full-text available
Servers in data center networks handle heterogenous bulk loads. Load balancing, therefore, plays an important role in optimizing network bandwidth and minimizing response time. A complete knowledge of the current network status is needed to provide a stable load in the network. The process of network status catalog in a traditional network needs additional processing which increases complexity, whereas, in software defined networking, the control plane monitors the overall working of the network continuously. Hence it is decided to propose an efficient load balancing algorithm that adapts SDN. This paper proposes an efficient algorithm TA-ASLB-traffic-aware adaptive server load balancing to balance the flows to the servers in a data center network. It works based on two parameters, residual bandwidth, and server capacity. It detects the elephant flows and forwards them towards the optimal server where it can be processed quickly. It has been tested with the Mininet simulator and gave considerably better results compared to the existing server load balancing algorithms in the floodlight controller. After experimentation and analysis, it is understood that the method provides comparatively better results than the existing load balancing algorithms.
... The authors in [6] provide a solution that combines the dynamic switch-controller association and dynamic control devolution while maintaining a reasonable wait time in queues. This trade-off aims to reduce the communication cost between the data plane and the control plane and the computational cost spent by the switch to process its requests locally. ...
... They aim providing more resiliency by solving the mapping problem related to connections between control and data planes which influences the overall of the SDN network. The authors in [1] combine the dynamic switch-to-controller association and dynamic control devolution while preserving a reasonable wait time in queues. As a result, a reduced communication cost between data and control planes is achieved and the computational cost spent by the switch to process its requests locally is minimized. ...
Article
Full-text available
Software defined networking (SDN) gains a lot of interest from network operators due to its ability to offer flexibility, efficiency and fine-grained control over forwarding elements (FE) by decoupling control and data planes. In the control plane, a centralized node, denoted controller, receives requests from ingress switches and makes decisions on path forwarding. Unfortunately, requests processing may lead to controller performance degradation as the number of incoming requests goes up. This paper deals with the controller performance issue in Software Defined WAN (SD-WAN). Mainly, it proposes a new approach to optimize the switch-to-controller assignment problem with load balancing support. The issue is formulated as a Minimum Cost Bipartite Assignment optimization problem which is solved using an improvement of the Hungarian algorithm. The new algorithm is based on the introduction of the load-driven penalty concept which aims to achieve a trade-off between the round trip time and the controller load. Finally, a new protocol denoted Distributed Hungarian-based Assignment Protocol (DHAP) is described as an implementation of the proposed solution in multi-controller environments. As shown in results, the proposed solution outperforms parallel schemes in terms of flow setup time and load balancing.
Article
Full-text available
Decentralized orchestration of the control plane is critical to the scalability and reliability of software-defined network (SDN). However, existing orchestrations of SDN are either one-off or centralized, and would be inefficient the presence of temporal and spatial variations in traffic requests. In this paper, a fully distributed orchestration is proposed to minimize the time-average cost of SDN, adapting to the variations. This is achieved by stochastically optimizing the on-demand activation of controllers, adaptive association of controllers and switches, and real-time request processing and dispatching. The proposed approach is able to operate at multiple timescales for activation and association of controllers, and request processing and dispatching, thereby alleviating potential service interruptions caused by orchestration. A new analytic framework is developed to confirm the asymptotic optimality of the proposed approach in the presence of non-negligible signaling delays between controllers. Corroborated from extensive simulations, the proposed approach can save up to 73% the time-average operational cost of SDN, as compared to the existing static orchestration.
Conference Paper
Full-text available
Distributed Software Defined Networking (SDN) controllers aim to solve the issue of single-point-of-failure and improve the scalability of the control plane. Byzantine and faulty controllers, however, may enforce incorrect configurations and thus endanger the control plane correctness. Multiple Byzantine Fault Tolerance (BFT) approaches relying on Replicated State Machine (RSM) execution have been proposed in the past to cater for this issue. The scalability of such solutions is, however, limited. Additionally, the interplay between progressing the state of the distributed controllers and the consistency of the external reconfigurations of the forwarding devices has not been thoroughly investigated. In this work, we propose an agreement-and-execution group-based approach to increase the overall through-put of a BFT-enabled distributed SDN control plane. We adapt a proven sequencing-based BFT protocol, and introduce two optimized BFT protocols that preserve the uniform agreement, causality and liveness properties. A state-hashing approach which ensures causally ordered switch reconfigurations is proposed, that enables an opportunistic RSM execution without relying on strict sequencing. The proposed designs are implemented and validated for two realistic topologies, a path computation application and a set of KPIs: switch reconfiguration (response) time, signaling overhead, and acceptance rates. We show a clear decrease in the system response time and communication overhead with the proposed models, compared to a state-of-the-art approach.
Article
In a typical software defined network (SDN), switches report the Packet-In messages of newly arrived flows to the controllers. With more and more flows arriving at a network, the controller load significantly increases, which may lead to long (or even unacceptable) controller response time. Though previous solutions, such as dynamic controller assignment, help to reduce the controller response time, they still lead to various disadvantages, such as an unacceptable controller re-assignment delay, massive communication overhead between controllers, or poor routing performance. To address this issue, our intention is to trade data plane resource for better control plane performance. Specifically, we propose to pre-install wildcard entries for some aggregated flows to reduce controller response time, and perform dynamic routing for new-arrival flows to optimize the network performance. We define the problem of reducing controller response time with minimum data plane resource cost, and prove its NP-hardness. We then present an efficient algorithm based on randomized rounding, and analyze that our algorithm can achieve constant bicriteria approximation under most practical situations. Some practical issues are discussed to enhance our algorithm. We have implemented the proposed algorithm on a real SDN testbed. The experimental results and the extensive simulation results show that our method can reduce the controller response time by 47%, or improve the network throughput by 56% compared with the previous solutions, even with significant traffic dynamics.
Article
A distributed control plane is more scalable and robust in software defined networking. This paper focuses on controller load balancing using packet-in request redirection, that is, given the instantaneous state of the system, determining whether to redirect packet-in requests for each switch, such that the overall control plane response time (CPRT) is minimized. To address the above problem, we propose a framework based on Lyapunov optimization. First, we use the drift-plus-penalty algorithm to combine CPRT minimization problem with controller capacity constraints, and further derive a non-linear program, whose optimal solution is obtained with brute force using standard linearization techniques. Second, we present a greedy strategy to efficiently obtain a solution with a bounded approximation ratio. Third, we reformulate the program as a problem of maximizing a non-monotone submodular function subject to matroid constraints. We implement a controller prototype for packet-in request redirection, and conduct trace-driven simulations to validate our theoretical results. The results show that our algorithms can reduce the average CPRT by 81.6% compared to static assignment, and achieve a 3× improvement in maximum controller capacity violation ratio.
Article
For software-defined networking (SDN) systems, to enhance the scalability and reliability of control plane, existing solutions adopt either multi-controller design with static switch-controller association, or static control devolution by delegating certain request processing back to switches. Such solutions can fall short in face of temporal variations of request traffics, incurring considerable local computation costs on switches and their communication costs to controllers. So far, it still remains an open problem to develop a joint online scheme that conducts dynamic switch-controller association and dynamic control devolution. In addition, the fundamental benefits of predictive scheduling to SDN systems still remain unexplored. In this paper, we identify the non-trivial trade-off in such a joint design and formulate a stochastic network optimization problem which aims to minimize time-averaged total system costs and ensure long-term queue stability. By exploiting the unique problem structure, we devise a predictive online switch-controller association and control devolution (POSCAD) scheme, which solves the problem through a series of online distributed decision making. Theoretical analysis shows that without prediction, POSCAD can achieve near-optimal total system costs a tunable trade-off for queue stability. With prediction, POSCAD can achieve even better performance with shorter latencies. We conduct extensive simulations to evaluate POSCAD. Notably, with mild-value of future information, POSCAD incurs a significant reduction in request latencies, even when faced with prediction errors.
Article
In software-defined networking (SDN) systems, it is a common practice to adopt a multi-controller design and control devolution techniques to improve the performance of the control plane. However, in such systems the decision-making for joint switch-controller association and control devolution often involves various uncertainties, e.g., the temporal variations of controller accessibility, and computation and communication costs of switches. In practice, statistics of such uncertainties are unattainable and need to be learned in an online fashion, calling for an integrated design of learning and control. In this article, we formulate a stochastic network optimization problem that aims to minimize time-average system costs and ensure queue stability. By transforming the problem into a combinatorial multi-armed bandit problem with long-term stability constraints, we adopt bandit learning methods and optimal control techniques to handle the exploration-exploitation tradeoff and long-term stability constraints, respectively. Through an integrated design of online learning and online control, we propose an effective Learning-Aided Switch-Controller Association and Control Devolution ( LASAC ) scheme. Our theoretical analysis and simulation results show that LASAC achieves a tunable tradeoff between queue stability and system cost reduction with a sublinear time-averaged regret bound over a finite time horizon.
Article
Full-text available
With the expansion of the network and increasing their users as well as emerging new technologies such as cloud computing and big data, managing traditional networks is difficult. Therefore, it is necessary to change the traditional network architecture. Lately, to address this issue, a notion named Software Defined Network (SDN) has been proposed which makes network management more conformable. Due to limited network resources and to meet the requirements of Quality of Service (QoS), one of the points that must be considered is load balancing issue that serves to distribute data traffic among multiple resources in order to maximize the efficiency and reliability of network resources. Load balancing is established based on local information of the network in the conventional network. Hence, it is not very precise. However, SDN controllers have a global view of the network and can produce more optimized load balances. Although load balancing mechanisms are important in the SDN, to the best of our knowledge, there exists no precise and systematic review or survey on investigating these issues. Hence, this paper reviews the load balancing mechanisms which have been used in the SDN systematically based on two categories, deterministic and non-deterministic. Also, this study represents benefits and weaknesses regarded of the selected load balancing algorithms and investigates the metrics of their algorithms. Additionally, the important challenges of these algorithms have been reviewed, so better load balancing techniques can be applied by the researchers in the future.
Article
Full-text available
In software-defined networking (SDN), as data plane scale expands, scalability and reliability of the control plane has become major concerns. To mitigate such concerns, two kinds of solutions have been proposed separately. One is multi-controller architecture, i.e., a logically centralized control plane with physically distributed controllers. The other is control devolution, i.e., delegating control of some flows back to switches. Most of existing solutions adopt either static switch-controller association or static devolution, which may not adapt well to the traffic variation, leading to high communication costs between switches and controller, and high computation costs of switches. In this paper, we propose a novel scheme to jointly consider both solutions, i.e., we dynamically associate switches with controllers and dynamically devolve control of flows to switches. Our scheme is an efficient online algorithm that does not need the statistics of traffic flows. By adjusting some parameter V, we can make a trade-off between costs and queue backlogs. Theoretical analysis and extensive simulations show that our scheme yields much lower costs and latency compared to static schemes, and balanced loads among controllers.
Article
Full-text available
The advent of software defined networking enables flexible, reliable and feature-rich control planes for data center networks. However, the tight coupling of centralized control and complete visibility leads to a wide range of issues among which scalability has risen to prominence. To address this, we present LazyCtrl, a novel hybrid control plane design for data center networks where network control is carried out by distributed control mechanisms inside independent groups of switches while complemented with a global controller. Our design is motivated by the observation that data center traffic is usually highly skewed and thus edge switches can be grouped according to traffic locality. LazyCtrl aims at bringing laziness to the global controller by dynamically devolving most of the control tasks to independent switch groups to process frequent intra-group events near datapaths while handling rare inter-group or other specified events by the controller. We implement LazyCtrl and build a prototype based on Open vSwich and Floodlight. Trace-driven experiments on our prototype show that an effective switch grouping is easy to maintain in multi-tenant clouds and the central controller can be significantly shielded by staying lazy, with its workload reduced by up to 82%.
Article
Full-text available
Software Defined Networks (SDN) give network designers freedom to refactor the network control plane. One core benefit of SDN is that it enables the network control logic to be designed and operated on a global network view, as though it were a centralized application, rather than a distributed system - logically centralized. Regardless of this abstraction, control plane state and logic must inevitably be physically distributed to achieve responsiveness, reliability, and scalability goals. Consequently, we ask: "How does distributed SDN state impact the performance of a logically centralized control application?" Motivated by this question, we characterize the state exchange points in a distributed SDN control plane and identify two key state distribution trade-offs. We simulate these exchange points in the context of an existing SDN load balancer application. We evaluate the impact of inconsistent global network view on load balancer performance and compare different state management approaches. Our results suggest that SDN control state inconsistency significantly degrades performance of logically centralized control applications agnostic to the underlying state distribution.
Conference Paper
Full-text available
OpenFlow is a great concept, but its original design imposes excessive overheads. It can simplify network and traffic management in enterprise and data center environments, because it enables flow-level control over Ethernet switching and provides global visibility of the flows in the network. However, such fine-grained control and visibility comes with costs: the switch-implementation costs of involving the switch's control-plane too often and the distributed-system costs of involving the OpenFlow controller too frequently, both on flow setups and especially for statistics-gathering. In this paper, we analyze these overheads, and show that OpenFlow's current design cannot meet the needs of high-performance networks. We design and evaluate DevoFlow, a modification of the OpenFlow model which gently breaks the coupling between control and global visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs. We evaluate DevoFlow through simulations, and find that it can load-balance data center traffic as well as fine-grained solutions, without as much overhead: DevoFlow uses 10--53 times fewer flow table entries at an average switch, and uses 10--42 times fewer control messages.
Conference Paper
Full-text available
Computer networks lack a general control paradigm, as traditional networks do not provide any network-wide management abstractions. As a result, each new function (such as routing) must provide its own state distribution, element discovery, and failure recovery mechanisms. We believe this lack of a common control platform has significantly hindered the development of flexible, reliable and feature-rich network control planes. To address this, we present Onix, a platform on top of which a network control plane can be implemented as a distributed system. Control planes written within Onix operate on a global view of the network, and use basic state distribution primitives provided by the platform. Thus Onix provides a general API for control plane implementations, while allowing them to make their own trade-offs among consistency, durability, and scalability.
Article
Full-text available
This whitepaper proposes OpenFlow: a way for researchers to run experimental protocols in the networks they use ev- ery day. OpenFlow is based on an Ethernet switch, with an internal flow-table, and a standardized interface to add and remove flow entries. Our goal is to encourage network- ing vendors to add OpenFlow to their switch products for deployment in college campus backbones and wiring closets. We believe that OpenFlow is a pragmatic compromise: on one hand, it allows researchers to run experiments on hetero- geneous switches in a uniform way at line-rate and with high port-density; while on the other hand, vendors do not need to expose the internal workings of their switches. In addition to allowing researchers to evaluate their ideas in real-world traffic settings, OpenFlow could serve as a useful campus component in proposed large-scale testbeds like GENI. Two buildings at Stanford University will soon run OpenFlow networks, using commercial Ethernet switches and routers. We will work to encourage deployment at other schools; and We encourage you to consider deploying OpenFlow in your university network too.
Article
Several distributed SDN controller architectures have been proposed to mitigate the risks of overload and failure. However, since they statically assign switches to controller instances and store state in distributed data stores (which doubles flow setup latency), they hinder operators' ability to minimize both flow setup latency and controller resource consumption. To address this, we propose a novel approach for assigning SDN switches and partitions of SDN application state to distributed controller instances. We present a new way to partition SDN application state that considers the dependencies between application state and SDN switches. We then formally model the assignment problem as a variant of multi-dimensional bin packing and propose a practical heuristic to solve the problem with strict time constraints. Our preliminary evaluations show that our approach yields a 44% decrease in flow setup latency and a 42% reduction in controller operating costs.
Conference Paper
Cloud computing realises the vision of utility computing. Tenants can benefit from on-demand provisioning of computational resources according to a pay-per-use model and can outsource hardware purchases and maintenance. Tenants, however, have only limited ...
Conference Paper
Distributed controllers have been proposed for Software Defined Networking to address the issues of scalability and reliability that a centralized controller suffers from. One key limitation of the distributed controllers is that the mapping between a switch and a controller is statically configured, which may result in uneven load distribution among the controllers. To address this problem, we propose ElastiCon, an elastic distributed controller architecture in which the controller pool is dynamically grown or shrunk according to traffic conditions and the load is dynamically shifted across controllers. We propose a novel switch migration protocol for enabling such load shifting, which conforms with the Openflow standard. We also build a prototype to demonstrate the efficacy of our design.
Conference Paper
OpenFlow assumes a logically centralized controller, which ideally can be physically distributed. However, current deployments rely on a single controller which has major drawbacks including lack of scalability. We present HyperFlow, a distributed event-based control plane for OpenFlow. HyperFlow is logically centralized but physically distributed: it provides scalability while keeping the benefits of network control centralization. By passively synchronizing network-wide views of OpenFlow controllers, HyperFlow localizes decision making to individual controllers, thus minimizing the control plane response time to data plane requests. HyperFlow is resilient to network partitioning and component failures. It also enables interconnecting independently managed OpenFlow networks, an essential feature missing in current OpenFlow deployments. We have implemented HyperFlow as an application for NOX. Our implementation requires minimal changes to NOX, and allows reuse of existing NOX applications with minor modifications. Our preliminary evaluation shows that, assuming sufficient control bandwidth, to bound the window of inconsistency among controllers by a factor of the delay between the farthest controllers, the network changes must occur at a rate lower than 1000 events per second across the network.
Article
Limiting the overhead of frequent events on the control plane is essential for realizing a scalable Software-Defined Network. One way of limiting this overhead is to process frequent events in the data plane. This requires modifying switches and comes at the cost of visibility in the control plane. Taking an alternative route, we propose Kandoo, a framework for preserving scalability without changing switches. Kandoo has two layers of controllers: (i) the bottom layer is a group of controllers with no interconnection, and no knowledge of the network-wide state, and (ii) the top layer is a logically centralized controller that maintains the network-wide state. Controllers at the bottom layer run only local control applications (i.e., applications that can function using the state of a single switch) near datapaths. These controllers handle most of the frequent events and effectively shield the top layer. Kandoo's design enables network operators to replicate local controllers on demand and relieve the load on the top layer, which is the only potential bottleneck in terms of scalability. Our evaluations show that a network controlled by Kandoo has an order of magnitude lower control channel consumption compared to normal OpenFlow networks.
Conference Paper
Although there is tremendous interest in designing improved networks for data centers, very little is known about the network-level traffic characteristics of data centers today. In this paper, we conduct an empirical study of the network traffic in 10 data centers belonging to three different categories, including university, enterprise campus, and cloud data centers. Our definition of cloud data centers includes not only data centers employed by large online service providers offering Internet-facing applications but also data centers used to host data-intensive (MapReduce style) applications). We collect and analyze SNMP statistics, topology and packet-level traces. We examine the range of applications deployed in these data centers and their placement, the flow-level and packet-level transmission properties of these applications, and their impact on network and link utilizations, congestion and packet drops. We describe the implications of the observed traffic patterns for data center internal traffic engineering as well as for recently proposed architectures for data center networks.
Book
This text presents a modern theory of analysis, control, and optimization for dynamic networks. Mathematical techniques of Lyapunov drift and Lyapunov optimization are developed and shown to enable constrained optimization of time averages in general stochastic systems. The focus is on communication and queueing systems, including wireless networks with time-varying channels, mobility, and randomly arriving traffic. A simple drift-plus-penalty framework is used to optimize time averages such as throughput, throughput-utility, power, and distortion. Explicit performance-delay tradeoffs are provided to illustrate the cost of approaching optimality. This theory is also applicable to problems in operations research and economics, where energy-efficient and profit-maximizing decisions must be made without knowing the future. Topics in the text include the following: • Queue stability theory • Backpressure, max-weight, and virtual queue methods • Primal-dual methods for non-convex stochastic utility maximization • Universal scheduling theory for arbitrary sample paths • Approximate and randomized scheduling theory • Optimization of renewal systems and Markov decision systems Detailed examples and numerous problem set questions are provided to reinforce the main concepts.