Predictive Switch-Controller Association and Control Devolution for SDN Systems

  • Shenzhen Institute of Artificial Intelligence and Robotics for Society
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.


For software-defined networking (SDN) systems, to enhance the scalability and reliability of control plane, existing solutions adopt either multi-controller design with static switch-controller associations, or static control devolution by delegating certain request processing back to switches. Such solutions can fall short in face of temporal variations of request traffics, incurring considerable local computation costs on switches and their communication costs to controllers. So far, it still remains an open problem to develop a joint online scheme that conducts dynamic switch-controller association and dynamic control devolution. In addition, the fundamental benefits of predictive scheduling to SDN systems still remain unexplored. In this paper, we identify the non-trivial trade-off in such a joint design and formulate a stochastic network optimization problem that aims to minimize time-averaged total system costs and ensure long-term queue stability. By exploiting the unique problem structure, we devise a predictive online switch-controller association and control devolution (POSCAD) scheme, which solves the problem through a series of online distributed decision making. Theoretical analysis shows that without prediction, POSCAD can achieve near-optimal total system costs with a tunable trade-off for queue stability. With prediction, POSCAD can achieve even better performance with shorter latencies. We conduct extensive simulations to evaluate POSCAD. Notably, with mild-value of future information, POSCAD incurs a significant reduction in request latencies, even when faced with prediction errors.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Decentralized orchestration of the control plane is critical to the scalability and reliability of software-defined network (SDN). However, existing orchestrations of SDN are either one-off or centralized, and would be inefficient the presence of temporal and spatial variations in traffic requests. In this paper, a fully distributed orchestration is proposed to minimize the time-average cost of SDN, adapting to the variations. This is achieved by stochastically optimizing the on-demand activation of controllers, adaptive association of controllers and switches, and real-time request processing and dispatching. The proposed approach is able to operate at multiple timescales for activation and association of controllers, and request processing and dispatching, thereby alleviating potential service interruptions caused by orchestration. A new analytic framework is developed to confirm the asymptotic optimality of the proposed approach in the presence of non-negligible signaling delays between controllers. Corroborated from extensive simulations, the proposed approach can save up to 73% the time-average operational cost of SDN, as compared to the existing static orchestration.
Full-text available
As opposed to the decentralized control logic underpinning the devising of the Internet as a complex bundle of box-centric protocols and vertically-integrated solutions, the SDN paradigm advocates the separation of the control logic from hardware and its centralization in software-based controllers. These key tenets offer new opportunities to introduce innovative applications and incorporate automatic and adaptive control aspects, thereby easing network management and guaranteeing the user’s QoE. Despite the excitement, SDN adoption raises many challenges including the scalability and reliability issues of centralized designs that can be addressed with the physical decentralization of the control plane. However, such physically distributed, but logically centralized systems bring an additional set of challenges. This paper presents a survey on SDN with a special focus on the distributed SDN control. Besides reviewing the SDN concept and studying the SDN architecture as compared to the classical one, the main contribution of this survey is a detailed analysis of state-of-the-art distributed SDN controller platforms which assesses their advantages and drawbacks and classifies them in novel ways (physical and logical classifications) in order to provide useful guidelines for SDN research and deployment initiatives. A thorough discussion on the major challenges of distributed SDN control is also provided along with some insights into emerging and future trends in that area.
Conference Paper
Full-text available
Software Defined Networking (SDN) uses a logically centralized controller to replace the distributed control plane in a traditional network. One of the central challenges faced by the SDN paradigm is the scalability of the logical controller. As a network grows in size, the computational and communication demand faced by a controller may soon exceed the capabilities of a commodity server. In this work, we revisit the task division of labour between the controller and switches, and propose FOCUS, an architecture that offloads a specific subset of control functions, i.e., stable local functions, to the switches' software stack. We implemented a prototype of FOCUS and analyzed the benefits of converting several SDN applications. Due to space restrictions, we only present results for ARP, LLDP and elephant flow detection. Our initial results are promising and they show that FOCUS can reduce a controller's communication overhead by 50% to nearly 100%, and the computational overhead from 80% to 98%. Furthermore, we observe that FOCUS offloading to the switches saves switch CPU because FOCUS reduces the overheads for communication with the controller.
Conference Paper
Full-text available
An experimental setup of 32 honeypots reported 17M login attempts originating from 112 different countries and over 6000 distinct source IP addresses. Due to decoupled control and data plane, Software Defined Networks (SDN) can handle these increasing number of attacks by blocking those network connections at the switch level. However, the challenge lies in defining the set of rules on the SDN controller to block malicious network connections. Historical network attack data can be used to automatically identify and block the malicious connections. There are a few existing open-source software tools to monitor and limit the number of login attempts per source IP address one-by-one. However, these solutions cannot efficiently act against a chain of attacks that comprises multiple IP addresses used by each attacker. In this paper, we propose using machine learning algorithms, trained on historical network attack data, to identify the potential malicious connections and potential attack destinations. We use four widely-known machine learning algorithms: C4.5, Bayesian Network (BayesNet), Decision Table (DT), and Naive-Bayes to predict the host that will be attacked based on the historical data. Experimental results show that average prediction accuracy of 91.68% is attained using Bayesian Networks.
Full-text available
The explosive growth of global mobile traffic has lead to a rapid growth in the energy consumption in communication networks. In this paper, we focus on the energy-aware design of the network selection, subchannel, and power allocation in cellular and Wi-Fi networks, while taking into account the traffic delay of mobile users. The problem is particularly challenging due to the two-timescale operations for the network selection (large timescale) and subchannel and power allocation (small timescale). Based on the two-timescale Lyapunov optimization technique, we first design an online Energy-Aware Network Selection and Resource Allocation (ENSRA) algorithm. The ENSRA algorithm yields a power consumption within O(1/V) bound of the optimal value, and guarantees an O(V) traffic delay for any positive control parameter V. Motivated by the recent advancement in the accurate estimation and prediction of user mobility, channel conditions, and traffic demands, we further develop a novel predictive Lyapunov optimization technique to utilize the predictive information, and propose a Predictive Energy-Aware Network Selection and Resource Allocation (P-ENSRA) algorithm. We characterize the performance bounds of P-ENSRA in terms of the power-delay tradeoff theoretically. To reduce the computational complexity, we finally propose a Greedy Predictive Energy-Aware Network Selection and Resource Allocation (GP-ENSRA) algorithm, where the operator solves the problem in P-ENSRA approximately and iteratively. Numerical results show that GP-ENSRA significantly improves the power-delay performance over ENSRA in the large delay regime. For a wide range of system parameters, GP-ENSRA reduces the traffic delay over ENSRA by 20~30% under the same power consumption.
Conference Paper
Full-text available
Software-Defined Networking (SDN) is now envisioned for Wide Area Networks (WAN) and deployed constrained networks. Such networks require a resilient, scalable and easily extensible SDN control plane. In this paper, we propose DISCO, a DIstributed SDN COntrol plane able to cope with the distributed and heterogeneous nature of modern overlay networks and deployed networks. A DISCO controller manages its own network domain, communicates with other DISCO controllers to provide end-to-end network services and share aggregated network-wide information. This east-west communication is based on a lightweight and highly manageable control channel which can self-adapt to network conditions.
Full-text available
Software-Defined Networking (SDN) is an emerging paradigm that promises to change the state of affairs of current networks, by breaking vertical integration, separating the network's control logic from the underlying routers and switches, promoting (logical) centralization of network control, and introducing the ability to program the network. The separation of concerns introduced between the definition of network policies, their implementation in switching hardware, and the forwarding of traffic, is key to the desired flexibility: by breaking the network control problem into tractable pieces, SDN makes it easier to create and introduce new abstractions in networking, simplifying network management and facilitating network evolution. Today, SDN is both a hot research topic and a concept gaining wide acceptance in industry, which justifies the comprehensive survey presented in this paper. We start by introducing the motivation for SDN, explain its main concepts and how it differs from traditional networking. Next, we present the key building blocks of an SDN infrastructure using a bottom-up, layered approach. We provide an in-depth analysis of the hardware infrastructure, southbound and northbounds APIs, network virtualization layers, network operating systems, network programming languages, and management applications. We also look at cross-layer problems such as debugging and troubleshooting. In an effort to anticipate the future evolution of this new paradigm, we discuss the main ongoing research efforts and challenges of SDN. In particular, we address the design of switches and control platforms -- with a focus on aspects such as resiliency, scalability, performance, security and dependability -- as well as new opportunities for carrier transport networks and cloud providers. Last but not least, we analyze the position of SDN as a key enabler of a software-defined environment.
Conference Paper
Full-text available
The data center network is increasingly a cost, reliability and performance bottleneck for cloud computing. Although multi-tree topologies can provide scalable bandwidth and traditional routing algorithms can provide eventual fault tolerance, we argue that recovery speed can be dramatically improved through the co-design of the network topology, routing algorithm and failure detector. We create an engineered network and routing protocol that directly address the failure characteristics observed in data centers. At the core of our proposal is a novel network topology that has many of the same desirable properties as FatTrees, but with much better fault recovery properties. We then create a series of failover protocols that benefit from this topology and are designed to cascade and complement each other. The resulting system, F10, can almost instantaneously reestablish connectivity and load balance, even in the presence of multiple failures. Our results show that following network link and switch failures, F10 has less than 1/7th the packet loss of current schemes. A trace-driven evaluation of MapReduce performance shows that F10's lower packet loss yields a median application-level 30% speedup.
Conference Paper
Full-text available
OpenFlow is a great concept, but its original design imposes excessive overheads. It can simplify network and traffic management in enterprise and data center environments, because it enables flow-level control over Ethernet switching and provides global visibility of the flows in the network. However, such fine-grained control and visibility comes with costs: the switch-implementation costs of involving the switch's control-plane too often and the distributed-system costs of involving the OpenFlow controller too frequently, both on flow setups and especially for statistics-gathering. In this paper, we analyze these overheads, and show that OpenFlow's current design cannot meet the needs of high-performance networks. We design and evaluate DevoFlow, a modification of the OpenFlow model which gently breaks the coupling between control and global visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs. We evaluate DevoFlow through simulations, and find that it can load-balance data center traffic as well as fine-grained solutions, without as much overhead: DevoFlow uses 10--53 times fewer flow table entries at an average switch, and uses 10--42 times fewer control messages.
Conference Paper
Full-text available
Computer networks lack a general control paradigm, as traditional networks do not provide any network-wide management abstractions. As a result, each new function (such as routing) must provide its own state distribution, element discovery, and failure recovery mechanisms. We believe this lack of a common control platform has significantly hindered the development of flexible, reliable and feature-rich network control planes. To address this, we present Onix, a platform on top of which a network control plane can be implemented as a distributed system. Control planes written within Onix operate on a global view of the network, and use basic state distribution primitives provided by the platform. Thus Onix provides a general API for control plane implementations, while allowing them to make their own trade-offs among consistency, durability, and scalability.
Full-text available
This whitepaper proposes OpenFlow: a way for researchers to run experimental protocols in the networks they use ev- ery day. OpenFlow is based on an Ethernet switch, with an internal flow-table, and a standardized interface to add and remove flow entries. Our goal is to encourage network- ing vendors to add OpenFlow to their switch products for deployment in college campus backbones and wiring closets. We believe that OpenFlow is a pragmatic compromise: on one hand, it allows researchers to run experiments on hetero- geneous switches in a uniform way at line-rate and with high port-density; while on the other hand, vendors do not need to expose the internal workings of their switches. In addition to allowing researchers to evaluate their ideas in real-world traffic settings, OpenFlow could serve as a useful campus component in proposed large-scale testbeds like GENI. Two buildings at Stanford University will soon run OpenFlow networks, using commercial Ethernet switches and routers. We will work to encourage deployment at other schools; and We encourage you to consider deploying OpenFlow in your university network too.
The operations landscape today is more complex than ever. IT Ops teams have to fight an uphill battle managing the massive amounts of data that is being generated by modern IT systems. They are expected to handle more incidents than ever before with shorter service-level agreements (SLAs), respond to these incidents more quickly, and improve on key metrics, such as mean time to detect (MTTD), mean time to failure (MTTF), mean time between failures (MTBF), and mean time to repair (MTTR). This is not because of lack of tools. Digital enterprise journal research suggests that 41 percent of enterprises use ten or more tools for IT performance monitoring, and downtime can get expensive when companies lose a whopping $5.6 million per outage and MTTR averages 4.2 hours and wastes precious resources. With a hybrid multi-cloud, multi-tenant environment, organizations need even more tools to manage the multiple facets of capacity planning, resource utilization, storage management, anomaly detection, and threat detection and analysis, to name a few.
In recent years, with the rapid development of current Internet and mobile communication technologies, the infrastructure, devices and resources in networking systems are becoming more complex and heterogeneous. In order to efficiently organize, manage, maintain and optimize networking systems, more intelligence needs to be deployed. However, due to the inherently distributed feature of traditional networks, machine learning techniques are hard to be applied and deployed to control and operate networks. Software Defined Networking (SDN) brings us new chances to provide intelligence inside the networks. The capabilities of SDN (e.g., logically centralized control, global view of the network, software-based traffic analysis, and dynamic updating of forwarding rules) make it easier to apply machine learning techniques. In this paper, we provide a comprehensive survey on the literature involving machine learning algorithms applied to SDN. First, the related works and background knowledge are introduced. Then, we present an overview of machine learning algorithms. In addition, we review how machine learning algorithms are applied in the realm of SDN, from the perspective of traffic classification, routing optimization, Quality of Service (QoS)/Quality of Experience (QoE) prediction, resource management and security. Finally, challenges and broader perspectives are discussed.
Software defined networking is increasingly prevalent in data center networks for it enables centralized network configuration and management. However, since switches are statically assigned to controllers and controllers are statically provisioned, traffic dynamics may cause long response time and incur high maintenance cost. To address these issues, we formulate the dynamic controller assignment problem (DCAP) as an online optimization to minimize the total cost caused by response time and maintenance on the cluster of controllers. By applying the randomized fixed horizon control framework, we decompose DCAP into a series of stable matching problems with transfers, guaranteeing a small loss in competitive ratio. Since the matching problem is NP-hard, we propose a hierarchical two-phase algorithm that integrates key concepts from both matching theory and coalitional games to solve it efficiently. Theoretical analysis proves that our algorithm converges to a near-optimal Nash stable solution within tens of iterations. Extensive simulations show that our online approach reduces total cost by about 46%, and achieves better load balancing among controllers compared with static assignment.
In online service systems, the delay experienced by users from service request to service completion is one of the most critical performance metrics. To improve user delay experience, recent industrial practices suggest a modern system design mechanism: proactive serving, where the service system predicts future user requests and allocates its capacity to serve these upcoming requests proactively. This approach complements the conventional mechanism of capability boosting. In this paper, we propose queuing models for online service systems with proactive serving capability and characterize the user delay reduction by proactive serving. In particular, we show that proactive serving decreases average delay exponentially (as a function of the prediction window size) in the cases where service time follows light-tailed distributions. Furthermore, the exponential decrease in user delay is robust against prediction errors (in terms of miss detection and false alarm) and user demand fluctuation. Compared with the conventional mechanism of capability boosting, proactive serving is more effective in decreasing delay when the system is in the light-load regime. Our trace-driven evaluations demonstrate the practical power of proactive serving: for example, for the data trace of light-tailed YouTube videos, the average user delay decreases by 50% when the system predicts 60 s ahead. Our results provide, from a queuing-theoretical perspective, justifications for the practical application of proactive serving in online service systems.
Datacenter networks suffer unpredictable performance due to a lack of application level bandwidth guarantees. A lot of attention has been drawn to solve this problem such as how to provide bandwidth guarantees for virtualized machines (VMs), proportional bandwidth share among tenants, and high network utilization under peak traffic. However, existing solutions fail to cope with highly dynamic traffic in datacenter networks. In this paper, we propose eBA, an efficient solution to bandwidth allocation that provides end-to-end bandwidth guarantee for VMs under large numbers of short flows and massive bursty traffic in datacenters. eBA leverages a novel distributed VM-to-VM rate control algorithm based on the logistic model under the control-theoretic framework. eBA's implementation requires no changes to hardware or applications and can be deployed in standard protocol stack. The theoretical analysis and the experimental results show that eBA not only guarantees the bandwidth for VMs, but also provides fast convergence to efficiency and fairness, as well as smooth response to bursty traffic.
We consider online convex optimization (OCO) problems with switching costs and noisy predictions. While the design of online algorithms for OCO problems has received considerable attention, the design of algorithms in the context of noisy predictions is largely open. To this point, two promising algorithms have been proposed: Receding Horizon Control (RHC) and Averaging Fixed Horizon Control (AFHC). The comparison of these policies is largely open. AFHC has been shown to provide better worst-case performance, while RHC outperforms AFHC in many realistic settings. In this paper, we introduce a new class of policies, Committed Horizon Control (CHC), that generalizes both RHC and AFHC. We provide average-case analysis and concentration results for CHC policies, yielding the first analysis of RHC for OCO problems with noisy predictions. Further, we provide explicit results characterizing the optimal CHC policy as a function of properties of the prediction noise, e.g., variance and correlation structure. Our results provide a characterization of when AFHC outperforms RHC and vice versa, as well as when other CHC policies outperform both RHC and AFHC.
Big data, with their promise to discover valuable insights for better decision making, have recently attracted significant interest from both academia and industry. Voluminous data are generated from a variety of users and devices, and are to be stored and processed in powerful data centers. As such, there is a strong demand for building an unimpeded network infrastructure to gather geologically distributed and rapidly generated data, and move them to data centers for effective knowledge discovery. The express network should also be seamlessly extended to interconnect multiple data centers as well as interconnect the server nodes within a data center. In this article, we take a close look at the unique challenges in building such a network infrastructure for big data. Our study covers each and every segment in this network highway: the access networks that connect data sources, the Internet backbone that bridges them to remote data centers, as well as the dedicated network among data centers and within a data center. We also present two case studies of real-world big data applications that are empowered by networking, highlighting interesting and promising future research directions.
Conference Paper
Cloud computing realises the vision of utility computing. Tenants can benefit from on-demand provisioning of computational resources according to a pay-per-use model and can outsource hardware purchases and maintenance. Tenants, however, have only limited ...
Conference Paper
Distributed controllers have been proposed for Software Defined Networking to address the issues of scalability and reliability that a centralized controller suffers from. One key limitation of the distributed controllers is that the mapping between a switch and a controller is statically configured, which may result in uneven load distribution among the controllers. To address this problem, we propose ElastiCon, an elastic distributed controller architecture in which the controller pool is dynamically grown or shrunk according to traffic conditions and the load is dynamically shifted across controllers. We propose a novel switch migration protocol for enabling such load shifting, which conforms with the Openflow standard. We also build a prototype to demonstrate the efficacy of our design.
Conference Paper
OpenFlow assumes a logically centralized controller, which ideally can be physically distributed. However, current deployments rely on a single controller which has major drawbacks including lack of scalability. We present HyperFlow, a distributed event-based control plane for OpenFlow. HyperFlow is logically centralized but physically distributed: it provides scalability while keeping the benefits of network control centralization. By passively synchronizing network-wide views of OpenFlow controllers, HyperFlow localizes decision making to individual controllers, thus minimizing the control plane response time to data plane requests. HyperFlow is resilient to network partitioning and component failures. It also enables interconnecting independently managed OpenFlow networks, an essential feature missing in current OpenFlow deployments. We have implemented HyperFlow as an application for NOX. Our implementation requires minimal changes to NOX, and allows reuse of existing NOX applications with minor modifications. Our preliminary evaluation shows that, assuming sufficient control bandwidth, to bound the window of inconsistency among controllers by a factor of the delay between the farthest controllers, the network changes must occur at a rate lower than 1000 events per second across the network.
Conference Paper
Although there is tremendous interest in designing improved networks for data centers, very little is known about the network-level traffic characteristics of data centers today. In this paper, we conduct an empirical study of the network traffic in 10 data centers belonging to three different categories, including university, enterprise campus, and cloud data centers. Our definition of cloud data centers includes not only data centers employed by large online service providers offering Internet-facing applications but also data centers used to host data-intensive (MapReduce style) applications). We collect and analyze SNMP statistics, topology and packet-level traces. We examine the range of applications deployed in these data centers and their placement, the flow-level and packet-level transmission properties of these applications, and their impact on network and link utilizations, congestion and packet drops. We describe the implications of the observed traffic patterns for data center internal traffic engineering as well as for recently proposed architectures for data center networks.
Conference Paper
Today's data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Non-uniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.
Onos: towards an open, distributed sdn os
  • P Berde
  • M Gerola
  • J Hart
  • Y Higuchi
  • M Kobayashi
  • T Koide
  • B Lantz
  • B O'connor
  • P Radoslavov
  • W Snow
P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi, T. Koide, B. Lantz, B. O'Connor, P. Radoslavov, W. Snow et al., "Onos: towards an open, distributed sdn os," in Proceedings of ACM HotSDN, 2014.
Netflix adds download functionality
  • J Broughton
J. Broughton, "Netflix adds download functionality," https://technology., 2016.
Kandoo: a framework for efficient and scalable offloading of control applications
  • S Hassas Yeganeh
  • Y Ganjali
S. Hassas Yeganeh and Y. Ganjali, "Kandoo: a framework for efficient and scalable offloading of control applications," in Proceedings of ACM HotSDN, 2012.
B4: Experience with a globally-deployed software defined wan
  • S Jain
  • A Kumar
  • S Mandal
  • J Ong
  • L Poutievski
  • A Singh
  • S Venkata
  • J Wanderer
  • J Zhou
  • M Zhu
S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu et al., "B4: Experience with a globally-deployed software defined wan," ACM SIGCOMM Computer Communication Review, vol. 43, no. 4, pp. 3-14, 2013.
Pratyaastha: An efficient elastic distributed sdn control plane
  • A Krishnamurthy
  • S P Chandrabose
  • A Gember-Jacobson
A. Krishnamurthy, S. P. Chandrabose, and A. Gember-Jacobson, "Pratyaastha: An efficient elastic distributed sdn control plane," in Proceedings of ACM HotSDN, 2014.
Logically centralized? state distribution trade-offs in software defined networks
  • D Levin
  • A Wundsam
  • B Heller
  • N Handigol
  • A Feldmann
D. Levin, A. Wundsam, B. Heller, N. Handigol, and A. Feldmann, "Logically centralized? state distribution trade-offs in software defined networks," in Proceedings of ACM HotSDN, 2012.
Decentralizing sdn's control plane
  • M A Santos
  • B A Nunes
  • K Obraczka
  • T Turletti
  • B T De Oliveira
  • C B Margi
M. A. Santos, B. A. Nunes, K. Obraczka, T. Turletti, B. T. De Oliveira, and C. B. Margi, "Decentralizing sdn's control plane," in Proceedings of IEEE LCN, 2014.
Jellyfish: Networking data centers, randomly
  • A Singla
  • C.-Y Hong
  • L Popa
  • P B Godfrey
A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey, "Jellyfish: Networking data centers, randomly." in Proceedings of USENIX NSDI, 2012.