Preprint

Joint Switch-Controller Association and Control Devolution for SDN Systems: An Integration of Online Control and Online Learning

Authors:
  • Shenzhen Institute of Artificial Intelligence and Robotics for Society
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

In software-defined networking (SDN) systems, it is a common practice to adopt a multi-controller design and control devolution techniques to improve the performance of the control plane. However, in such systems, the decision-making for joint switch-controller association and control devolution often involves various uncertainties, e.g., the temporal variations of controller accessibility, and computation and communication costs of switches. In practice, statistics of such uncertainties are unattainable and need to be learned in an online fashion, calling for an integrated design of learning and control. In this paper, we formulate a stochastic network optimization problem that aims to minimize time-average system costs and ensure queue stability. By transforming the problem into a combinatorial multi-armed bandit problem with long-term stability constraints, we adopt bandit learning methods and optimal control techniques to handle the exploration-exploitation tradeoff and long-term stability constraints, respectively. Through an integrated design of online learning and online control, we propose an effective Learning-Aided Switch-Controller Association and Control Devolution (LASAC) scheme. Our theoretical analysis and simulation results show that LASAC achieves a tunable tradeoff between queue stability and system cost reduction with a sublinear time-averaged regret bound over a finite time horizon.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Article
Decentralized orchestration of the control plane is critical to the scalability and reliability of software-defined network (SDN). However, existing orchestrations of SDN are either one-off or centralized, and would be inefficient the presence of temporal and spatial variations in traffic requests. In this paper, a fully distributed orchestration is proposed to minimize the time-average cost of SDN, adapting to the variations. This is achieved by stochastically optimizing the on-demand activation of controllers, adaptive association of controllers and switches, and real-time request processing and dispatching. The proposed approach is able to operate at multiple timescales for activation and association of controllers, and request processing and dispatching, thereby alleviating potential service interruptions caused by orchestration. A new analytic framework is developed to confirm the asymptotic optimality of the proposed approach in the presence of non-negligible signaling delays between controllers. Corroborated from extensive simulations, the proposed approach can save up to 73% the time-average operational cost of SDN, as compared to the existing static orchestration.
Full-text available
Conference Paper
We consider a distributed Software Defined Networking (SDN) architecture adopting a cluster of multiple controllers to improve network performance and reliability. Differently from previous work, we focus on the control traffic exchanged among the controllers, in addition to the Openflow control traffic exchanged between controllers and switches. We develop an analytical model to estimate the reaction time perceived at the switches due to the inter-controller communications, based on the data-ownership model adopted in the cluster. We advocate a careful placement of the controllers, taking into account the two above kinds of control traffic. We evaluate, for some real ISP network topologies, the possible delay tradeoffs for the controllers placement problem.
Full-text available
Article
Softwarization of networks allows simplifying deployment , configuration and management of network functions. The driving force towards this evolution is represented by Software Defined Networking (SDN) that allows more flexible and dynamic network resource allocation and management. Efficient resource allocation and orchestration are two primary targets of this softwarization process; however, centralized methodologies result complex, and exhibit scalability issues. So, distributed solutions are to be preferred but, in order to be effective, should quickly converge towards equilibrium solutions. In this paper, we focus on making distributed resource allocation and orchestration a viable approach, and prove convergence of the relevant mechanisms. Specifically, we exploit game theory to model interactions between users requesting network functions and servers providing these functions. Accordingly, a two-stage Stackelberg game is presented where servers act as leaders of the game and users as followers. Servers have conflicting interests and try to maximize their utility; users, on the other hand, use a replicator behavior and try to imitate other users decisions to improve their benefit. The framework proves the existence and uniqueness of an equilibrium, and a learning mechanism to converge to such equilibrium is proposed. Numerical results show the effectiveness of the approach.
Full-text available
Article
In this paper, the problem of proactive deployment of cache-enabled unmanned aerial vehicles (UAVs) for optimizing the quality-of-experience (QoE) of wireless devices in a cloud radio access network (CRAN) is studied. In the considered model, the network can leverage human-centric information such as users' visited locations, requested contents, gender, job, and device type to predict the content request distribution and mobility pattern of each user. Then, given these behavior predictions, the proposed approach seeks to find the user-UAV associations, the optimal UAVs' locations, and the contents to cache at UAVs. This problem is formulated as an optimization problem whose goal is to maximize the users' QoE while minimizing the transmit power used by the UAVs. To solve this problem, a novel algorithm based on the machine learning framework of conceptor-based echo state networks (ESNs) is proposed. Using ESNs, the network can effectively predict each user's content request distribution and its mobility pattern when limited information on the states of users and the network is available. Based on the predictions of the users' content request distribution and their mobility patterns, we derive the optimal user-UAV association, optimal locations of the UAVs as well as the content to cache at UAVs. Simulation results using real pedestrian mobility patterns from BUPT and actual content transmission data from Youku show that the proposed algorithm can yield 40% and 61% gains, respectively, in terms of the average transmit power and the percentage of the users with satisfied QoE compared to a benchmark algorithm without caching and a benchmark solution without UAVs.
Full-text available
Conference Paper
Classical collaborative filtering, and content-based filtering methods try to learn a static recommendation model given training data. These approaches are far from ideal in highly dynamic recommendation domains such as news recommendation and computational advertisement, where the set of items and users is very fluid. In this work, we investigate an adaptive clustering technique for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings. Our algorithm takes into account the collaborative effects that arise due to the interaction of the users with the items, by dynamically grouping users based on the items under consideration and, at the same time, grouping items based on the similarity of the clusterings induced over the users. The resulting algorithm thus takes advantage of preference patterns in the data in a way akin to collaborative filtering methods. We provide an empirical analysis on medium-size real-world datasets, showing scalability and increased prediction performance (as measured by click-through rate) over state-of-the-art methods for clustering bandits. We also provide a regret analysis within a standard linear stochastic noise setting.
Full-text available
Article
The IMT 2020 requirements of 20 Gbps peak data rate and 1 millisecond latency present significant engineering challenges for the design of 5G cellular systems. Use of the millimeter wave (mmWave) bands above 10 GHz --- where vast quantities of spectrum are available --- is a promising 5G candidate that may be able to rise to the occasion. However, while the mmWave bands can support massive peak data rates, delivering these data rates on end-to-end service while maintaining reliability and ultra-low latency performance will require rethinking all layers of the protocol stack. This papers surveys some of the challenges and possible solutions for delivering end-to-end, reliable, ultra-low latency services in mmWave cellular systems in terms of the Medium Access Control (MAC) layer, congestion control and core network architecture.
Full-text available
Article
Software-Defined Networking (SDN) is an emerging paradigm that promises to change the state of affairs of current networks, by breaking vertical integration, separating the network's control logic from the underlying routers and switches, promoting (logical) centralization of network control, and introducing the ability to program the network. The separation of concerns introduced between the definition of network policies, their implementation in switching hardware, and the forwarding of traffic, is key to the desired flexibility: by breaking the network control problem into tractable pieces, SDN makes it easier to create and introduce new abstractions in networking, simplifying network management and facilitating network evolution. Today, SDN is both a hot research topic and a concept gaining wide acceptance in industry, which justifies the comprehensive survey presented in this paper. We start by introducing the motivation for SDN, explain its main concepts and how it differs from traditional networking. Next, we present the key building blocks of an SDN infrastructure using a bottom-up, layered approach. We provide an in-depth analysis of the hardware infrastructure, southbound and northbounds APIs, network virtualization layers, network operating systems, network programming languages, and management applications. We also look at cross-layer problems such as debugging and troubleshooting. In an effort to anticipate the future evolution of this new paradigm, we discuss the main ongoing research efforts and challenges of SDN. In particular, we address the design of switches and control platforms -- with a focus on aspects such as resiliency, scalability, performance, security and dependability -- as well as new opportunities for carrier transport networks and cloud providers. Last but not least, we analyze the position of SDN as a key enabler of a software-defined environment.
Full-text available
Article
Network architectures such as Software-Defined Networks (SDNs) move the control logic off packet processing devices and onto external controllers. These network architectures with decoupled control planes open many unanswered questions regarding reliability, scalability, and performance when compared to more traditional purely distributed systems. This paper opens the investigation by focusing on two specific questions: given a topology, how many controllers are needed, and where should they go? To answer these questions, we examine fundamental limits to control plane propagation latency on an upcoming Internet2 production deployment, then expand our scope to over 100 publicly available WAN topologies. As expected, the answers depend on the topology. More surprisingly, one controller location is often sufficient to meet existing reaction-time requirements (though certainly not fault tolerance requirements).
Full-text available
Conference Paper
OpenFlow is a great concept, but its original design imposes excessive overheads. It can simplify network and traffic management in enterprise and data center environments, because it enables flow-level control over Ethernet switching and provides global visibility of the flows in the network. However, such fine-grained control and visibility comes with costs: the switch-implementation costs of involving the switch's control-plane too often and the distributed-system costs of involving the OpenFlow controller too frequently, both on flow setups and especially for statistics-gathering. In this paper, we analyze these overheads, and show that OpenFlow's current design cannot meet the needs of high-performance networks. We design and evaluate DevoFlow, a modification of the OpenFlow model which gently breaks the coupling between control and global visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs. We evaluate DevoFlow through simulations, and find that it can load-balance data center traffic as well as fine-grained solutions, without as much overhead: DevoFlow uses 10--53 times fewer flow table entries at an average switch, and uses 10--42 times fewer control messages.
Full-text available
Conference Paper
Computer networks lack a general control paradigm, as traditional networks do not provide any network-wide management abstractions. As a result, each new function (such as routing) must provide its own state distribution, element discovery, and failure recovery mechanisms. We believe this lack of a common control platform has significantly hindered the development of flexible, reliable and feature-rich network control planes. To address this, we present Onix, a platform on top of which a network control plane can be implemented as a distributed system. Control planes written within Onix operate on a global view of the network, and use basic state distribution primitives provided by the platform. Thus Onix provides a general API for control plane implementations, while allowing them to make their own trade-offs among consistency, durability, and scalability.
Full-text available
Article
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Article
In recent years, Software Defined Networking (SDN) has emerged as a pivotal element not only in data-centers and wide-area networks, but also in next generation networking architectures such as Vehicular ad hoc network and 5G. SDN is characterized by decoupled data and control planes, and logically centralized control plane. The centralized control plane in SDN offers several opportunities as well as challenges. A key design choice of the SDN control plane is placement of the controller(s), which impacts a wide range of network issues ranging from latency to resiliency, from energy efficiency to load balancing, and so on. In this paper, we present a comprehensive survey on the controller placement problem (CPP) in SDN. We introduce the CPP in SDN and highlight its significance. We present the classical CPP formulation along with its supporting system model. We also discuss a wide range of the CPP modeling choices and associated metrics. We classify the CPP literature based on the objectives and methodologies. Apart from the primary use-cases of the CPP in data-center networks and wide area networks, we also examine the recent application of the CPP in several new domains such as mobile/cellular networks, 5G, named data networks, wireless mesh networks and VANETs. We conclude our survey with discussion on open issues and future scope of this topic.
Conference Paper
In software-defined networking (SDN) systems, the scalability and reliability of the control plane still remain as major concerns. Existing solutions adopt either multi-controller designs or control devolution back to the data plane. The former requires a flexible yet efficient switch-controller association mechanism to adapt to workload changes and potential failures, while the latter demands timely decision making with low overheads. The integrate design for both is even more challenging. Meanwhile, the dramatic advancement in machine learning techniques has boosted the practice of predictive scheduling to improve the responsiveness in various systems. Nonetheless, so far little work has been conducted for SDN systems. In this paper, we study the joint problem of dynamic switch-controller association and control devolution, while investigating the benefits of predictive scheduling in SDN systems. We propose POSCAD, an efficient, online, and distributed scheme that exploits predictive future information to minimize the total system cost and the average request response time with queueing stability guarantee. Theoretical analysis and trace-driven simulation results show that POSCAD requires only mild-value of future information to achieve a near-optimal system cost and near-zero average request response time. Further, POSCAD is robust against mis-prediction to reduce the average request response time.
Article
In recent years, with the rapid development of current Internet and mobile communication technologies, the infrastructure, devices and resources in networking systems are becoming more complex and heterogeneous. In order to efficiently organize, manage, maintain and optimize networking systems, more intelligence needs to be deployed. However, due to the inherently distributed feature of traditional networks, machine learning techniques are hard to be applied and deployed to control and operate networks. Software Defined Networking (SDN) brings us new chances to provide intelligence inside the networks. The capabilities of SDN (e.g., logically centralized control, global view of the network, software-based traffic analysis, and dynamic updating of forwarding rules) make it easier to apply machine learning techniques. In this paper, we provide a comprehensive survey on the literature involving machine learning algorithms applied to SDN. First, the related works and background knowledge are introduced. Then, we present an overview of machine learning algorithms. In addition, we review how machine learning algorithms are applied in the realm of SDN, from the perspective of traffic classification, routing optimization, Quality of Service (QoS)/Quality of Experience (QoE) prediction, resource management and security. Finally, challenges and broader perspectives are discussed.
Article
Software Defined Networking shifts the control plane of forwarding devices to one or more external entities known as controllers. Determining the optimal location of controllers in the network and the assignment of switches to them is widely known as controller placement problem. In case of controller failures, the switches are disconnected from the controller until they are reassigned to other active controllers with enough spare capacity. However, there is a significant upsurge in the worst case latency after the reassignment due to lack of planning for controller failures. In this paper, we propose a controller placement strategy that not only considers reliability and capacity of controllers but also plans ahead for controller failures to avoid repeated administrative intervention, drastic increase in latency and disconnections. It is formulated as a mixed integer linear program (MILP). The objective is to minimize the maximum, for all switches, of the sum of the latency from the switch to the nearest controller with enough capacity (first reference controller) and the latency from the first reference controller to its closest controller with enough capacity (second reference controller). We also proposed a generalized model which can be used to minimize the average latency and extended it for multiple controller failures. Furthermore, we presented a simulated annealing heuristic that efficiently solves the problem on large scale networks. The proposed formulation and heuristic are evaluated on various networks from the Internet Topology Zoo. Simulation results show that our proposed method performs better than the controller placement that does not plan ahead for failures.
Article
Software defined networking is increasingly prevalent in data center networks for it enables centralized network configuration and management. However, since switches are statically assigned to controllers and controllers are statically provisioned, traffic dynamics may cause long response time and incur high maintenance cost. To address these issues, we formulate the dynamic controller assignment problem (DCAP) as an online optimization to minimize the total cost caused by response time and maintenance on the cluster of controllers. By applying the randomized fixed horizon control framework, we decompose DCAP into a series of stable matching problems with transfers, guaranteeing a small loss in competitive ratio. Since the matching problem is NP-hard, we propose a hierarchical two-phase algorithm that integrates key concepts from both matching theory and coalitional games to solve it efficiently. Theoretical analysis proves that our algorithm converges to a near-optimal Nash stable solution within tens of iterations. Extensive simulations show that our online approach reduces total cost by about 46%, and achieves better load balancing among controllers compared with static assignment.
Conference Paper
In this paper, a QoS-aware traffic classification framework for software defined networks is proposed. Instead of identifying specific applications in most of the previous work of traffic classification, our approach classifies the network traffic into different classes according to the QoS requirements, which provide the crucial information to enable the fine-grained and QoS-aware traffic engineering. The proposed framework is fully located in the network controller so that the real-time, adaptive, and accurate traffic classification can be realized by exploiting the superior computation capacity, the global visibility, andthe inherent programmability of the network controller. More specifically, the proposed framework jointly exploits deep packet inspection (DPI) and semi-supervised machine learning so that accurate traffic classification can be realized, while requiring minimal communications between the network controller and the SDN switches. Based on the real Internet data set, the simulation results show the proposed classification framework can provide good performance in terms of classification accuracy and communication costs
Conference Paper
Software-defined networks (SDNs) have been recognized as the next-generation networking paradigm that decouples the data forwarding from the centralized control. To realize the merits of dedicated QoS provisioning and fast route (re-)configuration services over the decoupled SDNs, various QoS requirements in packet delay, loss, and throughput should be supported by an efficient transportation with respect to each specific application. In this paper, a QoS-aware adaptive routing (QAR) is proposed in the designed multi-layer hierarchical SDNs. Specifically, the distributed hierarchical control plane architecture is employed to minimize signaling delay in large SDNs via three-levels design of controllers, i.e., the super, domain (or master), and slave controllers. Furthermore, QAR algorithm is proposed with the aid of reinforcement learning and QoSaware reward function, achieving a time-efficient, adaptive, QoSprovisioning packet forwarding. Simulation results confirm that QAR outperforms the existing learning solution and provides fast convergence with QoS provisioning, facilitating the practical implementations in large-scale software service-defined networks
Conference Paper
Software-Defined Networking (SDN) allows control applications to install fine-grained forwarding policies in the underlying switches. While Ternary Content Addressable Memory (TCAM) enables fast lookups in hardware switches with flexible wildcard rule patterns, the cost and power requirements limit the number of rules the switches can support. To make matters worse, these hardware switches cannot sustain a high rate of updates to the rule table. In this paper, we show how to give applications the illusion of high-speed forwarding, large rule tables, and fast updates by combining the best of hardware and software processing. Our CacheFlow system "caches" the most popular rules in the small TCAM, while relying on software to handle the small amount of "cache miss" traffic. However, we cannot blindly apply existing cache-replacement algorithms, because of dependencies between rules with overlapping patterns. Rather than cache large chains of dependent rules, we "splice" long dependency chains to cache smaller groups of rules while preserving the semantics of the policy. Experiments with our CacheFlow prototype---on both real and synthetic workloads and policies---demonstrate that rule splicing makes effective use of limited TCAM space, while adapting quickly to changes in the policy and the traffic demands.
Conference Paper
Distributed controllers have been proposed for Software Defined Networking to address the issues of scalability and reliability that a centralized controller suffers from. One key limitation of the distributed controllers is that the mapping between a switch and a controller is statically configured, which may result in uneven load distribution among the controllers. To address this problem, we propose ElastiCon, an elastic distributed controller architecture in which the controller pool is dynamically grown or shrunk according to traffic conditions and the load is dynamically shifted across controllers. We propose a novel switch migration protocol for enabling such load shifting, which conforms with the Openflow standard. We also build a prototype to demonstrate the efficacy of our design.
Conference Paper
OpenFlow assumes a logically centralized controller, which ideally can be physically distributed. However, current deployments rely on a single controller which has major drawbacks including lack of scalability. We present HyperFlow, a distributed event-based control plane for OpenFlow. HyperFlow is logically centralized but physically distributed: it provides scalability while keeping the benefits of network control centralization. By passively synchronizing network-wide views of OpenFlow controllers, HyperFlow localizes decision making to individual controllers, thus minimizing the control plane response time to data plane requests. HyperFlow is resilient to network partitioning and component failures. It also enables interconnecting independently managed OpenFlow networks, an essential feature missing in current OpenFlow deployments. We have implemented HyperFlow as an application for NOX. Our implementation requires minimal changes to NOX, and allows reuse of existing NOX applications with minor modifications. Our preliminary evaluation shows that, assuming sufficient control bandwidth, to bound the window of inconsistency among controllers by a factor of the delay between the farthest controllers, the network changes must occur at a rate lower than 1000 events per second across the network.
Conference Paper
Although there is tremendous interest in designing improved networks for data centers, very little is known about the network-level traffic characteristics of data centers today. In this paper, we conduct an empirical study of the network traffic in 10 data centers belonging to three different categories, including university, enterprise campus, and cloud data centers. Our definition of cloud data centers includes not only data centers employed by large online service providers offering Internet-facing applications but also data centers used to host data-intensive (MapReduce style) applications). We collect and analyze SNMP statistics, topology and packet-level traces. We examine the range of applications deployed in these data centers and their placement, the flow-level and packet-level transmission properties of these applications, and their impact on network and link utilizations, congestion and packet drops. We describe the implications of the observed traffic patterns for data center internal traffic engineering as well as for recently proposed architectures for data center networks.
Scalable routing in sdn-enabled networks with consolidated middleboxes
  • A Gushchin
  • A Walid
  • A Tang
A. Gushchin, A. Walid, and A. Tang, "Scalable routing in sdn-enabled networks with consolidated middleboxes," in Proceedings of ACM SIGCOMM HotMiddlebox, 2015.
Pratyaastha: An efficient elastic distributed sdn control plane
  • A Krishnamurthy
  • S P Chandrabose
  • A Gember-Jacobson
A. Krishnamurthy, S. P. Chandrabose, and A. Gember-Jacobson, "Pratyaastha: An efficient elastic distributed sdn control plane," in Proceedings of ACM HotSDN, 2014.
Logically centralized?: state distribution trade-offs in software defined networks
  • D Levin
  • A Wundsam
  • B Heller
  • N Handigol
  • A Feldmann
D. Levin, A. Wundsam, B. Heller, N. Handigol, and A. Feldmann, "Logically centralized?: state distribution trade-offs in software defined networks," in Proceedings of ACM HotSDN, 2012.
Kandoo: a framework for efficient and scalable offloading of control applications
  • S Hassas Yeganeh
  • Y Ganjali
S. Hassas Yeganeh and Y. Ganjali, "Kandoo: a framework for efficient and scalable offloading of control applications," in Proceedings of ACM HotSDN, 2012.
On placement of hypervisors and controllers in virtualized software defined network
--, "On placement of hypervisors and controllers in virtualized software defined network," IEEE Transactions on Network and Service Management, vol. 15, no. 2, pp. 840-853, 2018.
A survey on practical applications of multiarmed and contextual bandits
  • D Bouneffouf
  • I Rish
D. Bouneffouf and I. Rish, "A survey on practical applications of multiarmed and contextual bandits," preprint arXiv:1904.10040, 2019.
Rama: Controller fault tolerance in softwaredefined networking made practical
  • A Mantas
  • F Ramos
A. Mantas and F. Ramos, "Rama: Controller fault tolerance in softwaredefined networking made practical," preprint arXiv:1902.01669, 2019.
Sample mean based index policies by o(log n) regret for the multi-armed bandit problem
R. Agrawal, "Sample mean based index policies by o(log n) regret for the multi-armed bandit problem," Advances in Applied Probability, vol. 27, no. 4, pp. 1054-1078, 1995.
OpenFlow specification
  • Onf
ONF, "OpenFlow specification," available at: "https://www.open networking.org/software-defined-standards/specifications/".