Preprint

Online VNF Chaining and Predictive Scheduling: Optimality and Trade-offs

Authors:
  • Shenzhen Institute of Artificial Intelligence and Robotics for Society
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

For NFV systems, the key design space includes the function chaining for network requests and resource scheduling for servers. The problem is challenging since NFV systems usually require multiple (often conflicting) design objectives and the computational efficiency of real-time decision making with limited information. Furthermore, the benefits of predictive scheduling to NFV systems still remain unexplored. In this paper, we propose POSCARS, an efficient predictive and online service chaining and resource scheduling scheme that achieves tunable trade-offs among various system metrics with queue stability guarantee. Through a careful choice of granularity in system modeling, we acquire a better understanding of the trade-offs in our design space. By a non-trivial transformation, we decouple the complex optimization problem into a series of online sub-problems to achieve the optimality with only limited information. By employing randomized load balancing techniques, we propose three variants of POSCARS to reduce the overheads of decision making. Theoretical analysis and simulations show that POSCARS and its variants require only mild-value of future information to achieve near-optimal system cost with an ultra-low request response time.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
With the evolution of network function virtualization (NFV), diverse network services can be flexibly offered as service function chains (SFCs) consisted of different virtual network functions (VNFs). However, network state and traffic typically exhibit unpredictable variations due to stochastically arriving requests with different quality of service (QoS) requirements. Thus, an adaptive online SFC deployment approach is needed to handle the real-time network variations and various service requests. In this paper, we firstly introduce a Markov decision process (MDP) model to capture the dynamic network state transitions. In order to jointly minimize the operation cost of NFV providers and maximize the total throughput of requests, we propose NFVdeep, an adaptive, online, deep reinforcement learning approach to automatically deploy SFCs for requests with different QoS requirements. Specifically, we use a serialization-and-backtracking method to effectively deal with large discrete action space. We also adopt a policy gradient based method to improve the training efficiency and convergence to optimality. Extensive experimental results demonstrate that NFVdeep converges fast in the training process and responds rapidly to arriving requests especially in large, frequently transferred network state space. Consequently, NFVdeep surpasses the state-of-the-art methods by 32.59% higher accepted throughput and 33.29% lower operation cost on average.
Conference Paper
Full-text available
In this paper we present Metron, a Network Functions Virtualization (NFV) platform that achieves high resource utilization by jointly exploiting the underlying network and commodity servers' resources. This synergy allows Metron to: (i) offload part of the packet processing logic to the network, (ii) use smart tagging to setup and exploit the affinity of traffic classes, and (iii) use tag-based hardware dispatching to carry out the remaining packet processing at the speed of the servers' fastest cache(s), with zero inter-core communication. Metron also introduces a novel resource allocation scheme that minimizes the resource allocation overhead for large-scale NFV deployments. With commodity hardware assistance, Metron deeply inspects traffic at 40 Gbps and realizes stateful network functions at the speed of a 100 GbE network card on a single server. Metron has 2.75-6.5x better efficiency than OpenBox, a state of the art NFV system, while ensuring key requirements such as elasticity, fine-grained load balancing, and flexible traffic steering.
Article
Full-text available
The Network Function Virtualization (NFV) paradigm has gained increasing interest in both academia and industry as it promises scalable and flexible network management and orchestration. In NFV networks, network services are provided as chains of different Virtual Network Functions (VNFs), which are instantiated and executed on dedicated VNF-compliant servers. The problem of composing those chains is referred to as the Service Chain Composition problem. In contrast to centralized solutions that suffer from scalability and privacy issues, in this paper we leverage non-cooperative game theory to achieve a low-complexity distributed solution to the above problem. Specifically, to account for selfish and competitive behavior of users, we formulate the service chain composition problem as an atomic weighted congestion game with unsplittable flows and player-specific cost functions. We show that the game possesses a weighted potential function and admits a Nash Equilibrium (NE). We prove that the price of anarchy (PoA) is upper-bounded, and also propose a distributed and privacy-preserving algorithm which provably converges towards a NE of the game in polynomial time. Finally, through extensive numerical results, we assess the performance of the proposed distributed solution to the service chain composition problem.
Article
Full-text available
Network function virtualization (NFV) has already been a new paradigm for network architectures. By migrating network functions (NF) from dedicated hardware to virtualization platform, NFV can effectively improve the flexibility to deploy and manage service function chains (SFC). However, resource allocation for requested SFC in NFV-based infrastructures is not trivial as it mainly consists of three phases: virtual network functions (VNF) chain composition, VNFs forwarding graph embedding and VNFs scheduling. The decision of these three phases can be mutually dependent, which also makes it a tough task. Therefore, a coordinated approach is studied in this paper to jointly optimize NFV resource allocation in these three phases. We apply a general cost model to consider both network costs and service performance. The coordinate NFV-RA is formulated as a mixed-integer linear programming (MILP), and a heuristic based algorithm (JoraNFV) is proposed to get the near optimal solution. To make the coordinated NFV-RA more tractable, JoraNFV is divided into two sub-algorithms, one-hop optimal traffic scheduling and a multi-path greedy algorithm for VNF chain composition and VNF forwarding graph embedding. Lastly, extensive simulations are performed to evaluate the performance of JoraNFV, results have shown that JoraNFV can get a solution within 1.25 times of the optimal solution with reasonable execution time, which indicates that JoraNFV can be used for on-line NFV planning.
Conference Paper
Full-text available
Network Functions Virtualization (NFV) is incrementally deployed by Internet Service Providers (ISPs) in their carrier networks, by means of Virtual Network Function (VNF) chains, to address customers' demands. The motivation is the increasing manageability, reliability and performance of NFV systems, the gains in energy and space granted by virtualization, at a cost that becomes competitive with respect to legacy physical network function nodes. From a network optimization perspective, the routing of VNF chains across a carrier network implies key novelties making the VNF chain routing problem unique with respect to the state of the art: the bitrate of each demand flow can change along a VNF chain, the VNF processing latency and computing load can be a function of the demands traffic, VNFs can be shared among demands, etc. In this paper, we provide an NFV network model suitable for ISP operations. We define the generic VNF chain routing optimization problem and devise a mixed integer linear programming formulation. By extensive simulation on realistic ISP topologies, we draw conclusions on the trade-offs achievable between legacy Traffic Engineering (TE) ISP goals and novel combined TE-NFV goals.
Conference Paper
Full-text available
An experimental setup of 32 honeypots reported 17M login attempts originating from 112 different countries and over 6000 distinct source IP addresses. Due to decoupled control and data plane, Software Defined Networks (SDN) can handle these increasing number of attacks by blocking those network connections at the switch level. However, the challenge lies in defining the set of rules on the SDN controller to block malicious network connections. Historical network attack data can be used to automatically identify and block the malicious connections. There are a few existing open-source software tools to monitor and limit the number of login attempts per source IP address one-by-one. However, these solutions cannot efficiently act against a chain of attacks that comprises multiple IP addresses used by each attacker. In this paper, we propose using machine learning algorithms, trained on historical network attack data, to identify the potential malicious connections and potential attack destinations. We use four widely-known machine learning algorithms: C4.5, Bayesian Network (BayesNet), Decision Table (DT), and Naive-Bayes to predict the host that will be attacked based on the historical data. Experimental results show that average prediction accuracy of 91.68% is attained using Bayesian Networks.
Article
Full-text available
Network functions virtualization (NFV) is a new network architecture framework where network function that traditionally used dedicated hardware (middleboxes or network appliances) are now implemented in software that runs on top of general purpose hardware such as high volume server. NFV emerges as an initiative from the industry (network operators, carriers, and manufacturers) in order to increase the deployment flexibility and integration of new network services with increased agility within operator's networks and to obtain significant reductions in operating expenditures and capital expenditures. NFV promotes virtualizing network functions such as transcoders, firewalls, and load balancers, among others, which were carried out by specialized hardware devices and migrating them to software-based appliances. One of the main challenges for the deployment of NFV is the resource allocation of demanded network services in NFV-based network infrastructures. This challenge has been called the NFV resource allocation (NFV-RA) problem. This paper presents a comprehensive state of the art of NFV-RA by introducing a novel classification of the main approaches that pose solutions to solve it. This paper also presents the research challenges that are still subject of future investigation in the NFV-RA realm.
Article
Full-text available
Network Function Virtualization (NFV) has drawn significant attention from both industry and academia as an important shift in telecommunication service provisioning. By decoupling Network Functions (NFs) from the physical devices on which they run, NFV has the potential to lead to significant reductions in Operating Expenses (OPEX) and Capital Expenses (CAPEX) and facilitate the deployment of new services with increased agility and faster time-to-value. The NFV paradigm is still in its infancy and there is a large spectrum of opportunities for the research community to develop new architectures, systems and applications, and to evaluate alternatives and trade-offs in developing technologies for its successful deployment. In this paper, after discussing NFV and its relationship with complementary fields of Software Defined Networking (SDN) and cloud computing, we survey the state-of-the-art in NFV, and identify promising research directions in this area. We also overview key NFV projects, standardization efforts, early implementations, use cases and commercial products.
Conference Paper
Full-text available
The virtualization and softwarization of modern computer networks enables the definition and fast deployment of novel network services called service chains: sequences of virtualized network functions (e.g., firewalls, caches, traffic optimizers) through which traffic is routed between source and destination. This paper attends to the problem of admitting and embedding a maximum number of service chains, i.e., a maximum number of source-destination pairs which are routed via a sequence of to-be-allocated, capacitated network functions. We consider an Online variant of this maximum Service Chain Embedding Problem, short OSCEP, where requests arrive over time, in a worst-case manner. Our main contribution is a deterministic O(log L)-competitive online algorithm, under the assumption that capacities are at least logarithmic in L. We show that this is asymptotically optimal within the class of deterministic and randomized online algorithms. We also explore lower bounds for offline approximation algorithms, and prove that the offline problem is APX-hard for unit capacities and small L > 2, and even Poly-APX-hard in general, when there is no bound on L. These approximation lower bounds may be of independent interest, as they also extend to other problems such as Virtual Circuit Routing. Finally, we present an exact algorithm based on 0-1 programming, implying that the general offline SCEP is in NP and by the above hardness results it is NP-complete for constant L.
Conference Paper
Full-text available
The popularity of multimedia services offered over the Internet have increased tremendously during the last decade. The technologies that are used to deliver these services are evolving at a rapidly increasing pace. However, new technologies often demand updating the dedicated hardware (e.g., transcoders) that is required to deliver the services. Currently, these updates require installing the physical building blocks at different loca-tions across the network. These manual interventions are time-consuming and extend the Time to Market of new and improved services, reducing their monetary benefits. To alleviate the afore-mentioned issues, Network Function Virtualization (NFV) was introduced by decoupling the network functions from the physical hardware and by leveraging IT virtualization technology to allow running Virtual Network Functions (VNFs) on commodity hardware at datacenters across the network. In this paper, we investigate how existing service chains can be mapped onto NFV-based Service Function Chains (SFCs). Furthermore, the different alternative SFCs are explored and their impact on network and datacenter resources (e.g., bandwidth, storage) are quantified. We propose to use these findings to cost-optimally distribute datacenters across an Internet Service Provider (ISP) network.
Conference Paper
Full-text available
We explore the nature of trac in data centers, designed to su p- port the mining of massive data sets. We instrument the servers to collect socket-level logs, with negligible performance impact. In a 1500 server operational cluster, we thus amass roughly a petabyte of measurements over two months, from which we obtain and re- portdetailedviewsoftracandcongestionconditionsandp atterns. We further consider whether trac matrices in the clustermi ght be obtained instead via tomographic inference from coarser-grained counter data.
Conference Paper
Future networks are expected to support low-latency, context-aware and user-specific services in a highly flexible and efficient manner. One approach to support emerging use cases such as, e.g., virtual reality and in-network image processing is to introduce virtualized network functions (vNF)s at the edge of the network, placed in close proximity to the end users to reduce end-to-end latency, time-to-response, and unnecessary utilisation of the core network. While placement of vNFs has been studied before, it has so far mostly focused on reducing the utilisation of server resources (i.e., minimising the number of servers required in the network to run a specific set of vNFs), and not taking network conditions into consideration such as, e.g., end-to-end latency, the constantly changing network dynamics, and user mobility patterns. In this paper, we first formulate the Edge vNF placement problem to allocate vNFs to a distributed edge infrastructure, minimising end-to-end latency from all users to their associated vNFs. Furthermore, we present a way to dynamically reschedule the optimal placement of vNFs based on temporal network-wide latency fluctuations using optimal stopping theory. We evaluate our dynamic scheduler over a simulated nationwide backbone network using real-world ISP latency characteristics. We show that our proposed dynamic placement scheduler minimises vNF migrations compared to other schedulers (e.g., periodic and always-on scheduling of a new placement), and offers Quality of Service guarantees by not exceeding a maximum number of latency violations that can be tolerated by certain applications.
Article
Cloud computing and network slicing are essential concepts of forthcoming 5G mobile systems. Network slices are essentially chunks of virtual computing and connectivity resources, configured and provisioned for particular services according to their characteristics and requirements. The success of cloud computing and network slicing hinges on the efficient allocation of virtual resources (e.g. VCPU, VMDISK) and the optimal placement of Virtualized Network Functions (VNFs) composing the network slices. In this context, this paper elaborates issues that may disrupt the placement of VNFs and VMs. The paper classifies the existing solutions for VM Placement (VMP) based on their nature, whether the placement is dynamic or static, their objectives, and their metrics. The paper then proposes a classification of VNF Placement (VNFP) approaches, first, regarding the general placement and management issues of VNFs, and second, based on the target VNF type.
Article
Dynamic Service Function Chaining (SFC) is a technique that facilitates the enforcement of advanced services and differentiated traffic forwarding policies. It dynamically steers the traffic through an ordered list of Service Functions (SFs). Enabling SFC capabilities in the context of a Software Defined Networking (SDN) architecture is promising, as it takes advantage of the SDN flexibility and automation abilities to structure service chains and improve the delivery time. However, the delivery time depends also on the traffic steering techniques used by an SFC solution. This paper provides a closer look at the current SDN architectures for SFC and provides an analysis of traffic steering techniques used by the current SDN-based SFC approaches. This study presents a comprehensive analysis of these approaches using efficiency criteria. It concludes that the studied solutions are not efficient enough to be deployed in real-life networks, principally due to scalability and flexibility limitations. Accordingly, the paper identifies relevant research challenges.
Conference Paper
Network function virtualization (NFV) can significantly reduce the operation cost and speed up the deployment for network services to markets. Under NFV, a network service is composed by a chain of ordered virtual functions, or we call a "network function chain." A fundamental question is when given a number of network function chains, on which servers should we place these functions and how should we form a chain on these functions? This is challenging due to the intricate dependency relationship of functions and the intrinsic complex nature of the optimization. In this paper, we formulate the function placement and chaining problem as an integer optimization, where each variable is an indicator whether one service chain can be deployed on a configuration (or a possible function placement of a service chain). While this problem is generally NP-hard, our contribution is to show that it can be mapped to an exponential number of min-cost flow problems. Instead of solving all the min-cost problems, one can select a small number of mapped min-cost problems, which are likely to have a low cost. To achieve this, we relax the integer problem into a fractional linear problem, and theoretically prove that the fractional solutions possess some desirable properties, i.e., the number and the utilization of selected configurations can be upper and lower bounded, respectively. Based on such properties, we determine some "good" configurations selected from the fractional solution and determine the mapped min-cost flow problem, and this helps us to develop efficient algorithms for network function placement and chaining. Via extensive simulations, we show that our algorithms significantly outperform state-of-art algorithms and achieve near-optimal performance.
Article
Network Functions Virtualization (NFV) has enabled operators to dynamically place and allocate resources for network services to match workload requirements. However, unbounded end-to-end (e2e) latency of Service Function Chains (SFCs) resulting from distributed Virtualized Network Function (VNF) deployments can severely degrade performance. In particular, SFC instantiations with inter-data center links can incur high e2e latencies and Service Level Agreement (SLA) violations. These latencies can trigger timeouts and protocol errors with latency-sensitive operations. Traditional solutions to reduce e2e latency involve physical deployment of service elements in close proximity. These solutions are, however, no longer viable in the NFV era. In this paper, we present our solution that bounds the e2e latency in SFCs and inter-VNF control message exchanges by creating micro-service aggregates based on the affinity between VNFs. Our system, Contain-ed, dynamically creates and manages affinity aggregates using light-weight virtualization technologies like containers, allowing them to be placed in close proximity and hence bounding the e2e latency. We have applied Contain-ed to the Clearwater IP Multimedia Subsystem and built a proof-of-concept. Our results demonstrate that, by utilizing application and protocol specific knowledge, affinity aggregates can effectively bound SFC delays and significantly reduce protocol errors and service disruptions.
Article
Forecasting is a common data science task that helps organizations with capacity planning, goal setting, and anomaly detection. Despite its importance, there are serious challenges associated with producing reliable and high quality forecasts – especially when there are a variety of time series and analysts with expertise in time series modeling are relatively rare. To address these challenges, we describe a practical approach to forecasting “at scale” that combines configurable models with analyst-in-the-loop performance analysis. We propose a modular regression model with interpretable parameters that can be intuitively adjusted by analysts with domain knowledge about the time series. We describe performance analyses to compare and evaluate forecasting procedures, and automatically flag forecasts for manual review and adjustment. Tools that help analysts to use their expertise most effectively enable reliable, practical forecasting of business time series.
Conference Paper
The goals of load balancing are diverse. We may distribute the load to servers in order to achieve the same utilizations or average latencies. However, these goods are not a perfect fit in virtualized or software-defined networks. First, it is more difficult to assume homogeneous server capacities. Even for two (virtualized) functions with the same capacities, the capacities seen by the customer might be heterogeneous simply because they belong to different providers, are shared by others, or locate themselves differently and the communication costs are different. Heterogeneous server capacity will blur the aim of keeping the same utilizations. Second, usually the metric of latency in those networks is the (stochastic) bound instead of average value. In this paper, we parameterize the server capacities, and use the stochastic latency bound as the metric to further support inferring load balancing. We also model the load balancing process as a Markov-modulated process and observe the influence of its parameters onto achieving balance. The proposed model will benefit the load balancing function implementation and infrastructure design in virtualized or software-defined networks.
Article
Although network function virtualization (NFV) is a promising approach for providing elastic network functions, it faces several challenges in terms of adaptation to diverse network appliances and reduction of the capital and operational expenses of the service providers. In particular, to deploy service chains, providers must consider different objectives, such as minimizing the network latency or the operational cost, which are coupled objectives that have traditionally been addressed separately. In this paper, the problem of virtual network function (vNF) placement for service chains is studied for the purpose of energy and traffic-aware cost minimization. This problem is formulated as an optimization problem named the joint operational and network traffic cost (OPNET) problem. First, a sampling-based Markov approximation (MA) approach is proposed to solve the combinatorial NP-hard problem, OPNET. Even though the MA approach can yield a near-optimal solution, it requires a long convergence time that can hinder its practical deployment. To overcome this issue, a novel approach that combines the MA with matching theory, named as SAMA, is proposed to find an efficient solution for the original problem OPNET. Simulation results show that the proposed framework can reduce the total incurred cost by up to 19% compared to the existing non-coordinated approach.
Conference Paper
This manuscript investigates the issue of implementing chains of network functions in a “softwarized” environment where edge network middle-boxes are replaced by software appliances running in virtual machines within a data center. The primary goal is to show that this approach allows space and time diversity in service chaining, with a higher degree of dynamism and flexibility with respect to conventional hardware-based architectures. The manuscript describes implementation alternatives of the virtual function chaining in a SDN scenario, showing that both layer 2 and layer 3 approaches are functionally viable. A proof-of-concept implementation with the Mininet emulation platform is then presented to provide a practical example of the feasibility and degree of complexity of such approaches.
Conference Paper
Network resource virtualization emerged as the future of communication technology recently, and the advent of Software Define Network (SDN) and Network Function Virtualization (NFV) enables the realization of network resource virtualization. NFV virtualizes traditional physical middle-boxes that implement specific network functions. Since multiple network functions can be virtualized in a single server or data center, the network operator can save Capital Expenditure (CAPEX) and Operational Expenditure (OPEX) through NFV. Since each customer demands different types of VNFs with various applications, the service requirements are different for all VNFs. Therefore, allocating multiple Virtual Network Functions(VNFs) to limited network resource requires efficient resource allocation. We propose an efficient resource allocation strategy of VNFs in a single server by employing mixed queuing network model while minimizing the customers' waiting time in the system. The problem is formulated as a convex problem. However, this problem is impossible to be solved because of the closed queuing network calculation. So we use an approximation algorithm to solve this problem. Numerical results of this model show performance metrics of mixed queuing network. Also, we could find that the approximate algorithm has a close optimal solution by comparing them with neighbor solutions.
Article
Diverse proprietary network appliances increase both the capital and operational expense of service providers, meanwhile causing problems of network ossification. Network function virtualization (NFV) is proposed to address these issues by implementing network functions as pure software on commodity and general hardware. NFV allows flexible provisioning, deployment, and centralized management of virtual network functions. Integrated with SDN, the software-defined NFV architecture further offers agile traffic steering and joint optimization of network functions and resources. This architecture benefits a wide range of applications (e.g., service chaining) and is becoming the dominant form of NFV. In this survey, we present a thorough investigation of the development of NFV under the software-defined NFV architecture, with an emphasis on service chaining as its application. We first introduce the software-defined NFV architecture as the state of the art of NFV and present relationships between NFV and SDN. Then, we provide a historic view of the involvement from middlebox to NFV. Finally, we introduce significant challenges and relevant solutions of NFV, and discuss its future research directions by different application domains.
Article
In online service systems, the delay experienced by a user from the service request to the service completion is one of the most critical performance metrics. To improve user delay experience, recent industrial practice suggests a modern system design mechanism: proactive serving, where the system predicts future user requests and allocates its capacity to serve these upcoming requests proactively. In this paper, we investigate the fundamentals of proactive serving from a theoretical perspective. In particular, we show that proactive serving decreases average delay exponentially (as a function of the prediction window size). Our results provide theoretical foundations for proactive serving and shed light on its application in practical systems.
Article
Network function virtualization was recently proposed to improve the flexibility of network service provisioning and reduce the time to market of new services. By leveraging virtualization technologies and commercial off-the-shelf programmable hardware, such as general-purpose servers, storage, and switches, NFV decouples the software implementation of network functions from the underlying hardware. As an emerging technology, NFV brings several challenges to network operators, such as the guarantee of network performance for virtual appliances, their dynamic instantiation and migration, and their efficient placement. In this article, we provide a brief overview of NFV, explain its requirements and architectural framework, present several use cases, and discuss the challenges and future directions in this burgeoning research area.
Conference Paper
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler.
Article
Motivated by the increasing popularity of learning and predicting human user behavior in communication and computing systems, in this paper, we investigate the fundamental benefit of predictive scheduling, i.e., predicting and pre-serving arrivals, in controlled queueing systems. Based on a lookahead window prediction model, we first establish a novel equivalence between the predictive queueing system with a \emph{fully-efficient} scheduling scheme and an equivalent queueing system without prediction. This connection allows us to analytically demonstrate that predictive scheduling necessarily improves system delay performance and can drive it to zero with increasing prediction power. We then propose the \textsf{Predictive Backpressure (PBP)} algorithm for achieving optimal utility performance in such predictive systems. \textsf{PBP} efficiently incorporates prediction into stochastic system control and avoids the great complication due to the exponential state space growth in the prediction window size. We show that \textsf{PBP} can achieve a utility performance that is within $O(\epsilon)$ of the optimal, for any $\epsilon>0$, while guaranteeing that the system delay distribution is a \emph{shifted-to-the-left} version of that under the original Backpressure algorithm. Hence, the average packet delay under \textsf{PBP} is strictly better than that under Backpressure, and vanishes with increasing prediction window size. This implies that the resulting utility-delay tradeoff with predictive scheduling beats the known optimal $[O(\epsilon), O(\log(1/\epsilon))]$ tradeoff for systems without prediction.
Conference Paper
Although there is tremendous interest in designing improved networks for data centers, very little is known about the network-level traffic characteristics of data centers today. In this paper, we conduct an empirical study of the network traffic in 10 data centers belonging to three different categories, including university, enterprise campus, and cloud data centers. Our definition of cloud data centers includes not only data centers employed by large online service providers offering Internet-facing applications but also data centers used to host data-intensive (MapReduce style) applications). We collect and analyze SNMP statistics, topology and packet-level traces. We examine the range of applications deployed in these data centers and their placement, the flow-level and packet-level transmission properties of these applications, and their impact on network and link utilizations, congestion and packet drops. We describe the implications of the observed traffic patterns for data center internal traffic engineering as well as for recently proposed architectures for data center networks.
Conference Paper
Today's data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Non-uniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.