Conference Paper

An automatic machine scaling solution for cloud systems

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Infrastructure as a Service (IaaS) paradigm allows service providers and/or end-users to outsource the computing resources they need. Cloud providers offer the infrastructure as a utility with a pay-per-use model, relying on virtualization technologies to provide hardware resources to the cloud consumers. IaaS can potentially bring obvious advantages, but the virtualized resources use to be manually controlled by the consumers, and this may lead to unaffordable administration costs. The demand of computing resources can fluctuate from one time to another and managing the virtual machines “rented” to the cloud provider to meet peak requirements but to avoid overprovisioning, is a significant challenge. This paper presents AMAS (Automatic MAchine Scaling), a distributed solution capable of automatically creating and releasing virtual machines in order to minimize the number of virtual resources instantiated to run an application and to meet the consumer performance requirements. Furthermore, the complete design and a first real implementation of this solution is described to validate that it is capable of handling sudden load changes, maintaining the desired quality of service, minimizing the number of virtual machines and significantly reducing the consumer management efforts.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Conference Paper
Earlier virtual machine (VM) migration techniques consisted of stop-and-copy: the VM was stopped, its address space was copied to a different physical machine, and the VM was restarted at that machine. Recent VM hypervisors support live VM migration, which allows pages to be copied while the VM is running. If any copied page is dirtied (i.e., modified), it has to be copied again. The process stops when a fraction α of the pages need to be copied. Then, the VM is stopped and the remaining pages are copied. This paper derives a model to compute the downtime, total number of pages copied, and network utilization due to VM migration, as a function of α and other parameters under uniform and non-uniform dirtying rates. The paper also presents a non-linear optimization model to find the value of α that minimizes the downtime subject to network utilization constraints.
Article
Provisioning of multi-tier applications in cloud environments raises new challenges not addressed by prior work on provisioning single-tier applications, on dynamic balancing or on resource allocation in other types of distributed systems. Flexible and general automatic mechanisms are needed to determine how much virtual resources need to be allocated to each tier of the application minimizing resources consumption and meeting the service level agreement. Both the research community and the main cloud providers are proposing this kind of solutions but most of them are application-specific, provider-specific, centralized and focused only on batch applications. This paper presents an automatic provisioning solution for multi-tier applications called AutoMAP. The proposed mechanism is general (application and provider independent), it can be implemented with different architectures from centralized to distributed even being provided as a service, and it is able to deal with both batch and interactive applications allowing horizontal and vertical scaling (based on replication and on resizing respectively). A first prototype of AutoMAP has been implemented to demonstrate its efficiency with experimental results using a widely used benchmark, RUBiS, on a real cloud architecture.
Conference Paper
Blocking is the phenomenon where a service request is momentarily stopped, but not lost, until the service becomes available again. Despite its importance, blocking is a difficult phenomenon to model analytically, because it creates strong inter-dependencies in the systems components. Mean Value Analysis (MVA) is one of the most appealing evaluation methodology since its low computational cost and easy of use. In this paper, an approximate MVA for Bloking After Service is presented that greatly outperforms previous results. The new algorithm is obtained by analyzing the inter-dependencies due to the blocking mechanism and by consequently modifying the MVA equations. The proposed algorithm is tested and then applied to a capacity planning and admission control study of a web server system.
Article
Full-text available
The growth in computer and networking technologies over the past decades produced new type of collaborative computing environment called Grid Network. Grid is a parallel and distributed computing network system that possesses the ability to achieve a higher throughput computing by taking advantage of many computing resources available in the network. Therefore, to achieve a scalable and reliable Grid network system, the load needs to be efficiently distributed among the resources accessible on the network. In this paper, we present a distributed and scalable load balancing framework for Grid networks using biased random sampling. The generated network system is self-organized and depends only on local information for load distribution and resource discovery. We demonstrate that introducing a geographic awareness factor in the random walk sampling can reduce the effects of communication latency in the Grid network environment. Simulation results show that the generated network system provides an effective, scalable, and reliable load balancing scheme for the distributed resources available on Grid networks.
Conference Paper
Full-text available
Scalability is critical to the success of many enterprises currently involved in doing business on the Web and in providing information that may vary drastically from one time to another. Maintaining sufficient resources just to meet peak requirements can be costly. Cloud computing provides a powerful computing model that allows users to access resources on-demand. In this paper, we will describe a novel architecture for the dynamic scaling of Web applications based on thresholds in a virtualized cloud computing environment. We will illustrate our scaling approach with a front-end load-balancer for routing and balancing user requests to Web applications deployed on Web servers installed in virtual machine instances. A dynamic scaling algorithm for automated provisioning of virtual machine resources based on threshold number of active sessions will be introduced. The on-demand capability of the cloud to rapidly provision and dynamically allocate resources to users will be discussed. Our work has demonstrated the compelling benefits of the cloud which is capable of handling sudden load surges, delivering IT resources on-demands to users, and maintaining higher resource utilization, thus reducing infrastructure and management costs.
Conference Paper
Full-text available
Server consolidation based on virtualization is an important technique for improving power efficiency and resource utilization in cloud infrastructures. However, to ensure satisfactory performance on shared resources under changing application workloads, dynamic management of the resource pool via online adaptation is critical. The inherent tradeoffs between power and performance as well as between the cost of an adaptation and its benefits make such management challenging. In this paper, we present Mistral, a holistic controller framework that optimizes power consumption, performance benefits, and the transient costs incurred by various adaptations and the controller itself to maximize overall utility. Mistral can handle multiple distributed applications and large-scale infrastructures through a multi-level adaptation hierarchy and scalable optimization algorithm. We show that our approach outstrips other strategies that address the tradeoff between only two of the objectives (power, performance, and transient costs).
Conference Paper
Full-text available
Dynamic provisioning is a useful technique for handling the virtualized multi-tier applications in cloud environment. Understanding the performance of virtualized multi-tier applications is crucial for efficient cloud infrastructure management. In this paper, we present a novel dynamic provisioning technique for a cluster-based virtualized multi-tier application that employ a flexible hybrid queueing model to determine the number of virtual machines at each tier in a virtualized application. We present a cloud data center based on virtual machine to optimize resources provisioning. Using simulation experiments of three-tier application, we adopt an optimization model to minimize the total number of virtual machines while satisfying the customer average response time constraint and the request arrival rate constraint. Our experiments show that cloud data center resources can be allocated accurately with these techniques, and the extra cost can be effectively reduced.
Conference Paper
Full-text available
Although cloud computing has gained sufficient popularity recently, there are still some key impediments to enterprise adoption. Cloud management is one of the top challenges. The ability of on-the-fly partitioning hardware resources into virtual machine(VM) instances facilitates elastic computing environment to users. But the extra layer of resource virtualization poses challenges on effective cloud management. The factors of time-varying user demand, complicated interplay between co-hosted VMs and the arbitrary deployment of multitier applications make it difficult for administrators to plan good VM configurations. In this paper, we propose a distributed learning mechanism that facilitates self-adaptive virtual machines resource provisioning. We treat cloud resource allocation as a distributed learning task, in which each VM being a highly autonomous agent submits resource requests according to its own benefit. The mechanism evaluates the requests and replies with feedback. We develop a reinforcement learning algorithm with a highly efficient representation of experiences as the heart of the VM side learning engine. We prototype the mechanism and the distributed learning algorithm in an iBalloon system. Experiment results on an Xen-based cloud test bed demonstrate the effectiveness of iBalloon. The distributed VM agents are able to reach near-optimal configuration decisions in 7 iteration step sat no more than 5% performance cost. Most importantly, iBalloon shows good scalability on resource allocation by scaling to 128correlated VMs.
Article
Full-text available
In this paper, we investigate the global self-aggregation dynamics arising from local decision-based rewiring of an overlay network, used as an abstraction for an autonomic service-oriented architecture. We measure the ability of a selected set of local rules to foster self-organization of what is originally a random graph into a structured network. Scalability issues with respect to the key parameters of system size and diversity are extensively discussed. Conflicting goals are introduced, in the form of a population of nodes actively seeking to acquire neighbours of a type different from their own, resulting in decreased local homogeneity. We show that a ‘secondary’ self-organization process ensues, whereby nodes spontaneously cluster according to their implicit objective. Finally, we introduce dynamic goals by making the preferred neighbour type a function of the local characteristics of a simulated workload. We demonstrate that in this context, an overlay rewiring process based purely on local decisions and interactions can result in efficient load-balancing without central planning. We conclude by discussing the implications of our findings for the design of future distributed applications, the likely influence of other factors and of extreme parameter values on the ability of the system to self-organize and the potential improvements to our framework.
Conference Paper
Full-text available
This paper addresses the problem of hosting multiple applications on a provider's virtualized multi-tier infrastructure. Building from a previous model, we design a new self-adaptive capacity management framework, which combines a two-level SLA-driven pricing model, an optimization model and an analytical queuing-based performance model to maximize the provider's business objective. Our main contributions are the more accurate multi-queue performance model, which captures application specific bottlenecks and the parallelism inherent to multi-tier platforms, as well as the solution of the extended and much more complex optimization model. Our approach is evaluated via simulation with synthetic as well as realistic workloads, in various scenarios. The results show that our solution is significantly more cost-effective, in terms of the provider's achieved revenues, than the approach it is built upon, which uses a single-resource performance model. It also significantly outperforms a multi-tier static allocation strategy for heavy and unbalanced workloads. Finally, preliminary experiments assess the applicability of our framework to virtualized environments subjected to capacity variations caused by the processing of management and security-related tasks.
Article
Cloud Systems provide the computing infrastructure and on-demand capacity required to host services. In this paper we present a new provisioning mechanism for Cloud Systems. Our project addresses the key requirements in managing resources at the infrastructure level. The proposed resource manager allocates virtual resources in a flexible way, taking into account time, costs and physical resources. It provides on-demand elasticity, which is one of the most important feature of Cloud Computing towards the traditional hosting strategies. New resources can be dynamically allocated on demand or policy-based. This is one of the novel contribution of this paper, as the majority of the cloud resource managers provides just static allocation. For resource scheduling we have developed a mechanism that takes into account virtual machines' capabilities and well defined policies and it uses a genetic algorithm. We used a QoS constraint base algorithm in order to maximize the performance, according to different defined policies. In order to demonstrate the performance of the presented resource managing mechanism we provide and interpret several experimental results.
Conference Paper
Cloud Computing is a versatile technology that can support a broad-spectrum of applications. The low cost of cloud computing and its dynamic scaling renders it an innovation driver for small companies, particularly in the developing world. Cloud deployed enterprise resource planning (ERP), supply chain management applications (SCM), customer relationship management (CRM) applications, medical applications and mobile applications have potential to reach millions of users. Cloud deployed applications that employ mobile devices as end-points are particularly exciting due to the high penetration of mobile devices in countries like China, South Africa and India. With the opportunities in cloud computing being greater than at any other time in history, we had to pause and reflect on our own experiences with cloud computing - both as producers and consumers of that technology. Our interests and attitudes toward cloud technology differ considerably for each side of the cloud-computing topic. As producers of cloud-like infrastructure, much of our interest was on the technology itself. We experimented with algorithms for managing remote program invocation, fault tolerance, dynamic load balancing, proactive resource management and meaningful distributed application monitoring. As consumers of cloud computing however, our focus switched from interesting technology to usability, simplicity, reliability and guaranteed rock solid data stability. With an eye to the many cloud articles in the recent news, we have to ask, is cloud computing ready for prime time? After reviewing stories about current cloud deployments, we conclude that cloud computing is not yet ready for general use, many significant cloud service failures have been reported and several important issues remain unaddressed. Furthermore, besides the failures and gaps in the current cloud offerings, there is an inherent flaw in the model itself. Today, the cloud represents an opportunity for a client to outsource hardwa- - re/software function or program computing cycles. The missing piece is responsibility outsourcing - today something found only in IT Outsourcing contracts. This missing piece represents an essential component of a cloud offering. Without it, cloud consumers are left without any real reassurances that their data is safe from failures, catastrophe or court ordered search and seizure. In this paper, we explore the different viewpoints of cloud computing. Leveraging our experiences on both sides of clouds, we examine clouds from a technology aspect, a service aspect and a responsibility aspect. We highlight some of the opportunities in cloud computing, underlining the importance of clouds and showing why that technology must succeed. Finally, we propose some usability changes for cloud computing that we feel are needed to make clouds ready for prime time.
Conference Paper
Cloud providers can offer cloud consumers two plans to provision resources, namely reservation and on-demand plans. With the reservation plan, the consumer can reduce the total resource provisioning cost. However, this resource provisioning is challenging due to the uncertainty. For example, consumers' demand and providers' resource prices can be fluctuated. Moreover, inefficiency of resource provisioning leads to either overprovisioning or underprovisioning problem. In this paper, we propose a robust cloud resource provisioning (RCRP) algorithm to minimize the total resource provisioning cost (i.e., overprovisioning and underprovisioning costs). Various types of uncertainty are considered in the algorithm. To obtain the optimal solution, a robust optimization model is formulated and solved. Numerical studies are extensively performed in which the results show that the solution obtained from the RCRP algorithm achieves both solution-and model-robustness. That is, the total resource provisioning cost is close to the optimality (i.e., solution-robustness), and the overprovisioning and underprovisioning costs are significantly reduced (i.e., model-robustness).
Conference Paper
This paper explores autonomic approaches for optimizing provisioning for heterogeneous workloads on enterprise grids and clouds. Specifically, this paper presents a decentralized, robust online clustering approach that addresses the distributed nature of these environments, and can be used to detect patterns and trends, and use this information to optimize provisioning of virtual (VM) resources. It then presents a model-based approach for estimating application service time using long-term application performance monitoring, to provide feedback about the appropriateness of requested resources as well as the system's ability to meet QoS constraints and SLAs. Specifically for high-performance computing workloads, the use of a quadratic response surface model (QRSM) is justified with respect to traditional models, demonstrating the need for application-specific modeling. The proposed approaches are evaluated using a real computing center workload trace and the results demonstrate both their effectiveness and cost-efficiency.
Article
A cluster-based server consists of a front-end dispatcher and multiple back-end servers. The dispatcher receives incoming jobs, and then decides how to assign them to back-end servers, which in turn serve the jobs according to some discipline. Cluster-based servers have been widely deployed, as they combine good performance with low costs.Several assignment policies have been proposed for cluster-based servers, most of which aim to balance the load among back-end servers. There are two main strategies for load balancing: The first aims to balance the amount of workload at back-end servers, while the second aims to balance the number of jobs assigned to back-end servers. Examples of policies using these strategies are Dynamic and LC (Least Connected), respectively.In this paper we propose a policy, called LC*, which combines the two aforementioned strategies. The paper shows experimentally that when preemption is admitted (i.e., when jobs execute concurrently on back-end servers), LC* substantially outperforms bothDynamic and LC in terms of response-time metrics. This improved performance is achieved by using only information readily available to the dispatcher, rendering LC* a practical policy to implement. Finally, we study a refinement, called ALC* (Adaptive LC*), which further improves on the response-time performance of LC* by adapting its actions to incoming traffic rates.
Article
With the significant advances in Information and Communications Technology (ICT) over the last half century, there is an increasingly perceived vision that computing will one day be the 5th utility (after water, electricity, gas, and telephony). This computing utility, like all other four existing utilities, will provide the basic level of computing service that is considered essential to meet the everyday needs of the general community. To deliver this vision, a number of computing paradigms have been proposed, of which the latest one is known as Cloud computing. Hence, in this paper, we define Cloud computing and provide the architecture for creating Clouds with market-oriented resource allocation by leveraging technologies such as Virtual Machines (VMs). We also provide insights on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain Service Level Agreement (SLA)-oriented resource allocation. In addition, we reveal our early thoughts on interconnecting Clouds for dynamically creating global Cloud exchanges and markets. Then, we present some representative Cloud platforms, especially those developed in industries, along with our current work towards realizing market-oriented resource allocation of Clouds as realized in Aneka enterprise Cloud technology. Furthermore, we highlight the difference between High Performance Computing (HPC) workload and Internet-based services workload. We also describe a meta-negotiation infrastructure to establish global Cloud exchanges and markets, and illustrate a case study of harnessing ‘Storage Clouds’ for high performance content delivery. Finally, we conclude with the need for convergence of competing IT paradigms to deliver our 21st century vision.
Conference Paper
The adaptation of virtualization technologies and the Cloud Compute model by Web service providers is accelerating. These technologies commonly known as Cloud Compute Model are built upon an efficient and reliable dynamic resource allocation system. Maintaining sufficient resources to meet peak workloads while minimizing cost determines to a large extend the profitability of a Cloud service provider. Traditional centralized approach of resource provisioning with global optimization and statistical strategies can be complex, difficult to scale, computational intensive and often non-traceable which adds to the cost and efficiency of Cloud operation, especially in industrial environments. As we have learned in real life, the most efficient economic system is the one that provides individuals with incentives for their own decisions. It is also true for computing systems. In this paper, we present an architecture for dynamic resource provisioning via distributed decisions. We will illustrate our approach with a Cloud based scenario, in which each physical resource makes its own utilization decision based on its own current system and workload characteristics, and a light-weight provisioning optimizer with a replaceable routing algorithm for resource provisioning and scaling. This approach enables resource provisioning system to be more scalable, reliable, traceable, and simple to manage. In an industrial setting, the importance of these characteristics often exceeds the goal of squeezing the absolute last CPU cycles of the underlying physical resources.
Conference Paper
Cloud computing is the latest computing paradigm that delivers IT resources as services in which users are free from the burden of worrying about the low-level implementation or system administration details. However, there are significant problems that exist with regard to efficient provisioning and delivery of applications using Cloud-based IT resources. These barriers concern various levels such as workload modeling, virtualization, performance modeling, deployment, and monitoring of applications on virtualized IT resources. If these problems can be solved, then applications can operate more efficiently, with reduced financial and environmental costs, reduced under-utilization of resources, and better performance at times of peak load. In this paper, we present a provisioning technique that automatically adapts to workload changes related to applications for facilitating the adaptive management of system and offering end-users guaranteed Quality of Services (QoS) in large, autonomous, and highly dynamic environments. We model the behavior and performance of applications and Cloud-based IT resources to adaptively serve end-user requests. To improve the efficiency of the system, we use analytical performance (queueing network system model) and workload information to supply intelligent input about system requirements to an application provisioner with limited information about the physical infrastructure. Our simulation-based experimental results using production workload models indicate that the proposed provisioning technique detects changes in workload intensity (arrival pattern, resource demands) that occur over time and allocates multiple virtualized IT resources accordingly to achieve application QoS targets.
Article
The problem of computing a large set of different tasks on a set of heterogeneous resources connected by a network is very common nowadays in very different environments and load balancing is indispensable for achieving high performance and high throughput in systems such as clusters. Cluster heterogeneity increases the difficulty of balancing the load across the system nodes and, although the relationship between heterogeneity and load balancing is difficult to describe analytically, in this paper different models and performance metrics are proposed to describe heterogeneous cluster behavior and to perform an exhaustive analysis of the effects of heterogeneity on load balancing algorithm performance. This analysis allows us to propose efficient solutions capable of dealing with heterogeneity for all the load balancing algorithm stages. Furthermore, a load balancing algorithm has been implemented following these solutions to demonstrate, with experimental results, its efficiency on real heterogeneous clusters.
Article
Clouds have changed the way we think about IT infrastructure management. Providers of software-based services are now able to outsource the operation of the hardware platforms required by those services. However, as the utilization of cloud platforms grows, users are realizing that the implicit promise of clouds (leveraging them from the tasks related with infrastructure management) is not fulfilled. A reason for this is that current clouds offer interfaces too close to that infrastructure, while users demand functionalities that automate the management of their services as a whole unit. To overcome this limitation, we propose a new abstraction layer closer to the lifecycle of services that allows for their automatic deployment and escalation depending on the service status (not only on the infrastructure). This abstraction layer can sit on top of different cloud providers, hence mitigating the potential lock-in problem and allowing the transparent federation of clouds for the execution of services. Here, we present Claudia, a service management system that implements such an abstraction layer, and the results of the deployment of a grid service (based on the Sun Grid Engine software) on such system.
Article
The success of different computing models, performance analysis and load balancing and algorithms depends on the processor availability information because there is a strong relationship between a process response time and the processor time available for its execution. Therefore, predicting the processor availability for a new process or task in a computer system is a basic problem that arises in in many important contexts. Unfortunately, making such predictions is not easy because of the dynamic nature of current computer systems and their workload, which can vary drastically in a short interval of time. This paper presents two new availability prediction models. The first, called SPAP (Static Process Assignment Prediction) model, is capable of predicting the CPU availability for a new task on a computer system having information about the tasks in its run queue. The second, called DYPAP (DYnamic Process Assignment Prediction) model, is an improvement of the SPAP model capable of making these predictions from real-time measurements provided by a monitoring tool, without any kind of information about the tasks in the run queue. Furthermore, the implementation of this monitoring tool for Linux workstations is presented.
Cloud Computing Explained
  • J Rothon