Autonomic virtual resource management for service hosting platforms

Proceedings of the Workshop on Software Engineering Challenges in Cloud Computing 05/2009; DOI: 10.1109/CLOUD.2009.5071526
Source: OAI

ABSTRACT Cloud platforms host several independent applications on a shared resource pool with the ability to allocate com- puting power to applications on a per-demand basis. The use of server virtualization techniques for such platforms provide great flexibility with the ability to consolidate sev- eral virtual machines on the same physical server, to resize a virtual machine capacity and to migrate virtual machine across physical servers. A key challenge for cloud providers is to automate the management of virtual servers while taking into account both high-level QoS requirements of hosted applications and resource management costs. This paper proposes an autonomic resource manager to con- trol the virtualized environment which decouples the provi- sioning of resources from the dynamic placement of virtual machines. This manager aims to optimize a global utility function which integrates both the degree of SLA fulfillment and the operating costs. We resort to a Constraint Pro- gramming approach to formulate and solve the optimization problem. Results obtained through simulations validate our approach.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A challenge in cloud resource management is to design self-adaptable solutions capable to react to unpredictable workload fluctuations and changing utility principles. This paper analyzes the problem from the perspective of an Application Service Provider (ASP) that uses a cloud infrastructure to achieve scalable provisioning of its services in the respect of QoS constraints.First we draw a taxonomy of IaaS provider and use the identified features to drive the design of four autonomic service management architectures differing on the degree of control an ASP have on the system. We implemented two of this solutions and related mechanism to test five different resource provisioning policies. The implemented testbed has been evaluated under a realistic workload based on Wikipedia access traces on Amazon EC2 platform.The experimental evaluation performed confirms that: the proposed policies are capable to properly dimension the system resources making the whole system self-adaptable respect to the workload fluctuation. Moreover, having full control over the resource management plan allow to save up to the 32% of resource allocation cost always in the respect of SLA constraints.
    Computer Networks 02/2013; 57(3):795–810. DOI:10.1016/j.comnet.2012.10.020 · 1.28 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Infrastructure-as-a-service (IaaS) is one of emerging powerful cloud computing services provided by IT industry at present. This paper considers the interaction aspects between on-demand requests and the allocation of virtual machines in a server farm operated by a specific infrastructure owner. We formulate an analytic performance model of the server farm taking into account the quality of service (QoS) guaranteed to users and the operational energy consumption in the server farm. We compare several scheduling algorithms from the aspect of the average energy consumption and heat emission of servers as well as the blocking probabilities of on-demand requests. Based on numerical results of a comparison of different allocation strategies, a saving on the energy consumption is possible in the operational range (where on-demand requests do not face unpleasant blocking probability) with the allocation of virtual machines to physical servers based on the priority.
    Journal of Systems and Software 06/2012; 85(6-6):1400-1408. DOI:10.1016/j.jss.2012.01.019 · 1.25 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Distributed computing infrastructures are commonly used through scientific gateways, but operating these gateways requires important human intervention to handle operational incidents. This paper presents a self-healing process that quantifies incident degrees of workflow activities from metrics measuring long-tail effect, application efficiency, data transfer issues, and site-specific problems. These metrics are simple enough to be computed online and they make little assumptions on the application or resource characteristics. Incidents are classified in levels and associated to sets of healing actions that are selected based on association rules modeling correlations between incident levels. The healing process is parametrized on real application traces acquired in production on the European Grid Infrastructure. Implementation and experimental results obtained in the Virtual Imaging Platform show that the proposed method speeds up execution up to a factor of 4 and properly detects unrecoverable errors.
    Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on; 05/2012


Available from