Conference Paper

High-available grid services through the use of virtualized clustering.

Tech. Univ. of Catalonia Barcelona, Barcelona;
DOI: 10.1109/GRID.2007.4354113 Conference: 8th IEEE/ACM International Conference on Grid Computing (GRID 2007), September 19-21, 2007, Austin, Texas, USA, Proceedings
Source: DBLP

ABSTRACT Grid applications comprise several components and web-services that make them highly prone to the occurrence of transient software failures and aging problems. This type of failures often incur in undesired performance levels and unexpected partial crashes. In this paper we present a technique that offers high-availability for Grid services based on concepts like virtualization, clustering and software rejuvenation. To show the effectiveness of our approach, we have conducted some experiments with OGSA-DAI middleware. One of the implementations of OGSA-DAI makes use of use of Apache Axis V1.2.1, a SOAP implementation that suffers from severe memory leaks. Without changing any bit of the middleware layer we have been able to anticipate most of the problems caused by those leaks and to increase the overall availability of the OGSA-DAI Application Server. Although these results are tightly related with this middleware it should be noted that our technique is neutral and can be applied to any other Grid service that is supposed to be high-available.

  • [Show abstract] [Hide abstract]
    ABSTRACT: A number of studies have reported the phenomenon of “Software aging”, caused by resource exhaustion and characterized by progressive software performance degradation. In this article, we carry out an experimental study of software aging and rejuvenation for an on-line bookstore application, following the standard configuration of TPC-W benchmark. While real website is used for the bookstore, the clients are emulated. In order to reduce the time to application failures caused by memory leaks, we use the accelerated life testing (ALT) approach. We then select the Weibull time to failure distribution at normal level, to be used in a semi-Markov process, to compute the optimal software rejuvenation trigger interval. Since the validation of optimal rejuvenation trigger interval with emulated browsers will take an inordinate long time, we develop a simulation model to validate the ALT experimental results, and also estimate the steady-state availability to cross-validate the results of the semi-Markov availability model.
    ACM Journal on Emerging Technologies in Computing Systems 01/2014; 10(1):1. · 0.76 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Software aging is caused by resource exhaustion and can lead to progressive performance degradation or result in a crash. We develop experiments that simulate an on-line bookstore application, using the standard configuration of TPC-W benchmark. We study application failures due to memory leaks, using the accelerated life testing (ALT). ALT significantly reduces the time needed to estimate the time to failure at normal level. We then select the Weibull time to failure distribution at normal level, to be used in a semi-Markov model so as to optimize the software rejuvenation trigger interval. Then we derive the optimal rejuvenation schedule interval by fixed point iteration and by an alternative non-parametric estimation algorithm. Finally, we develop a simulation model using importance sampling (IS) to cross validate the ALT experimental results and the semi-Markov model, and also we apply the non-parametric method to cross validate the optimized trigger intervals by comparing the availabilities obtained from the semi-Markov model and those from IS simulation using the non-parametric method.
    Performance Evaluation 01/2013; 70(11):917–933. · 0.84 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Software rejuvenation has been addressed in hundreds of papers since it was proposed in 1995 by Huang et al. The growing number of research papers shows the great importance of this topic. However, no paper has studied yet software rejuvenation in the real world. This paper investigates to what extent software rejuvenation techniques are integrated in the IT and Telco solutions. For this purpose, it has been conducted an intensive search of different sources such as company's product websites, technical papers, white papers, US patents, and consultant surveys. The results show that IT and Telco companies develop software rejuvenation solutions to deal with software aging. The number of US patents addressing this issue confirms the interest of industry to develop mechanisms to deal with software aging-related failures. It has been observed that real software rejuvenation solutions mainly use time-based or threshold-based policies, while the US patents are focused on predictive approaches.
    Software Reliability Engineering Workshops (ISSREW), 2012 IEEE 23rd International Symposium on; 01/2012

Full-text (2 Sources)

Available from
Jun 3, 2014