Evaluation of system reliability for a cloud computing system with imperfect nodes.

Systems Engineering (Impact Factor: 0.66). 01/2012; 15:83-94. DOI: 10.1002/sys.20196
Source: DBLP

ABSTRACT From the perspective of system design and quality of service (QoS), system reliability is one of the essential performance indicators to measure the probable reliability of a network. In terms of a practical cloud computing system (CCS), edges and nodes have various capacities or states due to failure, partial failure, or maintenance. Thus, the CCS is a typical capacitated-flow network. To guarantee a good level of quality and reliability, the CCS should be maintained, so as not to fall into a failed state whereby it cannot provide sufficient capacity to satisfy demand. Thus, system reliability is developed in this paper to evaluate the capability of the CCS to send d units of data from the cloud to the client through two paths under both the maintenance budget and time constraints. An algorithm with an adjusting procedure based on the branch-and-bound approach is proposed to evaluate the system reliability. The relevant proof shows that the proposed algorithm is reasonable and appropriate for measuring the system reliability of the CCS. According to different maintenance budgets and corresponding system reliability, the system supervisor could determine a reasonable maintenance budget to maintain a good level of quality and reliability of the CCS. From the perspective of system design, the system supervisor could further conduct a sensitivity analysis to improve or investigate the most important part in a large CCS based on system reliability. © 2011 Wiley Periodicals, Inc.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cloud computing has become increasingly popular due to deployment of cloud solutions that will enable enterprises to cost reduction and more operational flexibility. Reliability is a key metric for assessing performance in such systems. Fault tolerance methods are extensively used to enhance reliability in Cloud Computing Systems (CCS). However, these methods impose extra hardware and/or software cost. Proper resource allocation is an alternative approach which can significantly improve system reliability without any extra overhead. On the other hand, contemplating reliability irrespective of energy consumption and Quality of Service (QoS) requirements is not desirable in CCSs. In this paper, an analytical model to analyze system reliability besides energy consumption and QoS requirements is introduced. Based on the proposed model, a new online resource allocation algorithm to find the right compromise between system reliability and energy consumption while satisfying QoS requirements is suggested. The algorithm is a new swarm intelligence technique based on imperialist competition which elaborately combines the strengths of some well-known meta-heuristic algorithms with an effective fast local search. A wide range of simulation results, based on real data, clearly demonstrate high efficiency of the proposed algorithm.
    15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013); 10/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cloud computing is widely referred as the next generation of computing systems. Reliability is a key metric for assessing performance in such systems. Redundancy and diversity are prevalent approaches to enhance reliability in Cloud Computing Systems (CCS). Proper resource allocation is an alternative approach to reliability improvement in such systems. In contrast to redundancy, appropriate resource allocation can improve system reliability without imposing extra cost. On the other hand, contemplating reliability irrespective of Quality of Service (QoS) requirements may be undesirable in most of CCSs. In this paper, we focus on resource allocation approach and introduce an analytical model in order to analyze system reliability besides considering application and resource constraints. Task precedence structure and QoS are taken into account as the application constraints. Memory and storage limitation of each server as well as maximum communication load on each link are considered as the principle resource constraints. In addition, effect of network topology on system reliability is discussed in detail and the model is extended to cover various network topologies.
    IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS), Japan; 06/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we study facility location problems on graphs under the most common optimization criteria, such as, median, center and centdian, but we incorporate in the objective function some reliability aspects. Assuming that facilities may become unavailable with a certain probability, the problem consists of locating facilities minimizing the overall or the maximum expected service cost in the long run, or a convex combination of the two. We show that the kk-facility problem on general networks is NP-hard. Then, we provide efficient algorithms for these problems for the cases of k=1,2k=1,2, both on general networks and on trees. We also explain how our methodology extends to handle a more general class of unreliable point facility location problems related to the ordered median objective function.
    Discrete Applied Mathematics 01/2014; 166:188–203. · 0.68 Impact Factor