Evaluation of system reliability for a cloud computing system with imperfect nodes

Systems Engineering (Impact Factor: 0.7). 03/2012; 15(1):83-94. DOI: 10.1002/sys.20196
Source: DBLP


From the perspective of system design and quality of service (QoS), system reliability is one of the essential performance indicators to measure the probable reliability of a network. In terms of a practical cloud computing system (CCS), edges and nodes have various capacities or states due to failure, partial failure, or maintenance. Thus, the CCS is a typical capacitated-flow network. To guarantee a good level of quality and reliability, the CCS should be maintained, so as not to fall into a failed state whereby it cannot provide sufficient capacity to satisfy demand. Thus, system reliability is developed in this paper to evaluate the capability of the CCS to send d units of data from the cloud to the client through two paths under both the maintenance budget and time constraints. An algorithm with an adjusting procedure based on the branch-and-bound approach is proposed to evaluate the system reliability. The relevant proof shows that the proposed algorithm is reasonable and appropriate for measuring the system reliability of the CCS. According to different maintenance budgets and corresponding system reliability, the system supervisor could determine a reasonable maintenance budget to maintain a good level of quality and reliability of the CCS. From the perspective of system design, the system supervisor could further conduct a sensitivity analysis to improve or investigate the most important part in a large CCS based on system reliability. © 2011 Wiley Periodicals, Inc.

13 Reads
  • Source
    • "They also ignored hard disk failures and assumed that server reliability is only dependent on reliability of its processors. Another important effort on this category belong to Lin and Chang [11] which evaluated system reliability for a typical CCS with imperfect nodes. They proposed an algorithm based on the branch and bound approach. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cloud computing is widely referred as the next generation of computing systems. Reliability is a key metric for assessing performance in such systems. Redundancy and diversity are prevalent approaches to enhance reliability in Cloud Computing Systems (CCS). Proper resource allocation is an alternative approach to reliability improvement in such systems. In contrast to redundancy, appropriate resource allocation can improve system reliability without imposing extra cost. On the other hand, contemplating reliability irrespective of Quality of Service (QoS) requirements may be undesirable in most of CCSs. In this paper, we focus on resource allocation approach and introduce an analytical model in order to analyze system reliability besides considering application and resource constraints. Task precedence structure and QoS are taken into account as the application constraints. Memory and storage limitation of each server as well as maximum communication load on each link are considered as the principle resource constraints. In addition, effect of network topology on system reliability is discussed in detail and the model is extended to cover various network topologies.
    IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS), Japan; 06/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Network systems have become more and more complex with the fast evolving technologies in order to accomplish those missions that are too complicated to be finished by any stand-alone system. The question of how reliable these network systems are facing those complex missions requires our prior attentions. Different from a stand-alone system carrying out one mission at a time, the network systems deal with many missions simultaneously. It is natural to model these missions as a mission network. In this paper, we employed a two-layer network model, including mission network and physical network, to test the network reliability against its missions, called mission-oriented network reliability. As observed by many researchers, the hierarchy is one of the most common phenomena in complex networks. By quantifying the hierarchies of the mission network and the physical network, we investigated the effects of the coupling between the hierarchies of these two networks on the mission-oriented network reliability. The simulation results show that the positive coupling between the hierarchies of the two-layer networks leads to the same positive correlation between the mission-oriented reliability and the significance of the hierarchy of the mission network; and similarly, the negative coupling between the hierarchies results in the negative correlation between the network reliability and the significance of the hierarchy of the mission network.
    Quality, Reliability, Risk, Maintenance, and Safety Engineering (ICQR2MSE), 2012 International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Reworking in a manufacturing system is usually an environment-friendly way to eliminate waste during producing. This paper develops a performance evaluation procedure for the parallel-line manufacturing system with different station failure rates. Considering reworking actions, we propose a graphical-based methodology to model the manufacturing system as a stochastic-flow network. First, a transformation technique is utilized to build the manufacturing system as a manufacturing network. Second, a simple algorithm integrating decomposition technique is proposed to generate the minimal capacity vectors that stations should provide to satisfy the given demand. We evaluate the probability that the manufacturing network can meet the demand in terms of the minimal capacity vectors, where the probability is referred to as the system reliability. A footwear manufacturing system is utilized to demonstrate the performance evaluation procedure. A further decision making issue is discussed based on the derived system reliability for achieving higher demand satisfaction and less waste.
    Journal of Cleaner Production 11/2012; 35:93–101. DOI:10.1016/j.jclepro.2012.05.023 · 3.84 Impact Factor
Show more