Evaluation of system reliability for a cloud computing system with imperfect nodes
ABSTRACT From the perspective of system design and quality of service (QoS), system reliability is one of the essential performance indicators to measure the probable reliability of a network. In terms of a practical cloud computing system (CCS), edges and nodes have various capacities or states due to failure, partial failure, or maintenance. Thus, the CCS is a typical capacitated-flow network. To guarantee a good level of quality and reliability, the CCS should be maintained, so as not to fall into a failed state whereby it cannot provide sufficient capacity to satisfy demand. Thus, system reliability is developed in this paper to evaluate the capability of the CCS to send d units of data from the cloud to the client through two paths under both the maintenance budget and time constraints. An algorithm with an adjusting procedure based on the branch-and-bound approach is proposed to evaluate the system reliability. The relevant proof shows that the proposed algorithm is reasonable and appropriate for measuring the system reliability of the CCS. According to different maintenance budgets and corresponding system reliability, the system supervisor could determine a reasonable maintenance budget to maintain a good level of quality and reliability of the CCS. From the perspective of system design, the system supervisor could further conduct a sensitivity analysis to improve or investigate the most important part in a large CCS based on system reliability. © 2011 Wiley Periodicals, Inc.
- SourceAvailable from: Hamid Reza Faragardi
[Show abstract] [Hide abstract]
- "They also ignored hard disk failures and assumed that server reliability is only dependent on reliability of its processors. Another important effort on this category belong to Lin and Chang  which evaluated system reliability for a typical CCS with imperfect nodes. They proposed an algorithm based on the branch and bound approach. "
ABSTRACT: Cloud computing is widely referred as the next generation of computing systems. Reliability is a key metric for assessing performance in such systems. Redundancy and diversity are prevalent approaches to enhance reliability in Cloud Computing Systems (CCS). Proper resource allocation is an alternative approach to reliability improvement in such systems. In contrast to redundancy, appropriate resource allocation can improve system reliability without imposing extra cost. On the other hand, contemplating reliability irrespective of Quality of Service (QoS) requirements may be undesirable in most of CCSs. In this paper, we focus on resource allocation approach and introduce an analytical model in order to analyze system reliability besides considering application and resource constraints. Task precedence structure and QoS are taken into account as the application constraints. Memory and storage limitation of each server as well as maximum communication load on each link are considered as the principle resource constraints. In addition, effect of network topology on system reliability is discussed in detail and the model is extended to cover various network topologies.IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS), Japan; 06/2013
- [Show abstract] [Hide abstract]
ABSTRACT: Network systems have become more and more complex with the fast evolving technologies in order to accomplish those missions that are too complicated to be finished by any stand-alone system. The question of how reliable these network systems are facing those complex missions requires our prior attentions. Different from a stand-alone system carrying out one mission at a time, the network systems deal with many missions simultaneously. It is natural to model these missions as a mission network. In this paper, we employed a two-layer network model, including mission network and physical network, to test the network reliability against its missions, called mission-oriented network reliability. As observed by many researchers, the hierarchy is one of the most common phenomena in complex networks. By quantifying the hierarchies of the mission network and the physical network, we investigated the effects of the coupling between the hierarchies of these two networks on the mission-oriented network reliability. The simulation results show that the positive coupling between the hierarchies of the two-layer networks leads to the same positive correlation between the mission-oriented reliability and the significance of the hierarchy of the mission network; and similarly, the negative coupling between the hierarchies results in the negative correlation between the network reliability and the significance of the hierarchy of the mission network.Quality, Reliability, Risk, Maintenance, and Safety Engineering (ICQR2MSE), 2012 International Conference on; 01/2012
- [Show abstract] [Hide abstract]
ABSTRACT: Correlation poses a serious threat to many engineered systems because the simultaneous failure of multiple components can dangerously degrade performance. Given the high cost of system failures in business and mission-critical applications, methods to explicitly consider the impact of correlation on system reliability are essential. This paper constructs a stochastic-flow network model to analyze the performance of a computer network, where there exists correlation between the failures of all the physical lines and routers comprising the edges and nodes of the network. That is, we address global-scale events that can cause widespread damage to the performance of the network. We propose a simulation approach to estimate the probability that a given amount of data can be sent from a source to sink through this network. This probability that the network satisfies a specified level of demand is referred to as the system reliability. Experimental results demonstrate that correlation can produce a substantial impact on system reliability. The proposed approach, thus, captures the influence of correlation on system reliability and offers a method to quantify the utility of reducing correlation.Reliability Engineering [?] System Safety 01/2013; 109:32–40. DOI:10.1016/j.ress.2012.08.008 · 2.05 Impact Factor