A Fault Tolerance Service for QoS in Grid Computing.
ABSTRACT This paper proposes fault tolerance service to satisfy QoS requirement in grid computing. The probability of failure in the
grid computing is higher than in a tradition parallel computing. Since the failure of resources affects job execution fatally,
fault tolerance service is essential in grid computing. And grid services are often expected to meet some minimum levels of
quality of service (QoS) for desirable operation. However Globus toolkit does not provide fault tolerance service that supports
fault detection service and management service and satisfies QoS requirement. In order to provide fault tolerance service
and satisfy QoS requirements, we expand the definition of failure, such as process failure, processor failure, and network
failure. And we propose fault detection service and fault management service and show simulation results.
- SourceAvailable from: citeseerx.ist.psu.edu
Conference Proceeding: An Infrastructure for Monitoring and Management in Computational Grids.[show abstract] [hide abstract]
ABSTRACT: We present the design and implementation of an infrastructure that enables monitoring of resources, services, and applications in a computational grid and provides a toolkit to help manage these entities when faults occur. This infrastructure builds on three basic monitoring components: sensors to perform measurements, actuators to perform actions, and an event service to communicate events between remote processes. We describe how we apply our infrastructure to support a grid service and an application: (1) the Globus Metacomputing Directory Service; and (2) a long-running and coarse-grained parameter study application. We use these application to show that our monitoring infrastructure is highly modular, conveniently retargettable, and extensible.Languages, Compilers, and Run-Time Systems for Scalable Computers, 5th International Workshop, LCR 2000, Rochester, NY, USA, May 25-27, 2000, Selected Papers; 01/2000
- Scalable Computing: Practice and Experience. 01/2000; 3.
- [show abstract] [hide abstract]
ABSTRACT: Reservation and adaptation are two well-known and effective techniques for enhancing the end-to-end performance of network applications. However, both techniques also have limitations, particularly when dealing with high-bandwidth, dynamic flows: fixed-capability reservations tend to be wasteful of resources and hinder graceful degradation in the face of congestion, while adaptive techniques fail when congestion becomes excessive. We propose an approach to quality of service (QoS) that overcomes these difficulties by combining features of reservations and adaptation. In this approach, a combination of online control interfaces for resource management, a sensor permitting online monitoring, and decision procedures embedded in resources enable a rich variety of dynamic feedback interactions between applications and resources. We describe a QoS architecture, GARA, that has been extended to support these mechanisms, and use three examples of application-level adaptive strategies to show ho...07/2000;