Conference Paper

A Fault Tolerant Adaptive Method for the Scheduling of Tasks in Dynamic Grids

Escuela Superior de Informática, Universidad de Castilla-La Mancha, 13071
DOI: 10.1109/ADVCOMP.2009.15 Conference: Advanced Engineering Computing and Applications in Sciences, 2009. ADVCOMP '09. Third International Conference on


An essential issue in distributed high-performance computing is how to allocate efficiently the workload among the processors. This is specially important in a computational Grid where its resources are heterogeneous and dynamic. Algorithms like Quadratic Self-Scheduling (QSS) and Exponential Self-Scheduling (ESS) are useful to obtain a good load balance, reducing the communication overhead. Here, it is proposed a fault tolerant adaptive approach to schedule tasks in dynamic Grid environments. The aim of this approach is to optimize the list of chunks that QSS and ESS generates, that is, the way to schedule the tasks. For that, when the environment changes, new optimal QSS and ESS parameters are obtained to schedule the remaining tasks in an optimal way, maintaining a good load balance. Moreover, failed tasks are rescheduled. The results show that the adaptive approach obtains a good performance of both QSS and ESS even in a highly dynamic environment.

Download full-text


Available from: Javier Diaz-Montes, Oct 01, 2015
22 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Parameter sweep experiments (PSE) involve several issues. In this work, we consider two of them: the generation of the parameters space and the scheduling of the associated tasks. Thus, we propose a general model to generate the parameters space of any PSE applying the Nested Summation Symbol operator. On the other hand, for the scheduling of these kinds of problems, we test an adaptive scheduling approach with fault tolerance. This approach has been implemented, using the DRMAA-C version for GridWay, to allocate tasks in a Grid environment. In the tests, the scheduler shows a good performance. Moreover, the CPU usage of the scheduler is quite low.
    Procedia Computer Science 05/2010; 1(1):565-572. DOI:10.1016/j.procs.2010.04.060
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Grid computing is becoming a new face of distributed computing, allowing the aggregation of geographically dispersed resources. The Dynamic and heterogeneous nature of Grid makes it more vulnerable to various faults which enforces the failure of job, delay in completion of job or even execution of job from the starting point. In this paper, the empirical analysis for different faults is carried out and also discussed nine fault tolerant job scheduling approaches to deal with these faults, for each approach comparative quantitative analysis is carried out. Though there is need of providing better resource sharing, improved resource utilization and computational speed for computationally intensive applications. Although the technique based on the combination of RFOH and application checkpointing approach may provide robust fault tolerant job scheduling.
    2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA); 04/2014