Impact of Parallel Download on Job Scheduling in Data Grid Environment
ABSTRACT Data intensive applications, such as high energy physics, usually have a large amount of input data requires analysis. These data are often shared and replicated across the data grid. As the computing power increases, the delay caused by "waiting for input data" will become more pronounced. In this paper, we study the impact of parallel download on job scheduler performance in data grid environment. A parallel downloading system, that supports replicating data fragments and parallel downloading of replicated data fragments, is presented. The performance of the parallel downloading system is compared with non-parallel downloading system, using three scheduling heuristics: shortest turnaround time (STT), least relative load (LRL) and data present (DP). Our simulation results show that the proposed parallel download approach greatly improves the data grid performance for all three scheduling algorithms, in terms of the geometric mean of job turnaround time. The advantage of parallel downloading system is felt most when the data grid has relatively low network bandwidth and relatively high computing power.