Conference Proceeding

Impact of Parallel Download on Job Scheduling in Data Grid Environment

Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore
11/2008; DOI:10.1109/GCC.2008.57 In proceeding of: Grid and Cooperative Computing, 2008. GCC '08. Seventh International Conference on
Source: IEEE Xplore

ABSTRACT Data intensive applications, such as high energy physics, usually have a large amount of input data requires analysis. These data are often shared and replicated across the data grid. As the computing power increases, the delay caused by "waiting for input data" will become more pronounced. In this paper, we study the impact of parallel download on job scheduler performance in data grid environment. A parallel downloading system, that supports replicating data fragments and parallel downloading of replicated data fragments, is presented. The performance of the parallel downloading system is compared with non-parallel downloading system, using three scheduling heuristics: shortest turnaround time (STT), least relative load (LRL) and data present (DP). Our simulation results show that the proposed parallel download approach greatly improves the data grid performance for all three scheduling algorithms, in terms of the geometric mean of job turnaround time. The advantage of parallel downloading system is felt most when the data grid has relatively low network bandwidth and relatively high computing power.

0 0
 · 
0 Bookmarks
 · 
20 Views

Keywords

computing power increases
 
data fragments
 
data grid
 
data grid environment
 
data grid performance
 
Data intensive applications
 
data present
 
input data
 
job scheduler performance
 
job turnaround time
 
large amount
 
non-parallel downloading system
 
parallel download
 
parallel downloading system
 
proposed parallel download approach
 
replicated data fragments
 
scheduling heuristics
 
shortest turnaround time
 
simulation results
 
three scheduling algorithms