Conference Paper

Design and analysis of a dynamic scheduling strategy with resource estimation for large-scale Grid systems

Department of Electrical & Computer Engineering , National University of Singapore, Tumasik, 00, Singapore
DOI: 10.1109/GRID.2004.19 Conference: Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on
Source: IEEE Xplore

ABSTRACT In this paper, we present a resource conscious dynamic scheduling strategy for handling large volume computationally intensive loads in a grid system involving multiple sources and sinks/processing nodes. We consider a "pull-based" strategy, wherein the processing nodes request load from the sources. We employ the Incremental Balancing Strategy (IBS) algorithm proposed in the literature and propose a buffer estimation strategy to derive optimal load distribution. We consider nontime critical loads that arrive at arbitrary times with time varying buffer availability at sinks and utilize buffer reclamation techniques so as to schedule the loads. We demonstrate detailed workings of the proposed algorithm with illustrative examples using real-life parameters derived from STAR experiments in BNL for scheduling large volume loads.

0 Followers
 · 
88 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Publisher’s description: This book constitutes the refereed proceedings of the Third International Conference on Information Systems, Technology and Management, ICISTM 2009, held in Ghaziabad, India, in March 2009 The 30 revised full papers presented together with 4 keynote papers were carefully reviewed and selected from 79 submissions. The papers are organized in topical sections on storage and retrieval systems; data mining and classification; managing digital goods and services; scheduling and distributed systems; advances in software engineering; case studies in information management; algorithms and workflows; authentication and detection systems; recommendation and negotiation; secure and multimedia systems; as well as 14 extended poster abstracts. The articles of this volume will not be reviewed individually.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Current High Performance Computing (HPC) applications have seen an explosive growth in the size of data in recent years. Many application scientists have initiated efforts to integrate data-intensive computing into computational-intensive HPC facilities, particularly for data analytics. We have observed several scientific applications which must migrate their data from an HPC storage system to a data-intensive one for analytics. There is a gap between the data semantics of HPC storage and data-intensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization must be performed before existing data-intensive tools such as MapReduce can be used to analyze data. This reorganization requires at least two complete scans through the data set and then at least one MapReduce program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. That is for every MapReduce phase, a distributed read and write operation on the file system must be performed. Our contribution is to develop a MapReduce-based framework for HPC analytics to eliminate the multiple scans and also reduce the number of data preprocessing MapReduce programs. We also implement a data-centric scheduler to further improve the performance of HPC analytics MapReduce programs by maintaining the data locality. We have added additional expressiveness to the MapReduce language to allow application scientists to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple data preprocessing MapReduce programs, and 2) the data can be simultaneously reorganized as it is migrated to the data-intensive file system. Using our augmented Map-Reduce system, MapReduce with Access Patterns (MRAP), we have demonstrated up to 33 percent throughput improvement in one real application, and up to 70 percent in an I/O kernel of another appl- cation. Our results for scheduling show up to 49 percent improvement for an I/O kernel of a prevalent HPC analysis application.
    IEEE Transactions on Parallel and Distributed Systems 01/2013; 24(1):158-169. DOI:10.1109/TPDS.2012.88 · 2.17 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract The wide-spread availability of high-speed internet access has brought about a mi- gration of computation from local company-owned servers and personal computers to shared resources or on-demand platforms. Rather than performing computation on local machines, more organizations are utilizing pooled computational resources, e.g., grid computing, or software provided as an on-demand service, e.g., cloud computing. These environments are open in that no single entity has control or full knowledge of outcomes. Entities are owned and deployed by dierent,organizations or individ- uals, who have conflicting interests. These entities can be modeled as self-interested agents with private information. The design of systems deployed in open environ- ments must be aligned with the agents’ incentives to ensure desirable outcomes. I propose open mechanism design, an open infrastructure model in which anyone can own resources and deploy mechanisms to support automated decision making and

Full-text (2 Sources)

Download
21 Downloads
Available from
May 30, 2014