Conference Paper

An adaptive strategy for scheduling data-intensive applications in Grid environments

May 2010

May 2010

DOI:10.1109/ICTEL.2010.5478755

Source
IEEE Xplore

Conference: Telecommunications (ICT), 2010 IEEE 17th International Conference on

Authors:

Wantao Liu

Chinese Academy of Sciences

Raj Kettimuthu

Argonne National Laboratory

Ian Foster

University of Chicago

Data-intensive applications are becoming increasingly common in Grid environments. These applications require enormous volume of data for the computation. Most conventional meta-scheduling approaches are aimed at computation intensive application and they do not take data requirement of the applications into account, thus leading to poor performance. Efficient scheduling of data-intensive applications in Grid environments is a challenging problem. In addition to process utilization and average turnaround time, it is important to consider the worst-case turnaround time in evaluating the performance of Grid scheduling strategies. In this paper, we propose an adaptive scheduling scheme that takes into account both the computational requirements and the data requirements of the jobs while making scheduling decisions. In our scheme, data transfer is viewed in par with computation and explicitly considered when scheduling. Jobs are dispatched to the sites that are optimal in terms of both data transfer time and computation time. In addition, our scheme overlaps a job's data transfer time with its own queuing time and other jobs' computation time as much as possible. Trace-based simulations show that the proposed scheme can gain significant performance benefits for data-intensive jobs.

Cost-based job scheduling strategy in cloud computing environments

Article

Full-text available

Jun 2020
DISTRIB PARALLEL DAT

Cloud computing is one of the important approach for business actions in nowadays industry. The different characteristics of cloud such as on-demand capabilities, measured service, virtualization and rapid elasticity make the cloud more interesting in scientific organizations. With increasing number of users and jobs, optimal job scheduling becomes a strenuous process. Most available scheduling techniques in cloud only concentrate on one job type that can be data-intensive or computation-intensive. But, job scheduling based on one job type does not appropriate in the viewpoint of all environments, and sometimes may lead to wasting of resources on the other side. To discuss the problem of simultaneously taking into account both job types, Cost-based job scheduling (CJS) algorithm is proposed in this paper. The CJS algorithm uses data, processing power and network characteristics in job allocation process. Finally, we conducted simulations using CloudSim toolkit and compared CJS with other existing algorithms, like FUGE, Berger, MQS, and HPSO algorithms. CJS method can reduce the response time of submitted jobs, which may consist of data-intensive and computing -intensive jobs.

Prioritized user demand approach for scheduling meta tasks on Heterogeneous grid environment

Article

Full-text available

Jan 2011

Prioritized User Demand Approach for Scheduling Meta Tasks on Heterogeneous Grid Environment

Article

Full-text available

Jun 2011

Due to the rapid evolution of grid computing, which deals with the effective utilization of the globally distributed comp uter resources to solve massive problems, grid scheduling is the major focus. Efficient scheduling algorithms are the need of the hour to achieve efficient utilization of the unused CPU cycles distributed geographically in various locations. The existing job scheduling algorithms in grid computing had mainly concentrated on the system performance rather than the user satisfaction. In this paper we have presented a new prioritized user demand algorithm that mainly focuses on better meeting the deadlines of the statically available jobs as expected by the users. This algorithm also concentrates on the better utilization of the available heterogeneous resources. The performance analysis shows that the prioritized user demand algorithm performs better than the other heuristic scheduling algorithms in terms of makespan and resource utilization rate.

A cost-driven task scheduling strategy of cloud computing

Article

Mar 2014

In order to make cloud computing satisfy the quality of service (QoS) of scheduling tasks and try to maximize service profit, a cost-driven task scheduling strategy of cloud computing was proposed from the view of cloud service providers. Under the condition of meeting the QoS constraints of submitted tasks, the scheduling objective of proposed method was used to maximize the service profit per unit computing spending of the cloud environment. The corresponding task scheduling model was established, and the genetic algorithm was adopted to optimize the scheduling objective in the polynomial time complexity. The simulation testing was completed on the Cloudsim simulator. The results show that the proposed method outperforms the conventional Min-min algorithm and the improved QoS-constrained Min-min algorithm for the indexes of the measurements of scheduling makespan, the ratio of violating deadline and the service profit per unit computing spending of cloud environment.

Multi-objective constraint grid task collaborative scheduling model

Article

Nov 2012

Effective task scheduling has been attracting a widespread interest in heterogeneous Grid computing environments, and the Quality of Service is an important factor in determining the scheduling efficiency. By taking the application task with the deadline and budget, a constrained Multi-objective Grid Task Collaborative Scheduling Model (MOC-CSM) is proposed to concern over the involved QoS requirements. In order to meet the different demands of different users by adjusting the weights of different objectives, the multi-objective optimization is transformed into a single objective optimization issue by using the subjection degree function, and solved in a new genetic algorithm with new designed evolution operators. The approximate optimal solution then is taken as the assistant for determining scheduling decision. The simulation conducted on the GridSim show that MOC-CSM outperforms the conventional Min- Min scheduling algorithm and the improved QoS guided Min-Min scheduling algorithm in the average makespan, the rate of violating deadline and the average scheduling cost under the same conditions. 1548-7741/

An enhanced meta-scheduling system for grid computing that considers the job type and priority

Article

May 2012

Meta-scheduling systems play a crucial role in scheduling jobs that are submitted for execution and require special attention because an increasing number of jobs are being executed using a limited number of resources. The primary problem of meta-scheduling is selecting the best resources (sites) to use to execute the underlying jobs while still achieving the following objectives: reducing the mean job turnaround time, ensuring site load balance, and considering job priorities. We introduce an enhanced meta-scheduling system, called Job Nature Meta-scheduling over Grid (JNMgrid), that achieves these objectives. JNMgrid consists of three components: (1) Job Analyzer and Monitor, which is responsible for determining the types of jobs in specific ratios; (2) Job Decider, which is responsible for matching the jobs with the appropriate resources; and (3) Job Batcher, which is responsible for determining the best number of jobs for execution. The performance of JNMgrid is compared with similar existing systems, such as Random, Queue Length, File Access Cost, and File Access Cost + Job Queue Access Cost. The simulation results demonstrate that JNMgrid outperforms these systems and can thus be deployed in any grid middleware to improve sharing of limited resources among grid users.

Survey of Job Shop Scheduling Techniques

Chapter

Full-text available

Dec 1999

A large number of approaches to the modeling and solution of these job shop scheduling problems have been reported in the OR literature, with varying degrees of success. These approaches revolve around a series of technological advances that have occurred over that last 35 years. These include mathematical programming, dispatching rules, expert systems, neural networks, genetic algorithms, and inductive learning. In this article, we take an evolutionary view in describing how these technologies have been applied to job shop scheduling problems. To do this, we discuss a few of the most important contributions in each of these technology areas and the most recent trends.

Data Management in an International Data Grid Project

Conference Paper

Full-text available

Dec 2000

In this paper we report on preliminary work and architectural design carried out in the “Data Management” work package in the International Data Grid project. Our aim within a time scale of three years is to provide Grid middleware services supporting the I/O-intensive world-wide distributed next generation experiments in High-Energy Physics, Earth Observation and Bioinformatics. The goal is to specify, develop, integrate and test tools and middleware infrastructure to coherently manage and share Petabyte-range information volumes in high-throughput production-quality Grid environments. The middleware will allow secure access to massive amounts of data in a universal name-space, to move and replicate data at high speed from one geographical site to another, and to manage synchronisation of remote copies. We put much attention on clearly specifying and categorising existing work on the Grid, especially in data management in Grid related projects. Challenging use cases are described and how they map to architectural decisions concerning data access, replication, meta data management, security and query optimisation.

Distributed Job Scheduling on Computational Grids Using Multiple Simultaneous Requests.

Conference Paper

Full-text available

Jan 2002

Even though middleware support for grid computing has been the subject of extensive research, scheduling poli- cies for the grid context have not been much studied. In addition to processor utilization, it is important to consider the response times of jobs in evaluating the performance of grid scheduling strategies. In this paper we propose dis- tributed scheduling algorithms that use multiple simulta- neous requests at different sites. Trace-based simulations show that the use of multiple simultaneous requests pro- vides significant performance benefits. We also show how this scheme can be adapted to provide priority to local jobs, without much loss of performance.

A Meta-scheduling Service for Co-allocating Arbitrary Types of Resources

Conference Paper

Full-text available

Sep 2005
Lect Notes Comput Sci

The Grid paradigm implies the sharing of a variety of resources across multiple administrative domains. In order to execute a work-flow using these distributed resources an instrument is needed to co-allocate resources by reaching agreements with the different local scheduling systems involved. Apart from compute resources to execute the work-flow the co-ordinated usage of other resource types must be also guaranteed, as there are for example a network connectivity with dedicated QoS parameters or a visualisation device. We present a Web Service-based MetaScheduling Service which allows to negotiate a common time slot with local resource management systems to enable the execution of a distributed work-flow. The successful negotiation process results in a formal agreement based on the WS-Agreement recommendation that is currently specified by the GRAAP working group of the Global Grid Forum. As a use case we demonstrate the integration of this MetaScheduling Service into the UNICORE middleware.

Accelerating Large-scale Data Exploration through Data Diffusion

Article

Full-text available

Jun 2008

Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. The approach is reminiscent of cooperative caching, web-caching, and peer-to-peer storage systems, but addresses different application demands. Other data-aware scheduling approaches assume dedicated resources, which can be expensive and/or inefficient if load varies significantly. To explore the feasibility of the data diffusion approach, we have extended the Falkon resource provisioning and task scheduling system to support data caching and data-aware scheduling. Performance results from both micro-benchmarks and a large scale astronomy application demonstrate that our approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.

The Anatomy of the Grid: Enabling Scalable Virtual Organizations

Conference Paper

Full-text available

Aug 2001
INT J HIGH PERFORM C

"Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid problem," which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources-what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges. It is this class of problem that is addressed by Grid technologies. Next, we present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. We describe requirements that we believe any such mechanisms must satisfy, and we discuss the central role played by the intergrid protocols that enable interoperability among different Grid systems. Finally, we discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. We maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.

Coordination of data movement with computation scheduling on a cluster

Conference Paper

Full-text available

Aug 2005

We are looking at the problem of scheduling compute tasks on a cluster of servers. These tasks require files that reside on a remote archive, and may also be cached on some subset of the servers. A task can only be run on a server that has the files it requires. This introduces the problem of scheduling data movement in coordination with the scheduling of computation. Our goal is to maximize throughput while minimizing data movement. FIFO scheduling is not efficient in this situation due to its lack of awareness of the data movement required. We looked at two other strategies, called shortest job first and linear programming based optimization, and compared them under various configurations.

Distributed job scheduling on computational Grids using multiple simultaneous requests

Conference Paper

Full-text available

Feb 2002

Even though middleware support for grid computing has been the subject of extensive research, scheduling policies for the grid context have not been much studied. In addition to processor utilization, it is important to consider the response times of jobs in evaluating the performance of grid scheduling strategies. In this paper we propose distributed scheduling algorithms that use multiple simultaneous requests at different sites. Trace-based simulations show that the use of multiple simultaneous requests provides significant performance benefits. We also show how this scheme can be adapted to provide priority to local jobs, without much loss of performance.

A comprehensive model of the supercomputer workload

Conference Paper

Full-text available

Jan 2002

As with any computer system, the performance of supercomputers depends upon the workloads that serve as their input. Unfortunately, however, there are many important aspects of the supercomputer workloads that have not been modeled, or that have been modeled only incipiently. This paper attacks this problem by considering requested time (and its relation with execution time) and the possibility of job cancellation, two aspects of the supercomputer workload that have not been modeled yet. Moreover, we also improve upon existing models for the arrival instant and partition size.

Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling

Article

Full-text available

Jul 2001

Scheduling jobs on the IBM SP2 system and many other distributed-memory MPPs is usually done by giving each job a partition of the machine for its exclusive use. Allocating such partitions in the order in which the jobs arrive (FCFS scheduling) is fair and predictable, but suffers from severe fragmentation, leading to low utilization. This situation led to the development of the EASY scheduler which uses aggressive backfilling: Small jobs are moved ahead to fill in holes in the schedule, provided they do not delay the first job in the queue. We compare this approach with a more conservative approach in which small jobs move ahead only if they do not delay any job in the queue and show that the relative performance of the two schemes depends on the workload. For workloads typical on SP2 systems, the aggressive approach is indeed better, but, for other workloads, both algorithms are similar. In addition, we study the sensitivity of backfilling to the accuracy of the runtime estimates provided by the users and find a very surprising result. Backfilling actually works better when users overestimate the runtime by a substantial factor

Data Management in an International Data Grid Project

Article

Full-text available

Aug 2000
Lect Notes Comput Sci

. In this paper we report on preliminary work and architectural design carried out in the "Data Management" work package in the International Data Grid project. Our aim within a time scale of three years is to provide Grid middleware services supporting the I/Ointensive world-wide distributed next generation experiments in HighEnergy Physics, Earth Observation and Bioinformatics. The goal is to specify, develop, integrate and test tools and middleware infrastructure to coherently manage and share Petabyte-range information volumes in high-throughput production-quality Grid environments. The middleware will allow secure access to massive amounts of data in a universal namespace, to move and replicate data at high speed from one geographical site to another, and to manage synchronisation of remote copies. We put much attention on clearly specifying and categorising existing work on the Grid, especially in data management in Grid related projects. Challenging use cases are d...

Survey of Job Shop Scheduling Techniques

Article

Full-text available

Dec 1998

ple resources respectively. In the multistage, flow shop problem each job consists of several tasks, which require processing by distinct resources; but there is a common route for all jobs. Finally, in the multistage, job shop situation, alternative resource sets and routes can be chosen, possibly for the same job, allowing the production of different part types. The third dimension, scheduling criteria, states the desired objectives to be met. "They are numerous, complex, and often conflicting" (2). Some commonly used scheduling criteria include the following: 1. Minimize total tardiness, 2. Minimize the number of late jobs, 3. Maximize system/resource utilization, 4. Minimize in-process inventory, 5. Balance resource usage, 6. Maximize production rate. The fourth dimension, parameters variability, indicates the degree of uncertainty of the various parameters of the scheduling problem. If the degree of uncertainty is insignificant (i.e., "the uncertainty in the various quantiti

A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids

Article

Full-text available

Jun 2004

The next generation of scientific experiments and studies, popularly called as e-Science, is carried out by large collaborations of researchers distributed around the world engaged in analysis of huge collections of data generated by scientific instruments. Grid computing has emerged as an enabler for e-Science as it permits the creation of virtual organizations that bring together communities with common objectives. Within a community, data collections are stored or replicated on distributed resources to enhance storage capability or efficiency of access. In such an environment, scientists need to have the ability to carry out their studies by transparently accessing distributed data and computational resources. In this paper, we propose and develop a Grid broker that mediates access to distributed resources by (a) discovering suitable data sources for a given analysis scenario, (b) suitable computational resources, (c) optimally mapping analysis jobs to resources, (d) deploying and monitoring job execution on selected resources, (e) accessing data from local or remote data source during job execution and (f) collating and presenting results. The broker supports a declarative and dynamic parametric programming model for creating grid applications. We have used this model in grid-enabling a high energy physics analysis application (Belle Analysis Software Framework). The broker has been used in deploying Belle experiment data analysis jobs on a grid testbed, called Belle Analysis Data Grid, having resources distributed across Australia interconnected through GrangeNet.

A new paradigm in data intensive computing: Stork and the data-aware schedulers

Article

Jan 2006

Tevfik Kosar

The unbounded increase in the computation and data requirements of scientific applications has necessitated the use of widely distributed compute and storage resources to meet the demand. In a widely distributed environment, data is no more locally accessible and has thus to be remotely retrieved and stored. Efficient and reliable access to data sources and archiving destinations in such an environment brings new challenges. Placing data on temporary local storage devices offers many advantages, but such "data placements" also require careful management of storage resources and data movement, i.e. allocating storage space, staging-in of input data, staging-out of generated data, and de-allocation of local storage after the data is safely stored at the destination. Traditional systems closely couple data placement and computation, and consider data placement as a side effect of computation. Data placement is either embedded in the computation and causes the computation to delay, or performed as simple scripts which do not have the privileges of a job. The insufficiency of the traditional systems and existing CPU-oriented schedulers in dealing with the complex data handling problem has yielded a new emerging era: the data-aware schedulers. One of the first examples of such schedulers is the Stork data placement scheduler. In this paper, we discuss the limitations of the traditional schedulers in handling the challenging data scheduling problem of large scale distributed applications; give our vision for the new paradigm in data-intensive scheduling; and elaborate on our case study: the Stork data placement scheduler

The job-shop problem: old and new challenges

Article

Jan 2007

Peter Brucker

The job-shop problem is one of the most difficult classical scheduling problems. An instance with ten jobs to be processed on ten machines formulated in 1963 was open for more than 25 years. It was finally solved by a branch-and-bound algorithm. Very simple special cases of the job-shop problem are already strongly NP-hard. After a short review of these old challenges we consider practical applications like problems in flexible manufacturing, multiprocessor task scheduling, robotic cell scheduling, railway scheduling, air traffic control which all have an underlying job-shop structure. Methods to solve these problems and new challenges in connection with them are indicated.

Randomization, Speculation, and Adaptation in Batch Schedulers

Conference Paper

Jan 2000

This paper proposes extensions to the backfilling job-scheduling algorithm that significantly improve its performance. We introduce variations that sort the "backfilling order" in priority-based and randomized fashions. We examine the effectiveness of guarantees present in conservative backfilling and find that initial guarantees have limited practical value, while the performance of a "no-guarantee" algorithm can be significantly better when combined with extensions that we introduce. Our study differs from many similar studies in using traces that contain user estimates. We find that actual overestimates are large and significantly different from simple models. We propose the use of speculative backfilling and speculative test runs to counteract these large overestimations. Finally, we explore the impact of dynamic, system-directed adaptation of application parallelism. The cumulative improvements of these techniques decrease the bounded slowdown, our primary metric, to less then 15% of conservative backfilling.

Chameleon: A resource scheduler in a data grid environment

Conference Paper

Jun 2003

Grid computing is moving into two ways. The Computational Grid focuses on reducing execution time of applications that require a great number of computer processing cycles. The Data Grid provides the way to solve large scale data management problems. Data intensive applications such as High Energy Physics and Bioinformatics require both Computational and Data Grid features. Job scheduling in Grid has been mostly discussed from the perspective of computational Grid. However, scheduling on Data Grid is just a recent focus of Grid computing activities. In Data Grid environment, effective scheduling mechanism considering both computational and data storage resources must be provided for large scale data intensive applications. In this paper, we describe new scheduling model that considers both amount of computational resources and data availability in Data Grid environment. We implemented a scheduler, called Chameleon, based on the proposed application scheduling model. Chameleon shows performance improvements in data intensive applications that require both large number of processors and data replication mechanisms. The results achieved from Chameleon are presented.

Are User Runtime Estimates Inherently Inaccurate?

Conference Paper

Jul 2004

Computer system batch schedulers typically require information from the user upon job submission, including a runtime estimate. Inaccuracy of these runtime estimates, relative to the actual runtime of the job, has been well documented and is a perennial problem mentioned in the job scheduling literature. Typically users provide these estimates under circumstances where their job will be killed after the provided amount of time elapses. Also, users may be unaware of the potential benefits of providing accurate estimates, such as increased likelihood of backfilling. This study examines user behavior when the threat of job killing is removed, and when a tangible reward is provided for accuracy. We show that under these conditions, about half of users provide an improved estimate, but there is not a substantial improvement in the overall average accuracy.

Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications

Article

Oct 2002

In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints (e.g., resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources.

Randomization, Speculation, and Adaptation in Batch Schedulers

Article

Feb 2001

This paper proposes extensions to the backfilling jobscheduling algorithm that significantly improve its performance. We introduce variations that sort the "backfilling order" in priority-based and randomized fashions. We examine the effectiveness of guarantees present in conservative backfilling and find that initial guarantees have limited practical value, while the performance of a "no-guarantee" algorithm can be significantly better when combined with extensions that we introduce. Our study differs from many similar studies in using traces that contain user estimates. We find that actual overestimates are large and significantly different from simple models. We propose the use of speculative backfilling and speculative test runs to counteract these large overestimations. Finally, we explore the impact of dynamic, system-directed adaptation of application parallelism. The cumulative improvements of these techniques decrease the bounded slowdown, our primary metric, to less then 15% of conservative backfilling

The ANL/IBM SP Scheduling System

D Lifka

[23] perfSONAR: [24] NWS: www.perfsonar.net/ http://nws.cs.ucsb.edu/ewiki/ The submitted manuscript has been created by Argonne, a U.S. Department of Energy Office of Science laboratory

Jan 2001

Walfredo Cirne
Fran Berman
A Comprehensive

Walfredo Cirne and Fran Berman, A Comprehensive Model of the Supercomputer Workload, in Proc. of IEEE 4th Annual Workshop on Job Scheduling Strategies for Parallel Processing. Cambridge, MA. 2001. [23] perfSONAR: [24] NWS: www.perfsonar.net/ http://nws.cs.ucsb.edu/ewiki/ The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory ("Argonne"). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE- AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up

An adaptive strategy for scheduling data-intensive applications in Grid environments

Abstract

No full-text available

Recommended publications

A New Method of File Transfer in Computational Grid Using P2P Technique

Performance and Availability Tradeoffs in Replicated File Systems

Block-based grid caching for grid datafarm

Next Generation Parallel Virtual File System.