Conference Paper

An adaptive strategy for scheduling data-intensive applications in Grid environments

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Data-intensive applications are becoming increasingly common in Grid environments. These applications require enormous volume of data for the computation. Most conventional meta-scheduling approaches are aimed at computation intensive application and they do not take data requirement of the applications into account, thus leading to poor performance. Efficient scheduling of data-intensive applications in Grid environments is a challenging problem. In addition to process utilization and average turnaround time, it is important to consider the worst-case turnaround time in evaluating the performance of Grid scheduling strategies. In this paper, we propose an adaptive scheduling scheme that takes into account both the computational requirements and the data requirements of the jobs while making scheduling decisions. In our scheme, data transfer is viewed in par with computation and explicitly considered when scheduling. Jobs are dispatched to the sites that are optimal in terms of both data transfer time and computation time. In addition, our scheme overlaps a job's data transfer time with its own queuing time and other jobs' computation time as much as possible. Trace-based simulations show that the proposed scheme can gain significant performance benefits for data-intensive jobs.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The main part operation time of data-intensive applications are belong to the data transfer time [25][26][27]. But computation-intensive jobs process data in most of their operation time [28]. In fact, some applications don't belong to one of these two classes specifically; nevertheless they need data and computational resources proportionally to be executed. ...
Article
Full-text available
Cloud computing is one of the important approach for business actions in nowadays industry. The different characteristics of cloud such as on-demand capabilities, measured service, virtualization and rapid elasticity make the cloud more interesting in scientific organizations. With increasing number of users and jobs, optimal job scheduling becomes a strenuous process. Most available scheduling techniques in cloud only concentrate on one job type that can be data-intensive or computation-intensive. But, job scheduling based on one job type does not appropriate in the viewpoint of all environments, and sometimes may lead to wasting of resources on the other side. To discuss the problem of simultaneously taking into account both job types, Cost-based job scheduling (CJS) algorithm is proposed in this paper. The CJS algorithm uses data, processing power and network characteristics in job allocation process. Finally, we conducted simulations using CloudSim toolkit and compared CJS with other existing algorithms, like FUGE, Berger, MQS, and HPSO algorithms. CJS method can reduce the response time of submitted jobs, which may consist of data-intensive and computing -intensive jobs.
... This approach concentrates on user satisfaction by improving the resource utilization and throughp ut. This is both system centric and application centric [1,15]. The user satisfaction is achieved by allocating most suitable resources to jobs without missing their expected completion time. ...
... This approach concentrates on user satisfaction by improving the resource utilization and throughp ut. This is both system centric and application centric [1,15]. The user satisfaction is achieved by allocating most suitable resources to jobs without missing their expected completion time. ...
Article
Full-text available
Due to the rapid evolution of grid computing, which deals with the effective utilization of the globally distributed comp uter resources to solve massive problems, grid scheduling is the major focus. Efficient scheduling algorithms are the need of the hour to achieve efficient utilization of the unused CPU cycles distributed geographically in various locations. The existing job scheduling algorithms in grid computing had mainly concentrated on the system performance rather than the user satisfaction. In this paper we have presented a new prioritized user demand algorithm that mainly focuses on better meeting the deadlines of the statically available jobs as expected by the users. This algorithm also concentrates on the better utilization of the available heterogeneous resources. The performance analysis shows that the prioritized user demand algorithm performs better than the other heuristic scheduling algorithms in terms of makespan and resource utilization rate.
Article
In order to make cloud computing satisfy the quality of service (QoS) of scheduling tasks and try to maximize service profit, a cost-driven task scheduling strategy of cloud computing was proposed from the view of cloud service providers. Under the condition of meeting the QoS constraints of submitted tasks, the scheduling objective of proposed method was used to maximize the service profit per unit computing spending of the cloud environment. The corresponding task scheduling model was established, and the genetic algorithm was adopted to optimize the scheduling objective in the polynomial time complexity. The simulation testing was completed on the Cloudsim simulator. The results show that the proposed method outperforms the conventional Min-min algorithm and the improved QoS-constrained Min-min algorithm for the indexes of the measurements of scheduling makespan, the ratio of violating deadline and the service profit per unit computing spending of cloud environment.
Article
Effective task scheduling has been attracting a widespread interest in heterogeneous Grid computing environments, and the Quality of Service is an important factor in determining the scheduling efficiency. By taking the application task with the deadline and budget, a constrained Multi-objective Grid Task Collaborative Scheduling Model (MOC-CSM) is proposed to concern over the involved QoS requirements. In order to meet the different demands of different users by adjusting the weights of different objectives, the multi-objective optimization is transformed into a single objective optimization issue by using the subjection degree function, and solved in a new genetic algorithm with new designed evolution operators. The approximate optimal solution then is taken as the assistant for determining scheduling decision. The simulation conducted on the GridSim show that MOC-CSM outperforms the conventional Min- Min scheduling algorithm and the improved QoS guided Min-Min scheduling algorithm in the average makespan, the rate of violating deadline and the average scheduling cost under the same conditions. 1548-7741/
Article
Meta-scheduling systems play a crucial role in scheduling jobs that are submitted for execution and require special attention because an increasing number of jobs are being executed using a limited number of resources. The primary problem of meta-scheduling is selecting the best resources (sites) to use to execute the underlying jobs while still achieving the following objectives: reducing the mean job turnaround time, ensuring site load balance, and considering job priorities. We introduce an enhanced meta-scheduling system, called Job Nature Meta-scheduling over Grid (JNMgrid), that achieves these objectives. JNMgrid consists of three components: (1) Job Analyzer and Monitor, which is responsible for determining the types of jobs in specific ratios; (2) Job Decider, which is responsible for matching the jobs with the appropriate resources; and (3) Job Batcher, which is responsible for determining the best number of jobs for execution. The performance of JNMgrid is compared with similar existing systems, such as Random, Queue Length, File Access Cost, and File Access Cost + Job Queue Access Cost. The simulation results demonstrate that JNMgrid outperforms these systems and can thus be deployed in any grid middleware to improve sharing of limited resources among grid users.
Chapter
Full-text available
A large number of approaches to the modeling and solution of these job shop scheduling problems have been reported in the OR literature, with varying degrees of success. These approaches revolve around a series of technological advances that have occurred over that last 35 years. These include mathematical programming, dispatching rules, expert systems, neural networks, genetic algorithms, and inductive learning. In this article, we take an evolutionary view in describing how these technologies have been applied to job shop scheduling problems. To do this, we discuss a few of the most important contributions in each of these technology areas and the most recent trends.
Conference Paper
Full-text available
In this paper we report on preliminary work and architectural design carried out in the “Data Management” work package in the International Data Grid project. Our aim within a time scale of three years is to provide Grid middleware services supporting the I/O-intensive world-wide distributed next generation experiments in High-Energy Physics, Earth Observation and Bioinformatics. The goal is to specify, develop, integrate and test tools and middleware infrastructure to coherently manage and share Petabyte-range information volumes in high-throughput production-quality Grid environments. The middleware will allow secure access to massive amounts of data in a universal name-space, to move and replicate data at high speed from one geographical site to another, and to manage synchronisation of remote copies. We put much attention on clearly specifying and categorising existing work on the Grid, especially in data management in Grid related projects. Challenging use cases are described and how they map to architectural decisions concerning data access, replication, meta data management, security and query optimisation.
Conference Paper
Full-text available
Even though middleware support for grid computing has been the subject of extensive research, scheduling poli- cies for the grid context have not been much studied. In addition to processor utilization, it is important to consider the response times of jobs in evaluating the performance of grid scheduling strategies. In this paper we propose dis- tributed scheduling algorithms that use multiple simulta- neous requests at different sites. Trace-based simulations show that the use of multiple simultaneous requests pro- vides significant performance benefits. We also show how this scheme can be adapted to provide priority to local jobs, without much loss of performance.
Conference Paper
Full-text available
The Grid paradigm implies the sharing of a variety of resources across multiple administrative domains. In order to execute a work-flow using these distributed resources an instrument is needed to co-allocate resources by reaching agreements with the different local scheduling systems involved. Apart from compute resources to execute the work-flow the co-ordinated usage of other resource types must be also guaranteed, as there are for example a network connectivity with dedicated QoS parameters or a visualisation device. We present a Web Service-based MetaScheduling Service which allows to negotiate a common time slot with local resource management systems to enable the execution of a distributed work-flow. The successful negotiation process results in a formal agreement based on the WS-Agreement recommendation that is currently specified by the GRAAP working group of the Global Grid Forum. As a use case we demonstrate the integration of this MetaScheduling Service into the UNICORE middleware.
Article
Full-text available
Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. The approach is reminiscent of cooperative caching, web-caching, and peer-to-peer storage systems, but addresses different application demands. Other data-aware scheduling approaches assume dedicated resources, which can be expensive and/or inefficient if load varies significantly. To explore the feasibility of the data diffusion approach, we have extended the Falkon resource provisioning and task scheduling system to support data caching and data-aware scheduling. Performance results from both micro-benchmarks and a large scale astronomy application demonstrate that our approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.
Conference Paper
Full-text available
"Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid problem," which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources-what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges. It is this class of problem that is addressed by Grid technologies. Next, we present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. We describe requirements that we believe any such mechanisms must satisfy, and we discuss the central role played by the intergrid protocols that enable interoperability among different Grid systems. Finally, we discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. We maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.
Conference Paper
Full-text available
We are looking at the problem of scheduling compute tasks on a cluster of servers. These tasks require files that reside on a remote archive, and may also be cached on some subset of the servers. A task can only be run on a server that has the files it requires. This introduces the problem of scheduling data movement in coordination with the scheduling of computation. Our goal is to maximize throughput while minimizing data movement. FIFO scheduling is not efficient in this situation due to its lack of awareness of the data movement required. We looked at two other strategies, called shortest job first and linear programming based optimization, and compared them under various configurations.
Conference Paper
Full-text available
Even though middleware support for grid computing has been the subject of extensive research, scheduling policies for the grid context have not been much studied. In addition to processor utilization, it is important to consider the response times of jobs in evaluating the performance of grid scheduling strategies. In this paper we propose distributed scheduling algorithms that use multiple simultaneous requests at different sites. Trace-based simulations show that the use of multiple simultaneous requests provides significant performance benefits. We also show how this scheme can be adapted to provide priority to local jobs, without much loss of performance.
Conference Paper
Full-text available
As with any computer system, the performance of supercomputers depends upon the workloads that serve as their input. Unfortunately, however, there are many important aspects of the supercomputer workloads that have not been modeled, or that have been modeled only incipiently. This paper attacks this problem by considering requested time (and its relation with execution time) and the possibility of job cancellation, two aspects of the supercomputer workload that have not been modeled yet. Moreover, we also improve upon existing models for the arrival instant and partition size.
Article
Full-text available
Scheduling jobs on the IBM SP2 system and many other distributed-memory MPPs is usually done by giving each job a partition of the machine for its exclusive use. Allocating such partitions in the order in which the jobs arrive (FCFS scheduling) is fair and predictable, but suffers from severe fragmentation, leading to low utilization. This situation led to the development of the EASY scheduler which uses aggressive backfilling: Small jobs are moved ahead to fill in holes in the schedule, provided they do not delay the first job in the queue. We compare this approach with a more conservative approach in which small jobs move ahead only if they do not delay any job in the queue and show that the relative performance of the two schemes depends on the workload. For workloads typical on SP2 systems, the aggressive approach is indeed better, but, for other workloads, both algorithms are similar. In addition, we study the sensitivity of backfilling to the accuracy of the runtime estimates provided by the users and find a very surprising result. Backfilling actually works better when users overestimate the runtime by a substantial factor
Article
Full-text available
. In this paper we report on preliminary work and architectural design carried out in the "Data Management" work package in the International Data Grid project. Our aim within a time scale of three years is to provide Grid middleware services supporting the I/Ointensive world-wide distributed next generation experiments in HighEnergy Physics, Earth Observation and Bioinformatics. The goal is to specify, develop, integrate and test tools and middleware infrastructure to coherently manage and share Petabyte-range information volumes in high-throughput production-quality Grid environments. The middleware will allow secure access to massive amounts of data in a universal namespace, to move and replicate data at high speed from one geographical site to another, and to manage synchronisation of remote copies. We put much attention on clearly specifying and categorising existing work on the Grid, especially in data management in Grid related projects. Challenging use cases are d...
Article
Full-text available
ple resources respectively. In the multistage, flow shop problem each job consists of several tasks, which require processing by distinct resources; but there is a common route for all jobs. Finally, in the multistage, job shop situation, alternative resource sets and routes can be chosen, possibly for the same job, allowing the production of different part types. The third dimension, scheduling criteria, states the desired objectives to be met. "They are numerous, complex, and often conflicting" (2). Some commonly used scheduling criteria include the following: 1. Minimize total tardiness, 2. Minimize the number of late jobs, 3. Maximize system/resource utilization, 4. Minimize in-process inventory, 5. Balance resource usage, 6. Maximize production rate. The fourth dimension, parameters variability, indicates the degree of uncertainty of the various parameters of the scheduling problem. If the degree of uncertainty is insignificant (i.e., "the uncertainty in the various quantiti
Article
Full-text available
The next generation of scientific experiments and studies, popularly called as e-Science, is carried out by large collaborations of researchers distributed around the world engaged in analysis of huge collections of data generated by scientific instruments. Grid computing has emerged as an enabler for e-Science as it permits the creation of virtual organizations that bring together communities with common objectives. Within a community, data collections are stored or replicated on distributed resources to enhance storage capability or efficiency of access. In such an environment, scientists need to have the ability to carry out their studies by transparently accessing distributed data and computational resources. In this paper, we propose and develop a Grid broker that mediates access to distributed resources by (a) discovering suitable data sources for a given analysis scenario, (b) suitable computational resources, (c) optimally mapping analysis jobs to resources, (d) deploying and monitoring job execution on selected resources, (e) accessing data from local or remote data source during job execution and (f) collating and presenting results. The broker supports a declarative and dynamic parametric programming model for creating grid applications. We have used this model in grid-enabling a high energy physics analysis application (Belle Analysis Software Framework). The broker has been used in deploying Belle experiment data analysis jobs on a grid testbed, called Belle Analysis Data Grid, having resources distributed across Australia interconnected through GrangeNet.
Article
The unbounded increase in the computation and data requirements of scientific applications has necessitated the use of widely distributed compute and storage resources to meet the demand. In a widely distributed environment, data is no more locally accessible and has thus to be remotely retrieved and stored. Efficient and reliable access to data sources and archiving destinations in such an environment brings new challenges. Placing data on temporary local storage devices offers many advantages, but such "data placements" also require careful management of storage resources and data movement, i.e. allocating storage space, staging-in of input data, staging-out of generated data, and de-allocation of local storage after the data is safely stored at the destination. Traditional systems closely couple data placement and computation, and consider data placement as a side effect of computation. Data placement is either embedded in the computation and causes the computation to delay, or performed as simple scripts which do not have the privileges of a job. The insufficiency of the traditional systems and existing CPU-oriented schedulers in dealing with the complex data handling problem has yielded a new emerging era: the data-aware schedulers. One of the first examples of such schedulers is the Stork data placement scheduler. In this paper, we discuss the limitations of the traditional schedulers in handling the challenging data scheduling problem of large scale distributed applications; give our vision for the new paradigm in data-intensive scheduling; and elaborate on our case study: the Stork data placement scheduler
Article
The job-shop problem is one of the most difficult classical scheduling problems. An instance with ten jobs to be processed on ten machines formulated in 1963 was open for more than 25 years. It was finally solved by a branch-and-bound algorithm. Very simple special cases of the job-shop problem are already strongly NP-hard. After a short review of these old challenges we consider practical applications like problems in flexible manufacturing, multiprocessor task scheduling, robotic cell scheduling, railway scheduling, air traffic control which all have an underlying job-shop structure. Methods to solve these problems and new challenges in connection with them are indicated.
Conference Paper
This paper proposes extensions to the backfilling job-scheduling algorithm that significantly improve its performance. We introduce variations that sort the "backfilling order" in priority-based and randomized fashions. We examine the effectiveness of guarantees present in conservative backfilling and find that initial guarantees have limited practical value, while the performance of a "no-guarantee" algorithm can be significantly better when combined with extensions that we introduce. Our study differs from many similar studies in using traces that contain user estimates. We find that actual overestimates are large and significantly different from simple models. We propose the use of speculative backfilling and speculative test runs to counteract these large overestimations. Finally, we explore the impact of dynamic, system-directed adaptation of application parallelism. The cumulative improvements of these techniques decrease the bounded slowdown, our primary metric, to less then 15% of conservative backfilling.
Conference Paper
Grid computing is moving into two ways. The Computational Grid focuses on reducing execution time of applications that require a great number of computer processing cycles. The Data Grid provides the way to solve large scale data management problems. Data intensive applications such as High Energy Physics and Bioinformatics require both Computational and Data Grid features. Job scheduling in Grid has been mostly discussed from the perspective of computational Grid. However, scheduling on Data Grid is just a recent focus of Grid computing activities. In Data Grid environment, effective scheduling mechanism considering both computational and data storage resources must be provided for large scale data intensive applications. In this paper, we describe new scheduling model that considers both amount of computational resources and data availability in Data Grid environment. We implemented a scheduler, called Chameleon, based on the proposed application scheduling model. Chameleon shows performance improvements in data intensive applications that require both large number of processors and data replication mechanisms. The results achieved from Chameleon are presented.
Conference Paper
Computer system batch schedulers typically require information from the user upon job submission, including a runtime estimate. Inaccuracy of these runtime estimates, relative to the actual runtime of the job, has been well documented and is a perennial problem mentioned in the job scheduling literature. Typically users provide these estimates under circumstances where their job will be killed after the provided amount of time elapses. Also, users may be unaware of the potential benefits of providing accurate estimates, such as increased likelihood of backfilling. This study examines user behavior when the threat of job killing is removed, and when a tangible reward is provided for accuracy. We show that under these conditions, about half of users provide an improved estimate, but there is not a substantial improvement in the overall average accuracy.
Article
In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints (e.g., resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources.
Article
This paper proposes extensions to the backfilling jobscheduling algorithm that significantly improve its performance. We introduce variations that sort the "backfilling order" in priority-based and randomized fashions. We examine the effectiveness of guarantees present in conservative backfilling and find that initial guarantees have limited practical value, while the performance of a "no-guarantee" algorithm can be significantly better when combined with extensions that we introduce. Our study differs from many similar studies in using traces that contain user estimates. We find that actual overestimates are large and significantly different from simple models. We propose the use of speculative backfilling and speculative test runs to counteract these large overestimations. Finally, we explore the impact of dynamic, system-directed adaptation of application parallelism. The cumulative improvements of these techniques decrease the bounded slowdown, our primary metric, to less then 15% of conservative backfilling
The ANL/IBM SP Scheduling System
  • D Lifka
[23] perfSONAR: [24] NWS: www.perfsonar.net/ http://nws.cs.ucsb.edu/ewiki/ The submitted manuscript has been created by Argonne, a U.S. Department of Energy Office of Science laboratory
  • Walfredo Cirne
  • Fran Berman
  • A Comprehensive
Walfredo Cirne and Fran Berman, A Comprehensive Model of the Supercomputer Workload, in Proc. of IEEE 4th Annual Workshop on Job Scheduling Strategies for Parallel Processing. Cambridge, MA. 2001. [23] perfSONAR: [24] NWS: www.perfsonar.net/ http://nws.cs.ucsb.edu/ewiki/ The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory ("Argonne"). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE- AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up