Conference Proceeding

A modeling approach for estimating execution time of long-running scientific applications

Florida Int. Univ. (FIU), Miami, FL
05/2008; DOI:10.1109/IPDPS.2008.4536214 ISBN: 978-1-4244-1693-6 pp.1 - 8 In proceeding of: Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
Source: IEEE Xplore

ABSTRACT In a grid computing environment, resources are shared among a large number of applications. Brokers and schedulers find matching resources and schedule the execution of the applications by monitoring dynamic resource availability and employing policies such as first- come-first-served and back-filling. To support applications with timeliness requirements in such an environment, brokering and scheduling algorithms must address an additional problem - they must be able to estimate the execution time of the application on the currently available resources. In this paper, we present a modeling approach to estimating the execution time of long-running scientific applications. The modeling approach we propose is generic; models can be constructed by merely observing the application execution "externally" without using intrusive techniques such as code inspection or instrumentation. The model is cross-platform; it enables prediction without the need for the application to be profiled first on the target hardware. To show the feasibility and effectiveness of this approach, we developed a resource usage model that estimates the execution time of a weather forecasting application in a multi-cluster grid computing environment. We validated the model through extensive benchmarking and profiling experiments and observed prediction errors that were within 10% of the measured values. Based on our initial experience, we believe that our approach can be used to model the execution time of other time-sensitive scientific applications; thereby, enabling the development of more intelligent brokering and scheduling algorithms.

0 0
 · 
0 Bookmarks
 · 
39 Views
  • Source
    Article: Coordinated rescheduling of Bag‐of‐Tasks for executions on multiple resource providers
    [show abstract] [hide abstract]
    ABSTRACT: Metaschedulers can distribute parts of a Bag-of-Tasks (BoT) application among various resource providers in order to speed up its execution. The expected completion time of the user application is then calculated based on the run-time estimates of all applications running and waiting for resources. However, because of inaccurate run time estimates, initial schedules are not those that provide users with the earliest completion time. These estimates increase the time distance between the first and last tasks of a BoT application, which increases average user response time, especially in multi-provider environments. This paper proposes a coordinated rescheduling algorithm to handle inaccurate run-time estimates when executing BoT applications in multi-provider environments. The coordinated rescheduling defines which tasks can have start time updated based on the expected completion time of the entire BoT application. We have also evaluated the impact of system-generated run-time estimates to schedule BoT applications on multiple providers. We performed experiments using simulations and a real distributed platform, Grid'5000. From our experiments, we obtained reductions of up to 5 and 10% for response time and slowdown metrics, respectively, by using coordinated rescheduling over a traditional rescheduling solution. Moreover, coordinated rescheduling requires little modification of existing scheduling systems. System-generated predictions, on the other hand, are more complex to be deployed and may not reduce response times as much as coordinated rescheduling. Copyright © 2011 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 09/2011; · 0.64 Impact Factor

Full-text

View
1 Download
Available from

Keywords

application execution
 
applications
 
available resources
 
execution time
 
first- come-first-served
 
intelligent brokering
 
intrusive techniques
 
long-running scientific applications
 
measured values
 
modeling approach
 
monitoring dynamic resource availability
 
multi-cluster grid
 
prediction errors
 
profiled first
 
profiling experiments
 
resource usage model
 
scheduling algorithms
 
support applications
 
time-sensitive scientific applications
 
timeliness requirements
 

S.M. Sadjadi