Article

Performance problems of using system-predicted runtimes for parallel job scheduling

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Many prediction techniques based on historical data have been proposed to reduce over-estimations of job runtimes provided by users. They were shown to improve the accuracy of runtime estimates and scheduling performance of backfill policies, according to particular error metrics and average performance measures. However, using a more complete set of performance measures and a new error metric, we show potential performance problems of using previous prediction techniques for job scheduling. Furthermore, we show simply adding half of the requested runtime to each initial prediction greatly reduces the problems.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Existing studies on runtime estimation show that many researchers focus on extracting valuable information from system logs in order to build prediction model [5,[9][10][11][12]. They use big data analysis and machine learning methods to mine the potential value in the logs. ...
... The formal definitions are shown in Eqs. (9)(10)(11) where h i is the output of the previous layer in MLP. W i and b i are the trainable parameters of MLP at layer i. ...
Article
Full-text available
Accurate job finish time estimation is one of the key parts of scheduling strategy design in supercomputing systems. Existing research works concentrate on designing a better or more complex machine learning model to achieve accurate job runtime prediction based on the non-job-specific parameters. These parameters include the number of processors consumed, the user-estimated runtime, job submit time, job ID, and so on. However, we can extract more useful information from the system logs to assist the runtime prediction. The system logs in supercomputing always contain the intermediate output results and input parameters, which motivate us to analyze the running status of the job and predict the job finish time. Since VASP is one of the most popular supercomputing applications in the world, in this paper, we conduct the first investigation into running features and deeply analyze the job-specific parameters. Based on the running and job-specific features, we propose RunningNet, a dynamic finish time prediction model during job running, which contains the running features represented by a time series and the parameters features. Experiments on the VASP job set in the supercomputing system at USTC show that RunningNet achieves state-of-the-art results. The Mean Average Percentage Error metric reaches about 10.3%.
... Our previous work (Sangsuree Vasupongayya and Su-Hui Chiang, 2007) showed that under-estimations induced by system-generated runtime prediction methods have worse impact 2 Found to have better performance (Sangsuree Vasupongayya, 2008). than over-estimations provided by users. ...
Article
The impact of three under-estimate ad-justment methods on the performance of three historical based system-generated prediction methods and three inflated prediction methods are evalu-ated. The results show that an inflated prediction is the best technique for pre-dicting parallel job runtimes. The ad-justment methods have minimal or no impact on the scheduling performances. That is, the prediction results in poor maximum wait performance and no ad-justment method can improve the per-formance. The same results are ob-served when an inflated prediction is used.
Article
The European Organization for Nuclear Research (CERN) is the largest research organization for particle physics. ALICE, short for ALarge Ion Collider Experiment, serves as one of the main detectors at CERN and produces approximately 15 petabytes of data each year. The computing associated with an ALICE experiment consists of both online and offline processing. An online cluster retrieves data while an offline cluster farm performs a broad range of data analysis. Online processing occurs as collision events are streamed from the detector to the online cluster. This process compresses and calibrates the data before storing it in a data storage system for subsequent offline processing, e.g., event reconstruction. Due to the large volume of stored data to process, offline processing seeks to minimize execution time and data-staging time of the applications via a two-tier offline cluster — the Event Processing Node (EPN) as the first tier and the World LHC Grid Computing (WLGC) as the second tier. This two-tier cluster requires a smart job scheduler to efficiently manage the running of the application. Thus, we propose a runtime estimation method for this offline processing in the ALICE environment. Our approach exploits application profiles to predict the runtime of a high-performance computing (HPC) application without the need for any additional metadata. To evaluate our proposed framework, we performed our experiment on the actual ALICE applications. In addition, we also test the efficacy of our runtime estimation method to predict the run times of the HPC applications on the Amazon EC2 cloud. The results show that our approach generally delivers accurate predictions, i.e., low error percentages.
Conference Paper
We present in this paper a novel method to predict application runtimes on backfilling parallel systems. The method is based on mining historical data to obtain important parameters. These parameters are then applied to predict the runtime of future applications. It has been shown in previous works that both underestimate and inaccuracy in prediction have adverse impacts on scheduling performance of backfilling systems. In our study, we try to reduce the number of jobs that are underestimated and reduce the prediction error as much as possible. Comparing with other predictors, experimental results show that our predictor is up to 25% better with respect to the problem of underestimate. Moreover, using the metric proposed in for the accuracy, our predictor improves up to 32%.
Chapter
Full-text available
We present a technique for deriving predictions for the run times of parallel applications from the run times of “similar” applications that have executed in the past. The novel aspect of our work is the use of search techniques to determine those application characteristics that yield the best definition of similarity for the purpose of making predictions. We use four workloads recorded from parallel computers at Argonne National Laboratory, the Cornell Theory Center, and the San Diego Supercomputer Center to evaluate the effectiveness of our approach. We show that on these workloads our techniques achieve predictions that are between 14 and 60 percent better than those achieved by other researchers; our approach achieves mean prediction errors that are between 40 and 59 percent of mean application run times.
Conference Paper
Full-text available
To balance performance goals and allow administra- tors to declaratively specify high-level performance goal s, we apply complete search algorithms to design on-line job scheduling policies for workloads that run on parallel com- puter systems. We formulate a hierarchical two-level objec - tive that contains two goals commonly placed on parallel computer systems: (1) minimizing the total excessive wait; (2) minimizing the average slowdown. Ten monthly work- loads that ran on a Linux cluster (IA-64) from NCSA are used in our simulation of policies. A wide range of measures are used for performance evaluation, including the aver- age slowdown, average wait, maximum wait, and new mea- sures based on excessive wait. For the workloads studied, our results show that the best search-based scheduling pol- icy (i.e., DDS/lxf/dynB) reported here simultaneously bea ts both FCFS-backfill and LXF-backfill, each roughly provid- ing a lower bound on maximum wait and the average slow- down, respectively, among backfill policies.
Article
Full-text available
We present a holistic approach to estimation that uses rough sets theory to determine a similarity template and then compute a runtime estimate using identified similar applications. We tested the technique in two real-life data-intensive applications: data mining and high-performance computing.
Article
Full-text available
Scheduling jobs on the IBM SP2 system and many other distributed-memory MPPs is usually done by giving each job a partition of the machine for its exclusive use. Allocating such partitions in the order in which the jobs arrive (FCFS scheduling) is fair and predictable, but suffers from severe fragmentation, leading to low utilization. This situation led to the development of the EASY scheduler which uses aggressive backfilling: Small jobs are moved ahead to fill in holes in the schedule, provided they do not delay the first job in the queue. We compare this approach with a more conservative approach in which small jobs move ahead only if they do not delay any job in the queue and show that the relative performance of the two schemes depends on the workload. For workloads typical on SP2 systems, the aggressive approach is indeed better, but, for other workloads, both algorithms are similar. In addition, we study the sensitivity of backfilling to the accuracy of the runtime estimates provided by the users and find a very surprising result. Backfilling actually works better when users overestimate the runtime by a substantial factor
Conference Paper
Scheduling algorithms that use application and system knowledge have been shown to be more effective at scheduling parallel jobs on a multiprocessor than algorithms that do not. This paper focuses on obtaining such information for use by a scheduler in a network of workstations environment. The log files from three parallel systems are examined to determine both how to categorize parallel jobs for storage in a job database and what job information would be useful to a schedules. A Historical Profiler is proposed that stores information about programs and users, and manipulates this information to provide schedulers with execution time predictions. Several preemptive and non-preemptive versions of the FCFS, EASY and Least Work First scheduling algorithms are compared to evaluate the utility of the profiler. It is found that both preemption and the use of application execution time predictions obtained from the Historical Profiler lead to improved performance.
Conference Paper
The question of whether more accurate requested runtimes can significantly improve production parallel system performance has previously been studied for the FCFS-backfill scheduler, using a limited set of system performance measures. This paper examines the question for higher performance backfill policies, heavier system loads as are observed in current leading edge production systems such as the large Origin 2000 system at NCSA, and a broader range of system performance measures. The new results show that more accurate requested runtimes can improve system performance much more significantly than suggested in previous results. For example, average slowdown decreases by a factor of two to six, depending on system load and the fraction of jobs that have the more accurate requests. The new results also show that (a) nearly all of the performance improvement is realized even if the more accurate runtime requests are a factor of two higher than the actual runtimes, (b) most of the performance improvement is achieved when test runs are used to obtain more accurate runtime requests, and (c) in systems where only a fraction (e.g., 60%) of the jobs provide approximately accurate runtime requests, the users that provide the approximately accurate requests achieve even greater improvements in performance, such as an order of magnitude improvement in average slowdown for jobs that have runtime up to fifty hours.
Conference Paper
Scheduling algorithms that use application and system knowl edge have been shown to be more effective at scheduling parallel jobs; on a multiprocessor than algorithms that do not. This paper focuses on obtaining such information for use by a scheduler in a network of workstations environment. The log files from three parallel systems are examined to determine: both how to categorize parallel jobs for storage in a job database and what job information would be useful to a scheduler. A Historical Profiler is proposed that stores information about programs and users, and manipulates this information to provide schedulers with execution time predictions. Several preemptive and non-preemptive versions of the FCFS, EASY and Least Work First scheduling algorithms are compared to evaluate the utility of the profiler. It is found that both preemption and the use of application execution time predictions obtained fi om the Historical Profiler lead to improved performance.
Conference Paper
Using historical information to predict future runs of parallel jobs has shown to be valuable in job scheduling. Trends toward more flexible job- scheduling techniques such as adaptive resource allocation, and toward the expansion of scheduling to grids, make runtime predictions even more important. We present a technique of employing both a user's knowledge of his/her parallel application and historical application-run data, synthesizing them to derive accurate and scalable predictions for future runs. These scalable predictions apply to runtime characteristics for different numbers of nodes (processor scalability) and different problem sizes (problem-size scalability). We employ multiple linear regression and show that for decently accurate complexity models, good prediction accuracy can be obtained.
Conference Paper
This paper proposes extensions to the backfilling job-scheduling algorithm that significantly improve its performance. We introduce variations that sort the "backfilling order" in priority-based and randomized fashions. We examine the effectiveness of guarantees present in conservative backfilling and find that initial guarantees have limited practical value, while the performance of a "no-guarantee" algorithm can be significantly better when combined with extensions that we introduce. Our study differs from many similar studies in using traces that contain user estimates. We find that actual overestimates are large and significantly different from simple models. We propose the use of speculative backfilling and speculative test runs to counteract these large overestimations. Finally, we explore the impact of dynamic, system-directed adaptation of application parallelism. The cumulative improvements of these techniques decrease the bounded slowdown, our primary metric, to less then 15% of conservative backfilling.
Conference Paper
In a computational Grid which consists of many computer clusters, job start time predictions are useful to guide resource selections and balance the workload distribution. However, the basic Grid middleware available today either has no means of expressing the time that a site will take before starting a job or uses a simple linear scale. In this paper we introduce a system for predicting job start times on clusters. Our predictions are based on statistical analysis of historical job traces and simulation of site schedulers. We have deployed the system on the EDG (European Data-Grid) production cluster at NIKHEF. The experimental results show that acceptable prediction accuracy is achieved to reflect real site states and site-specific scheduling policies. We find that the average error of our job start time predictions is 18.9 percent of the average job queue wait time and this is around 20 times smaller than the average prediction error using the EDG solution.
Article
The most commonly used scheduling algorithm for parallel supercomputers is FCFS with backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs are allowed to run ahead of their time provided they do not delay previously queued jobs (or at least the first queued job). However, predictions have not been incorporated into production schedulers, partially due to a misconception (that we resolve) claiming inaccuracy actually improves performance, but mainly because underprediction is technically unacceptable: users will not tolerate jobs being killed just because system predictions were too short. We solve this problem by divorcing kill-time from the runtime prediction and correcting predictions adaptively as needed if they are proved wrong. The end result is a surprisingly simple scheduler, which requires minimal deviations from current practices (e.g., using FCFS as the basis) and behaves exactly like EASY as far as users are concerned; nevertheless, it achieves significant improvements in performance, predictability, and accuracy. Notably, this is based on a very simple runtime predictor that just averages the runtimes of the last two jobs by the same user; counter intuitively, our results indicate that using recent data is more important than mining the history for similar jobs. All the techniques suggested in this paper can be used to enhance any backfilling algorithm and are not limited to EASY
Article
. Jobs that do not require all processors in the system can be packed together for gang scheduling. We examine accounting traces from several parallel computers to show that indeed many jobs have small sizes and can be packed together. We then formulate a number of such packing algorithms, and evaluate their effectiveness using simulations based on our workload study. The results are that two algorithms are the best: either perform the mapping based on a buddy system of processors, or use migration to re-map the jobs more tightly whenever a job arrives or terminates. Other approaches, such as mapping to the least loaded PEs, proved to be counterproductive. The buddy system approach depends on the capability to gang-schedule jobs in multiple slots, if there is space. The migration algorithm is more robust, but is expected to suffer greatly due to the overhead of the migration itself. In either case fragmentation is not an issue, and utilization may top 90% with sufficiently...