Conference Paper

Fault-aware, utility-based job scheduling on Blue, Gene/P systems

Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
DOI: 10.1109/CLUSTR.2009.5289206 Conference: Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on
Source: IEEE Xplore

ABSTRACT Job scheduling on large-scale systems is an increasingly complicated affair, with numerous factors influencing scheduling policy. Addressing these concerns results in sophisticated scheduling policies that can be difficult to reason about. In this paper, we present a general utility-based scheduling framework to balance various scheduling requirements and priorities. It enables system owners to customize scheduling policies under different circumstances without changing the scheduling code. We also develop a fault-aware job allocation strategy for Blue Gene/P systems to address the increasing concern of system failures. We demonstrate the effectiveness of these facilities by means of event-driven simulations with real job traces collected from the production Blue Gene/P system at Argonne National Laboratory.

0 Bookmarks
 · 
87 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we presented a new method for job scheduling and modeled the access requests to web pages in computer network or internet with job scheduling in single machine. We simulated this model in two kinds of problems: 1. Small scaled problems (10 users). 2. Large scaled problems (100 users). The purpose of all problems is to find the minimum amount of mean and variance time. Since these problems are NP-hard, we proposed one type of innovative V shaped arrangement for job scheduling. It's possible to find the optimal response for small scaled problems with little spent time, so by examining all possible states the optimal responses (minimum mean and variance) were found and evaluated.
    Communication Systems and Network Technologies (CSNT), 2012 International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: The analysis and research of power system necessitates the current computing. However, the bottleneck of current computing lies in the limited computing capacity in power system. Cloud computing's service-oriented characteristics advance a new way of service provisioning called utility based computing, which could provide powerful computing capability for current computing. However, toward the deployment of practical current computing Cloud, we encounter one challenge that the existing job scheduling algorithms under utility based computing do not take hardware/software failure and recovery in the Cloud into account. In an attempt to address this challenge, we introduce the failure and recovery scenario in the current Cloud computing entities and propose a Reinforcement Learning (RL) based algorithm to make job scheduling in the current computing Cloud fault tolerant. We carry out experimental comparison with Resource-constrained Utility Accrual algorithm (RUA), Utility Accrual Packet scheduling algorithm (UPA) and LBESA to demonstrate the feasibility of our proposed approach.
    2012 IEEE 3rd International Conference on Software Engineering and Service Science (ICSESS); 06/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Long-term execution of scientific applications often leads to dynamic workloads and varying application requirements. When the execution uses resources provisioned from IaaS clouds, and thus consumption-related payment, efficient and online scheduling algorithms must be found. Portfolio scheduling, which selects dynamically a suitable policy from a broad portfolio, may provide a solution to this problem. However, selecting online the right policy from possibly tens of alternatives remains challenging. In this work, we introduce an abstract model to explore this selection problem. Based on the model, we present a comprehensive portfolio scheduler that includes tens of provisioning and allocation policies. We propose an algorithm that can enlarge the chance of selecting the best policy in limited time, possibly online. Through trace-based simulation, we evaluate various aspects of our portfolio scheduler, and find performance improvements from 7% to 100% in comparison with the best constituent policies and high improvement for bursty workloads.
    SC; 01/2013

Full-text (2 Sources)

Download
101 Downloads
Available from
May 28, 2014