Evaluation of gang scheduling performance and cost in a cloud computing system
ABSTRACT Cloud Computing refers to the notion of outsourcing on-site available services, computational facilities, or data storage
to an off-site, location-transparent centralized facility or “Cloud.” Gang Scheduling is an efficient job scheduling algorithm
for time sharing, already applied in parallel and distributed systems. This paper studies the performance of a distributed
Cloud Computing model, based on the Amazon Elastic Compute Cloud (EC2) architecture that implements a Gang Scheduling scheme.
Our model utilizes the concept of Virtual Machines (or VMs) which act as the computational units of the system. Initially,
the system includes no VMs, but depending on the computational needs of the jobs being serviced new VMs can be leased and
later released dynamically. A simulation of the aforementioned model is used to study, analyze, and evaluate both the performance
and the overall cost of two major gang scheduling algorithms. Results reveal that Gang Scheduling can be effectively applied
in a Cloud Computing environment both performance-wise and cost-wise.
KeywordsCloud computing–Gang scheduling–HPC–Virtual machines
- SourceAvailable from: Nivethitha Somu[Show abstract] [Hide abstract]
ABSTRACT: Cloud computing provides us with the massive pool of resources in terms of pay-as-you-use policy. Cloud delivers these resources on demand through the use of network resources under different load conditions. As the users will be charged based on their usage the effective utilization of resources poses a major challenge. To accomplish this, a service request scheduling algorithm which reduces the waiting time of the task in the scheduler and maximizes the Quality of Service (QoS) is needed. Our proposed algorithm named Effective Resource Utilization Algorithm (ERUA) is based on 3-tier cloud architecture (Consumer, Service Provider and the Resource Provider) which benefits both the user (QoS) and the service provider (Cost) through effective schedule reallocation based on utilization ratio leading to better resource utilization. Performance analysis made with the existing scheduling techniques shows that our algorithm gives out a more optimized schedule and enhances the efficiency rate. I INTRODUCTION Cloud computing an emerging and an enabling technology which made us to think beyond what is possible. Realizing the services and amenities provided by the cloud many organizations decided to jump into cloud in order to reduce the infrastructure cost and energy consumption. Cloud makes them to move their business with different range and style of services. It had the changed the traditional way of using the resource infrastructure. Service request scheduling is the most crucial area with respect to the profit of the service provider and the QoS of the user. Cloud computing services are offered based on 3-tier architecture. The entire architecture of a cloud with respect to service request scheduling comprises of the resource provider, the service providers and the consumers. In order to service the request given by the consumer, the service provider needs either to procure new hardware resources or to rent it from resource provider. However, getting resource on rental basis incurs less cost than buying a new one. The service provider hires resources from the resource provider and creates Virtual Machine (VM) instances dynamically to serve the consumers. Resource provider takes on the responsibility of dispatching the VM's to the physical server. Charges for the running instance are based on the flat rate (/time unit). Users submit their request for processing an application consists of one or more services. These services along with the time and cost parameters are sent to the service provider. In general the actual processing time of a request is much longer than its estimated time as there incurs some delay at the service provider site. As the cloud is a form of "pay-as-you-use" utility, the service provider needs to reduce the response time and delay. Over here service request scheduling becomes an essential element to reduce maximize the profit of service provider and to improve the QoS offered to the user. Earlier research contributions towards service request scheduling algorithms were on SERver CONsolidation , optimized service scheduling algorithm , scheduling policy based on priority and admission control , integration of VM for sorting tasks based on the profit , multiple pheromone algorithm , gang scheduling on VM , utility model to balance the profit between the user and the service provider , dynamic service request resource allocation through gi-First In First Out (FIFO) , Service Level Agreement (SLA) creation, management and usage in utility computing , scheduling dynamic user request to maximize the profit of the service provider , Ant Colony Optimization (ACO) , Particle Swam Optimization (PSO) , dynamic distribution of user request between the application services in a decentralized way , scheduling algorithm based on genetic algorithm to reduce the waiting time , task consolidation heuristics with respect to idle and active energy consumption , pricing model based on processor – sharing through max_profit and max_utility , optimized service request – resource mapping using genetic algorithm , dynamic priority scheduling algorithm . Our algorithm ERUA for service request scheduling schedules the task units based on the utilization ratio of the queue. It always ensures that the utilization ratio always falls within 1 leading to better resource utilization Ramkumar N et al. / International Journal of Engineering and Technology (IJET)04/2013;
- [Show abstract] [Hide abstract]
ABSTRACT: Cloud Computing can be viewed as a dynamically-scalable pool of resources. Virtualization is one of the key technologies enabling Cloud Computing functionalities. Virtual machines (VMs) scheduling and allocation is essential in Cloud Computing environment. In this paper, two dynamic VMs scheduling and allocating schemes are presented and compared. One dynamically on-demand allocates VMs while the other deploys optimal threshold to control the scheduling and allocating of VMs. The aim is to dynamically allocate the virtual resources among the Cloud Computing applications based on their load changes to improve resource utilization and reduce the user usage cost. The schemes are implemented by using SimPy, and the simulation results show that the proposed adaptive scheme with one threshold can be effectively applied in a Cloud Computing environment both performance-wise and cost-wise.International Journal of Contents. 01/2012; 8(4).
- [Show abstract] [Hide abstract]
ABSTRACT: Recent studies have found cloud environments increasingly appealing for executing HPC applications, including tightly coupled parallel simulations. While public clouds offer elastic, on-demand resource provisioning and pay-as-you-go pricing, individual users setting up their on-demand virtual clusters may not be able to take full advantage of common cost-saving opportunities, such as reserved instances. In this paper, we propose a Semi-Elastic Cluster (SEC) computing model for organizations to reserve and dynamically resize a virtual cloud-based cluster. We present a set of integrated batch scheduling plus resource scaling strategies uniquely enabled by SEC, as well as an online reserved instance provisioning algorithm based on job history. Our trace-driven simulation results show that such a model has a 61.0% cost saving than individual users acquiring and managing cloud resources without causing longer average job wait time. Meanwhile, the overhead of acquiring/maintaining shared cloud instances is shown to take only a few seconds.Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis; 11/2013
Evaluation of gang scheduling performance
and cost in a cloud computing system
Ioannis A. Moschakis ·Helen D. Karatza
© Springer Science+Business Media, LLC 2010
Abstract Cloud Computing refers to the notion of outsourcing on-site available ser-
vices, computational facilities, or data storage to an off-site, location-transparent cen-
tralized facility or “Cloud.” Gang Scheduling is an efficient job scheduling algorithm
for time sharing, already applied in parallel and distributed systems. This paper stud-
ies the performance of a distributed Cloud Computing model, based on the Ama-
zon Elastic Compute Cloud (EC2) architecture that implements a Gang Scheduling
scheme. Our model utilizes the concept of Virtual Machines (or VMs) which act as
the computational units of the system. Initially, the system includes no VMs, but
depending on the computational needs of the jobs being serviced new VMs can be
leased and later released dynamically. A simulation of the aforementioned model is
used to study, analyze, and evaluate both the performance and the overall cost of
two major gang scheduling algorithms. Results reveal that Gang Scheduling can be
effectively applied in a Cloud Computing environment both performance-wise and
Keywords Cloud computing · Gang scheduling · HPC · Virtual machines
Cloud Computingis a revolutionaryway of providingshared resources over the Inter-
net. Through the use of low level virtualization software, such as Xen , the Cloud
provides virtualized computing hardware infrastructure in a manner similar to the
public utilities, thus it is also termed as Infrastructure-as-a-Service (IaaS) or Utility
Computing. Since all hardware is virtualized, the Cloud gives the illusion of limitless
resources which can be made availableto the user on-demandand can be dynamically
I.A. Moschakis (?) · H.D. Karatza
Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
I.A. Moschakis, H.D. Karatza
scaled up or down. On the other hand Computing refers to the applications and soft-
ware platforms being offered through the Cloud usually under the notion of a service
model, hence called Software-as-a-Service (SaaS) .
The importance of Cloud Computing arises in the opportunity that it provides for
the development of application services without the requirement of a prior to de-
ployment Capital Expenditure (CapEx). This allows for startup Internet companies
with tight budgets to use their profits for Operational Expenditure (OpEx) alone.
Furthermore, in the scientific field of study, CC presents us with the ability to lease
computational resources from its virtually infinite pool for use in High Performance
Computing (HPC). In this way, even small institutions or individuals can have access
to a large number of computational resources at a fraction of the cost of maintain-
ing a supercomputer center. Since the Cloud is cost-associative, we pay only for the
computing time that we spent running each VM and for data transfers in and out of
the Cloud. One, of course, could argue that this problem is already addressed by the
Grid, but the Grid posses certain restrictions on the availability of software while
Cloud VMs can be custom built with virtually any software a user needs.
In order to take advantage of computational resources that span one server, or
in our case a virtual machine a parallel or distributed computing scheme must be
applied. Although Cloud Computing infrastructure is virtualized, and thus provides
no direct access to the underlying hardware, the Amazon EC2 specification provides
multicore VMs, hence parallelization even on a single VM is possible. Moreover, one
of the main features of Cloud Computing is its ability to adapt, so a user can expand
or contract his system dynamically. Conclusively, if CC is going to be used for HPC,
whose market share comprises a third of the server market [2, 5], appropriatemethods
must be considered for both parallel job scheduling and VM scalability.
The importance of scheduling methods is apparent in every distributed system.
The scheduling algorithm must seek a way to maximize the performance of the sys-
tem by avoiding unnecessary delays  and also in our case maintain a good re-
sponse time to leasing cost ratio. The main task of the scheduler is to allocate proces-
sors to parallel jobs that have entered the system . In the system modeled, parallel
jobs consist of tasks that are in very frequent communication and, therefore, must ex-
ecute both simultaneously and concurrently. Gang scheduling is a special case of job
scheduling that allows the scheduling of such jobs. A system that applies this kind
of scheduling must guarantee that every task of a given job will be allocated on a
different processor so that it will begin and finish its execution at the same time as
the other tasks. In this way, the system can avoid cases where a task is blocked while
waiting for input from another task that is not currently executing.
This type of scheduling has been extensively studied in the past in the area of dis-
tributed and Grid systems [9, 11, 13, 23, 24]. In [11–14] Karatza has studied the per-
formance of Adaptive First Come First Serve (AFCFS) and Largest Job First Served
(LJFS) gang scheduling policies. Gang Scheduling has also been examined in situa-
tions involving more than one clusters of processors [7, 18, 19]. Also,  considers
task migration strategies with the inclusion of high priority jobs in the process. In the
aforementioned publications, the number of processors available to the system was
always static during the simulation and the workload consisted of jobs with a degree
of parallelism in the range [1..P], P being the total number of available processors,
regardless of distribution.
Evaluation of gang scheduling performance and cost in a cloud
Scheduling strategies have been studied before under the notion of Cloud Comput-
ing. In , Assunção et al. studied the use of CC as an extension to private clusters.
In their model, tasks were separate from each other and did not communicate. Vir-
tual Machine usage and leasing has also been studied in [21, 22] through the use of
Haizea1VM-based lease management architecture.
In this paper, the simulation model consists of one distributed and dynamically
scaling Cloud Computing cluster of VMs. The workload consists of parallel jobs
(gangs) that are either small or large based on a pre-simulation specified job size
coefficient. We compare AFCFS and LJFS under this model in order to study their
performance and cost efficiency in a Cloud Computing environment. Additionally,
we implement a complex system for adding and removing virtual machines from the
system depending on the system’s load at any specific time. To the best of our knowl-
edge, there have not been any other publications that have addressed this specific
The structure of this paper is as follows. Section 2 presents an in-depth descrip-
tion of the system and workload models. Section 3 describes the Dispatching and the
Scheduling strategies utilized in the simulation. In Sect. 4, we discuss the VM han-
dling system that we have implemented. Section 5 presents the metrics used to mea-
sure performance and cost, the parameters of the simulation, and the results along
with an analysis of them. Finally, Sect. 6 provides some conclusive remarks along
with our thoughts about future work on the subject.
2 System and workload models
The simulation model consists of a single cluster of Virtual Machines connected with
a Dispatcher Virtual Machine (DVM). Initially, the system leases no VMs so the
cluster is empty. Depending on the workload at any specific moment, the system has
the ability to lease new VMs up to a total number of Pmax= 120. This is a limitation
posed by Amazon EC2 which allows up to 20 “Regular” and up to 100 “Spot” VMs
which can be leased under certain conditions , hence virtually up to 120 VMs.
The user can request even more VMs through electronic request, but the approval of
the request is not certain nor an answer is guaranteed within a specified time limit.
Therefore, for the time being, such a feature is excluded from the model.
Each Virtual Machine incorporates its own task waiting queue where the tasks of
parallel jobs are dispatched by the Dispatcher Virtual Machine (DVM). The DVM
also includes a waiting queue for jobs that where unable to be dispatched at the
moment of their arrival due to either inadequacy of VMs at that moment or due to
overloaded VMs. For the sake of simplicity DVM is not counted within the overall
limit of VMs, Pmax.
In this paper, we assume that the communication between the virtual machines
is contention-free. Therefore, we consider that the communication latencies are in-
cluded implicitly in the jobs’ execution time. However, we do consider explicit delays
I.A. Moschakis, H.D. Karatza
Fig. 1 The system model
when jobs are not immediately dispatched for the reasons discussed in the previous
We also assume that all virtual machines are identical, that is, they all belong to the
same class of EC2 virtual machines. As is true with nonvirtualized systems, VMs can
suffer from inequalities in their performance depending on the state of the underlying
hardware at any specific moment. However, studies [2, 16, 17] have shown that VMs
are able to provide near homogeneous performance as long as no I/O takes place.
Even this problem is expected to be resolved in the near future through the use of
newer types of flash memory such as solid state drives (SSD). For these reasons, we
consider that any overhead that may exist due to temporal performance difference
between VMs is implicitly included in the execution time of jobs.
Gang scheduling is a special case of scheduling parallel jobs in which tasks of
jobs need to communicate very frequently . Thus, each job requires a number of
processors equal to its degree of parallelism, the number of tasks that it consists of, in
order to be dispatched and executed. In the model under study, degrees of parallelism
are random numbers following the discrete uniform distribution. Furthermore, jobs
fall in two different categories of size:
– Lowly Parallel Jobs, that have job sizes in the range [1..16] with a probability
Evaluation of gang scheduling performance and cost in a cloud
– Highly Parallel Jobs, that have job sizes in the range [17..32] with a probability
where q is the job size coefficient which determines the amount of jobs that belong
to the first or the second category.
So we can compute the average number of tasks per job or Average Job Size (AJS)
in the following way:
AJS = q1+16
The mean interarrival time of jobs is exponentially distributed with a mean of
1/λ. The mean task service time is exponentially distributed with a mean of 1/μ.
There exists no correlation between service times and job size, for example, it is not
necessary for a large job to have a long service time.
We must emphasize here that jobs always execute to completion and that no pre-
emption takes place. This happens because context switching in the case of Gang
Scheduling involves high overhead since network status must be saved and then be
restored when switching between tasks . Also, as noted in the same reference,
there is a possibility that some messages that should have been received by a process
before it was switched may be received by another process after the context switch.
For this reason, it is impractical and possibly dangerous to either preempt or migrate
gang tasks when they are already running.
3 Dispatching and scheduling strategies
3.1 Job routing
The job entry point for the system is the Dispatcher VM. If the degree of parallelism
of any arriving job is less than or equal to the number of the available VMs, the job
is immediately dispatched. The allocation of VMs to tasks is handled by the DVM
which employs the Shortest Queue First (SQF) algorithm for this. SQF dispatches
tasks to VMs with the shortest, least loaded, queues. Tasks that belong to the same
job, also called sibling tasks, cannot occupy the same queue since gang scheduling
requires that there exists a one-to-one mapping of tasks to server VMs. An abstracted
view of SQF is provided by the following listing 1.
Algorithm 1 Shortest Queue First
vmsByQueueLength := getVMsByQueueLengthIncremental();
for i = 0 to numberOfTasks do