Content uploaded by Miguel Camelo
Author content
All content in this area was uploaded by Miguel Camelo
Content may be subject to copyright.
A Multi-Objective Performance Evaluation in Grid Task Scheduling
using Evolutionary Algorithms
MIGUEL CAMELO, YEZID DONOSO, HAROLD CASTRO
Systems and Computer Engineering Department
Universidad de los Andes
Carrera 1Este # 19A-40
COLOMBIA
ydonoso@uniandes.edu.co
Abstract: - This paper presents a new strategy that can solve the Grid task scheduling problem with multiple
objectives (NP-Hard) in polynomial time using evolutionary algorithms. The results obtained by our proposed
algorithm were compared and evaluated against the -constraints classic Multi-Objective Optimization method,
which uses the deterministic algorithm of Branch and Bound to find the real Pareto front solutions. The main
contributions of this paper are the proposed mathematical model and the algorithm to solve it.
Key-Words: Multi-Objective Optimization, Grid Task Scheduling, Evolutionary Algorithms, Pareto Front.
1 Introduction
Grid computing technologies, specifically the ones used in High Throughput Computing (HTC) systems, allow
a large amount of processing over large periods of time. The key for these infrastructures to offer a high
performance is the management and effective exploitation of available computing resources [1]. Traditionally,
tasks scheduling in computing environments has been addressed using mono-objective optimization strategies
[2] [3]. Although this approach obtained good results on the performance of a single aspect (e.g. delay time
minimization), the Multi-Objective (MO) nature of Grid Computing [4] makes it an ineffective approach and
necessary to use MO Optimization to capture all the characteristics and requirements of different stakeholders
of the Grid infrastructure, obtaining a set of trade-off solutions, all good in some aspect.
Our proposal uses the paradigm of Multi-Objective Evolutionary Algorithms (MOEA). These algorithms
combined with a set of evolutionary operators that take advantage of the characteristics of the general
operations of Grid Computing, solve efficiently and effectively the task scheduling problem in Grid computing
environments, which is a NP-Hard problem [5]. The evaluations about our algorithm performance showed that
the set of non-dominated solutions found are closed to the Pareto-optimal front; uniformly spaced and the
algorithm convergence is fast. The true Pareto front was found using the classical method of -constrains MO
Optimization and the Branch and Bound (B&B) deterministic algorithm [6]. The rest of the paper is organized
as follows. Section 2 and 3 talks about related work and introduces the task scheduling problem in the Multi-
Objective context using a mathematical model of Grid Computing. The proposed algorithm to solve the
problem is described in Section 4. A set of performance tests where the proposed algorithm is compared against
-constrains methods is presented in Section 5 and Section 6 has the conclusions and future work.
2 Related Work
At the level of Grid task scheduling with multiple objectives using MO Optimization, very few papers are
presented. In [4] the quality of the different found solutions were compared against the Ant Colony
Optimization (ACO), Particle Swarm Optimization (PSO), Simulated Annealing (SA) and Genetic Algorithms
(GA) Metaheuristics. The results show that PSO and GA are highly efficient and effective in the task
scheduling problem. Later these Metaheuristics were compare (combined with the classic MO Optimization
Weighted Sum method) against a MOEA algorithm (it was not specified), optimizing the makespan and
flowtime [6] objective functions. Compared to other Metaheuristics, the MOEA algorithm presented better
quality solutions, showing that this type of algorithms always converges to the optimal value on each execution.
Unlike other Metaheuristics where on each execution it could provide the optimal value or not, causing
deviations between the solutions of different executions.
In [7], the authors propose an algorithm called Multi-Objective Resource Scheduling Approach - MORSA,
which is a combination between NPGA and NSGA, two first-generation MOEA Algorithms. They combine the
Applied Mathematics and Informatics
ISBN: 978-960-474-260-8
100
sorting algorithm of non-dominated solutions with the process of Niche Sharing to ensure diversity. It uses the
execution cost on resources and flow time as objective functions. This proposal has the disadvantage of using
first-generation MOEA algorithms, although are better than classical methods of MO optimization, they are less
efficient and effective than second-generation MOEA [8]. Finally, another interesting proposal is presented in
[9] where the NSGA-II (a Second Generation MOEA) is used as base algorithm. They are aimed at balancing
the load between autonomous sites (called Virtual Organizations or VOs) while minimizing the response time
of their jobs. Although this solution has the advantages and benefits of using the NSGA-II as base algorithm,
authors do not incorporate any knowledge into the algorithm (using fast heuristics or Metaheuristics), avoiding
better results. This assertion is demonstrated in the results of our proposal.
3 Grid Computing as Multi-Objective Optimization Problem
The need to formulate the Grid resource management as a MO problem results from the characteristics of the
Grid environment itself. This environment incorporates a set of requirements of different stakeholders groups.
Each group has its own point of view and policies that should be considered when evaluating different
schedules of tasks from the point of view of different criteria or objectives.
Grid Characteristics can be captured in a formal mathematical model, which allows designing an
optimization algorithm for the task scheduling problem. The mathematical model’s features of the MO
scheduler that support MO task scheduling are defined based on the High Throughput Computing Job
Scheduling information paper del Open Grid Forum [10]. In general, the Grid is composed of one or more
Virtual Organizations (VO), which is a set of individuals and/or domains that are governed by a common set of
policies for sharing resources. The users, resource providers and VO administrators (called stakeholders in the
literature) objectives/preferences are often inconsistent or conflicting with each other. A user is an entity that
uses available resources to execute their computer task for a certain period of time. On the other hand, a
resource provider is an entity that manages computational units inside a single administrative domain. Users
and resource providers can be organized dynamically between VOs, each one with different policies and
preferences. Aditionally, a VO manager is the responsible entity for the inter-domain policies maintenance and
control to exchange resources and security standards. Finally, in Grid Computing, the service responsible for
ensuring high efficiency in resource management is the service of Resource Management and Task
Scheduling. This service is responsible for identifying requirements, matching resources to tasks (for an
efficient implementation), submit to the selected resources the tasks executions and monitor them until they
finish.
The problem considered in this paper consists of a finite set of users that need
to send a quantity of task to resources that are provided by different
resources providers of a finite set. Each resources provider has associated a set
of resources of a finite set . Each resource is described by a set of attributes (or
resources features). Each computer task contains hard constraints of a finite set
HC.A hard constraint , can be defined as a relation between a resource’s
attribute (e.g. Operating System, Processor Architecture, CPU speed at least 2GHz) and a task
requirement concerning its attribute
, for each task and its requirements, where ,
and . The relationship is satisfied if matches with
, in other words if the
attribute is greater, equal and/or lower than a required value by the attribute
, according to the case.
For example, if the attribute
indicates that it has to be greater than avalue, the relationship is satisfied
if is greater than the value. Be (a scheduling) the assignments set of resources to execute all
tasks it is said that the assignment is feasible if meets the relationship
.
Additionally it can be set as the required resources quantity for a task (e.g. a 1GB Disk space or 4
CPUs).
It is important to verify that a strong restriction about the access rights that a user need to perform an
operation (i.e. a task execution) over a resource from a resource provider should be satisfied. The permissions
relationship of a user to access and use a resource (identified by ) can be defined as: if
, meaning assigning a resource to a task is valid if and only if the task owner has access and rights to use the
allocated resource. Given the definitions, notations and considerations that describe tasks scheduling problem
in Grid environments, the Multi-Objective problem formal definition is:
Applied Mathematics and Informatics
ISBN: 978-960-474-260-8
101
(1)
(2)
(3)
(4)
Where
,
,
(5)
(6)
(7)
,
and
(8)
Equation (2) indicates that all the
attributes requirements of a task have
to be satisfied under hard constraints.
Equation (3) ensures that available
resources meet the resource
requirements for a task and equation
(4) is the access rights constrain of an
user to resources to execute their tasks.
Equations (5) and (6) show the sets and
attributes used in the mathematical
model and equation (7) indicates a
feasible allocation of tasks to
resources, which should be part of the
set ( of all feasible solutions that
satisfy the problem constraints and
meet the definition of the Pareto Front
[11].
4 NSGA-II for Grid Task Scheduling
In year 2000, Kalyanmoy Deb et. al. in [12] presented the Non-dominated Sorting Genetic Algorithm II
(NSGA-II), which is a review of the NSGA algorithm of MOEA’s first generation. NSGA-II allows complexity
reduction, incorporates an elitism operator and eliminates the parameters on the diversity operator, allowing for
greater transparency in the algorithm.
4.1 Evolutionary Algorithm Operators
The process of convergence of the solutions set to the Pareto Front is highly linked to the use of evolutionary
operators. Now, the configuration parameters and operators designed for the NSGA-II applied to the Job
Scheduling Problem are described. The algorithm 1 shows NSGA-II pseudo code applied in task scheduling
and Figure 1 show the algorithm’s operation.
Fig.
1
NSGA
-
II Operation
4.1.1 Chromosome, Comparison and Selection
The chromosome encoding is done indirectly. Given a chromosome of length n, the position of each gene in the
chromosome is the task number identifier and the allele value (value the gene can take) m is the resource to
which the task is assigned. If n is the number of task to be scheduled in the batch and m the amount of resources
available, an instance with and indicates that the chromosome that represents a solution is
encoded as a vector of six positions, where each position can take the values 1, 2 and 3. The crowding
comparison operator is used to choose the solution set of the last front population trying to enter the next
generation population. Binary tournament operator is used into the creation of child population.
4.1.2 Crossover and Mutation
The crossover operator extends the search space to generate new solutions from the existing ones. The selected
crossover strategy is the single point crossover [8], which showed a good performance to solve this problem.
Applied Mathematics and Informatics
ISBN: 978-960-474-260-8
102
The mutation operation provides Metaheuristics with local search functions and solution diversity. The
mutation process acts in three different ways:
a) Fast Load Balancing between the most loaded (greater makespan) and the least loaded (less makespan).
This heuristic selects the most loaded resource and the less loaded of a chromosome and exchanges two
task.
b) Random migration of jobs between resources. Given a chromosome, two genes are randomly selected and
the value of their alleles is exchanged.
c) Min-Min strategy. Given a random point in the chromosome, the Min-Min heuristics generate the remaining
values of the chromosome.
d) Incorporation of new random population. Incorporates new solutions in a random fashion to extend the
search space which must meet the problem constrains.
The strategy of mutation has a low complexity and increases the capabilities of the NSGA-II to find the real
Pareto Front. The procedures a), b) and c) apply to (90%) of the individuals to be mutated and the remaining
(10%) are eliminated and replaced by the procedure d).
4.2 Evolutionary Algorithm Parameters
The initial population in our algorithm is mostly randomly generated meeting the problem constraints, but
maintaining a low complexity in the algorithm. On the other hand, a few individuals are generated with fast
Metaheuristics/Heuristics, enabling greater convergence and higher speed of the algorithm. The proposed initial
Population, uses a size of 100 individuals, where 95 (95%) are randomly generated, and of the remaining 5
(5%), four are generated by genetic algorithms (G.A) where one of the individuals minimizes the makespan and
the other minimizes the flowtime. The last individual is generated by the Min-Min heuristic. The termination
criterion used is the number of maximum generations, which was set at 500. This value was experimentally
chosen because any increase upon this value, offered no further improvements in the solutions. At last, the
relevant objective function at Grid according to [10] can be classified according to the stakeholders involved in
the Grid (users, resource providers and administrators of the VO). For the model, we selected two of these
functions to observe their behavior in the performance tests of the Multi-Objective Algorithm: load balancing
(makespan minimization) and job completion time minimization (flowtime minimization).
5 Evaluations and Results
One of the best ways to measure the quality of a Multi-Objective algorithm is using metrics to evaluate it
against theoretical solutions of the models. In this work, we built a real Pareto Front evaluating the
mathematical problem with the method of -constrains and using the deterministic/exact algorithm of B&B.
The obtain results allow us first to use the metric of generational distance (GD) [11] to evaluate the Pareto front
convergence found with our proposal against the real Pareto front found by the exact algorithm. On the other
hand, it is necessary to evaluate the ability to generate well-distributed solutions through the real Pareto, so the
metric spacing (S) [11] is use. The DG and S equation are presented in equation 9 and 10. The algorithm was
run 20 times and on each run the GD and S metrics were obtained. Subsequently their average and standard
deviation are obtained. The ε-constrains method configuration parameters are presented in table 1.
Options Value
Integer Problem
Relaxation Solution Simplex-Dual
Search Heuristic
Pure Depth
First
Quality Integer solution for
Acceptance 0.05%
Max Time Exploration
for Solution
3600 seconds
ε
values calculated
30
(9)
(10)
Table 1.
-constrains method configuration parameters.
5.1 Pareto Front, Spacing and GD Results
In this evaluation, three instances of the problem were used to calculate the real Pareto Front and the one
generated by our proposal. Being one instance a pair like (m, n), where m is number of machines and n the
number of jobs. Then the instances used for evaluation were: small instance (3,13), which generates a search
space of 1,594,323 solutions, medium instance (5,100) that generates a space of solutions and a big
instance (50,300) with a search space of solutions. The obtained Pareto Fronts are presented in the
figures 2, 3 and 4 and spacing and generational distance values in table 2.
Applied Mathematics and Informatics
ISBN: 978-960-474-260-8
103
Fig. 2
Pareto Front in the instance (3,13)
Fig. 3
Pareto Front in the instance (5,100)
The instance’s (3,13) results show that all the points of the real Pareto were found by the MOEA, thus when
computing its spacing and the generational distance, the values are below the unity because of the precision on
the decimal values of the exact algorithm. These results confirm the convergence of the algorithm towards the
real Pareto front. Furthermore, both the generational distance and the spacing held a low standard deviation,
which shows that the algorithm converges always in every execution.
Instance Metric Average
Standard
deviation
(3,13)
GD
0,091
0,038
Spacing
1.42
0.45
(5,100)
GD
1,29
0,75
Spacing
1,44
1,18
(50,300) GD
No calculated because
ε
-
constrains
method don’t work (the memory RAM
was exceeded)
Spacing
1,787
1,3217
Table 2. Generational Distance and Spacing metrics
results
Fig. 4 Pareto Front in the instance (50,300)
In the (5,100) instance, the algorithm’s efficiency was demonstrated again, even in bigger search spaces.
The calculated metrics show the high quality and diversity of the Pareto’s front non-dominated solutions found
by the proposed algorithm. Although the spacing and the GD both increased, they are still low in scale of the
objective functions. Finally, in the 50-300 instance (large-scale combinatory problem), the program used to
calculate the optimum values of the real Pareto did not work. The space and computational complexity of the
mathematical programming model turned the deterministic algorithm of the classic method useless. The
calculated spacing and its deviation showed that through each of the algorithm’s execution, the Pareto’s
distribution almost always stayed constant. This indicates that the generated fronts on each execution were
similar one to another, meaning convergence to the same optimum values.
5.3 Complexity Analysis
The computational complexity of NSGA-II algorithm is polynomial and equal to for each G
generation, where is the number of objective functions and the individuals in the population, meaning its
implementation complexity is and this is independent of the number of resources and task. On the
other hand, B&B has a complexity equal to fully evaluate the search space of combinatorial problem, and it is
concluded that this algorithm is, in the worst case, of exponential complexity, where is the number of
resource and is the number of task. In a practical way, the execution times of our proposal and of the
traditional method were measured. The table 3 shows the practical results of the execution time in both
algorithms.
This table shows the inefficiency of the deterministic algorithms in NP-Complete problems. In a small
problem instance, the difference, in terms of execution time elapsed, between algorithms is low. However, in
large problem instance the deterministic algorithm have very high execution therefore the optimization results
should be obtained as a gap respect to optimization problem relaxed solution. The last instance (50,300) the
Applied Mathematics and Informatics
ISBN: 978-960-474-260-8
104
algorithm deterministic do not finalize because its spatial complexity beyond the capabilities of the machine
used for assessments (Dual Core Xeon Processor and 4 GB of RAM).
Instance
NSGA
-
II for Grid Scheduling
(seconds)
ε
-
constrains using Branch and Bound
(seconds)
3
-
13
10.31
>>1
10
-
50
32,36
62,5
5
-
100
37.88
147,8
20
-
200
49,27
<<3600
50
-
300
94,28
Intractable
Table 3. Execution Time of NSGA-II and Branch and Bound
7 Conclusions and Future Work
It is important to note that the proposed algorithm finds a set of optimal solutions (close to the Optimal Pareto
Front). The charts presented and supported by the GD and spacing metrics, showed that the use of MOEA
algorithms is highly recommended to tackle Multi-Objective task scheduling problems which are of
combinatory nature and generate a large scale and/or non-convex search space. In the medium-large scale size
Grid scheduling problem (5-100 y 50-300), the difficulties suffered by the traditional deterministic-algorithm
method when trying to find the exact Pareto front (because of its computational and spatial complexity) were
noted. In the 5-100 instances, the computational complexity caused the B&B algorithm’s execution to be very
slow to find the values used to generate the real Pareto front using the MO classic method. In the 50-300
instances, the B&B algorithm failed its execution because the optimizer software doesn’t work because the
machine memory capacity is surpassed. The problem restriction matrix used above 4GB of RAM.
It is worthwhile to remember that HTC’s task scheduling Problem includes scenarios of hundreds of
thousands of resources/task that must be schedule quickly, where deterministic algorithms solutions cannot be
used and therefore, it necessary to use alternatives such as the one suggested here, which showed to be efficient
and effective when tackling the problem. As a future work, it is proposed to compare the algorithm with other
MOEA on large-scale instances. Furthermore, it is also suggested to begin evaluating other evolutionary
operators and incorporating new heuristics on the mutation operator to improve a faster the algorithm’s
convergence to the real Pareto.
References:
[1] Lei, Zhou and Yun, Zhifeng, Allen, Gabrielle., "Grid Resource Allocation." [ed.] Lizhe Wang, Wei Jie and Jinjun
Chen. Grid Computing: Infraestructure, Service, and Applications. Boca Raton : CRC Press, 2009, 7, pp. 1172-188.
[2] Braun, Tracy D., et al., A comparison of eleven static heuristics for mapping a class of independent tasks onto
heterogeneous distributed computing systems. s.l. : Journal of Parallel and Distributed Computing, Academic Press, Inc,
2001. pp. 810 - 837. Vol. 61. ISSN:0743-7315.
[3] Izakian, Hesam, Abraham, Ajith and Snasel, Václav., "Comparison of Heuristics for Scheduling Independent Tasks
on Heterogeneous Distributed Enviroments." Proceedings of the 2009 International Joint Conference on Computational
Sciences and Optimization. s.l. : IEEE Computer Society , 2009. Vol. 1, pp. 8-12.
[4] Xhafa, Fatos and Abraham, Ajith., Metaheuristics for Scheduling in Distributed Computing Environments. 2008.
ISBN: 978-3-540-69260-7.
[5] Bäck, Thomas., Evolutionary Algorithms in Theory and Practice. New York : Oxford University Press, Inc, 1996.
ISBN 0-19-509971-0.
[6] Pinedo, Miguel L., Scheduling: Theory, Algorithms, and Systems. Fifth Edition. s.l. : Springer, 2008. p. 678. ISBN
0387789340.
[7] Ye, Guangchang, Rao, Ruonan and Li, Minglu., A Multiobjective Resources Scheduling Approach Based on Genetic
Algorithms in Grid Environment. Hunan, China : Fifth International Conference on Grid and Cooperative Computing
Workshops, 2006. ISBN: 0-7695-2695-0.
[8] Deb, Kalyanmoy., Multi-Objective Optimization using Evolutionary Algorithms. New York : John Wiley & Sons,
Ltd., 2001. ISBN: 0-471-87339-X.
[9] Grimme, Christian, Lepping, Joachim and Papaspyrou, Alexander., "Discovering Performance Bounds for Grid
Scheduling by using Evolutionary Multiobjective Optimization." Atlanta, GA, USA : Proceedings of the 10th annual
conference on Genetic and evolutionary computation, ACM, 2008, pp. 1491-1498.
[10] WG, OGSA HPC Profile., "HPC Job Scheduling: Base Case and Common Cases." [Online] 2006. [Cited: 06 10,
2010.] http://www.ogf.org/documents/GFD.100.pdf.
[11] Deb, K., et al., "A fast elitist non-dominated sorting genetic algorithm for multi-objective optimisation: NSGA-II."
PPSN VI: Proceedings of the 6th International Conference on Parallel Problem Solving from Nature, London, UK :
Springer Verlag, 2000, pp. 849–858.
[12] Open Grid Forum - Group Scheduling Working., "Ten Actions When SuperScheduling." [Online] 2001.
http://www.ogf.org/documents/GFD.4.pdf.
Applied Mathematics and Informatics
ISBN: 978-960-474-260-8
105