Content uploaded by Alejandro Santiago
Author content
All content in this area was uploaded by Alejandro Santiago on Oct 27, 2016
Content may be subject to copyright.
MISTA 2013
An Iterative Local Search Algorithm for Scheduling
Precedence-Constrained Applications on Heterogeneous
Machines
Aurelio A. Santiago Pineda ·Johnatan E.
Pecero ·H´ector J. Fraire Huacuja ·Juan J.
Gonzalez Barbosa ·Pascal Bouvry
Abstract The paper deals with the problem of scheduling precedence-constrained ap-
plications on a distributed heterogeneous computing system with the aim of minimizing
the response time or total execution time. The main contribution is a scheduling al-
gorithm that promotes an iterative local search process. Due to a lack of generally
accepted standard benchmarks for the evaluation of scheduling algorithms in the het-
erogeneous computing systems we also generate a benchmark of synthetic instances.
The benchmark is composed of small size synthetic deterministic non-preemptive pro-
gram graphs known in the literature. We compute the optimal solution and the global
optimal value with an exact enumerative search method that explores all possible so-
lutions. We compare the performance of the proposed local search algorithm with the
optimal values. We have simulated the proposed algorithm using graphs obtained from
real-world applications emphasizing the interest of the approach.
1 Introduction
Heterogeneous computing systems are a commonplace infrastructure that provides re-
sources in a distributed way interconnected via networks for executing parallel appli-
cations with a large amount of data and computing power. In such a system, a parallel
application can be partitioned into a number of cooperative tasks that are distributed
to the resources for parallel execution. However, the performance of a parallel appli-
cation executed on a parallel and distributed computing system heavily depends on
the scheduling of the tasks from the application onto the available processors in the
system, which if not properly solved, can nullify the benefits of parallelization and the
power of the distributed computing resources. Moreover, not only the performance of
the parallel application is deteriorated, but also issues related to energy consumption
are affected if the problem of scheduling is not properly handled [17,20,28].
Johnatan E. Pecero ·Pascal Bouvry
University of Luxembourg
E-mail: {johnatan.pecero, pascal.bouvry}@uni.lu
Aurelio A. Santiago Pineda ·H´ector J. Fraire Huacuja ·Juan J. Gonzalez Barbosa
Instituto Tecnol´ogico de Ciudad Madero
E-mail: alx.santiago@gmail.com, automatas2002@yahoo.com.mx
E-mail: jjgonzalezbarbosa@hotmail.com
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 472 -
In its general form, the precedence-constrained scheduling problem is NP-complete [7].
Therefore, many heuristics based scheduling algorithms have been proposed that find
a sub-optimal solution and attempt to balance running time, complexity and sched-
ule quality [5,14–16, 19, 25]. However, the performance of these heuristics is still an
open research problem [4,10,11,27,31]. Consequently, there is an increasing interest
in investigating and designing heuristics for scheduling precedence-constrained paral-
lel programs on heterogeneous computing systems. Main motivations are not only the
availability of heterogeneous computing platforms, such as Grid and P2P systems, but
also the increasing interest in industry and science by executing a number of parallel
applications that can be modelled by precedence task graphs or workflows [11].
In this paper we propose a local search algorithm based on an iterative search
process to solve the precedence-constrained scheduling problem. The iterative local
search (ILS) algorithm is a straight-forward, yet powerful techique for extending sim-
ple local search algorithms. We generate a benchmark composed of small size synthetic
deterministic parallel program graphs proposed in literature. We compute the optimal
solution by an enumerative search process that exhaustively explores the search space.
We compare the performance of the proposed ILS algorithm with the optimal values
considering an approximation factor. To evaluate and investigate scalability issues we
also simulated the ILS algorithm using parallel graphs that model real-world applica-
tions. Results of the experimental study show that the algorithm is efficient in solving
the problem providing results close to the optimal value.
The paper is organized as follows. In Section 2 we describe the precedence-constrained
scheduling problem. Section 3 discusses related work. The proposed iterated local search
algorithm is described in Section 4. Next, in Section 5 we present the benchmark and
experimental results. Section 6 concludes the paper.
2 Problem description
The target system used in this work is represented by an undirected unweighted graph
Gs= (Vs, Es), called a system graph (see, e.g. [31]). Vsis the set of Nsnodes of
the system graph representing the mprocessors. Esis the set of edges representing
bidirectional channels between processors and defines a topology of the distributed
system. The processors have different processing speed or provide different processing
performance in term of MIPS (Million Instruction Per Second) and communication via
links does not consume any processor time.
As usually, a parallel program is represented by a weighted directed acyclic graph
(DAG). The DAG, called a precedence task graph or a program graph, is defined as
G= (T, E), where Tis a finite set of nodes (vertices) and Eis a finite set of edges.
The node ti∈Tis associated with one task tiof the modeled parallel program. To
every task ti, there is an associated value pij representing the computation cost of the
task tion a processor mj, and its average computation cost is denoted as pi. Each
edge (ti1, ti2)∈E(with ti1, ti2∈T) is a precedence constraint between tasks and
represents inter-task data communications if the output produced by task ti1has to be
communicated to the task ti2. We consider the same communication model as in [4].
That is, the data parameter is a t×tmatrix of communication data, where data(ti, tj)
is the amount of data required to be transmitted from task tito task tj. The rate
parameter is a m×mmatrix and represent the data transfer rate between proces-
sors. The communication cost of edge (ti, tj)∈E, which is for data transfer from task
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 473 -
ti(scheduled on processor mk) to task tj(scheduled on processor ml), is defined by
cti,tj=data(ti, tj)/rate(mk, ml). When both tiand tjare scheduled on the same pro-
cessor (pk=pl), then cti,tjbecomes zero. The average communication cost of an edge
is defined by cti,tj=data(ti, tj)/rate, where rate is the average transfer rate between
the processors in the domain. For a given DAG the communication to computation ra-
tio (CCR) is a measure that indicates wether a task graph is communication intensive,
computation intensive or moderate. It is computed by the average communication cost
divided by the average computation cost on a target system.
A simple task graph with its details are shown in Figure 1. The values presented in
the last column of the table are computed based on a frequently used task prioritization
method, the bottom level (blevel). The blevel of a node is the length of the longest path
from the node to an exit node. Note that, both the computation and communication
costs are averaged over all nodes and links. The blevel(ti) is computed recursively by
traversing the DAG upward starting from the exit task texit as follows (Eq. 1):
blevel(ti) = pi+maxtj∈succ(ti){blevel(tj) + cij },(1)
where succ(ti) is the set of immediate successors of tiand blevel(texit ) = (ptexit ).
!
" # $ %
& '
(
"" "( "% ""
"$ "! ") "$ #(
#" "$
task r0r1r2piblevel
0 11 13 9 11 101.3
1 10 15 11 12 66.7
2 9 12 14 12 63.3
3 12 16 10 12 73.0
4 15 11 19 15 79.3
5 13 9 5 9 41.7
6 11 15 13 12 37.3
7 11 15 10 12 12.0
Fig. 1 In the left a sample DAG with the task indexes iinside nodes and values of cti1ti2
function next to the corresponding edges. In the right computation cost (piat level L0) and
task priorities (blevel).
The aim of scheduling is to distribute the tasks among the processors in such a
way that the precedence constraints are preserved, and the response time Cmax (the
total execution time or makespan) is minimized. The response time Cmax for a given
precedence task graph depends on the allocation of tasks in the distributed computing
topology and scheduling policy applied in individual processors [31]:
Cmax =f(allocation, scheduling policy) (2)
The scheduling policy defines and order of processing tasks and assigns a starting
time for each task, ready to run in a given processor. We will assume that it is the
same for any run of the scheduling algorithm and we will focus on looking for such
an allocation of tasks of a parallel application in a distributed computing system to
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 474 -
minimize the makespan. Although the minimization of makespan is crucial, tasks of a
DAG in this work are not associated with deadlines as in real-time systems.
An approximation factor is used to evaluate the proposed algorithms. The factor
is defined as ρ=Cmax
C∗
max , where C∗
max is the optimal response time or makespan [11].
3 Related Work
The scheduling problem is NP-hard in its simplest version (homogeneous case with-
out considering communications). Therefore, many heuristics have been proposed to
schedule DAG applications on heterogeneous distributed computing systems. A well
known scheduling algorithm is the Heterogeneous Earliest Finish Time (HEFT) algo-
rithm [33]. The HEFT algorithm maintains a list of all tasks of a given graph according
to their priorities, usually based on the blevel method. It consists in two phases. In the
first phase, a ready task is selected from the priority list. The task with the highest
priority for which all dependent tasks have finished is chosen. This process corresponds
to the task prioritizing or task selection phase. Thereafter, a suitable processor that
minimizes a predefined cost function is selected (i.e., the processor selection phase), in
this case the processor which will result in the earliest finish time of that task. The
HEFT algorithm is one of the most used algorithm as a basis for comparison to eval-
uate the performance of new proposed scheduling algorithms [2–5, 8–10, 12, 13,15,16,
25–27,29,30,35]. Therefore, we also use HEFT in Section 5 to validate the proposed
approach.
A number of local search algorithms for scheduling have been investigated in the
literature. Kowk et al. [24] present a first improvement random local search algorithm,
named FAST. In this algorithm, a task is randomly picked and then moved to a ran-
domly selected processor. If the schedule length is reduced, the move is accepted.
Otherwise, the task is moved back to its original processor. Kwok and Ahmad in [23]
modified the FAST algorithm. The major improvement is that it uses a nested loop
for a probabilistic jump. A parallel version of FAST is named FASTEST. Wu et. al.
in [34] proposed a local search algorithm based on topological ordering. The algorithm
is a deterministic guided search that uses the level of a task, defined as the sum of the
top-level and blevel, to schedule the tasks. The algorithm first selects a task with the
largest level and then assigns it to the processor that generates the smallest level for
that task. The level for each task is dynamically calculated and is used to determine
the search direction. However, the considered computing system is based on homoge-
neous processors. Kim et al. [18] report a deterministic local search-based scheduling
algorithm. The algorithm starts with a schedule found by a deterministic scheduling
algorithm and then iteratively attempts to improve the current best solution using a
deterministic guided search method based on prior knowledge about the task schedul-
ing problem and the target computing environment. The main idea is to move tasks to
fill the idle periods of processors. One of the major limitations of this approach is that
it assumes complete knowledge of the problem and that the information about tasks
and communications is always accurate. However, many external events can modify
the parameters of the scheduling problem. Kang et al. [14] propose an iterated greedy
algorithm. The main idea of this algorithm is to improve the quality of the assign-
ment in an iterative manner using results from previous iterations. The algorithm first
uses a constructive heuristic to find an initial assignment and iteratively improves it in
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 475 -
a greedy way. The authors consider additional resource constraints in the scheduling
problem. However, in this work we do not consider such constraints.
4 Iterated Local Search Based Scheduling
The solution we propose in this work is a scheduling algorithm based on a local search.
Local search was one of the early techniques for combinatorial optimization. The princi-
ple is to refine a given initial solution point in the solution space by searching through a
neighborhood of the solution point. If an improvement can be achieved in this manner,
then a new solution is obtained. This process is continued until no further improvement
can be obtained. However, in many cases, local minima are common and local search
algorithms can converge quickly to these local minima and get stuck in a local optimum
solution far away from the global optimal. We propose a local search algorithm based
on an iterated local search (ILS) to alleviate this problem.
Algorithm 1 shows the pseudocode of ILS we have used in this paper. The ILS
algorithm is a trajectory-based metaheuristic that can be seen as a straight-forward, yet
powerful technique for extending simple local search algorithms. The algorithm starts
off by generating and evaluating an initial solution. The search process (i.e., the initial
solution) can be initialized in various ways, for example, by starting from a randomly
generated initial solution or from a heuristically constructed solution to the given
problem. Then, following an iteration based approach, it seeks to improve the solutions
from one iteration to the next. At each iteration, a perturbation of the obtained local
optimum is carried out. The perturbation mechanism introduces a modification to a
given candidate solution to allow the search process to escape from local optimum. A
local search is applied to the perturbed solution. The new solution is then evaluated
and accepted as the new current solution under some conditions. The algorithm finishes
when the termination condition is met. We will detail in the following subsections the
core components of the ILS customized to the task scheduling problem.
Algorithm 1 Algorithm outline of ILS based scheduling algorithm
1: sol = GenerateInitialSolution();
2: EvaluateSolution(sol);
3: bestSol = sol;
4: repeat
5: Perturbation(sol);
6: LocalSearch(sol);
7: EvaluateSolution(sol);
8: if sol <bestSol then
9: bestSol = sol;
10: end if
11: until termination condition met
4.1 Initial solution
The ILS based scheduling algorithm starts of by generating an initial feasible solution
as the starting point of search by the local search procedure. We have constructed the
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 476 -
initial solutions as follows. First, we sort tasks by priority, then we select the task with
highest priority and schedule the task to the processor that optimizes a predetermined
objective function. To assign priorities, two different methods have been evaluated, the
first one based on the blevel method and the second one is based on list scheduling
principle using random feasible orders. The main idea of random feasible orders is
to randomly select a task among the ready tasks and place that task in the top of
a priority list. A task is ready when it is an entry task or when all its predecessors
have already been selected and are in the priority list. The tasks are selected from the
priority list and scheduled on the basis of the HEFT heuristic, that is assigning the
task to the processor that minimizes the earliest finish time. In the case of blevel +
HEFT, the quality of the initial solution is equal to that of the HEFT algorithm. In
case of random feasible orders the quality of the solution depends on the priority of the
tasks. In this case, we generate 50 initial random feasible solutions. We evaluate each
of the generated solutions and we keep the best among them. Hence, we have evaluated
two different methods of constructing the initial solution.
4.2 Perturbation process
The perturbation process is an essential aspect determining the good behavior of an
ILS algorithm, as well as the local search process.
We have tested two different configurations for the perturbation method. As that,
the search space is differently explored. In the first perturbation process, called prob-
ability based movement, every task tiof the current solution has a probability to be
changed from its current processor. If the probability occurs the task tiwill be moved
from its current location and assigned to a new random processor. The second per-
turbation process, called random movement moves only one task at a time from its
current location to a random processor. Although the new assignment in the perturba-
tion methods can be selected by a sophisticated criterion, we have decided to randomly
select the new processor to keep time complexity of the algorithm as low as possible.
4.3 Local search
We propose a local search based on the first improvement type pivoting rule. The first
improvement strategy tries to avoid the time complexity of evaluating all neighborhoods
by performing the first improvement step encountered during the inspection of the
neighborhood. The proposed algorithm evaluates the neighboring candidate solutions
in a particular fixed order. We use the order of the priority list computed in the initial
solution construction (i.e., blevel or random feasible order) as classical list scheduling
algorithms. For each task in the list the neighborhood is defined to be the set of
assignments that can be obtained by removing the task from its current processor and
reallocating it to another. If a makespan improvement is found, then the search process
is restarted from the new solution.
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 477 -
4.4 Acceptance criterion and termination condition
An important step in the ILS algorithm is to consider whether the new schedule is
accepted or not as the incumbent solution for the next iteration. The proposed ILS
accepts solutions if and only if the makespan is improved, and rejects them otherwise.
Different conditions can be used to stop the algorithm. The ILS based scheduler
stops the process when the algorithm reaches a maximum number of iterations.
5 Synthetic Benchmark and Performance Comparison
Many heuristics have been developed to solve the heterogeneous DAG scheduling prob-
lem. Most of them use HEFT as a basis for comparison because of the lack of generally
accepted standard benchmarks for the evaluation of scheduling heuristics in the het-
erogeneous distributed computing systems. In this work, we construct a benchmark as
follows. We have collected a set of small size synthetic deterministic non-preemptive
program graphs from literature. We compute the optimal solution and the optimal
value by an enumerative algorithm that performs an exhaustive search. The algorithm
explores all the possible solutions in the search space by keeping the best found so-
lution. The problem is reduced to generate possible permutations with and without
repetitions and evaluate the resulting solutions.
Since generating optimal solutions for arbitrarily structured task graphs takes ex-
ponential time, it is not feasible to obtain optimal solutions for larger graphs [22].
However, to investigate the scalability of the proposed ILS algorithm we have also
used a set of structured real-world parallel applications. The applications used for
our experiments are the robot control application and sparse matrix solver from the
Standard Task Graph set (STG) [32], and a subroutine of the Laser Interferometer
Gravitational-wave Observatory (LIGO) application [6]. Table 1 summarizes the main
characteristics for the used applications: instances size, edges amount and the ratio
between tasks and edges (ETR). ETR gives information of the degree of parallelism.
Table 1 Instance types: tasks and edges numbers, and Edge Task Ratio.
Type Tasks Edges ET R
LIGO 76 132 1.73
Robot 88 131 1.48
Sparse 96 67 0.69
The STG set is composed of homogenous instances with constant execution time
among machines. Since we are interested in heterogenous instances, we have only con-
sidered the structure of these applications and we have implemented the procedure
described in [33] to consider heterogeneity. We fixed the parameter βto 1. Parameter
βis basically the heterogeneity factor for processor speeds. A high percentage value
(i.e., a percentage of 1) causes a significant difference in a task’s computation cost
among the processors. For each graph we have varied the CCR ratio. A randomization
procedure which changes weight of edges was executed to assure the needed CCR. We
have generated five CCRs (0.1, 0.5, 1, 5, 10) for each graph. Tested system sizes were
8, 16 and 32 processors. We have generated 15 different instances for each application
type.
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 478 -
5.1 ILS Configurations
We have four different ILS configurations depending on the method used to generate
the initial solution and the perturbation process. The four ILS algorithms studied are
listed below.
– ILS-Alg1: It uses the blevel method to generate the order of tasks’ execution
and generate the initial solution. The perturbation process is the probability based
movement.
– ILS-Alg2: The initial solution is based on the blevel method and the used pertur-
bation process is the random movement.
– ILS-Alg3: The algorithm generates the initial solutions based on the random fea-
sible orders. The algorithm generates 50 solutions and the best one is used as the
initial solution. The perturbation process is the probability based movement.
– ILS-Alg4: To construct the initial solution the algorithm generates 50 solutions
using the random feasible orders method and keep the best among all the generated
solutions. The algorithm applies the random movement to perturb the solution
during the iteration process.
We study in Figure 2 the effects of different probabilities of the perturbation process
based on the probability movement in order to identify the probability to set in the
experiments. For this experiment we used a LIGO instance in an eight processors
system, and CCR equal to 5. Figure 2 shows a typical run of the algorithms for different
probabilities from 0.05 to 0.2 with 0.05 increment. We only show the results obtained
by ILS-Alg1, however the results for the rest of algorithms show the same behavior.
The main result that can be drawn from this study is that probabilities equal to 0.05
and 0.15 perform better than the other two probabilities (equal to 0.1 and 0.2). When
comparing these two probabilities we noticed that using a low probability (0.05) leads
to a faster convergence. Therefore the results we report for the algorithms ILS-Alg1
and ILS-Alg3 were computed using that probability movement. The maximum number
of iterations in the local search process without improving the current best solution
was set to 50. The maximum number of iterations per algorithm (i.e., the termination
condition) was set to 15. The algorithms are executed 15 times on each instance and
each run is independent.
5.2 Results for Synthetic Deterministic Program Graphs
We report in this section some preliminary results obtained with the ILS algorithms. We
compare the results against the global optimal value computed with the enumerative
search method. We also compare the results obtained by HEFT. Table 2 shows the
results for the results reported in literature and the optimal value. The first column
corresponds to the name of the instance. The name of the instance is composed as
follows: id m n bestres, where id represents the surname of the main author of the
reference where the instances are proposed, mis the number of processors used in the
reference to schedule the instance, nis the size of the instances, and bestres corresponds
to the best value reported in literature. For example, instance Ahmad 3 9 28 is read as:
Ahmad the id of the instance simulated on three processors system, with 9 tasks in size
and the best result reported is equal to 28 units of time. The second column in Table 2
provides the reference where the instance is proposed. The third column presents the
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 479 -
720
740
760
780
800
820
840
860
880
900
0 20 40 60 80 100 120 140
Response Time
Number of Iterations
Prob 0.05
Prob 0.1
Prob 0.15
Prob 0.2
Fig. 2 Results for various probability values used by the perturbation process based on prob-
ability movement
Table 2 Results for synthetic deterministic program graphs known in the literature and the
optimal value
Instance Reference Best Result optimal
Ahmad 3 9 28 [1] 28 22
Hsu 3 10 84 [12] 84 80
Eswari 2 11 61 [10] 61 56
Hamid 3 10 [2] ? 100
Heteropar 4 12 124 [3] 124 124
Ilav arasan 3 10 77 [13] 77 73
Kang 3 10 76 [16] 76 73
Kang 3 10 84 [15] 84 79
Kuan 3 10 28 [25] 28 26
Liang 3 10 80 [26] 80 73
Daoud 2 11 64 [8] 64 56
Y CLee 3 8 80 [27] 80 66
Sample 3 8 100 100 81
SampleF ig 1 3 8 89 89 66
best result reported in literature. The fourth column shows the optimal value computed
with the enumerative search algorithm. Table 3 depicts the results computed by HEFT
in the third column. The last four columns present the best response time computed by
each one of the four ILS configurations and the number of times over 15 independent
runs that the best value is equal to the optimal value. Rows in gray color highlight
the algorithm that is able to find the optimal value for a given instance in all the
executions.
As we can notice in Table 2 the optimal solutions can not be always found by the
algorithms despite having small size instances with maximum four processor system
and 12 tasks. Moreover, there are some instances for which the ILS configurations are
not able to find the optimal solutions in each run. Recall that the scheduling problem
is NP-hard even for two processors system.
From Table 3 it can be observed that the ILS configurations with initial solutions
generated using random feasible orders (i.e., ILS Alg3 and ILS Alg4) find the optimal
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 480 -
Table 3 Comparison results for synthetic deterministic program graphs known in the litera-
ture
Instance optimal HEFT ILS Alg1 ILS Alg2 ILS Alg3 ILS Alg4
Ahmad 3 9 28 22 24 23 (0) 23 (0) 22 (15) 22 (15)
Hsu 3 10 84 80 92 80 (12) 80 (13) 80 (14) 80 (15)
Eswari 2 11 61 61 56 76 58 (0) 58 (0) 56 (4) 56 (2)
Hamid 3 10 100 110 100 (10) 100 (10) 100 (15) 100 (15)
Heteropar 4 12 124 124 150 124 (12) 124 (15) 124 (15) 124 (15)
Ilav arasan 3 10 77 73 80 73 (13) 73 (13) 73 (10) 73 (10)
Kang 3 10 76 73 80 75 (0) 75 (0) 73 (5) 73 (4)
Kang 3 10 84 79 109 82 (0) 82 (0) 83 (0) 83 (0)
Kuan 3 10 28 26 30 26 (15) 26 (15) 26 (8) 26 (11)
Liang 3 10 80 73 80 73 (13) 73 (13) 73 (15) 73 (10)
Daoud 2 11 64 56 76 58 (0) 58 (0) 56 (3) 56 (1)
Y CLee 3 8 80 66 88 68 (0) 68 (0) 66 (15) 66 (15)
Sample 3 8 100 81 84 81 (1) 81 (7) 81 (4) 81 (4)
SampleF ig 1 3 8 89 66 89 69 (0) 69 (0) 66 (15) 66 (15)
value for a bigger number of instances than the ILS algorithms using blevel as initial
solution (i.e., ILS Alg1 and ILS Alg2). The main reason is that by generating random
initial solutions the algorithm is able to explore different regions of the search space.
However, using the deterministic blevel method the algorithms are restricted to a spe-
cific region of the search space. In fact, the ILS Alg3 and ILS Alg4 algorithms find
the optimal value in around 93% of the instances and the algorithms ILS Alg1 and
ILS Alg2 find the optimal value for only 56% of the instances. The ILS configurations
with random feasible orders not only find the optimal value for a bigger number of
instances than ILS configurations with blevel, but also are able to find optimal values
in all the executions for a bigger number of instances than ILS Alg1 and ILS Alg2. It
can be verified in Table 3 with rows highlighted in gray color.
Table 4 Approximation Factor
Instance HEFT ILS Alg1 ILS Alg2 ILS Alg3 ILS Alg4
Ahmad 3 9 28 1.090 1.045 1.045 1 1
Hsu 3 10 84 1.150 1.012 1.011 1.001 1
Eswari 2 11 61 1.366 1.061 1.038 1.054 1.064
Hamid 3 10 1.100 1.019 1.019 1 1
Heteropar 4 12 124 1.209 1.036 1 1 1
Ilav arasan 3 10 77 1.095 1.005 1.005 1.012 1.013
Kang 3 10 76 1.095 1.030 1.030 1.025 1.026
Kang 3 10 84 1.379 1.043 1.046 1.058 1.049
Kuan 3 10 28 1.153 1 1 1.028 1.015
Liang 3 10 80 1.095 1.005 1.005 1 1.010
Daoud 2 11 64 1.366 1.057 1.042 1.058 1.056
Y CLee 3 8 80 1.333 1.030 1.030 1 1
Sample 3 8 100 1.037 1.018 1.008 1.012 1.009
SampleF ig 1 3 8 89 1.348 1.045 1.045 1 1
Average 1.191 1.029 1.022 1.017 1.018
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 481 -
Table 4 provides more details about the performance comparison by presenting
the approximation factor (ρ) for each algorithm. First column provides the name of
instances. Second column shows the approximation factor for the HEFT algorithm.
Recall that HEFT is a well-known scheduling algorithm and is often used as a basis
for comparison and validation of new proposed heuristics. However, to the best of our
knowledge no comparison has been provided between the results computed by HEFT
and the optimal value. The remainder columns present the approximation factor for the
ILS-configurations. We provide average approximation factor for the ILS algorithms.
From Table 4 we observe that the ILS algorithm with the different configurations
outperform HEFT. The ILS algorithms are around 3% on average (for ILS Alg1) from
the optimal values, however HEFT is around 19% on average from the optimal values.
We see that ILS configurations based on random feasible orders compute best results
than ILS algorithms using blevel to generate the initial solution. However, the results
are comparable among the four configurations.
5.3 Results for Real-world Parallel Applications
In this section we present results using the set of real-world parallel applications. The
aim is to evaluate the scalability of the ILS configurations. We present statistical results.
In our test, the algorithms were run for 15 independent runs. We have evaluated the
ILS algorithms using a Friedman test [21]. The Friedman test is a non-parametric
statistical tool that allows to compare a set of a non normalized population to verify if
does exist significant statistical differences in the sample. Moreover, the ranking shows
which algorithm has the higher performance.
Table 5 provides the results obtained after applying the Friedman test. The rank-
ing shows the performance of the algorithms. As we are dealing with a minimization
problem, the higher the value of the ranking, the better the algorithm performs in
finding a solution for the set of instances. We can observe that the behavior of the
algorithms is different when the problem scale regarding small size instances. The rank
shows that ILS-Alg4 is still computing best results than the other three configurations.
However, ILS-Alg2 shows better performance than ILS-Alg1 and ILS-ALg3. The com-
mon characteristics of both algorithms is that ILS-Alg4 and ILS-Alg2 use the same
perturbation process, which slightly change the current solution by moving only one
task at a time then exploring the neighbors of the perturbed solution. The main reason
is that the changes made by the algorithms (moving only one task) do not violate a
structure of the population in a huge range. At once the algorithm can continuously
search better and better solution [31]. However, it can be interesting to investigate a
more sophisticated heuristic to decide which task to move based on some knowledge of
the current structure of the scheduling problem under consideration without increasing
complexity. We can assume that the ILS algorithm provides results better or compara-
ble than HEFT. We can verify the results by considering that ILS-Alg1 and ILS-Alg2
use as initial solution the output of HEFT, hence if ILS does not improve the initial
solution we have the same results than HEFT.
We performed a statistical hypothesis testing to verify if the difference between the
algorithm that provides the best results are significant if the null hypothesis is rejected,
or otherwise, if due to the random nature of the solutions, the obtained results may be
considered statistically equivalent.
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 482 -
Table 5 Average Rankings of the algorithms on all the set of instances
Algorithm LIGO Robot Sparse
ILS-Alg1 2.2888 2.3733 2.1155
ILS-Alg2 3.0466 2.9333 3.2355
ILS-Alg3 1.3444 1.3000 1.6155
ILS-Alg4 3.3200 3.3933 3.0333
Table 6 presents the statistical tests that take into account the p-value obtained
between all the algorithms with the best performing one (ILS-Alg4), with a level of
significance of α= 0.05, if the observed value is larger than 1 −αquantile of this
distribution, the null hypothesis is rejected. As mentioned, the results indicates that
at least the candidate algorithm gives better performance than at least one of the
compared algorithms.
In Table 6 we can observe that ILS-Alg4 outperforms ILS-Alg1 and ILS-Alg3 config-
urations, but is statistically equivalent to ILS-Alg2. Meanwhile, ILS-Alg4 finds better
solutions than ILS-Alg2 when considering Robot application.
Table 6 Statistical hypothesis testing for the best algorithm by Makespan with α= 0.05/4
(Bonferroni correction)
Set ILS-Alg4 vs. p-value H0Rejected
Algorithm
ILS-Alg1 2.4237E-17 Yes
LIGO ILS-Alg2 0.0247 No
ILS-Alg3 3.0522E-59 Yes
ILS-Alg1 5.2857E-17 Yes
Robot ILS-Alg2 1.5727E-4 Yes
ILS-Alg3 2.7254E-66 Yes
ILS-Alg1 4.6884E-14 Yes
Sparse ILS-Alg2 0.0966 No
ILS-Alg3 2.3426E-31 Yes
6 Conclusions and Future Work
In this paper we have proposed an iterative local search based scheduling algorithm
to solve the precedence-constrained scheduling problem in heterogeneous computing
systems. We have generated a synthetic benchmark composed of small size determin-
istic parallel applications. We computed the optimal solutions by using an exhaustive
enumerative search algorithm. The generated benchmark can be used as a basis for
comparison during the design of new scheduling heuristics.
We have investigated four configurations for the proposed ILS algorithm. We com-
pared the results against optimal solutions. From the set of experimental results we
observed that ILS benefited not only from the generation of several feasible orders
(diversifying solutions) and considering the best one instead of generating only one
solution, but also from the small perturbations in the current solution intensifying the
search in its neighborhood, two basic considerations when designing local search meth-
ods. One of the main advantages of the propose ILS is that it can be coupled with a
more sophisticated heuristic without increasing complexity.
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 483 -
We plan to extend the proposed work. First, we consider to extend the benchmark.
For that, we plan to parallelize the enumerative search method and design a branch
and bound algorithm. We consider to design more sophisticated heuristics to decide
which task to move during the perturbation process. One possibility to explore is to
move the task with highest CCR, such that the communication delay can be reduced.
Nowadays, one important aspect to consider is the size of the workflows which usually
are composed of thousands of tasks, we plan to evaluate the scalability of proposed
local search in bigger workflows, for that, we consider to apply a partitioning technique
to first decompose the DAG graph in subgraphs, and then apply our local search in
a cooperative way to locally optimizes these subgraphs, while considering the global
optimization.
Acknowledgements This work is partially supported by the National Research Fund, Lux-
embourg in the framework of the AFR Green Energy-Efficient Computing project (PDR-09-
067). The mexican researchers was supported by the Consejo Nacional de Ciencia y Tecnologia
(CONACyT), Mexico.
References
1. Ahmad, I., Dhodhi, M., UI-Mustafa, R.: Dps: dynamic priority scheduling heuristic for
heterogeneous computing systems. IEE Proceedings: Computers and Digital Techniques
145(6), 411–418 (1998)
2. Arabnejad, H.: List based task scheduling algorithms on heterogeneous systems - an
overview. http://paginas.fe.up.pt/˜prodei/dsie12/papers/paper 30.pdf (2011). Available
Online. Consulted January, 2013
3. Arabnejad, H., Barbosa, J.: Performance evaluation of list based scheduling on heteroge-
neous systems. http://icl.eecs.utk.edu/heteropar2011/slides/heteropar
JorgeBarbosa.pdf. HeteroPar’11. Consulted online 2012
4. Arabnejad, H., Barbosa, J.G.: Performance evaluation of list based scheduling on heteroge-
neous systems. In: Proceedings of the 2011 international conference on Parallel Processing,
Euro-Par’11, pp. 440–449. Springer-Verlag, Berlin, Heidelberg (2012)
5. Bittencourt, L.F., Sakellariou, R., Madeira, E.R.M.: Dag scheduling using a lookahead
variant of the heterogeneous earliest finish time algorithm. In: Proceedings of the 2010
18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP
’10, pp. 27–34. IEEE Computer Society (2010)
6. Brown, D., Brady, P., Dietz, A., Cao, J., Johnson, B., McNabb, J.: A case study on the
use of workflow technologies for scientific analysis: Gravitational wave data analysis. In:
I. Taylor, E. Deelman, D. Gannon, M. Shields (eds.) Workflows for e-Science, pp. 39–59.
Springer London (2007)
7. Coffman, E.G.: Computer and jobshop scheduling theory. John Wiley & Sons Inc (1976)
8. Daoud, M.I., Kharma, N.: A hybrid heuristic-genetic algorithm for task scheduling in
heterogeneous processor networks. J. Parallel Distrib. Comput. 71(11), 1518–1531 (2011)
9. Demir¨oz, B., Topcuoglu, H.R.: Static task scheduling with a unified objective on time and
resource domains. Comput. J. 49(6), 731–743 (2006)
10. Eswari, R., Nickolas, S.: Path-based heuristic task scheduling algorithm for heterogeneous
distributed computing systems. In: Proceedings of the 2010 International Conference on
Advances in Recent Technologies in Communication and Computing, ARTCOM ’10, pp.
30–34. IEEE Computer Society, Washington, DC, USA (2010)
11. Hirales-Carbajal, A., Tchernykh, A., Yahyapour, R., Gonz´alez-Garc´ıa, J., R¨oblitz, T.,
Ram´ırez-Alcaraz, J.: Multiple workflow scheduling strategies with user run time estimates
on a grid. J. Grid Comput. 10(2), 325–346 (2012)
12. Hsu, C.H., Hsieh, C.W., Yang, C.T.: A generalized critical task anticipation technique for
dag scheduling. In: Proceedings of the 7th international conference on Algorithms and
architectures for parallel processing, ICA3PP’07, pp. 493–505. Springer-Verlag, Berlin,
Heidelberg (2007)
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 484 -
13. Ilavarasan, E., Thambidurai, P., Mahilmannan, R.: Performance effective task scheduling
algorithm for heterogeneous computing system. In: Proceedings of the The 4th Interna-
tional Symposium on Parallel and Distributed Computing, ISPDC ’05, pp. 28–38. IEEE
Computer Society, Washington, DC, USA (2005)
14. Kang, Q., He, H., Song, H.: Task assignment in heterogeneous computing systems using
an effective iterated greedy algorithm. J. Syst. Softw. 84(6), 985–992 (2011)
15. Kang, Y., Lin, Y.: A recursive algorithm for scheduling of tasks in a heterogeneous dis-
tributed environment. In: Y. Ding, Y. Peng, R. Shi, K. Hao, L. Wang (eds.) BMEI, pp.
2099–2103. IEEE (2011)
16. Kang, Y., Zhang, Z., Chen, P.: An activity-based genetic algorithm approach to multipro-
cessor scheduling. In: Y. Ding, H. Wang, N. Xiong, K. Hao, L. Wang (eds.) ICNC, pp.
1048–1052. IEEE (2011)
17. Khan, S.U., Ahmad, I.: A cooperative game theoretical technique for joint optimization
of energy consumption and response time in computational grids. IEEE Trans. Parallel
Distrib. Syst. 20(3), 346–360 (2009)
18. Kim, S.C., Lee, S., Hahm, J.: Push-pull: Deterministic search-based dag scheduling for
heterogeneous cluster systems. IEEE Trans. Parallel Distrib. Syst. 18(11), 1489–1502
(2007)
19. Kolodziej, J., Khan, S.U.: Multi-level hierarchic genetic-based scheduling of independent
jobs in dynamic heterogeneous grid environment. Inf. Sci. 214, 1–19 (2012)
20. Kolodziej, J., Khan, S.U., Xhafa, F.: Genetic algorithms for energy-aware scheduling in
computational grids. In: F. Xhafa, L. Barolli, J. Kolodziej, S.U. Khan (eds.) 3PGCIC, pp.
17–24. IEEE (2011)
21. Kvam, P.H., Vidakovic, B.: Nonparametric Statistics with Applications to Science and
Engineering (Wiley Series in Probability and Statistics). Wiley-Interscience (2007)
22. Kwok, Y.K., Ahmad, I.: Efficient scheduling of arbitrary task graphs to multiprocessors
using a parallel genetic algorithm. J. Parallel Distrib. Comput. 47(1), 58–77 (1997). DOI
10.1006/jpdc.1997.1395. URL http://dx.doi.org/10.1006/jpdc.1997.1395
23. Kwok, Y.K., Ahmad, I.: Fastest: A practical low-complexity algorithm for compile-time
assignment of parallel programs to multiprocessors. IEEE Trans. Parallel Distrib. Syst.
10(2), 147–159 (1999)
24. Kwok, Y.K., Ahmad, I., Gu, J.: Fast: A low-complexity algorithm for efficient scheduling
of dags on parallel processors. In: ICPP, Vol. 2, pp. 150–157 (1996)
25. Lai, K.C., Yang, C.T.: A dominant predecessor duplication scheduling algorithm for het-
erogeneous systems. J. Supercomput. 44(2), 126–145 (2008)
26. Lee, L.T., Chen, C.W., Chang, H.Y., Tang, C.C., Pan, K.C.: A non-critical path earliest-
finish algorithm for inter-dependent tasks in heterogeneous computing environments. In:
HPCC, pp. 603–608. IEEE (2009)
27. Lee, Y.C., Zomaya, A.: A novel state transition method for metaheuristic-based scheduling
in heterogeneous computing systems. IEEE Trans. Parallel Distrib. Syst. 19(9), 1215–1223
(2008)
28. Lee, Y.C., Zomaya, A.Y.: Energy conscious scheduling for distributed computing systems
under different operating conditions. IEEE Trans. Parallel Distrib. Syst. 22(8), 1374–1381
(2011)
29. Liu, G.Q., Poh, K.L., Xie, M.: Iterative list scheduling for heterogeneous computing. J.
Parallel Distrib. Comput. 65(5), 654–665 (2005)
30. Shen, L., Choe, T.Y.: Posterior task scheduling algorithms for heterogeneous computing
systems. In: Proceedings of the 7th international conference on High performance com-
puting for computational science, VECPAR’06, pp. 172–183. Springer-Verlag (2007)
31. Switalski, P., Seredynski, F.: Multiprocessor scheduling by generalized extremal optimiza-
tion. J. of Scheduling 13(5), 531–543 (2010)
32. Tobita, T., Kasahara, H.: A standard task graph set for fair evaluation of multiprocessor
scheduling algorithms. Journal of Scheduling 5(5), 379–394 (2002)
33. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task
scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–
274 (2002)
34. Wu, M.Y., Shu, W., Gu, J.: Efficient local search for dag scheduling. IEEE Trans. Parallel
Distrib. Syst. 12(6), 617–627 (2001)
35. Zhao, H., Sakellariou, R.: An experimental investigation into the rank function of the
heterogeneous earliest finish time scheduling algorithm. In: H. Kosch, H. B¨osz¨orm´enyi
L´aszl´oand Hellwagner (eds.) Euro-Par 2003 Parallel Processing, Lecture Notes in Com-
puter Science, vol. 2790, pp. 189–194. Springer Berlin / Heidelberg (2003)
6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)
27-29 August 2010, Gent, Belgium
- 485 -