Content uploaded by Alejandro Santiago

Author content

All content in this area was uploaded by Alejandro Santiago on Oct 27, 2016

Content may be subject to copyright.

MISTA 2013

An Iterative Local Search Algorithm for Scheduling

Precedence-Constrained Applications on Heterogeneous

Machines

Aurelio A. Santiago Pineda ·Johnatan E.

Pecero ·H´ector J. Fraire Huacuja ·Juan J.

Gonzalez Barbosa ·Pascal Bouvry

Abstract The paper deals with the problem of scheduling precedence-constrained ap-

plications on a distributed heterogeneous computing system with the aim of minimizing

the response time or total execution time. The main contribution is a scheduling al-

gorithm that promotes an iterative local search process. Due to a lack of generally

accepted standard benchmarks for the evaluation of scheduling algorithms in the het-

erogeneous computing systems we also generate a benchmark of synthetic instances.

The benchmark is composed of small size synthetic deterministic non-preemptive pro-

gram graphs known in the literature. We compute the optimal solution and the global

optimal value with an exact enumerative search method that explores all possible so-

lutions. We compare the performance of the proposed local search algorithm with the

optimal values. We have simulated the proposed algorithm using graphs obtained from

real-world applications emphasizing the interest of the approach.

1 Introduction

Heterogeneous computing systems are a commonplace infrastructure that provides re-

sources in a distributed way interconnected via networks for executing parallel appli-

cations with a large amount of data and computing power. In such a system, a parallel

application can be partitioned into a number of cooperative tasks that are distributed

to the resources for parallel execution. However, the performance of a parallel appli-

cation executed on a parallel and distributed computing system heavily depends on

the scheduling of the tasks from the application onto the available processors in the

system, which if not properly solved, can nullify the beneﬁts of parallelization and the

power of the distributed computing resources. Moreover, not only the performance of

the parallel application is deteriorated, but also issues related to energy consumption

are aﬀected if the problem of scheduling is not properly handled [17,20,28].

Johnatan E. Pecero ·Pascal Bouvry

University of Luxembourg

E-mail: {johnatan.pecero, pascal.bouvry}@uni.lu

Aurelio A. Santiago Pineda ·H´ector J. Fraire Huacuja ·Juan J. Gonzalez Barbosa

Instituto Tecnol´ogico de Ciudad Madero

E-mail: alx.santiago@gmail.com, automatas2002@yahoo.com.mx

E-mail: jjgonzalezbarbosa@hotmail.com

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 472 -

In its general form, the precedence-constrained scheduling problem is NP-complete [7].

Therefore, many heuristics based scheduling algorithms have been proposed that ﬁnd

a sub-optimal solution and attempt to balance running time, complexity and sched-

ule quality [5,14–16, 19, 25]. However, the performance of these heuristics is still an

open research problem [4,10,11,27,31]. Consequently, there is an increasing interest

in investigating and designing heuristics for scheduling precedence-constrained paral-

lel programs on heterogeneous computing systems. Main motivations are not only the

availability of heterogeneous computing platforms, such as Grid and P2P systems, but

also the increasing interest in industry and science by executing a number of parallel

applications that can be modelled by precedence task graphs or workﬂows [11].

In this paper we propose a local search algorithm based on an iterative search

process to solve the precedence-constrained scheduling problem. The iterative local

search (ILS) algorithm is a straight-forward, yet powerful techique for extending sim-

ple local search algorithms. We generate a benchmark composed of small size synthetic

deterministic parallel program graphs proposed in literature. We compute the optimal

solution by an enumerative search process that exhaustively explores the search space.

We compare the performance of the proposed ILS algorithm with the optimal values

considering an approximation factor. To evaluate and investigate scalability issues we

also simulated the ILS algorithm using parallel graphs that model real-world applica-

tions. Results of the experimental study show that the algorithm is eﬃcient in solving

the problem providing results close to the optimal value.

The paper is organized as follows. In Section 2 we describe the precedence-constrained

scheduling problem. Section 3 discusses related work. The proposed iterated local search

algorithm is described in Section 4. Next, in Section 5 we present the benchmark and

experimental results. Section 6 concludes the paper.

2 Problem description

The target system used in this work is represented by an undirected unweighted graph

Gs= (Vs, Es), called a system graph (see, e.g. [31]). Vsis the set of Nsnodes of

the system graph representing the mprocessors. Esis the set of edges representing

bidirectional channels between processors and deﬁnes a topology of the distributed

system. The processors have diﬀerent processing speed or provide diﬀerent processing

performance in term of MIPS (Million Instruction Per Second) and communication via

links does not consume any processor time.

As usually, a parallel program is represented by a weighted directed acyclic graph

(DAG). The DAG, called a precedence task graph or a program graph, is deﬁned as

G= (T, E), where Tis a ﬁnite set of nodes (vertices) and Eis a ﬁnite set of edges.

The node ti∈Tis associated with one task tiof the modeled parallel program. To

every task ti, there is an associated value pij representing the computation cost of the

task tion a processor mj, and its average computation cost is denoted as pi. Each

edge (ti1, ti2)∈E(with ti1, ti2∈T) is a precedence constraint between tasks and

represents inter-task data communications if the output produced by task ti1has to be

communicated to the task ti2. We consider the same communication model as in [4].

That is, the data parameter is a t×tmatrix of communication data, where data(ti, tj)

is the amount of data required to be transmitted from task tito task tj. The rate

parameter is a m×mmatrix and represent the data transfer rate between proces-

sors. The communication cost of edge (ti, tj)∈E, which is for data transfer from task

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 473 -

ti(scheduled on processor mk) to task tj(scheduled on processor ml), is deﬁned by

cti,tj=data(ti, tj)/rate(mk, ml). When both tiand tjare scheduled on the same pro-

cessor (pk=pl), then cti,tjbecomes zero. The average communication cost of an edge

is deﬁned by cti,tj=data(ti, tj)/rate, where rate is the average transfer rate between

the processors in the domain. For a given DAG the communication to computation ra-

tio (CCR) is a measure that indicates wether a task graph is communication intensive,

computation intensive or moderate. It is computed by the average communication cost

divided by the average computation cost on a target system.

A simple task graph with its details are shown in Figure 1. The values presented in

the last column of the table are computed based on a frequently used task prioritization

method, the bottom level (blevel). The blevel of a node is the length of the longest path

from the node to an exit node. Note that, both the computation and communication

costs are averaged over all nodes and links. The blevel(ti) is computed recursively by

traversing the DAG upward starting from the exit task texit as follows (Eq. 1):

blevel(ti) = pi+maxtj∈succ(ti){blevel(tj) + cij },(1)

where succ(ti) is the set of immediate successors of tiand blevel(texit ) = (ptexit ).

!

" # $ %

& '

(

"" "( "% ""

"$ "! ") "$ #(

#" "$

task r0r1r2piblevel

0 11 13 9 11 101.3

1 10 15 11 12 66.7

2 9 12 14 12 63.3

3 12 16 10 12 73.0

4 15 11 19 15 79.3

5 13 9 5 9 41.7

6 11 15 13 12 37.3

7 11 15 10 12 12.0

Fig. 1 In the left a sample DAG with the task indexes iinside nodes and values of cti1ti2

function next to the corresponding edges. In the right computation cost (piat level L0) and

task priorities (blevel).

The aim of scheduling is to distribute the tasks among the processors in such a

way that the precedence constraints are preserved, and the response time Cmax (the

total execution time or makespan) is minimized. The response time Cmax for a given

precedence task graph depends on the allocation of tasks in the distributed computing

topology and scheduling policy applied in individual processors [31]:

Cmax =f(allocation, scheduling policy) (2)

The scheduling policy deﬁnes and order of processing tasks and assigns a starting

time for each task, ready to run in a given processor. We will assume that it is the

same for any run of the scheduling algorithm and we will focus on looking for such

an allocation of tasks of a parallel application in a distributed computing system to

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 474 -

minimize the makespan. Although the minimization of makespan is crucial, tasks of a

DAG in this work are not associated with deadlines as in real-time systems.

An approximation factor is used to evaluate the proposed algorithms. The factor

is deﬁned as ρ=Cmax

C∗

max , where C∗

max is the optimal response time or makespan [11].

3 Related Work

The scheduling problem is NP-hard in its simplest version (homogeneous case with-

out considering communications). Therefore, many heuristics have been proposed to

schedule DAG applications on heterogeneous distributed computing systems. A well

known scheduling algorithm is the Heterogeneous Earliest Finish Time (HEFT) algo-

rithm [33]. The HEFT algorithm maintains a list of all tasks of a given graph according

to their priorities, usually based on the blevel method. It consists in two phases. In the

ﬁrst phase, a ready task is selected from the priority list. The task with the highest

priority for which all dependent tasks have ﬁnished is chosen. This process corresponds

to the task prioritizing or task selection phase. Thereafter, a suitable processor that

minimizes a predeﬁned cost function is selected (i.e., the processor selection phase), in

this case the processor which will result in the earliest ﬁnish time of that task. The

HEFT algorithm is one of the most used algorithm as a basis for comparison to eval-

uate the performance of new proposed scheduling algorithms [2–5, 8–10, 12, 13,15,16,

25–27,29,30,35]. Therefore, we also use HEFT in Section 5 to validate the proposed

approach.

A number of local search algorithms for scheduling have been investigated in the

literature. Kowk et al. [24] present a ﬁrst improvement random local search algorithm,

named FAST. In this algorithm, a task is randomly picked and then moved to a ran-

domly selected processor. If the schedule length is reduced, the move is accepted.

Otherwise, the task is moved back to its original processor. Kwok and Ahmad in [23]

modiﬁed the FAST algorithm. The major improvement is that it uses a nested loop

for a probabilistic jump. A parallel version of FAST is named FASTEST. Wu et. al.

in [34] proposed a local search algorithm based on topological ordering. The algorithm

is a deterministic guided search that uses the level of a task, deﬁned as the sum of the

top-level and blevel, to schedule the tasks. The algorithm ﬁrst selects a task with the

largest level and then assigns it to the processor that generates the smallest level for

that task. The level for each task is dynamically calculated and is used to determine

the search direction. However, the considered computing system is based on homoge-

neous processors. Kim et al. [18] report a deterministic local search-based scheduling

algorithm. The algorithm starts with a schedule found by a deterministic scheduling

algorithm and then iteratively attempts to improve the current best solution using a

deterministic guided search method based on prior knowledge about the task schedul-

ing problem and the target computing environment. The main idea is to move tasks to

ﬁll the idle periods of processors. One of the major limitations of this approach is that

it assumes complete knowledge of the problem and that the information about tasks

and communications is always accurate. However, many external events can modify

the parameters of the scheduling problem. Kang et al. [14] propose an iterated greedy

algorithm. The main idea of this algorithm is to improve the quality of the assign-

ment in an iterative manner using results from previous iterations. The algorithm ﬁrst

uses a constructive heuristic to ﬁnd an initial assignment and iteratively improves it in

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 475 -

a greedy way. The authors consider additional resource constraints in the scheduling

problem. However, in this work we do not consider such constraints.

4 Iterated Local Search Based Scheduling

The solution we propose in this work is a scheduling algorithm based on a local search.

Local search was one of the early techniques for combinatorial optimization. The princi-

ple is to reﬁne a given initial solution point in the solution space by searching through a

neighborhood of the solution point. If an improvement can be achieved in this manner,

then a new solution is obtained. This process is continued until no further improvement

can be obtained. However, in many cases, local minima are common and local search

algorithms can converge quickly to these local minima and get stuck in a local optimum

solution far away from the global optimal. We propose a local search algorithm based

on an iterated local search (ILS) to alleviate this problem.

Algorithm 1 shows the pseudocode of ILS we have used in this paper. The ILS

algorithm is a trajectory-based metaheuristic that can be seen as a straight-forward, yet

powerful technique for extending simple local search algorithms. The algorithm starts

oﬀ by generating and evaluating an initial solution. The search process (i.e., the initial

solution) can be initialized in various ways, for example, by starting from a randomly

generated initial solution or from a heuristically constructed solution to the given

problem. Then, following an iteration based approach, it seeks to improve the solutions

from one iteration to the next. At each iteration, a perturbation of the obtained local

optimum is carried out. The perturbation mechanism introduces a modiﬁcation to a

given candidate solution to allow the search process to escape from local optimum. A

local search is applied to the perturbed solution. The new solution is then evaluated

and accepted as the new current solution under some conditions. The algorithm ﬁnishes

when the termination condition is met. We will detail in the following subsections the

core components of the ILS customized to the task scheduling problem.

Algorithm 1 Algorithm outline of ILS based scheduling algorithm

1: sol = GenerateInitialSolution();

2: EvaluateSolution(sol);

3: bestSol = sol;

4: repeat

5: Perturbation(sol);

6: LocalSearch(sol);

7: EvaluateSolution(sol);

8: if sol <bestSol then

9: bestSol = sol;

10: end if

11: until termination condition met

4.1 Initial solution

The ILS based scheduling algorithm starts of by generating an initial feasible solution

as the starting point of search by the local search procedure. We have constructed the

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 476 -

initial solutions as follows. First, we sort tasks by priority, then we select the task with

highest priority and schedule the task to the processor that optimizes a predetermined

objective function. To assign priorities, two diﬀerent methods have been evaluated, the

ﬁrst one based on the blevel method and the second one is based on list scheduling

principle using random feasible orders. The main idea of random feasible orders is

to randomly select a task among the ready tasks and place that task in the top of

a priority list. A task is ready when it is an entry task or when all its predecessors

have already been selected and are in the priority list. The tasks are selected from the

priority list and scheduled on the basis of the HEFT heuristic, that is assigning the

task to the processor that minimizes the earliest ﬁnish time. In the case of blevel +

HEFT, the quality of the initial solution is equal to that of the HEFT algorithm. In

case of random feasible orders the quality of the solution depends on the priority of the

tasks. In this case, we generate 50 initial random feasible solutions. We evaluate each

of the generated solutions and we keep the best among them. Hence, we have evaluated

two diﬀerent methods of constructing the initial solution.

4.2 Perturbation process

The perturbation process is an essential aspect determining the good behavior of an

ILS algorithm, as well as the local search process.

We have tested two diﬀerent conﬁgurations for the perturbation method. As that,

the search space is diﬀerently explored. In the ﬁrst perturbation process, called prob-

ability based movement, every task tiof the current solution has a probability to be

changed from its current processor. If the probability occurs the task tiwill be moved

from its current location and assigned to a new random processor. The second per-

turbation process, called random movement moves only one task at a time from its

current location to a random processor. Although the new assignment in the perturba-

tion methods can be selected by a sophisticated criterion, we have decided to randomly

select the new processor to keep time complexity of the algorithm as low as possible.

4.3 Local search

We propose a local search based on the ﬁrst improvement type pivoting rule. The ﬁrst

improvement strategy tries to avoid the time complexity of evaluating all neighborhoods

by performing the ﬁrst improvement step encountered during the inspection of the

neighborhood. The proposed algorithm evaluates the neighboring candidate solutions

in a particular ﬁxed order. We use the order of the priority list computed in the initial

solution construction (i.e., blevel or random feasible order) as classical list scheduling

algorithms. For each task in the list the neighborhood is deﬁned to be the set of

assignments that can be obtained by removing the task from its current processor and

reallocating it to another. If a makespan improvement is found, then the search process

is restarted from the new solution.

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 477 -

4.4 Acceptance criterion and termination condition

An important step in the ILS algorithm is to consider whether the new schedule is

accepted or not as the incumbent solution for the next iteration. The proposed ILS

accepts solutions if and only if the makespan is improved, and rejects them otherwise.

Diﬀerent conditions can be used to stop the algorithm. The ILS based scheduler

stops the process when the algorithm reaches a maximum number of iterations.

5 Synthetic Benchmark and Performance Comparison

Many heuristics have been developed to solve the heterogeneous DAG scheduling prob-

lem. Most of them use HEFT as a basis for comparison because of the lack of generally

accepted standard benchmarks for the evaluation of scheduling heuristics in the het-

erogeneous distributed computing systems. In this work, we construct a benchmark as

follows. We have collected a set of small size synthetic deterministic non-preemptive

program graphs from literature. We compute the optimal solution and the optimal

value by an enumerative algorithm that performs an exhaustive search. The algorithm

explores all the possible solutions in the search space by keeping the best found so-

lution. The problem is reduced to generate possible permutations with and without

repetitions and evaluate the resulting solutions.

Since generating optimal solutions for arbitrarily structured task graphs takes ex-

ponential time, it is not feasible to obtain optimal solutions for larger graphs [22].

However, to investigate the scalability of the proposed ILS algorithm we have also

used a set of structured real-world parallel applications. The applications used for

our experiments are the robot control application and sparse matrix solver from the

Standard Task Graph set (STG) [32], and a subroutine of the Laser Interferometer

Gravitational-wave Observatory (LIGO) application [6]. Table 1 summarizes the main

characteristics for the used applications: instances size, edges amount and the ratio

between tasks and edges (ETR). ETR gives information of the degree of parallelism.

Table 1 Instance types: tasks and edges numbers, and Edge Task Ratio.

Type Tasks Edges ET R

LIGO 76 132 1.73

Robot 88 131 1.48

Sparse 96 67 0.69

The STG set is composed of homogenous instances with constant execution time

among machines. Since we are interested in heterogenous instances, we have only con-

sidered the structure of these applications and we have implemented the procedure

described in [33] to consider heterogeneity. We ﬁxed the parameter βto 1. Parameter

βis basically the heterogeneity factor for processor speeds. A high percentage value

(i.e., a percentage of 1) causes a signiﬁcant diﬀerence in a task’s computation cost

among the processors. For each graph we have varied the CCR ratio. A randomization

procedure which changes weight of edges was executed to assure the needed CCR. We

have generated ﬁve CCRs (0.1, 0.5, 1, 5, 10) for each graph. Tested system sizes were

8, 16 and 32 processors. We have generated 15 diﬀerent instances for each application

type.

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 478 -

5.1 ILS Conﬁgurations

We have four diﬀerent ILS conﬁgurations depending on the method used to generate

the initial solution and the perturbation process. The four ILS algorithms studied are

listed below.

– ILS-Alg1: It uses the blevel method to generate the order of tasks’ execution

and generate the initial solution. The perturbation process is the probability based

movement.

– ILS-Alg2: The initial solution is based on the blevel method and the used pertur-

bation process is the random movement.

– ILS-Alg3: The algorithm generates the initial solutions based on the random fea-

sible orders. The algorithm generates 50 solutions and the best one is used as the

initial solution. The perturbation process is the probability based movement.

– ILS-Alg4: To construct the initial solution the algorithm generates 50 solutions

using the random feasible orders method and keep the best among all the generated

solutions. The algorithm applies the random movement to perturb the solution

during the iteration process.

We study in Figure 2 the eﬀects of diﬀerent probabilities of the perturbation process

based on the probability movement in order to identify the probability to set in the

experiments. For this experiment we used a LIGO instance in an eight processors

system, and CCR equal to 5. Figure 2 shows a typical run of the algorithms for diﬀerent

probabilities from 0.05 to 0.2 with 0.05 increment. We only show the results obtained

by ILS-Alg1, however the results for the rest of algorithms show the same behavior.

The main result that can be drawn from this study is that probabilities equal to 0.05

and 0.15 perform better than the other two probabilities (equal to 0.1 and 0.2). When

comparing these two probabilities we noticed that using a low probability (0.05) leads

to a faster convergence. Therefore the results we report for the algorithms ILS-Alg1

and ILS-Alg3 were computed using that probability movement. The maximum number

of iterations in the local search process without improving the current best solution

was set to 50. The maximum number of iterations per algorithm (i.e., the termination

condition) was set to 15. The algorithms are executed 15 times on each instance and

each run is independent.

5.2 Results for Synthetic Deterministic Program Graphs

We report in this section some preliminary results obtained with the ILS algorithms. We

compare the results against the global optimal value computed with the enumerative

search method. We also compare the results obtained by HEFT. Table 2 shows the

results for the results reported in literature and the optimal value. The ﬁrst column

corresponds to the name of the instance. The name of the instance is composed as

follows: id m n bestres, where id represents the surname of the main author of the

reference where the instances are proposed, mis the number of processors used in the

reference to schedule the instance, nis the size of the instances, and bestres corresponds

to the best value reported in literature. For example, instance Ahmad 3 9 28 is read as:

Ahmad the id of the instance simulated on three processors system, with 9 tasks in size

and the best result reported is equal to 28 units of time. The second column in Table 2

provides the reference where the instance is proposed. The third column presents the

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 479 -

720

740

760

780

800

820

840

860

880

900

0 20 40 60 80 100 120 140

Response Time

Number of Iterations

Prob 0.05

Prob 0.1

Prob 0.15

Prob 0.2

Fig. 2 Results for various probability values used by the perturbation process based on prob-

ability movement

Table 2 Results for synthetic deterministic program graphs known in the literature and the

optimal value

Instance Reference Best Result optimal

Ahmad 3 9 28 [1] 28 22

Hsu 3 10 84 [12] 84 80

Eswari 2 11 61 [10] 61 56

Hamid 3 10 [2] ? 100

Heteropar 4 12 124 [3] 124 124

Ilav arasan 3 10 77 [13] 77 73

Kang 3 10 76 [16] 76 73

Kang 3 10 84 [15] 84 79

Kuan 3 10 28 [25] 28 26

Liang 3 10 80 [26] 80 73

Daoud 2 11 64 [8] 64 56

Y CLee 3 8 80 [27] 80 66

Sample 3 8 100 100 81

SampleF ig 1 3 8 89 89 66

best result reported in literature. The fourth column shows the optimal value computed

with the enumerative search algorithm. Table 3 depicts the results computed by HEFT

in the third column. The last four columns present the best response time computed by

each one of the four ILS conﬁgurations and the number of times over 15 independent

runs that the best value is equal to the optimal value. Rows in gray color highlight

the algorithm that is able to ﬁnd the optimal value for a given instance in all the

executions.

As we can notice in Table 2 the optimal solutions can not be always found by the

algorithms despite having small size instances with maximum four processor system

and 12 tasks. Moreover, there are some instances for which the ILS conﬁgurations are

not able to ﬁnd the optimal solutions in each run. Recall that the scheduling problem

is NP-hard even for two processors system.

From Table 3 it can be observed that the ILS conﬁgurations with initial solutions

generated using random feasible orders (i.e., ILS Alg3 and ILS Alg4) ﬁnd the optimal

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 480 -

Table 3 Comparison results for synthetic deterministic program graphs known in the litera-

ture

Instance optimal HEFT ILS Alg1 ILS Alg2 ILS Alg3 ILS Alg4

Ahmad 3 9 28 22 24 23 (0) 23 (0) 22 (15) 22 (15)

Hsu 3 10 84 80 92 80 (12) 80 (13) 80 (14) 80 (15)

Eswari 2 11 61 61 56 76 58 (0) 58 (0) 56 (4) 56 (2)

Hamid 3 10 100 110 100 (10) 100 (10) 100 (15) 100 (15)

Heteropar 4 12 124 124 150 124 (12) 124 (15) 124 (15) 124 (15)

Ilav arasan 3 10 77 73 80 73 (13) 73 (13) 73 (10) 73 (10)

Kang 3 10 76 73 80 75 (0) 75 (0) 73 (5) 73 (4)

Kang 3 10 84 79 109 82 (0) 82 (0) 83 (0) 83 (0)

Kuan 3 10 28 26 30 26 (15) 26 (15) 26 (8) 26 (11)

Liang 3 10 80 73 80 73 (13) 73 (13) 73 (15) 73 (10)

Daoud 2 11 64 56 76 58 (0) 58 (0) 56 (3) 56 (1)

Y CLee 3 8 80 66 88 68 (0) 68 (0) 66 (15) 66 (15)

Sample 3 8 100 81 84 81 (1) 81 (7) 81 (4) 81 (4)

SampleF ig 1 3 8 89 66 89 69 (0) 69 (0) 66 (15) 66 (15)

value for a bigger number of instances than the ILS algorithms using blevel as initial

solution (i.e., ILS Alg1 and ILS Alg2). The main reason is that by generating random

initial solutions the algorithm is able to explore diﬀerent regions of the search space.

However, using the deterministic blevel method the algorithms are restricted to a spe-

ciﬁc region of the search space. In fact, the ILS Alg3 and ILS Alg4 algorithms ﬁnd

the optimal value in around 93% of the instances and the algorithms ILS Alg1 and

ILS Alg2 ﬁnd the optimal value for only 56% of the instances. The ILS conﬁgurations

with random feasible orders not only ﬁnd the optimal value for a bigger number of

instances than ILS conﬁgurations with blevel, but also are able to ﬁnd optimal values

in all the executions for a bigger number of instances than ILS Alg1 and ILS Alg2. It

can be veriﬁed in Table 3 with rows highlighted in gray color.

Table 4 Approximation Factor

Instance HEFT ILS Alg1 ILS Alg2 ILS Alg3 ILS Alg4

Ahmad 3 9 28 1.090 1.045 1.045 1 1

Hsu 3 10 84 1.150 1.012 1.011 1.001 1

Eswari 2 11 61 1.366 1.061 1.038 1.054 1.064

Hamid 3 10 1.100 1.019 1.019 1 1

Heteropar 4 12 124 1.209 1.036 1 1 1

Ilav arasan 3 10 77 1.095 1.005 1.005 1.012 1.013

Kang 3 10 76 1.095 1.030 1.030 1.025 1.026

Kang 3 10 84 1.379 1.043 1.046 1.058 1.049

Kuan 3 10 28 1.153 1 1 1.028 1.015

Liang 3 10 80 1.095 1.005 1.005 1 1.010

Daoud 2 11 64 1.366 1.057 1.042 1.058 1.056

Y CLee 3 8 80 1.333 1.030 1.030 1 1

Sample 3 8 100 1.037 1.018 1.008 1.012 1.009

SampleF ig 1 3 8 89 1.348 1.045 1.045 1 1

Average 1.191 1.029 1.022 1.017 1.018

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 481 -

Table 4 provides more details about the performance comparison by presenting

the approximation factor (ρ) for each algorithm. First column provides the name of

instances. Second column shows the approximation factor for the HEFT algorithm.

Recall that HEFT is a well-known scheduling algorithm and is often used as a basis

for comparison and validation of new proposed heuristics. However, to the best of our

knowledge no comparison has been provided between the results computed by HEFT

and the optimal value. The remainder columns present the approximation factor for the

ILS-conﬁgurations. We provide average approximation factor for the ILS algorithms.

From Table 4 we observe that the ILS algorithm with the diﬀerent conﬁgurations

outperform HEFT. The ILS algorithms are around 3% on average (for ILS Alg1) from

the optimal values, however HEFT is around 19% on average from the optimal values.

We see that ILS conﬁgurations based on random feasible orders compute best results

than ILS algorithms using blevel to generate the initial solution. However, the results

are comparable among the four conﬁgurations.

5.3 Results for Real-world Parallel Applications

In this section we present results using the set of real-world parallel applications. The

aim is to evaluate the scalability of the ILS conﬁgurations. We present statistical results.

In our test, the algorithms were run for 15 independent runs. We have evaluated the

ILS algorithms using a Friedman test [21]. The Friedman test is a non-parametric

statistical tool that allows to compare a set of a non normalized population to verify if

does exist signiﬁcant statistical diﬀerences in the sample. Moreover, the ranking shows

which algorithm has the higher performance.

Table 5 provides the results obtained after applying the Friedman test. The rank-

ing shows the performance of the algorithms. As we are dealing with a minimization

problem, the higher the value of the ranking, the better the algorithm performs in

ﬁnding a solution for the set of instances. We can observe that the behavior of the

algorithms is diﬀerent when the problem scale regarding small size instances. The rank

shows that ILS-Alg4 is still computing best results than the other three conﬁgurations.

However, ILS-Alg2 shows better performance than ILS-Alg1 and ILS-ALg3. The com-

mon characteristics of both algorithms is that ILS-Alg4 and ILS-Alg2 use the same

perturbation process, which slightly change the current solution by moving only one

task at a time then exploring the neighbors of the perturbed solution. The main reason

is that the changes made by the algorithms (moving only one task) do not violate a

structure of the population in a huge range. At once the algorithm can continuously

search better and better solution [31]. However, it can be interesting to investigate a

more sophisticated heuristic to decide which task to move based on some knowledge of

the current structure of the scheduling problem under consideration without increasing

complexity. We can assume that the ILS algorithm provides results better or compara-

ble than HEFT. We can verify the results by considering that ILS-Alg1 and ILS-Alg2

use as initial solution the output of HEFT, hence if ILS does not improve the initial

solution we have the same results than HEFT.

We performed a statistical hypothesis testing to verify if the diﬀerence between the

algorithm that provides the best results are signiﬁcant if the null hypothesis is rejected,

or otherwise, if due to the random nature of the solutions, the obtained results may be

considered statistically equivalent.

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 482 -

Table 5 Average Rankings of the algorithms on all the set of instances

Algorithm LIGO Robot Sparse

ILS-Alg1 2.2888 2.3733 2.1155

ILS-Alg2 3.0466 2.9333 3.2355

ILS-Alg3 1.3444 1.3000 1.6155

ILS-Alg4 3.3200 3.3933 3.0333

Table 6 presents the statistical tests that take into account the p-value obtained

between all the algorithms with the best performing one (ILS-Alg4), with a level of

signiﬁcance of α= 0.05, if the observed value is larger than 1 −αquantile of this

distribution, the null hypothesis is rejected. As mentioned, the results indicates that

at least the candidate algorithm gives better performance than at least one of the

compared algorithms.

In Table 6 we can observe that ILS-Alg4 outperforms ILS-Alg1 and ILS-Alg3 conﬁg-

urations, but is statistically equivalent to ILS-Alg2. Meanwhile, ILS-Alg4 ﬁnds better

solutions than ILS-Alg2 when considering Robot application.

Table 6 Statistical hypothesis testing for the best algorithm by Makespan with α= 0.05/4

(Bonferroni correction)

Set ILS-Alg4 vs. p-value H0Rejected

Algorithm

ILS-Alg1 2.4237E-17 Yes

LIGO ILS-Alg2 0.0247 No

ILS-Alg3 3.0522E-59 Yes

ILS-Alg1 5.2857E-17 Yes

Robot ILS-Alg2 1.5727E-4 Yes

ILS-Alg3 2.7254E-66 Yes

ILS-Alg1 4.6884E-14 Yes

Sparse ILS-Alg2 0.0966 No

ILS-Alg3 2.3426E-31 Yes

6 Conclusions and Future Work

In this paper we have proposed an iterative local search based scheduling algorithm

to solve the precedence-constrained scheduling problem in heterogeneous computing

systems. We have generated a synthetic benchmark composed of small size determin-

istic parallel applications. We computed the optimal solutions by using an exhaustive

enumerative search algorithm. The generated benchmark can be used as a basis for

comparison during the design of new scheduling heuristics.

We have investigated four conﬁgurations for the proposed ILS algorithm. We com-

pared the results against optimal solutions. From the set of experimental results we

observed that ILS beneﬁted not only from the generation of several feasible orders

(diversifying solutions) and considering the best one instead of generating only one

solution, but also from the small perturbations in the current solution intensifying the

search in its neighborhood, two basic considerations when designing local search meth-

ods. One of the main advantages of the propose ILS is that it can be coupled with a

more sophisticated heuristic without increasing complexity.

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 483 -

We plan to extend the proposed work. First, we consider to extend the benchmark.

For that, we plan to parallelize the enumerative search method and design a branch

and bound algorithm. We consider to design more sophisticated heuristics to decide

which task to move during the perturbation process. One possibility to explore is to

move the task with highest CCR, such that the communication delay can be reduced.

Nowadays, one important aspect to consider is the size of the workﬂows which usually

are composed of thousands of tasks, we plan to evaluate the scalability of proposed

local search in bigger workﬂows, for that, we consider to apply a partitioning technique

to ﬁrst decompose the DAG graph in subgraphs, and then apply our local search in

a cooperative way to locally optimizes these subgraphs, while considering the global

optimization.

Acknowledgements This work is partially supported by the National Research Fund, Lux-

embourg in the framework of the AFR Green Energy-Eﬃcient Computing project (PDR-09-

067). The mexican researchers was supported by the Consejo Nacional de Ciencia y Tecnologia

(CONACyT), Mexico.

References

1. Ahmad, I., Dhodhi, M., UI-Mustafa, R.: Dps: dynamic priority scheduling heuristic for

heterogeneous computing systems. IEE Proceedings: Computers and Digital Techniques

145(6), 411–418 (1998)

2. Arabnejad, H.: List based task scheduling algorithms on heterogeneous systems - an

overview. http://paginas.fe.up.pt/˜prodei/dsie12/papers/paper 30.pdf (2011). Available

Online. Consulted January, 2013

3. Arabnejad, H., Barbosa, J.: Performance evaluation of list based scheduling on heteroge-

neous systems. http://icl.eecs.utk.edu/heteropar2011/slides/heteropar

JorgeBarbosa.pdf. HeteroPar’11. Consulted online 2012

4. Arabnejad, H., Barbosa, J.G.: Performance evaluation of list based scheduling on heteroge-

neous systems. In: Proceedings of the 2011 international conference on Parallel Processing,

Euro-Par’11, pp. 440–449. Springer-Verlag, Berlin, Heidelberg (2012)

5. Bittencourt, L.F., Sakellariou, R., Madeira, E.R.M.: Dag scheduling using a lookahead

variant of the heterogeneous earliest ﬁnish time algorithm. In: Proceedings of the 2010

18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP

’10, pp. 27–34. IEEE Computer Society (2010)

6. Brown, D., Brady, P., Dietz, A., Cao, J., Johnson, B., McNabb, J.: A case study on the

use of workﬂow technologies for scientiﬁc analysis: Gravitational wave data analysis. In:

I. Taylor, E. Deelman, D. Gannon, M. Shields (eds.) Workﬂows for e-Science, pp. 39–59.

Springer London (2007)

7. Coﬀman, E.G.: Computer and jobshop scheduling theory. John Wiley & Sons Inc (1976)

8. Daoud, M.I., Kharma, N.: A hybrid heuristic-genetic algorithm for task scheduling in

heterogeneous processor networks. J. Parallel Distrib. Comput. 71(11), 1518–1531 (2011)

9. Demir¨oz, B., Topcuoglu, H.R.: Static task scheduling with a uniﬁed objective on time and

resource domains. Comput. J. 49(6), 731–743 (2006)

10. Eswari, R., Nickolas, S.: Path-based heuristic task scheduling algorithm for heterogeneous

distributed computing systems. In: Proceedings of the 2010 International Conference on

Advances in Recent Technologies in Communication and Computing, ARTCOM ’10, pp.

30–34. IEEE Computer Society, Washington, DC, USA (2010)

11. Hirales-Carbajal, A., Tchernykh, A., Yahyapour, R., Gonz´alez-Garc´ıa, J., R¨oblitz, T.,

Ram´ırez-Alcaraz, J.: Multiple workﬂow scheduling strategies with user run time estimates

on a grid. J. Grid Comput. 10(2), 325–346 (2012)

12. Hsu, C.H., Hsieh, C.W., Yang, C.T.: A generalized critical task anticipation technique for

dag scheduling. In: Proceedings of the 7th international conference on Algorithms and

architectures for parallel processing, ICA3PP’07, pp. 493–505. Springer-Verlag, Berlin,

Heidelberg (2007)

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 484 -

13. Ilavarasan, E., Thambidurai, P., Mahilmannan, R.: Performance eﬀective task scheduling

algorithm for heterogeneous computing system. In: Proceedings of the The 4th Interna-

tional Symposium on Parallel and Distributed Computing, ISPDC ’05, pp. 28–38. IEEE

Computer Society, Washington, DC, USA (2005)

14. Kang, Q., He, H., Song, H.: Task assignment in heterogeneous computing systems using

an eﬀective iterated greedy algorithm. J. Syst. Softw. 84(6), 985–992 (2011)

15. Kang, Y., Lin, Y.: A recursive algorithm for scheduling of tasks in a heterogeneous dis-

tributed environment. In: Y. Ding, Y. Peng, R. Shi, K. Hao, L. Wang (eds.) BMEI, pp.

2099–2103. IEEE (2011)

16. Kang, Y., Zhang, Z., Chen, P.: An activity-based genetic algorithm approach to multipro-

cessor scheduling. In: Y. Ding, H. Wang, N. Xiong, K. Hao, L. Wang (eds.) ICNC, pp.

1048–1052. IEEE (2011)

17. Khan, S.U., Ahmad, I.: A cooperative game theoretical technique for joint optimization

of energy consumption and response time in computational grids. IEEE Trans. Parallel

Distrib. Syst. 20(3), 346–360 (2009)

18. Kim, S.C., Lee, S., Hahm, J.: Push-pull: Deterministic search-based dag scheduling for

heterogeneous cluster systems. IEEE Trans. Parallel Distrib. Syst. 18(11), 1489–1502

(2007)

19. Kolodziej, J., Khan, S.U.: Multi-level hierarchic genetic-based scheduling of independent

jobs in dynamic heterogeneous grid environment. Inf. Sci. 214, 1–19 (2012)

20. Kolodziej, J., Khan, S.U., Xhafa, F.: Genetic algorithms for energy-aware scheduling in

computational grids. In: F. Xhafa, L. Barolli, J. Kolodziej, S.U. Khan (eds.) 3PGCIC, pp.

17–24. IEEE (2011)

21. Kvam, P.H., Vidakovic, B.: Nonparametric Statistics with Applications to Science and

Engineering (Wiley Series in Probability and Statistics). Wiley-Interscience (2007)

22. Kwok, Y.K., Ahmad, I.: Eﬃcient scheduling of arbitrary task graphs to multiprocessors

using a parallel genetic algorithm. J. Parallel Distrib. Comput. 47(1), 58–77 (1997). DOI

10.1006/jpdc.1997.1395. URL http://dx.doi.org/10.1006/jpdc.1997.1395

23. Kwok, Y.K., Ahmad, I.: Fastest: A practical low-complexity algorithm for compile-time

assignment of parallel programs to multiprocessors. IEEE Trans. Parallel Distrib. Syst.

10(2), 147–159 (1999)

24. Kwok, Y.K., Ahmad, I., Gu, J.: Fast: A low-complexity algorithm for eﬃcient scheduling

of dags on parallel processors. In: ICPP, Vol. 2, pp. 150–157 (1996)

25. Lai, K.C., Yang, C.T.: A dominant predecessor duplication scheduling algorithm for het-

erogeneous systems. J. Supercomput. 44(2), 126–145 (2008)

26. Lee, L.T., Chen, C.W., Chang, H.Y., Tang, C.C., Pan, K.C.: A non-critical path earliest-

ﬁnish algorithm for inter-dependent tasks in heterogeneous computing environments. In:

HPCC, pp. 603–608. IEEE (2009)

27. Lee, Y.C., Zomaya, A.: A novel state transition method for metaheuristic-based scheduling

in heterogeneous computing systems. IEEE Trans. Parallel Distrib. Syst. 19(9), 1215–1223

(2008)

28. Lee, Y.C., Zomaya, A.Y.: Energy conscious scheduling for distributed computing systems

under diﬀerent operating conditions. IEEE Trans. Parallel Distrib. Syst. 22(8), 1374–1381

(2011)

29. Liu, G.Q., Poh, K.L., Xie, M.: Iterative list scheduling for heterogeneous computing. J.

Parallel Distrib. Comput. 65(5), 654–665 (2005)

30. Shen, L., Choe, T.Y.: Posterior task scheduling algorithms for heterogeneous computing

systems. In: Proceedings of the 7th international conference on High performance com-

puting for computational science, VECPAR’06, pp. 172–183. Springer-Verlag (2007)

31. Switalski, P., Seredynski, F.: Multiprocessor scheduling by generalized extremal optimiza-

tion. J. of Scheduling 13(5), 531–543 (2010)

32. Tobita, T., Kasahara, H.: A standard task graph set for fair evaluation of multiprocessor

scheduling algorithms. Journal of Scheduling 5(5), 379–394 (2002)

33. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-eﬀective and low-complexity task

scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–

274 (2002)

34. Wu, M.Y., Shu, W., Gu, J.: Eﬃcient local search for dag scheduling. IEEE Trans. Parallel

Distrib. Syst. 12(6), 617–627 (2001)

35. Zhao, H., Sakellariou, R.: An experimental investigation into the rank function of the

heterogeneous earliest ﬁnish time scheduling algorithm. In: H. Kosch, H. B¨osz¨orm´enyi

L´aszl´oand Hellwagner (eds.) Euro-Par 2003 Parallel Processing, Lecture Notes in Com-

puter Science, vol. 2790, pp. 189–194. Springer Berlin / Heidelberg (2003)

6th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2013)

27-29 August 2010, Gent, Belgium

- 485 -