Page 1

Design and Analysis of a Dynamic Scheduling Strategy

with Resource Estimation for Large-Scale Grid Systems

Sivakumar Viswanathan, Bharadwaj Veeravalli

Department of Electrical and Computer Engineering, National University of Singapore

4 Engineering Drive 3, Singapore 117576

{g0306272, elebv}@nus.edu.sg

Dantong Yu

Department of Physics, Brookhaven National Laboratory, Upton, NY 11973, USA

dtyu@bnl.gov

Thomas G. Robertazzi

Department of Electrical and Computer Engineering, Stony Brook University

Stony Brook, NY 11794, USA

tom@ece.sunysb.edu

Abstract

In this paper, we present a resource conscious dynamic

scheduling strategy for handling large volume computa-

tionally intensive loads in a Grid system involving multiple

sources and sinks / processing nodes. We consider a “pull-

based” strategy, wherein the processing nodes request load

from the sources. We employ the Incremental Balancing

Strategy (IBS) algorithm proposed in the literature and pro-

pose a buffer estimation strategy to derive optimal load dis-

tribution. Here, we consider non-time critical loads that ar-

rive at arbitrary times with time varying buffer availabil-

ity at sinks and utilize buffer reclamation techniques so as

to schedule the loads. We demonstrate detailed workings

of the proposed algorithm with illustrative examples using

real-life parameters derived from STAR experiments in BNL

for scheduling large volume loads.

1. Introduction

Large volume computational data that are being gener-

ated in the high energy and nuclear physics experiments de-

mand new strategies for collecting, sharing, transferring and

analyzing the data. For example, the Solenoidal Tracker At

RHIC (STAR) experiment at Brookhaven National Labora-

tories (BNL) is collecting data at the rate of over a Tera-

Bytes/day. The STAR collaboration is a large international

collaboration of about 400 high energy and nuclear physi-

cists located at 40 institutions in the United States, France,

Russia, Germany, Israel, Poland, and so on. After the Rel-

ativistic Heavy-Ion Collider (RHIC) experiments at BNL

came on-line in 1999, STAR began data taking and con-

current data analysis that will last about ten years. STAR

needs to perform data acquisition and analyzes over approx-

imately 250 tera-bytes of raw data, 1 peta-bytes of derived

and reconstructed data per year. The volume of data is ex-

pected to increase by a factor of 10 in the next five years.

Details on data acquisition and hardware can be found in

[12]. These experiments require effective analysis of large

amounts of data by widely distributed researchers who must

work closely together. Expanding collaborations and inten-

sive data analysis coupled with increasing computational

and networking capabilities stimulates a new era of service

oriented computing: Grid computing [1].

Grid computing consists of large sets of diverse, geo-

graphically distributed resources that are collected into a

virtual computer for high performance computation. Grid

computingcreatesmiddlewareandstandardstofunctionbe-

tween computers and networks to allow full resource shar-

ing among individuals, research institutes, and corporate or-

ganizations and to dynamically allocate the idle computing

capability to the needed users at remote sites. The diver-

sity of these computing resources and their large number

of users creates a challenge to efficiently schedule and uti-

lize these resources.

Scheduling is a significant problem in fairly allocat-

ing resources in cluster and grid systems. In divisible load

scheduling theory (DLT) domain, scheduling loads under

time varying processor and link speeds have been studied

Page 2

in [6]. However, to date there has been no work on dynami-

cally scheduling divisible loads (large volume loads) on a

Grid environment when the resource availability at sinks

varies randomly over time. The motivation for this paper

stems from the challenges in managing and utilizing com-

putingresourcesinGridsasefficientlyaspossible,withper-

formance optimization as the main focus. The performance

metric of interest includes, job throughput, resource utiliza-

tion, job response time and its variance. We propose an ef-

ficient dynamic scheduling strategy for situations where in

the jobs needs to be completed as early as possible. We pro-

vide detailed analysis of the algorithm with respect to the

above metrics and demonstrate the performance using il-

lustrative examples with real-life parameters derived from

STAR experiments in BNL. The analytical flexibility of-

fered by DLT is thoroughly exploited to design resource

aware algorithms that make best use of the available re-

sources on the grid. It also offers an exciting opportunity

to optimally schedule multiple divisible loads in Grid com-

puting.

Our contributions in this paper are multi-fold. We con-

sider the problem of scheduling several loads generated in

a Grid system onto a set of nodes, using the DLT paradigm.

Wedesignandproposeadynamicschedulingalgorithmthat

considers system level constraints such as finite buffer ca-

pacity constraints at the processing nodes. We propose re-

source reclaiming strategies in our design of the algorithm.

The workings of our algorithm is elaborated using numer-

ical example with real-life parameters and data acquired

from STAR experiments, described above. Our algorithm

is designed to adapt to network and load size scalability.

Our study and systematic design clearly elicits the advan-

tages offered by our strategy.

The paper is organized as follows: In section 2 we pro-

vide the research background and related work for Grid

scheduling and DLT. In section 3 we formalize the multi-

source and multi-sink problem in a Grid system. In section

4wediscusstheloaddistributionstrategyandprovideanin-

crementalschedulingstrategyforthedynamicenvironment.

In section 6 we discuss the scheduling algorithm and high-

light its advantages and provide the conclusion.

2. Related Work

In this section, we shall now present some of the re-

lated work relevant to the problem addressed in this pa-

per. For divisible loads, research since 1988 has established

that optimal allocation/scheduling of divisible load to pro-

cessors and links can be solved through the use of a very

tractable linear model formulation, referred to as DLT. It is

rich in such features as easy computation, a schematic lan-

guage,equivalentnetworkelementmodelling,resultsforin-

finite sized networks and numerous applications. This lin-

ear theory formulation opens up striking modelling possi-

bilities for systems incorporating communication and com-

putation issues, as in parallel, distributed and Grid comput-

ing. Optimality here involving solution time and speedup is

defined in the context of a specific scheduling policy and in-

terconnection topology. The linear model formulation usu-

ally produces optimal solutions through linear equation so-

lution or, in simpler models, through recursive algebra. The

model can take into account heterogeneous processor and

link speeds as well as relative computation and communi-

cation intensity.

DLT can model a wide variety of approaches, such as

store and forward and virtual cut through switching and

the presence or absence of front end processors. Front

end processors allow a processor to both communicate and

compute simultaneously by assuming communication du-

ties [8]. There exists literature of some sixty journal papers

on DLT. In addition to the monograph [8], two introduc-

tory up-to-date surveys have been published recently [4,9].

The DLT theory has been proven to be remarkably flexible

in the sense that the model allows analytical tractability to

derive a rich set of results regarding several important prop-

erties of the proposed strategies and to analyze their perfor-

mance. Consequently, we do not attempt to present another

survey here; however, we refer to the following papers that

are either directly or indirectly relevant to the context of this

paper. A very directly relevant material to the problem ad-

dressed in this paper is [3] in which authors propose an In-

cremental Balancing Strategy (IBS) to accommodate divis-

ible loads when there are buffer constraints in the proces-

sors. An alternative scheme for dynamic environments that

considers admission control mechanisms has been studied

recently in [10]. Issues such as processor release times cou-

pled with buffer capacity constraints are studied in [7]. The

solution time (time at which the processed loads/solution is

made known at the originator) is discussed in [5]. We re-

fer astute readers to the recent surveys mentioned above for

an up-to-date literature in this domain.

3. Problem Formulation and Some Remarks

The Grid computing system to be considered here com-

prises of N control processors, referred to as sources, that

have load to be processed and M computing elements, re-

ferredtoassinks,forprocessingloads,asshowninFigure1.

Each sink might include one supercomputer or a cluster of

computers connected by local area networks and controlled

by a header (root) node, and may have different computing

and communication capabilities. In a simplified view, these

clusters of processors can be replaced with a single equiva-

lent processor.The grid computing system can then be mod-

elled as a fully connected bi-partite graph as in Figure 2: a

set of graph vertices could be decomposed into two disjoint

Page 3

???????????????

??

??????? ???????? ????? !"#$%&'()

Figure 1. Grid System

sets such that no two graph vertices within the same set are

adjacent, while any pair of two graph vertices from these

two sets are adjacent. Thus this bi-partite graph is a repre-

sentation of the fact that each source can schedule its load

on all the sinks.

In real-life situations, one of the practical constraints is

the availability of the resources on a Grid. For instance, this

resource could be the memory capacity that a sink can of-

fer to the sources. Thus, whenever two or more sources

compete for a sink, the available memory (equivalently, the

amount of load that can be accepted for processing from

each source) is to be shared among such sources. We pre-

cisely consider this real-life constraint in our proposed al-

gorithm. Further, on such a Grid environment, it is possi-

ble that one can either follow a push-based approach or a

pull-based approach to distribute and schedule the loads.

In a push-based approach, the sources themselves identify

potential sinks (with an assumption/knowledge about the

available resources at the sinks) and schedule their loads.

Whereas, in a pull-based approach, the sinks collect the re-

quests from the competing sources and schedule them de-

pending on the availability of the resources among them-

selves. Both the schemes have their merits and the choice

of the approach depends purely on the application require-

ments and implementation constraints such as the size of

the grid, the resource availability, etc. In this paper, we shall

consider a pull-based approach in the design of our schedul-

ing strategy.

Now, we will formally define the problem we address.

As described above, we consider a Grid system with N

sources denoted as S1,S2,...,SNand M sinks denoted as

K1,K2,...,KM. For each source, there is a direct link to

all the sinks and we denote the link between Siand Kjas

li,j, i = 1,..,N, j = 1,..,M. Each source Sihas a load,

denoted by Lito process. Without loss of generality, we as-

sume that all sources can send their loads to all the sinks si-

multaneously. Similarly, we also assume that all the sinks

can request and receive load portions from all sources.

The objective in this study is to schedule all the N loads

Figure 2. Abstract Overview of Grid System

among M sink nodes such that the processing time, defined

as the time instant when all the M sinks complete process-

ing the loads, is a minimum. The scheduling strategy is such

thatthescheduler(withoutlossofgeneralityweassumethat

the scheduler resides in K1) will first obtain the informa-

tion about the available memory capacities at other sinks,

their computing speeds, and the size of the loads from the

sources. The scheduler will then calculate and notify each

sink on the optimum load fractions that are to be requested

from each source. This information can be easily commu-

nicated using any of the standard or customized communi-

cation protocols without incurring any significant commu-

nication overhead.

The sources, upon knowing the amount of loads that they

should give to each sink, will send their loads to all sinks si-

multaneously. Following Kim’s model [2], we assume that

the sinks will start computing the load fractions as they start

receiving them. We also assume that the communication

time delay is negligibly smaller than the computation time

owing to high speed links so that no sinks starve for load.

We shall now introduce the definitions and notations that

are used throughout this paper.

N (M) Total number of sources (sinks) in the system, with each

source (sink) denoted by Si, i = 1,...N (Kj, j = 1,...M).

wjInverse of the computing speed of Kj.

Load at Si

such that the total load in the system,

L = Sum (Li,∀i = 1...N).

αi,j (ˆ αi,j) Amount of load (estimated) that Kjshall request from

Siin an iteration.

αjFraction of load from L(q)that Kjshould take in an iteration,

αj = Sum (αi,j,∀i = 1...N).

TcpComputing intensity constant.

Li

T(q)Time taken to process the loads in the q-th iteration.

Y Fraction of the load L that should be taken into consideration in

an iteration of installment, where Y ≤ 1.

p Buffer estimator confidence factor.

B(q)

j

iteration.

(ˆ B(q)

j

) Available (Estimated) buffer space in Kjin the q-th

Pall(Pnow) Set of sinks (with buffer space available for processing

in an iteration) in the system.

Page 4

XnowSet of sources that are being processed in an iteration.

Xnew Set of sources that arrive at the system when the system is

idle or busy processing for some sources.

4. Dynamic Incremental Scheduling Strategy

We employ Kim’s multi-port communication model [2]

for load distribution and assume that K1generates the re-

quired schedule satisfying the resource constraints. Here,

we also assume that the sinks will start computing the load

fractions as they start receiving them and that the commu-

nication time delay is negligibly smaller than the computa-

tion time owing to high speed links so that no sinks starve

for load. In the DLT literature [8], in order to derive an op-

timal solution it was mentioned that it is necessary and suf-

ficient that all the sinks that participate in the computation

must stop at the same time instant; otherwise, load could be

redistributed to improve the processing time.

Using this optimality principle and assuming infinite

buffer space at sink nodes, i.e., a sink can hold any amount

of load from the sources, load fractions that a sink Kjshall

receive from the source Siis derived in [11] as

αi,j=

1

wj(?M

x=1

1

wx)Li

(1)

While deriving these load fractions, it is assumed that each

sink requests a load fraction that is proportional to the size

of the load at the source. Moreover, each sink requests the

same load fraction (percentage of total load) from each

source.

However, in real-life situations, each sink always has a

limit to the amount of buffer space that can be used. Fur-

ther, in a generic Grid environment, each node may be run-

ning multiple tasks such that it is required to share the avail-

able resources, hence there may be only a limited amount of

buffer space that is allocated for processing particular loads

at a given time. As a result, we are naturally confronted with

the problem of scheduling divisible loads under buffer ca-

pacity constraints. The IBS algorithm proposed in [3] pro-

duces a minimum time solution given pre-specified buffer

capacity constraints and it also exhibits finite convergence,

but it does not consider scheduling under dynamic envi-

ronments and buffer space variations at processing nodes.

In this paper, we propose a Dynamic Incremental Schedul-

ing Strategy (DISS) that takes into account the variations

in buffer space availabilities with sink nodes and also pro-

poseanadaptiveestimationschemetoscheduletheprocess-

ing loads in an incremental fashion.

In a real-life system, the total amount of loads to be pro-

cessed may exceed the available buffer space at sinks and

the loads may also arrive at arbitrary times to the system for

processing. Thus, the number of loads to be processed may

vary over time and also demand for processing may arise at

Figure 3. Flowchart for a ”Pull-based” DISS

with Distributed Buffer Space Estimation

any time. It will be difficult to estimate a priori the maxi-

mum amount of load that may be in the system. This is es-

pecially true on Grids, as any node can attempt to inject a

load when it has one to be processed. Under such condi-

tions, a feasible scheduling may not exist unless the sink

nodes allow their buffer space to be reclaimed after a given

load is processed. This means that, after processing a given

load, the sinks shall make their buffer space available for

subsequent processing. Thus in order to handle the situa-

tion wherein sources demand processing at various time in-

stants, dynamic scheduling strategies needs to be designed

in such a way that sinks continue to render their available

buffer space to the sources. The dynamic scheduling strate-

Page 5

Initial state:

I = {1,2,...N} , J = {1,2,...M} , q = 0 , T(0)= 0 , p = 0.95

ˆ B(1)

jj

= B(0)

, α(0)

i,j= 0

Step 1: K1computes ˆ α(q+1)

all Sink Nodes:

Li= Li− Sum (α(q)

If (Xnew?= ∅) { Xnow= Xnow∪ Xnew, Xnew= ∅ }

If (Li= 0) {Xnow= Xnow− {Si}}, ∀Si ∈ Xnow, i ∈ I

Pnow = Pall

If(ˆ B(q+1)

j

= 0 ) Pnow = Pnow − Kj, ∀Kj, j ∈ J

α(q+1)

j

=

wj∗ Sum (

L = Sum (Li,∀i = 1...N) , ∀Si ∈ Xnow, i ∈ I

ˆ

B(q+1)

j

α(q+1)

j

If (Y > 1) {Y = 1}

ˆ α(q+1)

i,j

j

∗ Li, ∀Si ∈ Xnow, i ∈ I , ∀Kj, j ∈ J

T(q+1)= Sum (ˆ α(q+1)

i,j

,∀i = 1...N) ∗ wj ∗ Tcp, Kj ∈ Pnow

K1communicates ˆ α(q+1)

i,j

& T(q+1)to all Sink Nodes

i,j

& T(q+1)and communicates them to

i,j,∀j = 1...M) , ∀Si ∈ Xnow, i ∈ I

1

1

wx,∀x=1...M), ∀Kj ∈ Pnow, j ∈ J

Y = min{

∗ L, ∀Kj ∈ Pnow} , j ∈ J

= Y ∗ α(q+1)

Step 2: All Sink Nodes computeˆ B(q+1)

them to K1:

All Sink Nodes wait till their Current Time = T(q)

q = q + 1

=Sum ((k ∗ B(k)

Sum (k ,∀k=1...q)

If ( Sum (ˆ α(q)

j

& α(q)

i,jand communicate

ˆ B(q+1)

j

j

) ,∀k=1...q)

∗ p

i,j,∀i = 1...N) > B(q)

j

) {

α(q)

i,j= ˆ α(q)

i,j∗

B(q)

j

Sum (ˆ α(q)

i,j,∀i=1...N)

& α(q)

Communicateˆ B(q+1)

j

i,jto K1}

else { α(q)

Communicateˆ B(q+1)

i,j= ˆ α(q)

i,j

j

to K1}

Step 3: All Sink Nodes schedule the loads from Source Nodes:

B(q)

jj

Sink Nodes request and process load fractions α(q)

Go to Step 1

= B(q)

− Sum (α(q)

i,j,∀i = 1...N)

i,jfrom Source Nodes

Figure 4. Pseudo code for a ”Pull-based”

DISS with Distributed Buffer Space Estima-

tion

gies also need to take into account that the amount of buffer

space available at sinks may vary over time and this vari-

ation may not be known a priori. Under such conditions,

the sinks shall estimate the amount of buffer space that it

could offer for scheduling in the next iteration and commu-

nicate it to the scheduler node. A buffer estimation strategy

is described later in the Section 5. With this information,

the scheduler node shall generate the required schedule sat-

isfying the resource constraints. It may be noted that in or-

der to estimate the load fractions ˆ αi,j, the scheduler will use

Equation 1.

As long as there is sufficient load in the system to com-

pletelyconsumetheestimatedamountofbufferspaceatone

of the sink nodes, the load fractions αi,jthat a sink Kjmay

request from a source Sihas to be reduced by a factor Y ,

given by

?

This ensures that at each iteration all the sinks that partici-

pateinprocessingtheloadscompleteprocessingatthesame

time instant, if the actual buffer space available at a sink

node is equal to the estimated one at that node. The algo-

rithm attempts to fill up one or more sinks’ buffer space in

every iteration. In any iteration, if the remaining load is not

enough to completely consume the buffer space at any of

the participating nodes, the suggested distribution by equa-

tion (1) will be used. When two sinks have identical buffer

space, the one at the fastest sink will be fully utilized. As

long as there is enough load to be processed in an iteration,

the algorithm ensures that at least one sink’s buffer is com-

pletely utilized. The processing time for the q-th iteration is

given by,

N

?

In the above algorithm, the load fractions have been cal-

culated based on the estimated buffer availabilities at the

sinks. But, at the start of the next iteration, the actual buffer

availabilities at the sinks may be different from the esti-

mated values. As long as the load fractions assigned to each

sink node by the node K1is less than or equal to the ac-

tual buffer availabilities at those sink nodes, the sink nodes

can request for the load fractions assigned to them from the

sources. But, if the buffer available at a sink is less than the

load fraction assigned to it, then it could not process the ex-

cess load that has been assigned to it. Hence, those sinks

shall recompute the load fraction as given by

Y = min

ˆBj

(αjL)

?

(2)

T(q)=

i=1

ˆ α(q)

i,jwjTcp

(3)

αi,j = ˆ αi,j ∗

Bj

i=1ˆ αi,j

?N

(4)

and request these load fractions from the sources. In addi-

tion to requesting these load fractions from the sources, the

Page 6

sink node also has to communicate the modified load frac-

tions that it has requested from the sources to the master

sink node K1. This is done for computing the exact amount

of loads that remain at the sources for processing for the

next iteration, by K1. This information can be piggy backed

along with the estimated buffer availability at the sink nodes

that all sink nodes communicate to the master sink node

K1. Also, let us suppose that K1attempts to fill the en-

tire buffer of K2. Suppose if K2could not accommodate all

the load assigned to it, then it modifies the value of ˆ αi,jas-

signed to it. Then K2together with other sink nodes that did

not participate in processing in that iteration shall wait for

all the other nodes to complete their processing (that is, for

the time T(q)) before requesting for loads from the sources

again.

The optimal load fractions for the (q + 1)-th iteration

shall be estimated by the master sink node K1while pro-

cessing the load for the (q)-th iteration, based on the to-

tal amount of load that remains to be processed. This pro-

cess shall continue until all of the loads are processed. Note

that in this case, the load requesting by the sinks and pro-

cessing are dynamic in the sense that the IBS algorithm is

invoked to estimate the load distribution depending on the

number of sources and their respective load sizes. It may

be noted that it is not necessary for a sink to render dimin-

ishing buffer space in every iteration to each source, since,

the load to be processed from a source also diminishes. Fur-

ther, it should be realized that the buffer space availability in

sinks does not have an affinity towards any source. Thus, if

no other sources demand processing, then the entire buffer

is allocated to the demanding source. Figures 3 and 4 sum-

marizes the above discussed policy.

The new set of loads and the unprocessed loads from

the existing sources are considered together for schedul-

ing in the next iteration. The following example clarifies

the working principle of the above strategy. The parame-

ters and data for this example are from real-life high en-

ergy nuclear physics experiments [12].

Example 1: Let us suppose that there are three sources

with loads to be processed and there are four sinks that

can process these loads. Let the speed parameter of sinks

be w1 = 1.11 × 10−9, w2 = 6.25 × 10−10, w3 =

5.00 × 10−10and w4 = 3.57 × 10−10, respectively. Let

Tcp= 6.52 × 1012sec/load. Let the actual buffer capacities

at sinks at the initial state (that is, iteration q = 0) and itera-

tion q = 1 be B(q)

1

= 6, B(q)

2

= 5, B(q)

at iteration q = 2 be B(2)

1

= 4, B(2)

and B(2)

4

= 1; at iteration q = 3 be B(3)

B(3)

3

= 2,andB(3)

4

= 1;andatiterationq = 4beB(4)

B(4)

2

= 1, B(4)

3

= 3, and B(4)

4

values are generated randomly using a uniform probability

3

= 0, and B(q)

= 3, B(2)

= 2, B(3)

4

= 2;

= 1,

= 0,

= 1,

23

1

2

1

= 1 units respectively. These

q = 1

K1

K2

K3

K4

q = 2

K1

K2

K3

K4

q = 3

K1

K2

K3

K4

q = 4

K1

K2

K3

K4

?ˆ α(1)

1.143

0.000

2.000

?ˆ α(2)

0.970

0.000

1.698

?ˆ α(3)

0.506

0.633

0.886

?ˆ α(4)

0.415

0.518

0.727

i,j

?α(1)

1.143

0.000

2.000

?α(2)

0.970

0.000

1.000

?α(3)

0.000

0.633

0.886

?α(4)

0.415

0.518

0.727

i,j

ˆB(1)

j

6.000

5.000

0.000

2.000

ˆB(2)

j

5.700

4.750

0.000

1.900

ˆB(3)

j

4.433

3.483

0.633

1.267

ˆB(4)

j

3.167

1.742

1.267

1.108

B(1)

j

5.357

3.857

0.000

0.000

B(2)

j

3.454

2.030

1.000

0.000

B(3)

j

1.716

0.000

1.367

0.114

B(4)

j

0.765

0.585

2.482

0.273

0.6430.643

i,ji,j

0.5460.546

i,ji,j

0.2840.284

i,ji,j

0.2350.235

Table 1. Buffer utilization values

distribution in the range [0,7] in each iteration. We let the

three sources to have loads L1= 5, L2= 2 and L3= 3 unit

loads, respectively. Let loads L1and L2arrive at t = 0 sec-

onds, and load L3arrive at t = 5 × 103seconds. Note that

the computationally intensive nature of the problem is re-

flected by the parameter Tcp. Using the above algorithm,

we have the values for α(q)

unutilized buffer space in all the iterations is given in the

last column of Table 1. From, these results, we observe that

buffer of K4is fully utilized in iterations 1 and 2, whereas

buffer of K3is not at all utilized in iteration 2 (because es-

timated buffer size is 0 for that iteration). For iteration 3,

buffer of K3is estimated to be less than the actual value and

hence buffers of all the available sinks are under utilized in

that iteration. At the final iteration, the remaining load is in-

sufficient to completely fill up the buffer at any of the sinks.

The distribution suggested by the values αi,jin the Table 2

are used by the sinks. Iteration 1 to 4 are scheduled at time

t = 0, 4.655 × 103, 8.607 × 103, and 1.067 × 104sec-

onds, respectively. The total processing time for process-

ing all the three loads is t = 1.236 × 104seconds. From

this example, it is seen that, because of the new source S3

and the buffer space variations at the sinks, the process-

ing time for the other sources in the system is stretched to

t = 1.236 × 104seconds. Below we describe the buffer es-

timation strategy and its impact on the performance with re-

spect to this example is discussed in Section 6.

i,jas shown in Tables 1 and 2. The

Page 7

q = 1

K1

K2

K3

K4

q = 2

K1

K2

K3

K4

q = 3

K1

K2

K3

K4

q = 4

K1

K2

K3

K4

S1

S2

?α(1)

1.143

0.000

2.000

?α(2)

0.970

0.000

1.000

?α(3)

0.000

0.633

0.886

?α(4)

0.415

0.518

0.727

i,j

0.459

0.817

0.000

1.429

0.184

0.326

0.000

0.571

0.643

S1

S2

i,j

0.390

0.693

0.000

0.714

0.156

0.277

0.000

0.286

0.546

S1

S2

S3

i,j

0.038

0.000

0.085

0.119

0.015

0.000

0.034

0.048

0.231

0.000

0.514

0.719

0.284

S1

S2

S3

i,j

0.032

0.056

0.070

0.098

0.013

0.022

0.028

0.040

0.190

0.337

0.420

0.589

0.235

Table 2. Values for load fractions

5. Buffer estimation Strategy

Weproposeadistributedbufferestimationstrategybased

on weighted average calculations. The weights for comput-

ing the estimates are based on the iteration indices until the

current iteration. We refer to this estimator as Iteration In-

dex based Buffer estimator (IIB). Our IIB algorithm shall be

executed at all sink nodes. A sink node, after estimating the

buffer space to render in the next iteration, shall communi-

cate it to the master sink node K1. Then, K1shall execute

the dynamic scheduling algorithm described in Figure 4 to

determine the ˆ αi,jthat the participating sink nodes shall re-

quest from the sources.

For estimating the buffer availability at a sink, each sink

Kjneeds to keep track of the actual buffer sizes Bjfrom its

previous iterations. Note that for implementation purposes

it is sufficient to keep a cumulative value for the weighted

buffer space. In any iteration q, each sink node shall esti-

mate the buffer size that will be available for the next itera-

tion (q + 1) as

??q

⇒

=

?q

and declare it to the master sink node. In Equation 5, p is

ˆB(q+1)

j

=

k=1((k/q) ∗ B(k)

?q

k=1(k ∗ B(k)

k=1k

j)

k=1(k/q)

?

∗ p

??q

j)

?

∗ p

(5)

the probability that the estimated buffer size will be avail-

able at a sink at the next iteration. The value of p can be

chosen based on the confidence level of the buffer estima-

tor. For practical purposes we shall assume that p equals

0.95. This guarantees that the expected buffer sizes will be

available at the sinks, with a confidence level of 95%, for

the next iteration. This may be observed in our example as

discussed in Section 6.

6. Discussions and Conclusions

The contributions in this paper are geared towards de-

signing and analyzing a dynamic scheduling strategy for

handling large volume loads that arrive to a grid system for

processing. The strategy that we proposed in this paper is

suitable for handling large scale data generated in physics

experiments (as discussed in Section 1). Since a Grid infras-

tructure is always viewed as a repository of resources that

can be availed by careful scheduling, implicit to this prob-

lem are some real-life constraints such as availability of the

nodes for processing, the amount of resources they can ren-

der, speeds with which the nodes and links can respond etc.

Also, as in the case of any networked system, here too, we

can follow a “pull-based” or “push-based” approaches. In

this paper, we considered a pull-based strategy. Further, we

considered a real-life situation where in the sinks have fi-

nite sized buffer and hence the available buffer space have

to be shared in an optimal manner among the competing

sources. Also, we assumed that every sink attempts to re-

quest loads from all participating sources for processing.

We tuned the IBS algorithm proposed in the literature to

tackle the posed problem. In addition, since the availabil-

ity of buffer spaces is dynamic, we proposed an estimation

strategy IIB which works on weighted average values as ex-

plained. The impact of IIB with respect to Example 1 is as

follows. In Table 1 we show the estimated as well as the ac-

tual loads requested by the sinks. Further we also project

the estimated buffer values. Following are important points

to observe. In iteration 1, the estimated and the actual loads

being same, the buffer rendered is adequate to handle the es-

timated load. However, this does not carry far in the second

iteration. In iteration 2, we observe that at K4, the estimated

load being more than the actual buffer rendered, the actual

load that is to be requested is tailored to adapt to the avail-

ablespace.Itmayalsobeobservedthatiniteration2,thees-

timated buffers take into account the actual buffers rendered

in the past iteration. This will be cumulatively done in each

iteration, which is indeed the essence of our design. Fur-

ther, in iteration 2, the actual buffer available is unutilized,

as the estimated value is 0. This prevents K3to request any

load from the source at this iteration. This is somewhat nat-

ural to expect which is captured in our design. Another im-

portant observation comes from the fact that in iteration 2, if

Page 8

the estimated load sizes have been requested by all the sinks

then the sources S1and S2could have been completely pro-

cessed in this iteration itself. However since K4could not

accommodate the estimate load, S1and S2are forced to be

considered for scheduling in the future iterations as well.

Also, note that although S3becomes available for pro-

cessing after iteration 2 starts, it is considered for process-

ing in iteration 3 onwards. Note that in iteration 3, the es-

timated buffer at K3is observed to be less than the actual

buffer available. Thus, the scheduler considers a load based

on a minimum of the actual or estimated buffer space. In our

case, this turns out to be the estimated buffer value. Now,

when the estimated total load to be processed is less than or

equal to the available buffer spaces, then all the loads could

be scheduled and processed at this iteration itself. This hap-

pens at iteration 4 in our example.

The proposed IIB strategy works as long as the buffer

variations are not drastic. Further, if new loads arrive to

the system before the loads being processed are completed,

thentheprocessingofexistingloadswillbestretched.Thus,

when loads to be processed are not time-critical the strategy

is highly recommended. This aspect indeed triggers an open

issue for refining this approach to accommodate an admis-

sion control mechanism that could adapt to random arrivals

of the loads. This is one of our future extensions.

Acknowledgments

Thomas Robertazzi’s research is funded by NSF

grant CCR-99-12331. Dantong Yu’s research is sup-

ported by DOE PPDG/ATLAS/RHIC grants. Sivakumar

Viswanathan’s research is supported by Institute for Info-

comm Research, Singapore.

References

[1] I. Foster and C. Kesselman, editors. The Grid: Blueprint for

a New Computing Infrastructure. Morgan Kaufman, 1999.

[2] H.-J. Kim. A Novel Optimal Load Distribution Algorithm

for Divisible Loads. Kluwer Academic Publishers, January

2003.

[3] X. Li, B. Veeravalli, and C. Ko. Divisible Load Schedul-

ing on Single Level Tree Networks with Buffer Constraints.

IEEE Transactions on Aerospace and Electronic Systems,

36(4):1298–1308, Oct. 2000.

[4] T. Robertazzi. Ten Reasons to Use Divisible Load Theory.

Computer, 2003.

[5] A. Rosenberg. Sharing Partitionable Workload in Heteroge-

neous NOWs: Greedier is Not Better. In Proc. of IEEE Inter-

national Conference on Cluster Computing, pages 124–131,

Newport Beach CA, USA, 2001.

[6] J. Sohn and T. Robertazzi. An Optimal Load Sharing Strat-

egy for Divisible Jobs with Time-Varying Processor Speeds.

IEEE Transactions on Aerospace and Electronic Systems,

34(3):907–923, July 1998.

[7] B. Veeravalli and G. Barlas. Scheduling Divisible Loads

with Processor Release Times and Finite Size Buffer Ca-

pacity Constraints. In T. G. Robertazzi and D. Ghose, ed-

itors, special issue of Cluster Computing on Divisible Load

Scheduling, volume 6 of 1, pages 63–74. Kluwer Academic

Publishers, Jan. 2003.

[8] B. Veeravalli, D. Ghose, V. Mani, and T. Robertazzi.

Scheduling Divisible Loads in Parallel and Distributed Sys-

tems.IEEE Computer Society Press, Los Alamitos, CA,

Sept. 1996.

[9] B. Veeravalli, D. Ghose, and T. G. Robertazzi. Divisible

Load Theory: A New Paradigm for Load Scheduling in Dis-

tributed Systems. In T. G. Robertazzi and D. Ghose, edi-

tors, special issue of Cluster Computing on Divisible Load

Scheduling, volume 6 of 1, pages 7–18. Kluwer Academic

Publishers, Jan. 2003.

[10] S. Viswanathan, B. Veeravalli, D. Yu, and T. G. Robertazzi.

Pull-based resource aware scheduling on large-scale compu-

tational grid systems. Technical Report TR/OSSL/VB/GC-

01-2004.

[11] H.Wong,D.Yu,B.Veeravalli,andT.Robertazzi. DataInten-

sive Grid Scheduling: Multiple Sources with Capacity Con-

straints. In IASTED International Conference on Parallel

and Distributed Computing and Systems (PDCS 2003), Ma-

rina del Rey, CA, Nov. 2003.

[12] D. Yu and T. Robertazzi. Divisible Load Scheduling for Grid

Computing. In IASTED International Conference on Paral-

lel and Distributed Computing and Systems (PDCS 2003),

Marina del Rey, CA, Nov. 2003.