Design and analysis of a dynamic scheduling strategy with resource estimation for largescale grid systems
ABSTRACT In this paper, we present a resource conscious dynamic scheduling strategy for handling large volume computationally intensive loads in a grid system involving multiple sources and sinks/processing nodes. We consider a "pullbased" strategy, wherein the processing nodes request load from the sources. We employ the Incremental Balancing Strategy (IBS) algorithm proposed in the literature and propose a buffer estimation strategy to derive optimal load distribution. We consider nontime critical loads that arrive at arbitrary times with time varying buffer availability at sinks and utilize buffer reclamation techniques so as to schedule the loads. We demonstrate detailed workings of the proposed algorithm with illustrative examples using reallife parameters derived from STAR experiments in BNL for scheduling large volume loads.

Article: Research on Scheduling Strategy in Parallel Applications Based on a Hybrid Genetic Algorithm
[Show abstract] [Hide abstract]
ABSTRACT: Efficient scheduling of parallel applications in a dynamic environment reveals several challenges due to its high heterogeneity, dynamic behavior, and space shared utilization. In this paper, first we compared some typical scheduling strategies and pointed out their shortcomings, and then we proposed a new scheduling strategy based on an advanced genetic algorithm, finally we simulated the strategy with the aid of SimGrid toolkit and it was proved reasonable and efficient. It is an effective approach for tasks scheduling in parallel applications.01/2008;  [Show abstract] [Hide abstract]
ABSTRACT: Current High Performance Computing (HPC) applications have seen an explosive growth in the size of data in recent years. Many application scientists have initiated efforts to integrate dataintensive computing into computationalintensive HPC facilities, particularly for data analytics. We have observed several scientific applications which must migrate their data from an HPC storage system to a dataintensive one for analytics. There is a gap between the data semantics of HPC storage and dataintensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization must be performed before existing dataintensive tools such as MapReduce can be used to analyze data. This reorganization requires at least two complete scans through the data set and then at least one MapReduce program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. That is for every MapReduce phase, a distributed read and write operation on the file system must be performed. Our contribution is to develop a MapReducebased framework for HPC analytics to eliminate the multiple scans and also reduce the number of data preprocessing MapReduce programs. We also implement a datacentric scheduler to further improve the performance of HPC analytics MapReduce programs by maintaining the data locality. We have added additional expressiveness to the MapReduce language to allow application scientists to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple data preprocessing MapReduce programs, and 2) the data can be simultaneously reorganized as it is migrated to the dataintensive file system. Using our augmented MapReduce system, MapReduce with Access Patterns (MRAP), we have demonstrated up to 33 percent throughput improvement in one real application, and up to 70 percent in an I/O kernel of another appl cation. Our results for scheduling show up to 49 percent improvement for an I/O kernel of a prevalent HPC analysis application.IEEE Transactions on Parallel and Distributed Systems 01/2013; 24(1):158169. · 2.17 Impact Factor  SourceAvailable from: eecs.harvard.edu[Show abstract] [Hide abstract]
ABSTRACT: Abstract The widespread availability of highspeed internet access has brought about a mi gration of computation from local companyowned servers and personal computers to shared resources or ondemand platforms. Rather than performing computation on local machines, more organizations are utilizing pooled computational resources, e.g., grid computing, or software provided as an ondemand service, e.g., cloud computing. These environments are open in that no single entity has control or full knowledge of outcomes. Entities are owned and deployed by dierent,organizations or individ uals, who have conflicting interests. These entities can be modeled as selfinterested agents with private information. The design of systems deployed in open environ ments must be aligned with the agents’ incentives to ensure desirable outcomes. I propose open mechanism design, an open infrastructure model in which anyone can own resources and deploy mechanisms to support automated decision making and
Page 1
Design and Analysis of a Dynamic Scheduling Strategy
with Resource Estimation for LargeScale Grid Systems
Sivakumar Viswanathan, Bharadwaj Veeravalli
Department of Electrical and Computer Engineering, National University of Singapore
4 Engineering Drive 3, Singapore 117576
{g0306272, elebv}@nus.edu.sg
Dantong Yu
Department of Physics, Brookhaven National Laboratory, Upton, NY 11973, USA
dtyu@bnl.gov
Thomas G. Robertazzi
Department of Electrical and Computer Engineering, Stony Brook University
Stony Brook, NY 11794, USA
tom@ece.sunysb.edu
Abstract
In this paper, we present a resource conscious dynamic
scheduling strategy for handling large volume computa
tionally intensive loads in a Grid system involving multiple
sources and sinks / processing nodes. We consider a “pull
based” strategy, wherein the processing nodes request load
from the sources. We employ the Incremental Balancing
Strategy (IBS) algorithm proposed in the literature and pro
pose a buffer estimation strategy to derive optimal load dis
tribution. Here, we consider nontime critical loads that ar
rive at arbitrary times with time varying buffer availabil
ity at sinks and utilize buffer reclamation techniques so as
to schedule the loads. We demonstrate detailed workings
of the proposed algorithm with illustrative examples using
reallife parameters derived from STAR experiments in BNL
for scheduling large volume loads.
1. Introduction
Large volume computational data that are being gener
ated in the high energy and nuclear physics experiments de
mand new strategies for collecting, sharing, transferring and
analyzing the data. For example, the Solenoidal Tracker At
RHIC (STAR) experiment at Brookhaven National Labora
tories (BNL) is collecting data at the rate of over a Tera
Bytes/day. The STAR collaboration is a large international
collaboration of about 400 high energy and nuclear physi
cists located at 40 institutions in the United States, France,
Russia, Germany, Israel, Poland, and so on. After the Rel
ativistic HeavyIon Collider (RHIC) experiments at BNL
came online in 1999, STAR began data taking and con
current data analysis that will last about ten years. STAR
needs to perform data acquisition and analyzes over approx
imately 250 terabytes of raw data, 1 petabytes of derived
and reconstructed data per year. The volume of data is ex
pected to increase by a factor of 10 in the next five years.
Details on data acquisition and hardware can be found in
[12]. These experiments require effective analysis of large
amounts of data by widely distributed researchers who must
work closely together. Expanding collaborations and inten
sive data analysis coupled with increasing computational
and networking capabilities stimulates a new era of service
oriented computing: Grid computing [1].
Grid computing consists of large sets of diverse, geo
graphically distributed resources that are collected into a
virtual computer for high performance computation. Grid
computingcreatesmiddlewareandstandardstofunctionbe
tween computers and networks to allow full resource shar
ing among individuals, research institutes, and corporate or
ganizations and to dynamically allocate the idle computing
capability to the needed users at remote sites. The diver
sity of these computing resources and their large number
of users creates a challenge to efficiently schedule and uti
lize these resources.
Scheduling is a significant problem in fairly allocat
ing resources in cluster and grid systems. In divisible load
scheduling theory (DLT) domain, scheduling loads under
time varying processor and link speeds have been studied
Page 2
in [6]. However, to date there has been no work on dynami
cally scheduling divisible loads (large volume loads) on a
Grid environment when the resource availability at sinks
varies randomly over time. The motivation for this paper
stems from the challenges in managing and utilizing com
putingresourcesinGridsasefficientlyaspossible,withper
formance optimization as the main focus. The performance
metric of interest includes, job throughput, resource utiliza
tion, job response time and its variance. We propose an ef
ficient dynamic scheduling strategy for situations where in
the jobs needs to be completed as early as possible. We pro
vide detailed analysis of the algorithm with respect to the
above metrics and demonstrate the performance using il
lustrative examples with reallife parameters derived from
STAR experiments in BNL. The analytical flexibility of
fered by DLT is thoroughly exploited to design resource
aware algorithms that make best use of the available re
sources on the grid. It also offers an exciting opportunity
to optimally schedule multiple divisible loads in Grid com
puting.
Our contributions in this paper are multifold. We con
sider the problem of scheduling several loads generated in
a Grid system onto a set of nodes, using the DLT paradigm.
Wedesignandproposeadynamicschedulingalgorithmthat
considers system level constraints such as finite buffer ca
pacity constraints at the processing nodes. We propose re
source reclaiming strategies in our design of the algorithm.
The workings of our algorithm is elaborated using numer
ical example with reallife parameters and data acquired
from STAR experiments, described above. Our algorithm
is designed to adapt to network and load size scalability.
Our study and systematic design clearly elicits the advan
tages offered by our strategy.
The paper is organized as follows: In section 2 we pro
vide the research background and related work for Grid
scheduling and DLT. In section 3 we formalize the multi
source and multisink problem in a Grid system. In section
4wediscusstheloaddistributionstrategyandprovideanin
crementalschedulingstrategyforthedynamicenvironment.
In section 6 we discuss the scheduling algorithm and high
light its advantages and provide the conclusion.
2. Related Work
In this section, we shall now present some of the re
lated work relevant to the problem addressed in this pa
per. For divisible loads, research since 1988 has established
that optimal allocation/scheduling of divisible load to pro
cessors and links can be solved through the use of a very
tractable linear model formulation, referred to as DLT. It is
rich in such features as easy computation, a schematic lan
guage,equivalentnetworkelementmodelling,resultsforin
finite sized networks and numerous applications. This lin
ear theory formulation opens up striking modelling possi
bilities for systems incorporating communication and com
putation issues, as in parallel, distributed and Grid comput
ing. Optimality here involving solution time and speedup is
defined in the context of a specific scheduling policy and in
terconnection topology. The linear model formulation usu
ally produces optimal solutions through linear equation so
lution or, in simpler models, through recursive algebra. The
model can take into account heterogeneous processor and
link speeds as well as relative computation and communi
cation intensity.
DLT can model a wide variety of approaches, such as
store and forward and virtual cut through switching and
the presence or absence of front end processors. Front
end processors allow a processor to both communicate and
compute simultaneously by assuming communication du
ties [8]. There exists literature of some sixty journal papers
on DLT. In addition to the monograph [8], two introduc
tory uptodate surveys have been published recently [4,9].
The DLT theory has been proven to be remarkably flexible
in the sense that the model allows analytical tractability to
derive a rich set of results regarding several important prop
erties of the proposed strategies and to analyze their perfor
mance. Consequently, we do not attempt to present another
survey here; however, we refer to the following papers that
are either directly or indirectly relevant to the context of this
paper. A very directly relevant material to the problem ad
dressed in this paper is [3] in which authors propose an In
cremental Balancing Strategy (IBS) to accommodate divis
ible loads when there are buffer constraints in the proces
sors. An alternative scheme for dynamic environments that
considers admission control mechanisms has been studied
recently in [10]. Issues such as processor release times cou
pled with buffer capacity constraints are studied in [7]. The
solution time (time at which the processed loads/solution is
made known at the originator) is discussed in [5]. We re
fer astute readers to the recent surveys mentioned above for
an uptodate literature in this domain.
3. Problem Formulation and Some Remarks
The Grid computing system to be considered here com
prises of N control processors, referred to as sources, that
have load to be processed and M computing elements, re
ferredtoassinks,forprocessingloads,asshowninFigure1.
Each sink might include one supercomputer or a cluster of
computers connected by local area networks and controlled
by a header (root) node, and may have different computing
and communication capabilities. In a simplified view, these
clusters of processors can be replaced with a single equiva
lent processor.The grid computing system can then be mod
elled as a fully connected bipartite graph as in Figure 2: a
set of graph vertices could be decomposed into two disjoint
Page 3
???????????????
??
??????? ???????? ????? !"#$%&'()
Figure 1. Grid System
sets such that no two graph vertices within the same set are
adjacent, while any pair of two graph vertices from these
two sets are adjacent. Thus this bipartite graph is a repre
sentation of the fact that each source can schedule its load
on all the sinks.
In reallife situations, one of the practical constraints is
the availability of the resources on a Grid. For instance, this
resource could be the memory capacity that a sink can of
fer to the sources. Thus, whenever two or more sources
compete for a sink, the available memory (equivalently, the
amount of load that can be accepted for processing from
each source) is to be shared among such sources. We pre
cisely consider this reallife constraint in our proposed al
gorithm. Further, on such a Grid environment, it is possi
ble that one can either follow a pushbased approach or a
pullbased approach to distribute and schedule the loads.
In a pushbased approach, the sources themselves identify
potential sinks (with an assumption/knowledge about the
available resources at the sinks) and schedule their loads.
Whereas, in a pullbased approach, the sinks collect the re
quests from the competing sources and schedule them de
pending on the availability of the resources among them
selves. Both the schemes have their merits and the choice
of the approach depends purely on the application require
ments and implementation constraints such as the size of
the grid, the resource availability, etc. In this paper, we shall
consider a pullbased approach in the design of our schedul
ing strategy.
Now, we will formally define the problem we address.
As described above, we consider a Grid system with N
sources denoted as S1,S2,...,SNand M sinks denoted as
K1,K2,...,KM. For each source, there is a direct link to
all the sinks and we denote the link between Siand Kjas
li,j, i = 1,..,N, j = 1,..,M. Each source Sihas a load,
denoted by Lito process. Without loss of generality, we as
sume that all sources can send their loads to all the sinks si
multaneously. Similarly, we also assume that all the sinks
can request and receive load portions from all sources.
The objective in this study is to schedule all the N loads
Figure 2. Abstract Overview of Grid System
among M sink nodes such that the processing time, defined
as the time instant when all the M sinks complete process
ing the loads, is a minimum. The scheduling strategy is such
thatthescheduler(withoutlossofgeneralityweassumethat
the scheduler resides in K1) will first obtain the informa
tion about the available memory capacities at other sinks,
their computing speeds, and the size of the loads from the
sources. The scheduler will then calculate and notify each
sink on the optimum load fractions that are to be requested
from each source. This information can be easily commu
nicated using any of the standard or customized communi
cation protocols without incurring any significant commu
nication overhead.
The sources, upon knowing the amount of loads that they
should give to each sink, will send their loads to all sinks si
multaneously. Following Kim’s model [2], we assume that
the sinks will start computing the load fractions as they start
receiving them. We also assume that the communication
time delay is negligibly smaller than the computation time
owing to high speed links so that no sinks starve for load.
We shall now introduce the definitions and notations that
are used throughout this paper.
N (M) Total number of sources (sinks) in the system, with each
source (sink) denoted by Si, i = 1,...N (Kj, j = 1,...M).
wjInverse of the computing speed of Kj.
Load at Si
such that the total load in the system,
L = Sum (Li,∀i = 1...N).
αi,j (ˆ αi,j) Amount of load (estimated) that Kjshall request from
Siin an iteration.
αjFraction of load from L(q)that Kjshould take in an iteration,
αj = Sum (αi,j,∀i = 1...N).
TcpComputing intensity constant.
Li
T(q)Time taken to process the loads in the qth iteration.
Y Fraction of the load L that should be taken into consideration in
an iteration of installment, where Y ≤ 1.
p Buffer estimator confidence factor.
B(q)
j
iteration.
(ˆ B(q)
j
) Available (Estimated) buffer space in Kjin the qth
Pall(Pnow) Set of sinks (with buffer space available for processing
in an iteration) in the system.
Page 4
XnowSet of sources that are being processed in an iteration.
Xnew Set of sources that arrive at the system when the system is
idle or busy processing for some sources.
4. Dynamic Incremental Scheduling Strategy
We employ Kim’s multiport communication model [2]
for load distribution and assume that K1generates the re
quired schedule satisfying the resource constraints. Here,
we also assume that the sinks will start computing the load
fractions as they start receiving them and that the commu
nication time delay is negligibly smaller than the computa
tion time owing to high speed links so that no sinks starve
for load. In the DLT literature [8], in order to derive an op
timal solution it was mentioned that it is necessary and suf
ficient that all the sinks that participate in the computation
must stop at the same time instant; otherwise, load could be
redistributed to improve the processing time.
Using this optimality principle and assuming infinite
buffer space at sink nodes, i.e., a sink can hold any amount
of load from the sources, load fractions that a sink Kjshall
receive from the source Siis derived in [11] as
αi,j=
1
wj(?M
x=1
1
wx)Li
(1)
While deriving these load fractions, it is assumed that each
sink requests a load fraction that is proportional to the size
of the load at the source. Moreover, each sink requests the
same load fraction (percentage of total load) from each
source.
However, in reallife situations, each sink always has a
limit to the amount of buffer space that can be used. Fur
ther, in a generic Grid environment, each node may be run
ning multiple tasks such that it is required to share the avail
able resources, hence there may be only a limited amount of
buffer space that is allocated for processing particular loads
at a given time. As a result, we are naturally confronted with
the problem of scheduling divisible loads under buffer ca
pacity constraints. The IBS algorithm proposed in [3] pro
duces a minimum time solution given prespecified buffer
capacity constraints and it also exhibits finite convergence,
but it does not consider scheduling under dynamic envi
ronments and buffer space variations at processing nodes.
In this paper, we propose a Dynamic Incremental Schedul
ing Strategy (DISS) that takes into account the variations
in buffer space availabilities with sink nodes and also pro
poseanadaptiveestimationschemetoscheduletheprocess
ing loads in an incremental fashion.
In a reallife system, the total amount of loads to be pro
cessed may exceed the available buffer space at sinks and
the loads may also arrive at arbitrary times to the system for
processing. Thus, the number of loads to be processed may
vary over time and also demand for processing may arise at
Figure 3. Flowchart for a ”Pullbased” DISS
with Distributed Buffer Space Estimation
any time. It will be difficult to estimate a priori the maxi
mum amount of load that may be in the system. This is es
pecially true on Grids, as any node can attempt to inject a
load when it has one to be processed. Under such condi
tions, a feasible scheduling may not exist unless the sink
nodes allow their buffer space to be reclaimed after a given
load is processed. This means that, after processing a given
load, the sinks shall make their buffer space available for
subsequent processing. Thus in order to handle the situa
tion wherein sources demand processing at various time in
stants, dynamic scheduling strategies needs to be designed
in such a way that sinks continue to render their available
buffer space to the sources. The dynamic scheduling strate
Page 5
Initial state:
I = {1,2,...N} , J = {1,2,...M} , q = 0 , T(0)= 0 , p = 0.95
ˆ B(1)
jj
= B(0)
, α(0)
i,j= 0
Step 1: K1computes ˆ α(q+1)
all Sink Nodes:
Li= Li− Sum (α(q)
If (Xnew?= ∅) { Xnow= Xnow∪ Xnew, Xnew= ∅ }
If (Li= 0) {Xnow= Xnow− {Si}}, ∀Si ∈ Xnow, i ∈ I
Pnow = Pall
If(ˆ B(q+1)
j
= 0 ) Pnow = Pnow − Kj, ∀Kj, j ∈ J
α(q+1)
j
=
wj∗ Sum (
L = Sum (Li,∀i = 1...N) , ∀Si ∈ Xnow, i ∈ I
ˆ
B(q+1)
j
α(q+1)
j
If (Y > 1) {Y = 1}
ˆ α(q+1)
i,j
j
∗ Li, ∀Si ∈ Xnow, i ∈ I , ∀Kj, j ∈ J
T(q+1)= Sum (ˆ α(q+1)
i,j
,∀i = 1...N) ∗ wj ∗ Tcp, Kj ∈ Pnow
K1communicates ˆ α(q+1)
i,j
& T(q+1)to all Sink Nodes
i,j
& T(q+1)and communicates them to
i,j,∀j = 1...M) , ∀Si ∈ Xnow, i ∈ I
1
1
wx,∀x=1...M), ∀Kj ∈ Pnow, j ∈ J
Y = min{
∗ L, ∀Kj ∈ Pnow} , j ∈ J
= Y ∗ α(q+1)
Step 2: All Sink Nodes computeˆ B(q+1)
them to K1:
All Sink Nodes wait till their Current Time = T(q)
q = q + 1
=Sum ((k ∗ B(k)
Sum (k ,∀k=1...q)
If ( Sum (ˆ α(q)
j
& α(q)
i,jand communicate
ˆ B(q+1)
j
j
) ,∀k=1...q)
∗ p
i,j,∀i = 1...N) > B(q)
j
) {
α(q)
i,j= ˆ α(q)
i,j∗
B(q)
j
Sum (ˆ α(q)
i,j,∀i=1...N)
& α(q)
Communicateˆ B(q+1)
j
i,jto K1}
else { α(q)
Communicateˆ B(q+1)
i,j= ˆ α(q)
i,j
j
to K1}
Step 3: All Sink Nodes schedule the loads from Source Nodes:
B(q)
jj
Sink Nodes request and process load fractions α(q)
Go to Step 1
= B(q)
− Sum (α(q)
i,j,∀i = 1...N)
i,jfrom Source Nodes
Figure 4. Pseudo code for a ”Pullbased”
DISS with Distributed Buffer Space Estima
tion
gies also need to take into account that the amount of buffer
space available at sinks may vary over time and this vari
ation may not be known a priori. Under such conditions,
the sinks shall estimate the amount of buffer space that it
could offer for scheduling in the next iteration and commu
nicate it to the scheduler node. A buffer estimation strategy
is described later in the Section 5. With this information,
the scheduler node shall generate the required schedule sat
isfying the resource constraints. It may be noted that in or
der to estimate the load fractions ˆ αi,j, the scheduler will use
Equation 1.
As long as there is sufficient load in the system to com
pletelyconsumetheestimatedamountofbufferspaceatone
of the sink nodes, the load fractions αi,jthat a sink Kjmay
request from a source Sihas to be reduced by a factor Y ,
given by
?
This ensures that at each iteration all the sinks that partici
pateinprocessingtheloadscompleteprocessingatthesame
time instant, if the actual buffer space available at a sink
node is equal to the estimated one at that node. The algo
rithm attempts to fill up one or more sinks’ buffer space in
every iteration. In any iteration, if the remaining load is not
enough to completely consume the buffer space at any of
the participating nodes, the suggested distribution by equa
tion (1) will be used. When two sinks have identical buffer
space, the one at the fastest sink will be fully utilized. As
long as there is enough load to be processed in an iteration,
the algorithm ensures that at least one sink’s buffer is com
pletely utilized. The processing time for the qth iteration is
given by,
N
?
In the above algorithm, the load fractions have been cal
culated based on the estimated buffer availabilities at the
sinks. But, at the start of the next iteration, the actual buffer
availabilities at the sinks may be different from the esti
mated values. As long as the load fractions assigned to each
sink node by the node K1is less than or equal to the ac
tual buffer availabilities at those sink nodes, the sink nodes
can request for the load fractions assigned to them from the
sources. But, if the buffer available at a sink is less than the
load fraction assigned to it, then it could not process the ex
cess load that has been assigned to it. Hence, those sinks
shall recompute the load fraction as given by
Y = min
ˆBj
(αjL)
?
(2)
T(q)=
i=1
ˆ α(q)
i,jwjTcp
(3)
αi,j = ˆ αi,j ∗
Bj
i=1ˆ αi,j
?N
(4)
and request these load fractions from the sources. In addi
tion to requesting these load fractions from the sources, the
Page 6
sink node also has to communicate the modified load frac
tions that it has requested from the sources to the master
sink node K1. This is done for computing the exact amount
of loads that remain at the sources for processing for the
next iteration, by K1. This information can be piggy backed
along with the estimated buffer availability at the sink nodes
that all sink nodes communicate to the master sink node
K1. Also, let us suppose that K1attempts to fill the en
tire buffer of K2. Suppose if K2could not accommodate all
the load assigned to it, then it modifies the value of ˆ αi,jas
signed to it. Then K2together with other sink nodes that did
not participate in processing in that iteration shall wait for
all the other nodes to complete their processing (that is, for
the time T(q)) before requesting for loads from the sources
again.
The optimal load fractions for the (q + 1)th iteration
shall be estimated by the master sink node K1while pro
cessing the load for the (q)th iteration, based on the to
tal amount of load that remains to be processed. This pro
cess shall continue until all of the loads are processed. Note
that in this case, the load requesting by the sinks and pro
cessing are dynamic in the sense that the IBS algorithm is
invoked to estimate the load distribution depending on the
number of sources and their respective load sizes. It may
be noted that it is not necessary for a sink to render dimin
ishing buffer space in every iteration to each source, since,
the load to be processed from a source also diminishes. Fur
ther, it should be realized that the buffer space availability in
sinks does not have an affinity towards any source. Thus, if
no other sources demand processing, then the entire buffer
is allocated to the demanding source. Figures 3 and 4 sum
marizes the above discussed policy.
The new set of loads and the unprocessed loads from
the existing sources are considered together for schedul
ing in the next iteration. The following example clarifies
the working principle of the above strategy. The parame
ters and data for this example are from reallife high en
ergy nuclear physics experiments [12].
Example 1: Let us suppose that there are three sources
with loads to be processed and there are four sinks that
can process these loads. Let the speed parameter of sinks
be w1 = 1.11 × 10−9, w2 = 6.25 × 10−10, w3 =
5.00 × 10−10and w4 = 3.57 × 10−10, respectively. Let
Tcp= 6.52 × 1012sec/load. Let the actual buffer capacities
at sinks at the initial state (that is, iteration q = 0) and itera
tion q = 1 be B(q)
1
= 6, B(q)
2
= 5, B(q)
at iteration q = 2 be B(2)
1
= 4, B(2)
and B(2)
4
= 1; at iteration q = 3 be B(3)
B(3)
3
= 2,andB(3)
4
= 1;andatiterationq = 4beB(4)
B(4)
2
= 1, B(4)
3
= 3, and B(4)
4
values are generated randomly using a uniform probability
3
= 0, and B(q)
= 3, B(2)
= 2, B(3)
4
= 2;
= 1,
= 0,
= 1,
23
1
2
1
= 1 units respectively. These
q = 1
K1
K2
K3
K4
q = 2
K1
K2
K3
K4
q = 3
K1
K2
K3
K4
q = 4
K1
K2
K3
K4
?ˆ α(1)
1.143
0.000
2.000
?ˆ α(2)
0.970
0.000
1.698
?ˆ α(3)
0.506
0.633
0.886
?ˆ α(4)
0.415
0.518
0.727
i,j
?α(1)
1.143
0.000
2.000
?α(2)
0.970
0.000
1.000
?α(3)
0.000
0.633
0.886
?α(4)
0.415
0.518
0.727
i,j
ˆB(1)
j
6.000
5.000
0.000
2.000
ˆB(2)
j
5.700
4.750
0.000
1.900
ˆB(3)
j
4.433
3.483
0.633
1.267
ˆB(4)
j
3.167
1.742
1.267
1.108
B(1)
j
5.357
3.857
0.000
0.000
B(2)
j
3.454
2.030
1.000
0.000
B(3)
j
1.716
0.000
1.367
0.114
B(4)
j
0.765
0.585
2.482
0.273
0.6430.643
i,ji,j
0.5460.546
i,j i,j
0.2840.284
i,ji,j
0.2350.235
Table 1. Buffer utilization values
distribution in the range [0,7] in each iteration. We let the
three sources to have loads L1= 5, L2= 2 and L3= 3 unit
loads, respectively. Let loads L1and L2arrive at t = 0 sec
onds, and load L3arrive at t = 5 × 103seconds. Note that
the computationally intensive nature of the problem is re
flected by the parameter Tcp. Using the above algorithm,
we have the values for α(q)
unutilized buffer space in all the iterations is given in the
last column of Table 1. From, these results, we observe that
buffer of K4is fully utilized in iterations 1 and 2, whereas
buffer of K3is not at all utilized in iteration 2 (because es
timated buffer size is 0 for that iteration). For iteration 3,
buffer of K3is estimated to be less than the actual value and
hence buffers of all the available sinks are under utilized in
that iteration. At the final iteration, the remaining load is in
sufficient to completely fill up the buffer at any of the sinks.
The distribution suggested by the values αi,jin the Table 2
are used by the sinks. Iteration 1 to 4 are scheduled at time
t = 0, 4.655 × 103, 8.607 × 103, and 1.067 × 104sec
onds, respectively. The total processing time for process
ing all the three loads is t = 1.236 × 104seconds. From
this example, it is seen that, because of the new source S3
and the buffer space variations at the sinks, the process
ing time for the other sources in the system is stretched to
t = 1.236 × 104seconds. Below we describe the buffer es
timation strategy and its impact on the performance with re
spect to this example is discussed in Section 6.
i,jas shown in Tables 1 and 2. The
Page 7
q = 1
K1
K2
K3
K4
q = 2
K1
K2
K3
K4
q = 3
K1
K2
K3
K4
q = 4
K1
K2
K3
K4
S1
S2
?α(1)
1.143
0.000
2.000
?α(2)
0.970
0.000
1.000
?α(3)
0.000
0.633
0.886
?α(4)
0.415
0.518
0.727
i,j
0.459
0.817
0.000
1.429
0.184
0.326
0.000
0.571
0.643
S1
S2
i,j
0.390
0.693
0.000
0.714
0.156
0.277
0.000
0.286
0.546
S1
S2
S3
i,j
0.038
0.000
0.085
0.119
0.015
0.000
0.034
0.048
0.231
0.000
0.514
0.719
0.284
S1
S2
S3
i,j
0.032
0.056
0.070
0.098
0.013
0.022
0.028
0.040
0.190
0.337
0.420
0.589
0.235
Table 2. Values for load fractions
5. Buffer estimation Strategy
Weproposeadistributedbufferestimationstrategybased
on weighted average calculations. The weights for comput
ing the estimates are based on the iteration indices until the
current iteration. We refer to this estimator as Iteration In
dex based Buffer estimator (IIB). Our IIB algorithm shall be
executed at all sink nodes. A sink node, after estimating the
buffer space to render in the next iteration, shall communi
cate it to the master sink node K1. Then, K1shall execute
the dynamic scheduling algorithm described in Figure 4 to
determine the ˆ αi,jthat the participating sink nodes shall re
quest from the sources.
For estimating the buffer availability at a sink, each sink
Kjneeds to keep track of the actual buffer sizes Bjfrom its
previous iterations. Note that for implementation purposes
it is sufficient to keep a cumulative value for the weighted
buffer space. In any iteration q, each sink node shall esti
mate the buffer size that will be available for the next itera
tion (q + 1) as
??q
⇒
=
?q
and declare it to the master sink node. In Equation 5, p is
ˆB(q+1)
j
=
k=1((k/q) ∗ B(k)
?q
k=1(k ∗ B(k)
k=1k
j)
k=1(k/q)
?
∗ p
??q
j)
?
∗ p
(5)
the probability that the estimated buffer size will be avail
able at a sink at the next iteration. The value of p can be
chosen based on the confidence level of the buffer estima
tor. For practical purposes we shall assume that p equals
0.95. This guarantees that the expected buffer sizes will be
available at the sinks, with a confidence level of 95%, for
the next iteration. This may be observed in our example as
discussed in Section 6.
6. Discussions and Conclusions
The contributions in this paper are geared towards de
signing and analyzing a dynamic scheduling strategy for
handling large volume loads that arrive to a grid system for
processing. The strategy that we proposed in this paper is
suitable for handling large scale data generated in physics
experiments (as discussed in Section 1). Since a Grid infras
tructure is always viewed as a repository of resources that
can be availed by careful scheduling, implicit to this prob
lem are some reallife constraints such as availability of the
nodes for processing, the amount of resources they can ren
der, speeds with which the nodes and links can respond etc.
Also, as in the case of any networked system, here too, we
can follow a “pullbased” or “pushbased” approaches. In
this paper, we considered a pullbased strategy. Further, we
considered a reallife situation where in the sinks have fi
nite sized buffer and hence the available buffer space have
to be shared in an optimal manner among the competing
sources. Also, we assumed that every sink attempts to re
quest loads from all participating sources for processing.
We tuned the IBS algorithm proposed in the literature to
tackle the posed problem. In addition, since the availabil
ity of buffer spaces is dynamic, we proposed an estimation
strategy IIB which works on weighted average values as ex
plained. The impact of IIB with respect to Example 1 is as
follows. In Table 1 we show the estimated as well as the ac
tual loads requested by the sinks. Further we also project
the estimated buffer values. Following are important points
to observe. In iteration 1, the estimated and the actual loads
being same, the buffer rendered is adequate to handle the es
timated load. However, this does not carry far in the second
iteration. In iteration 2, we observe that at K4, the estimated
load being more than the actual buffer rendered, the actual
load that is to be requested is tailored to adapt to the avail
ablespace.Itmayalsobeobservedthatiniteration2,thees
timated buffers take into account the actual buffers rendered
in the past iteration. This will be cumulatively done in each
iteration, which is indeed the essence of our design. Fur
ther, in iteration 2, the actual buffer available is unutilized,
as the estimated value is 0. This prevents K3to request any
load from the source at this iteration. This is somewhat nat
ural to expect which is captured in our design. Another im
portant observation comes from the fact that in iteration 2, if
Page 8
the estimated load sizes have been requested by all the sinks
then the sources S1and S2could have been completely pro
cessed in this iteration itself. However since K4could not
accommodate the estimate load, S1and S2are forced to be
considered for scheduling in the future iterations as well.
Also, note that although S3becomes available for pro
cessing after iteration 2 starts, it is considered for process
ing in iteration 3 onwards. Note that in iteration 3, the es
timated buffer at K3is observed to be less than the actual
buffer available. Thus, the scheduler considers a load based
on a minimum of the actual or estimated buffer space. In our
case, this turns out to be the estimated buffer value. Now,
when the estimated total load to be processed is less than or
equal to the available buffer spaces, then all the loads could
be scheduled and processed at this iteration itself. This hap
pens at iteration 4 in our example.
The proposed IIB strategy works as long as the buffer
variations are not drastic. Further, if new loads arrive to
the system before the loads being processed are completed,
thentheprocessingofexistingloadswillbestretched.Thus,
when loads to be processed are not timecritical the strategy
is highly recommended. This aspect indeed triggers an open
issue for refining this approach to accommodate an admis
sion control mechanism that could adapt to random arrivals
of the loads. This is one of our future extensions.
Acknowledgments
Thomas Robertazzi’s research is funded by NSF
grant CCR9912331. Dantong Yu’s research is sup
ported by DOE PPDG/ATLAS/RHIC grants. Sivakumar
Viswanathan’s research is supported by Institute for Info
comm Research, Singapore.
References
[1] I. Foster and C. Kesselman, editors. The Grid: Blueprint for
a New Computing Infrastructure. Morgan Kaufman, 1999.
[2] H.J. Kim. A Novel Optimal Load Distribution Algorithm
for Divisible Loads. Kluwer Academic Publishers, January
2003.
[3] X. Li, B. Veeravalli, and C. Ko. Divisible Load Schedul
ing on Single Level Tree Networks with Buffer Constraints.
IEEE Transactions on Aerospace and Electronic Systems,
36(4):1298–1308, Oct. 2000.
[4] T. Robertazzi. Ten Reasons to Use Divisible Load Theory.
Computer, 2003.
[5] A. Rosenberg. Sharing Partitionable Workload in Heteroge
neous NOWs: Greedier is Not Better. In Proc. of IEEE Inter
national Conference on Cluster Computing, pages 124–131,
Newport Beach CA, USA, 2001.
[6] J. Sohn and T. Robertazzi. An Optimal Load Sharing Strat
egy for Divisible Jobs with TimeVarying Processor Speeds.
IEEE Transactions on Aerospace and Electronic Systems,
34(3):907–923, July 1998.
[7] B. Veeravalli and G. Barlas. Scheduling Divisible Loads
with Processor Release Times and Finite Size Buffer Ca
pacity Constraints. In T. G. Robertazzi and D. Ghose, ed
itors, special issue of Cluster Computing on Divisible Load
Scheduling, volume 6 of 1, pages 63–74. Kluwer Academic
Publishers, Jan. 2003.
[8] B. Veeravalli, D. Ghose, V. Mani, and T. Robertazzi.
Scheduling Divisible Loads in Parallel and Distributed Sys
tems.IEEE Computer Society Press, Los Alamitos, CA,
Sept. 1996.
[9] B. Veeravalli, D. Ghose, and T. G. Robertazzi. Divisible
Load Theory: A New Paradigm for Load Scheduling in Dis
tributed Systems. In T. G. Robertazzi and D. Ghose, edi
tors, special issue of Cluster Computing on Divisible Load
Scheduling, volume 6 of 1, pages 7–18. Kluwer Academic
Publishers, Jan. 2003.
[10] S. Viswanathan, B. Veeravalli, D. Yu, and T. G. Robertazzi.
Pullbased resource aware scheduling on largescale compu
tational grid systems. Technical Report TR/OSSL/VB/GC
012004.
[11] H.Wong,D.Yu,B.Veeravalli,andT.Robertazzi. DataInten
sive Grid Scheduling: Multiple Sources with Capacity Con
straints. In IASTED International Conference on Parallel
and Distributed Computing and Systems (PDCS 2003), Ma
rina del Rey, CA, Nov. 2003.
[12] D. Yu and T. Robertazzi. Divisible Load Scheduling for Grid
Computing. In IASTED International Conference on Paral
lel and Distributed Computing and Systems (PDCS 2003),
Marina del Rey, CA, Nov. 2003.
View other sources
Hide other sources
 Available from Thomas Robertazzi · May 30, 2014
 Available from psu.edu