The effect of time delays on the stability of load balancing algorithms for parallel computations
ABSTRACT A deterministic dynamic nonlinear timedelay system is developed to model load balancing in a cluster of computer nodes used for parallel computations. The model is shown to be self consistent in that the queue lengths cannot go negative and the total number of tasks in all the queues and the network are conserved (i.e., load balancing can neither create nor lose tasks). Further, it is shown that using the proposed load balancing algorithms, the system is stable in the sense of Lyapunov. Experimental results are presented and compared with the predicted results from the analytical model. In particular, simulations of the models are compared with an experimental implementation of the load balancing algorithm on a distributed computing network.

Conference Paper: Adaptive energyefficient task partitioning for heterogeneous multicore multiprocessor realtime systems
[Show abstract] [Hide abstract]
ABSTRACT: The designs of heterogeneous multicore multiprocessor realtime systems are evolving for higher energy efficiency at the cost of increased heat density. This adversely effects the reliability and performance of the realtime systems. Moreover, the partitioning of periodic realtime tasks based on their worst case execution time can lead to significant energy wastage. In this paper, we investigate adaptive energyefficient task partitioning for heterogeneous multicore multiprocessor realtime systems. We use a power model which incorporates the impact of temperature and voltage of a processor on its static power consumption. Two different thermal models are used to estimate the peak temperature of a processor. We develop two feedbackbased optimization and control approaches for adaptively partitioning realtime tasks according to their actual utilizations. Simulation results show that the proposed approaches are effective in minimizing the energy consumption and reducing the number of task migrations.High Performance Computing and Simulation (HPCS), 2012 International Conference on; 01/2012 
Conference Paper: Optimal distribution of heterogeneous agents under delays
[Show abstract] [Hide abstract]
ABSTRACT: An analytical framework for the study of a generic distribution problem is introduced in which a group of agents with different capabilities intend to maximize total utility by dividing themselves into various subgroups without any form of global informationsharing or centralized decisionmaking. The marginal utility of belonging to a particular subgroup rests on the wellknown concept in economic theory of the law of diminishing returns. For a class of discrete event systems, we identify a set of conditions that define local information and cooperation requirements, and prove that if the proposed conditions are satisfied a stable agent distribution representing a Pareto optimum is achieved even under random but bounded decision and transition delays.American Control Conference (ACC), 2013; 01/2013 
Conference Paper: Invalidation of dynamic network models
[Show abstract] [Hide abstract]
ABSTRACT: Models of discrete event systems combine ideas from control theory and computer science to represent the evolution of distributed processes. We formalize a notion of the invalidation of models presumed to describe dynamics on networks, and introduce an algorithm to evaluate a class of eventdriven processes that evolve close to an invariant and stable state. The algorithm returns the value true, if according to the proposed notion of invalidation, the evolution of empirical observations is inconsistent with the stability properties of the model. To illustrate the approach, we represent a generic decisionmaking process in which the marginal utility of allocating agents to particular nodes rests on the wellknown concept in economy theory of the law of diminishing returns.American Control Conference (ACC), 2013; 01/2013
Page 1
932 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005
The Effect of Time Delays on the Stability of Load
Balancing Algorithms for Parallel Computations
John Chiasson, Senior Member, IEEE, Zhong Tang, Member, IEEE, Jean Ghanem,
Chaouki T. Abdallah, Senior Member, IEEE, J. Douglas Birdwell, Fellow, IEEE,
Majeed M. Hayat, Senior Member, IEEE, and Henry Jérez
Abstract—A deterministic dynamic nonlinear timedelay system
isdevelopedtomodelloadbalancinginaclusterofcomputernodes
used for parallel computations. The model is shown to be self con
sistent in that the queue lengths cannot go negative and the total
number of tasks in all the queues and the network are conserved
(i.e., load balancing can neither create nor lose tasks). Further, it
is shown that using the proposed load balancing algorithms, the
system is stable in the sense of Lyapunov. Experimental results are
presented and compared with the predicted results from the an
alytical model. In particular, simulations of the models are com
pared with an experimental implementation of the load balancing
algorithm on a distributed computing network.
Index Terms—Computer networks, load balancing, time delay
systems.
I. INTRODUCTION
D
that is not attainable on a single CE. A common architecture
is the cluster of otherwise independent computers commu
nicating through a shared network. To make use of parallel
computing resources, problems must be broken down into
smaller units that can be solved individually by each CE while
exchanging information with CEs solving other problems. For
a background on mathematical treatments of load balancing,
the reader is referred to [1]–[3]. For example, The Federal
Bureau of Investigation (FBI) National DNA Index System
ISTRIBUTED computing architectures utilize a set of
computational elements (CEs) to achieve performance
Manuscript received April 11, 2004. Manuscript received in final form
May 25, 2005. Recommended by Associate Editor J. Sarangapani. The work of
C. T. Abdallah, J. Ghanem, and M. M. Hayat was supported in part by
the National Science Foundation under Information Technology Grants
ANI0312611 and INT9818312. The work of J. D. Birdwell, J. Chiasson,
and Z. Tang was supported in part by the National Science Foundation under
ITR Grant ANI0312182 and in part by U.S. Department of Justice, Federal
Bureau of Investigation under Contract JFBI98083. J. D. Birdwell and
J. Chiasson were also supported in part by a Challenge Grant Award from the
Center for Information Technology Research at the University of Tennessee.
The views and conclusions contained in this document are those of the authors
and should not be interpreted as necessarily representing the official policies,
either expressed or implied, of the U.S. Government.
J.Chiasson,Z. Tang, andJ.D.Birdwell,arewith theElectricalandComputer
Engineering Department, University of Tennessee, Knoxville, TN 37996 USA
(email: chiasson@utk.edu; ztang@utk.edu; birdwell@utk.edu).
J. Ghanem was with the Electrical and Computer Engineering Department,
University of New Mexico, Albuquerque, NM 87131 USA. He is now with
Verizon Communications.
C. T. Abdallah, M. M. Hayat, and H. Jérez are with the Electrical and
Computer Engineering Department, University of New Mexico, Albuquerque,
NM 87131 USA (email: chaouki@ece.unm.edu; hayat@ece.unm.edu;
hjerez@ece.unm.edu).
Digital Object Identifier 10.1109/TCST.2005.854339
(NDIS) and Combined DNA Index System (CODIS) software
are candidates for parallelization. New methods developed by
Wang et al. [4]–[7] lead naturally to a parallel decomposition
of the DNA database search problem while providing orders
of magnitude improvements in performance over the current
release of the CODIS software. In this type of application,
the search itself, initiated on any particular node, can initiate
subsequent new searches which are added to the node’s queue.
Consequently, it is of great advantage to the overall system to
carry out load balancing to make effective use of the overall
computational resources. The projected growth of the NDIS
database and the demand for searches of the database can be
met by migration to a parallel computing platform.
Effective utilization of a parallel computer architecture re
quires the computational load to be distributed, more or less,
evenly over the available CEs. The qualifier “more or less” is
usedbecausethecommunications requiredtodistributetheload
consumebothcomputationalresourcesandnetwork bandwidth.
A point of diminishing returns therefore exists.
Distribution of computational load across available resources
is referred to as the load balancing problem in the literature.
Various taxonomies of load balancing algorithms exist [8]. Di
rect methods examine the global distribution of computational
load and assign portions of the workload to resources before
processingbegins.Iterativemethodsexaminetheprogressofthe
computation and the expected utilization of resources, and ad
just the workload assignments periodically as computation pro
gresses. Assignment may be either deterministic, as with the
dimension exchange/diffusion [9] and gradient methods, sto
chastic, or optimization based. A comparison of several deter
ministic methods is provided by WillebeekLeMair and Reeves
[10]. Here, a deterministic model is developed.
The present work focuses upon the effects of delays in the
exchange of information among CEs, and the constraints these
effects impose on the design of a load balancing strategy. Mo
tivated by the authors’ previous work in [11] and [12], a new
nonlinear model is developed here (see also [13]). Specifically,
a deterministic dynamic nonlinear timedelay system is devel
oped to model load balancing. The model is shown to be self
consistent in that the queue lengths cannot go negative and that
the total number of tasks in all the queues and the network is
conserved(i.e.,loadbalancingcanneithercreatenorlosetasks).
Further, it is shown that the controller proposed here is asymp
toticallystableinthesenseofLyapunov.Simulationsofthenon
linear model are compared with an experimental implementa
10636536/$20.00 © 2005 IEEE
Page 2
CHIASSON et al.: EFFECT OF TIME DELAYS ON STABILITY OF LOAD BALANCING ALGORITHMS 933
tion of the load balancing algorithm performed on a distributed
computing network.
Section II presents our approach to modeling the computer
network and load balancing algorithm to incorporate the pres
ence of delay in the communication between nodes and task
transfers. In Section III, we show that the model captures the
nonnegativity of the queue lengths as well as the fact that the
totality of tasks in all the queues and in transit is conserved by
the load balancing algorithm. Section IV shows that the system
is asymptotically stable in the sense of Lyapunov for any choice
of positive gains in the load balancing algorithm (controller).
SectionVpresentssimulationsofthenonlinearmodelsforcom
parison with the actual experimental data. Section VI presents
experimental results from an implementation of the load bal
ancing controller on a parallel computer consisting of a net
worked cluster of nodes. Section VII presents experiments con
ducted overa geographicallydispersed distributed environment
(i.e., PlanetLab). Both the effects of the network delays and the
variances in the task processing time on the behavior of the
systemareassessed.Finally,SectionVIIIisasummaryandcon
clusion of the present work and a discussion of future work.
II. MATHEMATICAL MODEL
Continuous time models are developed in this section that
model the load balancing dynamics among a network of
computers. To introduce the approach, consider a computing
network consisting of
computers (nodes) all of which can
communicate with each other. At start up, the computers are
assigned an equal number of tasks each of which has essentially
the same processing time (homogenous tasks). However, in
some applications when a node executes a particular task it
can, in turn, generate more tasks so that very quickly the loads
on various nodes become unequal. To balance the loads, each
computer in the network sends (broadcasts) its queue size
to all other computers in the network. A node
information from node
delayed by a finite amount of time
(with the convention0); that is, it receives
Each node
then uses this information to compute its estimate
of the network average of the number of tasks in all
the network. Based on the most recent observations, the simple
(local) estimate of the network average is computed by the th
node as
receives this
.
queues of
Node
of the network average by estimating its excess load,
. If its excess load is greater than zero or
some positive threshold, the node sends some of its tasks to the
other nodes. If it is less than zero, no tasks are sent. Further, the
tasks sent bynode are receivedbynode with a delay
controller (load balancing algorithm) decides how often to do
load balancing (transfer tasks among the nodes) and how many
tasks are to be sent to each node.
then compares its queue size with its estimate
. The
The mathematical model of the task load dynamics at a given
computing node is given by
(1)
where
,, satisfyand
if
if
if.
Further, in this model, we define
•
is the expected waiting time experienced by a task
inserted into the queue of the th node. With
average time needed to process a task on the th node,
the expected (average) waiting time is given by
. Note that
thequeueofnode .Ifthesetasksweretransferredtonode
, then the waiting time transferred is
so that the fraction
to waiting time on node .
•
is the rate of generation of waiting times on the
th node caused by the addition of tasks (rate of increase
in
).
•
is the rate of reduction in waiting time caused
by the service of tasks at the th node and is given by
1 for all
0 then 0; that is, if there are no tasks in the
queue, then the queue cannot possibly decrease.
•
is the rate of removal (transfer) of the tasks from
node at time by the load balancing algorithm at node .
Note that
.
•
is the fraction of the th node’s tasks to be sent out
that it sends to the th node. In more detail,
rate at which node
sends waiting time (tasks) to node
at timewhere, as all the tasks must go to some node,
one requires that
That is, the transfer from node
(tasks)
in the interval of time
other nodes is carried out with the th node receiving the
fraction
converts the task from waiting time on node
time on node . As
this results in removing all of the excess waiting time
from node .
• The quantity
of transfer) of the expected waiting time (tasks) at time
from node by (to) node where
delay for the task transfer from node
In this model, all rates are in units of the rate of change of ex
pected waiting time, or time/time which is dimensionless. This
normalization of the queue length (i.e.,
node by the local average processing time
to account for unequal task processing rates by each node. As
, node can only send tasks to other nodes and cannot
as the
is the number of tasks in
,
converts waiting time on node
if0, while if
is the
, 1 and0.
of expected waiting time
to the
, where the ratio
to waiting
,
is the rate of increase (rate
0 is the time
to node .
) at each
is simply a way
Page 3
934 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005
initiate transfers from another node to itself. A delay is experi
enced by transmitted tasks before they are received at the other
node. The control law
th node outputis above its estimate of the network av
erage
other nodes, while if it is less than this average nothing is sent.
The th node receives the fraction
transferred waiting time
states that if the
, then it sends data to the
of
delayed by the time.
A. Specification of the Factors
The model described in (1) is the basic model, but an impor
tant detail remains unspecified,namely the exactform of the
for eachsendingnode .One approachis tochoosethemascon
stant and equal, that is,
proach is to use the local information of the waiting times
to set their values. The quantity
is node ’s estimate of the excess (or deficit) waiting time in the
queue of node
with respect to the local average of node . If
node ’s queue is above the local average, then node
send tasks to it. Therefore,
by node
as to how much node
Node
performs this computation for all the other nodes and
then portions out its tasks among the other nodes according to
the amounts they are below the local average, that is
for . Another ap
,
does not
is a measure
is below the local average.
(2)
If the denominator
the
are defined to be zero and no load is transferred. This is
illustrated in Fig. 1 for node 1.
Remark: If the denominator
, then
is zero, then
definition of the average
for all. However, by
which implies
That is, if the denominator is zero, the node
than the local average, so
is therefore not sending out any tasks.
is not greater
0 and
III. MODEL CONSISTENCY
It is now shown that the model is consistent with actual
working systems in that the queue lengths cannot go negative,
and the load balancing algorithm cannot create or lose tasks; it
can only move tasks between nodes [13], [14].
Fig. 1.
? from node 1’s point of view. Node 1 will send data out to node ? in proportion
?
it estimates node ? is below the average where
Illustration of a hypothetical distribution ?
of the load at some time
?
? 1 and ?
? 0.
A. Nonnegativity of the Queue Lengths
To show the nonnegativity of the queue lengths, recall that
the queue length of each node is given by
The model is rewritten in terms of these quantities as
.
(3)
Giventhat
of (3) that
this, suppose without loss of generality that
is the first queue to go to zero, and let
0. At the time
of
and
definition of the
. Further, the term
0forall ,itfollowsfromtherighthandside
for all and all . To see
be the time when
by the definition
for all time by the
is negative only if
,
(4)
By supposition (up to time
and
positive at time
righthand side of (3) are nonnegative. Further,
negative in a neighborhood of
side of (4) is continuous, it follows that:
) all the0 for
0 so that
. Consequently, at time
0 as the right side of (4) is
all terms on the
cannot go
. For if it did, as the righthand
(5)
for some
all
for all
with 0. Therefore,0 for
and the righthand side of (3) is nonnegative
which contradicts
can be taken to be at least as large as the time at which
goes to zero, that is,
side of (5) must remain positive for
0. Note that
some
0 as the righthand
.
Page 4
CHIASSON et al.: EFFECT OF TIME DELAYS ON STABILITY OF LOAD BALANCING ALGORITHMS935
If goes positive after , then the previous argument is
repeated at the next time a queue goes to zero. If
identically zero in the interval ( ,
also similar in that at time
are then zero. As the remaining
the righthand side of (5) continues to hold with both
zero at timeand one again gets a contradiction if either
orgoes negative in an interval (
manner, it follows that
all .
remains
), then the argument is
,
nodes are still positive,
, both
and
, ). Continuing in this
cannot go negative for
B. Conservation of Queue Lengths
Itisnowshownthatthetotalnumberoftasksinallthequeues
and the network are conserved. To do so, sum up (3) from
to obtain
(6)
which is the rate of change of the total queue lengths on all the
nodes. However, the network itself also contains tasks in transit
between nodes. The dynamic model of the queue lengths in the
network is given by
(7)
Here
being sent to node . This equation simply saysthat the th node
is putting tasks on the network to be sent to node
whilethe thnodeistakingthesetasksfromnode
off the network at the rate
over all the nodes, one obtains
is the number of tasks put on the network that are
at the rate
. Summing (7)
(8)
Adding (6) and (8), one obtains the conservation of queue
lengths given by
(9)
In words, the total number of tasks which are in the system (i.e.,
in the nodes and/or in the network) can increase only by the rate
of arrival of tasks
decrease by the rate of processing of tasks
the nodes. The load balancing itself cannot increase or decrease
the total number of tasks in all the queues.
at all the nodes, or similarly,
at all
IV. STABILITY OF THE CONTROLLER
The controller in the model (1) is
where the gains
gains are limited by the bandwidth constraints in the network.
One can also view the
as controller parameters to be speci
fied subject to the constraints given previously.
Interestingly, it turns out that the system (1) is asymptotically
stable in the sense of Lyapunov for any set of gains
any set of
with
following theorem.
Theorem: Given the system described by (1) and (7) with
0 forand initial conditions
as
Proof: First note that the
(7), it follows that
0 are to be specified. Physically, these
0 and
1. Specifically, we have the
, then
.
are nonnegative since by
(10)
Under the conditions of the theorem, (9) becomes
(11)
Let
are nonnegative,
and, as the,
and is equal to zero if and only if
0 for every . Further, as
0 if only if
. This then implies that
1 for
0 and0, it follows that
(12)
is monotonically decreasing. As
have
is bounded below, we
, or
(13)
The quantity
is positive or zero, so
set of pulses of unit height and varying width. The integral
is finite by (13) which implies that the widths
of the unitheight pulses making up
as
. So, even if a
switch between zero and positive values, the time intervals for
which it is nonzero must go to zero as
the
are nonnegative, continuous functions, bounded by
the nonnegative monotonically decreasing function
the intervals for which the
. More precisely, let
, then the Lebesgue measure of
is either 1 or 0 depending on whether
can be viewed as a
must go to zero
continues to
. Summarizing,
, and
are nonzero goes to zero as
and if we define
Page 5
936 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005
, denoted by
. Further, as
, converges to 0 as
0 is always true and
for every
it follows that the time intervals for which the bounded func
tions
are nonzero must go to zero as
lows from the observation that
0. Thus, the integral in (10) can be upper bounded by
. This fol
0 necessarily implies
(recall that
.Consequently,),whichconvergestozeroas
as
We now show that the monotonically decreasing function
must go to zero, that is,
not, so that
0. As
so that
Since
by (10),
.
0. Suppose
large enough
where
, choose
for.
for all
we have
for
For every
on ) for which
, there exists at least one
0. Therefore,
for all . By (11), we then have
(which depends
(14)
As the right side of (14) eventually becomes negative, we have
a contradiction and therefore
shown thatfor all ,
for all . This completes the proof of the theorem.
0. As it has already been
then implies that
V. SIMULATION RESULTS
Experimental procedures to determine the delay values are
given in [15] and summarized in [16]. These give representative
valuesforaFastEthernetnetworkwiththreenodesof
200 s for,0, and
0. The initial conditions were
and
0.2. The inputs were set as
0,1. The
to 10
s.
In this set of simulations, the model (1) is used. Figs. 2 and
3 show the responses with the gains set as
1000 and as6667,
respectively.
400 s for
0.6,
,
0.4
0,,
’s were taken to be equal
4167,5000,
VI. EXPERIMENTAL RESULTS
A parallel machine has been built to implement an experi
mental facility for evaluation of load balancing strategies and
Fig. 2.Constant ?
nonlinear output responses with ? ? 1000.
Fig. 3.
4166.7; ?? ? 5000.
Nonlinear simulation with constant ?
and ?? ? 6666.7; ?? ?
parallel databases. A root node communicates with
of computer networks. Each of these groups is composed of
nodes (hosts) holding identical copies of a portion of the
database. Any pair of groups correspond to different databases,
which are not necessarily disjoint. A specific record (or DNA
profile in our specific case) is in general stored in two groups
for redundancy to protect against failure of a node. Within each
node,there are eitherone or two processors. Inthe experimental
facility, the dual processor machines use 1.4 GHz Athlon MP
processors, and the single processor machines use 1.3 GHz
Athlon processors. All run the Linux operating system. Our
interest here is in the load balancing in any one group of
nodes.
The database is implemented as a set of queues with associ
ated search engine threads, typically assigned one per node of
the parallel machine. The search requests are created not only
by the database clients; the search process also creates search
requests as the index tree is descended by any search thread.
Thiscreatestheopportunityforparallelism;searchrequeststhat
groups
Page 6
CHIASSON et al.: EFFECT OF TIME DELAYS ON STABILITY OF LOAD BALANCING ALGORITHMS937
await processing may be placed in any queue associated with a
search engine, and the contents of these queues may be moved
arbitrarily among the processing nodes of a group to achieve a
balance of the load.
An important point is that the actual delays experienced by
the network traffic in the parallel machine are random. Work
has been performed to characterize the bandwidth and delay on
unloaded and loaded network switches, in order to identify the
delay parameters of the analytic models and is reported in [15]
and [16]. The value
sents an average value for the delay and was found using the
procedure described in [16]. The interest here is to compare the
experimental data with that from the three models previously
developed.
Toexplaintheconnectionbetweenthecontrolgain
actual implementation, recall that the waiting time is related to
the number of tasks by
time to carry out a task. The continuous time control law is
s used for simulations repre
andthe
where is the average
where
time. Consequently, the gain
tion of waiting time per second in the continuous time model.
Also,
where
is simply the number of tasks above the estimated
(local) average number of tasks and, as the interest here is the
case
0, consider
tation must execute the load balancing control law repetitively
with a (possibly random) time interval between balancing ac
tions. With
the time interval between successive executions
of the load balancing algorithm, a discrete time control law
is defined that removes a fraction of the queue
in the time . The rate of reduction of waiting time
is
tinuous time control law, given the discrete time gain
control interval
, is
is the rate of decrease of waiting time per unit
represents the rate of reduc
,
. The implemen
so that an equivalent con
and
(15)
This shows that the gain
tion by how fast the load balancing can be carried out and how
much (fraction) of the load is transferred. In the experimental
work reported here,
actually varies each time the load is bal
anced. As a consequence, the value of
erage value for that run. The average time
is the same on all nodes used for the experiments (identical pro
cessors) and is equal 10
s, while the time it takes to ready a
load for transfer is about 5 s. The initial conditions were taken
as
6000,4000,
to
0.06,
of the experimental responses were carried out with constant
for.
Fig. 4 is a plot of the responses
for1,2,3 (recall that
erage) value of the gains were
6667,s
5000. This figure compares favorably with Fig. 3 except for
the time scale being off; that is, the experimental responses are
is related to the actual implementa
used in (15) is an av
to process a task
2000 (corresponding
0.04,0.02). All
). The (av
0.5s
4167,s
Fig. 4.
value of the gains are ?? ? 0.5? ? ? 6667, ? ? 4167, ? ? 5000 with
constant ? .
Experimental response of the load balancing algorithm. The average
Fig. 5.
value of the gains are ?? ? 0.3? ? ? 2400, ? ? 7273, ? ? 2500 with
constant ? .
Experimental response of the load balancing algorithm. The average
slower. The explanation for this is that the discrete load bal
ancing implementation is not accurately modeled in the contin
uous time simulations, only its average effect is represented in
the gains
. That is, the continuous time model does not stop
processing jobs (at the average rate
tasks to do the load balancing. Fig. 5 shows the plots of the re
sponse for the (average) value of the gains given by
s2400,
s2500.
Fig. 6 shows the plots of the response for the (average) value
of the gains given by
0.2
s2500,
conditions were
6000,
(
0.06,
Fig. 7 summarizes the data from several experimental runs
of the type shown in Figs. 4–6. For
ten runs were made and the settling time (time to load balance)
were determined. These are marked as small horizontal ticks
on Fig. 7. (For all such runs, the initial queues were the same
and equal to
600,
) while it is transferring
0.3
7273,s
s1600,
s
4000,
2857. The initial
2000
0.04,0.02).
0.1,0.2,0.3,0.4,0.5,
400,200. For
Page 7
938IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005
Fig. 6.
value of the gains are ?? ? 0.2? ? ? 1600, ? ? 2500, ? ? 2857 with
constant ? .
Experimental response of the load balancing algorithm. The average
Fig. 7.
? .
Summary of the load balance time as a function of the feedback gain
each value of
was computed and is marked as a dot on given on Fig. 7. For
values of
0.6 and higher (with increments of 0.1 in
consistent results could not be obtained. In many cases, ringing
extended throughout the experiment’s time interval (200 ms).
For example, Fig. 8 shows the plots of the queue length less
the local queue average for an experimental run with
0.6 where the settling time is approximately 7 ms. In contrast,
Fig. 9 shows the experimental results under the same conditions
where persistent ringing regenerates for 40 ms. It was found the
response was so oscillatory that a settling time was not possible
to determine accurately. However, Fig. 7 shows that one desires
to choose the gain to be close to 0.5 to achieve a faster response
time without breaking into oscillatory behavior.
, the average settling time for these ten runs
),
VII. EXPERIMENTS OVER PLANETLAB
A geographically distributed system was developed to vali
date the theoretical work for large delays and to assess different
Fig. 8.
? ? 0.6—settling time is approximately 7 ms.
Fig. 9.
persists.
? ? 0.6 These are the same conditions as Fig. 8, but now the ringing
load balancing policies in a real environment. The system con
sists of several nodes running the same code. The nodes are part
of PlanetLab, a planetaryscale network involving more than
350 nodes positioned around the globe and connected via the
Internet (www.planetlab.org). The application used to illustrate
Page 8
CHIASSON et al.: EFFECT OF TIME DELAYS ON STABILITY OF LOAD BALANCING ALGORITHMS 939
Fig. 10. Parameters and settings of the experiment.
Fig. 11.Average network delays and transmission rates.
Fig. 12.
delays. Gain ? ? 0.3 and ?
Experimental response of the load balancing algorithm under large
? 0.5.
theloadbalancingprocesswasmatrixmultiplication,whereone
task is defined as the multiplication of one row by a static ma
trix duplicated on all nodes (3 nodes in our experiment). The
number of elements in each row were generated randomly from
a specified range, which made the execution time of a task vari
able. The network protocol UDP was used to exchange queue
size information among the nodes, and TCP (connectionbased)
was used to transfer task data from one machine to another.
To match the experimental settings of the previous sections,
3 nodes were used; node1 at the University of New Mexico,
node2 in Taipei, Taiwan and node3 in Frankfurt, Germany. The
were set to 1/2 for. The initial parameters and settings
for theexperimentare summarized in theTable giveninFig. 10.
Throughout the experiments, network statistics related to
transmission rates and delays were collected. The averages of
these statistics are shown in the Table given in Fig. 11. Large
delays were observed in the network due to the dispersed
geographical location of the nodes. Moreover, the transmission
rates detected between the nodes were very low mainly because
the amount of data exchanged (in bytes) is small. Indeed,
the average size of data needed to transmit a single task was
20 bytes, which caused variability in the observed transmission
Fig. 13.
delays. Gain ? ? 0.5 and ?
Experimental response of the load balancing algorithm under large
? 0.5.
Fig. 14.Summary of the load balancing time as function of the gain ? .
Fig. 15.
delays. Gain ? ? 0.8 and ?
Experimental response of the load balancing algorithm under large
? 0.5.
rates due to the large communication delays and their variation.
In order to observe the behavior of the system under various
gains, several experiments were conducted for different gain
values
ranging from 0.1 to 1.0. Fig. 12 is a plot of the
responses
corresponding to each node
was set to 0.3. Similarly, Fig. 13 shows the system response
for gain
equal to 0.5. Fig. 14 summarizes several runs
corresponding to different gain values. For each
0.2, 0.3, 0.4, 0.5, 0.6, 0.7, ten runs were made, and the settling
times (time to load balance) were determined. For gain values
higher than 0.8, consistent results could not be obtained. For
instance, in most of the runs no settling time could be achieved.
where the gain
0.1,
Page 9
940IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005
Fig. 16.
delays. Gain ? ? 0.4 and ?
Experimental response of the load balancing algorithm under large
? 0.5.
Fig. 17.
delays. Gain ? ? 0.8 and ?
Experimental response of the load balancing algorithm under large
? 0.5.
However, when the observed network delays were stable, the
system response was steady and converged quickly to a bal
anced state when
was equal to 0.8 (Fig. 15). As previously
mentioned, this scenario was not frequently seen. The system
behaviors in these experiments do not exactly match, and the
results obtained in the previous sections. This is due to the
difference in network topology and delays (and very likely due
to the random nature of the delay [17], [18]). For instance, the
ratio between the average delay and the task process time is
20
s s for the local area network (LAN) setting
and 12 (120 ms/10 ms) for the distributed setting. This fact is
one of the reasons ringing is observed earlier (for
in the LAN experiment whereas under PlanetLab the ringing
responses were observed starting at
we have previously observed a similar behavior, using a Monte
Carlo simulation of a 3node distributed system with random
delays in [17] as well as experiments on a wireless LAN in [18].
Thepreviousexperimentshavebeen conductedundernormal
network conditions stated in the Table given in Fig. 11. How
ever, another set of experiments were carried out where the net
work condition worsens and larger delays were observed. In
particular, the data transmission rate between node 2 (Taiwan)
and node 3 (Germany) dropped from 1.03 KB/s to 407 KB/s.
Figs. 16 and 17 show the system responses for gains
and
0.8, respectively. These experiments clearly show
the negative effect of the delay on the stability of the system.
0.6)
0.8. Interestingly,
0.4
Fig. 18.
variance in the tasks processing time. Gain ? ? 0.3 and ?
Experimental response of the load balancing algorithm under large
? 0.5.
Fig. 19.
variance in the tasks processing time. Gain ? ? 0.8 and ?
Experimental response of the load balancing algorithm under large
? 0.5.
Nevertheless, we see that with a low gain
tling time is approximately 22 ms. On the other hand, when the
gain was set to 0.8, the system did not reach an equilibrium as
shown by the nodes’ responses
only the effect of delay on the stability of the system was tested.
In order to study the effect of the variability of the task pro
cessing time on system behavior, the matrix multiplication ap
plication was adjusted in a way to obtain the following results;
the average task processing time was kept at 10.2 ms, but the
standard deviation became 7.15 ms instead of 2.5 ms. Figs. 18
and19showtherespectivesystemresponsesforgains
and
0.8. Comparing Figs. 12 and 18, we can see that in
thelattercase,someringingpersistsandthesystemresponsedid
not completelysettle out. Onthe other hand, settingthe gain
to 0.8 led the system to accommodate the variances in the task
processing time.
The experiments presented in this section support the ones
reported in the previous section where a local area network was
used. In particular, high gains were shown to lead to persistent
ringing. Conversely, systems with low gain values lead to slow
responses in the load balancing.
0.4 , the set
in Fig. 17. At this point,
0.3
VIII. SUMMARY AND CONCLUSION
Aload balancing algorithm was modeledas a nonlinear time
delay system. The model was shown to be consistent in that the
totalnumberoftaskswasconservedandthequeueswerealways
Page 10
CHIASSON et al.: EFFECT OF TIME DELAYS ON STABILITY OF LOAD BALANCING ALGORITHMS 941
nonnegative.Further, thesystemwas showntobealways stable,
but thedelays docreate a limiton thesize of thecontroller gains
in order to ensure performance (fast enough response without
oscillatory behavior). Experiments demonstrated a correlation
of the continuous time model with the actual implementation.
Future work will consider the fact that the load balancing oper
ation involves processor time which is not being used to process
tasks. There is a tradeoff between using processor time/network
bandwidth and the advantage of distributing the load evenly be
tween the nodes to reduce overall processing time, which has
not been fully captured in the present work.
REFERENCES
[1] E. Altman and H. Kameda, “Equilibria for multiclass routing in multi
agent networks,” in Proc. 40th IEEE Conf. Decision and Control, Or
lando, FL, Dec. 2001, pp. 604–609.
[2] C. K. Hisao Kameda, J. Li, and Y. Zhang, Optimal Load Balancing in
Distributed Computer Systems.
[3] H.Kameda,I.R.ElZoghdySaidFathy,andJ.Li,“Aperformancecom
parisonofdynamicversusstaticloadbalancingpoliciesinamainframe,”
in Proc. 39th IEEE Conf. Decision and Control, Sydney, Australia, Dec.
2000, pp. 1415–1420.
[4] J. D. Birdwell, R. D. Horn, D. J. Icove, T. W. Wang, P. Yadav, and
S. Niezgoda, “A hierarchical database design and search method for
CODIS,” in Proc. 10th Int. Symp. Human Identification, Orlando, FL,
Sep. 1999.
[5] J. D. Birdwell, T. W. Wang, R. D. Horn, P. Yadav, and D. J. Icove,
“Method of indexed storage and retrieval of multidimensional informa
tion,” in Proc. 10th SIAM Conf. Parallel Processing for Scientific Com
putation, Sep. 2000.
[6] J. D. Birdwell, T.W. Wang, and M. Rader, “The university of Ten
nessee’s new search engine for CODIS,” in Proc. 6th CODIS Users
Conf., Arlington, VA, Feb. 2001.
[7] T. W. Wang, J. D. Birdwell, P. Yadav, D. J. Icove, S. Niezgoda, and
S. Jones, “Natural clustering of DNA/STR profiles,” in Proc. 10th Int.
Symp. Human Identification, Orlando, FL, Sep. 1999.
[8] H. G. Rotithor, “Taxonomy of dynamic task scheduling schemes in dis
tributed computing systems,” Inst. Elect. Eng. Proc. Comput. Dig. Tech
niques, vol. 141, no. 1, pp. 1–10, 1994.
[9] A. Corradi, L. Leonardi, and F. Zambonelli, “Diffusive loadbalancing
polices fordynamicapplications,”IEEEConcurrency, vol.22, no.1, pp.
979–993, Jan.–Feb. 1999.
[10] M. H. WillebeekLeMair and A. P. Reeves, “Strategies for dynamic load
balancing on highly parallel computers,” IEEE Trans. Parallel Distrib.
Syst., vol. 4, no. 9, pp. 979–993, Sep. 1993.
[11] C. T. Abdallah, J. D. Birdwell, J. Chiasson, V. Chupryna, Z. Tang, and
T. W. Wang, “Load balancing instabilities due to time delays in parallel
computation,” in Proc. 3rd IFAC Conf. Time Delay Systems, Sante Fe,
NM, Dec. 2001, pp. 198–202.
[12] J. D. Birdwell, J. Chiasson, Z. Tang, C. T. Abdallah, M. Hayat, and T.
Wang, “Dynamic time delay models for load balancing part I: determin
istic models,” in Proc. CNRSNSF Workshop: Advances in Control of
TimeDelay Systems, Paris, France, Jan. 2003, pp. 355–370.
[13] J. D. Birdwell, J. Chiasson, Z. Tang, C. T. Abdallah, and M. M. Hayat,
“The effect of feedback gains on the performance of a load balancing
network with time delays,” in Proc. IFAC Workshop on TimeDelay Sys
tems (TDS’03), Rocquencourt, France, Sep. 2003, pp. 371–385.
[14] J. D. Birdwell, J. Chiasson, C. T. Abdallah, Z. Tang, N. Alluri, and T.
Wang, “The effect of time delays in the stability of load balancing al
gorithms for parallel computations,” in Proc. 42nd IEEE Conf. Decision
and Control, Maui, Hi, Dec. 2003, pp. 582–587.
[15] P.Dasgupta,“Performanceevaluationoffastethernet,ATMandmyrinet
under PVM,” MS thesis, Univ. Tennessee, Knoxville, 2001.
[16] P. Dasgupta, J. D. Birdwell, and T. W. Wang, “Timing and congestion
studies under PVM,” in Proc. 10th SIAM Conf. Parallel Processing for
Scientific Computation, Portsmouth, VA, Mar. 2001.
[17] S. Dhakal, B. Paskaleva, M. Hayat, E. Schamiloglu, and C. Abdallah,
“Dynamical discretetime load balancing in distributed systems in the
presence of time delays,” in Proc. IEEE Conf. Decision Control, Maui,
HI, Dec. 2003, pp. 5128–5134.
New York: SpringerVerlag, 1997.
[18] J.Ghanem,S.Dhakal,M.M.Hayat,H.Jerez,C.T.Abdallah,andJ.Chi
asson, “On load balancing in distributed systems with large time delays:
theory and experiments,” in Proc. IEEE Mediterranean Control Conf.
(MED’04), Kusadasi, Turkey, Jun. 6–9, 2004.
John Chiasson (S’81–M’85–SM’03) received the
B.S. degree in mathematics from the University
of Arizona, Tucson, the M.S. degree in electrical
engineering from Washington State University,
Pullman, and the Ph.D. degree in controls from the
University of Minnesota, Minneapolis.
He has worked in industry at Boeing Aerospace,
Control Data, and ABB DaimlerBenz Transporta
tion. Since 1999, has been on the faculty of Electrical
and Computer Engineering at the University of Ten
nessee, Knoxville.
Zhong Tang (S’99–M’05) received the B.S. and
M.S. degrees in automatic control engineering from
Huazhong University of Science and Technology,
Wuhan, China, in 1994 and 1997, respectively, and
the Ph.D. degree in electrical engineering from the
University of Tennessee, Knoxville, in 2005.
Hisresearchinterestsincludecontrolsystems,par
allel databases, and distributed computing.
Jean Ghanem received the B.S. degree in computer
and communication engineering from the American
University of Beirut and the M.S. degree in computer
engineering from the University of New Mexico, Al
buquerque.
His research focused on the analysis and imple
mentation of load balancing policies in large time
delay networks mainly the Internet and wireless
network. He has also performed work related to
securing 802.11 wireless networks. He has software
development experience in network, multithreading,
database and parallel programming. He is currently a solution architect consul
tant for Verizon Communications.
Chaouki T. Abdallah (M’81–SM’85) received the
M.S.andPh.D.degreesinelectricalengineeringfrom
the Georgia Institute of Technology, Atlanta, in 1982
and 1988, respectively.
He is currently a Professor, Associate Chair,
and the Director of the Graduate Program in the
Electrical and Computer Engineering Department,
University of New Mexico, Albuquerque. He con
ducts research and teaches courses in the general
area of systems theory with focus on control and
communications systems. His research has been
funded by national funding agencies (NSF, AFOSR, NRL), national laborato
ries (SNL, LANL), and by various companies (Boeing, HP). He has also been
active in designing and implementing various international graduate programs
with Latin American and European countries. He was a cofounder in 1990 of
the ISTEC consortium, which currently includes more than 150 universities in
the US, Spain, and Latin America. He has coauthored four books, and more
than 150 peerreviewed papers.
Dr. Abdallah’s IEEE professional service credits include serving as an IEEE
CSS BOG Appointed Member (2004–2005), as a member of the IEEE CSS
Long Range Planning Committee (2004–2005), and as the Program Chair of
the IEEE Conference on Decision and Control, Hawaii, 2003. He will also
be serving as the General Chair of CDC’08. He is a recipient of the IEEE
Millennium medal.
Page 11
942IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005
J. Douglas Birdwell (S’73–M’78–SM’85–F’99)
received the Ph.D. degree in electrical engineering
from the Massachusetts Institute of Technology,
Cambridge, in 1978.
He is currently a Professor of Electrical and
Computer Engineering at the University of Ten
nessee (UT), Knoxville. He joined the faculty at
UT in 1978, and is Director of the Laboratory for
Information Technologies, which develops secure
distributed information systems and analysis tools.
His experience includes computer hardware and
software applications development, high performance data base design with
applications in bioinformatics, parallel computation and load balancing, and
artificial intelligence.
Dr. Birdwell has served in a number of positions in the IEEE Control Sys
tems Society, including President (2004), member of the Board of Governors
(1990–2001), General Chair of IEEE Conference on Decision and Control
(1998), and Associate Editor of the IEEE TRANSACTIONS ON AUTOMATIC
CONTROL.
Majeed M. Hayat (S’89–M’92–SM’00) was born in
Kuwait in 1963. He received the B.S. degree (summa
cum laude) in 1985 in electrical engineering from the
University of the Pacific, Stockton, CA, and the M.S.
and Ph.D. degrees in electrical and computer engi
neering from the University of WisconsinMadison,
in 1988 and 1992, respectively.
From 1993 to 1996, he worked at the University
of WisconsinMadison as a Research Associate
and coprincipal investigator of a project on statis
tical minefield modeling and detection, which was
funded by the Office of Naval Research. In 1996, he joined the faculty of
the ElectroOptics Graduate Program and the Department of Electrical and
Computer Engineering, University of Dayton, OH. He is currently an Associate
Professor in the Department of Electrical and Computer Engineering, Univer
sity of New Mexico, Albuquerque. His research contributions cover a broad
range of topics in statistical communication theory, optoelectronics, signal
processing, and applied probability theory including avalanche photodiodes,
optical communication systems, image processing, models for distributed
computing, as well as infrared and spectral imaging.
Dr. Hayat was a recipient of a 1998 National Science Foundation Early Fac
ulty Career Award. He is a member of SPIE and OSA.
HenryJérezreceivedtheB.Sc.degreeinsystemsen
gineering from the Private University of Bolivia and
theM.Sc.andPh.D.degreesincomputerengineering
from the University of New Mexico, Albuquerque.
He is a Research Scientist with the Corporation
for National Research Initiatives, Reston, VA. His
research in the area of distributed digital object
repositories, load balancing, and wireless communi
cations has been funded by AISTI, Hewlett Packard,
Microsoft and Sun Microsystems Latin America. His
primary work has been in the area of digital libraries
and distributed digital object repositories as a member of the Prototyping
team for the Research library at Los Alamos National Laboratory working on
LANL’s new Digital Object Storage Architecture. He has served as faculty
member, conference reviewer, enterprise consultant and international advisor
to several companies and universities across Latin and North America; and is
part of the advisory board for the Digital Library Linkages Initiative from the
Ibero American Science and Technology Education Consortium ISTEC.
View other sources
Hide other sources
 Available from John Chiasson · Jun 1, 2014
 Available from unm.edu