Page 1

932 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005

The Effect of Time Delays on the Stability of Load

Balancing Algorithms for Parallel Computations

John Chiasson, Senior Member, IEEE, Zhong Tang, Member, IEEE, Jean Ghanem,

Chaouki T. Abdallah, Senior Member, IEEE, J. Douglas Birdwell, Fellow, IEEE,

Majeed M. Hayat, Senior Member, IEEE, and Henry Jérez

Abstract—A deterministic dynamic nonlinear time-delay system

isdevelopedtomodelloadbalancinginaclusterofcomputernodes

used for parallel computations. The model is shown to be self con-

sistent in that the queue lengths cannot go negative and the total

number of tasks in all the queues and the network are conserved

(i.e., load balancing can neither create nor lose tasks). Further, it

is shown that using the proposed load balancing algorithms, the

system is stable in the sense of Lyapunov. Experimental results are

presented and compared with the predicted results from the an-

alytical model. In particular, simulations of the models are com-

pared with an experimental implementation of the load balancing

algorithm on a distributed computing network.

Index Terms—Computer networks, load balancing, time delay

systems.

I. INTRODUCTION

D

that is not attainable on a single CE. A common architecture

is the cluster of otherwise independent computers commu-

nicating through a shared network. To make use of parallel

computing resources, problems must be broken down into

smaller units that can be solved individually by each CE while

exchanging information with CEs solving other problems. For

a background on mathematical treatments of load balancing,

the reader is referred to [1]–[3]. For example, The Federal

Bureau of Investigation (FBI) National DNA Index System

ISTRIBUTED computing architectures utilize a set of

computational elements (CEs) to achieve performance

Manuscript received April 11, 2004. Manuscript received in final form

May 25, 2005. Recommended by Associate Editor J. Sarangapani. The work of

C. T. Abdallah, J. Ghanem, and M. M. Hayat was supported in part by

the National Science Foundation under Information Technology Grants

ANI-0312611 and INT-9818312. The work of J. D. Birdwell, J. Chiasson,

and Z. Tang was supported in part by the National Science Foundation under

ITR Grant ANI-0312182 and in part by U.S. Department of Justice, Federal

Bureau of Investigation under Contract J-FBI-98-083. J. D. Birdwell and

J. Chiasson were also supported in part by a Challenge Grant Award from the

Center for Information Technology Research at the University of Tennessee.

The views and conclusions contained in this document are those of the authors

and should not be interpreted as necessarily representing the official policies,

either expressed or implied, of the U.S. Government.

J.Chiasson,Z. Tang, andJ.D.Birdwell,arewith theElectricalandComputer

Engineering Department, University of Tennessee, Knoxville, TN 37996 USA

(e-mail: chiasson@utk.edu; ztang@utk.edu; birdwell@utk.edu).

J. Ghanem was with the Electrical and Computer Engineering Department,

University of New Mexico, Albuquerque, NM 87131 USA. He is now with

Verizon Communications.

C. T. Abdallah, M. M. Hayat, and H. Jérez are with the Electrical and

Computer Engineering Department, University of New Mexico, Albuquerque,

NM 87131 USA (e-mail: chaouki@ece.unm.edu; hayat@ece.unm.edu;

hjerez@ece.unm.edu).

Digital Object Identifier 10.1109/TCST.2005.854339

(NDIS) and Combined DNA Index System (CODIS) software

are candidates for parallelization. New methods developed by

Wang et al. [4]–[7] lead naturally to a parallel decomposition

of the DNA database search problem while providing orders

of magnitude improvements in performance over the current

release of the CODIS software. In this type of application,

the search itself, initiated on any particular node, can initiate

subsequent new searches which are added to the node’s queue.

Consequently, it is of great advantage to the overall system to

carry out load balancing to make effective use of the overall

computational resources. The projected growth of the NDIS

database and the demand for searches of the database can be

met by migration to a parallel computing platform.

Effective utilization of a parallel computer architecture re-

quires the computational load to be distributed, more or less,

evenly over the available CEs. The qualifier “more or less” is

usedbecausethecommunications requiredtodistributetheload

consumebothcomputationalresourcesandnetwork bandwidth.

A point of diminishing returns therefore exists.

Distribution of computational load across available resources

is referred to as the load balancing problem in the literature.

Various taxonomies of load balancing algorithms exist [8]. Di-

rect methods examine the global distribution of computational

load and assign portions of the workload to resources before

processingbegins.Iterativemethodsexaminetheprogressofthe

computation and the expected utilization of resources, and ad-

just the workload assignments periodically as computation pro-

gresses. Assignment may be either deterministic, as with the

dimension exchange/diffusion [9] and gradient methods, sto-

chastic, or optimization based. A comparison of several deter-

ministic methods is provided by Willebeek-LeMair and Reeves

[10]. Here, a deterministic model is developed.

The present work focuses upon the effects of delays in the

exchange of information among CEs, and the constraints these

effects impose on the design of a load balancing strategy. Mo-

tivated by the authors’ previous work in [11] and [12], a new

nonlinear model is developed here (see also [13]). Specifically,

a deterministic dynamic nonlinear time-delay system is devel-

oped to model load balancing. The model is shown to be self

consistent in that the queue lengths cannot go negative and that

the total number of tasks in all the queues and the network is

conserved(i.e.,loadbalancingcanneithercreatenorlosetasks).

Further, it is shown that the controller proposed here is asymp-

toticallystableinthesenseofLyapunov.Simulationsofthenon-

linear model are compared with an experimental implementa-

1063-6536/$20.00 © 2005 IEEE

Page 2

CHIASSON et al.: EFFECT OF TIME DELAYS ON STABILITY OF LOAD BALANCING ALGORITHMS 933

tion of the load balancing algorithm performed on a distributed

computing network.

Section II presents our approach to modeling the computer

network and load balancing algorithm to incorporate the pres-

ence of delay in the communication between nodes and task

transfers. In Section III, we show that the model captures the

nonnegativity of the queue lengths as well as the fact that the

totality of tasks in all the queues and in transit is conserved by

the load balancing algorithm. Section IV shows that the system

is asymptotically stable in the sense of Lyapunov for any choice

of positive gains in the load balancing algorithm (controller).

SectionVpresentssimulationsofthenonlinearmodelsforcom-

parison with the actual experimental data. Section VI presents

experimental results from an implementation of the load bal-

ancing controller on a parallel computer consisting of a net-

worked cluster of nodes. Section VII presents experiments con-

ducted overa geographically-dispersed distributed environment

(i.e., PlanetLab). Both the effects of the network delays and the

variances in the task processing time on the behavior of the

systemareassessed.Finally,SectionVIIIisasummaryandcon-

clusion of the present work and a discussion of future work.

II. MATHEMATICAL MODEL

Continuous time models are developed in this section that

model the load balancing dynamics among a network of

computers. To introduce the approach, consider a computing

network consisting of

computers (nodes) all of which can

communicate with each other. At start up, the computers are

assigned an equal number of tasks each of which has essentially

the same processing time (homogenous tasks). However, in

some applications when a node executes a particular task it

can, in turn, generate more tasks so that very quickly the loads

on various nodes become unequal. To balance the loads, each

computer in the network sends (broadcasts) its queue size

to all other computers in the network. A node

information from node

delayed by a finite amount of time

(with the convention0); that is, it receives

Each node

then uses this information to compute its estimate

of the network average of the number of tasks in all

the network. Based on the most recent observations, the simple

(local) estimate of the network average is computed by the th

node as

receives this

.

queues of

Node

of the network average by estimating its excess load,

. If its excess load is greater than zero or

some positive threshold, the node sends some of its tasks to the

other nodes. If it is less than zero, no tasks are sent. Further, the

tasks sent bynode are receivedbynode with a delay

controller (load balancing algorithm) decides how often to do

load balancing (transfer tasks among the nodes) and how many

tasks are to be sent to each node.

then compares its queue size with its estimate

. The

The mathematical model of the task load dynamics at a given

computing node is given by

(1)

where

,, satisfyand

if

if

if.

Further, in this model, we define

•

is the expected waiting time experienced by a task

inserted into the queue of the th node. With

average time needed to process a task on the th node,

the expected (average) waiting time is given by

. Note that

thequeueofnode .Ifthesetasksweretransferredtonode

, then the waiting time transferred is

so that the fraction

to waiting time on node .

•

is the rate of generation of waiting times on the

th node caused by the addition of tasks (rate of increase

in

).

•

is the rate of reduction in waiting time caused

by the service of tasks at the th node and is given by

1 for all

0 then 0; that is, if there are no tasks in the

queue, then the queue cannot possibly decrease.

•

is the rate of removal (transfer) of the tasks from

node at time by the load balancing algorithm at node .

Note that

.

•

is the fraction of the th node’s tasks to be sent out

that it sends to the th node. In more detail,

rate at which node

sends waiting time (tasks) to node

at timewhere, as all the tasks must go to some node,

one requires that

That is, the transfer from node

(tasks)

in the interval of time

other nodes is carried out with the th node receiving the

fraction

converts the task from waiting time on node

time on node . As

this results in removing all of the excess waiting time

from node .

• The quantity

of transfer) of the expected waiting time (tasks) at time

from node by (to) node where

delay for the task transfer from node

In this model, all rates are in units of the rate of change of ex-

pected waiting time, or time/time which is dimensionless. This

normalization of the queue length (i.e.,

node by the local average processing time

to account for unequal task processing rates by each node. As

, node can only send tasks to other nodes and cannot

as the

is the number of tasks in

,

converts waiting time on node

if0, while if

is the

, 1 and0.

of expected waiting time

to the

, where the ratio

to waiting

,

is the rate of increase (rate

0 is the time

to node .

) at each

is simply a way

Page 3

934 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005

initiate transfers from another node to itself. A delay is experi-

enced by transmitted tasks before they are received at the other

node. The control law

th node outputis above its estimate of the network av-

erage

other nodes, while if it is less than this average nothing is sent.

The th node receives the fraction

transferred waiting time

states that if the

, then it sends data to the

of

delayed by the time.

A. Specification of the Factors

The model described in (1) is the basic model, but an impor-

tant detail remains unspecified,namely the exactform of the

for eachsendingnode .One approachis tochoosethemascon-

stant and equal, that is,

proach is to use the local information of the waiting times

to set their values. The quantity

is node ’s estimate of the excess (or deficit) waiting time in the

queue of node

with respect to the local average of node . If

node ’s queue is above the local average, then node

send tasks to it. Therefore,

by node

as to how much node

Node

performs this computation for all the other nodes and

then portions out its tasks among the other nodes according to

the amounts they are below the local average, that is

for . Another ap-

,

does not

is a measure

is below the local average.

(2)

If the denominator

the

are defined to be zero and no load is transferred. This is

illustrated in Fig. 1 for node 1.

Remark: If the denominator

, then

is zero, then

definition of the average

for all. However, by

which implies

That is, if the denominator is zero, the node

than the local average, so

is therefore not sending out any tasks.

is not greater

0 and

III. MODEL CONSISTENCY

It is now shown that the model is consistent with actual

working systems in that the queue lengths cannot go negative,

and the load balancing algorithm cannot create or lose tasks; it

can only move tasks between nodes [13], [14].

Fig. 1.

? from node 1’s point of view. Node 1 will send data out to node ? in proportion

?

it estimates node ? is below the average where

Illustration of a hypothetical distribution ?

of the load at some time

?

? 1 and ?

? 0.

A. Nonnegativity of the Queue Lengths

To show the nonnegativity of the queue lengths, recall that

the queue length of each node is given by

The model is rewritten in terms of these quantities as

.

(3)

Giventhat

of (3) that

this, suppose without loss of generality that

is the first queue to go to zero, and let

0. At the time

of

and

definition of the

. Further, the term

0forall ,itfollowsfromtheright-handside

for all and all . To see

be the time when

by the definition

for all time by the

is negative only if

,

(4)

By supposition (up to time

and

positive at time

right-hand side of (3) are nonnegative. Further,

negative in a neighborhood of

side of (4) is continuous, it follows that:

) all the0 for

0 so that

. Consequently, at time

0 as the right side of (4) is

all terms on the

cannot go

. For if it did, as the right-hand

(5)

for some

all

for all

with 0. Therefore,0 for

and the right-hand side of (3) is nonnegative

which contradicts

can be taken to be at least as large as the time at which

goes to zero, that is,

side of (5) must remain positive for

0. Note that

some

0 as the right-hand

.

Page 4

CHIASSON et al.: EFFECT OF TIME DELAYS ON STABILITY OF LOAD BALANCING ALGORITHMS935

If goes positive after , then the previous argument is

repeated at the next time a queue goes to zero. If

identically zero in the interval ( ,

also similar in that at time

are then zero. As the remaining

the right-hand side of (5) continues to hold with both

zero at timeand one again gets a contradiction if either

orgoes negative in an interval (

manner, it follows that

all .

remains

), then the argument is

,

nodes are still positive,

, both

and

, ). Continuing in this

cannot go negative for

B. Conservation of Queue Lengths

Itisnowshownthatthetotalnumberoftasksinallthequeues

and the network are conserved. To do so, sum up (3) from

to obtain

(6)

which is the rate of change of the total queue lengths on all the

nodes. However, the network itself also contains tasks in transit

between nodes. The dynamic model of the queue lengths in the

network is given by

(7)

Here

being sent to node . This equation simply saysthat the th node

is putting tasks on the network to be sent to node

whilethe thnodeistakingthesetasksfromnode

off the network at the rate

over all the nodes, one obtains

is the number of tasks put on the network that are

at the rate

. Summing (7)

(8)

Adding (6) and (8), one obtains the conservation of queue

lengths given by

(9)

In words, the total number of tasks which are in the system (i.e.,

in the nodes and/or in the network) can increase only by the rate

of arrival of tasks

decrease by the rate of processing of tasks

the nodes. The load balancing itself cannot increase or decrease

the total number of tasks in all the queues.

at all the nodes, or similarly,

at all

IV. STABILITY OF THE CONTROLLER

The controller in the model (1) is

where the gains

gains are limited by the bandwidth constraints in the network.

One can also view the

as controller parameters to be speci-

fied subject to the constraints given previously.

Interestingly, it turns out that the system (1) is asymptotically

stable in the sense of Lyapunov for any set of gains

any set of

with

following theorem.

Theorem: Given the system described by (1) and (7) with

0 forand initial conditions

as

Proof: First note that the

(7), it follows that

0 are to be specified. Physically, these

0 and

1. Specifically, we have the

, then

.

are nonnegative since by

(10)

Under the conditions of the theorem, (9) becomes

(11)

Let

are nonnegative,

and, as the,

and is equal to zero if and only if

0 for every . Further, as

0 if only if

. This then implies that

1 for

0 and0, it follows that

(12)

is monotonically decreasing. As

have

is bounded below, we

, or

(13)

The quantity

is positive or zero, so

set of pulses of unit height and varying width. The integral

is finite by (13) which implies that the widths

of the unit-height pulses making up

as

. So, even if a

switch between zero and positive values, the time intervals for

which it is nonzero must go to zero as

the

are nonnegative, continuous functions, bounded by

the nonnegative monotonically decreasing function

the intervals for which the

. More precisely, let

, then the Lebesgue measure of

is either 1 or 0 depending on whether

can be viewed as a

must go to zero

continues to

. Summarizing,

, and

are nonzero goes to zero as

and if we define

Page 5

936 IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005

, denoted by

. Further, as

, converges to 0 as

0 is always true and

for every

it follows that the time intervals for which the bounded func-

tions

are nonzero must go to zero as

lows from the observation that

0. Thus, the integral in (10) can be upper bounded by

. This fol-

0 necessarily implies

(recall that

.Consequently,),whichconvergestozeroas

as

We now show that the monotonically decreasing function

must go to zero, that is,

not, so that

0. As

so that

Since

by (10),

.

0. Suppose

large enough

where

, choose

for.

for all

we have

for

For every

on ) for which

, there exists at least one

0. Therefore,

for all . By (11), we then have

(which depends

(14)

As the right side of (14) eventually becomes negative, we have

a contradiction and therefore

shown thatfor all ,

for all . This completes the proof of the theorem.

0. As it has already been

then implies that

V. SIMULATION RESULTS

Experimental procedures to determine the delay values are

given in [15] and summarized in [16]. These give representative

valuesforaFastEthernetnetworkwiththreenodesof

200 s for,0, and

0. The initial conditions were

and

0.2. The inputs were set as

0,1. The

to 10

s.

In this set of simulations, the model (1) is used. Figs. 2 and

3 show the responses with the gains set as

1000 and as6667,

respectively.

400 s for

0.6,

,

0.4

0,,

’s were taken to be equal

4167,5000,

VI. EXPERIMENTAL RESULTS

A parallel machine has been built to implement an experi-

mental facility for evaluation of load balancing strategies and

Fig. 2.Constant ?

nonlinear output responses with ? ? 1000.

Fig. 3.

4166.7; ?? ? 5000.

Nonlinear simulation with constant ?

and ?? ? 6666.7; ?? ?

parallel databases. A root node communicates with

of computer networks. Each of these groups is composed of

nodes (hosts) holding identical copies of a portion of the

database. Any pair of groups correspond to different databases,

which are not necessarily disjoint. A specific record (or DNA

profile in our specific case) is in general stored in two groups

for redundancy to protect against failure of a node. Within each

node,there are eitherone or two processors. Inthe experimental

facility, the dual processor machines use 1.4 GHz Athlon MP

processors, and the single processor machines use 1.3 GHz

Athlon processors. All run the Linux operating system. Our

interest here is in the load balancing in any one group of

nodes.

The database is implemented as a set of queues with associ-

ated search engine threads, typically assigned one per node of

the parallel machine. The search requests are created not only

by the database clients; the search process also creates search

requests as the index tree is descended by any search thread.

Thiscreatestheopportunityforparallelism;searchrequeststhat

groups

Page 6

CHIASSON et al.: EFFECT OF TIME DELAYS ON STABILITY OF LOAD BALANCING ALGORITHMS937

await processing may be placed in any queue associated with a

search engine, and the contents of these queues may be moved

arbitrarily among the processing nodes of a group to achieve a

balance of the load.

An important point is that the actual delays experienced by

the network traffic in the parallel machine are random. Work

has been performed to characterize the bandwidth and delay on

unloaded and loaded network switches, in order to identify the

delay parameters of the analytic models and is reported in [15]

and [16]. The value

sents an average value for the delay and was found using the

procedure described in [16]. The interest here is to compare the

experimental data with that from the three models previously

developed.

Toexplaintheconnectionbetweenthecontrolgain

actual implementation, recall that the waiting time is related to

the number of tasks by

time to carry out a task. The continuous time control law is

s used for simulations repre-

andthe

where is the average

where

time. Consequently, the gain

tion of waiting time per second in the continuous time model.

Also,

where

is simply the number of tasks above the estimated

(local) average number of tasks and, as the interest here is the

case

0, consider

tation must execute the load balancing control law repetitively

with a (possibly random) time interval between balancing ac-

tions. With

the time interval between successive executions

of the load balancing algorithm, a discrete time control law

is defined that removes a fraction of the queue

in the time . The rate of reduction of waiting time

is

tinuous time control law, given the discrete time gain

control interval

, is

is the rate of decrease of waiting time per unit

represents the rate of reduc-

,

. The implemen-

so that an equivalent con-

and

(15)

This shows that the gain

tion by how fast the load balancing can be carried out and how

much (fraction) of the load is transferred. In the experimental

work reported here,

actually varies each time the load is bal-

anced. As a consequence, the value of

erage value for that run. The average time

is the same on all nodes used for the experiments (identical pro-

cessors) and is equal 10

s, while the time it takes to ready a

load for transfer is about 5 s. The initial conditions were taken

as

6000,4000,

to

0.06,

of the experimental responses were carried out with constant

for.

Fig. 4 is a plot of the responses

for1,2,3 (recall that

erage) value of the gains were

6667,s

5000. This figure compares favorably with Fig. 3 except for

the time scale being off; that is, the experimental responses are

is related to the actual implementa-

used in (15) is an av-

to process a task

2000 (corresponding

0.04,0.02). All

). The (av-

0.5s

4167,s

Fig. 4.

value of the gains are ?? ? 0.5? ? ? 6667, ? ? 4167, ? ? 5000 with

constant ? .

Experimental response of the load balancing algorithm. The average

Fig. 5.

value of the gains are ?? ? 0.3? ? ? 2400, ? ? 7273, ? ? 2500 with

constant ? .

Experimental response of the load balancing algorithm. The average

slower. The explanation for this is that the discrete load bal-

ancing implementation is not accurately modeled in the contin-

uous time simulations, only its average effect is represented in

the gains

. That is, the continuous time model does not stop

processing jobs (at the average rate

tasks to do the load balancing. Fig. 5 shows the plots of the re-

sponse for the (average) value of the gains given by

s2400,

s2500.

Fig. 6 shows the plots of the response for the (average) value

of the gains given by

0.2

s2500,

conditions were

6000,

(

0.06,

Fig. 7 summarizes the data from several experimental runs

of the type shown in Figs. 4–6. For

ten runs were made and the settling time (time to load balance)

were determined. These are marked as small horizontal ticks

on Fig. 7. (For all such runs, the initial queues were the same

and equal to

600,

) while it is transferring

0.3

7273,s

s1600,

s

4000,

2857. The initial

2000

0.04,0.02).

0.1,0.2,0.3,0.4,0.5,

400,200. For

Page 7

938IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005

Fig. 6.

value of the gains are ?? ? 0.2? ? ? 1600, ? ? 2500, ? ? 2857 with

constant ? .

Experimental response of the load balancing algorithm. The average

Fig. 7.

? .

Summary of the load balance time as a function of the feedback gain

each value of

was computed and is marked as a dot on given on Fig. 7. For

values of

0.6 and higher (with increments of 0.1 in

consistent results could not be obtained. In many cases, ringing

extended throughout the experiment’s time interval (200 ms).

For example, Fig. 8 shows the plots of the queue length less

the local queue average for an experimental run with

0.6 where the settling time is approximately 7 ms. In contrast,

Fig. 9 shows the experimental results under the same conditions

where persistent ringing regenerates for 40 ms. It was found the

response was so oscillatory that a settling time was not possible

to determine accurately. However, Fig. 7 shows that one desires

to choose the gain to be close to 0.5 to achieve a faster response

time without breaking into oscillatory behavior.

, the average settling time for these ten runs

),

VII. EXPERIMENTS OVER PLANETLAB

A geographically distributed system was developed to vali-

date the theoretical work for large delays and to assess different

Fig. 8.

? ? 0.6—settling time is approximately 7 ms.

Fig. 9.

persists.

? ? 0.6 These are the same conditions as Fig. 8, but now the ringing

load balancing policies in a real environment. The system con-

sists of several nodes running the same code. The nodes are part

of PlanetLab, a planetary-scale network involving more than

350 nodes positioned around the globe and connected via the

Internet (www.planetlab.org). The application used to illustrate

Page 8

CHIASSON et al.: EFFECT OF TIME DELAYS ON STABILITY OF LOAD BALANCING ALGORITHMS 939

Fig. 10. Parameters and settings of the experiment.

Fig. 11.Average network delays and transmission rates.

Fig. 12.

delays. Gain ? ? 0.3 and ?

Experimental response of the load balancing algorithm under large

? 0.5.

theloadbalancingprocesswasmatrixmultiplication,whereone

task is defined as the multiplication of one row by a static ma-

trix duplicated on all nodes (3 nodes in our experiment). The

number of elements in each row were generated randomly from

a specified range, which made the execution time of a task vari-

able. The network protocol UDP was used to exchange queue

size information among the nodes, and TCP (connection-based)

was used to transfer task data from one machine to another.

To match the experimental settings of the previous sections,

3 nodes were used; node1 at the University of New Mexico,

node2 in Taipei, Taiwan and node3 in Frankfurt, Germany. The

were set to 1/2 for. The initial parameters and settings

for theexperimentare summarized in theTable giveninFig. 10.

Throughout the experiments, network statistics related to

transmission rates and delays were collected. The averages of

these statistics are shown in the Table given in Fig. 11. Large

delays were observed in the network due to the dispersed

geographical location of the nodes. Moreover, the transmission

rates detected between the nodes were very low mainly because

the amount of data exchanged (in bytes) is small. Indeed,

the average size of data needed to transmit a single task was

20 bytes, which caused variability in the observed transmission

Fig. 13.

delays. Gain ? ? 0.5 and ?

Experimental response of the load balancing algorithm under large

? 0.5.

Fig. 14.Summary of the load balancing time as function of the gain ? .

Fig. 15.

delays. Gain ? ? 0.8 and ?

Experimental response of the load balancing algorithm under large

? 0.5.

rates due to the large communication delays and their variation.

In order to observe the behavior of the system under various

gains, several experiments were conducted for different gain

values

ranging from 0.1 to 1.0. Fig. 12 is a plot of the

responses

corresponding to each node

was set to 0.3. Similarly, Fig. 13 shows the system response

for gain

equal to 0.5. Fig. 14 summarizes several runs

corresponding to different gain values. For each

0.2, 0.3, 0.4, 0.5, 0.6, 0.7, ten runs were made, and the settling

times (time to load balance) were determined. For gain values

higher than 0.8, consistent results could not be obtained. For

instance, in most of the runs no settling time could be achieved.

where the gain

0.1,

Page 9

940IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005

Fig. 16.

delays. Gain ? ? 0.4 and ?

Experimental response of the load balancing algorithm under large

? 0.5.

Fig. 17.

delays. Gain ? ? 0.8 and ?

Experimental response of the load balancing algorithm under large

? 0.5.

However, when the observed network delays were stable, the

system response was steady and converged quickly to a bal-

anced state when

was equal to 0.8 (Fig. 15). As previously

mentioned, this scenario was not frequently seen. The system

behaviors in these experiments do not exactly match, and the

results obtained in the previous sections. This is due to the

difference in network topology and delays (and very likely due

to the random nature of the delay [17], [18]). For instance, the

ratio between the average delay and the task process time is

20

s s for the local area network (LAN) setting

and 12 (120 ms/10 ms) for the distributed setting. This fact is

one of the reasons ringing is observed earlier (for

in the LAN experiment whereas under PlanetLab the ringing

responses were observed starting at

we have previously observed a similar behavior, using a Monte

Carlo simulation of a 3-node distributed system with random

delays in [17] as well as experiments on a wireless LAN in [18].

Thepreviousexperimentshavebeen conductedundernormal

network conditions stated in the Table given in Fig. 11. How-

ever, another set of experiments were carried out where the net-

work condition worsens and larger delays were observed. In

particular, the data transmission rate between node 2 (Taiwan)

and node 3 (Germany) dropped from 1.03 KB/s to 407 KB/s.

Figs. 16 and 17 show the system responses for gains

and

0.8, respectively. These experiments clearly show

the negative effect of the delay on the stability of the system.

0.6)

0.8. Interestingly,

0.4

Fig. 18.

variance in the tasks processing time. Gain ? ? 0.3 and ?

Experimental response of the load balancing algorithm under large

? 0.5.

Fig. 19.

variance in the tasks processing time. Gain ? ? 0.8 and ?

Experimental response of the load balancing algorithm under large

? 0.5.

Nevertheless, we see that with a low gain

tling time is approximately 22 ms. On the other hand, when the

gain was set to 0.8, the system did not reach an equilibrium as

shown by the nodes’ responses

only the effect of delay on the stability of the system was tested.

In order to study the effect of the variability of the task pro-

cessing time on system behavior, the matrix multiplication ap-

plication was adjusted in a way to obtain the following results;

the average task processing time was kept at 10.2 ms, but the

standard deviation became 7.15 ms instead of 2.5 ms. Figs. 18

and19showtherespectivesystemresponsesforgains

and

0.8. Comparing Figs. 12 and 18, we can see that in

thelattercase,someringingpersistsandthesystemresponsedid

not completelysettle out. Onthe other hand, settingthe gain

to 0.8 led the system to accommodate the variances in the task

processing time.

The experiments presented in this section support the ones

reported in the previous section where a local area network was

used. In particular, high gains were shown to lead to persistent

ringing. Conversely, systems with low gain values lead to slow

responses in the load balancing.

0.4 , the set-

in Fig. 17. At this point,

0.3

VIII. SUMMARY AND CONCLUSION

Aload balancing algorithm was modeledas a nonlinear time-

delay system. The model was shown to be consistent in that the

totalnumberoftaskswasconservedandthequeueswerealways

Page 10

CHIASSON et al.: EFFECT OF TIME DELAYS ON STABILITY OF LOAD BALANCING ALGORITHMS 941

nonnegative.Further, thesystemwas showntobealways stable,

but thedelays docreate a limiton thesize of thecontroller gains

in order to ensure performance (fast enough response without

oscillatory behavior). Experiments demonstrated a correlation

of the continuous time model with the actual implementation.

Future work will consider the fact that the load balancing oper-

ation involves processor time which is not being used to process

tasks. There is a tradeoff between using processor time/network

bandwidth and the advantage of distributing the load evenly be-

tween the nodes to reduce overall processing time, which has

not been fully captured in the present work.

REFERENCES

[1] E. Altman and H. Kameda, “Equilibria for multiclass routing in multi-

agent networks,” in Proc. 40th IEEE Conf. Decision and Control, Or-

lando, FL, Dec. 2001, pp. 604–609.

[2] C. K. Hisao Kameda, J. Li, and Y. Zhang, Optimal Load Balancing in

Distributed Computer Systems.

[3] H.Kameda,I.R.El-ZoghdySaidFathy,andJ.Li,“Aperformancecom-

parisonofdynamicversusstaticloadbalancingpoliciesinamainframe,”

in Proc. 39th IEEE Conf. Decision and Control, Sydney, Australia, Dec.

2000, pp. 1415–1420.

[4] J. D. Birdwell, R. D. Horn, D. J. Icove, T. W. Wang, P. Yadav, and

S. Niezgoda, “A hierarchical database design and search method for

CODIS,” in Proc. 10th Int. Symp. Human Identification, Orlando, FL,

Sep. 1999.

[5] J. D. Birdwell, T. W. Wang, R. D. Horn, P. Yadav, and D. J. Icove,

“Method of indexed storage and retrieval of multidimensional informa-

tion,” in Proc. 10th SIAM Conf. Parallel Processing for Scientific Com-

putation, Sep. 2000.

[6] J. D. Birdwell, T.-W. Wang, and M. Rader, “The university of Ten-

nessee’s new search engine for CODIS,” in Proc. 6th CODIS Users

Conf., Arlington, VA, Feb. 2001.

[7] T. W. Wang, J. D. Birdwell, P. Yadav, D. J. Icove, S. Niezgoda, and

S. Jones, “Natural clustering of DNA/STR profiles,” in Proc. 10th Int.

Symp. Human Identification, Orlando, FL, Sep. 1999.

[8] H. G. Rotithor, “Taxonomy of dynamic task scheduling schemes in dis-

tributed computing systems,” Inst. Elect. Eng. Proc. Comput. Dig. Tech-

niques, vol. 141, no. 1, pp. 1–10, 1994.

[9] A. Corradi, L. Leonardi, and F. Zambonelli, “Diffusive load-balancing

polices fordynamicapplications,”IEEEConcurrency, vol.22, no.1, pp.

979–993, Jan.–Feb. 1999.

[10] M. H. Willebeek-LeMair and A. P. Reeves, “Strategies for dynamic load

balancing on highly parallel computers,” IEEE Trans. Parallel Distrib.

Syst., vol. 4, no. 9, pp. 979–993, Sep. 1993.

[11] C. T. Abdallah, J. D. Birdwell, J. Chiasson, V. Chupryna, Z. Tang, and

T. W. Wang, “Load balancing instabilities due to time delays in parallel

computation,” in Proc. 3rd IFAC Conf. Time Delay Systems, Sante Fe,

NM, Dec. 2001, pp. 198–202.

[12] J. D. Birdwell, J. Chiasson, Z. Tang, C. T. Abdallah, M. Hayat, and T.

Wang, “Dynamic time delay models for load balancing part I: determin-

istic models,” in Proc. CNRS-NSF Workshop: Advances in Control of

Time-Delay Systems, Paris, France, Jan. 2003, pp. 355–370.

[13] J. D. Birdwell, J. Chiasson, Z. Tang, C. T. Abdallah, and M. M. Hayat,

“The effect of feedback gains on the performance of a load balancing

network with time delays,” in Proc. IFAC Workshop on Time-Delay Sys-

tems (TDS’03), Rocquencourt, France, Sep. 2003, pp. 371–385.

[14] J. D. Birdwell, J. Chiasson, C. T. Abdallah, Z. Tang, N. Alluri, and T.

Wang, “The effect of time delays in the stability of load balancing al-

gorithms for parallel computations,” in Proc. 42nd IEEE Conf. Decision

and Control, Maui, Hi, Dec. 2003, pp. 582–587.

[15] P.Dasgupta,“Performanceevaluationoffastethernet,ATMandmyrinet

under PVM,” MS thesis, Univ. Tennessee, Knoxville, 2001.

[16] P. Dasgupta, J. D. Birdwell, and T. W. Wang, “Timing and congestion

studies under PVM,” in Proc. 10th SIAM Conf. Parallel Processing for

Scientific Computation, Portsmouth, VA, Mar. 2001.

[17] S. Dhakal, B. Paskaleva, M. Hayat, E. Schamiloglu, and C. Abdallah,

“Dynamical discrete-time load balancing in distributed systems in the

presence of time delays,” in Proc. IEEE Conf. Decision Control, Maui,

HI, Dec. 2003, pp. 5128–5134.

New York: Springer-Verlag, 1997.

[18] J.Ghanem,S.Dhakal,M.M.Hayat,H.Jerez,C.T.Abdallah,andJ.Chi-

asson, “On load balancing in distributed systems with large time delays:

theory and experiments,” in Proc. IEEE Mediterranean Control Conf.

(MED’04), Kusadasi, Turkey, Jun. 6–9, 2004.

John Chiasson (S’81–M’85–SM’03) received the

B.S. degree in mathematics from the University

of Arizona, Tucson, the M.S. degree in electrical

engineering from Washington State University,

Pullman, and the Ph.D. degree in controls from the

University of Minnesota, Minneapolis.

He has worked in industry at Boeing Aerospace,

Control Data, and ABB Daimler-Benz Transporta-

tion. Since 1999, has been on the faculty of Electrical

and Computer Engineering at the University of Ten-

nessee, Knoxville.

Zhong Tang (S’99–M’05) received the B.S. and

M.S. degrees in automatic control engineering from

Huazhong University of Science and Technology,

Wuhan, China, in 1994 and 1997, respectively, and

the Ph.D. degree in electrical engineering from the

University of Tennessee, Knoxville, in 2005.

Hisresearchinterestsincludecontrolsystems,par-

allel databases, and distributed computing.

Jean Ghanem received the B.S. degree in computer

and communication engineering from the American

University of Beirut and the M.S. degree in computer

engineering from the University of New Mexico, Al-

buquerque.

His research focused on the analysis and imple-

mentation of load balancing policies in large time

delay networks mainly the Internet and wireless

network. He has also performed work related to

securing 802.11 wireless networks. He has software

development experience in network, multithreading,

database and parallel programming. He is currently a solution architect consul-

tant for Verizon Communications.

Chaouki T. Abdallah (M’81–SM’85) received the

M.S.andPh.D.degreesinelectricalengineeringfrom

the Georgia Institute of Technology, Atlanta, in 1982

and 1988, respectively.

He is currently a Professor, Associate Chair,

and the Director of the Graduate Program in the

Electrical and Computer Engineering Department,

University of New Mexico, Albuquerque. He con-

ducts research and teaches courses in the general

area of systems theory with focus on control and

communications systems. His research has been

funded by national funding agencies (NSF, AFOSR, NRL), national laborato-

ries (SNL, LANL), and by various companies (Boeing, HP). He has also been

active in designing and implementing various international graduate programs

with Latin American and European countries. He was a cofounder in 1990 of

the ISTEC consortium, which currently includes more than 150 universities in

the US, Spain, and Latin America. He has coauthored four books, and more

than 150 peer-reviewed papers.

Dr. Abdallah’s IEEE professional service credits include serving as an IEEE

CSS BOG Appointed Member (2004–2005), as a member of the IEEE CSS

Long Range Planning Committee (2004–2005), and as the Program Chair of

the IEEE Conference on Decision and Control, Hawaii, 2003. He will also

be serving as the General Chair of CDC’08. He is a recipient of the IEEE

Millennium medal.

Page 11

942IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 13, NO. 6, NOVEMBER 2005

J. Douglas Birdwell (S’73–M’78–SM’85–F’99)

received the Ph.D. degree in electrical engineering

from the Massachusetts Institute of Technology,

Cambridge, in 1978.

He is currently a Professor of Electrical and

Computer Engineering at the University of Ten-

nessee (UT), Knoxville. He joined the faculty at

UT in 1978, and is Director of the Laboratory for

Information Technologies, which develops secure

distributed information systems and analysis tools.

His experience includes computer hardware and

software applications development, high performance data base design with

applications in bioinformatics, parallel computation and load balancing, and

artificial intelligence.

Dr. Birdwell has served in a number of positions in the IEEE Control Sys-

tems Society, including President (2004), member of the Board of Governors

(1990–2001), General Chair of IEEE Conference on Decision and Control

(1998), and Associate Editor of the IEEE TRANSACTIONS ON AUTOMATIC

CONTROL.

Majeed M. Hayat (S’89–M’92–SM’00) was born in

Kuwait in 1963. He received the B.S. degree (summa

cum laude) in 1985 in electrical engineering from the

University of the Pacific, Stockton, CA, and the M.S.

and Ph.D. degrees in electrical and computer engi-

neering from the University of Wisconsin-Madison,

in 1988 and 1992, respectively.

From 1993 to 1996, he worked at the University

of Wisconsin-Madison as a Research Associate

and co-principal investigator of a project on statis-

tical minefield modeling and detection, which was

funded by the Office of Naval Research. In 1996, he joined the faculty of

the Electro-Optics Graduate Program and the Department of Electrical and

Computer Engineering, University of Dayton, OH. He is currently an Associate

Professor in the Department of Electrical and Computer Engineering, Univer-

sity of New Mexico, Albuquerque. His research contributions cover a broad

range of topics in statistical communication theory, optoelectronics, signal

processing, and applied probability theory including avalanche photodiodes,

optical communication systems, image processing, models for distributed

computing, as well as infrared and spectral imaging.

Dr. Hayat was a recipient of a 1998 National Science Foundation Early Fac-

ulty Career Award. He is a member of SPIE and OSA.

HenryJérezreceivedtheB.Sc.degreeinsystemsen-

gineering from the Private University of Bolivia and

theM.Sc.andPh.D.degreesincomputerengineering

from the University of New Mexico, Albuquerque.

He is a Research Scientist with the Corporation

for National Research Initiatives, Reston, VA. His

research in the area of distributed digital object

repositories, load balancing, and wireless communi-

cations has been funded by AISTI, Hewlett Packard,

Microsoft and Sun Microsystems Latin America. His

primary work has been in the area of digital libraries

and distributed digital object repositories as a member of the Prototyping

team for the Research library at Los Alamos National Laboratory working on

LANL’s new Digital Object Storage Architecture. He has served as faculty

member, conference reviewer, enterprise consultant and international advisor

to several companies and universities across Latin and North America; and is

part of the advisory board for the Digital Library Linkages Initiative from the

Ibero American Science and Technology Education Consortium ISTEC.