Content uploaded by Aresh Dadlani

Author content

All content in this area was uploaded by Aresh Dadlani

Content may be subject to copyright.

Improved Algorithms for Leader Election in Distributed Systems

Mohammadreza Effatparvar∗, Nasser Yazdani∗, Mehdi Effatparvar†, Aresh Dadlani∗ ‡ and Ahmad Khonsari∗‡

∗School of Electrical & Computer Engineering, University of Tehran, Tehran, Iran

†Electrical & Computer Engineering Department, Islamic Azad University - Ardabil Branch, Ardabil, Iran

‡School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics, Tehran, Iran

{effatparvar, yazdani}@ut.ac.ir, effatparvar@iauardabil.ac.ir, {a.dadlani, ak}@ipm.ir

Abstract

An important challenge confronted in distributed systems

is the adoption of suitable and efﬁcient algorithms for

coordinator election. The main role of an elected coordi-

nator is to manage the use of a shared resource in an

optimal manner. Among all the algorithms reported in the

literature, the Bully and Ring algorithms have gained more

popularity. In this paper, we describe novel approaches

towards improving the Bully and Ring algorithms and also

propose the heap tree mechanism for electing the coordi-

nator. The higher efﬁciency and better performance of our

presented algorithms with respect to the existing algorithms

is validated through extensive simulation results.

1. Introduction

Designating a single node as an organizer in distributed

systems is a challenging issue that calls for suitable election

algorithms. In distributed systems, nodes communicate with

each other using shared memory or via message passing. The

key requirement for nodes to execute any distributed task

effectively is coordination. In a pure distributed system, there

exists no central controlling node that arbitrates decisions

and thus, every node has to communicate with the rest of

the nodes in the network to make an apt decision. Often

during the decision process, not all nodes make the same

decision. Not only is the communication between nodes

time-consuming, but also is the decision-making process.

Coordination among nodes becomes difﬁcult when consis-

tency is needed among all nodes. Centralized controlling

nodes can be selected from the group of available nodes to

reduce the complexity of decision making. Many distributed

algorithms require one node to act as coordinator, initiator,

or otherwise perform some special role. In general, it does

not matter which node takes on this special responsibility,

but one of them has to do it.

Leader election is a technique that can be used to break the

symmetry of distributed systems [1]. In order to determine

a central controlling node in a distributed system, a node is

elected from the group of nodes as the leader to serve as the

centralized controller for that decentralized system [2].

Some applications of leader election include ﬁnding a

spanning tree with the elected leader as root [3], breaking

a deadlock, reconstructing a lost token in a token ring

network, and adopting leader election in ad hoc networks

[4][5]. Leader election algorithms for static networks have

been proposed in [6]. These algorithms work by constructing

several spanning trees with a prospective leader at the root

of the spanning tree and recursively reducing the number

of spanning trees to one. However, these algorithms work

only in cases where the topology remains static and hence,

cannot be used in a mobile setting [3][7].

The purpose of leader election is to choose a node that will

coordinate activities of the system [8]. In any leader election

algorithm, a leader is usually chosen based on some criterion

such as choosing the node with the largest identiﬁer. Once

the leader is elected, the nodes reach a certain state known

as terminated state. In leader election algorithms, such states

are partitioned into elected states and non-elected states [9].

When a node enters either state, it always remain in that

state. Every leader election algorithm must be satisﬁed by

the safety and liveness condition for an execution to be

admissible [10]. The liveness condition states that every

node will eventually enter an elected state or a non-elected

state. The safety condition for leader election requires that

only a single node can enter the elected state and eventually,

become the leader of the distributed system.

Several leader election algorithms such as the Bully

algorithm [11], Ring algorithm [12], Chang and Roberts’

algorithm [13], Peterson’s algorithm [14], LeLann’s algo-

rithm [15], and Franklin’s algorithm [16] have been proposed

over the years. These algorithms, however, require nodes

to be directly involved in leader election. Information is

exchanged between nodes by transmitting messages to one

another until an agreement is reached. Once a decision is

made, a node is elected as the leader and all the other nodes

will acknowledge the role of that node as the leader.

In this paper, we present improved modiﬁcations on the

existing Bully and Ring algorithms in Sections 2 and 3,

respectively. We also introduce the heap tree method in

electing the leader node by imposing lesser complexity

in message passing and bandwidth usage in Section 4. In

Section 5, we compare our algorithms with the existing

algorithms followed by concluding remarks in Section 6.

Pre-print

2. Modiﬁed Bully algorithm

As it has been mentioned, the number of messages ex-

changed between nodes in Bully algorithm is very high. In

[17], we presented a new approach based on a sort mecha-

nism to reduce the number of messages. This modiﬁed Bully

algorithm may, however, consume more time with regard to

the actual Bully algorithm in ﬁnding and electing the leader.

In this section, we introduce another approach to facilitate

the Bully algorithm with fault tolerance capabilities.

In this algorithm, when a node, say N, notices that the

leader has crashed, it sends an election message to all

nodes with higher ID numbers. Each node that receives the

election message sends its ID as a response to N. If no node

responses to N, it will broadcast a coordinator message to all

nodes. If some nodes respond to N, it will select the node

with the highest ID number as coordinator and will send

a new message with the selected ID number to all nodes,

informing them about the new leader. One drawback of this

approach is that the message carrying the highest node ID

may get lost before reaching N. Therefore, a fault tolerant

mechanism is required to prevent such fault. As illustrated

6 1

2

3

4

5

6 1

2

3

4

5

Election

Election

Election

6 1

2

3

4

5

6 1

2

3

4

5

OK

OK

OK

(a) (b)

6 1

2

3

4

5

6 1

2

3

4

5

Coordinator=4

6 1

2

3

4

5

6 1

2

3

4

5

Election

Election

(c) (d)

6 1

2

3

4

5

6 1

2

3

4

5

OK

6 1

2

3

4

5

6 1

2

3

4

5

Coordinator

(e) (f)

Figure 1. The modiﬁed Bully algorithm (a) Node 2

notices the coordinator has crashed & sends an election

message to nodes 3,4 & 5 (b) 2 receives OK messages

from 3 and 4, but not 5 (c) 4 is elected as leader (d) 4

sends an election message to 5 (e) 5 responds with an

OK to 4 (f) 4 introduces 5 as the new leader to the rest.

in Figure 1, when node Nselects the highest ID number,

it sends the selected ID to the rest of the nodes. The newly

elected leader then sends an election message to nodes with

greater ID numbers to ensure that there is exists no node

with a greater ID number than itself. If the leader receives a

response from nodes with ID numbers greater than its own,

it introduces the greatest one as the new leader. Otherwise,

it remains unchanged.

3. Modiﬁed Ring algorithm

In this section, we tend to introduce an appropriate method

to modify the Ring algorithm by reducing the number of

message passing and additional messages being sent to the

elected leader.

As in Figure 2, when a node notices that the leader has

crashed, it sends its ID number to its neighboring node in the

ring. Thus, it is not necessary for all nodes to send their IDs

into the ring. At this moment, the receiving node compares

the received ID with its own, and forwards whichever is the

greatest. This comparison is done by all the nodes such that

only the greatest ID remains in the ring. Finally, the greatest

ID returns back to the initial node. If the received ID equals

that of the initial sender, it declares itself as the leader by

sending a coordinate message into the ring.

It can be observed that this method dramatically reduces

6 1

2

3

4

5*

*

ID=2

ID=4

6 1

2

3

4

5

2<3

4<5

ID=5

ID=3

(

a

)

(

b

)

6 1

2

3

4

5

3<4

4<5

ID=5

1<5

6 1

2

3

4

5

ID=5

(c) (d)

(d, , )

5=5

6 1

2

3

4

5

ID=5

(e)

Figure 2: Processes 2,4 notice that coordinator has crashed

Figure 2. The modiﬁed Ring algorithm (a) Nodes 2

& 4 notice that the coordinator has crashed simultane-

ously (b) They send their IDs into the ring (c,d) The

greatest ID always remains in the ring (e) 5 is declared

as the leader.

Pre-print

the overhead involved in message passing. Thus, if many

nodes notice the absence of the leader at the same time,

only the message of the node with the greatest ID circulates

in the ring thus, preventing smaller IDs from being sent.

4. Leader election with heap tree

In this section, we describe a novel heap tree-based

algorithm for leader election. In this approach, each node

of the tree corresponds to an element of the array that

stores the value in the node. The tree is completely ﬁlled

on all levels except possibly the lowest, which is ﬁlled

from the left up to a point. An array Athat represents

a heap is an object with two attributes: length[A]and

heap-size[A], which are the number of elements in the

array and in the heap stored within array A, respectively.

Although A[1..length[A]] may be valid, no element

past A[heap-size[A]] is an element of the heap, where

heap-size[A]≤length[A].

The root of the tree is A[1], and given the index iof

a node, the indices of its parent PARENT(i), left child

LEFT(i), and right child RIGHT(i)can be computed

easily. Based upon the type of heap being used, the values

in the nodes satisfy a heap property. The property of the

max-heap is that for every node iother than the root,

A[PARENT(i)] ≥A[i], i.e, the value of a node is at

most the value of its parent. Thus, the largest element in a

max-heap is stored at the root, and the sub-tree rooted at

a node contains values no larger than that contained at the

node itself. A min-heap is organized in the opposite manner;

the min-heap property is that for every node iother than the

root, A[PARENT(i)] ≤A[i]. Hence, the root is always

the smallest element.

The MAX-HEAPIFY() procedure, which runs in

O(log n)time, is the key to maintaining the max-heap

property. The BUILD-MAX-HEAP() procedure, which

runs in linear time, produces a max-heap from an

unordered input array. The MAX-HEAP-INSERT(),

HEAP-EXTRACT-MAX(),HEAP-INCREASE-KEY(),

and HEAP-MAXIMUM() procedures, which run in O(log n)

time, allow the heap to be used as a priority queue.

Hereafter, we intend to describe our method using the

heap tree characteristics. We adopt the max-heap to explain

our algorithm and unlike the Ring algorithm, our approach

does not require any connectivity in the group. Also, the

nodes need not possess complete information regarding the

other nodes. In our method, each node that joins the group

records the information about its parent and children. The

data item stored in each node is greater than or equal to the

data items stored in its children. For the case of simplicity,

we implement heaps using arrays rather than linked list-like

structures. We simply number the nodes in the heap from

top to bottom, numbering the nodes on each level from left

to right and store the ith node in the ith location of the

y

Figure 3. The heap tree algorithm (a) The original

nodes (b) Nodes with their indices (c,d) Node com-

parison according to the algorithm.

array. The height of an n-element heap based on a binary

tree is log n, which is good during tree reconstruction. The

basic operations on heaps run in time at most proportional

to the height of the tree and thus, take O(log n)time.

In the new heap tree approach, nodes are added to the

tree by joining each node and then comparing its ID with

its father’s ID. If its ID is greater than that of its father’s,

the node swaps position with the father thus, resulting in

a tree reconstruction. In this structure, when the root is

deleted from the tree, we say that the leader has crashed.

As shown in Figure 3, when a node realizes that the leader

has crashed, it sends the election message to its father.

This message traverses up to the children of the deleted

root where the left and right children of the deleted root

compare their IDs with each other to determine the new

leader. Thus, it is not necessary for all nodes to start sending

their own IDs or election message in the tree. The election

message sent by the node reaches its direct father in the

tree and at this moment, the receiving node analyzes this

message to determine whether it is a duplicate message

or not. If duplicate, it is dropped by that node, otherwise

it is sent to the next father in the tree. By doing so, the

leader can be selected in O(log n)time at the expense of a

comparably reduced number of messages. In this method,

each node should save the information of its father, left

and right children, and its sibling. Therefore, unlike the

Bully algorithm which requires a memory space of n2, our

approach requires a smaller memory space of only 4n. More

details on these algorithms is provided in Table 1.

5. Simulations and evaluation results

In this section, we compare and evaluate the algorithms

based on their message passing complexity.

Pre-print

Table 1. Comparison of leader election algorithms

Method Total Memory Order Min. Max. Apprx. No. Messages

Needed Messages Messages when leader has crashed

Max-Heap 4nlog nlog nlog n+ (n−1) k

i=1blog(Ci)c − k

i,j=1;i6=jblog(max(Ai∩Bj))c;

Ai={f(Ci) = [Ci, f (bCi/2c)] |0< Ci}and

Bj={f(Cj) = [Cj, f (bCj/2c)] |0< Cj}

Bully n2n22n−2n2N(i)= (n−i+ 1)(n−i) + (n−1)

Ring n2n2n n2n

i=1(n−i) = 1/2 [(n−i)(n−i+ 1)]

5.1. Analysis of the modiﬁed Bully algorithm

In the modiﬁed Bully algorithm, if a single node detects

the crashed coordinator, N(i)is obtained with an order of

O(n)as follows:

N(i)= 2(n−i) + (n−2),(1)

With fault tolerance, the order of message passing increases

to O(n2)as follows:

N(i)= 2(n−i) + (n−2) + 2(n−i0) + (n−2),

= 2 [(n−i) + (n−2) + (n−i0)] ,(2)

where i0is the selected leader ID number in the ﬁrst step.

Figure 4(a) plots the Bully algorithm against the modiﬁed

Bully algorithm for the case when only one node notices that

the coordinator has crashed. The number of messages passed

in terms of the number of nodes that realize the absence

of the coordinator is depicted in Figure 4(b) for the fault-

tolerant Bully algorithm, the Bully algorithm based on the

sort mechanism, and the heap tree approach.

Table 2 indicates the number of faults, sent and received

messages in the fault-tolerant Bully algorithm. For instance,

if the 4th node found that the coordinator has crashed, it

would send 145 messages to the nodes with greater IDs.

However, due to the occurrence of 12 faults in the network,

the number of messages received reduces to 133. After a

initially being designated, the selected coordinator sends

messages to nodes with IDs greater than itself, which is

7 in this case. Since it receives 7 responses from nodes with

higher IDs, it declares the ﬁnal coordinator to the rest.

5.2. Modiﬁed Ring Algorithm

In this sub-section, we examine the message complexity of

the Ring algorithm with its modiﬁed version. If n{i1,i2,··· ,im}

is the number of nodes that concurrently detect the absence

of the crashed coordinator and nis the number of nodes in

the ring, then the total number of messages passed with an

order of O(n2)is as follows:

T=n{i1i2,··· ,im}×n. (3)

0

5000

10000

15000

20000

25000

4 30 60 90 120 149

The pr ocess ID notice that coordina tor has cra shed (N=150)

Number of messa ge passing

Bully

Modified Bully with fault tolerant

Figure 4. Message passing in terms of nodes that

realize that the coordinator has crashed (a) Bully versus

fault-tolerant Bully algorithm (b) Fault-tolerant Bully

versus sort-based Bully versus heap tree.

Similarly, for the modiﬁed Ring algorithm, we have:

n

X

i=0

i=n(n−1)/2 = 1/2(n2−n),(4)

with an order of O(n2)and reduced number of messages

passed. Thus, the complexity of modiﬁed Ring is much

lower than the Ring algorithm. Figure 5 compares the Ring,

modiﬁed Ring, and heap tree algorithms where the number

of nodes in the ring is assumed to be 10.

6. Conclusion and Future Works

Leader election algorithms play a vital role in distributed

environments. In this paper, we presented improved modi-

Pre-print

Table 2. Number of Message passing in the fault-tolerant Bully algorithm and the heap tree method

Node ID Messages in Messages in Sent Received No. of Messages Messages Coordinator

Heap Tree Fault-tolerant Messages Messages Faults sent to received from Messages

Bully higher ID higher ID

4 23 442 145 133 12 7 7 149

30 85 385 119 110 9 3 3 149

60 102 341 89 32 57 35 35 149

90 113 270 59 51 8 5 5 149

120 122 205 29 20 9 3 3 149

Figure 5. Message passing in Ring, modiﬁed Ring and

heap tree when several nodes notice that coordinator

has crashed simultaneously.

ﬁcations to the well-known Bully and Ring algorithms. In

addition, we proposed a novel approach known as the heap

tree method towards leader election based on the max-heap

data structure. The lesser complexity in message passing

exhibited by this method is justiﬁed through obtained sim-

ulation results. In future, we tend to adopt these appraoches

in ad hoc and sensor environments.

References

[1] G. Fredrickson and N. Lynch, “The impact of synchronous

communication on the problem of electing a leader in a

ring”, in Proc. 16th ACM Symp. on Theory of Computing,

Washington, USA, pp. 493-503, 1984.

[2] R. G. Gallager, “Choosing a leader in a network”, Internal

Memorandum, Laboratory for Information and Decision Sys-

tems, MIT, 1977.

[3] R. G. Gallager, P. Humblet and P. Spira, “A distributed

algorithm for minimum weight spanning trees”, ACM Trans.

on Programming Languages and Systems, vol.4, no.1, pp. 66-

77, Jan. 1983.

[4] N. Malpani, J. Welch and N. Vaidya, “Leader election algo-

rithms for mobile ad hoc networks”, 4th Intl. Workshop on

Discrete Algorithms and Methods for Mobile Computing and

Communications, Boston, USA, pp. 96-103, Aug. 2000.

[5] P. Basu, N. Khan and T. Little, “A mobility based metric for

clustering in mobile ad hoc networks”, In Proc. 21st Intl.

Conference on Distributed Coputing Systems, Washington,

USA, pp. 413, Apr. 2001.

[6] D. Peleg, “Time optimal leader election in general networks”,

Journal of Parallel and Distributed Computing, vol.8, no.1, pp.

96-99, Jan. 1990.

[7] P. Humblet, “Selecting a leader in a clique in O(nlog n)

messages”, Internal Memorandum, Laboratory for Informa-

tion and Decision Systems, MIT, 1984.

[8] J. Welch and H. Attiya, Distributed Computing: Fundamen-

tals, Simulations, and Advanced Topics, 2nd ed. London, UK:

McGraw-Hill Publishing Company, 2001.

[9] E. Korach, S. Moran, and S. Zaks, “Tight lower and upper

bounds for some distributed algorithms for a complete net-

work of processors”, in Proc. 3rd ACM Symp. on Principles

of Distributed Computing, Vancouver, Canada, pp. 199-207,

Aug. 1984.

[10] P. M. B. Vitanyi, “Distributed election in an Archimedean

ring of processors”, in Proc. 16th ACM Symp. on Theory of

Computing, Washington, USA, pp. 542-547, 1984.

[11] H. Gracia-Molina, “Elections in a distributed computing

system”, IEEE Trans. on Computers, vol. C-31, no. 1, Jan.

1982.

[12] N. Fredrickson and N. Lynch, “Electing a leader in a syn-

chronous ring”, Journal of ACM, vol. 34, no. 1, pp. 98-115,

1987.

[13] E. Chang and R. Roberts, “An improved algorithm for

decentralized extrema-ﬁnding in circular conﬁgurations of

processes”, Communications of the ACM, vol. 22, no. 5, pp.

281-283, May 1979.

[14] G. L. Peterson, “An O(nlog n)unidirectional algorithm for

the circular extrema problem”, ACM Trans. Programming

Languages and Systems, pp. 758-762, Oct. 1982.

[15] G. LeLann, “Distributed systems - towards a formal ap-

proach”, Information Processing Letters, pp. 155-160, 1977.

[16] W. R. Franklin, “On an improved algorithm for decentralized

extrema ﬁnding in circular conﬁgurations of processors”,

Communication of the ACM, pp. 336-337, 1982.

[17] M. Effatparvar, M. R. Effatparvar, A. Bemana, and M. De-

hghan, “Determining a central controlling processor with fault

tolerant method in distributed system,” In Proc. of ITNG’07,

May 2007.

Pre-print