Leader Election Algorithm in 2D Torus Networks with the Presence of One Link Failure
- SourceAvailable from: rutgers.edu[show abstract] [hide abstract]
ABSTRACT: After a failure occurs in a distributed computing system, it is often necessary to reorganize the active nodes so that they can continue to perform a useful task. The first step in such a reorganization or reconfiguration is to elect a coordinator node to manage the operation. This paper discusses such elections and reorganizations. Two types of reasonable failure environments are studied. For each environment assertions which define the meaning of an election are presented. An election algorithm which satisfies the assertions is presented for each environment.IEEE Transactions on Computers 02/1982; · 1.38 Impact Factor
- A New Leader Election Algorithm in Hypercube Networks. 497-501..
- Election a Leader in Asynchronous Ring. 98-115..
The International Arab Journal of Information Technology, Vol. 7, No. 2, April 2010 105
Leader Election Algorithm in 2D Torus Networks
with the Presence of One Link Failure
Mohammed Refai1, Ahmad Sharieh2, and Fahad Alshammari3
1Sciences and Information Technology Collage, Zarqa Private University, Jordan
2King Abdullah II School for Information Technology, University of Jordan, Jordan
3Information Technology and Computer Science College, University of Malaya, Malaysia
Abstract: Leader election algorithms solve the instability problem in the network which is caused by leader failure .In this
paper, we propose a new leader election algorithm in two dimensional torus networks. The algorithm aims to elect one node to
be a new leader. The new leader is identified by some characteristics not in the other nodes in the network. When the process is
terminated, the network is returned to a stable state with one node as leader where other nodes are aware of this leader. The new
algorithm solves this problem despite the existence of one link failure. In a network of N nodes connected by two dimensional
torus network, the new algorithm uses O(N) messages to elect a new leader in O(
cases: simple case (when the leader failure is detected by one node) and in the worst case (when the failure is discovered by up
to N-1 nodes).
Keywords: Concurrency, leader election, link failure, message complexity, 2D torus networks.
Received May 13, 2008; accepted November 25, 2008
N ) time steps. These results are valid for both
One of the most fundamental problems in distributed
systems is the leader failure. This problem can be
solved by Leader Election Algorithms (LEAs). These
algorithms move the system from an initial state where
all the nodes are in the same computation state into a
new state where only one node is distinguished
computationally (called leader) and all other nodes are
aware of this leader [1 ,4, 8].
Distributed systems are used to increase the
computational speed of problem solving. These systems
use a number of computers which cooperate with each
other to execute tasks. The control of distributed
algorithms requires one node to act as a controller
(leader). If the leader crashes or fails for any reason, a
new leader should be automatically elected to keep the
network working. The LEAs solves this problem by
substituting the failed leader by a new deserved leader
[5, 30, 31].
Election process is a program distributed over all
nodes. It starts when one or more nodes discover that
the leader has failed. It terminates when the remaining
nodes know who the new leader is. The LEAs are
widely used in centralized systems to solve single point
failure problem . For example, in client-server, the
LEAs are used when the server fails and the system
needs to transfer the leadership to another station. The
LEAs are also used in token ring. When the node that
has the token fails, the system should select a new node
to have the token .
In distributed systems, there are many network
topologies like hypercube, meshes, torus, ring, bus, …,
etc., . These topologies may be either hardware
processors or software processes embedded over other
hardware topology [10, 14]. This study will focus on
the 2D torus topology where one node works as a
leader. This paper proposes a new election algorithm to
solve leader failure in 2D torus network automatically.
Also it guarantees to solve the leader failure problem
despite of the existence of one link failure.
The election algorithms start when the leader failure
is detected by one node in a simple case or subset of
nodes reached to (N-1) at the worst case. It terminates
when the new leader is elected and all other nodes
become aware of the new leader.
Section 2 presents related work. Section 3 describes
the 2D torus model structure and properties. Section 4
presents the new algorithm. Mathematical proof for the
time steps and message complexity is presented in
section 5. Section 6 will conclude the results and
suggest future works.
2. Related Work
Leader election algorithm was studied by many
researchers [1, 2, 4, 6, 7, 10, 11, 12, 13, 14, 17, 18, 20,
22, 23, 25, 26, 27, 28, 34]. In these studies, the
researchers presented different methods to deal with
the leader election algorithms. In distributed systems, a
major problem is the leader failure and the relevant
106 The International Arab Journal of Information Technology, Vol. 7, No. 2, April 2010
Figure 2. Node links and codes ks.
Figure 1. 2D (7x4) Torus network.
leader election algorithm. The election algorithms were
varied based on the following:
• The nature of the algorithms (dynamic vs.
static) [7, 12, 22, 23].
• Node Identity (ID) (unique identity vs. anonymous
ID) (distinguished vs. not distinguished) .
• Topology types such as: ring, tree, complete graph,
meshes, torus, and Hypercube [1, 9, 22, 23].
• Communication mechanism used (synchronous vs.
asynchronous) [22, 23].
• Transmission media (wired vs. wireless or radio)
• Some of the previous work dealt with the link
failure [1, 26].
The leader election solution was first thought of at the
end of the seventies, it was started by the ring and
complete networks [1, 17, 26, 18, 11]. In the nineties
meshes, hypercube and tree were studied. To date,
these topologies and wireless networks are still being
studied [13, 17]. In , Singh proposed a protocol
for leader election which is tolerant to intermittent link
failure in the complete graph network. In , Gerard
proposed an election algorithm for oriented hypercube,
where each edge is assumed to be labeled with its
dimension in the hypercube. In , the election
problem in hypercube networks was studied, by using
two models with sense of direction. In , the problem
of one link failure besides the leader failure in the
hypercube was solved. In , the problem of, fault
tolerant and leader election in asynchronous complete
(fully connected) distributed networks was considered.
Antonoiu and Srimani  proposed a self-
stabilizing algorithm for leader election in a tree graph.
In , Navneet and others presented two new leader
election algorithms for mobile ad-hoc networks. In
, they proposed two
asynchronous distributed system in which the various
rounds of election proceed in a lock-step fashion.
Most of the previous researchers employed
theoretical proof to verify their algorithms. They used
the big O notation to obtain the complexity  of the
number of messages and time steps which represent the
domain factors of the algorithm complexity [9, 11].
Other researchers used simulation to validate their
3. Model Description
In 2D torus network, interconnection topology is a
torus graph with N = X * Y nodes (X is the number of
nodes in the X dimension, and Y is the number of nodes
in the Y dimension of the torus network). This section
explains; the model description; properties, and design
assumptions for this research [32, 33].
The 2D torus network is similar to 2D mesh, except
in the connection between the first and the last nodes in
each dimension. These connections make all nodes
connected with four neighbors (left, right, up, down) in
order to present more flexible topology [32, 33]. Figure
1 shows a two dimensional torus network with seven
columns and four rows (7 X 4).
3.1. Model Properties
The target architectures for the proposed algorithm are
distributed-memory, and two dimensional torus multi-
computers. For research analysis, we use the model
with the following properties: the multi-computers
consist of N nodes, which can be labeled 0, 1, 2, … ,
N-1. The nodes, physically, form an X * Y, (rows) *
(columns), two-dimensional torus.
Communication is with only one node at a time.
Multi cast is not implemented in hardware. A node can
send or receive simultaneously to, and from, the same-
or different nodes. The network uses XY-routing: a
message is routed within a row to the column that
contains the destination node and subsequently routed
within the column. Leader failure can occur any time.
This failure may be discovered by one node in a simple
case, or concurrently by more than one node-reached
in a worst case to N-1 nodes. The proposed algorithm
solves leader failure even when there is a link failure.
Each node has a distinguished ID used in the election
algorithm. Each node is connected by four links as in
Figure 2, which shows node links.
A torus network has advantages that make it one of
the preferable topologies. Torus is an attractive
Leader Election Algorithm in 2D Torus Networks with the Presence of One Link Failure 107
structure for parallel processing due to its symmetry
and regularity . Diameter of the torus is X * Y. A
node is labeled as (X,Y)-and uses X-Y routing
techniques. The number of links is X*Y. In fact, it has
been shown to be a very versatile and robust
architecture which is capable of executing several
efficient parallel algorithms. This topology is a suitable
architecture for designing tightly coupled systems in
both parallel and distributed systems.
This research assumes the following:
• • • • Leader: one node must have this state in a stable
The algorithm uses X and Y to represent the
dimensions and x and y to represent node position.
Phases: the proposed algorithm is composed of four
phases, as follows:
• Routers should work all the time even with fault
node-because the fault is in leader properties.
• All communication links are bidirectional.
• Leader node could fail due to different reasons
which will lead to loss of the leadership property.
Other nodes can detect this failure when the time
out exceeds without acknowledgement. Nodes
which detect this failure start the election algorithm.
• To solve leader failure problem, each node
calculates a weight that defines its relative
importance. Then, compares it with the weight of
other nodes that it has received and propagate the
maximum weight. This weight is represented by a
Identification Distinguish (ID) for each node.
• Each node has a distinguished ID. The election
algorithm depends on this ID.
• When the leader node crashes, its ID degrades to 0.
So, it can not win the election.
• One intermittent link failure is recoverable.
• Leader failure may be detected by a subset of nodes
(concurrent failure). This case becomes complicated
when the failure is detected by N-1 nodes (worst
Each node has the following variables:
• ID: a unique value for the election process.
• Position: the label indicates its position.
• Leader ID, leader position.
• Phase and step.
• State: leader or normal or candidate.
4. Proposed Algorithm
Before describing the proposed algorithm, the
definition of node state, phase and messages help to
understand the algorithm.
Node states: during the execution of the algorithm
the node state will be in one of the following states:
• • • • Normal: network is normal and no leader failure is
detected by this node.
• • • • Candidate: there is a failure and the election process
is in progress inside this node.
• • • •
Phase One: the node that detects leader failure
informs the failure event to its row.
Phase Two: the nodes in candidate states do
election process within each column to obtain the
result in the first row.
Phase Three: nodes in the first row make the
election within the first row to obtain the result in
Phase Four: the node that aware of the new leader
in phase three, broadcast the new leader to all
• • • •
• • • •
• • • •
Now, let us explain the events in each phase of the
• • • • Phase One: the algorithm starts by node(s) that
detects leader failure. This node sends failure
messages through link 1 (right) and 3 (left) to
inform its row about leader failure. A failure
message informs the receivers about leader failure.
To avoid the probability of link failure in this phase,
the failure message is sent in two directions. Each
node which receives this message performs the
following: changes its state to candidate. Passes the
failure message to the opposite direction through the
opposite link (1 to 3, or 3 to 1), depending on the
direction it receives the message. Starts phase two:
selects its ID as greater ID, and sends election
message through link 2 (Up). The election message
is composed of (message type, Phase, Step, Greater
ID, Position of the Greater ID, and position of the
message initiator). Ignores the received message if
the state is candidate.
• • • • Phase Two: the candidate nodes send election
messages through links 2. Any node which receives
the election message compares its ID with the
received ID in order to continue with the greater.
This process ends when the initiator position
receives the same message. After the column
election, the result is sent to the first node of each
column. This phase faces two problems: concurrent
initialization and link failure. To deal with the first
problem, any candidate node which receives the
election message ignores the message. If there is no
link failure, the result for the column is found in the
node that completed the ring. This node sends the
result to the node labeled (x, 0). To solve the second
problem, the node that sends the election message
waits for acknowledgment. If the node doesn’t
receive this message after time out, it detects that
there is a link failure. The role of the node that
detects the link failure is to send link-failure
108 The International Arab Journal of Information Technology, Vol. 7, No. 2, April 2010
message through link 3. The node which receives
this message forwards it through link 2, and then
left to pass the failure link. To complete the
algorithm, the result is sent to the node labeled (x,
0). After all, one node is aware of the column leader
so that the result for the leader is within the first row
of the network.
• • • • Phase Three: when the node which is labeled (0, 0)
(the most left node in the first row) finishes phase
2, it starts phase 3 by sending election message
through links 1 and waits for acknowledgment. Any
node which receives a phase three election message
from the left sends an acknowledgment message.
Then it compares IDs and sends a phase three
election message through link 1. If the node doesn’t
receive the acknowledgment message after time out,
it detects there is link failure. To solve this problem
in this phase, this node sends a link-failure message
through link 2, then link 1 and down to pass the link
failure. In this phase, any node which receives a
phase three message before finishing phase two
waits for the last message in phase two and then
continues. Phase three is terminated when phase
three election message is received by node (0, 0).
This node starts phase four by broadcasting the
result to all nodes.
• • • • Phase Four: after phase three, one node aware of
the new leader information. This node broadcasts
the result as follows:
a. Row broadcast: The nodes sends a leader
message in two directions through links 1 and 3
in order to make all nodes in the first row aware
of the new leader.
b. Column broadcast: the receivers in a row
broadcast, change their contents regarded the
leader, and changes its state to normal. Then,
they send the leader message through links 2 and
4. Any node which is aware of the new leader in
phase four ignores any new message about
The initiators of the leader message, within the row in
row broadcast and within the columns in column
broadcast send the leader message in two directions.
This is to recover the probability of one link failure.
This example is applied on a 4X4 torus network.
Assume that the link between nodes (0, 1) and (0, 2) is
failed as shown in Figure 3(a). Node (1, 2) detects
leader failure. So, it starts the algorithm by sending
two leader failure messages to inform about the failure.
Failure messages are sent through links 1 and 3 (thin
arrows). Node (1, 2) also starts phase two by sending
election message through link 2 (bold arrow) as shown
in Figure 3(a). In the second step, the nodes that
received the leader failure, passes this message to the
reverse direction and starts phase two. The node that
starts the algorithm in the first step waits for
acknowledgement. Node (2, 2) continues phase two
after comparing IDs and selecting the greater one.
Then, it sends Ack message to node (1, 2), as shown in
Figure 3(b). The nodes continue the algorithm as in
Figure 3(c) and 3(d).
We can see the election messages as bold arrows
and Ack messages as gray arrows. The election steps in
phase two are continued until the messages reach to the
election initiator in the column. Then, the column
results are sent to the first line-as shown in dots
Phase three is started when node (0, 0) receives its
column results by sending election message via link 1.
Node (0, 1) passes this message and returns the Ack
message. As shown in Figure 3(j), when node (0, 1)
exceeds the waiting time, it detects the link failure. So,
it uses the detour shown in Figure 3(j). The election
process continues until the node (0, 0) receives the
election message. So, it obtains the identification of the
new leader. In Figure 3(k), node (0, 0) starts
broadcasting the new leader information to the first
Each node receives the leader message; changes its
state to normal, and broadcasts the leader information
to its column as in Figure 3(l).
Figure 3. Steps for explaining the proposed algorithm when the link
between nodes (0, 1) and (0, 2) fails and the leader failure is
detected by node (1, 2).
4.2. Abstract Algorithm
This section presents the pseudo code for the
algorithm. A number of assumptions and variables
Leader Election Algorithm in 2D Torus Networks with the Presence of One Link Failure 109
have to be assigned. Each node has the following
• Local ID: the node ID that participates in the
• • • • Local Pos: the node position.
• • • • The algorithm uses five types of messages:
• • • • Election: vomposed of: steps of phase one to four,
ID (the winner ID); Pos (the winner position); and
• • • • Leader: contains the new leader (ID and position).
• • • • Link-failure: similar to the election message, except
the type to pass the link failure.
• • • • Column-result: is used in phase two in order to
inform the column election result to the first row.
• • • • Failure: is used to inform the row about the leader
The nodes are in one of four states:
• Phase Two: steps one to step Y, in each step, X
election messages are needed through links labeled
2 and the same number is also needed for
acknowledgment. So, the total number of messages
is expressed as in equation 1. Each column needs
one message in order to inform the first row of the
column-result message. This needs X messages. The
total number of messages for phase 2 is 2XY+X
• Phase Three: when node (0,0) receives the column
result message, it starts phase three: step one, node
(0,0) sends election message through link 1 and
waits for an acknowledgement. Step 2 to step X:
Each node receives the election message and sends
an acknowledgment message through link 3 and an
election message through link 1. When node (0,0)
receives the election message, it obtains the new
leader information after sending one message for
acknowledgement. In other words in phase three
each node sends two messages (election and
acknowledgement messages). So, the number of
messages in phase three is as in equation 2.
• Phase Four: node (0,0) starts a row broadcast by
sending two messages through links 1 and 3 in step
1. In steps 2 to X/2, two messages are used in each
step. In the last step, two extra messages are used if
the last nodes send the message before receiving it.
Row broadcast needs (X+ 2) messages.
• Normal: when the node is unaware of any failure
and the network is stable.
• Candidate: when the node is aware of the failure and
the node is participating in the election process.
• Leader: one node must have this state in a stable
• Failure: when the leader lose the leader prosperities.
Figure 4 shows the pseudo code for the proposed
5. Performance Evaluation
Performance evaluation is carried out by computing
the number of messages and time steps. The analyses
process is carried out for two cases. The first case is
the simple case, when the failure is detected by one
node. While the second case, is when the leader failure
is detected by subset of nodes which can reach all
nodes in the worst case.
5.1. Simple Case
5.1.1. Number of Messages
• Phase One: one node detects the leader failure. This
node starts phase one by sending 2 leader-failure
messages through links 1 and 3. Step two to step
X/2 + 1: each step needs two messages (any node
which receives the leader failure message sends this
message through the inverse link). The last two
nodes may use an extra two messages if the node
sends a leader-failure message before receiving it
from inverse links. Another way to find the number
of messages needed for phase one is to think about
receiving messages. Each node receives one
message-except the last two nodes which receive
two messages. So, the number of messages needed
for phase one is (X +2) Messages.
x =2(xy) Messages (1)
2x messages (2)
In column broadcast, as in the row broadcast, Y+2
messages are needed for each column. Therefore, the
total number of messages for X columns is X(Y+2). For
phase four the number of messages needed is as in
X+2 + XY+2X = XY+3X+2 (3)
In order to cover the link failure in phase two or phase
three the algorithm needs three messages. So, the total
number of messages overall the algorithm is as in
(X +2) + (2YX + X)+ 2X + (XY+3X+ 2) +3
= 3XY+7X+7 (4)
when X=Y, then XY =N. Thus, the total number of
messages in terms of N is expressed in equation 5.
3N+7N1/2+7 = O(N) Messages (5)