Conference PaperPDF Available

Reducing null messages using grouping and status retrieval for a conservative discrete-event simulation system.

Authors:

Abstract and Figures

In this paper we investigate Chandy-Misra-Bryant Null message algorithm and propose a grouping technique to improve the performance. This technique along with status retrieval which will be explained in detail can improve the performance when compared to the traditional conservative algorithm by Chandy-Misra-Bryant. Null message algorithm is an efficient conservative algorithm that uses null messages to provide synchronization between logical processes in a parallel discrete event simulation (PDES) system. The performance can be decreased if a large number of null messages are generated by LPs to avoid deadlock. The main objective of this research work is to propose a new grouping technique that can be used to reduce the Null messages between the logical processes. Since the performance of Null Message algorithm mainly depends on the Lookahead (L) values, our proposed technique can be used to determine an optimum value of the Lookahead.
Content may be subject to copyright.
Reducing Null Messages Using Grouping and Status Retrieval for a Conservative
Discrete-Event Simulation System
Bevin Thomas, Syed S. Rizvi, Khaled M. Elleithy
Department of Computer Science, University of Bridgeport, Bridgeport, CT 06604, USA
{srizvi, bthomas, elleithy}@bridgeport.edu
Tel: 92 (111) 002-004
Fax:92 (021) 509-0968
Keywords: Discrete event simulation, conservative
algorithms, null message algorithm, parallel systems.
Abstract
In this paper we investigate Chandy-Misra-Bryant Null
message algorithm and propose a grouping technique to
improve the performance. This technique along with status
retrieval which will be explained in detail can improve the
performance when compared to the traditional conservative
algorithm by Chandy-Misra-Bryant. Null message algorithm is
an efficient conservative algorithm that uses null messages to
provide synchronization between logical processes in a parallel
discrete event simulation (PDES) system. The performance can
be decreased if a large number of null messages are generated
by LPs to avoid deadlock. The main objective of this research
work is to propose a new grouping technique that can be used
to reduce the Null messages between the logical processes.
Since the performance of Null Message algorithm mainly
depends on the Lookahead (L) values, our proposed technique
can be used to determine an optimum value of the Lookahead.
1. INTRODUCTION
Parallel and distributed simulation refers to technologies that
help a simulation program to execute on multiple processors,
interconnected networks. Parallel simulations execute on
multiple processors or multiple computers confined to single
machine room while distributed simulations execute on
computers that are distributed geographically. A Time
management is required to ensure the execution of the
distributed simulation is properly synchronized. Time
management ensures that events are processed in a correct
order. Time management algorithms assume that logical
processes (LPs) communicate by exchanging time stamped
messages or events. The criterion is to make sure that LP
process events are processed in timestamp order.
There are two different kinds of parallel simulation: -
optimistic and conservative. Optimistic simulation allows
processors to independently simulate events assuming they are
temporally correct. When it is discovered that there is a
temporal discrepancy, the simulation is rolled back” to the
time of the discrepancy and then proceeds again. Conservative
simulation never allows discrepancies event processing is
only allowed when it can be guaranteed that the event will not
be altered. The principal task of conservative simulation is to
determine when it is “safe to process an event. An event is
said to be safe if the event containing the time stamp is less
than the Lower Bound on the Time Stamp (LBTS).
The algorithms developed by Chandy, Misra and Bryant
were the first synchronizations algorithms that were developed.
Each LP sends messages with non decreasing timestamps
and it’s received in the same order it was received. Each
process will execute an event with the smallest timestamp from
the queues. If any of the queues is empty then process is
blocked till the selected queue is not empty. This approach is
prone to deadlocks and therefore Chandy-Misra-Bryant
suggested the Null Message Algorithm (NMA). Null messages
are used to avoid deadlocks. A null message with timestamp
T
null
send from LP
a
to LP
b
is a promise by LPa that it will not
later send a message to LP
b
carrying a timestamp smaller than
T
null
. The null message algorithm introduced a key property
which is the Lookahead. In simple words, if an LP is at
simulation time T, and it can guarantee that any message it will
send in the future will have a time stamp of at least T+L
regardless of what messages it may later receive.
1.1 Problem Identification
Null message algorithm (NMA) has indeed resolved the
problem of deadlocks which by sending null messages between
neighboring LPs. The drawback of NMA is that the
performance can degrade drastically if a lot of null messages
are sent across the network and betweens LPs. It depends upon
the Lookahead value (L). If the Lookahead value is very small
then a lot of null messages are sent across. The main objective
is to calculate an optimum Lookahead value so that the null
1-4244-1457-1/08/$25.00 © IEEE
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the SpringSim 2009 proceeding.
Authorized licensed use limited to: University of Bridgeport. Downloaded on October 30, 2008 at 18:00 from IEEE Xplore. Restrictions apply.
messages can reduce and therefore improve the performance.
2. RELATED WORK
There are a few researches done in NMA in terms of
reducing the no. of null messages and calculating the optimum
Lookahead value. Ronald C. De Vries in [4] reduced the
number of NULL messages through of prediction of channel
lines. A framework is presented on which the distributed
discrete event simulation can be built for applications which
can be decomposed into feed-forward and feedback networks.
Another notable work done mentioned in [2] was the
research done by Syed S. Rizvi, K. M. Elleithy, and Aasia
Riasat in which they proposed a mathematical model which
can be used to approximate the optimal values of some critical
parameters such as frequency of transmission, Lookahead (L)
values, and the variance of null message elimination.
According to B. R. Preiss, W. M. Loucks, J. D. MacIntyre, J.
A. Field referred in [3], a null message cancellation can
improve performance by a great factor. Null Message
cancellation is an algorithmic modification to the basic
conservative synchronization scheme wherein a null message
is discarded before receipt when overcome by a message with a
larger timestamp.
3. METHOD OF GROUPING AND STATUS RETRIEVAL
In this paper, the approach to reduce the number of null
messages is to group logical processes where each group may
consist of n number of LPs. The number of LPs for each group
depends upon their similarities or with an optimum value. The
optimum number of LPs should be calculated. Each group is
controlled by a controller. The role of the controller is to
synchronize LPs within a group as well as send
synchronization messages across groups to different other
controllers. Synchronization messages are sent after fixed
interval of time. Whenever a controller receives a message
from the neighboring linked controller, it broadcasts the
message within the group.
The controller has to be directly connected to the LPs. All
LPs are connected to other neighboring LPs using mesh
topology. However, an LP can send synchronization messages
only to controller. The synchronization message from an LP to
its corresponding controller indicates that the sending LP has
finished with its assigned tasks. Upon receiving the
synchronization message from the one of the LPs, controller
broadcast it inside the group. Controllers send synchronization
messages after a fixed interval of time.
Fig. 1 represents the implementation of the proposed
Start
Controller Receives
a Message from LP
Pool of LPs
Grouped LPs
Non-Grouped
LPs
Reorder LP and
Time S
tamp
Message
Broadcast to LPs
Select Smallest Time
sage
Controller Receives
a Message from LP
Message Transmitted
to Controller LPs
Select Smallest Time
sage
Fig. 1.
An illustration of the proposed algorithm. The upper dotted box represents the implementation of the first 4 steps. The lower dotted
box represents step 5 to 8 of the proposed algorithm. The last 2 steps of the proposed algorithm used in both upper and lower boxes.
algorithm. The upper dotted-box of Fig.1 represents the
behavior of a controller when it receives a message from a
grouped-LP. On the other hand, the lower dotted-box of Fig. 1
represents that how controller deals with the message
originating from an ungrouped-LP.
In Fig. 2, there are 9 LPs and are grouped into 3 where each
group consist of 3 LPs in this figure. Each group also consists
of a controller which is directly connected to each LP.
3.1 Proposed Algorithm
1. While Loop(Simulation is not over)
2. Controller receives from LP inside the group
3. Record the LP and the time stamp
4. Broadcast message to other LPs
5. If Message is received from controller of other group
6. Broadcast message to the LPs within the group
7. Send null messages to neighboring controllers
8. With smallest timestamp indicating a lower bound on future
messages sent from that group
9. Approximately as: (T + L)
10. End if
11. END-LOOP
Fig. 3 shows the implementation of the proposed algorithm
with eight LPs. if all LPs were connected interconnected then
the total null messages sent for a single deadlock would be
totally 32. If the Lookahead is a small value, then the total
number of null messages transmitted between the LPs will be
approximated as follows:
LP
1
LP
3
LP
2
Group-1 (G1)
LP
1
LP
3
LP
2
LP
1
LP
3
LP
2
Group-3 (G3)
Group-2 (G2)
Controller LP
Controller LP
Controller LP
Fig. 2. Topological Map of 3 Groups with nine L
Ps and three Master Controllers. Within the group, LP are connected via Mesh Topology.
LPs can communicate with each other in a group but they can only send synch-message to their respective controller.
Total Null Messages = (32 * n) where n is iteration number.
It should be noted in Fig. 3 that the total number of null
messages are reduced with respect to the implementation of the
proposed algorithm. Node A sends a sync message to
controller A. On receiving the controller sends a message
inside the group except the link it received and also keeps a
track of the latest timestamp of the each LP. Thus in our
example it sends 9 messages within the group. Total number of
messages exchanged across all the LP’s are
Total number of null messages = 9 X 2(inside the group) + 2
(Between controllers) + 3 X 2(broadcast inside group) = 26.
The main thing to note here is that outside the group only 2
messages were sent. This brief analysis emphasizes the
significance of the proposed methodology for efficiently
grouping the LPs with their respective controllers and
connecting them using the mesh topology.
4. CONCLUSION
In this paper, we presented a new technique of grouping the
LPs with the status retrieval method. In order to support the
proposed technique, we presented an algorithm for the
modified NMA. In addition to the proposed technique and the
algorithm, we also provided a discussion on the
implementation of the proposed technique with the help of the
schematic. We believe that the proposed method of grouping
LPs can reduce the null messages transmission to a reasonable
extent. Though this technique is expensive since a controller
has to be setup, it serves the purpose of reducing the null
messages.
REFERENCES
[1] K. M. Chandy and J. Misra, "Distributed Simulation: A case
study in design and verification of distributed programs", IEEE
Transactions on Software Engineering, SE-5:5, pp. 440-452,
1979.
[2] Syed S. Rizvi, K. M. Elleithy, Aasia Riasat, “Minimizing the
Null Message Exchange in Conservative Distributed
Simulation,” International Joint Conferences on Computer,
Information, and Systems Sciences, and Engineering, CISSE
2006, pp. 443-448, Bridgeport CT, December 4-14 2006
[3] B. R. Preiss, W. M. Loucks, J. D. MacIntyre, J. A. Field, “Null
Message Cancellation in Conservative Distributed Simulation,”
Distributed Simulation 91 Proceedings of the SCS
Multiconference on Advances in Parallel and Distributed
Simulation, 1991.
[4] Ronald C. De Vries, Senior Member, IEEE Reducing Null
Discrete Messages in Misra’s Distributed Event Simulation
Method.
Fig. 3. Implementation of 8 LPs with the Mesh Topology
within the
group. Figure shows a different construction of grouped-
LPs
connecting via a Unicast connection between B and H.
... Other null message reduction algorithms that have been proposed use a generic mathematical model to approximate the optimal values of the parameters that are directly involved in the performance of a time management algorithm [33]. Thomas et al. [34] proposed another null message reduction algorithm based on grouping and status retrieval by determining an optimum value of the lookahead. ...
Article
Full-text available
This work attempts to provide insight into the problem of executing discrete event simulation in a distributed fashion. The article serves as the state of the art in Parallel Discrete-Event Simulation (PDES) by surveying existing algorithms and analyzing the merits and drawbacks of various techniques. We discuss the main characteristics of existing synchronization methods for parallel and distributed discrete event simulation. The two major categories of synchronization protocols, namely conservative and optimistic, are introduced and various approaches within each category are presented. We also present the latest efforts towards PDES on emerging platforms such as heterogeneous multicore processors, Web services, as well as Grid and Cloud environment.
... Bain and Scott [1] try to simplify network topology to re solve problem of null messages overhead. Recently, Rizvi Et al. [5] [6] have proposed mathematical model to quantify the null messages under different network loads. All these works are done to optimize the performance of the conservative distributed event simulations. ...
Article
Full-text available
A conservative distributed simulation requires all logical processes (LPs) to follow the causality constraint requirement. This implies that all event-messages are processed in strictly timestamp order. Apart from the timestamp of each event generated by LPs, synchronization between all LPs is the second most important requirements. Finally, there must not be a deadlock in the distributed environment. A deadlock may occur when there is no events present in the queue of LP. In such case, to avoid deadlock, Chandy-Misra-Bryant presented an algorithm called Null Message Algorithm (NMA) [3]. These null messages are passed as an event-message to other LPs and it stored in one of queues of LPs. This null message indicates that till the time stamp of that null message, all other events in the queue which have lesser time stamp than null message’s time stamp are safe to process. It means that there won’t be anyarrival of any events from that logical process until current simulation time is equal to the time stamp of the null message. With the time stamp of the null message, a Lookahead value is added to the time stamp of that null message. This Lookahead value can be measure on certain kind of parameters such as delay to transmit a message, propagation delay, etc. therefore, calculating value of Lookahead is the most important part as Lookahead value affects the performance of the conservative distributed event simulation. Proper value of Lookahead can reduce the number of null messages which decreases thetraffic of the network. In this paper, we demonstrate some calculation on the Lookahead which shows the performance of the distributed event simulation
Chapter
Full-text available
The performance of a conservative time management algorithm in a distributed simulation system degrade s significantly if a large number of null messages are exchanged across the logical processes in order to avoid deadlock. This situation gets more severe when the exchange of null messages is increased due to the poor selection of key parameters such as lookahead values. However, with a mathematical model that can approximate the optimal values of parameters that are directly involved in the performance of a time management algorithm, we can limit the exchange of null messages. The reduction in the exchange of null messages greatly improves the performance of the time management algorithm by both minimizing the transmission overhead and maintaining a consistent parallelization. This paper presents a generic mathematical model that can be effectively used to evaluate the performance of a conservative distributed simulation system that uses null messages to avoid deadlock. Since the proposed mathematical model is generic, the performance of any conservative synchronization algorithm can be approximated. In addition, we develop a performance model that demonstrates that how a conservative distributed simulation system performs with the null message algorithm (NMA). The simulation results show that the performance of a conservative distributed system degrades if the NMA generates an excessive number of null messages due to the improper selection of parameters. In addition, the proposed mathematical model presents the critical role of lookahead which may increase or decrease the amount of null messages across the logical processes. Furthermore, the proposed mathematical model is not limited to NMA. It can also be used with any conservative synchronization algorithm to approximate the optimal values of parameters.
Article
Full-text available
The problem of system simulation is typically solved in a sequential manner due to the wide and intensive sharing of variables by all parts of the system. We propose a distributed solution where processes communicate only through messages with their neighbors; there are no shared variables and there is no central process for message routing or process scheduling. Deadlock is avoided in this system despite the absence of global control. Each process in the solution requires only a limited amount of memory. The correctness of a distributed system is proven by proving the correctness of each of its component processes and then using inductive arguments. The proposed solution has been empirically found to be efficient in preliminary studies. The paper presents formal, detailed proofs of correctness.
Article
Consideration is given to the implementation of distributed discrete-event simulation (DDES) using what has been commonly called the Misra approach, after one of its inventors. A major problem with DDES is that deadlock can occur. Therefore, DDES algorithms must either avoid deadlock in the first place, or detect the existence of deadlock when it does occur and eliminate it. J. Misra (1986) proposes the use of null messages as one way to circumvent the deadlock problem. However the number of null messages can become quite large. Methods are presented for reducing the number of null messages through the prediction of channel times. A framework is presented on the basis of which distributed discrete-event simulation can be built for applications that can be decomposed into feedforward and feedback networks
Article
This paper presents the results of an empirical study of the effects of null message cancellation on the performance of conservatively synchronized distributed simulation. Null message cancellation is an algorithmic modification to the basic conservative synchronization scheme wherein a null message is discarded before receipt when overcome by a message with a larger timestamp.
Implementation of 8 LPs with the Mesh Topology within the group. Figure shows a different construction of grouped-LPs connecting via a Unicast connection between B and H
  • Fig
Fig. 3. Implementation of 8 LPs with the Mesh Topology within the group. Figure shows a different construction of grouped-LPs connecting via a Unicast connection between B and H.