Conference PaperPDF Available

Reducing null message traffic in large parallel and distributed systems

Authors:

Abstract and Figures

Null message algorithm (NMA) is one of the efficient conservative time management algorithms that use null messages to provide synchronization between the logical processes (LPs) in a parallel discrete event simulation (PDES) system. However, the performance of a PDES system could be severely degraded if a large number of null messages need to be generated by LPs to avoid deadlock. In this paper, we present a mathematical model based on the quantitative criteria specified in (Rizvi et al., 2006) to optimize the performance of NMA by reducing the null message traffic. Moreover, the proposed mathematical model can be used to approximate the optimal values of some critical parameters such as frequency of transmission, Lookahead (L) values, and the variance of null message elimination. In addition, the performance analysis of the proposed mathematical model incorporates both uniform and non-uniform distribution of L values across multiple output lines of an LP. Our simulation and numerical analysis suggest that an optimal NMA offers better scalability in PDES system if it is used with the proper selection of critical parameters.
Content may be subject to copyright.
A
bstract
Null message algorithm (NMA) is one of the efficient
conservative time management algorithms that use null
messages to provide synchronization between the
logical processes (LPs) in a parallel discrete event
simulation (PDES) system. However, the performance
of a PDES system could be severely degraded if a
large number of null messages need to be generated by
LPs to avoid deadlock. In this paper, we present a
mathematical model based on the quantitative criteria
specified in [12] to optimize the performance of NMA
by reducing the null message traffic. Moreover, the
proposed mathematical model can be used to
approximate the optimal values of some critical
parameters such as frequency of transmission,
Lookahead (L) values, and the variance of null
message elimination. In addition, the performance
analysis of the proposed mathematical model
incorporates both uniform and non-uniform
distribution of L values across multiple output lines of
an LP. Our simulation and numerical analysis suggest
that an optimal NMA offers better scalability in PDES
system if it is used with the proper selection of critical
parameters.
Keywords— Conservative distributed simulation, discrete
event, null messages, parallel and distributed systems.
1. Introduction
While there has been much research evaluating the
performance of conservative NMA in terms of message
transmission overhead and processor idle time, there
has been comparatively little work devoted to
suggesting any potential optimization for the NMA.
This paper presents a mathematical model based on the
quantitative criteria specified in [12] to optimize the
performance of NMA by minimizing the null message
transmission across each LP.
In PDES systems, the distributed discrete events
need to be tightly synchronized with each other in order
to produce the correct results. However, if these
discrete events are not properly synchronized, the
performance of a PDES environment may degrade
significantly [2]. Time management algorithms are,
therefore, required to ensure that the execution of a
PDES is properly synchronized. Two main classes of
time management algorithms are optimistic and
conservative. In optimistic time management
algorithm, errors are detected and recovered at run
time. However, the performance of optimistic
synchronization protocols is mainly dependent on the
transmission delay [13]. On the other hand, in
conservative PDES, each LP processes events strictly
in time stamp order. Since all LPs do not have a
consistent view of the state of the entire system, LPs
must exchange information to determine when events
are safe to process [1, 3].
Although, much research has been done to evaluate
the performance of conservative NMA for
inefficiencies and overhead [3, 12], none of them
suggest any potential optimization for the NMA.
Reference [12] proposed a quantitative criterion that
incorporates many critical parameters relevant to the
performance of NMA. It has been shown that the
selection of values for several critical parameters such
as the values for Lookahead (L), null message ratio
(NMR), and frequency of transmission plays an
important role in the generation of null messages [12].
If these values are not properly chosen by a simulation
designer, the result will be an excessive number of null
messages across each LP. This situation gets more
severe when the NMA needs to run to perform a
detailed logistics simulation in a distributed
environment to simulate a huge amount of data [9].
This paper presents a mathematical model based on
the quantitative criteria specified in [12] to optimize
the performance of NMA by reducing the null message
Reducing Null Message Traffic in Large Parallel and Distributed Systems
Syed S. Rizvi and Khaled M. Elleithy
Computer Science and Engineering Department
University of Bridgeport,
B
ridgeport, CT 06601
{srizvi, elleithy}@bridgeport.edu
A
asia Riasat
De
partment of Computer Science
Institute of Business Management
Karachi, Pakistan 78100
aasia.riasat@iobm.edu.pk
1115
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:56:32 EST from IEEE Xplore. Restrictions apply.
978-1-4244-2703-1/08/$25.00 ©2008 IEEE
t
raffic. The reduction in the null message traffic
significantly improves the performance of a PDES
system by both minimizing the transmission overhead
and maintaining a consistent parallelization. Moreover,
the proposed mathematical model can be used to
approximate the optimal values of some critical
parameters such as frequency of transmission, L values,
and the variance of null message elimination. These
optimal values can be further used to eliminate
unnecessary generation of null messages across the
LPs. In addition, the performance analysis of the
proposed mathematical model incorporates both
uniform and non-uniform distribution of L values
across multiple output lines of an LP. Our simulation
and numerical analysis suggest that an optimal NMA
offers better scalability in PDES system if it is used
with the proper selection of critical parameters
The rest of the paper is organized as follows. Section
2 provides an overview of the conservative
synchronization protocols. Section 3 presents the
proposed mathematical model based on the quantitative
criteria specified in [12]. Section 4 discusses the
potential optimizations in the NMA based on the
proposed mathematical model. Section 5 presents a
performance analysis for both the proposed
mathematical model and the optimizations for NMA.
Finally, Section 6 concludes the paper.
2. Related work
Event synchronization is an essential part of parallel
simulation. In general, synchronization protocols can
be categorized into two different families: conservative
and optimistic. Conservative protocols fundamentally
maintain causality in event execution by strictly
disallowing the processing of events out of timestamp
order. The main problems faced in conservative
algorithms are overcoming deadlock and guaranteeing
the steady progress of simulation time. Examples of
conservative mechanisms include Chandy, Misra and
Byrant's NMP [6], and Peacock, Manning, and Wong
[11] avoided deadlock through null messages. The
primary problem associated with null messages is that
if their timestamps are chosen inappropriately, the
simulation becomes choked with null messages and
performance suffers. Some intelligent approaches to
null message generation include generation on demand
[8], and generation after a time-out [5]. Some earlier
research on discrete event simulation has focused on
variants of null message protocol (NMP, with the
objective of reducing the high null message overhead.
For instance, Bain and Scott [4] attempt to simplify the
communication topology to resolve the problem of
transmitting redundant null messages due to low
Lookahead cycles. Other recent developments [10]
have focused on incorporating knowledge about the LP
into the synchronization algorithms. Cota and Sargent
[7] focused on the skew in simulation time between
different LPs by exploiting knowledge about the LPs
and the topology of the interconnections.
Although earlier work has aimed to optimize the
performance of the NMA by proposing the variants of
the NMP [3, 4, 8, 10, 12], it has not addressed reducing
the exchange of null messages that is caused by
improper selection of the parameters.
The principal problem with the NMA is that it uses
only the current simulation time of each LP and the L
value to predict the minimum time stamp of messages it
can generate in the future [12]. These messages with
the minimum time stamp are then used to avoid
deadlock. As a result, if one of the important
parameters such as the L value is chosen poorly, the
performance will degrade significantly due to an
excessive number of null messages. However, the
prediction of minimum time stamps of messages can be
improved by understanding the relationship between
the time stamp and the L value [12].
3. Mathematical model for NMA
A PDES environment involves synchronization
overhead which is added due to the distributed nature
of simulation. With NMA, this overhead is mainly
associated with the transmission of null messages.
Therefore, when comparing the performance of a
PDES environment that uses NMA with the
performance of sequential execution, the message
overhead can make a significant performance
difference between the two approaches. Before
presenting a proposed mathematical model, it is worth
mentioning some of our key assumptions.
We assume that the value of L may change during
the execution of a Lookahead period. However, the
values of L can not instantaneously be reduced.
Initially, a constant event arrival or job intensity rate
is assumed for each participating LP in the
simulation. However, for the sake of experimental
verifications, we also consider the non-uniform
distribution of L values across multiple output lines
of an LP.
For the frequency of message transmission, we
assume that all messages are equally distributed
among the LPs. For the proposed mathematical
model, we assume that we have n number of LPs in
the simulation where all LPs are connected with each
1116
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:56:32 EST from IEEE Xplore. Restrictions apply.
o
ther by means of highly reliable mesh networks
topology. Also, each LP maintains n-1 input and
output links for both input and output neighboring
LPs, respectively.
3.1. Definition of system parameters
All model variables, along with their definition, are
listed in Table 1. Based on the concept of NMA, we
assume that each LP maintains two clock times, one for
each of its input and output neighbors as shown in
Figure 1. One is the minimum receiving time (MRT) for
the input neighbor LP whereas the second is the
minimum sending time (MST) for the output neighbor
LP. The MRT represents an earliest time when an LP
can receive an event message from one of its input
neighboring LPs, where as the MST represents an
earliest time by which an LP can send a message to one
of its neighboring LP. The performance (P) of a
conservative distributed simulation environment mainly
depends on the amount of computation required for
processing an event per second. In addition, the event
arrival rate (
ρ
) represents the number of events that
o
ccur per second (in practice, events occur per
simulation second). Unlike performance, the
parameter
ρ
mainly depends on the model. Lookahead
(L)
is measured in seconds. Frequency of transmission
(
T
F
) is the frequency of sending a message from one
L
P to another. In addition, T
Null
represents the
timestamp of a null message sent from one LP to other
LPs.
In order to measure the performance, it is
imperative to consider one parameter that can compute
simulation time advancement (STA). The STA can be
defined as a ratio of performance to event arrival rate.
This relationship can be expressed as:
ρ
=
(1)
The value of MRT is updated by the time stamp of a
null message coming from other neighboring LPs on
one of the input links of a receiving LP. Any event
scheduled by an LP must have a timestamp at least as
large as the LP’s simulation time clock [1]. This
requirement is also referred as the local causality
constraint requirement. To strictly follow this
requirement, a large number of null messages can be
transmitted by LPs before the non null-messages can be
processed. This large message overhead may degrade
the performance of a conservative distributed
simulation. It is, therefore, worth computing the ratio of
null messages to the total messages transmitted among
LPs. The NMR can be simply defined as the ratio of
total number of null messages to total messages where
total messages include both null and event messages.
Mathematically, it can be expressed as follows:
Total Number of Null Messages
NMR
T
otal Messages
=
(2)
3.2. The proposed mathematical model
First, we present a mathematical model based on the
quantitative criteria specified in [12]. In addition, the
proposed mathematical model is also based on the
internal architecture of an LP as shown in Fig. 1. The
architecture for m number of LPs is shown in Fig. 2.
Using the quantitative criteria defined in [12], we can
approximate the advancement in the simulation time as
a ratio of performance to event arrival rate. This leads
us to the following mathematical expression of the
relative speed for simulation advancement:
( )
{
}
Msg
P T E
s
(3)
Taking this into account, we can give the following
hypothesis for approximating the number of null
messages transmitted per LP: If we assume that we
have an average value of L which associates with one
of the output lines of an LP, then P can be
approximated as”:
(
)
1
M
sg
P E L
(4)
Combining (3) and (4) yields the estimated number
of null messages transmitted per LP that has only one
output line as shown in (5).
( )
( )
(
)
( )
1 1
Msg S Msg S
LP
Null E L T E T L
(5)
Furthermore, if we assume that we have O number of
output lines attached with each LP with the uniform
distribution of L value on each output line, then (5) can
Table 1
S
ystem parameter definition
Parameter Definition
P
C
omputation required for processing an event
per second
ρ
E
vent arrival rate (events per second)
MRT
M
inimum receiving time
MST
M
inimum sending time
L
L
ookahead
STA
S
imulation time advancement
F
T
F
requency of transmission
T
Null
T
imestamp of a null message
T
S
C
urrent simulation of a LP
T
Total
T
otal simulation time in seconds
1117
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:56:32 EST from IEEE Xplore. Restrictions apply.
b
e further generalized for O number of output lines per
LP as follows:
( )
( )
(
)
( )
Msg S Msg S
LP
Null E O L T E T O L
(6)
It should be noted that (6) represents total number of
null messages transmitted per LP via O number of
output lines to the neighboring LPs. If we assume that
we have m number of total LPs present in a system
where each LP has O number of output lines, then this
allows us to extend (6) and generalize it for m number
of LPs present in a distributed simulation as shown in
(7). It can be seen that (7) gives total number of null
messages exchange among all LPs.
( )
( )
(
)
( )
Msg S Msg S
m LP
Null E O L m T E T O L m
(7)
where the term
O L
in (7) shows a uniform distribution
o
f L value for O number of output lines.
The assumption of uniform distribution of
Lookahead among O output lines of an LP simplifies
the procedure for computing the number of null
messages transmitted per LP to other neighboring LPs.
However, the values for L may change during the
execution of a Lookahead period that makes the
uniform distribution assumption of Lookahead a little
unrealistic. This argument leads us to the fact that we
should also account a non-uniform distribution method
for Lookahead where each output line of an LP can
have a different value of L. We can rewrite (6) as:
( )
( )( )
( )
1 1
1
O O
Msg i S Msg S i
LP
i i
Null E L T E T L
= =
(8)
It should be noted that (8) represents the total
number of null messages transmitted per LP to other
neighboring LPs.. If we assume that the model is
partitioned into m number of total LPs where each LP
can have at most O number of output lines, this allows
us to extend (8) for m number of LPs.
( )
( )( )
( )
1 1 1 1
1
m O m O
Msg hi S Msg S hi
m LP
h
i h i
Null E L T E T L
=
= = =
(9)
It can be evident that (9) gives the total number of
null messages exchange among all LPs.
4. Performance optimization of NMA
In this section, we introduce two different ways to
optimize the performance of NMA. We first implement
the concept of frequency of transmission described in
[12] to minimize the exchange of null messages across
the LP. Secondly, we present the new concept of
variance that works with the frequency of transmission
to avoid the unnecessary generation of null messages
and consequently minimize the overall synchronization
overhead. For both concepts, we derive a closed form
mathematical expression that can be used to evaluate
the performance of NMA in the presence of deadlock
situation.
4.1. Frequency of transmission
Transmission of null-messages on each occurrence
of an event results in unnecessary generation of null
messages that causes an increase in the synchronization
overhead. We believe, instead of sending null message
after every one event, it should be transmitted with
respect to a certain frequency of transmission. This
frequency of transmission (F
T
) is a fixed amount of
time and it should be measured in simulation second
per second. Recall (2) and (4), number of events
processed per second per LP can be equated from both
equations. This yields the following approximation for
F
T
in term of L value.
Fi
gure 1. Internal architecture of an LP
Figure 2. m number of LPs with I number of
input queues and O number of output queues
1118
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:56:32 EST from IEEE Xplore. Restrictions apply.
( )
2 1 2
Msg T Msg T
E L F E L L F
(10)
Substituting the value of (10) into (5), we get,
( )
1 1
2 2
S
S
L
P
T T
Msg
Msg
T
Null E T
E
F F
(11)
Equation (11) can be generalized for O number of
output lines per LP when the number of null messages
is assumed to generate with a certain frequency of
transmission as shown in (12).
( )
2 2
S
M
sg S
LP
Msg
T T
TO O
Null E T
E
F F
(12)
Equation (12) gives an estimated number of null
messages transmitted by an LP that has O number of
output lines where each line carry an equal percentage
of the L value in terms of a fixed frequency of
transmission per output line. In addition, if we assume
that the system consists of m number of total LPs where
each LP has a fixed number of output lines, then (12)
can be further extended for m number of LPs.
( )
( )
( ) ( )
%
2 2
S
M
sg T S T
m LP
Msg
T
T
Null E m O F T m O F
E
where F L
× ×
(13)
where the denominator of (13) (i.e.,
T
F
O )
represents
a uniform rate of null message transmission per output
line. Based on (13), we can conclude that a non
uniformity in null message algorithm results a non
linear generation of null messages. An expression can
be derived for O number of output lines where each
line may carry a different value of F
T
( )
1 1
%
1
2 2
O O
M
sg
S
S
LP
i i
Msg
Ti Ti
Ti i
E
T
Null T
E
F F
where F L
= =

(14)
Furthermore, (14) can be further extended and
generalized for m number of LPs where each LP can
have at most O number of output lines.
( ) ( )
( )
( )
( )
( ) ( )
1 1
%
1 1
2
1
2
m O
Msg S Msg
m LP T ki
k i
m O
S
T ki ki
k i
T ki
Null E F T E
T where F L
F
= =
= =

(15)
4.2. Variance for null message elimination
Also, in this scenario, it is essential to cancel out the
unnecessary generation of null messages. Variance
represents the probability of cancellation of
unnecessary null messages. The value of variance may
exist between 0 and 1. It should also be subtracted from
1, so that we can show that increase in variance causes
a decrease in the over all null messages where as a
decrease in variance results an increase in null
messages. If we consider variance as 0, then it should
give us the same results that we could achieve with out
using variance. In order to reflect the variance of null
message cancellation, we can rewrite (13) for m
nu
mber of LPs with the uniform distribution of null
message transmission per output:
( )
( )
( )
%
2
(1 )
(1 ) 0 1
2
S
M
sg T
m LP
Msg
S T
T
T
Null E m O F
E
O
T m where F L and
F
σ
σ σ
×
×  <
(16)
where
σ
represents probability of null message
c
ancellation.
The same concept of null message cancellation can
be implemented with a simulation model where the L
values are non-uniformly distributed among O number
of output lines. This leads us to the following
modification in (16):
( ) ( )
( )
( )
( )
1 1
1 1
2 (1 )
1 2 (1 ) 0 1
m O
S
M
sg T
m LP ki
k i
Msg
m O
S T
ki
k i
T
Null E F
E
T F where
σ
σ σ
= =
= =
<
(17)
5. Performance analysis of NMA
For the sake of performance analysis, we simulate 5
different cases. The system is modeled in C++.
5.1. Multiple output lines per LP
Using (6), Fig.3 shows the null message transmission
with the following simulation parameters: simulation
time (Ts) = 500 sec, L is uniformly distributed per
output line. The number of output line may vary from 0
to 8 for all results as show in Fig.3. Simulation results
of Fig. 3 presents a comparison of null message
transmission per LP versus multiple output lines.
1119
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:56:32 EST from IEEE Xplore. Restrictions apply.
5.2.
Multiple LPs with multiple output lines
per LP
We assume that we have multiple LPs with O output
lines (fixed per LP). Let the output lines per LP is 4
with the (Ts) of 500 sec. Using (7), Fig.4 shows the
null message transmission with the following
simulation parameters: Ts = 500 sec, L is uniformly
distributed per output lines, the output lines are
assumed to be fixed for each LP (O = 4). The numbers
of
LPs are varied from 1 to 10 as show in Fig.4.
5.3. Multiple output lines per LP with non-
uniform distribution of Lookahead
For this simulation, we assume that we have single
LP that has O number of output lines where each
output line of an LP can have different value of L.
Using (8), Fig.5 shows the null message transmission
with the following simulation parameters: Ts = 500 sec,
L is non-uniformly distributed per output lines. The
numbers of output lines may vary from 1 to 10 as show
in Fig.5. Also, it should be noted that the value of L is
chosen randomly within the range of 0 to 1 and
assigned to each output line at run time. This random
selection may control the generation of unnecessary
null messages as long as the value is chosen
appropriately.
5.4. Multiple LPs with multiple fixed output
lines
For this simulation, we assume that we have multiple
LPs that can have fixed number of output lines where
each line of an LP can have different value of L. Using
(9), Fig.6 shows the null message transmission with the
following simulation parameters: Ts = 500 sec, L is
non-uniformly distributed per output lines. The
numbers of LPs are varied from 1 to 20. Also, it should
be noted that the value of m and O are both varying
quantity for this particular scenario. In harmony with
our expectations, the number of null messages
increases due to an increase in number of LPs.
However, this increase in null messages is limited and
controlled due to the random behavior of Lookahead.
This can also be considered as irregular networks due
to the non uniform distribution.
2 3 4 5 6 7 8
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Output lines (O) Per LP
Null Message Transmission Per LP
Null(LP) with L=0.2
Null(LP) with L=0.4
Null(LP) with L=0.6
Null(LP) with L=0.7
Figure 3. Multiple output lines per LP versus
null message transmission per LP
2 3 4 5 6 7 8 9 10
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Number Of LPs
Null Message Transmission
Null(m-LP) with L=0.2
Null(m-LP) with L=0.3
Null(m-LP) with L=0.4
Null(m-LP) with L=0.6
Null(m-LP) with L=0.7
F
igure 4. Multiple LPs with fixed output lines
per LP versus null message transmission
2 4 6 8 10 12 14 16 18 20
400
600
800
1000
1200
1400
1600
1800
2000
Number Of output lines (O) per LP
Null Message Transmission per LP
Null Messages where 0<L<1 (100 Runs)
Null Messages where 0<L<1 (300 Runs)
Null Messages where 0<L<1 (500 Runs)
Null Messages where 0<L<1 (700 Runs)
Figure 5. Multiple output lines per LP with
non-uniform distribution of L value
1120
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:56:32 EST from IEEE Xplore. Restrictions apply.
6. C
onclusion
We have proposed a mathematical model to predict
the optimum values of critical parameters that have
great impact on the performance of NMA. The derived
properties of the proposed mathematical model account
for the cases when the NMA would send too many null
messages. The proposed mathematical model provides
a quick and practical way for simulation designers to
predict whether a simulation model has potential to
perform well under NMA in a given simulation
environment by giving the approximate optimal values
of the critical parameters. We have experimentally
verified that if critical parameters, specifically the L
va
lue, are chosen intelligently, we can limit the
transmission of null messages among the LPs and
consequently improve the performance of NMA in a
distributed simulation environment.
REFERENCES
[1] R. M. Fujimoto, “Distributed Simulation system,”
Proceeding of the 2003 winter simulation conference.
College of Computing, Georgia Institute of Technology,
Atlanta.
[2] Y.M. Teo, Y.K. Ng and B.S.S. Onggo, “Conservative
Simulation using Distributed Shared Memory,”
Proceedings of the 16
th
Workshop on Parallel and
Distributed Simulation (PADS-02), IEEE Computer
Society, 2002.
[3] B. R. Preiss, W. M. Loucks, J. D. MacIntyre, J. A.
Field, “Null Message Cancellation in Conservative
Distributed Simulation, Distributed Simulation 91
Proceedings of the SCS Multiconference on Advances in
Parallel and Distributed Simulation, 1991.
[4] W. L. Bain, and D. S. Scott, "An Algorithm for Time
Synchronization in Distributed Discrete Event
Simulation", Proceedings of the SCS Multiconference
on Distributed Simulation, Vol.3, pp. 30-33, February,
1988.
[5] N. J. Davis, D. L. Mannix, W. H. Shaw, and Hartrum,
T. C., ‘‘Distributed Discrete-Event Simulation using
Null Message Algorithms on Hypercube
Architectures,’’ Journal of Parallel and Distributed
Computing, Vol. 8, No. 4, pp. 349-357, April 1990.
[6] K. M. Chandy and J. Misra, "Distributed Simulation: A
case study in design and verification of distributed
programs", IEEE Transactions on Software
Engineering, SE-5:5, pp. 440-452, 1979.
[7] B. A. Cota and R. G. Sargent, ‘‘An Algorithm for
Parallel Discrete Event Simulation using Common
Memory,’’ Proc. 22nd Ann. Simulation Symp., pp. 23-
31, March 1989.
[8] J. K. Peacock, J. W. Wong, and E. Manning,
‘‘Synchronization of Distributed Simulation using
Broadcast Algorithms,’’ Computer Networks, Vol. 4,
pp. 3-10, 1980.
[9] L. A. Belfore, S. Mazumdar, and S. S. Rizvi et al.,
“Integrating the joint operation feasibility tool with
JFAST,” Proceedings of the Fall 2006 Simulation
Interoperability Workshop, Orlando Fl, September 10-
15 2006.
[10] D. M. Nicol and P. F. Reynolds, ‘‘Problem Oriented
Protocol Design,’’ Proc. 1984 Winter Simulation Conf.,
pp. 471-474, Nov. 1984.
[11] J. K. Peacock, J. W. Wong, and E. Manning, ‘‘A
Distributed Approach to Queuing Network
Simulation,’’ Proc. 1979 Winter Simulation Conf., pp.
39 9-406, Dec. 1979.
[12] Syed S. Rizvi, K. M. Elleithy, Aasia Riasat,
“Minimizing the Null Message Exchange in
Conservative Distributed Simulation, International
Joint Conferences on Computer, Information, and
Systems Sciences, and Engineering, CISSE 2006, pp.
443-448, Bridgeport CT, December 4-14 2006.
[13] Syed S. Rizvi, Khaled M. Elleithy, and Aasia Riasat,
“Trees and Butterflies Barriers in Distributed
Simulation System: A Better Approach to Improve
Latency and the Processor Idle Time”, IEEE
International Conference on Information and Emerging
Technologies (ICIET-2007), pp. 1 6, July 06-07,
2007, Karachi, Pakistan.
2 4 6 8 10 12 14 16 18 20
600
700
800
900
1000
1100
1200
1300
1400
1500
Number Of LPs
Null Message Transmission
4 output lines per LP where 0<L<1 (100 Runs)
6 output lines per LP where 0<L<1 (300 Runs)
8 output lines per LP where 0<L<1 (500 Runs)
10 output lines per LP where 0<L<1 (700 Runs)
Figure 6. Multiple LPs and multiple fixed output
l
ines with non-uniform distribution of L versus
null message transmission
1121
Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 11:56:32 EST from IEEE Xplore. Restrictions apply.
... This paper presents a new logical process (LP) simulation model for a conservative distributed simulation that uses Null Message Algorithm (NMA) as an underlying synchronization mechanism to avoid deadlock or recover from it if it occurs. The proposed simulation model is based on the mathematical equations derived in [1] for quantifying the null message transmission. The term distributed refers to distributing the execution of a single run of a simulation program across multiple processors [2]. ...
... This clearly demands a persistent simulation model with some well driven underlying mathematical equations that can be used by simulation designers to predict the behaviour of conservative simulation for large-scale networks. Although, previous studies have evaluated the performance of conservative synchronization algorithms for transmission overhead (e.g., see [1]), most of these studies have conducted using commercial network simulators without using a simulation model. It is, therefore, hard to generalize these research results for n number of LPs and could not be used to realize the modelling and simulation of a large-scale network. ...
... In this paper, we present a new LP simulation model for distributed simulation systems where NMA is used as an underlying TMA to provide synchronization among LPs by exchanging null messages. The proposed simulation model is based on the mathematical equations derived in [1] for quantifying the synchronization messages. Our proposed LP simulation model provides a detailed view of sub system components and models such as communication interface, simulation engine, and application to describe the proper order in which coordination need to be done between participating LPs to safely execute eventmessages with a minimum transmission overhead. ...
Article
Full-text available
This paper presents a new logical process (LP) simulation model for distributed simulation systems where Null Message Algorithm (NMA) is used as an underlying time management algorithm (TMA) to provide synchronization among LPs. To extend the proposed simulation model for n number of LPs, this paper provides a detailed overview of the internal architecture of each LP and its coordination with the other LPs through sub-system components and models such as communication interface and simulation executive. The proposed architecture of LP simulation model describes the proper sequence of coordination that need to be done among LPs though different subsystem components and models to achieve synchronization. To execute the proposed LP simulation model for different set of parameters, a queuing network model is used. Experiments will be performed to verify the accuracy of the proposed simulation model using the pre-derived mathematical equations. Our numerical and simulation results can be used to observe the exchange of null messages and overhead indices.
... One of the main problems associated with the distributed simulation is the synchronization of a distributed execution. If not properly handled, synchronization problems may degrade the performance of a distributed simulation environment [5]. This situation gets more severe when the synchronization algorithm needs to run to perform a detailed logistics simulation in a distributed environment to simulate a huge amount of data [6]. ...
Conference Paper
Full-text available
One of the most common optimistic synchronization protocols for parallel simulation is the Time Warp algorithm proposed by Jefferson [12]. Time Warp algorithm is based on the virtual time paradigm that has the potential for greater exploitation of parallelism and, perhaps more importantly, greater transparency of the synchronization mechanism to the simulation programmer. It is widely believe that the optimistic Time Warp algorithm suffers from large memory consumption due to frequent rollbacks. In order to achieve optimal memory management, Time Warp algorithm needs to periodically reclaim the memory. In order to determine which event-messages have been committed and which portion of memory can be reclaimed, the computation of global virtual time (GVT) is essential. Mattern [2] uses a distributed snapshot algorithm to approximate GVT which does not rely on first in first out (FIFO) channels. Specifically, it uses ring structure to establish cuts C1 and C2 to calculate the GVT for distinguishing between the safe and unsafe event-messages. Although, distributed snapshot algorithm provides a straightforward way for computing GVT, more efficient solutions for message acknowledging and delaying of sending event messages while awaiting control messages are desired. This paper studies the memory requirement and time complexity of GVT computation. The main objective of this paper is to implement the concept of matrix with the original Mattern's GVT algorithm to speedups the process of GVT computation while at the same time reduce the memory requirement. Our analysis shows that the use of matrix in GVT computation improves the overall performance in terms of memory saving and latency.
... In addition, this layer provides a deterministic model for NMA that allows the simulation designer to choose one of the most appropriate DES protocols from OPL with respect to the model specified at MAL of OTMA framework. The details of the proposed deterministic model can be found in [23]. ...
Conference Paper
Full-text available
Recent evolutions in wireless networks will require more efficient use of the underlying parallel discrete-event simulation (PDES) synchronization protocols to accommodate the demand for large-scale network simulation. In this dissertation, we investigate underlying synchronization protocols to improve the performance of large-scale network simulators operating over PDES systems. We begin by proposing a generic optimized time management algorithms (OTMA) framework that combines the improved forms of synchronization protocols on a single platform. Particularly, for the proposed OTMA framework, we use the layered architecture approach to combine the optimized forms of conservative and optimistic time management algorithms. To support the implementation of the OTMA framework, a new m -LP (logical process) simulation model is proposed. One of the other challenges of large-scale network simulations is the lack of a realistic analytical and mathematical model for underlying PDES protocols. In this research work, the proposed OTMA framework integrates both conservative and optimistic synchronization algorithms on a single platform. In particular, for the purposes of this research, we provide an improved form of NMA by developing a new deterministic model that quantifies the performance dependent critical parameters for PDES systems. In addition, for the implementation of NMA, a new m -LP simulation model along with the varying parameters network topology is proposed. Finally, we provide a quantitative model to support the simulation results and experimental verifications for NMA. The current DES based simulators have a large end-to-end latency and poor memory utilization. OTMA framework will provide an improved form of the existing Time Wrap algorithm by proposing a new unacknowledged message list (UML) scheme. The proposed UML scheme will provide global synchronization among large number of nodes along with a fool proof solution for message transient and simultaneous reporting problems. To illustrate the implementation of the proposed UML scheme, two algorithms are proposed for coordinating and non-coordinating LPs. In order to further improve the global virtual time (GVT) computation process, synchronous barriers (such as tree and butterfly barriers) will be combined with the asynchronous algorithms (such as Time Wrap algorithm) to provide an efficient GVT computation mechanism for large-scale distributed networks.
... Bain and Scott [1] try to simplify network topology to re solve problem of null messages overhead. Recently, Rizvi Et al. [5] [6] have proposed mathematical model to quantify the null messages under different network loads. All these works are done to optimize the performance of the conservative distributed event simulations. ...
Article
Full-text available
A conservative distributed simulation requires all logical processes (LPs) to follow the causality constraint requirement. This implies that all event-messages are processed in strictly timestamp order. Apart from the timestamp of each event generated by LPs, synchronization between all LPs is the second most important requirements. Finally, there must not be a deadlock in the distributed environment. A deadlock may occur when there is no events present in the queue of LP. In such case, to avoid deadlock, Chandy-Misra-Bryant presented an algorithm called Null Message Algorithm (NMA) [3]. These null messages are passed as an event-message to other LPs and it stored in one of queues of LPs. This null message indicates that till the time stamp of that null message, all other events in the queue which have lesser time stamp than null message’s time stamp are safe to process. It means that there won’t be anyarrival of any events from that logical process until current simulation time is equal to the time stamp of the null message. With the time stamp of the null message, a Lookahead value is added to the time stamp of that null message. This Lookahead value can be measure on certain kind of parameters such as delay to transmit a message, propagation delay, etc. therefore, calculating value of Lookahead is the most important part as Lookahead value affects the performance of the conservative distributed event simulation. Proper value of Lookahead can reduce the number of null messages which decreases thetraffic of the network. In this paper, we demonstrate some calculation on the Lookahead which shows the performance of the distributed event simulation
Conference Paper
Full-text available
transmission during the G VT computation. Global virtual time (GVT) is used in parallel 1. Introduction discrete event simulations to reclaim memory, commit output, detect termination, and handle errors. The term distributed refers to distributing the Mattern's [1] has proposed GVT approximation with execution of a single run of a simulation program distributed termination detection algorithm. This across multiple processors [2]. One of the main algorithm works fine and gives optimal performance in problems associated with distributed simulation is the terms of accurate GVT computation at the expense of synchronization of distributed execution. If not slower execution rate. This slower execution rate properly handled, synchronization problems may results a high GVT latency. Due to the high GVT degrade the performance of a distributed simulation latency, the processors involve in communication environment [5]. This situation gets more severe when remain idle during that period of time. As a result, the the synchronization algorithm needs to run to perform overall throughput of a discrete event parallel a detailed logistics simulation in a distributed simulation system degrades significantly. Thus, the environment to simulate a huge amount of data as high GVT latency prevents the widespread use of this specified in "in press" [6]. algorithm in discrete event parallel simulation system. Event synchronization is an essential part of parallel However, if we could improve the latency of GVT simulation [2]. In general, synchronization protocols computation, most of the discrete event parallel can be categorized into two different families: simulation system would likely take advantage of this conservative and optimistic. Time Warp is an technique in terms of accurate GVT computation. In optimistic protocol for synchronizing parallel discrete this paper, we examine the potential use of tress and event simulations [3]. Global virtual time (GVT) is butterflies barriers with the Mattern's GVT structure used in the Time Warp synchronization mechanism to using a ring. Simulation results demonstrate that the reclaim memory, commit output, detect termination, use of tree barriers with the Mattern's GVT structure and handle errors. GVT can be considered as a global can significantly improve the latency time and thus function which is computed many times during the increase the overall throughput of the parallel course of a simulation. The time required to compute simulation system. The performance measure adopted the value of GVT may result in performance in this paper is the achievable latency for a fixed degradation due to a slower execution rate [4]. On the other hand, a small GVT latency (delay between its occurrence and detection) reduces the processor's idle Authorized licensed use limited to: University of Bridgeport. Downloaded on February 24,2010 at 13:16:22 EST from IEEE Xplore. Restrictions apply. time and thus improves the overall throughput of where as C2 guarantees that no message distributed simulation system. generated prior to the first cut is in transient. Mattem's [1] has proposed GVT approximation * For our analysis, we assume that tp is the with distributed termination detection algorithm. This required-time to send one message from one algorithm works fine and gives optimal performance in processor to its neighbor (note that this terms of accurate GVT computation at the expense of neighboring processor might be a child for C1 slower execution rate. This slower execution rate and a parent for C2). results a high GVT latency. Due to the high GVT * In addition to that, we also assume that both latency, the processors involve in communication rounds of message transmission are required
Chapter
Full-text available
The performance of a conservative time management algorithm in a distributed simulation system degrade s significantly if a large number of null messages are exchanged across the logical processes in order to avoid deadlock. This situation gets more severe when the exchange of null messages is increased due to the poor selection of key parameters such as lookahead values. However, with a mathematical model that can approximate the optimal values of parameters that are directly involved in the performance of a time management algorithm, we can limit the exchange of null messages. The reduction in the exchange of null messages greatly improves the performance of the time management algorithm by both minimizing the transmission overhead and maintaining a consistent parallelization. This paper presents a generic mathematical model that can be effectively used to evaluate the performance of a conservative distributed simulation system that uses null messages to avoid deadlock. Since the proposed mathematical model is generic, the performance of any conservative synchronization algorithm can be approximated. In addition, we develop a performance model that demonstrates that how a conservative distributed simulation system performs with the null message algorithm (NMA). The simulation results show that the performance of a conservative distributed system degrades if the NMA generates an excessive number of null messages due to the improper selection of parameters. In addition, the proposed mathematical model presents the critical role of lookahead which may increase or decrease the amount of null messages across the logical processes. Furthermore, the proposed mathematical model is not limited to NMA. It can also be used with any conservative synchronization algorithm to approximate the optimal values of parameters.
Discrete simulation is a widely used technique for system performance evaluation. The conventional approach to discrete simulation (e.g., GPSS, Simscript) does not attempt to exploit the parallelism typically available in queueing network models. In this paper, a distributed approach to discrete simulation is presented. It involves the decomposition of a simulation into components and the synchronization of these components by message passing. This approach can result in the speedup of the total time to complete a given simulation if a network of processors is available. The architecture of a microcomputer network suitable for distributed simulation is described and some results concerning the distributed approach are presented.
Conference Paper
The prevention of deadlock in certain types of distributed simulation systems requires special synchronization protocols. These protocols often create an excessive amount of performance-degrading communication; yet a protocol with the minimum amount of communication may not lead to the fastest network finishing time. We propose a protocol that attempts to balance the network's need for auxiliary synchronization information with the cost of providing that information. Using an empirical study, we demonstrate the efficiency of this protocol. Also, we show that the synchronization requirements at different interfaces may vary; an integral part of our proposal assigns a protocol to an interface according to the interface's synchronization needs.
Article
This paper explores several variants of the Chandy-Misra Null Message algorithm for distributed simulation. The Chandy-Misra algorithm is one of a class of “conservative” algorithms that maintains the correct order of simulation throughout the execution of the model by means of constraints on simulation time advance. The algorithms developed in this paper incorporate an “event-oriented” view of the physical process and message-passing. The effects of the computational workload to compute each event is related to speedup attained over an equivalent sequential simulation. The effects of network topology are investigated, and performance is evaluated for the variants on transmission of null messages. The performance analysis is supported with empirical results based on an implementation of the algorithm on an Intel iPSC 32-node hypercube multiprocessor. Results show that speedups over sequential simulation of greater than N, using N processors, can be achieved in some circumstances.
Conference Paper
Originating from basic research conducted in the 1970's and 1980's, the parallel and distributed simulation field has matured over the last few decades. Today, operational systems have been fielded for applications such as military training, analysis of communication networks, and air traffic control systems, to mention a few. The article gives an overview of technologies to distribute the execution of simulation programs over multiple computer systems. Particular emphasis is placed on synchronization (also called time management) algorithms as well as data distribution techniques
Conference Paper
Most work on parallel discrete event simulation has been based on a distributed model of computation in which processes can only communicate through message passing. Here we study parallel discrete event simulation under a common memory model of computation. An algorithm for parallel discrete event simulation is developed based on the assumption that every process has direct access to the state of any other process. The objective is to avoid the high overhead associated with null messages and request messages in distributed algorithms. This algorithm is then compared to distributed synchronization algorithms.