Conference PaperPDF Available

Implementation of Tree and Butterfly Barriers with Optimistic Time Management Algorithms for Discrete Event Simulation

Authors:

Abstract and Figures

The Time Wrap algorithm [3] offers a run time recovery mechanism that deals with the causality errors. These run time recovery mechanisms consists of rollback, anti-message, and Global Virtual Time (GVT) techniques. For rollback, there is a need to compute GVT which is used in discrete-event simulation to reclaim the memory, commit the output, detect the termination, and handle the errors. However, the computation of GVT requires dealing with transient message problem and the simultaneous reporting problem. These problems can be dealt in an efficient manner by the Samadi’s algorithm [8] which works fine in the presence of causality errors. However, the performance of both Time Wrap and Samadi’s algorithms depends on the latency involve in GVT computation. Both algorithms give poor latency for large simulation systems especially in the presence of causality errors. To improve the latency and reduce the processor ideal time, we implement tree and butterflies barriers with the optimistic algorithm. Our analysis shows that the use of synchronous barriers such as tree and butterfly with the optimistic algorithm not only minimizes the GVT latency but also minimizes the processor idle time.
Content may be subject to copyright.
Abstract: The Time Wrap algorithm [3] offers a run time
recovery mechanism that deals with the causality errors. These
run time recovery mechanisms consists of rollback, anti-
message, and Global Virtual Time (GVT) techniques. For
rollback, there is a need to compute GVT which is used in
discrete-event simulation to reclaim the memory, commit the
output, detect the termination, and handle the errors. However,
the computation of GVT requires dealing with transient
message problem and the simultaneous reporting problem.
These problems can be dealt in an efficient manner by the
Samadi’s algorithm [8] which works fine in the presence of
causality errors. However, the performance of both Time Wrap
and Samadi’s algorithms depends on the latency involve in GVT
computation. Both algorithms give poor latency for large
simulation systems especially in the presence of causality errors.
To improve the latency and reduce the processor ideal time, we
implement tree and butterflies barriers with the optimistic
algorithm. Our analysis shows that the use of synchronous
barriers such as tree and butterfly with the optimistic algorithm
not only minimizes the GVT latency but also minimizes the
processor idle time.
I. INTRODUCTION
The main problem associated with the distributed system is
the synchronization among the discrete events that run
simultaneously on multiple machines [4]. If the
synchronization problem is not properly handled, it can
degrade the performance of parallel and distributed systems
[7]. There are two types of synchronization algorithms that
could be used with the parallel and discrete-event simulation
(PDES): conservative and the optimistic synchronization
algorithms. The conservative synchronization ensures that the
local causality constrain requirement must not be violated by
the logical processes (LPs) within the simulation system [5].
On the other hand, optimistic synchronization allows the
violation of the local causality constraint requirement.
However, such violation can not only be detected at run time
1Contact author: srizvi@bridgeport.edu,
but can also be dealt by using the rollback mechanism
provided by optimistic algorithms [1, 2, 6]
The Time Wrap [2, 3] is one of the mechanisms of
optimistic time management algorithm (TMA) which
includes rollback, anti-message, and GVT computation
techniques [1, 4]. The rollback mechanism is used to remove
causality errors by dealing with straggling events. The
straggling events are referred to those events whose time-
stamp is less than the current simulation time of an LP. In
addition, the occurrence of a straggling event may cause the
propagation of incorrect events messages to the other
neighboring LPs. Optimistic TMA demands the cancellation
of all such event messages that might have been processed by
the other LPs. The anti-message is one of the techniques of
the Time Wrap algorithm that deals with the incorrect event
messages by cancelling them out.
Formally, the GVT can be defined as a minimum time-
stamp among all the unprocessed and partially processed
event messages and anti-messages present in the simulation
time at current clock time Ts. This implies that those events
whose time-stamp is strictly less that the value of GVT can
be considered as safe event message whose memory can be
reclaimed. In addition to normal event messages, the anti-
messages and the state information associated with the event
messages whose time-stamp is less than the GVT value
would also be considered as safe event which in turns allows
the global mechanism to reclaim the memory. This GVT is a
global function which might be computed several times
during the execution of distributed simulation. In other
words, the success of global control mechanism is dependent
on the factor that how fast the GVT is computed level of the
front of you who the Therefore, the time required to compute
the GVT value is critical for the optimal performance of the
optimistic TMA. If the latency for computing GVT is high,
which is true in the case of the optimistic algorithms, the
performance of optimistic TMA degrades significantly due to
a lower execution speed. The increase in the latency also
increases the processor idle time since the processor has to
wait unless the global control mechanism knows the current
value of GVT. Once it comes to know the value of GVT, the
Syed S. Rizvi1 and Dipali Shah
Computer Science and Engineering Department
University of Bridgeport
Bridgeport, CT 06601
{srizvi, dipalis}@bridgeport.edu
Aasia Riasat
Department of Computer Science
Institute of Business Management
Karachi, Pakistan 78100
Aasia.riasat@iobm.edu.pk
Implementation of Tree and Butterfly Barriers with
Optimistic Time Management Algorithms for
Discrete Event Simulation
K. Elleithy (ed.), Advanced Techniques in Computing Sciences and Software Engineering,
DOI 10.1007/978-90-481-3660-5_78, © Springer Science+Business Media B.V. 2010
event processing will then be resumed by the individual
processors of the distributed machines.
Once the GVT computation is initiated, the controller
freezes all the LPs in the system. The freezing of LP
indicates that all LPs have entered into the find mode and
they can not send and receive messages to the other
neighboring LPs unless the new GVT value is announced by
the controller. For this particular case, there is a possibility
that there might be one or more messages that are delayed
and stuck somewhere in the network. In other words, these
messages were sent by the source LP but did not yet receive
by the destination LP. These event messages are referred as
transient messages which should be considered in the GVT
computation by each LP. Since the LPs are freeze during the
GVT computation, they might not be able to receive any
transient message that arrives after they enter into the find
mode. This problem is generally referred as transient
message problem as shown in Fig. 1. In order to overcome
this problem, message acknowledgement technique is used in
which the sending LP needs to keep track that how many
event messages has been sent and how many
acknowledgements has received. The LP remains block as
long as it receives all the acknowledgements for the
transmitted packets.
Other problem faced at the time of computing GVT is the
simultaneous reporting problem as shown in Fig. 2. For this
particular scenario, once the controller sends a message to the
LP asking to start computing the local minimum, there is
always a possibility that the LP may receive one or more
event messages from the other neighboring LPs. If this
happens, there is no way that the LP can compute the correct
value of the local minimum since the LBTS computation did
not account the time-stamp of those event messages that
arrived at LP during the LBTS computation. The result
would be a wrong value of the LBTS which in turns may
cause an error in the GVT value. For instance, as shown in
Fig. 2, the controller sends the message to both LPA and LPB
asking to compute their local minimum. Once LPB receives
this message, it immediately reports its local minimum to
controller as 35. However, the second message from
controller to LPA was delayed. LPA sends one event message
to LPB with the time-stamp 30. Later, it reports its local
minimum (40) to the controller after processing an event
message with the time-stamp 40. The controller computes the
global minimum (GVT) as 35 rather than 30. This problem
was raised due to the fact that LPB did not account an event
message with the time-stamp 30.
II. THE ORIGINAL SAMADIS GVT ALGORITHM
In the Samadi’s algorithm [8], sender has to keep track that
how many messages have been sent so far, how many
acknowledgments have been received for the transmitted
messages, and how many sent message are still
unacknowledged. The transient message problem as cited
above will be solved by keeping track of all such messages.
However, in order to solve the issue of simultaneous
reporting problem, the Samadi’s algorithm requires that an
LP should place a flag on each acknowledgement message
that it transmits to the other LPs after start computing its
local minimum. This acknowledgement message with the
flag indicates that the time-stamp of this message was not
considered by the sending LP in its LBTS computation. If the
receiving LP receives this message before it starts computing
its local minimum, the receiving LP must consider the time-
stamp of the received message. In this way, we ensure that
not only the unprocessed messages but also the transient and
anti-messages are considered by all LPs while computing
their local minimum values. This in turns leads us to the
correct GVT computation. For instance, as shown Fig. 3, if
Fig.2. Representation of simultaneous reporting problem
between 2 LPs and controller. X-axis represents the global
current simulation time of the system.
Fig.1. Representation of transient message problem between
two LPs and the controller. X-axis represents the global
current simulation time of the system. The first two messages
are transmitted from controller (indicated by full dotted lines)
to initiate GVT computation. LPA and LPB compute its LBTS
values as 15 and 20, respectively and reported to the
controller (indicated by partially cut lines). A transient
message arrived with the time stamp 10 at LPA. If this time
stamp is not considered, the value of GVT will be incorrectly
com
p
uted as 15 rather than 10.
RIZVI ET AL.
Controlle
r
TS=35 TS=40
LP
A
LP
B
TS = 30
Simulation Time Î
TS is Time Stamp
Initiate GVT computation
Reporting LBTS value
Regular event-message
Controlle
r
TS=15 TS=20
LPA
TS =10
LPB
Simulation Time Î
TS is Time Stamp
Initiate GVT computation
Reporting LBTS value
Regular event-message
456
an event-message with the time-stamp 40 reaches to PB from
PA, the receiving LP has to send an acknowledgement back to
the sending LP. This acknowledgement must be flagged, so
that the receiving LP must include that value if it does not
start computing the LBTS value. If this does not properly
manage, the GVT computation will not only be delayed due
to the transient messages but also produced the incorrect
GVT values.
This implies that the role of GVT computation is critical in
the performance of optimistic TMA. Samadi’s algorithm
provides a foolproof solution to all problems cited above as
long as the algorithm is implemented as described. However,
the algorithm itself does not guarantee that the GVT
computation is fast enough that it can minimize the
processors idle time. Therefore, in order to minimize the
latency involves in GVT computation, we deploy the
synchronous barriers.
III. COMPARATIVE ANALYSIS OF TREE BARRIER
In the tree barrier (see Fig. 4), the LPs are categorized into
three different levels. The first level is the root LP which is
responsible to initiate and finish the process of GVT
computation. In the second level, we might have several non-
leaf LPs that could be a parent or child. Finally, at the lowest
level of tree, we may have several leaf LPs which do not
have any child LP. The GVT computation will be carried out
in both directions going from root to leaf via non-leaf nodes.
The root LP initiates the GVT computation by broadcasting a
single message that propagates all the way from the root node
to the leaf LPs via the non-leaf nodes.
When this initial broadcast message is received by the LPs,
each LP starts computing its local minimum value and sends
a message to the respective parent node. Similarly, a parent
node does not send a message to its parent unless it receives a
message from each child LP. The leaf nodes are exempted
from this condition. Once the non-leaf node receives a
message from all child LPs with the LBTS values, it can then
determine the local minimum and send the final value to its
parent. This cycle of LBTS computation goes on unless the
root LP receives the message from its child LPs. The final
computation will be done by the root LP in which it
determines the minimum of all LBTS values that it receives
from its child LP. Finally, a message will be broadcasted to
all LPs with the new GVT value.
This analysis shows that the number of messages
transmitted in both directions (i.e., from root to leaf LPs and
vice versa) will be the same. In other words, based on our
hypothesis, each LP in the tree barrier processes two
messages, one for receiving message whereas the other for
acknowledging the received message. After receiving the
acknowledgement from all the child nodes, the parent node
starts computing the local minimum which automatically
solves the problem of simultaneous reporting. Since there is
no freezing LP exists in the system, there is no way that a
message might left in the network until the new GVT value is
computed. Therefore, this solves the transient message
problem. This discussion implies that the tree barrier offers a
complete structure that one can use to solves the problems of
optimistic TMA while at the same time it offers a low latency
to compute the GVT value.
The total number of messages transmitted in the tree
barrier is clearly less that the Samadi’s algorithm. This is
simply because in the proposed tree barrier, there is no
message containing events as well as all the messages are
processed with respect to the number of edges via they are
connected with each other. If there are N LPs exist in the
system, they need typically N-1 number of edges. This
implies that the total number of messages need to be
transmitted in order to compute the GVT value in both
directions (i.e., from root LP to leaf and vice versa) can not
exceed to twice of the number of edges exist in the tree
Fig.3. Representation of Samadi’s GVT algorithm for dealing
with transient message problem between two LPs and the
controller, X-axis represents the global current simulation
time of the system
Fig.4. The organization of 14 LPs in a tree barriers structure.
LP0 is a controller which computes global minimum. LP1 to
LP6 are non-leaf nodes. LP7 to LP13 represents leaf nodes.
Green lines represent LBTS computation and GVT
announcement whereas red lines represent synchronization
messages
IMPLEMENTATION OF TREE AND BUTTERFLY BARRIERS
Controlle
r
TS = 45 TS = 55
LP
A
LP
B
TS = 40
Simulation Time Î
TS is Time Stamp
Initiate GVT computation
Reporting LBTS value
Regular event-message
3
87
4
109
5
12 11
6
13
1 2
0
457
barrier. The total number of messages exchanged in both
directions can be approximated as: 2 (N-1).
The reduction in the number of messages clearly reduces
the latency which in turns minimizes the processor idle time.
In other words, the latency is largely dependent on the
number of round of messages between the initiator and the
LPs as shown in Fig. 5. Since the numbers of rounds are just
two in the case of the tree barrier, the latency will be reduced
by a large magnitude when compared to the Samadi’s
algorithm. Figure 5 shows a comparison between the number
of rounds and the latency for the implementation of tree
barrier with the Samadi’s algorithm.
IV. COMPARATIVE ANALYSIS OF BUTTERFLY BARRIER
Butterfly barrier is a technique where we eliminate the
broadcast messages. Assume that there are N numbers of LPs
present in the LP. For instance, if LP2 wants to synchronize
with the other neighboring LPs once it is done with the
execution of safe event messages, it can send a
synchronization message to one of the neighboring LPs
whose binary address differ in one bit. The LP2 can only
initiate this synchronization once it is done with the
processing of safe event messages so that it can compute its
own local minimum. Therefore, the first message send from
LP2 to the other neighboring LPs will have its LBTS value.
When one of the neighboring LPs receives such
synchronization message, it first determines its own status to
see weather it has unprocessed safe event messages. If the
receiving LP has unprocessed safe event messages present, it
firsts executes them before it goes for the synchronization
with LP2. However, if the receiving LP has already finished
executing the safe event messages, it computes its own LBTS
value and compares it with the one it receives from LP2. The
minimum of the two LBTS will be selected as the local
minimum for the two synchronized LPs. The receiving LP
sends an acknowledgement to LP2 with the selected value of
the LBTS. In other words, once both of them are
synchronized, each LP must have the identical value of
LBTS which should be considered as the local minimum
within the synchronized LPs. For instance, this can be
expressed for two LPs that are synchronized after the
execution of safe event messages such as:
LBTS (LP1 and LP2) = Min {LBTSLP1, LBTSLP2}
This cycle of synchronization goes on until N numbers of
LPs are completely synchronized with each other. Once all N
numbers of LPs are synchronized, each one contains the
identical value of the minimum time-stamp. This minimum
time-stamp can then be considered as the GVT value since
this is the smallest local minimum among all LPs. In butterfly
barrier, each LP typically processes two messages. One is for
the transmission of the synchronization message and one for
its acknowledgment. In other words, each LP must send and
receive one message in each step of the algorithm. Since the
synchronization is achieved in a pair wise fashion, a total of
log2 (N) numbers of steps are required to complete the
algorithm. In other words, each LP is released from the
barrier once it has completed log2N pair wise barriers. Taking
these factors into account, one can approximate the total
numbers of messages that we must send and receive before
the GVT value is computed, such as: N log2 (N). Figure 6
shows a comparison between the number of rounds and the
GVT latency for the implementation of butterfly barrier with
the Samadi’s algorithm. It should be noted in Fig. 6 that the
performance of butterfly barrier is not as impressive as the
tree barrier due to the high latency for a large number of
rounds.
V. PERFORMANCE EVALUATION OF TREE AND
BUTTERFLY BARRIERS
Figure 7 shows a comparison between the Samadi’s
algorithm and the tress and butterfly barriers for N = 9.
Based on the simulation results of Fig. 7, one can clearly
infer that the tree barrier outperforms the Samadi’s algorithm
in terms of the numbers of message that one may need to
transmit in order to compute the GVT value. This reduction
in the number of messages not only minimizes the GVT
latency but also improves the CPU utilization by reducing
the processor idle time. However, the implementation of
butterfly barrier for computing the GVT values results in
high latency than the Samadi’s algorithm. This is mainly due
to the fact that the butterfly barrier may not perform well in
the presence of network error or delayed acknowledgement.
In other words, the simultaneous reporting problem causes
the performance degradation for the butterfly barrier.
Next, we discuss that how the butterfly barrier deals with
the common problems of the optimistic TMA such as the
transient messages, and simultaneous reporting. Like the
tree barrier, butterfly barrier also avoid the possibility of any
0
20
40
60
80
10 0
12 0
14 0
16 0
18 0
20 0
12 34 56 78 910
L
a
t
e
n
c
y
No of r ound
No of round versus Latency
GVT
Fig.5. Tree implementation with Samadi’s algorithm, Number of
rounds versus latency for N=10.
RIZVI ET AL.
458
freezing LP. Since the LP may not get involve in the
deadlock situation, there is no way that a system may have a
transient message somewhere in the network. Therefore, the
transient message should not be a problem in the case of
butterfly barrier. In addition to transient message problem,
the butterfly barrier needs to address the problem of
simultaneous reporting. The simultaneous reporting might
cause a serious problem in the case of butterfly barrier
especially when communication network is not reliable. For
instance, let LP2 sends a synchronization message to the
other LP which differs in one bit position with the LBTS
value. Also, assume that the receiving LP has finished
executing the safe event messages and has successfully
computed its own LBTS value. After comparing the two
LBTS values, it sends an acknowledgement with the local
minimum to LP2. If that acknowledgement message is not
arrived to the destination LP (i.e., LP2) due to the network
error or it is delayed, the pair wise synchronization will be
failed. As a result, when the sending LP did not hear anything
from the receiving LP, it will eventually look for the other
LPs (only LPs whose binary addresses differ in one bit) and
send a new synchronization message.
If we assume that the LP2 has successfully synchronized
with one of the other neighboring LPs, then what would
happen if the delayed acknowledgement message arrived at
LP2. Obviously, LP2 had an impression that the previous LP
was not interested in the synchronization process where as
the previous LP had an impression that I was synchronized
with LP2. For this particular scenario, the butterfly barrier
will fail to successfully compute the GVT values. However,
if we assume that the network is completely reliable and all
the links between the LPs are working without
malfunctioning, the simultaneous reporting problem can not
be raised within the pair wise synchronization of the butterfly
barrier structure.
VI. CONCLUSION
In this paper, we presented the implementation of
synchronous barriers such as tree and butterfly with the
optimistic TMA. This approach is quite new since it
combines two different families of algorithms (conservative
and optimistic) to go for the same task of synchronization.
We started our discussion from the optimistic algorithm in
general and Time Wrap and Samadi’s algorithm in particular.
We also presented an analysis that shows why an optimistic
algorithm must have the capability to deal with some
common problems like rollback, reclaiming memory,
transient messages, and simultaneous reporting. Finally, we
showed that how the tree and butterfly barrier can be
implemented with the optimistic algorithm to compute the
GVT value. Both our theoretical analysis and the simulation
results clearly suggest that the tree barrier performs well than
the pure optimistic algorithm in terms of the number of
messages that one may need to transmit to compute the GVT
value. In addition, we also discussed that how these two
barriers can deal with the common problems of optimistic
algorithms. For the future work, it will be interesting to
design the simulation where we can compare the
performance of these barriers with the Time Wrap algorithm.
REFERENCES
[1] D. Bauer, G. Yaun, C. Carothers, S. Kalyanaraman, “Seven-O’ Clock:
A new Distributed GVT Algorithm using Network Atomic
Operations,” 19th Workshop on Principles of Advanced and
Distributed Simulation (PADS'05), PP 39-48, 2005.
[2] F. Mattern, H. Mehl, A. Schoone, Tel, G. Global Virtual Time
Approximation with Distributed Termination Detection Algorithms.
Tech. Rep. RUU-CS-91-32, Department of Computer Science,
University of Utrecht, The Netherlands, 1991.
[3] F. Mattern. Efficient algorithms for distributed snapshots and global
virtual time approximations. Journal of Parallel and Distributed
Computing, 18:423--434, 1993
0
50
100
150
200
250
300
350
135791113151719
No of Rou nd
Latency
GVT
Fig.6. Butterfly barrier implementation with the Samadi’s
algorithm, GVT computation with number of rounds versus
latency for N=20
0
10
20
30
40
50
60
70
80
90
100
1234567 8
N
o
o
f
M
e
s
s
a
g
e
s
No of processor
No of processor vs messages
TREE
SAMADI
BUTTERFLY
Fig.7. Comparison of Samadi’s algorithm with the Tree and
N
=
9
IMPLEMENTATION OF TREE AND BUTTERFLY BARRIERS 459
[4] R. Fujimoto, “Distributed Simulation system,” proceeding of the 2003
winter simulation conference. College of Computing, Georgia Institute
of Technology, Atlanta.
[5] S. Rizvi, K. Elleithy, and A. Riasat, “Trees and Butterflies Barriers in
Mattern’s GVT: A Better Approach to Improve the Latency and the
Processor Idle Time for Wide Range Parallel and Distributed
Systems”, IEEE International Conference on Information and
Emerging Technologies (ICIET-2007), July 06-07, 2007, Karachi,
Pakistan.
[6] F. Mattern, “Efficient Algorithms for Distributed Snapshots and Global
virtual Time Approximation,Journal of Parallel and Distributed
Computing, Vol.18, No.4, 1993.
[7] R. Fujimoto, Parallel discrete event simulation, Communications of the
ACM, v.33 n.10, p.30-53, Oct. 1990.
[8] B. Samadi, Distributed simulation, algorithms and performance
analysis (load balancing, distributed processing), Computer Science
Department, PhD Thesis, University of California, Los Angeles, 1985.
RIZVI ET AL.
460
... Consequently, the use of matrix with the Mattern's algorithm provides several advantages such as it reduces the number of memory fetches, saves memory, increases the processor speed, and improves the latency. We incorporated the butterfly barrier as it has great performance when compared to the other broadcast and the centralized barriers [7,19,20]. ...
Conference Paper
Full-text available
One of the most common optimistic synchronization protocols for parallel simulation is the Time Warp algorithm proposed by Jefferson [12]. Time Warp algorithm is based on the virtual time paradigm that has the potential for greater exploitation of parallelism and, perhaps more importantly, greater transparency of the synchronization mechanism to the simulation programmer. It is widely believe that the optimistic Time Warp algorithm suffers from large memory consumption due to frequent rollbacks. In order to achieve optimal memory management, Time Warp algorithm needs to periodically reclaim the memory. In order to determine which event-messages have been committed and which portion of memory can be reclaimed, the computation of global virtual time (GVT) is essential. Mattern [2] uses a distributed snapshot algorithm to approximate GVT which does not rely on first in first out (FIFO) channels. Specifically, it uses ring structure to establish cuts C1 and C2 to calculate the GVT for distinguishing between the safe and unsafe event-messages. Although, distributed snapshot algorithm provides a straightforward way for computing GVT, more efficient solutions for message acknowledging and delaying of sending event messages while awaiting control messages are desired. This paper studies the memory requirement and time complexity of GVT computation. The main objective of this paper is to implement the concept of matrix with the original Mattern's GVT algorithm to speedups the process of GVT computation while at the same time reduce the memory requirement. Our analysis shows that the use of matrix in GVT computation improves the overall performance in terms of memory saving and latency.
Conference Paper
Full-text available
In this paper we introduce a new concept, network atomic operations (NAOs) to create a zero-cost consistent cut. Using NAOs, we define a wall-clock-time driven GVT algorithm called Seven O'Clock that is an extension of Fujimoto's shared memory GVT algorithm. Using this new GVT algorithm, we report good optimistic parallel performance on a cluster of state-of-the-art Itanium-II quad processor systems for both benchmark applications such as PHOLD and real-world applications such as a large-scale TCP/Internet model. In some cases, super-linear speedup is observed.
Conference Paper
Full-text available
Global virtual time (GVT) is used in parallel discrete event simulations to reclaim memory, commit output, detect termination, and handle errors. Mattern 's [I] has proposed G VT approximation with distributed termination detection algorithm. This algorithm works fine and gives optimal performance in terms of accurate GVT computation at the expense of slower execution rate. This slower execution rate results a high GVT latency. Due to the high GVT latency, the processors involve in communication remain idle during that period of time. As a result, the overall throughput of a discrete event parallel simulation system degrades significantly. Thus, the high G VT latency prevents the widespread use of this algorithm in discrete event parallel simulation system. However, if we could improve the latency of GVT computation, most of the discrete event parallel simulation system would likely take advantage of this technique in terms of accurate G VT computation. In this paper, we examine the potential use of tress and butterflies barriers with the Mattern's GVT structure using a ring. Simulation results demonstrate that the use of tree barriers with the Mattern's GVT structure can significantly improve the latency time and thus increase the overall throughput of the parallel simulation system. The performance measure adopted in this paper is the achievable latency for a fixed number of processors and the number of message transmission during the G VT computation.
Article
This tutorial surveys the state of the art in executing discrete event simulation programs on a parallel computer. Specifically, we will focus attention on asynchronous simulation programs where few events occur at any single point in simulated time, necessitating the concurrent execution of events occurring at different points in time.We first describe the parallel discrete event simulation problem, and examine why it so difficult. We review several simulation strategies that have been proposed, and discuss the underlying ideas on which they are based. We critique existing approaches in order to clarify their respective strengths and weaknesses.
Article
Thesis (Ph. D.)--University of California, Los Angeles, 1985. Typescript (photocopy). Vita. Includes bibliographical references (leaves 211-216).
Article
Parallel discrete event simulation (PDES), sometimes called distributed simulation, refers to the execution of a single discrete event simulation program on a parallel computer. PDES has attracted a considerable amount of interest in recent years. From a pragmatic standpoint, this interest arises from the fact that large simulations in engineering, computer science, economics, and military applications, to mention a few, consume enormous amounts of time on sequential machines. From an academic point of view, parallel simulation is interesting because it represents a problem domain that often contains substantial amounts of parallelism (e.g., see [59]), yet paradoxically, is surprisingly difficult to parallelize in practice. A sufficiently general solution to the PDES problem may lead to new insights in parallel computation as a whole. Historically, the irregular, data-dependent nature of PDES programs has identified it as an application where vectorization techniques using supercomputer hardware provide little benefit [14]. A discrete event simulation model assumes the system being simulated only changes state at discrete points in simulated time. The simulation model jumps from one state to another upon the occurrence of an event. For example, a simulator of a store-and-forward communication network might include state variables to indicate the length of message queues, the status of communication links (busy or idle), etc. Typical events might include arrival of a message at some node in the network, forwarding a message to another network node, component failures, etc. We are especially concerned with the simulation of asynchronous systems where events are not synchronized by a global clock, but rather, occur at irregular time intervals. For these systems, few simulator events occur at any single point in simulated time; therefore parallelization techniques based on lock-step execution using a global simulation clock perform poorly or require assumptions in the timing model that may compromise the fidelity of the simulation. Concurrent execution of events at different points in simulated time is required, but as we shall soon see, this introduces interesting synchronization problems that are at the heart of the PDES problem. This article deals with the execution of a simulation program on a parallel computer by decomposing the simulation application into a set of concurrently executing processes. For completeness, we conclude this section by mentioning other approaches to exploiting parallelism in simulation problems. Comfort and Shepard et al. have proposed using dedicated functional units to implement specific sequential simulation functions, (e.g., event list manipulation and random number generation [20, 23, 47]). This method can provide only a limited amount of speedup, however. Zhang, Zeigler, and Concepcion use the hierarchical decomposition of the simulation model to allow an event consisting of several subevents to be processed concurrently [21, 98]. A third alternative is to execute independent, sequential simulation programs on different processors [11, 39]. This replicated trials approach is useful if the simulation is largely stochastic and one is performing long simulation runs to reduce variance, or if one is attempting to simulate a specific simulation problem across a large number of different parameter settings. However, one drawback with this approach is that each processor must contain sufficient memory to hold the entire simulation. Furthermore, this approach is less suitable in a design environment where results of one experiment are used to determine the experiment that should be performed next because one must wait for a sequential execution to be completed before results are obtained.
Conference Paper
An overview of technologies concerned with distributing the execution of simulation programs across multiple processors is presented. Here, particular emphasis is placed on discrete event simulations. The High Level Architecture (HLA) developed by the Department of Defense in the United States is first described to provide a concrete example of a contemporary approach to distributed simulation. The remainder of this paper is focused on time management, a central issue concerning the synchronization of computations on different processors. Time management algorithms broadly fall into two categories, termed conservative and optimistic synchronization. A survey of both conservative and optimistic algorithms is presented focusing on fundamental principles and mechanisms. Finally, time management in the HLA is discussed as a means to illustrate how this standard supports both approaches to synchronization.
Article
. This paper presents snapshot algorithms for determining a consistent global state of a distributed system without significantly affecting the underlying computation. These algorithms do not require channels to be FIFO or messages to be acknowledged. Only a small amount of storage is needed. An important application of a snapshot algorithm is Global Virtual Time determination for distributed simulations. The paper proposes new and efficient Global Virtual Time approximation schemes based on snapshot algorithms and distributed termination detection principles. 1 Introduction A snapshot of a distributed system is a global state (consisting of the local states of the processes and all the messages in transit) which is meaningful in the sense that it corresponds to a possible global state where the local states of all processes and of all communication channels are recorded simultaneously [5]. In order to get such a causally consistent state in a system without a common clock, the local...
Article
It is shown that distributed termination detection algorithms can be transformed into efficient algorithms to approximate the so-called Global Virtual Time (GVT) of a distributed monotonic computation. Typical instances of such computations are optimistic distributed simulations based on the timewarp principle. The transformation is exemplified for two termination detection algorithms, namely an algorithm by Dijkstra et al. and a new scheme based on the principle of "sticky flags". The general idea of the transformation is that many termination detection algorithms (viz., one for each possible GVT value) run in parallel. Each algorithm determines a specific lower bound The work of H. Mehl is supported by the German National Science Foundation (Deutsche Forschungsgemeinschaft) under grant SPP-322671.
Distributed simulation, algorithms and performance analysis (load balancing, distributed processing) Computer Science Department
  • B Samadi
B. Samadi, Distributed simulation, algorithms and performance analysis (load balancing, distributed processing), Computer Science Department, PhD Thesis, University of California, Los Angeles, 1985. RIZVI ET AL.