Article

pGVT: an algorithm for accurate GVT estimation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The time warp mechanism uses memory space to save event and state information for rollback processing. As the simulation advances in time, old state and event information can be discarded and the memory space reclaimed. This reclamation process is called fossil collection and is guided by a global time value called Global Virtual Time (GVT). That is, GVT represents the greatest minimum time of the fully committed events (the time before which no rollback will occur). GVT is then used to establish a boundary for fossil collection. This paper presents a new algorithm for GVT estimation called pGVT. pGVT was designed to support accurate estimates of the actual GVT value and it operates in an environment where the communication subsystem does not support FIFO message delivery and where message delivery failure may occur. We show that pGVT correctly estimates GVT values and present some performance comparisons with other GVT algorithms.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... There are some GVT algorithms that cannot be easily grouped into any of the three categories discussed so far. The pGVT algorithm [15] uses a GVT manager to monitor the progress of every processor and to compute the GVT based on information collected from processors. Processors are required to report to the GVT manager whenever they receive a straggler message. ...
... The Continuously Monitored GVT (CMGVT) algorithm allows processors to calculate GVT based on the local information constantly available to each processor. supplemented with the global information, such as the local virtual time of each processor and information about messages in transit, that is appended to simulation D'Souza, Fan, and Wilsey [15] Report stragglers to a GVT manager Yes No Any 2 [15] Bauer, Yuan, Carothers, Yuksel, and Kalyanaraman [23] Extend Fujimoto's sharedmemory GVT algorithm with the notion of network atomic operations No No Maximum Delay 16 [23] Deelman and Szymanski [24] Use vector and matrix clocks to keep track of messages in transit No Yes Any 16 [24] messages [24]. Hence, the algorithm works well when there is a lot of simulation message traffic and the communication is local, as was the case of the spatially explicit simulations considered in [24]. ...
... The Continuously Monitored GVT (CMGVT) algorithm allows processors to calculate GVT based on the local information constantly available to each processor. supplemented with the global information, such as the local virtual time of each processor and information about messages in transit, that is appended to simulation D'Souza, Fan, and Wilsey [15] Report stragglers to a GVT manager Yes No Any 2 [15] Bauer, Yuan, Carothers, Yuksel, and Kalyanaraman [23] Extend Fujimoto's sharedmemory GVT algorithm with the notion of network atomic operations No No Maximum Delay 16 [23] Deelman and Szymanski [24] Use vector and matrix clocks to keep track of messages in transit No Yes Any 16 [24] messages [24]. Hence, the algorithm works well when there is a lot of simulation message traffic and the communication is local, as was the case of the spatially explicit simulations considered in [24]. ...
Article
Full-text available
This paper presents a new Global Virtual Time (GVT) algorithm, called TQ-GVT that is at the heart of a new high performance Time Warp simulator designed for large-scale clusters. Starting with a survey of numerous existing GVT algorithms, the paper discusses how other GVT solutions, especially Mattern's GVT algorithm, influenced the design of TQ-GVT, as well as how it avoided several types of overheads that arise in clusters executing parallel discrete simulations. The algorithm is presented in details, with a proof of its correctness. Its effectiveness is then verified by experimental results obtained on more than 1,000 processors for two applications, one synthetic workload and the other a spiking neuron network simulation.
... Other GVT algorithms with different ideas are represented as follows. To compute GVT value, D'Souza et al. applied a GVT manager to collect information from all LPs [37]. Similarly, Chen and Szymanski used hierarchy GVT masters to collect GVT reports from LPs passively and then distribute the new GVT value to LPs [17]. ...
... By using the communication topology of processors, the workload of GVT computation is distributed to all processor. Our algorithm does not use a GVT monitor or GVT master to perform GVT calculation like other algorithms [17,21,32,37]. Each processor maintains a counter indicating the index of the current EB and increases the counter at the end of the EB. ...
Article
Full-text available
Global Virtual Time computation of Parallel Discrete Event Simulation is crucial for conducting fossil collection and detecting the termination of simulation. The triggering condition of GVT computation in typical approaches is generally based on the wall-clock time or logical time intervals. However, the GVT value depends on the timestamps of events rather than the wall-clock time or logical time intervals. Therefore, it is difficult for the existing approaches to select appropriate time intervals to compute the GVT value. In this study, we propose a scalable GVT estimation algorithm based on Lower Bound of Event-Bulk-Time, which triggers the computation of the GVT value according to the number of processed events. In order to calculate the number of transient messages, our algorithm employs Event-Bulk to record the messages sent and received by Logical Processes. To eliminate the performance bottleneck, we adopt an overlapping computation approach to distribute the workload of GVT computation to all worker-threads. We compare our algorithm with the fast asynchronous GVT algorithm using PHOLD benchmark on the shared memory machine. Experimental results indicate that our algorithm has a light overhead and shows higher speedup and accuracy of GVT computation than the fast asynchronous GVT algorithm.
... GVT is also essential for commitment of output. There have been several algorithms proposed for computing GVT including [23], [24], [25]. As of now, a centralized GVT management algorithm, Samadi's algorithm is being used. ...
... On the other hand, it is very easy to compute, given the minima from each LP. Other distributed algorithms which are more efficient in terms of messages have been proposed [24], [25] . However, they are computationally more expensive than Samadi's algorithm. ...
Article
One of the methods used to reduce the time spent simulating VHDL designs is by parallelizing the simulation. In this paper, we describe the implementation of an objectoriented Time Warp simulator for VHDL on an actor based environment. The actor model of computation allows the exploitation of fine grained parallelism in a truly asynchronous manner and allows for the overlap of computation with communication. Some preliminary results obtained by simulating a set of multipliers and some ISCAS benchmark circuits are provided. In addition, the importance of placing processes based on circuit partitioning techniques for improving runtimes and scalability is demonstrated. Results are reported on a Sun SPARCServer 1000 and an Intel Paragon. 1 Introduction The design of a digital VLSI system commonly begins with a description of the system being written in a Hardware Description Language, an example of which is VHDL [1]. Subsequent to verifying the functionality of the description, it is give...
... Fossil-collection occurs either as a \scavenge all fossils" operation 9] or a \scavenge one item" operation (on-they fossil collection) 2]. In addition, GVT estimates can be maintained continuously 3,12] or explicitly requested (usually when memory space is exhausted) 9]. Algorithms that continuously update GVT vary according to the level of aggressiveness in updating GVT. ...
... In order to minimize state saving overhead, states may be saved incrementally, or periodically | reducing the cost of state-saving, but requiring an overhead when a rollback occurs 5].3 Here GVT refers to the true simulation GVT; by the time GVT estimation algorithms reach a consensus on GVT, true GVT could have further advanced. ...
Article
. In the Time-Warp synchronization model, the processes must occasionally interrupt execution in order to reclaim memory space used by state and event histories that are no longer needed (fossil-collection). Traditionally, fossil-collection techniques have required the processes to reach a consensus on the Global Virtual-Time (GVT) --- the global progress time. Events with time-stamps less than GVT are guaranteed to have been processed correctly; their histories can be safely collected. This paper presents Optimistic Fossil-Collection (OFC), a new fossilcollection algorithm that is fully distributed. OFC uses a local decision function to estimate the fossilized portion of the histories (and optimistically collects them). Because a global property is estimated using local information only, an erroneous estimate is possible. Accordingly, OFC must also include a recovery mechanism to be feasible. An uncoordinated distributed checkpointing algorithm for Time-Warp that is domi...
... No rollback can reset a local clock to a value below the GVT, so the saved state from before the GVT can be discarded, and simulation results up to the GVT can be committed. Several algorithms exist for computing GVT, and trade off the computation cost of time spent computing GVT with the benefit of reducing memory demands [37]. ...
... In recent years, researchers at the University of Cincinnati and MTL Systems, Inc., developed the QUEST simulator [38] followed by the VAST simulator [39]. These simulators facilitated research into many of the issues discussed above, including lazy or aggressive cancellation, throttling mechanisms, and state saving [31,32,33,34,35,36,37]. The QUEST simulator was developed for distributed memory architectures, and a version of QUEST also exists for execution using MPI (thus supporting either a shared-memory or distributed-memory model, but not a mix). ...
Article
Full-text available
This paper presents a set of recommended hardware description language modeling practices to help modelers and users achieve high-performance parallel simulation. This work begins with a taxonomy of parallel HDL simulation techniques, which provides a structure for the recommended practices. Research efforts and commercial products fit in this context to better understand them and the appropriate methods for effective parallel simulation.
... The approaches to improve GVT estimation have fallen chie y along three lines. Those that work to improving the frequency of GVT estimations, thereby reducing the amount of saved history information 7, 27,26]; those that attempt to reduce the number of uncommitted events via cancel-back, bounded time windows, or some other ow control mechanism 23] and those that attempt to eliminate the need for GVT estimation 84,85]. ...
... D'Souza et al 26,27] use an algorithm called pGVT to improve the frequency and accuracy of GVT estimation. pGVT removes a portion of messaging overhead by letting the LPs decide when to report new GVT information to the GVT Manager. ...
Article
Many Time Warp simulation tools are used by a wide variety of application developers, each with different demands and patterns of use. It is unlikely, under these circumstances, for off-the-shelf simulation software to be "optimal" for any application in any processing environment. The main form of adaptation that is presently available is hand-crafted and problem specific; where the needs and patterns of use of the application are defined and the Time Warp simulation kernel software is fitted to optimize the performance of this typical application. The problem with this is, by their nature, Time Warp simulations are subject to constant change and adaptation. This situation is exacerbated by changes in network topology and hardware platforms. For most simulations, successfully adapting to the imbalances in the system is often a question of dynamically adjusting the right set of parameters in the executing simulation. Unfortunately, due to the dynamic nature of Time Warp simulation systems, identification of this critical set of parameters is not trivial. Also, modifying these parameters in the simulation system affects both the executing simulation and the execution environment. Hence, in addition to studying methods to adjust the set of critical parameters, the effect of these adjustments on the execution and the system resources must also be investigated.
... D'Souza et. al [17], proposes a statistical approach to estimating GVT. Pancerella [18] propose a hardware based scheme whereby host systems using custom network interface cards are interconnected to form an efficient reduction network to rapidly compute GVT. ...
Conference Paper
Full-text available
One of the most common optimistic synchronization protocols for parallel simulation is the Time Warp algorithm proposed by Jefferson [12]. Time Warp algorithm is based on the virtual time paradigm that has the potential for greater exploitation of parallelism and, perhaps more importantly, greater transparency of the synchronization mechanism to the simulation programmer. It is widely believe that the optimistic Time Warp algorithm suffers from large memory consumption due to frequent rollbacks. In order to achieve optimal memory management, Time Warp algorithm needs to periodically reclaim the memory. In order to determine which event-messages have been committed and which portion of memory can be reclaimed, the computation of global virtual time (GVT) is essential. Mattern [2] uses a distributed snapshot algorithm to approximate GVT which does not rely on first in first out (FIFO) channels. Specifically, it uses ring structure to establish cuts C1 and C2 to calculate the GVT for distinguishing between the safe and unsafe event-messages. Although, distributed snapshot algorithm provides a straightforward way for computing GVT, more efficient solutions for message acknowledging and delaying of sending event messages while awaiting control messages are desired. This paper studies the memory requirement and time complexity of GVT computation. The main objective of this paper is to implement the concept of matrix with the original Mattern's GVT algorithm to speedups the process of GVT computation while at the same time reduce the memory requirement. Our analysis shows that the use of matrix in GVT computation improves the overall performance in terms of memory saving and latency.
... The algorithm works by specifying a Target Virtual Time TVT and an initiator to detect when GVT ~ TVT. The idea of passive GVT calculation was presented by D'Souza, Fan, and Wilsey [10]. In their passive response GVT algorithm, called pGVT, a central GVT manager calculates new GVT values from information reported by logical processes. ...
Article
Full-text available
Parallel and Distributed Simulation (PADS) algorithms are typically categorized to belong to one of two categories. They are either conservative or optimistic with respect to the method of handling causality. Conservative systems strictly preserve causality, while optimistic systems detect and correct causality errors when they occur. Time Warp is the basis of optimistic algorithms where rolling back the simulation clock allows the simulation to correct for errors. The Global Virtual Time (GVT) is the variable that maintains information about simulation progress, termination decision, and for committing input/output data. In this paper the basis for an environment for visualization distributed simulations with time warp on a network of UNIX workstations is presented. The visualization environment provides a graphical overview of simulation processes, and provides insight for algorithm performance. Extensions to the visualizations are also possible for animation of simulation results.
... In Time Warp optimistic simulation, fossil reclamation through GVT estimation is a prominent and well proven technique (Jefferson 1985 ). There are several GVT algorithms proposed in the literature (Lin and Lazowska 1990; Bellenot 1990; D'Souza, Fan, and Wilsey 1994; Mattern 1993; Fujimoto and Hybinette 1994; Tomlinson and Garg 1993). These algorithms either measure the rate of virtual time progress (D'Souza, Fan, and Wilsey 1994) or identify consistent snapshots (Mattern 1993) or keeps track of the peak and valley messages (Lin and Lazowska 1990). ...
Article
This paper presents a time warp fossil collection mechanism that functions without need for a GVT estimation algorithm. Effectively each logical process (LP) collects causality information during normal event execution and then each LP utilizes this information to identify fossils. In this mechanism, LPs use constant size vectors (that are independent of the total number of parallel simulation objects) as timestamps called Plausible Total Clocks to disseminate causality information. For proper operation, this mechanism requires that the communication layer preserves a FIFO ordering on messages. A detailed description of this new fossil collection mechanism and its proof of correctness is presented in this paper
... The process of identifying and reclaiming this space is called fossil collection. The global time against which fossil collection algorithms operate is called the global virtual time (or GVT) and several algorithms for GVT estimation have been proposed 3,4,5,6,7,8]. In addition to its use for fossil collection, GVT is also useful for deciding when irrevocable operations (such as I/O) can be performed and, in some instances, when the simulation has completed. ...
... At about the same time, D'Souza et. al [4], proposes a statistical approach to estimating GVT. Prior to Fujimoto's GVT algorithm, Xiao et al. [27] , proposes an asynchronous GVT algorithm that exploits sharedmemory multiprocessor architectures. ...
Conference Paper
Full-text available
In this paper we introduce a new concept, network atomic operations (NAOs) to create a zero-cost consistent cut. Using NAOs, we define a wall-clock-time driven GVT algorithm called Seven O'Clock that is an extension of Fujimoto's shared memory GVT algorithm. Using this new GVT algorithm, we report good optimistic parallel performance on a cluster of state-of-the-art Itanium-II quad processor systems for both benchmark applications such as PHOLD and real-world applications such as a large-scale TCP/Internet model. In some cases, super-linear speedup is observed.
... The process of identifying and reclaiming this space is called fossil collection. The global time against which fossil collection algorithms operate is called the Global Virtual Time (or GVT) and several algorithms for GVT estimation have been proposed [5] [9] [16] [19]. In addition to its use for fossil collection, GVT is also useful for deciding when irrevocable operations can be performed and, in some instances, when the simulation has completed. ...
Conference Paper
Several optimizations to the Time Warp synchronization pro to- col for parallel discrete event simulation have been propos ed and studied. Many of these optimizations have included some for m of dynamic adjustment (or control) of the operating parameter s of the simulation (e.g., checkpoint interval, cancellation strategy). Tra- ditionally dynamic parameter adjustment has been performe d at the simulation object level; each simulation object collec ts mea- sures of its operating behaviors (e.g., rollback frequency, rollback length, etc) and uses them to adjust its operating parameter s. The performance data collection functions and parameter adjus tment are overhead costs that are incurred in the expectation of hi gher throughput. This paper presents a method of eliminating som e of these overheads through the use of an external object to adju st the control parameters. That is, instead of inserting code for a djusting simulation parameters in the simulation object, an externa l control object is defined to periodically analyze each simulation ob ject's performance data and revise that object's operating parame ters. An implementation of an external control object in theWARPED Time Warp simulation kernel has been completed. The simula- tion parameters updated by the implemented control system a re: checkpoint interval, and cancellation strategy (lazy or ag gressive). A comparative analysis of three test cases shows that the ext ernal control mechanism provides speedups between 5%-17% over th e best performing embedded dynamic adjustment algorithms.
... GVT estimate operates concurrently with the simulation: if it is carried out aggressively, it incurs a higher overhead but the obtained estimate is tighter, allowing more timely garbage collection. WARPED implements two GVT algorithms -pGVT [10] and Mattern's algorithm [18]. We use Mattern's algorithm because it has a lower overhead and produces good estimates. ...
Conference Paper
Full-text available
This paper explores optimization of parallel discrete event simulators (PDES) on a cluster of workstations with programmable network interface cards (NICs). We explore reprogramming the firmware on the NIC to optimize the performance of distributed simulation. This is a new implementation model for distributed applications where: (i) application specific communication optimizations can be implemented on the NIC; (ii) portions of the application that are most heavily communicating can be migrated to the NIC; (iii) some messages can be filtered out at the NIC without burdening the primary processor resources; and (iv) critical events are detected and handled early. The combined effect is to optimize the application communication behavior as well as reduce the load on the host processor resources. We explore this new model by implementing two optimizations to a time-warp simulator on the NIC: (1) the migration of the global virtual time estimation algorithm to the NIC; and (2) early cancellation of messages in place upon early detection of rollbacks. We believe that the model generalizes to other distributed applications
... This global snapshot can not be implemented directly in a distributed system, but an estimate of GVT can be computed. Numerous distributed algorithms 3,4,7,9,17,20,29] to estimate GVT have been proposed. According to the de nition of GVT, it represents a lower bound to the time to which a process can rollback. ...
Article
Any digital system has to be tested for correctness before manufacturing it to keep costs down. Simulation of digital models written in a Hardware Description Language provides a fast and inexpensive testing method. VHDL is such a widely accepted modeling language. This thesis explores the design and implementation of one such VHDL simulator, TyVIS, based on a parallel discrete event simulation paradigm Time Warp. The input VHDL is parsed and analyzed into an intermediate representation prescribed by the AIRE standard. A code generator translates this intermediate representation to an equivalent C++ implementation. The design of these C++ equivalent forms are discussed in detail, explaining their relevance in light of the Time Warp simulation algorithm used. warped, a simulation kernel based on Time Warp is used as the underlying simulation engine. While this enables a TyVIS simulation to be distributed across a network of workstations, it poses some restrictions on the design. The TyVIS kernel provides an elegant solution to these constraints. To my loving parents, and my dearest sister Acknowledgements I wish to convey my sincere thanks to Dr. Philip A. Wilsey for providing me the guidance, motivation, and opportunity to work with him in this wonderful project. He has directed me towards the solution every time without spoon feeding me. Though I had to struggle initially, it eventually led me to the correct place, and I learnt a lot in the process. Thanks are due to my thesis committee members Dr. Perry Alexander and Dr. Hal Carter for taking their valuable time off, and for providing helpful suggestions. I can not undermine the help rendered by my lab mates Malolan Chetlur, Radharamanan Radhakrishnan, Dhananjai Madhava Rao, and Tim McBrayer in the development of...
... This global snapshot cannot be implemented directly in a distributed system, but an estimate of GVT can be computed. Numerous distributed algorithms 6,7,26,47,51,76] to estimate GVT have been proposed. According to the de nition of GVT, it represents a lower bound to the time to which a process can rollback. ...
Article
With the advent of cheap and powerful hardware for workstations and networks, a new cluster-based architecture for Time Warp simulations has been envisioned. However, fine-grained Time Warp applications that communicate frequently are not the ideal candidates for such architectures due to their high latency communication costs. Hence, designers of fine-grained Time Warp applications on clusters are faced with the problem of reducing the high communication latency of the communication subsystem in such architectures. An efficient communication subsystem consumes a lower fraction of the processing cycles for communication operations and allows the majority of the processing cycles to be used by the application. This increases the performance of Time Warp applications. This thesis reduces the latency of the communication subsystem by selecting one of the following approaches: (i) reducing network latency by employing a higher performance network hardware (i.e., Fast Ethernet versus Myrinet) and (ii) using more efficient communication libraries (MPICH versus MPI-BIP on Myrinet and TCPMPL (TCP/IP based custom message passing layer) on Fast Ethernet). TCPMPL was developed as part of this research after extensive investigations of the communication subsystem. In addition, this thesis evaluates the performance of different message passing libraries (on Fast Ethernet) to determine the most suitable library for warped, an extant Time Warp simulation kernel. The communication subsystem of warped was suitably modified to perform the aforementioned studies. The performance of MPI-BIP was studied with warped applications and an improvement of 96% in the execution time was observed for most applications. The performance of TCPMPL was also studied and an improvement of 75% in the exec...
... The process of identifying and reclaiming this space is called fossil collection. The global time against which fossil collection algorithms operate is called the global virtual time (or GVT) and several algorithms for GVT estimation have been proposed 3,4,5,6,7,8]. In addition to its use for fossil collection, GVT is also useful for deciding when irrevocable operations (such as I/O) can be performed and, in some instances, when the simulation has completed. ...
Article
Parallelization is a popular technique for improving the performance of discrete event simulation. Due to the complex, distributed nature of parallel simulation algorithms, debugging implemented systems is a daunting, if not impossible task. Developers are plagued with transient errors that prove difficult to replicate and eliminate. Recently, researchers at The University of Cincinnati developed a parallel simulation kernel, warped, implementing a generic parallel discrete event simulator based on the Time Warp optimistic synchronization algorithm. The intent was to provide a common base from which domain specific simulators can be developed. Due to the complexity of the Time Warp algorithm and the dependence of many simulators on the simulation kernel's correctness, a formal specicification was developed and verified for critical aspects of the Time Warp system. This paper describes these specifications, their verification and their interaction with the development process....
Chapter
In the previous chapter several systems that offer a different approach to load distribution have been described. This chapter now deals with application based solutions to distribute the workload. Again the presentation usees the terminology defined in chapter 2.
Conference Paper
Global Virtual Time (GVT) is the fundamental synchronization concept in optimistic simulations. It is defined as the earliest time tag within the set of unprocessed pending events in a distributed simulation. A number of techniques for determining GVT have been proposed in recent years, each having their own intrinsic properties. However, most of these techniques either focus on specific types of simulation problems or assume specific hardware support. This paper specifically addresses the GVT problem in the context of the following areas: Scalability, Efficiency, Portability, Flow control, Interactive support, Real time use. A new GVT algorithm, called SPEEDES GVT, has been developed in the Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) framework. The new algorithm runs periodically but does not disrupt event processing. It provides flow control by processing events risk-free while flushing out messages during the GVT computation. SPEEDES GVT is built from a set of global reduction operations that are easily implementable on any hardware system
Article
Full-text available
An introduction to the field of Parallel and Distributed Simulation (PADS) is given. The capabilities and limitations of currently used PADS techniques are discussed. A review of the recently developed hybrid and adaptive PADS techniques is also given. Sample performance results of some PADS techniques are presented using a network of workstations.
Article
Full-text available
Time Wrap algorithm is a well-known mechanism of optimistic synchronization in a parallel discrete-event simulation (PDES) system. It offers a run time recovery mechanism that deals with the causality errors. For an efficient use of rollback, the global virtual time (GVT) computation is performed to reclaim the memory, commit the output, detect the termination, and handle the errors. This paper presents a new unacknowledged message list (UML) scheme for an efficient and accurate GVT computation. The proposed UML scheme is based on the assumption that certain variables are accessible by all processors. In addition to GVT computation, the proposed UML scheme provides an effective solution for both simultaneous reporting and transient message problems in the context of synchronous algorithm. To support the proposed UML approach, two algorithms are presented in details, with a proof of its correctness. Empirical evidence from an experimental study of the proposed UML scheme on PHOLD benchmark fully confirms the theoretical outcomes of this paper.
Chapter
Trends and Challenges of Distributed Simulation A Brief History of Distributed Simulation Synchronization Algorithms for Parallel and Distributed Simulation Distributed Simulation Middleware Conclusion References
Conference Paper
A monitoring circuit for individual photovoltaic (PV) panels in grid-connected systems is proposed, which exhibits a number of features devised to simplify and reduce cost of diagnostics and maintenance of the PV plant. In particular, the system is provided with an effective energy harvesting supply stage, which eliminates the requirement for external supply or batteries; furthermore, no cables are needed for data transfer due to the adoption of a rugged wireless connectivity.
Article
Time Warp is the most common mechanism used for implementing optimistically synchronized Parallel Discrete Event Simulation (PDES). Rollback relaxation is an optimization to Time Warp that reduces the space and time requirements of rollback. Rollback relaxation is applicable to simulation systems that contain memoryless components (i.e., components whose output at any instant of time is determined completely by its inputs at that time). For such components, a complete rollback is not necessary for the correct completion of simulation. Instead, on the receipt of a straggler message, a rollback relaxed processes merely aligns the input set to send new, and validate already sent, output messages. This optimization has been implemented and has experimentally proven to enhance the performance of Time Warp simulations. However, no formal proof of the correctness of rollback relaxation exists (although correctness proofs of Time Warp do). In this paper, we formally specify and verify the correctness of rollback relaxation. The problem is specified using the Prototype Verification System (PVS) Specification Language and proved using the PVS Prover.
Conference Paper
We developed and implemented two highly optimized optimistic discrete event simulation techniques based on an efficient and scalable Parallel Heap data structure as a global event queue. The primary results are (i) the design of an optimistic simulation algorithm, namely SyncSim, which does not rely on traditional state and message saving data structures, but employs only one backup state per state variable, (ii) a demonstration, through implementation of SyncSim, of an optimistic technique which overcomes the two main mutually conflicting and unbounded overheads of the existing optimistic simulation algorithms: SyncSim bounds the additional space requirements to just one copy per state variable and drastically limits the number of rollbacks encountered. Furthermore, SyncSim beats the highly optimized traditional simulator simglobal on a wide variety of large networks on an Origin-2000 computer. The algorithm SyncSim could form a basis for a good parallelizing engine attachable relatively easily to an existing serial simulator.
Conference Paper
Over 5000 publications on parallel discrete event simulation (PDES) have appeared in the literature to date. Nevertheless, few articles have focused on empirical studies of PDES performance on large supercomputer-based systems. This gap is bridged here, by undertaking a parameterized performance study on thousands of processor cores of a Blue Gene supercomputing system. In contrast to theoretical insights from analytical studies, our study is based on actual implementation in software, incurring the actual messaging and computational overheads for both conservative and optimistic synchronization approaches of PDES. Complex and counter-intuitive effects are uncovered and analyzed, with different event timestamp distributions and available levels of concurrency in the synthetic benchmark models. The results are intended to provide guidance to the PDES community in terms of how the synchronization protocols behave at high processor core counts using a state-of-the-art supercomputing systems.
Conference Paper
The time warp mechanism is a technique for optimistically synchronizing Parallel and distributed Discrete Event-driven Simulators (PDES). Within this synchronization paradigm lie numerous parallel algorithms, chief among them being an estimation of the Global Virtual Time (GVT) value for fossil collection and output commit. Because the optimistic synchronization strategy allows for temporary violations of causal relations in the system being simulated, developing algorithms that correctly estimate GVT can prove extremely difficult. Testing and debugging can also prove difficult as error situations are frequently not repeatable due to varying load conditions and processing orders. Consequently, the application of formal methods to develop and analyze such algorithms are of extreme importance. This paper addresses the application of formal methods for the development of GVT estimation algorithms. More precisely, the paper presents a formal specification for and verification of one specific GVT estimation algorithm, the pGVT algorithm. The specifications are presented in the Larch Shared Language and verification completed using the Larch Proof Assistant. The ultimate goal of this work is to develop a reusable infrastructure for GVT proof development that can be used by developers of new GVT estimation algorithms.
Conference Paper
Parallel and distributed software systems are representative of large scale critical and complex systems that require the application of normal methods. Parallel and distributed software systems are notoriously unreliable because implementors often design and develop such systems without a complete understanding of the problem domain; in addition, the nondeterministic nature of certain parallel and distributed systems make system validation difficult if not impossible. In this paper, the application of normal specification and verification to a class of parallel and distributed software systems is presented. Specifically, the prototype verification system (PVS) is applied to the specification and verification of the time warp protocol, a parallel optimistic discrete event simulation algorithm. The paper discusses how the specification of the time warp protocol can be mechanized within a general-purpose higher-order logic framework like PVS. In addition, the paper presents the extensibility of the specification to address and verify different aspects and optimizations of the basic time warp protocol
Conference Paper
Full-text available
Parallel and distributed systems are representative of large and complex systems that require the application of formal methods. These systems are often unreliable because implementors design and develop these systems without a complete understanding of the problem domain; in addition, the nondeterministic nature of certain parallel and distributed systems make system validation difficult if not impossible. To address this issue, the application of formal specification and verification to a class of parallel and distributed software systems is presented in this paper. Specifically, the Prototype Verification System (PVS) is applied to the specification and verification of the Time Warp protocol, a distributed optimistic discrete event simulation algorithm. The paper discusses how the specification of the Time Warp protocol can be mechanized within a general-purpose higher-order theorem proving framework like PVS. In addition, the paper presents the extensibility of the specification to address and verify different aspects and optimizations of the basic Time Warp protocol. As an illustrative example, our experiences in specifying and verifying the infrequent state saving optimization to the basic Time Warp protocol is reported in the paper.
Article
The calculation of GVT has been a requirement to identify fossilized state and event space during Time Warp simulations. This paper outlines methods that use observations of past behavior to estimate future behavior for the purposes of fossil reclamation. More precisely, predictions of future rollback behavior are used to determine a probability that a particular item of saved state or event information is no longer needed. This probability is compared against a user-defined risk factor to decide if the space can be reclaimed and reused. This method is called optimistic fossil collection and it is fully distributed, not requiring the global estimate of GVT for operation.
Article
This thesis is concerned with the experimental development of parallel simulation tools that not only exploit diverse multiprocessor environments, but also allow parallel simulations to be built with reasonable effort. We work on two fronts: model replication and model decomposition. We describe the design of EcliPSe, a parallel simulation system for replicative applications whose programming interface is designed to enable easy parallelization of such programs. We investigate solutions to serializing bottlenecks that arise when samples are collected from many processes. We also examine how the structure of replicative applications can be exploited to provide fault tolerance with low execution overhead. Experiments using up to 128 workstations resulted in excellent performance, showing the scalability of the system. In model decomposition (also called parallel discrete-event simulation), we depart from the standard approach usually taken in current parallel tools and use the active-transaction approach. By obviating the need for explicitly sending messages, we make modeling easier for analysts that are not used to parallel programming constructs. We describe the design of the ParaSol model-decomposed parallel simulation tool. Using this threads-based tool as a testbed, we investigate how existing methods for model decomposition can be adapted to the active-transaction approach. We show, using performance experiments, that this approach does not incur a substantial run-time penalty. Finally, to demonstrate that ParaSol enables a simplified approach to implementing models, we use it to develop the first parallel implementation of the widely used GPSS simulation language. Initial performance experiments showed promising results: despite the overheads associated with model-decomposed parallel simulations, we were able to achieve a 34% reduction in execution time when going from two to four processors in a GPSS program execution.
Conference Paper
Full-text available
This paper presents the design, implementation and performance of a time warp simulator, called DSIM, which targets clusters comprised of thousands of processors. DSIM employs a novel technique for GVT computation, called the time quantum GVT algorithm that requires no message acknowledgement, relies on constant-length messages and is efficient on clusters with large numbers of processors. Its implementation uses a technique called Local Fossil Collection to alleviate the overhead of memory reclamation and to support efficient event management. DSIM is also equipped with a simple programming interface to ease programming and debugging of simulations. Experimental results obtained on the PHOLD benchmark demonstrated that DSIM can process as many as 228 million events per second on 1033 processors.
Conference Paper
We developed and implemented two highly optimized optimistic discrete event simulation techniques based on an efficient and scalable parallel heap data structure as a global event queue. The primary results are (i) the design of an optimistic simulation algorithm, namely SyncSim, which does not rely on traditional state and message saving data structures, but employs only one backup state per state variable, (ii) a demonstration, through implementation of SyncSim, of an optimistic technique which overcomes the two main mutually conflicting and unbounded overheads of the existing optimistic simulation algorithms: SyncSim bounds the additional space requirements to just one copy per state variable and drastically limits the number of rollbacks encountered. Furthermore, SyncSim beats the highly optimized traditional simulator simglobal on a wide variety of large networks on an Origin-2000 computer. The algorithm SyncSim could form a basis for a good parallelizing engine attachable relatively easily to an existing serial simulator.
Conference Paper
WARPED is a publicly available time warp simulation kernel. The kernel defines a standard interface to the application developer and is designed to provide a highly configurable environment for the integration of time warp optimizations. It is written in C++, uses the MPI message passing standard, and executes on a variety of parallel and distributed processing platforms. Version 2.0 of WARPED described here is distributed with several applications and the configuration can be set so that a sequential kernel implementation can be instantiated The kernel supports LP clustering, various GVT algorithms, and numerous optimizations to adaptively adjust simulation parameters at runtime.
Conference Paper
Each process in a time-warp parallel simulation requires a time-varying set of state and event histories to be retained for recovering from erroneous computations. Erroneous computation is discovered when straggler messages arrive with time-stamps in the process's past. The traditional method of determining the set of histories to retain has been through the estimation of a global virtual time (GVT) for the distributed simulation. A distributed GVT calculation requires an estimation of the global progress of the simulation during a real-time interval. Optimistic fossil collection (OFC) predicts a bound for the needed histories using local information, or previously collected information, that enables the process to continue. In most cases, OFC requires less communication overhead and less memory usage, and estimates the set of committed events faster. These benefits come at the cost of a possible penalty of having to recover from a state history that was incorrectly fossil-collected (an OFC fault). Sufficiently lightweight checkpointing and recovery techniques compensate for this possibility while yielding good performance. In this paper, the requirements of an OFC-based simulator (algorithm has been implemented in the WARPED time-warp parallel discrete-event simulator) are detailed along with a presentation of results from an OFC simulation. Performance statistics are given comparing the execution time and required memory usage of each logical process for different fossil collection methods
Conference Paper
Optimistic fossil collection (OFC) is a fully-distributed mechanism to reclaim memory from the state and event histories of a time warp simulation. Each fossil collector executes with a logical process (LP) and operates independently of other fossil collectors. Each one examines event arrival times and creates a statistical model of the expected variance from local virtual time (LVT). From this, it is possible to determine the probability that the LP will, in the future, rollback a distance X from LVT. Thus, the fossil collector can examine the time-stamps of items in the state and event histories to find the probability that they will be needed in the future. Comparing this probability against a user-specified risk factor, the fossil collector decides if the item can be marked as a fossil and scavenged. OFC allows for the possibility for simulation failure, so it may be desirable to periodically have complete checkpoints taken and archived during the simulation for a possible restart with a smaller risk factor specified. This method of memory management assumes there is an underlying stationary distribution for the rollback lengths during a time interval t. This is reasonable, since rollback lengths in time warp are relatively constant in length. This assumption can be relaxed for models that operate without an underlying assumption about the distribution of rollback lengths. This paper reviews the design and implementation of two rollback models for OFC. One assumes a geometrically distributed rollback length; the other assumes an arbitrary distribution of rollback lengths with fixed mean and variance
Conference Paper
Full-text available
The computation of Global Virtual Time is of fundamental importance in Time Warp based Parallel Discrete Event Simulation Systems. Shared memory multiprocessor architectures can support interprocess communication with much smaller overheads than distributed memory systems. This paper presents a new, completely asynchronous, Gvt algorithm which provides very fast and accurate Gvt estimation with significantly lower overhead than previous approaches. The algorithm presented is able to support more efficient memory management, termination, and other global control mechanisms. The Gvt algorithm described enables any Time Warp entity to compute Gvt at any time without slowing down other entities, in particular, those executing on the critical path. Experimental results are presented for a shared memory Time Warp system that employs a two tiered distributed memory management scheme. The proof of the correctness and the accuracy of the algorithm are also presented. Finally, some suggestions on possible further optimization of the implementation are given.
Conference Paper
We present an algorithm for computing the global virtual time (GVT) in an optimistic parallel discrete event simulation, on the distributed memory hypercube architecture. Our algorithm uses only 3N messages and runs in O(log N) time where N is the number of logical processors (LP's) representing components of the simulation system. It is based on the construction of a spanning binomial tree in the hypercube. In most simulation systems, there is an LP designated for GVT computation, called the GVT manager. Failure of the physical processor running this LP causes the simulation process to stop, and in such a case reorganization of LPs is necessary so that another logical processor take the roll of the GVT manager. In our algorithm, any LP in the system can elect itself to be the GVT manager and hence such reorganization is not necessary. We show how our algorithm can be used for memory management and hierarchical load balancing in a hypercube machine, and suggest a new technique to handle transient messages
Article
Full-text available
this paper is meanwhile published as: A. Ferscha and J. Luthi. "Estimating Rollback Overhead for Optimism Control in Timewarp". In: Proceedings of the 28
Article
Full-text available
The optimistic approach to parallel discrete event simulation (PDES) has led to a number of algorithms capable of fully exploiting the inherent parallelism of discrete event systems. On the other hand, these parallel algorithms, as well as most implementations of the Time Warp mechanism were designed to suit a specific parallel architecture, therefore suffering from lack of portability. This paper proposes the bulk synchronous parallel (BSP) model as a target platform for the design of portable parallel algorithms for optimistic simulation. After an overview of the main directions in PDES, the paper describes the Time Warp mechanism, presenting the most important issues related to optimistic simulation. A class of BSP algorithms for GVT computation is introduced and analysed in terms of the the BSP cost model. Then, two BSP algorithms for optimistic PDES are discussed; the first algorithm aims at avoiding recursive rollbacks in aggressive-cancellation Time Warp implementations, while t...
Article
Full-text available
The Time Warp mechanism is a protocol for synchronising a distributed computation of message-passing processes. Since its introduction in 1985 it has received attention specifically as a mechanism for synchronising a distributed discrete event simulation. The Time Warp mechanism is conceptually simple and has many attractive features.
Article
Full-text available
The achievements attained in accelerating the simulation of the dynamics of complex discrete event systems using parallel or distributed multiprocessing environments are comprehensively presented. While parallel discrete event simulation (DES) governs the evolution of the system over simulated time in an iterative SIMD way, distributed DES tries to spatially decompose the event structure underlying the system, and executes event occurrences in spatial subregions by logical processes (LPs) usually assigned to different (physical) processing elements. Synchronization protocols are necessary in this approach to avoid timing inconsistencies and to guarantee the preservation of event causalities across LPs.
Article
Traditionally, parallel discrete event simulators based on the Time Warp synchronization protocol have been implemented using either the shared memory programming model or the distributed memory, message passing programming model. This was because the preferred hardware platform was either a shared memory multiprocessor workstation or a network of uniprocessor workstations. However, the advent of clumps (clusters of multiprocessors) , has mandated a change in this dichotomous view. Programming for clumps can be quite novel as the platform allows the implementor to apply both shared memory and distributed memory programming techniques within the same framework. This thesis explores the design and implementation issues involved in exploiting this new platform for Time Warp simulations. Specifically, this thesis presents a few strategies for implementing Time Warp simulators on clumps. In addition, experiences in implementing these strategies on an extant distributed memory, message passing Time Warp simulator (warped) are presented. Performance results comparing the modified clump-specific simulation kernel to the unmodified distributed memory, message passing simulation kernel are also presented. To my parents. Acknowledgements I wish to thank my advisor Dr. Philip A. Wilsey for providing valuable guidance during the course of this work. I thank my thesis committee members Dr. Harold Carter and Dr. Santosh Pande for their suggestions on this thesis work. Mal, Ramanan, and Umesh provided valuable suggestions during the course of this work. I thank them for taking the time and interest in this work. I would also like to thank Mal and Ramanan for reading the initial drafts of my thesis and providing insightful comments. Working in the Computer Architecture Design Labor...
ResearchGate has not been able to resolve any references for this publication.