Conference Paper

Determining the Global Virtual Time in a Distributed Simulation.

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As it is impossible to measure local virtual times at the same wall-clock time on different processors, techniques were needed to make the measurements look like they were taken simultaneously . Most of these techniques are in general based on overlapping intervals [10, 12, 14, 16, 19], two cuts [13, 17], or global reduction [18, 20, 21]. 2.1. ...
... Still, such an optimization does not completely alleviate the problem. Lin and Lazowska [16] proposed to use sequence numbers to reduce acknowledgment overhead. Messages sent from one processor to another are marked with consecutive, increasing sequence numbers. ...
... First of those is the need for message acknowledgments that increase the network traffic and interfere with the transmission of normal messages. Lin and Lazowska's idea [16] eliminates the necessity of explicit acknowledgments, but at the expense of larger latency of GVT computation and of complex data structures to store messages. Second such property is the use of any vector whose size is linear with the number of processors. ...
Article
Full-text available
This paper presents a new Global Virtual Time (GVT) algorithm, called TQ-GVT that is at the heart of a new high performance Time Warp simulator designed for large-scale clusters. Starting with a survey of numerous existing GVT algorithms, the paper discusses how other GVT solutions, especially Mattern's GVT algorithm, influenced the design of TQ-GVT, as well as how it avoided several types of overheads that arise in clusters executing parallel discrete simulations. The algorithm is presented in details, with a proof of its correctness. Its effectiveness is then verified by experimental results obtained on more than 1,000 processors for two applications, one synthetic workload and the other a spiking neuron network simulation.
... Some of these algorithms (e.g., [15], [18]) acknowledge individual messages, reducing the time interval along which a message can result as still in-transit. Other approaches (e.g., [16]) acknowledge batches of messages reducing the network overhead, but stretching the interval of time along which a message still results in-transit (although being potentially already processed at the destination). This, in its turn, leads to worsening the approximation provided by the algorithms on the actual GVT, given that "obsolete" timestamps might be still considered in the global reduction while computing the new GVT value. ...
... The y axis is logaithmic. Three different GVT reduction algorithms are compared: ORCHESTRA, an asynchronous algorithm where the local computation is protected by critical sections, according to the Fujimoto&Hybinette algorithm in [9] (referred to as F&H in the plot), and an acknowledgement-based reduction based on the work in [16] (referred to as Ack in the plot). ...
... Some of these algorithms (e.g., [15], [18]) acknowledge individual messages, reducing the time interval along which a message can result as still in-transit. Other approaches (e.g., [16]) acknowledge batches of messages reducing the network overhead, but stretching the interval of time along which a message still results in-transit (although being potentially already processed at the destination). This, in its turn, leads to worsening the approximation provided by the algorithms on the actual GVT, given that "obsolete" timestamps might be still considered in the global reduction while computing the new GVT value. ...
... The y axis is logaithmic. Three different GVT reduction algorithms are compared: ORCHESTRA, an asynchronous algorithm where the local computation is protected by critical sections, according to the Fujimoto&Hybinette algorithm in [9] (referred to as F&H in the plot), and an acknowledgement-based reduction based on the work in [16] (referred to as Ack in the plot). ...
Article
Cities present new challenges and needs to satisfy and improve lifestyle for their citizens under the concept “Smart City”. In order to achieve this goal in a global manner, new technologies are required as the robotic one. But Public entities unknown the possibilities offered by this technology to get solutions to their needs. In this paper the development of the Innovative Public Procurement instruments is explained, specifically the process PDTI (Public end Users Driven Technological Innovation) as a driving force of robotic research and development and offering a list of robotic urban challenges proposed by European cities that have participated in such a process. In the next phases of the procedure, this fact will provide novel robotic solutions addressed to public demand that are an example to be followed by other Smart Cities.
... As for the first category, several proposals have been based on explicit message acknowledgment schemes [12], [13], [14] in order to determine which messages (or antimessages) are still in transit and which processes are responsible for keeping into account the timestamps of in-transit messages while computing the new GVT value. Some of these algorithms (see, e.g., [12], [14]) opt for acknowledging individual messages, which reduces the time interval along which a message can result as still in-transit. ...
... Some of these algorithms (see, e.g., [12], [14]) opt for acknowledging individual messages, which reduces the time interval along which a message can result as still in-transit. On the other hand, other approaches (see, e.g., [13]) opt for acknowledging batches of messages (rather than individual ones) which allows for reducing the overhead due to acknowledgment messages, but stretches the interval of time along which a message still results in-transit (although being potentially already processed at the destination). This, in its turn, leads to worsening the approximation provided by the algorithms on the actual GVT, given that "obsolete" timestamps might be still considered in the global reduction while computing the new GVT value. ...
Conference Paper
Full-text available
Global Virtual Time (GVT) is a powerful abstraction used to discriminate what events belong (and what do not belong) to the past history of a parallel/distributed computation. For high performance simulation systems based on the Time Warp synchronization protocol, where concurrent simulation objects are allowed to process their events speculatively and causal consistency is achieved via rollback/recovery techniques, GVT is used to determine which portion of the simulation can be considered as committed. Hence it is the base for actuating memory recovery (e.g. of obsolete logs that were taken in order to support state recoverability) and non-revocable operations (e.g. I/O). For shared memory implementations of simulation platforms based on the Time Warp protocol, the reference GVT algorithm is the one presented by Fujimoto and Hybinette [1]. However, this algorithm relies on critical sections that make it non-wait-free, and which can hamper scalability. In this article we present a wait-free shared memory GVT algorithm that requires no critical section. Rather, correct coordination across the processes while computing the GVT value is achieved via memory atomic operations, namely compare-and-swap. The price paid by our proposal is an increase in the number of GVT computation phases, as opposed to the single phase required by the proposal in [1]. However, as we show via the results of an experimental study, the wait-free nature of the phases carried out in our GVT algorithm pays-off in reducing the actual cost incurred by the proposal in [1].
... In Time Warp optimistic simulation, fossil reclamation through GVT estimation is a prominent and well proven technique [37]. There are several GVT algorithms proposed in the literature [5,22,33,41,46,79]. These algorithms either measure the rate of virtual time progress [22], or identify consistent snapshots [46], or keeps track of the peak and valley messages [41]. ...
... There are several GVT algorithms proposed in the literature [5,22,33,41,46,79]. These algorithms either measure the rate of virtual time progress [22], or identify consistent snapshots [46], or keeps track of the peak and valley messages [41]. GVT computation and the memory consumption are directly related and a trade off has to be arrived between the frequency of GVT computations and the memory overhead experienced by simulation objects [15,16]. ...
... GVT calculation is an interesting problem both from a theoretical and a practical point of view, so the literature on the subject is extensive. There are two main difficulties when calculating a GVT value, as pointed out e.g. by Lin and Lazowska [108] or Fujimoto [74]: ...
... The need for message acknowledgements was eliminated in the algorithm proposed by Lin and Lazowska [108]. Essentially, the set of transient messages is determined actively. ...
... In Time Warp optimistic simulation, fossil reclamation through GVT estimation is a prominent and well proven technique (Jefferson 1985 ). There are several GVT algorithms proposed in the literature (Lin and Lazowska 1990; Bellenot 1990; D'Souza, Fan, and Wilsey 1994; Mattern 1993; Fujimoto and Hybinette 1994; Tomlinson and Garg 1993). These algorithms either measure the rate of virtual time progress (D'Souza, Fan, and Wilsey 1994) or identify consistent snapshots (Mattern 1993) or keeps track of the peak and valley messages (Lin and Lazowska 1990). ...
... There are several GVT algorithms proposed in the literature (Lin and Lazowska 1990; Bellenot 1990; D'Souza, Fan, and Wilsey 1994; Mattern 1993; Fujimoto and Hybinette 1994; Tomlinson and Garg 1993). These algorithms either measure the rate of virtual time progress (D'Souza, Fan, and Wilsey 1994) or identify consistent snapshots (Mattern 1993) or keeps track of the peak and valley messages (Lin and Lazowska 1990). On the other hand, Optimistic Fossil Collection (OFC) designed by Young (Young, Abu-Ghazaleh, and Wilsey 1998) identifies fossils by predicting future rollback lengths based on the past rollback behavior. ...
Article
This paper presents a time warp fossil collection mechanism that functions without need for a GVT estimation algorithm. Effectively each logical process (LP) collects causality information during normal event execution and then each LP utilizes this information to identify fossils. In this mechanism, LPs use constant size vectors (that are independent of the total number of parallel simulation objects) as timestamps called Plausible Total Clocks to disseminate causality information. For proper operation, this mechanism requires that the communication layer preserves a FIFO ordering on messages. A detailed description of this new fossil collection mechanism and its proof of correctness is presented in this paper
... (In such an environment, the transient messages in the system cannot be directly accessed.) In most approaches [3, 18, 27, 34], the task of finding GVT involves all the processes in the system. One of the processes, called the coordinator, is assigned to initiate the task. ...
... In this approach, when process q receives a data message from p, it needs to send an acknowledgement back to p. We have proposed a GVT algorithm [18] that does not require acknowledgements for data messages. Thus, this algorithm can eliminate about 50% of the message sending during the simulation. ...
Article
Simulation is a powerful tool for studying the dynamics of a system. However, simulation is time-consuming. Thus, it is natural to attempt to use multiple processors to speed up the simulation process. Many protocols have been proposed to perform discrete event simulation in multi-processor environments. Most of these distributed discrete event simulation protocols are either conservative or optimistic. The most common optimistic distributed simulation protocol is called Time Warp. Several issues must be considered when designing a Time Warp simulation; examples are reducing the state saving overhead and designing the global control mechanism (i. e., global virtual time computation, memory management, distributed termination, and fault tolerance). This paper addresses these issues. We propose a heuristic to select the checkpoint interval to reduce the state saving overhead, generalize a previously proposed global virtual time computation algorithm, and present new algorithms for memory management, distributed termination, and fault tolerance. The main contribution of this paper is to provide guidelines for designing an efficient Time Warp simulation.
... We need one saved state with a time not greater than GVT in case a rollback to GVT occurs. Many algorithms for efficiently computing or estimating GVT have been proposed, see, e.g., [8,10,26,43,47,58,91,104,129,138,141,142,159]. ...
... Numerous algorithms for GVT computation or estimation have been proposed, including algorithms employing a central controller [10,43,91,129], distributed GVT algorithms [104], and algorithms specifically for shared-memory multiprocessors [58,159]. Note that GVT only tells the set of committed events, and therefore the performance of the GVT algorithm does . ...
Article
Thesis (Ph. D.)--Nanyang Technological University, School of Computer Engineering, 2003.
... For example, acknowledging individual messages scheme doubles the number of messages to decrease the performance of simulations. Although some methods, such as acknowledging batches of messages [30] and piggyback acknowledgement [21,24], were devised to reduce the acknowledgement overhead, the acknowledgement scheme becomes complex. ...
Article
Full-text available
Global Virtual Time computation of Parallel Discrete Event Simulation is crucial for conducting fossil collection and detecting the termination of simulation. The triggering condition of GVT computation in typical approaches is generally based on the wall-clock time or logical time intervals. However, the GVT value depends on the timestamps of events rather than the wall-clock time or logical time intervals. Therefore, it is difficult for the existing approaches to select appropriate time intervals to compute the GVT value. In this study, we propose a scalable GVT estimation algorithm based on Lower Bound of Event-Bulk-Time, which triggers the computation of the GVT value according to the number of processed events. In order to calculate the number of transient messages, our algorithm employs Event-Bulk to record the messages sent and received by Logical Processes. To eliminate the performance bottleneck, we adopt an overlapping computation approach to distribute the workload of GVT computation to all worker-threads. We compare our algorithm with the fast asynchronous GVT algorithm using PHOLD benchmark on the shared memory machine. Experimental results indicate that our algorithm has a light overhead and shows higher speedup and accuracy of GVT computation than the fast asynchronous GVT algorithm.
... As is typical with optimistic synchronisation, garbage collection is utilised to free memory used for keeping historical state information. Several algorithms have been proposed to get a snapshot of the system and determine the removable states [11,12,13,14]. PDES-MAS implements an adaptation of Mattern's GVT algorithm [12]. ...
Article
Full-text available
Multi-agent systems (MAS) are increasingly being acknowledged as a modelling paradigm for capturing the dynamics of complex systems in a wide range of domains, from system biology to adaptive socio-technical system of systems. The execution of such MAS simulations on parallel machines is a challenging problem due to their dynamic, non-deterministic, data-centric behaviour and nature. These problems are exacerbated as the scale of such MAS models increases. PDES-MAS is a distributed simulation kernel developed specifically to support MAS models addressing the problems of partitioning, load balancing and interest management in an integrated, transparent and adaptive manner. This paper presents an overview of PDES-MAS and for the first time it provides a quantitative evaluation of the system.
... Time Warp [12] is an optimistic synchronization protocol that uses run-time detection of errors caused by out-of-order execution of portions of a parallel computation, and recovery using a rollback mechanism [8]. The main advantages of Time Warp protocol is that it offers the potential for greater exploitation of parallelism and, perhaps more importantly, greater transparency of the synchronization mechanism to the simulation programmer [13]. ...
Conference Paper
Full-text available
One of the most common optimistic synchronization protocols for parallel simulation is the Time Warp algorithm proposed by Jefferson [12]. Time Warp algorithm is based on the virtual time paradigm that has the potential for greater exploitation of parallelism and, perhaps more importantly, greater transparency of the synchronization mechanism to the simulation programmer. It is widely believe that the optimistic Time Warp algorithm suffers from large memory consumption due to frequent rollbacks. In order to achieve optimal memory management, Time Warp algorithm needs to periodically reclaim the memory. In order to determine which event-messages have been committed and which portion of memory can be reclaimed, the computation of global virtual time (GVT) is essential. Mattern [2] uses a distributed snapshot algorithm to approximate GVT which does not rely on first in first out (FIFO) channels. Specifically, it uses ring structure to establish cuts C1 and C2 to calculate the GVT for distinguishing between the safe and unsafe event-messages. Although, distributed snapshot algorithm provides a straightforward way for computing GVT, more efficient solutions for message acknowledging and delaying of sending event messages while awaiting control messages are desired. This paper studies the memory requirement and time complexity of GVT computation. The main objective of this paper is to implement the concept of matrix with the original Mattern's GVT algorithm to speedups the process of GVT computation while at the same time reduce the memory requirement. Our analysis shows that the use of matrix in GVT computation improves the overall performance in terms of memory saving and latency.
... A collection of optimizations to time warp is provided in [10]. The technical report describing time warp [15] does not solve the problem of determining global virtual time, however an efficient algorithm for the determination of global virtual time is presented in [19]. ...
... Lin and Lazowska [72] propose a new data structure that reduces the frequency of acknowledgment messages. The work done by Tomlinson and Garg [66] uses counters to detect transient messages. ...
... Collections can be done in the reverse order of broadcasts. Lin and Lazowska [8] proposed an algorithm which is similar to Samadi's algorithm but which does not use an acknowledgment message for every single message. Rather, every message carries a sequence number and when a process gets a certain control message it sends to every neighboring process the smallest sequence number which is missing from that process. ...
Article
Full-text available
Parallel and Distributed Simulation (PADS) algorithms are typically categorized to belong to one of two categories. They are either conservative or optimistic with respect to the method of handling causality. Conservative systems strictly preserve causality, while optimistic systems detect and correct causality errors when they occur. Time Warp is the basis of optimistic algorithms where rolling back the simulation clock allows the simulation to correct for errors. The Global Virtual Time (GVT) is the variable that maintains information about simulation progress, termination decision, and for committing input/output data. In this paper the basis for an environment for visualization distributed simulations with time warp on a network of UNIX workstations is presented. The visualization environment provides a graphical overview of simulation processes, and provides insight for algorithm performance. Extensions to the visualizations are also possible for animation of simulation results.
... The process of identifying and reclaiming this space is called fossil collection. The global time against which fossil collection algorithms operate is called the global virtual time (or GVT) and several algorithms for GVT estimation have been proposed 3,4,5,6,7,8]. In addition to its use for fossil collection, GVT is also useful for deciding when irrevocable operations (such as I/O) can be performed and, in some instances, when the simulation has completed. ...
... This space can be freed only when global progress of the simulation advances beyond the (simulation) time at which the saved information is needed. The process of identifying and reclaiming this space is called fossil collecr The global time against which fossil collection algorithms operate is cMled the global virtual time (or GVT) and several algorithms for GVT estimation have been proposed [1,6,7,15,19,24,26]. In addition to its use for fossil collection, GVT is also useful for deciding when irrevocable operations (such as I/O) can be performed and, in some instances, when the simulation has completed. ...
Conference Paper
The time warp mechanism is a technique for optimistically synchronizing Parallel and distributed Discrete Event-driven Simulators (PDES). Within this synchronization paradigm lie numerous parallel algorithms, chief among them being an estimation of the Global Virtual Time (GVT) value for fossil collection and output commit. Because the optimistic synchronization strategy allows for temporary violations of causal relations in the system being simulated, developing algorithms that correctly estimate GVT can prove extremely difficult. Testing and debugging can also prove difficult as error situations are frequently not repeatable due to varying load conditions and processing orders. Consequently, the application of formal methods to develop and analyze such algorithms are of extreme importance. This paper addresses the application of formal methods for the development of GVT estimation algorithms. More precisely, the paper presents a formal specification for and verification of one specific GVT estimation algorithm, the pGVT algorithm. The specifications are presented in the Larch Shared Language and verification completed using the Larch Proof Assistant. The ultimate goal of this work is to develop a reusable infrastructure for GVT proof development that can be used by developers of new GVT estimation algorithms.
... However, instantaneous values of GVT are impossible to compute in a distributed system. Hence, several methods to accurately estimate GVT have been proposed in the literature 1, 3, 12, 14]. Estimates of GVT are then used for termination detection, memory management, error handling, and committing input/output opera- tions. ...
Conference Paper
Parallel and distributed software systems are representative of large scale critical and complex systems that require the application of normal methods. Parallel and distributed software systems are notoriously unreliable because implementors often design and develop such systems without a complete understanding of the problem domain; in addition, the nondeterministic nature of certain parallel and distributed systems make system validation difficult if not impossible. In this paper, the application of normal specification and verification to a class of parallel and distributed software systems is presented. Specifically, the prototype verification system (PVS) is applied to the specification and verification of the time warp protocol, a parallel optimistic discrete event simulation algorithm. The paper discusses how the specification of the time warp protocol can be mechanized within a general-purpose higher-order logic framework like PVS. In addition, the paper presents the extensibility of the specification to address and verify different aspects and optimizations of the basic time warp protocol
... Improvements to this algorithm have been proposed by Bellenot [1] which reduce the complexity of message passing by organizing PEs into trees rather than a ring. Lin and Lazowska [17] propose a new data structure that reduces the frequency of acknowledgment messages . The work done by Tomlinson and Garg [24] uses counters to detect transient messages. ...
Conference Paper
Full-text available
In this paper we introduce a new concept, network atomic operations (NAOs) to create a zero-cost consistent cut. Using NAOs, we define a wall-clock-time driven GVT algorithm called Seven O'Clock that is an extension of Fujimoto's shared memory GVT algorithm. Using this new GVT algorithm, we report good optimistic parallel performance on a cluster of state-of-the-art Itanium-II quad processor systems for both benchmark applications such as PHOLD and real-world applications such as a large-scale TCP/Internet model. In some cases, super-linear speedup is observed.
... The process of identifying and reclaiming this space is called fossil collection. The global time against which fossil collection algorithms operate is called the Global Virtual Time (or GVT) and several algorithms for GVT estimation have been proposed [5] [9] [16] [19]. In addition to its use for fossil collection, GVT is also useful for deciding when irrevocable operations can be performed and, in some instances, when the simulation has completed. ...
Conference Paper
Several optimizations to the Time Warp synchronization pro to- col for parallel discrete event simulation have been propos ed and studied. Many of these optimizations have included some for m of dynamic adjustment (or control) of the operating parameter s of the simulation (e.g., checkpoint interval, cancellation strategy). Tra- ditionally dynamic parameter adjustment has been performe d at the simulation object level; each simulation object collec ts mea- sures of its operating behaviors (e.g., rollback frequency, rollback length, etc) and uses them to adjust its operating parameter s. The performance data collection functions and parameter adjus tment are overhead costs that are incurred in the expectation of hi gher throughput. This paper presents a method of eliminating som e of these overheads through the use of an external object to adju st the control parameters. That is, instead of inserting code for a djusting simulation parameters in the simulation object, an externa l control object is defined to periodically analyze each simulation ob ject's performance data and revise that object's operating parame ters. An implementation of an external control object in theWARPED Time Warp simulation kernel has been completed. The simula- tion parameters updated by the implemented control system a re: checkpoint interval, and cancellation strategy (lazy or ag gressive). A comparative analysis of three test cases shows that the ext ernal control mechanism provides speedups between 5%-17% over th e best performing embedded dynamic adjustment algorithms.
... Determination of Global Virtual Time should be done as defined by [22]. This algorithm allows Global Virtual Time to be determined in a message-passing environment as opposed to the easier case of a shared memory environment. ...
Article
Full-text available
There is a trend toward the use of predictivesystems in communications networks. At the systems andnetwork management level predictive capabilities arefocused on anticipating network faults and performance degradation. Simultaneously, mobilecommunication networks are being developed withpredictive location and tracking mechanisms. Theinteractions and synergies between these systems presenta new set of problems. A new predictive network managementframework is developed and examined. The interactionbetween a predictive mobile network and the proposednetwork management system is discussed. The Rapidly Deployable Radio Network (RDRN) is used as aspecific example to illustrate theseinteractions.
... GVT computation is especially likely to overwhelm forward execution time in less than ideal simulations. This is probably why there are so many algorithms for GVT calculation in the literature [32,33,46,35]. ...
Article
Printout. Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2005. Vita. Includes bibliographical references (leaves 144-150)
... Several algorithms have been proposed in the literature for computing GVT. Software-based GVT schemes either synchronize all processors and take a global snapshot, or compute GVT concurrently with the simulation (e.g., see 24, 2, 19, 18]). Schemes utilizing a global snapshot entail an unacceptable amount of overhead for our purposes because we require GVT to be performed very frequently, possibly as often as after each event. ...
Article
This paper describes issues concerning the design of an optimistic parallel discrete event simulation system that executes in environments that impose real-time constraints on the simulator's execution. Two key problems must be addressed by such a system. First the timing characteristics of the parallel simulator must be sufficiently predictable to allow one to guarantee that real-time deadlines for completing simulation computations will be met. Second, the optimistic computation must be able to interact with its surrounding environment with as little latency as possible, necessitating rapid commitment of I/O operations. To address the first question, we show that optimistic simulators that never send incorrect messages (sometimes called “aggressive-no-risk” simulators) provide sufficient predictability to allow traditional schedulability analysis techniques commonly used in real-time systems to be applied. We show that incremental state saving techniques introduce sufficient unpredictability that they are not well-suited for real-time environments. We observe that the traditional “lowest timestamp first” scheduling policy used in many optimistic parallel simulation systems is an optimal (in the real-time sense) scheduling algorithm when event timestamps and real-time deadlines are the same. Finally, to address the question for rapid commitment of I/O operations, we utilize a continuous GVT computation scheme for shared-memory multiprocessors where a new value of GVT is computed after processing each event in the simulation. These ideas are incorporated in a parallel, optimistic, real-time simulation system called PORTS. Initial performance measurements of the shared-memory based PORTS system executing on a Kendall Square Research multiprocessor are presented. Initial performance results are encouraging, demonstrating that PORTS achieves performance approaching that of a conventional Time Warp system for the benchmark programs that were tested.
Article
This is Part 2 of a trio of papers intended to provide a unifying framework in which conservative and optimistic synchronization for parallel discrete event simulations (PDES) can be freely and transparently combined in the same logical process (LP) on an event-by-event basis. In this paper we continue the outline of an approach called Unified Virtual Time (UVT) that was introduced in Part 1 , and show in detail via two extended examples how conservative synchronization can be refactored and combined with optimistic synchronization in the UVT framework. We describe UVT versions of both a basic time windowing algorithm called USTW (Unified Simple Time Windows) and a refactored version of the Chandy-Misra-Bryant Null Message algorithm called UCMB (Unified CMB) .
Article
Algorithms for synchronization of parallel discrete event simulation have historically been divided between conservative methods that require lookahead but not rollback, and optimistic methods that require rollback but not lookahead. In this paper we present a new approach in the form of a framework called Unified Virtual Time (UVT) that unifies the two approaches, combining the advantages of both within a single synchronization theory. Whenever timely lookahead information is available, a logical process (LP) executes conservatively using an irreversible event handler. When lookahead information is not available the LP does not block, as it would in a classical conservative execution, but instead executes optimistically using a reversible event handler. The switch from conservative to optimistic synchronization and back is decided on an event-by-event basis by the simulator, transparently to the model code. UVT treats conservative synchronization algorithms as optional accelerators for an underlying optimistic synchronization algorithm, enabling the speed of conservative execution whenever it is applicable, but otherwise falling back on the generality of optimistic execution. We describe UVT in a novel way, based on fundamental invariants, monotonicity requirements, and synchronization rules. UVT permits zero-delay messages and pays careful attention to tie-handling using superposition. We prove that under fairly general conditions a UVT simulation always makes progress in virtual time. This is Part 1 of a trio of papers describing the UVT framework for PDES, mixing conservative and optimistic synchronization and integrating throttling control.
Article
One of the methods used to reduce the time spent simulating VHDL designs is by parallelizing the simulation. In this paper, we describe the implementation of an object-oriented Time Warp simulator for VHDL on an actor based environment. The actor model of computation allows the exploitation of fine grained parallelism in a truly asynchronous manner and allows for the overlap of computation with communication. Some preliminary results obtained by simulating a set of multipliers and some ISCAS benchmark circuits are provided. In addition, the importance of placing processes based on circuit partitioning techniques for improving runtimes and scalability is demonstrated. Results are reported on a Sun SPARCServer 1000 and an Intel Paragon.
Patent
Full-text available
A method for managing data between a virtual machine a bus controller includes transmitting an input output (IO) request from the virtual machine to a service virtual machine that owns the bus controller. According to an alternate embodiment, managing data between a virtual machine and a bus controller includes trapping a register access made by the virtual machine. A schedule is generated to be implemented by the bus controller. Status is returned to the virtual machine via a virtual host controller. Other embodiments are described and claimed.
Article
Global Virtual Time (GVT) computation is a key determinant of the efficiency and runtime dynamics of Parallel Discrete Event Simulations (PDES), especially on large-scale parallel platforms. Here, three execution modes of a generalized GVT computation algorithm are studied on high-performance parallel computing systems: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 216,000 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine- to medium-grained event computation, synthetic and nonsynthetic applications, and different lookahead values. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly coupled discrete event dynamics on massively parallel platforms.
Article
Full-text available
An introduction to the field of Parallel and Distributed Simulation (PADS) is given. The capabilities and limitations of currently used PADS techniques are discussed. A review of the recently developed hybrid and adaptive PADS techniques is also given. Sample performance results of some PADS techniques are presented using a network of workstations.
Article
It has been established elsewhere (Reynolds, Int. J. Comput. Simulation, to appear) that hardware to support parallel discrete event simulations (PDES) is desirable. We describe the steps leading to the implementation of a hardware-based framework to support PDES. We begin with an exploration of the criteria that must be met to make such a framework both practical and useful, concluding that maintenance of sequential consistency is sufficient, while "observable" sequential consistency is more desirable but difficult to attain. We derive a functional design based on these criteria, and from that derive a prototype design. Also, we establish the utility of our design, showing that computation of critical global values, such as global virtual time, can be done in times at least two orders of magnitude faster than typical event times in discrete event simulations.
Article
Full-text available
The two main approaches to parallel discrete event simulation – conservative and optimistic – are likely to encounter some limitations when the size and complexity of the simulation system increases. For such large scale simulations, the conservative approach appears to be limited by blocking overhead and sensitivity to lookahead, whereas the optimistic approach may become prone to cascading rollbacks, state saving overhead, and demands for larger memory space. These drawbacks restrict the synchronization schemes based on each of the two approaches from scaling up. A combined approach may resolve these limitations, while preserving and utilizing potential advantages of each method. However, the schemes proposed so far integrate the two views at the same level, i.e. local to a logical process, and hence may not be able to fully solve the problems. In this paper we propose the Local Time Warp method for parallel discrete-event simulation and present a novel synchronization scheme for it called HCTW. The new scheme hierarchically combines a Conservative Time Window algorithm with Time Warp and aims at reducing cascade rollbacks, sensitivity to lookahead, and the scalability problems. Local Time Warp is believed to be suitable for parallel machines equipped with thousands of processors and thus an appropriate candidate for simulation of large and complex systems.
Article
Full-text available
The paper gives an introduction to parallel discrete event simulation (PDES) and emphasising the classification of the research done in the area. Such a classification includes a time line presentation of different streams of thought in the field and research from different angles. In this respect four synchronisation streams including conservative, optimistic, hybrid and adaptive protocols are reviewed. Then performance related issues in PDES are summarised in which topics such as synchronisation overheads, lookahead, time parallelism, performance modelling, and loads balancing are further emphasised.
Conference Paper
A monitoring circuit for individual photovoltaic (PV) panels in grid-connected systems is proposed, which exhibits a number of features devised to simplify and reduce cost of diagnostics and maintenance of the PV plant. In particular, the system is provided with an effective energy harvesting supply stage, which eliminates the requirement for external supply or batteries; furthermore, no cables are needed for data transfer due to the adoption of a rugged wireless connectivity.
Article
This paper studies the space complexity of an optimistic parallel simulation protocol called Time Warp. We evaluate four Time Warp memory management algorithms: fossil collection, message sendback, cancelback and artificial rollback. We identify two criteria in designing Time Warp memory management algorithms. Criterion 1 tests if a memory management algorithm ensures that Time Warp simulation always stops (either completes or terminates when memory is exhausted). If an algorithm does not satisfy this criterion, then the simulation may be trapped in an infinite loop. Criterion 2 tests if a memory management algorithm is independent of processor parameters (e.g., number of processors available for the parallel simulation, processor speed and interprocessor communication costs). We show that if an algorithm satisfies this second criterion, then the amount of memory consumed by Time Warp simulation is bounded by the amount consumed by sequential simulation. For algorithms that do not have full control of uncommitted objects (e.g., fossil collection and message sendback), Criterion 2 is not satisfied in general. For algorithms that have full control of uncommitted objects (e.g., cancelback and artificial rollback), special treatments are necessary to satisfy Criterion 1 (i.e., to ensure that the algorithms do not cancel future objects such that global virtual time never advances).
Conference Paper
In the field of distributed discrete event simulation we introduced the split queue time warp algorithm, which is a generalization of the well known time warp algorithm. The main feature is allowing lazy message reception and so the rollback frequency may be reduced. This paper describes a method for global virtual time approximation during a simulation run respecting the specific structure of such a system. This will keep the interference with the underlying simulation at a very low degree.
Conference Paper
Parallel discrete event simulations (PDES) encompass a broad range of analytical simulations. Their utility lies in their ability to model a system and provide information about its behavior in a timely manner. Current PDES methods provide limited performance improvements over sequential simulation. Many logical models for applications have fine granularity making them challenging to parallelize. In POSE, we examine the overhead required for optimistically synchronizing events. We have designed an object model based on the concept of visualization and new adaptive optimistic methods to improve the performance of finegrained PDES applications. These novel approaches exploit the speculative nature of optimistic protocols to improve single-processor parallel over sequential performance and achieve scalability for previously hard-to-parallelize fine-grained simulations.
Conference Paper
Historically, large-scale low-lookahead parallel simulation has been a difficult problem. As a solution, we have designed a global synchronization unit (GSU) that would reside centrally on a multi-core chip and asynchronously compute the lower bound on time stamps (LBTS), the minimum timestamp of all unprocessed events in the simulation, on demand to synchronize conservative parallel simulators. Our GSU also accounts for transient messages, messages that have been sent but not yet processed by their recipient, eliminating the need for the simulator to acknowledge received messages. In this paper we analyze the sensitivity of simulation performance to the time required to access the GSU. The sensitivity analysis revealed that with GSU access times as high as hundreds of cycles, there was still a significant performance advantage over the baseline shared-memory implementation.
Article
Techniques are proposed for computing a global virtual time (GVT), which is the minimum local virtual time of processes in Time Warp. The algorithm computes a conservative estimate of GVT using an approach which is considerably simpler than previous algorithms for computing GVT. This algorithm does not require a global synchronization of processors. An inherent problem is GVT computation relates to handling messages in transit. Several alternatives are proposed for solving the transient message problem. The algorithm is suitable for distributed shared memory machines such as the BBN Butterfly and message passing machines with a variety of interconnection networks
Article
Time Warp is an optimistic synchronization protocol used for parallel discrete event simulation. While Time Warp has the potential to reduce the execution time of large simulations, it has been plagued by a variety of problems, namely: 1. Instability due to thrashing effects caused by echoing and cascading rollbacks. 2. Memory bottlenecks due to state saving and excessive optimism. 3. Inefficient scheduling algorithms for scheduling Time Warp processes on each processing node. These problems have inhibited the widespread use of Time Warp as a general purpose synchronization algorithm. The general trend of researchers attempting to solve these problems has been to statically limit the optimism of Time Warp. Unfortunately, these attempts have achieved only limited success. This is because a static set of parameters may perform well for one simulation but not for another. This paper attacks the problem using adaptive mechanisms to control optimism, using an index of performance called useful work. This research presents solutions for the above mentioned problems, by: 1. Stabilizing Time Warp using adaptive bounded time windows. 2. Reducing memory usage and overall execution time by using an adaptive mechanism to vary the checkpoint interval. 3. Scheduling Time Warp processes with the useful work parameter to favor more productive processes. Using this new performance index called Useful Work, several modifications to Time Warp are implemented to stabilize and improve Time Warp. Thus, this new improved Time Warp synchronization mechanism termed Parameterized Time Warp provides an integrated adaptive solution to optimistic Parallel Discrete Event Simulation. Empirical work showing that PTW outperforms an equivalent Time Warp simulation executing under similar partitioning and load conditions is also presented.
Article
Global virtual time (GVT) is used in the Time Warp synchronization mechanism to perform irrevocable operations such as I/O and to reclaim storage. Most existing algorithms for computing GVT assume a message-passing programming model. Here, GVT computation is examined in the context of a shared-memory model. We observe that computation of GVT is much simpler in shared-memory multiprocessors because these machines normally guarantee that no two processors will observe a set of memory operations as occurring in different orders. Exploiting this fact, we propose an efficient, asynchronous, shared-memory GVT algorithm and prove its correctness. This algorithm does not require message acknowledgments, special GVT messages, or FIFO delivery of messages, and requires only a minimal number of shared variables and data structures. The algorithm only requires one round of interprocessor communication to compute GVT, in contrast to many message-based algorithms that require two. An efficient implementatin is described that eliminates the need for a processor to explicitly compute a local minimum for time warp systems using a lowest-timestamp-first scheduling policy in each processor. In addition, we propose a new mechanism called on-the-fly fossil collection that enables efficient storage reclamation for simulations containing large numbers, e.g., hundreds of thousand or even millions of simulator objects. On-the-fly fossil collection can be used in time warp systems executing on either shared-memory or message-based machines. Performance measurements of the GVT algorithm and the on-the-fly fossil collection mechanism on a Kendall Square Research KSR-2 machine demonstrate that these techniques enable frequent GVT and fossil collections, e.g., every millisecond, without incurring a significant performance penalty
Article
In this thesis, we consider the problem of dynamic load balancing for parallel discrete event simulation. We focus on the optimistic synchronization protocol, Time Warp. A distributed load balancing algorithm was developed, which makes use of the active process migration in Clustered Time Warp. Clustered Time Warp is a hybrid synchronization protocol; it uses an optimistic approach between the clusters and a sequential approach within the clusters. As opposed to the centralized algorithm developed by H. Avril for Clustered Time Warp, the presented load balancing algorithm is a distributed token-passing one. We present two metrics for measuring the load: processor utilization and processor advance simulation rate. Different models were simulated and tested: VLSI models and queuing network models (pipeline and distributed networks). Results show that improving the performance of the system depends a great deal on the nature of the simulated model. For the VLSI model, we also examined the effect of the dynamic load balancing algorithm on the total number of processed messages per unit time. Performance results show that dynamically balancing the load, the throughput of the simulation was improved by more than 100%.
Article
This thesis is concerned with the experimental development of parallel simulation tools that not only exploit diverse multiprocessor environments, but also allow parallel simulations to be built with reasonable effort. We work on two fronts: model replication and model decomposition. We describe the design of EcliPSe, a parallel simulation system for replicative applications whose programming interface is designed to enable easy parallelization of such programs. We investigate solutions to serializing bottlenecks that arise when samples are collected from many processes. We also examine how the structure of replicative applications can be exploited to provide fault tolerance with low execution overhead. Experiments using up to 128 workstations resulted in excellent performance, showing the scalability of the system. In model decomposition (also called parallel discrete-event simulation), we depart from the standard approach usually taken in current parallel tools and use the active-transaction approach. By obviating the need for explicitly sending messages, we make modeling easier for analysts that are not used to parallel programming constructs. We describe the design of the ParaSol model-decomposed parallel simulation tool. Using this threads-based tool as a testbed, we investigate how existing methods for model decomposition can be adapted to the active-transaction approach. We show, using performance experiments, that this approach does not incur a substantial run-time penalty. Finally, to demonstrate that ParaSol enables a simplified approach to implementing models, we use it to develop the first parallel implementation of the widely used GPSS simulation language. Initial performance experiments showed promising results: despite the overheads associated with model-decomposed parallel simulations, we were able to achieve a 34% reduction in execution time when going from two to four processors in a GPSS program execution.
Conference Paper
We present two GVT computation algorithms for PDES techniques with event based activities, relying on a space-time memory abstraction. Algorithm 2 involves a modification in the activity control, and is based on an epoch coloring scheme. The effect of the modification is assessed through an experimental study on a simulator implemented in the Linda coordination language. Experiments are performed on a cluster of workstations, and show that the modified activity control discipline is able to enhance performance
Conference Paper
Full-text available
It is well known that Time Warp may suffer from poor performance due to excessive rollbacks caused by overly optimistic execution. The authors present a simple flow control mechanism using only local information and GVT that limits the number of uncommitted messages generated by a processor, thus throttling overly optimistic TW execution. The flow control scheme is analogous to traditional networking flow control mechanisms. A “window” of messages defines the maximum number of uncommitted messages allowed to be scheduled by a process. Committing messages is analogous to acknowledgments in networking flow control. The initial size of the window is calculated using a simple analytical model that estimates the instantaneous number of messages that a process will eventually commit. This window is expanded so that the process may progress up to the next commit point (generally the next fossil collection), and to accommodate optimistic execution. The expansions to the window are based on monitoring TW performance statistics so the window size automatically adapts to changing program behaviors. The flow control technique presented here is simple and fully automatic. No global knowledge or synchronization (other than GVT) is required. They also develop an implementation of the flow control scheme for shared memory multiprocessors that uses dynamically sized pools of free message buffers. Experimental data indicates that the adaptive flow control scheme maintains high performance for “balanced workloads”, and achieves as much as a factor of 7 speedup over unthrottled TW for certain irregular workloads
Conference Paper
Optimistic fossil collection (OFC) is a fully-distributed mechanism to reclaim memory from the state and event histories of a time warp simulation. Each fossil collector executes with a logical process (LP) and operates independently of other fossil collectors. Each one examines event arrival times and creates a statistical model of the expected variance from local virtual time (LVT). From this, it is possible to determine the probability that the LP will, in the future, rollback a distance X from LVT. Thus, the fossil collector can examine the time-stamps of items in the state and event histories to find the probability that they will be needed in the future. Comparing this probability against a user-specified risk factor, the fossil collector decides if the item can be marked as a fossil and scavenged. OFC allows for the possibility for simulation failure, so it may be desirable to periodically have complete checkpoints taken and archived during the simulation for a possible restart with a smaller risk factor specified. This method of memory management assumes there is an underlying stationary distribution for the rollback lengths during a time interval t. This is reasonable, since rollback lengths in time warp are relatively constant in length. This assumption can be relaxed for models that operate without an underlying assumption about the distribution of rollback lengths. This paper reviews the design and implementation of two rollback models for OFC. One assumes a geometrically distributed rollback length; the other assumes an arbitrary distribution of rollback lengths with fixed mean and variance
Conference Paper
Full-text available
The computation of Global Virtual Time is of fundamental importance in Time Warp based Parallel Discrete Event Simulation Systems. Shared memory multiprocessor architectures can support interprocess communication with much smaller overheads than distributed memory systems. This paper presents a new, completely asynchronous, Gvt algorithm which provides very fast and accurate Gvt estimation with significantly lower overhead than previous approaches. The algorithm presented is able to support more efficient memory management, termination, and other global control mechanisms. The Gvt algorithm described enables any Time Warp entity to compute Gvt at any time without slowing down other entities, in particular, those executing on the critical path. Experimental results are presented for a shared memory Time Warp system that employs a two tiered distributed memory management scheme. The proof of the correctness and the accuracy of the algorithm are also presented. Finally, some suggestions on possible further optimization of the implementation are given.
ResearchGate has not been able to resolve any references for this publication.