Article

Virtual time 1i: the cancelback protocol for storage management in distributed simulation

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Thus, it is important to design an efficient memory management protocol which guarantees that the memory consumption of parallel simulation is of the same order as sequential simulation. (Such an algorithm is referred to as an optimal memory management algorithm; a formal definition will be given in Definition 1.) Previous work [6, 7, 8] has been devoted to reducing the space complexity of Time Warp simulation. The first optimal memory management protocol (called cancelback) was proposed by Jefferson [8]. ...
... (Such an algorithm is referred to as an optimal memory management algorithm; a formal definition will be given in Definition 1.) Previous work [6, 7, 8] has been devoted to reducing the space complexity of Time Warp simulation. The first optimal memory management protocol (called cancelback) was proposed by Jefferson [8]. Although cancelback is considered as a complete solution for the storage management problem in Time Warp, some efficiency issues in implementing this algorithm must be considered. ...
... Now we show that Chandy-Misra may consume non-constant bounded memory. Lin and La- zowska [17], and Jefferson [8] show that there exist Chandy-Misra simulations that have spaceFigure 1: The case when Chandy-Misra consumes less storage than the sequential simulation. complexities of O(kM s ) for arbitrary k. ...
Article
Recently there has been a great deal of interest in performance evaluation of parallel simulation. Most work is devoted to the time complexity and assumes that the amount of memory available for parallel simulation is unlimited. This paper studies the space complexity of parallel simulation. Our goal is to design an efficient memory management protocol which guarantees that the memory consumption of parallel simulation is of the same order as sequential simulation. (Such an algorithm is referred to as optimal.) We first derive the relationships among the space complexities of sequential simulation, Chandy-Misra simulation, and Time Warp simulation. We show that Chandy-Misra may consume more storage than sequential simulation, or vice versa. Then we show that Time Warp never consumes less memory than sequential simulation. Then we describe cancelback, an optimal Time Warp memory management protocol proposed by Jefferson. Although cancelback is considered as a complete solution for the storage management problem in Time Warp, some efficiency issues in implementing this algorithm must be considered. In this paper, we propose an optimal algorithm called artificial rollback. We show that this algorithm is easy to implement and analyze. An implementation of artificial rollback is given, which is integrated with processor scheduling to adjust the memory consumption rate based on the amount of free storage available in the system.
... In this paper, we present a comprehensive empirical evaluation of a rollback based active memory management protocol called cancelback [15]. In this context, we also describe an efficient implementation of the cancelback protocol in an existing multiprocessor Time Warp kernel. ...
... In this context, we also describe an efficient implementation of the cancelback protocol in an existing multiprocessor Time Warp kernel. The cancelback protocol is attractive because it has the "storage optimal" property [15], [19]. The storage optimal property states that the cancelback protocol is able to complete the Time Warp computation within the amount of memory required for the equivalent sequential computation. ...
... Message sendback and Gafni's protocol do not have the storage optimal property: They may not be able to complete the simulation within the sequential amount of memory. 4 The cancelback protocol [15], however, is storage optimal. Unlike message sendback and Gafni's protocol, cancelback is targeted for a shared memory architecture where there is a single shared pool of memory. ...
Article
The performance of the Time Warp mechanism is experimentally evaluated when only a limited amount of memory is available to the parallel computation. An implementation of the cancelback protocol is used for memory management on a shared memory architecture, viz., KSR to evaluate the performance vs. memory tradeoff. The implementation of the cancelback protocol supports canceling back more than one memory object when memory has been exhausted (the precise number is referred to as the salvage parameter) and incorporates a non-work-conserving processor scheduling technique to prevent starvation. Several synthetic and benchmark programs are used that provide interesting stress cases for evaluating the limited memory behavior. The experiments are extensively monitored to determine the extent to which various factors may affect performance. Several observations are made by analyzing the behavior of Time Warp under limited memory: 1) Depending on the available memory and asymmetry in the workload, canceling back several memory objects at one time (i.e., a salvage parameter value of more than one) improves performance significantly, by reducing certain overheads. However, performance is relatively insensitive to the salvage parameter except at extreme values. 2) The speedup vs. memory curve for Time Warp programs has a well-defined knee before which speedup increases very rapidly with memory and beyond which there is little performance gain with increased memory. 3) A performance nearly equivalent to that with large amounts of memory can be achieved with only a modest amount of additional memory beyond that required for sequential execution, if memory management overheads are small compared to the event granularity. These results indicate that contrary to the comm...
... However, such violation can not only be detected at run time but can also be dealt by using the rollback mechanism provided by optimistic algorithms [14] [15] [16]. The Time Wrap [5] [17] is one of the mechanisms of optimistic time management algorithm (TMA) which includes rollback, anti-message, and global virtual time (GVT) computation techniques [1]. GVT defines a lower bound on any unprocessed event in the system and defines the point beyond which events should not be reclaimed [15]. ...
... However, such violation can not only be detected at run time but can also be dealt by using the rollback mechanism provided by optimistic algorithms141516. The Time Wrap [5, 17] is one of the mechanisms of optimistic time management algorithm (TMA) which includes rollback, anti-message, and global virtual time (GVT) computation techniques [1]. GVT defines a lower bound on any unprocessed event in the system and defines the point beyond which events should not be reclaimed [15]. ...
Article
Full-text available
Time Wrap algorithm is a well-known mechanism of optimistic synchronization in a parallel discrete-event simulation (PDES) system. It offers a run time recovery mechanism that deals with the causality errors. For an efficient use of rollback, the global virtual time (GVT) computation is performed to reclaim the memory, commit the output, detect the termination, and handle the errors. This paper presents a new unacknowledged message list (UML) scheme for an efficient and accurate GVT computation. The proposed UML scheme is based on the assumption that certain variables are accessible by all processors. In addition to GVT computation, the proposed UML scheme provides an effective solution for both simultaneous reporting and transient message problems in the context of synchronous algorithm. To support the proposed UML approach, two algorithms are presented in details, with a proof of its correctness. Empirical evidence from an experimental study of the proposed UML scheme on PHOLD benchmark fully confirms the theoretical outcomes of this paper.
... A parallel simulation may consume much more storage than space a sequential simulation no matter which parallel simulation protocol is used [13, 24]. Since extra memory is required to store the histories of processes, memory management for Time Warp is more critical than that for conservative simulation protocols, such as the Chandy-Misra approach. ...
... (A memory management algorithm for parallel simulation is called optimal if the amount of memory consumed by the algorithm is of the same order as the corresponding sequential simulation.) Jefferson proposed the first optimal memory management algorithm, called cancelback [13]. In this protocol, when the Time Warp simulation runs out of memory, objects (i.e., input messages, states, or output messages) with send times later than GVT are cancelled to make more room. ...
Article
Simulation is a powerful tool for studying the dynamics of a system. However, simulation is time-consuming. Thus, it is natural to attempt to use multiple processors to speed up the simulation process. Many protocols have been proposed to perform discrete event simulation in multi-processor environments. Most of these distributed discrete event simulation protocols are either conservative or optimistic. The most common optimistic distributed simulation protocol is called Time Warp. Several issues must be considered when designing a Time Warp simulation; examples are reducing the state saving overhead and designing the global control mechanism (i. e., global virtual time computation, memory management, distributed termination, and fault tolerance). This paper addresses these issues. We propose a heuristic to select the checkpoint interval to reduce the state saving overhead, generalize a previously proposed global virtual time computation algorithm, and present new algorithms for memory management, distributed termination, and fault tolerance. The main contribution of this paper is to provide guidelines for designing an efficient Time Warp simulation.
... When the originating process receives the sent back message, it rolls back to the state when the message was sent. The primary goal is to return i to a condition where there is room to process one message[1, 5]. Gafni's Protocol When an LP runs out of memory, this protocol selects a memory object and discards it. ...
... They have both been shown to be memory optimal by Lin[?]. Cancelback Cancelback extends Gafni's protocol to select any memory object, with a timestamp greater than GVT for cancelling[1]. Artificial Rollback Artificial rollback can be viewed as a simplified implementation of cancelback. ...
Conference Paper
This paper examines memory management issues associated with Time Warp synchronized parallel simulation on distributed memory machines. The paper begins with a summary of the techniques which have been previously proposed for memory management on various parallel processor memory structures. It then concentrates the discussion on parallel simulation executing on a distributed memory computer—a system comprised of separate computers, interconnected by a communications network. An important characteristic of the software developed for such systems is the fact that the dynamic memory is allocated from a pool of memory that is shared by all of the processes at a given processor. This paper presents a new memory management protocol, pruneback, which recovers space by discarding previous states. This is different from all previous schemes such as artificial rollback and cancelback which recover memory space by causing one or more logical processes to roll back to an earlier simulation time. The paper includes an empirical study of a parallel simulation of a closed stochastic queueing network showing the relationship between simulation execution time and amount of memory available. The results indicate that using pruneback is significantly more effective than artificial rollback (adapted for a distributed memory computer) for this problem. In the study, varying the memory limits over a 2:1 range resulted in a 1:2 change in artificial rollback execution time and almost no change in pruneback execution time.
... The optimistic scheme-Time Warp protocol [3,4] allows to each LPi keeps calculations in its local simulated time (LV T i ) under the assumption that message communication between processors arrives at proper time. In the case when causality error occurs, e.g., because LPi receives a message with a time stamp smaller than LV T i , the calculations are rolled back. ...
Article
Full-text available
The paper describes the design, performance and applications of ASimJava, a Java-based library for distributed simulation of large networks. The important issues associated with the implementation of parallel and distributed simulation are discussed. The focus is on the effectiveness of different synchronization protocols implemented in ASimJava. The practical example - computer network simulation - is provided to illustrate the operation of the presented software tool.
... The Time Wrap [5,17] is one of the mechanisms of optimistic time management algorithm (TMA) which includes rollback, anti-message, and global virtual time (GVT) computation techniques [1]. GVT defines a lower bound on any unprocessed event in the system and defines the point beyond which events should not be reclaimed [15]. ...
... In fact, it is possible for some simulations to run to completion without ever restoring state-saving memory resources [27]. On the other hand, simulations using full state saving mechanisms, especially when object states are large, consume large amounts of memory and must perform GVT updates more frequently, use state skipping strategies to reduce memory consumption [5], or protect their memory by cancelback protocols [7,13]. With incremental state saving techniques becoming more popular, the requirement for frequent GVT updates (in order to return memory resources) has diminished. ...
Article
Full-text available
Global Virtual Time (GVT) is the fundamental synchronization concept in optimistic simulations. It is defined as the earliest time tag within the set of unprocessed pending events in distributed simulation. A number of techniques for determining GVT have been proposed in recent years, each having their own intrinsic properties. However, most of these techniques either focus on specific types of simulation problems or assume specific hardware support. This paper specifically addresses the GVT problem in the context of the following area• Scalability• Efficiency• Portability• Flow control• Interactive support•Real time useA new GVT algorithm, called SPEEDES GVT, has been developed in the Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) framework. The new algorithm runs periodically but does not disrupt event processing. It provides flow control by processing events risk-free while flushing out messages during the GVT computation. SPEEDES GVT is built from a set of global reduction operations that are easily implementable on any hardware system.
... These techniques include infrequent state-saving [2], incremental state-saving [17, 30], and most recently reverse computation [6]. Rollback-based protocols have demonstrated that Time Warp systems can execute in no more memory than the corresponding sequential simulation, such as Artificial Rollback [22] and Cancelback [19], however performance suffers. Adaptive techniques [10] , which adjust the amount of memory dynamically , have been shown to improve performance under " rollback thrashing " conditions and reduce memory consumption to within a constant factor of sequential. ...
Article
In this paper, we introduce a new Time Warp system called ROSS: Rensselaer’s optimistic simulation system. ROSS is an extremely modular kernel that is capable of achieving event rates as high as 1,250,000 events per second when simulating a wireless telephone network model (PCS) on a quad processor PC server. In a head-to-head comparison, we observe that ROSS out performs the Georgia Tech Time Warp (GTW) system by up to 180% on a quad processor PC server and up to 200% on the SGI Origin 2000. ROSS only requires a small constant amount of memory buffers greater than the amount needed by the sequential simulation for a constant number of processors. ROSS demonstrates for the first time that stable, highly efficient execution using little memory above what the sequential model would require is possible for low-event granularity simulation models. The driving force behind these high-performance and low-memory utilization results is the coupling of an efficient pointer-based implementation framework, Fujimoto's fast GVT algorithm for shared memory multiprocessors, reverse computation and the introduction of kernel processes (KPs). KPs lower fossil collection overheads by aggregating processed event lists. This aspect allows fossil collection to be done with greater frequency, thus lowering the overall memory necessary to sustain stable, efficient parallel execution. These characteristics make ROSS an ideal system for use in large-scale networking simulation models. The principle conclusion drawn from this study is that the performance of an optimistic simulator is largely determined by its memory usage.
... Two protocols used to limit memory utilization in an overoptimistic simulation are artificial rollback (Jefferson 1990) and cancel back (Lin and Preiss 1991 ). These protocols are used when the system runs out of memory and fos-sil collection attempts cannot reclaim the memory needed for the simulation to progress. ...
Conference Paper
In standard optimistic parallel event simulation, no restriction exists on the maximum lag in simulation time between the fastest and slowest logical processes (LPs). Over-optimistic applications exhibit a large lag, which encourages rollback and may degrade performance. We investigate an approach for controlling over-optimism that classifies LPs as fast, medium, or slow and migrates fast and/or slow processes. Fast LPs are aggregated, forcing them to compete for CPU cycles. Slow LPs are dispersed, to limit their competition for CPU cycles. The approach was implemented on distributed Georgia Tech Time Warp (GTW) (Das et al. 1994) and experiments performed using the synthetic application P-Hold (Fujimoto 1990). For over-optimistic test cases, our approach was found to perform 1.25 to 2.75 times better than the standard approach in terms of useful work and to exhibit execution times shorter than or equal to the standard computation.
... The objective of memory management is the reclamation of space from simulation objects so that a stalled simulation can continue. These techniques include cancelback [10] , artificial rollback [11], and adaptive memory management [6]. In an attempt to prevent a simulation from running out of memory and to increase processor utilization, dynamic load balancing algorithms transfer LPs from heavily loaded processors to lightly loaded ones [3, 1, 4]. ...
Conference Paper
We present an algorithm which integrates flow control and dynamic load balancing in Time Warp. The algorithm is intended for use in a distributed memory environment. Our flow control algorithm makes use of stochastic learning automata and is similar to the leaky-bucket flow control algorithm used in computer networks. It regulates the flow of messages between processors continously throughout the course of the simulation, while the dynamic load balancing algorithm is invoked only when a load imbalance is detected. We compare the perfomance of the flow control algorithm, the dynamic load balancing algorithm and the integrated algorithm with that of a simulation without these controls. We simulated large shuffle ring networks with and without hot spots and a PCS network on an SGI Origin 2000 system. Our results indicate that the flow control scheme alone succeeds in greatly reducing the number and length of rollbacks as well as the number of anti-messages, thereby increasing the number of non-rolledback messages processed per second. It results in a large reduction in the amount of memory used and outperforms the dynamic load balancing algorithm for these measures. The integrated scheme produces even better results for all of these measures and results in reduced execution times
... The purpose of this restriction was to limit the memory usage of a Chandy-Misra simulation. However, Lin and Preiss [8] and Jefferson [9] showed that in general limiting the input buffer capacities of processes does not limit the total memory usage for a Chandy-Misra simulation. We assume that the input buffer capacity of a process is unlimited. ...
Article
Most small-scale simulation applications are implemented by sequential simulation techniques. As the problem size increases, however, sequential techniques may be unable to manage the time complexity of the simulation applications adequately. It is natural to consider re-implementing the corresponding large-scale simulations using parallel techniques, which have been reported to be successful in reducing the time complexity for several examples. However, parallel simulation may not be effective for every application. Since the implementation of parallel simulation for an application is usually very expensive, it is required to investigate the performance of parallel simulation for a particular application before re-implementing the simulation. The Chandy-Misra parallel, discrete-event simulation paradigm has been utilized in many large-scale simulation experiments, and several significant extensions have been based on it. Hence the Chandy-Misra protocol is adopted here as a basic model of parallel simulation to which our performance prediction techniques are applied. For an existing sequential simulation program based on the process interaction model, this paper proposes a technique for evaluating Chandy-Misra parallel simulation without actually implementing the parallel program. The idea is to insert parallelism analysis code into the sequential simulation program. When the modified sequential program is executed, the time complexity of the parallel simulation based on the Chandy-Misra protocol is computed. Our technique has been used to determine whether a giant Signaling System 7 simulation (sequential implementation) should be re-implemented using the parallel simulation approach
... The Jade TimeWarp system found it necessary as a practical measure to solve this problem. This was done using an elegant algorithm called "cancelback" [Jefferson 1990]. It provides a consistent technique to reclaim optimistic execution and events in such a way that memory exhaustion occurs deterministically. ...
Article
Full-text available
The use of shared memory computers to implement the TimeWarp algorithm for distributed discrete event simulation is discussed. Actual experience on an implementation for the KSR-1 is described and compared with another implementation on explicit message passing machines. Modifications necessary to achieve good speedup are described. Performance results on the KSR-1 and on the SPARCServer-1000 are reported. Introduction TimeWarp is a complex algorithm for distributing and parallelizing discrete event simulations. The algorithm was originally proposed by Jefferson in 1982 [Jefferson 1985], [Fujimoto 1990a]. Parallelizing discrete event simulations is a difficult problem because of the need to synchronise the simulation times of different simulation objects. The TimeWarp algorithm solves this by executing events optimistically and later, if necessary, rolling-back (undoing) events that should not in fact have been executed. A simulation executed using TimeWarp is usually decomposed as ...
Article
The paper is concerned with discrete event‐driven simulation, which is a well‐known technique used for modeling and simulating complex parallel systems. The focus of the paper is on distributed simulation, in which multiple simulated event queues are processed in parallel according to the Time Warp approach. With this approach, parallel simulation based on event queues is allowed to progress in an optimistic way until event correlation errors appear in parallel simulation branches what results in simulation rollbacks. In the absence of synchronization between simulator parallel queues, massive processing rollbacks can strongly slow down simulation. The paper presents a distributed optimistic event‐driven simulation control based on simulator global state monitoring. A systematic control of the simulator global states prevents excessive rollbacks in the Time Warp simulation. Each simulation event queue reports its progress to a global synchronizer which monitors the global simulation state based on virtual timestamps of recently processed events. Based on the global state, the synchronizer checks the simulation progress and sends control signals which asynchronously slow down servicing too advanced queues. The paper describes the principles of the proposed approach and experimental results of its basic program implementation. Comparisons to existing parallel simulation methods are provided.
Chapter
This chapter is about the history of the Time Warp algorithm and optimistic approaches to parallel discrete event simulation. It concentrates on the early history from our personal perspective as active developers of the ideas over several decades.
Conference Paper
Global Virtual Time (GVT) is the fundamental synchronization concept in optimistic simulations. It is defined as the earliest time tag within the set of unprocessed pending events in a distributed simulation. A number of techniques for determining GVT have been proposed in recent years, each having their own intrinsic properties. However, most of these techniques either focus on specific types of simulation problems or assume specific hardware support. This paper specifically addresses the GVT problem in the context of the following areas: Scalability, Efficiency, Portability, Flow control, Interactive support, Real time use. A new GVT algorithm, called SPEEDES GVT, has been developed in the Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) framework. The new algorithm runs periodically but does not disrupt event processing. It provides flow control by processing events risk-free while flushing out messages during the GVT computation. SPEEDES GVT is built from a set of global reduction operations that are easily implementable on any hardware system
Article
Full-text available
This project focused on research involving a simulation-based planning methodology and toolkit to help in real-time planning and decision making. A new methodology called OOPM (Object Oriented Physical Modeling) was developed. Development of the MOOSE (Multimodeling Object Oriented Simulation Environment) system, based on the OOPM methodology, has also been underway including modeling windows for finite state automata, differential equations, and functional models. The object-oriented multimodel framework of MOOSE is composed of three parts consisting of a graphical user interface, BLOCKS models based on the BLOCKS modeling language, and the SimPack Toolkit. MOOSE represents a software prototype for constructing simulation-based planning scenarios. The current scenario involves an interdiction mission. The goal is to achieve the mission goals with the fewest casualties. The best plan to achieve this goal is determined with a combination of a 'qualitative' rule-based model and lower level 'quantitative' block-structured models. The MOOSE architecture permits a model abstraction capability to allow users to realize time constraints on the simulation-based planning by simulating models at different levels. The use of an iterative deepening approach to simulation is being investigated.
Article
Full-text available
The article deals with the problems of implementation of parallel discrete event simulation in which every process represents an object in the simulation. The main problem of parallel discrete event simulation is the time synchronization of the processes, running on different processors. One approach to the solution of this problem, called Virtual Time Concept, or Time Warp, is presented. In this article we will describe an approach to implementing Time Warp and some techniques that allows to implement Time Warp more efficiently.
Article
Full-text available
The article deals with the problems of parallel discrete event simulation in which every process represents an object in the simulation. The main problem of parallel simulation discrete event simulation is the time synchronization of the processes, running on different processors. One approach to the solution of this problem, called Virtual Time Concept, or Time Warp, is presented. The Linda programming language is considered as a tool for description of parallel running processes. The Linda programming language (or system) works in a distributed environment. The distributed environment is thought of as a set of processors which run in parallel and which do not share a common memory. The processors communicate only via communication links.
Article
This paper studies the space complexity of an optimistic parallel simulation protocol called Time Warp. We evaluate four Time Warp memory management algorithms: fossil collection, message sendback, cancelback and artificial rollback. We identify two criteria in designing Time Warp memory management algorithms. Criterion 1 tests if a memory management algorithm ensures that Time Warp simulation always stops (either completes or terminates when memory is exhausted). If an algorithm does not satisfy this criterion, then the simulation may be trapped in an infinite loop. Criterion 2 tests if a memory management algorithm is independent of processor parameters (e.g., number of processors available for the parallel simulation, processor speed and interprocessor communication costs). We show that if an algorithm satisfies this second criterion, then the amount of memory consumed by Time Warp simulation is bounded by the amount consumed by sequential simulation. For algorithms that do not have full control of uncommitted objects (e.g., fossil collection and message sendback), Criterion 2 is not satisfied in general. For algorithms that have full control of uncommitted objects (e.g., cancelback and artificial rollback), special treatments are necessary to satisfy Criterion 1 (i.e., to ensure that the algorithms do not cancel future objects such that global virtual time never advances).
Article
Over the past decade, techniques for parallel and distributed discrete-event simulation have been developed largely within the academic research community. The advent of cluster computing and desktop multiprocessors has now made it feasible to bring these results to bear on the simulation of large, complex engineering systems and on physical systems in a cost-effective manner. Our challenge now is to apply these techniques to real-world problems, i.e., to bring this area out of the laboratory. This special issue of JPDC is devoted to a collection of articles which are representative of the areas to which parallel and distributed simulation are now being applied. We feel that these articles represent a major change in the way in which discrete-event simulation will be done in the future. Taken together, they give us an indication of the benefits which will accrue from our increased ability to simulate problems which it is not possible to approach on conventional uniprocessors.
Chapter
Full-text available
In this chapter we describe real-time simulation of large-scale networks and compare it against other major tools for networking research. We discuss the problems that may prevent simulation from achieving real-time performance and subsequently present our current solutions. We conduct large-scale network experiments incorporating real-time simulation to demonstrate its capabilities. Future work includes efficient background traffic models for large-scale networks, highperformance communication conduit for connecting virtual machines and the real-time simulator, and effective methods for configuring, running and visualizing network experiments.
Article
The verification of VLSI circuits, which are ever increasing in size and complexity, is bottlenecked during simulation within the circuit design process. Distributed simulation on a cluster of workstations or a shared memory multiple processor computer attempts a cost-effective solution. The key factor in performance of these simulations is the development of distributed simulation algorithms that make use of the circuits' underlying properties. This study presents the design and implementation of a distributed simulation framework to better produce and test simulation algorithms. The framework includes a technique to generate circuits, since large commercial circuits are hard to obtain, a Verilog compiler, and a simulator template to ease implementation.
Article
In this thesis, we consider the problem of dynamic load balancing for parallel discrete event simulation. We focus on the optimistic synchronization protocol, Time Warp. A distributed load balancing algorithm was developed, which makes use of the active process migration in Clustered Time Warp. Clustered Time Warp is a hybrid synchronization protocol; it uses an optimistic approach between the clusters and a sequential approach within the clusters. As opposed to the centralized algorithm developed by H. Avril for Clustered Time Warp, the presented load balancing algorithm is a distributed token-passing one. We present two metrics for measuring the load: processor utilization and processor advance simulation rate. Different models were simulated and tested: VLSI models and queuing network models (pipeline and distributed networks). Results show that improving the performance of the system depends a great deal on the nature of the simulated model. For the VLSI model, we also examined the effect of the dynamic load balancing algorithm on the total number of processed messages per unit time. Performance results show that dynamically balancing the load, the throughput of the simulation was improved by more than 100%.
Article
The research project reported in this thesis considers Multiple Distributed Replications in Parallel (MDRIP), a hybrid approach to parallelisation of quantitative stochastic discrete-event simulation. Parallel Discrete-Event Simulation (PDES) generally covers distributed simulation or simulation with replicated trials. Distributed simulation requires model partitioning and synchronisation among submodels. Simulation with replicated trials can be executed on-line by applying Multiple Replications in Parallel (MRIP). MDRIP has been proposed for overcoming problems related to the large size of simulated models and their complexity, as well as with the problem of controlling the accuracy of the final simulation results. A survey of PDES investigates several primary issues which are directly related to the parallelisation of DES. A secondary issue related to implementation efficiency is also covered. Statistical analysis as a supporting issue is described. The AKAROA2 package is an implementation of making such supporting issue effortless. Existing solutions proposed for PDES have exclusively focused on collecting of output data during simulation and conducting analysis of these data when simulation is finished. Such off-line statistical analysis of output data offers no control of statistical errors of the final estimates. On-line control of statistical errors during simulation has been successfully implemented in AKAROA2, an automated controller of output data analysis during simulation executed in MRIP. However, AKAROA2 cannot be applied directly to distributed simulation. This thesis reports results of a research project aimed at employing AKAROA2 for launching multiple replications of distributed simulation models and for on-line sequential control of statistical errors associated with a distributed performance measure; i.e. with a performance measure which depends on output data being generated by a number of submodels of distributed simulation. We report changes required in the architecture of AKAROA2 to make MDRIP possible. A new MDRIP-related component of AKAROA2, a distributed simulation engine mdrip engine, is introduced. Stochastic simulation in its MDRIP version, as implemented in AKAROA2, has been tested in a number of simulation scenarios. We discuss two specific simulation models employed in our tests: (i) a model consisting of independent queues, and (ii) a queueing network consisting of tandem connection of queueing systems. In the first case, we look at the correctness of message orderings from the distributed messages. In the second case, we look at the correctness of output data analysis when the analysed performance measures require data from all submodels of a given (distributed) simulation model. Our tests confirm correctness of our mdrip engine design in the cases considered; i.e. in models in which causality errors do not occur. However, we argue that the same design principles should be applicable in the case of distributed simulation models with (potential) causality errors.
Conference Paper
Full-text available
Distributed discrete-event simulation is proposed as ail alternative to traditional sequential simulation. The paper discusses some important issues associated with the implementation of parallel and distributed simulation. The topics that are covered are classical and new synchronization protocols. The effectiveness of different algorithms for asynchronous simulation is discussed based on the numerical tests results.
Conference Paper
This paper reviews issues concerning the design of adaptive protocols for parallel discrete event simulation (PDES). The need for adaptive protocols are motivated in the background of the classical synchronisation problem that has driven much of the research in this field. Traditional conservative and optimistic protocols and their hybrid variants-that form the basis of adaptive protocol-are also discussed. Adaptive synchronisation protocols are reviewed with special reference to their characteristics regarding the aspects of the simulation state that influence the adaptive decision and the control parameters used. Finally, adaptive load management strategies and their relationship to the synchronisation protocol are discussed.
Conference Paper
We introduce a new time warp system called ROSS: Rensselaer's Optimistic Simulation System. ROSS is an extremely modular kernel that is capable of achieving event rates as high as 1,250,000 events per second when simulating a wireless telephone network model (PCS) on a quad processor PC server. In a head-to-head comparison, we observe that ROSS out performs the Georgia Tech Time Warp (GTW) system on the same computing platform by up to 180%. ROSS only requires a small constant amount of memory buffers greater than the amount needed by the sequential simulation for a constant number of processors. The driving force behind these high-performance and low memory utilization results is the coupling of an efficient pointer-based implementation framework, Fujimoto's (1989) fast GVT algorithm for shared memory multiprocessors, reverse computation and the introduction of kernel processes (KPs). KPs lower fossil collection overheads by aggregating processed event lists. This aspect allows fossil collection to be done with greater frequency, thus lowering the overall memory necessary to sustain stable, efficient parallel execution
Conference Paper
Focuses on evaluating metrics for use with the dynamic load balancing of optimistic simulations. We present a load balancing algorithm which is token-based and is used in conjunction with clustered time warp (CTW). CTW is a hybrid synchronization protocol, which makes use of a sequential algorithm within clusters of logical processes (LPs) and time warp between the clusters. We define three separate metrics and measure their effectiveness in different simulation environments. One metric measures processor utilization, a second measures the difference in virtual times between the clusters, while a third is a combination of these two metrics. We compare the execution time, memory consumption and throughput obtained in three simulation environments by each of these metrics and to the results obtained without load balancing. Our categories of simulation are VLSI simulations, characterized by a large number of LPs and a low computational granularity; distributed network simulations, in which the workload varies spatially over the execution of the simulation; and a pipeline simulation, characterized by a single direction of message flow. The experiments revealed a significant improvement in the simulation times in the first two categories of simulations when we employed the processor utilization and the combination metrics. For example, improvements of up to 70% were obtained for VLSI simulations. None of the metrics proved to be effective for the pipeline simulation
Conference Paper
We present an algorithm which integrates flow control and dynamic load balancing in order to improve the performance and stability of Time Warp. The algorithm is intended for use in a distributed memory environment such as a cluster of workstations connected by a high speed switch. Our flow control algorithm makes use of stochastic learning automata and operates in the fashion of the leaky-bucket flow control algorithm used in computer networks. It regulates the flow of messages between processors continuously throughout the course of the simulation, while the dynamic load balancing algorithm is invoked only when a load imbalance is detected. Both algorithms make use of a space-time product metric and collect the requisite information via a snapshot-based GVT algorithm. We compare the performance of the flow control algorithm, the dynamic load balancing algorithm and the integrated algorithm with that of a simulation without any of these controls. We simulated large shuffle ring networks with and without hot spots and a PCS network on an SGI Origin 2000 system. Our results indicate that the flow control scheme alone succeeds in greatly reducing the number and length of roll-backs as well as the number of anti-messages, thereby increasing the number of non-rolled back messages processed per second. It results in a large reduction in the amount of memory used and outperforms the dynamic load balancing algorithm for these measures. The integrated scheme produces even better results for all of these measures and results in reduced execution times as well
Conference Paper
In this paper we describe a method for the use of the Java object-oriented programming language for simulation. Our approach is to use the Unified Modeling Language, recently proposed for general-purpose modeling and software engineering, as a modeling and simulation language with Java as the implementation platform. Using the Model-View-Controller design pattern, views are added to models for such purposes as statistics collection, animation, and checkpoint recording. Controllers are added to models to handle events, or in the case of a simulation, to synthetically generate events. CASE tools being developed for UML will include Java code generation, making the approach proposed in this paper a practical method for simulation modeling
Conference Paper
In a distributed memory environment the communication overhead of Time Warp as induced by the rollback procedure due to “overoptimistic” progression of the simulation is the dominating performance factor. To limit optimism to an extent that can be justified from the inherent model parallelism, an optimism control mechanism is proposed, which by maintaining a history record of virtual time differences from the time stamps carried by arriving messages, and forecasting the timestamps of forthcoming messages, probabilistically delays the execution of scheduled events to avoid potential rollback and associated communication overhead (antimessages). After investigating statistical forecast methods which express only the central tendency of the arrival process, we demonstrate that arrival processes in the context of Time Warp simulations of timed Petri nets have certain predictable and consistent ARIMA characteristics, which encourage the use of sophisticated and recursive forecast procedures based on those models. Adaptiveness is achieved in two respects: the synchronization behavior of logical processes automatically adjusts to that point in the continuum between optimistically progressing and conservatively blocking, that is the most adequate for (i) the specific simulation model and (ii) the communication/computation speed characteristics of the underlying execution platform
Conference Paper
This paper proposes a critical path-like analyzer to predict the amount of memory consumed in a specific Chandy-Misra simulation: Segments of code are inserted into the existing sequential simulation program, and this modified simulation program is called a memory analyzer. The amount of memory needed in the corresponding Chandy-Misra simulation is computed along with the execution of the memory analyzer. Experiments to evaluate the analyzer are in progress
Conference Paper
Full-text available
The main performance pitfall of the Time Warp distributed discrete event simulation (DDES) protocol has been widely recognized to be the overoptimistic progression of event execution into the simulated future. The premature execution of events that eventually have to be “rolled back” due to causality violations induces memory and communication overheads as sources of performance inefficiencies. Optimistic Time Windows and self adaptive mechanisms have been proposed in the literature to control the optimism in Time Warp in order to improve or optimize its execution performance. An adaptive optimism control mechanism based on the observed model parallelism is proposed. Methodologically, logical processes (LPs) monitor the local virtual time (LVT) progression per unit CPU time from the timestamp of arriving messages and establish a cost model for the tradeoff between optimistically progressing and conservatively blocking the simulation engine. Compared to previous approaches, an optimal CPU delay interval is computed from the rollback probability and the overhead induced by the rollback procedure, such that the LP can adapt the synchronization behavior to the amount of optimism that can be justified from the parallelism inherent in the simulation model. Experiments with an implementation on a distributed memory multiprocessor (iPSC/860) show that the protocol is able to automatically adjust the local virtual time progression such that rollback overhead is minimized, and that the original Time Warp protocol can be outperformed
Article
Complex models may have model components distributed over a network and generally require significant execution times. The field of parallel and distributed simulation has grown over the past fifteen years to accommodate the need of simulating the complex models using a distributed versus sequential method. In particular, asynchronous parallel discrete event simulation (PDES) has been widely studied, and yet we envision greater acceptance of this methodology as more readers are exposed to PDES introductions that carefully integrate real-world applications. With this in mind, we present two key methodologies (conservative and optimistic) which have been adopted as solutions to PDES systems. We discuss PDES terminology and methodology under the umbrella of the personal communications services application
Article
This manual gives an introduction to writing parallel discrete event simulation programs for the Georgia Tech Time Warp (GTW) system (version 3.1). Time Warp is a synchronization mechanism for parallel discrete event simulation programs. GTW is a Time Warp simulation kernel implemented on distributed network of uniprocessor and shared memory multiprocessor machines. Copyright 1994 (R) Georgia Tech Research Corporation Atlanta, Georgia USA 30332 Use of this program shall be restricted to internal research purposes only, and it may not be redistributed in any form without authorization from the Georgia Tech Research Corporation. Derivative works must carry this Copyright notice. This program is provided as is and Georgia Tech Research Corporation disclaims all warranties with regard to this program. In no event shall Georgia Tech Research Corporation be liable for any damages arising out of or in connection with the use or performance of this program. Contents 1 Introduction 1 2 S...
Article
this memory to the application. It is functionally identical to the standard malloc() procedure defined in the C library. TWMalloc() should only be called during the initialization phase of the simulation. Applications calling TWMalloc() during the simulation itself should still execute correctly provided sufficient memory is available in the system. However, calling TWMalloc() in this way suffers from the following problems: 1. If an event invoking TWMalloc() is rolled back, the memory will be lost; at present, the kernel does not automatically reclaim it. 2. The current implementation of TWMalloc() calls malloc() to allocate memory. The
Article
Full-text available
The main performance pitfall of the Time Warp distributed discrete event simulation (DDES) protocol has been widely recognized to be the overoptimistic progression of event execution into the simulated future. The premature execution of events that eventually have to be "rolled back" due to causality violations induces memory and communication overheads as sources of performance inefficiencies. Optimistic Time Windows and self adaptive mechanisms have been proposed in the literature to control the optimism in Time Warp in order to improve or optimize its execution performance.
Article
Full-text available
The Time Warp mechanism is a protocol for synchronising a distributed computation of message-passing processes. Since its introduction in 1985 it has received attention specifically as a mechanism for synchronising a distributed discrete event simulation. The Time Warp mechanism is conceptually simple and has many attractive features.
Article
7 [Simulation and Modeling]: Simulation Support Systems General Terms: Experimentation, Measurement, Performance A preliminary version of this work appeared in the Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1994. This work was supported by NSF grant number MIP-94085550 and Innovative Science and Technology contract number DASG60-93-C-0126 provided by the Ballistic Missile Defense Organization and managed through the Strategic Defense Command Advanced Technology Directorate Processing Division. The work of the first author is also supported by DoD/AFOSR grant number F49620-96-1-0472 and NSF grant number CDA-9529541. NOTE: This is a preliminary release of an article accepted by the ACM Transactions on Modeling and Computer Simulation. The definitive version is currently in production at ACM and, when released, will supersede this version. Copyright (C) 1996 by the Association for Computing Mach
Conference Paper
Full-text available
This paper describes the Time Warp Operating System, under development for three years at the Jet Propulsion Laboratory for the Caltech Mark III Hypercube multi- processor. Its primary goal is concurrent execution of large, irregular discrete event simulations at maximum speed. It also supports any other distributed applica- tions that are synchronized by virtual time. The Time Warp Operating System includes a complete implementation of the Time Warp mechanism, and is a substantial departure from conventional operating systems in that it performs synchronization by a general distributed process rollback mechanism. The use of general rollback forces a rethinking of many aspects of operating system design, including programming in- terface, scheduling, message routing and queueing, storage management, flow control, and commitment. In this paper we review the mechanics of Time Warp, describe the TWOS operating system, show how to construct simulations in object-oriented form to run under TWOS, and offer a qualitative comparison of Time Warp to the Chandy-Misra method of distributed simulation. We also include details of two benchmark simulations and preliminary measurements of time-to- completion, speedup, rollback rate, and antimessage rate, all as functions of the number of processors used.
Article
Full-text available
The problem of system simulation is typically solved in a sequential manner due to the wide and intensive sharing of variables by all parts of the system. We propose a distributed solution where processes communicate only through messages with their neighbors; there are no shared variables and there is no central process for message routing or process scheduling. Deadlock is avoided in this system despite the absence of global control. Each process in the solution requires only a limited amount of memory. The correctness of a distributed system is proven by proving the correctness of each of its component processes and then using inductive arguments. The proposed solution has been empirically found to be efficient in preliminary studies. The paper presents formal, detailed proofs of correctness.
Article
Although many distributed simulation strategies have been developed, to data, little empirical data is available to evaluate their performance. A multiprocessor-based, distributed simulation testbed is described that was designed to facilitate controlled experimentation with distributed simulation algorithms. Using this testbed, the performance of simulation strategies using deadlock avoidance and deadlock detection and recovery techniques was examined under various synthetic workloads. The distributed simulators were compared with a uniprocessor-based event list implementation. Results of a series of experiments are reported that demonstrate that message population and the degree to which processes can look ahead in simulated time play critical roles in the performance of distributed simulators using these algorithms. An avalanche phenomenon was observed in the deadlock detection and recovery simulators as message population was increased, and was found to be a necessary condition for achieving good performance. It is demonstrated that these distributed simulation algorithms can provide significant speedups over sequential event list implementations for some workloads, even in the presence of only a moderate amount of parallelism and many feedback loops. However, a moderate to high degree of parallelism was not sufficient to guarantee good performance for all workloads that were tested.
Article
An approach to carrying out asynchronous, distributed simulation on multiprocessor message-passing architectures is presented. This scheme differs from other distributed simulation schemes because (1) the amount of memory required by all processors together is bounded and is no more than the amount required in sequential simulation and (2) the multiprocessor network is allowed to deadlock, the deadlock is detected, and then the deadlock is broken. Proofs for the correctness of this approach are outlined.
Article
Traditional discrete-event simulations employ an inherently sequential algorithm. In practice, simulations of large systems are limited by this sequentiality, because only a modest number of events can be simulated. Distributed discrete-event simulation (carried out on a network of processors with asynchronous message-communicating capabilities) is proposed as an alternative; it may provide better performance by partitioning the simulation among the component processors. The basic distributed simulation scheme, which uses time encoding, is described. Its major shortcoming is a possibility of deadlock. Several techniques for deadlock avoidance and deadlock detection are suggested. The focus of this work is on the theory of distributed discrete-event simulation.
Conference Paper
In this paper we introduce a new analytical approach to modeling the performance of systems synchronized by timestamp mechanisms, including database systems. We define the virtual time - real time (T-V) plane, and an important kind of stochastic process that we call linear Poisson processes. We show how to calculate the rate of preemption (corresponding to the rate of abortion or rollback in concurrency control mechanisms) and the waiting time until last preemption (corresponding to commit time) for linear Poisson processes. Finally, we apply this theory, analyzing one example system synchronized by the Time Warp mechanism.
Conference Paper
A variation of the Time Warp parallel discrete event simulation mechanism is presented that is optimized for execution on a shared memory multiprocessor. In particular, the direct cancellation mechanism is proposed that eliminates the need for anti-messages and provides an efficient mechanism for cancelling erroneous computations. The mechanism thereby eliminates many of the overheads associated with conventional, message-based implementations of Time Warp. More importantly, this mechanism effects rapid repairs of the parallel computation when an error is discovered. Initial performance measurements of an implementation of the mechanism executing on a BBN Butterfly multiprocessor are presented. These measurements indicate that the mechanism achieves good performance, particularly for many workloads where conservative clock synchronization algorithms perform poorly. Speedups as high as 56.8 using 64 processors were obtained. However, our studies also indicate that state saving overheads represent a significant stumbling block for many parallel simulations using Time Warp. (kr)
Article
We address the question of whether one can find a worst-case example simulation model, on which the Time Warp approach to parallel discrete event simulation can arbitrarily outperform the Chandy-Misra conservative methods - or vice versa. Under our simplifying assumptions, we prove that: (1) there exists a p-process simulation model on which Time Warp outperforms Chandy-Misra by a factor of p, and that; (2) no opposite example exists; Chandy-Misra can only outperform Time Warp by a constant factor.
Performance of the Colliding Pucks Simulation on the Time Warp Operating System (Part I: Asynchronous behavior and sectoring)
  • Philip Hontalas
  • Brian Beckman
  • Mike Diloreto
  • Leo Blume
  • Peter Rcihcr
  • Kathy Sturdevant
  • L Van Warren
  • John Wcdcl
  • Fred Wicland
  • David Jefferson
SCS Conf. Did. Sim., Vol. 22, No. 2, SCS, Jan., 1990 [Gafni 851 Anat Gafni, "Space Management and Cancellation Mechanisms for Time Warp", Ph.D. Diss., Dept. of Comp. Sci., USC, TR-85-341, Dec. 1985. [Hontalas 89a] Philip Hontalas, Brian Beckman, Mike DiLoreto, Leo Blume, Peter Rcihcr, Kathy Sturdevant, L. Van Warren, John Wcdcl, Fred Wicland, and David Jefferson, "Performance of the Colliding Pucks Simulation on the Time Warp Operating System (Part I: Asynchronous behavior and sectoring)", Proc. 1989 SCS Conf. on Dist. Sim., Sim. Series Vol21, No. 2, SCS, San Diego, 1989
Performance of the Colliding Pucks Simulation on the Time Warp Operating System (Part II: Detailed Analysis)
  • Philip Hontalas
  • Brian Beckman
  • David Jefferson
[Hontalas 89bl Philip Hontalas, Brian Beckman, David Jefferson, "Performance of the Colliding Pucks Simulation on the Time Warp Operating System (Part II: Detailed Analysis)", Proc. SCS Summer Comp. Sim. Conf. Austin, Texas, July 1989
The effect of feedback on the performance of conservative algorithms
  • John Leung
  • Greg Cleary
  • Dirk Lomow
  • Brian Baezncr
  • Unger
[Leung 891 Edwina Leung, John Cleary, Greg Lomow, Dirk Baezncr, and Brian Unger, "The effect of feedback on the performance of conservative algorithms", Proc. 1989 SCS Conf. on Disf. Sim., Sim. Series Vol 21, No. 2, SCS., San Diego, 1989
Scalability of the bounded lag distributed discrete event simulation
[Lubachevsky 891 Boris Lubachevsky "Scalability of the bounded lag distributed discrete event simulation", Proc. 7989 SCS Conf. Disf. Sim., Sim. Series Vol. 21, No. 2, SCS, San Diego, 1989
The Performance of Distributed Combat Simulation with the Time Warp Operating System
  • Abraham Feinberg
  • Michael Diloreto
  • Leo Bloom
  • Joseph Ruffles
  • Peter Reiher
  • Brian Beckman
  • Phil Hontalas
  • Steve Bellenot
  • David Jefferson
Abraham Feinberg, Michael DiLoreto, Leo Bloom, Joseph Ruffles, Peter Reiher, Brian Beckman, Phil Hontalas, Steve Bellenot, and David Jefferson, "The Performance of Distributed Combat Simulation with the Time Warp Operating System", Concurrency Practice and Experience, Vol. 1, No. 1, Sept. 1989