ChapterPDF Available

Minimizing the Null Message Exchange in Conservative Distributed Simulation

Authors:

Abstract

The performance of a conservative time management algorithm in a distributed simulation system degrade s significantly if a large number of null messages are exchanged across the logical processes in order to avoid deadlock. This situation gets more severe when the exchange of null messages is increased due to the poor selection of key parameters such as lookahead values. However, with a mathematical model that can approximate the optimal values of parameters that are directly involved in the performance of a time management algorithm, we can limit the exchange of null messages. The reduction in the exchange of null messages greatly improves the performance of the time management algorithm by both minimizing the transmission overhead and maintaining a consistent parallelization. This paper presents a generic mathematical model that can be effectively used to evaluate the performance of a conservative distributed simulation system that uses null messages to avoid deadlock. Since the proposed mathematical model is generic, the performance of any conservative synchronization algorithm can be approximated. In addition, we develop a performance model that demonstrates that how a conservative distributed simulation system performs with the null message algorithm (NMA). The simulation results show that the performance of a conservative distributed system degrades if the NMA generates an excessive number of null messages due to the improper selection of parameters. In addition, the proposed mathematical model presents the critical role of lookahead which may increase or decrease the amount of null messages across the logical processes. Furthermore, the proposed mathematical model is not limited to NMA. It can also be used with any conservative synchronization algorithm to approximate the optimal values of parameters.
... While there is a considerable literature exploring how poor selection of critical parameters might results poor performance of PDES systems [8, 12], surprisingly little work has examined how critical parameters impact on the performance of PDES systems. These research works indicate the strong relationship among many critical parameters such as Lookahead and frequency of transmission that one may use to quantify the impact of these parameters on the PDES performance. ...
... Cota and Sargent [7] focused on the skew in simulation time between different LPs by exploiting knowledge about the LPs and the topology of the interconnections. Although, much research has been done to evaluate the performance of conservative NMA for inefficiencies and transmission overhead [3, 8, 12], none of them suggest any potential optimization for the NMA. Reference [12] proposed a new approach that shows relationships between many parameters to quantify the performance of PDES system running under NMA. ...
... Although, much research has been done to evaluate the performance of conservative NMA for inefficiencies and transmission overhead [3, 8, 12], none of them suggest any potential optimization for the NMA. Reference [12] proposed a new approach that shows relationships between many parameters to quantify the performance of PDES system running under NMA. It has been shown that the selection of values for several critical parameters such as the values for Lookahead, null message ratio (NMR), and frequency of transmission plays an important role in the generation of null messages [12]. ...
Conference Paper
Full-text available
Null message algorithm is an important conservative time management protocol in parallel discrete event simulation systems for providing synchronization between the distributed computers with the capability of both avoiding and resolving the deadlock. However, the excessive generation of null messages prevents the widespread use of this algorithm. The excessive generation of null messages results due to an improper use of some of the critical parameters such as frequency of transmission and Lookahead values. However, if we could minimize the generation of null messages, most of the parallel discrete event simulation systems would be likely to take advantage of this algorithm in order to gain increased system throughput and minimum transmission delays. In this paper, a new mathematical model for optimizing the performance of parallel and distributed simulation systems is proposed. The proposed mathematical model utilizes various optimization techniques such as variance of null message elimination to improve the performance of parallel and distributed simulation systems. For the sake of simulation results, we consider both uniform and non-uniform distribution of Lookahead values across multiple output lines of an LP. Our experimental verifications demonstrate that an optimal NMA offers better scalability in parallel discrete event simulation systems if it is used with the proper selection of critical parameters.
... Conservative protocols fundamentally maintain causality in event execution by strictly disallowing the processing of events out of time-stamp order [4]. Some recent research on conservative algorithms in DES can be found in11121314. An effort to combine conservative and optimistic synchronization algorithms on a common layered architecture framework is proposed in [15]. ...
... Conservative protocols fundamentally maintain causality in event execution by strictly disallowing the processing of events out of time-stamp order [4]. Some recent research on conservative algorithms in DES can be found in [11][12][13][14]. An effort to combine conservative and optimistic synchronization algorithms on a common layered architecture framework is proposed in [15]. ...
Article
Full-text available
This paper presents a new logical process (LP) simulation model for distributed simulation systems where Null Message Algorithm (NMA) is used as an underlying time management algorithm (TMA) to provide synchronization among LPs. To extend the proposed simulation model for n number of LPs, this paper provides a detailed overview of the internal architecture of each LP and its coordination with the other LPs through sub-system components and models such as communication interface and simulation executive. The proposed architecture of LP simulation model describes the proper sequence of coordination that need to be done among LPs though different subsystem components and models to achieve synchronization. To execute the proposed LP simulation model for different set of parameters, a queuing network model is used. Experiments will be performed to verify the accuracy of the proposed simulation model using the pre-derived mathematical equations. Our numerical and simulation results can be used to observe the exchange of null messages and overhead indices.
... Due to its nature, it requires the definition of some artificial events with the aim of making the simulation proceed. The number of such messages introduced by the synchronization algorithm can be very large [74,75]. Obviously, this communication overhead has a big effect on the WCT. ...
Article
Full-text available
Recent advances in computing architectures and networking are bringing parallel computing systems to the masses so increasing the number of potential users of these kinds of systems. In particular, two important technological evolutions are happening at the ends of the computing spectrum: at the “small” scale, processors now include an increasing number of independent execution units (cores), at the point that a mere CPU can be considered a parallel shared-memory computer; at the “large” scale, the Cloud Computing paradigm allows applications to scale by offering resources from a large pool on a pay-as-you-go model. Multi-core processors and Clouds both require applications to be suitably modified to take advantage of the features they provide. Despite laying at the extreme of the computing architecture spectrum – multi-core processors being at the small scale, and Clouds being at the large scale – they share an important common trait: both are specific forms of parallel/distributed architectures. As such, they present to the developers well known problems of synchronization, communication, workload distribution, and so on. Is parallel and distributed simulation ready for these challenges? In this paper, we analyze the state of the art of parallel and distributed simulation techniques, and assess their applicability to multi-core architectures or Clouds. It turns out that most of the current approaches exhibit limitations in terms of usability and adaptivity which may hinder their application to these new computing architectures. We propose an adaptive simulation mechanism, based on the multi-agent system paradigm, to partially address some of those limitations. While it is unlikely that a single approach will work well on both settings above, we argue that the proposed adaptive mechanism has useful features which make it attractive both in a multi-core processor and in a Cloud system. These features include the ability to reduce communication costs by migrating simulation components, and the support for adding (or removing) nodes to the execution architecture at runtime. We will also show that, with the help of an additional support layer, parallel and distributed simulations can be executed on top of unreliable resources.
... without any semantic content) with the aim to make the simulation proceed and to avoid deadlock. The number of such events introduced by the synchronization algorithm can be very large [18], [38]. In the years, many variants have been proposed to reduce the number of such events [41], but the amount of extra communications for their delivery can still be prohibitive. ...
Conference Paper
Full-text available
In these days two main changes are revolutionizing the execution architectures used to run simulations: on the bottom level the processors (CPU) are gaining more and more cores while on the high level we need to cope with virtual resources: "everything as a service" in a Public Cloud infrastructure. Given the magnitude of these changes, what is going to happen to simulation? What are the (many) limits of current approaches, technologies and tools? Is it possible to finally find a solution to some of the many problems of PADS while broadening its scope? In this tutorial we aim to introduce all the basic aspects of these subjects and to discuss the main drawbacks of the current approaches. The main aim of the tutorial is to foster discussion and to increase the knowledge of some now undervalued technologies. The last part of the tutorial will be about our practical experience in the development of the ARTACE simulation middleware and in the proposal of a new paradigm for adaptive distributed simulation (called GAIA) that could be able to tackle with some of the issues described above. The tutorial will conclude with some examples derived from our experience in the performance evaluation of complex systems.
... One of the main problems associated with the distributed simulation is the synchronization of a distributed execution. If not properly handled, synchronization problems may degrade the performance of a distributed simulation environment [5]. This situation gets more severe when the synchronization algorithm needs to run to perform a detailed logistics simulation in a distributed environment to simulate a huge amount of data [6]. ...
Conference Paper
Full-text available
Mattern’s GVT algorithm is a time management algorithm that helps achieve the synchronization in parallel and distributed systems. This algorithm uses ring structure to establish cuts C1 and C2 to calculate the GVT. The latency of calculating the GVT is vital in parallel/distributed systems which is extremely high if calculated using this algorithm. However, using synchronous barriers with the Matterns algorithm can help improving the GVT computation process by minimizing the GVT latency. In this paper, we incorporate the butterfly barrier to employ two cuts C1 and C2 and obtain the resultant GVT at an affordable latency. Our analysis shows that the proposed GVT computation algorithm significantly improves the overall performance in terms of memory saving and latency.
... A framework is presented on which the distributed discrete event simulation can be built for applications which can be decomposed into feed-forward and feedback networks. Another notable work done mentioned in [2] was the research done by Syed S. Rizvi, K. M. Elleithy, and Aasia Riasat in which they proposed a mathematical model which ...
Conference Paper
Full-text available
In this paper we investigate Chandy-Misra-Bryant Null message algorithm and propose a grouping technique to improve the performance. This technique along with status retrieval which will be explained in detail can improve the performance when compared to the traditional conservative algorithm by Chandy-Misra-Bryant. Null message algorithm is an efficient conservative algorithm that uses null messages to provide synchronization between logical processes in a parallel discrete event simulation (PDES) system. The performance can be decreased if a large number of null messages are generated by LPs to avoid deadlock. The main objective of this research work is to propose a new grouping technique that can be used to reduce the Null messages between the logical processes. Since the performance of Null Message algorithm mainly depends on the Lookahead (L) values, our proposed technique can be used to determine an optimum value of the Lookahead.
... without any semantic content) with the aim to make the simulation proceed and to avoid deadlock. The number of such events introduced by the synchronization algorithm can be very large [18], [38]. In the years, many variants have been proposed to reduce the number of such events [41], but the amount of extra communications for their delivery can still be prohibitive. ...
Article
Full-text available
In this tutorial paper, we will firstly review some basic simulation concepts and then introduce the parallel and distributed simulation techniques in view of some new challenges of today and tomorrow. More in particular, in the last years there has been a wide diffusion of many cores architectures and we can expect this trend to continue. On the other hand, the success of cloud computing is strongly promoting the everything as a service paradigm. Is parallel and distributed simulation ready for these new challenges? The current approaches present many limitations in terms of usability and adaptivity: there is a strong need for new evaluation metrics and for revising the currently implemented mechanisms. In the last part of the paper, we propose a new approach based on multi-agent systems for the simulation of complex systems. It is possible to implement advanced techniques such as the migration of simulated entities in order to build mechanisms that are both adaptive and very easy to use. Adaptive mechanisms are able to significantly reduce the communication cost in the parallel/distributed architectures, to implement load-balance techniques and to cope with execution environments that are both variable and dynamic. Finally, such mechanisms will be used to build simulations on top of unreliable cloud services.
Conference Paper
The performance of a conservative time management algorithm in a distributed simulation system degrades significantly if a large number of null messages are exchanged across the logical processes in order to avoid deadlock. This situation gets more severe when the exchange of null messages is increased due to the poor selection of key parameters such as lookahead values. This paper presents a generic mathematical model that uses null messages to avoid deadlock. Since the proposed mathematical model is generic, the performance of any conservative synchronization algorithm can be approximated. In addition, we develop a performance model that demonstrates that how a conservative distributed simulation system performs with the null message algorithm (NMA). The simulation results show that the performance of a distributed system degrades if the NMA generates an excessive number of null messages due to the improper selection of parameters.
Conference Paper
Full-text available
This paper presents a computing technique for efficient parallel simulation of large-scale discrete-event models on the IBM Cell Broadband Engine (CBE), which has one Power Processor Element (PPE) and eight Synergistic Processing Elements (SPE). Based on the general-purpose Discrete Event System Specification (DEVS), the technique tackles all performance bottlenecks, combining multi-dimensional parallelism and various optimizations. Preliminary experiments have produced very promising results, attaining speedups up to 134.34 and 41.23 over the baseline implementation on PPE and on Intel Core2 Duo E6400 processor respectively. The methods can also be applied to other multicore and shared-memory architectures. We conclude that the technique not only allows discrete-event simulation users to tap CBE potential without being distracted by multicore programming, but also provides insight on migration of legacy software to current and future multicore platforms.
Conference Paper
Full-text available
p>This paper focuses on conservative simulation using distributed-shared memory for inter-processor communication. JavaSpaces, a special service of Java Jini, provides a shared persistent memory for simulation message communication among processors. Two benchmark programs written using our SPaDES/Java parallel simulation library are used. The first program is a linear pipeline system representing a loosely-coupled open system. The PHOLD program represents a strongly-connected closed system. Experiments are carried out using a cluster of Pentium II PCs. We used a combination of Wood Turner carrier null, flushing and demand-driven algorithms for null message synchronization. To optimize message communication, we replace SPaDES/Java inter-processor communication implemented using Java's Remote Method Invocation (RMI) with one JavaSpace. For PHOLD (16x16, 16) running on eight processors, this change reduces simulation runtime by more than half, null message overhead reduces by a further 15%, and event rate more than doubled. Based on our memory analysis methodology, the memory cost of null message synchronization for PHOLD is less than 9% of the total memory needed by the simulation.</p
Article
Full-text available
The problem of system simulation is typically solved in a sequential manner due to the wide and intensive sharing of variables by all parts of the system. We propose a distributed solution where processes communicate only through messages with their neighbors; there are no shared variables and there is no central process for message routing or process scheduling. Deadlock is avoided in this system despite the absence of global control. Each process in the solution requires only a limited amount of memory. The correctness of a distributed system is proven by proving the correctness of each of its component processes and then using inductive arguments. The proposed solution has been empirically found to be efficient in preliminary studies. The paper presents formal, detailed proofs of correctness.
Article
Discrete simulation is a widely used technique for system performance evaluation. The conventional approach to discrete simulation (e.g., GPSS, Simscript) does not attempt to exploit the parallelism typically available in queueing network models. In this paper, a distributed approach to discrete simulation is presented. It involves the decomposition of a simulation into components and the synchronization of these components by message passing. This approach can result in the speedup of the total time to complete a given simulation if a network of processors is available. The architecture of a microcomputer network suitable for distributed simulation is described and some results concerning the distributed approach are presented.
Conference Paper
The prevention of deadlock in certain types of distributed simulation systems requires special synchronization protocols. These protocols often create an excessive amount of performance-degrading communication; yet a protocol with the minimum amount of communication may not lead to the fastest network finishing time. We propose a protocol that attempts to balance the network's need for auxiliary synchronization information with the cost of providing that information. Using an empirical study, we demonstrate the efficiency of this protocol. Also, we show that the synchronization requirements at different interfaces may vary; an integral part of our proposal assigns a protocol to an interface according to the interface's synchronization needs.
Article
This paper explores several variants of the Chandy-Misra Null Message algorithm for distributed simulation. The Chandy-Misra algorithm is one of a class of “conservative” algorithms that maintains the correct order of simulation throughout the execution of the model by means of constraints on simulation time advance. The algorithms developed in this paper incorporate an “event-oriented” view of the physical process and message-passing. The effects of the computational workload to compute each event is related to speedup attained over an equivalent sequential simulation. The effects of network topology are investigated, and performance is evaluated for the variants on transmission of null messages. The performance analysis is supported with empirical results based on an implementation of the algorithm on an Intel iPSC 32-node hypercube multiprocessor. Results show that speedups over sequential simulation of greater than N, using N processors, can be achieved in some circumstances.
Article
Simulation, particularly of networks of queues, is an application with a high degree of inherent parallelism, and is of considerable practical interest. We continue the analysis of synchronization methods for distributed simulation, defined by the taxonomy in our previous paper. Specifically, we develop algorithms for time-driven simulation using a network of processors. For most of the synchronization methods considered, each node k of an n-node network simulation cannot proceed directly with its part of a simulation. Rather, it must compute some function Bk(ν1, ν2, …, νn), where νi is some value which must be obtained from node i. The value of νi at each node changes as the simulation progresses, and must be broadcast to every other node for the recomputation of the B-functions. In some cases, it is advantageous to compute the B-function in a distributed manner. Broadcast algorithms for such distributed computation are presented. Since the performance of a broadcast algorithm depends on the properties of the inter-process communication facility, we characterize some particular cases and give algorithms for each of them.
Conference Paper
An overview of technologies concerned with distributing the execution of simulation programs across multiple processors is presented. Here, particular emphasis is placed on discrete event simulations. The High Level Architecture (HLA) developed by the Department of Defense in the United States is first described to provide a concrete example of a contemporary approach to distributed simulation. The remainder of this paper is focused on time management, a central issue concerning the synchronization of computations on different processors. Time management algorithms broadly fall into two categories, termed conservative and optimistic synchronization. A survey of both conservative and optimistic algorithms is presented focusing on fundamental principles and mechanisms. Finally, time management in the HLA is discussed as a means to illustrate how this standard supports both approaches to synchronization.